Re: Spark SQL Thriftserver

Mich Talebzadeh Tue, 13 Sep 2016 15:52:23 -0700

STS will rely on Hive execution engine. My Hive uses Spark execution engine
so STS will pass the SQL to Hive and let it do the work and return the
result set


 which beeline
/usr/lib/spark-2.0.0-bin-hadoop2.6/bin/beeline
${SPARK_HOME}/bin/beeline -u jdbc:hive2://rhes564:10055 -n hduser -p
xxxxxxxx
Connecting to jdbc:hive2://rhes564:10055
Connected to: Spark SQL (version 2.0.0)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1.spark2 by Apache Hive
0: jdbc:hive2://rhes564:10055>

jdbc:hive2://rhes564:10055> select count(1) from test.prices;
Ok I did a simple query in STS, You will this in hive.log

2016-09-13T23:44:50,996 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
(HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_database:
test
2016-09-13T23:44:50,996 INFO  [pool-4-thread-4]: HiveMetaStore.audit
(HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
ip=50.140.197.217       cmd=source:50.140.197.217 get_database: test
2016-09-13T23:44:50,998 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
(HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_table :
db=test tbl=prices
2016-09-13T23:44:50,998 INFO  [pool-4-thread-4]: HiveMetaStore.audit
(HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
ip=50.140.197.217       cmd=source:50.140.197.217 get_table : db=test
tbl=prices
2016-09-13T23:44:51,007 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
(HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_table :
db=test tbl=prices
2016-09-13T23:44:51,007 INFO  [pool-4-thread-4]: HiveMetaStore.audit
(HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
ip=50.140.197.217       cmd=source:50.140.197.217 get_table : db=test
tbl=prices
2016-09-13T23:44:51,021 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
(HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_database:
test
2016-09-13T23:44:51,021 INFO  [pool-4-thread-4]: HiveMetaStore.audit
(HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
ip=50.140.197.217       cmd=source:50.140.197.217 get_database: test
2016-09-13T23:44:51,023 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
(HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_table :
db=test tbl=prices
2016-09-13T23:44:51,023 INFO  [pool-4-thread-4]: HiveMetaStore.audit
(HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
ip=50.140.197.217       cmd=source:50.140.197.217 get_table : db=test
tbl=prices
2016-09-13T23:44:51,029 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
(HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_table :
db=test tbl=prices
2016-09-13T23:44:51,029 INFO  [pool-4-thread-4]: HiveMetaStore.audit
(HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
ip=50.140.197.217       cmd=source:50.140.197.217 get_table : db=test
tbl=prices

I think it is a good idea to switch to Spark engine (as opposed to MR). My
tests proved that Hive on Spark using DAG and in-memory offering runs at
least by order of magnitude faster compared to map-reduce.

You can either connect to beeline from $HIVE_HOME/... or beeline from
$SPARK_HOME

HTH




Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 13 September 2016 at 23:28, Benjamin Kim <bbuil...@gmail.com> wrote:

> Mich,
>
> It sounds like that there would be no harm in changing then. Are you
> saying that using STS would still use MapReduce to run the SQL statements?
> What our users are doing in our CDH 5.7.2 installation is changing the
> execution engine to Spark when connected to HiveServer2 to get faster
> results. Would they still have to do this using STS? Lastly, we are seeing
> zombie YARN jobs left behind even after a user disconnects. Are you seeing
> this happen with STS? If not, then this would be even better.
>
> Thanks for your fast reply.
>
> Cheers,
> Ben
>
> On Sep 13, 2016, at 3:15 PM, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
> Hi,
>
> Spark Thrift server (STS) still uses hive thrift server. If you look at
> $SPARK_HOME/sbin/start-thriftserver.sh you will see (mine is Spark 2)
>
> function usage {
>   echo "Usage: ./sbin/start-thriftserver [options] [thrift server options]"
>   pattern="usage"
>   *pattern+="\|Spark assembly has been built with Hive"*
>   pattern+="\|NOTE: SPARK_PREPEND_CLASSES is set"
>   pattern+="\|Spark Command: "
>   pattern+="\|======="
>   pattern+="\|--help"
>
>
> Indeed when you start STS, you pass hiveconf parameter to it
>
> ${SPARK_HOME}/sbin/start-thriftserver.sh \
>                 --master  \
>                 --hiveconf hive.server2.thrift.port=10055 \
>
> and STS bypasses Spark optimiser and uses Hive optimizer and execution
> engine. You will see this in hive.log file
>
> So I don't think it is going to give you much difference. Unless they have
> recently changed the design of STS.
>
> HTH
>
>
>
>
> Dr Mich Talebzadeh
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
> http://talebzadehmich.wordpress.com
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 13 September 2016 at 22:32, Benjamin Kim <bbuil...@gmail.com> wrote:
>
>> Does anyone have any thoughts about using Spark SQL Thriftserver in Spark
>> 1.6.2 instead of HiveServer2? We are considering abandoning HiveServer2 for
>> it. Some advice and gotcha’s would be nice to know.
>>
>> Thanks,
>> Ben
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>
>

Re: Spark SQL Thriftserver

Reply via email to