Re: Spark SQL Thriftserver

ayan guha Tue, 13 Sep 2016 16:55:59 -0700

Hi

AFAIK STS uses Spark SQL and not Map Reduce. Is that not correct?


Best
Ayan

On Wed, Sep 14, 2016 at 8:51 AM, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> STS will rely on Hive execution engine. My Hive uses Spark execution
> engine so STS will pass the SQL to Hive and let it do the work and return
> the result set
>
>  which beeline
> /usr/lib/spark-2.0.0-bin-hadoop2.6/bin/beeline
> ${SPARK_HOME}/bin/beeline -u jdbc:hive2://rhes564:10055 -n hduser -p
> xxxxxxxx
> Connecting to jdbc:hive2://rhes564:10055
> Connected to: Spark SQL (version 2.0.0)
> Driver: Hive JDBC (version 1.2.1.spark2)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 1.2.1.spark2 by Apache Hive
> 0: jdbc:hive2://rhes564:10055>
>
> jdbc:hive2://rhes564:10055> select count(1) from test.prices;
> Ok I did a simple query in STS, You will this in hive.log
>
> 2016-09-13T23:44:50,996 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
> (HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217
> get_database: test
> 2016-09-13T23:44:50,996 INFO  [pool-4-thread-4]: HiveMetaStore.audit
> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
> ip=50.140.197.217       cmd=source:50.140.197.217 get_database: test
> 2016-09-13T23:44:50,998 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
> (HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_table :
> db=test tbl=prices
> 2016-09-13T23:44:50,998 INFO  [pool-4-thread-4]: HiveMetaStore.audit
> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
> ip=50.140.197.217       cmd=source:50.140.197.217 get_table : db=test
> tbl=prices
> 2016-09-13T23:44:51,007 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
> (HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_table :
> db=test tbl=prices
> 2016-09-13T23:44:51,007 INFO  [pool-4-thread-4]: HiveMetaStore.audit
> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
> ip=50.140.197.217       cmd=source:50.140.197.217 get_table : db=test
> tbl=prices
> 2016-09-13T23:44:51,021 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
> (HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217
> get_database: test
> 2016-09-13T23:44:51,021 INFO  [pool-4-thread-4]: HiveMetaStore.audit
> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
> ip=50.140.197.217       cmd=source:50.140.197.217 get_database: test
> 2016-09-13T23:44:51,023 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
> (HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_table :
> db=test tbl=prices
> 2016-09-13T23:44:51,023 INFO  [pool-4-thread-4]: HiveMetaStore.audit
> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
> ip=50.140.197.217       cmd=source:50.140.197.217 get_table : db=test
> tbl=prices
> 2016-09-13T23:44:51,029 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
> (HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_table :
> db=test tbl=prices
> 2016-09-13T23:44:51,029 INFO  [pool-4-thread-4]: HiveMetaStore.audit
> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
> ip=50.140.197.217       cmd=source:50.140.197.217 get_table : db=test
> tbl=prices
>
> I think it is a good idea to switch to Spark engine (as opposed to MR). My
> tests proved that Hive on Spark using DAG and in-memory offering runs at
> least by order of magnitude faster compared to map-reduce.
>
> You can either connect to beeline from $HIVE_HOME/... or beeline from
> $SPARK_HOME
>
> HTH
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 13 September 2016 at 23:28, Benjamin Kim <bbuil...@gmail.com> wrote:
>
>> Mich,
>>
>> It sounds like that there would be no harm in changing then. Are you
>> saying that using STS would still use MapReduce to run the SQL statements?
>> What our users are doing in our CDH 5.7.2 installation is changing the
>> execution engine to Spark when connected to HiveServer2 to get faster
>> results. Would they still have to do this using STS? Lastly, we are seeing
>> zombie YARN jobs left behind even after a user disconnects. Are you seeing
>> this happen with STS? If not, then this would be even better.
>>
>> Thanks for your fast reply.
>>
>> Cheers,
>> Ben
>>
>> On Sep 13, 2016, at 3:15 PM, Mich Talebzadeh <mich.talebza...@gmail.com>
>> wrote:
>>
>> Hi,
>>
>> Spark Thrift server (STS) still uses hive thrift server. If you look at
>> $SPARK_HOME/sbin/start-thriftserver.sh you will see (mine is Spark 2)
>>
>> function usage {
>>   echo "Usage: ./sbin/start-thriftserver [options] [thrift server
>> options]"
>>   pattern="usage"
>>   *pattern+="\|Spark assembly has been built with Hive"*
>>   pattern+="\|NOTE: SPARK_PREPEND_CLASSES is set"
>>   pattern+="\|Spark Command: "
>>   pattern+="\|======="
>>   pattern+="\|--help"
>>
>>
>> Indeed when you start STS, you pass hiveconf parameter to it
>>
>> ${SPARK_HOME}/sbin/start-thriftserver.sh \
>>                 --master  \
>>                 --hiveconf hive.server2.thrift.port=10055 \
>>
>> and STS bypasses Spark optimiser and uses Hive optimizer and execution
>> engine. You will see this in hive.log file
>>
>> So I don't think it is going to give you much difference. Unless they
>> have recently changed the design of STS.
>>
>> HTH
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 13 September 2016 at 22:32, Benjamin Kim <bbuil...@gmail.com> wrote:
>>
>>> Does anyone have any thoughts about using Spark SQL Thriftserver in
>>> Spark 1.6.2 instead of HiveServer2? We are considering abandoning
>>> HiveServer2 for it. Some advice and gotcha’s would be nice to know.
>>>
>>> Thanks,
>>> Ben
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>>
>>
>>
>


-- 
Best Regards,
Ayan Guha

Re: Spark SQL Thriftserver

Reply via email to