Hi, all Spark STS just uses HiveContext inside and does not use MR. Anyway, Spark STS misses some HiveServer2 functionalities such as HA (See: https://issues.apache.org/jira/browse/SPARK-11100) and has some known issues there. So, you'd better off checking all the jira issues related to STS for considering the replacement.
// maropu On Wed, Sep 14, 2016 at 8:55 AM, ayan guha <guha.a...@gmail.com> wrote: > Hi > > AFAIK STS uses Spark SQL and not Map Reduce. Is that not correct? > > Best > Ayan > > On Wed, Sep 14, 2016 at 8:51 AM, Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> STS will rely on Hive execution engine. My Hive uses Spark execution >> engine so STS will pass the SQL to Hive and let it do the work and return >> the result set >> >> which beeline >> /usr/lib/spark-2.0.0-bin-hadoop2.6/bin/beeline >> ${SPARK_HOME}/bin/beeline -u jdbc:hive2://rhes564:10055 -n hduser -p >> xxxxxxxx >> Connecting to jdbc:hive2://rhes564:10055 >> Connected to: Spark SQL (version 2.0.0) >> Driver: Hive JDBC (version 1.2.1.spark2) >> Transaction isolation: TRANSACTION_REPEATABLE_READ >> Beeline version 1.2.1.spark2 by Apache Hive >> 0: jdbc:hive2://rhes564:10055> >> >> jdbc:hive2://rhes564:10055> select count(1) from test.prices; >> Ok I did a simple query in STS, You will this in hive.log >> >> 2016-09-13T23:44:50,996 INFO [pool-4-thread-4]: metastore.HiveMetaStore >> (HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 >> get_database: test >> 2016-09-13T23:44:50,996 INFO [pool-4-thread-4]: HiveMetaStore.audit >> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser >> ip=50.140.197.217 cmd=source:50.140.197.217 get_database: test >> 2016-09-13T23:44:50,998 INFO [pool-4-thread-4]: metastore.HiveMetaStore >> (HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_table : >> db=test tbl=prices >> 2016-09-13T23:44:50,998 INFO [pool-4-thread-4]: HiveMetaStore.audit >> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser >> ip=50.140.197.217 cmd=source:50.140.197.217 get_table : db=test >> tbl=prices >> 2016-09-13T23:44:51,007 INFO [pool-4-thread-4]: metastore.HiveMetaStore >> (HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_table : >> db=test tbl=prices >> 2016-09-13T23:44:51,007 INFO [pool-4-thread-4]: HiveMetaStore.audit >> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser >> ip=50.140.197.217 cmd=source:50.140.197.217 get_table : db=test >> tbl=prices >> 2016-09-13T23:44:51,021 INFO [pool-4-thread-4]: metastore.HiveMetaStore >> (HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 >> get_database: test >> 2016-09-13T23:44:51,021 INFO [pool-4-thread-4]: HiveMetaStore.audit >> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser >> ip=50.140.197.217 cmd=source:50.140.197.217 get_database: test >> 2016-09-13T23:44:51,023 INFO [pool-4-thread-4]: metastore.HiveMetaStore >> (HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_table : >> db=test tbl=prices >> 2016-09-13T23:44:51,023 INFO [pool-4-thread-4]: HiveMetaStore.audit >> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser >> ip=50.140.197.217 cmd=source:50.140.197.217 get_table : db=test >> tbl=prices >> 2016-09-13T23:44:51,029 INFO [pool-4-thread-4]: metastore.HiveMetaStore >> (HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_table : >> db=test tbl=prices >> 2016-09-13T23:44:51,029 INFO [pool-4-thread-4]: HiveMetaStore.audit >> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser >> ip=50.140.197.217 cmd=source:50.140.197.217 get_table : db=test >> tbl=prices >> >> I think it is a good idea to switch to Spark engine (as opposed to MR). >> My tests proved that Hive on Spark using DAG and in-memory offering runs at >> least by order of magnitude faster compared to map-reduce. >> >> You can either connect to beeline from $HIVE_HOME/... or beeline from >> $SPARK_HOME >> >> HTH >> >> >> >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> On 13 September 2016 at 23:28, Benjamin Kim <bbuil...@gmail.com> wrote: >> >>> Mich, >>> >>> It sounds like that there would be no harm in changing then. Are you >>> saying that using STS would still use MapReduce to run the SQL statements? >>> What our users are doing in our CDH 5.7.2 installation is changing the >>> execution engine to Spark when connected to HiveServer2 to get faster >>> results. Would they still have to do this using STS? Lastly, we are seeing >>> zombie YARN jobs left behind even after a user disconnects. Are you seeing >>> this happen with STS? If not, then this would be even better. >>> >>> Thanks for your fast reply. >>> >>> Cheers, >>> Ben >>> >>> On Sep 13, 2016, at 3:15 PM, Mich Talebzadeh <mich.talebza...@gmail.com> >>> wrote: >>> >>> Hi, >>> >>> Spark Thrift server (STS) still uses hive thrift server. If you look at >>> $SPARK_HOME/sbin/start-thriftserver.sh you will see (mine is Spark 2) >>> >>> function usage { >>> echo "Usage: ./sbin/start-thriftserver [options] [thrift server >>> options]" >>> pattern="usage" >>> *pattern+="\|Spark assembly has been built with Hive"* >>> pattern+="\|NOTE: SPARK_PREPEND_CLASSES is set" >>> pattern+="\|Spark Command: " >>> pattern+="\|=======" >>> pattern+="\|--help" >>> >>> >>> Indeed when you start STS, you pass hiveconf parameter to it >>> >>> ${SPARK_HOME}/sbin/start-thriftserver.sh \ >>> --master \ >>> --hiveconf hive.server2.thrift.port=10055 \ >>> >>> and STS bypasses Spark optimiser and uses Hive optimizer and execution >>> engine. You will see this in hive.log file >>> >>> So I don't think it is going to give you much difference. Unless they >>> have recently changed the design of STS. >>> >>> HTH >>> >>> >>> >>> >>> Dr Mich Talebzadeh >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> On 13 September 2016 at 22:32, Benjamin Kim <bbuil...@gmail.com> wrote: >>> >>>> Does anyone have any thoughts about using Spark SQL Thriftserver in >>>> Spark 1.6.2 instead of HiveServer2? We are considering abandoning >>>> HiveServer2 for it. Some advice and gotcha’s would be nice to know. >>>> >>>> Thanks, >>>> Ben >>>> --------------------------------------------------------------------- >>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>>> >>>> >>> >>> >> > > > -- > Best Regards, > Ayan Guha > -- --- Takeshi Yamamuro