@Xiao, It is tracked in SPARK-15345 <https://issues.apache.org/jira/browse/SPARK-15345>
On Fri, May 20, 2016 at 4:20 AM, Xiao Li <gatorsm...@gmail.com> wrote: > -1 > > Unable to use Hive meta-store in pyspark shell. Tried both HiveContext and > SparkSession. Both failed. It always uses in-memory catalog. Anybody else > hit the same issue? > > > Method 1: SparkSession > > >>> from pyspark.sql import SparkSession > > >>> spark = SparkSession.builder.enableHiveSupport().getOrCreate() > > >>> > > >>> spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)") > > DataFrame[] > > >>> spark.sql("LOAD DATA LOCAL INPATH > 'examples/src/main/resources/kv1.txt' INTO TABLE src") > > Traceback (most recent call last): > > File "<stdin>", line 1, in <module> > > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/session.py", > line 494, in sql > > return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped) > > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", > line 933, in __call__ > > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/utils.py", > line 57, in deco > > return f(*a, **kw) > > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", > line 312, in get_return_value > > py4j.protocol.Py4JJavaError: An error occurred while calling o21.sql. > > : java.lang.UnsupportedOperationException: loadTable is not implemented > > at > org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.loadTable(InMemoryCatalog.scala:297) > > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.loadTable(SessionCatalog.scala:280) > > at org.apache.spark.sql.execution.command.LoadData.run(tables.scala:263) > > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:57) > > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:55) > > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:69) > > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) > > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) > > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114) > > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:85) > > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:85) > > at org.apache.spark.sql.Dataset.<init>(Dataset.scala:187) > > at org.apache.spark.sql.Dataset.<init>(Dataset.scala:168) > > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63) > > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:541) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:606) > > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) > > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > > at py4j.Gateway.invoke(Gateway.java:280) > > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128) > > at py4j.commands.CallCommand.execute(CallCommand.java:79) > > at py4j.GatewayConnection.run(GatewayConnection.java:211) > > at java.lang.Thread.run(Thread.java:745) > > > Method 2: Using HiveContext: > > >>> from pyspark.sql import HiveContext > > >>> sqlContext = HiveContext(sc) > > >>> sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value > STRING)") > > DataFrame[] > > >>> sqlContext.sql("LOAD DATA LOCAL INPATH > 'examples/src/main/resources/kv1.txt' INTO TABLE src") > > Traceback (most recent call last): > > File "<stdin>", line 1, in <module> > > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/context.py", > line 346, in sql > > return self.sparkSession.sql(sqlQuery) > > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/session.py", > line 494, in sql > > return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped) > > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", > line 933, in __call__ > > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/utils.py", > line 57, in deco > > return f(*a, **kw) > > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", > line 312, in get_return_value > > py4j.protocol.Py4JJavaError: An error occurred while calling o21.sql. > > : java.lang.UnsupportedOperationException: loadTable is not implemented > > at > org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.loadTable(InMemoryCatalog.scala:297) > > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.loadTable(SessionCatalog.scala:280) > > at org.apache.spark.sql.execution.command.LoadData.run(tables.scala:263) > > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:57) > > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:55) > > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:69) > > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) > > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) > > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114) > > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:85) > > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:85) > > at org.apache.spark.sql.Dataset.<init>(Dataset.scala:187) > > at org.apache.spark.sql.Dataset.<init>(Dataset.scala:168) > > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63) > > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:541) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:606) > > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) > > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > > at py4j.Gateway.invoke(Gateway.java:280) > > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128) > > at py4j.commands.CallCommand.execute(CallCommand.java:79) > > at py4j.GatewayConnection.run(GatewayConnection.java:211) > > at java.lang.Thread.run(Thread.java:745) > > 2016-05-19 12:49 GMT-07:00 Herman van Hövell tot Westerflier < > hvanhov...@questtec.nl>: > >> +1 >> >> >> 2016-05-19 18:20 GMT+02:00 Xiangrui Meng <m...@databricks.com>: >> >>> +1 >>> >>> On Thu, May 19, 2016 at 9:18 AM Joseph Bradley <jos...@databricks.com> >>> wrote: >>> >>>> +1 >>>> >>>> On Wed, May 18, 2016 at 10:49 AM, Reynold Xin <r...@databricks.com> >>>> wrote: >>>> >>>>> Hi Ovidiu-Cristian , >>>>> >>>>> The best source of truth is change the filter with target version to >>>>> 2.1.0. Not a lot of tickets have been targeted yet, but I'd imagine as we >>>>> get closer to 2.0 release, more will be retargeted at 2.1.0. >>>>> >>>>> >>>>> >>>>> On Wed, May 18, 2016 at 10:43 AM, Ovidiu-Cristian MARCU < >>>>> ovidiu-cristian.ma...@inria.fr> wrote: >>>>> >>>>>> Yes, I can filter.. >>>>>> Did that and for example: >>>>>> >>>>>> https://issues.apache.org/jira/browse/SPARK-15370?jql=project%20%3D%20SPARK%20AND%20resolution%20%3D%20Unresolved%20AND%20affectedVersion%20%3D%202.0.0 >>>>>> <https://issues.apache.org/jira/browse/SPARK-15370?jql=project%20=%20SPARK%20AND%20resolution%20=%20Unresolved%20AND%20affectedVersion%20=%202.0.0> >>>>>> >>>>>> To rephrase: for 2.0 do you have specific issues that are not a >>>>>> priority and will released maybe with 2.1 for example? >>>>>> >>>>>> Keep up the good work! >>>>>> >>>>>> On 18 May 2016, at 18:19, Reynold Xin <r...@databricks.com> wrote: >>>>>> >>>>>> You can find that by changing the filter to target version = 2.0.0. >>>>>> Cheers. >>>>>> >>>>>> On Wed, May 18, 2016 at 9:00 AM, Ovidiu-Cristian MARCU < >>>>>> ovidiu-cristian.ma...@inria.fr> wrote: >>>>>> >>>>>>> +1 Great, I see the list of resolved issues, do you have a list of >>>>>>> known issue you plan to stay with this release? >>>>>>> >>>>>>> with >>>>>>> build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.7.1 -Phive >>>>>>> -Phive-thriftserver -DskipTests clean package >>>>>>> >>>>>>> mvn -version >>>>>>> Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; >>>>>>> 2015-11-10T17:41:47+01:00) >>>>>>> Maven home: /Users/omarcu/tools/apache-maven-3.3.9 >>>>>>> Java version: 1.7.0_80, vendor: Oracle Corporation >>>>>>> Java home: >>>>>>> /Library/Java/JavaVirtualMachines/jdk1.7.0_80.jdk/Contents/Home/jre >>>>>>> Default locale: en_US, platform encoding: UTF-8 >>>>>>> OS name: "mac os x", version: "10.11.5", arch: "x86_64", family: >>>>>>> “mac" >>>>>>> >>>>>>> [INFO] Reactor Summary: >>>>>>> [INFO] >>>>>>> [INFO] Spark Project Parent POM ........................... SUCCESS >>>>>>> [ 2.635 s] >>>>>>> [INFO] Spark Project Tags ................................. SUCCESS >>>>>>> [ 1.896 s] >>>>>>> [INFO] Spark Project Sketch ............................... SUCCESS >>>>>>> [ 2.560 s] >>>>>>> [INFO] Spark Project Networking ........................... SUCCESS >>>>>>> [ 6.533 s] >>>>>>> [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS >>>>>>> [ 4.176 s] >>>>>>> [INFO] Spark Project Unsafe ............................... SUCCESS >>>>>>> [ 4.809 s] >>>>>>> [INFO] Spark Project Launcher ............................. SUCCESS >>>>>>> [ 6.242 s] >>>>>>> [INFO] Spark Project Core ................................. SUCCESS >>>>>>> [01:20 min] >>>>>>> [INFO] Spark Project GraphX ............................... SUCCESS >>>>>>> [ 9.148 s] >>>>>>> [INFO] Spark Project Streaming ............................ SUCCESS >>>>>>> [ 22.760 s] >>>>>>> [INFO] Spark Project Catalyst ............................. SUCCESS >>>>>>> [ 50.783 s] >>>>>>> [INFO] Spark Project SQL .................................. SUCCESS >>>>>>> [01:05 min] >>>>>>> [INFO] Spark Project ML Local Library ..................... SUCCESS >>>>>>> [ 4.281 s] >>>>>>> [INFO] Spark Project ML Library ........................... SUCCESS >>>>>>> [ 54.537 s] >>>>>>> [INFO] Spark Project Tools ................................ SUCCESS >>>>>>> [ 0.747 s] >>>>>>> [INFO] Spark Project Hive ................................. SUCCESS >>>>>>> [ 33.032 s] >>>>>>> [INFO] Spark Project HiveContext Compatibility ............ SUCCESS >>>>>>> [ 3.198 s] >>>>>>> [INFO] Spark Project REPL ................................. SUCCESS >>>>>>> [ 3.573 s] >>>>>>> [INFO] Spark Project YARN Shuffle Service ................. SUCCESS >>>>>>> [ 4.617 s] >>>>>>> [INFO] Spark Project YARN ................................. SUCCESS >>>>>>> [ 7.321 s] >>>>>>> [INFO] Spark Project Hive Thrift Server ................... SUCCESS >>>>>>> [ 16.496 s] >>>>>>> [INFO] Spark Project Assembly ............................. SUCCESS >>>>>>> [ 2.300 s] >>>>>>> [INFO] Spark Project External Flume Sink .................. SUCCESS >>>>>>> [ 4.219 s] >>>>>>> [INFO] Spark Project External Flume ....................... SUCCESS >>>>>>> [ 6.987 s] >>>>>>> [INFO] Spark Project External Flume Assembly .............. SUCCESS >>>>>>> [ 1.465 s] >>>>>>> [INFO] Spark Integration for Kafka 0.8 .................... SUCCESS >>>>>>> [ 6.891 s] >>>>>>> [INFO] Spark Project Examples ............................. SUCCESS >>>>>>> [ 13.465 s] >>>>>>> [INFO] Spark Project External Kafka Assembly .............. SUCCESS >>>>>>> [ 2.815 s] >>>>>>> [INFO] >>>>>>> ------------------------------------------------------------------------ >>>>>>> [INFO] BUILD SUCCESS >>>>>>> [INFO] >>>>>>> ------------------------------------------------------------------------ >>>>>>> [INFO] Total time: 07:04 min >>>>>>> [INFO] Finished at: 2016-05-18T17:55:33+02:00 >>>>>>> [INFO] Final Memory: 90M/824M >>>>>>> [INFO] >>>>>>> ------------------------------------------------------------------------ >>>>>>> >>>>>>> On 18 May 2016, at 16:28, Sean Owen <so...@cloudera.com> wrote: >>>>>>> >>>>>>> I think it's a good idea. Although releases have been preceded before >>>>>>> by release candidates for developers, it would be good to get a >>>>>>> formal >>>>>>> preview/beta release ratified for public consumption ahead of a new >>>>>>> major release. Better to have a little more testing in the wild to >>>>>>> identify problems before 2.0.0 is finalized. >>>>>>> >>>>>>> +1 to the release. License, sigs, etc check out. On Ubuntu 16 + Java >>>>>>> 8, compilation and tests succeed for "-Pyarn -Phive >>>>>>> -Phive-thriftserver -Phadoop-2.6". >>>>>>> >>>>>>> On Wed, May 18, 2016 at 6:40 AM, Reynold Xin <r...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> In the past the Apache Spark community have created preview packages >>>>>>> (not >>>>>>> official releases) and used those as opportunities to ask community >>>>>>> members >>>>>>> to test the upcoming versions of Apache Spark. Several people in the >>>>>>> Apache >>>>>>> community have suggested we conduct votes for these preview packages >>>>>>> and >>>>>>> turn them into formal releases by the Apache foundation's standard. >>>>>>> Preview >>>>>>> releases are not meant to be functional, i.e. they can and highly >>>>>>> likely >>>>>>> will contain critical bugs or documentation errors, but we will be >>>>>>> able to >>>>>>> post them to the project's website to get wider feedback. They should >>>>>>> satisfy the legal requirements of Apache's release policy >>>>>>> (http://www.apache.org/dev/release.html) such as having proper >>>>>>> licenses. >>>>>>> >>>>>>> >>>>>>> Please vote on releasing the following candidate as Apache Spark >>>>>>> version >>>>>>> 2.0.0-preview. The vote is open until Friday, May 20, 2015 at 11:00 >>>>>>> PM PDT >>>>>>> and passes if a majority of at least 3 +1 PMC votes are cast. >>>>>>> >>>>>>> [ ] +1 Release this package as Apache Spark 2.0.0-preview >>>>>>> [ ] -1 Do not release this package because ... >>>>>>> >>>>>>> To learn more about Apache Spark, please see >>>>>>> http://spark.apache.org/ >>>>>>> >>>>>>> The tag to be voted on is 2.0.0-preview >>>>>>> (8f5a04b6299e3a47aca13cbb40e72344c0114860) >>>>>>> >>>>>>> The release files, including signatures, digests, etc. can be found >>>>>>> at: >>>>>>> >>>>>>> http://home.apache.org/~pwendell/spark-releases/spark-2.0.0-preview-bin/ >>>>>>> >>>>>>> Release artifacts are signed with the following key: >>>>>>> https://people.apache.org/keys/committer/pwendell.asc >>>>>>> >>>>>>> The documentation corresponding to this release can be found at: >>>>>>> >>>>>>> http://home.apache.org/~pwendell/spark-releases/spark-2.0.0-preview-docs/ >>>>>>> >>>>>>> The list of resolved issues are: >>>>>>> >>>>>>> https://issues.apache.org/jira/browse/SPARK-15351?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.0.0 >>>>>>> >>>>>>> >>>>>>> If you are a Spark user, you can help us test this release by taking >>>>>>> an >>>>>>> existing Apache Spark workload and running on this candidate, then >>>>>>> reporting >>>>>>> any regressions. >>>>>>> >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >>>>>>> For additional commands, e-mail: dev-h...@spark.apache.org >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >> > -- Best Regards Jeff Zhang