RE: [SparkSQL] Reuse HiveContext to different Hive warehouse?

Cheng, Hao Tue, 10 Mar 2015 17:28:01 -0700

I am not so sure if Hive supports change the metastore after initialized, I 
guess not. Spark SQL totally rely on Hive Metastore in HiveContext, probably 
that's why it doesn't work as expected for Q1.


BTW, in most of cases, people configure the metastore settings in 
hive-site.xml, and will not change that since then, is there any reason that 
you want to change that in runtime?

For Q2, probably something wrong in configuration, seems the HDFS run into the 
pseudo/single node mode, can you double check that? Or can you run the DDL 
(like create a table) from the spark shell with HiveContext?

From: Haopu Wang [mailto:hw...@qilinsoft.com]
Sent: Tuesday, March 10, 2015 6:38 PM
To: user; d...@spark.apache.org
Subject: [SparkSQL] Reuse HiveContext to different Hive warehouse?


I'm using Spark 1.3.0 RC3 build with Hive support.



In Spark Shell, I want to reuse the HiveContext instance to different warehouse 
locations. Below are the steps for my test (Assume I have loaded a file into 
table "src").



======

15/03/10 18:22:59 INFO SparkILoop: Created sql context (with Hive support)..

SQL context available as sqlContext.

scala> sqlContext.sql("SET hive.metastore.warehouse.dir=/test/w")

scala> sqlContext.sql("SELECT * from src").saveAsTable("table1")

scala> sqlContext.sql("SET hive.metastore.warehouse.dir=/test/w2")

scala> sqlContext.sql("SELECT * from src").saveAsTable("table2")

======

After these steps, the tables are stored in "/test/w" only. I expect "table2" 
to be stored in "/test/w2" folder.



Another question is: if I set "hive.metastore.warehouse.dir" to a HDFS folder, 
I cannot use saveAsTable()? Is this by design? Exception stack trace is below:

======

15/03/10 18:35:28 INFO BlockManagerMaster: Updated info of block 
broadcast_0_piece0

15/03/10 18:35:28 INFO SparkContext: Created broadcast 0 from broadcast at 
TableReader.scala:74

java.lang.IllegalArgumentException: Wrong FS: 
hdfs://server:8020/space/warehouse/table2, expected: file:///<file:///\\>

        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:643)

        at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:463)

        at 
org.apache.hadoop.fs.FilterFileSystem.makeQualified(FilterFileSystem.java:118)

        at 
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:252)

        at 
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:251)

        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

        at scala.collection.immutable.List.foreach(List.scala:318)

        at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)

        at scala.collection.AbstractTraversable.map(Traversable.scala:105)

        at 
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:251)

        at 
org.apache.spark.sql.parquet.ParquetRelation2.<init>(newParquet.scala:370)

        at 
org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:96)

        at 
org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:125)

        at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:308)

        at 
org.apache.spark.sql.hive.execution.CreateMetastoreDataSourceAsSelect.run(commands.scala:217)

        at 
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:55)

        at 
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:55)

        at 
org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:65)

        at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1088)

        at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1088)

        at org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:1048)

        at org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:998)

        at org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:964)

        at org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:942)

        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:20)

        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:25)

        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27)

        at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:29)

        at $iwC$$iwC$$iwC$$iwC.<init>(<console>:31)

        at $iwC$$iwC$$iwC.<init>(<console>:33)

        at $iwC$$iwC.<init>(<console>:35)

        at $iwC.<init>(<console>:37)

        at <init>(<console>:39)



Thank you very much!

RE: [SparkSQL] Reuse HiveContext to different Hive warehouse?

Reply via email to