I'm using Spark 1.3.0 RC3 build with Hive support.
In Spark Shell, I want to reuse the HiveContext instance to different
warehouse locations. Below are the steps for my test (Assume I have
loaded a file into table "src").
======
15/03/10 18:22:59 INFO SparkILoop: Created sql context (with Hive
support)..
SQL context available as sqlContext.
scala> sqlContext.sql("SET hive.metastore.warehouse.dir=/test/w")
scala> sqlContext.sql("SELECT * from src").saveAsTable("table1")
scala> sqlContext.sql("SET hive.metastore.warehouse.dir=/test/w2")
scala> sqlContext.sql("SELECT * from src").saveAsTable("table2")
======
After these steps, the tables are stored in "/test/w" only. I expect
"table2" to be stored in "/test/w2" folder.
Another question is: if I set "hive.metastore.warehouse.dir" to a HDFS
folder, I cannot use saveAsTable()? Is this by design? Exception stack
trace is below:
======
15/03/10 18:35:28 INFO BlockManagerMaster: Updated info of block
broadcast_0_piece0
15/03/10 18:35:28 INFO SparkContext: Created broadcast 0 from broadcast
at TableReader.scala:74
java.lang.IllegalArgumentException: Wrong FS:
hdfs://server:8020/space/warehouse/table2, expected: file:///
at
org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:643)
at
org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:463)
at
org.apache.hadoop.fs.FilterFileSystem.makeQualified(FilterFileSystem.jav
a:118)
at
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.a
pply(newParquet.scala:252)
at
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.a
pply(newParquet.scala:251)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.sc
ala:244)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.sc
ala:244)
at scala.collection.immutable.List.foreach(List.scala:318)
at
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at
scala.collection.AbstractTraversable.map(Traversable.scala:105)
at
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newP
arquet.scala:251)
at
org.apache.spark.sql.parquet.ParquetRelation2.<init>(newParquet.scala:37
0)
at
org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.sca
la:96)
at
org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.sca
la:125)
at
org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:308)
at
org.apache.spark.sql.hive.execution.CreateMetastoreDataSourceAsSelect.ru
n(commands.scala:217)
at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompu
te(commands.scala:55)
at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands
.scala:55)
at
org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:65
)
at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLConte
xt.scala:1088)
at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:10
88)
at
org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:1048)
at
org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:998)
at
org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:964)
at
org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:942)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:20)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:25)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:29)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
at $iwC$$iwC$$iwC.<init>(<console>:33)
at $iwC$$iwC.<init>(<console>:35)
at $iwC.<init>(<console>:37)
at <init>(<console>:39)
Thank you very much!