Re: [SparkSQL] Reuse HiveContext to different Hive warehouse?

Michael Armbrust Wed, 11 Mar 2015 11:27:13 -0700

That val is not really your problem.  In general, there is a lot of global
state throughout the hive codebase that make it unsafe to try and connect
to more than one hive installation from the same JVM.


On Tue, Mar 10, 2015 at 11:36 PM, Haopu Wang <hw...@qilinsoft.com> wrote:

>  Hao, thanks for the response.
>
>
>
> For Q1, in my case, I have a tool on SparkShell which serves multiple
> users where they can use different Hive installation. I take a look at the
> code of HiveContext. It looks like I cannot do that today because "catalog"
> field cannot be changed after initialize.
>
>
>
>   /* A catalyst metadata catalog that points to the Hive Metastore. */
>
>   @transient
>
>   *override* *protected*[sql] *lazy* *val* catalog = *new*
> HiveMetastoreCatalog(*this*) *with* OverrideCatalog
>
>
>
> For Q2, I check HDFS and it is running as a cluster. I can run the DDL
> from spark shell with HiveContext as well. To reproduce the exception, I
> just run below script. It happens in the last step.
>
>
>
> 15/03/11 14:24:48 INFO SparkILoop: Created sql context (with Hive
> support)..
>
> SQL context available as sqlContext.
>
> scala> sqlContext.sql("SET
> hive.metastore.warehouse.dir=hdfs://server:8020/space/warehouse")
>
> scala> sqlContext.sql("CREATE TABLE IF NOT EXISTS src(key INT, value
> STRING)")
>
> scala> sqlContext.sql("LOAD DATA LOCAL INPATH
> 'examples/src/main/resources/kv1.txt' INTO TABLE src")
>
> scala> var output = sqlContext.sql("SELECT key,value FROM src")
>
> scala> output.saveAsTable("outputtable")
>
>
>  ------------------------------
>
> *From:* Cheng, Hao [mailto:hao.ch...@intel.com]
> *Sent:* Wednesday, March 11, 2015 8:25 AM
> *To:* Haopu Wang; user; dev@spark.apache.org
> *Subject:* RE: [SparkSQL] Reuse HiveContext to different Hive warehouse?
>
>
>
> I am not so sure if Hive supports change the metastore after initialized,
> I guess not. Spark SQL totally rely on Hive Metastore in HiveContext,
> probably that’s why it doesn’t work as expected for Q1.
>
>
>
> BTW, in most of cases, people configure the metastore settings in
> hive-site.xml, and will not change that since then, is there any reason
> that you want to change that in runtime?
>
>
>
> For Q2, probably something wrong in configuration, seems the HDFS run into
> the pseudo/single node mode, can you double check that? Or can you run the
> DDL (like create a table) from the spark shell with HiveContext?
>
>
>
> *From:* Haopu Wang [mailto:hw...@qilinsoft.com]
> *Sent:* Tuesday, March 10, 2015 6:38 PM
> *To:* user; dev@spark.apache.org
> *Subject:* [SparkSQL] Reuse HiveContext to different Hive warehouse?
>
>
>
> I'm using Spark 1.3.0 RC3 build with Hive support.
>
>
>
> In Spark Shell, I want to reuse the HiveContext instance to different
> warehouse locations. Below are the steps for my test (Assume I have loaded
> a file into table "src").
>
>
>
> ======
>
> 15/03/10 18:22:59 INFO SparkILoop: Created sql context (with Hive
> support)..
>
> SQL context available as sqlContext.
>
> scala> sqlContext.sql("SET hive.metastore.warehouse.dir=/test/w")
>
> scala> sqlContext.sql("SELECT * from src").saveAsTable("table1")
>
> scala> sqlContext.sql("SET hive.metastore.warehouse.dir=/test/w2")
>
> scala> sqlContext.sql("SELECT * from src").saveAsTable("table2")
>
> ======
>
> After these steps, the tables are stored in "/test/w" only. I expect
> "table2" to be stored in "/test/w2" folder.
>
>
>
> Another question is: if I set "hive.metastore.warehouse.dir" to a HDFS
> folder, I cannot use saveAsTable()? Is this by design? Exception stack
> trace is below:
>
> ======
>
> 15/03/10 18:35:28 INFO BlockManagerMaster: Updated info of block
> broadcast_0_piece0
>
> 15/03/10 18:35:28 INFO SparkContext: Created broadcast 0 from broadcast at
> TableReader.scala:74
>
> java.lang.IllegalArgumentException: Wrong FS:
> hdfs://server:8020/space/warehouse/table2, expected: file:///
>
>         at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:643)
>
>         at
> org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:463)
>
>         at
> org.apache.hadoop.fs.FilterFileSystem.makeQualified(FilterFileSystem.java:118)
>
>         at
> org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:252)
>
>         at
> org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:251)
>
>         at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>
>         at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>
>         at scala.collection.immutable.List.foreach(List.scala:318)
>
>         at
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>
>         at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>
>         at
> org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:251)
>
>         at
> org.apache.spark.sql.parquet.ParquetRelation2.<init>(newParquet.scala:370)
>
>         at
> org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:96)
>
>         at
> org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:125)
>
>         at
> org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:308)
>
>         at
> org.apache.spark.sql.hive.execution.CreateMetastoreDataSourceAsSelect.run(commands.scala:217)
>
>         at
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:55)
>
>         at
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:55)
>
>         at
> org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:65)
>
>         at
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1088)
>
>         at
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1088)
>
>         at org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:1048)
>
>         at org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:998)
>
>         at org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:964)
>
>         at org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:942)
>
>         at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:20)
>
>         at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:25)
>
>         at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27)
>
>         at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:29)
>
>         at $iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
>
>         at $iwC$$iwC$$iwC.<init>(<console>:33)
>
>         at $iwC$$iwC.<init>(<console>:35)
>
>         at $iwC.<init>(<console>:37)
>
>         at <init>(<console>:39)
>
>
>
> Thank you very much!
>
>
>

Re: [SparkSQL] Reuse HiveContext to different Hive warehouse?

Reply via email to