Re: HiveContext is creating metastore warehouse locally instead of in hdfs

Andrew Lee Thu, 31 Jul 2014 20:18:19 -0700

Could you enable HistoryServer and provide the properties and CLASSPATH for the 
spark-shell? And 'env' command to list your environment variables?


By the way, what does the spark logs says? Enable debug mode to see what's 
going on in spark-shell when it tries to interact and init HiveContext.



> On Jul 31, 2014, at 19:09, "chenjie" <chenjie2...@gmail.com> wrote:
> 
> Hi, Yin and Andrew, thank you for your reply.
> When I create table in hive cli, it works correctly and the table will be
> found in hdfs. I forgot start hiveserver2 before and I started it today.
> Then I run the command below:
>    spark-shell --master spark://192.168.40.164:7077  --driver-class-path
> conf/hive-site.xml
> Furthermore, I added the following command:
>    hiveContext.hql("SET
> hive.metastore.warehouse.dir=hdfs://192.168.40.164:8020/user/hive/warehouse")
> But then didn't work for me. I got the same exception as before and found
> the table file in local directory instead of hdfs.
> 
> 
> Yin Huai-2 wrote
>> Another way is to set "hive.metastore.warehouse.dir" explicitly to the
>> HDFS
>> dir storing Hive tables by using SET command. For example:
>> 
>> hiveContext.hql("SET
>> hive.metastore.warehouse.dir=hdfs://localhost:54310/user/hive/warehouse")
>> 
>> 
>> 
>> 
>> On Thu, Jul 31, 2014 at 8:05 AM, Andrew Lee &lt;
> 
>> alee526@
> 
>> &gt; wrote:
>> 
>>> Hi All,
>>> 
>>> It has been awhile, but what I did to make it work is to make sure the
>>> followings:
>>> 
>>> 1. Hive is working when you run Hive CLI and JDBC via Hiveserver2
>>> 
>>> 2. Make sure you have the hive-site.xml from above Hive configuration.
>>> The
>>> problem here is that you want the hive-site.xml from the Hive metastore.
>>> The one for Hive and HCatalog may be different files. Make sure you check
>>> the xml properties in that file, pick the one that has the warehouse
>>> property configured and the JDO setup.
>>> 
>>> 3. Make sure hive-site.xml from step 2 is included in $SPARK_HOME/conf,
>>> and in your runtime CLASSPATH when you run spark-shell
>>> 
>>> 4. Use the history server to check the runtime CLASSPATH and order to
>>> ensure hive-site.xml is included.
>>> 
>>> HiveContext should pick up the hive-site.xml and talk to your running
>>> hive
>>> service.
>>> 
>>> Hope these tips help.
>>> 
>>>> On Jul 30, 2014, at 22:47, "chenjie" &lt;
> 
>> chenjie2001@
> 
>> &gt; wrote:
>>>> 
>>>> Hi, Michael. I Have the same problem. My warehouse directory is always
>>>> created locally. I copied the default hive-site.xml into the
>>>> $SPARK_HOME/conf directory on each node. After I executed the code
>>> below,
>>>>   val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>>>>   hiveContext.hql("CREATE TABLE IF NOT EXISTS src (key INT, value
>>>> STRING)")
>>>>   hiveContext.hql("LOAD DATA LOCAL INPATH
>>>> '/extdisk2/tools/spark/examples/src/main/resources/kv1.txt' INTO TABLE
>>> src")
>>>>   hiveContext.hql("FROM src SELECT key, value").collect()
>>>> 
>>>> I got the exception below:
>>>> java.io.FileNotFoundException: File
>>> file:/user/hive/warehouse/src/kv1.txt
>>>> does not exist
>>>>   at
>>> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:520)
>>>>   at
>>> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
>>>>   at
>>> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.
>> <init>
>> (ChecksumFileSystem.java:137)
>>>>   at
>>> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
>>>>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763)
>>>>   at
>>> org.apache.hadoop.mapred.LineRecordReader.
>> <init>
>> (LineRecordReader.java:106)
>>>>   at
>>> org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
>>>>   at org.apache.spark.rdd.HadoopRDD$$anon$1.
>> <init>
>> (HadoopRDD.scala:193)
>>>> 
>>>> At last, I found /user/hive/warehouse/src/kv1.txt was created on the
>>> node
>>>> where I start spark-shell.
>>>> 
>>>> The spark that I used is pre-built spark1.0.1 for hadoop2.
>>>> 
>>>> Thanks in advance.
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-is-creating-metastore-warehouse-locally-instead-of-in-hdfs-tp10838p11111.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: HiveContext is creating metastore warehouse locally instead of in hdfs

Reply via email to