Re: saveAsTable with path not working as expected (pyspark + Scala)

2015-03-27 Thread Tom Walwyn
We can set a path, refer to the unit tests. For example:
df.saveAsTable(savedJsonTable, org.apache.spark.sql.json, append, path
=tmpPath)
https://github.com/apache/spark/blob/master/python/pyspark/sql/tests.py

Investigating some more, I found that the table is being created at the
specified location, but the error is still being thrown, and the table has
not been stored. This is the code that I ran:

 a = [Row(key=k, value=str(k)) for k in range(100)]
 df =  sc.parallelize(a).toDF()
 df.saveAsTable(savedJsonTable, org.apache.spark.sql.json, append,
path=/tmp/test10)
15/03/27 10:45:13 ERROR RetryingHMSHandler:
MetaException(message:file:/user/hive/warehouse/savedjsontable is not a
directory or unable to create one)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1239)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1294)
...
 sqlCtx.tables()
DataFrame[tableName: string, isTemporary: boolean]
 exit()
~ cat /tmp/test10/part-0
{key:0,value:0}
{key:1,value:1}
{key:2,value:2}
{key:3,value:3}
{key:4,value:4}
{key:5,value:5}

Kind Regards,
Tom







On 27 March 2015 at 10:33, Yanbo Liang yblia...@gmail.com wrote:

 saveAsTable will use the default data source configured by
 spark.sql.sources.default.

 def saveAsTable(tableName: String): Unit = {
 saveAsTable(tableName, SaveMode.ErrorIfExists)
   }

 It can not set path if I understand correct.

 2015-03-27 15:45 GMT+08:00 Tom Walwyn twal...@gmail.com:

 Hi,

 The behaviour is the same for me in Scala and Python, so posting here in
 Python. When I use DataFrame.saveAsTable with the path option, I expect an
 external Hive table to be created at the specified path. Specifically, when
 I call:

   df.saveAsTable(..., path=/tmp/test)

 I expect an external Hive table to be created pointing to /tmp/test which
 would contain the data in df.

 However, running locally on my Mac, I get an error indicating that Spark
 tried to create a managed table in the location of the Hive warehouse:

 ERROR RetryingHMSHandler:
 MetaException(message:file:/user/hive/warehouse/savetable is not a
 directory or unable to create one)
 at
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1239)
 at
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1294)

 Am I wrong to expect that Spark create an external table in this case?
 What is the expected behaviour of saveAsTable with the path option?

 Setup: running spark locally with spark 1.3.0.

 Kind Regards,
 Tom





saveAsTable with path not working as expected (pyspark + Scala)

2015-03-27 Thread Tom Walwyn
Hi,

The behaviour is the same for me in Scala and Python, so posting here in
Python. When I use DataFrame.saveAsTable with the path option, I expect an
external Hive table to be created at the specified path. Specifically, when
I call:

  df.saveAsTable(..., path=/tmp/test)

I expect an external Hive table to be created pointing to /tmp/test which
would contain the data in df.

However, running locally on my Mac, I get an error indicating that Spark
tried to create a managed table in the location of the Hive warehouse:

ERROR RetryingHMSHandler:
MetaException(message:file:/user/hive/warehouse/savetable is not a
directory or unable to create one)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1239)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1294)

Am I wrong to expect that Spark create an external table in this case? What
is the expected behaviour of saveAsTable with the path option?

Setup: running spark locally with spark 1.3.0.

Kind Regards,
Tom


Re: saveAsTable with path not working as expected (pyspark + Scala)

2015-03-27 Thread Tom Walwyn
Another follow-up: saveAsTable works as expected when running on hadoop
cluster with Hive installed. It's just locally that I'm getting this
strange behaviour. Any ideas why this is happening?

Kind Regards.
Tom

On 27 March 2015 at 11:29, Tom Walwyn twal...@gmail.com wrote:

 We can set a path, refer to the unit tests. For example:
 df.saveAsTable(savedJsonTable, org.apache.spark.sql.json, append,
 path=tmpPath)
 https://github.com/apache/spark/blob/master/python/pyspark/sql/tests.py

 Investigating some more, I found that the table is being created at the
 specified location, but the error is still being thrown, and the table has
 not been stored. This is the code that I ran:

  a = [Row(key=k, value=str(k)) for k in range(100)]
  df =  sc.parallelize(a).toDF()
  df.saveAsTable(savedJsonTable, org.apache.spark.sql.json,
 append, path=/tmp/test10)
 15/03/27 10:45:13 ERROR RetryingHMSHandler:
 MetaException(message:file:/user/hive/warehouse/savedjsontable is not a
 directory or unable to create one)
 at
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1239)
 at
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1294)
 ...
  sqlCtx.tables()
 DataFrame[tableName: string, isTemporary: boolean]
  exit()
 ~ cat /tmp/test10/part-0
 {key:0,value:0}
 {key:1,value:1}
 {key:2,value:2}
 {key:3,value:3}
 {key:4,value:4}
 {key:5,value:5}

 Kind Regards,
 Tom







 On 27 March 2015 at 10:33, Yanbo Liang yblia...@gmail.com wrote:

 saveAsTable will use the default data source configured by
 spark.sql.sources.default.

 def saveAsTable(tableName: String): Unit = {
 saveAsTable(tableName, SaveMode.ErrorIfExists)
   }

 It can not set path if I understand correct.

 2015-03-27 15:45 GMT+08:00 Tom Walwyn twal...@gmail.com:

 Hi,

 The behaviour is the same for me in Scala and Python, so posting here in
 Python. When I use DataFrame.saveAsTable with the path option, I expect an
 external Hive table to be created at the specified path. Specifically, when
 I call:

   df.saveAsTable(..., path=/tmp/test)

 I expect an external Hive table to be created pointing to /tmp/test
 which would contain the data in df.

 However, running locally on my Mac, I get an error indicating that Spark
 tried to create a managed table in the location of the Hive warehouse:

 ERROR RetryingHMSHandler:
 MetaException(message:file:/user/hive/warehouse/savetable is not a
 directory or unable to create one)
 at
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1239)
 at
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1294)

 Am I wrong to expect that Spark create an external table in this case?
 What is the expected behaviour of saveAsTable with the path option?

 Setup: running spark locally with spark 1.3.0.

 Kind Regards,
 Tom