We can set a path, refer to the unit tests. For example:
df.saveAsTable("savedJsonTable", "org.apache.spark.sql.json", "append", path
=tmpPath)
<https://github.com/apache/spark/blob/master/python/pyspark/sql/tests.py>

Investigating some more, I found that the table is being created at the
specified location, but the error is still being thrown, and the table has
not been stored. This is the code that I ran:

>>> a = [Row(key=k, value=str(k)) for k in range(100)]
>>> df =  sc.parallelize(a).toDF()
>>> df.saveAsTable("savedJsonTable", "org.apache.spark.sql.json", "append",
path="/tmp/test10")
15/03/27 10:45:13 ERROR RetryingHMSHandler:
MetaException(message:file:/user/hive/warehouse/savedjsontable is not a
directory or unable to create one)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1239)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1294)
...
>>> sqlCtx.tables()
DataFrame[tableName: string, isTemporary: boolean]
>>> exit()
~> cat /tmp/test10/part-00000
{"key":0,"value":"0"}
{"key":1,"value":"1"}
{"key":2,"value":"2"}
{"key":3,"value":"3"}
{"key":4,"value":"4"}
{"key":5,"value":"5"}

Kind Regards,
Tom







On 27 March 2015 at 10:33, Yanbo Liang <yblia...@gmail.com> wrote:

> "saveAsTable" will use the default data source configured by
> spark.sql.sources.default.
>
> def saveAsTable(tableName: String): Unit = {
>     saveAsTable(tableName, SaveMode.ErrorIfExists)
>   }
>
> It can not set "path" if I understand correct.
>
> 2015-03-27 15:45 GMT+08:00 Tom Walwyn <twal...@gmail.com>:
>
>> Hi,
>>
>> The behaviour is the same for me in Scala and Python, so posting here in
>> Python. When I use DataFrame.saveAsTable with the path option, I expect an
>> external Hive table to be created at the specified path. Specifically, when
>> I call:
>>
>> >>>  df.saveAsTable(..., path="/tmp/test")
>>
>> I expect an external Hive table to be created pointing to /tmp/test which
>> would contain the data in df.
>>
>> However, running locally on my Mac, I get an error indicating that Spark
>> tried to create a managed table in the location of the Hive warehouse:
>>
>> ERROR RetryingHMSHandler:
>> MetaException(message:file:/user/hive/warehouse/savetable is not a
>> directory or unable to create one)
>> at
>> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1239)
>> at
>> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1294)
>>
>> Am I wrong to expect that Spark create an external table in this case?
>> What is the expected behaviour of saveAsTable with the path option?
>>
>> Setup: running spark locally with spark 1.3.0.
>>
>> Kind Regards,
>> Tom
>>
>
>

Reply via email to