[jira] [Updated] (SPARK-9600) DataFrameWriter.saveAsTable always writes data to "/user/hive/warehouse"

2015-08-10 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-9600:

Priority: Blocker  (was: Critical)

> DataFrameWriter.saveAsTable always writes data to "/user/hive/warehouse"
> 
>
> Key: SPARK-9600
> URL: https://issues.apache.org/jira/browse/SPARK-9600
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.0
>Reporter: Cheng Lian
>Assignee: Sudhakar Thota
>Priority: Blocker
> Attachments: SPARK-9600-fl1.txt
>
>
> Get a clean Spark 1.4.1 build:
> {noformat}
> $ git checkout v1.4.1
> $ ./build/sbt -Phive -Phive-thriftserver -Phadoop-1 -Dhadoop.version=1.2.1 
> clean assembly/assembly
> {noformat}
> Stop any running local Hadoop instance and unset all Hadoop environment 
> variables, so that we force Spark run with local file system only:
> {noformat}
> $ unset HADOOP_CONF_DIR
> $ unset HADOOP_PREFIX
> $ unset HADOOP_LIBEXEC_DIR
> $ unset HADOOP_CLASSPATH
> {noformat}
> In this way we also ensure that the default Hive warehouse location points to 
> local file system {{file:///user/hive/warehouse}}.  Now we create warehouse 
> directories for testing:
> {noformat}
> $ sudo rm -rf /user  # !! WARNING: IT'S /user RATHER THAN /usr !!
> $ sudo mkdir -p /user/hive/{warehouse,warehouse_hive13}
> $ sudo chown -R lian:staff /user
> $ tree /user
> /user
> └── hive
> ├── warehouse
> └── warehouse_hive13
> {noformat}
> Create a minimal {{hive-site.xml}}, only override the warehouse location, put 
> it under {{$SPARK_HOME/conf}}:
> {noformat}
> 
> 
> 
>   
> hive.metastore.warehouse.dir
> file:///user/hive/warehouse_hive13
>   
> 
> {noformat}
> Now run our test snippets with {{pyspark}}:
> {noformat}
> $ ./bin/pyspark
> In [1]: sqlContext.range(10).coalesce(1).write.saveAsTable("ds")
> {noformat}
> Check warehouse directories:
> {noformat}
> $ tree /user
> /user
> └── hive
> ├── warehouse
> │   └── ds
> │   ├── _SUCCESS
> │   ├── _common_metadata
> │   ├── _metadata
> │   └── part-r-0-46e4b32a-5c4d-4dba-b8d6-8d30ae910dc9.gz.parquet
> └── warehouse_hive13
> └── ds
> {noformat}
> Here you may notice the weird part: we have {{ds}} under both {{warehouse}} 
> and {{warehouse_hive13}}, but data are only written into the former.
> Now let's try HiveQl:
> {noformat}
> In [2]: sqlContext.range(10).coalesce(1).registerTempTable("t")
> In [3]: sqlContext.sql("CREATE TABLE ds_ctas AS SELECT * FROM t")
> {noformat}
> Check the directories again:
> {noformat}
> $ tree /user
> /user
> └── hive
> ├── warehouse
> │   └── ds
> │   ├── _SUCCESS
> │   ├── _common_metadata
> │   ├── _metadata
> │   └── part-r-0-46e4b32a-5c4d-4dba-b8d6-8d30ae910dc9.gz.parquet
> └── warehouse_hive13
> ├── ds
> └── ds_ctas
> ├── _SUCCESS
> └── part-0
> {noformat}
> So HiveQl works fine.  (Hive never writes Parquet summary files, so 
> {{_common_metadata}} and {{_metadata}} are missing in {{ds_ctas}}).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9600) DataFrameWriter.saveAsTable always writes data to "/user/hive/warehouse"

2015-08-08 Thread Cheng Lian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-9600:
--
Description: 
Get a clean Spark 1.4.1 build:
{noformat}
$ git checkout v1.4.1
$ ./build/sbt -Phive -Phive-thriftserver -Phadoop-1 -Dhadoop.version=1.2.1 
clean assembly/assembly
{noformat}
Stop any running local Hadoop instance and unset all Hadoop environment 
variables, so that we force Spark run with local file system only:
{noformat}
$ unset HADOOP_CONF_DIR
$ unset HADOOP_PREFIX
$ unset HADOOP_LIBEXEC_DIR
$ unset HADOOP_CLASSPATH
{noformat}
In this way we also ensure that the default Hive warehouse location points to 
local file system {{file:///user/hive/warehouse}}.  Now we create warehouse 
directories for testing:
{noformat}
$ sudo rm -rf /user  # !! WARNING: IT'S /user RATHER THAN /usr !!
$ sudo mkdir -p /user/hive/{warehouse,warehouse_hive13}
$ sudo chown -R lian:staff /user
$ tree /user
/user
└── hive
├── warehouse
└── warehouse_hive13
{noformat}
Create a minimal {{hive-site.xml}}, only override the warehouse location, put 
it under {{$SPARK_HOME/conf}}:
{noformat}



  
hive.metastore.warehouse.dir
file:///user/hive/warehouse_hive13
  

{noformat}
Now run our test snippets with {{pyspark}}:
{noformat}
$ ./bin/pyspark
In [1]: sqlContext.range(10).coalesce(1).write.saveAsTable("ds")
{noformat}
Check warehouse directories:
{noformat}
$ tree /user
/user
└── hive
├── warehouse
│   └── ds
│   ├── _SUCCESS
│   ├── _common_metadata
│   ├── _metadata
│   └── part-r-0-46e4b32a-5c4d-4dba-b8d6-8d30ae910dc9.gz.parquet
└── warehouse_hive13
└── ds
{noformat}
Here you may notice the weird part: we have {{ds}} under both {{warehouse}} and 
{{warehouse_hive13}}, but data are only written into the former.

Now let's try HiveQl:
{noformat}
In [2]: sqlContext.range(10).coalesce(1).registerTempTable("t")
In [3]: sqlContext.sql("CREATE TABLE ds_ctas AS SELECT * FROM t")
{noformat}
Check the directories again:
{noformat}
$ tree /user
/user
└── hive
├── warehouse
│   └── ds
│   ├── _SUCCESS
│   ├── _common_metadata
│   ├── _metadata
│   └── part-r-0-46e4b32a-5c4d-4dba-b8d6-8d30ae910dc9.gz.parquet
└── warehouse_hive13
├── ds
└── ds_ctas
├── _SUCCESS
└── part-0
{noformat}
So HiveQl works fine.  (Hive never writes Parquet summary files, so 
{{_common_metadata}} and {{_metadata}} are missing in {{ds_ctas}}).

  was:
Having a {{hive-site.xml}} with a non-default {{hive.metastore.warehouse.dir}} 
value, Spark SQL still writes to the default warehouse location 
{{/user/hive/warehouse}} when saving data source tables using 
{{DataFrameWriter.saveAsTable()}}:
{noformat}



  
javax.jdo.option.ConnectionURL
jdbc:mysql://localhost/metastore_hive13_hadoop2
  

  
javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
  

  
javax.jdo.option.ConnectionUserName
hive
  

  
javax.jdo.option.ConnectionPassword
password
  

  
hive.metastore.warehouse.dir
hdfs://localhost:9000/user/hive/warehouse_hive13
  

{noformat}
Spark shell snippet to reproduce:
{noformat}
sqlContext.range(10).write.saveAsTable("xxx")
{noformat}
Running {{DESC EXTENDED xxx}} in Hive to check SerDe propertyies:
{noformat}
...
location:hdfs://localhost:9000/user/hive/warehouse_hive13/xxx
...
parameters:{path=hdfs://localhost:9000/user/hive/warehouse/xxx, 
serialization.format=1})
...
{noformat}
We are probably using execution Hive configuration when calling 
{{HiveMetastoreCatalog.hiveDefaultTableFilePath()}}.


> DataFrameWriter.saveAsTable always writes data to "/user/hive/warehouse"
> 
>
> Key: SPARK-9600
> URL: https://issues.apache.org/jira/browse/SPARK-9600
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.0
>Reporter: Cheng Lian
>Assignee: Sudhakar Thota
>Priority: Critical
> Attachments: SPARK-9600-fl1.txt
>
>
> Get a clean Spark 1.4.1 build:
> {noformat}
> $ git checkout v1.4.1
> $ ./build/sbt -Phive -Phive-thriftserver -Phadoop-1 -Dhadoop.version=1.2.1 
> clean assembly/assembly
> {noformat}
> Stop any running local Hadoop instance and unset all Hadoop environment 
> variables, so that we force Spark run with local file system only:
> {noformat}
> $ unset HADOOP_CONF_DIR
> $ unset HADOOP_PREFIX
> $ unset HADOOP_LIBEXEC_DIR
> $ unset HADOOP_CLASSPATH
> {noformat}
> In this way we also ensure that the default Hive warehouse location points to 
> local file system {{file:///user/hive/warehouse}}.  Now we create warehouse 
> directories for testing:
> {noformat}
> $ sudo rm -rf /user  # !! WARNING: IT'S /user RATHER THAN /usr !!
> $ sudo mkdir -p /user/

[jira] [Updated] (SPARK-9600) DataFrameWriter.saveAsTable always writes data to "/user/hive/warehouse"

2015-08-07 Thread Sudhakar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudhakar Thota updated SPARK-9600:
--
Attachment: SPARK-9600-fl1.txt

my screen content

> DataFrameWriter.saveAsTable always writes data to "/user/hive/warehouse"
> 
>
> Key: SPARK-9600
> URL: https://issues.apache.org/jira/browse/SPARK-9600
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.0
>Reporter: Cheng Lian
>Assignee: Sudhakar Thota
>Priority: Critical
> Attachments: SPARK-9600-fl1.txt
>
>
> Having a {{hive-site.xml}} with a non-default 
> {{hive.metastore.warehouse.dir}} value, Spark SQL still writes to the default 
> warehouse location {{/user/hive/warehouse}} when saving data source tables 
> using {{DataFrameWriter.saveAsTable()}}:
> {noformat}
> 
> 
> 
>   
> javax.jdo.option.ConnectionURL
> jdbc:mysql://localhost/metastore_hive13_hadoop2
>   
>   
> javax.jdo.option.ConnectionDriverName
> com.mysql.jdbc.Driver
>   
>   
> javax.jdo.option.ConnectionUserName
> hive
>   
>   
> javax.jdo.option.ConnectionPassword
> password
>   
>   
> hive.metastore.warehouse.dir
> hdfs://localhost:9000/user/hive/warehouse_hive13
>   
> 
> {noformat}
> Spark shell snippet to reproduce:
> {noformat}
> sqlContext.range(10).write.saveAsTable("xxx")
> {noformat}
> Running {{DESC EXTENDED xxx}} in Hive to check SerDe propertyies:
> {noformat}
> ...
> location:hdfs://localhost:9000/user/hive/warehouse_hive13/xxx
> ...
> parameters:{path=hdfs://localhost:9000/user/hive/warehouse/xxx, 
> serialization.format=1})
> ...
> {noformat}
> We are probably using execution Hive configuration when calling 
> {{HiveMetastoreCatalog.hiveDefaultTableFilePath()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9600) DataFrameWriter.saveAsTable always writes data to "/user/hive/warehouse"

2015-08-06 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-9600:

Sprint: Spark 1.5 doc/QA sprint

> DataFrameWriter.saveAsTable always writes data to "/user/hive/warehouse"
> 
>
> Key: SPARK-9600
> URL: https://issues.apache.org/jira/browse/SPARK-9600
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.0
>Reporter: Cheng Lian
>Assignee: Sudhakar Thota
>Priority: Critical
>
> Having a {{hive-site.xml}} with a non-default 
> {{hive.metastore.warehouse.dir}} value, Spark SQL still writes to the default 
> warehouse location {{/user/hive/warehouse}} when saving data source tables 
> using {{DataFrameWriter.saveAsTable()}}:
> {noformat}
> 
> 
> 
>   
> javax.jdo.option.ConnectionURL
> jdbc:mysql://localhost/metastore_hive13_hadoop2
>   
>   
> javax.jdo.option.ConnectionDriverName
> com.mysql.jdbc.Driver
>   
>   
> javax.jdo.option.ConnectionUserName
> hive
>   
>   
> javax.jdo.option.ConnectionPassword
> password
>   
>   
> hive.metastore.warehouse.dir
> hdfs://localhost:9000/user/hive/warehouse_hive13
>   
> 
> {noformat}
> Spark shell snippet to reproduce:
> {noformat}
> sqlContext.range(10).write.saveAsTable("xxx")
> {noformat}
> Running {{DESC EXTENDED xxx}} in Hive to check SerDe propertyies:
> {noformat}
> ...
> location:hdfs://localhost:9000/user/hive/warehouse_hive13/xxx
> ...
> parameters:{path=hdfs://localhost:9000/user/hive/warehouse/xxx, 
> serialization.format=1})
> ...
> {noformat}
> We are probably using execution Hive configuration when calling 
> {{HiveMetastoreCatalog.hiveDefaultTableFilePath()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9600) DataFrameWriter.saveAsTable always writes data to "/user/hive/warehouse"

2015-08-06 Thread Cheng Lian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-9600:
--
Assignee: Sudhakar Thota

> DataFrameWriter.saveAsTable always writes data to "/user/hive/warehouse"
> 
>
> Key: SPARK-9600
> URL: https://issues.apache.org/jira/browse/SPARK-9600
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.0
>Reporter: Cheng Lian
>Assignee: Sudhakar Thota
>Priority: Critical
>
> Having a {{hive-site.xml}} with a non-default 
> {{hive.metastore.warehouse.dir}} value, Spark SQL still writes to the default 
> warehouse location {{/user/hive/warehouse}} when saving data source tables 
> using {{DataFrameWriter.saveAsTable()}}:
> {noformat}
> 
> 
> 
>   
> javax.jdo.option.ConnectionURL
> jdbc:mysql://localhost/metastore_hive13_hadoop2
>   
>   
> javax.jdo.option.ConnectionDriverName
> com.mysql.jdbc.Driver
>   
>   
> javax.jdo.option.ConnectionUserName
> hive
>   
>   
> javax.jdo.option.ConnectionPassword
> password
>   
>   
> hive.metastore.warehouse.dir
> hdfs://localhost:9000/user/hive/warehouse_hive13
>   
> 
> {noformat}
> Spark shell snippet to reproduce:
> {noformat}
> sqlContext.range(10).write.saveAsTable("xxx")
> {noformat}
> Running {{DESC EXTENDED xxx}} in Hive to check SerDe propertyies:
> {noformat}
> ...
> location:hdfs://localhost:9000/user/hive/warehouse_hive13/xxx
> ...
> parameters:{path=hdfs://localhost:9000/user/hive/warehouse/xxx, 
> serialization.format=1})
> ...
> {noformat}
> We are probably using execution Hive configuration when calling 
> {{HiveMetastoreCatalog.hiveDefaultTableFilePath()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org