subject:"\"saveAsTable fails to save RDD in Spark SQL 1.3.0\""

RE: saveAsTable fails to save RDD in Spark SQL 1.3.0

2015-03-18 Thread Shahdad Moradi

Sun,
Just want to confirm that it was in fact an authentication issue.
The issue is resolved now and I can see my tables through Simba ODBC driver.

Thanks a lot.
Shahdad



From: fightf...@163.com [mailto:fightf...@163.com]
Sent: March-17-15 6:33 PM
To: Shahdad Moradi; user
Subject: Re: saveAsTable fails to save RDD in Spark SQL 1.3.0

Looks like some authentification issues. Can you check that your current user
had authority to operate (maybe r/w/x) on /user/hive/warehouse?

Thanks,
Sun.


fightf...@163.com<mailto:fightf...@163.com>

From: smoradi<mailto:smor...@currenex.com>
Date: 2015-03-18 09:24
To: user<mailto:user@spark.apache.org>
Subject: saveAsTable fails to save RDD in Spark SQL 1.3.0
Hi,
Basically my goal is to make the Spark SQL RDDs available to Tableau
software through Simba ODBC driver.
I’m running standalone Spark 1.3.0 on Ubuntu 14.04. Got the source code and
complied it with maven.
Hive is also setup and connected to mysql all on a the same machine. The
hive-site.xml file has been copied to spark/conf. Here is the content of the
hive-site.xml:


  
javax.jdo.option.ConnectionURL

jdbc:MySql://localhost:3306/metastore_db?createDatabaseIfNotExist=true
metadata is stored in a MySQL server
  
  
hive.metastore.schema.verification
false
  
  
javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
MySQL JDBC driver class
  
  
javax.jdo.option.ConnectionUserName
hiveuser
user name for connecting to mysql server

  
  
javax.jdo.option.ConnectionPassword
hivepassword
password for connecting to mysql server

  


Both hive and mysql work just fine. I can create a table with Hive and find
it in mysql.
The thriftserver is also configured and connected to the spark master.
Everything works just fine and I can monitor all the workers and running
applications through spark master UI.
I have a very simple python script to convert a json file to an RDD like
this:

import json

def transform(data):
ts  = data[:25].strip()
jss = data[41:].strip()
jsj = json.loads(jss)
jsj['ts'] = ts
return json.dumps(jsj)

from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
rdd  = sc.textFile("myfile")
tbl = sqlContext.jsonRDD(rdd.map(transform))
tbl.saveAsTable("neworder")

the saveAsTable fails with this:
15/03/17 17:22:17 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks
have all completed, from pool
Traceback (most recent call last):
  File "", line 1, in 
  File "/opt/spark/python/pyspark/sql/dataframe.py", line 191, in
saveAsTable
self._jdf.saveAsTable(tableName, source, jmode, joptions)
  File "/opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
line 538, in __call__
  File "/opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line
300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling
o31.saveAsTable.
: java.io.IOException: Failed to rename
DeprecatedRawLocalFileStatus{path=file:/user/hive/warehouse/neworder/_temporary/0/task_201503171618_0008_r_01/part-r-2.parquet;
isDirectory=false; length=5591; replication=1; blocksize=33554432;
modification_time=142663430; access_time=0; owner=; group=;
permission=rw-rw-rw-; isSymlink=false} to
file:/user/hive/warehouse/neworder/part-r-2.parquet
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:346)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
at
parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:43)
at
org.apache.spark.sql.parquet.ParquetRelation2.insert(newParquet.scala:649)
at
org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:126)
at
org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:308)
at
org.apache.spark.sql.hive.execution.CreateMetastoreDataSourceAsSelect.run(commands.scala:217)
at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:55)
at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:55)
at
org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:65)
at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1088)
at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1088)
at
org.apache.spark.sql.DataFrame.saveAsT

RE: saveAsTable fails to save RDD in Spark SQL 1.3.0

2015-03-18 Thread Shahdad Moradi

/user/hive/warehouse is a hdfs location.
I’ve changed the mod for this location but I’m still having the same issue.


hduser@hadoop01-VirtualBox:/opt/spark/bin$ hdfs dfs -chmod -R 777  /user/hive
hduser@hadoop01-VirtualBox:/opt/spark/bin$ hdfs dfs -ls /user/hive/warehouse
Found 1 items




15/03/18 09:31:47 INFO DAGScheduler: Stage 3 (runJob at newParquet.scala:648) 
finished in 0.347 s
15/03/18 09:31:47 INFO DAGScheduler: Job 3 finished: runJob at 
newParquet.scala:648, took 0.549170 s
Traceback (most recent call last):
  File "", line 1, in 
  File "/opt/spark/python/pyspark/sql/dataframe.py", line 191, in saveAsTable
self._jdf.saveAsTable(tableName, source, jmode, joptions)
  File "/opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 
538, in __call__
  File "/opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, 
in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o49.saveAsTable.
: java.io.IOException: Failed to rename 
DeprecatedRawLocalFileStatus{path=file:/user/hive/warehouse/order04/_temporary/0/task_201503180931_0017_r_01/part-r-2.parquet;
 isDirectory=false; length=5591; replication=1; blocksize=33554432; 
modification_time=1426696307000; access_time=0; owner=; group=; 
permission=rw-rw-rw-; isSymlink=false} to 
file:/user/hive/warehouse/order04/part-r-2.parquet
at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:346)
at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362)
at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
at 
parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:43)
at 
org.apache.spark.sql.parquet.ParquetRelation2.insert(newParquet.scala:649)
at 
org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:126)
at 
org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:308)
at 
org.apache.spark.sql.hive.execution.CreateMetastoreDataSourceAsSelect.run(commands.scala:217)
at 
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:55)
at 
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:55)
at 
org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:65)
at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1088)
at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1088)
at 
org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:1048)
at 
org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:1018)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at 
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:259)
at 
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)


any help is appreciated.
Thanks

From: fightf...@163.com [mailto:fightf...@163.com]
Sent: March-17-15 6:33 PM
To: Shahdad Moradi; user
Subject: Re: saveAsTable fails to save RDD in Spark SQL 1.3.0

Looks like some authentification issues. Can you check that your current user
had authority to operate (maybe r/w/x) on /user/hive/warehouse?

Thanks,
Sun.


fightf...@163.com<mailto:fightf...@163.com>

From: smoradi<mailto:smor...@currenex.com>
Date: 2015-03-18 09:24
To: user<mailto:user@spark.apache.org>
Subject: saveAsTable fails to save RDD in Spark SQL 1.3.0
Hi,
Basically my goal is to make the Spark SQL RDDs available to Tableau
software through Simba ODBC driver.
I’m running standalone Spark 1.3.0 on Ubuntu 14.04. Got the source code and
complied it with maven.
Hive is also setup and connected to mysql all on a the same machine. The
hive-site.xml file has been copied to spark/conf. Here is the content of the
hive-site.xml:


  
javax.jdo.option.ConnectionURL

jdbc:MySql://localhost:3306/metastore_db?createDatabaseIfNotExist=true
metadata is stored in a MySQL server

Re: saveAsTable fails to save RDD in Spark SQL 1.3.0

2015-03-17 Thread fightf...@163.com

Looks like some authentification issues. Can you check that your current user 
had authority to operate (maybe r/w/x) on /user/hive/warehouse?

Thanks,
Sun.



fightf...@163.com
 
From: smoradi
Date: 2015-03-18 09:24
To: user
Subject: saveAsTable fails to save RDD in Spark SQL 1.3.0
Hi,
Basically my goal is to make the Spark SQL RDDs available to Tableau
software through Simba ODBC driver.
I’m running standalone Spark 1.3.0 on Ubuntu 14.04. Got the source code and
complied it with maven.
Hive is also setup and connected to mysql all on a the same machine. The
hive-site.xml file has been copied to spark/conf. Here is the content of the
hive-site.xml:
 

  
javax.jdo.option.ConnectionURL
   
jdbc:MySql://localhost:3306/metastore_db?createDatabaseIfNotExist=true
metadata is stored in a MySQL server
  
  
hive.metastore.schema.verification
false
  
  
javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
MySQL JDBC driver class
  
  
javax.jdo.option.ConnectionUserName
hiveuser
user name for connecting to mysql server

  
  
javax.jdo.option.ConnectionPassword
hivepassword
password for connecting to mysql server

  

 
Both hive and mysql work just fine. I can create a table with Hive and find
it in mysql.
The thriftserver is also configured and connected to the spark master.
Everything works just fine and I can monitor all the workers and running
applications through spark master UI.
I have a very simple python script to convert a json file to an RDD like
this:
 
import json
 
def transform(data):
ts  = data[:25].strip()
jss = data[41:].strip()
jsj = json.loads(jss)
jsj['ts'] = ts
return json.dumps(jsj)
 
from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
rdd  = sc.textFile("myfile")
tbl = sqlContext.jsonRDD(rdd.map(transform))
tbl.saveAsTable("neworder")
 
the saveAsTable fails with this:
15/03/17 17:22:17 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks
have all completed, from pool 
Traceback (most recent call last):
  File "", line 1, in 
  File "/opt/spark/python/pyspark/sql/dataframe.py", line 191, in
saveAsTable
self._jdf.saveAsTable(tableName, source, jmode, joptions)
  File "/opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
line 538, in __call__
  File "/opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line
300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling
o31.saveAsTable.
: java.io.IOException: Failed to rename
DeprecatedRawLocalFileStatus{path=file:/user/hive/warehouse/neworder/_temporary/0/task_201503171618_0008_r_01/part-r-2.parquet;
isDirectory=false; length=5591; replication=1; blocksize=33554432;
modification_time=142663430; access_time=0; owner=; group=;
permission=rw-rw-rw-; isSymlink=false} to
file:/user/hive/warehouse/neworder/part-r-2.parquet
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:346)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
at
parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:43)
at
org.apache.spark.sql.parquet.ParquetRelation2.insert(newParquet.scala:649)
at
org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:126)
at
org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:308)
at
org.apache.spark.sql.hive.execution.CreateMetastoreDataSourceAsSelect.run(commands.scala:217)
at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:55)
at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:55)
at
org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:65)
at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1088)
at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1088)
at
org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:1048)
at
org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:1018)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)

saveAsTable fails to save RDD in Spark SQL 1.3.0

2015-03-17 Thread smoradi

Hi,
Basically my goal is to make the Spark SQL RDDs available to Tableau
software through Simba ODBC driver.
I’m running standalone Spark 1.3.0 on Ubuntu 14.04. Got the source code and
complied it with maven.
Hive is also setup and connected to mysql all on a the same machine. The
hive-site.xml file has been copied to spark/conf. Here is the content of the
hive-site.xml:


  
javax.jdo.option.ConnectionURL
   
jdbc:MySql://localhost:3306/metastore_db?createDatabaseIfNotExist=true
metadata is stored in a MySQL server
  
  
hive.metastore.schema.verification
false
  
  
javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
MySQL JDBC driver class
  
  
javax.jdo.option.ConnectionUserName
hiveuser
user name for connecting to mysql server

  
  
javax.jdo.option.ConnectionPassword
hivepassword
password for connecting to mysql server

  


Both hive and mysql work just fine. I can create a table with Hive and find
it in mysql.
The thriftserver is also configured and connected to the spark master.
Everything works just fine and I can monitor all the workers and running
applications through spark master UI.
I have a very simple python script to convert a json file to an RDD like
this:

import json

def transform(data):
ts  = data[:25].strip()
jss = data[41:].strip()
jsj = json.loads(jss)
jsj['ts'] = ts
return json.dumps(jsj)

from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
rdd  = sc.textFile("myfile")
tbl = sqlContext.jsonRDD(rdd.map(transform))
tbl.saveAsTable("neworder")

the saveAsTable fails with this:
15/03/17 17:22:17 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks
have all completed, from pool 
Traceback (most recent call last):
  File "", line 1, in 
  File "/opt/spark/python/pyspark/sql/dataframe.py", line 191, in
saveAsTable
self._jdf.saveAsTable(tableName, source, jmode, joptions)
  File "/opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
line 538, in __call__
  File "/opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line
300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling
o31.saveAsTable.
: java.io.IOException: Failed to rename
DeprecatedRawLocalFileStatus{path=file:/user/hive/warehouse/neworder/_temporary/0/task_201503171618_0008_r_01/part-r-2.parquet;
isDirectory=false; length=5591; replication=1; blocksize=33554432;
modification_time=142663430; access_time=0; owner=; group=;
permission=rw-rw-rw-; isSymlink=false} to
file:/user/hive/warehouse/neworder/part-r-2.parquet
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:346)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
at
parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:43)
at
org.apache.spark.sql.parquet.ParquetRelation2.insert(newParquet.scala:649)
at
org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:126)
at
org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:308)
at
org.apache.spark.sql.hive.execution.CreateMetastoreDataSourceAsSelect.run(commands.scala:217)
at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:55)
at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:55)
at
org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:65)
at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1088)
at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1088)
at
org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:1048)
at
org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:1018)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:259)
at
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(Call

RE: saveAsTable fails to save RDD in Spark SQL 1.3.0

RE: saveAsTable fails to save RDD in Spark SQL 1.3.0

Re: saveAsTable fails to save RDD in Spark SQL 1.3.0

saveAsTable fails to save RDD in Spark SQL 1.3.0

4 matches

Site Navigation

Mail list logo

Footer information