RE: saveAsTable fails to save RDD in Spark SQL 1.3.0
/user/hive/warehouse is a hdfs location. I’ve changed the mod for this location but I’m still having the same issue. hduser@hadoop01-VirtualBox:/opt/spark/bin$ hdfs dfs -chmod -R 777 /user/hive hduser@hadoop01-VirtualBox:/opt/spark/bin$ hdfs dfs -ls /user/hive/warehouse Found 1 items 15/03/18 09:31:47 INFO DAGScheduler: Stage 3 (runJob at newParquet.scala:648) finished in 0.347 s 15/03/18 09:31:47 INFO DAGScheduler: Job 3 finished: runJob at newParquet.scala:648, took 0.549170 s Traceback (most recent call last): File stdin, line 1, in module File /opt/spark/python/pyspark/sql/dataframe.py, line 191, in saveAsTable self._jdf.saveAsTable(tableName, source, jmode, joptions) File /opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py, line 538, in __call__ File /opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py, line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o49.saveAsTable. : java.io.IOException: Failed to rename DeprecatedRawLocalFileStatus{path=file:/user/hive/warehouse/order04/_temporary/0/task_201503180931_0017_r_01/part-r-2.parquet; isDirectory=false; length=5591; replication=1; blocksize=33554432; modification_time=1426696307000; access_time=0; owner=; group=; permission=rw-rw-rw-; isSymlink=false} to file:/user/hive/warehouse/order04/part-r-2.parquet at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:346) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310) at parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:43) at org.apache.spark.sql.parquet.ParquetRelation2.insert(newParquet.scala:649) at org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:126) at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:308) at org.apache.spark.sql.hive.execution.CreateMetastoreDataSourceAsSelect.run(commands.scala:217) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:55) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:55) at org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:65) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1088) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1088) at org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:1048) at org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:1018) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745) any help is appreciated. Thanks From: fightf...@163.com [mailto:fightf...@163.com] Sent: March-17-15 6:33 PM To: Shahdad Moradi; user Subject: Re: saveAsTable fails to save RDD in Spark SQL 1.3.0 Looks like some authentification issues. Can you check that your current user had authority to operate (maybe r/w/x) on /user/hive/warehouse? Thanks, Sun. fightf...@163.commailto:fightf...@163.com From: smoradimailto:smor...@currenex.com Date: 2015-03-18 09:24 To: usermailto:user@spark.apache.org Subject: saveAsTable fails to save RDD in Spark SQL 1.3.0 Hi, Basically my goal is to make the Spark SQL RDDs available to Tableau software through Simba ODBC driver. I’m running standalone Spark 1.3.0 on Ubuntu 14.04. Got the source code and complied it with maven. Hive is also setup and connected to mysql all on a the same machine. The hive-site.xml file has been copied to spark/conf. Here is the content of the hive-site.xml: configuration property namejavax.jdo.option.ConnectionURL/name valuejdbc:MySql://localhost:3306/metastore_db?createDatabaseIfNotExist=true/value descriptionmetadata is stored in a MySQL server/description
RE: saveAsTable fails to save RDD in Spark SQL 1.3.0
Sun, Just want to confirm that it was in fact an authentication issue. The issue is resolved now and I can see my tables through Simba ODBC driver. Thanks a lot. Shahdad From: fightf...@163.com [mailto:fightf...@163.com] Sent: March-17-15 6:33 PM To: Shahdad Moradi; user Subject: Re: saveAsTable fails to save RDD in Spark SQL 1.3.0 Looks like some authentification issues. Can you check that your current user had authority to operate (maybe r/w/x) on /user/hive/warehouse? Thanks, Sun. fightf...@163.commailto:fightf...@163.com From: smoradimailto:smor...@currenex.com Date: 2015-03-18 09:24 To: usermailto:user@spark.apache.org Subject: saveAsTable fails to save RDD in Spark SQL 1.3.0 Hi, Basically my goal is to make the Spark SQL RDDs available to Tableau software through Simba ODBC driver. I’m running standalone Spark 1.3.0 on Ubuntu 14.04. Got the source code and complied it with maven. Hive is also setup and connected to mysql all on a the same machine. The hive-site.xml file has been copied to spark/conf. Here is the content of the hive-site.xml: configuration property namejavax.jdo.option.ConnectionURL/name valuejdbc:MySql://localhost:3306/metastore_db?createDatabaseIfNotExist=true/value descriptionmetadata is stored in a MySQL server/description /property property namehive.metastore.schema.verification/name valuefalse/value /property property namejavax.jdo.option.ConnectionDriverName/name valuecom.mysql.jdbc.Driver/value descriptionMySQL JDBC driver class/description /property property namejavax.jdo.option.ConnectionUserName/name valuehiveuser/value descriptionuser name for connecting to mysql server /description /property property namejavax.jdo.option.ConnectionPassword/name valuehivepassword/value descriptionpassword for connecting to mysql server /description /property /configuration Both hive and mysql work just fine. I can create a table with Hive and find it in mysql. The thriftserver is also configured and connected to the spark master. Everything works just fine and I can monitor all the workers and running applications through spark master UI. I have a very simple python script to convert a json file to an RDD like this: import json def transform(data): ts = data[:25].strip() jss = data[41:].strip() jsj = json.loads(jss) jsj['ts'] = ts return json.dumps(jsj) from pyspark.sql import HiveContext sqlContext = HiveContext(sc) rdd = sc.textFile(myfile) tbl = sqlContext.jsonRDD(rdd.map(transform)) tbl.saveAsTable(neworder) the saveAsTable fails with this: 15/03/17 17:22:17 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool Traceback (most recent call last): File stdin, line 1, in module File /opt/spark/python/pyspark/sql/dataframe.py, line 191, in saveAsTable self._jdf.saveAsTable(tableName, source, jmode, joptions) File /opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py, line 538, in __call__ File /opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py, line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o31.saveAsTable. : java.io.IOException: Failed to rename DeprecatedRawLocalFileStatus{path=file:/user/hive/warehouse/neworder/_temporary/0/task_201503171618_0008_r_01/part-r-2.parquet; isDirectory=false; length=5591; replication=1; blocksize=33554432; modification_time=142663430; access_time=0; owner=; group=; permission=rw-rw-rw-; isSymlink=false} to file:/user/hive/warehouse/neworder/part-r-2.parquet at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:346) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310) at parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:43) at org.apache.spark.sql.parquet.ParquetRelation2.insert(newParquet.scala:649) at org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:126) at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:308) at org.apache.spark.sql.hive.execution.CreateMetastoreDataSourceAsSelect.run(commands.scala:217) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:55) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:55) at org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:65) at org.apache.spark.sql.SQLContext