[jira] [Commented] (SPARK-14331) Exceptions saving to parquetFile after join from dataframes in master

2016-05-20 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15294168#comment-15294168
 ] 

Davies Liu commented on SPARK-14331:


Could you post the full stacktrace? This exception should be caused by another 
one.

> Exceptions saving to parquetFile after join from dataframes in master
> -
>
> Key: SPARK-14331
> URL: https://issues.apache.org/jira/browse/SPARK-14331
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Thomas Graves
>Priority: Critical
>
> I'm trying to use master and write to a parquet file when using a dataframe 
> but am seeing the exception below.  Not sure exact state of dataframes right 
> now so if this is known issue let me know.
> I read 2 sources of parquet files, joined them, then saved them back.
>  val df_pixels = sqlContext.read.parquet("data1")
> val df_pixels_renamed = df_pixels.withColumnRenamed("photo_id", 
> "pixels_photo_id")
> val df_meta = sqlContext.read.parquet("data2")
> val df = df_meta.as("meta").join(df_pixels_renamed, $"meta.photo_id" === 
> $"pixels_photo_id", "inner").drop("pixels_photo_id")
> df.write.parquet(args(0))
> 16/04/01 17:21:34 ERROR InsertIntoHadoopFsRelation: Aborting job.
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
> Exchange hashpartitioning(pixels_photo_id#3, 2), None
> +- WholeStageCodegen
>:  +- Filter isnotnull(pixels_photo_id#3)
>: +- INPUT
>+- Coalesce 0
>   +- WholeStageCodegen
>  :  +- Project [img_data#0,photo_id#1 AS pixels_photo_id#3]
>  : +- Scan HadoopFiles[img_data#0,photo_id#1] Format: 
> ParquetFormat, PushedFilters: [], ReadSchema: 
> struct
> at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)
> at 
> org.apache.spark.sql.execution.exchange.ShuffleExchange.doExecute(ShuffleExchange.scala:109)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:137)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:134)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:117)
> at 
> org.apache.spark.sql.execution.InputAdapter.upstreams(WholeStageCodegen.scala:236)
> at org.apache.spark.sql.execution.Sort.upstreams(Sort.scala:104)
> at 
> org.apache.spark.sql.execution.WholeStageCodegen.doExecute(WholeStageCodegen.scala:351)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:137)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:134)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:117)
> at 
> org.apache.spark.sql.execution.InputAdapter.doExecute(WholeStageCodegen.scala:228)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:137)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:134)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:117)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14331) Exceptions saving to parquetFile after join from dataframes in master

2016-05-23 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296416#comment-15296416
 ] 

Thomas Graves commented on SPARK-14331:
---

I was trying to reproduce this to get you the rest but now I get a different 
exception, this is using sqlContext directly: 
val sqlContext = new org.apache.spark.sql.SQLContext(sc)

It looks like its perhaps using the wrong filesystem (using hadoop when it 
should use local) .

16/05/23 14:15:08 ERROR ApplicationMaster: User class threw exception: 
org.apache.spark.SparkException: Unable to create database default as failed to 
create its directory 
hdfs://nn1.com:8020/hadoop/tmp/yarn-local/usercache/tgraves/appcache/application_1463805142339_520258/container_e11_1463805142339_520258_01_01/spark-warehouse
org.apache.spark.SparkException: Unable to create database default as failed to 
create its directory 
hdfs://nn1.com:8020/hadoop/tmp/yarn-local/usercache/tgraves/appcache/application_1463805142339_520258/container_e11_1463805142339_520258_01_01/spark-warehouse
at 
org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.liftedTree1$1(InMemoryCatalog.scala:126)
at 
org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.createDatabase(InMemoryCatalog.scala:122)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:142)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.(SessionCatalog.scala:84)
at 
org.apache.spark.sql.internal.SessionState.catalog$lzycompute(SessionState.scala:94)
at 
org.apache.spark.sql.internal.SessionState.catalog(SessionState.scala:94)
at 
org.apache.spark.sql.internal.SessionState$$anon$1.(SessionState.scala:110)
at 
org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:110)
at 
org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:109)
at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:48)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:62)
at 
org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:383)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:154)
at 
org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:419)
at yahoo.spark.SparkFlickrLargeJoin$.main(SparkFlickrLargeJoin.scala:26)
at yahoo.spark.SparkFlickrLargeJoin.main(SparkFlickrLargeJoin.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:617)
Caused by: org.apache.hadoop.security.AccessControlException: Permission 
denied: user=tgraves, access=WRITE, inode="/":hdfs:hdfs:drwxr-xr-x
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:298)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:204)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:182)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem$PathResolver.verifyPermissions(FSNamesystem.java:8622)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3961)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:989)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:622)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:608)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server.call(Server.java:2267)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:648)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:615)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1720)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2217)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
s

[jira] [Commented] (SPARK-14331) Exceptions saving to parquetFile after join from dataframes in master

2016-05-23 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296440#comment-15296440
 ] 

Thomas Graves commented on SPARK-14331:
---

Note I was running spark on yarn.

> Exceptions saving to parquetFile after join from dataframes in master
> -
>
> Key: SPARK-14331
> URL: https://issues.apache.org/jira/browse/SPARK-14331
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Thomas Graves
>Priority: Critical
>
> I'm trying to use master and write to a parquet file when using a dataframe 
> but am seeing the exception below.  Not sure exact state of dataframes right 
> now so if this is known issue let me know.
> I read 2 sources of parquet files, joined them, then saved them back.
>  val df_pixels = sqlContext.read.parquet("data1")
> val df_pixels_renamed = df_pixels.withColumnRenamed("photo_id", 
> "pixels_photo_id")
> val df_meta = sqlContext.read.parquet("data2")
> val df = df_meta.as("meta").join(df_pixels_renamed, $"meta.photo_id" === 
> $"pixels_photo_id", "inner").drop("pixels_photo_id")
> df.write.parquet(args(0))
> 16/04/01 17:21:34 ERROR InsertIntoHadoopFsRelation: Aborting job.
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
> Exchange hashpartitioning(pixels_photo_id#3, 2), None
> +- WholeStageCodegen
>:  +- Filter isnotnull(pixels_photo_id#3)
>: +- INPUT
>+- Coalesce 0
>   +- WholeStageCodegen
>  :  +- Project [img_data#0,photo_id#1 AS pixels_photo_id#3]
>  : +- Scan HadoopFiles[img_data#0,photo_id#1] Format: 
> ParquetFormat, PushedFilters: [], ReadSchema: 
> struct
> at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)
> at 
> org.apache.spark.sql.execution.exchange.ShuffleExchange.doExecute(ShuffleExchange.scala:109)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:137)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:134)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:117)
> at 
> org.apache.spark.sql.execution.InputAdapter.upstreams(WholeStageCodegen.scala:236)
> at org.apache.spark.sql.execution.Sort.upstreams(Sort.scala:104)
> at 
> org.apache.spark.sql.execution.WholeStageCodegen.doExecute(WholeStageCodegen.scala:351)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:137)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:134)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:117)
> at 
> org.apache.spark.sql.execution.InputAdapter.doExecute(WholeStageCodegen.scala:228)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:137)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:134)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:117)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14331) Exceptions saving to parquetFile after join from dataframes in master

2016-05-23 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296705#comment-15296705
 ] 

Thomas Graves commented on SPARK-14331:
---

that might be https://github.com/apache/spark/pull/13175.  I'll let you know if 
I can reproduce and get full stacktrace

> Exceptions saving to parquetFile after join from dataframes in master
> -
>
> Key: SPARK-14331
> URL: https://issues.apache.org/jira/browse/SPARK-14331
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Thomas Graves
>Priority: Critical
>
> I'm trying to use master and write to a parquet file when using a dataframe 
> but am seeing the exception below.  Not sure exact state of dataframes right 
> now so if this is known issue let me know.
> I read 2 sources of parquet files, joined them, then saved them back.
>  val df_pixels = sqlContext.read.parquet("data1")
> val df_pixels_renamed = df_pixels.withColumnRenamed("photo_id", 
> "pixels_photo_id")
> val df_meta = sqlContext.read.parquet("data2")
> val df = df_meta.as("meta").join(df_pixels_renamed, $"meta.photo_id" === 
> $"pixels_photo_id", "inner").drop("pixels_photo_id")
> df.write.parquet(args(0))
> 16/04/01 17:21:34 ERROR InsertIntoHadoopFsRelation: Aborting job.
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
> Exchange hashpartitioning(pixels_photo_id#3, 2), None
> +- WholeStageCodegen
>:  +- Filter isnotnull(pixels_photo_id#3)
>: +- INPUT
>+- Coalesce 0
>   +- WholeStageCodegen
>  :  +- Project [img_data#0,photo_id#1 AS pixels_photo_id#3]
>  : +- Scan HadoopFiles[img_data#0,photo_id#1] Format: 
> ParquetFormat, PushedFilters: [], ReadSchema: 
> struct
> at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)
> at 
> org.apache.spark.sql.execution.exchange.ShuffleExchange.doExecute(ShuffleExchange.scala:109)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:137)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:134)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:117)
> at 
> org.apache.spark.sql.execution.InputAdapter.upstreams(WholeStageCodegen.scala:236)
> at org.apache.spark.sql.execution.Sort.upstreams(Sort.scala:104)
> at 
> org.apache.spark.sql.execution.WholeStageCodegen.doExecute(WholeStageCodegen.scala:351)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:137)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:134)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:117)
> at 
> org.apache.spark.sql.execution.InputAdapter.doExecute(WholeStageCodegen.scala:228)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:137)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:134)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:117)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org