[ https://issues.apache.org/jira/browse/SPARK-14331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296440#comment-15296440 ]
Thomas Graves commented on SPARK-14331: --------------------------------------- Note I was running spark on yarn. > Exceptions saving to parquetFile after join from dataframes in master > --------------------------------------------------------------------- > > Key: SPARK-14331 > URL: https://issues.apache.org/jira/browse/SPARK-14331 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0 > Reporter: Thomas Graves > Priority: Critical > > I'm trying to use master and write to a parquet file when using a dataframe > but am seeing the exception below. Not sure exact state of dataframes right > now so if this is known issue let me know. > I read 2 sources of parquet files, joined them, then saved them back. > val df_pixels = sqlContext.read.parquet("data1") > val df_pixels_renamed = df_pixels.withColumnRenamed("photo_id", > "pixels_photo_id") > val df_meta = sqlContext.read.parquet("data2") > val df = df_meta.as("meta").join(df_pixels_renamed, $"meta.photo_id" === > $"pixels_photo_id", "inner").drop("pixels_photo_id") > df.write.parquet(args(0)) > 16/04/01 17:21:34 ERROR InsertIntoHadoopFsRelation: Aborting job. > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: > Exchange hashpartitioning(pixels_photo_id#3, 20000), None > +- WholeStageCodegen > : +- Filter isnotnull(pixels_photo_id#3) > : +- INPUT > +- Coalesce 0 > +- WholeStageCodegen > : +- Project [img_data#0,photo_id#1 AS pixels_photo_id#3] > : +- Scan HadoopFiles[img_data#0,photo_id#1] Format: > ParquetFormat, PushedFilters: [], ReadSchema: > struct<img_data:binary,photo_id:string> > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47) > at > org.apache.spark.sql.execution.exchange.ShuffleExchange.doExecute(ShuffleExchange.scala:109) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:137) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:134) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.InputAdapter.upstreams(WholeStageCodegen.scala:236) > at org.apache.spark.sql.execution.Sort.upstreams(Sort.scala:104) > at > org.apache.spark.sql.execution.WholeStageCodegen.doExecute(WholeStageCodegen.scala:351) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:137) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:134) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.InputAdapter.doExecute(WholeStageCodegen.scala:228) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:137) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:134) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:117) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org