[jira] [Commented] (SPARK-14331) Exceptions saving to parquetFile after join from dataframes in master

Thomas Graves (JIRA) Mon, 23 May 2016 07:35:14 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-14331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296440#comment-15296440
 ]


Thomas Graves commented on SPARK-14331:
---------------------------------------

Note I was running spark on yarn.

> Exceptions saving to parquetFile after join from dataframes in master
> ---------------------------------------------------------------------
>
>                 Key: SPARK-14331
>                 URL: https://issues.apache.org/jira/browse/SPARK-14331
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Thomas Graves
>            Priority: Critical
>
> I'm trying to use master and write to a parquet file when using a dataframe 
> but am seeing the exception below.  Not sure exact state of dataframes right 
> now so if this is known issue let me know.
> I read 2 sources of parquet files, joined them, then saved them back.
>  val df_pixels = sqlContext.read.parquet("data1")
>     val df_pixels_renamed = df_pixels.withColumnRenamed("photo_id", 
> "pixels_photo_id")
>     val df_meta = sqlContext.read.parquet("data2")
>     val df = df_meta.as("meta").join(df_pixels_renamed, $"meta.photo_id" === 
> $"pixels_photo_id", "inner").drop("pixels_photo_id")
>     df.write.parquet(args(0))
> 16/04/01 17:21:34 ERROR InsertIntoHadoopFsRelation: Aborting job.
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
> Exchange hashpartitioning(pixels_photo_id#3, 20000), None
> +- WholeStageCodegen
>    :  +- Filter isnotnull(pixels_photo_id#3)
>    :     +- INPUT
>    +- Coalesce 0
>       +- WholeStageCodegen
>          :  +- Project [img_data#0,photo_id#1 AS pixels_photo_id#3]
>          :     +- Scan HadoopFiles[img_data#0,photo_id#1] Format: 
> ParquetFormat, PushedFilters: [], ReadSchema: 
> struct<img_data:binary,photo_id:string>
>         at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)
>         at 
> org.apache.spark.sql.execution.exchange.ShuffleExchange.doExecute(ShuffleExchange.scala:109)
>         at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118)
>         at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118)
>         at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:137)
>         at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>         at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:134)
>         at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:117)
>         at 
> org.apache.spark.sql.execution.InputAdapter.upstreams(WholeStageCodegen.scala:236)
>         at org.apache.spark.sql.execution.Sort.upstreams(Sort.scala:104)
>         at 
> org.apache.spark.sql.execution.WholeStageCodegen.doExecute(WholeStageCodegen.scala:351)
>         at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118)
>         at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118)
>         at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:137)
>         at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>         at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:134)
>         at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:117)
>         at 
> org.apache.spark.sql.execution.InputAdapter.doExecute(WholeStageCodegen.scala:228)
>         at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118)
>         at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118)
>         at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:137)
>         at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>         at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:134)
>         at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:117)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14331) Exceptions saving to parquetFile after join from dataframes in master

Reply via email to