[ https://issues.apache.org/jira/browse/SPARK-13531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15171354#comment-15171354 ]
Zuo Wang commented on SPARK-13531: ---------------------------------- Unable to reproduce, here is my steps in spark-shell: val rowRDD1 = sc.parallelize(Seq(Row(2), Row(3))) val schema1 = StructType(StructField("x",IntegerType,true)::Nil) val df1 = sqlContext.createDataFrame(rowRDD1, schema1) val rowRDD2 = sc.parallelize(Seq(Row(1, "1"), Row(2, "2"), Row(3, "3"))) val schema2 = StructType(StructField("x",IntegerType,true)::StructField("y",StringType,true)::Nil) val df2 = sqlContext.createDataFrame(rowRDD2, schema2) df1.join(df2, Seq("x")).show > Some DataFrame joins stopped working with UnsupportedOperationException: No > size estimation available for objects > ----------------------------------------------------------------------------------------------------------------- > > Key: SPARK-13531 > URL: https://issues.apache.org/jira/browse/SPARK-13531 > Project: Spark > Issue Type: Bug > Components: SQL > Reporter: koert kuipers > Priority: Minor > > this is using spark 2.0.0-SNAPSHOT > dataframe df1: > schema: > {noformat}StructType(StructField(x,IntegerType,true)){noformat} > explain: > {noformat}== Physical Plan == > MapPartitions <function1>, obj#135: object, [if (input[0, object].isNullAt) > null else input[0, object].get AS x#128] > +- MapPartitions <function1>, createexternalrow(if (isnull(x#9)) null else > x#9), [input[0, object] AS obj#135] > +- WholeStageCodegen > : +- Project [_1#8 AS x#9] > : +- Scan ExistingRDD[_1#8]{noformat} > show: > {noformat}+---+ > | x| > +---+ > | 2| > | 3| > +---+{noformat} > dataframe df2: > schema: > {noformat}StructType(StructField(x,IntegerType,true), > StructField(y,StringType,true)){noformat} > explain: > {noformat}== Physical Plan == > MapPartitions <function1>, createexternalrow(x#2, if (isnull(y#3)) null else > y#3.toString), [if (input[0, object].isNullAt) null else input[0, object].get > AS x#130,if (input[0, object].isNullAt) null else staticinvoke(class > org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, > object].get, true) AS y#131] > +- WholeStageCodegen > : +- Project [_1#0 AS x#2,_2#1 AS y#3] > : +- Scan ExistingRDD[_1#0,_2#1]{noformat} > show: > {noformat}+---+---+ > | x| y| > +---+---+ > | 1| 1| > | 2| 2| > | 3| 3| > +---+---+{noformat} > i run: > df1.join(df2, Seq("x")).show > i get: > {noformat}java.lang.UnsupportedOperationException: No size estimation > available for objects. > at org.apache.spark.sql.types.ObjectType.defaultSize(ObjectType.scala:41) > at > org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$6.apply(LogicalPlan.scala:323) > at > org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$6.apply(LogicalPlan.scala:323) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.immutable.List.map(List.scala:285) > at > org.apache.spark.sql.catalyst.plans.logical.UnaryNode.statistics(LogicalPlan.scala:323) > at > org.apache.spark.sql.execution.SparkStrategies$CanBroadcast$.unapply(SparkStrategies.scala:87){noformat} > now sure what changed, this ran about a week ago without issues (in our > internal unit tests). it is fully reproducible, however when i tried to > minimize the issue i could not reproduce it by just creating data frames in > the repl with the same contents, so it probably has something to do with way > these are created (from Row objects and StructTypes). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org