Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r142796899 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -519,3 +519,18 @@ case class CoGroup( outputObjAttr: Attribute, left: LogicalPlan, right: LogicalPlan) extends BinaryNode with ObjectProducer + +case class FlatMapGroupsInPandas( + groupingAttributes: Seq[Attribute], + functionExpr: Expression, + output: Seq[Attribute], + child: LogicalPlan) extends UnaryNode { + /** + * This is needed because output attributes is considered `reference` when + * passed through the constructor. + * + * Without this, catalyst will complain that output attributes are missing + * from the input. + */ + override val producedAttributes = AttributeSet(output) --- End diff -- This is one of the trick bit. It's because of this code: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala#L135 Because of `productIterator` will return all member variables, including `output`, `references` of the tree node will include all output attributes, and it will complain about missing input: ``` def missingInput: AttributeSet = references -- inputSet -- producedAttributes ``` I think my solution here isn't great but I don't know the best way of deal with this. If someone with deeper catalyst knowledge can suggest, I am happy to give rid of this bit..
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org