Github user bdrillard commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20085#discussion_r159519672
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
    @@ -1237,47 +1342,91 @@ case class DecodeUsingSerializer[T](child: 
Expression, tag: ClassTag[T], kryo: B
     }
     
    --- End diff --
    
    In order to support initializations on more complicated objects, it makes 
sense to generalize `InitializeJavaBean` to an `InitializeObject` that can take 
a sequence of method names associated with a sequence of those methods' 
arguments. It seems thought that on plan analysis, Spark fails to resolve the 
column names against the Expression `children` when those child expressions are 
gathered from a `Seq[Expression]`, yielding errors like:
    
    ```
    Resolved attribute(s) 'field1,'field2 missing from field1#2,field2#3 in 
operator 'DeserializeToObject initializeobject(newInstance(class 
org.apache.spark.sql.catalyst.expressions.GenericBean), 
(setField1,List(assertnotnull('field1))), (setField2,List('field2.toString))), 
obj#4: org.apache.spark.sql.catalyst.expressions.GenericBean. Attribute(s) with 
the same name appear in the operation: field1,field2. Please check if the right 
attribute(s) are used.;
    org.apache.spark.sql.AnalysisException: Resolved attribute(s) 
'field1,'field2 missing from field1#2,field2#3 in operator 'DeserializeToObject 
initializeobject(newInstance(class 
org.apache.spark.sql.catalyst.expressions.GenericBean), 
(setField1,List(assertnotnull('field1))), (setField2,List('field2.toString))), 
obj#4: org.apache.spark.sql.catalyst.expressions.GenericBean. Attribute(s) with 
the same name appear in the operation: field1,field2. Please check if the right 
attribute(s) are used.;
        at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:41)
    ```
    
    Interestingly, if we change the `setters` signature from `Seq[(String, 
Seq[Expression])]` to `Seq[(String, (Expression, Expression)]`, (the use case 
for Spark-Avro, where objects are initialized by calling `put` with an integer 
index argument and then some object argument), the plan will resolve. But of 
course, such a function signature would in a sense be hard-coded for Avro.
    
    Any ideas why passing a sequence of child expressions would yield the 
analysis error above?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to