Github user vofque commented on the issue: https://github.com/apache/spark/pull/22708 The original problem is described here: https://issues.apache.org/jira/browse/SPARK-21402 I'll try to explain what happens in detail. Let's consider this data structure: root |-- intervals: array | |-- element: struct | | |-- startTime: long | | |-- endTime: long And let's say we have a java bean class with corresponding structure. When building a deserializer for the field _intervals_ in _JavaTypeInference.deserializerFor_ we construct a _MapObjects_ expression to convert structs to java beans: ``` case c if listType.isAssignableFrom(typeToken) => val et = elementType(typeToken) MapObjects( p => deserializerFor(et, Some(p)), getPath, inferDataType(et)._1, customCollectionCls = Some(c)) ``` _MapObjects_ requires _DataType_ of array elements. It is extracted from java element type using _JavaTypeInference.inferDataType_ which gets java bean properties and maps them to _StructFields_. ``` case other => // some more code goes here val properties = getJavaBeanReadableProperties(other) val fields = properties.map { property => val returnType = typeToken.method(property.getReadMethod).getReturnType val (dataType, nullable) = inferDataType(returnType, seenTypeSet + other) new StructField(property.getName, dataType, nullable) } ``` The order of properties in the resulting _StructType_ may not correspond to their declaration order as the declaration order is simply unknown. So the resulting element _StructType_ may look like this: root |-- endTime: long |-- startTime: long This _StructType_ is passed to _MapObjects_ and then to its loop variable _LambdaVariable_. For deserialization of single array elements an _InitializeJavaBean_ expression is created. It contains _UnresolvedExtractValue_ expressions for each field, and these expressions have _LambdaVariable_ as a child. They are resolved during analysis: ``` case UnresolvedExtractValue(child, fieldName) if child.resolved => ExtractValue(child, fieldName, resolver) ``` For each field _startTime_ and _endTime_ ordinals are calculated. For that child's _DataType_ is used, and in our case this is _StructType_ of _LambdaVariable_ with incorrect field order. As a result we get _GetStructField_ expressions with ordinal = 0 for 'endTime' and ordinal = 1 for startTime.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org