rdblue commented on a change in pull request #3186:
URL: https://github.com/apache/iceberg/pull/3186#discussion_r748883894
##########
File path:
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ConstantColumnVector.java
##########
@@ -119,6 +122,8 @@ public UTF8String getUTF8String(int rowId) {
@Override
public ColumnVector getChild(int ordinal) {
- throw new UnsupportedOperationException("ConstantColumnVector only
supports primitives");
+ DataType sparkType = ((StructType) type).fields()[ordinal].dataType();
+ Type childType = SparkSchemaUtil.convert(sparkType);
+ return new ConstantColumnVector(childType, batchSize, ((InternalRow)
constant).get(ordinal, sparkType));
Review comment:
It looks like the approach in this PR is to use existing constant
handling to fix nested column construction. I'm not sure that this is the right
approach. There isn't anything specific to ORC here, so I'm not convinced that
the ORC readers handle nested constant values correctly. Avoiding a problem is
an okay solution as long as we're confident that it is always avoided.
At a minimum, I think we should have tests for partitioning by nested fields
in structs that are not constants and would not get created by
`findMoreCompoundConstants`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]