slessard commented on code in PR #10953:
URL: https://github.com/apache/iceberg/pull/10953#discussion_r1776485623
##########
arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorHolder.java:
##########
@@ -140,12 +141,18 @@ public static class ConstantVectorHolder<T> extends
VectorHolder {
private final int numRows;
public ConstantVectorHolder(int numRows) {
+ super(new NullVector("_dummy_", numRows), null, new
NullabilityHolder(numRows));
+ nullabilityHolder().setNulls(0, numRows);
this.numRows = numRows;
this.constantValue = null;
}
public ConstantVectorHolder(Types.NestedField icebergField, int numRows, T
constantValue) {
- super(icebergField);
+ super(
+ new NullVector(icebergField.name(), numRows),
Review Comment:
@nastra OK, Now I see what's going on. The Spark tests were failing before
I added a the VectorHolderTest class. I added that class because I realized
that there are three different ways to end up with a dummy holder. I ran that
test class against the main branch and all the tests passed. When I ran that
test class against my branch some of the tests for the isDummy method failed. I
then changed the isDummy method to correctly handle the three cases
1. The dummy holder is a DeletedVectorHolder instance created via the
deletedVectorHolder() method.
2. The dummy holder is a ConstantVectorHolder instance created via the
dummyHolder() method, i.e. vector has no assigned value
3. the dummy holder is a ConstantVectorHolder instance created via the
constantHolder() method, i.e. vector has an assigned value (though this
assigned value could be null)
For DeletedVectorHolder its `vector` instance variable is always `null`.
For ConstantVectorHolder, regardless of whether its constant value is `"a"`
or `null`, its `vector` instance variable is always a `NullVector` instance.
I updated the `isDummy()` method to handle those two cases. This made the
three unit tests related to the isDummy method pass. Fixing the isDummy method
seems to have fixed the Spark tests.
The remaining question is (possibly minor enough to ignore) that a
ConstantVectorHolder, regardless of whether its constant value is `"a"` or
`null`, its `vector` instance variable is always a `NullVector` instance. This
seems especially misleading when the constant value is `"a"`. My suggested
patch above fixes this issue to make ConstantVectorHolder's `vector` have a
`null` value when its constant value is `"a"`. Is this worth fixing? If yes,
then my suggested patch is needed. If no, then my suggested patch is not needed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]