walterddr commented on code in PR #11824:
URL: https://github.com/apache/pinot/pull/11824#discussion_r1378937058
##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java:
##########
@@ -300,6 +303,27 @@ public void setTransformFunction(@Nullable String
transformFunction) {
_transformFunction = transformFunction;
}
+ /**
+ * Returns whether the column is nullable or not.
+ *
+ * This Java property itself is nullable. In case null is returned, actual
nullability depends on the value of
+ * {@link IndexingConfig#isNullHandlingEnabled()}.
+ *
+ * @return true if the field is nullable, false if it is not and null if
column level nullability should be delegated
+ * to {@link IndexingConfig#isNullHandlingEnabled()}.
+ */
+ @Nullable
+ public Boolean getNullable() {
Review Comment:
sorry for following up late:
> Phase2: ... otherwise delegate on `IndexingConfig.isNullHandlingEnabled()`
?
> Phase 3:
> Deprecate `IndexingConfig.isNullHandlingEnabled()` (although we still need
to support it for compatibility reasons)
^ when it is deprecated in phase 3 what's the logic originally in phase 2?
is it false or true?
> There are potential big performance impact
IMO: 4 scenarios currently
1. `IndexingConfig.isNullHandlingEnabled() = true` and `SET
enableNullHandling=true;`
2. `IndexingConfig.isNullHandlingEnabled() = false` and `SET
enableNullHandling=true;`
3. `IndexingConfig.isNullHandlingEnabled() = true` and `SET
enableNullHandling=false;`
4. `IndexingConfig.isNullHandlingEnabled() = false` and `SET
enableNullHandling=false;`
as long as the following are met we should be good:
- performance on query should be the same on both 3&4 --> we should
guarantee the same if all columns are marked as non-nullable by the user
- performance on query should be the same on both 1&2 and will be worst than
3&4 b/c it accesses the null vector --> we should guarantee better performance
when some columns are marked as non-nullable by the user
> I don't think that is a good idea. Without making this non-primitive we
cannot know whether the value was explicitly defined as false or it has
defaulted to false. We would also be breaking backward compatibility. By using
a primitive boolean with false by default, in order to make a column nullable
users would need to:
> In Phase 1:
> - Set IndexingConfig.isNullHandlingEnabled() to true. This is required
at write time, given in V1 write path ignores FieldSpec nullability.
> - Set each, for each nullable column, FieldSpec nullability to true.
This is a breaking change. All tables already created would need to be
modified, otherwise they would be suddenly treated as not null.
> In Phase 2:
> - Set each, for each nullable column, FieldSpec nullability to true.
there's no backward compatibility here, b/c the default behavior of V1 is
all non-nullable, so setting it to primitive boolean with default to false
exactly achieves that,
you don't need to set nullability = true for all columns, b/c
- in v1 this can be controlled (and should still be controlled by `SET
enableNullHandling=true` option regardless whether this field is `boolean` or
`Boolean`
- in v2 there's no such option, so it will honor the nullability setting
--> which requires it to be set explicitly to true anyway otherwise nothing
works.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]