Copilot commented on code in PR #17504:
URL: https://github.com/apache/pinot/pull/17504#discussion_r2688514517
##########
pinot-plugins/pinot-input-format/pinot-parquet/src/main/java/org/apache/pinot/plugin/inputformat/parquet/ParquetAvroRecordExtractor.java:
##########
@@ -40,21 +39,26 @@ protected Object transformValue(Object value, Schema.Field
field) {
Object handleDeprecatedTypes(Object value, Schema.Field field) {
Schema.Type avroColumnType = field.schema().getType();
- if (avroColumnType == org.apache.avro.Schema.Type.UNION) {
- org.apache.avro.Schema nonNullSchema = null;
- for (org.apache.avro.Schema childFieldSchema :
field.schema().getTypes()) {
- if (childFieldSchema.getType() != org.apache.avro.Schema.Type.NULL) {
+ if (avroColumnType == Schema.Type.UNION) {
+ Schema nonNullSchema = null;
+ for (Schema childFieldSchema : field.schema().getTypes()) {
+ if (childFieldSchema.getType() != Schema.Type.NULL) {
if (nonNullSchema == null) {
nonNullSchema = childFieldSchema;
} else {
throw new IllegalStateException("More than one non-null schema in
UNION schema");
}
}
}
+ assert nonNullSchema != null;
- //INT96 is deprecated. We convert to long as we do in the native parquet
extractor.
- if
(nonNullSchema.getName().equals(PrimitiveType.PrimitiveTypeName.INT96.name())) {
- return ParquetNativeRecordExtractor.convertInt96ToLong((byte[]) value);
+ // NOTE:
+ // INT96 is deprecated. We convert to long as we do in the native
parquet extractor.
+ // See org.apache.parquet.avro.AvroSchemaConverter about how INT96 is
converted into Avro schema.
+ // We have to rely on the doc to determine whether a field is INT96.
+ if (nonNullSchema.getType() == Schema.Type.FIXED &&
nonNullSchema.getFixedSize() == 12
+ && "INT96 represented as byte[12]".equals(nonNullSchema.getDoc())) {
Review Comment:
The magic number 12 and the hardcoded documentation string 'INT96
represented as byte[12]' should be extracted as named constants (e.g.,
INT96_BYTE_SIZE and INT96_DOC_STRING) to improve maintainability and make the
intent clearer.
##########
pinot-plugins/pinot-input-format/pinot-parquet/src/main/java/org/apache/pinot/plugin/inputformat/parquet/ParquetAvroRecordExtractor.java:
##########
@@ -40,21 +39,26 @@ protected Object transformValue(Object value, Schema.Field
field) {
Object handleDeprecatedTypes(Object value, Schema.Field field) {
Schema.Type avroColumnType = field.schema().getType();
- if (avroColumnType == org.apache.avro.Schema.Type.UNION) {
- org.apache.avro.Schema nonNullSchema = null;
- for (org.apache.avro.Schema childFieldSchema :
field.schema().getTypes()) {
- if (childFieldSchema.getType() != org.apache.avro.Schema.Type.NULL) {
+ if (avroColumnType == Schema.Type.UNION) {
+ Schema nonNullSchema = null;
+ for (Schema childFieldSchema : field.schema().getTypes()) {
+ if (childFieldSchema.getType() != Schema.Type.NULL) {
if (nonNullSchema == null) {
nonNullSchema = childFieldSchema;
} else {
throw new IllegalStateException("More than one non-null schema in
UNION schema");
}
}
}
+ assert nonNullSchema != null;
Review Comment:
The assertion will not execute in production builds where assertions are
disabled. Consider replacing with an explicit null check and throwing an
IllegalStateException with a descriptive message, such as 'All schemas in UNION
are NULL types'.
```suggestion
if (nonNullSchema == null) {
throw new IllegalStateException("All schemas in UNION are NULL
types");
}
```
##########
pinot-plugins/pinot-input-format/pinot-parquet/src/main/java/org/apache/pinot/plugin/inputformat/parquet/ParquetAvroRecordExtractor.java:
##########
@@ -40,21 +39,26 @@ protected Object transformValue(Object value, Schema.Field
field) {
Object handleDeprecatedTypes(Object value, Schema.Field field) {
Schema.Type avroColumnType = field.schema().getType();
- if (avroColumnType == org.apache.avro.Schema.Type.UNION) {
- org.apache.avro.Schema nonNullSchema = null;
- for (org.apache.avro.Schema childFieldSchema :
field.schema().getTypes()) {
- if (childFieldSchema.getType() != org.apache.avro.Schema.Type.NULL) {
+ if (avroColumnType == Schema.Type.UNION) {
+ Schema nonNullSchema = null;
+ for (Schema childFieldSchema : field.schema().getTypes()) {
+ if (childFieldSchema.getType() != Schema.Type.NULL) {
if (nonNullSchema == null) {
nonNullSchema = childFieldSchema;
} else {
throw new IllegalStateException("More than one non-null schema in
UNION schema");
}
}
}
+ assert nonNullSchema != null;
- //INT96 is deprecated. We convert to long as we do in the native parquet
extractor.
- if
(nonNullSchema.getName().equals(PrimitiveType.PrimitiveTypeName.INT96.name())) {
- return ParquetNativeRecordExtractor.convertInt96ToLong((byte[]) value);
+ // NOTE:
+ // INT96 is deprecated. We convert to long as we do in the native
parquet extractor.
+ // See org.apache.parquet.avro.AvroSchemaConverter about how INT96 is
converted into Avro schema.
+ // We have to rely on the doc to determine whether a field is INT96.
+ if (nonNullSchema.getType() == Schema.Type.FIXED &&
nonNullSchema.getFixedSize() == 12
+ && "INT96 represented as byte[12]".equals(nonNullSchema.getDoc())) {
Review Comment:
The new INT96 detection logic should have test coverage to verify it
correctly identifies INT96 fields using the fixed-size and documentation
attributes, and that it properly handles edge cases like null documentation or
incorrect fixed sizes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]