arunkumarucet commented on code in PR #17593:
URL: https://github.com/apache/pinot/pull/17593#discussion_r2740880429
##########
pinot-plugins/pinot-input-format/pinot-protobuf/src/main/java/org/apache/pinot/plugin/inputformat/protobuf/ProtoBufRecordExtractor.java:
##########
@@ -68,23 +124,32 @@ private Object getFieldValue(Descriptors.FieldDescriptor
fieldDescriptor, Messag
@Override
public GenericRow extract(Message from, GenericRow to) {
Descriptors.Descriptor descriptor = from.getDescriptorForType();
- if (_extractAll) {
- for (Descriptors.FieldDescriptor fieldDescriptor :
descriptor.getFields()) {
- Object fieldValue = getFieldValue(fieldDescriptor, from);
- if (fieldValue != null) {
- fieldValue = convert(new ProtoBufFieldInfo(fieldValue,
fieldDescriptor));
- }
- to.putValue(fieldDescriptor.getName(), fieldValue);
- }
- } else {
- for (String fieldName : _fields) {
- Descriptors.FieldDescriptor fieldDescriptor =
descriptor.findFieldByName(fieldName);
- Object fieldValue = fieldDescriptor == null ? null :
getFieldValue(fieldDescriptor, from);
+
+ // Initialize or reinitialize cache if descriptor changed (handles schema
evolution)
+ if (_cachedDescriptorFullName == null ||
!_cachedDescriptorFullName.equals(descriptor.getFullName())) {
Review Comment:
No, we won't miss any fields. Here's how it works:
For extractAll mode: We call descriptor.getFields() which returns ALL fields
defined in the proto schema, regardless of whether they're set in a particular
message.
For subset mode: We iterate through the Pinot schema fields and call
findFieldByName() for each. If a field exists in Pinot schema but not in proto,
findFieldByName returns null, which we handle gracefully by setting the value
to null.
For proto3 presence: The existing getFieldValue() method handles this
correctly:
- Fields with explicit presence (optional keyword): returns null if not set
- Fields with implicit presence (regular proto3): returns default value
- Repeated fields/maps: returns empty collection if not set
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]