[ https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552379#comment-17552379 ]
ASF GitHub Bot commented on PARQUET-1020: ----------------------------------------- dossett commented on code in PR #963: URL: https://github.com/apache/parquet-mr/pull/963#discussion_r893846424 ########## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoWriteSupport.java: ########## @@ -115,27 +120,32 @@ public void prepareForWrite(RecordConsumer recordConsumer) { public WriteContext init(Configuration configuration) { // if no protobuf descriptor was given in constructor, load descriptor from configuration (set with setProtobufClass) - if (protoMessage == null) { - Class<? extends Message> pbClass = configuration.getClass(PB_CLASS_WRITE, null, Message.class); - if (pbClass != null) { - protoMessage = pbClass; - } else { - String msg = "Protocol buffer class not specified."; - String hint = " Please use method ProtoParquetOutputFormat.setProtobufClass(...) or other similar method."; - throw new BadConfigurationException(msg + hint); + if (descriptor == null) { + if (protoMessage == null) { + Class<? extends Message> pbClass = configuration.getClass(PB_CLASS_WRITE, null, Message.class); + if (pbClass != null) { + protoMessage = pbClass; + } else { + String msg = "Protocol buffer class or descriptor not specified."; + String hint = " Please use method ProtoParquetOutputFormat.setProtobufClass(...) or other similar method."; + throw new BadConfigurationException(msg + hint); + } } + descriptor = Protobufs.getMessageDescriptor(protoMessage); + } else { + //Assume no specific Message extending class, so use DynamicMessage + protoMessage = DynamicMessage.class; Review Comment: Should this just be left null? Having it set implies it usable, but it's only used to determine the descriptor and get a protoschemaconverter and then only in cases when we don't have a descriptor. I just worry that setting it could lead to assumptions later on that it's as usable as if it were a generated message and not a dynamic message. ########## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoWriteSupport.java: ########## @@ -115,27 +120,32 @@ public void prepareForWrite(RecordConsumer recordConsumer) { public WriteContext init(Configuration configuration) { // if no protobuf descriptor was given in constructor, load descriptor from configuration (set with setProtobufClass) - if (protoMessage == null) { - Class<? extends Message> pbClass = configuration.getClass(PB_CLASS_WRITE, null, Message.class); - if (pbClass != null) { - protoMessage = pbClass; - } else { - String msg = "Protocol buffer class not specified."; - String hint = " Please use method ProtoParquetOutputFormat.setProtobufClass(...) or other similar method."; - throw new BadConfigurationException(msg + hint); + if (descriptor == null) { + if (protoMessage == null) { + Class<? extends Message> pbClass = configuration.getClass(PB_CLASS_WRITE, null, Message.class); + if (pbClass != null) { + protoMessage = pbClass; + } else { + String msg = "Protocol buffer class or descriptor not specified."; + String hint = " Please use method ProtoParquetOutputFormat.setProtobufClass(...) or other similar method."; + throw new BadConfigurationException(msg + hint); + } } + descriptor = Protobufs.getMessageDescriptor(protoMessage); + } else { + //Assume no specific Message extending class, so use DynamicMessage + protoMessage = DynamicMessage.class; Review Comment: Should this just be left null? Having it set implies it usable, but it's only used to determine the descriptor and get a protoschemaconverter and then only in cases when we don't have a descriptor. I just worry that setting it could lead to assumptions later on that it's as usable as if it were a generated message and not a dynamic message. > Add support for Dynamic Messages in parquet-protobuf > ---------------------------------------------------- > > Key: PARQUET-1020 > URL: https://issues.apache.org/jira/browse/PARQUET-1020 > Project: Parquet > Issue Type: New Feature > Reporter: Alex Buck > Assignee: Alex Buck > Priority: Major > > Hello. We would like to pass in a DynamicMessage rather than using the > generated protobuf classes to allow us to make our job very generic. > I think this could be achieved by setting the descriptor upfront, similarly > to how there is a ProtoParquetOutputFormat today. > In ProtoWriteSupport in the init method it could then generate the parquet > schema created by ProtoSchemaConverter using the passed in descriptor, rather > than taking it from the generated proto class. > Would there be interest in incorporating this change? If so does the approach > above sound sensible? I am happy to do a pull request > initial PR here: https://github.com/apache/parquet-mr/pull/414 -- This message was sent by Atlassian Jira (v8.20.7#820007)