[jira] [Commented] (PARQUET-388) ProtoRecordConverter might wrongly cast a Message.Builder to Message
[ https://issues.apache.org/jira/browse/PARQUET-388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292627#comment-15292627 ] David Cardon commented on PARQUET-388: -- A workaround I found was to replace all of the {{Message}} references in [ProtoReadSupport|https://github.com/apache/parquet-mr/blob/master/parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoReadSupport.java] with {{MessageOrBuilder}}. You can then fairly easily resolve the {{Builder}} into a {{Message}} outside of the reading mechanism. > ProtoRecordConverter might wrongly cast a Message.Builder to Message > > > Key: PARQUET-388 > URL: https://issues.apache.org/jira/browse/PARQUET-388 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Reporter: Wu Xiang >Assignee: Reuben Kuhnert > > ProtoRecordConverter returns current record as follows: > {code} > public T getCurrentRecord() { > if (buildBefore) { > return (T) this.reusedBuilder.build(); > } else { > return (T) this.reusedBuilder; > } > } > {code} > However this might fail if T is subclass of Message and buildBefore == false, > since it's actually casting a Message.Builder instance to Message type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-388) ProtoRecordConverter might wrongly cast a Message.Builder to Message
[ https://issues.apache.org/jira/browse/PARQUET-388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127838#comment-15127838 ] Matt Martin commented on PARQUET-388: - I agree with [~wxiang7] that there seems to be something a little bit off here. I'm also getting an unexpected ClassCastException in the following Scala code: {code} val reader = ParquetReader.builder(new ProtoReadSupport[SomeMessageClass](), new Path(file.toURI)).build ... reader.read {code} At reader.read I get the following exception: {code} ClassCastException: : SomeMessageClass$Builder cannot be cast to SomeMessageClass {code} I cannot change the declaration of reader to the following: {code} val reader = ParquetReader.builder(new ProtoReadSupport[SomeMessageClass$Builder](), new Path(file.toURI)).build {code} because then I get the following error: {code} type arguments [SomeMessageClass$Builder] do not conform to class ProtoReadSupport's type parameter bounds [T <: com.google.protobuf.Message] {code} > ProtoRecordConverter might wrongly cast a Message.Builder to Message > > > Key: PARQUET-388 > URL: https://issues.apache.org/jira/browse/PARQUET-388 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Reporter: Wu Xiang >Assignee: Reuben Kuhnert > > ProtoRecordConverter returns current record as follows: > {code} > public T getCurrentRecord() { > if (buildBefore) { > return (T) this.reusedBuilder.build(); > } else { > return (T) this.reusedBuilder; > } > } > {code} > However this might fail if T is subclass of Message and buildBefore == false, > since it's actually casting a Message.Builder instance to Message type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-388) ProtoRecordConverter might wrongly cast a Message.Builder to Message
[ https://issues.apache.org/jira/browse/PARQUET-388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14987434#comment-14987434 ] Lukas Nalezenec commented on PARQUET-388: - > To be sure, when you say 'The cast is correct', do you mean by that > ProtoRecordConverter::getCurrentRecord would work correctly ? No, it won't work correctly but ProtoRecordConverter::getCurrentRecord will work correctly. > The signature for ProtoReadSupport is as follows, which leads > ProtoReadSupport to compile time error. It looks like there is bug in the signature. IMO it should be public class ProtoReadSupport extends ReadSupport {} I will look at it later today. Thank you very much. > ProtoRecordConverter might wrongly cast a Message.Builder to Message > > > Key: PARQUET-388 > URL: https://issues.apache.org/jira/browse/PARQUET-388 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Reporter: Wu Xiang >Assignee: Reuben Kuhnert > > ProtoRecordConverter returns current record as follows: > {code} > public T getCurrentRecord() { > if (buildBefore) { > return (T) this.reusedBuilder.build(); > } else { > return (T) this.reusedBuilder; > } > } > {code} > However this might fail if T is subclass of Message and buildBefore == false, > since it's actually casting a Message.Builder instance to Message type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-388) ProtoRecordConverter might wrongly cast a Message.Builder to Message
[ https://issues.apache.org/jira/browse/PARQUET-388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14987136#comment-14987136 ] Lukas Nalezenec commented on PARQUET-388: - Hi, author here. The cast is of correct and the buildBefore flag is extra feature, its not used much. Do you really need data converted from Builder to Message ? Using Builder is more efficient. > ProtoRecordConverter might wrongly cast a Message.Builder to Message > > > Key: PARQUET-388 > URL: https://issues.apache.org/jira/browse/PARQUET-388 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Reporter: Wu Xiang >Assignee: Reuben Kuhnert > > ProtoRecordConverter returns current record as follows: > {code} > public T getCurrentRecord() { > if (buildBefore) { > return (T) this.reusedBuilder.build(); > } else { > return (T) this.reusedBuilder; > } > } > {code} > However this might fail if T is subclass of Message and buildBefore == false, > since it's actually casting a Message.Builder instance to Message type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-388) ProtoRecordConverter might wrongly cast a Message.Builder to Message
[ https://issues.apache.org/jira/browse/PARQUET-388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14987875#comment-14987875 ] Reuben Kuhnert commented on PARQUET-388: Just as a general comment, returning {{}} does feel very mysterious to me. Does it make sense to have two explicit methods like {{asBuilder}} and {{asMessage}} in addition to the current {{getCurrentRecord}}? > ProtoRecordConverter might wrongly cast a Message.Builder to Message > > > Key: PARQUET-388 > URL: https://issues.apache.org/jira/browse/PARQUET-388 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Reporter: Wu Xiang >Assignee: Reuben Kuhnert > > ProtoRecordConverter returns current record as follows: > {code} > public T getCurrentRecord() { > if (buildBefore) { > return (T) this.reusedBuilder.build(); > } else { > return (T) this.reusedBuilder; > } > } > {code} > However this might fail if T is subclass of Message and buildBefore == false, > since it's actually casting a Message.Builder instance to Message type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-388) ProtoRecordConverter might wrongly cast a Message.Builder to Message
[ https://issues.apache.org/jira/browse/PARQUET-388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988762#comment-14988762 ] Wu Xiang commented on PARQUET-388: -- One more suggestion, let's just be honest to align the type signature of {{ProtoRecordConverter}} class definition and {{getCurrentRecord}} method. I mean, when we use {{ProtoRecordConverter}} (or ProtoReadSupport), only MyProtoMessage instances would be returned from getCurrentRecord. On the other hand, when we use {{ProtoRecordConverter}} or ProtoReadSupport, only MyProtoMessage.Builder instances returned. In this way, I would feel less confused about ProtoRecordConverter API. > ProtoRecordConverter might wrongly cast a Message.Builder to Message > > > Key: PARQUET-388 > URL: https://issues.apache.org/jira/browse/PARQUET-388 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Reporter: Wu Xiang >Assignee: Reuben Kuhnert > > ProtoRecordConverter returns current record as follows: > {code} > public T getCurrentRecord() { > if (buildBefore) { > return (T) this.reusedBuilder.build(); > } else { > return (T) this.reusedBuilder; > } > } > {code} > However this might fail if T is subclass of Message and buildBefore == false, > since it's actually casting a Message.Builder instance to Message type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-388) ProtoRecordConverter might wrongly cast a Message.Builder to Message
[ https://issues.apache.org/jira/browse/PARQUET-388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14987295#comment-14987295 ] Wu Xiang commented on PARQUET-388: -- Hi Lukas, Thanks for your reply. To be sure, when you say 'The cast is correct', do you mean by that {{ProtoRecordConverter::getCurrentRecord}} would work correctly ? In my case, I created a instance of {{ProtoReadSupport}}, which then created {{RecordMaterializer}} and {{ProtoRecordConverter}} instances internally. Since the {{buildBefore}} flag is not set, {{ProtoRecordConverter::getCurrentRecord}} would definitely cast a {{MyProtoMessage.Builder}} instance to {{MyProtoMessage}}, with runtime exceptions. P.S. The signature for {{ProtoReadSupport}} is as follows, which leads {{ProtoReadSupport}} to compile time error. {code} public class ProtoReadSupport extends ReadSupport {} {code} wu > ProtoRecordConverter might wrongly cast a Message.Builder to Message > > > Key: PARQUET-388 > URL: https://issues.apache.org/jira/browse/PARQUET-388 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Reporter: Wu Xiang >Assignee: Reuben Kuhnert > > ProtoRecordConverter returns current record as follows: > {code} > public T getCurrentRecord() { > if (buildBefore) { > return (T) this.reusedBuilder.build(); > } else { > return (T) this.reusedBuilder; > } > } > {code} > However this might fail if T is subclass of Message and buildBefore == false, > since it's actually casting a Message.Builder instance to Message type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-388) ProtoRecordConverter might wrongly cast a Message.Builder to Message
[ https://issues.apache.org/jira/browse/PARQUET-388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14986645#comment-14986645 ] Wu Xiang commented on PARQUET-388: -- hi Reuben, Yes, I agree that current record could be either Message or Message.Builder. But the problem is when I use it, the 'buildBefore' flag would be always false, which means the Builder instance is always returned. More details, I'm implementing a wrapper for cascading to read/write parquet|protobuf data. And the default ProtoReadSupport class is used to help with that. The signature for ProtoReadSupport is as follows, where only Mesage type is declared as the type parameter. {code:title=ProtoReadSupport.java|borderStyle=solid} public class ProtoReadSupport extends ReadSupport { } {code} But the RecordMaterializer inside ProtoReadSupport is actually returning Message.Builder. This is where type inconsistency comes from. To make it work, I need to add some hacks for RecordMaterializer (ProtoRecordConverter), by switch the 'buildBefore' flag. {code:title=ProtoReadSupport.java|borderStyle=solid} @Override public RecordMaterializer prepareForRead(Configuration configuration, MapkeyValueMetaData, MessageType fileSchema, ReadContext readContext) { String headerProtoClass = keyValueMetaData.get(PB_CLASS); String configuredProtoClass = configuration.get(PB_CLASS); if (configuredProtoClass != null) { LOG.debug("Replacing class " + headerProtoClass + " by " + configuredProtoClass); headerProtoClass = configuredProtoClass; } if (headerProtoClass == null) { throw new RuntimeException("I Need parameter " + PB_CLASS + " with Protocol Buffer class"); } LOG.debug("Reading data with Protocol Buffer class " + headerProtoClass); MessageType requestedSchema = readContext.getRequestedSchema(); Class protobufClass = Protobufs.getProtobufClass(headerProtoClass); ProtoRecordMaterializer materializer = new ProtoRecordMaterializer(requestedSchema, protobufClass); ((ProtoRecordConverter)materializer.getRootConverter()).setBuildBefore(true); return materializer; } {code} > ProtoRecordConverter might wrongly cast a Message.Builder to Message > > > Key: PARQUET-388 > URL: https://issues.apache.org/jira/browse/PARQUET-388 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Reporter: Wu Xiang >Assignee: Reuben Kuhnert > > ProtoRecordConverter returns current record as follows: > {code} > public T getCurrentRecord() { > if (buildBefore) { > return (T) this.reusedBuilder.build(); > } else { > return (T) this.reusedBuilder; > } > } > {code} > However this might fail if T is subclass of Message and buildBefore == false, > since it's actually casting a Message.Builder instance to Message type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)