[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614878#comment-17614878 ]
ASF GitHub Bot commented on PARQUET-1711: ----------------------------------------- jinyius commented on PR #988: URL: https://github.com/apache/parquet-mr/pull/988#issuecomment-1272784474 > Hi @jinyius and @matthieun, Thank both of you for the contribution and we really appreciate your patience with us. Now we have two PRs for the same issue, we better merge them into one. Given this PR is earlier, would it be a good idea to incorporate #995 into this PR for what is missing? @matthieun can add @jinyius as a co-author in that case. > > Does it make sense to both of you? i don't think merging will help here. both approaches do similar things in terms of traversing and expanding out the schema on recursive fields. the differ on the state used during the traversal, and they differ on how to deal with the remaining recursive data (this one silently ignores, but the mine stores as serialized bytes). i don't care about authorship. i want this to get fixed, and fixed properly. > [parquet-protobuf] stack overflow when work with well known json type > --------------------------------------------------------------------- > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug > Affects Versions: 1.10.1 > Reporter: Lawrence He > Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map<string, google.protobuf.ListValue> data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.<init>(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.<init>(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)