from:"J Y \(Jira\)"

[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type

2022-05-26 Thread J Y (Jira)



[ 
https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542604#comment-17542604
 ] 

J Y commented on PARQUET-1711:
--

i believe google's 
[Struct|https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/struct.proto]
 also runs afoul here, too.

any new progress here?

> [parquet-protobuf] stack overflow when work with well known json type
> -
>
> Key: PARQUET-1711
> URL: https://issues.apache.org/jira/browse/PARQUET-1711
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: 1.10.1
>Reporter: Lawrence He
>Priority: Major
>
> Writing following protobuf message as parquet file is not possible: 
> {code:java}
> syntax = "proto3";
> import "google/protobuf/struct.proto";
> package test;
> option java_outer_classname = "CustomMessage";
> message TestMessage {
> map data = 1;
> } {code}
> Protobuf introduced "well known json type" such like 
> [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue]
>  to work around json schema conversion. 
> However writing above messages traps parquet writer into an infinite loop due 
> to the "general type" support in protobuf. Current implementation will keep 
> referencing 6 possible types defined in protobuf (null, bool, number, string, 
> struct, list) and entering infinite loop when referencing "struct".
> {code:java}
> java.lang.StackOverflowErrorjava.lang.StackOverflowError at 
> java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at 
> java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at 
> java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044)
>  at 
> java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type

2022-06-09 Thread J Y (Jira)



[ 
https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552396#comment-17552396
 ] 

J Y commented on PARQUET-1711:
--

i'd be ok of that approach: a proto option annotation to limit the recursion 
limit, then failing over to treat it as proto bytes.  if the recursion limit is 
omitted/missing, then just treat the recursive definition as bytes after the 
first occurrence.

forgive me if this is a naive question, but what's the difficulty in supporting 
"typing" properly to handle recursive nesting?

> [parquet-protobuf] stack overflow when work with well known json type
> -
>
> Key: PARQUET-1711
> URL: https://issues.apache.org/jira/browse/PARQUET-1711
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: 1.10.1
>Reporter: Lawrence He
>Priority: Major
>
> Writing following protobuf message as parquet file is not possible: 
> {code:java}
> syntax = "proto3";
> import "google/protobuf/struct.proto";
> package test;
> option java_outer_classname = "CustomMessage";
> message TestMessage {
> map data = 1;
> } {code}
> Protobuf introduced "well known json type" such like 
> [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue]
>  to work around json schema conversion. 
> However writing above messages traps parquet writer into an infinite loop due 
> to the "general type" support in protobuf. Current implementation will keep 
> referencing 6 possible types defined in protobuf (null, bool, number, string, 
> struct, list) and entering infinite loop when referencing "struct".
> {code:java}
> java.lang.StackOverflowErrorjava.lang.StackOverflowError at 
> java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at 
> java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at 
> java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044)
>  at 
> java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Comment Edited] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type

2022-06-09 Thread J Y (Jira)



[ 
https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552396#comment-17552396
 ] 

J Y edited comment on PARQUET-1711 at 6/9/22 7:16 PM:
--

i'd be ok of that approach: a proto option annotation to limit the recursion 
limit, then failing over to treat it as proto bytes.  if the recursion limit is 
omitted/missing, then just treat the recursive definition as bytes after the 
first occurrence.

forgive me if this is a naive question, but what's the difficulty in supporting 
"typing" properly to handle recursive nesting?

 

PARQUET-129 is very much related/the same issue...


was (Author: jinyius):
i'd be ok of that approach: a proto option annotation to limit the recursion 
limit, then failing over to treat it as proto bytes.  if the recursion limit is 
omitted/missing, then just treat the recursive definition as bytes after the 
first occurrence.

forgive me if this is a naive question, but what's the difficulty in supporting 
"typing" properly to handle recursive nesting?

> [parquet-protobuf] stack overflow when work with well known json type
> -
>
> Key: PARQUET-1711
> URL: https://issues.apache.org/jira/browse/PARQUET-1711
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: 1.10.1
>Reporter: Lawrence He
>Priority: Major
>
> Writing following protobuf message as parquet file is not possible: 
> {code:java}
> syntax = "proto3";
> import "google/protobuf/struct.proto";
> package test;
> option java_outer_classname = "CustomMessage";
> message TestMessage {
> map data = 1;
> } {code}
> Protobuf introduced "well known json type" such like 
> [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue]
>  to work around json schema conversion. 
> However writing above messages traps parquet writer into an infinite loop due 
> to the "general type" support in protobuf. Current implementation will keep 
> referencing 6 possible types defined in protobuf (null, bool, number, string, 
> struct, list) and entering infinite loop when referencing "struct".
> {code:java}
> java.lang.StackOverflowErrorjava.lang.StackOverflowError at 
> java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at 
> java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at 
> java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044)
>  at 
> java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (PARQUET-2180) make the default behavior for proto writing not-backwards compatible

2022-08-30 Thread J Y (Jira)

J Y created PARQUET-2180:


 Summary: make the default behavior for proto writing not-backwards 
compatible
 Key: PARQUET-2180
 URL: https://issues.apache.org/jira/browse/PARQUET-2180
 Project: Parquet
  Issue Type: Improvement
  Components: parquet-protobuf
Reporter: J Y


https://issues.apache.org/jira/browse/PARQUET-968 introduced supporting maps 
and lists in a spec compliant way.  however, to not break existing libraries, a 
flag was introduced and defaulted the write behavior to NOT use the specs 
compliant writes.

it's been over 5 years, and people should be really off of it.  so much so, 
that trying to use the new parquet-cli tool to read parquet files generated by 
flink using doesn't work b/c it's hard coded to never allow the old style.  the 
deprecated parquet-tools reads these files fine b/c it's the older style.

i started coding up a workaround in flink-parquet and parquet-cli, but stopped. 
 we really should just move on at this point, imho.  protobufs often have 
repeated primitives and maps now, so it just makes sense to move on at this 
point.  we should keep the flag around and let people override it back to being 
backwards compatible though.

i have the code written and can submit a PR if you'd like.

i'm not an expert in parquet though, so i'm unclear as to the deep downstream 
ramifications of this change, so i would love to get feedback in this area.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (PARQUET-2180) make the default behavior for proto writing not-backwards compatible

2022-08-30 Thread J Y (Jira)



 [ 
https://issues.apache.org/jira/browse/PARQUET-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J Y updated PARQUET-2180:
-
Description: 
https://issues.apache.org/jira/browse/PARQUET-968 introduced supporting maps 
and lists in a spec compliant way.  however, to not break existing libraries, a 
flag was introduced and defaulted the write behavior to NOT use the specs 
compliant writes.

it's been over 5 years, and people should be really off of it.  so much so, 
that trying to use the new parquet-cli tool to read parquet files generated by 
flink doesn't work b/c it's hard coded to never allow the old style.  the 
deprecated parquet-tools reads these files fine b/c it's the older style.

i started coding up a workaround in flink-parquet and parquet-cli, but stopped. 
 we really should just move on at this point, imho.  protobufs often have 
repeated primitives and maps now, so it just makes sense to move on at this 
point.  we should keep the flag around and let people override it back to being 
backwards compatible though.

i have the code written and can submit a PR if you'd like.

i'm not an expert in parquet though, so i'm unclear as to the deep downstream 
ramifications of this change, so i would love to get feedback in this area.

  was:
https://issues.apache.org/jira/browse/PARQUET-968 introduced supporting maps 
and lists in a spec compliant way.  however, to not break existing libraries, a 
flag was introduced and defaulted the write behavior to NOT use the specs 
compliant writes.

it's been over 5 years, and people should be really off of it.  so much so, 
that trying to use the new parquet-cli tool to read parquet files generated by 
flink using doesn't work b/c it's hard coded to never allow the old style.  the 
deprecated parquet-tools reads these files fine b/c it's the older style.

i started coding up a workaround in flink-parquet and parquet-cli, but stopped. 
 we really should just move on at this point, imho.  protobufs often have 
repeated primitives and maps now, so it just makes sense to move on at this 
point.  we should keep the flag around and let people override it back to being 
backwards compatible though.

i have the code written and can submit a PR if you'd like.

i'm not an expert in parquet though, so i'm unclear as to the deep downstream 
ramifications of this change, so i would love to get feedback in this area.


> make the default behavior for proto writing not-backwards compatible
> 
>
> Key: PARQUET-2180
> URL: https://issues.apache.org/jira/browse/PARQUET-2180
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-protobuf
>Reporter: J Y
>Priority: Minor
>
> https://issues.apache.org/jira/browse/PARQUET-968 introduced supporting maps 
> and lists in a spec compliant way.  however, to not break existing libraries, 
> a flag was introduced and defaulted the write behavior to NOT use the specs 
> compliant writes.
> it's been over 5 years, and people should be really off of it.  so much so, 
> that trying to use the new parquet-cli tool to read parquet files generated 
> by flink doesn't work b/c it's hard coded to never allow the old style.  the 
> deprecated parquet-tools reads these files fine b/c it's the older style.
> i started coding up a workaround in flink-parquet and parquet-cli, but 
> stopped.  we really should just move on at this point, imho.  protobufs often 
> have repeated primitives and maps now, so it just makes sense to move on at 
> this point.  we should keep the flag around and let people override it back 
> to being backwards compatible though.
> i have the code written and can submit a PR if you'd like.
> i'm not an expert in parquet though, so i'm unclear as to the deep downstream 
> ramifications of this change, so i would love to get feedback in this area.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (PARQUET-2180) make the default behavior for proto writing not-backwards compatible

2022-08-30 Thread J Y (Jira)



 [ 
https://issues.apache.org/jira/browse/PARQUET-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J Y updated PARQUET-2180:
-
Description: 
https://issues.apache.org/jira/browse/PARQUET-968 introduced supporting maps 
and lists in a spec compliant way.  however, to not break existing libraries, a 
flag was introduced and defaulted the write behavior to NOT use the specs 
compliant writes.

it's been over 5 years, and people should be really off of it.  so much so, 
that trying to use the new parquet-cli tool to read parquet files generated by 
flink doesn't work b/c it's hard coded to never allow the old style.  the 
deprecated parquet-tools reads these files fine b/c it's the older style.

i started coding up a workaround in flink-parquet and parquet-cli, but stopped. 
 we really should just move on at this point, imho.  protobufs often have 
repeated primitives and maps, so it's more pressing to get proper specs 
compliant support for it now.  we should keep the flag around and let people 
override it back to being backwards compatible though.

i have the code written and can submit a PR if you'd like.

i'm not an expert in parquet though, so i'm unclear as to the deep downstream 
ramifications of this change, so i would love to get feedback in this area.

  was:
https://issues.apache.org/jira/browse/PARQUET-968 introduced supporting maps 
and lists in a spec compliant way.  however, to not break existing libraries, a 
flag was introduced and defaulted the write behavior to NOT use the specs 
compliant writes.

it's been over 5 years, and people should be really off of it.  so much so, 
that trying to use the new parquet-cli tool to read parquet files generated by 
flink doesn't work b/c it's hard coded to never allow the old style.  the 
deprecated parquet-tools reads these files fine b/c it's the older style.

i started coding up a workaround in flink-parquet and parquet-cli, but stopped. 
 we really should just move on at this point, imho.  protobufs often have 
repeated primitives and maps now, so it just makes sense to move on at this 
point.  we should keep the flag around and let people override it back to being 
backwards compatible though.

i have the code written and can submit a PR if you'd like.

i'm not an expert in parquet though, so i'm unclear as to the deep downstream 
ramifications of this change, so i would love to get feedback in this area.


> make the default behavior for proto writing not-backwards compatible
> 
>
> Key: PARQUET-2180
> URL: https://issues.apache.org/jira/browse/PARQUET-2180
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-protobuf
>Reporter: J Y
>Priority: Minor
>
> https://issues.apache.org/jira/browse/PARQUET-968 introduced supporting maps 
> and lists in a spec compliant way.  however, to not break existing libraries, 
> a flag was introduced and defaulted the write behavior to NOT use the specs 
> compliant writes.
> it's been over 5 years, and people should be really off of it.  so much so, 
> that trying to use the new parquet-cli tool to read parquet files generated 
> by flink doesn't work b/c it's hard coded to never allow the old style.  the 
> deprecated parquet-tools reads these files fine b/c it's the older style.
> i started coding up a workaround in flink-parquet and parquet-cli, but 
> stopped.  we really should just move on at this point, imho.  protobufs often 
> have repeated primitives and maps, so it's more pressing to get proper specs 
> compliant support for it now.  we should keep the flag around and let people 
> override it back to being backwards compatible though.
> i have the code written and can submit a PR if you'd like.
> i'm not an expert in parquet though, so i'm unclear as to the deep downstream 
> ramifications of this change, so i would love to get feedback in this area.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated schemas that have repeated primitives in them

2022-08-30 Thread J Y (Jira)

J Y created PARQUET-2181:


 Summary: parquet-cli fails at supporting parquet-protobuf 
generated schemas that have repeated primitives in them
 Key: PARQUET-2181
 URL: https://issues.apache.org/jira/browse/PARQUET-2181
 Project: Parquet
  Issue Type: Bug
  Components: parquet-cli
Reporter: J Y


i generated a parquet file using a protobuf with this proto definition:

{quote}message IndexPath {
  // Index of item in path.
  repeated int32 index = 1;
}

message SomeEvent {
  // truncated/obfuscated wrapper
  optional IndexPath client_position = 1;
}
{quote}

this gets translated to the following parquet schema using the new compliant 
schema for lists:

{quote}message SomeEvent {
  optional group client_position = 24 {
    optional group index (LIST) = 1 {
      repeated group list {
        required int32 element;
  }
    }
  }
}{quote}

this causes parquet-cli cat to barf on a file containing these events:

{quote}java.lang.RuntimeException: Failed on record 0
        at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
        at org.apache.parquet.cli.Main.run(Main.java:157)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.parquet.cli.Main.main(Main.java:187)
Caused by: java.lang.ClassCastException: required int32 element is not a group
        at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
        at 
org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.(AvroRecordConverter.java:539)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:489)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:91)
        at 
org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
        at 
org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:142)
        at 
org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:190)
        at 
org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:166)
        at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
        at org.apache.parquet.cli.BaseCommand$1$1.advance(BaseCommand.java:363)
        at org.apache.parquet.cli.BaseCommand$1$1.(BaseCommand.java:344)
        at org.apache.parquet.cli.BaseCommand$1.iterator(BaseCommand.java:342)
        at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:73)
        ... 3 more{quote}

using the old parquet-tools binary to cat this file works fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated schemas that have repeated primitives in them

2022-08-30 Thread J Y (Jira)



 [ 
https://issues.apache.org/jira/browse/PARQUET-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J Y updated PARQUET-2181:
-
Description: 
i generated a parquet file using a protobuf with this proto definition:

{quote}message IndexPath {
  // Index of item in path.
  repeated int32 index = 1;
}

message SomeEvent {
  // truncated/obfuscated wrapper
  optional IndexPath client_position = 1;
}
{quote}

this gets translated to the following parquet schema using the new compliant 
schema for lists:

{quote}message SomeEvent {
  optional group client_position = 24 {
    optional group index (LIST) = 1 {
      repeated group list {{}}
        required int32 element;
  }
    }
  }
}{quote}

this causes parquet-cli cat to barf on a file containing these events:

{quote}java.lang.RuntimeException: Failed on record 0
        at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
        at org.apache.parquet.cli.Main.run(Main.java:157)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.parquet.cli.Main.main(Main.java:187)
Caused by: java.lang.ClassCastException: required int32 element is not a group
        at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
        at 
org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.(AvroRecordConverter.java:539)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:489)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:91)
        at 
org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
        at 
org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:142)
        at 
org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:190)
        at 
org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:166)
        at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
        at org.apache.parquet.cli.BaseCommand$1$1.advance(BaseCommand.java:363)
        at org.apache.parquet.cli.BaseCommand$1$1.(BaseCommand.java:344)
        at org.apache.parquet.cli.BaseCommand$1.iterator(BaseCommand.java:342)
        at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:73)
        ... 3 more{quote}

using the old parquet-tools binary to cat this file works fine.

  was:
i generated a parquet file using a protobuf with this proto definition:

{quote}message IndexPath {
  // Index of item in path.
  repeated int32 index = 1;
}

message SomeEvent {
  // truncated/obfuscated wrapper
  optional IndexPath client_position = 1;
}
{quote}

this gets translated to the following parquet schema using the new compliant 
schema for lists:

{quote}message SomeEvent {
  optional group client_position = 24 {
    optional group index (LIST) = 1 {
      repeated group list {
        required int32 element;
  }
    }
  }
}{quote}

this causes parquet-cli cat to barf on a file containing these events:

{quote}java.lang.RuntimeException: Failed on record 0
        at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
        at org.apache.parquet.cli.Main.run(Main.java:157)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.parquet.cli.Main.main(Main.java:187)
Caused by: java.lang.ClassCastException: required int32 element is not a group
        at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
        at 
org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.(AvroRecordConverter.java:539)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:489)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
        at 
org.apach

[jira] [Updated] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated schemas that have repeated primitives in them

2022-08-30 Thread J Y (Jira)



 [ 
https://issues.apache.org/jira/browse/PARQUET-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J Y updated PARQUET-2181:
-
Description: 
i generated a parquet file using a protobuf with this proto definition:

{quote}message IndexPath {
  // Index of item in path.
  repeated int32 index = 1;
}

message SomeEvent {
  // truncated/obfuscated wrapper
  optional IndexPath client_position = 1;
}
{quote}

this gets translated to the following parquet schema using the new compliant 
schema for lists:

{quote}message SomeEvent {
  optional group client_position = 24 {
    optional group index (LIST) = 1 {
      repeated group list {
        required int32 element;
  }
    }
  }
}{quote}

this causes parquet-cli cat to barf on a file containing these events:

{quote}java.lang.RuntimeException: Failed on record 0
        at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
        at org.apache.parquet.cli.Main.run(Main.java:157)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.parquet.cli.Main.main(Main.java:187)
Caused by: java.lang.ClassCastException: required int32 element is not a group
        at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
        at 
org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.(AvroRecordConverter.java:539)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:489)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:91)
        at 
org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
        at 
org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:142)
        at 
org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:190)
        at 
org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:166)
        at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
        at org.apache.parquet.cli.BaseCommand$1$1.advance(BaseCommand.java:363)
        at org.apache.parquet.cli.BaseCommand$1$1.(BaseCommand.java:344)
        at org.apache.parquet.cli.BaseCommand$1.iterator(BaseCommand.java:342)
        at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:73)
        ... 3 more{quote}

using the old parquet-tools binary to cat this file works fine.

  was:
i generated a parquet file using a protobuf with this proto definition:

{quote}message IndexPath {
  // Index of item in path.
  repeated int32 index = 1;
}

message SomeEvent {
  // truncated/obfuscated wrapper
  optional IndexPath client_position = 1;
}
{quote}

this gets translated to the following parquet schema using the new compliant 
schema for lists:

{quote}message SomeEvent {
  optional group client_position = 24 {
    optional group index (LIST) = 1 {
      repeated group list }
        required int32 element;
  }
    }
  }
}{quote}

this causes parquet-cli cat to barf on a file containing these events:

{quote}java.lang.RuntimeException: Failed on record 0
        at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
        at org.apache.parquet.cli.Main.run(Main.java:157)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.parquet.cli.Main.main(Main.java:187)
Caused by: java.lang.ClassCastException: required int32 element is not a group
        at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
        at 
org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.(AvroRecordConverter.java:539)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:489)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
        at 
org.apache.p

[jira] [Updated] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated schemas that have repeated primitives in them

2022-08-30 Thread J Y (Jira)



 [ 
https://issues.apache.org/jira/browse/PARQUET-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J Y updated PARQUET-2181:
-
Description: 
i generated a parquet file using a protobuf with this proto definition:

{quote}message IndexPath {
  // Index of item in path.
  repeated int32 index = 1;
}

message SomeEvent {
  // truncated/obfuscated wrapper
  optional IndexPath client_position = 1;
}
{quote}

this gets translated to the following parquet schema using the new compliant 
schema for lists:

{quote}message SomeEvent {
  optional group client_position = 24 {
    optional group index (LIST) = 1 {
      repeated group list }
        required int32 element;
  }
    }
  }
}{quote}

this causes parquet-cli cat to barf on a file containing these events:

{quote}java.lang.RuntimeException: Failed on record 0
        at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
        at org.apache.parquet.cli.Main.run(Main.java:157)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.parquet.cli.Main.main(Main.java:187)
Caused by: java.lang.ClassCastException: required int32 element is not a group
        at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
        at 
org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.(AvroRecordConverter.java:539)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:489)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:91)
        at 
org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
        at 
org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:142)
        at 
org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:190)
        at 
org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:166)
        at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
        at org.apache.parquet.cli.BaseCommand$1$1.advance(BaseCommand.java:363)
        at org.apache.parquet.cli.BaseCommand$1$1.(BaseCommand.java:344)
        at org.apache.parquet.cli.BaseCommand$1.iterator(BaseCommand.java:342)
        at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:73)
        ... 3 more{quote}

using the old parquet-tools binary to cat this file works fine.

  was:
i generated a parquet file using a protobuf with this proto definition:

{quote}message IndexPath {
  // Index of item in path.
  repeated int32 index = 1;
}

message SomeEvent {
  // truncated/obfuscated wrapper
  optional IndexPath client_position = 1;
}
{quote}

this gets translated to the following parquet schema using the new compliant 
schema for lists:

{quote}message SomeEvent {
  optional group client_position = 24 {
    optional group index (LIST) = 1 {
      repeated group list {
        required int32 element;
  }
    }
  }
}{quote}

this causes parquet-cli cat to barf on a file containing these events:

{quote}java.lang.RuntimeException: Failed on record 0
        at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
        at org.apache.parquet.cli.Main.run(Main.java:157)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.parquet.cli.Main.main(Main.java:187)
Caused by: java.lang.ClassCastException: required int32 element is not a group
        at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
        at 
org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.(AvroRecordConverter.java:539)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:489)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
        at 
org.apache.p

[jira] [Updated] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated schemas that have repeated primitives in them

2022-08-30 Thread J Y (Jira)



 [ 
https://issues.apache.org/jira/browse/PARQUET-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J Y updated PARQUET-2181:
-
Description: 
i generated a parquet file using a protobuf with this proto definition:

{quote}message IndexPath {
  // Index of item in path.
  repeated int32 index = 1;
}

message SomeEvent {
  // truncated/obfuscated wrapper
  optional IndexPath client_position = 1;
}
{quote}

this gets translated to the following parquet schema using the new compliant 
schema for lists:

{quote}message SomeEvent {
  optional group client_position = 24 {
    optional group index (LIST) = 1 {
      repeated group list {
        required int32 element;
  }
    }
  }
}{quote}

this causes parquet-cli cat to barf on a file containing these events:

{quote}java.lang.RuntimeException: Failed on record 0
        at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
        at org.apache.parquet.cli.Main.run(Main.java:157)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.parquet.cli.Main.main(Main.java:187)
Caused by: java.lang.ClassCastException: required int32 element is not a group
        at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
        at 
org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.(AvroRecordConverter.java:539)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:489)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:91)
        at 
org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
        at 
org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:142)
        at 
org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:190)
        at 
org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:166)
        at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
        at org.apache.parquet.cli.BaseCommand$1$1.advance(BaseCommand.java:363)
        at org.apache.parquet.cli.BaseCommand$1$1.(BaseCommand.java:344)
        at org.apache.parquet.cli.BaseCommand$1.iterator(BaseCommand.java:342)
        at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:73)
        ... 3 more{quote}

using the old parquet-tools binary to cat this file works fine.

  was:
i generated a parquet file using a protobuf with this proto definition:

{quote}message IndexPath {
  // Index of item in path.
  repeated int32 index = 1;
}

message SomeEvent {
  // truncated/obfuscated wrapper
  optional IndexPath client_position = 1;
}
{quote}

this gets translated to the following parquet schema using the new compliant 
schema for lists:

{quote}message SomeEvent {
  optional group client_position = 24 {
    optional group index (LIST) = 1 {
      repeated group list {{}}
        required int32 element;
  }
    }
  }
}{quote}

this causes parquet-cli cat to barf on a file containing these events:

{quote}java.lang.RuntimeException: Failed on record 0
        at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
        at org.apache.parquet.cli.Main.run(Main.java:157)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.parquet.cli.Main.main(Main.java:187)
Caused by: java.lang.ClassCastException: required int32 element is not a group
        at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
        at 
org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.(AvroRecordConverter.java:539)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:489)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
        at 
org.apach

[jira] [Updated] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated schemas that have repeated primitives in them

2022-08-30 Thread J Y (Jira)



 [ 
https://issues.apache.org/jira/browse/PARQUET-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J Y updated PARQUET-2181:
-
Description: 
i generated a parquet file using a protobuf with this proto definition:
{quote}message IndexPath {
  // Index of item in path.
  repeated int32 index = 1;
}

message SomeEvent {
  // truncated/obfuscated wrapper
optional IndexPath client_position = 1;
}
{quote}
this gets translated to the following parquet schema using the new compliant 
schema for lists:
{quote}message SomeEvent {
  optional group client_position = 24 {
    optional group index (LIST) = 1 {
      repeated group list {
        required int32 element;
      }
    }
  }
}{quote}
this causes parquet-cli cat to barf on a file containing these events:
{quote}java.lang.RuntimeException: Failed on record 0
        at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
        at org.apache.parquet.cli.Main.run(Main.java:157)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.parquet.cli.Main.main(Main.java:187)
Caused by: java.lang.ClassCastException: required int32 element is not a group
        at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
        at 
org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.(AvroRecordConverter.java:539)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:489)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:91)
        at 
org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
        at 
org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:142)
        at 
org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:190)
        at 
org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:166)
        at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
        at org.apache.parquet.cli.BaseCommand$1$1.advance(BaseCommand.java:363)
        at org.apache.parquet.cli.BaseCommand$1$1.(BaseCommand.java:344)
        at org.apache.parquet.cli.BaseCommand$1.iterator(BaseCommand.java:342)
        at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:73)
        ... 3 more
{quote}
using the old parquet-tools binary to cat this file works fine.

  was:
i generated a parquet file using a protobuf with this proto definition:

{quote}message IndexPath {
  // Index of item in path.
  repeated int32 index = 1;
}

message SomeEvent {
  // truncated/obfuscated wrapper
  optional IndexPath client_position = 1;
}
{quote}

this gets translated to the following parquet schema using the new compliant 
schema for lists:

{quote}message SomeEvent {
  optional group client_position = 24 {
    optional group index (LIST) = 1 {
      repeated group list {
        required int32 element;
  }
    }
  }
}{quote}

this causes parquet-cli cat to barf on a file containing these events:

{quote}java.lang.RuntimeException: Failed on record 0
        at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
        at org.apache.parquet.cli.Main.run(Main.java:157)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.parquet.cli.Main.main(Main.java:187)
Caused by: java.lang.ClassCastException: required int32 element is not a group
        at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
        at 
org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.(AvroRecordConverter.java:539)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:489)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
        at 
org.apache.parquet.

[jira] [Updated] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated schemas that have repeated primitives in them

2022-08-30 Thread J Y (Jira)



 [ 
https://issues.apache.org/jira/browse/PARQUET-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J Y updated PARQUET-2181:
-
Description: 
i generated a parquet file using a protobuf with this proto definition:

{code:java}
message IndexPath {
  // Index of item in path.
  repeated int32 index = 1;
}

message SomeEvent {
  // truncated/obfuscated wrapper
  optional IndexPath client_position = 1;
}
{code}

this gets translated to the following parquet schema using the new compliant 
schema for lists:

{code:java}
message SomeEvent {
  optional group client_position = 24 {
    optional group index (LIST) = 1 {
      repeated group list {
        required int32 element;
      }
    }
  }
}
{code}

this causes parquet-cli cat to barf on a file containing these events:
{quote}java.lang.RuntimeException: Failed on record 0
        at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
        at org.apache.parquet.cli.Main.run(Main.java:157)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.parquet.cli.Main.main(Main.java:187)
Caused by: java.lang.ClassCastException: required int32 element is not a group
        at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
        at 
org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.(AvroRecordConverter.java:539)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:489)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:91)
        at 
org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
        at 
org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:142)
        at 
org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:190)
        at 
org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:166)
        at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
        at org.apache.parquet.cli.BaseCommand$1$1.advance(BaseCommand.java:363)
        at org.apache.parquet.cli.BaseCommand$1$1.(BaseCommand.java:344)
        at org.apache.parquet.cli.BaseCommand$1.iterator(BaseCommand.java:342)
        at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:73)
        ... 3 more
{quote}
using the old parquet-tools binary to cat this file works fine.

  was:
i generated a parquet file using a protobuf with this proto definition:

{code:java}
message IndexPath {
  // Index of item in path.
  repeated int32 index = 1;
}

message SomeEvent {
  // truncated/obfuscated wrapper
optional IndexPath client_position = 1;
}
{code}

this gets translated to the following parquet schema using the new compliant 
schema for lists:

{code:java}
message SomeEvent {
  optional group client_position = 24 {
    optional group index (LIST) = 1 {
      repeated group list {
        required int32 element;
      }
    }
  }
}
{code}

this causes parquet-cli cat to barf on a file containing these events:
{quote}java.lang.RuntimeException: Failed on record 0
        at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
        at org.apache.parquet.cli.Main.run(Main.java:157)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.parquet.cli.Main.main(Main.java:187)
Caused by: java.lang.ClassCastException: required int32 element is not a group
        at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
        at 
org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.(AvroRecordConverter.java:539)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:489)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
        at

[jira] [Updated] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated schemas that have repeated primitives in them

2022-08-30 Thread J Y (Jira)



 [ 
https://issues.apache.org/jira/browse/PARQUET-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J Y updated PARQUET-2181:
-
Description: 
i generated a parquet file using a protobuf with this proto definition:

{code:java}
message IndexPath {
  // Index of item in path.
  repeated int32 index = 1;
}

message SomeEvent {
  // truncated/obfuscated wrapper
optional IndexPath client_position = 1;
}
{code}

this gets translated to the following parquet schema using the new compliant 
schema for lists:

{code:java}
message SomeEvent {
  optional group client_position = 24 {
    optional group index (LIST) = 1 {
      repeated group list {
        required int32 element;
      }
    }
  }
}
{code}

this causes parquet-cli cat to barf on a file containing these events:
{quote}java.lang.RuntimeException: Failed on record 0
        at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
        at org.apache.parquet.cli.Main.run(Main.java:157)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.parquet.cli.Main.main(Main.java:187)
Caused by: java.lang.ClassCastException: required int32 element is not a group
        at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
        at 
org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.(AvroRecordConverter.java:539)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:489)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:91)
        at 
org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
        at 
org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:142)
        at 
org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:190)
        at 
org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:166)
        at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
        at org.apache.parquet.cli.BaseCommand$1$1.advance(BaseCommand.java:363)
        at org.apache.parquet.cli.BaseCommand$1$1.(BaseCommand.java:344)
        at org.apache.parquet.cli.BaseCommand$1.iterator(BaseCommand.java:342)
        at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:73)
        ... 3 more
{quote}
using the old parquet-tools binary to cat this file works fine.

  was:
i generated a parquet file using a protobuf with this proto definition:
{quote}message IndexPath {
  // Index of item in path.
  repeated int32 index = 1;
}

message SomeEvent {
  // truncated/obfuscated wrapper
optional IndexPath client_position = 1;
}
{quote}
this gets translated to the following parquet schema using the new compliant 
schema for lists:
{quote}message SomeEvent {
  optional group client_position = 24 {
    optional group index (LIST) = 1 {
      repeated group list {
        required int32 element;
      }
    }
  }
}{quote}
this causes parquet-cli cat to barf on a file containing these events:
{quote}java.lang.RuntimeException: Failed on record 0
        at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
        at org.apache.parquet.cli.Main.run(Main.java:157)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.parquet.cli.Main.main(Main.java:187)
Caused by: java.lang.ClassCastException: required int32 element is not a group
        at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
        at 
org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.(AvroRecordConverter.java:539)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:489)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
        at 
org.apache.pa

[jira] [Updated] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated schemas that have repeated primitives in them

2022-08-30 Thread J Y (Jira)



 [ 
https://issues.apache.org/jira/browse/PARQUET-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J Y updated PARQUET-2181:
-
Description: 
i generated a parquet file using a protobuf with this proto definition:

{code:java}
message IndexPath {
  // Index of item in path.
  repeated int32 index = 1;
}

message SomeEvent {
  // truncated/obfuscated wrapper
  optional IndexPath client_position = 1;
}
{code}

this gets translated to the following parquet schema using the new compliant 
schema for lists:

{code:java}
message SomeEvent {
  optional group client_position = 1 {
    optional group index (LIST) = 1 {
      repeated group list {
        required int32 element;
      }
    }
  }
}
{code}

this causes parquet-cli cat to barf on a file containing these events:
{quote}java.lang.RuntimeException: Failed on record 0
        at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
        at org.apache.parquet.cli.Main.run(Main.java:157)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.parquet.cli.Main.main(Main.java:187)
Caused by: java.lang.ClassCastException: required int32 element is not a group
        at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
        at 
org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.(AvroRecordConverter.java:539)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:489)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:91)
        at 
org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
        at 
org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:142)
        at 
org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:190)
        at 
org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:166)
        at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
        at org.apache.parquet.cli.BaseCommand$1$1.advance(BaseCommand.java:363)
        at org.apache.parquet.cli.BaseCommand$1$1.(BaseCommand.java:344)
        at org.apache.parquet.cli.BaseCommand$1.iterator(BaseCommand.java:342)
        at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:73)
        ... 3 more
{quote}
using the old parquet-tools binary to cat this file works fine.

  was:
i generated a parquet file using a protobuf with this proto definition:

{code:java}
message IndexPath {
  // Index of item in path.
  repeated int32 index = 1;
}

message SomeEvent {
  // truncated/obfuscated wrapper
  optional IndexPath client_position = 1;
}
{code}

this gets translated to the following parquet schema using the new compliant 
schema for lists:

{code:java}
message SomeEvent {
  optional group client_position = 24 {
    optional group index (LIST) = 1 {
      repeated group list {
        required int32 element;
      }
    }
  }
}
{code}

this causes parquet-cli cat to barf on a file containing these events:
{quote}java.lang.RuntimeException: Failed on record 0
        at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
        at org.apache.parquet.cli.Main.run(Main.java:157)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.parquet.cli.Main.main(Main.java:187)
Caused by: java.lang.ClassCastException: required int32 element is not a group
        at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
        at 
org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.(AvroRecordConverter.java:539)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:489)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
        at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
        a

[jira] [Updated] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated schemas that have repeated primitives in them

2022-08-30 Thread J Y (Jira)



 [ 
https://issues.apache.org/jira/browse/PARQUET-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J Y updated PARQUET-2181:
-
Attachment: samples.tgz

> parquet-cli fails at supporting parquet-protobuf generated schemas that have 
> repeated primitives in them
> 
>
> Key: PARQUET-2181
> URL: https://issues.apache.org/jira/browse/PARQUET-2181
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cli
>Reporter: J Y
>Priority: Major
> Attachments: samples.tgz
>
>
> i generated a parquet file using a protobuf with this proto definition:
> {code:java}
> message IndexPath {
>   // Index of item in path.
>   repeated int32 index = 1;
> }
> message SomeEvent {
>   // truncated/obfuscated wrapper
>   optional IndexPath client_position = 1;
> }
> {code}
> this gets translated to the following parquet schema using the new compliant 
> schema for lists:
> {code:java}
> message SomeEvent {
>   optional group client_position = 1 {
>     optional group index (LIST) = 1 {
>       repeated group list {
>         required int32 element;
>       }
>     }
>   }
> }
> {code}
> this causes parquet-cli cat to barf on a file containing these events:
> {quote}java.lang.RuntimeException: Failed on record 0
>         at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
>         at org.apache.parquet.cli.Main.run(Main.java:157)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>         at org.apache.parquet.cli.Main.main(Main.java:187)
> Caused by: java.lang.ClassCastException: required int32 element is not a group
>         at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
>         at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.(AvroRecordConverter.java:539)
>         at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:489)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:91)
>         at 
> org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
>         at 
> org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:142)
>         at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:190)
>         at 
> org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:166)
>         at 
> org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
>         at 
> org.apache.parquet.cli.BaseCommand$1$1.advance(BaseCommand.java:363)
>         at org.apache.parquet.cli.BaseCommand$1$1.(BaseCommand.java:344)
>         at org.apache.parquet.cli.BaseCommand$1.iterator(BaseCommand.java:342)
>         at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:73)
>         ... 3 more
> {quote}
> using the old parquet-tools binary to cat this file works fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated schemas that have repeated primitives in them

2022-08-30 Thread J Y (Jira)



[ 
https://issues.apache.org/jira/browse/PARQUET-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598196#comment-17598196
 ] 

J Y commented on PARQUET-2181:
--

i've attached some parquet files that all read fine using parquet-tools (both 
the deprecated version from parquet-mr and the [one written in 
python|https://github.com/ktrueda/parquet-tools]) *but do not read at all using 
parquet-cli*.  parquet-cli's meta command works fine.

it turns out, there's other stack traces when trying to use parquet-cli to read 
these files.  in addition to the repeated primitive issue highlighted 
originally, there's 2 other issues like the following exhibited in these files:

{quote}--- 
./raw/delivery-log/dt=2022-08-10/hour=04/part-02a95a0e-bd21-4476-9d0f-d1896687b12a-0
 
Argument error: Map key type must be binary (UTF8): required int32 key  

 

--- 
./raw/user/dt=2022-08-10/hour=04/part-8cac1d0c-fb7f-4a9a-b77e-b3dd59f89333-0
 
Unknown error   

 
java.lang.RuntimeException: Failed on record 0  

 
at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)   

 
at org.apache.parquet.cli.Main.run(Main.java:157)   

 
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)

 
at org.apache.parquet.cli.Main.main(Main.java:187)  

 
Caused by: org.apache.parquet.io.InvalidRecordException: Parquet/Avro schema 
mismatch: Avro field 'null_value' not found
at 
org.apache.parquet.avro.AvroRecordConverter.getAvroField(AvroRecordConverter.java:221)
   {quote} 

is using AvroReadSupport and AvroRecrodConverter the right way to go for 
protobufs?  it looks like the parquet-tools that was deprecated in 1.12.3+ 
doesn't use the parquet-avro approach to reading (it uses [its own 
SimpleReadSupport 
approach|https://github.com/apache/parquet-mr/tree/apache-parquet-1.12.2/parquet-tools-deprecated/src/main/java/org/apache/parquet/tools/read]),
 which makes sense to me given the underlying schema and data written in 
parquet-protobuf generated files are not avro...

should we move parquet-cli back to SimpleReadSupport instead of relying on what 
appears to be a broken AvroReadSupport when dealing with proto generated files?

> parquet-cli fails at supporting parquet-protobuf generated schemas that have 
> repeated primitives in them
> 
>
> Key: PARQUET-2181
> URL: https://issues.apache.org/jira/browse/PARQUET-2181
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cli
>Reporter: J Y
>Priority: Major
> Attachments: samples.tgz
>
>
> i generated a parquet file using a protobuf with this proto definition:
> {code:java}
> message IndexPath {
>   // Index of item in path.
>   repeated int32 index = 1;
> }
> message SomeEvent {
>   // truncated/obfuscated wrapper
>   optional IndexPath client_position = 1;
> }
> {code}
> this gets translated to the following parquet schema using the new compliant 
> schema for lists:
> {code:java}
> message SomeEvent {
>   optional group client_position = 1 {
>     optional group index (LIST) = 1 {
>       repeated group list {
>         required int32 element;
>       }
>     }
>   }
> }
> {code}
> this causes parquet-cli cat to barf on a file containing these events:
> {quote}java.lang.RuntimeException: Failed on record 0
>         at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
>         at org.apache.parquet.cli.Main.run(Main.java:157)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>         at org.apache.parquet.cli.Main.main(Main.java:187)
> Caused by: java.lang.ClassCastException: required int32 element is not a group
>         at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
>         at 
> org.

[jira] [Updated] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated schemas that have repeated primitives in them

2022-08-30 Thread J Y (Jira)



 [ 
https://issues.apache.org/jira/browse/PARQUET-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J Y updated PARQUET-2181:
-
Priority: Critical  (was: Major)

> parquet-cli fails at supporting parquet-protobuf generated schemas that have 
> repeated primitives in them
> 
>
> Key: PARQUET-2181
> URL: https://issues.apache.org/jira/browse/PARQUET-2181
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cli
>Reporter: J Y
>Priority: Critical
> Attachments: samples.tgz
>
>
> i generated a parquet file using a protobuf with this proto definition:
> {code:java}
> message IndexPath {
>   // Index of item in path.
>   repeated int32 index = 1;
> }
> message SomeEvent {
>   // truncated/obfuscated wrapper
>   optional IndexPath client_position = 1;
> }
> {code}
> this gets translated to the following parquet schema using the new compliant 
> schema for lists:
> {code:java}
> message SomeEvent {
>   optional group client_position = 1 {
>     optional group index (LIST) = 1 {
>       repeated group list {
>         required int32 element;
>       }
>     }
>   }
> }
> {code}
> this causes parquet-cli cat to barf on a file containing these events:
> {quote}java.lang.RuntimeException: Failed on record 0
>         at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
>         at org.apache.parquet.cli.Main.run(Main.java:157)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>         at org.apache.parquet.cli.Main.main(Main.java:187)
> Caused by: java.lang.ClassCastException: required int32 element is not a group
>         at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
>         at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.(AvroRecordConverter.java:539)
>         at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:489)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:91)
>         at 
> org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
>         at 
> org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:142)
>         at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:190)
>         at 
> org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:166)
>         at 
> org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
>         at 
> org.apache.parquet.cli.BaseCommand$1$1.advance(BaseCommand.java:363)
>         at org.apache.parquet.cli.BaseCommand$1$1.(BaseCommand.java:344)
>         at org.apache.parquet.cli.BaseCommand$1.iterator(BaseCommand.java:342)
>         at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:73)
>         ... 3 more
> {quote}
> using the old parquet-tools binary to cat this file works fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated files

2022-08-30 Thread J Y (Jira)



 [ 
https://issues.apache.org/jira/browse/PARQUET-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J Y updated PARQUET-2181:
-
Summary: parquet-cli fails at supporting parquet-protobuf generated files  
(was: parquet-cli fails at supporting parquet-protobuf generated schemas that 
have repeated primitives in them)

> parquet-cli fails at supporting parquet-protobuf generated files
> 
>
> Key: PARQUET-2181
> URL: https://issues.apache.org/jira/browse/PARQUET-2181
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cli
>Reporter: J Y
>Priority: Critical
> Attachments: samples.tgz
>
>
> i generated a parquet file using a protobuf with this proto definition:
> {code:java}
> message IndexPath {
>   // Index of item in path.
>   repeated int32 index = 1;
> }
> message SomeEvent {
>   // truncated/obfuscated wrapper
>   optional IndexPath client_position = 1;
> }
> {code}
> this gets translated to the following parquet schema using the new compliant 
> schema for lists:
> {code:java}
> message SomeEvent {
>   optional group client_position = 1 {
>     optional group index (LIST) = 1 {
>       repeated group list {
>         required int32 element;
>       }
>     }
>   }
> }
> {code}
> this causes parquet-cli cat to barf on a file containing these events:
> {quote}java.lang.RuntimeException: Failed on record 0
>         at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
>         at org.apache.parquet.cli.Main.run(Main.java:157)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>         at org.apache.parquet.cli.Main.main(Main.java:187)
> Caused by: java.lang.ClassCastException: required int32 element is not a group
>         at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
>         at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.(AvroRecordConverter.java:539)
>         at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:489)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:91)
>         at 
> org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
>         at 
> org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:142)
>         at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:190)
>         at 
> org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:166)
>         at 
> org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
>         at 
> org.apache.parquet.cli.BaseCommand$1$1.advance(BaseCommand.java:363)
>         at org.apache.parquet.cli.BaseCommand$1$1.(BaseCommand.java:344)
>         at org.apache.parquet.cli.BaseCommand$1.iterator(BaseCommand.java:342)
>         at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:73)
>         ... 3 more
> {quote}
> using the old parquet-tools binary to cat this file works fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated files

2022-08-31 Thread J Y (Jira)



[ 
https://issues.apache.org/jira/browse/PARQUET-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598508#comment-17598508
 ] 

J Y commented on PARQUET-2181:
--

the more i think about this, the more i believe using avro reading as the basis 
for parquet reading is broken.  for example, {{Argument error: Map key type 
must be binary (UTF8): required int32 key}} is due to avro requiring all map 
keys to be strings.  parquet and protos do not have this limitation.  avro as 
the schema definition doesn't seem expressive enough to easily interoperate 
with these other formats.

> parquet-cli fails at supporting parquet-protobuf generated files
> 
>
> Key: PARQUET-2181
> URL: https://issues.apache.org/jira/browse/PARQUET-2181
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cli
>Reporter: J Y
>Priority: Critical
> Attachments: samples.tgz
>
>
> i generated a parquet file using a protobuf with this proto definition:
> {code:java}
> message IndexPath {
>   // Index of item in path.
>   repeated int32 index = 1;
> }
> message SomeEvent {
>   // truncated/obfuscated wrapper
>   optional IndexPath client_position = 1;
> }
> {code}
> this gets translated to the following parquet schema using the new compliant 
> schema for lists:
> {code:java}
> message SomeEvent {
>   optional group client_position = 1 {
>     optional group index (LIST) = 1 {
>       repeated group list {
>         required int32 element;
>       }
>     }
>   }
> }
> {code}
> this causes parquet-cli cat to barf on a file containing these events:
> {quote}java.lang.RuntimeException: Failed on record 0
>         at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
>         at org.apache.parquet.cli.Main.run(Main.java:157)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>         at org.apache.parquet.cli.Main.main(Main.java:187)
> Caused by: java.lang.ClassCastException: required int32 element is not a group
>         at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
>         at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.(AvroRecordConverter.java:539)
>         at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:489)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:91)
>         at 
> org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
>         at 
> org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:142)
>         at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:190)
>         at 
> org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:166)
>         at 
> org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
>         at 
> org.apache.parquet.cli.BaseCommand$1$1.advance(BaseCommand.java:363)
>         at org.apache.parquet.cli.BaseCommand$1$1.(BaseCommand.java:344)
>         at org.apache.parquet.cli.BaseCommand$1.iterator(BaseCommand.java:342)
>         at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:73)
>         ... 3 more
> {quote}
> using the old parquet-tools binary to cat this file works fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated files

2022-09-02 Thread J Y (Jira)



 [ 
https://issues.apache.org/jira/browse/PARQUET-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J Y updated PARQUET-2181:
-
Attachment: sample-depth-1.tgz

> parquet-cli fails at supporting parquet-protobuf generated files
> 
>
> Key: PARQUET-2181
> URL: https://issues.apache.org/jira/browse/PARQUET-2181
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cli
>Reporter: J Y
>Priority: Critical
> Attachments: sample-depth-1.tgz, samples.tgz
>
>
> i generated a parquet file using a protobuf with this proto definition:
> {code:java}
> message IndexPath {
>   // Index of item in path.
>   repeated int32 index = 1;
> }
> message SomeEvent {
>   // truncated/obfuscated wrapper
>   optional IndexPath client_position = 1;
> }
> {code}
> this gets translated to the following parquet schema using the new compliant 
> schema for lists:
> {code:java}
> message SomeEvent {
>   optional group client_position = 1 {
>     optional group index (LIST) = 1 {
>       repeated group list {
>         required int32 element;
>       }
>     }
>   }
> }
> {code}
> this causes parquet-cli cat to barf on a file containing these events:
> {quote}java.lang.RuntimeException: Failed on record 0
>         at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
>         at org.apache.parquet.cli.Main.run(Main.java:157)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>         at org.apache.parquet.cli.Main.main(Main.java:187)
> Caused by: java.lang.ClassCastException: required int32 element is not a group
>         at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
>         at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.(AvroRecordConverter.java:539)
>         at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:489)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:91)
>         at 
> org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
>         at 
> org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:142)
>         at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:190)
>         at 
> org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:166)
>         at 
> org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
>         at 
> org.apache.parquet.cli.BaseCommand$1$1.advance(BaseCommand.java:363)
>         at org.apache.parquet.cli.BaseCommand$1$1.(BaseCommand.java:344)
>         at org.apache.parquet.cli.BaseCommand$1.iterator(BaseCommand.java:342)
>         at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:73)
>         ... 3 more
> {quote}
> using the old parquet-tools binary to cat this file works fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated files

2022-09-02 Thread J Y (Jira)



[ 
https://issues.apache.org/jira/browse/PARQUET-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599721#comment-17599721
 ] 

J Y commented on PARQUET-2181:
--

so here's an interesting update.  one of my colleagues who is more familiar 
with this area suggested that the schema itself is probably too large.  i 
regenerated this data using a more limited recursion depth, and the parquet-cli 
tool worked fine.  i'm attaching the set of files that work 
([^sample-depth-1.tgz]).

so it seems the core issue here is that there's a hard limit somewhere on the 
schema sizes... 

> parquet-cli fails at supporting parquet-protobuf generated files
> 
>
> Key: PARQUET-2181
> URL: https://issues.apache.org/jira/browse/PARQUET-2181
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cli
>Reporter: J Y
>Priority: Critical
> Attachments: sample-depth-1.tgz, samples.tgz
>
>
> i generated a parquet file using a protobuf with this proto definition:
> {code:java}
> message IndexPath {
>   // Index of item in path.
>   repeated int32 index = 1;
> }
> message SomeEvent {
>   // truncated/obfuscated wrapper
>   optional IndexPath client_position = 1;
> }
> {code}
> this gets translated to the following parquet schema using the new compliant 
> schema for lists:
> {code:java}
> message SomeEvent {
>   optional group client_position = 1 {
>     optional group index (LIST) = 1 {
>       repeated group list {
>         required int32 element;
>       }
>     }
>   }
> }
> {code}
> this causes parquet-cli cat to barf on a file containing these events:
> {quote}java.lang.RuntimeException: Failed on record 0
>         at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
>         at org.apache.parquet.cli.Main.run(Main.java:157)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>         at org.apache.parquet.cli.Main.main(Main.java:187)
> Caused by: java.lang.ClassCastException: required int32 element is not a group
>         at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
>         at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.(AvroRecordConverter.java:539)
>         at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:489)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:91)
>         at 
> org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
>         at 
> org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:142)
>         at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:190)
>         at 
> org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:166)
>         at 
> org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
>         at 
> org.apache.parquet.cli.BaseCommand$1$1.advance(BaseCommand.java:363)
>         at org.apache.parquet.cli.BaseCommand$1$1.(BaseCommand.java:344)
>         at org.apache.parquet.cli.BaseCommand$1.iterator(BaseCommand.java:342)
>         at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:73)
>         ... 3 more
> {quote}
> using the old parquet-tools binary to cat this file works fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated files

2022-09-03 Thread J Y (Jira)



[ https://issues.apache.org/jira/browse/PARQUET-2181 ]


J Y deleted comment on PARQUET-2181:
--

was (Author: jinyius):
so here's an interesting update.  one of my colleagues who is more familiar 
with this area suggested that the schema itself is probably too large.  i 
regenerated this data using a more limited recursion depth, and the parquet-cli 
tool worked fine.  i'm attaching the set of files that work 
([^sample-depth-1.tgz]).

so it seems the core issue here is that there's a hard limit somewhere on the 
schema sizes... 

> parquet-cli fails at supporting parquet-protobuf generated files
> 
>
> Key: PARQUET-2181
> URL: https://issues.apache.org/jira/browse/PARQUET-2181
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cli
>Reporter: J Y
>Priority: Critical
> Attachments: sample-depth-1.tgz, samples.tgz
>
>
> i generated a parquet file using a protobuf with this proto definition:
> {code:java}
> message IndexPath {
>   // Index of item in path.
>   repeated int32 index = 1;
> }
> message SomeEvent {
>   // truncated/obfuscated wrapper
>   optional IndexPath client_position = 1;
> }
> {code}
> this gets translated to the following parquet schema using the new compliant 
> schema for lists:
> {code:java}
> message SomeEvent {
>   optional group client_position = 1 {
>     optional group index (LIST) = 1 {
>       repeated group list {
>         required int32 element;
>       }
>     }
>   }
> }
> {code}
> this causes parquet-cli cat to barf on a file containing these events:
> {quote}java.lang.RuntimeException: Failed on record 0
>         at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
>         at org.apache.parquet.cli.Main.run(Main.java:157)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>         at org.apache.parquet.cli.Main.main(Main.java:187)
> Caused by: java.lang.ClassCastException: required int32 element is not a group
>         at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
>         at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.(AvroRecordConverter.java:539)
>         at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:489)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:91)
>         at 
> org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
>         at 
> org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:142)
>         at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:190)
>         at 
> org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:166)
>         at 
> org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
>         at 
> org.apache.parquet.cli.BaseCommand$1$1.advance(BaseCommand.java:363)
>         at org.apache.parquet.cli.BaseCommand$1$1.(BaseCommand.java:344)
>         at org.apache.parquet.cli.BaseCommand$1.iterator(BaseCommand.java:342)
>         at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:73)
>         ... 3 more
> {quote}
> using the old parquet-tools binary to cat this file works fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated files

2022-09-03 Thread J Y (Jira)



 [ 
https://issues.apache.org/jira/browse/PARQUET-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J Y updated PARQUET-2181:
-
Attachment: (was: sample-depth-1.tgz)

> parquet-cli fails at supporting parquet-protobuf generated files
> 
>
> Key: PARQUET-2181
> URL: https://issues.apache.org/jira/browse/PARQUET-2181
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cli
>Reporter: J Y
>Priority: Critical
> Attachments: samples.tgz
>
>
> i generated a parquet file using a protobuf with this proto definition:
> {code:java}
> message IndexPath {
>   // Index of item in path.
>   repeated int32 index = 1;
> }
> message SomeEvent {
>   // truncated/obfuscated wrapper
>   optional IndexPath client_position = 1;
> }
> {code}
> this gets translated to the following parquet schema using the new compliant 
> schema for lists:
> {code:java}
> message SomeEvent {
>   optional group client_position = 1 {
>     optional group index (LIST) = 1 {
>       repeated group list {
>         required int32 element;
>       }
>     }
>   }
> }
> {code}
> this causes parquet-cli cat to barf on a file containing these events:
> {quote}java.lang.RuntimeException: Failed on record 0
>         at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
>         at org.apache.parquet.cli.Main.run(Main.java:157)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>         at org.apache.parquet.cli.Main.main(Main.java:187)
> Caused by: java.lang.ClassCastException: required int32 element is not a group
>         at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
>         at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.(AvroRecordConverter.java:539)
>         at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:489)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:91)
>         at 
> org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
>         at 
> org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:142)
>         at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:190)
>         at 
> org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:166)
>         at 
> org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
>         at 
> org.apache.parquet.cli.BaseCommand$1$1.advance(BaseCommand.java:363)
>         at org.apache.parquet.cli.BaseCommand$1$1.(BaseCommand.java:344)
>         at org.apache.parquet.cli.BaseCommand$1.iterator(BaseCommand.java:342)
>         at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:73)
>         ... 3 more
> {quote}
> using the old parquet-tools binary to cat this file works fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated files

2022-09-03 Thread J Y (Jira)



[ 
https://issues.apache.org/jira/browse/PARQUET-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599874#comment-17599874
 ] 

J Y commented on PARQUET-2181:
--

here's the same data with a more limited recursion depth to keep the schema a 
more more manageable: [^sample-depth-1.tgz] 

> parquet-cli fails at supporting parquet-protobuf generated files
> 
>
> Key: PARQUET-2181
> URL: https://issues.apache.org/jira/browse/PARQUET-2181
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cli
>Reporter: J Y
>Priority: Critical
> Attachments: sample-depth-1.tgz, samples.tgz
>
>
> i generated a parquet file using a protobuf with this proto definition:
> {code:java}
> message IndexPath {
>   // Index of item in path.
>   repeated int32 index = 1;
> }
> message SomeEvent {
>   // truncated/obfuscated wrapper
>   optional IndexPath client_position = 1;
> }
> {code}
> this gets translated to the following parquet schema using the new compliant 
> schema for lists:
> {code:java}
> message SomeEvent {
>   optional group client_position = 1 {
>     optional group index (LIST) = 1 {
>       repeated group list {
>         required int32 element;
>       }
>     }
>   }
> }
> {code}
> this causes parquet-cli cat to barf on a file containing these events:
> {quote}java.lang.RuntimeException: Failed on record 0
>         at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
>         at org.apache.parquet.cli.Main.run(Main.java:157)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>         at org.apache.parquet.cli.Main.main(Main.java:187)
> Caused by: java.lang.ClassCastException: required int32 element is not a group
>         at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
>         at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.(AvroRecordConverter.java:539)
>         at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:489)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:91)
>         at 
> org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
>         at 
> org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:142)
>         at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:190)
>         at 
> org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:166)
>         at 
> org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
>         at 
> org.apache.parquet.cli.BaseCommand$1$1.advance(BaseCommand.java:363)
>         at org.apache.parquet.cli.BaseCommand$1$1.(BaseCommand.java:344)
>         at org.apache.parquet.cli.BaseCommand$1.iterator(BaseCommand.java:342)
>         at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:73)
>         ... 3 more
> {quote}
> using the old parquet-tools binary to cat this file works fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated files

2022-09-03 Thread J Y (Jira)



 [ 
https://issues.apache.org/jira/browse/PARQUET-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J Y updated PARQUET-2181:
-
Attachment: sample-depth-1.tgz

> parquet-cli fails at supporting parquet-protobuf generated files
> 
>
> Key: PARQUET-2181
> URL: https://issues.apache.org/jira/browse/PARQUET-2181
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cli
>Reporter: J Y
>Priority: Critical
> Attachments: sample-depth-1.tgz, samples.tgz
>
>
> i generated a parquet file using a protobuf with this proto definition:
> {code:java}
> message IndexPath {
>   // Index of item in path.
>   repeated int32 index = 1;
> }
> message SomeEvent {
>   // truncated/obfuscated wrapper
>   optional IndexPath client_position = 1;
> }
> {code}
> this gets translated to the following parquet schema using the new compliant 
> schema for lists:
> {code:java}
> message SomeEvent {
>   optional group client_position = 1 {
>     optional group index (LIST) = 1 {
>       repeated group list {
>         required int32 element;
>       }
>     }
>   }
> }
> {code}
> this causes parquet-cli cat to barf on a file containing these events:
> {quote}java.lang.RuntimeException: Failed on record 0
>         at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
>         at org.apache.parquet.cli.Main.run(Main.java:157)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>         at org.apache.parquet.cli.Main.main(Main.java:187)
> Caused by: java.lang.ClassCastException: required int32 element is not a group
>         at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
>         at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.(AvroRecordConverter.java:539)
>         at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:489)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:91)
>         at 
> org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
>         at 
> org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:142)
>         at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:190)
>         at 
> org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:166)
>         at 
> org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
>         at 
> org.apache.parquet.cli.BaseCommand$1$1.advance(BaseCommand.java:363)
>         at org.apache.parquet.cli.BaseCommand$1$1.(BaseCommand.java:344)
>         at org.apache.parquet.cli.BaseCommand$1.iterator(BaseCommand.java:342)
>         at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:73)
>         ... 3 more
> {quote}
> using the old parquet-tools binary to cat this file works fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated files

2022-09-07 Thread J Y (Jira)



[ 
https://issues.apache.org/jira/browse/PARQUET-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601491#comment-17601491
 ] 

J Y commented on PARQUET-2181:
--

[~theosib-amazon], i think they're similar in that there's schema issues w/ the 
avro conversion.  i believe the root cause is that using avro internally to 
read parquet files loses expressiveness, so you get avro schema validation or 
mismatched schema issues as a consequence.

specs-wise, you have to work around the limitations of avro's dsl to truly 
capture the parquet schema properly.  i believe people typically don't hit this 
since the typical open source pattern is to start w/ an avro schema as the 
basis.  for people who aren't, you'll have problems.

> parquet-cli fails at supporting parquet-protobuf generated files
> 
>
> Key: PARQUET-2181
> URL: https://issues.apache.org/jira/browse/PARQUET-2181
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cli
>Reporter: J Y
>Priority: Critical
> Attachments: sample-depth-1.tgz, samples.tgz
>
>
> i generated a parquet file using a protobuf with this proto definition:
> {code:java}
> message IndexPath {
>   // Index of item in path.
>   repeated int32 index = 1;
> }
> message SomeEvent {
>   // truncated/obfuscated wrapper
>   optional IndexPath client_position = 1;
> }
> {code}
> this gets translated to the following parquet schema using the new compliant 
> schema for lists:
> {code:java}
> message SomeEvent {
>   optional group client_position = 1 {
>     optional group index (LIST) = 1 {
>       repeated group list {
>         required int32 element;
>       }
>     }
>   }
> }
> {code}
> this causes parquet-cli cat to barf on a file containing these events:
> {quote}java.lang.RuntimeException: Failed on record 0
>         at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
>         at org.apache.parquet.cli.Main.run(Main.java:157)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>         at org.apache.parquet.cli.Main.main(Main.java:187)
> Caused by: java.lang.ClassCastException: required int32 element is not a group
>         at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
>         at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.(AvroRecordConverter.java:539)
>         at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:489)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:137)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:91)
>         at 
> org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
>         at 
> org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:142)
>         at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:190)
>         at 
> org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:166)
>         at 
> org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
>         at 
> org.apache.parquet.cli.BaseCommand$1$1.advance(BaseCommand.java:363)
>         at org.apache.parquet.cli.BaseCommand$1$1.(BaseCommand.java:344)
>         at org.apache.parquet.cli.BaseCommand$1.iterator(BaseCommand.java:342)
>         at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:73)
>         ... 3 more
> {quote}
> using the old parquet-tools binary to cat this file works fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type

2022-09-21 Thread J Y (Jira)



[ 
https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17608085#comment-17608085
 ] 

J Y commented on PARQUET-1711:
--

ping.. the pr i wrote is the byte encoding version that 
[~emkornfi...@gmail.com] suggested in 
https://issues.apache.org/jira/browse/PARQUET-1711?focusedCommentId=17543672&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17543672

> [parquet-protobuf] stack overflow when work with well known json type
> -
>
> Key: PARQUET-1711
> URL: https://issues.apache.org/jira/browse/PARQUET-1711
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: 1.10.1
>Reporter: Lawrence He
>Priority: Major
>
> Writing following protobuf message as parquet file is not possible: 
> {code:java}
> syntax = "proto3";
> import "google/protobuf/struct.proto";
> package test;
> option java_outer_classname = "CustomMessage";
> message TestMessage {
> map data = 1;
> } {code}
> Protobuf introduced "well known json type" such like 
> [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue]
>  to work around json schema conversion. 
> However writing above messages traps parquet writer into an infinite loop due 
> to the "general type" support in protobuf. Current implementation will keep 
> referencing 6 possible types defined in protobuf (null, bool, number, string, 
> struct, list) and entering infinite loop when referencing "struct".
> {code:java}
> java.lang.StackOverflowErrorjava.lang.StackOverflowError at 
> java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at 
> java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at 
> java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044)
>  at 
> java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type

2022-11-23 Thread J Y (Jira)



[ 
https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638109#comment-17638109
 ] 

J Y commented on PARQUET-1711:
--

this should be resolved after #995 now.

> [parquet-protobuf] stack overflow when work with well known json type
> -
>
> Key: PARQUET-1711
> URL: https://issues.apache.org/jira/browse/PARQUET-1711
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: 1.10.1
>Reporter: Lawrence He
>Priority: Major
>
> Writing following protobuf message as parquet file is not possible: 
> {code:java}
> syntax = "proto3";
> import "google/protobuf/struct.proto";
> package test;
> option java_outer_classname = "CustomMessage";
> message TestMessage {
> map data = 1;
> } {code}
> Protobuf introduced "well known json type" such like 
> [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue]
>  to work around json schema conversion. 
> However writing above messages traps parquet writer into an infinite loop due 
> to the "general type" support in protobuf. Current implementation will keep 
> referencing 6 possible types defined in protobuf (null, bool, number, string, 
> struct, list) and entering infinite loop when referencing "struct".
> {code:java}
> java.lang.StackOverflowErrorjava.lang.StackOverflowError at 
> java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at 
> java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at 
> java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044)
>  at 
> java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type

[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type

[jira] [Comment Edited] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type

[jira] [Created] (PARQUET-2180) make the default behavior for proto writing not-backwards compatible

[jira] [Updated] (PARQUET-2180) make the default behavior for proto writing not-backwards compatible

[jira] [Updated] (PARQUET-2180) make the default behavior for proto writing not-backwards compatible

[jira] [Created] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated schemas that have repeated primitives in them

[jira] [Updated] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated schemas that have repeated primitives in them

[jira] [Updated] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated schemas that have repeated primitives in them

[jira] [Updated] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated schemas that have repeated primitives in them

[jira] [Updated] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated schemas that have repeated primitives in them

[jira] [Updated] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated schemas that have repeated primitives in them

[jira] [Updated] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated schemas that have repeated primitives in them

[jira] [Updated] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated schemas that have repeated primitives in them

[jira] [Updated] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated schemas that have repeated primitives in them

[jira] [Updated] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated schemas that have repeated primitives in them

[jira] [Commented] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated schemas that have repeated primitives in them

[jira] [Updated] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated schemas that have repeated primitives in them

[jira] [Updated] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated files

[jira] [Commented] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated files

[jira] [Updated] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated files

[jira] [Commented] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated files

[jira] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated files

[jira] [Updated] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated files

[jira] [Commented] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated files

[jira] [Updated] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated files

[jira] [Commented] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated files

[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type

[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type

29 matches

Site Navigation

Mail list logo

Footer information