[
https://issues.apache.org/jira/browse/PARQUET-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17362991#comment-17362991
]
Andreas Hailu commented on PARQUET-1681:
----------------------------------------
Hi folks, we're running in to this issue as well. Bob Smith was able to provide
a unit test that reproduces this issue in PARQUET-1254.
I personally like Xinli's idea of having some sort of 'metadata' field to store
information like the list type rather than mangling the actual schema, so that
the conversion Avro <-> Parquet schema is always compatible as to not create
any unforeseen pitfalls.
> Avro's isElementType() change breaks the reading of some parquet(1.8.1) files
> -----------------------------------------------------------------------------
>
> Key: PARQUET-1681
> URL: https://issues.apache.org/jira/browse/PARQUET-1681
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-avro
> Affects Versions: 1.10.0, 1.9.1, 1.11.0
> Reporter: Xinli Shang
> Assignee: Xinli Shang
> Priority: Critical
>
> When using the Avro schema below to write a parquet(1.8.1) file and then read
> back by using parquet 1.10.1 without passing any schema, the reading throws
> an exception "XXX is not a group" . Reading through parquet 1.8.1 is fine.
> {
> "name": "phones",
> "type": [
> "null",
> {
> "type": "array",
> "items": {
> "type": "record",
> "name": "phones_items",
> "fields": [
>
> { "name": "phone_number",
> "type": [ "null",
> "string" ], "default": null
> }
> ]
> }
> }
> ],
> "default": null
> }
> The code to read is as below
> val reader =
> AvroParquetReader._builder_[SomeRecordType](parquetPath).withConf(*new*
> Configuration).build()
> reader.read()
> PARQUET-651 changed the method isElementType() by relying on Avro's
> checkReaderWriterCompatibility() to check the compatibility. However,
> checkReaderWriterCompatibility() consider the ParquetSchema and the
> AvroSchema(converted from File schema) as not compatible(the name in avro
> schema is ‘phones_items’, but the name is ‘array’ in Parquet schema, hence
> not compatible) . Hence return false and caused the “phone_number” field in
> the above schema to be considered as group type which is not true. Then the
> exception throws as .asGroupType().
> I didn’t try writing via parquet 1.10.1 would reproduce the same problem or
> not. But it could because the translation of Avro schema to Parquet schema is
> not changed(didn’t verify yet).
> I hesitate to revert PARQUET-651 because it solved several problems. I would
> like to hear the community's thoughts on it.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)