[ 
https://issues.apache.org/jira/browse/PARQUET-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17362991#comment-17362991
 ] 

Andreas Hailu edited comment on PARQUET-1681 at 6/14/21, 7:49 PM:
------------------------------------------------------------------

Hi folks, we're running in to this issue as well. Bob Smith was able to provide 
a unit test in 2018 that reproduces this issue in PARQUET-1254.

I personally like [~sha...@uber.com]'s idea of having some sort of 'metadata' 
field to store information like the list type rather than mangling the actual 
schema, so that the conversion Avro <-> Parquet schema is always compatible as 
to not create any unforeseen pitfalls.


was (Author: ahailu):
Hi folks, we're running in to this issue as well. Bob Smith was able to provide 
a unit test that reproduces this issue in PARQUET-1254.

I personally like Xinli's idea of having some sort of 'metadata' field to store 
information like the list type rather than mangling the actual schema, so that 
the conversion Avro <-> Parquet schema is always compatible as to not create 
any unforeseen pitfalls.

> Avro's isElementType() change breaks the reading of some parquet(1.8.1) files
> -----------------------------------------------------------------------------
>
>                 Key: PARQUET-1681
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1681
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-avro
>    Affects Versions: 1.10.0, 1.9.1, 1.11.0
>            Reporter: Xinli Shang
>            Assignee: Xinli Shang
>            Priority: Critical
>
> When using the Avro schema below to write a parquet(1.8.1) file and then read 
> back by using parquet 1.10.1 without passing any schema, the reading throws 
> an exception "XXX is not a group" . Reading through parquet 1.8.1 is fine. 
>            {
>               "name": "phones",
>               "type": [
>                 "null",
>                 {
>                   "type": "array",
>                   "items": {
>                     "type": "record",
>                     "name": "phones_items",
>                     "fields": [
>                       
> {                         "name": "phone_number",                         
> "type": [                           "null",                           
> "string"                         ],                         "default": null   
>                     }
>                     ]
>                   }
>                 }
>               ],
>               "default": null
>             }
> The code to read is as below 
>      val reader = 
> AvroParquetReader._builder_[SomeRecordType](parquetPath).withConf(*new*   
> Configuration).build()
>     reader.read()
> PARQUET-651 changed the method isElementType() by relying on Avro's 
> checkReaderWriterCompatibility() to check the compatibility. However, 
> checkReaderWriterCompatibility() consider the ParquetSchema and the 
> AvroSchema(converted from File schema) as not compatible(the name in avro 
> schema is ‘phones_items’, but the name is ‘array’ in Parquet schema, hence 
> not compatible) . Hence return false and caused the “phone_number” field in 
> the above schema to be considered as group type which is not true. Then the 
> exception throws as .asGroupType(). 
> I didn’t try writing via parquet 1.10.1 would reproduce the same problem or 
> not. But it could because the translation of Avro schema to Parquet schema is 
> not changed(didn’t verify yet). 
>  I hesitate to revert PARQUET-651 because it solved several problems. I would 
> like to hear the community's thoughts on it. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to