[ 
https://issues.apache.org/jira/browse/PARQUET-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17736990#comment-17736990
 ] 

Gabor Szadovszky commented on PARQUET-2317:
-------------------------------------------

[~wgtmac], Let me summarize the history of this. parquet-format contains all 
the specification docs and the parquet.thrift itself which is a kind of source 
code and spec at the same time. This is good to have all of these separated 
from the implementations. Meanwhile, since the thrift file is there, it was 
natural to have Thrift code generation and the Util there as well. But it was 
not a good choice since we only had the java code there. In some new features 
we had to extend Util which is clearly related to parquet-mr. So, we decided to 
deprecate all of the java related stuff in parquet-format and moved them to 
parquet-format-structures under parquet-mr.
So, it would be good to not only have Util be removed but all the other java 
classes including the Thrift generated ones to be part of the jar.
The catch is we still need to have some mechanism that validates the thrift 
file so we won't add invalid changes. Also, the distribution should be changed 
because providing a jar file without java classes would not make sense. I 
think, we should release a tarball instead that contains all the specs and the 
thrift file as well. Of course, we would need to update the parquet-mr (and 
maybe other affected implementations) to download that tarball instead of the 
jar file.

> parquet-format and parquet-format-structures defines Util with inconsitent 
> methods provided
> -------------------------------------------------------------------------------------------
>
>                 Key: PARQUET-2317
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2317
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-format
>    Affects Versions: 1.12.0, 1.13.0
>            Reporter: Joey Pereira
>            Priority: Major
>
> I have been running into a bug due to {{parquet-format}} and 
> {{parquet-format-structures}} both defining the 
> {{org.apache.parquet.format.Util}} class but doing so inconsistently.
> Examples of this are several methods which include a {{BlockCipher}} 
> parameter that are defined from {{parquet-format-structures}} but not 
> {{{}parquet-format{}}}. While invoking code that happens to use these, such 
> as {{{}org.apache.parquet.hadoop.ParquetFileReader.readFooter{}}}, the code 
> will fail if the {{parquet-format}} happens to be loaded first on the 
> classpath.
> Here is an example stack trace for a Scala Spark application.
> {code:java}
> Caused by: java.lang.NoSuchMethodError: 
> 'org.apache.parquet.format.FileMetaData 
> org.apache.parquet.format.Util.readFileMetaData(java.io.InputStream, 
> org.apache.parquet.format.BlockCipher$Decryptor, byte[])'
> at 
> org.apache.parquet.format.converter.ParquetMetadataConverter$3.visit(ParquetMetadataConverter.java:1441)
>  ~[parquet_hadoop.jar:1.13.1]
> at 
> org.apache.parquet.format.converter.ParquetMetadataConverter$3.visit(ParquetMetadataConverter.java:1438)
>  ~[parquet_hadoop.jar:1.13.1]
> at 
> org.apache.parquet.format.converter.ParquetMetadataConverter$NoFilter.accept(ParquetMetadataConverter.java:1173)
>  ~[parquet_hadoop.jar:1.13.1]
> at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:1438)
>  ~[parquet_hadoop.jar:1.13.1]
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:591)
>  ~[parquet_hadoop.jar:1.13.1]
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:536)
>  ~[parquet_hadoop.jar:1.13.1]
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:530)
>  ~[parquet_hadoop.jar:1.13.1]
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:478)
>  ~[parquet_hadoop.jar:1.13.1]
> ... (my application code invoking the above)
> {code}
> Because of issues external to Parquet that I have yet to figure out (a 
> complex Spark and dependency setup), my classpaths are not deterministically 
> ordered and I am unable to pin the {{parquet-format-structures}} ahead hence 
> why I'm chiming in about this.
> Even if that weren't the case, this is a fairly prickly edge to run into as 
> both modules define overlapping classes. {{Util}} is not the only class that 
> appears to be defined by both, just what I have been focusing on due to this 
> bug.
> It appears these methods were introduced in at least 1.12: 
> [https://github.com/apache/parquet-mr/commit/65b95fb72be8f5a8a193a6f7bc4560fdcd742fc7#diff-852341c99dcae06c8fa2b764bcf3d9e6860e40442d0ab1cf5b935df80a9cacb7]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to