[ https://issues.apache.org/jira/browse/PARQUET-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17279551#comment-17279551 ]
Antoine Pitrou commented on PARQUET-1974: ----------------------------------------- [~gszadovszky] The status is that the Parquet-C++ developers did all they could to try and achieve compatibility. Now it's up to parquet-mr to pick up the ball. [~mario.luzi] Can you post the Parquet file so that they can take a look? If there is no resolution on the Java side I will ask for LZ4 to be dropped from the Parquet spec. Right now Parquet is partly a proprietary standard due to under-specification. > LZ4 decoding is not working over hadoop > ---------------------------------------- > > Key: PARQUET-1974 > URL: https://issues.apache.org/jira/browse/PARQUET-1974 > Project: Parquet > Issue Type: Bug > Components: parquet-mr > Affects Versions: 1.11.1 > Reporter: mario luzi > Priority: Critical > > Hello , we just tried latest apache-arrow version 3.0.0 and the write example > included in low level api example, but lz4 still seems not compatible with > Hadoop . we got this error reading over hadoop file parquet produced with > 3.0.0 library : > [leal@sulu parquet]$ ./hadoop-3.2.2/bin/hadoop jar > apache-parquet-1.11.1/parquet-tools/target/parquet-tools-1.11.1.jar head > --debug parquet_2_0_example2.parquet > 2021-02-04 21:24:36,354 INFO hadoop.InternalParquetRecordReader: RecordReader > initialized will read a total of 1500001 records. > 2021-02-04 21:24:36,355 INFO hadoop.InternalParquetRecordReader: at row 0. > reading next block > 2021-02-04 21:24:36,397 INFO compress.CodecPool: Got brand-new decompressor > [.lz4] > 2021-02-04 21:24:36,410 INFO hadoop.InternalParquetRecordReader: block read > in memory in 55 ms. row count = 434436 > org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in > block -1 in file > [file:/home/leal/parquet/parquet_2_0_example2.parquet|file://home/leal/parquet/parquet_2_0_example2.parquet] > at > org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:255) > at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:132) > at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136) > at org.apache.parquet.tools.command.HeadCommand.execute(HeadCommand.java:87) > at org.apache.parquet.tools.Main.main(Main.java:223) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:323) > at org.apache.hadoop.util.RunJar.main(RunJar.java:236) > Caused by: java.lang.IllegalArgumentException > at java.nio.Buffer.limit(Buffer.java:275) > at > org.apache.hadoop.io.compress.lz4.Lz4Decompressor.decompress(Lz4Decompressor.java:232) > at > org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:88) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105) > at java.io.DataInputStream.readFully(DataInputStream.java:195) > > any advice ? we need to write Lz4 files by C++ and read oover Hadoop jobs but > still stuck on this problem . -- This message was sent by Atlassian Jira (v8.3.4#803005)