[
https://issues.apache.org/jira/browse/PARQUET-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514816#comment-14514816
]
Ryan Blue commented on PARQUET-258:
-----------------------------------
I think this duplicates PARQUET-251. [[email protected]], could you
verify that?
> Binary statistics is not updated correctly if an underlying Binary array is
> modified in place
> ---------------------------------------------------------------------------------------------
>
> Key: PARQUET-258
> URL: https://issues.apache.org/jira/browse/PARQUET-258
> Project: Parquet
> Issue Type: Bug
> Components: parquet-mr
> Affects Versions: 1.6.0
> Reporter: Konstantin Shaposhnikov
>
> The following test case shows the problem:
> {code}
> byte[] bytes = new byte[] { 49 };
> BinaryStatistics reusableStats = new BinaryStatistics();
> reusableStats.updateStats(Binary.fromByteArray(bytes));
> bytes[0] = 50;
> reusableStats.updateStats(Binary.fromByteArray(bytes, 0, 1));
>
> assertArrayEquals(new byte[] { 49 }, reusableStats.getMinBytes());
> assertArrayEquals(new byte[] { 50 }, reusableStats.getMaxBytes());
> {code}
> I discovered the bug when converting an AVRO file to a Parquet file by
> reading GenericRecords from a file using [DataFileStream.next(D
> reuse)|http://javadox.com/org.apache.avro/avro/1.7.6/org/apache/avro/file/DataFileStream.html#next(D)]
> method. The problem is that underlying byte array of avro Utf8 object is
> passed to parquet that saves it as part of BinaryStatistics and then the same
> array is modified in place on the next read.
> I am not sure what is the right way to fix the problem (in BinaryStatistics
> or AvroWriteSupport).
> If BinaryStatistics implementation is correct (for performance reasons) then
> this behavior should be documented and AvroWriteSupport.fromAvroString should
> be fixed to duplicate underlying Utf8 array.
> I am happy to create a pull request once the desired way to fix the issue is
> discussed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)