[
https://issues.apache.org/jira/browse/PARQUET-244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alosh Bennett updated PARQUET-244:
----------------------------------
Description:
DeltaByteArrayReader.readBytes() fails with ArrayIndexOutOfBoundsException soon
after it has processed a new page via initFromPage(). This issue can be
reproduced by trying to read a Binary column that is encoded using delta byte
array and spans multiple pages.
This is happening because ColumnReaderImpl.initDataReader() creates a new
ValueReader every time a new page is processed (see _this.dataColumn =
dataEncoding.getValuesReader(path, VALUES)_). The DeltaByteArrayReader is
stateful and needs to remember the _previous_ Binary value that was read across
pages. When a new DeltaByteArrayReader is created, this information is lost.
was:
DeltaByteArrayReader.readBytes() fails with ArrayIndexOutOfBoundsException soon
after it has processed a new page via initFromPage(). This issue can be
reproduced by trying to read a Binary column that is encoded using delta byte
array and spans multiple pages.
This is happening because ColumnReaderImpl.initDataReader() creates a new
ValueReader every time a new page is processed (see _this.dataColumn =
dataEncoding.getValuesReader(path, VALUES)_). The DeltaByteArrayReader is
stateful and needs to remember the _previous_ Binary value across pages. When a
new DeltaByteArrayReader is created, this information is lost.
> DeltaByteArrayReader fails with ArrayIndexOutOfBoundsException when moving
> across pages
> ---------------------------------------------------------------------------------------
>
> Key: PARQUET-244
> URL: https://issues.apache.org/jira/browse/PARQUET-244
> Project: Parquet
> Issue Type: Bug
> Components: parquet-mr
> Affects Versions: parquet-mr_1.6.0
> Reporter: Alosh Bennett
>
> DeltaByteArrayReader.readBytes() fails with ArrayIndexOutOfBoundsException
> soon after it has processed a new page via initFromPage(). This issue can be
> reproduced by trying to read a Binary column that is encoded using delta byte
> array and spans multiple pages.
> This is happening because ColumnReaderImpl.initDataReader() creates a new
> ValueReader every time a new page is processed (see _this.dataColumn =
> dataEncoding.getValuesReader(path, VALUES)_). The DeltaByteArrayReader is
> stateful and needs to remember the _previous_ Binary value that was read
> across pages. When a new DeltaByteArrayReader is created, this information is
> lost.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)