[
https://issues.apache.org/jira/browse/PARQUET-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348030#comment-14348030
]
Ryan Blue commented on PARQUET-180:
-----------------------------------
I updated the PR to use the one-time reflection strategy. I verified that it is
binary compatible with 0.9.2, but cannot be built against 0.9.2.
I'd like to find a better way to introduce this check, maybe one that works
with all of the protocols by limiting the ByteBuffer (or whatever) passed to
thrift?
> Parquet-thrift compile issue with 0.9.2.
> ----------------------------------------
>
> Key: PARQUET-180
> URL: https://issues.apache.org/jira/browse/PARQUET-180
> Project: Parquet
> Issue Type: Bug
> Reporter: Ryan Blue
>
> Thrift 0.9.2 removed
> [{{setReadLength}}|https://github.com/apache/thrift/commit/2ca9c2028593782621c8876817d8772aa5f46ac7].
> This causes parquet-thrift to fail because it is called for TBinaryProtocol.
> The reason we use it is defensive: a size is read from the data and then that
> many bytes are read, so using this method sets a maximum and causes an
> exception rather than a strange failure later on. The code also has a comment
> that says it is okay when it can't be used.
> {code}
> /* Reduce the chance of OOM when data is corrupted. When readBinary is
> called on TBinaryProtocol, it reads the length of the binary first,
> so if the data is corrupted, it could read a big integer as the length
> of the binary and therefore causes OOM to happen.
> Currently this fix only applies to TBinaryProtocol which has the
> setReadLength defined.
> */
> if (protocol instanceof TBinaryProtocol) {
> ((TBinaryProtocol)protocol).setReadLength(record.getLength());
> }
> {code}
> I think the fix is to remove the section above.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)