Saurabh Seth created HIVE-20664:
-----------------------------------
Summary: Potential ArrayIndexOutOfBoundsException in
VectorizedOrcAcidRowBatchReader.findMinMaxKeys
Key: HIVE-20664
URL: https://issues.apache.org/jira/browse/HIVE-20664
Project: Hive
Issue Type: Bug
Components: Transactions
Reporter: Saurabh Seth
Assignee: Saurabh Seth
[~ekoifman], could you please confirm if my understanding is correct and if so,
review the fix?
In the method {{VectorizedOrcAcidRowBatchReader.findMinMaxKeys}}, the code
snippet that identifies the first and last stripe indices in the current split
could result in an ArrayIndexOutOfBoundsException if a complete split is within
the same stripe:
{noformat}
for(int i = 0; i < stripes.size(); i++) {
StripeInformation stripe = stripes.get(i);
long stripeEnd = stripe.getOffset() + stripe.getLength();
if(firstStripeIndex == -1 && stripe.getOffset() >= splitStart) {
firstStripeIndex = i;
}
if(lastStripeIndex == -1 && splitEnd <= stripeEnd &&
stripes.get(firstStripeIndex).getOffset() <= stripe.getOffset() ) {
//the last condition is for when both splitStart and splitEnd are in
// the same stripe
lastStripeIndex = i;
}
}
{noformat}
Consider the example where there are 2 stripes - 0-500 and 500-1000 and
splitStart is 600 and splitEnd is 800.
In the first iteration of the loop, stripe.getOffset() is 0 and stripeEnd is
500. In this iteration, neither of the if statement conditions will be met and
firstSripeIndex as well as lastStripeIndex remain -1.
In the second iteration of the loop stripe.getOffset() is 500, stripeEnd is
1000, The first if statement condition will not be met in this case because
stripe's offset (500) is not greater than or equal to the splitStart (600).
However, in the second if statement, splitEnd (800) is <= stripeEnd(1000) and
it will try to compute the last condition
{{stripes.get(firstStripeIndex).getOffset() <= stripe.getOffset()}}. This will
throw an ArrayIndexOutOfBoundsException because firstStripeIndex is still -1.
I'm not sure if this scenario is possible at all, hence logging this as a low
priority issue. Perhaps block based split generation using BISplitStrategy
could trigger this?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)