[ 
https://issues.apache.org/jira/browse/PIG-5380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5380:
------------------------------
    Attachment: pig-5380-v01.patch

Attaching a patch {{pig-5380-v01.patch}}.
Without the change to SortedDataBag, test cases will fail with
{noformat}
Testcase: testSortedSpillDuringPriorityQueueCreation took 0.213 sec
    Caused an ERROR
null
java.util.ConcurrentModificationException
    at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
    at java.util.ArrayList$Itr.next(ArrayList.java:851)
    at 
org.apache.pig.data.SortedDataBag$SortedDataBagIterator.readFromPriorityQ(SortedDataBag.java:348)
    at 
org.apache.pig.data.SortedDataBag$SortedDataBagIterator.next(SortedDataBag.java:322)
    at 
org.apache.pig.data.SortedDataBag$SortedDataBagIterator.hasNext(SortedDataBag.java:235)
    at 
org.apache.pig.test.TestDataBag.testSortedSpillDuringPriorityQueueCreation(TestDataBag.java:1333)

{noformat}
and
{noformat}
Testcase: testSortedSpillDuringPriorityQueueCreation2 took 1.012 sec
    FAILED
tuples should be the same expected:<(-2147483648)> but was:<(-2055861747)>
junit.framework.AssertionFailedError: tuples should be the same 
expected:<(-2147483648)> but was:<(-2055861747)>
    at 
org.apache.pig.test.TestDataBag.testSortedSpillDuringPriorityQueueCreation2(TestDataBag.java:1419)

Testcase: testSortedFirstSpillDuringRead took 0.003 sec
{noformat}
Basically ConcurrentModificationException can happen when new spill file is 
added while SortedDataBag is creating a priority queue at 
 
[https://github.com/apache/pig/blob/01b7a50657b46d346f0a8f472c92fdba72819a24/src/org/apache/pig/data/SortedDataBag.java#L344-L360]

and missing value can happen when spilling occurs after files are read but 
before memory is being checked at 
 
[https://github.com/apache/pig/blob/01b7a50657b46d346f0a8f472c92fdba72819a24/src/org/apache/pig/data/SortedDataBag.java#L361]
Also the smallest value has to be in memory.

In short, ConcurrentModificationException can happen when there are a lot of 
spills but chances of missing value is very small. Please note that test cases 
may not reliably fail. I tried to insert a short sleep to increase the chances 
of reproducing these race conditions.

Also, note that we probably didn't observe these bugs since our framework 
stopped using SortedDataBag a long time back when we switched to using 
InternalSortedBag.

> SortedDataBag hitting ConcurrentModificationException or producing incorrect 
> output in a corner-case 
> -----------------------------------------------------------------------------------------------------
>
>                 Key: PIG-5380
>                 URL: https://issues.apache.org/jira/browse/PIG-5380
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Major
>         Attachments: pig-5380-v01.patch
>
>
> User had a UDF that created large SortedDataBag.  This UDF was failing with 
> {noformat}
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
>   at java.util.ArrayList$Itr.next(ArrayList.java:851)
>   at 
> org.apache.pig.data.SortedDataBag$SortedDataBagIterator.readFromPriorityQ(SortedDataBag.java:346)
>   at 
> org.apache.pig.data.SortedDataBag$SortedDataBagIterator.next(SortedDataBag.java:322)
>   at 
> org.apache.pig.data.SortedDataBag$SortedDataBagIterator.hasNext(SortedDataBag.java:235)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to