[ https://issues.apache.org/jira/browse/PIG-5380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Koji Noguchi updated PIG-5380: ------------------------------ Attachment: pig-5380-v01.patch Attaching a patch {{pig-5380-v01.patch}}. Without the change to SortedDataBag, test cases will fail with {noformat} Testcase: testSortedSpillDuringPriorityQueueCreation took 0.213 sec Caused an ERROR null java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) at java.util.ArrayList$Itr.next(ArrayList.java:851) at org.apache.pig.data.SortedDataBag$SortedDataBagIterator.readFromPriorityQ(SortedDataBag.java:348) at org.apache.pig.data.SortedDataBag$SortedDataBagIterator.next(SortedDataBag.java:322) at org.apache.pig.data.SortedDataBag$SortedDataBagIterator.hasNext(SortedDataBag.java:235) at org.apache.pig.test.TestDataBag.testSortedSpillDuringPriorityQueueCreation(TestDataBag.java:1333) {noformat} and {noformat} Testcase: testSortedSpillDuringPriorityQueueCreation2 took 1.012 sec FAILED tuples should be the same expected:<(-2147483648)> but was:<(-2055861747)> junit.framework.AssertionFailedError: tuples should be the same expected:<(-2147483648)> but was:<(-2055861747)> at org.apache.pig.test.TestDataBag.testSortedSpillDuringPriorityQueueCreation2(TestDataBag.java:1419) Testcase: testSortedFirstSpillDuringRead took 0.003 sec {noformat} Basically ConcurrentModificationException can happen when new spill file is added while SortedDataBag is creating a priority queue at [https://github.com/apache/pig/blob/01b7a50657b46d346f0a8f472c92fdba72819a24/src/org/apache/pig/data/SortedDataBag.java#L344-L360] and missing value can happen when spilling occurs after files are read but before memory is being checked at [https://github.com/apache/pig/blob/01b7a50657b46d346f0a8f472c92fdba72819a24/src/org/apache/pig/data/SortedDataBag.java#L361] Also the smallest value has to be in memory. In short, ConcurrentModificationException can happen when there are a lot of spills but chances of missing value is very small. Please note that test cases may not reliably fail. I tried to insert a short sleep to increase the chances of reproducing these race conditions. Also, note that we probably didn't observe these bugs since our framework stopped using SortedDataBag a long time back when we switched to using InternalSortedBag. > SortedDataBag hitting ConcurrentModificationException or producing incorrect > output in a corner-case > ----------------------------------------------------------------------------------------------------- > > Key: PIG-5380 > URL: https://issues.apache.org/jira/browse/PIG-5380 > Project: Pig > Issue Type: Bug > Reporter: Koji Noguchi > Assignee: Koji Noguchi > Priority: Major > Attachments: pig-5380-v01.patch > > > User had a UDF that created large SortedDataBag. This UDF was failing with > {noformat} > java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) > at java.util.ArrayList$Itr.next(ArrayList.java:851) > at > org.apache.pig.data.SortedDataBag$SortedDataBagIterator.readFromPriorityQ(SortedDataBag.java:346) > at > org.apache.pig.data.SortedDataBag$SortedDataBagIterator.next(SortedDataBag.java:322) > at > org.apache.pig.data.SortedDataBag$SortedDataBagIterator.hasNext(SortedDataBag.java:235) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)