bright chen created APEXMALHAR-2197:
---------------------------------------
Summary: TimeBasedPriorityQueue.removeLRU throws
NoSuchElementException
Key: APEXMALHAR-2197
URL: https://issues.apache.org/jira/browse/APEXMALHAR-2197
Project: Apache Apex Malhar
Issue Type: Bug
Reporter: bright chen
Assignee: bright chen
When there are lots of keys and large throughput, the following exception will
be throw.
java.util.NoSuchElementException
at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1113)
at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169)
at
org.apache.apex.malhar.lib.state.spillable.TimeBasedPriorityQueue.removeLRU(TimeBasedPriorityQueue.java:75)
at
org.apache.apex.malhar.lib.state.spillable.WindowBoundedMapCache.endWindow(WindowBoundedMapCache.java:119)
at
org.apache.apex.malhar.lib.state.spillable.SpillableByteArrayListMultimapImpl.endWindow(SpillableByteArrayListMultimapImpl.java:281)
at
com.datatorrent.benchmark.spillable.SpillableDSBenchmarkTest.testSpillableMutimap(SpillableDSBenchmarkTest.java:139)
The reason is as following:
In TimeBasedPriorityQueue.TimeWrapper<T>, when implement compareTo(), only
compare by time. While in equals(), compare both time and key.
Generally, "compareTo() == 0” means “equals() == true”. But in
TimeBasedPriorityQueue, compareTo() was used by “sortedTimestamp” and it would
be OK only compare by time.
But it would cause problem in removeLRU() if lots of key written in same time.
In this case, one time(even different key) is one entry of sortedTimestamp, but
the counter and count was calculated by key.
So the counter could be much larger than the number of entry of
sortedTimestamp. And NoSuchElementException would throw when reach the end of
the iterator.
Solutions:
Add key as the less important factor could solve this problem. But probably
need to limit key to support compareTo()
Another idea I think also work is use two maps, the benefit is don’t need to
compare by key( only compare by time)
- timeToKeys: multiple list map from time to keys
- keyToRecentTime: key to most recent time
implementation:
- upSert(T key): get time from keyToRecentTime map; if don’t have time, put
(key, time) to both keyToRecentTime and timeToKeys; if have time and time is
different, remove key from timeToKeys, update keyToRecentTime
- removeLRU(int count): get time sortedTimestamp, and then get and remove keys
from timeToKeys, and remove entry from keyToRecentTime.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)