[ 
https://issues.apache.org/jira/browse/HBASE-18168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gavin updated HBASE-18168:
--------------------------
    Comment: was deleted

(was: A comment with security level 'jira-users' was removed.)

> NoSuchElementException when rolling the log
> -------------------------------------------
>
>                 Key: HBASE-18168
>                 URL: https://issues.apache.org/jira/browse/HBASE-18168
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.1.11
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>            Priority: Major
>             Fix For: 1.1.11
>
>         Attachments: HBASE-18168-branch-1.1.patch, 
> HBASE-18168-branch-1.1.v2.patch, HBASE-18168-branch-1.1.v3.patch
>
>
> Today, one of our server aborted due to the following log.
> {code}
> 2017-06-06 05:38:47,142 ERROR [regionserver/xxxx.logRoller] 
> regionserver.LogRoller: Log rolling failed
> java.util.NoSuchElementException
>         at 
> java.util.concurrent.ConcurrentSkipListMap$Iter.advance(ConcurrentSkipListMap.java:2224)
>         at 
> java.util.concurrent.ConcurrentSkipListMap$ValueIterator.next(ConcurrentSkipListMap.java:2253)
>         at java.util.Collections.min(Collections.java:628)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog.findEligibleMemstoresToFlush(FSHLog.java:861)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog.findRegionsToForceFlush(FSHLog.java:886)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog.rollWriter(FSHLog.java:728)
>         at 
> org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:137)
>         at java.lang.Thread.run(Thread.java:756)
> 2017-06-06 05:38:47,142 FATAL [regionserver/xxxx.logRoller] 
> regionserver.HRegionServer: ABORTING region server xxxx: Log rolling failed
> java.util.NoSuchElementException
> ......
> {code}
> The code is here: 
> {code}
> private byte[][] findEligibleMemstoresToFlush(Map<byte[], Long> 
> regionsSequenceNums) {
>     List<byte[]> regionsToFlush = null;
>     // Keeping the old behavior of iterating unflushedSeqNums under 
> oldestSeqNumsLock.
>     synchronized (regionSequenceIdLock) {
>       for (Map.Entry<byte[], Long> e: regionsSequenceNums.entrySet()) {
>         ConcurrentMap<byte[], Long> m =
>             this.oldestUnflushedStoreSequenceIds.get(e.getKey());
>         if (m == null) {
>           continue;
>         }
>         long unFlushedVal = Collections.min(m.values()); //The exception is 
> thrown here
>         ......
> {code}
> The map 'm' is empty is the only reason I can think of why 
> NoSuchElementException is thrown. I then looked up all code related to the 
> update of 'oldestUnflushedStoreSequenceIds'. All update to 
> 'oldestUnflushedStoreSequenceIds' is guarded by the synchronization of 
> 'regionSequenceIdLock' except here:
> {code}
> private ConcurrentMap<byte[], Long> 
> getOrCreateOldestUnflushedStoreSequenceIdsOfRegion(
>       byte[] encodedRegionName) {
>     ......
>     oldestUnflushedStoreSequenceIdsOfRegion =
>         new ConcurrentSkipListMap<byte[], Long>(Bytes.BYTES_COMPARATOR);
>     ConcurrentMap<byte[], Long> alreadyPut =
>         oldestUnflushedStoreSequenceIds.putIfAbsent(encodedRegionName,
>           oldestUnflushedStoreSequenceIdsOfRegion); // Here, a empty map may 
> put to 'oldestUnflushedStoreSequenceIds' with no synchronization
>     return alreadyPut == null ? oldestUnflushedStoreSequenceIdsOfRegion : 
> alreadyPut;
>   }
> {code}
> It should be a very rare bug. But it can lead to server abort. It only exists 
> in branch-1.1. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to