[ https://issues.apache.org/jira/browse/HBASE-13329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lars Hofhansl reassigned HBASE-13329: ------------------------------------- Assignee: Lars Hofhansl > ArrayIndexOutOfBoundsException in CellComparator#getMinimumMidpointArray > ------------------------------------------------------------------------ > > Key: HBASE-13329 > URL: https://issues.apache.org/jira/browse/HBASE-13329 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 1.0.1 > Environment: linux-debian-jessie > ec2 - t2.micro instances > Reporter: Ruben Aguiar > Assignee: Lars Hofhansl > Priority: Critical > Attachments: 13329-asserts.patch, 13329-v1.patch, > HBASE-13329.test.00.branch-1.1.patch > > > While trying to benchmark my opentsdb cluster, I've created a script that > sends to hbase always the same value (in this case 1). After a few minutes, > the whole region server crashes and the region itself becomes impossible to > open again (cannot assign or unassign). After some investigation, what I saw > on the logs is that when a Memstore flush is called on a large region (128mb) > the process errors, killing the regionserver. On restart, replaying the edits > generates the same error, making the region unavailable. Tried to manually > unassign, assign or close_region. That didn't work because the code that > reads/replays it crashes. > From my investigation this seems to be an overflow issue. The logs show that > the function getMinimumMidpointArray tried to access index -32743 of an > array, extremely close to the minimum short value in Java. Upon investigation > of the source code, it seems an index short is used, being incremented as > long as the two vectors are the same, probably making it overflow on large > vectors with equal data. Changing it to int should solve the problem. > Here follows the hadoop logs of when the regionserver went down. Any help is > appreciated. Any other information you need please do tell me: > 2015-03-24 18:00:56,187 INFO [regionserver//10.2.0.73:16020.logRoller] > wal.FSHLog: Rolled WAL > /hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427220018516 > with entries=143, filesize=134.70 MB; new WAL > /hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427220056140 > 2015-03-24 18:00:56,188 INFO [regionserver//10.2.0.73:16020.logRoller] > wal.FSHLog: Archiving > hdfs://10.2.0.74:8020/hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427219987709 > to > hdfs://10.2.0.74:8020/hbase/oldWALs/10.2.0.73%2C16020%2C1427216382590.default.1427219987709 > 2015-03-24 18:04:35,722 INFO [MemStoreFlusher.0] regionserver.HRegion: > Started memstore flush for > tsdb,,1427133969325.52bc1994da0fea97563a4a656a58bec2., current region > memstore size 128.04 MB > 2015-03-24 18:04:36,154 FATAL [MemStoreFlusher.0] regionserver.HRegionServer: > ABORTING region server 10.2.0.73,16020,1427216382590: Replay of WAL required. > Forcing server shutdown > org.apache.hadoop.hbase.DroppedSnapshotException: region: > tsdb,,1427133969325.52bc1994da0fea97563a4a656a58bec2. > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1999) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1770) > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1702) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:445) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:407) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:69) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:225) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ArrayIndexOutOfBoundsException: -32743 > at > org.apache.hadoop.hbase.CellComparator.getMinimumMidpointArray(CellComparator.java:478) > at > org.apache.hadoop.hbase.CellComparator.getMidpoint(CellComparator.java:448) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishBlock(HFileWriterV2.java:165) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterV2.checkBlockBoundary(HFileWriterV2.java:146) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:263) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87) > at > org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:932) > at > org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:121) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:71) > at > org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:879) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2128) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1953) > ... 7 more > 2015-03-24 18:04:36,156 FATAL [MemStoreFlusher.0] regionserver.HRegionServer: > RegionServer abort: loaded coprocessors are: > [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] -- This message was sent by Atlassian JIRA (v6.3.4#6332)