[ 
https://issues.apache.org/jira/browse/HBASE-13329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614167#comment-14614167
 ] 

Dinh Duong Mai commented on HBASE-13329:
----------------------------------------

I run a python script as below to send data to OpenTSDB:

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("192.168.56.101", 4242))

start_epoch = 1300000000;

for epoch_in_sec in range(start_epoch, start_epoch + 2001): # 2000 seconds
        for epoch_msec_offset in xrange(0, 1000, 100):          # time 
resolution of 100 milliseconds
                epoch_in_msec = epoch_in_sec + epoch_msec_offset;
                
                for tag in xrange(1, 101, 10):          # 100 metrics, from 
TAG_1 to TAG_100
                        tag1 = "put TAG_%d %d 12.9 stt=good\n" % (tag + 1, 
epoch_in_msec)
                        tag2 = "put TAG_%d %d 12.9 stt=good\n" % (tag + 2, 
epoch_in_msec)
                        tag3 = "put TAG_%d %d 12.9 stt=good\n" % (tag + 3, 
epoch_in_msec)
                        tag4 = "put TAG_%d %d 12.9 stt=good\n" % (tag + 4, 
epoch_in_msec)
                        tag5 = "put TAG_%d %d 12.9 stt=good\n" % (tag + 5, 
epoch_in_msec)
                        tag6 = "put TAG_%d %d 12.9 stt=good\n" % (tag + 6, 
epoch_in_msec)
                        tag7 = "put TAG_%d %d 12.9 stt=good\n" % (tag + 7, 
epoch_in_msec)
                        tag8 = "put TAG_%d %d 12.9 stt=good\n" % (tag + 8, 
epoch_in_msec)
                        tag9 = "put TAG_%d %d 12.9 stt=good\n" % (tag + 9, 
epoch_in_msec)
                        tag10 = "put TAG_10 %d 12.9 stt=good\n" % (tag + 10, 
epoch_in_msec)
                        
                        str = tag1 + tag2 + tag3 + tag4 + tag5 + tag6 + tag7 + 
tag8 + tag9 + tag10
                        s.send(str)
        sleep (1) # every 1 second

> ArrayIndexOutOfBoundsException in CellComparator#getMinimumMidpointArray
> ------------------------------------------------------------------------
>
>                 Key: HBASE-13329
>                 URL: https://issues.apache.org/jira/browse/HBASE-13329
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 1.0.1
>         Environment: linux-debian-jessie
> ec2 - t2.micro instances
>            Reporter: Ruben Aguiar
>            Priority: Critical
>         Attachments: 13329-asserts.patch, 13329-v1.patch, 
> HBASE-13329.test.00.branch-1.1.patch
>
>
> While trying to benchmark my opentsdb cluster, I've created a script that 
> sends to hbase always the same value (in this case 1). After a few minutes, 
> the whole region server crashes and the region itself becomes impossible to 
> open again (cannot assign or unassign). After some investigation, what I saw 
> on the logs is that when a Memstore flush is called on a large region (128mb) 
> the process errors, killing the regionserver. On restart, replaying the edits 
> generates the same error, making the region unavailable. Tried to manually 
> unassign, assign or close_region. That didn't work because the code that 
> reads/replays it crashes.
> From my investigation this seems to be an overflow issue. The logs show that 
> the function getMinimumMidpointArray tried to access index -32743 of an 
> array, extremely close to the minimum short value in Java. Upon investigation 
> of the source code, it seems an index short is used, being incremented as 
> long as the two vectors are the same, probably making it overflow on large 
> vectors with equal data. Changing it to int should solve the problem.
> Here follows the hadoop logs of when the regionserver went down. Any help is 
> appreciated. Any other information you need please do tell me:
> 2015-03-24 18:00:56,187 INFO  [regionserver//10.2.0.73:16020.logRoller] 
> wal.FSHLog: Rolled WAL 
> /hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427220018516
>  with entries=143, filesize=134.70 MB; new WAL 
> /hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427220056140
> 2015-03-24 18:00:56,188 INFO  [regionserver//10.2.0.73:16020.logRoller] 
> wal.FSHLog: Archiving 
> hdfs://10.2.0.74:8020/hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427219987709
>  to 
> hdfs://10.2.0.74:8020/hbase/oldWALs/10.2.0.73%2C16020%2C1427216382590.default.1427219987709
> 2015-03-24 18:04:35,722 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
> Started memstore flush for 
> tsdb,,1427133969325.52bc1994da0fea97563a4a656a58bec2., current region 
> memstore size 128.04 MB
> 2015-03-24 18:04:36,154 FATAL [MemStoreFlusher.0] regionserver.HRegionServer: 
> ABORTING region server 10.2.0.73,16020,1427216382590: Replay of WAL required. 
> Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> tsdb,,1427133969325.52bc1994da0fea97563a4a656a58bec2.
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1999)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1770)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1702)
>       at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:445)
>       at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:407)
>       at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:69)
>       at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:225)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: -32743
>       at 
> org.apache.hadoop.hbase.CellComparator.getMinimumMidpointArray(CellComparator.java:478)
>       at 
> org.apache.hadoop.hbase.CellComparator.getMidpoint(CellComparator.java:448)
>       at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishBlock(HFileWriterV2.java:165)
>       at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.checkBlockBoundary(HFileWriterV2.java:146)
>       at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:263)
>       at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>       at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:932)
>       at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:121)
>       at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:71)
>       at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:879)
>       at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2128)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1953)
>       ... 7 more
> 2015-03-24 18:04:36,156 FATAL [MemStoreFlusher.0] regionserver.HRegionServer: 
> RegionServer abort: loaded coprocessors are: 
> [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to