[jira] [Commented] (HBASE-13970) NPE during compaction in trunk

ramkrishna.s.vasudevan (JIRA) Thu, 25 Jun 2015 04:10:30 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601010#comment-14601010
 ]


ramkrishna.s.vasudevan commented on HBASE-13970:
------------------------------------------------

The reason here is that there are 2 compactions getting triggered from the 
CompactionSplitThread on a region that is newly split.  One may be due to the 
split that happened?
So when both the compaction goes parallely then the 
PressureAwareCompactionController is started by the Compactor thread
{code}
@Override
  public void start(String compactionName) {
    activeCompactions.put(compactionName, new ActiveCompaction());
  }
{code}
Now by the time the second compaction is in progress the first compaction 
completes and does a finish
{code}
@Override
  public void finish(String compactionName) {
    ActiveCompaction compaction = activeCompactions.remove(compactionName);
    long elapsedTime = Math.max(1, EnvironmentEdgeManager.currentTime() - 
compaction.startTime);
.....
{code}
The compactionName is going to be common here because 
{code}
    String compactionName =
        store.getRegionInfo().getRegionNameAsString() + "#" + 
store.getFamily().getNameAsString();
{code}
When the second compaction comes for completion the compaction has already been 
removed and hence we get an NPE.

Logs
One compaction started with 4 files
{code}
2015-06-25 22:07:49,135 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] 
regionserver.HRegion: Starting compaction on info in region 
TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.
2015-06-25 22:07:49,135 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] 
regionserver.HStore: Starting compaction of 4 file(s) in info of 
TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.
 into 
tmpdir=hdfs://stobdtserver3:9010/hbase/data/default/TestTable/5eb54f001fd85035ab448f44d049ab84/.tmp,
 totalSize=285.6 M
2015-06-25 22:07:49,165 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] 
hfile.CacheConfig: 
blockCache=org.apache.hadoop.hbase.io.hfile.CombinedBlockCache@71f1ce16, 
cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
prefetchOnOpen=false
2015-06-25 22:07:49,954 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040.logRoller] wal.FSHLog: Rolled 
WAL 
/hbase/WALs/stobdtserver3,16040,1435250244539/stobdtserver3%2C16040%2C1435250244539.default.1435250268365
 with entries=90, filesize=124.14 MB; new WAL 
/hbase/WALs/stobdtserver3,16040,1435250244539/stobdtserver3%2C16040%2C1435250244539.default.1435250269933
20
{code}
Another compaction has been started with 3 files
{code}
2015-06-25 22:07:53,405 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435250245998] 
regionserver.HRegion: Starting compaction on info in region 
TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.
2015-06-25 22:07:53,406 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435250245998] 
regionserver.HStore: Starting compaction of 3 file(s) in info of 
TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.
 into 
tmpdir=hdfs://stobdtserver3:9010/hbase/data/default/TestTable/5eb54f001fd85035ab448f44d049ab84/.tmp,
 totalSize=343.4 M
2015-06-25 22:07:53,411 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435250245998] 
hfile.CacheConfig: 
blockCache=org.apache.hadoop.hbase.io.hfile.CombinedBlockCache@71f1ce16, 
cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
prefetchOnOpen=false
2015-06-25 22:07:54,211 INFO  [MemStoreFlusher.1] regionserver.HRegion: 
Flushing 1/1 column families, memstore=128.23 MB
2015-06-25 22:07:54,639 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040.logRoller] wal.FSHLog: Rolled 
WAL 
/hbase/WALs/stobdtserver3,16040,1435250244539/stobdtserver3%2C16040%2C1435250244539.default.1435250273034
 with entries=90, filesize=
{code}
{code}
2015-06-25 22:08:09,446 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] 
compactions.PressureAwareCompactionThroughputController: 
TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.#info
 average throughput is 19.80 MB/sec, slept 30 time(s) and total slept time is 
27694 ms. 0 active compactions remaining, total limit is 10.00 MB/sec
2015-06-25 22:08:09,520 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] 
regionserver.HStore: Completed compaction of 4 (all) file(s) in info of 
TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.
 into c044bd80b8684d97bb8ed2552687d22a(size=228.4 M), total size for store is 
686.2 M. This selection was in queue for 0sec, and took 20sec to execute.
2015-06-25 22:08:09,523 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] 
regionserver.CompactSplitThread: Completed compaction: Request = 
regionName=TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.,
 storeName=info, fileCount=4, fileSize=285.6 M, priority=6, 
time=7539031296069589; duration=20sec
2015-06-25 22:08:09,858 ERROR 
[regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435250245998] 
regionserver.CompactSplitThread: Compaction failed Request = 
regionName=TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.,
 storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), 
priority=3, time=7539035566509573
java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79)
        at 
org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238)
        at 
org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306)
        at 
org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106)
        at 
org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112)
        at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792)
        at 
org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
2015-06-25 22:12:26,339 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: 
totalSize=1.83 MB, freeSize=1.74 GB, max=1.74 GB, blockCount=0, accesses=0, 
hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, 
cachingHitsRatio=0,evictions=2
{code}
Both gets completed around the same time and hence you can see that the 
compaction with 4 files was successful but the one with 3 got failed with NPE. 
Though the compaction was actually done but it got stopped due to NPE. This is 
a race condition - so the failed compacted files may be get compacated in the 
next run of compaction when being selected.

> NPE during compaction in trunk
> ------------------------------
>
>                 Key: HBASE-13970
>                 URL: https://issues.apache.org/jira/browse/HBASE-13970
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 2.0.0
>
>
> Updated the trunk.. Loaded the table with PE tool.  Trigger a flush to ensure 
> all data is flushed out to disk. When the first compaction is triggered we 
> get an NPE and this is very easy to reproduce
> {code}
> 015-06-25 21:33:46,041 INFO  [main-EventThread] 
> procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
> event: /hbase/flush-table-proc/acquired
> 2015-06-25 21:33:46,051 INFO  
> [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
> regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB
> 2015-06-25 21:33:46,159 ERROR 
> [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945] 
> regionserver.CompactSplitThread: Compaction failed Request = 
> regionName=TestTable,00000000000000000000283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.,
>  storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), 
> priority=3, time=7536968291719985
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79)
>         at 
> org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238)
>         at 
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306)
>         at 
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106)
>         at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112)
>         at 
> org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792)
>         at 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 2015-06-25 21:33:46,745 INFO  
> [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
> regionserver.DefaultStoreFlusher: Flushed, sequenceid=1534, memsize=76.9 M, 
> hasBloomFilter=true, into tmp file 
> hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/.tmp/942ba0831a0047a08987439e34361a0c
> 2015-06-25 21:33:46,772 INFO  
> [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
> regionserver.HStore: Added 
> hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/info/942ba0831a0047a08987439e34361a0c,
>  entries=68116, sequenceid=1534, filesize=68.7 M
> 2015-06-25 21:33:46,773 INFO  
> [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
> regionserver.HRegion: Finished memstore flush of ~76.91 MB/80649344, 
> currentsize=0 B/0 for region 
> TestTable,00000000000000000000283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.
>  in 723ms, sequenceid=1534, compaction requested=true
> 2015-06-25 21:33:46,780 INFO  [main-EventThread] 
> procedure.ZKProcedureMemberRpcs: Received created 
> event:/hbase/flush-table-proc/reached/TestTable
> 2015-06-25 21:33:46,790 INFO  [main-EventThread] 
> procedure.ZKProcedureMemberRpcs: Received created 
> event:/hbase/flush-table-proc/abort/TestTable
> 2015-06-25 21:33:46,791 INFO  [main-EventThread] 
> procedure.ZKProcedureMemberRpcs: Received procedure abort children changed 
> event: /hbase/flush-table-proc/abort
> 2015-06-25 21:33:46,803 INFO  [main-EventThread] 
> procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
> event: /hbase/flush-table-proc/acquired
> 2015-06-25 21:33:46,818 INFO  [main-EventThread] 
> procedure.ZKProcedureMemberRpcs: Received procedure abort children changed 
> event: /hbase/flush-table-proc/abort
> {code}
> Will check this on what is the reason behind it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13970) NPE during compaction in trunk

Reply via email to