[jira] [Commented] (HBASE-12952) Seek with prefixtree may hang
[ https://issues.apache.org/jira/browse/HBASE-12952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306807#comment-14306807 ] Matt Corgan commented on HBASE-12952: - unfortunately, i can't say for sure if it's correct now, but if it passes all the existing tests then it's probably an improvement... thanks for debugging that Seek with prefixtree may hang - Key: HBASE-12952 URL: https://issues.apache.org/jira/browse/HBASE-12952 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 1.0.0, 0.98.7, 0.98.8, 0.98.6.1, 0.98.9, 0.98.10 Reporter: sinfox Assignee: sinfox Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11 Attachments: hbase_0.98.6.1.patch I have upgraded my hbase cluster from hbase-0.96 to hbase-0.98.6.1,then i found some compaction hang on many regionserver, and the cpu costed100%. It looks like there is an infinite loop somewhere. From the log, i found StoreFileScanner.java : reseekAtOrAfter(HFileScanner s, KeyValue k) enterd an infinite loop. Read source code, I found en error on PrefixTreeArrayReversibleScanner.java : previousRowInternal() eg: A:fan:12, numCell:1 A : 1 - B A : 2 - C C: 3 - D C: 4 - E A: fan:12, numCell:1 B: fan,numCell:1 C: fan:34,numCell: 0 D: fan,numCell:1 E: fan,numCell:1 when currentNode is D, its previous node is B , but this function will return A. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12976) Default hbase.client.scanner.max.result.size
[ https://issues.apache.org/jira/browse/HBASE-12976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306809#comment-14306809 ] Hadoop QA commented on HBASE-12976: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12696676/12976.txt against master branch at commit 8aeb3acaf959e2905191fd6c92fa56300f7d3597. ATTACHMENT ID: 12696676 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/12706//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12706//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12706//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12706//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12706//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12706//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12706//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12706//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12706//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12706//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12706//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12706//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/12706//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/12706//console This message is automatically generated. Default hbase.client.scanner.max.result.size Key: HBASE-12976 URL: https://issues.apache.org/jira/browse/HBASE-12976 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 0.94.27, 0.98.11 Attachments: 12976.txt Setting scanner caching is somewhat of a black art. It's hard to estimate ahead of time how large the result set will be. I propose we hbase.client.scanner.max.result.size to 2mb. That is good compromise between performance and buffer usage on typical networks (avoiding OOMs when the caching was chosen too high). To an HTable client this is completely transparent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11567) Write bulk load COMMIT events to WAL
[ https://issues.apache.org/jira/browse/HBASE-11567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306818#comment-14306818 ] Hadoop QA commented on HBASE-11567: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12696680/HBASE-11567-v4-rebase.patch against master branch at commit 8aeb3acaf959e2905191fd6c92fa56300f7d3597. ATTACHMENT ID: 12696680 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/12705//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12705//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12705//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12705//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12705//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12705//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12705//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12705//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12705//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12705//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12705//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12705//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/12705//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/12705//console This message is automatically generated. Write bulk load COMMIT events to WAL Key: HBASE-11567 URL: https://issues.apache.org/jira/browse/HBASE-11567 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Alex Newman Attachments: HBASE-11567-v1.patch, HBASE-11567-v2.patch, HBASE-11567-v4-rebase.patch, hbase-11567-v3.patch, hbase-11567-v4.patch Similar to writing flush (HBASE-11511), compaction(HBASE-2231) to WAL and region open/close (HBASE-11512) , we should persist bulk load events to WAL. This is especially important for secondary region replicas, since we can use this information to pick up primary regions' files from secondary replicas. A design doc for secondary replica replication can be found at HBASE-11183. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12958) SSH doing hbase:meta get but hbase:meta not assigned
[ https://issues.apache.org/jira/browse/HBASE-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306814#comment-14306814 ] Hudson commented on HBASE-12958: FAILURE: Integrated in HBase-TRUNK #6090 (See [https://builds.apache.org/job/HBase-TRUNK/6090/]) HBASE-12958 SSH doing hbase:meta get but hbase:meta not assigned (stack: rev 96cdc7987e8894b304a3201f67cb0b9595c68cc3) * hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java * hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestRegionStates.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java SSH doing hbase:meta get but hbase:meta not assigned Key: HBASE-12958 URL: https://issues.apache.org/jira/browse/HBASE-12958 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: stack Assignee: stack Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11 Attachments: 12958.txt, 12958v2.txt, 12958v2.txt All master threads are blocked waiting on this call to return: {code} MASTER_SERVER_OPERATIONS-c2020:16020-2 #189 prio=5 os_prio=0 tid=0x7f4b0408b000 nid=0x7821 in Object.wait() [0x7f4ada24d000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:168) - locked 0x00041c374f50 (a java.util.concurrent.atomic.AtomicBoolean) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:881) at org.apache.hadoop.hbase.MetaTableAccessor.get(MetaTableAccessor.java:208) at org.apache.hadoop.hbase.MetaTableAccessor.getRegionLocation(MetaTableAccessor.java:250) at org.apache.hadoop.hbase.MetaTableAccessor.getRegion(MetaTableAccessor.java:225) at org.apache.hadoop.hbase.master.RegionStates.serverOffline(RegionStates.java:634) - locked 0x00041c1f0d80 (a org.apache.hadoop.hbase.master.RegionStates) at org.apache.hadoop.hbase.master.AssignmentManager.processServerShutdown(AssignmentManager.java:3298) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:226) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} Master is stuck trying to find hbase:meta on the server that just crashed and that we just recovered: Mon Feb 02 23:00:02 PST 2015, null, java.net.SocketTimeoutException: callTimeout=6, callDuration=68181: row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=c2022.halxg.cloudera.com,16020,1422944918568, seqNum=0 Will add more detail in a sec. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-8329) Limit compaction speed
[ https://issues.apache.org/jira/browse/HBASE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308708#comment-14308708 ] Hudson commented on HBASE-8329: --- SUCCESS: Integrated in HBase-0.98 #839 (See [https://builds.apache.org/job/HBase-0.98/839/]) Amend HBASE-8329 Limit compaction speed (zhangduo) (apurtell: rev 409983a99d21a6a723027e531dfd2b9a0228abb6) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StripeStoreFileManager.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java Limit compaction speed -- Key: HBASE-8329 URL: https://issues.apache.org/jira/browse/HBASE-8329 Project: HBase Issue Type: Improvement Components: Compaction Reporter: binlijin Assignee: zhangduo Fix For: 2.0.0, 1.1.0 Attachments: HBASE-8329-0.98-addendum.patch, HBASE-8329-0.98.patch, HBASE-8329-10.patch, HBASE-8329-11.patch, HBASE-8329-12.patch, HBASE-8329-2-trunk.patch, HBASE-8329-3-trunk.patch, HBASE-8329-4-trunk.patch, HBASE-8329-5-trunk.patch, HBASE-8329-6-trunk.patch, HBASE-8329-7-trunk.patch, HBASE-8329-8-trunk.patch, HBASE-8329-9-trunk.patch, HBASE-8329-branch-1.patch, HBASE-8329-trunk.patch, HBASE-8329_13.patch, HBASE-8329_14.patch, HBASE-8329_15.patch, HBASE-8329_16.patch, HBASE-8329_17.patch There is no speed or resource limit for compaction,I think we should add this feature especially when request burst. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12957) region_mover#isSuccessfulScan may be extremely slow on region with lots of expired data
[ https://issues.apache.org/jira/browse/HBASE-12957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308709#comment-14308709 ] Hudson commented on HBASE-12957: SUCCESS: Integrated in HBase-0.98 #839 (See [https://builds.apache.org/job/HBase-0.98/839/]) HBASE-12957 region_mover#isSuccessfulScan may be extremely slow on region with lots of expired data (Hongyu Bi) (apurtell: rev beab86116db20a3c9287184084e55625d1bec313) * bin/region_mover.rb region_mover#isSuccessfulScan may be extremely slow on region with lots of expired data --- Key: HBASE-12957 URL: https://issues.apache.org/jira/browse/HBASE-12957 Project: HBase Issue Type: Improvement Components: scripts Reporter: hongyu bi Assignee: hongyu bi Priority: Minor Fix For: 2.0.0, 1.1.0, 0.98.11 Attachments: HBASE-12957-v0.patch region_mover will call isSuccessfulScan when region has moved to make sure it's healthy, however , if the moved region has lots of expired data region_mover#isSuccessfulScan will take a long time to finish ,that may even exceed lease timeout.So I made isSuccessfulScan a get-like scan to achieve faster response in that case. workaround:before graceful_stop/rolling_restart ,call major_compact on the table with small TTL -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-8329) Limit compaction speed
[ https://issues.apache.org/jira/browse/HBASE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308790#comment-14308790 ] Hudson commented on HBASE-8329: --- FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #796 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/796/]) Amend HBASE-8329 Limit compaction speed (zhangduo) (apurtell: rev 409983a99d21a6a723027e531dfd2b9a0228abb6) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StripeStoreFileManager.java Limit compaction speed -- Key: HBASE-8329 URL: https://issues.apache.org/jira/browse/HBASE-8329 Project: HBase Issue Type: Improvement Components: Compaction Reporter: binlijin Assignee: zhangduo Fix For: 2.0.0, 1.1.0 Attachments: HBASE-8329-0.98-addendum.patch, HBASE-8329-0.98.patch, HBASE-8329-10.patch, HBASE-8329-11.patch, HBASE-8329-12.patch, HBASE-8329-2-trunk.patch, HBASE-8329-3-trunk.patch, HBASE-8329-4-trunk.patch, HBASE-8329-5-trunk.patch, HBASE-8329-6-trunk.patch, HBASE-8329-7-trunk.patch, HBASE-8329-8-trunk.patch, HBASE-8329-9-trunk.patch, HBASE-8329-branch-1.patch, HBASE-8329-trunk.patch, HBASE-8329_13.patch, HBASE-8329_14.patch, HBASE-8329_15.patch, HBASE-8329_16.patch, HBASE-8329_17.patch There is no speed or resource limit for compaction,I think we should add this feature especially when request burst. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12957) region_mover#isSuccessfulScan may be extremely slow on region with lots of expired data
[ https://issues.apache.org/jira/browse/HBASE-12957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308791#comment-14308791 ] Hudson commented on HBASE-12957: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #796 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/796/]) HBASE-12957 region_mover#isSuccessfulScan may be extremely slow on region with lots of expired data (Hongyu Bi) (apurtell: rev beab86116db20a3c9287184084e55625d1bec313) * bin/region_mover.rb region_mover#isSuccessfulScan may be extremely slow on region with lots of expired data --- Key: HBASE-12957 URL: https://issues.apache.org/jira/browse/HBASE-12957 Project: HBase Issue Type: Improvement Components: scripts Reporter: hongyu bi Assignee: hongyu bi Priority: Minor Fix For: 2.0.0, 1.1.0, 0.98.11 Attachments: HBASE-12957-v0.patch region_mover will call isSuccessfulScan when region has moved to make sure it's healthy, however , if the moved region has lots of expired data region_mover#isSuccessfulScan will take a long time to finish ,that may even exceed lease timeout.So I made isSuccessfulScan a get-like scan to achieve faster response in that case. workaround:before graceful_stop/rolling_restart ,call major_compact on the table with small TTL -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12956) Binding to 0.0.0.0 is broken after HBASE-10569
[ https://issues.apache.org/jira/browse/HBASE-12956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308563#comment-14308563 ] Enis Soztutar commented on HBASE-12956: --- I've tested the patch at my cluster. Both {{hbase.regionserver.ipc.address}} works as before. zk dumps does not show anything unusual. Binding to 0.0.0.0 is broken after HBASE-10569 -- Key: HBASE-12956 URL: https://issues.apache.org/jira/browse/HBASE-12956 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Priority: Blocker Fix For: 1.0.0, 2.0.0, 1.1.0 Attachments: 0001-HBASE-12956-Binding-to-0.0.0.0-is-broken-after-HBASE.patch After the Region Server and Master code was merged, we lost the functionality to bind to 0.0.0.0 via hbase.regionserver.ipc.address and znodes now get created with the wildcard address which means that RSs and the master cannot connect to each other. Thanks to [~dimaspivak] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12956) Binding to 0.0.0.0 is broken after HBASE-10569
[ https://issues.apache.org/jira/browse/HBASE-12956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308624#comment-14308624 ] Hadoop QA commented on HBASE-12956: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12696917/0001-HBASE-12956-Binding-to-0.0.0.0-is-broken-after-HBASE.patch against master branch at commit 2583e8de574ae4b002c5dbc80b0da666b42dd699. ATTACHMENT ID: 12696917 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1942 checkstyle errors (more than the master's current 1941 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/12713//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12713//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12713//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12713//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12713//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12713//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12713//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12713//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12713//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12713//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12713//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12713//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/12713//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/12713//console This message is automatically generated. Binding to 0.0.0.0 is broken after HBASE-10569 -- Key: HBASE-12956 URL: https://issues.apache.org/jira/browse/HBASE-12956 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Priority: Blocker Fix For: 1.0.0, 2.0.0, 1.1.0 Attachments: 0001-HBASE-12956-Binding-to-0.0.0.0-is-broken-after-HBASE.patch, HBASE-12956-v2.txt After the Region Server and Master code was merged, we lost the functionality to bind to 0.0.0.0 via hbase.regionserver.ipc.address and znodes now get created with the wildcard address which means that RSs and the master cannot connect to each other. Thanks to [~dimaspivak] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-12956) Binding to 0.0.0.0 is broken after HBASE-10569
[ https://issues.apache.org/jira/browse/HBASE-12956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308711#comment-14308711 ] Enis Soztutar edited comment on HBASE-12956 at 2/6/15 6:48 AM: --- I was thinking that we do not need both bindAddress, and isa in RpcServer. The following line is equal to just using bindAddress: {code} isa.equals(bindAddress) ? isa : bindAddress {code} What I had in mind is your exact same patch, but without the RpcServer changes, and instead of {code} rpcServer = new RpcServer(rs, name, getServices(), - initialIsa, // BindAddress is IP we got for this server. + initialIsa, + bindAddress, // BindAddress is IP we got for this server. {code} just {code} rpcServer = new RpcServer(rs, name, getServices(), - initialIsa, // BindAddress is IP we got for this server. + bindAddress, // BindAddress is IP we got for this server. {code} let me know if this makes sense. was (Author: enis): I was thinking that we do not need both bindAddress, and isa in RpcServer. The following line is equal to just using bindAddress: {code} isa.equals(bindAddress) ? isa : bindAddress {code} What I had in mind is you exact same patch, but without the RpcServer changes, and instead of {code} rpcServer = new RpcServer(rs, name, getServices(), - initialIsa, // BindAddress is IP we got for this server. + initialIsa, + bindAddress, // BindAddress is IP we got for this server. {code} just {code} rpcServer = new RpcServer(rs, name, getServices(), - initialIsa, // BindAddress is IP we got for this server. + bindAddress, // BindAddress is IP we got for this server. {code} let me know if this makes sense. Binding to 0.0.0.0 is broken after HBASE-10569 -- Key: HBASE-12956 URL: https://issues.apache.org/jira/browse/HBASE-12956 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Priority: Blocker Fix For: 1.0.0, 2.0.0, 1.1.0 Attachments: 0001-HBASE-12956-Binding-to-0.0.0.0-is-broken-after-HBASE.patch, HBASE-12956-v2.txt After the Region Server and Master code was merged, we lost the functionality to bind to 0.0.0.0 via hbase.regionserver.ipc.address and znodes now get created with the wildcard address which means that RSs and the master cannot connect to each other. Thanks to [~dimaspivak] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12956) Binding to 0.0.0.0 is broken after HBASE-10569
[ https://issues.apache.org/jira/browse/HBASE-12956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308730#comment-14308730 ] Hadoop QA commented on HBASE-12956: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12696964/HBASE-12956-v2.txt against master branch at commit 2583e8de574ae4b002c5dbc80b0da666b42dd699. ATTACHMENT ID: 12696964 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1942 checkstyle errors (more than the master's current 1941 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/12714//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12714//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12714//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12714//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12714//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12714//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12714//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12714//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12714//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12714//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12714//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12714//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/12714//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/12714//console This message is automatically generated. Binding to 0.0.0.0 is broken after HBASE-10569 -- Key: HBASE-12956 URL: https://issues.apache.org/jira/browse/HBASE-12956 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Priority: Blocker Fix For: 1.0.0, 2.0.0, 1.1.0 Attachments: 0001-HBASE-12956-Binding-to-0.0.0.0-is-broken-after-HBASE.patch, HBASE-12956-v2.txt After the Region Server and Master code was merged, we lost the functionality to bind to 0.0.0.0 via hbase.regionserver.ipc.address and znodes now get created with the wildcard address which means that RSs and the master cannot connect to each other. Thanks to [~dimaspivak] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12949) Scanner can be stuck in infinite loop if the HFile is corrupted
[ https://issues.apache.org/jira/browse/HBASE-12949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308732#comment-14308732 ] stack commented on HBASE-12949: --- bq. Looks like in the code, if HBase checksum fails, we just turn it off? Suggest we dig in here more. If checksum fails, indeed, parse of KV/Cell is going to be problematic. Looking at logging, the hbase-based checksumming failed so we were supposed to fall back to the hdfs checksumming (you know about this [~jerryhe]? If not, just say and I can point you to right place). Maybe something messed up here. If bad checksum, we should be failing the read for sure. Scanner can be stuck in infinite loop if the HFile is corrupted --- Key: HBASE-12949 URL: https://issues.apache.org/jira/browse/HBASE-12949 Project: HBase Issue Type: Bug Affects Versions: 0.94.3, 0.98.10 Reporter: Jerry He Attachments: HBASE-12949-master.patch We've encountered problem where compaction hangs and never completes. After looking into it further, we found that the compaction scanner was stuck in a infinite loop. See stack below. {noformat} org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:296) org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:257) org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:697) org.apache.hadoop.hbase.regionserver.StoreScanner.seekToNextRow(StoreScanner.java:672) org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:529) org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:223) {noformat} We identified the hfile that seems to be corrupted. Using HFile tool shows the following: {noformat} [biadmin@hdtest009 bin]$ hbase org.apache.hadoop.hbase.io.hfile.HFile -v -k -m -f /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 15/01/23 11:53:17 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 15/01/23 11:53:18 INFO util.ChecksumType: Checksum using org.apache.hadoop.util.PureJavaCrc32 15/01/23 11:53:18 INFO util.ChecksumType: Checksum can use org.apache.hadoop.util.PureJavaCrc32C 15/01/23 11:53:18 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS Scanning - /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 WARNING, previous row is greater then current row filename - /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 previous - \x00/20110203-094231205-79442793-1410161293068203000\x0Aattributes16794406\x00\x00\x01\x00\x00\x00\x00\x00\x00 current - Exception in thread main java.nio.BufferUnderflowException at java.nio.Buffer.nextGetIndex(Buffer.java:489) at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:347) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.readKeyValueLen(HFileReaderV2.java:856) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:768) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.scanKeysValues(HFilePrettyPrinter.java:362) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.processFile(HFilePrettyPrinter.java:262) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.run(HFilePrettyPrinter.java:220) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.main(HFilePrettyPrinter.java:539) at org.apache.hadoop.hbase.io.hfile.HFile.main(HFile.java:802) {noformat} Turning on Java Assert shows the following: {noformat} Exception in thread main java.lang.AssertionError: Key 20110203-094231205-79442793-1410161293068203000/attributes:16794406/1099511627776/Minimum/vlen=15/mvcc=0 followed by a smaller key //0/Minimum/vlen=0/mvcc=0 in cf attributes at org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:672) {noformat} It shows that the hfile seems to be corrupted -- the keys don't seem to be right. But Scanner is not able to give a meaningful error, but stuck in an infinite loop in here: {code} KeyValueHeap.generalizedSeek() while ((scanner = heap.poll()) != null) { } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12956) Binding to 0.0.0.0 is broken after HBASE-10569
[ https://issues.apache.org/jira/browse/HBASE-12956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez updated HBASE-12956: -- Attachment: HBASE-12956-v2.txt Hey [~enis] I removed isa as suggested, please let me know if that is what you were thinking. Binding to 0.0.0.0 is broken after HBASE-10569 -- Key: HBASE-12956 URL: https://issues.apache.org/jira/browse/HBASE-12956 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Priority: Blocker Fix For: 1.0.0, 2.0.0, 1.1.0 Attachments: 0001-HBASE-12956-Binding-to-0.0.0.0-is-broken-after-HBASE.patch, HBASE-12956-v2.txt After the Region Server and Master code was merged, we lost the functionality to bind to 0.0.0.0 via hbase.regionserver.ipc.address and znodes now get created with the wildcard address which means that RSs and the master cannot connect to each other. Thanks to [~dimaspivak] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12956) Binding to 0.0.0.0 is broken after HBASE-10569
[ https://issues.apache.org/jira/browse/HBASE-12956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308711#comment-14308711 ] Enis Soztutar commented on HBASE-12956: --- I was thinking that we do not need both bindAddress, and isa in RpcServer. The following line is equal to just using bindAddress: {code} isa.equals(bindAddress) ? isa : bindAddress {code} What I had in mind is you exact same patch, but without the RpcServer changes, and instead of {code} rpcServer = new RpcServer(rs, name, getServices(), - initialIsa, // BindAddress is IP we got for this server. + initialIsa, + bindAddress, // BindAddress is IP we got for this server. {code} just {code} rpcServer = new RpcServer(rs, name, getServices(), - initialIsa, // BindAddress is IP we got for this server. + bindAddress, // BindAddress is IP we got for this server. {code} let me know if this makes sense. Binding to 0.0.0.0 is broken after HBASE-10569 -- Key: HBASE-12956 URL: https://issues.apache.org/jira/browse/HBASE-12956 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Priority: Blocker Fix For: 1.0.0, 2.0.0, 1.1.0 Attachments: 0001-HBASE-12956-Binding-to-0.0.0.0-is-broken-after-HBASE.patch, HBASE-12956-v2.txt After the Region Server and Master code was merged, we lost the functionality to bind to 0.0.0.0 via hbase.regionserver.ipc.address and znodes now get created with the wildcard address which means that RSs and the master cannot connect to each other. Thanks to [~dimaspivak] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12723) Update ACL matrix to reflect reality
[ https://issues.apache.org/jira/browse/HBASE-12723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308678#comment-14308678 ] Ashish Singhi commented on HBASE-12723: --- A very small nit, can you change operation name of restore _snapshot command from restore to restoreSnapshot ? Update ACL matrix to reflect reality Key: HBASE-12723 URL: https://issues.apache.org/jira/browse/HBASE-12723 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Assignee: Srikanth Srungarapu Fix For: 2.0.0, 1.0.1, 1.1.0 Attachments: HBASE-12723.patch, HBASE-12723_v2.patch, book.html The ACL matrix in the book should be updated with the recent changes. https://hbase.apache.org/book/appendix_acl_matrix.html Also the format is not optimal. There is a hierarchy relation between scopes (GLOBAL NS TABLE), but not so much between Permissions (A,C,R) Some things to do: - {{Minimum Permission}} column does not make sense. We should replace it. - Add information about superuser - grant is a multi level thing. Required permissions depend on the scope. - See HBASE-12511 and others changed some of the permissions What I would like to see at the end is something like: {code} createNamespace: superuser | global(A) deleteNamespace: superuser | global(A) | NS(A) modifyNamespace: superuser | global(A) | NS(A) getNamespaceDescriptor : superuser | global(A) | NS(A) listNamespaces : All access* createTable: superuser | global(C) | NS(C) grant NS Perm : superuser | global(A) | NS(A) Table Perm : ... revoke NS Perm : superuser | global(A) | NS(A) Table Perm : ... getPerms NS perm : superuser | global(A) | NS(A) Table Perm : ... {code} See HBASE-12511. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12974) Opaque AsyncProcess failure: RetriesExhaustedWithDetailsException but no detail
[ https://issues.apache.org/jira/browse/HBASE-12974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307249#comment-14307249 ] Nicolas Liochon commented on HBASE-12974: - bq. 1 time only We don't keep the history of the exceptions. the time is only about the last exception. So if you have 1 action that failed you will have 1 time. If 10 actions fail for the same reason you will have '10 times'. Yes it's kind of useless. We used to start to log after 10 retries or so, so the log should contain more information (at the info level iirc). Opaque AsyncProcess failure: RetriesExhaustedWithDetailsException but no detail --- Key: HBASE-12974 URL: https://issues.apache.org/jira/browse/HBASE-12974 Project: HBase Issue Type: Bug Components: integration tests Affects Versions: 1.0.0 Reporter: stack Assignee: stack I'm trying to do longer running tests but when I up the numbers for a task I run into this: {code} 2015-02-04 15:35:10,267 FATAL [IPC Server handler 17 on 43975] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1419986015214_0204_m_02_3 - exited : org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: IOException: 1 time, at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:227) at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:207) at org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1658) at org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:208) at org.apache.hadoop.hbase.client.BufferedMutatorImpl.doMutate(BufferedMutatorImpl.java:141) at org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:98) at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.persist(IntegrationTestBigLinkedList.java:449) at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.map(IntegrationTestBigLinkedList.java:407) at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.map(IntegrationTestBigLinkedList.java:355) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) {code} Its telling me an action failed but 1 time only with an empty IOE? I'm kinda stumped. Starting up this issue to see if I can get to the bottom of it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12952) Seek with prefixtree may hang
[ https://issues.apache.org/jira/browse/HBASE-12952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306953#comment-14306953 ] ramkrishna.s.vasudevan commented on HBASE-12952: Please check with the latest version or with 0.98.10RC version. Some fixes have gone into it related to Prefix Tree. Seek with prefixtree may hang - Key: HBASE-12952 URL: https://issues.apache.org/jira/browse/HBASE-12952 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 1.0.0, 0.98.7, 0.98.8, 0.98.6.1, 0.98.9, 0.98.10 Reporter: sinfox Assignee: sinfox Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11 Attachments: hbase_0.98.6.1.patch I have upgraded my hbase cluster from hbase-0.96 to hbase-0.98.6.1,then i found some compaction hang on many regionserver, and the cpu costed100%. It looks like there is an infinite loop somewhere. From the log, i found StoreFileScanner.java : reseekAtOrAfter(HFileScanner s, KeyValue k) enterd an infinite loop. Read source code, I found en error on PrefixTreeArrayReversibleScanner.java : previousRowInternal() eg: A:fan:12, numCell:1 A : 1 - B A : 2 - C C: 3 - D C: 4 - E A: fan:12, numCell:1 B: fan,numCell:1 C: fan:34,numCell: 0 D: fan,numCell:1 E: fan,numCell:1 when currentNode is D, its previous node is B , but this function will return A. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12974) Opaque AsyncProcess failure: RetriesExhaustedWithDetailsException but no detail
[ https://issues.apache.org/jira/browse/HBASE-12974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-12974: -- Issue Type: Sub-task (was: Bug) Parent: HBASE-12978 Opaque AsyncProcess failure: RetriesExhaustedWithDetailsException but no detail --- Key: HBASE-12974 URL: https://issues.apache.org/jira/browse/HBASE-12974 Project: HBase Issue Type: Sub-task Components: integration tests Affects Versions: 1.0.0 Reporter: stack Assignee: stack I'm trying to do longer running tests but when I up the numbers for a task I run into this: {code} 2015-02-04 15:35:10,267 FATAL [IPC Server handler 17 on 43975] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1419986015214_0204_m_02_3 - exited : org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: IOException: 1 time, at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:227) at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:207) at org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1658) at org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:208) at org.apache.hadoop.hbase.client.BufferedMutatorImpl.doMutate(BufferedMutatorImpl.java:141) at org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:98) at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.persist(IntegrationTestBigLinkedList.java:449) at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.map(IntegrationTestBigLinkedList.java:407) at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.map(IntegrationTestBigLinkedList.java:355) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) {code} Its telling me an action failed but 1 time only with an empty IOE? I'm kinda stumped. Starting up this issue to see if I can get to the bottom of it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [jira] [Commented] (HBASE-12949) Scanner can be stuck in infinite loop if the HFile is corrupted
Jerry What is the quick and easy way to monitor corrupted hfiles ? We are using HBASE 0.98 Thanks Asim On Feb 5, 2015, at 10:47 AM, Jerry He (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/HBASE-12949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307748#comment-14307748 ] Jerry He commented on HBASE-12949: -- Hi, [~stack], [~ram_krish] I agree. It is a balancing act between checking, not checking, and checking more or checking less. We can check less, for example, only the type. Another option I can think of is that we have a property (say 'SanityCheckCell'). We will only do checking when reading the cells if the property is set to true, for people indicating strong cell sanity check, or for people lacking strong FileSystem protection (checksum, etc). What do you think? Scanner can be stuck in infinite loop if the HFile is corrupted --- Key: HBASE-12949 URL: https://issues.apache.org/jira/browse/HBASE-12949 Project: HBase Issue Type: Bug Affects Versions: 0.94.3, 0.98.10 Reporter: Jerry He Attachments: HBASE-12949-master.patch We've encountered problem where compaction hangs and never completes. After looking into it further, we found that the compaction scanner was stuck in a infinite loop. See stack below. {noformat} org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:296) org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:257) org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:697) org.apache.hadoop.hbase.regionserver.StoreScanner.seekToNextRow(StoreScanner.java:672) org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:529) org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:223) {noformat} We identified the hfile that seems to be corrupted. Using HFile tool shows the following: {noformat} [biadmin@hdtest009 bin]$ hbase org.apache.hadoop.hbase.io.hfile.HFile -v -k -m -f /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 15/01/23 11:53:17 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 15/01/23 11:53:18 INFO util.ChecksumType: Checksum using org.apache.hadoop.util.PureJavaCrc32 15/01/23 11:53:18 INFO util.ChecksumType: Checksum can use org.apache.hadoop.util.PureJavaCrc32C 15/01/23 11:53:18 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS Scanning - /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 WARNING, previous row is greater then current row filename - /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 previous - \x00/20110203-094231205-79442793-1410161293068203000\x0Aattributes16794406\x00\x00\x01\x00\x00\x00\x00\x00\x00 current - Exception in thread main java.nio.BufferUnderflowException at java.nio.Buffer.nextGetIndex(Buffer.java:489) at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:347) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.readKeyValueLen(HFileReaderV2.java:856) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:768) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.scanKeysValues(HFilePrettyPrinter.java:362) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.processFile(HFilePrettyPrinter.java:262) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.run(HFilePrettyPrinter.java:220) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.main(HFilePrettyPrinter.java:539) at org.apache.hadoop.hbase.io.hfile.HFile.main(HFile.java:802) {noformat} Turning on Java Assert shows the following: {noformat} Exception in thread main java.lang.AssertionError: Key 20110203-094231205-79442793-1410161293068203000/attributes:16794406/1099511627776/Minimum/vlen=15/mvcc=0 followed by a smaller key //0/Minimum/vlen=0/mvcc=0 in cf attributes at org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:672) {noformat} It shows that the hfile seems to be corrupted -- the keys don't seem to be right. But Scanner is not able to give a meaningful error, but stuck in an infinite loop in here: {code} KeyValueHeap.generalizedSeek() while ((scanner = heap.poll()) != null) { } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12949) Scanner can be stuck in infinite loop if the HFile is corrupted
[ https://issues.apache.org/jira/browse/HBASE-12949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307960#comment-14307960 ] Asim Zafir commented on HBASE-12949: Jerry What is the quick and easy way to monitor corrupted hfiles ? We are using HBASE 0.98 Thanks Asim Scanner can be stuck in infinite loop if the HFile is corrupted --- Key: HBASE-12949 URL: https://issues.apache.org/jira/browse/HBASE-12949 Project: HBase Issue Type: Bug Affects Versions: 0.94.3, 0.98.10 Reporter: Jerry He Attachments: HBASE-12949-master.patch We've encountered problem where compaction hangs and never completes. After looking into it further, we found that the compaction scanner was stuck in a infinite loop. See stack below. {noformat} org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:296) org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:257) org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:697) org.apache.hadoop.hbase.regionserver.StoreScanner.seekToNextRow(StoreScanner.java:672) org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:529) org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:223) {noformat} We identified the hfile that seems to be corrupted. Using HFile tool shows the following: {noformat} [biadmin@hdtest009 bin]$ hbase org.apache.hadoop.hbase.io.hfile.HFile -v -k -m -f /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 15/01/23 11:53:17 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 15/01/23 11:53:18 INFO util.ChecksumType: Checksum using org.apache.hadoop.util.PureJavaCrc32 15/01/23 11:53:18 INFO util.ChecksumType: Checksum can use org.apache.hadoop.util.PureJavaCrc32C 15/01/23 11:53:18 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS Scanning - /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 WARNING, previous row is greater then current row filename - /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 previous - \x00/20110203-094231205-79442793-1410161293068203000\x0Aattributes16794406\x00\x00\x01\x00\x00\x00\x00\x00\x00 current - Exception in thread main java.nio.BufferUnderflowException at java.nio.Buffer.nextGetIndex(Buffer.java:489) at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:347) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.readKeyValueLen(HFileReaderV2.java:856) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:768) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.scanKeysValues(HFilePrettyPrinter.java:362) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.processFile(HFilePrettyPrinter.java:262) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.run(HFilePrettyPrinter.java:220) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.main(HFilePrettyPrinter.java:539) at org.apache.hadoop.hbase.io.hfile.HFile.main(HFile.java:802) {noformat} Turning on Java Assert shows the following: {noformat} Exception in thread main java.lang.AssertionError: Key 20110203-094231205-79442793-1410161293068203000/attributes:16794406/1099511627776/Minimum/vlen=15/mvcc=0 followed by a smaller key //0/Minimum/vlen=0/mvcc=0 in cf attributes at org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:672) {noformat} It shows that the hfile seems to be corrupted -- the keys don't seem to be right. But Scanner is not able to give a meaningful error, but stuck in an infinite loop in here: {code} KeyValueHeap.generalizedSeek() while ((scanner = heap.poll()) != null) { } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12978) hbase:meta has a row missing hregioninfo and it causes my long-running job to fail
stack created HBASE-12978: - Summary: hbase:meta has a row missing hregioninfo and it causes my long-running job to fail Key: HBASE-12978 URL: https://issues.apache.org/jira/browse/HBASE-12978 Project: HBase Issue Type: Bug Reporter: stack Fix For: 1.0.0 Testing 1.0.0 trying long-running tests. A row in hbase:meta was missing its HRI entry. It caused the job to fail. Around the time of the first task failure, there are balances of the hbase:meta region and it was on a server that crashed. I tried to look at what happened around time of our writing hbase:meta and I ran into another issue; 20 logs of 256MBs filled with WrongRegionException written over a minute or two. The actual update of hbase:meta was not in the logs, it'd been rotated off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12978) hbase:meta has a row missing hregioninfo and it causes my long-running job to fail
[ https://issues.apache.org/jira/browse/HBASE-12978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307972#comment-14307972 ] stack commented on HBASE-12978: --- Let me try and get more data in here. hbase:meta has a row missing hregioninfo and it causes my long-running job to fail -- Key: HBASE-12978 URL: https://issues.apache.org/jira/browse/HBASE-12978 Project: HBase Issue Type: Bug Reporter: stack Fix For: 1.0.0 Testing 1.0.0 trying long-running tests. A row in hbase:meta was missing its HRI entry. It caused the job to fail. Around the time of the first task failure, there are balances of the hbase:meta region and it was on a server that crashed. I tried to look at what happened around time of our writing hbase:meta and I ran into another issue; 20 logs of 256MBs filled with WrongRegionException written over a minute or two. The actual update of hbase:meta was not in the logs, it'd been rotated off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12976) Default hbase.client.scanner.max.result.size
[ https://issues.apache.org/jira/browse/HBASE-12976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307985#comment-14307985 ] stack commented on HBASE-12976: --- +1 (again). Mighty [~jonathan.lawlor] is working on removing the now unneeded caching. Default hbase.client.scanner.max.result.size Key: HBASE-12976 URL: https://issues.apache.org/jira/browse/HBASE-12976 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 0.94.27, 0.98.11 Attachments: 12976-v2.txt, 12976.txt Setting scanner caching is somewhat of a black art. It's hard to estimate ahead of time how large the result set will be. I propose we hbase.client.scanner.max.result.size to 2mb. That is good compromise between performance and buffer usage on typical networks (avoiding OOMs when the caching was chosen too high). To an HTable client this is completely transparent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12966) NPE in HMaster while recovering tables in Enabling state
[ https://issues.apache.org/jira/browse/HBASE-12966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-12966: --- Fix Version/s: 1.1.0 1.0.1 Hadoop Flags: Reviewed NPE in HMaster while recovering tables in Enabling state Key: HBASE-12966 URL: https://issues.apache.org/jira/browse/HBASE-12966 Project: HBase Issue Type: Bug Components: master Reporter: ramkrishna.s.vasudevan Assignee: Andrey Stepachev Fix For: 2.0.0, 1.0.1, 1.1.0 Attachments: HBASE-12966-branch-1.patch, HBASE-12966-branch-1.patch, HBASE-12966-branch-1.patch, HBASE-12966.patch, HBASE-12966.patch, HBASE-12966.patch, HBASE-12966.patch {code} java.lang.NullPointerException at org.apache.hadoop.hbase.master.handler.EnableTableHandler.handleEnableTable(EnableTableHandler.java:210) at org.apache.hadoop.hbase.master.handler.EnableTableHandler.process(EnableTableHandler.java:142) at org.apache.hadoop.hbase.master.AssignmentManager.recoverTableInEnablingState(AssignmentManager.java:1695) at org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:416) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:720) at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:170) at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1459) at java.lang.Thread.run(Thread.java:745) 2015-02-04 16:11:45,932 FATAL [stobdtserver3:16040.activeMasterManager] master.HMaster: Master server abort: loaded coprocessors are: [org.apache.hadoop.hbor.MultiRowMutationEndpoint] 2015-02-04 16:11:45,933 FATAL [stobdtserver3:16040.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown. java.lang.NullPointerException at org.apache.hadoop.hbase.master.handler.EnableTableHandler.handleEnableTable(EnableTableHandler.java:210) at org.apache.hadoop.hbase.master.handler.EnableTableHandler.process(EnableTableHandler.java:142) at org.apache.hadoop.hbase.master.AssignmentManager.recoverTableInEnablingState(AssignmentManager.java:1695) at org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:416) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:720) at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:170) at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1459) at java.lang.Thread.run(Thread.java:745) {code} A table was trying to recover from ENABLING state and the master got the above exception. Note that the set up was 2 master with 1 RS (total 3 machines). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12976) Set default value for hbase.client.scanner.max.result.size
[ https://issues.apache.org/jira/browse/HBASE-12976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308039#comment-14308039 ] Lars Hofhansl commented on HBASE-12976: --- Cherry-picked and pushed to 0.98, 1.0, 1.1, and 2.0. Making a 0.94 patch now. Set default value for hbase.client.scanner.max.result.size -- Key: HBASE-12976 URL: https://issues.apache.org/jira/browse/HBASE-12976 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.94.27, 0.98.11 Attachments: 12976-v2.txt, 12976.txt Setting scanner caching is somewhat of a black art. It's hard to estimate ahead of time how large the result set will be. I propose we hbase.client.scanner.max.result.size to 2mb. That is good compromise between performance and buffer usage on typical networks (avoiding OOMs when the caching was chosen too high). To an HTable client this is completely transparent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12976) Set default value for hbase.client.scanner.max.result.size
[ https://issues.apache.org/jira/browse/HBASE-12976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-12976: -- Fix Version/s: 1.1.0 Set default value for hbase.client.scanner.max.result.size -- Key: HBASE-12976 URL: https://issues.apache.org/jira/browse/HBASE-12976 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.94.27, 0.98.11 Attachments: 12976-v2.txt, 12976.txt Setting scanner caching is somewhat of a black art. It's hard to estimate ahead of time how large the result set will be. I propose we hbase.client.scanner.max.result.size to 2mb. That is good compromise between performance and buffer usage on typical networks (avoiding OOMs when the caching was chosen too high). To an HTable client this is completely transparent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12958) SSH doing hbase:meta get but hbase:meta not assigned
[ https://issues.apache.org/jira/browse/HBASE-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308042#comment-14308042 ] Hudson commented on HBASE-12958: FAILURE: Integrated in HBase-1.1 #143 (See [https://builds.apache.org/job/HBase-1.1/143/]) Revert HBASE-12958 SSH doing hbase:meta get but hbase:meta not assigned (stack: rev aabc74406fc0c23542a29ac9058a632b5a3edfd8) * hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java * hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestRegionStates.java * hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java SSH doing hbase:meta get but hbase:meta not assigned Key: HBASE-12958 URL: https://issues.apache.org/jira/browse/HBASE-12958 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: stack Assignee: stack Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11 Attachments: 12958.txt, 12958v2.txt, 12958v2.txt, HBASE-12958-0.98-addendum.patch All master threads are blocked waiting on this call to return: {code} MASTER_SERVER_OPERATIONS-c2020:16020-2 #189 prio=5 os_prio=0 tid=0x7f4b0408b000 nid=0x7821 in Object.wait() [0x7f4ada24d000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:168) - locked 0x00041c374f50 (a java.util.concurrent.atomic.AtomicBoolean) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:881) at org.apache.hadoop.hbase.MetaTableAccessor.get(MetaTableAccessor.java:208) at org.apache.hadoop.hbase.MetaTableAccessor.getRegionLocation(MetaTableAccessor.java:250) at org.apache.hadoop.hbase.MetaTableAccessor.getRegion(MetaTableAccessor.java:225) at org.apache.hadoop.hbase.master.RegionStates.serverOffline(RegionStates.java:634) - locked 0x00041c1f0d80 (a org.apache.hadoop.hbase.master.RegionStates) at org.apache.hadoop.hbase.master.AssignmentManager.processServerShutdown(AssignmentManager.java:3298) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:226) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} Master is stuck trying to find hbase:meta on the server that just crashed and that we just recovered: Mon Feb 02 23:00:02 PST 2015, null, java.net.SocketTimeoutException: callTimeout=6, callDuration=68181: row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=c2022.halxg.cloudera.com,16020,1422944918568, seqNum=0 Will add more detail in a sec. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12974) Opaque AsyncProcess failure: RetriesExhaustedWithDetailsException but no detail
[ https://issues.apache.org/jira/browse/HBASE-12974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307976#comment-14307976 ] stack commented on HBASE-12974: --- Making this a subtask of HBASE-12978. Related it seems. The opaque shows up first and then I see the missing row which may be the root cause (speculation). Opaque AsyncProcess failure: RetriesExhaustedWithDetailsException but no detail --- Key: HBASE-12974 URL: https://issues.apache.org/jira/browse/HBASE-12974 Project: HBase Issue Type: Sub-task Components: integration tests Affects Versions: 1.0.0 Reporter: stack Assignee: stack I'm trying to do longer running tests but when I up the numbers for a task I run into this: {code} 2015-02-04 15:35:10,267 FATAL [IPC Server handler 17 on 43975] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1419986015214_0204_m_02_3 - exited : org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: IOException: 1 time, at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:227) at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:207) at org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1658) at org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:208) at org.apache.hadoop.hbase.client.BufferedMutatorImpl.doMutate(BufferedMutatorImpl.java:141) at org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:98) at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.persist(IntegrationTestBigLinkedList.java:449) at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.map(IntegrationTestBigLinkedList.java:407) at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.map(IntegrationTestBigLinkedList.java:355) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) {code} Its telling me an action failed but 1 time only with an empty IOE? I'm kinda stumped. Starting up this issue to see if I can get to the bottom of it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12976) Default hbase.client.scanner.max.result.size
[ https://issues.apache.org/jira/browse/HBASE-12976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308008#comment-14308008 ] Hadoop QA commented on HBASE-12976: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12696826/12976-v2.txt against master branch at commit 96cdc7987e8894b304a3201f67cb0b9595c68cc3. ATTACHMENT ID: 12696826 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.hadoop.hbase.coprocessor.TestMasterObserver.testRegionTransitionOperations(TestMasterObserver.java:1604) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/12709//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12709//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12709//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12709//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12709//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12709//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12709//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12709//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12709//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12709//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12709//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12709//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/12709//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/12709//console This message is automatically generated. Default hbase.client.scanner.max.result.size Key: HBASE-12976 URL: https://issues.apache.org/jira/browse/HBASE-12976 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 0.94.27, 0.98.11 Attachments: 12976-v2.txt, 12976.txt Setting scanner caching is somewhat of a black art. It's hard to estimate ahead of time how large the result set will be. I propose we hbase.client.scanner.max.result.size to 2mb. That is good compromise between performance and buffer usage on typical networks (avoiding OOMs when the caching was chosen too high). To an HTable client this is completely transparent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12954) Ability impaired using HBase on multihomed hosts
[ https://issues.apache.org/jira/browse/HBASE-12954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-12954: --- Attachment: 12954-v11.txt Patch v11 addresses checkstyle warnings. Ability impaired using HBase on multihomed hosts Key: HBASE-12954 URL: https://issues.apache.org/jira/browse/HBASE-12954 Project: HBase Issue Type: Bug Affects Versions: 0.98.4 Reporter: Clay B. Assignee: Ted Yu Priority: Minor Attachments: 12954-v1.txt, 12954-v10.txt, 12954-v11.txt, 12954-v7.txt, 12954-v8.txt, Hadoop Three Interfaces.png For HBase clusters running on unusual networks (such as NAT'd cloud environments or physical machines with multiple IP's per network interface) it would be ideal to have a way to both specify: # which IP interface to which HBase master or region-server will bind # what hostname HBase will advertise in Zookeeper both for a master or region-server process While efforts such as HBASE-8640 go a long way to normalize these two sources of information, it is not possible in the current design of the properties available to an administrator for these to be unambiguously specified. One has been able to request {{hbase.master.ipc.address}} or {{hbase.regionserver.ipc.address}} but one can not specify the desired HBase {{hbase.master.hostname}}. (It was removed in HBASE-1357, further I am unaware of a region-server equivalent.) I use a configuration management system to generate all of my configuration files on a per-machine basis. As such, an option to generate a file specifying exactly which hostname to use would be helpful. Today, specifying the bind address for HBase works and one can use an HBase-only DNS for faking what to put in Zookeeper but this is far from ideal. Network interfaces have no intrinsic IP address, nor hostname. Specifing a DNS server is awkward as the DNS server may differ from the system's resolver and is a single IP address. Similarly, on hosts which use a transient VIP (e.g. through keepalived) for other services, it means there's a seemingly non-deterministic hostname choice made by HBase depending on the state of the VIP at daemon start-up time. I will attach two networking examples I use which become very difficult to manage under the current properties. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12966) NPE in HMaster while recovering tables in Enabling state
[ https://issues.apache.org/jira/browse/HBASE-12966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307991#comment-14307991 ] Ted Yu commented on HBASE-12966: [~apurtell]: Let me know if you want this in 0.98 Thanks NPE in HMaster while recovering tables in Enabling state Key: HBASE-12966 URL: https://issues.apache.org/jira/browse/HBASE-12966 Project: HBase Issue Type: Bug Components: master Reporter: ramkrishna.s.vasudevan Assignee: Andrey Stepachev Fix For: 2.0.0 Attachments: HBASE-12966-branch-1.patch, HBASE-12966-branch-1.patch, HBASE-12966-branch-1.patch, HBASE-12966.patch, HBASE-12966.patch, HBASE-12966.patch, HBASE-12966.patch {code} java.lang.NullPointerException at org.apache.hadoop.hbase.master.handler.EnableTableHandler.handleEnableTable(EnableTableHandler.java:210) at org.apache.hadoop.hbase.master.handler.EnableTableHandler.process(EnableTableHandler.java:142) at org.apache.hadoop.hbase.master.AssignmentManager.recoverTableInEnablingState(AssignmentManager.java:1695) at org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:416) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:720) at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:170) at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1459) at java.lang.Thread.run(Thread.java:745) 2015-02-04 16:11:45,932 FATAL [stobdtserver3:16040.activeMasterManager] master.HMaster: Master server abort: loaded coprocessors are: [org.apache.hadoop.hbor.MultiRowMutationEndpoint] 2015-02-04 16:11:45,933 FATAL [stobdtserver3:16040.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown. java.lang.NullPointerException at org.apache.hadoop.hbase.master.handler.EnableTableHandler.handleEnableTable(EnableTableHandler.java:210) at org.apache.hadoop.hbase.master.handler.EnableTableHandler.process(EnableTableHandler.java:142) at org.apache.hadoop.hbase.master.AssignmentManager.recoverTableInEnablingState(AssignmentManager.java:1695) at org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:416) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:720) at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:170) at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1459) at java.lang.Thread.run(Thread.java:745) {code} A table was trying to recover from ENABLING state and the master got the above exception. Note that the set up was 2 master with 1 RS (total 3 machines). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12954) Ability impaired using HBase on multihomed hosts
[ https://issues.apache.org/jira/browse/HBASE-12954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308000#comment-14308000 ] Hadoop QA commented on HBASE-12954: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12696824/12954-v10.txt against master branch at commit 96cdc7987e8894b304a3201f67cb0b9595c68cc3. ATTACHMENT ID: 12696824 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1943 checkstyle errors (more than the master's current 1941 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: +descriptionThis config is for experts: don't set its value unless you really know what you are doing. +When set to a non-empty value, this represents the (external facing) hostname for the underlying server. + new java.lang.String[] { Port, ServerStartCode, ServerCurrentTime, UseThisHostnameInstead, }); {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/12708//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12708//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12708//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12708//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12708//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12708//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12708//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12708//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12708//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12708//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12708//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12708//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/12708//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/12708//console This message is automatically generated. Ability impaired using HBase on multihomed hosts Key: HBASE-12954 URL: https://issues.apache.org/jira/browse/HBASE-12954 Project: HBase Issue Type: Bug Affects Versions: 0.98.4 Reporter: Clay B. Assignee: Ted Yu Priority: Minor Attachments: 12954-v1.txt, 12954-v10.txt, 12954-v7.txt, 12954-v8.txt, Hadoop Three Interfaces.png For HBase clusters running on unusual networks (such as NAT'd cloud environments or physical machines with multiple IP's per network interface) it would be ideal to have a way to both specify: # which IP interface to which HBase master or region-server will bind # what hostname HBase will advertise in Zookeeper both for a master or region-server process While efforts such as HBASE-8640 go a long way to normalize these two sources of information, it is not possible in the current design of the properties available to an administrator for
[jira] [Updated] (HBASE-12976) Set default value for hbase.client.scanner.max.result.size
[ https://issues.apache.org/jira/browse/HBASE-12976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-12976: -- Summary: Set default value for hbase.client.scanner.max.result.size (was: Default hbase.client.scanner.max.result.size) Set default value for hbase.client.scanner.max.result.size -- Key: HBASE-12976 URL: https://issues.apache.org/jira/browse/HBASE-12976 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 0.94.27, 0.98.11 Attachments: 12976-v2.txt, 12976.txt Setting scanner caching is somewhat of a black art. It's hard to estimate ahead of time how large the result set will be. I propose we hbase.client.scanner.max.result.size to 2mb. That is good compromise between performance and buffer usage on typical networks (avoiding OOMs when the caching was chosen too high). To an HTable client this is completely transparent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12979) Use setters instead of return values for handing back statistics from HRegion methods
Andrew Purtell created HBASE-12979: -- Summary: Use setters instead of return values for handing back statistics from HRegion methods Key: HBASE-12979 URL: https://issues.apache.org/jira/browse/HBASE-12979 Project: HBase Issue Type: Improvement Affects Versions: 0.98.10 Reporter: Andrew Purtell Assignee: Andrew Purtell In HBASE-5162 (and backports such as HBASE-12729) we modified some HRegion methods to return statistics for consumption by callers. The statistics are ultimately passed back to the client as load feedback. [~lhofhansl] thinks returning this information is a weird mix of concerns. This also produced a difficult to anticipate binary compatibility issue with Phoenix. There was no compile time issue because the code of course was not structured to assign from a method returning void, yet the method signature changes so the JVM cannot resolve it if older Phoenix binaries are installed into a 0.98.10 release. Let's change the HRegion methods back to returning 'void' and use setters instead. Officially we don't support use of HRegion (HBASE-12566) but we do not need to go out of our way to break things (smile) so I would also like to make a patch release containing just this change to help out our sister project. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12979) Use setters instead of return values for handing back statistics from HRegion methods
[ https://issues.apache.org/jira/browse/HBASE-12979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-12979: --- Fix Version/s: 0.98.10.1 Use setters instead of return values for handing back statistics from HRegion methods - Key: HBASE-12979 URL: https://issues.apache.org/jira/browse/HBASE-12979 Project: HBase Issue Type: Improvement Affects Versions: 0.98.10 Reporter: Andrew Purtell Assignee: Andrew Purtell Labels: phoenix Fix For: 0.98.10.1 In HBASE-5162 (and backports such as HBASE-12729) we modified some HRegion methods to return statistics for consumption by callers. The statistics are ultimately passed back to the client as load feedback. [~lhofhansl] thinks returning this information is a weird mix of concerns. This also produced a difficult to anticipate binary compatibility issue with Phoenix. There was no compile time issue because the code of course was not structured to assign from a method returning void, yet the method signature changes so the JVM cannot resolve it if older Phoenix binaries are installed into a 0.98.10 release. Let's change the HRegion methods back to returning 'void' and use setters instead. Officially we don't support use of HRegion (HBASE-12566) but we do not need to go out of our way to break things (smile) so I would also like to make a patch release containing just this change to help out our sister project. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12979) Use setters instead of return values for handing back statistics from HRegion methods
[ https://issues.apache.org/jira/browse/HBASE-12979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308038#comment-14308038 ] Andrew Purtell commented on HBASE-12979: Not super urgent, let's look at a patch release next week. I'll have a patch here before then. Use setters instead of return values for handing back statistics from HRegion methods - Key: HBASE-12979 URL: https://issues.apache.org/jira/browse/HBASE-12979 Project: HBase Issue Type: Improvement Affects Versions: 0.98.10 Reporter: Andrew Purtell Assignee: Andrew Purtell Labels: phoenix Fix For: 0.98.10.1 In HBASE-5162 (and backports such as HBASE-12729) we modified some HRegion methods to return statistics for consumption by callers. The statistics are ultimately passed back to the client as load feedback. [~lhofhansl] thinks handing back this information as return values from HRegion methods is a weird mix of concerns. This also produced a difficult to anticipate binary compatibility issue with Phoenix. There was no compile time issue because the code of course was not structured to assign from a method returning void, yet the method signature changes so the JVM cannot resolve it if older Phoenix binaries are installed into a 0.98.10 release. Let's change the HRegion methods back to returning 'void' and use setters instead. Officially we don't support use of HRegion (HBASE-12566) but we do not need to go out of our way to break things (smile) so I would also like to make a patch release containing just this change to help out our sister project. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12975) SplitTranaction, RegionMergeTransaction to should have InterfaceAudience of LimitedPrivate(Coproc,Phoenix)
[ https://issues.apache.org/jira/browse/HBASE-12975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308035#comment-14308035 ] Rajeshbabu Chintaguntla commented on HBASE-12975: - [~lhofhansl] Most of the failed split clean up will be taken care by HBase only. If split fails for data or index region before PONR rollback will be called for data region, internally in the coprocessors calling rollback on index region so for both region split clean up happen properly. If RS down here master takes care of cleaning up the failed splits. After PONR updating meta with both data and index region split mutations together if any failure from onwards the RS will be aborted so while recovering master sees split regions of both data and index regions from meta and assign them. SplitTranaction, RegionMergeTransaction to should have InterfaceAudience of LimitedPrivate(Coproc,Phoenix) -- Key: HBASE-12975 URL: https://issues.apache.org/jira/browse/HBASE-12975 Project: HBase Issue Type: Improvement Reporter: Rajeshbabu Chintaguntla Assignee: Rajeshbabu Chintaguntla Priority: Minor Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11 Attachments: HBASE-12975.patch Making SplitTransaction, RegionMergeTransaction limited private is required to support local indexing feature in Phoenix to ensure regions colocation. We can ensure region split, regions merge in the coprocessors in few method calls without touching internals like creating zk's, file layout changes or assignments. 1) stepsBeforePONR, stepsAfterPONR we can ensure split. 2) meta entries can pass through coprocessors to atomically update with the normal split/merge. 3) rollback on failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12979) Use setters instead of return values for handing back statistics from HRegion methods
[ https://issues.apache.org/jira/browse/HBASE-12979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-12979: --- Description: In HBASE-5162 (and backports such as HBASE-12729) we modified some HRegion methods to return statistics for consumption by callers. The statistics are ultimately passed back to the client as load feedback. [~lhofhansl] thinks handing back this information as return values from HRegion methods is a weird mix of concerns. This also produced a difficult to anticipate binary compatibility issue with Phoenix. There was no compile time issue because the code of course was not structured to assign from a method returning void, yet the method signature changes so the JVM cannot resolve it if older Phoenix binaries are installed into a 0.98.10 release. Let's change the HRegion methods back to returning 'void' and use setters instead. Officially we don't support use of HRegion (HBASE-12566) but we do not need to go out of our way to break things (smile) so I would also like to make a patch release containing just this change to help out our sister project. was: In HBASE-5162 (and backports such as HBASE-12729) we modified some HRegion methods to return statistics for consumption by callers. The statistics are ultimately passed back to the client as load feedback. [~lhofhansl] thinks returning this information is a weird mix of concerns. This also produced a difficult to anticipate binary compatibility issue with Phoenix. There was no compile time issue because the code of course was not structured to assign from a method returning void, yet the method signature changes so the JVM cannot resolve it if older Phoenix binaries are installed into a 0.98.10 release. Let's change the HRegion methods back to returning 'void' and use setters instead. Officially we don't support use of HRegion (HBASE-12566) but we do not need to go out of our way to break things (smile) so I would also like to make a patch release containing just this change to help out our sister project. Use setters instead of return values for handing back statistics from HRegion methods - Key: HBASE-12979 URL: https://issues.apache.org/jira/browse/HBASE-12979 Project: HBase Issue Type: Improvement Affects Versions: 0.98.10 Reporter: Andrew Purtell Assignee: Andrew Purtell Labels: phoenix Fix For: 0.98.10.1 In HBASE-5162 (and backports such as HBASE-12729) we modified some HRegion methods to return statistics for consumption by callers. The statistics are ultimately passed back to the client as load feedback. [~lhofhansl] thinks handing back this information as return values from HRegion methods is a weird mix of concerns. This also produced a difficult to anticipate binary compatibility issue with Phoenix. There was no compile time issue because the code of course was not structured to assign from a method returning void, yet the method signature changes so the JVM cannot resolve it if older Phoenix binaries are installed into a 0.98.10 release. Let's change the HRegion methods back to returning 'void' and use setters instead. Officially we don't support use of HRegion (HBASE-12566) but we do not need to go out of our way to break things (smile) so I would also like to make a patch release containing just this change to help out our sister project. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11544) [Ergonomics] hbase.client.scanner.caching is dogged and will try to return batch even if it means OOME
[ https://issues.apache.org/jira/browse/HBASE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308054#comment-14308054 ] Jonathan Lawlor commented on HBASE-11544: - Thanks for the feedback [~lhofhansl]. Setting 2mb as the default maxResultSize definitely makes sense in the current workflow where caching is set very high. In cases where the size limit is hit before the caching limit this will almost be equivalent to how I was thinking of implementing solution #1. Certainly the move to a full streaming protocol (solution #2) would be the best solution. However, the implementation of solution #1 seems like it would fix the major pain point of running out of memory on the server and may even help us improve the current paradigm of how the network is used. In the move to solution #1 we could improve network usage by reworking the semantics behind Scan#setCaching() (as [~lhofhansl] has called out in HBASE-12976). Rather than looking to fetch a certain number of rows from an RPC request we woud instead try to ensure that we always fill the maxResultSize in the RPC response sent back (if the chunk cannot be filled, just send back what we have). The rework of the caching could then instead change caching into more of a row limit concept where by default the limit is Long.MAX. Then, in the default case we could service RPC requests on the basis of Batch and MaxResultSize alone. We would no longer need to see if we had accumulated enough rows to meet the caching limit. Instead we would do the following: - While the maxResultSize has not been reached, accumulate results. If Scan#setBatch() is not set, each Result will represent a row. If Scan#setBatch() is set, each Result will represent a group of cells from a row. - The maxResultSize would be checked at the cell level rather than the row level like it currently is. What this means is that rather than fetching a row and seeing if we have met our maxResultSize cap, we would be fetching a cell and seeing if we have hit our maxResultSize cap. This finer grain size check means it will be possible for us to retrieve rows that would otherwise cause OOME exceptions. Thus, the RPC response would be returning a list of Results where the last Result will likely be a partial result (i.e. will not contain all of the cells for its rows) since the maxResultSize limit would be triggered after fetching a cell. Then on the client side we can determine, maybe based on a new Scan setting, whether or not the caller will be okay with receiving the results for a row in parts (similar to batching). If the caller expects each Result to contain all of the cells for a single row, then we can accumulate the partial results for a row on the client side and then combine them into a single result before adding them to our client side cache (note that as [~lhofhansl] has mentioned, this presents the possibility that the client may OOME when reading large rows because the entire row would need to be constructed client side). In summary, what the implementation of solution #1 does for us is: - Reduce the likelihood of OOME on the server. OOME may still occur if a single cell is too large for the server to handle. This issue could be fixed with the move to a full streaming protocol (solution #2) - Allows the client to retrieve rows that they may otherwise be unable to retrieve currently due to RowTooBigExceptions - Provides a finer grained restriction on the ResultSize on the server -- Since the limit on the result size is checked on a cell by cell basis, we will no longer overshoot the result size limit by large amounts when the rows are large - Addresses points #1 and #2 of [~lhofhansl]'s list above. #3 would be addressed by an implementation of streaming The implementation details would be as in my previous comment, but with the addition of changing the semantics behind caching to act more as a limit on the number of rows. Once again, any feedback is greatly appreciated. Thanks [Ergonomics] hbase.client.scanner.caching is dogged and will try to return batch even if it means OOME -- Key: HBASE-11544 URL: https://issues.apache.org/jira/browse/HBASE-11544 Project: HBase Issue Type: Bug Reporter: stack Priority: Critical Labels: beginner Running some tests, I set hbase.client.scanner.caching=1000. Dataset has large cells. I kept OOME'ing. Serverside, we should measure how much we've accumulated and return to the client whatever we've gathered once we pass out a certain size threshold rather than keep accumulating till we OOME. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12975) SplitTranaction, RegionMergeTransaction to should have InterfaceAudience of LimitedPrivate(Coproc,Phoenix)
[ https://issues.apache.org/jira/browse/HBASE-12975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308048#comment-14308048 ] Lars Hofhansl commented on HBASE-12975: --- So if you had an interface with beforePONR and afterPONR methods, you'd be good? You'd call beforePONR on both transactions and if successful call afterPONR on both? SplitTranaction, RegionMergeTransaction to should have InterfaceAudience of LimitedPrivate(Coproc,Phoenix) -- Key: HBASE-12975 URL: https://issues.apache.org/jira/browse/HBASE-12975 Project: HBase Issue Type: Improvement Reporter: Rajeshbabu Chintaguntla Assignee: Rajeshbabu Chintaguntla Priority: Minor Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11 Attachments: HBASE-12975.patch Making SplitTransaction, RegionMergeTransaction limited private is required to support local indexing feature in Phoenix to ensure regions colocation. We can ensure region split, regions merge in the coprocessors in few method calls without touching internals like creating zk's, file layout changes or assignments. 1) stepsBeforePONR, stepsAfterPONR we can ensure split. 2) meta entries can pass through coprocessors to atomically update with the normal split/merge. 3) rollback on failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12954) Ability impaired using HBase on multihomed hosts
[ https://issues.apache.org/jira/browse/HBASE-12954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307780#comment-14307780 ] Ted Yu commented on HBASE-12954: Thanks for the comments, Lars. bq. If the hostname is not set there should be absolutely no additional lookups in any code path, forward or reverse. I checked my patch one more time. The above requirement is satisfied. Ability impaired using HBase on multihomed hosts Key: HBASE-12954 URL: https://issues.apache.org/jira/browse/HBASE-12954 Project: HBase Issue Type: Bug Affects Versions: 0.98.4 Reporter: Clay B. Assignee: Ted Yu Priority: Minor Attachments: 12954-v1.txt, 12954-v7.txt, 12954-v8.txt, Hadoop Three Interfaces.png For HBase clusters running on unusual networks (such as NAT'd cloud environments or physical machines with multiple IP's per network interface) it would be ideal to have a way to both specify: # which IP interface to which HBase master or region-server will bind # what hostname HBase will advertise in Zookeeper both for a master or region-server process While efforts such as HBASE-8640 go a long way to normalize these two sources of information, it is not possible in the current design of the properties available to an administrator for these to be unambiguously specified. One has been able to request {{hbase.master.ipc.address}} or {{hbase.regionserver.ipc.address}} but one can not specify the desired HBase {{hbase.master.hostname}}. (It was removed in HBASE-1357, further I am unaware of a region-server equivalent.) I use a configuration management system to generate all of my configuration files on a per-machine basis. As such, an option to generate a file specifying exactly which hostname to use would be helpful. Today, specifying the bind address for HBase works and one can use an HBase-only DNS for faking what to put in Zookeeper but this is far from ideal. Network interfaces have no intrinsic IP address, nor hostname. Specifing a DNS server is awkward as the DNS server may differ from the system's resolver and is a single IP address. Similarly, on hosts which use a transient VIP (e.g. through keepalived) for other services, it means there's a seemingly non-deterministic hostname choice made by HBase depending on the state of the VIP at daemon start-up time. I will attach two networking examples I use which become very difficult to manage under the current properties. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12954) Ability impaired using HBase on multihomed hosts
[ https://issues.apache.org/jira/browse/HBASE-12954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-12954: --- Attachment: 12954-v10.txt Ability impaired using HBase on multihomed hosts Key: HBASE-12954 URL: https://issues.apache.org/jira/browse/HBASE-12954 Project: HBase Issue Type: Bug Affects Versions: 0.98.4 Reporter: Clay B. Assignee: Ted Yu Priority: Minor Attachments: 12954-v1.txt, 12954-v10.txt, 12954-v7.txt, 12954-v8.txt, Hadoop Three Interfaces.png For HBase clusters running on unusual networks (such as NAT'd cloud environments or physical machines with multiple IP's per network interface) it would be ideal to have a way to both specify: # which IP interface to which HBase master or region-server will bind # what hostname HBase will advertise in Zookeeper both for a master or region-server process While efforts such as HBASE-8640 go a long way to normalize these two sources of information, it is not possible in the current design of the properties available to an administrator for these to be unambiguously specified. One has been able to request {{hbase.master.ipc.address}} or {{hbase.regionserver.ipc.address}} but one can not specify the desired HBase {{hbase.master.hostname}}. (It was removed in HBASE-1357, further I am unaware of a region-server equivalent.) I use a configuration management system to generate all of my configuration files on a per-machine basis. As such, an option to generate a file specifying exactly which hostname to use would be helpful. Today, specifying the bind address for HBase works and one can use an HBase-only DNS for faking what to put in Zookeeper but this is far from ideal. Network interfaces have no intrinsic IP address, nor hostname. Specifing a DNS server is awkward as the DNS server may differ from the system's resolver and is a single IP address. Similarly, on hosts which use a transient VIP (e.g. through keepalived) for other services, it means there's a seemingly non-deterministic hostname choice made by HBase depending on the state of the VIP at daemon start-up time. I will attach two networking examples I use which become very difficult to manage under the current properties. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12976) Default hbase.client.scanner.max.result.size
[ https://issues.apache.org/jira/browse/HBASE-12976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-12976: -- Attachment: 12976-v2.txt Adds confiug hbase-default.xml. Planning to commit this soon. Default hbase.client.scanner.max.result.size Key: HBASE-12976 URL: https://issues.apache.org/jira/browse/HBASE-12976 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 0.94.27, 0.98.11 Attachments: 12976-v2.txt, 12976.txt Setting scanner caching is somewhat of a black art. It's hard to estimate ahead of time how large the result set will be. I propose we hbase.client.scanner.max.result.size to 2mb. That is good compromise between performance and buffer usage on typical networks (avoiding OOMs when the caching was chosen too high). To an HTable client this is completely transparent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-12976) Default hbase.client.scanner.max.result.size
[ https://issues.apache.org/jira/browse/HBASE-12976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl reassigned HBASE-12976: - Assignee: Lars Hofhansl Default hbase.client.scanner.max.result.size Key: HBASE-12976 URL: https://issues.apache.org/jira/browse/HBASE-12976 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 0.94.27, 0.98.11 Attachments: 12976-v2.txt, 12976.txt Setting scanner caching is somewhat of a black art. It's hard to estimate ahead of time how large the result set will be. I propose we hbase.client.scanner.max.result.size to 2mb. That is good compromise between performance and buffer usage on typical networks (avoiding OOMs when the caching was chosen too high). To an HTable client this is completely transparent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12958) SSH doing hbase:meta get but hbase:meta not assigned
[ https://issues.apache.org/jira/browse/HBASE-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307805#comment-14307805 ] stack commented on HBASE-12958: --- Let me do it [~apurtell] SSH doing hbase:meta get but hbase:meta not assigned Key: HBASE-12958 URL: https://issues.apache.org/jira/browse/HBASE-12958 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: stack Assignee: stack Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11 Attachments: 12958.txt, 12958v2.txt, 12958v2.txt All master threads are blocked waiting on this call to return: {code} MASTER_SERVER_OPERATIONS-c2020:16020-2 #189 prio=5 os_prio=0 tid=0x7f4b0408b000 nid=0x7821 in Object.wait() [0x7f4ada24d000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:168) - locked 0x00041c374f50 (a java.util.concurrent.atomic.AtomicBoolean) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:881) at org.apache.hadoop.hbase.MetaTableAccessor.get(MetaTableAccessor.java:208) at org.apache.hadoop.hbase.MetaTableAccessor.getRegionLocation(MetaTableAccessor.java:250) at org.apache.hadoop.hbase.MetaTableAccessor.getRegion(MetaTableAccessor.java:225) at org.apache.hadoop.hbase.master.RegionStates.serverOffline(RegionStates.java:634) - locked 0x00041c1f0d80 (a org.apache.hadoop.hbase.master.RegionStates) at org.apache.hadoop.hbase.master.AssignmentManager.processServerShutdown(AssignmentManager.java:3298) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:226) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} Master is stuck trying to find hbase:meta on the server that just crashed and that we just recovered: Mon Feb 02 23:00:02 PST 2015, null, java.net.SocketTimeoutException: callTimeout=6, callDuration=68181: row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=c2022.halxg.cloudera.com,16020,1422944918568, seqNum=0 Will add more detail in a sec. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12958) SSH doing hbase:meta get but hbase:meta not assigned
[ https://issues.apache.org/jira/browse/HBASE-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307811#comment-14307811 ] Andrew Purtell commented on HBASE-12958: I already have a patch. Going to push it in a sec, just waiting on a few tests. SSH doing hbase:meta get but hbase:meta not assigned Key: HBASE-12958 URL: https://issues.apache.org/jira/browse/HBASE-12958 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: stack Assignee: stack Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11 Attachments: 12958.txt, 12958v2.txt, 12958v2.txt All master threads are blocked waiting on this call to return: {code} MASTER_SERVER_OPERATIONS-c2020:16020-2 #189 prio=5 os_prio=0 tid=0x7f4b0408b000 nid=0x7821 in Object.wait() [0x7f4ada24d000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:168) - locked 0x00041c374f50 (a java.util.concurrent.atomic.AtomicBoolean) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:881) at org.apache.hadoop.hbase.MetaTableAccessor.get(MetaTableAccessor.java:208) at org.apache.hadoop.hbase.MetaTableAccessor.getRegionLocation(MetaTableAccessor.java:250) at org.apache.hadoop.hbase.MetaTableAccessor.getRegion(MetaTableAccessor.java:225) at org.apache.hadoop.hbase.master.RegionStates.serverOffline(RegionStates.java:634) - locked 0x00041c1f0d80 (a org.apache.hadoop.hbase.master.RegionStates) at org.apache.hadoop.hbase.master.AssignmentManager.processServerShutdown(AssignmentManager.java:3298) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:226) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} Master is stuck trying to find hbase:meta on the server that just crashed and that we just recovered: Mon Feb 02 23:00:02 PST 2015, null, java.net.SocketTimeoutException: callTimeout=6, callDuration=68181: row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=c2022.halxg.cloudera.com,16020,1422944918568, seqNum=0 Will add more detail in a sec. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12958) SSH doing hbase:meta get but hbase:meta not assigned
[ https://issues.apache.org/jira/browse/HBASE-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307814#comment-14307814 ] stack commented on HBASE-12958: --- OK. Sorry about that [~apurtell] Thought I'd built but can't have. This patch is causing issue in branch-1* I am working on fixing it there. Will do same for 0.98 branch if an issue. SSH doing hbase:meta get but hbase:meta not assigned Key: HBASE-12958 URL: https://issues.apache.org/jira/browse/HBASE-12958 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: stack Assignee: stack Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11 Attachments: 12958.txt, 12958v2.txt, 12958v2.txt All master threads are blocked waiting on this call to return: {code} MASTER_SERVER_OPERATIONS-c2020:16020-2 #189 prio=5 os_prio=0 tid=0x7f4b0408b000 nid=0x7821 in Object.wait() [0x7f4ada24d000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:168) - locked 0x00041c374f50 (a java.util.concurrent.atomic.AtomicBoolean) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:881) at org.apache.hadoop.hbase.MetaTableAccessor.get(MetaTableAccessor.java:208) at org.apache.hadoop.hbase.MetaTableAccessor.getRegionLocation(MetaTableAccessor.java:250) at org.apache.hadoop.hbase.MetaTableAccessor.getRegion(MetaTableAccessor.java:225) at org.apache.hadoop.hbase.master.RegionStates.serverOffline(RegionStates.java:634) - locked 0x00041c1f0d80 (a org.apache.hadoop.hbase.master.RegionStates) at org.apache.hadoop.hbase.master.AssignmentManager.processServerShutdown(AssignmentManager.java:3298) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:226) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} Master is stuck trying to find hbase:meta on the server that just crashed and that we just recovered: Mon Feb 02 23:00:02 PST 2015, null, java.net.SocketTimeoutException: callTimeout=6, callDuration=68181: row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=c2022.halxg.cloudera.com,16020,1422944918568, seqNum=0 Will add more detail in a sec. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12747) IntegrationTestMTTR will OOME if launched with mvn verify
[ https://issues.apache.org/jira/browse/HBASE-12747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306830#comment-14306830 ] Abhishek Singh Chouhan commented on HBASE-12747: [~apurtell] IntegrationTestMTTR will OOME if launched with mvn verify - Key: HBASE-12747 URL: https://issues.apache.org/jira/browse/HBASE-12747 Project: HBase Issue Type: Bug Affects Versions: 0.98.9 Reporter: Andrew Purtell Priority: Minor Attachments: HBASE-12747-v1.patch, HBASE-12747.patch, org.apache.hadoop.hbase.mttr.IntegrationTestMTTR-output.txt.gz IntegrationTestMTRR will OOME if launched like: {noformat} cd hbase-it mvn verify -Dit.test=IntegrationTestMTTR {noformat} Linux environment, 7u67. Looks like we should bump the heap on the failsafe argline in the POM. {noformat} 2014-12-22 11:24:07,725 ERROR [B.DefaultRpcServer.handler=2,queue=0,port=55672] ipc.RpcServer(2067): Unexpected throwable o bject java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.hbase.regionserver.MemStoreLAB$Chunk.init(MemStoreLAB.java:246) at org.apache.hadoop.hbase.regionserver.MemStoreLAB.getOrMakeChunk(MemStoreLAB.java:196) at org.apache.hadoop.hbase.regionserver.MemStoreLAB.allocateBytes(MemStoreLAB.java:114) at org.apache.hadoop.hbase.regionserver.MemStore.maybeCloneWithAllocator(MemStore.java:274) at org.apache.hadoop.hbase.regionserver.MemStore.add(MemStore.java:229) at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:576) at org.apache.hadoop.hbase.regionserver.HRegion.applyFamilyMapToMemstore(HRegion.java:3084) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2517) at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2284) at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2239) at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2243) at org.apache.hadoop.hbase.regionserver.HRegionServer.doBatchOp(HRegionServer.java:4482) at org.apache.hadoop.hbase.regionserver.HRegionServer.doNonAtomicRegionMutation(HRegionServer.java:3665) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3554) {noformat} Another minor issue: After taking the OOME, the test executor will linger indefinitely as a zombie. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12949) Scanner can be stuck in infinite loop if the HFile is corrupted
[ https://issues.apache.org/jira/browse/HBASE-12949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306836#comment-14306836 ] ramkrishna.s.vasudevan commented on HBASE-12949: +1 on moving to CellUtil. Checking only the type is enough? Why only rowLength and famLength checked? Scanner can be stuck in infinite loop if the HFile is corrupted --- Key: HBASE-12949 URL: https://issues.apache.org/jira/browse/HBASE-12949 Project: HBase Issue Type: Bug Affects Versions: 0.94.3, 0.98.10 Reporter: Jerry He Attachments: HBASE-12949-master.patch We've encountered problem where compaction hangs and never completes. After looking into it further, we found that the compaction scanner was stuck in a infinite loop. See stack below. {noformat} org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:296) org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:257) org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:697) org.apache.hadoop.hbase.regionserver.StoreScanner.seekToNextRow(StoreScanner.java:672) org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:529) org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:223) {noformat} We identified the hfile that seems to be corrupted. Using HFile tool shows the following: {noformat} [biadmin@hdtest009 bin]$ hbase org.apache.hadoop.hbase.io.hfile.HFile -v -k -m -f /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 15/01/23 11:53:17 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 15/01/23 11:53:18 INFO util.ChecksumType: Checksum using org.apache.hadoop.util.PureJavaCrc32 15/01/23 11:53:18 INFO util.ChecksumType: Checksum can use org.apache.hadoop.util.PureJavaCrc32C 15/01/23 11:53:18 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS Scanning - /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 WARNING, previous row is greater then current row filename - /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 previous - \x00/20110203-094231205-79442793-1410161293068203000\x0Aattributes16794406\x00\x00\x01\x00\x00\x00\x00\x00\x00 current - Exception in thread main java.nio.BufferUnderflowException at java.nio.Buffer.nextGetIndex(Buffer.java:489) at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:347) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.readKeyValueLen(HFileReaderV2.java:856) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:768) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.scanKeysValues(HFilePrettyPrinter.java:362) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.processFile(HFilePrettyPrinter.java:262) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.run(HFilePrettyPrinter.java:220) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.main(HFilePrettyPrinter.java:539) at org.apache.hadoop.hbase.io.hfile.HFile.main(HFile.java:802) {noformat} Turning on Java Assert shows the following: {noformat} Exception in thread main java.lang.AssertionError: Key 20110203-094231205-79442793-1410161293068203000/attributes:16794406/1099511627776/Minimum/vlen=15/mvcc=0 followed by a smaller key //0/Minimum/vlen=0/mvcc=0 in cf attributes at org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:672) {noformat} It shows that the hfile seems to be corrupted -- the keys don't seem to be right. But Scanner is not able to give a meaningful error, but stuck in an infinite loop in here: {code} KeyValueHeap.generalizedSeek() while ((scanner = heap.poll()) != null) { } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12956) Binding to 0.0.0.0 is broken after HBASE-10569
[ https://issues.apache.org/jira/browse/HBASE-12956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308127#comment-14308127 ] Esteban Gutierrez commented on HBASE-12956: --- [~enis] got pulled into some requests and I haven't had the chance to upload the patch rebased to include the changes form [~tedyu] in HBASE-12954. Let me upload a patch this afternoon so we can get a new RC today or tomorrow. Binding to 0.0.0.0 is broken after HBASE-10569 -- Key: HBASE-12956 URL: https://issues.apache.org/jira/browse/HBASE-12956 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: Esteban Gutierrez Priority: Blocker Fix For: 1.0.0, 2.0.0, 1.1.0 After the Region Server and Master code was merged, we lost the functionality to bind to 0.0.0.0 via hbase.regionserver.ipc.address and znodes now get created with the wildcard address which means that RSs and the master cannot connect to each other. Thanks to [~dimaspivak] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12966) NPE in HMaster while recovering tables in Enabling state
[ https://issues.apache.org/jira/browse/HBASE-12966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308124#comment-14308124 ] Andrew Purtell commented on HBASE-12966: Is this a valid issue for 0.98? NPE in HMaster while recovering tables in Enabling state Key: HBASE-12966 URL: https://issues.apache.org/jira/browse/HBASE-12966 Project: HBase Issue Type: Bug Components: master Reporter: ramkrishna.s.vasudevan Assignee: Andrey Stepachev Fix For: 2.0.0, 1.0.1, 1.1.0 Attachments: HBASE-12966-branch-1.patch, HBASE-12966-branch-1.patch, HBASE-12966-branch-1.patch, HBASE-12966.patch, HBASE-12966.patch, HBASE-12966.patch, HBASE-12966.patch {code} java.lang.NullPointerException at org.apache.hadoop.hbase.master.handler.EnableTableHandler.handleEnableTable(EnableTableHandler.java:210) at org.apache.hadoop.hbase.master.handler.EnableTableHandler.process(EnableTableHandler.java:142) at org.apache.hadoop.hbase.master.AssignmentManager.recoverTableInEnablingState(AssignmentManager.java:1695) at org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:416) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:720) at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:170) at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1459) at java.lang.Thread.run(Thread.java:745) 2015-02-04 16:11:45,932 FATAL [stobdtserver3:16040.activeMasterManager] master.HMaster: Master server abort: loaded coprocessors are: [org.apache.hadoop.hbor.MultiRowMutationEndpoint] 2015-02-04 16:11:45,933 FATAL [stobdtserver3:16040.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown. java.lang.NullPointerException at org.apache.hadoop.hbase.master.handler.EnableTableHandler.handleEnableTable(EnableTableHandler.java:210) at org.apache.hadoop.hbase.master.handler.EnableTableHandler.process(EnableTableHandler.java:142) at org.apache.hadoop.hbase.master.AssignmentManager.recoverTableInEnablingState(AssignmentManager.java:1695) at org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:416) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:720) at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:170) at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1459) at java.lang.Thread.run(Thread.java:745) {code} A table was trying to recover from ENABLING state and the master got the above exception. Note that the set up was 2 master with 1 RS (total 3 machines). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12958) SSH doing hbase:meta get but hbase:meta not assigned
[ https://issues.apache.org/jira/browse/HBASE-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308182#comment-14308182 ] Hudson commented on HBASE-12958: FAILURE: Integrated in HBase-0.98 #837 (See [https://builds.apache.org/job/HBase-0.98/837/]) Amend HBASE-12958 SSH doing hbase:meta get but hbase:meta not assigned (apurtell: rev 57e947c45697bd38277e39c092c5feb3917cf75c) * hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java SSH doing hbase:meta get but hbase:meta not assigned Key: HBASE-12958 URL: https://issues.apache.org/jira/browse/HBASE-12958 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: stack Assignee: stack Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11 Attachments: 12958.branch-1.v2.txt, 12958.txt, 12958v2.txt, 12958v2.txt, HBASE-12958-0.98-addendum.patch All master threads are blocked waiting on this call to return: {code} MASTER_SERVER_OPERATIONS-c2020:16020-2 #189 prio=5 os_prio=0 tid=0x7f4b0408b000 nid=0x7821 in Object.wait() [0x7f4ada24d000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:168) - locked 0x00041c374f50 (a java.util.concurrent.atomic.AtomicBoolean) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:881) at org.apache.hadoop.hbase.MetaTableAccessor.get(MetaTableAccessor.java:208) at org.apache.hadoop.hbase.MetaTableAccessor.getRegionLocation(MetaTableAccessor.java:250) at org.apache.hadoop.hbase.MetaTableAccessor.getRegion(MetaTableAccessor.java:225) at org.apache.hadoop.hbase.master.RegionStates.serverOffline(RegionStates.java:634) - locked 0x00041c1f0d80 (a org.apache.hadoop.hbase.master.RegionStates) at org.apache.hadoop.hbase.master.AssignmentManager.processServerShutdown(AssignmentManager.java:3298) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:226) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} Master is stuck trying to find hbase:meta on the server that just crashed and that we just recovered: Mon Feb 02 23:00:02 PST 2015, null, java.net.SocketTimeoutException: callTimeout=6, callDuration=68181: row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=c2022.halxg.cloudera.com,16020,1422944918568, seqNum=0 Will add more detail in a sec. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-7332) [webui] HMaster webui should display the number of regions a table has.
[ https://issues.apache.org/jira/browse/HBASE-7332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308181#comment-14308181 ] Hudson commented on HBASE-7332: --- FAILURE: Integrated in HBase-0.98 #837 (See [https://builds.apache.org/job/HBase-0.98/837/]) HBASE-7332 [webui] HMaster webui should display the number of regions a table has. (Andrey Stepachev) (apurtell: rev afe33d1db079a51f019ba7979f42599f730d7c6c) * hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/MasterStatusTmpl.jamon * hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java * hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManagerOnCluster.java [webui] HMaster webui should display the number of regions a table has. --- Key: HBASE-7332 URL: https://issues.apache.org/jira/browse/HBASE-7332 Project: HBase Issue Type: Bug Components: UI Affects Versions: 2.0.0, 1.1.0 Reporter: Jonathan Hsieh Assignee: Andrey Stepachev Priority: Minor Labels: beginner, operability Fix For: 2.0.0, 1.1.0, 0.98.11 Attachments: HBASE-7332-0.98.patch, HBASE-7332.patch, HBASE-7332.patch, Screen Shot 2014-07-28 at 4.10.01 PM.png, Screen Shot 2015-02-03 at 9.23.57 AM.png Pre-0.96/trunk hbase displayed the number of regions per table in the table listing. Would be good to have this back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12976) Set default value for hbase.client.scanner.max.result.size
[ https://issues.apache.org/jira/browse/HBASE-12976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308193#comment-14308193 ] Hudson commented on HBASE-12976: FAILURE: Integrated in HBase-1.1 #144 (See [https://builds.apache.org/job/HBase-1.1/144/]) HBASE-12976 Set default value for hbase.client.scanner.max.result.size. (larsh: rev af7b5fa94a70509633ce52af361b00698447e659) * hbase-common/src/main/resources/hbase-default.xml * hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java Set default value for hbase.client.scanner.max.result.size -- Key: HBASE-12976 URL: https://issues.apache.org/jira/browse/HBASE-12976 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.94.27, 0.98.11 Attachments: 12976-v2.txt, 12976.txt Setting scanner caching is somewhat of a black art. It's hard to estimate ahead of time how large the result set will be. I propose we hbase.client.scanner.max.result.size to 2mb. That is good compromise between performance and buffer usage on typical networks (avoiding OOMs when the caching was chosen too high). To an HTable client this is completely transparent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12966) NPE in HMaster while recovering tables in Enabling state
[ https://issues.apache.org/jira/browse/HBASE-12966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308192#comment-14308192 ] Hudson commented on HBASE-12966: FAILURE: Integrated in HBase-1.1 #144 (See [https://builds.apache.org/job/HBase-1.1/144/]) HBASE-12966 NPE in HMaster while recovering tables in Enabling state (Andrey Stepachev) (tedyu: rev 58b943a8421037d839a70a0e490abd52e79910ff) * hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java * hbase-server/src/test/java/org/apache/hadoop/hbase/master/handler/TestEnableTableHandler.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java NPE in HMaster while recovering tables in Enabling state Key: HBASE-12966 URL: https://issues.apache.org/jira/browse/HBASE-12966 Project: HBase Issue Type: Bug Components: master Reporter: ramkrishna.s.vasudevan Assignee: Andrey Stepachev Fix For: 1.0.0, 2.0.0, 1.1.0 Attachments: HBASE-12966-branch-1.patch, HBASE-12966-branch-1.patch, HBASE-12966-branch-1.patch, HBASE-12966.patch, HBASE-12966.patch, HBASE-12966.patch, HBASE-12966.patch {code} java.lang.NullPointerException at org.apache.hadoop.hbase.master.handler.EnableTableHandler.handleEnableTable(EnableTableHandler.java:210) at org.apache.hadoop.hbase.master.handler.EnableTableHandler.process(EnableTableHandler.java:142) at org.apache.hadoop.hbase.master.AssignmentManager.recoverTableInEnablingState(AssignmentManager.java:1695) at org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:416) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:720) at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:170) at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1459) at java.lang.Thread.run(Thread.java:745) 2015-02-04 16:11:45,932 FATAL [stobdtserver3:16040.activeMasterManager] master.HMaster: Master server abort: loaded coprocessors are: [org.apache.hadoop.hbor.MultiRowMutationEndpoint] 2015-02-04 16:11:45,933 FATAL [stobdtserver3:16040.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown. java.lang.NullPointerException at org.apache.hadoop.hbase.master.handler.EnableTableHandler.handleEnableTable(EnableTableHandler.java:210) at org.apache.hadoop.hbase.master.handler.EnableTableHandler.process(EnableTableHandler.java:142) at org.apache.hadoop.hbase.master.AssignmentManager.recoverTableInEnablingState(AssignmentManager.java:1695) at org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:416) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:720) at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:170) at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1459) at java.lang.Thread.run(Thread.java:745) {code} A table was trying to recover from ENABLING state and the master got the above exception. Note that the set up was 2 master with 1 RS (total 3 machines). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12975) SplitTranaction, RegionMergeTransaction to should have InterfaceAudience of LimitedPrivate(Coproc,Phoenix)
[ https://issues.apache.org/jira/browse/HBASE-12975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308064#comment-14308064 ] Rajeshbabu Chintaguntla commented on HBASE-12975: - [~lhofhansl] Yes that will be helpful but just thinking how we can make a split on a region into joint split in case of auto split? for this do we need to have some hooks again? SplitTranaction, RegionMergeTransaction to should have InterfaceAudience of LimitedPrivate(Coproc,Phoenix) -- Key: HBASE-12975 URL: https://issues.apache.org/jira/browse/HBASE-12975 Project: HBase Issue Type: Improvement Reporter: Rajeshbabu Chintaguntla Assignee: Rajeshbabu Chintaguntla Priority: Minor Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11 Attachments: HBASE-12975.patch Making SplitTransaction, RegionMergeTransaction limited private is required to support local indexing feature in Phoenix to ensure regions colocation. We can ensure region split, regions merge in the coprocessors in few method calls without touching internals like creating zk's, file layout changes or assignments. 1) stepsBeforePONR, stepsAfterPONR we can ensure split. 2) meta entries can pass through coprocessors to atomically update with the normal split/merge. 3) rollback on failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12958) SSH doing hbase:meta get but hbase:meta not assigned
[ https://issues.apache.org/jira/browse/HBASE-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308074#comment-14308074 ] Hudson commented on HBASE-12958: SUCCESS: Integrated in HBase-1.0 #710 (See [https://builds.apache.org/job/HBase-1.0/710/]) Revert HBASE-12958 SSH doing hbase:meta get but hbase:meta not assigned (stack: rev 19087bc078dfcbfb2a6812e622a49276861a639d) * hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestRegionStates.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java * hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java SSH doing hbase:meta get but hbase:meta not assigned Key: HBASE-12958 URL: https://issues.apache.org/jira/browse/HBASE-12958 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: stack Assignee: stack Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11 Attachments: 12958.txt, 12958v2.txt, 12958v2.txt, HBASE-12958-0.98-addendum.patch All master threads are blocked waiting on this call to return: {code} MASTER_SERVER_OPERATIONS-c2020:16020-2 #189 prio=5 os_prio=0 tid=0x7f4b0408b000 nid=0x7821 in Object.wait() [0x7f4ada24d000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:168) - locked 0x00041c374f50 (a java.util.concurrent.atomic.AtomicBoolean) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:881) at org.apache.hadoop.hbase.MetaTableAccessor.get(MetaTableAccessor.java:208) at org.apache.hadoop.hbase.MetaTableAccessor.getRegionLocation(MetaTableAccessor.java:250) at org.apache.hadoop.hbase.MetaTableAccessor.getRegion(MetaTableAccessor.java:225) at org.apache.hadoop.hbase.master.RegionStates.serverOffline(RegionStates.java:634) - locked 0x00041c1f0d80 (a org.apache.hadoop.hbase.master.RegionStates) at org.apache.hadoop.hbase.master.AssignmentManager.processServerShutdown(AssignmentManager.java:3298) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:226) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} Master is stuck trying to find hbase:meta on the server that just crashed and that we just recovered: Mon Feb 02 23:00:02 PST 2015, null, java.net.SocketTimeoutException: callTimeout=6, callDuration=68181: row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=c2022.halxg.cloudera.com,16020,1422944918568, seqNum=0 Will add more detail in a sec. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12731) Heap occupancy based client pushback
[ https://issues.apache.org/jira/browse/HBASE-12731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-12731: --- Release Note: This feature incorporates reported regionserver heap occupancy in the (optional) client pushback calculations. If client pushback is enabled, the exponential backoff policy will take heap occupancy into account, should it exceed hbase.heap.occupancy.low_water_mark percentage of the heap (default 0.95). Once above the low water mark, heap occupancy is an additional factor scaling from 0.1 up to 1.0 at hbase.heap.occupancy.high_water_mark (default 0.98). At or above the high water mark the client will use the maximum configured backoff. Heap occupancy based client pushback Key: HBASE-12731 URL: https://issues.apache.org/jira/browse/HBASE-12731 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 1.0.0, 2.0.0, 0.98.10, 1.1.0 Attachments: HBASE-12731-0.98.patch, HBASE-12731-0.98.patch, HBASE-12731.patch, HBASE-12731.patch, HBASE-12731.patch, HBASE-12731.patch If the heap occupancy of a RegionServer is beyond a configurable high water mark (suggestions: 95%, 98%) then we should reject all user RPCs and only allow administrative RPCs until occupancy has dropped below a configurable low water mark (suggestions: 92%). Implement building on the HBASE-5162 changes. It might be expensive to check heap occupancy, in which case we can sample it periodically with a chore and use the last known value in pushback calculations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12956) Binding to 0.0.0.0 is broken after HBASE-10569
[ https://issues.apache.org/jira/browse/HBASE-12956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-12956: -- Fix Version/s: (was: 1.0.1) 1.0.0 Binding to 0.0.0.0 is broken after HBASE-10569 -- Key: HBASE-12956 URL: https://issues.apache.org/jira/browse/HBASE-12956 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: Esteban Gutierrez Priority: Blocker Fix For: 1.0.0, 2.0.0, 1.1.0 After the Region Server and Master code was merged, we lost the functionality to bind to 0.0.0.0 via hbase.regionserver.ipc.address and znodes now get created with the wildcard address which means that RSs and the master cannot connect to each other. Thanks to [~dimaspivak] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12956) Binding to 0.0.0.0 is broken after HBASE-10569
[ https://issues.apache.org/jira/browse/HBASE-12956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-12956: -- Priority: Blocker (was: Major) Binding to 0.0.0.0 is broken after HBASE-10569 -- Key: HBASE-12956 URL: https://issues.apache.org/jira/browse/HBASE-12956 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: Esteban Gutierrez Priority: Blocker Fix For: 1.0.0, 2.0.0, 1.1.0 After the Region Server and Master code was merged, we lost the functionality to bind to 0.0.0.0 via hbase.regionserver.ipc.address and znodes now get created with the wildcard address which means that RSs and the master cannot connect to each other. Thanks to [~dimaspivak] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12956) Binding to 0.0.0.0 is broken after HBASE-10569
[ https://issues.apache.org/jira/browse/HBASE-12956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-12956: -- Assignee: Esteban Gutierrez Binding to 0.0.0.0 is broken after HBASE-10569 -- Key: HBASE-12956 URL: https://issues.apache.org/jira/browse/HBASE-12956 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Priority: Blocker Fix For: 1.0.0, 2.0.0, 1.1.0 After the Region Server and Master code was merged, we lost the functionality to bind to 0.0.0.0 via hbase.regionserver.ipc.address and znodes now get created with the wildcard address which means that RSs and the master cannot connect to each other. Thanks to [~dimaspivak] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11568) Async WAL replication for region replicas
[ https://issues.apache.org/jira/browse/HBASE-11568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308070#comment-14308070 ] Hadoop QA commented on HBASE-11568: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12696838/11568-2-branch-1.txt against branch-1 branch at commit 96cdc7987e8894b304a3201f67cb0b9595c68cc3. ATTACHMENT ID: 12696838 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 16 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 12 warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 3789 checkstyle errors (more than the master's current 3768 errors). {color:red}-1 findbugs{color}. The patch appears to introduce 4 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.hadoop.hbase.io.hfile.slab.TestSingleSizeCache.testCacheMultiThreadedEviction(TestSingleSizeCache.java:73) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/12710//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12710//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12710//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12710//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12710//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12710//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12710//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12710//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12710//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12710//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12710//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12710//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/12710//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12710//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/12710//console This message is automatically generated. Async WAL replication for region replicas - Key: HBASE-11568 URL: https://issues.apache.org/jira/browse/HBASE-11568 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 2.0.0, 1.1.0 Attachments: 11568-2-branch-1.txt, 11568-branch-1.txt, hbase-11568_v2.patch, hbase-11568_v3.patch As mentioned in parent issue, and design docs for phase-1 (HBASE-10070) and Phase-2 (HBASE-11183), implement asynchronous WAL replication from the WAL files of the primary region to the secondary region replicas. The WAL replication will build upon the pluggable replication framework introduced in HBASE-11367, and the distributed WAL replay. Upon having some experience with the patch, we changed the design so that there is only one replication queue for doing the async wal replication to secondary replicas rather than having a queue per region
[jira] [Commented] (HBASE-12956) Binding to 0.0.0.0 is broken after HBASE-10569
[ https://issues.apache.org/jira/browse/HBASE-12956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308107#comment-14308107 ] Enis Soztutar commented on HBASE-12956: --- Ok, I'm gonna sink the release candidate because this is an important regression to fix. I've reproduced this with a cluster {{hbase.regionserver.ipc.address=0.0.0.0}}. {code} hbase(main):001:0 zk_dump HBase is rooted at /hbase Active master address: 0:0:0:0:0:0:0:0,16020,1423174431955 Backup master addresses: Region server holding hbase:meta: os-enis-hbase-1.0-test-feb5-1.novalocal,16020,1423173791765 Region servers: 0:0:0:0:0:0:0:0,16020,1423174437687 0:0:0:0:0:0:0:0,16020,1423174436415 0:0:0:0:0:0:0:0,16020,1423174434835 0:0:0:0:0:0:0:0,16020,1423174433653 0:0:0:0:0:0:0:0,16020,1423174438855 {code} Binding to 0.0.0.0 is broken after HBASE-10569 -- Key: HBASE-12956 URL: https://issues.apache.org/jira/browse/HBASE-12956 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: Esteban Gutierrez Priority: Blocker Fix For: 1.0.0, 2.0.0, 1.1.0 After the Region Server and Master code was merged, we lost the functionality to bind to 0.0.0.0 via hbase.regionserver.ipc.address and znodes now get created with the wildcard address which means that RSs and the master cannot connect to each other. Thanks to [~dimaspivak] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12954) Ability impaired using HBase on multihomed hosts
[ https://issues.apache.org/jira/browse/HBASE-12954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308122#comment-14308122 ] Enis Soztutar commented on HBASE-12954: --- Slightly related: HBASE-12956 Ability impaired using HBase on multihomed hosts Key: HBASE-12954 URL: https://issues.apache.org/jira/browse/HBASE-12954 Project: HBase Issue Type: Bug Affects Versions: 0.98.4 Reporter: Clay B. Assignee: Ted Yu Priority: Minor Attachments: 12954-v1.txt, 12954-v10.txt, 12954-v11.txt, 12954-v7.txt, 12954-v8.txt, Hadoop Three Interfaces.png For HBase clusters running on unusual networks (such as NAT'd cloud environments or physical machines with multiple IP's per network interface) it would be ideal to have a way to both specify: # which IP interface to which HBase master or region-server will bind # what hostname HBase will advertise in Zookeeper both for a master or region-server process While efforts such as HBASE-8640 go a long way to normalize these two sources of information, it is not possible in the current design of the properties available to an administrator for these to be unambiguously specified. One has been able to request {{hbase.master.ipc.address}} or {{hbase.regionserver.ipc.address}} but one can not specify the desired HBase {{hbase.master.hostname}}. (It was removed in HBASE-1357, further I am unaware of a region-server equivalent.) I use a configuration management system to generate all of my configuration files on a per-machine basis. As such, an option to generate a file specifying exactly which hostname to use would be helpful. Today, specifying the bind address for HBase works and one can use an HBase-only DNS for faking what to put in Zookeeper but this is far from ideal. Network interfaces have no intrinsic IP address, nor hostname. Specifing a DNS server is awkward as the DNS server may differ from the system's resolver and is a single IP address. Similarly, on hosts which use a transient VIP (e.g. through keepalived) for other services, it means there's a seemingly non-deterministic hostname choice made by HBase depending on the state of the VIP at daemon start-up time. I will attach two networking examples I use which become very difficult to manage under the current properties. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-8329) Limit compaction speed
[ https://issues.apache.org/jira/browse/HBASE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308138#comment-14308138 ] Andrew Purtell commented on HBASE-8329: --- Pushed to 0.98 Limit compaction speed -- Key: HBASE-8329 URL: https://issues.apache.org/jira/browse/HBASE-8329 Project: HBase Issue Type: Improvement Components: Compaction Reporter: binlijin Assignee: zhangduo Fix For: 2.0.0, 1.1.0 Attachments: HBASE-8329-0.98.patch, HBASE-8329-10.patch, HBASE-8329-11.patch, HBASE-8329-12.patch, HBASE-8329-2-trunk.patch, HBASE-8329-3-trunk.patch, HBASE-8329-4-trunk.patch, HBASE-8329-5-trunk.patch, HBASE-8329-6-trunk.patch, HBASE-8329-7-trunk.patch, HBASE-8329-8-trunk.patch, HBASE-8329-9-trunk.patch, HBASE-8329-branch-1.patch, HBASE-8329-trunk.patch, HBASE-8329_13.patch, HBASE-8329_14.patch, HBASE-8329_15.patch, HBASE-8329_16.patch, HBASE-8329_17.patch There is no speed or resource limit for compaction,I think we should add this feature especially when request burst. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12958) SSH doing hbase:meta get but hbase:meta not assigned
[ https://issues.apache.org/jira/browse/HBASE-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-12958: -- Attachment: 12958.branch-1.v2.txt Branch-1 patch. SSH doing hbase:meta get but hbase:meta not assigned Key: HBASE-12958 URL: https://issues.apache.org/jira/browse/HBASE-12958 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: stack Assignee: stack Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11 Attachments: 12958.branch-1.v2.txt, 12958.txt, 12958v2.txt, 12958v2.txt, HBASE-12958-0.98-addendum.patch All master threads are blocked waiting on this call to return: {code} MASTER_SERVER_OPERATIONS-c2020:16020-2 #189 prio=5 os_prio=0 tid=0x7f4b0408b000 nid=0x7821 in Object.wait() [0x7f4ada24d000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:168) - locked 0x00041c374f50 (a java.util.concurrent.atomic.AtomicBoolean) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:881) at org.apache.hadoop.hbase.MetaTableAccessor.get(MetaTableAccessor.java:208) at org.apache.hadoop.hbase.MetaTableAccessor.getRegionLocation(MetaTableAccessor.java:250) at org.apache.hadoop.hbase.MetaTableAccessor.getRegion(MetaTableAccessor.java:225) at org.apache.hadoop.hbase.master.RegionStates.serverOffline(RegionStates.java:634) - locked 0x00041c1f0d80 (a org.apache.hadoop.hbase.master.RegionStates) at org.apache.hadoop.hbase.master.AssignmentManager.processServerShutdown(AssignmentManager.java:3298) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:226) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} Master is stuck trying to find hbase:meta on the server that just crashed and that we just recovered: Mon Feb 02 23:00:02 PST 2015, null, java.net.SocketTimeoutException: callTimeout=6, callDuration=68181: row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=c2022.halxg.cloudera.com,16020,1422944918568, seqNum=0 Will add more detail in a sec. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HBASE-12958) SSH doing hbase:meta get but hbase:meta not assigned
[ https://issues.apache.org/jira/browse/HBASE-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack reopened HBASE-12958: --- Reverted branch-1*. Reopening till get them in. SSH doing hbase:meta get but hbase:meta not assigned Key: HBASE-12958 URL: https://issues.apache.org/jira/browse/HBASE-12958 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: stack Assignee: stack Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11 Attachments: 12958.branch-1.v2.txt, 12958.txt, 12958v2.txt, 12958v2.txt, HBASE-12958-0.98-addendum.patch All master threads are blocked waiting on this call to return: {code} MASTER_SERVER_OPERATIONS-c2020:16020-2 #189 prio=5 os_prio=0 tid=0x7f4b0408b000 nid=0x7821 in Object.wait() [0x7f4ada24d000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:168) - locked 0x00041c374f50 (a java.util.concurrent.atomic.AtomicBoolean) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:881) at org.apache.hadoop.hbase.MetaTableAccessor.get(MetaTableAccessor.java:208) at org.apache.hadoop.hbase.MetaTableAccessor.getRegionLocation(MetaTableAccessor.java:250) at org.apache.hadoop.hbase.MetaTableAccessor.getRegion(MetaTableAccessor.java:225) at org.apache.hadoop.hbase.master.RegionStates.serverOffline(RegionStates.java:634) - locked 0x00041c1f0d80 (a org.apache.hadoop.hbase.master.RegionStates) at org.apache.hadoop.hbase.master.AssignmentManager.processServerShutdown(AssignmentManager.java:3298) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:226) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} Master is stuck trying to find hbase:meta on the server that just crashed and that we just recovered: Mon Feb 02 23:00:02 PST 2015, null, java.net.SocketTimeoutException: callTimeout=6, callDuration=68181: row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=c2022.halxg.cloudera.com,16020,1422944918568, seqNum=0 Will add more detail in a sec. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12958) SSH doing hbase:meta get but hbase:meta not assigned
[ https://issues.apache.org/jira/browse/HBASE-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-12958: -- Status: Patch Available (was: Reopened) SSH doing hbase:meta get but hbase:meta not assigned Key: HBASE-12958 URL: https://issues.apache.org/jira/browse/HBASE-12958 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: stack Assignee: stack Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11 Attachments: 12958.branch-1.v2.txt, 12958.txt, 12958v2.txt, 12958v2.txt, HBASE-12958-0.98-addendum.patch All master threads are blocked waiting on this call to return: {code} MASTER_SERVER_OPERATIONS-c2020:16020-2 #189 prio=5 os_prio=0 tid=0x7f4b0408b000 nid=0x7821 in Object.wait() [0x7f4ada24d000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:168) - locked 0x00041c374f50 (a java.util.concurrent.atomic.AtomicBoolean) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:881) at org.apache.hadoop.hbase.MetaTableAccessor.get(MetaTableAccessor.java:208) at org.apache.hadoop.hbase.MetaTableAccessor.getRegionLocation(MetaTableAccessor.java:250) at org.apache.hadoop.hbase.MetaTableAccessor.getRegion(MetaTableAccessor.java:225) at org.apache.hadoop.hbase.master.RegionStates.serverOffline(RegionStates.java:634) - locked 0x00041c1f0d80 (a org.apache.hadoop.hbase.master.RegionStates) at org.apache.hadoop.hbase.master.AssignmentManager.processServerShutdown(AssignmentManager.java:3298) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:226) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} Master is stuck trying to find hbase:meta on the server that just crashed and that we just recovered: Mon Feb 02 23:00:02 PST 2015, null, java.net.SocketTimeoutException: callTimeout=6, callDuration=68181: row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=c2022.halxg.cloudera.com,16020,1422944918568, seqNum=0 Will add more detail in a sec. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12966) NPE in HMaster while recovering tables in Enabling state
[ https://issues.apache.org/jira/browse/HBASE-12966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-12966: --- Release Note: (was: test was lost, now it back in the patch.) NPE in HMaster while recovering tables in Enabling state Key: HBASE-12966 URL: https://issues.apache.org/jira/browse/HBASE-12966 Project: HBase Issue Type: Bug Components: master Reporter: ramkrishna.s.vasudevan Assignee: Andrey Stepachev Fix For: 2.0.0, 1.0.1, 1.1.0 Attachments: HBASE-12966-branch-1.patch, HBASE-12966-branch-1.patch, HBASE-12966-branch-1.patch, HBASE-12966.patch, HBASE-12966.patch, HBASE-12966.patch, HBASE-12966.patch {code} java.lang.NullPointerException at org.apache.hadoop.hbase.master.handler.EnableTableHandler.handleEnableTable(EnableTableHandler.java:210) at org.apache.hadoop.hbase.master.handler.EnableTableHandler.process(EnableTableHandler.java:142) at org.apache.hadoop.hbase.master.AssignmentManager.recoverTableInEnablingState(AssignmentManager.java:1695) at org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:416) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:720) at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:170) at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1459) at java.lang.Thread.run(Thread.java:745) 2015-02-04 16:11:45,932 FATAL [stobdtserver3:16040.activeMasterManager] master.HMaster: Master server abort: loaded coprocessors are: [org.apache.hadoop.hbor.MultiRowMutationEndpoint] 2015-02-04 16:11:45,933 FATAL [stobdtserver3:16040.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown. java.lang.NullPointerException at org.apache.hadoop.hbase.master.handler.EnableTableHandler.handleEnableTable(EnableTableHandler.java:210) at org.apache.hadoop.hbase.master.handler.EnableTableHandler.process(EnableTableHandler.java:142) at org.apache.hadoop.hbase.master.AssignmentManager.recoverTableInEnablingState(AssignmentManager.java:1695) at org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:416) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:720) at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:170) at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1459) at java.lang.Thread.run(Thread.java:745) {code} A table was trying to recover from ENABLING state and the master got the above exception. Note that the set up was 2 master with 1 RS (total 3 machines). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12966) NPE in HMaster while recovering tables in Enabling state
[ https://issues.apache.org/jira/browse/HBASE-12966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-12966: --- Fix Version/s: (was: 1.0.1) 1.0.0 NPE in HMaster while recovering tables in Enabling state Key: HBASE-12966 URL: https://issues.apache.org/jira/browse/HBASE-12966 Project: HBase Issue Type: Bug Components: master Reporter: ramkrishna.s.vasudevan Assignee: Andrey Stepachev Fix For: 1.0.0, 2.0.0, 1.1.0 Attachments: HBASE-12966-branch-1.patch, HBASE-12966-branch-1.patch, HBASE-12966-branch-1.patch, HBASE-12966.patch, HBASE-12966.patch, HBASE-12966.patch, HBASE-12966.patch {code} java.lang.NullPointerException at org.apache.hadoop.hbase.master.handler.EnableTableHandler.handleEnableTable(EnableTableHandler.java:210) at org.apache.hadoop.hbase.master.handler.EnableTableHandler.process(EnableTableHandler.java:142) at org.apache.hadoop.hbase.master.AssignmentManager.recoverTableInEnablingState(AssignmentManager.java:1695) at org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:416) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:720) at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:170) at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1459) at java.lang.Thread.run(Thread.java:745) 2015-02-04 16:11:45,932 FATAL [stobdtserver3:16040.activeMasterManager] master.HMaster: Master server abort: loaded coprocessors are: [org.apache.hadoop.hbor.MultiRowMutationEndpoint] 2015-02-04 16:11:45,933 FATAL [stobdtserver3:16040.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown. java.lang.NullPointerException at org.apache.hadoop.hbase.master.handler.EnableTableHandler.handleEnableTable(EnableTableHandler.java:210) at org.apache.hadoop.hbase.master.handler.EnableTableHandler.process(EnableTableHandler.java:142) at org.apache.hadoop.hbase.master.AssignmentManager.recoverTableInEnablingState(AssignmentManager.java:1695) at org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:416) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:720) at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:170) at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1459) at java.lang.Thread.run(Thread.java:745) {code} A table was trying to recover from ENABLING state and the master got the above exception. Note that the set up was 2 master with 1 RS (total 3 machines). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11292) Add an undelete operation
[ https://issues.apache.org/jira/browse/HBASE-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308197#comment-14308197 ] Lars Hofhansl commented on HBASE-11292: --- I will think about this in earnest during the next few days. Related: A scan coprocessor can currently not ignore delete markers (without reimplementing the entire delete marker logic). I am not about a good interface for this (seems like we need a pass down a list of excluded timeranges). [~ghelmling], any ideas? Add an undelete operation --- Key: HBASE-11292 URL: https://issues.apache.org/jira/browse/HBASE-11292 Project: HBase Issue Type: New Feature Components: Deletes Reporter: Gary Helmling Labels: Phoenix While column families can be configured to keep deleted cells (allowing time range queries to still retrieve those cells), deletes are still somewhat unique in that they are irreversible operations. Once a delete has been issued on a cell, the only way to undelete it is to rewrite the data with a timestamp newer than the delete. The idea here is to add an undelete operation, that would make it possible to cancel a previous delete. An undelete operation will be similar to a delete, in that it will be written as a marker (tombstone doesn't seem like the right word). The undelete marker, however, will sort prior to a delete marker, canceling the effect of any following delete. In the absence of a column family configured to KEEP_DELETED_CELLS, we can't be sure if a prior delete marker and the effected cells have already been garbage collected. In this case (column family not configured with KEEP_DELETED_CELLS) it may be necessary for the server to reject undelete operations to avoid creating the appearance of a client contact for undeletes that can't reliably be honored. I think there are additional subtleties of the implementation to be worked out, but I'm also interested in a broader discussion of interest in this capability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-11292) Add an undelete operation
[ https://issues.apache.org/jira/browse/HBASE-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308197#comment-14308197 ] Lars Hofhansl edited comment on HBASE-11292 at 2/5/15 11:01 PM: I will think about this in earnest during the next few days. Related: A scan coprocessor can currently not ignore delete markers (without reimplementing the entire delete marker logic). I am not sure about a good interface for this (seems like we need a pass down a list of excluded timeranges). [~ghelmling], any ideas? was (Author: lhofhansl): I will think about this in earnest during the next few days. Related: A scan coprocessor can currently not ignore delete markers (without reimplementing the entire delete marker logic). I am not about a good interface for this (seems like we need a pass down a list of excluded timeranges). [~ghelmling], any ideas? Add an undelete operation --- Key: HBASE-11292 URL: https://issues.apache.org/jira/browse/HBASE-11292 Project: HBase Issue Type: New Feature Components: Deletes Reporter: Gary Helmling Labels: Phoenix While column families can be configured to keep deleted cells (allowing time range queries to still retrieve those cells), deletes are still somewhat unique in that they are irreversible operations. Once a delete has been issued on a cell, the only way to undelete it is to rewrite the data with a timestamp newer than the delete. The idea here is to add an undelete operation, that would make it possible to cancel a previous delete. An undelete operation will be similar to a delete, in that it will be written as a marker (tombstone doesn't seem like the right word). The undelete marker, however, will sort prior to a delete marker, canceling the effect of any following delete. In the absence of a column family configured to KEEP_DELETED_CELLS, we can't be sure if a prior delete marker and the effected cells have already been garbage collected. In this case (column family not configured with KEEP_DELETED_CELLS) it may be necessary for the server to reject undelete operations to avoid creating the appearance of a client contact for undeletes that can't reliably be honored. I think there are additional subtleties of the implementation to be worked out, but I'm also interested in a broader discussion of interest in this capability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12966) NPE in HMaster while recovering tables in Enabling state
[ https://issues.apache.org/jira/browse/HBASE-12966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-12966: --- Resolution: Fixed Status: Resolved (was: Patch Available) 0.98 doesn't seem to have this issue. NPE in HMaster while recovering tables in Enabling state Key: HBASE-12966 URL: https://issues.apache.org/jira/browse/HBASE-12966 Project: HBase Issue Type: Bug Components: master Reporter: ramkrishna.s.vasudevan Assignee: Andrey Stepachev Fix For: 2.0.0, 1.0.1, 1.1.0 Attachments: HBASE-12966-branch-1.patch, HBASE-12966-branch-1.patch, HBASE-12966-branch-1.patch, HBASE-12966.patch, HBASE-12966.patch, HBASE-12966.patch, HBASE-12966.patch {code} java.lang.NullPointerException at org.apache.hadoop.hbase.master.handler.EnableTableHandler.handleEnableTable(EnableTableHandler.java:210) at org.apache.hadoop.hbase.master.handler.EnableTableHandler.process(EnableTableHandler.java:142) at org.apache.hadoop.hbase.master.AssignmentManager.recoverTableInEnablingState(AssignmentManager.java:1695) at org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:416) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:720) at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:170) at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1459) at java.lang.Thread.run(Thread.java:745) 2015-02-04 16:11:45,932 FATAL [stobdtserver3:16040.activeMasterManager] master.HMaster: Master server abort: loaded coprocessors are: [org.apache.hadoop.hbor.MultiRowMutationEndpoint] 2015-02-04 16:11:45,933 FATAL [stobdtserver3:16040.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown. java.lang.NullPointerException at org.apache.hadoop.hbase.master.handler.EnableTableHandler.handleEnableTable(EnableTableHandler.java:210) at org.apache.hadoop.hbase.master.handler.EnableTableHandler.process(EnableTableHandler.java:142) at org.apache.hadoop.hbase.master.AssignmentManager.recoverTableInEnablingState(AssignmentManager.java:1695) at org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:416) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:720) at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:170) at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1459) at java.lang.Thread.run(Thread.java:745) {code} A table was trying to recover from ENABLING state and the master got the above exception. Note that the set up was 2 master with 1 RS (total 3 machines). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11544) [Ergonomics] hbase.client.scanner.caching is dogged and will try to return batch even if it means OOME
[ https://issues.apache.org/jira/browse/HBASE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308185#comment-14308185 ] Lars Hofhansl commented on HBASE-11544: --- +1 to everything you said And +1 to defer streaming for now, that will be somewhat tricky. I think this is implicit but just to be sure: Are planning to keep the caching setting on Scan (maybe names rowLimit now, or whatever)? (So a client who only wants to see the first N rows can avoid downloading an entire batch) [Ergonomics] hbase.client.scanner.caching is dogged and will try to return batch even if it means OOME -- Key: HBASE-11544 URL: https://issues.apache.org/jira/browse/HBASE-11544 Project: HBase Issue Type: Bug Reporter: stack Priority: Critical Labels: beginner Running some tests, I set hbase.client.scanner.caching=1000. Dataset has large cells. I kept OOME'ing. Serverside, we should measure how much we've accumulated and return to the client whatever we've gathered once we pass out a certain size threshold rather than keep accumulating till we OOME. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12975) SplitTranaction, RegionMergeTransaction to should have InterfaceAudience of LimitedPrivate(Coproc,Phoenix)
[ https://issues.apache.org/jira/browse/HBASE-12975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308188#comment-14308188 ] Lars Hofhansl commented on HBASE-12975: --- That'll be the tricky part :) Working out what we need to pull into an interface - and in a way that still allows us to change the actual implementation. SplitTranaction, RegionMergeTransaction to should have InterfaceAudience of LimitedPrivate(Coproc,Phoenix) -- Key: HBASE-12975 URL: https://issues.apache.org/jira/browse/HBASE-12975 Project: HBase Issue Type: Improvement Reporter: Rajeshbabu Chintaguntla Assignee: Rajeshbabu Chintaguntla Priority: Minor Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11 Attachments: HBASE-12975.patch Making SplitTransaction, RegionMergeTransaction limited private is required to support local indexing feature in Phoenix to ensure regions colocation. We can ensure region split, regions merge in the coprocessors in few method calls without touching internals like creating zk's, file layout changes or assignments. 1) stepsBeforePONR, stepsAfterPONR we can ensure split. 2) meta entries can pass through coprocessors to atomically update with the normal split/merge. 3) rollback on failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11544) [Ergonomics] hbase.client.scanner.caching is dogged and will try to return batch even if it means OOME
[ https://issues.apache.org/jira/browse/HBASE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308200#comment-14308200 ] Jonathan Lawlor commented on HBASE-11544: - [~lhofhansl] Yes absolutely, the setting will remain in some form so that the use case you described can be controlled. [Ergonomics] hbase.client.scanner.caching is dogged and will try to return batch even if it means OOME -- Key: HBASE-11544 URL: https://issues.apache.org/jira/browse/HBASE-11544 Project: HBase Issue Type: Bug Reporter: stack Priority: Critical Labels: beginner Running some tests, I set hbase.client.scanner.caching=1000. Dataset has large cells. I kept OOME'ing. Serverside, we should measure how much we've accumulated and return to the client whatever we've gathered once we pass out a certain size threshold rather than keep accumulating till we OOME. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12976) Set default value for hbase.client.scanner.max.result.size
[ https://issues.apache.org/jira/browse/HBASE-12976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308202#comment-14308202 ] Lars Hofhansl commented on HBASE-12976: --- On second thought... We left default scanner caching set to 1 in 0.94, so this setting by default makes not much sense. Set default value for hbase.client.scanner.max.result.size -- Key: HBASE-12976 URL: https://issues.apache.org/jira/browse/HBASE-12976 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11 Attachments: 12976-v2.txt, 12976.txt Setting scanner caching is somewhat of a black art. It's hard to estimate ahead of time how large the result set will be. I propose we hbase.client.scanner.max.result.size to 2mb. That is good compromise between performance and buffer usage on typical networks (avoiding OOMs when the caching was chosen too high). To an HTable client this is completely transparent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12976) Set default value for hbase.client.scanner.max.result.size
[ https://issues.apache.org/jira/browse/HBASE-12976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-12976: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Set default value for hbase.client.scanner.max.result.size -- Key: HBASE-12976 URL: https://issues.apache.org/jira/browse/HBASE-12976 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11 Attachments: 12976-v2.txt, 12976.txt Setting scanner caching is somewhat of a black art. It's hard to estimate ahead of time how large the result set will be. I propose we hbase.client.scanner.max.result.size to 2mb. That is good compromise between performance and buffer usage on typical networks (avoiding OOMs when the caching was chosen too high). To an HTable client this is completely transparent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12972) Region, a supportable public/evolving subset of HRegion
[ https://issues.apache.org/jira/browse/HBASE-12972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308211#comment-14308211 ] Andrew Purtell commented on HBASE-12972: [~lhofhansl], want to fold what to do about HBASE-12975 in here, after it's hashed out over there? Region, a supportable public/evolving subset of HRegion --- Key: HBASE-12972 URL: https://issues.apache.org/jira/browse/HBASE-12972 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11 On HBASE-12566, [~lhofhansl] proposed: {quote} Maybe we can have a {{Region}} interface that is to {{HRegion}} is what {{Store}} is to {{HStore}}. Store marked with {{@InterfaceAudience.Private}} but used in some coprocessor hooks. {quote} By example, now coprocessors have to reach into HRegion in order to participate in row and region locking protocols, this is one area where the functionality is legitimate for coprocessors but not for users, so an in-between interface make sense. In addition we should promote {{Store}}'s interface audience to LimitedPrivate(COPROC). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12956) Binding to 0.0.0.0 is broken after HBASE-10569
[ https://issues.apache.org/jira/browse/HBASE-12956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308276#comment-14308276 ] Enis Soztutar commented on HBASE-12956: --- HBASE-12954 will not make into 1.0. If you have a patch without those changes, I can test it out for branch-1.0. Binding to 0.0.0.0 is broken after HBASE-10569 -- Key: HBASE-12956 URL: https://issues.apache.org/jira/browse/HBASE-12956 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Priority: Blocker Fix For: 1.0.0, 2.0.0, 1.1.0 After the Region Server and Master code was merged, we lost the functionality to bind to 0.0.0.0 via hbase.regionserver.ipc.address and znodes now get created with the wildcard address which means that RSs and the master cannot connect to each other. Thanks to [~dimaspivak] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12958) SSH doing hbase:meta get but hbase:meta not assigned
[ https://issues.apache.org/jira/browse/HBASE-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308293#comment-14308293 ] Enis Soztutar commented on HBASE-12958: --- I'll wait for this patch as well for the new 1.0 RC. SSH doing hbase:meta get but hbase:meta not assigned Key: HBASE-12958 URL: https://issues.apache.org/jira/browse/HBASE-12958 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: stack Assignee: stack Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11 Attachments: 12958.branch-1.v2.txt, 12958.txt, 12958v2.txt, 12958v2.txt, HBASE-12958-0.98-addendum.patch All master threads are blocked waiting on this call to return: {code} MASTER_SERVER_OPERATIONS-c2020:16020-2 #189 prio=5 os_prio=0 tid=0x7f4b0408b000 nid=0x7821 in Object.wait() [0x7f4ada24d000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:168) - locked 0x00041c374f50 (a java.util.concurrent.atomic.AtomicBoolean) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:881) at org.apache.hadoop.hbase.MetaTableAccessor.get(MetaTableAccessor.java:208) at org.apache.hadoop.hbase.MetaTableAccessor.getRegionLocation(MetaTableAccessor.java:250) at org.apache.hadoop.hbase.MetaTableAccessor.getRegion(MetaTableAccessor.java:225) at org.apache.hadoop.hbase.master.RegionStates.serverOffline(RegionStates.java:634) - locked 0x00041c1f0d80 (a org.apache.hadoop.hbase.master.RegionStates) at org.apache.hadoop.hbase.master.AssignmentManager.processServerShutdown(AssignmentManager.java:3298) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:226) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} Master is stuck trying to find hbase:meta on the server that just crashed and that we just recovered: Mon Feb 02 23:00:02 PST 2015, null, java.net.SocketTimeoutException: callTimeout=6, callDuration=68181: row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=c2022.halxg.cloudera.com,16020,1422944918568, seqNum=0 Will add more detail in a sec. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12949) Scanner can be stuck in infinite loop if the HFile is corrupted
[ https://issues.apache.org/jira/browse/HBASE-12949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308339#comment-14308339 ] Jerry He commented on HBASE-12949: -- Hi, [~stack] Regarding the checksum comment, there were indeed these messages in the logs {noformat} 2015-01-31 14:53:09,848 WARN org.apache.hadoop.hbase.io.hfile.HFile: File hdfs://hdtest009:9000/hbase/data/default/CUMMINS_INSITE_V1/e57ef4908adb6bcfc72cdcfd2ac2564f/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 Stored checksum value of 0 at offset 65613 does not match computed checksum 2413089988, total data size 65666 Checksum data range offset 33 len 16351 Header dump: magic: 4918304907327195946 blockType DATA compressedBlockSizeNoHeader 65600 uncompressedBlockSizeNoHeader 65580 prevBlockOffset 2853080217 checksumType CRC32 bytesPerChecksum 16384 onDiskDataSizeWithHeader 65613 2015-01-31 14:53:09,848 WARN org.apache.hadoop.hbase.io.hfile.HFile: HBase checksum verification failed for file hdfs://hdtest009:9000/hbase/data/default/CUMMINS_INSITE_V1/e57ef4908adb6bcfc72cdcfd2ac2564f/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 at offset 2853145836 filesize 6861765956. Retrying read with HDFS checksums turned on... 2015-01-31 14:53:09,855 WARN org.apache.hadoop.hbase.io.hfile.HFile: HDFS checksum verification suceeded for file hdfs://hdtest009:9000/hbase/data/default/CUMMINS_INSITE_V1/e57ef4908adb6bcfc72cdcfd2ac2564f/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 at offset 2853145836 filesize 6861765956 {noformat} Looks like in the code, if HBase checksum fails, we just turn it off? Scanner can be stuck in infinite loop if the HFile is corrupted --- Key: HBASE-12949 URL: https://issues.apache.org/jira/browse/HBASE-12949 Project: HBase Issue Type: Bug Affects Versions: 0.94.3, 0.98.10 Reporter: Jerry He Attachments: HBASE-12949-master.patch We've encountered problem where compaction hangs and never completes. After looking into it further, we found that the compaction scanner was stuck in a infinite loop. See stack below. {noformat} org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:296) org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:257) org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:697) org.apache.hadoop.hbase.regionserver.StoreScanner.seekToNextRow(StoreScanner.java:672) org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:529) org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:223) {noformat} We identified the hfile that seems to be corrupted. Using HFile tool shows the following: {noformat} [biadmin@hdtest009 bin]$ hbase org.apache.hadoop.hbase.io.hfile.HFile -v -k -m -f /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 15/01/23 11:53:17 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 15/01/23 11:53:18 INFO util.ChecksumType: Checksum using org.apache.hadoop.util.PureJavaCrc32 15/01/23 11:53:18 INFO util.ChecksumType: Checksum can use org.apache.hadoop.util.PureJavaCrc32C 15/01/23 11:53:18 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS Scanning - /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 WARNING, previous row is greater then current row filename - /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 previous - \x00/20110203-094231205-79442793-1410161293068203000\x0Aattributes16794406\x00\x00\x01\x00\x00\x00\x00\x00\x00 current - Exception in thread main java.nio.BufferUnderflowException at java.nio.Buffer.nextGetIndex(Buffer.java:489) at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:347) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.readKeyValueLen(HFileReaderV2.java:856) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:768) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.scanKeysValues(HFilePrettyPrinter.java:362) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.processFile(HFilePrettyPrinter.java:262) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.run(HFilePrettyPrinter.java:220) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.main(HFilePrettyPrinter.java:539) at org.apache.hadoop.hbase.io.hfile.HFile.main(HFile.java:802) {noformat} Turning on Java Assert shows the following:
[jira] [Commented] (HBASE-12954) Ability impaired using HBase on multihomed hosts
[ https://issues.apache.org/jira/browse/HBASE-12954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308222#comment-14308222 ] Hadoop QA commented on HBASE-12954: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12696866/12954-v11.txt against master branch at commit 96cdc7987e8894b304a3201f67cb0b9595c68cc3. ATTACHMENT ID: 12696866 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: +descriptionThis config is for experts: don't set its value unless you really know what you are doing. +When set to a non-empty value, this represents the (external facing) hostname for the underlying server. + new java.lang.String[] { Port, ServerStartCode, ServerCurrentTime, UseThisHostnameInstead, }); {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/12711//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12711//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12711//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12711//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12711//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12711//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12711//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12711//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12711//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12711//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12711//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12711//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/12711//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/12711//console This message is automatically generated. Ability impaired using HBase on multihomed hosts Key: HBASE-12954 URL: https://issues.apache.org/jira/browse/HBASE-12954 Project: HBase Issue Type: Bug Affects Versions: 0.98.4 Reporter: Clay B. Assignee: Ted Yu Priority: Minor Attachments: 12954-v1.txt, 12954-v10.txt, 12954-v11.txt, 12954-v7.txt, 12954-v8.txt, Hadoop Three Interfaces.png For HBase clusters running on unusual networks (such as NAT'd cloud environments or physical machines with multiple IP's per network interface) it would be ideal to have a way to both specify: # which IP interface to which HBase master or region-server will bind # what hostname HBase will advertise in Zookeeper both for a master or region-server process While efforts such as HBASE-8640 go a long way to normalize these two sources of information, it is not possible in the current design
[jira] [Commented] (HBASE-12949) Scanner can be stuck in infinite loop if the HFile is corrupted
[ https://issues.apache.org/jira/browse/HBASE-12949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308265#comment-14308265 ] Jerry He commented on HBASE-12949: -- If the header/trailer of the hfile is corrupted, the problem probably shows up very quickly. The hbck tool has a '-checkCorruptHFiles' option, although it only checks that we can create readers on the files: headers/trailers are ok. For corruptions in the data cells, there does not seem to be a quick way. I had to run the HFile tool on the individual suspected region files. The hbck tool can be enhanced to have option to check rows/cells on a table. Scanner can be stuck in infinite loop if the HFile is corrupted --- Key: HBASE-12949 URL: https://issues.apache.org/jira/browse/HBASE-12949 Project: HBase Issue Type: Bug Affects Versions: 0.94.3, 0.98.10 Reporter: Jerry He Attachments: HBASE-12949-master.patch We've encountered problem where compaction hangs and never completes. After looking into it further, we found that the compaction scanner was stuck in a infinite loop. See stack below. {noformat} org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:296) org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:257) org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:697) org.apache.hadoop.hbase.regionserver.StoreScanner.seekToNextRow(StoreScanner.java:672) org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:529) org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:223) {noformat} We identified the hfile that seems to be corrupted. Using HFile tool shows the following: {noformat} [biadmin@hdtest009 bin]$ hbase org.apache.hadoop.hbase.io.hfile.HFile -v -k -m -f /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 15/01/23 11:53:17 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 15/01/23 11:53:18 INFO util.ChecksumType: Checksum using org.apache.hadoop.util.PureJavaCrc32 15/01/23 11:53:18 INFO util.ChecksumType: Checksum can use org.apache.hadoop.util.PureJavaCrc32C 15/01/23 11:53:18 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS Scanning - /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 WARNING, previous row is greater then current row filename - /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 previous - \x00/20110203-094231205-79442793-1410161293068203000\x0Aattributes16794406\x00\x00\x01\x00\x00\x00\x00\x00\x00 current - Exception in thread main java.nio.BufferUnderflowException at java.nio.Buffer.nextGetIndex(Buffer.java:489) at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:347) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.readKeyValueLen(HFileReaderV2.java:856) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:768) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.scanKeysValues(HFilePrettyPrinter.java:362) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.processFile(HFilePrettyPrinter.java:262) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.run(HFilePrettyPrinter.java:220) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.main(HFilePrettyPrinter.java:539) at org.apache.hadoop.hbase.io.hfile.HFile.main(HFile.java:802) {noformat} Turning on Java Assert shows the following: {noformat} Exception in thread main java.lang.AssertionError: Key 20110203-094231205-79442793-1410161293068203000/attributes:16794406/1099511627776/Minimum/vlen=15/mvcc=0 followed by a smaller key //0/Minimum/vlen=0/mvcc=0 in cf attributes at org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:672) {noformat} It shows that the hfile seems to be corrupted -- the keys don't seem to be right. But Scanner is not able to give a meaningful error, but stuck in an infinite loop in here: {code} KeyValueHeap.generalizedSeek() while ((scanner = heap.poll()) != null) { } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12949) Scanner can be stuck in infinite loop if the HFile is corrupted
[ https://issues.apache.org/jira/browse/HBASE-12949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308264#comment-14308264 ] stack commented on HBASE-12949: --- Another thought, we'd not fail checksum if managled Cells? [~jerryhe] Just say no to more configs! I'm not mad about adding friction on our reads. In the KV constructor, it is going to decode sizes anyway as part of the parse? Check at this point rather than apart in the CellUtil I suggested above? Thanks. Scanner can be stuck in infinite loop if the HFile is corrupted --- Key: HBASE-12949 URL: https://issues.apache.org/jira/browse/HBASE-12949 Project: HBase Issue Type: Bug Affects Versions: 0.94.3, 0.98.10 Reporter: Jerry He Attachments: HBASE-12949-master.patch We've encountered problem where compaction hangs and never completes. After looking into it further, we found that the compaction scanner was stuck in a infinite loop. See stack below. {noformat} org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:296) org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:257) org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:697) org.apache.hadoop.hbase.regionserver.StoreScanner.seekToNextRow(StoreScanner.java:672) org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:529) org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:223) {noformat} We identified the hfile that seems to be corrupted. Using HFile tool shows the following: {noformat} [biadmin@hdtest009 bin]$ hbase org.apache.hadoop.hbase.io.hfile.HFile -v -k -m -f /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 15/01/23 11:53:17 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 15/01/23 11:53:18 INFO util.ChecksumType: Checksum using org.apache.hadoop.util.PureJavaCrc32 15/01/23 11:53:18 INFO util.ChecksumType: Checksum can use org.apache.hadoop.util.PureJavaCrc32C 15/01/23 11:53:18 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS Scanning - /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 WARNING, previous row is greater then current row filename - /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 previous - \x00/20110203-094231205-79442793-1410161293068203000\x0Aattributes16794406\x00\x00\x01\x00\x00\x00\x00\x00\x00 current - Exception in thread main java.nio.BufferUnderflowException at java.nio.Buffer.nextGetIndex(Buffer.java:489) at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:347) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.readKeyValueLen(HFileReaderV2.java:856) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:768) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.scanKeysValues(HFilePrettyPrinter.java:362) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.processFile(HFilePrettyPrinter.java:262) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.run(HFilePrettyPrinter.java:220) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.main(HFilePrettyPrinter.java:539) at org.apache.hadoop.hbase.io.hfile.HFile.main(HFile.java:802) {noformat} Turning on Java Assert shows the following: {noformat} Exception in thread main java.lang.AssertionError: Key 20110203-094231205-79442793-1410161293068203000/attributes:16794406/1099511627776/Minimum/vlen=15/mvcc=0 followed by a smaller key //0/Minimum/vlen=0/mvcc=0 in cf attributes at org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:672) {noformat} It shows that the hfile seems to be corrupted -- the keys don't seem to be right. But Scanner is not able to give a meaningful error, but stuck in an infinite loop in here: {code} KeyValueHeap.generalizedSeek() while ((scanner = heap.poll()) != null) { } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12897) Minimum memstore size is a percentage
[ https://issues.apache.org/jira/browse/HBASE-12897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308302#comment-14308302 ] churro morales commented on HBASE-12897: in trunk, HeapMemorySizeUtil will allow for a block cache to be sized at 0. I was thinking we do what lars said which is have something like hbase.min.memstore.blockcache.threshold.percentage and have it default to 0.0 and make sure that we check both the blockcache and memstore are larger that this configuration option. What do you guys think? Minimum memstore size is a percentage - Key: HBASE-12897 URL: https://issues.apache.org/jira/browse/HBASE-12897 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 0.98.10, 1.1.0 Reporter: churro morales Assignee: churro morales We have a cluster which is optimized for random reads. Thus we have a large block cache and a small memstore. Currently our heap is 20GB and we wanted to configure the memstore to take 4% or 800MB. Right now the minimum memstore size is 5%. What do you guys think about reducing the minimum size to 1%? Suppose we log a warning if the memstore is below 5% but allow it? What do you folks think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12956) Binding to 0.0.0.0 is broken after HBASE-10569
[ https://issues.apache.org/jira/browse/HBASE-12956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez updated HBASE-12956: -- Attachment: 0001-HBASE-12956-Binding-to-0.0.0.0-is-broken-after-HBASE.patch Binding to 0.0.0.0 is broken after HBASE-10569 -- Key: HBASE-12956 URL: https://issues.apache.org/jira/browse/HBASE-12956 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Priority: Blocker Fix For: 1.0.0, 2.0.0, 1.1.0 Attachments: 0001-HBASE-12956-Binding-to-0.0.0.0-is-broken-after-HBASE.patch After the Region Server and Master code was merged, we lost the functionality to bind to 0.0.0.0 via hbase.regionserver.ipc.address and znodes now get created with the wildcard address which means that RSs and the master cannot connect to each other. Thanks to [~dimaspivak] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12958) SSH doing hbase:meta get but hbase:meta not assigned
[ https://issues.apache.org/jira/browse/HBASE-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308340#comment-14308340 ] Hadoop QA commented on HBASE-12958: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12696896/12958.branch-1.v2.txt against branch-1 branch at commit 2583e8de574ae4b002c5dbc80b0da666b42dd699. ATTACHMENT ID: 12696896 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 12 warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/12712//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12712//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12712//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12712//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12712//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12712//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12712//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12712//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12712//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12712//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12712//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12712//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/12712//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12712//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/12712//console This message is automatically generated. SSH doing hbase:meta get but hbase:meta not assigned Key: HBASE-12958 URL: https://issues.apache.org/jira/browse/HBASE-12958 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: stack Assignee: stack Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11 Attachments: 12958.branch-1.v2.txt, 12958.txt, 12958v2.txt, 12958v2.txt, HBASE-12958-0.98-addendum.patch All master threads are blocked waiting on this call to return: {code} MASTER_SERVER_OPERATIONS-c2020:16020-2 #189 prio=5 os_prio=0 tid=0x7f4b0408b000 nid=0x7821 in Object.wait() [0x7f4ada24d000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:168) - locked 0x00041c374f50 (a java.util.concurrent.atomic.AtomicBoolean) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:881)
[jira] [Commented] (HBASE-11568) Async WAL replication for region replicas
[ https://issues.apache.org/jira/browse/HBASE-11568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308341#comment-14308341 ] Enis Soztutar commented on HBASE-11568: --- +1. I checked the patch against the master patch and current code. Two nits we can address on commit. - RegionReplicaReplicationEndpoint.java javadoc refers to now-gone HLogSplitter. This coming from the old patch, we can replace that with WALSplitter. Also the InterfaceAudience should be the one in the hbase package, not hadoop package. - in LogReplayOutputSink.flush() method, dataAvailable.notifyAll() is wrapped with the sync block. It is not needed since all callers for the flush() method already has the monitor, but we can still have that as an extra guarantee, since it is not documented well. master code already has this (I think added after the patch), so we can keep that sync block in the branch-1 as well. {code} synchronized(dataAvailable) { - dataAvailable.notifyAll(); -} +controller.dataAvailable.notifyAll(); {code} Async WAL replication for region replicas - Key: HBASE-11568 URL: https://issues.apache.org/jira/browse/HBASE-11568 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 2.0.0, 1.1.0 Attachments: 11568-2-branch-1.txt, 11568-branch-1.txt, hbase-11568_v2.patch, hbase-11568_v3.patch As mentioned in parent issue, and design docs for phase-1 (HBASE-10070) and Phase-2 (HBASE-11183), implement asynchronous WAL replication from the WAL files of the primary region to the secondary region replicas. The WAL replication will build upon the pluggable replication framework introduced in HBASE-11367, and the distributed WAL replay. Upon having some experience with the patch, we changed the design so that there is only one replication queue for doing the async wal replication to secondary replicas rather than having a queue per region replica. This is due to the fact that, we do not want to tail the logs of every region server for a single region replica. Handling of flushes/compactions and memstore accounting will be handled in other subtasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12897) Minimum memstore size is a percentage
[ https://issues.apache.org/jira/browse/HBASE-12897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] churro morales updated HBASE-12897: --- Attachment: HBASE-12897.patch This patch will always enforce the blockcache + memstore 0. Minimum memstore size is a percentage - Key: HBASE-12897 URL: https://issues.apache.org/jira/browse/HBASE-12897 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 0.98.10, 1.1.0 Reporter: churro morales Assignee: churro morales Attachments: HBASE-12897.patch We have a cluster which is optimized for random reads. Thus we have a large block cache and a small memstore. Currently our heap is 20GB and we wanted to configure the memstore to take 4% or 800MB. Right now the minimum memstore size is 5%. What do you guys think about reducing the minimum size to 1%? Suppose we log a warning if the memstore is below 5% but allow it? What do you folks think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12976) Set default value for hbase.client.scanner.max.result.size
[ https://issues.apache.org/jira/browse/HBASE-12976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-12976: -- Fix Version/s: (was: 0.94.27) Set default value for hbase.client.scanner.max.result.size -- Key: HBASE-12976 URL: https://issues.apache.org/jira/browse/HBASE-12976 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11 Attachments: 12976-v2.txt, 12976.txt Setting scanner caching is somewhat of a black art. It's hard to estimate ahead of time how large the result set will be. I propose we hbase.client.scanner.max.result.size to 2mb. That is good compromise between performance and buffer usage on typical networks (avoiding OOMs when the caching was chosen too high). To an HTable client this is completely transparent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12976) Set default value for hbase.client.scanner.max.result.size
[ https://issues.apache.org/jira/browse/HBASE-12976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308208#comment-14308208 ] Hudson commented on HBASE-12976: FAILURE: Integrated in HBase-1.0 #711 (See [https://builds.apache.org/job/HBase-1.0/711/]) HBASE-12976 Set default value for hbase.client.scanner.max.result.size. (larsh: rev e8a51c07c9e066f9362d849157f0741b62242e2e) * hbase-common/src/main/resources/hbase-default.xml * hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java Set default value for hbase.client.scanner.max.result.size -- Key: HBASE-12976 URL: https://issues.apache.org/jira/browse/HBASE-12976 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11 Attachments: 12976-v2.txt, 12976.txt Setting scanner caching is somewhat of a black art. It's hard to estimate ahead of time how large the result set will be. I propose we hbase.client.scanner.max.result.size to 2mb. That is good compromise between performance and buffer usage on typical networks (avoiding OOMs when the caching was chosen too high). To an HTable client this is completely transparent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12966) NPE in HMaster while recovering tables in Enabling state
[ https://issues.apache.org/jira/browse/HBASE-12966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308207#comment-14308207 ] Hudson commented on HBASE-12966: FAILURE: Integrated in HBase-1.0 #711 (See [https://builds.apache.org/job/HBase-1.0/711/]) HBASE-12966 NPE in HMaster while recovering tables in Enabling state (Andrey Stepachev) (tedyu: rev 8e938de75ff9d4e1a5f11e04fd14a5c84e4bcccf) * hbase-server/src/test/java/org/apache/hadoop/hbase/master/handler/TestEnableTableHandler.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java * hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java NPE in HMaster while recovering tables in Enabling state Key: HBASE-12966 URL: https://issues.apache.org/jira/browse/HBASE-12966 Project: HBase Issue Type: Bug Components: master Reporter: ramkrishna.s.vasudevan Assignee: Andrey Stepachev Fix For: 1.0.0, 2.0.0, 1.1.0 Attachments: HBASE-12966-branch-1.patch, HBASE-12966-branch-1.patch, HBASE-12966-branch-1.patch, HBASE-12966.patch, HBASE-12966.patch, HBASE-12966.patch, HBASE-12966.patch {code} java.lang.NullPointerException at org.apache.hadoop.hbase.master.handler.EnableTableHandler.handleEnableTable(EnableTableHandler.java:210) at org.apache.hadoop.hbase.master.handler.EnableTableHandler.process(EnableTableHandler.java:142) at org.apache.hadoop.hbase.master.AssignmentManager.recoverTableInEnablingState(AssignmentManager.java:1695) at org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:416) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:720) at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:170) at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1459) at java.lang.Thread.run(Thread.java:745) 2015-02-04 16:11:45,932 FATAL [stobdtserver3:16040.activeMasterManager] master.HMaster: Master server abort: loaded coprocessors are: [org.apache.hadoop.hbor.MultiRowMutationEndpoint] 2015-02-04 16:11:45,933 FATAL [stobdtserver3:16040.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown. java.lang.NullPointerException at org.apache.hadoop.hbase.master.handler.EnableTableHandler.handleEnableTable(EnableTableHandler.java:210) at org.apache.hadoop.hbase.master.handler.EnableTableHandler.process(EnableTableHandler.java:142) at org.apache.hadoop.hbase.master.AssignmentManager.recoverTableInEnablingState(AssignmentManager.java:1695) at org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:416) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:720) at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:170) at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1459) at java.lang.Thread.run(Thread.java:745) {code} A table was trying to recover from ENABLING state and the master got the above exception. Note that the set up was 2 master with 1 RS (total 3 machines). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12976) Set default value for hbase.client.scanner.max.result.size
[ https://issues.apache.org/jira/browse/HBASE-12976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308254#comment-14308254 ] Hudson commented on HBASE-12976: FAILURE: Integrated in HBase-TRUNK #6093 (See [https://builds.apache.org/job/HBase-TRUNK/6093/]) HBASE-12976 Set default value for hbase.client.scanner.max.result.size. (larsh: rev 2583e8de574ae4b002c5dbc80b0da666b42dd699) * hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java * hbase-common/src/main/resources/hbase-default.xml Set default value for hbase.client.scanner.max.result.size -- Key: HBASE-12976 URL: https://issues.apache.org/jira/browse/HBASE-12976 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11 Attachments: 12976-v2.txt, 12976.txt Setting scanner caching is somewhat of a black art. It's hard to estimate ahead of time how large the result set will be. I propose we hbase.client.scanner.max.result.size to 2mb. That is good compromise between performance and buffer usage on typical networks (avoiding OOMs when the caching was chosen too high). To an HTable client this is completely transparent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12956) Binding to 0.0.0.0 is broken after HBASE-10569
[ https://issues.apache.org/jira/browse/HBASE-12956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308281#comment-14308281 ] Esteban Gutierrez commented on HBASE-12956: --- Thats fine [~enis] how ever without HBASE-12954 you cannot bind to an IP different to 0.0.0.0 or the default host IP address (e.g. no way to use 127.0.0.1 in hbase.regionserver.ipc.address) Binding to 0.0.0.0 is broken after HBASE-10569 -- Key: HBASE-12956 URL: https://issues.apache.org/jira/browse/HBASE-12956 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Priority: Blocker Fix For: 1.0.0, 2.0.0, 1.1.0 After the Region Server and Master code was merged, we lost the functionality to bind to 0.0.0.0 via hbase.regionserver.ipc.address and znodes now get created with the wildcard address which means that RSs and the master cannot connect to each other. Thanks to [~dimaspivak] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12956) Binding to 0.0.0.0 is broken after HBASE-10569
[ https://issues.apache.org/jira/browse/HBASE-12956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308290#comment-14308290 ] Enis Soztutar commented on HBASE-12956: --- bq. Thats fine Enis Soztutar how ever without HBASE-12954 you cannot bind to an IP different to 0.0.0.0 or the default host IP address (e.g. no way to use 127.0.0.1 in hbase.regionserver.ipc.address) Not sure I understand it. Users might want to do the same setup as in 0.98 and 0.94, no? HBASE-12954 is not needed for a setup where the RS is multi homed, bind address is 0.0.0.0, but hostnames are resolved via reverse DNS. Binding to 0.0.0.0 is broken after HBASE-10569 -- Key: HBASE-12956 URL: https://issues.apache.org/jira/browse/HBASE-12956 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Priority: Blocker Fix For: 1.0.0, 2.0.0, 1.1.0 After the Region Server and Master code was merged, we lost the functionality to bind to 0.0.0.0 via hbase.regionserver.ipc.address and znodes now get created with the wildcard address which means that RSs and the master cannot connect to each other. Thanks to [~dimaspivak] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12972) Region, a supportable public/evolving subset of HRegion
[ https://issues.apache.org/jira/browse/HBASE-12972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308289#comment-14308289 ] Lars Hofhansl commented on HBASE-12972: --- Either way :) They are related in principle but probably not in code. Region, a supportable public/evolving subset of HRegion --- Key: HBASE-12972 URL: https://issues.apache.org/jira/browse/HBASE-12972 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11 On HBASE-12566, [~lhofhansl] proposed: {quote} Maybe we can have a {{Region}} interface that is to {{HRegion}} is what {{Store}} is to {{HStore}}. Store marked with {{@InterfaceAudience.Private}} but used in some coprocessor hooks. {quote} By example, now coprocessors have to reach into HRegion in order to participate in row and region locking protocols, this is one area where the functionality is legitimate for coprocessors but not for users, so an in-between interface make sense. In addition we should promote {{Store}}'s interface audience to LimitedPrivate(COPROC). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12976) Set default value for hbase.client.scanner.max.result.size
[ https://issues.apache.org/jira/browse/HBASE-12976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308296#comment-14308296 ] Lars Hofhansl commented on HBASE-12976: --- What the heck. All build fail with different failures. Pretty certain that none are related to this. Set default value for hbase.client.scanner.max.result.size -- Key: HBASE-12976 URL: https://issues.apache.org/jira/browse/HBASE-12976 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11 Attachments: 12976-v2.txt, 12976.txt Setting scanner caching is somewhat of a black art. It's hard to estimate ahead of time how large the result set will be. I propose we hbase.client.scanner.max.result.size to 2mb. That is good compromise between performance and buffer usage on typical networks (avoiding OOMs when the caching was chosen too high). To an HTable client this is completely transparent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12897) Minimum memstore size is a percentage
[ https://issues.apache.org/jira/browse/HBASE-12897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308308#comment-14308308 ] Lars Hofhansl commented on HBASE-12897: --- I meant something different: Remove any minimum size restriction other than saying the cache size cannot be 0. Minimum memstore size is a percentage - Key: HBASE-12897 URL: https://issues.apache.org/jira/browse/HBASE-12897 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 0.98.10, 1.1.0 Reporter: churro morales Assignee: churro morales We have a cluster which is optimized for random reads. Thus we have a large block cache and a small memstore. Currently our heap is 20GB and we wanted to configure the memstore to take 4% or 800MB. Right now the minimum memstore size is 5%. What do you guys think about reducing the minimum size to 1%? Suppose we log a warning if the memstore is below 5% but allow it? What do you folks think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)