[jira] [Updated] (HBASE-9941) The context ClassLoader isn't set while calling into a coprocessor
[ https://issues.apache.org/jira/browse/HBASE-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-9941: -- Resolution: Fixed Fix Version/s: 0.99.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk and 0.98. Thanks Gary and Lars for the reviews and advice. > The context ClassLoader isn't set while calling into a coprocessor > -- > > Key: HBASE-9941 > URL: https://issues.apache.org/jira/browse/HBASE-9941 > Project: HBase > Issue Type: Sub-task > Components: Coprocessors >Affects Versions: 0.96.0 >Reporter: Benoit Sigoure >Assignee: Andrew Purtell > Fix For: 0.98.0, 0.99.0 > > Attachments: 9941.patch, 9941.patch, 9941.patch, 9941.patch, > 9941.patch > > > Whenever one of the methods of a coprocessor is invoked, the context > {{ClassLoader}} isn't set to be the {{CoprocessorClassLoader}}. It's only > set properly when calling the coprocessor's {{start}} method. This means > that if the coprocessor code attempts to load classes using the context > {{ClassLoader}}, it will fail to find the classes it's looking for. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9941) The context ClassLoader isn't set while calling into a coprocessor
[ https://issues.apache.org/jira/browse/HBASE-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862232#comment-13862232 ] Andrew Purtell commented on HBASE-9941: --- Apparent 'release audit' warnings are these issues unrelated to this patch, which only modifies existing files with license headers: {noformat} [WARNING] The POM for org.eclipse.m2e:lifecycle-mapping:jar:1.0.0 is missing, no dependency information available [WARNING] Failed to retrieve plugin descriptor for org.eclipse.m2e:lifecycle-mapping:1.0.0: Plugin org.eclipse.m2e:lifecycle-mapping:1.0.0 or one of its dependencies could not be resolved: Failed to read artifact descriptor for org.eclipse.m2e:lifecycle-mapping:jar:1.0.0 [WARNING] The POM for org.eclipse.m2e:lifecycle-mapping:jar:1.0.0 is missing, no dependency information available [WARNING] Failed to retrieve plugin descriptor for org.eclipse.m2e:lifecycle-mapping:1.0.0: Plugin org.eclipse.m2e:lifecycle-mapping:1.0.0 or one of its dependencies could not be resolved: Failed to read artifact descriptor for org.eclipse.m2e:lifecycle-mapping:jar:1.0.0 {noformat} > The context ClassLoader isn't set while calling into a coprocessor > -- > > Key: HBASE-9941 > URL: https://issues.apache.org/jira/browse/HBASE-9941 > Project: HBase > Issue Type: Sub-task > Components: Coprocessors >Affects Versions: 0.96.0 >Reporter: Benoit Sigoure >Assignee: Andrew Purtell > Fix For: 0.98.0 > > Attachments: 9941.patch, 9941.patch, 9941.patch, 9941.patch, > 9941.patch > > > Whenever one of the methods of a coprocessor is invoked, the context > {{ClassLoader}} isn't set to be the {{CoprocessorClassLoader}}. It's only > set properly when calling the coprocessor's {{start}} method. This means > that if the coprocessor code attempts to load classes using the context > {{ClassLoader}}, it will fail to find the classes it's looking for. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10279) TestStore.testDeleteExpiredStoreFiles is flaky
[ https://issues.apache.org/jira/browse/HBASE-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862226#comment-13862226 ] Hudson commented on HBASE-10279: SUCCESS: Integrated in HBase-0.94-security #378 (See [https://builds.apache.org/job/HBase-0.94-security/378/]) HBASE-10279 TestStore.testDeleteExpiredStoreFiles is flaky (larsh: rev 1555321) * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java > TestStore.testDeleteExpiredStoreFiles is flaky > -- > > Key: HBASE-10279 > URL: https://issues.apache.org/jira/browse/HBASE-10279 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Fix For: 0.94.16 > > Attachments: 10279-0.94.txt > > > TestStore.testDeleteExpiredStoreFiles relies on wall clock time, if there is > a blip on the machine running the test, first compaction might be delayed > enough in order to compact away multiple of the files, and have the test fail. > The simplest fix is to just double the time given from 1s/file to 2s/file. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10279) TestStore.testDeleteExpiredStoreFiles is flaky
[ https://issues.apache.org/jira/browse/HBASE-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862224#comment-13862224 ] Hudson commented on HBASE-10279: ABORTED: Integrated in HBase-0.94 #1250 (See [https://builds.apache.org/job/HBase-0.94/1250/]) HBASE-10279 TestStore.testDeleteExpiredStoreFiles is flaky (larsh: rev 1555321) * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java > TestStore.testDeleteExpiredStoreFiles is flaky > -- > > Key: HBASE-10279 > URL: https://issues.apache.org/jira/browse/HBASE-10279 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Fix For: 0.94.16 > > Attachments: 10279-0.94.txt > > > TestStore.testDeleteExpiredStoreFiles relies on wall clock time, if there is > a blip on the machine running the test, first compaction might be delayed > enough in order to compact away multiple of the files, and have the test fail. > The simplest fix is to just double the time given from 1s/file to 2s/file. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10272) Cluster becomes nonoperational if the node hosting the active Master AND ROOT/META table goes offline
[ https://issues.apache.org/jira/browse/HBASE-10272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862223#comment-13862223 ] Hudson commented on HBASE-10272: SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #52 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/52/]) HBASE-10272 Cluster becomes nonoperational if the node hosting the active Master AND ROOT/META table goes offline (Tedyu: rev 1555313) * /hbase/branches/0.98/hbase-client/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java > Cluster becomes nonoperational if the node hosting the active Master AND > ROOT/META table goes offline > - > > Key: HBASE-10272 > URL: https://issues.apache.org/jira/browse/HBASE-10272 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 0.96.1, 0.94.15 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Fix For: 0.98.0, 0.99.0 > > Attachments: HBASE-10272.patch, HBASE-10272_0.94.patch > > > Since HBASE-6364, HBase client caches a connection failure to a server and > any subsequent attempt to connect to the server throws a > {{FailedServerException}} > Now if a node which hosted the active Master AND ROOT/META table goes > offline, the newly anointed Master's initial attempt to connect to the dead > region server will fail with {{NoRouteToHostException}} which it handles but > since on second attempt crashes with {{FailedServerException}} > Here is the log from one such occurance > {noformat} > 2013-11-20 10:58:00,161 FATAL org.apache.hadoop.hbase.master.HMaster: Master > server abort: loaded coprocessors are: [] > 2013-11-20 10:58:00,161 FATAL org.apache.hadoop.hbase.master.HMaster: > Unhandled exception. Starting shutdown. > org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is > in the failed servers list: xxx02/192.168.1.102:60020 > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425) > at > org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) > at $Proxy9.getProtocolVersion(Unknown Source) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138) > at > org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1335) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1294) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1281) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:506) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:383) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:445) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnection(CatalogTracker.java:464) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:624) > at > org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:684) > at > org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:560) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:376) > at java.lang.Thread.run(Thread.java:662) > 2013-11-20 10:58:00,162 INFO org.apache.hadoop.hbase.master.HMaster: Aborting > 2013-11-20 10:58:00,162 INFO org.apache.hadoop.ipc.HBaseServer: Stopping > server on 6 > {noformat} > Each of the backup master will crash with same error and restarting them will > have the same effect. Once this happens, the cluster will remain > in-operational until the node with region server is brought online (or the > Zookeeper node containing the root region server and/or META entry from the > ROOT table is deleted). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10279) TestStore.testDeleteExpiredStoreFiles is flaky
[ https://issues.apache.org/jira/browse/HBASE-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1386#comment-1386 ] Hudson commented on HBASE-10279: ABORTED: Integrated in HBase-0.94-JDK7 #15 (See [https://builds.apache.org/job/HBase-0.94-JDK7/15/]) HBASE-10279 TestStore.testDeleteExpiredStoreFiles is flaky (larsh: rev 1555321) * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java > TestStore.testDeleteExpiredStoreFiles is flaky > -- > > Key: HBASE-10279 > URL: https://issues.apache.org/jira/browse/HBASE-10279 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Fix For: 0.94.16 > > Attachments: 10279-0.94.txt > > > TestStore.testDeleteExpiredStoreFiles relies on wall clock time, if there is > a blip on the machine running the test, first compaction might be delayed > enough in order to compact away multiple of the files, and have the test fail. > The simplest fix is to just double the time given from 1s/file to 2s/file. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10272) Cluster becomes nonoperational if the node hosting the active Master AND ROOT/META table goes offline
[ https://issues.apache.org/jira/browse/HBASE-10272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862219#comment-13862219 ] Hudson commented on HBASE-10272: FAILURE: Integrated in HBase-0.98 #56 (See [https://builds.apache.org/job/HBase-0.98/56/]) HBASE-10272 Cluster becomes nonoperational if the node hosting the active Master AND ROOT/META table goes offline (Tedyu: rev 1555313) * /hbase/branches/0.98/hbase-client/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java > Cluster becomes nonoperational if the node hosting the active Master AND > ROOT/META table goes offline > - > > Key: HBASE-10272 > URL: https://issues.apache.org/jira/browse/HBASE-10272 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 0.96.1, 0.94.15 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Fix For: 0.98.0, 0.99.0 > > Attachments: HBASE-10272.patch, HBASE-10272_0.94.patch > > > Since HBASE-6364, HBase client caches a connection failure to a server and > any subsequent attempt to connect to the server throws a > {{FailedServerException}} > Now if a node which hosted the active Master AND ROOT/META table goes > offline, the newly anointed Master's initial attempt to connect to the dead > region server will fail with {{NoRouteToHostException}} which it handles but > since on second attempt crashes with {{FailedServerException}} > Here is the log from one such occurance > {noformat} > 2013-11-20 10:58:00,161 FATAL org.apache.hadoop.hbase.master.HMaster: Master > server abort: loaded coprocessors are: [] > 2013-11-20 10:58:00,161 FATAL org.apache.hadoop.hbase.master.HMaster: > Unhandled exception. Starting shutdown. > org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is > in the failed servers list: xxx02/192.168.1.102:60020 > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425) > at > org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) > at $Proxy9.getProtocolVersion(Unknown Source) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138) > at > org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1335) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1294) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1281) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:506) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:383) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:445) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnection(CatalogTracker.java:464) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:624) > at > org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:684) > at > org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:560) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:376) > at java.lang.Thread.run(Thread.java:662) > 2013-11-20 10:58:00,162 INFO org.apache.hadoop.hbase.master.HMaster: Aborting > 2013-11-20 10:58:00,162 INFO org.apache.hadoop.ipc.HBaseServer: Stopping > server on 6 > {noformat} > Each of the backup master will crash with same error and restarting them will > have the same effect. Once this happens, the cluster will remain > in-operational until the node with region server is brought online (or the > Zookeeper node containing the root region server and/or META entry from the > ROOT table is deleted). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8741) Scope sequenceid to the region rather than regionserver (WAS: Mutations on Regions in recovery mode might have same sequenceIDs)
[ https://issues.apache.org/jira/browse/HBASE-8741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862218#comment-13862218 ] Liyin Tang commented on HBASE-8741: --- Good to know :) Not sure whether it is worth amending the above implication in the java doc. Basically, the latestSequenceNums might contain duplicated keys for the same region. In 89-fb, we just use the ConcurrentSkipListMap, just in case this map might be reused for other purpose. Anyway, thanks for the explanation ! Nice feature indeed ! > Scope sequenceid to the region rather than regionserver (WAS: Mutations on > Regions in recovery mode might have same sequenceIDs) > > > Key: HBASE-8741 > URL: https://issues.apache.org/jira/browse/HBASE-8741 > Project: HBase > Issue Type: Bug > Components: MTTR >Affects Versions: 0.95.1 >Reporter: Himanshu Vashishtha >Assignee: Himanshu Vashishtha > Fix For: 0.98.0 > > Attachments: HBASE-8741-trunk-v6.1-rebased.patch, > HBASE-8741-trunk-v6.2.1.patch, HBASE-8741-trunk-v6.2.2.patch, > HBASE-8741-trunk-v6.2.2.patch, HBASE-8741-trunk-v6.3.patch, > HBASE-8741-trunk-v6.4.patch, HBASE-8741-trunk-v6.patch, HBASE-8741-v0.patch, > HBASE-8741-v2.patch, HBASE-8741-v3.patch, HBASE-8741-v4-again.patch, > HBASE-8741-v4-again.patch, HBASE-8741-v4.patch, HBASE-8741-v5-again.patch, > HBASE-8741-v5.patch > > > Currently, when opening a region, we find the maximum sequence ID from all > its HFiles and then set the LogSequenceId of the log (in case the later is at > a small value). This works good in recovered.edits case as we are not writing > to the region until we have replayed all of its previous edits. > With distributed log replay, if we want to enable writes while a region is > under recovery, we need to make sure that the logSequenceId > maximum > logSequenceId of the old regionserver. Otherwise, we might have a situation > where new edits have same (or smaller) sequenceIds. > We can store region level information in the WALTrailer, than this scenario > could be avoided by: > a) reading the trailer of the "last completed" file, i.e., last wal file > which has a trailer and, > b) completely reading the last wal file (this file would not have the > trailer, so it needs to be read completely). > In future, if we switch to multi wal file, we could read the trailer for all > completed WAL files, and reading the remaining incomplete files. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10272) Cluster becomes nonoperational if the node hosting the active Master AND ROOT/META table goes offline
[ https://issues.apache.org/jira/browse/HBASE-10272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862216#comment-13862216 ] Hudson commented on HBASE-10272: SUCCESS: Integrated in HBase-TRUNK #4787 (See [https://builds.apache.org/job/HBase-TRUNK/4787/]) HBASE-10272 Cluster becomes nonoperational if the node hosting the active Master AND ROOT/META table goes offline (Tedyu: rev 1555312) * /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java > Cluster becomes nonoperational if the node hosting the active Master AND > ROOT/META table goes offline > - > > Key: HBASE-10272 > URL: https://issues.apache.org/jira/browse/HBASE-10272 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 0.96.1, 0.94.15 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Fix For: 0.98.0, 0.99.0 > > Attachments: HBASE-10272.patch, HBASE-10272_0.94.patch > > > Since HBASE-6364, HBase client caches a connection failure to a server and > any subsequent attempt to connect to the server throws a > {{FailedServerException}} > Now if a node which hosted the active Master AND ROOT/META table goes > offline, the newly anointed Master's initial attempt to connect to the dead > region server will fail with {{NoRouteToHostException}} which it handles but > since on second attempt crashes with {{FailedServerException}} > Here is the log from one such occurance > {noformat} > 2013-11-20 10:58:00,161 FATAL org.apache.hadoop.hbase.master.HMaster: Master > server abort: loaded coprocessors are: [] > 2013-11-20 10:58:00,161 FATAL org.apache.hadoop.hbase.master.HMaster: > Unhandled exception. Starting shutdown. > org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is > in the failed servers list: xxx02/192.168.1.102:60020 > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425) > at > org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) > at $Proxy9.getProtocolVersion(Unknown Source) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138) > at > org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1335) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1294) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1281) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:506) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:383) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:445) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnection(CatalogTracker.java:464) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:624) > at > org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:684) > at > org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:560) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:376) > at java.lang.Thread.run(Thread.java:662) > 2013-11-20 10:58:00,162 INFO org.apache.hadoop.hbase.master.HMaster: Aborting > 2013-11-20 10:58:00,162 INFO org.apache.hadoop.ipc.HBaseServer: Stopping > server on 6 > {noformat} > Each of the backup master will crash with same error and restarting them will > have the same effect. Once this happens, the cluster will remain > in-operational until the node with region server is brought online (or the > Zookeeper node containing the root region server and/or META entry from the > ROOT table is deleted). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9941) The context ClassLoader isn't set while calling into a coprocessor
[ https://issues.apache.org/jira/browse/HBASE-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862211#comment-13862211 ] Hadoop QA commented on HBASE-9941: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621452/9941.patch against trunk revision . ATTACHMENT ID: 12621452 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 4 release audit warnings (more than the trunk's current 0 warnings). {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8339//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8339//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8339//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8339//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8339//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8339//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8339//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8339//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8339//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8339//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8339//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8339//console This message is automatically generated. > The context ClassLoader isn't set while calling into a coprocessor > -- > > Key: HBASE-9941 > URL: https://issues.apache.org/jira/browse/HBASE-9941 > Project: HBase > Issue Type: Sub-task > Components: Coprocessors >Affects Versions: 0.96.0 >Reporter: Benoit Sigoure >Assignee: Andrew Purtell > Fix For: 0.98.0 > > Attachments: 9941.patch, 9941.patch, 9941.patch, 9941.patch, > 9941.patch > > > Whenever one of the methods of a coprocessor is invoked, the context > {{ClassLoader}} isn't set to be the {{CoprocessorClassLoader}}. It's only > set properly when calling the coprocessor's {{start}} method. This means > that if the coprocessor code attempts to load classes using the context > {{ClassLoader}}, it will fail to find the classes it's looking for. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10275) [89-fb] Guarantee the sequenceID in each Region is strictly monotonic increasing
[ https://issues.apache.org/jira/browse/HBASE-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862208#comment-13862208 ] Liyin Tang commented on HBASE-10275: The problem you have described is exactly what we want to resolve. Basically if the sequenceID for each region is strictly monotonic increasing, then in the case of a region moving from A to B, the replication stream in B would know the gap/lag for that region in the previous replication stream A. As you mentioned but slightly different: The fix is to guarantee the old hlog entries of a region from the previous region server been fully replicated, before starting to replicate this region from a new region server. > [89-fb] Guarantee the sequenceID in each Region is strictly monotonic > increasing > > > Key: HBASE-10275 > URL: https://issues.apache.org/jira/browse/HBASE-10275 > Project: HBase > Issue Type: New Feature >Reporter: Liyin Tang >Assignee: Liyin Tang > > [HBASE-8741] has implemented the per-region sequence ID. It would be even > better to guarantee that the sequencing is strictly monotonic increasing so > that HLog-Based Async Replication is able to delivery transactions in order > in the case of region movements. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HBASE-10279) TestStore.testDeleteExpiredStoreFiles is flaky
[ https://issues.apache.org/jira/browse/HBASE-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-10279. --- Resolution: Fixed Committed to 0.94 only (similar was added in 0.95, so all other branches have it) > TestStore.testDeleteExpiredStoreFiles is flaky > -- > > Key: HBASE-10279 > URL: https://issues.apache.org/jira/browse/HBASE-10279 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Attachments: 10279-0.94.txt > > > TestStore.testDeleteExpiredStoreFiles relies on wall clock time, if there is > a blip on the machine running the test, first compaction might be delayed > enough in order to compact away multiple of the files, and have the test fail. > The simplest fix is to just double the time given from 1s/file to 2s/file. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10279) TestStore.testDeleteExpiredStoreFiles is flaky
[ https://issues.apache.org/jira/browse/HBASE-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-10279: -- Fix Version/s: 0.94.16 > TestStore.testDeleteExpiredStoreFiles is flaky > -- > > Key: HBASE-10279 > URL: https://issues.apache.org/jira/browse/HBASE-10279 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Fix For: 0.94.16 > > Attachments: 10279-0.94.txt > > > TestStore.testDeleteExpiredStoreFiles relies on wall clock time, if there is > a blip on the machine running the test, first compaction might be delayed > enough in order to compact away multiple of the files, and have the test fail. > The simplest fix is to just double the time given from 1s/file to 2s/file. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10279) TestStore.testDeleteExpiredStoreFiles is flaky
[ https://issues.apache.org/jira/browse/HBASE-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862201#comment-13862201 ] Lars Hofhansl commented on HBASE-10279: --- This was fixed a while back with HBASE-6832. IncrementingEnvironmentEdge is a bit different trunk. I'll commit by patch to 0.94. Thanks for looking [~apurtell] > TestStore.testDeleteExpiredStoreFiles is flaky > -- > > Key: HBASE-10279 > URL: https://issues.apache.org/jira/browse/HBASE-10279 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Attachments: 10279-0.94.txt > > > TestStore.testDeleteExpiredStoreFiles relies on wall clock time, if there is > a blip on the machine running the test, first compaction might be delayed > enough in order to compact away multiple of the files, and have the test fail. > The simplest fix is to just double the time given from 1s/file to 2s/file. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10263) make LruBlockCache single/multi/in-memory ratio user-configurable and provide preemptive mode for in-memory type block
[ https://issues.apache.org/jira/browse/HBASE-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862198#comment-13862198 ] Hadoop QA commented on HBASE-10263: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621447/HBASE-10263-trunk_v1.patch against trunk revision . ATTACHMENT ID: 12621447 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 4 release audit warnings (more than the trunk's current 0 warnings). {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8338//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8338//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8338//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8338//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8338//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8338//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8338//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8338//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8338//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8338//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8338//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8338//console This message is automatically generated. > make LruBlockCache single/multi/in-memory ratio user-configurable and provide > preemptive mode for in-memory type block > -- > > Key: HBASE-10263 > URL: https://issues.apache.org/jira/browse/HBASE-10263 > Project: HBase > Issue Type: Improvement > Components: io >Reporter: Feng Honghua >Assignee: Feng Honghua > Attachments: HBASE-10263-trunk_v0.patch, HBASE-10263-trunk_v1.patch > > > currently the single/multi/in-memory ratio in LruBlockCache is hardcoded > 1:2:1, which can lead to somewhat counter-intuition behavior for some user > scenario where in-memory table's read performance is much worse than ordinary > table when two tables' data size is almost equal and larger than > regionserver's cache size (we ever did some such experiment and verified that > in-memory table random read performance is two times worse than ordinary > table). > this patch fixes above issue and provides: > 1. make single/multi/in-memory ratio user-configurable > 2. provide a configurable switch which can make in-memory block preemptive, > by preemptive means when this switch is on in-memory block can kick out any > ordinary block to make room until no ordinary block, when this switch is off > (by default) the behavior is the same as previous, using > single/multi/in-memory ratio to determine evicting. > by default, above two changes are both off and the behavior keeps the same as > before applying this patch. it's client/user's choice to determine whether or > which behavior to use by enabling one of t
[jira] [Commented] (HBASE-10275) [89-fb] Guarantee the sequenceID in each Region is strictly monotonic increasing
[ https://issues.apache.org/jira/browse/HBASE-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862193#comment-13862193 ] Feng Honghua commented on HBASE-10275: -- For your reference, [HBASE-9465|https://issues.apache.org/jira/browse/HBASE-9465] describes the problem of no guarantee of serial transaction delivery to peer in case failover or region-move. In essence, it's hard to fix if we don't synchronize the previous(or worker regionserver which takes over the hlog pushing for the failed regionserver) and current hosting regionserver on hlog push. Without synchronization two different regionservers can push hlog entries of a same region with different pace. Another alternative fix is to guarantee the old hlog entries of a region have all been pushed to peer before it can be opened by a new regionserver. > [89-fb] Guarantee the sequenceID in each Region is strictly monotonic > increasing > > > Key: HBASE-10275 > URL: https://issues.apache.org/jira/browse/HBASE-10275 > Project: HBase > Issue Type: New Feature >Reporter: Liyin Tang >Assignee: Liyin Tang > > [HBASE-8741] has implemented the per-region sequence ID. It would be even > better to guarantee that the sequencing is strictly monotonic increasing so > that HLog-Based Async Replication is able to delivery transactions in order > in the case of region movements. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10275) [89-fb] Guarantee the sequenceID in each Region is strictly monotonic increasing
[ https://issues.apache.org/jira/browse/HBASE-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862190#comment-13862190 ] Feng Honghua commented on HBASE-10275: -- To achieve the goal of in-order (hlog) transaction delivery, also need to guarantee all the older(smaller) hlog entries in previous regionserver have been successfully pushed(replicated) to peer before the region is served by the new regionserver, right? otherwise it's still possible the hlog entries with smaller sequenceid are pushed(replicated) to peer in previous hosting regionserver *after* the ones with greater sequenceid in the new/current hosting regionserver, right? For region movement in case of regionserver failover(if we deem it another kind of region movement, though passively), the hlog files containing un-pushed entries for the region will be handled by a different regionserver other than the region's new hosting regionserver, under this situation, it needs the communication/synchronization between these two regionservers to achieve the region's in-order transaction delivery from the overall perspective. > [89-fb] Guarantee the sequenceID in each Region is strictly monotonic > increasing > > > Key: HBASE-10275 > URL: https://issues.apache.org/jira/browse/HBASE-10275 > Project: HBase > Issue Type: New Feature >Reporter: Liyin Tang >Assignee: Liyin Tang > > [HBASE-8741] has implemented the per-region sequence ID. It would be even > better to guarantee that the sequencing is strictly monotonic increasing so > that HLog-Based Async Replication is able to delivery transactions in order > in the case of region movements. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10210) during master startup, RS can be you-are-dead-ed by master in error
[ https://issues.apache.org/jira/browse/HBASE-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862189#comment-13862189 ] Hudson commented on HBASE-10210: SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #51 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/51/]) HBASE-10210 during master startup, RS can be you-are-dead-ed by master in error (sershe: rev 1555302) * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManagerOnCluster.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java > during master startup, RS can be you-are-dead-ed by master in error > --- > > Key: HBASE-10210 > URL: https://issues.apache.org/jira/browse/HBASE-10210 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0, 0.96.1, 0.99.0, 0.96.1.1 >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 0.98.0, 0.99.0 > > Attachments: HBASE-10210.01.patch, HBASE-10210.02.patch, > HBASE-10210.03.patch, HBASE-10210.04.patch, HBASE-10210.05.patch, > HBASE-10210.patch > > > Not sure of the root cause yet, I am at "how did this ever work" stage. > We see this problem in 0.96.1, but didn't in 0.96.0 + some patches. > It looks like RS information arriving from 2 sources - ZK and server itself, > can conflict. Master doesn't handle such cases (timestamp match), and anyway > technically timestamps can collide for two separate servers. > So, master YouAreDead-s the already-recorded reporting RS, and adds it too. > Then it discovers that the new server has died with fatal error! > Note the threads. > Addition is called from master initialization and from RPC. > {noformat} > 2013-12-19 11:16:45,290 INFO > [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: > Finished waiting for region servers count to settle; checked in 2, slept for > 18262 ms, expecting minimum of 1, maximum of 2147483647, master is running. > 2013-12-19 11:16:45,290 INFO > [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: > Registering > server=h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > 2013-12-19 11:16:45,290 INFO > [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.HMaster: Registered > server found up in zk but who has not yet reported in: > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] > master.ServerManager: Triggering server recovery; existingServer > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > looks stale, new > server:h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] > master.ServerManager: Master doesn't enable ServerShutdownHandler during > initialization, delay expiring server > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > ... > 2013-12-19 11:16:46,925 ERROR [RpcServer.handler=7,port=6] > master.HMaster: Region server > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > reported a fatal error: > ABORTING region server > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800: > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; > currently processing > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 as > dead server > {noformat} > Presumably some of the recent ZK listener related changes b -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-9941) The context ClassLoader isn't set while calling into a coprocessor
[ https://issues.apache.org/jira/browse/HBASE-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-9941: -- Attachment: 9941.patch Latest patch removes the microbenchmark and POM changes. This is what I will commit soon. > The context ClassLoader isn't set while calling into a coprocessor > -- > > Key: HBASE-9941 > URL: https://issues.apache.org/jira/browse/HBASE-9941 > Project: HBase > Issue Type: Sub-task > Components: Coprocessors >Affects Versions: 0.96.0 >Reporter: Benoit Sigoure >Assignee: Andrew Purtell > Fix For: 0.98.0 > > Attachments: 9941.patch, 9941.patch, 9941.patch, 9941.patch, > 9941.patch > > > Whenever one of the methods of a coprocessor is invoked, the context > {{ClassLoader}} isn't set to be the {{CoprocessorClassLoader}}. It's only > set properly when calling the coprocessor's {{start}} method. This means > that if the coprocessor code attempts to load classes using the context > {{ClassLoader}}, it will fail to find the classes it's looking for. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10263) make LruBlockCache single/multi/in-memory ratio user-configurable and provide preemptive mode for in-memory type block
[ https://issues.apache.org/jira/browse/HBASE-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862186#comment-13862186 ] Ted Yu commented on HBASE-10263: [~apurtell]: Do you want this in 0.98 ? > make LruBlockCache single/multi/in-memory ratio user-configurable and provide > preemptive mode for in-memory type block > -- > > Key: HBASE-10263 > URL: https://issues.apache.org/jira/browse/HBASE-10263 > Project: HBase > Issue Type: Improvement > Components: io >Reporter: Feng Honghua >Assignee: Feng Honghua > Attachments: HBASE-10263-trunk_v0.patch, HBASE-10263-trunk_v1.patch > > > currently the single/multi/in-memory ratio in LruBlockCache is hardcoded > 1:2:1, which can lead to somewhat counter-intuition behavior for some user > scenario where in-memory table's read performance is much worse than ordinary > table when two tables' data size is almost equal and larger than > regionserver's cache size (we ever did some such experiment and verified that > in-memory table random read performance is two times worse than ordinary > table). > this patch fixes above issue and provides: > 1. make single/multi/in-memory ratio user-configurable > 2. provide a configurable switch which can make in-memory block preemptive, > by preemptive means when this switch is on in-memory block can kick out any > ordinary block to make room until no ordinary block, when this switch is off > (by default) the behavior is the same as previous, using > single/multi/in-memory ratio to determine evicting. > by default, above two changes are both off and the behavior keeps the same as > before applying this patch. it's client/user's choice to determine whether or > which behavior to use by enabling one of these two enhancements. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10210) during master startup, RS can be you-are-dead-ed by master in error
[ https://issues.apache.org/jira/browse/HBASE-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862184#comment-13862184 ] Hudson commented on HBASE-10210: SUCCESS: Integrated in HBase-0.98 #55 (See [https://builds.apache.org/job/HBase-0.98/55/]) HBASE-10210 during master startup, RS can be you-are-dead-ed by master in error (sershe: rev 1555302) * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManagerOnCluster.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java > during master startup, RS can be you-are-dead-ed by master in error > --- > > Key: HBASE-10210 > URL: https://issues.apache.org/jira/browse/HBASE-10210 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0, 0.96.1, 0.99.0, 0.96.1.1 >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 0.98.0, 0.99.0 > > Attachments: HBASE-10210.01.patch, HBASE-10210.02.patch, > HBASE-10210.03.patch, HBASE-10210.04.patch, HBASE-10210.05.patch, > HBASE-10210.patch > > > Not sure of the root cause yet, I am at "how did this ever work" stage. > We see this problem in 0.96.1, but didn't in 0.96.0 + some patches. > It looks like RS information arriving from 2 sources - ZK and server itself, > can conflict. Master doesn't handle such cases (timestamp match), and anyway > technically timestamps can collide for two separate servers. > So, master YouAreDead-s the already-recorded reporting RS, and adds it too. > Then it discovers that the new server has died with fatal error! > Note the threads. > Addition is called from master initialization and from RPC. > {noformat} > 2013-12-19 11:16:45,290 INFO > [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: > Finished waiting for region servers count to settle; checked in 2, slept for > 18262 ms, expecting minimum of 1, maximum of 2147483647, master is running. > 2013-12-19 11:16:45,290 INFO > [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: > Registering > server=h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > 2013-12-19 11:16:45,290 INFO > [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.HMaster: Registered > server found up in zk but who has not yet reported in: > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] > master.ServerManager: Triggering server recovery; existingServer > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > looks stale, new > server:h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] > master.ServerManager: Master doesn't enable ServerShutdownHandler during > initialization, delay expiring server > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > ... > 2013-12-19 11:16:46,925 ERROR [RpcServer.handler=7,port=6] > master.HMaster: Region server > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > reported a fatal error: > ABORTING region server > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800: > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; > currently processing > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 as > dead server > {noformat} > Presumably some of the recent ZK listener related changes b -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9941) The context ClassLoader isn't set while calling into a coprocessor
[ https://issues.apache.org/jira/browse/HBASE-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862183#comment-13862183 ] Andrew Purtell commented on HBASE-9941: --- I can see caliper at test scope in the classpath when I run 'mvn dependency:build-classpath' on the command line but it is not showing up in the generated file, which appears to be generated by running the exact same plugin and goal. Beats me. Maven "documentation" offers no clue. If I leave it at test scope the microbenchmark will compile but cannot be run from in tree or an untarred assembly. > The context ClassLoader isn't set while calling into a coprocessor > -- > > Key: HBASE-9941 > URL: https://issues.apache.org/jira/browse/HBASE-9941 > Project: HBase > Issue Type: Sub-task > Components: Coprocessors >Affects Versions: 0.96.0 >Reporter: Benoit Sigoure >Assignee: Andrew Purtell > Fix For: 0.98.0 > > Attachments: 9941.patch, 9941.patch, 9941.patch, 9941.patch > > > Whenever one of the methods of a coprocessor is invoked, the context > {{ClassLoader}} isn't set to be the {{CoprocessorClassLoader}}. It's only > set properly when calling the coprocessor's {{start}} method. This means > that if the coprocessor code attempts to load classes using the context > {{ClassLoader}}, it will fail to find the classes it's looking for. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10280) Make inMemoryForceMode of LruBlockCache configurable per column-family
[ https://issues.apache.org/jira/browse/HBASE-10280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862181#comment-13862181 ] Feng Honghua commented on HBASE-10280: -- Thanks for the correction of its parent jira number, [~yuzhih...@gmail.com] :-) > Make inMemoryForceMode of LruBlockCache configurable per column-family > -- > > Key: HBASE-10280 > URL: https://issues.apache.org/jira/browse/HBASE-10280 > Project: HBase > Issue Type: Improvement > Components: io, regionserver >Reporter: Feng Honghua >Assignee: Feng Honghua > > An extension of > [HBASE-10263|https://issues.apache.org/jira/browse/HBASE-10263] per > [~yuzhih...@gmail.com]'s suggestion. > brief description of this extension is as below: > 1. if the global inMemoryForceMode is on, all in-memory blocks are preemptive > no matter what their per-column-family inMemoryForceMode are; > 2. if the global inMemoryForceMode is off, only in-memory blocks of > column-family with inMemoryForceMode on are preemptive; non-preemptive > inMemory blocks respect the single/multi/memory ratio; > In short, global flag dominates, and per-column-family flag can only control > its own blocks. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8751) Enable peer cluster to choose/change the ColumnFamilies/Tables it really want to replicate from a source cluster
[ https://issues.apache.org/jira/browse/HBASE-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862180#comment-13862180 ] Feng Honghua commented on HBASE-8751: - The newly added tableCFs config is a per-peer attribute, much like the peer state which indicates whether replication for this peer is currently ON or OFF, the latter is one of peer's attribute to control the peer's behavior, the newly added tableCFs is another peer attribute controlling which data will be pushed(replicated) to the peer in a user-defined(finer and more accurate, hence more flexible) granularity. Sounds natural this attribute keeps the same implementation theme as peer state, right? Currently, in either 0.94.x or in trunk, the peer state has a permanent node in zk, and for the above reason I do want to align the implementation theme of tableCFs with peer state. It would look weird if we implement differently for these two per-peer attributes. :-) > Enable peer cluster to choose/change the ColumnFamilies/Tables it really want > to replicate from a source cluster > > > Key: HBASE-8751 > URL: https://issues.apache.org/jira/browse/HBASE-8751 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Feng Honghua >Assignee: Feng Honghua > Attachments: HBASE-8751-0.94-V0.patch, HBASE-8751-0.94-v1.patch > > > Consider scenarios (all cf are with replication-scope=1): > 1) cluster S has 3 tables, table A has cfA,cfB, table B has cfX,cfY, table C > has cf1,cf2. > 2) cluster X wants to replicate table A : cfA, table B : cfX and table C from > cluster S. > 3) cluster Y wants to replicate table B : cfY, table C : cf2 from cluster S. > Current replication implementation can't achieve this since it'll push the > data of all the replicatable column-families from cluster S to all its peers, > X/Y in this scenario. > This improvement provides a fine-grained replication theme which enable peer > cluster to choose the column-families/tables they really want from the source > cluster: > A). Set the table:cf-list for a peer when addPeer: > hbase-shell> add_peer '3', "zk:1100:/hbase", "table1; table2:cf1,cf2; > table3:cf2" > B). View the table:cf-list config for a peer using show_peer_tableCFs: > hbase-shell> show_peer_tableCFs "1" > C). Change/set the table:cf-list for a peer using set_peer_tableCFs: > hbase-shell> set_peer_tableCFs '2', "table1:cfX; table2:cf1; table3:cf1,cf2" > In this theme, replication-scope=1 only means a column-family CAN be > replicated to other clusters, but only the 'table:cf-list list' determines > WHICH cf/table will actually be replicated to a specific peer. > To provide back-compatibility, empty 'table:cf-list list' will replicate all > replicatable cf/table. (this means we don't allow a peer which replicates > nothing from a source cluster, we think it's reasonable: if replicating > nothing why bother adding a peer?) > This improvement addresses the exact problem raised by the first FAQ in > "http://hbase.apache.org/replication.html": > "GLOBAL means replicate? Any provision to replicate only to cluster X and > not to cluster Y? or is that for later? > Yes, this is for much later." > I also noticed somebody mentioned "replication-scope" as integer rather than > a boolean is for such fine-grained replication purpose, but I think extending > "replication-scope" can't achieve the same replication granularity > flexibility as providing above per-peer replication configurations. > This improvement has been running smoothly in our production clusters > (Xiaomi) for several months. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10280) Make inMemoryForceMode of LruBlockCache configurable per column-family
[ https://issues.apache.org/jira/browse/HBASE-10280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862175#comment-13862175 ] Ted Yu commented on HBASE-10280: The above description is good. > Make inMemoryForceMode of LruBlockCache configurable per column-family > -- > > Key: HBASE-10280 > URL: https://issues.apache.org/jira/browse/HBASE-10280 > Project: HBase > Issue Type: Improvement > Components: io, regionserver >Reporter: Feng Honghua >Assignee: Feng Honghua > > An extension of > [HBASE-10263|https://issues.apache.org/jira/browse/HBASE-10263] per > [~yuzhih...@gmail.com]'s suggestion. > brief description of this extension is as below: > 1. if the global inMemoryForceMode is on, all in-memory blocks are preemptive > no matter what their per-column-family inMemoryForceMode are; > 2. if the global inMemoryForceMode is off, only in-memory blocks of > column-family with inMemoryForceMode on are preemptive; non-preemptive > inMemory blocks respect the single/multi/memory ratio; > In short, global flag dominates, and per-column-family flag can only control > its own blocks. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10280) Make inMemoryForceMode of LruBlockCache configurable per column-family
[ https://issues.apache.org/jira/browse/HBASE-10280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10280: --- Description: An extension of [HBASE-10263|https://issues.apache.org/jira/browse/HBASE-10263] per [~yuzhih...@gmail.com]'s suggestion. brief description of this extension is as below: 1. if the global inMemoryForceMode is on, all in-memory blocks are preemptive no matter what their per-column-family inMemoryForceMode are; 2. if the global inMemoryForceMode is off, only in-memory blocks of column-family with inMemoryForceMode on are preemptive; non-preemptive inMemory blocks respect the single/multi/memory ratio; In short, global flag dominates, and per-column-family flag can only control its own blocks. was: An extension of [HBASE-10273|https://issues.apache.org/jira/browse/HBASE-10263] per [~yuzhih...@gmail.com]'s suggestion. brief description of this extension is as below: 1. if the global inMemoryForceMode is on, all in-memory blocks are preemptive no matter what their per-column-family inMemoryForceMode are; 2. if the global inMemoryForceMode is off, only in-memory blocks of column-family with inMemoryForceMode on are preemptive; non-preemptive inMemory blocks respect the single/multi/memory ratio; In short, global flag dominates, and per-column-family flag can only control its own blocks. > Make inMemoryForceMode of LruBlockCache configurable per column-family > -- > > Key: HBASE-10280 > URL: https://issues.apache.org/jira/browse/HBASE-10280 > Project: HBase > Issue Type: Improvement > Components: io, regionserver >Reporter: Feng Honghua >Assignee: Feng Honghua > > An extension of > [HBASE-10263|https://issues.apache.org/jira/browse/HBASE-10263] per > [~yuzhih...@gmail.com]'s suggestion. > brief description of this extension is as below: > 1. if the global inMemoryForceMode is on, all in-memory blocks are preemptive > no matter what their per-column-family inMemoryForceMode are; > 2. if the global inMemoryForceMode is off, only in-memory blocks of > column-family with inMemoryForceMode on are preemptive; non-preemptive > inMemory blocks respect the single/multi/memory ratio; > In short, global flag dominates, and per-column-family flag can only control > its own blocks. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10272) Cluster becomes nonoperational if the node hosting the active Master AND ROOT/META table goes offline
[ https://issues.apache.org/jira/browse/HBASE-10272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10272: --- Fix Version/s: 0.99.0 0.98.0 Hadoop Flags: Reviewed Integrated to 0.98 and trunk. Thanks for the reviews. > Cluster becomes nonoperational if the node hosting the active Master AND > ROOT/META table goes offline > - > > Key: HBASE-10272 > URL: https://issues.apache.org/jira/browse/HBASE-10272 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 0.96.1, 0.94.15 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Fix For: 0.98.0, 0.99.0 > > Attachments: HBASE-10272.patch, HBASE-10272_0.94.patch > > > Since HBASE-6364, HBase client caches a connection failure to a server and > any subsequent attempt to connect to the server throws a > {{FailedServerException}} > Now if a node which hosted the active Master AND ROOT/META table goes > offline, the newly anointed Master's initial attempt to connect to the dead > region server will fail with {{NoRouteToHostException}} which it handles but > since on second attempt crashes with {{FailedServerException}} > Here is the log from one such occurance > {noformat} > 2013-11-20 10:58:00,161 FATAL org.apache.hadoop.hbase.master.HMaster: Master > server abort: loaded coprocessors are: [] > 2013-11-20 10:58:00,161 FATAL org.apache.hadoop.hbase.master.HMaster: > Unhandled exception. Starting shutdown. > org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is > in the failed servers list: xxx02/192.168.1.102:60020 > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425) > at > org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) > at $Proxy9.getProtocolVersion(Unknown Source) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138) > at > org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1335) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1294) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1281) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:506) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:383) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:445) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnection(CatalogTracker.java:464) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:624) > at > org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:684) > at > org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:560) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:376) > at java.lang.Thread.run(Thread.java:662) > 2013-11-20 10:58:00,162 INFO org.apache.hadoop.hbase.master.HMaster: Aborting > 2013-11-20 10:58:00,162 INFO org.apache.hadoop.ipc.HBaseServer: Stopping > server on 6 > {noformat} > Each of the backup master will crash with same error and restarting them will > have the same effect. Once this happens, the cluster will remain > in-operational until the node with region server is brought online (or the > Zookeeper node containing the root region server and/or META entry from the > ROOT table is deleted). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10272) Cluster becomes nonoperational if the node hosting the active Master AND ROOT/META table goes offline
[ https://issues.apache.org/jira/browse/HBASE-10272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10272: --- Summary: Cluster becomes nonoperational if the node hosting the active Master AND ROOT/META table goes offline (was: Cluster becomes in-operational if the node hosting the active Master AND ROOT/META table goes offline) > Cluster becomes nonoperational if the node hosting the active Master AND > ROOT/META table goes offline > - > > Key: HBASE-10272 > URL: https://issues.apache.org/jira/browse/HBASE-10272 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 0.96.1, 0.94.15 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Attachments: HBASE-10272.patch, HBASE-10272_0.94.patch > > > Since HBASE-6364, HBase client caches a connection failure to a server and > any subsequent attempt to connect to the server throws a > {{FailedServerException}} > Now if a node which hosted the active Master AND ROOT/META table goes > offline, the newly anointed Master's initial attempt to connect to the dead > region server will fail with {{NoRouteToHostException}} which it handles but > since on second attempt crashes with {{FailedServerException}} > Here is the log from one such occurance > {noformat} > 2013-11-20 10:58:00,161 FATAL org.apache.hadoop.hbase.master.HMaster: Master > server abort: loaded coprocessors are: [] > 2013-11-20 10:58:00,161 FATAL org.apache.hadoop.hbase.master.HMaster: > Unhandled exception. Starting shutdown. > org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is > in the failed servers list: xxx02/192.168.1.102:60020 > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425) > at > org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) > at $Proxy9.getProtocolVersion(Unknown Source) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138) > at > org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1335) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1294) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1281) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:506) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:383) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:445) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnection(CatalogTracker.java:464) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:624) > at > org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:684) > at > org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:560) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:376) > at java.lang.Thread.run(Thread.java:662) > 2013-11-20 10:58:00,162 INFO org.apache.hadoop.hbase.master.HMaster: Aborting > 2013-11-20 10:58:00,162 INFO org.apache.hadoop.ipc.HBaseServer: Stopping > server on 6 > {noformat} > Each of the backup master will crash with same error and restarting them will > have the same effect. Once this happens, the cluster will remain > in-operational until the node with region server is brought online (or the > Zookeeper node containing the root region server and/or META entry from the > ROOT table is deleted). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9941) The context ClassLoader isn't set while calling into a coprocessor
[ https://issues.apache.org/jira/browse/HBASE-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862170#comment-13862170 ] Hadoop QA commented on HBASE-9941: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621442/9941.patch against trunk revision . ATTACHMENT ID: 12621442 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 4 release audit warnings (more than the trunk's current 0 warnings). {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8337//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8337//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8337//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8337//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8337//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8337//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8337//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8337//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8337//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8337//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8337//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8337//console This message is automatically generated. > The context ClassLoader isn't set while calling into a coprocessor > -- > > Key: HBASE-9941 > URL: https://issues.apache.org/jira/browse/HBASE-9941 > Project: HBase > Issue Type: Sub-task > Components: Coprocessors >Affects Versions: 0.96.0 >Reporter: Benoit Sigoure >Assignee: Andrew Purtell > Fix For: 0.98.0 > > Attachments: 9941.patch, 9941.patch, 9941.patch, 9941.patch > > > Whenever one of the methods of a coprocessor is invoked, the context > {{ClassLoader}} isn't set to be the {{CoprocessorClassLoader}}. It's only > set properly when calling the coprocessor's {{start}} method. This means > that if the coprocessor code attempts to load classes using the context > {{ClassLoader}}, it will fail to find the classes it's looking for. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9941) The context ClassLoader isn't set while calling into a coprocessor
[ https://issues.apache.org/jira/browse/HBASE-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862169#comment-13862169 ] Andrew Purtell commented on HBASE-9941: --- Actually, I agree caliper shouldn't come in except when running tests. Let me figure out some Maven way to do what I want while having caliper at test scope. > The context ClassLoader isn't set while calling into a coprocessor > -- > > Key: HBASE-9941 > URL: https://issues.apache.org/jira/browse/HBASE-9941 > Project: HBase > Issue Type: Sub-task > Components: Coprocessors >Affects Versions: 0.96.0 >Reporter: Benoit Sigoure >Assignee: Andrew Purtell > Fix For: 0.98.0 > > Attachments: 9941.patch, 9941.patch, 9941.patch, 9941.patch > > > Whenever one of the methods of a coprocessor is invoked, the context > {{ClassLoader}} isn't set to be the {{CoprocessorClassLoader}}. It's only > set properly when calling the coprocessor's {{start}} method. This means > that if the coprocessor code attempts to load classes using the context > {{ClassLoader}}, it will fail to find the classes it's looking for. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10210) during master startup, RS can be you-are-dead-ed by master in error
[ https://issues.apache.org/jira/browse/HBASE-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862165#comment-13862165 ] Hudson commented on HBASE-10210: SUCCESS: Integrated in HBase-TRUNK #4786 (See [https://builds.apache.org/job/HBase-TRUNK/4786/]) HBASE-10210 during master startup, RS can be you-are-dead-ed by master in error (sershe: rev 1555275) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManagerOnCluster.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java > during master startup, RS can be you-are-dead-ed by master in error > --- > > Key: HBASE-10210 > URL: https://issues.apache.org/jira/browse/HBASE-10210 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0, 0.96.1, 0.99.0, 0.96.1.1 >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 0.98.0, 0.99.0 > > Attachments: HBASE-10210.01.patch, HBASE-10210.02.patch, > HBASE-10210.03.patch, HBASE-10210.04.patch, HBASE-10210.05.patch, > HBASE-10210.patch > > > Not sure of the root cause yet, I am at "how did this ever work" stage. > We see this problem in 0.96.1, but didn't in 0.96.0 + some patches. > It looks like RS information arriving from 2 sources - ZK and server itself, > can conflict. Master doesn't handle such cases (timestamp match), and anyway > technically timestamps can collide for two separate servers. > So, master YouAreDead-s the already-recorded reporting RS, and adds it too. > Then it discovers that the new server has died with fatal error! > Note the threads. > Addition is called from master initialization and from RPC. > {noformat} > 2013-12-19 11:16:45,290 INFO > [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: > Finished waiting for region servers count to settle; checked in 2, slept for > 18262 ms, expecting minimum of 1, maximum of 2147483647, master is running. > 2013-12-19 11:16:45,290 INFO > [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: > Registering > server=h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > 2013-12-19 11:16:45,290 INFO > [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.HMaster: Registered > server found up in zk but who has not yet reported in: > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] > master.ServerManager: Triggering server recovery; existingServer > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > looks stale, new > server:h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] > master.ServerManager: Master doesn't enable ServerShutdownHandler during > initialization, delay expiring server > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > ... > 2013-12-19 11:16:46,925 ERROR [RpcServer.handler=7,port=6] > master.HMaster: Region server > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > reported a fatal error: > ABORTING region server > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800: > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; > currently processing > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 as > dead server > {noformat} > Presumably some of the recent ZK listener related changes b -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9280) Integration tests should use compression.
[ https://issues.apache.org/jira/browse/HBASE-9280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862166#comment-13862166 ] Andrew Purtell commented on HBASE-9280: --- Wait, we have http://svn.apache.org/repos/asf/hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/ChangeCompressionAction.java. Close this issue? > Integration tests should use compression. > - > > Key: HBASE-9280 > URL: https://issues.apache.org/jira/browse/HBASE-9280 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.98.0 >Reporter: Elliott Clark >Assignee: Andrew Purtell > Fix For: 0.98.0 > > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10263) make LruBlockCache single/multi/in-memory ratio user-configurable and provide preemptive mode for in-memory type block
[ https://issues.apache.org/jira/browse/HBASE-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862163#comment-13862163 ] Feng Honghua commented on HBASE-10263: -- [~yuzhih...@gmail.com], [HBASE-10280|https://issues.apache.org/jira/browse/HBASE-10280] is created per your suggestion, please check its description to see if the per-column-family behavior matches your expectation, thanks agaion :-) > make LruBlockCache single/multi/in-memory ratio user-configurable and provide > preemptive mode for in-memory type block > -- > > Key: HBASE-10263 > URL: https://issues.apache.org/jira/browse/HBASE-10263 > Project: HBase > Issue Type: Improvement > Components: io >Reporter: Feng Honghua >Assignee: Feng Honghua > Attachments: HBASE-10263-trunk_v0.patch, HBASE-10263-trunk_v1.patch > > > currently the single/multi/in-memory ratio in LruBlockCache is hardcoded > 1:2:1, which can lead to somewhat counter-intuition behavior for some user > scenario where in-memory table's read performance is much worse than ordinary > table when two tables' data size is almost equal and larger than > regionserver's cache size (we ever did some such experiment and verified that > in-memory table random read performance is two times worse than ordinary > table). > this patch fixes above issue and provides: > 1. make single/multi/in-memory ratio user-configurable > 2. provide a configurable switch which can make in-memory block preemptive, > by preemptive means when this switch is on in-memory block can kick out any > ordinary block to make room until no ordinary block, when this switch is off > (by default) the behavior is the same as previous, using > single/multi/in-memory ratio to determine evicting. > by default, above two changes are both off and the behavior keeps the same as > before applying this patch. it's client/user's choice to determine whether or > which behavior to use by enabling one of these two enhancements. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10263) make LruBlockCache single/multi/in-memory ratio user-configurable and provide preemptive mode for in-memory type block
[ https://issues.apache.org/jira/browse/HBASE-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Honghua updated HBASE-10263: - Component/s: io > make LruBlockCache single/multi/in-memory ratio user-configurable and provide > preemptive mode for in-memory type block > -- > > Key: HBASE-10263 > URL: https://issues.apache.org/jira/browse/HBASE-10263 > Project: HBase > Issue Type: Improvement > Components: io >Reporter: Feng Honghua >Assignee: Feng Honghua > Attachments: HBASE-10263-trunk_v0.patch, HBASE-10263-trunk_v1.patch > > > currently the single/multi/in-memory ratio in LruBlockCache is hardcoded > 1:2:1, which can lead to somewhat counter-intuition behavior for some user > scenario where in-memory table's read performance is much worse than ordinary > table when two tables' data size is almost equal and larger than > regionserver's cache size (we ever did some such experiment and verified that > in-memory table random read performance is two times worse than ordinary > table). > this patch fixes above issue and provides: > 1. make single/multi/in-memory ratio user-configurable > 2. provide a configurable switch which can make in-memory block preemptive, > by preemptive means when this switch is on in-memory block can kick out any > ordinary block to make room until no ordinary block, when this switch is off > (by default) the behavior is the same as previous, using > single/multi/in-memory ratio to determine evicting. > by default, above two changes are both off and the behavior keeps the same as > before applying this patch. it's client/user's choice to determine whether or > which behavior to use by enabling one of these two enhancements. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9941) The context ClassLoader isn't set while calling into a coprocessor
[ https://issues.apache.org/jira/browse/HBASE-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862162#comment-13862162 ] Andrew Purtell commented on HBASE-9941: --- bq. Only comment is can the caliper dependency be {{test}}? Don't want to ship it as a dependency if it's only for test code. If I make it a test only dependency then this happens: {noformat} ./bin/hbase org.apache.hadoop.hbase.CoprocessorInvocationEvaluation --trials 10 Exception in thread "main" java.lang.NoClassDefFoundError: com/google/caliper/SimpleBenchmark at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631) at java.lang.ClassLoader.defineClass(ClassLoader.java:615) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141) at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) at java.net.URLClassLoader.access$000(URLClassLoader.java:58) at java.net.URLClassLoader$1.run(URLClassLoader.java:197) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) Caused by: java.lang.ClassNotFoundException: com.google.caliper.SimpleBenchmark at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247 {noformat} I will commit this shortly to trunk and 0.98 if no objections. > The context ClassLoader isn't set while calling into a coprocessor > -- > > Key: HBASE-9941 > URL: https://issues.apache.org/jira/browse/HBASE-9941 > Project: HBase > Issue Type: Sub-task > Components: Coprocessors >Affects Versions: 0.96.0 >Reporter: Benoit Sigoure >Assignee: Andrew Purtell > Fix For: 0.98.0 > > Attachments: 9941.patch, 9941.patch, 9941.patch, 9941.patch > > > Whenever one of the methods of a coprocessor is invoked, the context > {{ClassLoader}} isn't set to be the {{CoprocessorClassLoader}}. It's only > set properly when calling the coprocessor's {{start}} method. This means > that if the coprocessor code attempts to load classes using the context > {{ClassLoader}}, it will fail to find the classes it's looking for. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10280) Make inMemoryForceMode of LruBlockCache configurable per column-family
Feng Honghua created HBASE-10280: Summary: Make inMemoryForceMode of LruBlockCache configurable per column-family Key: HBASE-10280 URL: https://issues.apache.org/jira/browse/HBASE-10280 Project: HBase Issue Type: Improvement Components: io, regionserver Reporter: Feng Honghua Assignee: Feng Honghua An extension of [HBASE-10273|https://issues.apache.org/jira/browse/HBASE-10263] per [~yuzhih...@gmail.com]'s suggestion. brief description of this extension is as below: 1. if the global inMemoryForceMode is on, all in-memory blocks are preemptive no matter what their per-column-family inMemoryForceMode are; 2. if the global inMemoryForceMode is off, only in-memory blocks of column-family with inMemoryForceMode on are preemptive; non-preemptive inMemory blocks respect the single/multi/memory ratio; In short, global flag dominates, and per-column-family flag can only control its own blocks. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10263) make LruBlockCache single/multi/in-memory ratio user-configurable and provide preemptive mode for in-memory type block
[ https://issues.apache.org/jira/browse/HBASE-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862157#comment-13862157 ] Feng Honghua commented on HBASE-10263: -- [~yuzhih...@gmail.com] bq.Is there plan to make inMemoryForceMode column-family config ? ==> hmmm...sounds reasonable and feasible, but not sure providing such finer-grained control for this flag is desirable for users. Let me create a new jira for it, and will implementation it if seeing request or someone wants it, thanks for suggestion :-) > make LruBlockCache single/multi/in-memory ratio user-configurable and provide > preemptive mode for in-memory type block > -- > > Key: HBASE-10263 > URL: https://issues.apache.org/jira/browse/HBASE-10263 > Project: HBase > Issue Type: Improvement >Reporter: Feng Honghua >Assignee: Feng Honghua > Attachments: HBASE-10263-trunk_v0.patch, HBASE-10263-trunk_v1.patch > > > currently the single/multi/in-memory ratio in LruBlockCache is hardcoded > 1:2:1, which can lead to somewhat counter-intuition behavior for some user > scenario where in-memory table's read performance is much worse than ordinary > table when two tables' data size is almost equal and larger than > regionserver's cache size (we ever did some such experiment and verified that > in-memory table random read performance is two times worse than ordinary > table). > this patch fixes above issue and provides: > 1. make single/multi/in-memory ratio user-configurable > 2. provide a configurable switch which can make in-memory block preemptive, > by preemptive means when this switch is on in-memory block can kick out any > ordinary block to make room until no ordinary block, when this switch is off > (by default) the behavior is the same as previous, using > single/multi/in-memory ratio to determine evicting. > by default, above two changes are both off and the behavior keeps the same as > before applying this patch. it's client/user's choice to determine whether or > which behavior to use by enabling one of these two enhancements. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10263) make LruBlockCache single/multi/in-memory ratio user-configurable and provide preemptive mode for in-memory type block
[ https://issues.apache.org/jira/browse/HBASE-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862153#comment-13862153 ] Feng Honghua commented on HBASE-10263: -- [~stack] bq.I would suggest that this behavior be ON by default in a major release of hbase (0.98 if @apurtell is amenable or 1.0.0 if not); to me, the way this patch is more the 'expected' behavior. ==> the single/multi/memory ratio by default is the same as before(without any tweak): 25%:50%:25%, but user can change them by setting the new configurations, the 'inMemoryForceMode'(preemptive mode for in-memory blocks) is by default OFF, you want to turn 'inMemoryForceMode' ON? hmmm. what about we firstly make it conservative by keeping it OFF by default, and turn it on if we eventually found most of our users tweak it on for their real use :-) At least we now provide users a new option to control how 'in-memory' cached blocks mean and behave, and when it's off we enable users to configure the single/multi/memory ratios. Opinion? > make LruBlockCache single/multi/in-memory ratio user-configurable and provide > preemptive mode for in-memory type block > -- > > Key: HBASE-10263 > URL: https://issues.apache.org/jira/browse/HBASE-10263 > Project: HBase > Issue Type: Improvement >Reporter: Feng Honghua >Assignee: Feng Honghua > Attachments: HBASE-10263-trunk_v0.patch, HBASE-10263-trunk_v1.patch > > > currently the single/multi/in-memory ratio in LruBlockCache is hardcoded > 1:2:1, which can lead to somewhat counter-intuition behavior for some user > scenario where in-memory table's read performance is much worse than ordinary > table when two tables' data size is almost equal and larger than > regionserver's cache size (we ever did some such experiment and verified that > in-memory table random read performance is two times worse than ordinary > table). > this patch fixes above issue and provides: > 1. make single/multi/in-memory ratio user-configurable > 2. provide a configurable switch which can make in-memory block preemptive, > by preemptive means when this switch is on in-memory block can kick out any > ordinary block to make room until no ordinary block, when this switch is off > (by default) the behavior is the same as previous, using > single/multi/in-memory ratio to determine evicting. > by default, above two changes are both off and the behavior keeps the same as > before applying this patch. it's client/user's choice to determine whether or > which behavior to use by enabling one of these two enhancements. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9941) The context ClassLoader isn't set while calling into a coprocessor
[ https://issues.apache.org/jira/browse/HBASE-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862152#comment-13862152 ] Gary Helmling commented on HBASE-9941: -- +1 on the latest patch. Only comment is can the caliper dependency be {{test}}? Don't want to ship it as a dependency if it's only for test code. Assuming that works, I'm fine to fix that bit on commit. > The context ClassLoader isn't set while calling into a coprocessor > -- > > Key: HBASE-9941 > URL: https://issues.apache.org/jira/browse/HBASE-9941 > Project: HBase > Issue Type: Sub-task > Components: Coprocessors >Affects Versions: 0.96.0 >Reporter: Benoit Sigoure >Assignee: Andrew Purtell > Fix For: 0.98.0 > > Attachments: 9941.patch, 9941.patch, 9941.patch, 9941.patch > > > Whenever one of the methods of a coprocessor is invoked, the context > {{ClassLoader}} isn't set to be the {{CoprocessorClassLoader}}. It's only > set properly when calling the coprocessor's {{start}} method. This means > that if the coprocessor code attempts to load classes using the context > {{ClassLoader}}, it will fail to find the classes it's looking for. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10279) TestStore.testDeleteExpiredStoreFiles is flaky
[ https://issues.apache.org/jira/browse/HBASE-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862151#comment-13862151 ] Lars Hofhansl commented on HBASE-10279: --- Arghh. Should've looked there first. > TestStore.testDeleteExpiredStoreFiles is flaky > -- > > Key: HBASE-10279 > URL: https://issues.apache.org/jira/browse/HBASE-10279 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Attachments: 10279-0.94.txt > > > TestStore.testDeleteExpiredStoreFiles relies on wall clock time, if there is > a blip on the machine running the test, first compaction might be delayed > enough in order to compact away multiple of the files, and have the test fail. > The simplest fix is to just double the time given from 1s/file to 2s/file. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10279) TestStore.testDeleteExpiredStoreFiles is flaky
[ https://issues.apache.org/jira/browse/HBASE-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862149#comment-13862149 ] Andrew Purtell commented on HBASE-10279: Looks like trunk already has a change like the patch on this issue. > TestStore.testDeleteExpiredStoreFiles is flaky > -- > > Key: HBASE-10279 > URL: https://issues.apache.org/jira/browse/HBASE-10279 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Attachments: 10279-0.94.txt > > > TestStore.testDeleteExpiredStoreFiles relies on wall clock time, if there is > a blip on the machine running the test, first compaction might be delayed > enough in order to compact away multiple of the files, and have the test fail. > The simplest fix is to just double the time given from 1s/file to 2s/file. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10263) make LruBlockCache single/multi/in-memory ratio user-configurable and provide preemptive mode for in-memory type block
[ https://issues.apache.org/jira/browse/HBASE-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Honghua updated HBASE-10263: - Attachment: HBASE-10263-trunk_v1.patch new patch removing unused variables in CacheConfig.java per [~yuzhih...@gmail.com]'s review, thanks [~yuzhih...@gmail.com] :-) > make LruBlockCache single/multi/in-memory ratio user-configurable and provide > preemptive mode for in-memory type block > -- > > Key: HBASE-10263 > URL: https://issues.apache.org/jira/browse/HBASE-10263 > Project: HBase > Issue Type: Improvement >Reporter: Feng Honghua >Assignee: Feng Honghua > Attachments: HBASE-10263-trunk_v0.patch, HBASE-10263-trunk_v1.patch > > > currently the single/multi/in-memory ratio in LruBlockCache is hardcoded > 1:2:1, which can lead to somewhat counter-intuition behavior for some user > scenario where in-memory table's read performance is much worse than ordinary > table when two tables' data size is almost equal and larger than > regionserver's cache size (we ever did some such experiment and verified that > in-memory table random read performance is two times worse than ordinary > table). > this patch fixes above issue and provides: > 1. make single/multi/in-memory ratio user-configurable > 2. provide a configurable switch which can make in-memory block preemptive, > by preemptive means when this switch is on in-memory block can kick out any > ordinary block to make room until no ordinary block, when this switch is off > (by default) the behavior is the same as previous, using > single/multi/in-memory ratio to determine evicting. > by default, above two changes are both off and the behavior keeps the same as > before applying this patch. it's client/user's choice to determine whether or > which behavior to use by enabling one of these two enhancements. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10263) make LruBlockCache single/multi/in-memory ratio user-configurable and provide preemptive mode for in-memory type block
[ https://issues.apache.org/jira/browse/HBASE-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862146#comment-13862146 ] Feng Honghua commented on HBASE-10263: -- They are not used in CacheConfig.java, they are read from conf in constructor and surely used in LruBlockCache.java > make LruBlockCache single/multi/in-memory ratio user-configurable and provide > preemptive mode for in-memory type block > -- > > Key: HBASE-10263 > URL: https://issues.apache.org/jira/browse/HBASE-10263 > Project: HBase > Issue Type: Improvement >Reporter: Feng Honghua >Assignee: Feng Honghua > Attachments: HBASE-10263-trunk_v0.patch > > > currently the single/multi/in-memory ratio in LruBlockCache is hardcoded > 1:2:1, which can lead to somewhat counter-intuition behavior for some user > scenario where in-memory table's read performance is much worse than ordinary > table when two tables' data size is almost equal and larger than > regionserver's cache size (we ever did some such experiment and verified that > in-memory table random read performance is two times worse than ordinary > table). > this patch fixes above issue and provides: > 1. make single/multi/in-memory ratio user-configurable > 2. provide a configurable switch which can make in-memory block preemptive, > by preemptive means when this switch is on in-memory block can kick out any > ordinary block to make room until no ordinary block, when this switch is off > (by default) the behavior is the same as previous, using > single/multi/in-memory ratio to determine evicting. > by default, above two changes are both off and the behavior keeps the same as > before applying this patch. it's client/user's choice to determine whether or > which behavior to use by enabling one of these two enhancements. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10263) make LruBlockCache single/multi/in-memory ratio user-configurable and provide preemptive mode for in-memory type block
[ https://issues.apache.org/jira/browse/HBASE-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862144#comment-13862144 ] Feng Honghua commented on HBASE-10263: -- [~yuzhih...@gmail.com]: bq.Is the above variable used(inMemoryForceMode) ? ==> No, they(together with single/multi/memory factors) are not used. There is a historical reason for these variables here: this flag(and other 3 factors) will be read from *conf* passed as parameter in LruBlockCache constructor, in 0.94.3(our internal branch) there is a INFO log for max-size before constructing the LruBlockCache, and I added these 'forceMode/single/multi/memory' info in that INFO log as well, they are used just for info purpose, but this INFO log in CacheConfig.java doesn't exist in trunk code(it's removed), and I forgot to remove these four just-for-info variables accordingly. *It won't affect correctness*. Thanks for point this out :-) > make LruBlockCache single/multi/in-memory ratio user-configurable and provide > preemptive mode for in-memory type block > -- > > Key: HBASE-10263 > URL: https://issues.apache.org/jira/browse/HBASE-10263 > Project: HBase > Issue Type: Improvement >Reporter: Feng Honghua >Assignee: Feng Honghua > Attachments: HBASE-10263-trunk_v0.patch > > > currently the single/multi/in-memory ratio in LruBlockCache is hardcoded > 1:2:1, which can lead to somewhat counter-intuition behavior for some user > scenario where in-memory table's read performance is much worse than ordinary > table when two tables' data size is almost equal and larger than > regionserver's cache size (we ever did some such experiment and verified that > in-memory table random read performance is two times worse than ordinary > table). > this patch fixes above issue and provides: > 1. make single/multi/in-memory ratio user-configurable > 2. provide a configurable switch which can make in-memory block preemptive, > by preemptive means when this switch is on in-memory block can kick out any > ordinary block to make room until no ordinary block, when this switch is off > (by default) the behavior is the same as previous, using > single/multi/in-memory ratio to determine evicting. > by default, above two changes are both off and the behavior keeps the same as > before applying this patch. it's client/user's choice to determine whether or > which behavior to use by enabling one of these two enhancements. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10279) TestStore.testDeleteExpiredStoreFiles is flaky
[ https://issues.apache.org/jira/browse/HBASE-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862141#comment-13862141 ] Andrew Purtell commented on HBASE-10279: I assume you pinged me to pick this up for trunk [~lhofhansl], so that's what I will do. :-) > TestStore.testDeleteExpiredStoreFiles is flaky > -- > > Key: HBASE-10279 > URL: https://issues.apache.org/jira/browse/HBASE-10279 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Attachments: 10279-0.94.txt > > > TestStore.testDeleteExpiredStoreFiles relies on wall clock time, if there is > a blip on the machine running the test, first compaction might be delayed > enough in order to compact away multiple of the files, and have the test fail. > The simplest fix is to just double the time given from 1s/file to 2s/file. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10272) Cluster becomes in-operational if the node hosting the active Master AND ROOT/META table goes offline
[ https://issues.apache.org/jira/browse/HBASE-10272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862139#comment-13862139 ] Andrew Purtell commented on HBASE-10272: +1 > Cluster becomes in-operational if the node hosting the active Master AND > ROOT/META table goes offline > - > > Key: HBASE-10272 > URL: https://issues.apache.org/jira/browse/HBASE-10272 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 0.96.1, 0.94.15 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Attachments: HBASE-10272.patch, HBASE-10272_0.94.patch > > > Since HBASE-6364, HBase client caches a connection failure to a server and > any subsequent attempt to connect to the server throws a > {{FailedServerException}} > Now if a node which hosted the active Master AND ROOT/META table goes > offline, the newly anointed Master's initial attempt to connect to the dead > region server will fail with {{NoRouteToHostException}} which it handles but > since on second attempt crashes with {{FailedServerException}} > Here is the log from one such occurance > {noformat} > 2013-11-20 10:58:00,161 FATAL org.apache.hadoop.hbase.master.HMaster: Master > server abort: loaded coprocessors are: [] > 2013-11-20 10:58:00,161 FATAL org.apache.hadoop.hbase.master.HMaster: > Unhandled exception. Starting shutdown. > org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is > in the failed servers list: xxx02/192.168.1.102:60020 > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425) > at > org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) > at $Proxy9.getProtocolVersion(Unknown Source) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138) > at > org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1335) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1294) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1281) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:506) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:383) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:445) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnection(CatalogTracker.java:464) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:624) > at > org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:684) > at > org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:560) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:376) > at java.lang.Thread.run(Thread.java:662) > 2013-11-20 10:58:00,162 INFO org.apache.hadoop.hbase.master.HMaster: Aborting > 2013-11-20 10:58:00,162 INFO org.apache.hadoop.ipc.HBaseServer: Stopping > server on 6 > {noformat} > Each of the backup master will crash with same error and restarting them will > have the same effect. Once this happens, the cluster will remain > in-operational until the node with region server is brought online (or the > Zookeeper node containing the root region server and/or META entry from the > ROOT table is deleted). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10272) Cluster becomes in-operational if the node hosting the active Master AND ROOT/META table goes offline
[ https://issues.apache.org/jira/browse/HBASE-10272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862137#comment-13862137 ] Ted Yu commented on HBASE-10272: +1 [~apurtell]: Do you want this in 0.98 ? > Cluster becomes in-operational if the node hosting the active Master AND > ROOT/META table goes offline > - > > Key: HBASE-10272 > URL: https://issues.apache.org/jira/browse/HBASE-10272 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 0.96.1, 0.94.15 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Attachments: HBASE-10272.patch, HBASE-10272_0.94.patch > > > Since HBASE-6364, HBase client caches a connection failure to a server and > any subsequent attempt to connect to the server throws a > {{FailedServerException}} > Now if a node which hosted the active Master AND ROOT/META table goes > offline, the newly anointed Master's initial attempt to connect to the dead > region server will fail with {{NoRouteToHostException}} which it handles but > since on second attempt crashes with {{FailedServerException}} > Here is the log from one such occurance > {noformat} > 2013-11-20 10:58:00,161 FATAL org.apache.hadoop.hbase.master.HMaster: Master > server abort: loaded coprocessors are: [] > 2013-11-20 10:58:00,161 FATAL org.apache.hadoop.hbase.master.HMaster: > Unhandled exception. Starting shutdown. > org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is > in the failed servers list: xxx02/192.168.1.102:60020 > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425) > at > org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) > at $Proxy9.getProtocolVersion(Unknown Source) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138) > at > org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1335) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1294) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1281) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:506) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:383) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:445) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnection(CatalogTracker.java:464) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:624) > at > org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:684) > at > org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:560) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:376) > at java.lang.Thread.run(Thread.java:662) > 2013-11-20 10:58:00,162 INFO org.apache.hadoop.hbase.master.HMaster: Aborting > 2013-11-20 10:58:00,162 INFO org.apache.hadoop.ipc.HBaseServer: Stopping > server on 6 > {noformat} > Each of the backup master will crash with same error and restarting them will > have the same effect. Once this happens, the cluster will remain > in-operational until the node with region server is brought online (or the > Zookeeper node containing the root region server and/or META entry from the > ROOT table is deleted). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HBASE-10279) TestStore.testDeleteExpiredStoreFiles is flaky
[ https://issues.apache.org/jira/browse/HBASE-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl reassigned HBASE-10279: - Assignee: Lars Hofhansl > TestStore.testDeleteExpiredStoreFiles is flaky > -- > > Key: HBASE-10279 > URL: https://issues.apache.org/jira/browse/HBASE-10279 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Attachments: 10279-0.94.txt > > > TestStore.testDeleteExpiredStoreFiles relies on wall clock time, if there is > a blip on the machine running the test, first compaction might be delayed > enough in order to compact away multiple of the files, and have the test fail. > The simplest fix is to just double the time given from 1s/file to 2s/file. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10279) TestStore.testDeleteExpiredStoreFiles is flaky
[ https://issues.apache.org/jira/browse/HBASE-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862133#comment-13862133 ] Lars Hofhansl commented on HBASE-10279: --- [~apurtell], FYI > TestStore.testDeleteExpiredStoreFiles is flaky > -- > > Key: HBASE-10279 > URL: https://issues.apache.org/jira/browse/HBASE-10279 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Attachments: 10279-0.94.txt > > > TestStore.testDeleteExpiredStoreFiles relies on wall clock time, if there is > a blip on the machine running the test, first compaction might be delayed > enough in order to compact away multiple of the files, and have the test fail. > The simplest fix is to just double the time given from 1s/file to 2s/file. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10279) TestStore.testDeleteExpiredStoreFiles is flaky
[ https://issues.apache.org/jira/browse/HBASE-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-10279: -- Attachment: 10279-0.94.txt Patch for 0.94. Uses EnvironmentEdge instead. The change to Store.java is not needed, but good to have. > TestStore.testDeleteExpiredStoreFiles is flaky > -- > > Key: HBASE-10279 > URL: https://issues.apache.org/jira/browse/HBASE-10279 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl > Attachments: 10279-0.94.txt > > > TestStore.testDeleteExpiredStoreFiles relies on wall clock time, if there is > a blip on the machine running the test, first compaction might be delayed > enough in order to compact away multiple of the files, and have the test fail. > The simplest fix is to just double the time given from 1s/file to 2s/file. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10279) TestStore.testDeleteExpiredStoreFiles is flaky
[ https://issues.apache.org/jira/browse/HBASE-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-10279: -- Attachment: 10279-0.94.txt > TestStore.testDeleteExpiredStoreFiles is flaky > -- > > Key: HBASE-10279 > URL: https://issues.apache.org/jira/browse/HBASE-10279 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl > Attachments: 10279-0.94.txt > > > TestStore.testDeleteExpiredStoreFiles relies on wall clock time, if there is > a blip on the machine running the test, first compaction might be delayed > enough in order to compact away multiple of the files, and have the test fail. > The simplest fix is to just double the time given from 1s/file to 2s/file. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-5923) Cleanup checkAndXXX logic
[ https://issues.apache.org/jira/browse/HBASE-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862131#comment-13862131 ] Feng Honghua commented on HBASE-5923: - [~lhofhansl]: Sounds good > Cleanup checkAndXXX logic > - > > Key: HBASE-5923 > URL: https://issues.apache.org/jira/browse/HBASE-5923 > Project: HBase > Issue Type: Improvement > Components: Client, regionserver >Reporter: Lars Hofhansl > Labels: noob > Attachments: 5923-0.94.txt, 5923-trunk.txt, HBASE-10262-trunk_v0.patch > > > 1. the checkAnd{Put|Delete} method that takes a CompareOP is not exposed via > HTable[Interface]. > 2. there is unnecessary duplicate code in the check{Put|Delete} code in > HRegionServer. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10279) TestStore.testDeleteExpiredStoreFiles is flaky
[ https://issues.apache.org/jira/browse/HBASE-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-10279: -- Attachment: (was: 10279-0.94.txt) > TestStore.testDeleteExpiredStoreFiles is flaky > -- > > Key: HBASE-10279 > URL: https://issues.apache.org/jira/browse/HBASE-10279 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl > Attachments: 10279-0.94.txt > > > TestStore.testDeleteExpiredStoreFiles relies on wall clock time, if there is > a blip on the machine running the test, first compaction might be delayed > enough in order to compact away multiple of the files, and have the test fail. > The simplest fix is to just double the time given from 1s/file to 2s/file. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10279) TestStore.testDeleteExpiredStoreFiles is flaky
[ https://issues.apache.org/jira/browse/HBASE-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862129#comment-13862129 ] Lars Hofhansl commented on HBASE-10279: --- Or better, use the EnvironmentEdge correctly. > TestStore.testDeleteExpiredStoreFiles is flaky > -- > > Key: HBASE-10279 > URL: https://issues.apache.org/jira/browse/HBASE-10279 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl > > TestStore.testDeleteExpiredStoreFiles relies on wall clock time, if there is > a blip on the machine running the test, first compaction might be delayed > enough in order to compact away multiple of the files, and have the test fail. > The simplest fix is to just double the time given from 1s/file to 2s/file. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10273) AssignmentManager.regions(region to regionserver assignment map) and AssignmentManager.servers(regionserver to regions assignment map) are not always updated in tandem w
[ https://issues.apache.org/jira/browse/HBASE-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Honghua updated HBASE-10273: - Attachment: HBASE-10273-0.94_v1.patch new patch per [~lhofhansl]'s feedback is attached, thanks [~lhofhansl] > AssignmentManager.regions(region to regionserver assignment map) and > AssignmentManager.servers(regionserver to regions assignment map) are not > always updated in tandem with each other > --- > > Key: HBASE-10273 > URL: https://issues.apache.org/jira/browse/HBASE-10273 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.94.16 >Reporter: Feng Honghua >Assignee: Feng Honghua > Fix For: 0.94.16 > > Attachments: HBASE-10273-0.94_v0.patch, HBASE-10273-0.94_v1.patch > > > By definition, AssignmentManager.servers and AssignmentManager.regions are > tied and should be updated in tandem with each other under a lock on > AssignmentManager.regions, but there are two places where this protocol is > broken. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-9941) The context ClassLoader isn't set while calling into a coprocessor
[ https://issues.apache.org/jira/browse/HBASE-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-9941: -- Attachment: 9941.patch Updated patch fixes the part Gary pointed out, adds handling of throwables to WALObserver upcalls, adds classloader context setup for RegionServerCoprocessorHost (missed it previously), and fixes a javadoc nit. > The context ClassLoader isn't set while calling into a coprocessor > -- > > Key: HBASE-9941 > URL: https://issues.apache.org/jira/browse/HBASE-9941 > Project: HBase > Issue Type: Sub-task > Components: Coprocessors >Affects Versions: 0.96.0 >Reporter: Benoit Sigoure >Assignee: Andrew Purtell > Fix For: 0.98.0 > > Attachments: 9941.patch, 9941.patch, 9941.patch, 9941.patch > > > Whenever one of the methods of a coprocessor is invoked, the context > {{ClassLoader}} isn't set to be the {{CoprocessorClassLoader}}. It's only > set properly when calling the coprocessor's {{start}} method. This means > that if the coprocessor code attempts to load classes using the context > {{ClassLoader}}, it will fail to find the classes it's looking for. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10279) TestStore.testDeleteExpiredStoreFiles is flaky
Lars Hofhansl created HBASE-10279: - Summary: TestStore.testDeleteExpiredStoreFiles is flaky Key: HBASE-10279 URL: https://issues.apache.org/jira/browse/HBASE-10279 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl TestStore.testDeleteExpiredStoreFiles relies on wall clock time, if there is a blip on the machine running the test, first compaction might be delayed enough in order to compact away multiple of the files, and have the test fail. The simplest fix is to just double the time given from 1s/file to 2s/file. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9977) Define C interface of HBase Client Asynchronous APIs
[ https://issues.apache.org/jira/browse/HBASE-9977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862100#comment-13862100 ] Hudson commented on HBASE-9977: --- SUCCESS: Integrated in HBase-TRUNK #4785 (See [https://builds.apache.org/job/HBase-TRUNK/4785/]) HBASE-9977 Define C interface of HBase Client Asynchronous APIs (eclark: rev 1555272) * /hbase/trunk/hbase-native-client * /hbase/trunk/hbase-native-client/.gitignore * /hbase/trunk/hbase-native-client/CMakeLists.txt * /hbase/trunk/hbase-native-client/README.md * /hbase/trunk/hbase-native-client/bin * /hbase/trunk/hbase-native-client/bin/build-all.sh * /hbase/trunk/hbase-native-client/bin/build-thirdparty.sh * /hbase/trunk/hbase-native-client/bin/download-thirdparty.sh * /hbase/trunk/hbase-native-client/bin/hbase-client-env.sh * /hbase/trunk/hbase-native-client/cmake_modules * /hbase/trunk/hbase-native-client/cmake_modules/FindGTest.cmake * /hbase/trunk/hbase-native-client/cmake_modules/FindLibEv.cmake * /hbase/trunk/hbase-native-client/src * /hbase/trunk/hbase-native-client/src/async * /hbase/trunk/hbase-native-client/src/async/CMakeLists.txt * /hbase/trunk/hbase-native-client/src/async/get-test.cc * /hbase/trunk/hbase-native-client/src/async/hbase_admin.cc * /hbase/trunk/hbase-native-client/src/async/hbase_admin.h * /hbase/trunk/hbase-native-client/src/async/hbase_client.cc * /hbase/trunk/hbase-native-client/src/async/hbase_client.h * /hbase/trunk/hbase-native-client/src/async/hbase_connection.cc * /hbase/trunk/hbase-native-client/src/async/hbase_connection.h * /hbase/trunk/hbase-native-client/src/async/hbase_errno.h * /hbase/trunk/hbase-native-client/src/async/hbase_get.cc * /hbase/trunk/hbase-native-client/src/async/hbase_get.h * /hbase/trunk/hbase-native-client/src/async/hbase_mutations.cc * /hbase/trunk/hbase-native-client/src/async/hbase_mutations.h * /hbase/trunk/hbase-native-client/src/async/hbase_result.cc * /hbase/trunk/hbase-native-client/src/async/hbase_result.h * /hbase/trunk/hbase-native-client/src/async/hbase_scanner.cc * /hbase/trunk/hbase-native-client/src/async/hbase_scanner.h * /hbase/trunk/hbase-native-client/src/async/mutations-test.cc * /hbase/trunk/hbase-native-client/src/core * /hbase/trunk/hbase-native-client/src/core/CMakeLists.txt * /hbase/trunk/hbase-native-client/src/core/admin.cc * /hbase/trunk/hbase-native-client/src/core/admin.h * /hbase/trunk/hbase-native-client/src/core/client.cc * /hbase/trunk/hbase-native-client/src/core/client.h * /hbase/trunk/hbase-native-client/src/core/connection.cc * /hbase/trunk/hbase-native-client/src/core/connection.h * /hbase/trunk/hbase-native-client/src/core/connection_attr.h * /hbase/trunk/hbase-native-client/src/core/delete.cc * /hbase/trunk/hbase-native-client/src/core/delete.h * /hbase/trunk/hbase-native-client/src/core/get.cc * /hbase/trunk/hbase-native-client/src/core/get.h * /hbase/trunk/hbase-native-client/src/core/hbase_connection_attr.cc * /hbase/trunk/hbase-native-client/src/core/hbase_connection_attr.h * /hbase/trunk/hbase-native-client/src/core/hbase_macros.h * /hbase/trunk/hbase-native-client/src/core/hbase_types.h * /hbase/trunk/hbase-native-client/src/core/mutation.cc * /hbase/trunk/hbase-native-client/src/core/mutation.h * /hbase/trunk/hbase-native-client/src/core/put.cc * /hbase/trunk/hbase-native-client/src/core/put.h * /hbase/trunk/hbase-native-client/src/core/scanner.cc * /hbase/trunk/hbase-native-client/src/core/scanner.h * /hbase/trunk/hbase-native-client/src/rpc * /hbase/trunk/hbase-native-client/src/rpc/CMakeLists.txt * /hbase/trunk/hbase-native-client/src/sync * /hbase/trunk/hbase-native-client/src/sync/CMakeLists.txt * /hbase/trunk/hbase-native-client/src/sync/hbase_admin.cc * /hbase/trunk/hbase-native-client/src/sync/hbase_admin.h * /hbase/trunk/hbase-native-client/src/sync/hbase_connection.cc * /hbase/trunk/hbase-native-client/src/sync/hbase_connection.h > Define C interface of HBase Client Asynchronous APIs > > > Key: HBASE-9977 > URL: https://issues.apache.org/jira/browse/HBASE-9977 > Project: HBase > Issue Type: Sub-task > Components: Client >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 0.99.0 > > Attachments: HBASE-9977-0.patch, HBASE-9977-1.patch, > HBASE-9977-2.patch, HBASE-9977-3.patch, HBASE-9977-4.patch, HBASE-9977-5.patch > > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10278) Provide better write predictability
[ https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Himanshu Vashishtha updated HBASE-10278: Attachment: Multiwaldesigndoc.pdf > Provide better write predictability > --- > > Key: HBASE-10278 > URL: https://issues.apache.org/jira/browse/HBASE-10278 > Project: HBase > Issue Type: New Feature >Reporter: Himanshu Vashishtha > Attachments: Multiwaldesigndoc.pdf > > > Currently, HBase has one WAL per region server. > Whenever there is any latency in the write pipeline (due to whatever reasons > such as n/w blip, a node in the pipeline having a bad disk, etc), the overall > write latency suffers. > Jonathan Hsieh and I analyzed various approaches to tackle this issue. We > also looked at HBASE-5699, which talks about adding concurrent multi WALs. > Along with performance numbers, we also focussed on design simplicity, > minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. > Considering all these parameters, we propose a new HLog implementation with > WAL Switching functionality. > Please find attached the design doc for the same. It introduces the WAL > Switching feature, and experiments/results of a prototype implementation, > showing the benefits of this feature. > The second goal of this work is to serve as a building block for concurrent > multiple WALs feature. > Please review the doc. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10278) Provide better write predictability
Himanshu Vashishtha created HBASE-10278: --- Summary: Provide better write predictability Key: HBASE-10278 URL: https://issues.apache.org/jira/browse/HBASE-10278 Project: HBase Issue Type: New Feature Reporter: Himanshu Vashishtha Currently, HBase has one WAL per region server. Whenever there is any latency in the write pipeline (due to whatever reasons such as n/w blip, a node in the pipeline having a bad disk, etc), the overall write latency suffers. Jonathan Hsieh and I analyzed various approaches to tackle this issue. We also looked at HBASE-5699, which talks about adding concurrent multi WALs. Along with performance numbers, we also focussed on design simplicity, minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. Considering all these parameters, we propose a new HLog implementation with WAL Switching functionality. Please find attached the design doc for the same. It introduces the WAL Switching feature, and experiments/results of a prototype implementation, showing the benefits of this feature. The second goal of this work is to serve as a building block for concurrent multiple WALs feature. Please review the doc. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10272) Cluster becomes in-operational if the node hosting the active Master AND ROOT/META table goes offline
[ https://issues.apache.org/jira/browse/HBASE-10272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862095#comment-13862095 ] Hadoop QA commented on HBASE-10272: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621405/HBASE-10272.patch against trunk revision . ATTACHMENT ID: 12621405 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 4 release audit warnings (more than the trunk's current 0 warnings). {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8336//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8336//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8336//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8336//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8336//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8336//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8336//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8336//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8336//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8336//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8336//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8336//console This message is automatically generated. > Cluster becomes in-operational if the node hosting the active Master AND > ROOT/META table goes offline > - > > Key: HBASE-10272 > URL: https://issues.apache.org/jira/browse/HBASE-10272 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 0.96.1, 0.94.15 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Attachments: HBASE-10272.patch, HBASE-10272_0.94.patch > > > Since HBASE-6364, HBase client caches a connection failure to a server and > any subsequent attempt to connect to the server throws a > {{FailedServerException}} > Now if a node which hosted the active Master AND ROOT/META table goes > offline, the newly anointed Master's initial attempt to connect to the dead > region server will fail with {{NoRouteToHostException}} which it handles but > since on second attempt crashes with {{FailedServerException}} > Here is the log from one such occurance > {noformat} > 2013-11-20 10:58:00,161 FATAL org.apache.hadoop.hbase.master.HMaster: Master > server abort: loaded coprocessors are: [] > 2013-11-20 10:58:00,161 FATAL org.apache.hadoop.hbase.master.HMaster: > Unhandled exception. Starting shutdown. > org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is > in the failed servers list: xxx02/192.168.1.102:60020 > at > org.apache.hadoop
[jira] [Commented] (HBASE-5923) Cleanup checkAndXXX logic
[ https://issues.apache.org/jira/browse/HBASE-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862091#comment-13862091 ] Lars Hofhansl commented on HBASE-5923: -- Patch looks good. Unless somebody desperately wants this in 0.94/0.96/0.98, let's just change this in trunk. > Cleanup checkAndXXX logic > - > > Key: HBASE-5923 > URL: https://issues.apache.org/jira/browse/HBASE-5923 > Project: HBase > Issue Type: Improvement > Components: Client, regionserver >Reporter: Lars Hofhansl > Labels: noob > Attachments: 5923-0.94.txt, 5923-trunk.txt, HBASE-10262-trunk_v0.patch > > > 1. the checkAnd{Put|Delete} method that takes a CompareOP is not exposed via > HTable[Interface]. > 2. there is unnecessary duplicate code in the check{Put|Delete} code in > HRegionServer. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8751) Enable peer cluster to choose/change the ColumnFamilies/Tables it really want to replicate from a source cluster
[ https://issues.apache.org/jira/browse/HBASE-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862089#comment-13862089 ] Lars Hofhansl commented on HBASE-8751: -- Yeah, no persistent state in ZK... Although, we already broke that anyway, we store the state of the replication queue there, if we blow that away we will lose (not replicate) data in the slave cluster. > Enable peer cluster to choose/change the ColumnFamilies/Tables it really want > to replicate from a source cluster > > > Key: HBASE-8751 > URL: https://issues.apache.org/jira/browse/HBASE-8751 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Feng Honghua >Assignee: Feng Honghua > Attachments: HBASE-8751-0.94-V0.patch, HBASE-8751-0.94-v1.patch > > > Consider scenarios (all cf are with replication-scope=1): > 1) cluster S has 3 tables, table A has cfA,cfB, table B has cfX,cfY, table C > has cf1,cf2. > 2) cluster X wants to replicate table A : cfA, table B : cfX and table C from > cluster S. > 3) cluster Y wants to replicate table B : cfY, table C : cf2 from cluster S. > Current replication implementation can't achieve this since it'll push the > data of all the replicatable column-families from cluster S to all its peers, > X/Y in this scenario. > This improvement provides a fine-grained replication theme which enable peer > cluster to choose the column-families/tables they really want from the source > cluster: > A). Set the table:cf-list for a peer when addPeer: > hbase-shell> add_peer '3', "zk:1100:/hbase", "table1; table2:cf1,cf2; > table3:cf2" > B). View the table:cf-list config for a peer using show_peer_tableCFs: > hbase-shell> show_peer_tableCFs "1" > C). Change/set the table:cf-list for a peer using set_peer_tableCFs: > hbase-shell> set_peer_tableCFs '2', "table1:cfX; table2:cf1; table3:cf1,cf2" > In this theme, replication-scope=1 only means a column-family CAN be > replicated to other clusters, but only the 'table:cf-list list' determines > WHICH cf/table will actually be replicated to a specific peer. > To provide back-compatibility, empty 'table:cf-list list' will replicate all > replicatable cf/table. (this means we don't allow a peer which replicates > nothing from a source cluster, we think it's reasonable: if replicating > nothing why bother adding a peer?) > This improvement addresses the exact problem raised by the first FAQ in > "http://hbase.apache.org/replication.html": > "GLOBAL means replicate? Any provision to replicate only to cluster X and > not to cluster Y? or is that for later? > Yes, this is for much later." > I also noticed somebody mentioned "replication-scope" as integer rather than > a boolean is for such fine-grained replication purpose, but I think extending > "replication-scope" can't achieve the same replication granularity > flexibility as providing above per-peer replication configurations. > This improvement has been running smoothly in our production clusters > (Xiaomi) for several months. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10249) Intermittent TestReplicationSyncUpTool failure
[ https://issues.apache.org/jira/browse/HBASE-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862082#comment-13862082 ] Lars Hofhansl commented on HBASE-10249: --- bq. sorry about all the troubles That's what the tests are for :) Replication is in principle asynchronous, it might still just be an issue with the test. > Intermittent TestReplicationSyncUpTool failure > -- > > Key: HBASE-10249 > URL: https://issues.apache.org/jira/browse/HBASE-10249 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Demai Ni > Fix For: 0.98.0, 0.94.16, 0.96.2, 0.99.0 > > Attachments: HBASE-10249-trunk-v0.patch > > > New issue to keep track of this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10264) [MapReduce]: CompactionTool in mapred mode is missing classes in its classpath
[ https://issues.apache.org/jira/browse/HBASE-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862081#comment-13862081 ] Hudson commented on HBASE-10264: SUCCESS: Integrated in hbase-0.96-hadoop2 #169 (See [https://builds.apache.org/job/hbase-0.96-hadoop2/169/]) HBASE-10264 CompactionTool in mapred mode is missing classes in its classpath (Himanshu Vashishtha) (ndimiduk: rev 1555183) * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/CompactionTool.java > [MapReduce]: CompactionTool in mapred mode is missing classes in its classpath > -- > > Key: HBASE-10264 > URL: https://issues.apache.org/jira/browse/HBASE-10264 > Project: HBase > Issue Type: Bug > Components: Compaction, mapreduce >Affects Versions: 0.98.0, 0.99.0 >Reporter: Aleksandr Shulman >Assignee: Himanshu Vashishtha > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: HBase-10264.patch > > > Calling o.a.h.h.regionserver.CompactionTool fails due to classpath-related > issues in both MRv1 and MRv2. > {code}hbase org.apache.hadoop.hbase.regionserver.CompactionTool -mapred > -major hdfs://`hostname`:8020/hbase/data/default/orig_1388179858868{code} > Results: > {code}2013-12-27 13:31:49,478 INFO [main] mapreduce.Job: Task Id : > attempt_1388179525649_0011_m_00_2, Status : FAILED > Error: java.lang.ClassNotFoundException: > org.apache.hadoop.hbase.TableInfoMissingException > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionWorker.compact(CompactionTool.java:115) > at > org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionMapper.map(CompactionTool.java:231) > at > org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionMapper.map(CompactionTool.java:207) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1499) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160){code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10277) refactor AsyncProcess
Sergey Shelukhin created HBASE-10277: Summary: refactor AsyncProcess Key: HBASE-10277 URL: https://issues.apache.org/jira/browse/HBASE-10277 Project: HBase Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin AsyncProcess currently has two patterns of usage, one from HTable flush w/o callback and with reuse, and one from HCM/HTable batch call, with callback and w/o reuse. In the former case (but not the latter), it also does some throttling of actions on initial submit call, limiting the number of outstanding actions per server. The latter case is relatively straightforward. The former appears to be error prone due to reuse - if, as javadoc claims should be safe, multiple submit calls are performed without waiting for the async part of the previous call to finish, fields like hasError become ambiguous and can be used for the wrong call; callback for success/failure is called based on "original index" of an action in submitted list, but with only one callback supplied to AP in ctor it's not clear to which submit call the index belongs, if several are outstanding. I was going to add support for HBASE-10070 to AP, and found that it might be difficult to do cleanly. It would be nice to normalize AP usage patterns; in particular, separate the "global" part (load tracking) from per-submit-call part. Per-submit part can more conveniently track stuff like initialActions, mapping of indexes and retry information, that is currently passed around the method calls. I am not sure yet, but maybe sending of the original index to server in "ClientProtos.MultiAction" can also be avoided. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Reopened] (HBASE-10249) Intermittent TestReplicationSyncUpTool failure
[ https://issues.apache.org/jira/browse/HBASE-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-10249: Let's reopen the issue. > Intermittent TestReplicationSyncUpTool failure > -- > > Key: HBASE-10249 > URL: https://issues.apache.org/jira/browse/HBASE-10249 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Demai Ni > Fix For: 0.98.0, 0.94.16, 0.96.2, 0.99.0 > > Attachments: HBASE-10249-trunk-v0.patch > > > New issue to keep track of this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10210) during master startup, RS can be you-are-dead-ed by master in error
[ https://issues.apache.org/jira/browse/HBASE-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862015#comment-13862015 ] Andrew Purtell commented on HBASE-10210: +1 for 0.98 > during master startup, RS can be you-are-dead-ed by master in error > --- > > Key: HBASE-10210 > URL: https://issues.apache.org/jira/browse/HBASE-10210 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0, 0.96.1, 0.99.0, 0.96.1.1 >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 0.98.0, 0.99.0 > > Attachments: HBASE-10210.01.patch, HBASE-10210.02.patch, > HBASE-10210.03.patch, HBASE-10210.04.patch, HBASE-10210.05.patch, > HBASE-10210.patch > > > Not sure of the root cause yet, I am at "how did this ever work" stage. > We see this problem in 0.96.1, but didn't in 0.96.0 + some patches. > It looks like RS information arriving from 2 sources - ZK and server itself, > can conflict. Master doesn't handle such cases (timestamp match), and anyway > technically timestamps can collide for two separate servers. > So, master YouAreDead-s the already-recorded reporting RS, and adds it too. > Then it discovers that the new server has died with fatal error! > Note the threads. > Addition is called from master initialization and from RPC. > {noformat} > 2013-12-19 11:16:45,290 INFO > [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: > Finished waiting for region servers count to settle; checked in 2, slept for > 18262 ms, expecting minimum of 1, maximum of 2147483647, master is running. > 2013-12-19 11:16:45,290 INFO > [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: > Registering > server=h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > 2013-12-19 11:16:45,290 INFO > [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.HMaster: Registered > server found up in zk but who has not yet reported in: > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] > master.ServerManager: Triggering server recovery; existingServer > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > looks stale, new > server:h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] > master.ServerManager: Master doesn't enable ServerShutdownHandler during > initialization, delay expiring server > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > ... > 2013-12-19 11:16:46,925 ERROR [RpcServer.handler=7,port=6] > master.HMaster: Region server > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > reported a fatal error: > ABORTING region server > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800: > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; > currently processing > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 as > dead server > {noformat} > Presumably some of the recent ZK listener related changes b -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10272) Cluster becomes in-operational if the node hosting the active Master AND ROOT/META table goes offline
[ https://issues.apache.org/jira/browse/HBASE-10272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Kishore updated HBASE-10272: --- Affects Version/s: 0.96.1 Status: Patch Available (was: Open) Submitting to HadoopQA. > Cluster becomes in-operational if the node hosting the active Master AND > ROOT/META table goes offline > - > > Key: HBASE-10272 > URL: https://issues.apache.org/jira/browse/HBASE-10272 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 0.94.15, 0.96.1 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Attachments: HBASE-10272.patch, HBASE-10272_0.94.patch > > > Since HBASE-6364, HBase client caches a connection failure to a server and > any subsequent attempt to connect to the server throws a > {{FailedServerException}} > Now if a node which hosted the active Master AND ROOT/META table goes > offline, the newly anointed Master's initial attempt to connect to the dead > region server will fail with {{NoRouteToHostException}} which it handles but > since on second attempt crashes with {{FailedServerException}} > Here is the log from one such occurance > {noformat} > 2013-11-20 10:58:00,161 FATAL org.apache.hadoop.hbase.master.HMaster: Master > server abort: loaded coprocessors are: [] > 2013-11-20 10:58:00,161 FATAL org.apache.hadoop.hbase.master.HMaster: > Unhandled exception. Starting shutdown. > org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is > in the failed servers list: xxx02/192.168.1.102:60020 > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425) > at > org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) > at $Proxy9.getProtocolVersion(Unknown Source) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138) > at > org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1335) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1294) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1281) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:506) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:383) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:445) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnection(CatalogTracker.java:464) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:624) > at > org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:684) > at > org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:560) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:376) > at java.lang.Thread.run(Thread.java:662) > 2013-11-20 10:58:00,162 INFO org.apache.hadoop.hbase.master.HMaster: Aborting > 2013-11-20 10:58:00,162 INFO org.apache.hadoop.ipc.HBaseServer: Stopping > server on 6 > {noformat} > Each of the backup master will crash with same error and restarting them will > have the same effect. Once this happens, the cluster will remain > in-operational until the node with region server is brought online (or the > Zookeeper node containing the root region server and/or META entry from the > ROOT table is deleted). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10249) Intermittent TestReplicationSyncUpTool failure
[ https://issues.apache.org/jira/browse/HBASE-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862012#comment-13862012 ] Demai Ni commented on HBASE-10249: -- too bad this is a tough one. The debug info shows that the data at source is correct. I need to re-exam the logic of both the testcase and the syncup Tool. sorry about all the troubles. > Intermittent TestReplicationSyncUpTool failure > -- > > Key: HBASE-10249 > URL: https://issues.apache.org/jira/browse/HBASE-10249 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Demai Ni > Fix For: 0.98.0, 0.94.16, 0.96.2, 0.99.0 > > Attachments: HBASE-10249-trunk-v0.patch > > > New issue to keep track of this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10272) Cluster becomes in-operational if the node hosting the active Master AND ROOT/META table goes offline
[ https://issues.apache.org/jira/browse/HBASE-10272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Kishore updated HBASE-10272: --- Attachment: HBASE-10272.patch Patch for trunk > Cluster becomes in-operational if the node hosting the active Master AND > ROOT/META table goes offline > - > > Key: HBASE-10272 > URL: https://issues.apache.org/jira/browse/HBASE-10272 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 0.96.1, 0.94.15 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Attachments: HBASE-10272.patch, HBASE-10272_0.94.patch > > > Since HBASE-6364, HBase client caches a connection failure to a server and > any subsequent attempt to connect to the server throws a > {{FailedServerException}} > Now if a node which hosted the active Master AND ROOT/META table goes > offline, the newly anointed Master's initial attempt to connect to the dead > region server will fail with {{NoRouteToHostException}} which it handles but > since on second attempt crashes with {{FailedServerException}} > Here is the log from one such occurance > {noformat} > 2013-11-20 10:58:00,161 FATAL org.apache.hadoop.hbase.master.HMaster: Master > server abort: loaded coprocessors are: [] > 2013-11-20 10:58:00,161 FATAL org.apache.hadoop.hbase.master.HMaster: > Unhandled exception. Starting shutdown. > org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is > in the failed servers list: xxx02/192.168.1.102:60020 > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425) > at > org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) > at $Proxy9.getProtocolVersion(Unknown Source) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138) > at > org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1335) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1294) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1281) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:506) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:383) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:445) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnection(CatalogTracker.java:464) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:624) > at > org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:684) > at > org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:560) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:376) > at java.lang.Thread.run(Thread.java:662) > 2013-11-20 10:58:00,162 INFO org.apache.hadoop.hbase.master.HMaster: Aborting > 2013-11-20 10:58:00,162 INFO org.apache.hadoop.ipc.HBaseServer: Stopping > server on 6 > {noformat} > Each of the backup master will crash with same error and restarting them will > have the same effect. Once this happens, the cluster will remain > in-operational until the node with region server is brought online (or the > Zookeeper node containing the root region server and/or META entry from the > ROOT table is deleted). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10210) during master startup, RS can be you-are-dead-ed by master in error
[ https://issues.apache.org/jira/browse/HBASE-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862007#comment-13862007 ] Sergey Shelukhin commented on HBASE-10210: -- [~stack] should this also be in 96 branch? > during master startup, RS can be you-are-dead-ed by master in error > --- > > Key: HBASE-10210 > URL: https://issues.apache.org/jira/browse/HBASE-10210 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0, 0.96.1, 0.99.0, 0.96.1.1 >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 0.98.0, 0.99.0 > > Attachments: HBASE-10210.01.patch, HBASE-10210.02.patch, > HBASE-10210.03.patch, HBASE-10210.04.patch, HBASE-10210.05.patch, > HBASE-10210.patch > > > Not sure of the root cause yet, I am at "how did this ever work" stage. > We see this problem in 0.96.1, but didn't in 0.96.0 + some patches. > It looks like RS information arriving from 2 sources - ZK and server itself, > can conflict. Master doesn't handle such cases (timestamp match), and anyway > technically timestamps can collide for two separate servers. > So, master YouAreDead-s the already-recorded reporting RS, and adds it too. > Then it discovers that the new server has died with fatal error! > Note the threads. > Addition is called from master initialization and from RPC. > {noformat} > 2013-12-19 11:16:45,290 INFO > [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: > Finished waiting for region servers count to settle; checked in 2, slept for > 18262 ms, expecting minimum of 1, maximum of 2147483647, master is running. > 2013-12-19 11:16:45,290 INFO > [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: > Registering > server=h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > 2013-12-19 11:16:45,290 INFO > [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.HMaster: Registered > server found up in zk but who has not yet reported in: > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] > master.ServerManager: Triggering server recovery; existingServer > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > looks stale, new > server:h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] > master.ServerManager: Master doesn't enable ServerShutdownHandler during > initialization, delay expiring server > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > ... > 2013-12-19 11:16:46,925 ERROR [RpcServer.handler=7,port=6] > master.HMaster: Region server > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > reported a fatal error: > ABORTING region server > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800: > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; > currently processing > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 as > dead server > {noformat} > Presumably some of the recent ZK listener related changes b -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10276) Hook hbase-native-client up to Native profile.
[ https://issues.apache.org/jira/browse/HBASE-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-10276: -- Description: {code}mvn clean package -Pnative{code} should build the new hbase-native-client module. was: mvn clean package -Pnative should build the new hbase-native-client module. > Hook hbase-native-client up to Native profile. > -- > > Key: HBASE-10276 > URL: https://issues.apache.org/jira/browse/HBASE-10276 > Project: HBase > Issue Type: Sub-task > Components: build, Client >Reporter: Elliott Clark >Assignee: Elliott Clark > > {code}mvn clean package -Pnative{code} > should build the new hbase-native-client module. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8912) [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE
[ https://issues.apache.org/jira/browse/HBASE-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861999#comment-13861999 ] Lars Hofhansl commented on HBASE-8912: -- Yeah, would be nice if the AM could retain a history of assignments of a region and avoid retrying the same RS over and over, it should also do per region rate limiting. Too risky to add this to 0.94, though. As for the warning... You are probably right. The warning might still be an indication for double assignments (i.e. the region was OPEN already as far as the AM was concerned and yet it got another OPENED message from ZK). I think in 0.94 we should leave the warning in, in case we see more issues here in the future. In 0.96+ it's not an issue anyway. > [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to > OFFLINE > -- > > Key: HBASE-8912 > URL: https://issues.apache.org/jira/browse/HBASE-8912 > Project: HBase > Issue Type: Bug >Reporter: Enis Soztutar >Assignee: Lars Hofhansl >Priority: Critical > Fix For: 0.94.16 > > Attachments: 8912-0.94-alt2.txt, 8912-0.94.txt, 8912-fix-race.txt, > HBASE-8912.patch, HBase-0.94 #1036 test - testRetrying [Jenkins].html, > log.txt, org.apache.hadoop.hbase.catalog.TestMetaReaderEditor-output.txt > > > AM throws this exception which subsequently causes the master to abort: > {code} > java.lang.IllegalStateException: Unexpected state : > testRetrying,jjj,1372891751115.9b828792311001062a5ff4b1038fe33b. > state=PENDING_OPEN, ts=1372891751912, > server=hemera.apache.org,39064,1372891746132 .. Cannot transit it to OFFLINE. > at > org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394) > at > org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) > at java.lang.Thread.run(Thread.java:662) > {code} > This exception trace is from the failing test TestMetaReaderEditor which is > failing pretty frequently, but looking at the test code, I think this is not > a test-only issue, but affects the main code path. > https://builds.apache.org/job/HBase-0.94/1036/testReport/junit/org.apache.hadoop.hbase.catalog/TestMetaReaderEditor/testRetrying/ -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10273) AssignmentManager.regions(region to regionserver assignment map) and AssignmentManager.servers(regionserver to regions assignment map) are not always updated in tandem
[ https://issues.apache.org/jira/browse/HBASE-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861987#comment-13861987 ] Lars Hofhansl commented on HBASE-10273: --- Do we need the warning here: {code} + if (sn == null) { +LOG.warn("No region server for " + region); {code} Could serverRegions be null? {code} +Set serverRegions = this.servers.get(sn); +if (!serverRegions.remove(region)) { + LOG.warn("No " + region + " on " + sn); {code} I guess, I'd prefer: {code} - this.regions.remove(region); + ServerName sn = this.regions.remove(region); + if (sn != null) { +Set serverRegions = this.servers.get(sn); +if (serverRegions == null || !serverRegions.remove(region)) { + LOG.warn("No " + region + " on " + sn); +} + } {code} > AssignmentManager.regions(region to regionserver assignment map) and > AssignmentManager.servers(regionserver to regions assignment map) are not > always updated in tandem with each other > --- > > Key: HBASE-10273 > URL: https://issues.apache.org/jira/browse/HBASE-10273 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.94.16 >Reporter: Feng Honghua >Assignee: Feng Honghua > Fix For: 0.94.16 > > Attachments: HBASE-10273-0.94_v0.patch > > > By definition, AssignmentManager.servers and AssignmentManager.regions are > tied and should be updated in tandem with each other under a lock on > AssignmentManager.regions, but there are two places where this protocol is > broken. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (HBASE-8912) [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE
[ https://issues.apache.org/jira/browse/HBASE-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861925#comment-13861925 ] Jean-Marc Spaggiari edited comment on HBASE-8912 at 1/3/14 10:54 PM: - After the first restart, 36 regions are stuck in transition :( But not any server crashed. What I did: - Restored default balancer to make sure as much regions as possible will move. - Stop/start HBase - Run balancer from shell. Every thing is back up after a 2nd restart. I get many errors like this one: {code} 2014-01-03 16:03:03,958 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received FAILED_OPEN for region b75cb9067c3c4456d6198c9237c143b3 from server node4.domain.com,60020,1388782921790 but region was in the state page,rf.idua.www\x1Fhttp\x1F-1\x1F/fr/brand/fr/audi_fleet_solutions/contact/contact_transport_personnes.html\x1Fnull,1379103792232.b75cb9067c3c4456d6198c9237c143b3. state=CLOSED, ts=1388782983373, server=node4.domain.com,60020,1388782921790 and not in OFFLINE, PENDING_OPEN or OPENING {code} After investigations, I figured that snappy was missing on a server. I fixed that, restart: All seems to be fine. So I restored my customized balancer, restart, balanced. Still some warning in the logs: {code} 2014-01-03 16:21:52,864 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENED for region db8e67acde26bf340da481d3c1b934cd from server node4.domain.com,60020,1388784051197 but region was in the state page,moc.tenretnigruoboc.www\x1Fhttp\x1F-1\x1F/cobourg-and-the-web\x1Fnull,1379103844627.db8e67acde26bf340da481d3c1b934cd. state=OPEN, ts=1388784100392, server=node4.domain.com,60020,1388784051197 and not in expected OFFLINE, PENDING_OPEN or OPENING states {code} But this time all the regions are assigned correctly. I did that one more time (change balancer, stop, start, balance. Change balancer, stop, start, balance). I turned loglevel to warn. {code} 2014-01-03 16:28:51,142 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENED for region 17bee313797fc1ce982c0e31fdb6620c from server node8.domain.com,60020,1388784498327 but region was in the state page,rf.ofniecnarf.www\x1Fhttp\x1F-1\x1F/vote/comment/27996/1/vote/zero_vote/c99b0992e5a9cd6bf3a4cfc91769ceeb\x1Fnull,1379104524006.17bee313797fc1ce982c0e31fdb6620c. state=OPEN, ts=1388784531048, server=node8.domain.com,60020,1388784498327 and not in expected OFFLINE, PENDING_OPEN or OPENING states 2014-01-03 16:28:52,135 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENED for region 6dc6290df1855b319f60bf89faa3da41 from server node8.domain.com,60020,1388784498327 but region was in the state page_crc,\x00\x00\x00\x00\xD7\xD9\x97\x8Bvideo.k-wreview.ca,1378042601904.6dc6290df1855b319f60bf89faa3da41. state=OPEN, ts=1388784531793, server=node8.domain.com,60020,1388784498327 and not in expected OFFLINE, PENDING_OPEN or OPENING states 2014-01-03 16:28:52,712 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENED for region ec4f96b6cedd935aeba279b15d5337af from server node8.domain.com,60020,1388784498327 but region was in the state work_proposed,\x98\xBF\xAF\x90\x00\x00\x00\x00http://feedproxy.google.com/~r/WheatWeeds/~3/Of24fZKcpco/the-eighth-day-of-christmas.html,1378975430143.ec4f96b6cedd935aeba279b15d5337af. state=OPEN, ts=1388784532540, server=node8.domain.com,60020,1388784498327 and not in expected OFFLINE, PENDING_OPEN or OPENING states 2014-01-03 16:28:52,747 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENED for region 4f823b5de664556a89cbd86aa41cd0b0 from server node8.domain.com,60020,1388784498327 but region was in the state work_proposed,\x8D4K\xEA\x00\x00\x00\x00http://twitter.com/home?status=CartoonStock%3A++http%3A%2F%2Fwww%2Ecartoonstock%2Ecom%2Fdirectory%2Fc%2Fcream%5Ftea%5Fgifts%2Easp,1378681682935.4f823b5de664556a89cbd86aa41cd0b0. state=OPEN, ts=1388784532552, server=node8.domain.com,60020,1388784498327 and not in expected OFFLINE, PENDING_OPEN or OPENING states 2014-01-03 16:28:53,244 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENED for region da0bd0a6b7187f731fb34d4ac14ca279 from server node8.domain.com,60020,1388784498327 but region was in the state work_proposed,\xB2\xE6\xB6\xBB\x00\x00\x00\x00http://www.canpages.ca/page/QC/notre-dame-des-prairies/concept-beton-design/4550984.html,1378737981443.da0bd0a6b7187f731fb34d4ac14ca279. state=OPEN, ts=1388784533203, server=node8.domain.com,60020,1388784498327 and not in expected OFFLINE, PENDING_OPEN or OPENING states {code} But everything finally got assigned without any restart required, any pretty quickly. Logs from the last run: {code} 2014-01-03 16:32:20,252 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":10969,"call":"balance(), rpc version=1, client version=29, methodsFingerPrint=1
[jira] [Created] (HBASE-10276) Hook hbase-native-client up to Native profile.
Elliott Clark created HBASE-10276: - Summary: Hook hbase-native-client up to Native profile. Key: HBASE-10276 URL: https://issues.apache.org/jira/browse/HBASE-10276 Project: HBase Issue Type: Sub-task Components: build, Client Reporter: Elliott Clark Assignee: Elliott Clark mvn clean package -Pnative should build the new hbase-native-client module. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8912) [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE
[ https://issues.apache.org/jira/browse/HBASE-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861982#comment-13861982 ] Jean-Marc Spaggiari commented on HBASE-8912: Yep, all 36 regions belonged to a server whith Snappy issues. It will have been nice to see them re-assigned successfuly somewhere else. I can reproduce that easily, I just need to remove the snappy so file... Also, regarding the warning, if I clear what, that gives that: 2014-01-03 16:32:21,278 WARN AssignmentManager: Received OPENED for region 0...a from server node1 but region was in the state state=OPEN, and not in expected OFFLINE, PENDING_OPEN or OPENING states If we get OPENED and region is OPEN, can we not simply discard the warning? That mean region is fine, we got a request to open it and it's already done. So why should we worry? ;) > [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to > OFFLINE > -- > > Key: HBASE-8912 > URL: https://issues.apache.org/jira/browse/HBASE-8912 > Project: HBase > Issue Type: Bug >Reporter: Enis Soztutar >Assignee: Lars Hofhansl >Priority: Critical > Fix For: 0.94.16 > > Attachments: 8912-0.94-alt2.txt, 8912-0.94.txt, 8912-fix-race.txt, > HBASE-8912.patch, HBase-0.94 #1036 test - testRetrying [Jenkins].html, > log.txt, org.apache.hadoop.hbase.catalog.TestMetaReaderEditor-output.txt > > > AM throws this exception which subsequently causes the master to abort: > {code} > java.lang.IllegalStateException: Unexpected state : > testRetrying,jjj,1372891751115.9b828792311001062a5ff4b1038fe33b. > state=PENDING_OPEN, ts=1372891751912, > server=hemera.apache.org,39064,1372891746132 .. Cannot transit it to OFFLINE. > at > org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394) > at > org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) > at java.lang.Thread.run(Thread.java:662) > {code} > This exception trace is from the failing test TestMetaReaderEditor which is > failing pretty frequently, but looking at the test code, I think this is not > a test-only issue, but affects the main code path. > https://builds.apache.org/job/HBase-0.94/1036/testReport/junit/org.apache.hadoop.hbase.catalog/TestMetaReaderEditor/testRetrying/ -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HBASE-9977) Define C interface of HBase Client Asynchronous APIs
[ https://issues.apache.org/jira/browse/HBASE-9977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark resolved HBASE-9977. -- Resolution: Fixed Fix Version/s: 0.99.0 Hadoop Flags: Reviewed Thanks for all of the reviews. Now comes the interesting part. > Define C interface of HBase Client Asynchronous APIs > > > Key: HBASE-9977 > URL: https://issues.apache.org/jira/browse/HBASE-9977 > Project: HBase > Issue Type: Sub-task > Components: Client >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 0.99.0 > > Attachments: HBASE-9977-0.patch, HBASE-9977-1.patch, > HBASE-9977-2.patch, HBASE-9977-3.patch, HBASE-9977-4.patch, HBASE-9977-5.patch > > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10249) Intermittent TestReplicationSyncUpTool failure
[ https://issues.apache.org/jira/browse/HBASE-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861980#comment-13861980 ] Ted Yu commented on HBASE-10249: Oops, it failed again: https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/50/testReport/junit/org.apache.hadoop.hbase.replication/TestReplicationSyncUpTool/testSyncUpTool/ > Intermittent TestReplicationSyncUpTool failure > -- > > Key: HBASE-10249 > URL: https://issues.apache.org/jira/browse/HBASE-10249 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Demai Ni > Fix For: 0.98.0, 0.94.16, 0.96.2, 0.99.0 > > Attachments: HBASE-10249-trunk-v0.patch > > > New issue to keep track of this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8741) Scope sequenceid to the region rather than regionserver (WAS: Mutations on Regions in recovery mode might have same sequenceIDs)
[ https://issues.apache.org/jira/browse/HBASE-8741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861978#comment-13861978 ] Himanshu Vashishtha commented on HBASE-8741: Yes, reopening a region is safe. Re-opening a region involves closing and opening it again. On closing, the region is flushed. On flushing, we update the oldestFlushingSeqNums and oldestUnFlushedSeqNums (basically, remove its entry from these maps). Let's say latestSequenceNums still has two entries for that region. There is no corresponding element in oldestUnflushedSeqNums and oldestFlushingSeqNums map for the older entry. It will be ignored when considering that WAL file for archiving. https://github.com/apache/hbase/blob/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java#L676 > Scope sequenceid to the region rather than regionserver (WAS: Mutations on > Regions in recovery mode might have same sequenceIDs) > > > Key: HBASE-8741 > URL: https://issues.apache.org/jira/browse/HBASE-8741 > Project: HBase > Issue Type: Bug > Components: MTTR >Affects Versions: 0.95.1 >Reporter: Himanshu Vashishtha >Assignee: Himanshu Vashishtha > Fix For: 0.98.0 > > Attachments: HBASE-8741-trunk-v6.1-rebased.patch, > HBASE-8741-trunk-v6.2.1.patch, HBASE-8741-trunk-v6.2.2.patch, > HBASE-8741-trunk-v6.2.2.patch, HBASE-8741-trunk-v6.3.patch, > HBASE-8741-trunk-v6.4.patch, HBASE-8741-trunk-v6.patch, HBASE-8741-v0.patch, > HBASE-8741-v2.patch, HBASE-8741-v3.patch, HBASE-8741-v4-again.patch, > HBASE-8741-v4-again.patch, HBASE-8741-v4.patch, HBASE-8741-v5-again.patch, > HBASE-8741-v5.patch > > > Currently, when opening a region, we find the maximum sequence ID from all > its HFiles and then set the LogSequenceId of the log (in case the later is at > a small value). This works good in recovered.edits case as we are not writing > to the region until we have replayed all of its previous edits. > With distributed log replay, if we want to enable writes while a region is > under recovery, we need to make sure that the logSequenceId > maximum > logSequenceId of the old regionserver. Otherwise, we might have a situation > where new edits have same (or smaller) sequenceIds. > We can store region level information in the WALTrailer, than this scenario > could be avoided by: > a) reading the trailer of the "last completed" file, i.e., last wal file > which has a trailer and, > b) completely reading the last wal file (this file would not have the > trailer, so it needs to be read completely). > In future, if we switch to multi wal file, we could read the trailer for all > completed WAL files, and reading the remaining incomplete files. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10264) [MapReduce]: CompactionTool in mapred mode is missing classes in its classpath
[ https://issues.apache.org/jira/browse/HBASE-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861976#comment-13861976 ] Hudson commented on HBASE-10264: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #50 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/50/]) HBASE-10264 CompactionTool in mapred mode is missing classes in its classpath (Himanshu Vashishtha) (ndimiduk: rev 1555182) * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/CompactionTool.java > [MapReduce]: CompactionTool in mapred mode is missing classes in its classpath > -- > > Key: HBASE-10264 > URL: https://issues.apache.org/jira/browse/HBASE-10264 > Project: HBase > Issue Type: Bug > Components: Compaction, mapreduce >Affects Versions: 0.98.0, 0.99.0 >Reporter: Aleksandr Shulman >Assignee: Himanshu Vashishtha > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: HBase-10264.patch > > > Calling o.a.h.h.regionserver.CompactionTool fails due to classpath-related > issues in both MRv1 and MRv2. > {code}hbase org.apache.hadoop.hbase.regionserver.CompactionTool -mapred > -major hdfs://`hostname`:8020/hbase/data/default/orig_1388179858868{code} > Results: > {code}2013-12-27 13:31:49,478 INFO [main] mapreduce.Job: Task Id : > attempt_1388179525649_0011_m_00_2, Status : FAILED > Error: java.lang.ClassNotFoundException: > org.apache.hadoop.hbase.TableInfoMissingException > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionWorker.compact(CompactionTool.java:115) > at > org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionMapper.map(CompactionTool.java:231) > at > org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionMapper.map(CompactionTool.java:207) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1499) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160){code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8912) [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE
[ https://issues.apache.org/jira/browse/HBASE-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861970#comment-13861970 ] Lars Hofhansl commented on HBASE-8912: -- Cool, thanks JM! The warnings are expected to some extend. The 36 regions that got stuck after the first restart, were they assigned to the RS that had SNAPPY missing? > [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to > OFFLINE > -- > > Key: HBASE-8912 > URL: https://issues.apache.org/jira/browse/HBASE-8912 > Project: HBase > Issue Type: Bug >Reporter: Enis Soztutar >Assignee: Lars Hofhansl >Priority: Critical > Fix For: 0.94.16 > > Attachments: 8912-0.94-alt2.txt, 8912-0.94.txt, 8912-fix-race.txt, > HBASE-8912.patch, HBase-0.94 #1036 test - testRetrying [Jenkins].html, > log.txt, org.apache.hadoop.hbase.catalog.TestMetaReaderEditor-output.txt > > > AM throws this exception which subsequently causes the master to abort: > {code} > java.lang.IllegalStateException: Unexpected state : > testRetrying,jjj,1372891751115.9b828792311001062a5ff4b1038fe33b. > state=PENDING_OPEN, ts=1372891751912, > server=hemera.apache.org,39064,1372891746132 .. Cannot transit it to OFFLINE. > at > org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394) > at > org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) > at java.lang.Thread.run(Thread.java:662) > {code} > This exception trace is from the failing test TestMetaReaderEditor which is > failing pretty frequently, but looking at the test code, I think this is not > a test-only issue, but affects the main code path. > https://builds.apache.org/job/HBase-0.94/1036/testReport/junit/org.apache.hadoop.hbase.catalog/TestMetaReaderEditor/testRetrying/ -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9977) Define C interface of HBase Client Asynchronous APIs
[ https://issues.apache.org/jira/browse/HBASE-9977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861963#comment-13861963 ] Jean-Daniel Cryans commented on HBASE-9977: --- +1, also left a comment on review board. > Define C interface of HBase Client Asynchronous APIs > > > Key: HBASE-9977 > URL: https://issues.apache.org/jira/browse/HBASE-9977 > Project: HBase > Issue Type: Sub-task > Components: Client >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HBASE-9977-0.patch, HBASE-9977-1.patch, > HBASE-9977-2.patch, HBASE-9977-3.patch, HBASE-9977-4.patch, HBASE-9977-5.patch > > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10275) [89-fb] Guarantee the sequenceID in each Region is strictly monotonic increasing
Liyin Tang created HBASE-10275: -- Summary: [89-fb] Guarantee the sequenceID in each Region is strictly monotonic increasing Key: HBASE-10275 URL: https://issues.apache.org/jira/browse/HBASE-10275 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Assignee: Liyin Tang [HBASE-8741] has implemented the per-region sequence ID. It would be even better to guarantee that the sequencing is strictly monotonic increasing so that HLog-Based Async Replication is able to delivery transactions in order in the case of region movements. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10272) Cluster becomes in-operational if the node hosting the active Master AND ROOT/META table goes offline
[ https://issues.apache.org/jira/browse/HBASE-10272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861956#comment-13861956 ] Aditya Kishore commented on HBASE-10272: Couldn't find a way to simulate the entire host becoming offline at once. All the kill() and abort() methods close the regions which cleans up the information in ZK which leads up to this situation. > Cluster becomes in-operational if the node hosting the active Master AND > ROOT/META table goes offline > - > > Key: HBASE-10272 > URL: https://issues.apache.org/jira/browse/HBASE-10272 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 0.94.15 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Attachments: HBASE-10272_0.94.patch > > > Since HBASE-6364, HBase client caches a connection failure to a server and > any subsequent attempt to connect to the server throws a > {{FailedServerException}} > Now if a node which hosted the active Master AND ROOT/META table goes > offline, the newly anointed Master's initial attempt to connect to the dead > region server will fail with {{NoRouteToHostException}} which it handles but > since on second attempt crashes with {{FailedServerException}} > Here is the log from one such occurance > {noformat} > 2013-11-20 10:58:00,161 FATAL org.apache.hadoop.hbase.master.HMaster: Master > server abort: loaded coprocessors are: [] > 2013-11-20 10:58:00,161 FATAL org.apache.hadoop.hbase.master.HMaster: > Unhandled exception. Starting shutdown. > org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is > in the failed servers list: xxx02/192.168.1.102:60020 > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425) > at > org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) > at $Proxy9.getProtocolVersion(Unknown Source) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138) > at > org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1335) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1294) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1281) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:506) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:383) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:445) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnection(CatalogTracker.java:464) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:624) > at > org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:684) > at > org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:560) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:376) > at java.lang.Thread.run(Thread.java:662) > 2013-11-20 10:58:00,162 INFO org.apache.hadoop.hbase.master.HMaster: Aborting > 2013-11-20 10:58:00,162 INFO org.apache.hadoop.ipc.HBaseServer: Stopping > server on 6 > {noformat} > Each of the backup master will crash with same error and restarting them will > have the same effect. Once this happens, the cluster will remain > in-operational until the node with region server is brought online (or the > Zookeeper node containing the root region server and/or META entry from the > ROOT table is deleted). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10264) [MapReduce]: CompactionTool in mapred mode is missing classes in its classpath
[ https://issues.apache.org/jira/browse/HBASE-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861947#comment-13861947 ] Hudson commented on HBASE-10264: SUCCESS: Integrated in HBase-0.98 #54 (See [https://builds.apache.org/job/HBase-0.98/54/]) HBASE-10264 CompactionTool in mapred mode is missing classes in its classpath (Himanshu Vashishtha) (ndimiduk: rev 1555182) * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/CompactionTool.java > [MapReduce]: CompactionTool in mapred mode is missing classes in its classpath > -- > > Key: HBASE-10264 > URL: https://issues.apache.org/jira/browse/HBASE-10264 > Project: HBase > Issue Type: Bug > Components: Compaction, mapreduce >Affects Versions: 0.98.0, 0.99.0 >Reporter: Aleksandr Shulman >Assignee: Himanshu Vashishtha > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: HBase-10264.patch > > > Calling o.a.h.h.regionserver.CompactionTool fails due to classpath-related > issues in both MRv1 and MRv2. > {code}hbase org.apache.hadoop.hbase.regionserver.CompactionTool -mapred > -major hdfs://`hostname`:8020/hbase/data/default/orig_1388179858868{code} > Results: > {code}2013-12-27 13:31:49,478 INFO [main] mapreduce.Job: Task Id : > attempt_1388179525649_0011_m_00_2, Status : FAILED > Error: java.lang.ClassNotFoundException: > org.apache.hadoop.hbase.TableInfoMissingException > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionWorker.compact(CompactionTool.java:115) > at > org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionMapper.map(CompactionTool.java:231) > at > org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionMapper.map(CompactionTool.java:207) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1499) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160){code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10078) Dynamic Filter - Not using DynamicClassLoader when using FilterList
[ https://issues.apache.org/jira/browse/HBASE-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861946#comment-13861946 ] Hadoop QA commented on HBASE-10078: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621379/hbase-10078.patch against trunk revision . ATTACHMENT ID: 12621379 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8335//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8335//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8335//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8335//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8335//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8335//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8335//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8335//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8335//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8335//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8335//console This message is automatically generated. > Dynamic Filter - Not using DynamicClassLoader when using FilterList > --- > > Key: HBASE-10078 > URL: https://issues.apache.org/jira/browse/HBASE-10078 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 0.94.13 >Reporter: Federico Gaule >Assignee: Jimmy Xiang >Priority: Minor > Attachments: 0.94-10078.patch, hbase-10078.patch > > > I've tried to use dynamic jar load > (https://issues.apache.org/jira/browse/HBASE-1936) but seems to have an issue > with FilterList. > Here is some log from my app where i send a Get with a FilterList containing > AFilter and other with BFilter. > {noformat} > 2013-12-02 13:55:42,564 DEBUG > org.apache.hadoop.hbase.util.DynamicClassLoader: Class d.p.AFilter not found > - using dynamical class loader > 2013-12-02 13:55:42,564 DEBUG > org.apache.hadoop.hbase.util.DynamicClassLoader: Finding class: d.p.AFilter > 2013-12-02 13:55:42,564 DEBUG > org.apache.hadoop.hbase.util.DynamicClassLoader: Loading new jar files, if any > 2013-12-02 13:55:42,677 DEBUG > org.apache.hadoop.hbase.util.DynamicClassLoader: Finding class again: > d.p.AFilter > 2013-12-02 13:55:43,004 ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable: > Can't find class d.p.BFilter > java.lang.ClassNotFoundException: d.p.BFilter > at java.net.URLClassLoader$1.run(URLClassLoader.java:202) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:190) > at java.lang.ClassLoader.loadClass(ClassLoader.java:306) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) > at java.lang.ClassLoader.loadClass(ClassLoader.java:247) > at java.lang.Class.forName0(
[jira] [Commented] (HBASE-10264) [MapReduce]: CompactionTool in mapred mode is missing classes in its classpath
[ https://issues.apache.org/jira/browse/HBASE-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861939#comment-13861939 ] Hudson commented on HBASE-10264: FAILURE: Integrated in hbase-0.96 #249 (See [https://builds.apache.org/job/hbase-0.96/249/]) HBASE-10264 CompactionTool in mapred mode is missing classes in its classpath (Himanshu Vashishtha) (ndimiduk: rev 1555183) * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/CompactionTool.java > [MapReduce]: CompactionTool in mapred mode is missing classes in its classpath > -- > > Key: HBASE-10264 > URL: https://issues.apache.org/jira/browse/HBASE-10264 > Project: HBase > Issue Type: Bug > Components: Compaction, mapreduce >Affects Versions: 0.98.0, 0.99.0 >Reporter: Aleksandr Shulman >Assignee: Himanshu Vashishtha > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: HBase-10264.patch > > > Calling o.a.h.h.regionserver.CompactionTool fails due to classpath-related > issues in both MRv1 and MRv2. > {code}hbase org.apache.hadoop.hbase.regionserver.CompactionTool -mapred > -major hdfs://`hostname`:8020/hbase/data/default/orig_1388179858868{code} > Results: > {code}2013-12-27 13:31:49,478 INFO [main] mapreduce.Job: Task Id : > attempt_1388179525649_0011_m_00_2, Status : FAILED > Error: java.lang.ClassNotFoundException: > org.apache.hadoop.hbase.TableInfoMissingException > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionWorker.compact(CompactionTool.java:115) > at > org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionMapper.map(CompactionTool.java:231) > at > org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionMapper.map(CompactionTool.java:207) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1499) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160){code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10263) make LruBlockCache single/multi/in-memory ratio user-configurable and provide preemptive mode for in-memory type block
[ https://issues.apache.org/jira/browse/HBASE-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861941#comment-13861941 ] Ted Yu commented on HBASE-10263: In CacheConfig.java : {code} +boolean inMemoryForceMode = conf.getBoolean("hbase.rs.inmemoryforcemode", +false); {code} Is the above variable used ? {code} + * configuration, inMemoryForceMode is a cluster-wide configuration {code} Is there plan to make inMemoryForceMode column-family config ? > make LruBlockCache single/multi/in-memory ratio user-configurable and provide > preemptive mode for in-memory type block > -- > > Key: HBASE-10263 > URL: https://issues.apache.org/jira/browse/HBASE-10263 > Project: HBase > Issue Type: Improvement >Reporter: Feng Honghua >Assignee: Feng Honghua > Attachments: HBASE-10263-trunk_v0.patch > > > currently the single/multi/in-memory ratio in LruBlockCache is hardcoded > 1:2:1, which can lead to somewhat counter-intuition behavior for some user > scenario where in-memory table's read performance is much worse than ordinary > table when two tables' data size is almost equal and larger than > regionserver's cache size (we ever did some such experiment and verified that > in-memory table random read performance is two times worse than ordinary > table). > this patch fixes above issue and provides: > 1. make single/multi/in-memory ratio user-configurable > 2. provide a configurable switch which can make in-memory block preemptive, > by preemptive means when this switch is on in-memory block can kick out any > ordinary block to make room until no ordinary block, when this switch is off > (by default) the behavior is the same as previous, using > single/multi/in-memory ratio to determine evicting. > by default, above two changes are both off and the behavior keeps the same as > before applying this patch. it's client/user's choice to determine whether or > which behavior to use by enabling one of these two enhancements. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9858) Integration test and LoadTestTool support for cell Visibility
[ https://issues.apache.org/jira/browse/HBASE-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861937#comment-13861937 ] Hudson commented on HBASE-9858: --- SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-1.1 #40 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/40/]) HBASE-9858 Integration test and LoadTestTool support for cell Visibility (anoopsamjohn: rev 1555145) * /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/test/LoadTestDataGenerator.java * /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestIngest.java * /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestIngestWithTags.java * /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestIngestWithVisibilityLabels.java * /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestLazyCfLoading.java * /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/StripeCompactionsPerformanceEvaluation.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityLabelFilter.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/LoadTestDataGeneratorWithVisibilityLabels.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/LoadTestDataGeneratorWithTags.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/LoadTestTool.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/MultiThreadedAction.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/MultiThreadedReader.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/MultiThreadedUpdater.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/MultiThreadedWriter.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/MultiThreadedWriterBase.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/RestartMetaTest.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestMiniClusterLoadParallel.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestMiniClusterLoadSequential.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/test * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/test/LoadTestDataGenerator.java > Integration test and LoadTestTool support for cell Visibility > - > > Key: HBASE-9858 > URL: https://issues.apache.org/jira/browse/HBASE-9858 > Project: HBase > Issue Type: Sub-task > Components: security >Affects Versions: 0.98.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 0.98.0, 0.99.0 > > Attachments: HBASE-9858.patch, HBASE-9858_V2.patch, > HBASE-9858_V3.patch, HBASE-9858_V4.patch > > > Cell level visibility should have an integration test and LoadTestTool > support. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10264) [MapReduce]: CompactionTool in mapred mode is missing classes in its classpath
[ https://issues.apache.org/jira/browse/HBASE-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861936#comment-13861936 ] Hudson commented on HBASE-10264: SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-1.1 #40 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/40/]) HBASE-10264 CompactionTool in mapred mode is missing classes in its classpath (Himanshu Vashishtha) (ndimiduk: rev 1555178) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/CompactionTool.java > [MapReduce]: CompactionTool in mapred mode is missing classes in its classpath > -- > > Key: HBASE-10264 > URL: https://issues.apache.org/jira/browse/HBASE-10264 > Project: HBase > Issue Type: Bug > Components: Compaction, mapreduce >Affects Versions: 0.98.0, 0.99.0 >Reporter: Aleksandr Shulman >Assignee: Himanshu Vashishtha > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: HBase-10264.patch > > > Calling o.a.h.h.regionserver.CompactionTool fails due to classpath-related > issues in both MRv1 and MRv2. > {code}hbase org.apache.hadoop.hbase.regionserver.CompactionTool -mapred > -major hdfs://`hostname`:8020/hbase/data/default/orig_1388179858868{code} > Results: > {code}2013-12-27 13:31:49,478 INFO [main] mapreduce.Job: Task Id : > attempt_1388179525649_0011_m_00_2, Status : FAILED > Error: java.lang.ClassNotFoundException: > org.apache.hadoop.hbase.TableInfoMissingException > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionWorker.compact(CompactionTool.java:115) > at > org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionMapper.map(CompactionTool.java:231) > at > org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionMapper.map(CompactionTool.java:207) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1499) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160){code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10210) during master startup, RS can be you-are-dead-ed by master in error
[ https://issues.apache.org/jira/browse/HBASE-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861927#comment-13861927 ] Jimmy Xiang commented on HBASE-10210: - Yes, it is ok with me. > during master startup, RS can be you-are-dead-ed by master in error > --- > > Key: HBASE-10210 > URL: https://issues.apache.org/jira/browse/HBASE-10210 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0, 0.96.1, 0.99.0, 0.96.1.1 >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 0.98.0, 0.99.0 > > Attachments: HBASE-10210.01.patch, HBASE-10210.02.patch, > HBASE-10210.03.patch, HBASE-10210.04.patch, HBASE-10210.05.patch, > HBASE-10210.patch > > > Not sure of the root cause yet, I am at "how did this ever work" stage. > We see this problem in 0.96.1, but didn't in 0.96.0 + some patches. > It looks like RS information arriving from 2 sources - ZK and server itself, > can conflict. Master doesn't handle such cases (timestamp match), and anyway > technically timestamps can collide for two separate servers. > So, master YouAreDead-s the already-recorded reporting RS, and adds it too. > Then it discovers that the new server has died with fatal error! > Note the threads. > Addition is called from master initialization and from RPC. > {noformat} > 2013-12-19 11:16:45,290 INFO > [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: > Finished waiting for region servers count to settle; checked in 2, slept for > 18262 ms, expecting minimum of 1, maximum of 2147483647, master is running. > 2013-12-19 11:16:45,290 INFO > [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: > Registering > server=h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > 2013-12-19 11:16:45,290 INFO > [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.HMaster: Registered > server found up in zk but who has not yet reported in: > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] > master.ServerManager: Triggering server recovery; existingServer > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > looks stale, new > server:h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] > master.ServerManager: Master doesn't enable ServerShutdownHandler during > initialization, delay expiring server > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > ... > 2013-12-19 11:16:46,925 ERROR [RpcServer.handler=7,port=6] > master.HMaster: Region server > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > reported a fatal error: > ABORTING region server > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800: > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; > currently processing > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 as > dead server > {noformat} > Presumably some of the recent ZK listener related changes b -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8912) [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE
[ https://issues.apache.org/jira/browse/HBASE-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861925#comment-13861925 ] Jean-Marc Spaggiari commented on HBASE-8912: After the first restart, 36 regions are stuck in transition :( But not any server crashed. What I did: - Restored default balancer to make sure as much regions as possible will move. - Stop/start HBase - Run balancer from shell. Every thing is back up after a 2nd restart. I get many errors like this one: {code} 2014-01-03 16:03:03,958 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received FAILED_OPEN for region b75cb9067c3c4456d6198c9237c143b3 from server node4.domain.com,60020,1388782921790 but region was in the state page,rf.idua.www\x1Fhttp\x1F-1\x1F/fr/brand/fr/audi_fleet_solutions/contact/contact_transport_personnes.html\x1Fnull,1379103792232.b75cb9067c3c4456d6198c9237c143b3. state=CLOSED, ts=1388782983373, server=node4.domain.com,60020,1388782921790 and not in OFFLINE, PENDING_OPEN or OPENING {code} After investigations, I figured that snappy was missing on a server. I fixed that, restart: All seems to be fine. So I restored my customized balancer, restart, balanced. Still some warning in the logs: {code} 2014-01-03 16:21:52,864 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENED for region db8e67acde26bf340da481d3c1b934cd from server node4.domain.com,60020,1388784051197 but region was in the state page,moc.tenretnigruoboc.www\x1Fhttp\x1F-1\x1F/cobourg-and-the-web\x1Fnull,1379103844627.db8e67acde26bf340da481d3c1b934cd. state=OPEN, ts=1388784100392, server=node4.distparser.com,60020,1388784051197 and not in expected OFFLINE, PENDING_OPEN or OPENING states {code} But this time all the regions are assigned correctly. I did that one more time (change balancer, stop, start, balance. Change balancer, stop, start, balance). I turned loglevel to warn. {code} 2014-01-03 16:28:51,142 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENED for region 17bee313797fc1ce982c0e31fdb6620c from server node8.domain.com,60020,1388784498327 but region was in the state page,rf.ofniecnarf.www\x1Fhttp\x1F-1\x1F/vote/comment/27996/1/vote/zero_vote/c99b0992e5a9cd6bf3a4cfc91769ceeb\x1Fnull,1379104524006.17bee313797fc1ce982c0e31fdb6620c. state=OPEN, ts=1388784531048, server=node8.distparser.com,60020,1388784498327 and not in expected OFFLINE, PENDING_OPEN or OPENING states 2014-01-03 16:28:52,135 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENED for region 6dc6290df1855b319f60bf89faa3da41 from server node8.domain.com,60020,1388784498327 but region was in the state page_crc,\x00\x00\x00\x00\xD7\xD9\x97\x8Bvideo.k-wreview.ca,1378042601904.6dc6290df1855b319f60bf89faa3da41. state=OPEN, ts=1388784531793, server=node8.distparser.com,60020,1388784498327 and not in expected OFFLINE, PENDING_OPEN or OPENING states 2014-01-03 16:28:52,712 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENED for region ec4f96b6cedd935aeba279b15d5337af from server node8.domain.com,60020,1388784498327 but region was in the state work_proposed,\x98\xBF\xAF\x90\x00\x00\x00\x00http://feedproxy.google.com/~r/WheatWeeds/~3/Of24fZKcpco/the-eighth-day-of-christmas.html,1378975430143.ec4f96b6cedd935aeba279b15d5337af. state=OPEN, ts=1388784532540, server=node8.distparser.com,60020,1388784498327 and not in expected OFFLINE, PENDING_OPEN or OPENING states 2014-01-03 16:28:52,747 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENED for region 4f823b5de664556a89cbd86aa41cd0b0 from server node8.distparser.com,60020,1388784498327 but region was in the state work_proposed,\x8D4K\xEA\x00\x00\x00\x00http://twitter.com/home?status=CartoonStock%3A++http%3A%2F%2Fwww%2Ecartoonstock%2Ecom%2Fdirectory%2Fc%2Fcream%5Ftea%5Fgifts%2Easp,1378681682935.4f823b5de664556a89cbd86aa41cd0b0. state=OPEN, ts=1388784532552, server=node8.distparser.com,60020,1388784498327 and not in expected OFFLINE, PENDING_OPEN or OPENING states 2014-01-03 16:28:53,244 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENED for region da0bd0a6b7187f731fb34d4ac14ca279 from server node8.domain.com,60020,1388784498327 but region was in the state work_proposed,\xB2\xE6\xB6\xBB\x00\x00\x00\x00http://www.canpages.ca/page/QC/notre-dame-des-prairies/concept-beton-design/4550984.html,1378737981443.da0bd0a6b7187f731fb34d4ac14ca279. state=OPEN, ts=1388784533203, server=node8.distparser.com,60020,1388784498327 and not in expected OFFLINE, PENDING_OPEN or OPENING states {code} But everything finally got assigned without any restart required, any pretty quickly. Logs from the last run: {code} 2014-01-03 16:32:20,252 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":10969,"call":"balance(), rpc version=1, client version=29, methodsFingerPrint=1886733559","client":"
[jira] [Commented] (HBASE-10210) during master startup, RS can be you-are-dead-ed by master in error
[ https://issues.apache.org/jira/browse/HBASE-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861920#comment-13861920 ] Enis Soztutar commented on HBASE-10210: --- I think the last patch is good to go. Any more comments Jimmy, Liang? > during master startup, RS can be you-are-dead-ed by master in error > --- > > Key: HBASE-10210 > URL: https://issues.apache.org/jira/browse/HBASE-10210 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0, 0.96.1, 0.99.0, 0.96.1.1 >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 0.98.0, 0.99.0 > > Attachments: HBASE-10210.01.patch, HBASE-10210.02.patch, > HBASE-10210.03.patch, HBASE-10210.04.patch, HBASE-10210.05.patch, > HBASE-10210.patch > > > Not sure of the root cause yet, I am at "how did this ever work" stage. > We see this problem in 0.96.1, but didn't in 0.96.0 + some patches. > It looks like RS information arriving from 2 sources - ZK and server itself, > can conflict. Master doesn't handle such cases (timestamp match), and anyway > technically timestamps can collide for two separate servers. > So, master YouAreDead-s the already-recorded reporting RS, and adds it too. > Then it discovers that the new server has died with fatal error! > Note the threads. > Addition is called from master initialization and from RPC. > {noformat} > 2013-12-19 11:16:45,290 INFO > [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: > Finished waiting for region servers count to settle; checked in 2, slept for > 18262 ms, expecting minimum of 1, maximum of 2147483647, master is running. > 2013-12-19 11:16:45,290 INFO > [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: > Registering > server=h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > 2013-12-19 11:16:45,290 INFO > [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.HMaster: Registered > server found up in zk but who has not yet reported in: > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] > master.ServerManager: Triggering server recovery; existingServer > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > looks stale, new > server:h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] > master.ServerManager: Master doesn't enable ServerShutdownHandler during > initialization, delay expiring server > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > ... > 2013-12-19 11:16:46,925 ERROR [RpcServer.handler=7,port=6] > master.HMaster: Region server > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > reported a fatal error: > ABORTING region server > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800: > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; > currently processing > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 as > dead server > {noformat} > Presumably some of the recent ZK listener related changes b -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10210) during master startup, RS can be you-are-dead-ed by master in error
[ https://issues.apache.org/jira/browse/HBASE-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861914#comment-13861914 ] Sergey Shelukhin commented on HBASE-10210: -- should this be ok to commit? > during master startup, RS can be you-are-dead-ed by master in error > --- > > Key: HBASE-10210 > URL: https://issues.apache.org/jira/browse/HBASE-10210 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0, 0.96.1, 0.99.0, 0.96.1.1 >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 0.98.0, 0.99.0 > > Attachments: HBASE-10210.01.patch, HBASE-10210.02.patch, > HBASE-10210.03.patch, HBASE-10210.04.patch, HBASE-10210.05.patch, > HBASE-10210.patch > > > Not sure of the root cause yet, I am at "how did this ever work" stage. > We see this problem in 0.96.1, but didn't in 0.96.0 + some patches. > It looks like RS information arriving from 2 sources - ZK and server itself, > can conflict. Master doesn't handle such cases (timestamp match), and anyway > technically timestamps can collide for two separate servers. > So, master YouAreDead-s the already-recorded reporting RS, and adds it too. > Then it discovers that the new server has died with fatal error! > Note the threads. > Addition is called from master initialization and from RPC. > {noformat} > 2013-12-19 11:16:45,290 INFO > [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: > Finished waiting for region servers count to settle; checked in 2, slept for > 18262 ms, expecting minimum of 1, maximum of 2147483647, master is running. > 2013-12-19 11:16:45,290 INFO > [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: > Registering > server=h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > 2013-12-19 11:16:45,290 INFO > [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.HMaster: Registered > server found up in zk but who has not yet reported in: > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] > master.ServerManager: Triggering server recovery; existingServer > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > looks stale, new > server:h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] > master.ServerManager: Master doesn't enable ServerShutdownHandler during > initialization, delay expiring server > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > ... > 2013-12-19 11:16:46,925 ERROR [RpcServer.handler=7,port=6] > master.HMaster: Region server > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > reported a fatal error: > ABORTING region server > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800: > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; > currently processing > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 as > dead server > {noformat} > Presumably some of the recent ZK listener related changes b -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10264) [MapReduce]: CompactionTool in mapred mode is missing classes in its classpath
[ https://issues.apache.org/jira/browse/HBASE-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861910#comment-13861910 ] Hudson commented on HBASE-10264: SUCCESS: Integrated in HBase-TRUNK #4784 (See [https://builds.apache.org/job/HBase-TRUNK/4784/]) HBASE-10264 CompactionTool in mapred mode is missing classes in its classpath (Himanshu Vashishtha) (ndimiduk: rev 1555178) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/CompactionTool.java > [MapReduce]: CompactionTool in mapred mode is missing classes in its classpath > -- > > Key: HBASE-10264 > URL: https://issues.apache.org/jira/browse/HBASE-10264 > Project: HBase > Issue Type: Bug > Components: Compaction, mapreduce >Affects Versions: 0.98.0, 0.99.0 >Reporter: Aleksandr Shulman >Assignee: Himanshu Vashishtha > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: HBase-10264.patch > > > Calling o.a.h.h.regionserver.CompactionTool fails due to classpath-related > issues in both MRv1 and MRv2. > {code}hbase org.apache.hadoop.hbase.regionserver.CompactionTool -mapred > -major hdfs://`hostname`:8020/hbase/data/default/orig_1388179858868{code} > Results: > {code}2013-12-27 13:31:49,478 INFO [main] mapreduce.Job: Task Id : > attempt_1388179525649_0011_m_00_2, Status : FAILED > Error: java.lang.ClassNotFoundException: > org.apache.hadoop.hbase.TableInfoMissingException > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionWorker.compact(CompactionTool.java:115) > at > org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionMapper.map(CompactionTool.java:231) > at > org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionMapper.map(CompactionTool.java:207) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1499) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160){code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9941) The context ClassLoader isn't set while calling into a coprocessor
[ https://issues.apache.org/jira/browse/HBASE-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861879#comment-13861879 ] Hadoop QA commented on HBASE-9941: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621363/9941.patch against trunk revision . ATTACHMENT ID: 12621363 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8334//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8334//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8334//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8334//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8334//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8334//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8334//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8334//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8334//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8334//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8334//console This message is automatically generated. > The context ClassLoader isn't set while calling into a coprocessor > -- > > Key: HBASE-9941 > URL: https://issues.apache.org/jira/browse/HBASE-9941 > Project: HBase > Issue Type: Sub-task > Components: Coprocessors >Affects Versions: 0.96.0 >Reporter: Benoit Sigoure >Assignee: Andrew Purtell > Fix For: 0.98.0 > > Attachments: 9941.patch, 9941.patch, 9941.patch > > > Whenever one of the methods of a coprocessor is invoked, the context > {{ClassLoader}} isn't set to be the {{CoprocessorClassLoader}}. It's only > set properly when calling the coprocessor's {{start}} method. This means > that if the coprocessor code attempts to load classes using the context > {{ClassLoader}}, it will fail to find the classes it's looking for. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10210) during master startup, RS can be you-are-dead-ed by master in error
[ https://issues.apache.org/jira/browse/HBASE-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10210: --- Fix Version/s: 0.99.0 0.98.0 > during master startup, RS can be you-are-dead-ed by master in error > --- > > Key: HBASE-10210 > URL: https://issues.apache.org/jira/browse/HBASE-10210 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0, 0.96.1, 0.99.0, 0.96.1.1 >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 0.98.0, 0.99.0 > > Attachments: HBASE-10210.01.patch, HBASE-10210.02.patch, > HBASE-10210.03.patch, HBASE-10210.04.patch, HBASE-10210.05.patch, > HBASE-10210.patch > > > Not sure of the root cause yet, I am at "how did this ever work" stage. > We see this problem in 0.96.1, but didn't in 0.96.0 + some patches. > It looks like RS information arriving from 2 sources - ZK and server itself, > can conflict. Master doesn't handle such cases (timestamp match), and anyway > technically timestamps can collide for two separate servers. > So, master YouAreDead-s the already-recorded reporting RS, and adds it too. > Then it discovers that the new server has died with fatal error! > Note the threads. > Addition is called from master initialization and from RPC. > {noformat} > 2013-12-19 11:16:45,290 INFO > [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: > Finished waiting for region servers count to settle; checked in 2, slept for > 18262 ms, expecting minimum of 1, maximum of 2147483647, master is running. > 2013-12-19 11:16:45,290 INFO > [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: > Registering > server=h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > 2013-12-19 11:16:45,290 INFO > [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.HMaster: Registered > server found up in zk but who has not yet reported in: > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] > master.ServerManager: Triggering server recovery; existingServer > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > looks stale, new > server:h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] > master.ServerManager: Master doesn't enable ServerShutdownHandler during > initialization, delay expiring server > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > ... > 2013-12-19 11:16:46,925 ERROR [RpcServer.handler=7,port=6] > master.HMaster: Region server > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 > reported a fatal error: > ABORTING region server > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800: > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; > currently processing > h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 as > dead server > {noformat} > Presumably some of the recent ZK listener related changes b -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10078) Dynamic Filter - Not using DynamicClassLoader when using FilterList
[ https://issues.apache.org/jira/browse/HBASE-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861866#comment-13861866 ] Jimmy Xiang commented on HBASE-10078: - The 0.94 patch fixed the issue in 0.94. The trunk patch just added tests for using DynamicClassLoader in FilterList. > Dynamic Filter - Not using DynamicClassLoader when using FilterList > --- > > Key: HBASE-10078 > URL: https://issues.apache.org/jira/browse/HBASE-10078 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 0.94.13 >Reporter: Federico Gaule >Assignee: Jimmy Xiang >Priority: Minor > Attachments: 0.94-10078.patch, hbase-10078.patch > > > I've tried to use dynamic jar load > (https://issues.apache.org/jira/browse/HBASE-1936) but seems to have an issue > with FilterList. > Here is some log from my app where i send a Get with a FilterList containing > AFilter and other with BFilter. > {noformat} > 2013-12-02 13:55:42,564 DEBUG > org.apache.hadoop.hbase.util.DynamicClassLoader: Class d.p.AFilter not found > - using dynamical class loader > 2013-12-02 13:55:42,564 DEBUG > org.apache.hadoop.hbase.util.DynamicClassLoader: Finding class: d.p.AFilter > 2013-12-02 13:55:42,564 DEBUG > org.apache.hadoop.hbase.util.DynamicClassLoader: Loading new jar files, if any > 2013-12-02 13:55:42,677 DEBUG > org.apache.hadoop.hbase.util.DynamicClassLoader: Finding class again: > d.p.AFilter > 2013-12-02 13:55:43,004 ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable: > Can't find class d.p.BFilter > java.lang.ClassNotFoundException: d.p.BFilter > at java.net.URLClassLoader$1.run(URLClassLoader.java:202) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:190) > at java.lang.ClassLoader.loadClass(ClassLoader.java:306) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) > at java.lang.ClassLoader.loadClass(ClassLoader.java:247) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:247) > at > org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820) > at > org.apache.hadoop.hbase.io.HbaseObjectWritable.getClassByName(HbaseObjectWritable.java:792) > at > org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:679) > at > org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:594) > at > org.apache.hadoop.hbase.filter.FilterList.readFields(FilterList.java:324) > at org.apache.hadoop.hbase.client.Get.readFields(Get.java:405) > at > org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:690) > at > org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:594) > at org.apache.hadoop.hbase.client.Action.readFields(Action.java:101) > at > org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:690) > at > org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:594) > at > org.apache.hadoop.hbase.client.MultiAction.readFields(MultiAction.java:116) > at > org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:690) > at > org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:126) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1311) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1226) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:748) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:539) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:514) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {noformat} > AFilter is not found so it tries with DynamicClassLoader, but when it tries > to load AFilter, it uses URLClassLoader and fails without checking out for > dynamic jars. > I think the issue is releated to FilterList#readFields > {code:title=FilterList.java|borderStyle=solid} > public void readFields(final DataInput in) throws IOException { > byte opByte = in.readByte(); > operator = Operator.values()[opByte]; > int size = in.readInt(); > if (size > 0) { > filters = new ArrayList(size); > for (int i = 0; i < size; i++) { > Filter filter = (Filter)Hbas