[jira] [Commented] (HBASE-10364) Allow configuration option for parent znode in LoadTestTool
[ https://issues.apache.org/jira/browse/HBASE-10364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13873952#comment-13873952 ] Andrew Purtell commented on HBASE-10364: +1 Allow configuration option for parent znode in LoadTestTool --- Key: HBASE-10364 URL: https://issues.apache.org/jira/browse/HBASE-10364 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.98.0, 0.99.0 Attachments: 10364-v1.txt, 10364-v2.txt I saw the following running Hoya functional test which involves LoadTestTool: {code} 2014-01-16 19:06:03,443 [Thread-2] INFO client.HConnectionManager$HConnectionImplementation (HConnectionManager.java:makeStub(1572)) - getMaster attempt 8 of 35 failed; retrying after sleep of 10098, exception=org.apache.hadoop.hbase.MasterNotRunningException: The node /hbase is not in ZooKeeper. It should have been written by themaster. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. {code} LoadTestTool was reading from correct zookeeper quorum but it wasn't able to find parent znode. An option should be added to LoadTestTool so that user can specify parent znode in zookeeper. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (HBASE-6873) Clean up Coprocessor loading failure handling
[ https://issues.apache.org/jira/browse/HBASE-6873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13873957#comment-13873957 ] Andrew Purtell edited comment on HBASE-6873 at 1/16/14 9:17 PM: I checked the build report of HBase-TRUNK #4825 (https://builds.apache.org/job/HBase-TRUNK/4825/) . This is an occasional failure with TestHBaseFsck that turns up now and again in some environments, generally builds.apache.org, and is not related. was (Author: apurtell): I checked the build report of HBase-TRUNK #4825 (https://builds.apache.org/job/HBase-TRUNK/4825/) . This is an occasional failure with TestHBaseFsck that turns up now and again and is not related. Clean up Coprocessor loading failure handling - Key: HBASE-6873 URL: https://issues.apache.org/jira/browse/HBASE-6873 Project: HBase Issue Type: Sub-task Components: Coprocessors, regionserver Affects Versions: 0.98.0 Reporter: David Arthur Assignee: Andrew Purtell Priority: Blocker Fix For: 0.98.0, 0.99.0 Attachments: 6873.patch, 6873.patch, 6873.patch, 6873.patch, 6873.patch, 6873.patch, 6873.patch When registering a coprocessor with a missing dependency, the regionserver gets stuck in an infinite fail loop. Restarting the regionserver and/or master has no affect. E.g., Load coprocessor from my-coproc.jar, that uses an external dependency (kafka) that is not included with HBase. {code} 12/09/24 13:13:15 INFO handler.OpenRegionHandler: Opening of region {NAME = 'documents,,1348505987177.6d1e1b7bb93486f096173bd401e8ef6b.', STARTKEY = '', ENDKEY = '', ENCODED = 6d1e1b7bb93486f096173bd401e8ef6b,} failed, marking as FAILED_OPEN in ZK 12/09/24 13:13:15 DEBUG zookeeper.ZKAssign: regionserver:60020-0x139f43af2a70043 Attempting to transition node 6d1e1b7bb93486f096173bd401e8ef6b from RS_ZK_REGION_OPENING to RS_ZK_REGION_FAILED_OPEN 12/09/24 13:13:15 DEBUG zookeeper.ZKAssign: regionserver:60020-0x139f43af2a70043 Successfully transitioned node 6d1e1b7bb93486f096173bd401e8ef6b from RS_ZK_REGION_OPENING to RS_ZK_REGION_FAILED_OPEN 12/09/24 13:13:15 INFO regionserver.HRegionServer: Received request to open region: documents,,1348505987177.6d1e1b7bb93486f096173bd401e8ef6b. 12/09/24 13:13:15 DEBUG zookeeper.ZKAssign: regionserver:60020-0x139f43af2a70043 Attempting to transition node 6d1e1b7bb93486f096173bd401e8ef6b from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING 12/09/24 13:13:15 DEBUG zookeeper.ZKAssign: regionserver:60020-0x139f43af2a70043 Successfully transitioned node 6d1e1b7bb93486f096173bd401e8ef6b from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING 12/09/24 13:13:15 DEBUG regionserver.HRegion: Opening region: {NAME = 'documents,,1348505987177.6d1e1b7bb93486f096173bd401e8ef6b.', STARTKEY = '', ENDKEY = '', ENCODED = 6d1e1b7bb93486f096173bd401e8ef6b,} 12/09/24 13:13:15 INFO regionserver.HRegion: Setting up tabledescriptor config now ... 12/09/24 13:13:15 INFO coprocessor.CoprocessorHost: Class com.mycompany.hbase.documents.DocumentObserverCoprocessor needs to be loaded from a file - file:/path/to/my-coproc.jar. 12/09/24 13:13:16 ERROR handler.OpenRegionHandler: Failed open of region=documents,,1348505987177.6d1e1b7bb93486f096173bd401e8ef6b., starting to roll back the global memstore size. java.lang.IllegalStateException: Could not instantiate a region instance. at org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:3595) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3733) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:332) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:108) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:169) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedConstructorAccessor15.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:3592) ... 7 more Caused by: java.lang.NoClassDefFoundError: kafka/common/NoBrokersForPartitionException at java.lang.Class.getDeclaredConstructors0(Native Method) at
[jira] [Commented] (HBASE-6873) Clean up Coprocessor loading failure handling
[ https://issues.apache.org/jira/browse/HBASE-6873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13873957#comment-13873957 ] Andrew Purtell commented on HBASE-6873: --- I checked the build report of HBase-TRUNK #4825 (https://builds.apache.org/job/HBase-TRUNK/4825/) . This is an occasional failure with TestHBaseFsck that turns up now and again and is not related. Clean up Coprocessor loading failure handling - Key: HBASE-6873 URL: https://issues.apache.org/jira/browse/HBASE-6873 Project: HBase Issue Type: Sub-task Components: Coprocessors, regionserver Affects Versions: 0.98.0 Reporter: David Arthur Assignee: Andrew Purtell Priority: Blocker Fix For: 0.98.0, 0.99.0 Attachments: 6873.patch, 6873.patch, 6873.patch, 6873.patch, 6873.patch, 6873.patch, 6873.patch When registering a coprocessor with a missing dependency, the regionserver gets stuck in an infinite fail loop. Restarting the regionserver and/or master has no affect. E.g., Load coprocessor from my-coproc.jar, that uses an external dependency (kafka) that is not included with HBase. {code} 12/09/24 13:13:15 INFO handler.OpenRegionHandler: Opening of region {NAME = 'documents,,1348505987177.6d1e1b7bb93486f096173bd401e8ef6b.', STARTKEY = '', ENDKEY = '', ENCODED = 6d1e1b7bb93486f096173bd401e8ef6b,} failed, marking as FAILED_OPEN in ZK 12/09/24 13:13:15 DEBUG zookeeper.ZKAssign: regionserver:60020-0x139f43af2a70043 Attempting to transition node 6d1e1b7bb93486f096173bd401e8ef6b from RS_ZK_REGION_OPENING to RS_ZK_REGION_FAILED_OPEN 12/09/24 13:13:15 DEBUG zookeeper.ZKAssign: regionserver:60020-0x139f43af2a70043 Successfully transitioned node 6d1e1b7bb93486f096173bd401e8ef6b from RS_ZK_REGION_OPENING to RS_ZK_REGION_FAILED_OPEN 12/09/24 13:13:15 INFO regionserver.HRegionServer: Received request to open region: documents,,1348505987177.6d1e1b7bb93486f096173bd401e8ef6b. 12/09/24 13:13:15 DEBUG zookeeper.ZKAssign: regionserver:60020-0x139f43af2a70043 Attempting to transition node 6d1e1b7bb93486f096173bd401e8ef6b from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING 12/09/24 13:13:15 DEBUG zookeeper.ZKAssign: regionserver:60020-0x139f43af2a70043 Successfully transitioned node 6d1e1b7bb93486f096173bd401e8ef6b from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING 12/09/24 13:13:15 DEBUG regionserver.HRegion: Opening region: {NAME = 'documents,,1348505987177.6d1e1b7bb93486f096173bd401e8ef6b.', STARTKEY = '', ENDKEY = '', ENCODED = 6d1e1b7bb93486f096173bd401e8ef6b,} 12/09/24 13:13:15 INFO regionserver.HRegion: Setting up tabledescriptor config now ... 12/09/24 13:13:15 INFO coprocessor.CoprocessorHost: Class com.mycompany.hbase.documents.DocumentObserverCoprocessor needs to be loaded from a file - file:/path/to/my-coproc.jar. 12/09/24 13:13:16 ERROR handler.OpenRegionHandler: Failed open of region=documents,,1348505987177.6d1e1b7bb93486f096173bd401e8ef6b., starting to roll back the global memstore size. java.lang.IllegalStateException: Could not instantiate a region instance. at org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:3595) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3733) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:332) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:108) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:169) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedConstructorAccessor15.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:3592) ... 7 more Caused by: java.lang.NoClassDefFoundError: kafka/common/NoBrokersForPartitionException at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2389) at java.lang.Class.getConstructor0(Class.java:2699) at java.lang.Class.newInstance0(Class.java:326) at java.lang.Class.newInstance(Class.java:308) at org.apache.hadoop.hbase.coprocessor.CoprocessorHost.loadInstance(CoprocessorHost.java:254) at
[jira] [Commented] (HBASE-10363) [0.94] TestInputSampler and TestInputSamplerTool fail under hadoop 2.0/23 profiles.
[ https://issues.apache.org/jira/browse/HBASE-10363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13873969#comment-13873969 ] Jonathan Hsieh commented on HBASE-10363: Yes. I ran these test against the apache 0.94.15 release and also the tip of 0.94 this morning. [0.94] TestInputSampler and TestInputSamplerTool fail under hadoop 2.0/23 profiles. --- Key: HBASE-10363 URL: https://issues.apache.org/jira/browse/HBASE-10363 Project: HBase Issue Type: Bug Affects Versions: 0.94.15 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Priority: Critical Fix For: 0.94.16 From tip of 0.94 and from 0.94.15. {code} jon@swoop:~/proj/hbase-0.94$ mvn clean test -Dhadoop.profile=2.0 -Dtest=TestInputSampler,TestInputSamplerTool -PlocalTests ... Running org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSamplerTool Tests run: 4, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 3.718 sec FAILURE! Running org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSampler Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.666 sec FAILURE! Results : Tests in error: testSplitInterval(org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSamplerTool): Failed getting constructor testSplitRamdom(org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSamplerTool): Failed getting constructor testSplitSample(org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSamplerTool): Failed getting constructor testSplitSampler(org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSampler): Failed getting constructor testIntervalSampler(org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSampler): Failed getting constructor Tests run: 6, Failures: 0, Errors: 5, Skipped: 0 {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10363) [0.94] TestInputSampler and TestInputSamplerTool fail under hadoop 2.0/23 profiles.
[ https://issues.apache.org/jira/browse/HBASE-10363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-10363: --- Assignee: (was: Jonathan Hsieh) [0.94] TestInputSampler and TestInputSamplerTool fail under hadoop 2.0/23 profiles. --- Key: HBASE-10363 URL: https://issues.apache.org/jira/browse/HBASE-10363 Project: HBase Issue Type: Bug Affects Versions: 0.94.15 Reporter: Jonathan Hsieh Priority: Critical Fix For: 0.94.16 From tip of 0.94 and from 0.94.15. {code} jon@swoop:~/proj/hbase-0.94$ mvn clean test -Dhadoop.profile=2.0 -Dtest=TestInputSampler,TestInputSamplerTool -PlocalTests ... Running org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSamplerTool Tests run: 4, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 3.718 sec FAILURE! Running org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSampler Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.666 sec FAILURE! Results : Tests in error: testSplitInterval(org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSamplerTool): Failed getting constructor testSplitRamdom(org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSamplerTool): Failed getting constructor testSplitSample(org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSamplerTool): Failed getting constructor testSplitSampler(org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSampler): Failed getting constructor testIntervalSampler(org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSampler): Failed getting constructor Tests run: 6, Failures: 0, Errors: 5, Skipped: 0 {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10365) HBaseFsck should clean up connection properly when repair is completed
Ted Yu created HBASE-10365: -- Summary: HBaseFsck should clean up connection properly when repair is completed Key: HBASE-10365 URL: https://issues.apache.org/jira/browse/HBASE-10365 Project: HBase Issue Type: Bug Reporter: Ted Yu At the end of exec() method, connections to the cluster are not properly released. Connections should be released upon completion of repair. This was mentioned by Jean-Marc in the thread '[VOTE] The 1st hbase 0.94.16 release candidate is available for download' -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10349) Table became unusable when master balanced its region after table was dropped
[ https://issues.apache.org/jira/browse/HBASE-10349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10349: --- Affects Version/s: 0.98.0 Fix Version/s: 0.99.0 0.98.0 Ted said he saw this while testing 0.98, bringing it in Table became unusable when master balanced its region after table was dropped - Key: HBASE-10349 URL: https://issues.apache.org/jira/browse/HBASE-10349 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Ted Yu Assignee: Jimmy Xiang Fix For: 0.98.0, 0.99.0 Attachments: 10349-hadoop-hdfs-namenode-hor11n14.gq1.ygridcore.net.zip, 10349-output.log, 10349-v1.txt, 10349-v2.txt, HBASE-10349-meta-test-and-debug.patch, hbase-hbase-master-hor15n05.gq1.ygridcore.net.log.tar.gz 0.98 was used. This was sequence of events: create 'tablethree_mod' snapshot 'tablethree_mod', 'snapshot_tablethree_mod' disable 'tablethree_mod' 2014-01-15 09:34:51,749 restore_snapshot 'snapshot_tablethree_mod' 2014-01-15 09:35:07,210 enable 'tablethree_mod' 2014-01-15 09:35:46,134 delete_snapshot 'snapshot_tablethree_mod' 2014-01-15 09:41:42,210 disable 'tablethree_mod' 2014-01-15 09:41:43,610 drop 'tablethree_mod' create 'tablethree_mod' For the last table creation request: {code} 2014-01-15 10:03:52,999|beaver.component.hbase|INFO| 'create 'tablethree_mod', {NAME = 'f1', VERSIONS = 3} , {NAME = 'f2', VERSIONS = 3} , {NAME = 'f3', VERSIONS = 3} ' 2014-01-15 10:03:52,999|beaver.component.hbase|INFO| 'exists 'tablethree_mod'' 2014-01-15 10:03:52,999|beaver.component.hbase|INFO| 'put 'tablethree_mod', '0', 'f1:q1', 'value-0', 10' 2014-01-15 10:03:52,999|beaver.component.hbase|INFO| 'put 'tablethree_mod', '1', 'f1:q1', 'value-1', 20' 2014-01-15 10:03:53,000|beaver.component.hbase|INFO| 'put 'tablethree_mod', '2', 'f2:q2', 'value-2', 30' 2014-01-15 10:03:53,000|beaver.component.hbase|INFO| 'put 'tablethree_mod', '3', 'f3:q3', 'value-3', 40' 2014-01-15 10:03:53,000|beaver.component.hbase|INFO| 'put 'tablethree_mod', '4', 'f3:q3', 'value-4', 50' 2014-01-15 10:03:53,000|beaver.component.hbase|INFO|Done writing commands to file. Will execute them now. 2014-01-15 10:03:53,000|beaver.machine|INFO|RUNNING: /usr/lib/hbase/bin/hbase shell /grid/0/tmp/hwqe/artifacts/tmp-471142 2014-01-15 10:03:55,878|beaver.machine|INFO|2014-01-15 10:03:55,878 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 2014-01-15 10:03:57,283|beaver.machine|INFO|2014-01-15 10:03:57,283 WARN [main] conf.Configuration: hbase-site.xml:an attempt to override final parameter: dfs.support.append; Ignoring. 2014-01-15 10:03:57,669|beaver.machine|INFO|2014-01-15 10:03:57,669 WARN [main] conf.Configuration: hbase-site.xml:an attempt to override final parameter: dfs.support.append; Ignoring. 2014-01-15 10:03:57,720|beaver.machine|INFO|2014-01-15 10:03:57,720 WARN [main] conf.Configuration: hbase-site.xml:an attempt to override final parameter: dfs.support.append; Ignoring. 2014-01-15 10:03:57,997|beaver.machine|INFO| 2014-01-15 10:03:57,997|beaver.machine|INFO|ERROR: Table already exists: tablethree_mod! 2014-01-15 10:03:57,997|beaver.machine|INFO| {code} This was an intermittent issue after using Snapshots, a table is not properly dropped / and not able to properly re-create with the same name. And a HRegion is empty or null Error occurs. (When you try to drop the table it says it does not exist, and when you try to create the table it says that it does already exist). {code} 2014-01-15 10:04:02,462|beaver.machine|INFO|ERROR: HRegionInfo was null or empty in hbase:meta, row=keyvalues= {tablethree_mod,,1389778226606.afc82d1ceabbaca36a504b83b65fc0c9./info:seqnumDuringOpen/1389778905355/Put/vlen=8/mvcc=0, tablethree_mod,,1389778226606.afc82d1ceabbaca36a504b83b65fc0c9./info:server/1389778905355/Put/vlen=32/mvcc=0, tablethree_mod,,1389778226606.afc82d1ceabbaca36a504b83b65fc0c9./info:serverstartcode/1389778905355/Put/vlen=8/mvcc=0} {code} Thanks to Huned who discovered this issue. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-6873) Clean up Coprocessor loading failure handling
[ https://issues.apache.org/jira/browse/HBASE-6873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13873978#comment-13873978 ] Hudson commented on HBASE-6873: --- SUCCESS: Integrated in HBase-0.98 #86 (See [https://builds.apache.org/job/HBase-0.98/86/]) HBASE-6873. Clean up Coprocessor loading failure handling (apurtell: rev 1558870) * /hbase/branches/0.98/hbase-common/src/main/resources/hbase-default.xml * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerCoprocessorHost.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCoprocessorHost.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/constraint/TestConstraint.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithAbort.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithRemove.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithRemove.java * /hbase/branches/0.98/hbase-shell/src/test/java/org/apache/hadoop/hbase/client/TestShell.java Clean up Coprocessor loading failure handling - Key: HBASE-6873 URL: https://issues.apache.org/jira/browse/HBASE-6873 Project: HBase Issue Type: Sub-task Components: Coprocessors, regionserver Affects Versions: 0.98.0 Reporter: David Arthur Assignee: Andrew Purtell Priority: Blocker Fix For: 0.98.0, 0.99.0 Attachments: 6873.patch, 6873.patch, 6873.patch, 6873.patch, 6873.patch, 6873.patch, 6873.patch When registering a coprocessor with a missing dependency, the regionserver gets stuck in an infinite fail loop. Restarting the regionserver and/or master has no affect. E.g., Load coprocessor from my-coproc.jar, that uses an external dependency (kafka) that is not included with HBase. {code} 12/09/24 13:13:15 INFO handler.OpenRegionHandler: Opening of region {NAME = 'documents,,1348505987177.6d1e1b7bb93486f096173bd401e8ef6b.', STARTKEY = '', ENDKEY = '', ENCODED = 6d1e1b7bb93486f096173bd401e8ef6b,} failed, marking as FAILED_OPEN in ZK 12/09/24 13:13:15 DEBUG zookeeper.ZKAssign: regionserver:60020-0x139f43af2a70043 Attempting to transition node 6d1e1b7bb93486f096173bd401e8ef6b from RS_ZK_REGION_OPENING to RS_ZK_REGION_FAILED_OPEN 12/09/24 13:13:15 DEBUG zookeeper.ZKAssign: regionserver:60020-0x139f43af2a70043 Successfully transitioned node 6d1e1b7bb93486f096173bd401e8ef6b from RS_ZK_REGION_OPENING to RS_ZK_REGION_FAILED_OPEN 12/09/24 13:13:15 INFO regionserver.HRegionServer: Received request to open region: documents,,1348505987177.6d1e1b7bb93486f096173bd401e8ef6b. 12/09/24 13:13:15 DEBUG zookeeper.ZKAssign: regionserver:60020-0x139f43af2a70043 Attempting to transition node 6d1e1b7bb93486f096173bd401e8ef6b from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING 12/09/24 13:13:15 DEBUG zookeeper.ZKAssign: regionserver:60020-0x139f43af2a70043 Successfully transitioned node 6d1e1b7bb93486f096173bd401e8ef6b from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING 12/09/24 13:13:15 DEBUG regionserver.HRegion: Opening region: {NAME = 'documents,,1348505987177.6d1e1b7bb93486f096173bd401e8ef6b.', STARTKEY = '', ENDKEY = '', ENCODED = 6d1e1b7bb93486f096173bd401e8ef6b,} 12/09/24 13:13:15 INFO regionserver.HRegion: Setting up tabledescriptor config now ... 12/09/24 13:13:15 INFO coprocessor.CoprocessorHost: Class com.mycompany.hbase.documents.DocumentObserverCoprocessor needs to be loaded from a file - file:/path/to/my-coproc.jar. 12/09/24 13:13:16 ERROR handler.OpenRegionHandler: Failed open of region=documents,,1348505987177.6d1e1b7bb93486f096173bd401e8ef6b., starting to roll back the global memstore size. java.lang.IllegalStateException: Could not instantiate a region instance. at org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:3595) at
[jira] [Commented] (HBASE-6873) Clean up Coprocessor loading failure handling
[ https://issues.apache.org/jira/browse/HBASE-6873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13873982#comment-13873982 ] Hudson commented on HBASE-6873: --- SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #78 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/78/]) HBASE-6873. Clean up Coprocessor loading failure handling (apurtell: rev 1558870) * /hbase/branches/0.98/hbase-common/src/main/resources/hbase-default.xml * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerCoprocessorHost.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCoprocessorHost.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/constraint/TestConstraint.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithAbort.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithRemove.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithRemove.java * /hbase/branches/0.98/hbase-shell/src/test/java/org/apache/hadoop/hbase/client/TestShell.java Clean up Coprocessor loading failure handling - Key: HBASE-6873 URL: https://issues.apache.org/jira/browse/HBASE-6873 Project: HBase Issue Type: Sub-task Components: Coprocessors, regionserver Affects Versions: 0.98.0 Reporter: David Arthur Assignee: Andrew Purtell Priority: Blocker Fix For: 0.98.0, 0.99.0 Attachments: 6873.patch, 6873.patch, 6873.patch, 6873.patch, 6873.patch, 6873.patch, 6873.patch When registering a coprocessor with a missing dependency, the regionserver gets stuck in an infinite fail loop. Restarting the regionserver and/or master has no affect. E.g., Load coprocessor from my-coproc.jar, that uses an external dependency (kafka) that is not included with HBase. {code} 12/09/24 13:13:15 INFO handler.OpenRegionHandler: Opening of region {NAME = 'documents,,1348505987177.6d1e1b7bb93486f096173bd401e8ef6b.', STARTKEY = '', ENDKEY = '', ENCODED = 6d1e1b7bb93486f096173bd401e8ef6b,} failed, marking as FAILED_OPEN in ZK 12/09/24 13:13:15 DEBUG zookeeper.ZKAssign: regionserver:60020-0x139f43af2a70043 Attempting to transition node 6d1e1b7bb93486f096173bd401e8ef6b from RS_ZK_REGION_OPENING to RS_ZK_REGION_FAILED_OPEN 12/09/24 13:13:15 DEBUG zookeeper.ZKAssign: regionserver:60020-0x139f43af2a70043 Successfully transitioned node 6d1e1b7bb93486f096173bd401e8ef6b from RS_ZK_REGION_OPENING to RS_ZK_REGION_FAILED_OPEN 12/09/24 13:13:15 INFO regionserver.HRegionServer: Received request to open region: documents,,1348505987177.6d1e1b7bb93486f096173bd401e8ef6b. 12/09/24 13:13:15 DEBUG zookeeper.ZKAssign: regionserver:60020-0x139f43af2a70043 Attempting to transition node 6d1e1b7bb93486f096173bd401e8ef6b from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING 12/09/24 13:13:15 DEBUG zookeeper.ZKAssign: regionserver:60020-0x139f43af2a70043 Successfully transitioned node 6d1e1b7bb93486f096173bd401e8ef6b from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING 12/09/24 13:13:15 DEBUG regionserver.HRegion: Opening region: {NAME = 'documents,,1348505987177.6d1e1b7bb93486f096173bd401e8ef6b.', STARTKEY = '', ENDKEY = '', ENCODED = 6d1e1b7bb93486f096173bd401e8ef6b,} 12/09/24 13:13:15 INFO regionserver.HRegion: Setting up tabledescriptor config now ... 12/09/24 13:13:15 INFO coprocessor.CoprocessorHost: Class com.mycompany.hbase.documents.DocumentObserverCoprocessor needs to be loaded from a file - file:/path/to/my-coproc.jar. 12/09/24 13:13:16 ERROR handler.OpenRegionHandler: Failed open of region=documents,,1348505987177.6d1e1b7bb93486f096173bd401e8ef6b., starting to roll back the global memstore size. java.lang.IllegalStateException: Could not instantiate a region instance. at org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:3595) at
[jira] [Commented] (HBASE-9721) RegionServer should not accept regionOpen RPC intended for another(previous) server
[ https://issues.apache.org/jira/browse/HBASE-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13873989#comment-13873989 ] Enis Soztutar commented on HBASE-9721: -- Ok, let's do the revert. I'll inspect the test cases for finding out why they started failing sporadically. RegionServer should not accept regionOpen RPC intended for another(previous) server --- Key: HBASE-9721 URL: https://issues.apache.org/jira/browse/HBASE-9721 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.99.0 Attachments: hbase-9721_v0.patch, hbase-9721_v1.patch, hbase-9721_v2.patch, hbase-9721_v3.patch On a test cluster, this following events happened with ITBLL and CM leading to meta being unavailable until master is restarted. An RS carrying meta died, and master assigned the region to one of the RSs. {code} 2013-10-03 23:30:06,611 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] master.AssignmentManager: Assigning hbase:meta,,1.1588230740 to gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 2013-10-03 23:30:06,611 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] master.RegionStates: Transitioned {1588230740 state=OFFLINE, ts=1380843006601, server=null} to {1588230740 state=PENDING_OPEN, ts=1380843006611, server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820} 2013-10-03 23:30:06,611 DEBUG [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] master.ServerManager: New admin connection to gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 {code} At the same time, the RS that meta recently got assigned also died (due to CM), and restarted: {code} 2013-10-03 23:30:07,636 DEBUG [RpcServer.handler=17,port=6] master.ServerManager: REPORT: Server gs-hdp2-secure-1380781860-hbase-8.cs1cloud.internal,60020,1380843002494 came back up, removed it from the dead servers list 2013-10-03 23:30:08,769 INFO [RpcServer.handler=18,port=6] master.ServerManager: Triggering server recovery; existingServer gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 looks stale, new server:gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380843006362 2013-10-03 23:30:08,771 DEBUG [RpcServer.handler=18,port=6] master.AssignmentManager: Checking region=hbase:meta,,1.1588230740, zk server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 current=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820, matches=true 2013-10-03 23:30:08,771 DEBUG [RpcServer.handler=18,port=6] master.ServerManager: Added=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 to dead servers, submitted shutdown handler to be executed meta=true 2013-10-03 23:30:08,771 INFO [RpcServer.handler=18,port=6] master.ServerManager: Registering server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380843006362 2013-10-03 23:30:08,772 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] handler.MetaServerShutdownHandler: Splitting hbase:meta logs for gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 {code} AM/SSH sees that the RS that died was carrying meta, but the assignment RPC request was still not sent: {code} 2013-10-03 23:30:08,791 DEBUG [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] master.AssignmentManager: Checking region=hbase:meta,,1.1588230740, zk server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 current=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820, matches=true 2013-10-03 23:30:08,791 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] handler.MetaServerShutdownHandler: Server gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 was carrying META. Trying to assign. 2013-10-03 23:30:08,791 DEBUG [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] master.RegionStates: Offline 1588230740 with current state=PENDING_OPEN, expected state=OFFLINE/SPLITTING/MERGING 2013-10-03 23:30:08,791 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] master.RegionStates: Transitioned {1588230740 state=PENDING_OPEN, ts=1380843006611, server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820} to {1588230740 state=OFFLINE, ts=1380843008791, server=null} 2013-10-03 23:30:09,809 INFO
[jira] [Commented] (HBASE-9721) RegionServer should not accept regionOpen RPC intended for another(previous) server
[ https://issues.apache.org/jira/browse/HBASE-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13873993#comment-13873993 ] Andrew Purtell commented on HBASE-9721: --- Ok, I can do it if you don't. .. ? RegionServer should not accept regionOpen RPC intended for another(previous) server --- Key: HBASE-9721 URL: https://issues.apache.org/jira/browse/HBASE-9721 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.99.0 Attachments: hbase-9721_v0.patch, hbase-9721_v1.patch, hbase-9721_v2.patch, hbase-9721_v3.patch On a test cluster, this following events happened with ITBLL and CM leading to meta being unavailable until master is restarted. An RS carrying meta died, and master assigned the region to one of the RSs. {code} 2013-10-03 23:30:06,611 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] master.AssignmentManager: Assigning hbase:meta,,1.1588230740 to gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 2013-10-03 23:30:06,611 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] master.RegionStates: Transitioned {1588230740 state=OFFLINE, ts=1380843006601, server=null} to {1588230740 state=PENDING_OPEN, ts=1380843006611, server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820} 2013-10-03 23:30:06,611 DEBUG [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] master.ServerManager: New admin connection to gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 {code} At the same time, the RS that meta recently got assigned also died (due to CM), and restarted: {code} 2013-10-03 23:30:07,636 DEBUG [RpcServer.handler=17,port=6] master.ServerManager: REPORT: Server gs-hdp2-secure-1380781860-hbase-8.cs1cloud.internal,60020,1380843002494 came back up, removed it from the dead servers list 2013-10-03 23:30:08,769 INFO [RpcServer.handler=18,port=6] master.ServerManager: Triggering server recovery; existingServer gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 looks stale, new server:gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380843006362 2013-10-03 23:30:08,771 DEBUG [RpcServer.handler=18,port=6] master.AssignmentManager: Checking region=hbase:meta,,1.1588230740, zk server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 current=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820, matches=true 2013-10-03 23:30:08,771 DEBUG [RpcServer.handler=18,port=6] master.ServerManager: Added=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 to dead servers, submitted shutdown handler to be executed meta=true 2013-10-03 23:30:08,771 INFO [RpcServer.handler=18,port=6] master.ServerManager: Registering server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380843006362 2013-10-03 23:30:08,772 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] handler.MetaServerShutdownHandler: Splitting hbase:meta logs for gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 {code} AM/SSH sees that the RS that died was carrying meta, but the assignment RPC request was still not sent: {code} 2013-10-03 23:30:08,791 DEBUG [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] master.AssignmentManager: Checking region=hbase:meta,,1.1588230740, zk server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 current=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820, matches=true 2013-10-03 23:30:08,791 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] handler.MetaServerShutdownHandler: Server gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 was carrying META. Trying to assign. 2013-10-03 23:30:08,791 DEBUG [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] master.RegionStates: Offline 1588230740 with current state=PENDING_OPEN, expected state=OFFLINE/SPLITTING/MERGING 2013-10-03 23:30:08,791 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] master.RegionStates: Transitioned {1588230740 state=PENDING_OPEN, ts=1380843006611, server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820} to {1588230740 state=OFFLINE, ts=1380843008791, server=null} 2013-10-03 23:30:09,809 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2]
[jira] [Updated] (HBASE-10249) Intermittent TestReplicationSyncUpTool failure
[ https://issues.apache.org/jira/browse/HBASE-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans updated HBASE-10249: --- Attachment: HBASE-10249-0.94-v0.patch Two things I've noticed that I'm fixing in the attached patch for 0.94: - The multi path doesn't check if the znode that we're moving is ours, so we end up deleting our own queue (!!!). - Looking at the link for the latest failure, we do check that in the non-multi path but when we do it it takes a few hundreds of milliseconds. It seems that they all end up counting towards the 10 seconds limit that we have in order to clear all the queues. I moved the checking of the path before the sleeping in NodeFailoverWorker.run so that we don't waste time on ourselves. Regardless, this code is racy: {noformat} int numberOfOldSource = 1; // default wait once while (numberOfOldSource 0) { Thread.sleep(SLEEP_TIME); numberOfOldSource = manager.getOldSources().size(); } {noformat} We basically say let's wait 10 seconds and see if we can transfer _all_ the queues during that time. If some queues are still being transferred, and the others we did transfer are already done, they won't count as an oldSource, and so we can miss them. The most extreme case is moving 1 queue with enough znodes that it takes more than 10 seconds to move (I've seen that), in which case the sync tool will stop even though there might be many more queues to transfer. Intermittent TestReplicationSyncUpTool failure -- Key: HBASE-10249 URL: https://issues.apache.org/jira/browse/HBASE-10249 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Demai Ni Fix For: 0.98.0, 0.96.2, 0.99.0, 0.94.17 Attachments: HBASE-10249-0.94-v0.patch, HBASE-10249-trunk-v0.patch New issue to keep track of this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9360) Enable 0.94 - 0.96 replication to minimize upgrade down time
[ https://issues.apache.org/jira/browse/HBASE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874048#comment-13874048 ] Nick Dimiduk commented on HBASE-9360: - Should this ticket be resolved then? Is this code we can put in a contrib directory, similar to our dev-support directory, or are we happy with it in an external repo? Enable 0.94 - 0.96 replication to minimize upgrade down time - Key: HBASE-9360 URL: https://issues.apache.org/jira/browse/HBASE-9360 Project: HBase Issue Type: Brainstorming Components: migration Affects Versions: 0.98.0, 0.96.0 Reporter: Jeffrey Zhong As we know 0.96 is a singularity release, as of today a 0.94 hbase user has to do in-place upgrade: make corresponding client changes, recompile client application code, fully shut down existing 0.94 hbase cluster, deploy 0.96 binary, run upgrade script and then start the upgraded cluster. You can image the down time will be extended if something is wrong in between. To minimize the down time, another possible way is to setup a secondary 0.96 cluster and then setup replication between the existing 0.94 cluster and the new 0.96 slave cluster. Once the 0.96 cluster is synced, a user can switch the traffic to the 0.96 cluster and decommission the old one. The ideal steps will be: 1) Setup a 0.96 cluster 2) Setup replication between a running 0.94 cluster to the newly created 0.96 cluster 3) Wait till they're in sync in replication 4) Starts duplicated writes to both 0.94 and 0.96 clusters(could stop relocation now) 5) Forward read traffic to the slave 0.96 cluster 6) After a certain period, stop writes to the original 0.94 cluster if everything is good and completes upgrade To get us there, there are two tasks: 1) Enable replication from 0.94 - 0.96 I've run the idea with [~jdcryans], [~devaraj] and [~ndimiduk]. Currently it seems the best approach is to build a very similar service or on top of https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep with support three commands replicateLogEntries, multi and delete. Inside the three commands, we just pass down the corresponding requests to the destination 0.96 cluster as a bridge. The reason to support the multi and delete is for CopyTable to copy data from a 0.94 cluster to a 0.96 one. The other approach is to provide limited support of 0.94 RPC protocol in 0.96. While an issue on this is that a 0.94 client needs to talk to zookeeper firstly before it can connect to a 0.96 region server. Therefore, we need a faked Zookeeper setup in front of a 0.96 cluster for a 0.94 client to connect. It may also pollute 0.96 code base with 0.94 RPC code. 2) To support writes to a 0.96 cluster and a 0.94 at the same time, we need to load both hbase clients into one single JVM using different class loader. Let me know if you think this is worth to do and any better approach we could take. Thanks! -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9721) RegionServer should not accept regionOpen RPC intended for another(previous) server
[ https://issues.apache.org/jira/browse/HBASE-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874049#comment-13874049 ] Enis Soztutar commented on HBASE-9721: -- Reverted both on trunk and 0.98. RegionServer should not accept regionOpen RPC intended for another(previous) server --- Key: HBASE-9721 URL: https://issues.apache.org/jira/browse/HBASE-9721 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.99.0 Attachments: hbase-9721_v0.patch, hbase-9721_v1.patch, hbase-9721_v2.patch, hbase-9721_v3.patch On a test cluster, this following events happened with ITBLL and CM leading to meta being unavailable until master is restarted. An RS carrying meta died, and master assigned the region to one of the RSs. {code} 2013-10-03 23:30:06,611 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] master.AssignmentManager: Assigning hbase:meta,,1.1588230740 to gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 2013-10-03 23:30:06,611 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] master.RegionStates: Transitioned {1588230740 state=OFFLINE, ts=1380843006601, server=null} to {1588230740 state=PENDING_OPEN, ts=1380843006611, server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820} 2013-10-03 23:30:06,611 DEBUG [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] master.ServerManager: New admin connection to gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 {code} At the same time, the RS that meta recently got assigned also died (due to CM), and restarted: {code} 2013-10-03 23:30:07,636 DEBUG [RpcServer.handler=17,port=6] master.ServerManager: REPORT: Server gs-hdp2-secure-1380781860-hbase-8.cs1cloud.internal,60020,1380843002494 came back up, removed it from the dead servers list 2013-10-03 23:30:08,769 INFO [RpcServer.handler=18,port=6] master.ServerManager: Triggering server recovery; existingServer gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 looks stale, new server:gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380843006362 2013-10-03 23:30:08,771 DEBUG [RpcServer.handler=18,port=6] master.AssignmentManager: Checking region=hbase:meta,,1.1588230740, zk server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 current=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820, matches=true 2013-10-03 23:30:08,771 DEBUG [RpcServer.handler=18,port=6] master.ServerManager: Added=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 to dead servers, submitted shutdown handler to be executed meta=true 2013-10-03 23:30:08,771 INFO [RpcServer.handler=18,port=6] master.ServerManager: Registering server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380843006362 2013-10-03 23:30:08,772 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] handler.MetaServerShutdownHandler: Splitting hbase:meta logs for gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 {code} AM/SSH sees that the RS that died was carrying meta, but the assignment RPC request was still not sent: {code} 2013-10-03 23:30:08,791 DEBUG [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] master.AssignmentManager: Checking region=hbase:meta,,1.1588230740, zk server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 current=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820, matches=true 2013-10-03 23:30:08,791 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] handler.MetaServerShutdownHandler: Server gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 was carrying META. Trying to assign. 2013-10-03 23:30:08,791 DEBUG [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] master.RegionStates: Offline 1588230740 with current state=PENDING_OPEN, expected state=OFFLINE/SPLITTING/MERGING 2013-10-03 23:30:08,791 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] master.RegionStates: Transitioned {1588230740 state=PENDING_OPEN, ts=1380843006611, server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820} to {1588230740 state=OFFLINE, ts=1380843008791, server=null} 2013-10-03 23:30:09,809 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2]
[jira] [Resolved] (HBASE-10341) TestAssignmentManagerOnCluster fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-10341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-10341. Resolution: Fixed Fix Version/s: (was: 0.99.0) (was: 0.98.0) Assignee: (was: Andrew Purtell) Resolved by revert of HBASE-9721 TestAssignmentManagerOnCluster fails occasionally - Key: HBASE-10341 URL: https://issues.apache.org/jira/browse/HBASE-10341 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Andrew Purtell Attachments: good-test.log.gz, test.log.gz TestAssignmentManagerOnCluster has recently started failing occasionally in 0.98 branch unit test runs. No failure trace available yet. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10249) Intermittent TestReplicationSyncUpTool failure
[ https://issues.apache.org/jira/browse/HBASE-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans updated HBASE-10249: --- Attachment: HBASE-10249-0.94-v1.patch Actually the last patch's new method wasn't named correctly, new patch includes this cosmetic change. Intermittent TestReplicationSyncUpTool failure -- Key: HBASE-10249 URL: https://issues.apache.org/jira/browse/HBASE-10249 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Demai Ni Fix For: 0.98.0, 0.96.2, 0.99.0, 0.94.17 Attachments: HBASE-10249-0.94-v0.patch, HBASE-10249-0.94-v1.patch, HBASE-10249-trunk-v0.patch New issue to keep track of this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10364) Allow configuration option for parent znode in LoadTestTool
[ https://issues.apache.org/jira/browse/HBASE-10364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874070#comment-13874070 ] Hadoop QA commented on HBASE-10364: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12623468/10364-v2.txt against trunk revision . ATTACHMENT ID: 12623468 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8448//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8448//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8448//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8448//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8448//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8448//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8448//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8448//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8448//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8448//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8448//console This message is automatically generated. Allow configuration option for parent znode in LoadTestTool --- Key: HBASE-10364 URL: https://issues.apache.org/jira/browse/HBASE-10364 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.98.0, 0.99.0 Attachments: 10364-v1.txt, 10364-v2.txt I saw the following running Hoya functional test which involves LoadTestTool: {code} 2014-01-16 19:06:03,443 [Thread-2] INFO client.HConnectionManager$HConnectionImplementation (HConnectionManager.java:makeStub(1572)) - getMaster attempt 8 of 35 failed; retrying after sleep of 10098, exception=org.apache.hadoop.hbase.MasterNotRunningException: The node /hbase is not in ZooKeeper. It should have been written by themaster. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. {code} LoadTestTool was reading from correct zookeeper quorum but it wasn't able to find parent znode. An option should be added to LoadTestTool so that user can specify parent znode in zookeeper. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10366) 0.94 filterRow() may be skipped in 0.96(or onwards) code
Jeffrey Zhong created HBASE-10366: - Summary: 0.94 filterRow() may be skipped in 0.96(or onwards) code Key: HBASE-10366 URL: https://issues.apache.org/jira/browse/HBASE-10366 Project: HBase Issue Type: Bug Components: Filters Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Priority: Critical Fix For: 0.98.0, 0.96.2 HBASE-6429 combines both filterRow filterRow(ListKeyValue kvs) functions in Filter. While 0.94 code or older, it may not implement hasFilterRow as HBase-6429 expected because hasFilterRow only returns true when filterRow(ListKeyValue kvs) is overridden not the filterRow(). Therefore, the filterRow() will be skipped. Since we don't ask 0.94 users to update their existing filter code, the issue will cause scan returns unexpected keyvalues and break the backward compatibility. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10364) Allow configuration option for parent znode in LoadTestTool
[ https://issues.apache.org/jira/browse/HBASE-10364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874087#comment-13874087 ] Ted Yu commented on HBASE-10364: Integrated to 0.98 and trunk. Thanks for the review, Andy. Allow configuration option for parent znode in LoadTestTool --- Key: HBASE-10364 URL: https://issues.apache.org/jira/browse/HBASE-10364 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.98.0, 0.99.0 Attachments: 10364-v1.txt, 10364-v2.txt I saw the following running Hoya functional test which involves LoadTestTool: {code} 2014-01-16 19:06:03,443 [Thread-2] INFO client.HConnectionManager$HConnectionImplementation (HConnectionManager.java:makeStub(1572)) - getMaster attempt 8 of 35 failed; retrying after sleep of 10098, exception=org.apache.hadoop.hbase.MasterNotRunningException: The node /hbase is not in ZooKeeper. It should have been written by themaster. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. {code} LoadTestTool was reading from correct zookeeper quorum but it wasn't able to find parent znode. An option should be added to LoadTestTool so that user can specify parent znode in zookeeper. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10366) 0.94 filterRow() may be skipped in 0.96(or onwards) code
[ https://issues.apache.org/jira/browse/HBASE-10366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-10366: -- Status: Patch Available (was: Open) 0.94 filterRow() may be skipped in 0.96(or onwards) code Key: HBASE-10366 URL: https://issues.apache.org/jira/browse/HBASE-10366 Project: HBase Issue Type: Bug Components: Filters Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Priority: Critical Fix For: 0.98.0, 0.96.2 Attachments: hbase-10366.patch HBASE-6429 combines both filterRow filterRow(ListKeyValue kvs) functions in Filter. While 0.94 code or older, it may not implement hasFilterRow as HBase-6429 expected because hasFilterRow only returns true when filterRow(ListKeyValue kvs) is overridden not the filterRow(). Therefore, the filterRow() will be skipped. Since we don't ask 0.94 users to update their existing filter code, the issue will cause scan returns unexpected keyvalues and break the backward compatibility. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10366) 0.94 filterRow() may be skipped in 0.96(or onwards) code
[ https://issues.apache.org/jira/browse/HBASE-10366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-10366: -- Attachment: hbase-10366.patch 0.94 filterRow() may be skipped in 0.96(or onwards) code Key: HBASE-10366 URL: https://issues.apache.org/jira/browse/HBASE-10366 Project: HBase Issue Type: Bug Components: Filters Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Priority: Critical Fix For: 0.98.0, 0.96.2 Attachments: hbase-10366.patch HBASE-6429 combines both filterRow filterRow(ListKeyValue kvs) functions in Filter. While 0.94 code or older, it may not implement hasFilterRow as HBase-6429 expected because hasFilterRow only returns true when filterRow(ListKeyValue kvs) is overridden not the filterRow(). Therefore, the filterRow() will be skipped. Since we don't ask 0.94 users to update their existing filter code, the issue will cause scan returns unexpected keyvalues and break the backward compatibility. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10017) HRegionPartitioner, rows directed to last partition are wrongly mapped.
[ https://issues.apache.org/jira/browse/HBASE-10017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874103#comment-13874103 ] Nick Dimiduk commented on HBASE-10017: -- The wind has fallen from the sails on this issue. Can the reported data loss be confirmed and corrected (my attempts were unsuccessful)? If so, let's pump to blocker and get it fixed for 0.98. HRegionPartitioner, rows directed to last partition are wrongly mapped. --- Key: HBASE-10017 URL: https://issues.apache.org/jira/browse/HBASE-10017 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.94.6 Reporter: Roman Nikitchenko Priority: Critical Attachments: HBASE-10017-r1544633.patch, HBASE-10017-r1544633.patch, TEST-org.apache.hadoop.hbase.mapreduce.IntegrationTestBulkLoad.xml.gz, TestHRegionServerBulkLoad-more-splits.txt, TestHRegionServerBulkLoad-more-splits.txt, patchSiteOutput.txt Inside HRegionPartitioner class there is getPartition() method which should map first numPartitions regions to appropriate partitions 1:1. But based on condition last region is hashed which could lead to last reducer not having any data. This is considered serious issue. I reproduced this only starting from 16 regions per table. Original defect was found in 0.94.6 but at least today's trunk and 0.91 branch head have the same HRegionPartitioner code in this part which means the same issue. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10366) 0.94 filterRow() may be skipped in 0.96(or onwards) code
[ https://issues.apache.org/jira/browse/HBASE-10366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Taylor updated HBASE-10366: - Tags: Phoenix 0.94 filterRow() may be skipped in 0.96(or onwards) code Key: HBASE-10366 URL: https://issues.apache.org/jira/browse/HBASE-10366 Project: HBase Issue Type: Bug Components: Filters Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Priority: Critical Fix For: 0.98.0, 0.96.2 Attachments: hbase-10366.patch HBASE-6429 combines both filterRow filterRow(ListKeyValue kvs) functions in Filter. While 0.94 code or older, it may not implement hasFilterRow as HBase-6429 expected because hasFilterRow only returns true when filterRow(ListKeyValue kvs) is overridden not the filterRow(). Therefore, the filterRow() will be skipped. Since we don't ask 0.94 users to update their existing filter code, the issue will cause scan returns unexpected keyvalues and break the backward compatibility. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10367) RegionServer graceful stop / decommissioning
Enis Soztutar created HBASE-10367: - Summary: RegionServer graceful stop / decommissioning Key: HBASE-10367 URL: https://issues.apache.org/jira/browse/HBASE-10367 Project: HBase Issue Type: Improvement Reporter: Enis Soztutar Right now, we have a weird way of node decommissioning / graceful stop, which is a graceful_stop.sh bash script, and a region_mover ruby script, and some draining server support which you have to manually write to a znode (really!). Also draining servers is only partially supported in LB operations (LB does take that into account for roundRobin assignment, but not for normal balance) See http://hbase.apache.org/book/node.management.html and HBASE-3071 I think we should support graceful stop as a first class citizen. Thinking about it, it seems that the difference between regionserver stop and graceful stop is that regionserver stop will close the regions, but the master will only assign them after the znode is deleted. In the new master design (or even before), if we allow RS to be able to close regions on its own (without master initiating it), then graceful stop becomes regular stop. The RS already closes the regions cleanly, and will reject new region assignments, so that we don't need much of the balancer or draining server trickery. This ties into the new master/AM redesign (HBASE-5487), but still deserves it's own jira. Let's use this to brainstorm on the design. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10249) Intermittent TestReplicationSyncUpTool failure
[ https://issues.apache.org/jira/browse/HBASE-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874124#comment-13874124 ] Jonathan Hsieh commented on HBASE-10249: Is the patch relevent and does it apply to trunk/0.98/0.96 as well? Intermittent TestReplicationSyncUpTool failure -- Key: HBASE-10249 URL: https://issues.apache.org/jira/browse/HBASE-10249 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Demai Ni Fix For: 0.98.0, 0.96.2, 0.99.0, 0.94.17 Attachments: HBASE-10249-0.94-v0.patch, HBASE-10249-0.94-v1.patch, HBASE-10249-trunk-v0.patch New issue to keep track of this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (HBASE-10249) Intermittent TestReplicationSyncUpTool failure
[ https://issues.apache.org/jira/browse/HBASE-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874124#comment-13874124 ] Jonathan Hsieh edited comment on HBASE-10249 at 1/16/14 11:19 PM: -- Is the patch relevent to and does it apply to trunk/0.98/0.96 as well? was (Author: jmhsieh): Is the patch relevent and does it apply to trunk/0.98/0.96 as well? Intermittent TestReplicationSyncUpTool failure -- Key: HBASE-10249 URL: https://issues.apache.org/jira/browse/HBASE-10249 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Demai Ni Fix For: 0.98.0, 0.96.2, 0.99.0, 0.94.17 Attachments: HBASE-10249-0.94-v0.patch, HBASE-10249-0.94-v1.patch, HBASE-10249-trunk-v0.patch New issue to keep track of this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10364) Allow configuration option for parent znode in LoadTestTool
[ https://issues.apache.org/jira/browse/HBASE-10364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10364: --- Resolution: Fixed Status: Resolved (was: Patch Available) Allow configuration option for parent znode in LoadTestTool --- Key: HBASE-10364 URL: https://issues.apache.org/jira/browse/HBASE-10364 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.98.0, 0.99.0 Attachments: 10364-v1.txt, 10364-v2.txt I saw the following running Hoya functional test which involves LoadTestTool: {code} 2014-01-16 19:06:03,443 [Thread-2] INFO client.HConnectionManager$HConnectionImplementation (HConnectionManager.java:makeStub(1572)) - getMaster attempt 8 of 35 failed; retrying after sleep of 10098, exception=org.apache.hadoop.hbase.MasterNotRunningException: The node /hbase is not in ZooKeeper. It should have been written by themaster. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. {code} LoadTestTool was reading from correct zookeeper quorum but it wasn't able to find parent znode. An option should be added to LoadTestTool so that user can specify parent znode in zookeeper. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10368) Add Mutation.setWriteToWAL() back to 0.98
Enis Soztutar created HBASE-10368: - Summary: Add Mutation.setWriteToWAL() back to 0.98 Key: HBASE-10368 URL: https://issues.apache.org/jira/browse/HBASE-10368 Project: HBase Issue Type: Improvement Components: Client Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.99.0 This is similar to HBASE-10339, where we deprecated the API Mutation.setWriteToWAL() in 0.96 and removed it in 0.98. Although 0.94.7+ contains the Durability API which replaces this, for Pig and other tools to be able to compile with 0.94.7- and 0.98 without shims / reflection, the safest way is to add the API back to 0.98 in deprecated mode. [~daijy] says that it may still be important to be able to compile with 0.94.7-, which I kind of agree. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10368) Add Mutation.setWriteToWAL() back to 0.98
[ https://issues.apache.org/jira/browse/HBASE-10368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-10368: -- Attachment: hbase-10368_v1.patch Attaching simple patch. Add Mutation.setWriteToWAL() back to 0.98 - Key: HBASE-10368 URL: https://issues.apache.org/jira/browse/HBASE-10368 Project: HBase Issue Type: Improvement Components: Client Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.99.0 Attachments: hbase-10368_v1.patch This is similar to HBASE-10339, where we deprecated the API Mutation.setWriteToWAL() in 0.96 and removed it in 0.98. Although 0.94.7+ contains the Durability API which replaces this, for Pig and other tools to be able to compile with 0.94.7- and 0.98 without shims / reflection, the safest way is to add the API back to 0.98 in deprecated mode. [~daijy] says that it may still be important to be able to compile with 0.94.7-, which I kind of agree. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10368) Add Mutation.setWriteToWAL() back to 0.98
[ https://issues.apache.org/jira/browse/HBASE-10368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-10368: -- Status: Patch Available (was: Open) Add Mutation.setWriteToWAL() back to 0.98 - Key: HBASE-10368 URL: https://issues.apache.org/jira/browse/HBASE-10368 Project: HBase Issue Type: Improvement Components: Client Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.99.0 Attachments: hbase-10368_v1.patch This is similar to HBASE-10339, where we deprecated the API Mutation.setWriteToWAL() in 0.96 and removed it in 0.98. Although 0.94.7+ contains the Durability API which replaces this, for Pig and other tools to be able to compile with 0.94.7- and 0.98 without shims / reflection, the safest way is to add the API back to 0.98 in deprecated mode. [~daijy] says that it may still be important to be able to compile with 0.94.7-, which I kind of agree. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10249) Intermittent TestReplicationSyncUpTool failure
[ https://issues.apache.org/jira/browse/HBASE-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874152#comment-13874152 ] Jean-Daniel Cryans commented on HBASE-10249: In trunk, 0.98, and 0.96 the check is in place but it's done after we sleep so hitting the race is what makes it fail. Intermittent TestReplicationSyncUpTool failure -- Key: HBASE-10249 URL: https://issues.apache.org/jira/browse/HBASE-10249 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Demai Ni Fix For: 0.98.0, 0.96.2, 0.99.0, 0.94.17 Attachments: HBASE-10249-0.94-v0.patch, HBASE-10249-0.94-v1.patch, HBASE-10249-trunk-v0.patch New issue to keep track of this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10369) LabelExpander#createLabels() should close scanner in finally clause
Ted Yu created HBASE-10369: -- Summary: LabelExpander#createLabels() should close scanner in finally clause Key: HBASE-10369 URL: https://issues.apache.org/jira/browse/HBASE-10369 Project: HBase Issue Type: Bug Reporter: Ted Yu Priority: Minor Here is related code: {code} while (true) { Result next = scanner.next(); if (next == null) { break; } byte[] row = next.getRow(); byte[] value = next.getValue(LABELS_TABLE_FAMILY, LABEL_QUALIFIER); labels.put(Bytes.toString(value), Bytes.toInt(row)); } scanner.close(); } finally { {code} If scanner.next() throws exception, scanner would be left open. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10367) RegionServer graceful stop / decommissioning
[ https://issues.apache.org/jira/browse/HBASE-10367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874165#comment-13874165 ] Nick Dimiduk commented on HBASE-10367: -- The usability of graceful stop just came up in a conversation I had today as well. Thanks for bringing it up, @enis! RegionServer graceful stop / decommissioning Key: HBASE-10367 URL: https://issues.apache.org/jira/browse/HBASE-10367 Project: HBase Issue Type: Improvement Reporter: Enis Soztutar Right now, we have a weird way of node decommissioning / graceful stop, which is a graceful_stop.sh bash script, and a region_mover ruby script, and some draining server support which you have to manually write to a znode (really!). Also draining servers is only partially supported in LB operations (LB does take that into account for roundRobin assignment, but not for normal balance) See http://hbase.apache.org/book/node.management.html and HBASE-3071 I think we should support graceful stop as a first class citizen. Thinking about it, it seems that the difference between regionserver stop and graceful stop is that regionserver stop will close the regions, but the master will only assign them after the znode is deleted. In the new master design (or even before), if we allow RS to be able to close regions on its own (without master initiating it), then graceful stop becomes regular stop. The RS already closes the regions cleanly, and will reject new region assignments, so that we don't need much of the balancer or draining server trickery. This ties into the new master/AM redesign (HBASE-5487), but still deserves it's own jira. Let's use this to brainstorm on the design. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10369) LabelExpander#createLabels() should close scanner in finally clause
[ https://issues.apache.org/jira/browse/HBASE-10369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10369: --- Attachment: 10369-v1.txt LabelExpander#createLabels() should close scanner in finally clause --- Key: HBASE-10369 URL: https://issues.apache.org/jira/browse/HBASE-10369 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10369-v1.txt Here is related code: {code} while (true) { Result next = scanner.next(); if (next == null) { break; } byte[] row = next.getRow(); byte[] value = next.getValue(LABELS_TABLE_FAMILY, LABEL_QUALIFIER); labels.put(Bytes.toString(value), Bytes.toInt(row)); } scanner.close(); } finally { {code} If scanner.next() throws exception, scanner would be left open. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HBASE-10369) LabelExpander#createLabels() should close scanner in finally clause
[ https://issues.apache.org/jira/browse/HBASE-10369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reassigned HBASE-10369: -- Assignee: Ted Yu LabelExpander#createLabels() should close scanner in finally clause --- Key: HBASE-10369 URL: https://issues.apache.org/jira/browse/HBASE-10369 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10369-v1.txt Here is related code: {code} while (true) { Result next = scanner.next(); if (next == null) { break; } byte[] row = next.getRow(); byte[] value = next.getValue(LABELS_TABLE_FAMILY, LABEL_QUALIFIER); labels.put(Bytes.toString(value), Bytes.toInt(row)); } scanner.close(); } finally { {code} If scanner.next() throws exception, scanner would be left open. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10369) LabelExpander#createLabels() should close scanner in finally clause
[ https://issues.apache.org/jira/browse/HBASE-10369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10369: --- Status: Patch Available (was: Open) LabelExpander#createLabels() should close scanner in finally clause --- Key: HBASE-10369 URL: https://issues.apache.org/jira/browse/HBASE-10369 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10369-v1.txt Here is related code: {code} while (true) { Result next = scanner.next(); if (next == null) { break; } byte[] row = next.getRow(); byte[] value = next.getValue(LABELS_TABLE_FAMILY, LABEL_QUALIFIER); labels.put(Bytes.toString(value), Bytes.toInt(row)); } scanner.close(); } finally { {code} If scanner.next() throws exception, scanner would be left open. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10249) Intermittent TestReplicationSyncUpTool failure
[ https://issues.apache.org/jira/browse/HBASE-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans updated HBASE-10249: --- Hadoop Flags: (was: Reviewed) Status: Patch Available (was: Reopened) Intermittent TestReplicationSyncUpTool failure -- Key: HBASE-10249 URL: https://issues.apache.org/jira/browse/HBASE-10249 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Demai Ni Fix For: 0.98.0, 0.96.2, 0.99.0, 0.94.17 Attachments: HBASE-10249-0.94-v0.patch, HBASE-10249-0.94-v1.patch, HBASE-10249-trunk-v0.patch, HBASE-10249-trunk-v1.patch New issue to keep track of this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10249) Intermittent TestReplicationSyncUpTool failure
[ https://issues.apache.org/jira/browse/HBASE-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans updated HBASE-10249: --- Attachment: HBASE-10249-trunk-v1.patch Patch for trunk, kind of the same thing. Doing it I also saw that I missed something the isThisOurZnode in the v1 0.94 patch, shouldn't talk about parents (not a functional change though). Intermittent TestReplicationSyncUpTool failure -- Key: HBASE-10249 URL: https://issues.apache.org/jira/browse/HBASE-10249 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Demai Ni Fix For: 0.98.0, 0.96.2, 0.99.0, 0.94.17 Attachments: HBASE-10249-0.94-v0.patch, HBASE-10249-0.94-v1.patch, HBASE-10249-trunk-v0.patch, HBASE-10249-trunk-v1.patch New issue to keep track of this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10368) Add Mutation.setWriteToWAL() back to 0.98
[ https://issues.apache.org/jira/browse/HBASE-10368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874179#comment-13874179 ] Andrew Purtell commented on HBASE-10368: +1, same reason as for HBASE-10339 Add Mutation.setWriteToWAL() back to 0.98 - Key: HBASE-10368 URL: https://issues.apache.org/jira/browse/HBASE-10368 Project: HBase Issue Type: Improvement Components: Client Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.99.0 Attachments: hbase-10368_v1.patch This is similar to HBASE-10339, where we deprecated the API Mutation.setWriteToWAL() in 0.96 and removed it in 0.98. Although 0.94.7+ contains the Durability API which replaces this, for Pig and other tools to be able to compile with 0.94.7- and 0.98 without shims / reflection, the safest way is to add the API back to 0.98 in deprecated mode. [~daijy] says that it may still be important to be able to compile with 0.94.7-, which I kind of agree. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10363) [0.94] TestInputSampler and TestInputSamplerTool fail under hadoop 2.0/23 profiles.
[ https://issues.apache.org/jira/browse/HBASE-10363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874193#comment-13874193 ] Andrew Purtell commented on HBASE-10363: [~yuzhih...@gmail.com] says this shows up here also: https://builds.apache.org/job/HBase-0.94-on-Hadoop-2/1/testReport/junit/org.apache.hadoop.hbase.mapreduce.hadoopbackport/TestInputSampler/testSplitSampler/ [0.94] TestInputSampler and TestInputSamplerTool fail under hadoop 2.0/23 profiles. --- Key: HBASE-10363 URL: https://issues.apache.org/jira/browse/HBASE-10363 Project: HBase Issue Type: Bug Affects Versions: 0.94.15 Reporter: Jonathan Hsieh Priority: Critical Fix For: 0.94.16 From tip of 0.94 and from 0.94.15. {code} jon@swoop:~/proj/hbase-0.94$ mvn clean test -Dhadoop.profile=2.0 -Dtest=TestInputSampler,TestInputSamplerTool -PlocalTests ... Running org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSamplerTool Tests run: 4, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 3.718 sec FAILURE! Running org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSampler Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.666 sec FAILURE! Results : Tests in error: testSplitInterval(org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSamplerTool): Failed getting constructor testSplitRamdom(org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSamplerTool): Failed getting constructor testSplitSample(org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSamplerTool): Failed getting constructor testSplitSampler(org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSampler): Failed getting constructor testIntervalSampler(org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSampler): Failed getting constructor Tests run: 6, Failures: 0, Errors: 5, Skipped: 0 {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9721) RegionServer should not accept regionOpen RPC intended for another(previous) server
[ https://issues.apache.org/jira/browse/HBASE-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874196#comment-13874196 ] Andrew Purtell commented on HBASE-9721: --- Thanks Enis. RegionServer should not accept regionOpen RPC intended for another(previous) server --- Key: HBASE-9721 URL: https://issues.apache.org/jira/browse/HBASE-9721 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.99.0 Attachments: hbase-9721_v0.patch, hbase-9721_v1.patch, hbase-9721_v2.patch, hbase-9721_v3.patch On a test cluster, this following events happened with ITBLL and CM leading to meta being unavailable until master is restarted. An RS carrying meta died, and master assigned the region to one of the RSs. {code} 2013-10-03 23:30:06,611 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] master.AssignmentManager: Assigning hbase:meta,,1.1588230740 to gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 2013-10-03 23:30:06,611 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] master.RegionStates: Transitioned {1588230740 state=OFFLINE, ts=1380843006601, server=null} to {1588230740 state=PENDING_OPEN, ts=1380843006611, server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820} 2013-10-03 23:30:06,611 DEBUG [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] master.ServerManager: New admin connection to gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 {code} At the same time, the RS that meta recently got assigned also died (due to CM), and restarted: {code} 2013-10-03 23:30:07,636 DEBUG [RpcServer.handler=17,port=6] master.ServerManager: REPORT: Server gs-hdp2-secure-1380781860-hbase-8.cs1cloud.internal,60020,1380843002494 came back up, removed it from the dead servers list 2013-10-03 23:30:08,769 INFO [RpcServer.handler=18,port=6] master.ServerManager: Triggering server recovery; existingServer gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 looks stale, new server:gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380843006362 2013-10-03 23:30:08,771 DEBUG [RpcServer.handler=18,port=6] master.AssignmentManager: Checking region=hbase:meta,,1.1588230740, zk server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 current=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820, matches=true 2013-10-03 23:30:08,771 DEBUG [RpcServer.handler=18,port=6] master.ServerManager: Added=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 to dead servers, submitted shutdown handler to be executed meta=true 2013-10-03 23:30:08,771 INFO [RpcServer.handler=18,port=6] master.ServerManager: Registering server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380843006362 2013-10-03 23:30:08,772 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] handler.MetaServerShutdownHandler: Splitting hbase:meta logs for gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 {code} AM/SSH sees that the RS that died was carrying meta, but the assignment RPC request was still not sent: {code} 2013-10-03 23:30:08,791 DEBUG [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] master.AssignmentManager: Checking region=hbase:meta,,1.1588230740, zk server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 current=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820, matches=true 2013-10-03 23:30:08,791 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] handler.MetaServerShutdownHandler: Server gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 was carrying META. Trying to assign. 2013-10-03 23:30:08,791 DEBUG [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] master.RegionStates: Offline 1588230740 with current state=PENDING_OPEN, expected state=OFFLINE/SPLITTING/MERGING 2013-10-03 23:30:08,791 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] master.RegionStates: Transitioned {1588230740 state=PENDING_OPEN, ts=1380843006611, server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820} to {1588230740 state=OFFLINE, ts=1380843008791, server=null} 2013-10-03 23:30:09,809 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] zookeeper.ZooKeeperNodeTracker:
[jira] [Commented] (HBASE-10369) LabelExpander#createLabels() should close scanner in finally clause
[ https://issues.apache.org/jira/browse/HBASE-10369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874199#comment-13874199 ] Andrew Purtell commented on HBASE-10369: +1 for trunk and 0.98 LabelExpander#createLabels() should close scanner in finally clause --- Key: HBASE-10369 URL: https://issues.apache.org/jira/browse/HBASE-10369 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10369-v1.txt Here is related code: {code} while (true) { Result next = scanner.next(); if (next == null) { break; } byte[] row = next.getRow(); byte[] value = next.getValue(LABELS_TABLE_FAMILY, LABEL_QUALIFIER); labels.put(Bytes.toString(value), Bytes.toInt(row)); } scanner.close(); } finally { {code} If scanner.next() throws exception, scanner would be left open. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10249) Intermittent TestReplicationSyncUpTool failure
[ https://issues.apache.org/jira/browse/HBASE-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874202#comment-13874202 ] Andrew Purtell commented on HBASE-10249: I skimmed the trunk patch, +1 for 0.98 Intermittent TestReplicationSyncUpTool failure -- Key: HBASE-10249 URL: https://issues.apache.org/jira/browse/HBASE-10249 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Demai Ni Fix For: 0.98.0, 0.96.2, 0.99.0, 0.94.17 Attachments: HBASE-10249-0.94-v0.patch, HBASE-10249-0.94-v1.patch, HBASE-10249-trunk-v0.patch, HBASE-10249-trunk-v1.patch New issue to keep track of this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10366) 0.94 filterRow() may be skipped in 0.96(or onwards) code
[ https://issues.apache.org/jira/browse/HBASE-10366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874203#comment-13874203 ] Hadoop QA commented on HBASE-10366: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12623509/hbase-10366.patch against trunk revision . ATTACHMENT ID: 12623509 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8449//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8449//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8449//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8449//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8449//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8449//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8449//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8449//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8449//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8449//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8449//console This message is automatically generated. 0.94 filterRow() may be skipped in 0.96(or onwards) code Key: HBASE-10366 URL: https://issues.apache.org/jira/browse/HBASE-10366 Project: HBase Issue Type: Bug Components: Filters Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Priority: Critical Fix For: 0.98.0, 0.96.2 Attachments: hbase-10366.patch HBASE-6429 combines both filterRow filterRow(ListKeyValue kvs) functions in Filter. While 0.94 code or older, it may not implement hasFilterRow as HBase-6429 expected because hasFilterRow only returns true when filterRow(ListKeyValue kvs) is overridden not the filterRow(). Therefore, the filterRow() will be skipped. Since we don't ask 0.94 users to update their existing filter code, the issue will cause scan returns unexpected keyvalues and break the backward compatibility. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10333) Assignments are not retained on a cluster start
[ https://issues.apache.org/jira/browse/HBASE-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-10333: Resolution: Fixed Fix Version/s: 0.99.0 0.96.2 0.98.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Integrated into trunk, 0.98, and 0.96. Thanks. Assignments are not retained on a cluster start --- Key: HBASE-10333 URL: https://issues.apache.org/jira/browse/HBASE-10333 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.1.1 Reporter: Devaraj Das Assignee: Jimmy Xiang Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: hbase-10333.patch When a cluster is fully shutdown and then started up again with hbase.master.startup.retainassign set to true, I noticed that the assignments are not retained. Upon digging, it seems like HBASE-10101 made a change due to which the server holding the META previously is added to dead-servers (in _HMaster.assignMeta_). Later on, this makes the AssignmentManager think that the master recovered from a failure as opposed to a fresh cluster start (the ServerManager.deadServers list is not empty in the check within _AssignmentManager.processDeadServersAndRegionsInTransition_) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9721) RegionServer should not accept regionOpen RPC intended for another(previous) server
[ https://issues.apache.org/jira/browse/HBASE-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874220#comment-13874220 ] Hudson commented on HBASE-9721: --- SUCCESS: Integrated in HBase-0.98 #87 (See [https://builds.apache.org/job/HBase-0.98/87/]) HBASE-9721 RegionServer should not accept regionOpen RPC intended for another(previous) server -- REVERT (enis: rev 1558937) * /hbase/branches/0.98/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java * /hbase/branches/0.98/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java * /hbase/branches/0.98/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java * /hbase/branches/0.98/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/AdminProtos.java * /hbase/branches/0.98/hbase-protocol/src/main/protobuf/Admin.proto * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestScannersFromClientSide.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManagerOnCluster.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestZKBasedOpenCloseRegion.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerNoMaster.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java RegionServer should not accept regionOpen RPC intended for another(previous) server --- Key: HBASE-9721 URL: https://issues.apache.org/jira/browse/HBASE-9721 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.99.0 Attachments: hbase-9721_v0.patch, hbase-9721_v1.patch, hbase-9721_v2.patch, hbase-9721_v3.patch On a test cluster, this following events happened with ITBLL and CM leading to meta being unavailable until master is restarted. An RS carrying meta died, and master assigned the region to one of the RSs. {code} 2013-10-03 23:30:06,611 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] master.AssignmentManager: Assigning hbase:meta,,1.1588230740 to gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 2013-10-03 23:30:06,611 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] master.RegionStates: Transitioned {1588230740 state=OFFLINE, ts=1380843006601, server=null} to {1588230740 state=PENDING_OPEN, ts=1380843006611, server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820} 2013-10-03 23:30:06,611 DEBUG [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] master.ServerManager: New admin connection to gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 {code} At the same time, the RS that meta recently got assigned also died (due to CM), and restarted: {code} 2013-10-03 23:30:07,636 DEBUG [RpcServer.handler=17,port=6] master.ServerManager: REPORT: Server gs-hdp2-secure-1380781860-hbase-8.cs1cloud.internal,60020,1380843002494 came back up, removed it from the dead servers list 2013-10-03 23:30:08,769 INFO [RpcServer.handler=18,port=6] master.ServerManager: Triggering server recovery; existingServer gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 looks stale, new server:gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380843006362 2013-10-03 23:30:08,771 DEBUG [RpcServer.handler=18,port=6] master.AssignmentManager: Checking region=hbase:meta,,1.1588230740, zk server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 current=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820, matches=true 2013-10-03 23:30:08,771 DEBUG [RpcServer.handler=18,port=6] master.ServerManager: Added=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 to dead servers, submitted shutdown handler to be executed meta=true 2013-10-03 23:30:08,771 INFO [RpcServer.handler=18,port=6] master.ServerManager: Registering
[jira] [Updated] (HBASE-10156) Fix up the HBASE-8755 slowdown when low contention
[ https://issues.apache.org/jira/browse/HBASE-10156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10156: -- Attachment: 10156v16.txt Forgot a divide by 1M. Fixed TestLogRolling (a refactor of a method did not keep our being able to roll over a failed close of WAL if configured allowed) Fix up the HBASE-8755 slowdown when low contention -- Key: HBASE-10156 URL: https://issues.apache.org/jira/browse/HBASE-10156 Project: HBase Issue Type: Sub-task Components: wal Reporter: stack Assignee: stack Attachments: 10156.txt, 10156v10.txt, 10156v11.txt, 10156v12.txt, 10156v12.txt, 10156v13.txt, 10156v16.txt, 10156v2.txt, 10156v3.txt, 10156v4.txt, 10156v5.txt, 10156v6.txt, 10156v7.txt, 10156v9.txt, Disrupting.java HBASE-8755 slows our writes when only a few clients. Fix. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10156) Fix up the HBASE-8755 slowdown when low contention
[ https://issues.apache.org/jira/browse/HBASE-10156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10156: -- Attachment: 10156v17.txt Fix up the HBASE-8755 slowdown when low contention -- Key: HBASE-10156 URL: https://issues.apache.org/jira/browse/HBASE-10156 Project: HBase Issue Type: Sub-task Components: wal Reporter: stack Assignee: stack Attachments: 10156.txt, 10156v10.txt, 10156v11.txt, 10156v12.txt, 10156v12.txt, 10156v13.txt, 10156v16.txt, 10156v17.txt, 10156v2.txt, 10156v3.txt, 10156v4.txt, 10156v5.txt, 10156v6.txt, 10156v7.txt, 10156v9.txt, Disrupting.java HBASE-8755 slows our writes when only a few clients. Fix. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10349) Table became unusable when master balanced its region after table was dropped
[ https://issues.apache.org/jira/browse/HBASE-10349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-10349: Attachment: hbase-10349.patch Table became unusable when master balanced its region after table was dropped - Key: HBASE-10349 URL: https://issues.apache.org/jira/browse/HBASE-10349 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Ted Yu Assignee: Jimmy Xiang Fix For: 0.98.0, 0.99.0 Attachments: 10349-hadoop-hdfs-namenode-hor11n14.gq1.ygridcore.net.zip, 10349-output.log, 10349-v1.txt, 10349-v2.txt, HBASE-10349-meta-test-and-debug.patch, hbase-10349.patch, hbase-hbase-master-hor15n05.gq1.ygridcore.net.log.tar.gz 0.98 was used. This was sequence of events: create 'tablethree_mod' snapshot 'tablethree_mod', 'snapshot_tablethree_mod' disable 'tablethree_mod' 2014-01-15 09:34:51,749 restore_snapshot 'snapshot_tablethree_mod' 2014-01-15 09:35:07,210 enable 'tablethree_mod' 2014-01-15 09:35:46,134 delete_snapshot 'snapshot_tablethree_mod' 2014-01-15 09:41:42,210 disable 'tablethree_mod' 2014-01-15 09:41:43,610 drop 'tablethree_mod' create 'tablethree_mod' For the last table creation request: {code} 2014-01-15 10:03:52,999|beaver.component.hbase|INFO| 'create 'tablethree_mod', {NAME = 'f1', VERSIONS = 3} , {NAME = 'f2', VERSIONS = 3} , {NAME = 'f3', VERSIONS = 3} ' 2014-01-15 10:03:52,999|beaver.component.hbase|INFO| 'exists 'tablethree_mod'' 2014-01-15 10:03:52,999|beaver.component.hbase|INFO| 'put 'tablethree_mod', '0', 'f1:q1', 'value-0', 10' 2014-01-15 10:03:52,999|beaver.component.hbase|INFO| 'put 'tablethree_mod', '1', 'f1:q1', 'value-1', 20' 2014-01-15 10:03:53,000|beaver.component.hbase|INFO| 'put 'tablethree_mod', '2', 'f2:q2', 'value-2', 30' 2014-01-15 10:03:53,000|beaver.component.hbase|INFO| 'put 'tablethree_mod', '3', 'f3:q3', 'value-3', 40' 2014-01-15 10:03:53,000|beaver.component.hbase|INFO| 'put 'tablethree_mod', '4', 'f3:q3', 'value-4', 50' 2014-01-15 10:03:53,000|beaver.component.hbase|INFO|Done writing commands to file. Will execute them now. 2014-01-15 10:03:53,000|beaver.machine|INFO|RUNNING: /usr/lib/hbase/bin/hbase shell /grid/0/tmp/hwqe/artifacts/tmp-471142 2014-01-15 10:03:55,878|beaver.machine|INFO|2014-01-15 10:03:55,878 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 2014-01-15 10:03:57,283|beaver.machine|INFO|2014-01-15 10:03:57,283 WARN [main] conf.Configuration: hbase-site.xml:an attempt to override final parameter: dfs.support.append; Ignoring. 2014-01-15 10:03:57,669|beaver.machine|INFO|2014-01-15 10:03:57,669 WARN [main] conf.Configuration: hbase-site.xml:an attempt to override final parameter: dfs.support.append; Ignoring. 2014-01-15 10:03:57,720|beaver.machine|INFO|2014-01-15 10:03:57,720 WARN [main] conf.Configuration: hbase-site.xml:an attempt to override final parameter: dfs.support.append; Ignoring. 2014-01-15 10:03:57,997|beaver.machine|INFO| 2014-01-15 10:03:57,997|beaver.machine|INFO|ERROR: Table already exists: tablethree_mod! 2014-01-15 10:03:57,997|beaver.machine|INFO| {code} This was an intermittent issue after using Snapshots, a table is not properly dropped / and not able to properly re-create with the same name. And a HRegion is empty or null Error occurs. (When you try to drop the table it says it does not exist, and when you try to create the table it says that it does already exist). {code} 2014-01-15 10:04:02,462|beaver.machine|INFO|ERROR: HRegionInfo was null or empty in hbase:meta, row=keyvalues= {tablethree_mod,,1389778226606.afc82d1ceabbaca36a504b83b65fc0c9./info:seqnumDuringOpen/1389778905355/Put/vlen=8/mvcc=0, tablethree_mod,,1389778226606.afc82d1ceabbaca36a504b83b65fc0c9./info:server/1389778905355/Put/vlen=32/mvcc=0, tablethree_mod,,1389778226606.afc82d1ceabbaca36a504b83b65fc0c9./info:serverstartcode/1389778905355/Put/vlen=8/mvcc=0} {code} Thanks to Huned who discovered this issue. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9360) Enable 0.94 - 0.96 replication to minimize upgrade down time
[ https://issues.apache.org/jira/browse/HBASE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874235#comment-13874235 ] stack commented on HBASE-9360: -- Should at least doc its existence in refguide? Or play a loud trumpet so a bunch get to hear about it (Or post on user list?) Enable 0.94 - 0.96 replication to minimize upgrade down time - Key: HBASE-9360 URL: https://issues.apache.org/jira/browse/HBASE-9360 Project: HBase Issue Type: Brainstorming Components: migration Affects Versions: 0.98.0, 0.96.0 Reporter: Jeffrey Zhong As we know 0.96 is a singularity release, as of today a 0.94 hbase user has to do in-place upgrade: make corresponding client changes, recompile client application code, fully shut down existing 0.94 hbase cluster, deploy 0.96 binary, run upgrade script and then start the upgraded cluster. You can image the down time will be extended if something is wrong in between. To minimize the down time, another possible way is to setup a secondary 0.96 cluster and then setup replication between the existing 0.94 cluster and the new 0.96 slave cluster. Once the 0.96 cluster is synced, a user can switch the traffic to the 0.96 cluster and decommission the old one. The ideal steps will be: 1) Setup a 0.96 cluster 2) Setup replication between a running 0.94 cluster to the newly created 0.96 cluster 3) Wait till they're in sync in replication 4) Starts duplicated writes to both 0.94 and 0.96 clusters(could stop relocation now) 5) Forward read traffic to the slave 0.96 cluster 6) After a certain period, stop writes to the original 0.94 cluster if everything is good and completes upgrade To get us there, there are two tasks: 1) Enable replication from 0.94 - 0.96 I've run the idea with [~jdcryans], [~devaraj] and [~ndimiduk]. Currently it seems the best approach is to build a very similar service or on top of https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep with support three commands replicateLogEntries, multi and delete. Inside the three commands, we just pass down the corresponding requests to the destination 0.96 cluster as a bridge. The reason to support the multi and delete is for CopyTable to copy data from a 0.94 cluster to a 0.96 one. The other approach is to provide limited support of 0.94 RPC protocol in 0.96. While an issue on this is that a 0.94 client needs to talk to zookeeper firstly before it can connect to a 0.96 region server. Therefore, we need a faked Zookeeper setup in front of a 0.96 cluster for a 0.94 client to connect. It may also pollute 0.96 code base with 0.94 RPC code. 2) To support writes to a 0.96 cluster and a 0.94 at the same time, we need to load both hbase clients into one single JVM using different class loader. Let me know if you think this is worth to do and any better approach we could take. Thanks! -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10349) Table became unusable when master balanced its region after table was dropped
[ https://issues.apache.org/jira/browse/HBASE-10349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-10349: Status: Patch Available (was: Open) [~yuzhih...@gmail.com], [~mbertozzi], could you please take a look the patch? Whenever a table is deleted, we remove all its regions from the region states. So we won't assign it by mistake any more (either from balancer, or move region from shell). Table became unusable when master balanced its region after table was dropped - Key: HBASE-10349 URL: https://issues.apache.org/jira/browse/HBASE-10349 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Ted Yu Assignee: Jimmy Xiang Fix For: 0.98.0, 0.99.0 Attachments: 10349-hadoop-hdfs-namenode-hor11n14.gq1.ygridcore.net.zip, 10349-output.log, 10349-v1.txt, 10349-v2.txt, HBASE-10349-meta-test-and-debug.patch, hbase-10349.patch, hbase-hbase-master-hor15n05.gq1.ygridcore.net.log.tar.gz 0.98 was used. This was sequence of events: create 'tablethree_mod' snapshot 'tablethree_mod', 'snapshot_tablethree_mod' disable 'tablethree_mod' 2014-01-15 09:34:51,749 restore_snapshot 'snapshot_tablethree_mod' 2014-01-15 09:35:07,210 enable 'tablethree_mod' 2014-01-15 09:35:46,134 delete_snapshot 'snapshot_tablethree_mod' 2014-01-15 09:41:42,210 disable 'tablethree_mod' 2014-01-15 09:41:43,610 drop 'tablethree_mod' create 'tablethree_mod' For the last table creation request: {code} 2014-01-15 10:03:52,999|beaver.component.hbase|INFO| 'create 'tablethree_mod', {NAME = 'f1', VERSIONS = 3} , {NAME = 'f2', VERSIONS = 3} , {NAME = 'f3', VERSIONS = 3} ' 2014-01-15 10:03:52,999|beaver.component.hbase|INFO| 'exists 'tablethree_mod'' 2014-01-15 10:03:52,999|beaver.component.hbase|INFO| 'put 'tablethree_mod', '0', 'f1:q1', 'value-0', 10' 2014-01-15 10:03:52,999|beaver.component.hbase|INFO| 'put 'tablethree_mod', '1', 'f1:q1', 'value-1', 20' 2014-01-15 10:03:53,000|beaver.component.hbase|INFO| 'put 'tablethree_mod', '2', 'f2:q2', 'value-2', 30' 2014-01-15 10:03:53,000|beaver.component.hbase|INFO| 'put 'tablethree_mod', '3', 'f3:q3', 'value-3', 40' 2014-01-15 10:03:53,000|beaver.component.hbase|INFO| 'put 'tablethree_mod', '4', 'f3:q3', 'value-4', 50' 2014-01-15 10:03:53,000|beaver.component.hbase|INFO|Done writing commands to file. Will execute them now. 2014-01-15 10:03:53,000|beaver.machine|INFO|RUNNING: /usr/lib/hbase/bin/hbase shell /grid/0/tmp/hwqe/artifacts/tmp-471142 2014-01-15 10:03:55,878|beaver.machine|INFO|2014-01-15 10:03:55,878 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 2014-01-15 10:03:57,283|beaver.machine|INFO|2014-01-15 10:03:57,283 WARN [main] conf.Configuration: hbase-site.xml:an attempt to override final parameter: dfs.support.append; Ignoring. 2014-01-15 10:03:57,669|beaver.machine|INFO|2014-01-15 10:03:57,669 WARN [main] conf.Configuration: hbase-site.xml:an attempt to override final parameter: dfs.support.append; Ignoring. 2014-01-15 10:03:57,720|beaver.machine|INFO|2014-01-15 10:03:57,720 WARN [main] conf.Configuration: hbase-site.xml:an attempt to override final parameter: dfs.support.append; Ignoring. 2014-01-15 10:03:57,997|beaver.machine|INFO| 2014-01-15 10:03:57,997|beaver.machine|INFO|ERROR: Table already exists: tablethree_mod! 2014-01-15 10:03:57,997|beaver.machine|INFO| {code} This was an intermittent issue after using Snapshots, a table is not properly dropped / and not able to properly re-create with the same name. And a HRegion is empty or null Error occurs. (When you try to drop the table it says it does not exist, and when you try to create the table it says that it does already exist). {code} 2014-01-15 10:04:02,462|beaver.machine|INFO|ERROR: HRegionInfo was null or empty in hbase:meta, row=keyvalues= {tablethree_mod,,1389778226606.afc82d1ceabbaca36a504b83b65fc0c9./info:seqnumDuringOpen/1389778905355/Put/vlen=8/mvcc=0, tablethree_mod,,1389778226606.afc82d1ceabbaca36a504b83b65fc0c9./info:server/1389778905355/Put/vlen=32/mvcc=0, tablethree_mod,,1389778226606.afc82d1ceabbaca36a504b83b65fc0c9./info:serverstartcode/1389778905355/Put/vlen=8/mvcc=0} {code} Thanks to Huned who discovered this issue. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10346) Add Documentation for stateless scanner
[ https://issues.apache.org/jira/browse/HBASE-10346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vandana Ayyalasomayajula updated HBASE-10346: - Attachment: HBASE-10346.0.patch Patch for stateless scanner documentation. Add Documentation for stateless scanner --- Key: HBASE-10346 URL: https://issues.apache.org/jira/browse/HBASE-10346 Project: HBase Issue Type: Sub-task Components: REST Reporter: Vandana Ayyalasomayajula Assignee: Vandana Ayyalasomayajula Priority: Minor Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10346.0.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10346) Add Documentation for stateless scanner
[ https://issues.apache.org/jira/browse/HBASE-10346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vandana Ayyalasomayajula updated HBASE-10346: - Status: Patch Available (was: Open) Add Documentation for stateless scanner --- Key: HBASE-10346 URL: https://issues.apache.org/jira/browse/HBASE-10346 Project: HBase Issue Type: Sub-task Components: REST Reporter: Vandana Ayyalasomayajula Assignee: Vandana Ayyalasomayajula Priority: Minor Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10346.0.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-6873) Clean up Coprocessor loading failure handling
[ https://issues.apache.org/jira/browse/HBASE-6873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874249#comment-13874249 ] Hudson commented on HBASE-6873: --- FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #55 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/55/]) HBASE-6873. Clean up Coprocessor loading failure handling (apurtell: rev 1558869) * /hbase/trunk/hbase-common/src/main/resources/hbase-default.xml * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerCoprocessorHost.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCoprocessorHost.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/constraint/TestConstraint.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithAbort.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithRemove.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithRemove.java * /hbase/trunk/hbase-shell/src/test/java/org/apache/hadoop/hbase/client/TestShell.java Clean up Coprocessor loading failure handling - Key: HBASE-6873 URL: https://issues.apache.org/jira/browse/HBASE-6873 Project: HBase Issue Type: Sub-task Components: Coprocessors, regionserver Affects Versions: 0.98.0 Reporter: David Arthur Assignee: Andrew Purtell Priority: Blocker Fix For: 0.98.0, 0.99.0 Attachments: 6873.patch, 6873.patch, 6873.patch, 6873.patch, 6873.patch, 6873.patch, 6873.patch When registering a coprocessor with a missing dependency, the regionserver gets stuck in an infinite fail loop. Restarting the regionserver and/or master has no affect. E.g., Load coprocessor from my-coproc.jar, that uses an external dependency (kafka) that is not included with HBase. {code} 12/09/24 13:13:15 INFO handler.OpenRegionHandler: Opening of region {NAME = 'documents,,1348505987177.6d1e1b7bb93486f096173bd401e8ef6b.', STARTKEY = '', ENDKEY = '', ENCODED = 6d1e1b7bb93486f096173bd401e8ef6b,} failed, marking as FAILED_OPEN in ZK 12/09/24 13:13:15 DEBUG zookeeper.ZKAssign: regionserver:60020-0x139f43af2a70043 Attempting to transition node 6d1e1b7bb93486f096173bd401e8ef6b from RS_ZK_REGION_OPENING to RS_ZK_REGION_FAILED_OPEN 12/09/24 13:13:15 DEBUG zookeeper.ZKAssign: regionserver:60020-0x139f43af2a70043 Successfully transitioned node 6d1e1b7bb93486f096173bd401e8ef6b from RS_ZK_REGION_OPENING to RS_ZK_REGION_FAILED_OPEN 12/09/24 13:13:15 INFO regionserver.HRegionServer: Received request to open region: documents,,1348505987177.6d1e1b7bb93486f096173bd401e8ef6b. 12/09/24 13:13:15 DEBUG zookeeper.ZKAssign: regionserver:60020-0x139f43af2a70043 Attempting to transition node 6d1e1b7bb93486f096173bd401e8ef6b from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING 12/09/24 13:13:15 DEBUG zookeeper.ZKAssign: regionserver:60020-0x139f43af2a70043 Successfully transitioned node 6d1e1b7bb93486f096173bd401e8ef6b from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING 12/09/24 13:13:15 DEBUG regionserver.HRegion: Opening region: {NAME = 'documents,,1348505987177.6d1e1b7bb93486f096173bd401e8ef6b.', STARTKEY = '', ENDKEY = '', ENCODED = 6d1e1b7bb93486f096173bd401e8ef6b,} 12/09/24 13:13:15 INFO regionserver.HRegion: Setting up tabledescriptor config now ... 12/09/24 13:13:15 INFO coprocessor.CoprocessorHost: Class com.mycompany.hbase.documents.DocumentObserverCoprocessor needs to be loaded from a file - file:/path/to/my-coproc.jar. 12/09/24 13:13:16 ERROR handler.OpenRegionHandler: Failed open of region=documents,,1348505987177.6d1e1b7bb93486f096173bd401e8ef6b., starting to roll back the global memstore size. java.lang.IllegalStateException: Could not instantiate a region instance. at org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:3595) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3733) at
[jira] [Commented] (HBASE-10336) Remove deprecated usage of Hadoop HttpServer in InfoServer
[ https://issues.apache.org/jira/browse/HBASE-10336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874252#comment-13874252 ] stack commented on HBASE-10336: --- I took a quick look. Seems like you are copying into hbase the hadoop httpserver and servlets (and tests) to undo our dependency. You also move around a few classes to put them into places that make more sense now httpserver is in hbase. That right? What are implications of applying this to our trunk Eric? (I did not see adding jetty to our pom. Do we need it or are we transitively including it? Do we need to exclude import of jetty from hadoop now?). Good work. Remove deprecated usage of Hadoop HttpServer in InfoServer -- Key: HBASE-10336 URL: https://issues.apache.org/jira/browse/HBASE-10336 Project: HBase Issue Type: Bug Affects Versions: 0.99.0 Reporter: Eric Charles Attachments: HBASE-10336-1.patch, HBASE-10336-2.patch, HBASE-10336-3.patch, HBASE-10336-4.patch Recent changes in Hadoop HttpServer give NPE when running on hadoop 3.0.0-SNAPSHOT. This way we use HttpServer is deprecated and will probably be not fixed (see HDFS-5760). We'd better move to the new proposed builder pattern, which means we can no more use inheritance to build our nice InfoServer. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10369) LabelExpander#createLabels() should close scanner in finally clause
[ https://issues.apache.org/jira/browse/HBASE-10369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874261#comment-13874261 ] Hadoop QA commented on HBASE-10369: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12623537/10369-v1.txt against trunk revision . ATTACHMENT ID: 12623537 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to cause Findbugs (version 1.3.9) to fail. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8450//testReport/ Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8450//console This message is automatically generated. LabelExpander#createLabels() should close scanner in finally clause --- Key: HBASE-10369 URL: https://issues.apache.org/jira/browse/HBASE-10369 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10369-v1.txt Here is related code: {code} while (true) { Result next = scanner.next(); if (next == null) { break; } byte[] row = next.getRow(); byte[] value = next.getValue(LABELS_TABLE_FAMILY, LABEL_QUALIFIER); labels.put(Bytes.toString(value), Bytes.toInt(row)); } scanner.close(); } finally { {code} If scanner.next() throws exception, scanner would be left open. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9721) RegionServer should not accept regionOpen RPC intended for another(previous) server
[ https://issues.apache.org/jira/browse/HBASE-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874267#comment-13874267 ] Hudson commented on HBASE-9721: --- FAILURE: Integrated in HBase-TRUNK #4827 (See [https://builds.apache.org/job/HBase-TRUNK/4827/]) HBASE-9721 RegionServer should not accept regionOpen RPC intended for another(previous) server -- REVERT (enis: rev 1558935) * /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java * /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java * /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java * /hbase/trunk/hbase-protocol/src/main/java/com/google/protobuf/ZeroCopyLiteralByteString.java * /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/AdminProtos.java * /hbase/trunk/hbase-protocol/src/main/protobuf/Admin.proto * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestScannersFromClientSide.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManagerOnCluster.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestZKBasedOpenCloseRegion.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerNoMaster.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java RegionServer should not accept regionOpen RPC intended for another(previous) server --- Key: HBASE-9721 URL: https://issues.apache.org/jira/browse/HBASE-9721 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.99.0 Attachments: hbase-9721_v0.patch, hbase-9721_v1.patch, hbase-9721_v2.patch, hbase-9721_v3.patch On a test cluster, this following events happened with ITBLL and CM leading to meta being unavailable until master is restarted. An RS carrying meta died, and master assigned the region to one of the RSs. {code} 2013-10-03 23:30:06,611 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] master.AssignmentManager: Assigning hbase:meta,,1.1588230740 to gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 2013-10-03 23:30:06,611 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] master.RegionStates: Transitioned {1588230740 state=OFFLINE, ts=1380843006601, server=null} to {1588230740 state=PENDING_OPEN, ts=1380843006611, server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820} 2013-10-03 23:30:06,611 DEBUG [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] master.ServerManager: New admin connection to gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 {code} At the same time, the RS that meta recently got assigned also died (due to CM), and restarted: {code} 2013-10-03 23:30:07,636 DEBUG [RpcServer.handler=17,port=6] master.ServerManager: REPORT: Server gs-hdp2-secure-1380781860-hbase-8.cs1cloud.internal,60020,1380843002494 came back up, removed it from the dead servers list 2013-10-03 23:30:08,769 INFO [RpcServer.handler=18,port=6] master.ServerManager: Triggering server recovery; existingServer gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 looks stale, new server:gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380843006362 2013-10-03 23:30:08,771 DEBUG [RpcServer.handler=18,port=6] master.AssignmentManager: Checking region=hbase:meta,,1.1588230740, zk server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 current=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820, matches=true 2013-10-03 23:30:08,771 DEBUG [RpcServer.handler=18,port=6] master.ServerManager: Added=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 to dead servers, submitted shutdown handler to be executed meta=true 2013-10-03 23:30:08,771 INFO [RpcServer.handler=18,port=6] master.ServerManager: Registering
[jira] [Commented] (HBASE-10156) Fix up the HBASE-8755 slowdown when low contention
[ https://issues.apache.org/jira/browse/HBASE-10156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874277#comment-13874277 ] Enis Soztutar commented on HBASE-10156: --- Amazing work Stack. Should this be a blocker for 0.98 because of the regression? I guess one has to spend some time to really understand the patch though. Fix up the HBASE-8755 slowdown when low contention -- Key: HBASE-10156 URL: https://issues.apache.org/jira/browse/HBASE-10156 Project: HBase Issue Type: Sub-task Components: wal Reporter: stack Assignee: stack Attachments: 10156.txt, 10156v10.txt, 10156v11.txt, 10156v12.txt, 10156v12.txt, 10156v13.txt, 10156v16.txt, 10156v17.txt, 10156v2.txt, 10156v3.txt, 10156v4.txt, 10156v5.txt, 10156v6.txt, 10156v7.txt, 10156v9.txt, Disrupting.java HBASE-8755 slows our writes when only a few clients. Fix. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10156) Fix up the HBASE-8755 slowdown when low contention
[ https://issues.apache.org/jira/browse/HBASE-10156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874279#comment-13874279 ] Andrew Purtell commented on HBASE-10156: bq. Should this be a blocker for 0.98 because of the regression? No, it should not be. Fix up the HBASE-8755 slowdown when low contention -- Key: HBASE-10156 URL: https://issues.apache.org/jira/browse/HBASE-10156 Project: HBase Issue Type: Sub-task Components: wal Reporter: stack Assignee: stack Attachments: 10156.txt, 10156v10.txt, 10156v11.txt, 10156v12.txt, 10156v12.txt, 10156v13.txt, 10156v16.txt, 10156v17.txt, 10156v2.txt, 10156v3.txt, 10156v4.txt, 10156v5.txt, 10156v6.txt, 10156v7.txt, 10156v9.txt, Disrupting.java HBASE-8755 slows our writes when only a few clients. Fix. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10369) LabelExpander#createLabels() should close scanner in finally clause
[ https://issues.apache.org/jira/browse/HBASE-10369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874283#comment-13874283 ] Ted Yu commented on HBASE-10369: Integrated to 0.98 and trunk. Thanks for the review, Andy. LabelExpander#createLabels() should close scanner in finally clause --- Key: HBASE-10369 URL: https://issues.apache.org/jira/browse/HBASE-10369 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10369-v1.txt Here is related code: {code} while (true) { Result next = scanner.next(); if (next == null) { break; } byte[] row = next.getRow(); byte[] value = next.getValue(LABELS_TABLE_FAMILY, LABEL_QUALIFIER); labels.put(Bytes.toString(value), Bytes.toInt(row)); } scanner.close(); } finally { {code} If scanner.next() throws exception, scanner would be left open. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10366) 0.94 filterRow() may be skipped in 0.96(or onwards) code
[ https://issues.apache.org/jira/browse/HBASE-10366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874288#comment-13874288 ] Andrew Purtell commented on HBASE-10366: Is this for Hive [~jeffreyz]? 0.94 filterRow() may be skipped in 0.96(or onwards) code Key: HBASE-10366 URL: https://issues.apache.org/jira/browse/HBASE-10366 Project: HBase Issue Type: Bug Components: Filters Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Priority: Critical Fix For: 0.98.0, 0.96.2 Attachments: hbase-10366.patch HBASE-6429 combines both filterRow filterRow(ListKeyValue kvs) functions in Filter. While 0.94 code or older, it may not implement hasFilterRow as HBase-6429 expected because hasFilterRow only returns true when filterRow(ListKeyValue kvs) is overridden not the filterRow(). Therefore, the filterRow() will be skipped. Since we don't ask 0.94 users to update their existing filter code, the issue will cause scan returns unexpected keyvalues and break the backward compatibility. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10364) Allow configuration option for parent znode in LoadTestTool
[ https://issues.apache.org/jira/browse/HBASE-10364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874287#comment-13874287 ] Hudson commented on HBASE-10364: SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #79 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/79/]) HBASE-10364 Allow configuration option for parent znode in LoadTestTool (Tedyu: rev 1558940) * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/util/LoadTestTool.java Allow configuration option for parent znode in LoadTestTool --- Key: HBASE-10364 URL: https://issues.apache.org/jira/browse/HBASE-10364 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.98.0, 0.99.0 Attachments: 10364-v1.txt, 10364-v2.txt I saw the following running Hoya functional test which involves LoadTestTool: {code} 2014-01-16 19:06:03,443 [Thread-2] INFO client.HConnectionManager$HConnectionImplementation (HConnectionManager.java:makeStub(1572)) - getMaster attempt 8 of 35 failed; retrying after sleep of 10098, exception=org.apache.hadoop.hbase.MasterNotRunningException: The node /hbase is not in ZooKeeper. It should have been written by themaster. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. {code} LoadTestTool was reading from correct zookeeper quorum but it wasn't able to find parent znode. An option should be added to LoadTestTool so that user can specify parent znode in zookeeper. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9721) RegionServer should not accept regionOpen RPC intended for another(previous) server
[ https://issues.apache.org/jira/browse/HBASE-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874286#comment-13874286 ] Hudson commented on HBASE-9721: --- SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #79 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/79/]) HBASE-9721 RegionServer should not accept regionOpen RPC intended for another(previous) server -- REVERT (enis: rev 1558937) * /hbase/branches/0.98/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java * /hbase/branches/0.98/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java * /hbase/branches/0.98/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java * /hbase/branches/0.98/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/AdminProtos.java * /hbase/branches/0.98/hbase-protocol/src/main/protobuf/Admin.proto * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestScannersFromClientSide.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManagerOnCluster.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestZKBasedOpenCloseRegion.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerNoMaster.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java RegionServer should not accept regionOpen RPC intended for another(previous) server --- Key: HBASE-9721 URL: https://issues.apache.org/jira/browse/HBASE-9721 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.99.0 Attachments: hbase-9721_v0.patch, hbase-9721_v1.patch, hbase-9721_v2.patch, hbase-9721_v3.patch On a test cluster, this following events happened with ITBLL and CM leading to meta being unavailable until master is restarted. An RS carrying meta died, and master assigned the region to one of the RSs. {code} 2013-10-03 23:30:06,611 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] master.AssignmentManager: Assigning hbase:meta,,1.1588230740 to gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 2013-10-03 23:30:06,611 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] master.RegionStates: Transitioned {1588230740 state=OFFLINE, ts=1380843006601, server=null} to {1588230740 state=PENDING_OPEN, ts=1380843006611, server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820} 2013-10-03 23:30:06,611 DEBUG [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] master.ServerManager: New admin connection to gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 {code} At the same time, the RS that meta recently got assigned also died (due to CM), and restarted: {code} 2013-10-03 23:30:07,636 DEBUG [RpcServer.handler=17,port=6] master.ServerManager: REPORT: Server gs-hdp2-secure-1380781860-hbase-8.cs1cloud.internal,60020,1380843002494 came back up, removed it from the dead servers list 2013-10-03 23:30:08,769 INFO [RpcServer.handler=18,port=6] master.ServerManager: Triggering server recovery; existingServer gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 looks stale, new server:gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380843006362 2013-10-03 23:30:08,771 DEBUG [RpcServer.handler=18,port=6] master.AssignmentManager: Checking region=hbase:meta,,1.1588230740, zk server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 current=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820, matches=true 2013-10-03 23:30:08,771 DEBUG [RpcServer.handler=18,port=6] master.ServerManager: Added=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 to dead servers, submitted shutdown handler to be executed meta=true 2013-10-03 23:30:08,771 INFO [RpcServer.handler=18,port=6] master.ServerManager: Registering
[jira] [Commented] (HBASE-10364) Allow configuration option for parent znode in LoadTestTool
[ https://issues.apache.org/jira/browse/HBASE-10364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874268#comment-13874268 ] Hudson commented on HBASE-10364: FAILURE: Integrated in HBase-TRUNK #4827 (See [https://builds.apache.org/job/HBase-TRUNK/4827/]) HBASE-10364 Allow configuration option for parent znode in LoadTestTool (Tedyu: rev 1558941) * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/LoadTestTool.java Allow configuration option for parent znode in LoadTestTool --- Key: HBASE-10364 URL: https://issues.apache.org/jira/browse/HBASE-10364 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.98.0, 0.99.0 Attachments: 10364-v1.txt, 10364-v2.txt I saw the following running Hoya functional test which involves LoadTestTool: {code} 2014-01-16 19:06:03,443 [Thread-2] INFO client.HConnectionManager$HConnectionImplementation (HConnectionManager.java:makeStub(1572)) - getMaster attempt 8 of 35 failed; retrying after sleep of 10098, exception=org.apache.hadoop.hbase.MasterNotRunningException: The node /hbase is not in ZooKeeper. It should have been written by themaster. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. {code} LoadTestTool was reading from correct zookeeper quorum but it wasn't able to find parent znode. An option should be added to LoadTestTool so that user can specify parent znode in zookeeper. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10366) 0.94 filterRow() may be skipped in 0.96(or onwards) code
[ https://issues.apache.org/jira/browse/HBASE-10366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874291#comment-13874291 ] Andrew Purtell commented on HBASE-10366: Is there a unit test that covers the old method and behavior we put back here? 0.94 filterRow() may be skipped in 0.96(or onwards) code Key: HBASE-10366 URL: https://issues.apache.org/jira/browse/HBASE-10366 Project: HBase Issue Type: Bug Components: Filters Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Priority: Critical Fix For: 0.98.0, 0.96.2 Attachments: hbase-10366.patch HBASE-6429 combines both filterRow filterRow(ListKeyValue kvs) functions in Filter. While 0.94 code or older, it may not implement hasFilterRow as HBase-6429 expected because hasFilterRow only returns true when filterRow(ListKeyValue kvs) is overridden not the filterRow(). Therefore, the filterRow() will be skipped. Since we don't ask 0.94 users to update their existing filter code, the issue will cause scan returns unexpected keyvalues and break the backward compatibility. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10366) 0.94 filterRow() may be skipped in 0.96(or onwards) code
[ https://issues.apache.org/jira/browse/HBASE-10366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-10366: -- Description: HBASE-6429 combines both filterRow filterRow(ListKeyValue kvs) functions in Filter. While 0.94 code or older, it may not implement hasFilterRow as HBase-6429 expected because 0.94(old) hasFilterRow only returns true when filterRow(ListKeyValue kvs) is overridden not the filterRow(). Therefore, the filterRow() will be skipped. Since we don't ask 0.94 users to update their existing filter code, the issue will cause scan returns unexpected keyvalues and break the backward compatibility. was: HBASE-6429 combines both filterRow filterRow(ListKeyValue kvs) functions in Filter. While 0.94 code or older, it may not implement hasFilterRow as HBase-6429 expected because hasFilterRow only returns true when filterRow(ListKeyValue kvs) is overridden not the filterRow(). Therefore, the filterRow() will be skipped. Since we don't ask 0.94 users to update their existing filter code, the issue will cause scan returns unexpected keyvalues and break the backward compatibility. 0.94 filterRow() may be skipped in 0.96(or onwards) code Key: HBASE-10366 URL: https://issues.apache.org/jira/browse/HBASE-10366 Project: HBase Issue Type: Bug Components: Filters Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Priority: Critical Fix For: 0.98.0, 0.96.2 Attachments: hbase-10366.patch HBASE-6429 combines both filterRow filterRow(ListKeyValue kvs) functions in Filter. While 0.94 code or older, it may not implement hasFilterRow as HBase-6429 expected because 0.94(old) hasFilterRow only returns true when filterRow(ListKeyValue kvs) is overridden not the filterRow(). Therefore, the filterRow() will be skipped. Since we don't ask 0.94 users to update their existing filter code, the issue will cause scan returns unexpected keyvalues and break the backward compatibility. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10366) 0.94 filterRow() may be skipped in 0.96(or onwards) code
[ https://issues.apache.org/jira/browse/HBASE-10366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874289#comment-13874289 ] Andrew Purtell commented on HBASE-10366: Never mind, I see the Phoenix tag. Same difference. 0.94 filterRow() may be skipped in 0.96(or onwards) code Key: HBASE-10366 URL: https://issues.apache.org/jira/browse/HBASE-10366 Project: HBase Issue Type: Bug Components: Filters Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Priority: Critical Fix For: 0.98.0, 0.96.2 Attachments: hbase-10366.patch HBASE-6429 combines both filterRow filterRow(ListKeyValue kvs) functions in Filter. While 0.94 code or older, it may not implement hasFilterRow as HBase-6429 expected because hasFilterRow only returns true when filterRow(ListKeyValue kvs) is overridden not the filterRow(). Therefore, the filterRow() will be skipped. Since we don't ask 0.94 users to update their existing filter code, the issue will cause scan returns unexpected keyvalues and break the backward compatibility. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10366) 0.94 filterRow() may be skipped in 0.96(or onwards) code
[ https://issues.apache.org/jira/browse/HBASE-10366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874293#comment-13874293 ] Jeffrey Zhong commented on HBASE-10366: --- I manually verified in Phoenix on 0.96. Let me try to add a filter test case to cover this change. Thanks. 0.94 filterRow() may be skipped in 0.96(or onwards) code Key: HBASE-10366 URL: https://issues.apache.org/jira/browse/HBASE-10366 Project: HBase Issue Type: Bug Components: Filters Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Priority: Critical Fix For: 0.98.0, 0.96.2 Attachments: hbase-10366.patch HBASE-6429 combines both filterRow filterRow(ListKeyValue kvs) functions in Filter. While 0.94 code or older, it may not implement hasFilterRow as HBase-6429 expected because 0.94(old) hasFilterRow only returns true when filterRow(ListKeyValue kvs) is overridden not the filterRow(). Therefore, the filterRow() will be skipped. Since we don't ask 0.94 users to update their existing filter code, the issue will cause scan returns unexpected keyvalues and break the backward compatibility. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10366) 0.94 filterRow() may be skipped in 0.96(or onwards) code
[ https://issues.apache.org/jira/browse/HBASE-10366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874296#comment-13874296 ] Andrew Purtell commented on HBASE-10366: Thanks, +1 for trunk and 0.98 with a unit test. 0.94 filterRow() may be skipped in 0.96(or onwards) code Key: HBASE-10366 URL: https://issues.apache.org/jira/browse/HBASE-10366 Project: HBase Issue Type: Bug Components: Filters Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Priority: Critical Fix For: 0.98.0, 0.96.2 Attachments: hbase-10366.patch HBASE-6429 combines both filterRow filterRow(ListKeyValue kvs) functions in Filter. While 0.94 code or older, it may not implement hasFilterRow as HBase-6429 expected because 0.94(old) hasFilterRow only returns true when filterRow(ListKeyValue kvs) is overridden not the filterRow(). Therefore, the filterRow() will be skipped. Since we don't ask 0.94 users to update their existing filter code, the issue will cause scan returns unexpected keyvalues and break the backward compatibility. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (HBASE-10366) 0.94 filterRow() may be skipped in 0.96(or onwards) code
[ https://issues.apache.org/jira/browse/HBASE-10366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874296#comment-13874296 ] Andrew Purtell edited comment on HBASE-10366 at 1/17/14 1:44 AM: - Thanks, +1 for trunk and 0.98 with a unit test. Edit: Actually without one too, but thanks in advance for one. :-) was (Author: apurtell): Thanks, +1 for trunk and 0.98 with a unit test. 0.94 filterRow() may be skipped in 0.96(or onwards) code Key: HBASE-10366 URL: https://issues.apache.org/jira/browse/HBASE-10366 Project: HBase Issue Type: Bug Components: Filters Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Priority: Critical Fix For: 0.98.0, 0.96.2 Attachments: hbase-10366.patch HBASE-6429 combines both filterRow filterRow(ListKeyValue kvs) functions in Filter. While 0.94 code or older, it may not implement hasFilterRow as HBase-6429 expected because 0.94(old) hasFilterRow only returns true when filterRow(ListKeyValue kvs) is overridden not the filterRow(). Therefore, the filterRow() will be skipped. Since we don't ask 0.94 users to update their existing filter code, the issue will cause scan returns unexpected keyvalues and break the backward compatibility. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10249) Intermittent TestReplicationSyncUpTool failure
[ https://issues.apache.org/jira/browse/HBASE-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874313#comment-13874313 ] Hadoop QA commented on HBASE-10249: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12623538/HBASE-10249-trunk-v1.patch against trunk revision . ATTACHMENT ID: 12623538 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//console This message is automatically generated. Intermittent TestReplicationSyncUpTool failure -- Key: HBASE-10249 URL: https://issues.apache.org/jira/browse/HBASE-10249 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Demai Ni Fix For: 0.98.0, 0.96.2, 0.99.0, 0.94.17 Attachments: HBASE-10249-0.94-v0.patch, HBASE-10249-0.94-v1.patch, HBASE-10249-trunk-v0.patch, HBASE-10249-trunk-v1.patch New issue to keep track of this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10347) HRegionInfo changes for adding replicaId and MetaEditor/MetaReader changes for region replicas
[ https://issues.apache.org/jira/browse/HBASE-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated HBASE-10347: Assignee: Devaraj Das This is the RB request https://reviews.apache.org/r/17018/ HRegionInfo changes for adding replicaId and MetaEditor/MetaReader changes for region replicas -- Key: HBASE-10347 URL: https://issues.apache.org/jira/browse/HBASE-10347 Project: HBase Issue Type: Sub-task Components: Region Assignment Reporter: Enis Soztutar Assignee: Devaraj Das Fix For: 0.99.0 As per parent jira, the cleanest way to add region replicas we think is to actually create one more region per replica per primary region. So for example, if a table has 10 regions with replication = 3, the table would indeed be created with 30 regions. These regions will be handled and assigned individually for AM purposes. We can add replicaId to HRegionInfo to indicate the replicaId, and use this to differentiate different replicas of the same region. So, primary replica would have replicaId = 0, and the others will have replicaId 0. These replicas will share the same regionId prefix, but differ in an appended replicaId. The primary will not contain the replicaId so that no changes would be needed for existing tables. In meta, the replica regions are kept in the same row as the primary ( so for above example, there will be 10 rows in meta). The servers for the replicas are kept in columns like server+replicaId. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10368) Add Mutation.setWriteToWAL() back to 0.98
[ https://issues.apache.org/jira/browse/HBASE-10368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-10368: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed this to trunk and 0.98. Thanks Andrew for review. Add Mutation.setWriteToWAL() back to 0.98 - Key: HBASE-10368 URL: https://issues.apache.org/jira/browse/HBASE-10368 Project: HBase Issue Type: Improvement Components: Client Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.99.0 Attachments: hbase-10368_v1.patch This is similar to HBASE-10339, where we deprecated the API Mutation.setWriteToWAL() in 0.96 and removed it in 0.98. Although 0.94.7+ contains the Durability API which replaces this, for Pig and other tools to be able to compile with 0.94.7- and 0.98 without shims / reflection, the safest way is to add the API back to 0.98 in deprecated mode. [~daijy] says that it may still be important to be able to compile with 0.94.7-, which I kind of agree. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-8410) Basic quota support for namespaces
[ https://issues.apache.org/jira/browse/HBASE-8410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vandana Ayyalasomayajula updated HBASE-8410: Attachment: HBASE-8410_trunk_12.patch Basic quota support for namespaces -- Key: HBASE-8410 URL: https://issues.apache.org/jira/browse/HBASE-8410 Project: HBase Issue Type: Sub-task Reporter: Francis Liu Assignee: Vandana Ayyalasomayajula Attachments: HBASE-8410_trunk_10.patch, HBASE-8410_trunk_10.patch, HBASE-8410_trunk_11.patch, HBASE-8410_trunk_12.patch, HBASE-8410_trunk_2.patch, HBASE-8410_trunk_3.patch, HBASE-8410_trunk_4.patch, HBASE-8410_trunk_4.patch, HBASE-8410_trunk_5.patch, HBASE-8410_trunk_6.patch, HBASE-8410_trunk_7.patch, HBASE-8410_trunk_8.patch, HBASE-8410_trunk_9.patch, HBASE_8410.patch, HBASE_8410_1_trunk.patch This task involves creating an observer which provides basic quota support to namespaces in terms of (1) number of tables and (2) number of regions. The quota support can be enabled by setting: property namehbase.coprocessor.region.classes/name valueorg.apache.hadoop.hbase.namespace.NamespaceController/value /property property namehbase.coprocessor.master.classes/name valueorg.apache.hadoop.hbase.namespace.NamespaceController/value /property in the hbase-site.xml. To add quotas to namespace, while creating namespace properties need to be added. Examples: 1. namespace_create 'ns1', {'hbase.namespace.quota.maxregion'='10'} 2. 1. namespace_create 'ns2', {'hbase.namespace.quota.maxtables'='2'}, {'hbase.namespace.quota.maxregion'='5'} The quotas can be modified/added to namespace at any point of time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-9343) Implement stateless scanner for Stargate
[ https://issues.apache.org/jira/browse/HBASE-9343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-9343: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to 0.98 and trunk. Thanks for another nice improvement, [~avandana]! Implement stateless scanner for Stargate Key: HBASE-9343 URL: https://issues.apache.org/jira/browse/HBASE-9343 Project: HBase Issue Type: Improvement Components: REST Affects Versions: 0.94.11 Reporter: Vandana Ayyalasomayajula Assignee: Vandana Ayyalasomayajula Priority: Minor Fix For: 0.98.0, 0.99.0 Attachments: HBASE-9343_94.00.patch, HBASE-9343_94.01.patch, HBASE-9343_trunk.00.patch, HBASE-9343_trunk.01.patch, HBASE-9343_trunk.01.patch, HBASE-9343_trunk.02.patch, HBASE-9343_trunk.03.patch, HBASE-9343_trunk.04.patch, HBASE-9343_trunk.05.patch The current scanner implementation for scanner stores state and hence not very suitable for REST server failure scenarios. The current JIRA proposes to implement a stateless scanner. In the first version of the patch, a new resource class ScanResource has been added and all the scan parameters will be specified as query params. The following are the scan parameters startrow - The start row for the scan. endrow - The end row for the scan. columns - The columns to scan. starttime, endtime - To only retrieve columns within a specific range of version timestamps,both start and end time must be specified. maxversions - To limit the number of versions of each column to be returned. batchsize - To limit the maximum number of values returned for each call to next(). limit - The number of rows to return in the scan operation. More on start row, end row and limit parameters. 1. If start row, end row and limit not specified, then the whole table will be scanned. 2. If start row and limit (say N) is specified, then the scan operation will return N rows from the start row specified. 3. If only limit parameter is specified, then the scan operation will return N rows from the start of the table. 4. If limit and end row are specified, then the scan operation will return N rows from start of table till the end row. If the end row is reached before N rows ( say M and M lt; N ), then M rows will be returned to the user. 5. If start row, end row and limit (say N ) are specified and N lt; number of rows between start row and end row, then N rows from start row will be returned to the user. If N gt; (number of rows between start row and end row (say M), then M number of rows will be returned to the user. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-9343) Implement stateless scanner for Stargate
[ https://issues.apache.org/jira/browse/HBASE-9343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-9343: Fix Version/s: (was: 0.98.1) 0.98.0 Implement stateless scanner for Stargate Key: HBASE-9343 URL: https://issues.apache.org/jira/browse/HBASE-9343 Project: HBase Issue Type: Improvement Components: REST Affects Versions: 0.94.11 Reporter: Vandana Ayyalasomayajula Assignee: Vandana Ayyalasomayajula Priority: Minor Fix For: 0.98.0, 0.99.0 Attachments: HBASE-9343_94.00.patch, HBASE-9343_94.01.patch, HBASE-9343_trunk.00.patch, HBASE-9343_trunk.01.patch, HBASE-9343_trunk.01.patch, HBASE-9343_trunk.02.patch, HBASE-9343_trunk.03.patch, HBASE-9343_trunk.04.patch, HBASE-9343_trunk.05.patch The current scanner implementation for scanner stores state and hence not very suitable for REST server failure scenarios. The current JIRA proposes to implement a stateless scanner. In the first version of the patch, a new resource class ScanResource has been added and all the scan parameters will be specified as query params. The following are the scan parameters startrow - The start row for the scan. endrow - The end row for the scan. columns - The columns to scan. starttime, endtime - To only retrieve columns within a specific range of version timestamps,both start and end time must be specified. maxversions - To limit the number of versions of each column to be returned. batchsize - To limit the maximum number of values returned for each call to next(). limit - The number of rows to return in the scan operation. More on start row, end row and limit parameters. 1. If start row, end row and limit not specified, then the whole table will be scanned. 2. If start row and limit (say N) is specified, then the scan operation will return N rows from the start row specified. 3. If only limit parameter is specified, then the scan operation will return N rows from the start of the table. 4. If limit and end row are specified, then the scan operation will return N rows from start of table till the end row. If the end row is reached before N rows ( say M and M lt; N ), then M rows will be returned to the user. 5. If start row, end row and limit (say N ) are specified and N lt; number of rows between start row and end row, then N rows from start row will be returned to the user. If N gt; (number of rows between start row and end row (say M), then M number of rows will be returned to the user. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10346) Add Documentation for stateless scanner
[ https://issues.apache.org/jira/browse/HBASE-10346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-10346: - Issue Type: Improvement (was: Sub-task) Parent: (was: HBASE-9343) Add Documentation for stateless scanner --- Key: HBASE-10346 URL: https://issues.apache.org/jira/browse/HBASE-10346 Project: HBase Issue Type: Improvement Components: REST Reporter: Vandana Ayyalasomayajula Assignee: Vandana Ayyalasomayajula Priority: Minor Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10346.0.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10346) Add Documentation for stateless scanner
[ https://issues.apache.org/jira/browse/HBASE-10346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874344#comment-13874344 ] Nick Dimiduk commented on HBASE-10346: -- Converting to task since the parent issue has been resolved. Add Documentation for stateless scanner --- Key: HBASE-10346 URL: https://issues.apache.org/jira/browse/HBASE-10346 Project: HBase Issue Type: Improvement Components: REST Reporter: Vandana Ayyalasomayajula Assignee: Vandana Ayyalasomayajula Priority: Minor Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10346.0.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-9345) Add support for specifying filters in scan
[ https://issues.apache.org/jira/browse/HBASE-9345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-9345: Issue Type: Improvement (was: Sub-task) Parent: (was: HBASE-9343) Add support for specifying filters in scan -- Key: HBASE-9345 URL: https://issues.apache.org/jira/browse/HBASE-9345 Project: HBase Issue Type: Improvement Components: REST Affects Versions: 0.94.11 Reporter: Vandana Ayyalasomayajula Assignee: Vandana Ayyalasomayajula Priority: Minor In the implementation of stateless scanner from HBase-9343, the support for specifying filters is missing. This JIRA aims to implement support for filter specification. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9345) Add support for specifying filters in scan
[ https://issues.apache.org/jira/browse/HBASE-9345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874346#comment-13874346 ] Nick Dimiduk commented on HBASE-9345: - Converting to task since the parent issue has been resolved. Add support for specifying filters in scan -- Key: HBASE-9345 URL: https://issues.apache.org/jira/browse/HBASE-9345 Project: HBase Issue Type: Improvement Components: REST Affects Versions: 0.94.11 Reporter: Vandana Ayyalasomayajula Assignee: Vandana Ayyalasomayajula Priority: Minor In the implementation of stateless scanner from HBase-9343, the support for specifying filters is missing. This JIRA aims to implement support for filter specification. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8410) Basic quota support for namespaces
[ https://issues.apache.org/jira/browse/HBASE-8410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874350#comment-13874350 ] Vandana Ayyalasomayajula commented on HBASE-8410: - [~te...@apache.org] The new patch should fix the warnings. Basic quota support for namespaces -- Key: HBASE-8410 URL: https://issues.apache.org/jira/browse/HBASE-8410 Project: HBase Issue Type: Sub-task Reporter: Francis Liu Assignee: Vandana Ayyalasomayajula Attachments: HBASE-8410_trunk_10.patch, HBASE-8410_trunk_10.patch, HBASE-8410_trunk_11.patch, HBASE-8410_trunk_12.patch, HBASE-8410_trunk_2.patch, HBASE-8410_trunk_3.patch, HBASE-8410_trunk_4.patch, HBASE-8410_trunk_4.patch, HBASE-8410_trunk_5.patch, HBASE-8410_trunk_6.patch, HBASE-8410_trunk_7.patch, HBASE-8410_trunk_8.patch, HBASE-8410_trunk_9.patch, HBASE_8410.patch, HBASE_8410_1_trunk.patch This task involves creating an observer which provides basic quota support to namespaces in terms of (1) number of tables and (2) number of regions. The quota support can be enabled by setting: property namehbase.coprocessor.region.classes/name valueorg.apache.hadoop.hbase.namespace.NamespaceController/value /property property namehbase.coprocessor.master.classes/name valueorg.apache.hadoop.hbase.namespace.NamespaceController/value /property in the hbase-site.xml. To add quotas to namespace, while creating namespace properties need to be added. Examples: 1. namespace_create 'ns1', {'hbase.namespace.quota.maxregion'='10'} 2. 1. namespace_create 'ns2', {'hbase.namespace.quota.maxtables'='2'}, {'hbase.namespace.quota.maxregion'='5'} The quotas can be modified/added to namespace at any point of time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10370) Compaction in out-of-date Store causes region split failed
Liu Shaohui created HBASE-10370: --- Summary: Compaction in out-of-date Store causes region split failed Key: HBASE-10370 URL: https://issues.apache.org/jira/browse/HBASE-10370 Project: HBase Issue Type: Bug Components: Compaction Reporter: Liu Shaohui Priority: Critical In out product cluster, we encounter a problem that two daughter regions can not been opened for FileNotFoundException. {quote} 2014-01-14,20:12:46,927 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup of failed split of user_profile,x,1389671863815.99e016485b0bc142d67ae07a884f6966.; Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98 java.io.IOException: Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98 at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:375) at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:467) at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:69) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: java.io.IOException: java.io.FileNotFoundException: File does not exist: /hbase/lgprc-xiaomi/user_profile/99e016485b0bc142d67ae07a884f6966/A/5e05d706e4a84f34acc2cf00f089a4cf {quote} The reason is that a compaction in an out-of-date Store deletes the hfiles, which are referenced by the daughter regions after split. This will cause the daughter regions can not be opened forever. The timeline is that Assumption: there are two hfiles: a, b in Store A in Region R t0: A compaction request of Store A(a+b) in Region R is send. t1: A Split for Region R. But the split is timeout and rollbacked. In the rollback, region reinitializes all store objects , see SplitTransaction #824. Now the store is Region R is A'(a+b). t2: Run compaction(a + b - c): A(a+b) - A(c). Hfile a and b are archived. t3: A Split for Region R. R splits into two region R.0, R.1, which create hfile references for hfile a, b from Store A'(a + b) t4: For hfile a, b have been deleted, the opening for region R.0 and R.1 will failed for FileNotFoundException. I have add a test to identity this problem. After search the jira, maybe HBASE-8502 is the same problem. [~goldin] -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9360) Enable 0.94 - 0.96 replication to minimize upgrade down time
[ https://issues.apache.org/jira/browse/HBASE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874355#comment-13874355 ] Nick Dimiduk commented on HBASE-9360: - A new section in http://hbase.apache.org/book.html#upgrade0.96 plus a note on user@ should do the trick. Enable 0.94 - 0.96 replication to minimize upgrade down time - Key: HBASE-9360 URL: https://issues.apache.org/jira/browse/HBASE-9360 Project: HBase Issue Type: Brainstorming Components: migration Affects Versions: 0.98.0, 0.96.0 Reporter: Jeffrey Zhong As we know 0.96 is a singularity release, as of today a 0.94 hbase user has to do in-place upgrade: make corresponding client changes, recompile client application code, fully shut down existing 0.94 hbase cluster, deploy 0.96 binary, run upgrade script and then start the upgraded cluster. You can image the down time will be extended if something is wrong in between. To minimize the down time, another possible way is to setup a secondary 0.96 cluster and then setup replication between the existing 0.94 cluster and the new 0.96 slave cluster. Once the 0.96 cluster is synced, a user can switch the traffic to the 0.96 cluster and decommission the old one. The ideal steps will be: 1) Setup a 0.96 cluster 2) Setup replication between a running 0.94 cluster to the newly created 0.96 cluster 3) Wait till they're in sync in replication 4) Starts duplicated writes to both 0.94 and 0.96 clusters(could stop relocation now) 5) Forward read traffic to the slave 0.96 cluster 6) After a certain period, stop writes to the original 0.94 cluster if everything is good and completes upgrade To get us there, there are two tasks: 1) Enable replication from 0.94 - 0.96 I've run the idea with [~jdcryans], [~devaraj] and [~ndimiduk]. Currently it seems the best approach is to build a very similar service or on top of https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep with support three commands replicateLogEntries, multi and delete. Inside the three commands, we just pass down the corresponding requests to the destination 0.96 cluster as a bridge. The reason to support the multi and delete is for CopyTable to copy data from a 0.94 cluster to a 0.96 one. The other approach is to provide limited support of 0.94 RPC protocol in 0.96. While an issue on this is that a 0.94 client needs to talk to zookeeper firstly before it can connect to a 0.96 region server. Therefore, we need a faked Zookeeper setup in front of a 0.96 cluster for a 0.94 client to connect. It may also pollute 0.96 code base with 0.94 RPC code. 2) To support writes to a 0.96 cluster and a 0.94 at the same time, we need to load both hbase clients into one single JVM using different class loader. Let me know if you think this is worth to do and any better approach we could take. Thanks! -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10368) Add Mutation.setWriteToWAL() back to 0.98
[ https://issues.apache.org/jira/browse/HBASE-10368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874357#comment-13874357 ] Hadoop QA commented on HBASE-10368: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12623531/hbase-10368_v1.patch against trunk revision . ATTACHMENT ID: 12623531 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8452//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8452//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8452//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8452//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8452//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8452//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8452//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8452//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8452//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8452//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8452//console This message is automatically generated. Add Mutation.setWriteToWAL() back to 0.98 - Key: HBASE-10368 URL: https://issues.apache.org/jira/browse/HBASE-10368 Project: HBase Issue Type: Improvement Components: Client Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.99.0 Attachments: hbase-10368_v1.patch This is similar to HBASE-10339, where we deprecated the API Mutation.setWriteToWAL() in 0.96 and removed it in 0.98. Although 0.94.7+ contains the Durability API which replaces this, for Pig and other tools to be able to compile with 0.94.7- and 0.98 without shims / reflection, the safest way is to add the API back to 0.98 in deprecated mode. [~daijy] says that it may still be important to be able to compile with 0.94.7-, which I kind of agree. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10370) Compaction in out-of-date Store causes region split failed
[ https://issues.apache.org/jira/browse/HBASE-10370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Shaohui updated HBASE-10370: Description: In out product cluster, we encounter a problem that two daughter regions can not been opened for FileNotFoundException. {quote} 2014-01-14,20:12:46,927 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup of failed split of user_profile,x,1389671863815.99e016485b0bc142d67ae07a884f6966.; Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98 java.io.IOException: Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98 at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:375) at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:467) at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:69) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: java.io.IOException: java.io.FileNotFoundException: File does not exist: /hbase/lgprc-xiaomi/user_profile/99e016485b0bc142d67ae07a884f6966/A/5e05d706e4a84f34acc2cf00f089a4cf {quote} The reason is that a compaction in an out-of-date Store deletes the hfiles, which are referenced by the daughter regions after split. This will cause the daughter regions can not be opened forever. The timeline is that Assumption: there are two hfiles: a, b in Store A in Region R t0: A compaction request of Store A(a+b) in Region R is send. t1: A Split for Region R. But the split is timeout and rollbacked. In the rollback, region reinitializes all store objects , see SplitTransaction #824. Now the store is Region R is A'(a+b). t2: Run the compaction send in t0 . (hfile: a + b - c): A(a+b) - A(c). Hfile a and b are archived. t3: A Split for Region R. R splits into two region R.0, R.1, which create hfile references for hfile a, b from Store A'(a + b) t4: For hfile a, b have been deleted, the opening for region R.0 and R.1 will failed for FileNotFoundException. I have add a test to identity this problem. After search the jira, maybe HBASE-8502 is the same problem. [~goldin] was: In out product cluster, we encounter a problem that two daughter regions can not been opened for FileNotFoundException. {quote} 2014-01-14,20:12:46,927 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup of failed split of user_profile,x,1389671863815.99e016485b0bc142d67ae07a884f6966.; Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98 java.io.IOException: Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98 at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:375) at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:467) at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:69) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: java.io.IOException: java.io.FileNotFoundException: File does not exist: /hbase/lgprc-xiaomi/user_profile/99e016485b0bc142d67ae07a884f6966/A/5e05d706e4a84f34acc2cf00f089a4cf {quote} The reason is that a compaction in an out-of-date Store deletes the hfiles, which are referenced by the daughter regions after split. This will cause the daughter regions can not be opened forever. The timeline is that Assumption: there are two hfiles: a, b in Store A in Region R t0: A compaction request of Store A(a+b) in Region R is send. t1: A Split for Region R. But the split is timeout and rollbacked. In the rollback, region reinitializes all store objects , see SplitTransaction #824. Now the store is Region R is A'(a+b). t2: Run compaction(a + b - c): A(a+b) - A(c). Hfile a and b are archived. t3: A Split for Region R. R splits into two region R.0, R.1, which create hfile references for hfile a, b from Store A'(a + b) t4: For hfile a, b have been deleted, the opening for region R.0 and R.1 will failed for FileNotFoundException. I have add a test to identity this problem. After search the jira, maybe HBASE-8502 is the same problem. [~goldin] Compaction in out-of-date Store causes region split failed -- Key: HBASE-10370 URL:
[jira] [Commented] (HBASE-10333) Assignments are not retained on a cluster start
[ https://issues.apache.org/jira/browse/HBASE-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874359#comment-13874359 ] Hudson commented on HBASE-10333: FAILURE: Integrated in HBase-0.98 #88 (See [https://builds.apache.org/job/HBase-0.98/88/]) HBASE-10333 Assignments are not retained on a cluster start (jxiang: rev 1558964) * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java Assignments are not retained on a cluster start --- Key: HBASE-10333 URL: https://issues.apache.org/jira/browse/HBASE-10333 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.1.1 Reporter: Devaraj Das Assignee: Jimmy Xiang Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: hbase-10333.patch When a cluster is fully shutdown and then started up again with hbase.master.startup.retainassign set to true, I noticed that the assignments are not retained. Upon digging, it seems like HBASE-10101 made a change due to which the server holding the META previously is added to dead-servers (in _HMaster.assignMeta_). Later on, this makes the AssignmentManager think that the master recovered from a failure as opposed to a fresh cluster start (the ServerManager.deadServers list is not empty in the check within _AssignmentManager.processDeadServersAndRegionsInTransition_) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10364) Allow configuration option for parent znode in LoadTestTool
[ https://issues.apache.org/jira/browse/HBASE-10364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874360#comment-13874360 ] Hudson commented on HBASE-10364: FAILURE: Integrated in HBase-0.98 #88 (See [https://builds.apache.org/job/HBase-0.98/88/]) HBASE-10364 Allow configuration option for parent znode in LoadTestTool (Tedyu: rev 1558940) * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/util/LoadTestTool.java Allow configuration option for parent znode in LoadTestTool --- Key: HBASE-10364 URL: https://issues.apache.org/jira/browse/HBASE-10364 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.98.0, 0.99.0 Attachments: 10364-v1.txt, 10364-v2.txt I saw the following running Hoya functional test which involves LoadTestTool: {code} 2014-01-16 19:06:03,443 [Thread-2] INFO client.HConnectionManager$HConnectionImplementation (HConnectionManager.java:makeStub(1572)) - getMaster attempt 8 of 35 failed; retrying after sleep of 10098, exception=org.apache.hadoop.hbase.MasterNotRunningException: The node /hbase is not in ZooKeeper. It should have been written by themaster. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. {code} LoadTestTool was reading from correct zookeeper quorum but it wasn't able to find parent znode. An option should be added to LoadTestTool so that user can specify parent znode in zookeeper. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10333) Assignments are not retained on a cluster start
[ https://issues.apache.org/jira/browse/HBASE-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874367#comment-13874367 ] Hudson commented on HBASE-10333: FAILURE: Integrated in hbase-0.96 #260 (See [https://builds.apache.org/job/hbase-0.96/260/]) HBASE-10333 Assignments are not retained on a cluster start (jxiang: rev 1558965) * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java Assignments are not retained on a cluster start --- Key: HBASE-10333 URL: https://issues.apache.org/jira/browse/HBASE-10333 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.1.1 Reporter: Devaraj Das Assignee: Jimmy Xiang Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: hbase-10333.patch When a cluster is fully shutdown and then started up again with hbase.master.startup.retainassign set to true, I noticed that the assignments are not retained. Upon digging, it seems like HBASE-10101 made a change due to which the server holding the META previously is added to dead-servers (in _HMaster.assignMeta_). Later on, this makes the AssignmentManager think that the master recovered from a failure as opposed to a fresh cluster start (the ServerManager.deadServers list is not empty in the check within _AssignmentManager.processDeadServersAndRegionsInTransition_) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10371) Compact create empty hfile, then select this file for compaction and create empty hfile and over again.
[ https://issues.apache.org/jira/browse/HBASE-10371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] binlijin updated HBASE-10371: - Description: (1) Select HFile for compaction {code} 2014-01-16 01:01:25,111 INFO org.apache.hadoop.hbase.regionserver.compactions.CompactSelection: Deleting the expired store file by compaction: hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/a/f3e38d10d579420494079e17a2557f0b whose maxTimeStamp is -1 while the max expired timestamp is 1389632485111 {code} (2) Compact {code} 2014-01-16 01:01:26,042 DEBUG org.apache.hadoop.hbase.regionserver.Compactor: Compacting hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/a/f3e38d10d579420494079e17a2557f0b, keycount=0, bloomtype=NONE, size=534, encoding=NONE 2014-01-16 01:01:26,045 DEBUG org.apache.hadoop.hbase.util.FSUtils: Creating file=hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/.tmp/40de5d79f80e4fb197e409fb99ab0fd8 with permission=rwxrwxrwx 2014-01-16 01:01:26,076 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming compacted file at hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/.tmp/40de5d79f80e4fb197e409fb99ab0fd8 to hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/a/40de5d79f80e4fb197e409fb99ab0fd8 2014-01-16 01:01:26,142 INFO org.apache.hadoop.hbase.regionserver.Store: Completed compaction of 1 file(s) in a of storagetable,01:,1369377609136.7d8941661904fb99a41f79a1fce47767. into 40de5d79f80e4fb197e409fb99ab0fd8, size=534; total size for store is 399.0 M 2014-01-16 01:01:26,142 INFO org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest: completed compaction: regionName=storagetable,01:,1369377609136.7d8941661904fb99a41f79a1fce47767., storeName=a, fileCount=1, fileSize=534, priority=16, time=18280340606333745; duration=0sec {code} (3) Select HFile for compaction {code} 2014-01-16 03:48:05,120 INFO org.apache.hadoop.hbase.regionserver.compactions.CompactSelection: Deleting the expired store file by compaction: hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/a/40de5d79f80e4fb197e409fb99ab0fd8 whose maxTimeStamp is -1 while the max expired timestamp is 1389642485120 {code} (4) Compact {code} 2014-01-16 03:50:17,731 DEBUG org.apache.hadoop.hbase.regionserver.Compactor: Compacting hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/a/40de5d79f80e4fb197e409fb99ab0fd8, keycount=0, bloomtype=NONE, size=534, encoding=NONE 2014-01-16 03:50:17,732 DEBUG org.apache.hadoop.hbase.util.FSUtils: Creating file=hdfs://dump002002.cm6:9000/hbase-0.90 {code} ... this loop for ever. was: (1) Select HFile for compaction 2014-01-16 01:01:25,111 INFO org.apache.hadoop.hbase.regionserver.compactions.CompactSelection: Deleting the expired store file by compaction: hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/a/f3e38d10d579420494079e17a2557f0b whose maxTimeStamp is -1 while the max expired timestamp is 1389632485111 (2) Compact 2014-01-16 01:01:26,042 DEBUG org.apache.hadoop.hbase.regionserver.Compactor: Compacting hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/a/f3e38d10d579420494079e17a2557f0b, keycount=0, bloomtype=NONE, size=534, encoding=NONE 2014-01-16 01:01:26,045 DEBUG org.apache.hadoop.hbase.util.FSUtils: Creating file=hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/.tmp/40de5d79f80e4fb197e409fb99ab0fd8 with permission=rwxrwxrwx 2014-01-16 01:01:26,076 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming compacted file at hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/.tmp/40de5d79f80e4fb197e409fb99ab0fd8 to hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/a/40de5d79f80e4fb197e409fb99ab0fd8 2014-01-16 01:01:26,142 INFO org.apache.hadoop.hbase.regionserver.Store: Completed compaction of 1 file(s) in a of storagetable,01:,1369377609136.7d8941661904fb99a41f79a1fce47767. into 40de5d79f80e4fb197e409fb99ab0fd8, size=534; total size for store is 399.0 M 2014-01-16 01:01:26,142 INFO org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest: completed compaction: regionName=storagetable,01:,1369377609136.7d8941661904fb99a41f79a1fce47767., storeName=a, fileCount=1, fileSize=534, priority=16, time=18280340606333745; duration=0sec (3) Select HFile for compaction 2014-01-16 03:48:05,120 INFO org.apache.hadoop.hbase.regionserver.compactions.CompactSelection: Deleting the expired store file by compaction: hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/a/40de5d79f80e4fb197e409fb99ab0fd8 whose maxTimeStamp is -1 while the max expired timestamp is
[jira] [Created] (HBASE-10371) Compact create empty hfile, then select this file for compaction and create empty hfile and over again.
binlijin created HBASE-10371: Summary: Compact create empty hfile, then select this file for compaction and create empty hfile and over again. Key: HBASE-10371 URL: https://issues.apache.org/jira/browse/HBASE-10371 Project: HBase Issue Type: Bug Reporter: binlijin (1) Select HFile for compaction 2014-01-16 01:01:25,111 INFO org.apache.hadoop.hbase.regionserver.compactions.CompactSelection: Deleting the expired store file by compaction: hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/a/f3e38d10d579420494079e17a2557f0b whose maxTimeStamp is -1 while the max expired timestamp is 1389632485111 (2) Compact 2014-01-16 01:01:26,042 DEBUG org.apache.hadoop.hbase.regionserver.Compactor: Compacting hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/a/f3e38d10d579420494079e17a2557f0b, keycount=0, bloomtype=NONE, size=534, encoding=NONE 2014-01-16 01:01:26,045 DEBUG org.apache.hadoop.hbase.util.FSUtils: Creating file=hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/.tmp/40de5d79f80e4fb197e409fb99ab0fd8 with permission=rwxrwxrwx 2014-01-16 01:01:26,076 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming compacted file at hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/.tmp/40de5d79f80e4fb197e409fb99ab0fd8 to hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/a/40de5d79f80e4fb197e409fb99ab0fd8 2014-01-16 01:01:26,142 INFO org.apache.hadoop.hbase.regionserver.Store: Completed compaction of 1 file(s) in a of storagetable,01:,1369377609136.7d8941661904fb99a41f79a1fce47767. into 40de5d79f80e4fb197e409fb99ab0fd8, size=534; total size for store is 399.0 M 2014-01-16 01:01:26,142 INFO org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest: completed compaction: regionName=storagetable,01:,1369377609136.7d8941661904fb99a41f79a1fce47767., storeName=a, fileCount=1, fileSize=534, priority=16, time=18280340606333745; duration=0sec (3) Select HFile for compaction 2014-01-16 03:48:05,120 INFO org.apache.hadoop.hbase.regionserver.compactions.CompactSelection: Deleting the expired store file by compaction: hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/a/40de5d79f80e4fb197e409fb99ab0fd8 whose maxTimeStamp is -1 while the max expired timestamp is 1389642485120 (4) 2014-01-16 03:50:17,731 DEBUG org.apache.hadoop.hbase.regionserver.Compactor: Compacting hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/a/40de5d79f80e4fb197e409fb99ab0fd8, keycount=0, bloomtype=NONE, size=534, encoding=NONE 2014-01-16 03:50:17,732 DEBUG org.apache.hadoop.hbase.util.FSUtils: Creating file=hdfs://dump002002.cm6:9000/hbase-0.90 ... this loop for ever. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10333) Assignments are not retained on a cluster start
[ https://issues.apache.org/jira/browse/HBASE-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874387#comment-13874387 ] Hudson commented on HBASE-10333: SUCCESS: Integrated in HBase-TRUNK #4828 (See [https://builds.apache.org/job/HBase-TRUNK/4828/]) HBASE-10333 Assignments are not retained on a cluster start (jxiang: rev 1558963) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java Assignments are not retained on a cluster start --- Key: HBASE-10333 URL: https://issues.apache.org/jira/browse/HBASE-10333 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.1.1 Reporter: Devaraj Das Assignee: Jimmy Xiang Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: hbase-10333.patch When a cluster is fully shutdown and then started up again with hbase.master.startup.retainassign set to true, I noticed that the assignments are not retained. Upon digging, it seems like HBASE-10101 made a change due to which the server holding the META previously is added to dead-servers (in _HMaster.assignMeta_). Later on, this makes the AssignmentManager think that the master recovered from a failure as opposed to a fresh cluster start (the ServerManager.deadServers list is not empty in the check within _AssignmentManager.processDeadServersAndRegionsInTransition_) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10371) Compact create empty hfile, then select this file for compaction and create empty hfile and over again.
[ https://issues.apache.org/jira/browse/HBASE-10371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] binlijin updated HBASE-10371: - Attachment: HBASE-10371-94.patch Compact create empty hfile, then select this file for compaction and create empty hfile and over again. --- Key: HBASE-10371 URL: https://issues.apache.org/jira/browse/HBASE-10371 Project: HBase Issue Type: Bug Reporter: binlijin Attachments: HBASE-10371-94.patch (1) Select HFile for compaction {code} 2014-01-16 01:01:25,111 INFO org.apache.hadoop.hbase.regionserver.compactions.CompactSelection: Deleting the expired store file by compaction: hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/a/f3e38d10d579420494079e17a2557f0b whose maxTimeStamp is -1 while the max expired timestamp is 1389632485111 {code} (2) Compact {code} 2014-01-16 01:01:26,042 DEBUG org.apache.hadoop.hbase.regionserver.Compactor: Compacting hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/a/f3e38d10d579420494079e17a2557f0b, keycount=0, bloomtype=NONE, size=534, encoding=NONE 2014-01-16 01:01:26,045 DEBUG org.apache.hadoop.hbase.util.FSUtils: Creating file=hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/.tmp/40de5d79f80e4fb197e409fb99ab0fd8 with permission=rwxrwxrwx 2014-01-16 01:01:26,076 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming compacted file at hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/.tmp/40de5d79f80e4fb197e409fb99ab0fd8 to hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/a/40de5d79f80e4fb197e409fb99ab0fd8 2014-01-16 01:01:26,142 INFO org.apache.hadoop.hbase.regionserver.Store: Completed compaction of 1 file(s) in a of storagetable,01:,1369377609136.7d8941661904fb99a41f79a1fce47767. into 40de5d79f80e4fb197e409fb99ab0fd8, size=534; total size for store is 399.0 M 2014-01-16 01:01:26,142 INFO org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest: completed compaction: regionName=storagetable,01:,1369377609136.7d8941661904fb99a41f79a1fce47767., storeName=a, fileCount=1, fileSize=534, priority=16, time=18280340606333745; duration=0sec {code} (3) Select HFile for compaction {code} 2014-01-16 03:48:05,120 INFO org.apache.hadoop.hbase.regionserver.compactions.CompactSelection: Deleting the expired store file by compaction: hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/a/40de5d79f80e4fb197e409fb99ab0fd8 whose maxTimeStamp is -1 while the max expired timestamp is 1389642485120 {code} (4) Compact {code} 2014-01-16 03:50:17,731 DEBUG org.apache.hadoop.hbase.regionserver.Compactor: Compacting hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/a/40de5d79f80e4fb197e409fb99ab0fd8, keycount=0, bloomtype=NONE, size=534, encoding=NONE 2014-01-16 03:50:17,732 DEBUG org.apache.hadoop.hbase.util.FSUtils: Creating file=hdfs://dump002002.cm6:9000/hbase-0.90 {code} ... this loop for ever. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10371) Compact create empty hfile, then select this file for compaction and create empty hfile and over again.
[ https://issues.apache.org/jira/browse/HBASE-10371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] binlijin updated HBASE-10371: - Attachment: HBASE-10371-trunk.patch Compact create empty hfile, then select this file for compaction and create empty hfile and over again. --- Key: HBASE-10371 URL: https://issues.apache.org/jira/browse/HBASE-10371 Project: HBase Issue Type: Bug Reporter: binlijin Attachments: HBASE-10371-94.patch, HBASE-10371-trunk.patch (1) Select HFile for compaction {code} 2014-01-16 01:01:25,111 INFO org.apache.hadoop.hbase.regionserver.compactions.CompactSelection: Deleting the expired store file by compaction: hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/a/f3e38d10d579420494079e17a2557f0b whose maxTimeStamp is -1 while the max expired timestamp is 1389632485111 {code} (2) Compact {code} 2014-01-16 01:01:26,042 DEBUG org.apache.hadoop.hbase.regionserver.Compactor: Compacting hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/a/f3e38d10d579420494079e17a2557f0b, keycount=0, bloomtype=NONE, size=534, encoding=NONE 2014-01-16 01:01:26,045 DEBUG org.apache.hadoop.hbase.util.FSUtils: Creating file=hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/.tmp/40de5d79f80e4fb197e409fb99ab0fd8 with permission=rwxrwxrwx 2014-01-16 01:01:26,076 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming compacted file at hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/.tmp/40de5d79f80e4fb197e409fb99ab0fd8 to hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/a/40de5d79f80e4fb197e409fb99ab0fd8 2014-01-16 01:01:26,142 INFO org.apache.hadoop.hbase.regionserver.Store: Completed compaction of 1 file(s) in a of storagetable,01:,1369377609136.7d8941661904fb99a41f79a1fce47767. into 40de5d79f80e4fb197e409fb99ab0fd8, size=534; total size for store is 399.0 M 2014-01-16 01:01:26,142 INFO org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest: completed compaction: regionName=storagetable,01:,1369377609136.7d8941661904fb99a41f79a1fce47767., storeName=a, fileCount=1, fileSize=534, priority=16, time=18280340606333745; duration=0sec {code} (3) Select HFile for compaction {code} 2014-01-16 03:48:05,120 INFO org.apache.hadoop.hbase.regionserver.compactions.CompactSelection: Deleting the expired store file by compaction: hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/a/40de5d79f80e4fb197e409fb99ab0fd8 whose maxTimeStamp is -1 while the max expired timestamp is 1389642485120 {code} (4) Compact {code} 2014-01-16 03:50:17,731 DEBUG org.apache.hadoop.hbase.regionserver.Compactor: Compacting hdfs://dump002002.cm6:9000/hbase-0.90/storagetable/7d8941661904fb99a41f79a1fce47767/a/40de5d79f80e4fb197e409fb99ab0fd8, keycount=0, bloomtype=NONE, size=534, encoding=NONE 2014-01-16 03:50:17,732 DEBUG org.apache.hadoop.hbase.util.FSUtils: Creating file=hdfs://dump002002.cm6:9000/hbase-0.90 {code} ... this loop for ever. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10369) LabelExpander#createLabels() should close scanner in finally clause
[ https://issues.apache.org/jira/browse/HBASE-10369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10369: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) LabelExpander#createLabels() should close scanner in finally clause --- Key: HBASE-10369 URL: https://issues.apache.org/jira/browse/HBASE-10369 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10369-v1.txt Here is related code: {code} while (true) { Result next = scanner.next(); if (next == null) { break; } byte[] row = next.getRow(); byte[] value = next.getValue(LABELS_TABLE_FAMILY, LABEL_QUALIFIER); labels.put(Bytes.toString(value), Bytes.toInt(row)); } scanner.close(); } finally { {code} If scanner.next() throws exception, scanner would be left open. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10333) Assignments are not retained on a cluster start
[ https://issues.apache.org/jira/browse/HBASE-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874392#comment-13874392 ] Hudson commented on HBASE-10333: SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #80 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/80/]) HBASE-10333 Assignments are not retained on a cluster start (jxiang: rev 1558964) * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java Assignments are not retained on a cluster start --- Key: HBASE-10333 URL: https://issues.apache.org/jira/browse/HBASE-10333 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.1.1 Reporter: Devaraj Das Assignee: Jimmy Xiang Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: hbase-10333.patch When a cluster is fully shutdown and then started up again with hbase.master.startup.retainassign set to true, I noticed that the assignments are not retained. Upon digging, it seems like HBASE-10101 made a change due to which the server holding the META previously is added to dead-servers (in _HMaster.assignMeta_). Later on, this makes the AssignmentManager think that the master recovered from a failure as opposed to a fresh cluster start (the ServerManager.deadServers list is not empty in the check within _AssignmentManager.processDeadServersAndRegionsInTransition_) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10369) LabelExpander#createLabels() should close scanner in finally clause
[ https://issues.apache.org/jira/browse/HBASE-10369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874393#comment-13874393 ] Hudson commented on HBASE-10369: SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #80 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/80/]) HBASE-10369 LabelExpander#createLabels() should close scanner in finally clause (Tedyu: rev 1558976) * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/LabelExpander.java LabelExpander#createLabels() should close scanner in finally clause --- Key: HBASE-10369 URL: https://issues.apache.org/jira/browse/HBASE-10369 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10369-v1.txt Here is related code: {code} while (true) { Result next = scanner.next(); if (next == null) { break; } byte[] row = next.getRow(); byte[] value = next.getValue(LABELS_TABLE_FAMILY, LABEL_QUALIFIER); labels.put(Bytes.toString(value), Bytes.toInt(row)); } scanner.close(); } finally { {code} If scanner.next() throws exception, scanner would be left open. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10156) Fix up the HBASE-8755 slowdown when low contention
[ https://issues.apache.org/jira/browse/HBASE-10156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874406#comment-13874406 ] Hadoop QA commented on HBASE-10156: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12623548/10156v17.txt against trunk revision . ATTACHMENT ID: 12623548 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 27 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8453//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8453//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8453//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8453//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8453//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8453//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8453//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8453//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8453//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8453//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8453//console This message is automatically generated. Fix up the HBASE-8755 slowdown when low contention -- Key: HBASE-10156 URL: https://issues.apache.org/jira/browse/HBASE-10156 Project: HBase Issue Type: Sub-task Components: wal Reporter: stack Assignee: stack Attachments: 10156.txt, 10156v10.txt, 10156v11.txt, 10156v12.txt, 10156v12.txt, 10156v13.txt, 10156v16.txt, 10156v17.txt, 10156v2.txt, 10156v3.txt, 10156v4.txt, 10156v5.txt, 10156v6.txt, 10156v7.txt, 10156v9.txt, Disrupting.java HBASE-8755 slows our writes when only a few clients. Fix. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10249) Intermittent TestReplicationSyncUpTool failure
[ https://issues.apache.org/jira/browse/HBASE-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874410#comment-13874410 ] Demai Ni commented on HBASE-10249: -- [~jdcryans],[~jmhsieh], many thanks to both of you. You guys are great and fast. I was occupied by my 'day-time' job, just come back to this and found out the patch is there and testing is done. Appreciate it. Intermittent TestReplicationSyncUpTool failure -- Key: HBASE-10249 URL: https://issues.apache.org/jira/browse/HBASE-10249 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Demai Ni Fix For: 0.98.0, 0.96.2, 0.99.0, 0.94.17 Attachments: HBASE-10249-0.94-v0.patch, HBASE-10249-0.94-v1.patch, HBASE-10249-trunk-v0.patch, HBASE-10249-trunk-v1.patch New issue to keep track of this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10249) Intermittent TestReplicationSyncUpTool failure
[ https://issues.apache.org/jira/browse/HBASE-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-10249: - Assignee: Jean-Daniel Cryans (was: Demai Ni) Intermittent TestReplicationSyncUpTool failure -- Key: HBASE-10249 URL: https://issues.apache.org/jira/browse/HBASE-10249 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Jean-Daniel Cryans Fix For: 0.98.0, 0.96.2, 0.99.0, 0.94.17 Attachments: HBASE-10249-0.94-v0.patch, HBASE-10249-0.94-v1.patch, HBASE-10249-trunk-v0.patch, HBASE-10249-trunk-v1.patch New issue to keep track of this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10249) Intermittent TestReplicationSyncUpTool failure
[ https://issues.apache.org/jira/browse/HBASE-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874412#comment-13874412 ] Demai Ni commented on HBASE-10249: -- assigned to [~jdcryans] who fixed the bug and deserves the credit, where I ran out of idea. :-) Intermittent TestReplicationSyncUpTool failure -- Key: HBASE-10249 URL: https://issues.apache.org/jira/browse/HBASE-10249 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Jean-Daniel Cryans Fix For: 0.98.0, 0.96.2, 0.99.0, 0.94.17 Attachments: HBASE-10249-0.94-v0.patch, HBASE-10249-0.94-v1.patch, HBASE-10249-trunk-v0.patch, HBASE-10249-trunk-v1.patch New issue to keep track of this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)