[jira] [Updated] (HBASE-6769) HRS.multi eats NoSuchColumnFamilyException since HBASE-5021
[ https://issues.apache.org/jira/browse/HBASE-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-6769: - Attachment: HBASE-6769-0.94-1.patch Here's the patch without the FailedSanityCheckException. I put in comments around everywhere that catches the exception so hopefully that will keep things sane. TestHRegion and TestFromClientSide are both passing on my machine locally. Running the rest of the suite right now. HRS.multi eats NoSuchColumnFamilyException since HBASE-5021 --- Key: HBASE-6769 URL: https://issues.apache.org/jira/browse/HBASE-6769 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.94.1 Reporter: Jean-Daniel Cryans Assignee: Elliott Clark Priority: Critical Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6769-0.94-0.patch, HBASE-6769-0.94-1.patch, HBASE-6769-0.patch I think this is a pretty major usability regression, since HBASE-5021 this is what you get in the client when using a wrong family: {noformat} 2012-09-11 09:45:29,634 WARN org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: DoNotRetryIOException: 1 time, servers with issues: sfor3s44:10304, at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1601) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1377) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:772) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:747) {noformat} Then you have to log on the server to understand what failed. Since everything is now a multi call, even single puts in the shell fail like this. This is present since 0.94.0 Assigning to Elliott because he asked. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5447) Support for custom filters with PB-based RPC
[ https://issues.apache.org/jira/browse/HBASE-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-5447. -- Resolution: Fixed Fix Version/s: 0.96.0 Assignee: Gregory Chanan (was: Todd Lipcon) Hadoop Flags: Reviewed Closing. Assigned Gregory. Regards custom filters, let them come out of the woodwork. We'll help them make the convertion to pb. Support for custom filters with PB-based RPC Key: HBASE-5447 URL: https://issues.apache.org/jira/browse/HBASE-5447 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Todd Lipcon Assignee: Gregory Chanan Fix For: 0.96.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6769) HRS.multi eats NoSuchColumnFamilyException since HBASE-5021
[ https://issues.apache.org/jira/browse/HBASE-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454679#comment-13454679 ] Hadoop QA commented on HBASE-6769: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12544945/HBASE-6769-0.94-1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2856//console This message is automatically generated. HRS.multi eats NoSuchColumnFamilyException since HBASE-5021 --- Key: HBASE-6769 URL: https://issues.apache.org/jira/browse/HBASE-6769 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.94.1 Reporter: Jean-Daniel Cryans Assignee: Elliott Clark Priority: Critical Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6769-0.94-0.patch, HBASE-6769-0.94-1.patch, HBASE-6769-0.patch I think this is a pretty major usability regression, since HBASE-5021 this is what you get in the client when using a wrong family: {noformat} 2012-09-11 09:45:29,634 WARN org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: DoNotRetryIOException: 1 time, servers with issues: sfor3s44:10304, at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1601) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1377) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:772) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:747) {noformat} Then you have to log on the server to understand what failed. Since everything is now a multi call, even single puts in the shell fail like this. This is present since 0.94.0 Assigning to Elliott because he asked. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6769) HRS.multi eats NoSuchColumnFamilyException since HBASE-5021
[ https://issues.apache.org/jira/browse/HBASE-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454683#comment-13454683 ] Lars Hofhansl commented on HBASE-6769: -- +1 on 0.94 patch as well. This comment is weird, as it refers to a non existing exception. {code} + // Don't send a FailedSanityCheckException as older clients will not know about + // that class being a subclass of DoNotRetryIOException + // and will retry mutations that will never succeed. {code} Don't post a new patch :) ... I'll change the comment on commit: {code} // Use generic DoNotRetryIOException so that older clients know how to deal with it. {code} HRS.multi eats NoSuchColumnFamilyException since HBASE-5021 --- Key: HBASE-6769 URL: https://issues.apache.org/jira/browse/HBASE-6769 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.94.1 Reporter: Jean-Daniel Cryans Assignee: Elliott Clark Priority: Critical Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6769-0.94-0.patch, HBASE-6769-0.94-1.patch, HBASE-6769-0.patch I think this is a pretty major usability regression, since HBASE-5021 this is what you get in the client when using a wrong family: {noformat} 2012-09-11 09:45:29,634 WARN org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: DoNotRetryIOException: 1 time, servers with issues: sfor3s44:10304, at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1601) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1377) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:772) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:747) {noformat} Then you have to log on the server to understand what failed. Since everything is now a multi call, even single puts in the shell fail like this. This is present since 0.94.0 Assigning to Elliott because he asked. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6769) HRS.multi eats NoSuchColumnFamilyException since HBASE-5021
[ https://issues.apache.org/jira/browse/HBASE-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454685#comment-13454685 ] Elliott Clark commented on HBASE-6769: -- haha yeah I guess I have more context than 0.94 source would give a reader. Sorry about that. HRS.multi eats NoSuchColumnFamilyException since HBASE-5021 --- Key: HBASE-6769 URL: https://issues.apache.org/jira/browse/HBASE-6769 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.94.1 Reporter: Jean-Daniel Cryans Assignee: Elliott Clark Priority: Critical Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6769-0.94-0.patch, HBASE-6769-0.94-1.patch, HBASE-6769-0.patch I think this is a pretty major usability regression, since HBASE-5021 this is what you get in the client when using a wrong family: {noformat} 2012-09-11 09:45:29,634 WARN org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: DoNotRetryIOException: 1 time, servers with issues: sfor3s44:10304, at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1601) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1377) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:772) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:747) {noformat} Then you have to log on the server to understand what failed. Since everything is now a multi call, even single puts in the shell fail like this. This is present since 0.94.0 Assigning to Elliott because he asked. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5306) Add support for protocol buffer based RPC
[ https://issues.apache.org/jira/browse/HBASE-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454688#comment-13454688 ] Gregory Chanan commented on HBASE-5306: --- Do you think there is more to do here Devaraj? Or does HBASE-5705 and HBASE-5451 cover this? Add support for protocol buffer based RPC - Key: HBASE-5306 URL: https://issues.apache.org/jira/browse/HBASE-5306 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Devaraj Das Assignee: Devaraj Das This will help HBase to achieve wire compatibility across versions. The idea (to start with) is to leverage the recent work that has gone in in the Hadoop core in this area. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3529) Add search to HBase
[ https://issues.apache.org/jira/browse/HBASE-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liusheding updated HBASE-3529: -- Description: Using the Apache Lucene library we can add freetext search to HBase. The advantages of this are: * HBase is highly scalable and distributed * HBase is realtime * Lucene is a fast inverted index and will soon be realtime (see LUCENE-2312) * Lucene offers many types of queries not currently available in HBase (eg, AND, OR, NOT, phrase, etc) * It's easier to build scalable realtime systems on top of already architecturally sound, scalable realtime data system, eg, HBase. * Scaling realtime search will be as simple as scaling HBase. Phase 1 - Indexing: * Integrate Lucene into HBase such that an index mirrors a given region. This means cascading add, update, and deletes between a Lucene index and an HBase region (and vice versa). * Define meta-data to mark a region as indexed, and use a Solr schema to allow the user to define the fields and analyzers. * Integrate with the HLog to ensure that index recovery can occur properly (eg, on region server failure) * Mirror region splits with indexes (use Lucene's IndexSplitter?) * When a region is written to HDFS, also write the corresponding Lucene index to HDFS. * A row key will be the ID of a given Lucene document. The Lucene docstore will explicitly not be used because the document/row data is stored in HBase. We will need to solve what the best data structure for efficiently mapping a docid - row key is. It could be a docstore, field cache, column stride fields, or some other mechanism. * Write unit tests for the above Phase 2 - Queries: * Enable distributed Lucene queries * Regions that have Lucene indexes are inherently available and may be searched on, meaning there's no need for a separate search related system in Zookeeper. * Integrate search with HBase's RPC mechanis was: Using the Apache Lucene library we can add freetext search to HBase. The advantages of this are: * HBase is highly scalable and distributed * HBase is realtime * Lucene is a fast inverted index and will soon be realtime (see LUCENE-2312) * Lucene offers many types of queries not currently available in HBase (eg, AND, OR, NOT, phrase, etc) * It's easier to build scalable realtime systems on top of already architecturally sound, scalable realtime data system, eg, HBase. * Scaling realtime search will be as simple as scaling HBase. Phase 1 - Indexing: * Integrate Lucene into HBase such that an index mirrors a given region. This means cascading add, update, and deletes between a Lucene index and an HBase region (and vice versa). * Define meta-data to mark a region as indexed, and use a Solr schema to allow the user to define the fields and analyzers. * Integrate with the HLog to ensure that index recovery can occur properly (eg, on region server failure) * Mirror region splits with indexes (use Lucene's IndexSplitter?) * When a region is written to HDFS, also write the corresponding Lucene index to HDFS. * A row key will be the ID of a given Lucene document. The Lucene docstore will explicitly not be used because the document/row data is stored in HBase. We will need to solve what the best data structure for efficiently mapping a docid - row key is. It could be a docstore, field cache, column stride fields, or some other mechanism. * Write unit tests for the above Phase 2 - Queries: * Enable distributed Lucene queries * Regions that have Lucene indexes are inherently available and may be searched on, meaning there's no need for a separate search related system in Zookeeper. * Integrate search with HBase's RPC mechanism Add search to HBase --- Key: HBASE-3529 URL: https://issues.apache.org/jira/browse/HBASE-3529 Project: HBase Issue Type: Improvement Affects Versions: 0.90.0 Reporter: Jason Rutherglen Attachments: HBASE-3529.patch, HDFS-APPEND-0.20-LOCAL-FILE.patch Using the Apache Lucene library we can add freetext search to HBase. The advantages of this are: * HBase is highly scalable and distributed * HBase is realtime * Lucene is a fast inverted index and will soon be realtime (see LUCENE-2312) * Lucene offers many types of queries not currently available in HBase (eg, AND, OR, NOT, phrase, etc) * It's easier to build scalable realtime systems on top of already architecturally sound, scalable realtime data system, eg, HBase. * Scaling realtime search will be as simple as scaling HBase. Phase 1 - Indexing: * Integrate Lucene into HBase such that an index mirrors a given region. This means cascading add, update, and deletes between a Lucene index and an HBase region (and vice versa). * Define meta-data to mark a region as indexed, and use a Solr schema to allow the user to define the fields and analyzers.
[jira] [Commented] (HBASE-6769) HRS.multi eats NoSuchColumnFamilyException since HBASE-5021
[ https://issues.apache.org/jira/browse/HBASE-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454709#comment-13454709 ] Elliott Clark commented on HBASE-6769: -- All tests passed locally for 0.94. HRS.multi eats NoSuchColumnFamilyException since HBASE-5021 --- Key: HBASE-6769 URL: https://issues.apache.org/jira/browse/HBASE-6769 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.94.1 Reporter: Jean-Daniel Cryans Assignee: Elliott Clark Priority: Critical Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6769-0.94-0.patch, HBASE-6769-0.94-1.patch, HBASE-6769-0.patch I think this is a pretty major usability regression, since HBASE-5021 this is what you get in the client when using a wrong family: {noformat} 2012-09-11 09:45:29,634 WARN org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: DoNotRetryIOException: 1 time, servers with issues: sfor3s44:10304, at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1601) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1377) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:772) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:747) {noformat} Then you have to log on the server to understand what failed. Since everything is now a multi call, even single puts in the shell fail like this. This is present since 0.94.0 Assigning to Elliott because he asked. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5971) ServerLoad needs redo; can't be pb based
[ https://issues.apache.org/jira/browse/HBASE-5971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark resolved HBASE-5971. -- Resolution: Not A Problem This was fixed in HBASE-6411 which is a sub issue of HBASE-4050 ServerLoad needs redo; can't be pb based Key: HBASE-5971 URL: https://issues.apache.org/jira/browse/HBASE-5971 Project: HBase Issue Type: Bug Components: metrics Reporter: stack Priority: Blocker Fix For: 0.96.0 Here is what happens when we try to register server bean: {code} javax.management.NotCompliantMBeanException: org.apache.hadoop.hbase.master.MXBean: Method org.apache.hadoop.hbase.master.MXBean.getRegionServers has parameter or return type that cannot be translated into an open type at com.sun.jmx.mbeanserver.Introspector.throwException(Introspector.java:412) at com.sun.jmx.mbeanserver.MBeanAnalyzer.init(MBeanAnalyzer.java:101) at com.sun.jmx.mbeanserver.MBeanAnalyzer.analyzer(MBeanAnalyzer.java:87) at com.sun.jmx.mbeanserver.MXBeanIntrospector.getAnalyzer(MXBeanIntrospector.java:53) at com.sun.jmx.mbeanserver.MBeanIntrospector.getPerInterface(MBeanIntrospector.java:163) at com.sun.jmx.mbeanserver.MBeanSupport.init(MBeanSupport.java:147) at com.sun.jmx.mbeanserver.MXBeanSupport.init(MXBeanSupport.java:48) at com.sun.jmx.mbeanserver.Introspector.makeDynamicMBean(Introspector.java:184) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:915) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:312) at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:482) at org.apache.hadoop.metrics.util.MBeanUtil.registerMBean(MBeanUtil.java:58) at org.apache.hadoop.hbase.master.HMaster.registerMBean(HMaster.java:1926) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:617) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:367) at java.lang.Thread.run(Thread.java:680) Caused by: java.lang.IllegalArgumentException: Method org.apache.hadoop.hbase.master.MXBean.getRegionServers has parameter or return type that cannot be translated into an open type at com.sun.jmx.mbeanserver.ConvertingMethod.from(ConvertingMethod.java:32) at com.sun.jmx.mbeanserver.MXBeanIntrospector.mFrom(MXBeanIntrospector.java:63) at com.sun.jmx.mbeanserver.MXBeanIntrospector.mFrom(MXBeanIntrospector.java:33) at com.sun.jmx.mbeanserver.MBeanAnalyzer.initMaps(MBeanAnalyzer.java:118) at com.sun.jmx.mbeanserver.MBeanAnalyzer.init(MBeanAnalyzer.java:99) ... 14 more Caused by: javax.management.openmbean.OpenDataException: Cannot convert type: java.util.Mapjava.lang.String, org.apache.hadoop.hbase.ServerLoad at com.sun.jmx.mbeanserver.OpenConverter.openDataException(OpenConverter.java:1411) at com.sun.jmx.mbeanserver.OpenConverter.toConverter(OpenConverter.java:264) at com.sun.jmx.mbeanserver.ConvertingMethod.init(ConvertingMethod.java:184) at com.sun.jmx.mbeanserver.ConvertingMethod.from(ConvertingMethod.java:27) ... 18 more Caused by: javax.management.openmbean.OpenDataException: Cannot convert type: class org.apache.hadoop.hbase.ServerLoad at com.sun.jmx.mbeanserver.OpenConverter.openDataException(OpenConverter.java:1411) at com.sun.jmx.mbeanserver.OpenConverter.toConverter(OpenConverter.java:264) at com.sun.jmx.mbeanserver.OpenConverter.makeTabularConverter(OpenConverter.java:360) at com.sun.jmx.mbeanserver.OpenConverter.makeParameterizedConverter(OpenConverter.java:402) at com.sun.jmx.mbeanserver.OpenConverter.makeConverter(OpenConverter.java:296) at com.sun.jmx.mbeanserver.OpenConverter.toConverter(OpenConverter.java:262) ... 20 more Caused by: javax.management.openmbean.OpenDataException: Cannot convert type: java.util.Listorg.apache.hadoop.hbase.protobuf.generated.HBaseProtos$Coprocessor at com.sun.jmx.mbeanserver.OpenConverter.openDataException(OpenConverter.java:1411) at com.sun.jmx.mbeanserver.OpenConverter.toConverter(OpenConverter.java:264) at com.sun.jmx.mbeanserver.OpenConverter.makeCompositeConverter(OpenConverter.java:467) at com.sun.jmx.mbeanserver.OpenConverter.makeConverter(OpenConverter.java:293) at com.sun.jmx.mbeanserver.OpenConverter.toConverter(OpenConverter.java:262) ... 24 more Caused by: javax.management.openmbean.OpenDataException: Cannot convert type: class org.apache.hadoop.hbase.protobuf.generated.HBaseProtos$Coprocessor at
[jira] [Updated] (HBASE-6413) Investigate having hbase-env.sh decide which hadoop-compat to include
[ https://issues.apache.org/jira/browse/HBASE-6413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-6413: - Component/s: metrics Investigate having hbase-env.sh decide which hadoop-compat to include - Key: HBASE-6413 URL: https://issues.apache.org/jira/browse/HBASE-6413 Project: HBase Issue Type: Sub-task Components: metrics Reporter: Elliott Clark Allow for one package to be created with both compat jars in and have hbase-env load the correct one. This would simplify shipping tar.gz's -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6408) Naming and documenting of the hadoop-metrics2.properties file
[ https://issues.apache.org/jira/browse/HBASE-6408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-6408: - Component/s: metrics Naming and documenting of the hadoop-metrics2.properties file - Key: HBASE-6408 URL: https://issues.apache.org/jira/browse/HBASE-6408 Project: HBase Issue Type: Sub-task Components: metrics Affects Versions: 0.96.0 Reporter: Elliott Clark Assignee: Elliott Clark Priority: Blocker Attachments: HBASE-6408-0.patch hadoop-metrics2.properties is currently where metrics2 loads it's sinks. This file could be better named, hadoop-hbase-metrics2.properties In addition it needs examples like the current hadoop-metrics.properties has. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6412) Move external servers to metrics2 (thrift,thrift2,rest)
[ https://issues.apache.org/jira/browse/HBASE-6412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-6412: - Component/s: metrics Move external servers to metrics2 (thrift,thrift2,rest) --- Key: HBASE-6412 URL: https://issues.apache.org/jira/browse/HBASE-6412 Project: HBase Issue Type: Sub-task Components: metrics Affects Versions: 0.96.0 Reporter: Elliott Clark Assignee: Elliott Clark Priority: Blocker Attachments: HBASE-6412-0.patch, HBASE-6412-1.patch, HBASE-6412-2.patch, HBASE-6412-3.patch, HBASE-6412-4.patch, HBASE-6412-5.patch Implement metrics2 for all the external servers: * Thrift * Thrift2 * Rest -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6410) Move RegionServer Metrics to metrics2
[ https://issues.apache.org/jira/browse/HBASE-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-6410: - Component/s: metrics Move RegionServer Metrics to metrics2 - Key: HBASE-6410 URL: https://issues.apache.org/jira/browse/HBASE-6410 Project: HBase Issue Type: Sub-task Components: metrics Affects Versions: 0.96.0 Reporter: Elliott Clark Assignee: Alex Baranau Priority: Blocker Attachments: HBASE-6410.patch Move RegionServer Metrics to metrics2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6717) Remove hadoop-metrics.properties when everything has moved over.
[ https://issues.apache.org/jira/browse/HBASE-6717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-6717: - Component/s: metrics Remove hadoop-metrics.properties when everything has moved over. Key: HBASE-6717 URL: https://issues.apache.org/jira/browse/HBASE-6717 Project: HBase Issue Type: Sub-task Components: metrics Reporter: Elliott Clark Assignee: Elliott Clark -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6409) Create histogram class for metrics 2
[ https://issues.apache.org/jira/browse/HBASE-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-6409: - Component/s: metrics Create histogram class for metrics 2 Key: HBASE-6409 URL: https://issues.apache.org/jira/browse/HBASE-6409 Project: HBase Issue Type: Sub-task Components: metrics Affects Versions: 0.96.0 Reporter: Elliott Clark Assignee: Elliott Clark Priority: Blocker Attachments: HBASE-6409-0.patch, HBASE-6409-1.patch, HBASE-6409-2.patch, HBASE-6409-3.patch, HBASE-6409-4.patch Create the replacement for MetricsHistogram and PersistantTimeVaryingRate classes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4366) dynamic metrics logging
[ https://issues.apache.org/jira/browse/HBASE-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454716#comment-13454716 ] Elliott Clark commented on HBASE-4366: -- Seems like this has been addressed in 0.94+ we now have Per Region Metrics, Per CF metrics, and per block type. Are there other requirements or has this been completed ? dynamic metrics logging --- Key: HBASE-4366 URL: https://issues.apache.org/jira/browse/HBASE-4366 Project: HBase Issue Type: New Feature Components: metrics Reporter: Ming Ma Assignee: Ming Ma First, if there is existing solution for this, I would close this jira. Also I realize we already have various overlapping solutions; creating another solution isn't necessarily the best approach. However, I couldn't find anything that can meet the need. So open this jira for discussion. We have some scenarios in hbase/mapreduce/hdfs that requires logging large number of dynamic metrics. They can be used for troubleshooting, better measurement on the system and scorecard. For example, 1.HBase. Get metrics such as request per sec that are specific to a table, or column family. 2.Mapreduce Job history analysis. Would like to found out all the job ids that are submitted, completed, etc. in a specific time window. For troubleshooting, what people usually do today, 1) Use current machine-level metrics to find out which machine has the issue. 2) go to that machine, analysis the local log. The characteristics of such kind of metrics: 1.It isn't something that can be predefined. The key such as table name, job id is dynamic. 2.The number of such metrics could be much larger than what the current metrics framework can handle. 3.We don't have a scenario that require near real time query support, e.g., from the time the metrics is generated to the time it is available to query can be at like an hour. 4.How data is consumed is highly application specific. Some ideas: 1. Provide some interface for any application to log data. 2. The metrics can be written to log files. The log files or log entries will be loaded to HBase, or HDFS asynchronously. That could go to a separate cluster. 3. To consume such data, application could run map reduce job on the log files for aggregation, or do random read directly from HBase. Comments? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6261) Better approximate high-percentile percentile latency metrics
[ https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-6261: - Component/s: metrics Better approximate high-percentile percentile latency metrics - Key: HBASE-6261 URL: https://issues.apache.org/jira/browse/HBASE-6261 Project: HBase Issue Type: New Feature Components: metrics Reporter: Andrew Wang Assignee: Andrew Wang Labels: metrics Attachments: Latencyestimation.pdf, MetricsHistogram.data, parse.py, SampleQuantiles.data The existing reservoir-sampling based latency metrics in HBase are not well-suited for providing accurate estimates of high-percentile (e.g. 90th, 95th, or 99th) latency. This is a well-studied problem in the literature (see [1] and [2]), the question is determining which methods best suit our needs and then implementing it. Ideally, we should be able to estimate these high percentiles with minimal memory and CPU usage as well as minimal error (e.g. 1% error on 90th, or .1% on 99th). It's also desirable to provide this over different time-based sliding windows, e.g. last 1 min, 5 mins, 15 mins, and 1 hour. I'll note that this would also be useful in HDFS, or really anywhere latency metrics are kept. [1] http://www.cs.rutgers.edu/~muthu/bquant.pdf [2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6500) hbck comlaining, Exception in thread main java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/HBASE-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454770#comment-13454770 ] liuli commented on HBASE-6500: -- Mr. Sorry for wrong type. hbck comlaining, Exception in thread main java.lang.NullPointerException --- Key: HBASE-6500 URL: https://issues.apache.org/jira/browse/HBASE-6500 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.94.0 Environment: Hadoop 0.20.205.0 Zookeeper: zookeeper-3.3.5.jar Hbase: hbase-0.94.0 Reporter: liuli I met problem with starting Hbase: I have 5 machines (Ubuntu) 109.123.121.23 rsmm-master.example.com 109.123.121.24 rsmm-slave-1.example.com 109.123.121.25 rsmm-slave-2.example.com 109.123.121.26 rsmm-slave-3.example.com 109.123.121.27 rsmm-slave-4.example.com Hadoop 0.20.205.0 Zookeeper: zookeeper-3.3.5.jar Hbase: hbase-0.94.0 After starting HBase, running hbck hduser@rsmm-master:~/hbase/bin$ ./hbase hbck 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.5-1301095, built on 03/15/2012 19:48 GMT 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client environment:host.name=rsmm-master.example.com 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_33 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc. 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jre1.6.0_33 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client
[jira] [Commented] (HBASE-6500) hbck comlaining, Exception in thread main java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/HBASE-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454768#comment-13454768 ] liuli commented on HBASE-6500: -- Ms. @Lars Hofhansl, yes, it perfectly solve this issue. You can close this ticket now. hbck comlaining, Exception in thread main java.lang.NullPointerException --- Key: HBASE-6500 URL: https://issues.apache.org/jira/browse/HBASE-6500 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.94.0 Environment: Hadoop 0.20.205.0 Zookeeper: zookeeper-3.3.5.jar Hbase: hbase-0.94.0 Reporter: liuli I met problem with starting Hbase: I have 5 machines (Ubuntu) 109.123.121.23 rsmm-master.example.com 109.123.121.24 rsmm-slave-1.example.com 109.123.121.25 rsmm-slave-2.example.com 109.123.121.26 rsmm-slave-3.example.com 109.123.121.27 rsmm-slave-4.example.com Hadoop 0.20.205.0 Zookeeper: zookeeper-3.3.5.jar Hbase: hbase-0.94.0 After starting HBase, running hbck hduser@rsmm-master:~/hbase/bin$ ./hbase hbck 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.5-1301095, built on 03/15/2012 19:48 GMT 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client environment:host.name=rsmm-master.example.com 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_33 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc. 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jre1.6.0_33 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client
[jira] [Commented] (HBASE-6528) Raise the wait time for TestSplitLogWorker#testAcquireTaskAtStartup to reduce the failure probability
[ https://issues.apache.org/jira/browse/HBASE-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454784#comment-13454784 ] ShiXing commented on HBASE-6528: [~lhofhansl] this test case was introduced when I fix HBASE-6520 which does not have any relationship with TestSplitLogWorker#testAcquireTaskAtStartup. I don't see any failing till now~ Raise the wait time for TestSplitLogWorker#testAcquireTaskAtStartup to reduce the failure probability - Key: HBASE-6528 URL: https://issues.apache.org/jira/browse/HBASE-6528 Project: HBase Issue Type: Bug Reporter: ShiXing Assignee: ShiXing Attachments: HBASE-6528-trunk-v1.patch All the code for the TestSplitLogWorker, only the testAcquireTaskAtStartup waits 100ms, other testCase wait 1000ms. The 100ms is short and sometimes causes testAcquireTaskAtStartup failure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6658) Rename WritableByteArrayComparable to something not mentioning Writable
[ https://issues.apache.org/jira/browse/HBASE-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454825#comment-13454825 ] Hudson commented on HBASE-6658: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #171 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/171/]) HBASE-6658 Rename WritableByteArrayComparable to something not mentioning Writable (Revision 1384191) Result = FAILURE gchanan : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/RegionObserver.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/BinaryComparator.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/BinaryPrefixComparator.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/BitComparator.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/ByteArrayComparable.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/CompareFilter.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/DependentColumnFilter.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/FamilyFilter.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/Filter.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/NullComparator.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/ParseFilter.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/QualifierFilter.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/RegexStringComparator.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/RowFilter.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueExcludeFilter.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/SubstringComparator.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/ValueFilter.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/WritableByteArrayComparable.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/rest/model/ScannerModel.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestFakeKeyInFilter.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/TestHbaseObjectWritable.java Rename WritableByteArrayComparable to something not mentioning Writable --- Key: HBASE-6658 URL: https://issues.apache.org/jira/browse/HBASE-6658 Project: HBase Issue Type: Bug Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Minor Fix For: 0.96.0 Attachments: HBASE-6658.patch, HBASE-6658-v3.patch, HBASE-6658-v4.patch, HBASE-6658-v5.patch, HBASE-6658-v6.patch After HBASE-6477, WritableByteArrayComparable will no longer be Writable, so should be renamed. Current idea is ByteArrayComparator (since all the derived classes are *Comparator not *Comparable), but I'm open to suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.
[ https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maryann Xue updated HBASE-6299: --- Attachment: HBASE-6299-v3.patch Considering a live RS would most likely eventually get to the openRegion() request and process, it might be good just to return on SocketTimeoutException, for SocketTimeoutException indicates an uncertain state in the assign process, with potential race conditions. And this can happen if a RS is temporarily running out of IPC handlers, or if the RS's response is lost on the line. RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems. - Key: HBASE-6299 URL: https://issues.apache.org/jira/browse/HBASE-6299 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.94.0 Reporter: Maryann Xue Assignee: Maryann Xue Priority: Critical Attachments: HBASE-6299.patch, HBASE-6299-v2.patch, HBASE-6299-v3.patch 1. HMaster tries to assign a region to an RS. 2. HMaster creates a RegionState for this region and puts it into regionsInTransition. 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS receives the open region request and starts to proceed, with success eventually. However, due to network problems, HMaster fails to receive the response for the openRegion() call, and the call times out. 4. HMaster attemps to assign for a second time, choosing another RS. 5. But since the HMaster's OpenedRegionHandler has been triggered by the region open of the previous RS, and the RegionState has already been removed from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK node RS_ZK_REGION_OPENING updated by the second attempt. 6. The unassigned ZK node stays and a later unassign fails coz RS_ZK_REGION_CLOSING cannot be created. {code} 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.; plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568., src=swbss-hadoop-004,60020,1340890123243, dest=swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. to swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:28,882 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2377fee2ae80007 Deleting existing unassigned node for b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2377fee2ae80007 Successfully deleted unassigned node for region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has opened the region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. that was online on serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301) 2012-06-29 07:07:41,140 WARN
[jira] [Commented] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.
[ https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454840#comment-13454840 ] Maryann Xue commented on HBASE-6299: updated the patch as HBASE-6299-v3.patch RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems. - Key: HBASE-6299 URL: https://issues.apache.org/jira/browse/HBASE-6299 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.94.0 Reporter: Maryann Xue Assignee: Maryann Xue Priority: Critical Attachments: HBASE-6299.patch, HBASE-6299-v2.patch, HBASE-6299-v3.patch 1. HMaster tries to assign a region to an RS. 2. HMaster creates a RegionState for this region and puts it into regionsInTransition. 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS receives the open region request and starts to proceed, with success eventually. However, due to network problems, HMaster fails to receive the response for the openRegion() call, and the call times out. 4. HMaster attemps to assign for a second time, choosing another RS. 5. But since the HMaster's OpenedRegionHandler has been triggered by the region open of the previous RS, and the RegionState has already been removed from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK node RS_ZK_REGION_OPENING updated by the second attempt. 6. The unassigned ZK node stays and a later unassign fails coz RS_ZK_REGION_CLOSING cannot be created. {code} 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.; plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568., src=swbss-hadoop-004,60020,1340890123243, dest=swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. to swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:28,882 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2377fee2ae80007 Deleting existing unassigned node for b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2377fee2ae80007 Successfully deleted unassigned node for region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has opened the region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. that was online on serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301) 2012-06-29 07:07:41,140 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. to serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=0, regions=575, usedHeap=0, maxHeap=0), trying to assign elsewhere instead; retry=0
[jira] [Created] (HBASE-6772) Make the Distributed Split HDFS Location aware
nkeywal created HBASE-6772: -- Summary: Make the Distributed Split HDFS Location aware Key: HBASE-6772 URL: https://issues.apache.org/jira/browse/HBASE-6772 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal During a hlog split, each log file (a single hdfs block) is allocated to a different region server. This region server reads the file and creates the recovery edit files. The allocation to the region server is random. We could take into account the locations of the log file to split: - the reads would be local, hence faster. This allows short circuit as well. - less network i/o used during a failure (and this is important) - we would be sure to read from a working datanode, hence we're sure we won't have read errors. Read errors slow the split process a lot, as we often enter the timeouted world. We need to limit the calls to the namenode however. Typical algo could be: - the master gets the locations of the hlog files - it writes it into ZK, if possible in one transaction (this way all the tasks are visible alltogether, allowing some arbitrage by the region server). - when the regionserver receives the event, it checks for all logs and all locations. - if there is a match, it takes it - if not it waits something like 0.2s (to give the time to other regionserver to take it if the location matches), and take any remaining task. Drawbacks are: - a 0.2s delay added if there is no regionserver available on one of the locations. It's likely possible to remove it with some extra synchronization. - Small increase in complexity and dependency to HDFS Considering the advantages, it's worth it imho. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-6536) [replication] replication will be block if WAL compress set differently in master and slave cluster configuration
[ https://issues.apache.org/jira/browse/HBASE-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Drzal resolved HBASE-6536. -- Resolution: Duplicate [replication] replication will be block if WAL compress set differently in master and slave cluster configuration - Key: HBASE-6536 URL: https://issues.apache.org/jira/browse/HBASE-6536 Project: HBase Issue Type: Bug Components: replication Affects Versions: 0.94.0 Reporter: terry zhang As we know in hbase 0.94.0 we have a configuration below property namehbase.regionserver.wal.enablecompression/name valuetrue/value /property if we enable it in master cluster and disable it in slave cluster . Then replication will not work. It will throw unwrapRemoteException again and again in master cluster because slave can not parse the hlog entry buffer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-6535) [replication] replication will be block if WAL compress set differently in master and slave cluster configuration
[ https://issues.apache.org/jira/browse/HBASE-6535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Drzal resolved HBASE-6535. -- Resolution: Duplicate [replication] replication will be block if WAL compress set differently in master and slave cluster configuration - Key: HBASE-6535 URL: https://issues.apache.org/jira/browse/HBASE-6535 Project: HBase Issue Type: Bug Components: replication Affects Versions: 0.94.0 Reporter: terry zhang As we know in hbase 0.94.0 we have a configuration below property namehbase.regionserver.wal.enablecompression/name valuetrue/value /property if we enable it in master cluster and disable it in slave cluster . Then replication will not work. It will throw unwrapRemoteException again and again in master cluster because slave can not parse the hlog entry buffer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.
[ https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454869#comment-13454869 ] Hadoop QA commented on HBASE-6299: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12544974/HBASE-6299-v3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause mvn compile goal to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2857//testReport/ Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2857//console This message is automatically generated. RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems. - Key: HBASE-6299 URL: https://issues.apache.org/jira/browse/HBASE-6299 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.94.0 Reporter: Maryann Xue Assignee: Maryann Xue Priority: Critical Attachments: HBASE-6299.patch, HBASE-6299-v2.patch, HBASE-6299-v3.patch 1. HMaster tries to assign a region to an RS. 2. HMaster creates a RegionState for this region and puts it into regionsInTransition. 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS receives the open region request and starts to proceed, with success eventually. However, due to network problems, HMaster fails to receive the response for the openRegion() call, and the call times out. 4. HMaster attemps to assign for a second time, choosing another RS. 5. But since the HMaster's OpenedRegionHandler has been triggered by the region open of the previous RS, and the RegionState has already been removed from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK node RS_ZK_REGION_OPENING updated by the second attempt. 6. The unassigned ZK node stays and a later unassign fails coz RS_ZK_REGION_CLOSING cannot be created. {code} 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.; plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568., src=swbss-hadoop-004,60020,1340890123243, dest=swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. to swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:28,882 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
[jira] [Resolved] (HBASE-6534) [replication] replication will be block if WAL compress set differently in master and slave configuration
[ https://issues.apache.org/jira/browse/HBASE-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Drzal resolved HBASE-6534. -- Resolution: Duplicate [replication] replication will be block if WAL compress set differently in master and slave configuration - Key: HBASE-6534 URL: https://issues.apache.org/jira/browse/HBASE-6534 Project: HBase Issue Type: Bug Components: replication Affects Versions: 0.94.0 Reporter: terry zhang as we know in hbase 0.94.0 we have a configuration below property namehbase.regionserver.wal.enablecompression/name valuetrue/value /property if we enable it in master cluster and disable it in slave cluster . Then replication will not work. It will throw unwrapRemoteException again and again in master cluster. 2012-08-09 12:49:55,892 WARN org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Can't replicate because of an error on the remote cluster: java.io.IOException: IPC server unable to read call parameters: Error in readFields at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:635) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:365) Caused by: org.apache.hadoop.ipc.RemoteException: IPC server unable to read call parameters: Error in readFields at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:921) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:151) at $Proxy13.replicateLogEntries(Unknown Source) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:616) ... 1 more This is because Slave cluster can not parse the hlog entry . 2012-08-09 14:46:05,891 WARN org.apache.hadoop.ipc.HBaseServer: Unable to read call parameters for client 10.232.98.89 java.io.IOException: Error in readFields at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:685) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:586) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:635) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:125) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1292) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1207) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:735) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:524) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:499) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.hbase.KeyValue.readFields(KeyValue.java:2254) at org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFields(WALEdit.java:146) at org.apache.hadoop.hbase.regionserver.wal.HLog$Entry.readFields(HLog.java:1767) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:682) ... 11 more -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6533) [replication] replication will block if WAL compress set differently in master and slave configuration
[ https://issues.apache.org/jira/browse/HBASE-6533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454872#comment-13454872 ] Michael Drzal commented on HBASE-6533: -- [~terry_zhang] I've cleaned up the duplicate issues for you. [replication] replication will block if WAL compress set differently in master and slave configuration -- Key: HBASE-6533 URL: https://issues.apache.org/jira/browse/HBASE-6533 Project: HBase Issue Type: Bug Components: replication Affects Versions: 0.94.0 Reporter: terry zhang Assignee: terry zhang Priority: Critical Fix For: 0.94.3 Attachments: hbase-6533.patch as we know in hbase 0.94.0 we have a configuration below property namehbase.regionserver.wal.enablecompression/name valuetrue/value /property if we enable it in master cluster and disable it in slave cluster . Then replication will not work. It will throw unwrapRemoteException again and again in master cluster. 2012-08-09 12:49:55,892 WARN org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Can't replicate because of an error on the remote cluster: java.io.IOException: IPC server unable to read call parameters: Error in readFields at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:635) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:365) Caused by: org.apache.hadoop.ipc.RemoteException: IPC server unable to read call parameters: Error in readFields at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:921) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:151) at $Proxy13.replicateLogEntries(Unknown Source) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:616) ... 1 more This is because Slave cluster can not parse the hlog entry . 2012-08-09 14:46:05,891 WARN org.apache.hadoop.ipc.HBaseServer: Unable to read call parameters for client 10.232.98.89 java.io.IOException: Error in readFields at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:685) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:586) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:635) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:125) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1292) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1207) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:735) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:524) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:499) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.hbase.KeyValue.readFields(KeyValue.java:2254) at org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFields(WALEdit.java:146) at org.apache.hadoop.hbase.regionserver.wal.HLog$Entry.readFields(HLog.java:1767) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:682) ... 11 more -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6533) [replication] replication will block if WAL compress set differently in master and slave configuration
[ https://issues.apache.org/jira/browse/HBASE-6533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454873#comment-13454873 ] Michael Drzal commented on HBASE-6533: -- [~jdcryans] should we just close this out since the real fix is HBASE-5778? [replication] replication will block if WAL compress set differently in master and slave configuration -- Key: HBASE-6533 URL: https://issues.apache.org/jira/browse/HBASE-6533 Project: HBase Issue Type: Bug Components: replication Affects Versions: 0.94.0 Reporter: terry zhang Assignee: terry zhang Priority: Critical Fix For: 0.94.3 Attachments: hbase-6533.patch as we know in hbase 0.94.0 we have a configuration below property namehbase.regionserver.wal.enablecompression/name valuetrue/value /property if we enable it in master cluster and disable it in slave cluster . Then replication will not work. It will throw unwrapRemoteException again and again in master cluster. 2012-08-09 12:49:55,892 WARN org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Can't replicate because of an error on the remote cluster: java.io.IOException: IPC server unable to read call parameters: Error in readFields at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:635) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:365) Caused by: org.apache.hadoop.ipc.RemoteException: IPC server unable to read call parameters: Error in readFields at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:921) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:151) at $Proxy13.replicateLogEntries(Unknown Source) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:616) ... 1 more This is because Slave cluster can not parse the hlog entry . 2012-08-09 14:46:05,891 WARN org.apache.hadoop.ipc.HBaseServer: Unable to read call parameters for client 10.232.98.89 java.io.IOException: Error in readFields at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:685) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:586) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:635) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:125) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1292) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1207) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:735) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:524) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:499) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.hbase.KeyValue.readFields(KeyValue.java:2254) at org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFields(WALEdit.java:146) at org.apache.hadoop.hbase.regionserver.wal.HLog$Entry.readFields(HLog.java:1767) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:682) ... 11 more -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6563) s.isMajorCompaction() throws npe will cause current major Compaction checking abort
[ https://issues.apache.org/jira/browse/HBASE-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454875#comment-13454875 ] Michael Drzal commented on HBASE-6563: -- [~zhou wenjian] any response to Ted's comments? s.isMajorCompaction() throws npe will cause current major Compaction checking abort --- Key: HBASE-6563 URL: https://issues.apache.org/jira/browse/HBASE-6563 Project: HBase Issue Type: Bug Components: regionserver Reporter: Zhou wenjian Assignee: Zhou wenjian Fix For: 0.94.1 Attachments: HBASE-6563-trunk.patch, HBASE-6563-trunk-v2.patch, HBASE-6563-trunk-v3.patch 2012-05-05 00:49:43,265 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker: Caught exception java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:938) at org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:917) at org.apache.hadoop.hbase.regionserver.HRegion.isMajorCompaction(HRegion.java:3250) at org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker.chore(HRegionServer.java:1222) at org.apache.hadoop.hbase.Chore.run(Chore.java:66) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6564) HDFS space is not reclaimed when a column family is deleted
[ https://issues.apache.org/jira/browse/HBASE-6564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454876#comment-13454876 ] Michael Drzal commented on HBASE-6564: -- [~zhi...@ebaysf.com] can we close this out? HDFS space is not reclaimed when a column family is deleted --- Key: HBASE-6564 URL: https://issues.apache.org/jira/browse/HBASE-6564 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.1 Reporter: J Mohamed Zahoor Assignee: J Mohamed Zahoor Priority: Minor Attachments: HBASE-6564-trunk.patch, HBASE-6564-v2.patch, HBASE-6564-v3.patch, HBASE-6564-v4.patch, HBASE-6564-v5.patch When a column family of a table is deleted, the HDFS space of the column family does not seem to be reclaimed even after a major compaction. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6564) HDFS space is not reclaimed when a column family is deleted
[ https://issues.apache.org/jira/browse/HBASE-6564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454877#comment-13454877 ] J Mohamed Zahoor commented on HBASE-6564: - Yes. I guess. HDFS space is not reclaimed when a column family is deleted --- Key: HBASE-6564 URL: https://issues.apache.org/jira/browse/HBASE-6564 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.1 Reporter: J Mohamed Zahoor Assignee: J Mohamed Zahoor Priority: Minor Attachments: HBASE-6564-trunk.patch, HBASE-6564-v2.patch, HBASE-6564-v3.patch, HBASE-6564-v4.patch, HBASE-6564-v5.patch When a column family of a table is deleted, the HDFS space of the column family does not seem to be reclaimed even after a major compaction. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6583) Enhance Hbase load test tool to automatically create cf's if not present
[ https://issues.apache.org/jira/browse/HBASE-6583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Drzal updated HBASE-6583: - Labels: noob (was: ) Enhance Hbase load test tool to automatically create cf's if not present Key: HBASE-6583 URL: https://issues.apache.org/jira/browse/HBASE-6583 Project: HBase Issue Type: Bug Components: test Reporter: Karthik Ranganathan Labels: noob The load test tool currently disables the table and applies any changes to the cf descriptor if any, but does not create the cf if not present. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6564) HDFS space is not reclaimed when a column family is deleted
[ https://issues.apache.org/jira/browse/HBASE-6564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-6564: -- Resolution: Fixed Fix Version/s: 0.96.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) HDFS space is not reclaimed when a column family is deleted --- Key: HBASE-6564 URL: https://issues.apache.org/jira/browse/HBASE-6564 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.1 Reporter: J Mohamed Zahoor Assignee: J Mohamed Zahoor Priority: Minor Fix For: 0.96.0 Attachments: HBASE-6564-trunk.patch, HBASE-6564-v2.patch, HBASE-6564-v3.patch, HBASE-6564-v4.patch, HBASE-6564-v5.patch When a column family of a table is deleted, the HDFS space of the column family does not seem to be reclaimed even after a major compaction. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6591) checkAndPut executed/not metrics
[ https://issues.apache.org/jira/browse/HBASE-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454885#comment-13454885 ] Michael Drzal commented on HBASE-6591: -- [~gchanan] just to make sure I understand you correctly, you would want metrics at the regionserver level along the lines of: * checkAndPutSuccesses * checkAndPutFailures * checkAndDeleteSuccesses * checkAndDeleteFailures Would they actually be helpful at the regionserver level or would you need them more granular? checkAndPut executed/not metrics Key: HBASE-6591 URL: https://issues.apache.org/jira/browse/HBASE-6591 Project: HBase Issue Type: Task Components: metrics, regionserver Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Minor Fix For: 0.96.0 checkAndPut/checkAndDelete return true if the new put was executed, false otherwise. So clients can figure out this metric for themselves, but it would be useful to get a look at what is happening on the cluster as a whole, across all clients. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6773) Make the dfs replication factor configurable per table
nkeywal created HBASE-6773: -- Summary: Make the dfs replication factor configurable per table Key: HBASE-6773 URL: https://issues.apache.org/jira/browse/HBASE-6773 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.96.0 Reporter: nkeywal Today, it's an application level configuration. So all the HFiles are replicated 3 times per default. There are some reasons to make it per table: - some tables are critical while some others are not. For example, meta would benefit from an higher level of replication, to ensure we continue working even when we lose 20% of the cluster. - some tables are backuped somewhere else, used by non essential process, so the user may accept a lower level of replication for these ones. - it should be a dynamic parameter. For example, during a bulk load we set a replication of 1 or 2, then we increase it. It's in the same space as disabling the WAL for some writes. The case that seems important to me is meta. We can also handle this one by a specific parameter in the usual hbase-site.xml if we don't want a generic solution. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6698) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation
[ https://issues.apache.org/jira/browse/HBASE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454899#comment-13454899 ] ramkrishna.s.vasudevan commented on HBASE-6698: --- [~saint@gmail.com] Is this patch fine Stack? Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation -- Key: HBASE-6698 URL: https://issues.apache.org/jira/browse/HBASE-6698 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-6698_1.patch, HBASE-6698_2.patch, HBASE-6698_3.patch, HBASE-6698_5.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698.patch Currently the checkAndPut and checkAndDelete api internally calls the internalPut and internalDelete. May be we can just call doMiniBatchMutation only. This will help in future like if we have some hooks and the CP handles certain cases in the doMiniBatchMutation the same can be done while doing a put thro checkAndPut or while doing a delete thro checkAndDelete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.
[ https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454905#comment-13454905 ] ramkrishna.s.vasudevan commented on HBASE-6299: --- @Maryann Thanks for the patch. This what we just discussed over in HBASe-6438. Pls take a look at that patch also. We could actually merge both if you feel it is fine and commit them once others review it. RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems. - Key: HBASE-6299 URL: https://issues.apache.org/jira/browse/HBASE-6299 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.94.0 Reporter: Maryann Xue Assignee: Maryann Xue Priority: Critical Attachments: HBASE-6299.patch, HBASE-6299-v2.patch, HBASE-6299-v3.patch 1. HMaster tries to assign a region to an RS. 2. HMaster creates a RegionState for this region and puts it into regionsInTransition. 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS receives the open region request and starts to proceed, with success eventually. However, due to network problems, HMaster fails to receive the response for the openRegion() call, and the call times out. 4. HMaster attemps to assign for a second time, choosing another RS. 5. But since the HMaster's OpenedRegionHandler has been triggered by the region open of the previous RS, and the RegionState has already been removed from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK node RS_ZK_REGION_OPENING updated by the second attempt. 6. The unassigned ZK node stays and a later unassign fails coz RS_ZK_REGION_CLOSING cannot be created. {code} 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.; plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568., src=swbss-hadoop-004,60020,1340890123243, dest=swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. to swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:28,882 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2377fee2ae80007 Deleting existing unassigned node for b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2377fee2ae80007 Successfully deleted unassigned node for region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has opened the region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. that was online on serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301) 2012-06-29 07:07:41,140 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of
[jira] [Commented] (HBASE-6438) RegionAlreadyInTransitionException needs to give more info to avoid assignment inconsistencies
[ https://issues.apache.org/jira/browse/HBASE-6438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454906#comment-13454906 ] ramkrishna.s.vasudevan commented on HBASE-6438: --- @Lars/@Ted Maryann has come up with a patch for HBASe-6299 where there is no retry on SocketTimeOut. May be if he is fine we can merge both or we can seperate this HBASE-6438 seperately. RegionAlreadyInTransitionException needs to give more info to avoid assignment inconsistencies -- Key: HBASE-6438 URL: https://issues.apache.org/jira/browse/HBASE-6438 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: rajeshbabu Fix For: 0.96.0, 0.92.3, 0.94.3 Attachments: HBASE-6438_2.patch, HBASE-6438_94.patch, HBASE-6438_trunk.patch Seeing some of the recent issues in region assignment, RegionAlreadyInTransitionException is one reason after which the region assignment may or may not happen(in the sense we need to wait for the TM to assign). In HBASE-6317 we got one problem due to RegionAlreadyInTransitionException on master restart. Consider the following case, due to some reason like master restart or external assign call, we try to assign a region that is already getting opened in a RS. Now the next call to assign has already changed the state of the znode and so the current assign that is going on the RS is affected and it fails. The second assignment that started also fails getting RAITE exception. Finally both assignments not carrying on. Idea is to find whether any such RAITE exception can be retried or not. Here again we have following cases like where - The znode is yet to transitioned from OFFLINE to OPENING in RS - RS may be in the step of openRegion. - RS may be trying to transition OPENING to OPENED. - RS is yet to add to online regions in the RS side. Here in openRegion() and updateMeta() any failures we are moving the znode to FAILED_OPEN. So in these cases getting an RAITE should be ok. But in other cases the assignment is stopped. The idea is to just add the current state of the region assignment in the RIT map in the RS side and using that info we can determine whether the assignment can be retried or not on getting an RAITE. Considering the current work going on in AM, pls do share if this is needed atleast in the 0.92/0.94 versions? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6774) Immediate assignment of regions that don't have entries in HLog
nkeywal created HBASE-6774: -- Summary: Immediate assignment of regions that don't have entries in HLog Key: HBASE-6774 URL: https://issues.apache.org/jira/browse/HBASE-6774 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal The algo is today, after a failure detection: - split the logs - when all the logs are split, assign the regions But some regions can have no entries at all in the HLog. There are many reasons for this: - kind of reference or historical tables. Bulk written sometimes then read only. - sequential rowkeys. In this case, most of the regions will be read only. But they can be in a regionserver with a lot of writes. - tables flushed often for safety reasons. I'm thinking about meta here. For meta; we can imagine flushing very often. Hence, the recovery for meta, in many cases, will be the failure detection time. There are different possible algos: Option 1) A new task is added, in parallel of the split. This task reads all the HLog. If there is no entry for a region, this region is assigned. Pro: simple Cons: We will need to read all the files. Add a read. Option 2) The master writes in ZK the number of log files, per region. When the regionserver starts the split, it reads the full block (64M) and decrease the log file counter of the region. If it reaches 0, the assign start. At the end of its split, the region server decreases the counter as well. This allow to start the assign even if not all the HLog are finished. It would allow to make some regions available even if we have an issue in one of the log file. Pro: parallel Cons: add something to do for the region server. Requites to read the whole file before starting to write. Option 3) Add some metadata at the end of the log file. The last log file won't have meta data, as if we are recovering, it's because the server crashed. But the others will. And last log file should be smaller (half a block on average). Option 4) Still some metadata, but in a different file. Cons: write are increased (but not that much, we just need to write the region once). Pros: if we lose the HLog files (major failure, no replica available) we can still continue with the regions that were not written at this stage. I think it should be done, even if none of the algorithm above is totally convincing yet. It's linked as well to locality and short circuit reads: with these two points reading the file twice become much less of an issue for example. My current preference would be to open the file twice in the region server, once for splitting as of today, once for a quick read looking for unused regions. Who knows, may be it would even be faster this way, the quick read thread would warm-up the different caches for the splitting thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5843) Improve HBase MTTR - Mean Time To Recover
[ https://issues.apache.org/jira/browse/HBASE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454911#comment-13454911 ] nkeywal commented on HBASE-5843: Test with meta: On a real cluster, 3 nodes. dfs.replication = 2 local HD. Start with 2 DN and 2 RS. Create a table with 100 regions in the second one. The first holds meta root. Start another box with a DN and a RS. This box is empty (no regions, no blocks). Unplug the box with meta root. try to create a table = Time taken is the recovery time of the bow holding meta. No bad surprise. It means as well that with the default zookeeper timeout you're loosing the cluster for 3 minutes if your meta regionserver dies. HBASE-6772, HBASE-6773 and HBASE-6774 would help to increase meta failure resiliency. Improve HBase MTTR - Mean Time To Recover - Key: HBASE-5843 URL: https://issues.apache.org/jira/browse/HBASE-5843 Project: HBase Issue Type: Umbrella Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal A part of the approach is described here: https://docs.google.com/document/d/1z03xRoZrIJmg7jsWuyKYl6zNournF_7ZHzdi0qz_B4c/edit The ideal target is: - failure impact client applications only by an added delay to execute a query, whatever the failure. - this delay is always inferior to 1 second. We're not going to achieve that immediately... Priority will be given to the most frequent issues. Short term: - software crash - standard administrative tasks as stop/start of a cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6438) RegionAlreadyInTransitionException needs to give more info to avoid assignment inconsistencies
[ https://issues.apache.org/jira/browse/HBASE-6438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454929#comment-13454929 ] Ted Yu commented on HBASE-6438: --- I think separating the fix would make discussion easier. Thanks RegionAlreadyInTransitionException needs to give more info to avoid assignment inconsistencies -- Key: HBASE-6438 URL: https://issues.apache.org/jira/browse/HBASE-6438 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: rajeshbabu Fix For: 0.96.0, 0.92.3, 0.94.3 Attachments: HBASE-6438_2.patch, HBASE-6438_94.patch, HBASE-6438_trunk.patch Seeing some of the recent issues in region assignment, RegionAlreadyInTransitionException is one reason after which the region assignment may or may not happen(in the sense we need to wait for the TM to assign). In HBASE-6317 we got one problem due to RegionAlreadyInTransitionException on master restart. Consider the following case, due to some reason like master restart or external assign call, we try to assign a region that is already getting opened in a RS. Now the next call to assign has already changed the state of the znode and so the current assign that is going on the RS is affected and it fails. The second assignment that started also fails getting RAITE exception. Finally both assignments not carrying on. Idea is to find whether any such RAITE exception can be retried or not. Here again we have following cases like where - The znode is yet to transitioned from OFFLINE to OPENING in RS - RS may be in the step of openRegion. - RS may be trying to transition OPENING to OPENED. - RS is yet to add to online regions in the RS side. Here in openRegion() and updateMeta() any failures we are moving the znode to FAILED_OPEN. So in these cases getting an RAITE should be ok. But in other cases the assignment is stopped. The idea is to just add the current state of the region assignment in the RIT map in the RS side and using that info we can determine whether the assignment can be retried or not on getting an RAITE. Considering the current work going on in AM, pls do share if this is needed atleast in the 0.92/0.94 versions? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6500) hbck comlaining, Exception in thread main java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/HBASE-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454954#comment-13454954 ] Lars Hofhansl commented on HBASE-6500: -- Heh, no problem English is not my native language either. hbck comlaining, Exception in thread main java.lang.NullPointerException --- Key: HBASE-6500 URL: https://issues.apache.org/jira/browse/HBASE-6500 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.94.0 Environment: Hadoop 0.20.205.0 Zookeeper: zookeeper-3.3.5.jar Hbase: hbase-0.94.0 Reporter: liuli I met problem with starting Hbase: I have 5 machines (Ubuntu) 109.123.121.23 rsmm-master.example.com 109.123.121.24 rsmm-slave-1.example.com 109.123.121.25 rsmm-slave-2.example.com 109.123.121.26 rsmm-slave-3.example.com 109.123.121.27 rsmm-slave-4.example.com Hadoop 0.20.205.0 Zookeeper: zookeeper-3.3.5.jar Hbase: hbase-0.94.0 After starting HBase, running hbck hduser@rsmm-master:~/hbase/bin$ ./hbase hbck 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.5-1301095, built on 03/15/2012 19:48 GMT 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client environment:host.name=rsmm-master.example.com 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_33 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc. 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jre1.6.0_33 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client
[jira] [Resolved] (HBASE-6500) hbck comlaining, Exception in thread main java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/HBASE-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-6500. -- Resolution: Duplicate Closing as duplicate of HBASE-6464 hbck comlaining, Exception in thread main java.lang.NullPointerException --- Key: HBASE-6500 URL: https://issues.apache.org/jira/browse/HBASE-6500 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.94.0 Environment: Hadoop 0.20.205.0 Zookeeper: zookeeper-3.3.5.jar Hbase: hbase-0.94.0 Reporter: liuli I met problem with starting Hbase: I have 5 machines (Ubuntu) 109.123.121.23 rsmm-master.example.com 109.123.121.24 rsmm-slave-1.example.com 109.123.121.25 rsmm-slave-2.example.com 109.123.121.26 rsmm-slave-3.example.com 109.123.121.27 rsmm-slave-4.example.com Hadoop 0.20.205.0 Zookeeper: zookeeper-3.3.5.jar Hbase: hbase-0.94.0 After starting HBase, running hbck hduser@rsmm-master:~/hbase/bin$ ./hbase hbck 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.5-1301095, built on 03/15/2012 19:48 GMT 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client environment:host.name=rsmm-master.example.com 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_33 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc. 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jre1.6.0_33 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client
[jira] [Commented] (HBASE-6438) RegionAlreadyInTransitionException needs to give more info to avoid assignment inconsistencies
[ https://issues.apache.org/jira/browse/HBASE-6438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454956#comment-13454956 ] Lars Hofhansl commented on HBASE-6438: -- I'm fine either way. 0.94.2RC0 is nit spun, yet. If we can get this in quickly I can pull it into the that RC. RegionAlreadyInTransitionException needs to give more info to avoid assignment inconsistencies -- Key: HBASE-6438 URL: https://issues.apache.org/jira/browse/HBASE-6438 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: rajeshbabu Fix For: 0.96.0, 0.92.3, 0.94.3 Attachments: HBASE-6438_2.patch, HBASE-6438_94.patch, HBASE-6438_trunk.patch Seeing some of the recent issues in region assignment, RegionAlreadyInTransitionException is one reason after which the region assignment may or may not happen(in the sense we need to wait for the TM to assign). In HBASE-6317 we got one problem due to RegionAlreadyInTransitionException on master restart. Consider the following case, due to some reason like master restart or external assign call, we try to assign a region that is already getting opened in a RS. Now the next call to assign has already changed the state of the znode and so the current assign that is going on the RS is affected and it fails. The second assignment that started also fails getting RAITE exception. Finally both assignments not carrying on. Idea is to find whether any such RAITE exception can be retried or not. Here again we have following cases like where - The znode is yet to transitioned from OFFLINE to OPENING in RS - RS may be in the step of openRegion. - RS may be trying to transition OPENING to OPENED. - RS is yet to add to online regions in the RS side. Here in openRegion() and updateMeta() any failures we are moving the znode to FAILED_OPEN. So in these cases getting an RAITE should be ok. But in other cases the assignment is stopped. The idea is to just add the current state of the region assignment in the RIT map in the RS side and using that info we can determine whether the assignment can be retried or not on getting an RAITE. Considering the current work going on in AM, pls do share if this is needed atleast in the 0.92/0.94 versions? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.
[ https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454957#comment-13454957 ] Lars Hofhansl commented on HBASE-6299: -- Do we still need to unwrap the exception? RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems. - Key: HBASE-6299 URL: https://issues.apache.org/jira/browse/HBASE-6299 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.94.0 Reporter: Maryann Xue Assignee: Maryann Xue Priority: Critical Attachments: HBASE-6299.patch, HBASE-6299-v2.patch, HBASE-6299-v3.patch 1. HMaster tries to assign a region to an RS. 2. HMaster creates a RegionState for this region and puts it into regionsInTransition. 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS receives the open region request and starts to proceed, with success eventually. However, due to network problems, HMaster fails to receive the response for the openRegion() call, and the call times out. 4. HMaster attemps to assign for a second time, choosing another RS. 5. But since the HMaster's OpenedRegionHandler has been triggered by the region open of the previous RS, and the RegionState has already been removed from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK node RS_ZK_REGION_OPENING updated by the second attempt. 6. The unassigned ZK node stays and a later unassign fails coz RS_ZK_REGION_CLOSING cannot be created. {code} 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.; plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568., src=swbss-hadoop-004,60020,1340890123243, dest=swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. to swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:28,882 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2377fee2ae80007 Deleting existing unassigned node for b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2377fee2ae80007 Successfully deleted unassigned node for region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has opened the region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. that was online on serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301) 2012-06-29 07:07:41,140 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. to serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=0, regions=575, usedHeap=0, maxHeap=0), trying to assign elsewhere instead; retry=0
[jira] [Updated] (HBASE-6769) HRS.multi eats NoSuchColumnFamilyException since HBASE-5021
[ https://issues.apache.org/jira/browse/HBASE-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6769: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to 0.94 and 0.96. Thanks the patch Elliot! HRS.multi eats NoSuchColumnFamilyException since HBASE-5021 --- Key: HBASE-6769 URL: https://issues.apache.org/jira/browse/HBASE-6769 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.94.1 Reporter: Jean-Daniel Cryans Assignee: Elliott Clark Priority: Critical Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6769-0.94-0.patch, HBASE-6769-0.94-1.patch, HBASE-6769-0.patch I think this is a pretty major usability regression, since HBASE-5021 this is what you get in the client when using a wrong family: {noformat} 2012-09-11 09:45:29,634 WARN org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: DoNotRetryIOException: 1 time, servers with issues: sfor3s44:10304, at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1601) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1377) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:772) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:747) {noformat} Then you have to log on the server to understand what failed. Since everything is now a multi call, even single puts in the shell fail like this. This is present since 0.94.0 Assigning to Elliott because he asked. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.
[ https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6299: - Fix Version/s: 0.94.3 0.96.0 RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems. - Key: HBASE-6299 URL: https://issues.apache.org/jira/browse/HBASE-6299 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.94.0 Reporter: Maryann Xue Assignee: Maryann Xue Priority: Critical Fix For: 0.96.0, 0.94.3 Attachments: HBASE-6299.patch, HBASE-6299-v2.patch, HBASE-6299-v3.patch 1. HMaster tries to assign a region to an RS. 2. HMaster creates a RegionState for this region and puts it into regionsInTransition. 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS receives the open region request and starts to proceed, with success eventually. However, due to network problems, HMaster fails to receive the response for the openRegion() call, and the call times out. 4. HMaster attemps to assign for a second time, choosing another RS. 5. But since the HMaster's OpenedRegionHandler has been triggered by the region open of the previous RS, and the RegionState has already been removed from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK node RS_ZK_REGION_OPENING updated by the second attempt. 6. The unassigned ZK node stays and a later unassign fails coz RS_ZK_REGION_CLOSING cannot be created. {code} 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.; plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568., src=swbss-hadoop-004,60020,1340890123243, dest=swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. to swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:28,882 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2377fee2ae80007 Deleting existing unassigned node for b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2377fee2ae80007 Successfully deleted unassigned node for region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has opened the region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. that was online on serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301) 2012-06-29 07:07:41,140 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. to serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=0, regions=575, usedHeap=0, maxHeap=0), trying to assign elsewhere instead; retry=0 java.net.SocketTimeoutException:
[jira] [Updated] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.
[ https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-6299: -- Fix Version/s: 0.92.3 RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems. - Key: HBASE-6299 URL: https://issues.apache.org/jira/browse/HBASE-6299 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.94.0 Reporter: Maryann Xue Assignee: Maryann Xue Priority: Critical Fix For: 0.96.0, 0.92.3, 0.94.3 Attachments: HBASE-6299.patch, HBASE-6299-v2.patch, HBASE-6299-v3.patch 1. HMaster tries to assign a region to an RS. 2. HMaster creates a RegionState for this region and puts it into regionsInTransition. 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS receives the open region request and starts to proceed, with success eventually. However, due to network problems, HMaster fails to receive the response for the openRegion() call, and the call times out. 4. HMaster attemps to assign for a second time, choosing another RS. 5. But since the HMaster's OpenedRegionHandler has been triggered by the region open of the previous RS, and the RegionState has already been removed from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK node RS_ZK_REGION_OPENING updated by the second attempt. 6. The unassigned ZK node stays and a later unassign fails coz RS_ZK_REGION_CLOSING cannot be created. {code} 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.; plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568., src=swbss-hadoop-004,60020,1340890123243, dest=swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. to swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:28,882 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2377fee2ae80007 Deleting existing unassigned node for b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2377fee2ae80007 Successfully deleted unassigned node for region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has opened the region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. that was online on serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301) 2012-06-29 07:07:41,140 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. to serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=0, regions=575, usedHeap=0, maxHeap=0), trying to assign elsewhere instead; retry=0 java.net.SocketTimeoutException: Call to /172.16.0.6:60020 failed
[jira] [Commented] (HBASE-6591) checkAndPut executed/not metrics
[ https://issues.apache.org/jira/browse/HBASE-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454968#comment-13454968 ] Gregory Chanan commented on HBASE-6591: --- Michael, Those are the metrics I was thinking. What do you think about granularity? regionserver level? region level? checkAndPut executed/not metrics Key: HBASE-6591 URL: https://issues.apache.org/jira/browse/HBASE-6591 Project: HBase Issue Type: Task Components: metrics, regionserver Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Minor Fix For: 0.96.0 checkAndPut/checkAndDelete return true if the new put was executed, false otherwise. So clients can figure out this metric for themselves, but it would be useful to get a look at what is happening on the cluster as a whole, across all clients. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.
[ https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454969#comment-13454969 ] Ted Yu commented on HBASE-6299: --- nit: {code} +else if (t instanceof java.net.SocketTimeoutException {code} 'else' keyword is not needed above. RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems. - Key: HBASE-6299 URL: https://issues.apache.org/jira/browse/HBASE-6299 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.94.0 Reporter: Maryann Xue Assignee: Maryann Xue Priority: Critical Fix For: 0.96.0, 0.92.3, 0.94.3 Attachments: HBASE-6299.patch, HBASE-6299-v2.patch, HBASE-6299-v3.patch 1. HMaster tries to assign a region to an RS. 2. HMaster creates a RegionState for this region and puts it into regionsInTransition. 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS receives the open region request and starts to proceed, with success eventually. However, due to network problems, HMaster fails to receive the response for the openRegion() call, and the call times out. 4. HMaster attemps to assign for a second time, choosing another RS. 5. But since the HMaster's OpenedRegionHandler has been triggered by the region open of the previous RS, and the RegionState has already been removed from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK node RS_ZK_REGION_OPENING updated by the second attempt. 6. The unassigned ZK node stays and a later unassign fails coz RS_ZK_REGION_CLOSING cannot be created. {code} 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.; plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568., src=swbss-hadoop-004,60020,1340890123243, dest=swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. to swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:28,882 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2377fee2ae80007 Deleting existing unassigned node for b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2377fee2ae80007 Successfully deleted unassigned node for region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has opened the region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. that was online on serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301) 2012-06-29 07:07:41,140 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. to serverName=swbss-hadoop-006,60020,1340890678078,
[jira] [Commented] (HBASE-6591) checkAndPut executed/not metrics
[ https://issues.apache.org/jira/browse/HBASE-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454974#comment-13454974 ] Michael Drzal commented on HBASE-6591: -- I don't know. You'd have to tell me what would work for your use case. If you track it at the regionserver level, you could end up with multiple regions affecting this counter. I'm not sure if you'd end up with valuable data in that case. checkAndPut executed/not metrics Key: HBASE-6591 URL: https://issues.apache.org/jira/browse/HBASE-6591 Project: HBase Issue Type: Task Components: metrics, regionserver Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Minor Fix For: 0.96.0 checkAndPut/checkAndDelete return true if the new put was executed, false otherwise. So clients can figure out this metric for themselves, but it would be useful to get a look at what is happening on the cluster as a whole, across all clients. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6769) HRS.multi eats NoSuchColumnFamilyException since HBASE-5021
[ https://issues.apache.org/jira/browse/HBASE-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454986#comment-13454986 ] Jean-Daniel Cryans commented on HBASE-6769: --- Great work Elliott. HRS.multi eats NoSuchColumnFamilyException since HBASE-5021 --- Key: HBASE-6769 URL: https://issues.apache.org/jira/browse/HBASE-6769 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.94.1 Reporter: Jean-Daniel Cryans Assignee: Elliott Clark Priority: Critical Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6769-0.94-0.patch, HBASE-6769-0.94-1.patch, HBASE-6769-0.patch I think this is a pretty major usability regression, since HBASE-5021 this is what you get in the client when using a wrong family: {noformat} 2012-09-11 09:45:29,634 WARN org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: DoNotRetryIOException: 1 time, servers with issues: sfor3s44:10304, at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1601) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1377) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:772) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:747) {noformat} Then you have to log on the server to understand what failed. Since everything is now a multi call, even single puts in the shell fail like this. This is present since 0.94.0 Assigning to Elliott because he asked. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-6374) [89-fb] Unify the multi-put/get/delete path so there is only one call to each RS, instead of one call per region
[ https://issues.apache.org/jira/browse/HBASE-6374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amitanand Aiyer resolved HBASE-6374. Resolution: Fixed [89-fb] Unify the multi-put/get/delete path so there is only one call to each RS, instead of one call per region Key: HBASE-6374 URL: https://issues.apache.org/jira/browse/HBASE-6374 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb This is a feature similar to the batch feature in trunk. We have optimisation for the put path where we batch puts by the regionserver, but for gets and deletes we do batching only per hregion. So, if there are 20 regions on a regionserver, we would be doing 20 RPC when we can potentially batch them together in 1 call. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6591) checkAndPut executed/not metrics
[ https://issues.apache.org/jira/browse/HBASE-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454990#comment-13454990 ] Lars Hofhansl commented on HBASE-6591: -- What would the use of this metric generally be? checkAndPut executed/not metrics Key: HBASE-6591 URL: https://issues.apache.org/jira/browse/HBASE-6591 Project: HBase Issue Type: Task Components: metrics, regionserver Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Minor Fix For: 0.96.0 checkAndPut/checkAndDelete return true if the new put was executed, false otherwise. So clients can figure out this metric for themselves, but it would be useful to get a look at what is happening on the cluster as a whole, across all clients. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6374) [89-fb] Unify the multi-put/get/delete path so there is only one call to each RS, instead of one call per region
[ https://issues.apache.org/jira/browse/HBASE-6374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454995#comment-13454995 ] Lars Hofhansl commented on HBASE-6374: -- So is this in trunk as well (the Get and Delete optimization)? (I know I could just look, but asking is easier :) ) [89-fb] Unify the multi-put/get/delete path so there is only one call to each RS, instead of one call per region Key: HBASE-6374 URL: https://issues.apache.org/jira/browse/HBASE-6374 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb This is a feature similar to the batch feature in trunk. We have optimisation for the put path where we batch puts by the regionserver, but for gets and deletes we do batching only per hregion. So, if there are 20 regions on a regionserver, we would be doing 20 RPC when we can potentially batch them together in 1 call. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6769) HRS.multi eats NoSuchColumnFamilyException since HBASE-5021
[ https://issues.apache.org/jira/browse/HBASE-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455011#comment-13455011 ] Hudson commented on HBASE-6769: --- Integrated in HBase-TRUNK #3327 (See [https://builds.apache.org/job/HBase-TRUNK/3327/]) HBASE-6769 HRS.multi eats NoSuchColumnFamilyException (Elliott Clark) (Revision 1384377) Result = FAILURE larsh : Files : * /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/FailedSanityCheckException.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java HRS.multi eats NoSuchColumnFamilyException since HBASE-5021 --- Key: HBASE-6769 URL: https://issues.apache.org/jira/browse/HBASE-6769 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.94.1 Reporter: Jean-Daniel Cryans Assignee: Elliott Clark Priority: Critical Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6769-0.94-0.patch, HBASE-6769-0.94-1.patch, HBASE-6769-0.patch I think this is a pretty major usability regression, since HBASE-5021 this is what you get in the client when using a wrong family: {noformat} 2012-09-11 09:45:29,634 WARN org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: DoNotRetryIOException: 1 time, servers with issues: sfor3s44:10304, at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1601) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1377) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:772) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:747) {noformat} Then you have to log on the server to understand what failed. Since everything is now a multi call, even single puts in the shell fail like this. This is present since 0.94.0 Assigning to Elliott because he asked. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6769) HRS.multi eats NoSuchColumnFamilyException since HBASE-5021
[ https://issues.apache.org/jira/browse/HBASE-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455015#comment-13455015 ] Hudson commented on HBASE-6769: --- Integrated in HBase-0.94 #467 (See [https://builds.apache.org/job/HBase-0.94/467/]) HBASE-6769 HRS.multi eats NoSuchColumnFamilyException (Elliott Clark) (Revision 1384378) Result = FAILURE larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HConstants.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java HRS.multi eats NoSuchColumnFamilyException since HBASE-5021 --- Key: HBASE-6769 URL: https://issues.apache.org/jira/browse/HBASE-6769 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.94.1 Reporter: Jean-Daniel Cryans Assignee: Elliott Clark Priority: Critical Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6769-0.94-0.patch, HBASE-6769-0.94-1.patch, HBASE-6769-0.patch I think this is a pretty major usability regression, since HBASE-5021 this is what you get in the client when using a wrong family: {noformat} 2012-09-11 09:45:29,634 WARN org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: DoNotRetryIOException: 1 time, servers with issues: sfor3s44:10304, at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1601) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1377) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:772) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:747) {noformat} Then you have to log on the server to understand what failed. Since everything is now a multi call, even single puts in the shell fail like this. This is present since 0.94.0 Assigning to Elliott because he asked. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6765) 'Take a snapshot' interface
[ https://issues.apache.org/jira/browse/HBASE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455026#comment-13455026 ] Jonathan Hsieh commented on HBASE-6765: --- +1 on v2 on review board. 'Take a snapshot' interface --- Key: HBASE-6765 URL: https://issues.apache.org/jira/browse/HBASE-6765 Project: HBase Issue Type: Bug Components: client, master Affects Versions: 0.96.0 Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.96.0 Attachments: hbase-6765-v0.patch Add interfaces taking a snapshot. This is in hopes of cutting down on the overhead involved in reviewing snapshots. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6765) 'Take a snapshot' interface
[ https://issues.apache.org/jira/browse/HBASE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-6765: -- Issue Type: Sub-task (was: Bug) Parent: HBASE-6055 'Take a snapshot' interface --- Key: HBASE-6765 URL: https://issues.apache.org/jira/browse/HBASE-6765 Project: HBase Issue Type: Sub-task Components: client, master Affects Versions: 0.96.0 Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.96.0 Attachments: hbase-6765-v0.patch Add interfaces taking a snapshot. This is in hopes of cutting down on the overhead involved in reviewing snapshots. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6719) [replication] Data will lose if open a Hlog failed more than maxRetriesMultiplier
[ https://issues.apache.org/jira/browse/HBASE-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455027#comment-13455027 ] Jean-Daniel Cryans commented on HBASE-6719: --- bq. Can we rewrite the patch this way? Yeah I think this works. bq. One concern I have: What if the file is actually gone for some reason? In that case it seems we'd never stop retrying. If you go up in the file you'll see that after we looked everywhere I currently don't have a good solution for files that are missing completely. Basically my heuristic was if I can't open or get to the file and there's another one that's available, I'll dump it. This indeed doesn't work if there's a transient error that lasts long enough for the retries to exhaust. Should we introduce a quarantine? [replication] Data will lose if open a Hlog failed more than maxRetriesMultiplier - Key: HBASE-6719 URL: https://issues.apache.org/jira/browse/HBASE-6719 Project: HBase Issue Type: Bug Components: replication Affects Versions: 0.94.1 Reporter: terry zhang Assignee: terry zhang Priority: Critical Fix For: 0.94.3 Attachments: 6719.txt, hbase-6719.patch Please Take a look below code {code:title=ReplicationSource.java|borderStyle=solid} protected boolean openReader(int sleepMultiplier) { { ... catch (IOException ioe) { LOG.warn(peerClusterZnode + Got: , ioe); // TODO Need a better way to determinate if a file is really gone but // TODO without scanning all logs dir if (sleepMultiplier == this.maxRetriesMultiplier) { LOG.warn(Waited too long for this file, considering dumping); return !processEndOfFile(); // Open a file failed over maxRetriesMultiplier(default 10) } } return true; ... } protected boolean processEndOfFile() { if (this.queue.size() != 0) {// Skipped this Hlog . Data loss this.currentPath = null; this.position = 0; return true; } else if (this.queueRecovered) { // Terminate Failover Replication source thread ,data loss this.manager.closeRecoveredQueue(this); LOG.info(Finished recovering the queue); this.running = false; return true; } return false; } {code} Some Time HDFS will meet some problem but actually Hlog file is OK , So after HDFS back ,Some data will lose and can not find them back in slave cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5997) Fix concerns raised in HBASE-5922 related to HalfStoreFileReader
[ https://issues.apache.org/jira/browse/HBASE-5997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455028#comment-13455028 ] Hudson commented on HBASE-5997: --- Integrated in HBase-0.94-security #52 (See [https://builds.apache.org/job/HBase-0.94-security/52/]) HBASE-5997 Fix concerns raised in HBASE-5922 related to HalfStoreFileReader (Revision 1383792) Result = SUCCESS stack : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java Fix concerns raised in HBASE-5922 related to HalfStoreFileReader Key: HBASE-5997 URL: https://issues.apache.org/jira/browse/HBASE-5997 Project: HBase Issue Type: Bug Affects Versions: 0.90.6, 0.92.1, 0.94.0, 0.96.0 Reporter: ramkrishna.s.vasudevan Assignee: Anoop Sam John Fix For: 0.96.0, 0.94.2 Attachments: 5997v3_trunk.txt, 5997v3_trunk.txt, 5997v3_trunk.txt, 5997v3_trunk.txt, HBASE-5997_0.94.patch, HBASE-5997_94 V2.patch, HBASE-5997_94 V3.patch, Testcase.patch.txt Pls refer to the comment https://issues.apache.org/jira/browse/HBASE-5922?focusedCommentId=13269346page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13269346. Raised this issue to solve that comment. Just incase we don't forget it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6286) Upgrade maven-compiler-plugin to 2.5.1
[ https://issues.apache.org/jira/browse/HBASE-6286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455029#comment-13455029 ] Hudson commented on HBASE-6286: --- Integrated in HBase-0.94-security #52 (See [https://builds.apache.org/job/HBase-0.94-security/52/]) HBASE-6286 Upgrade maven-compiler-plugin to 2.5.1 (Revision 1381861) Result = SUCCESS stack : Files : * /hbase/branches/0.94/pom.xml Upgrade maven-compiler-plugin to 2.5.1 -- Key: HBASE-6286 URL: https://issues.apache.org/jira/browse/HBASE-6286 Project: HBase Issue Type: Improvement Components: build Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6286.patch time mvn -PlocalTests clean install -DskipTests With 2.5.1: |user|1m35.634s|1m31.178s|1m31.366s| |sys|0m06.540s|0m05.376s|0m05.488s| With 2.0.2 (current): |user|2m01.168s|1m54.027s|1m57.799s| |sys|0m05.896s|0m05.912s|0m06.032s| -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5922) HalfStoreFileReader seekBefore causes StackOverflowError
[ https://issues.apache.org/jira/browse/HBASE-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455030#comment-13455030 ] Hudson commented on HBASE-5922: --- Integrated in HBase-0.94-security #52 (See [https://builds.apache.org/job/HBase-0.94-security/52/]) HBASE-5997 Fix concerns raised in HBASE-5922 related to HalfStoreFileReader (Revision 1383792) Result = SUCCESS stack : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java HalfStoreFileReader seekBefore causes StackOverflowError Key: HBASE-5922 URL: https://issues.apache.org/jira/browse/HBASE-5922 Project: HBase Issue Type: Bug Components: client, io Affects Versions: 0.90.0 Environment: HBase 0.90.4 Reporter: Nate Putnam Assignee: Nate Putnam Priority: Critical Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1 Attachments: 5922.092.txt, HBASE-5922.patch, HBASE-5922.patch, HBASE-5922.v2.patch, HBASE-5922.v3.patch, HBASE-5922.v4.patch Calling HRegionServer.getClosestRowBefore() can cause a stack overflow if the underlying store file is a reference and the row key is in the bottom. java.io.IOException: java.io.IOException: java.lang.StackOverflowError at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:990) at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:978) at org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1651) at sun.reflect.GeneratedMethodAccessor174.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) Caused by: java.lang.StackOverflowError at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:147) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455031#comment-13455031 ] Hudson commented on HBASE-6649: --- Integrated in HBase-0.94-security #52 (See [https://builds.apache.org/job/HBase-0.94-security/52/]) HBASE-6649 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] (Revision 1381289) Result = SUCCESS stack : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5631) hbck should handle case where .tableinfo file is missing.
[ https://issues.apache.org/jira/browse/HBASE-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455032#comment-13455032 ] Hudson commented on HBASE-5631: --- Integrated in HBase-0.94-security #52 (See [https://builds.apache.org/job/HBase-0.94-security/52/]) HBASE-5631 ADDENDUM (extra comments) (Revision 1382628) HBASE-5631 hbck should handle case where .tableinfo file is missing (Jie Huang) (Revision 1382530) Result = SUCCESS jmhsieh : Files : * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java jmhsieh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java hbck should handle case where .tableinfo file is missing. - Key: HBASE-5631 URL: https://issues.apache.org/jira/browse/HBASE-5631 Project: HBase Issue Type: Improvement Components: hbck Affects Versions: 0.92.2, 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jie Huang Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: hbase-5631-addendum.patch, hbase-5631.patch, hbase-5631-v1.patch, hbase-5631-v2.patch 0.92+ branches have a .tableinfo file which could be missing from hdfs. hbck should be able to detect and repair this properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6769) HRS.multi eats NoSuchColumnFamilyException since HBASE-5021
[ https://issues.apache.org/jira/browse/HBASE-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455035#comment-13455035 ] Hudson commented on HBASE-6769: --- Integrated in HBase-0.94-security #52 (See [https://builds.apache.org/job/HBase-0.94-security/52/]) HBASE-6769 HRS.multi eats NoSuchColumnFamilyException (Elliott Clark) (Revision 1384378) Result = SUCCESS larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HConstants.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java HRS.multi eats NoSuchColumnFamilyException since HBASE-5021 --- Key: HBASE-6769 URL: https://issues.apache.org/jira/browse/HBASE-6769 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.94.1 Reporter: Jean-Daniel Cryans Assignee: Elliott Clark Priority: Critical Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6769-0.94-0.patch, HBASE-6769-0.94-1.patch, HBASE-6769-0.patch I think this is a pretty major usability regression, since HBASE-5021 this is what you get in the client when using a wrong family: {noformat} 2012-09-11 09:45:29,634 WARN org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: DoNotRetryIOException: 1 time, servers with issues: sfor3s44:10304, at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1601) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1377) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:772) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:747) {noformat} Then you have to log on the server to understand what failed. Since everything is now a multi call, even single puts in the shell fail like this. This is present since 0.94.0 Assigning to Elliott because he asked. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6757) Very inefficient behaviour of scan using FilterList
[ https://issues.apache.org/jira/browse/HBASE-6757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455034#comment-13455034 ] Hudson commented on HBASE-6757: --- Integrated in HBase-0.94-security #52 (See [https://builds.apache.org/job/HBase-0.94-security/52/]) HBASE-6757 Very inefficient behaviour of scan using FilterList (Revision 1383749) Result = SUCCESS larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java Very inefficient behaviour of scan using FilterList --- Key: HBASE-6757 URL: https://issues.apache.org/jira/browse/HBASE-6757 Project: HBase Issue Type: Bug Components: filters Affects Versions: 0.90.6 Reporter: Jerry Lam Assignee: Lars Hofhansl Fix For: 0.96.0, 0.94.2 Attachments: 6757.txt, CopyOfTestColumnPrefixFilter.java, DisplayFilter.java The behaviour of scan is very inefficient when using with FilterList. The FilterList rewrites the return code from NEXT_ROW to SKIP from a filter if Operator.MUST_PASS_ALL is used. This happens when using ColumnPrefixFilter. Even though the ColumnPrefixFilter indicates to jump to NEXT_ROW because no further match can be found, the scan continues to scan all versions of a column in that row and all columns of that row because the ReturnCode from ColumnPrefixFilter has been rewritten by the FilterList from NEXT_ROW to SKIP. This is particularly inefficient when there are many versions in a column because the check is performed on all versions of the column instead of just by checking the qualifier of the column name. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6432) HRegionServer doesn't properly set clusterId in conf
[ https://issues.apache.org/jira/browse/HBASE-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455036#comment-13455036 ] Hudson commented on HBASE-6432: --- Integrated in HBase-0.94-security #52 (See [https://builds.apache.org/job/HBase-0.94-security/52/]) HBASE-6432 HRegionServer doesn't properly set clusterId in conf (Revision 1381907) Result = SUCCESS stack : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java HRegionServer doesn't properly set clusterId in conf Key: HBASE-6432 URL: https://issues.apache.org/jira/browse/HBASE-6432 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.96.0 Reporter: Francis Liu Assignee: Francis Liu Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6432_94.patch, HBASE-6432.patch ClusterId is normally set into the passed conf during instantiation of an HTable class. In the case of a HRegionServer this is bypassed and set to default since getMaster() since it uses HBaseRPC to create the proxy directly and bypasses the class which retrieves and sets the correct clusterId. This becomes a problem with clients (ie within a coprocessor) using delegation tokens for authentication. Since the token's service will be the correct clusterId and while the TokenSelector is looking for one with service default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5206) Port HBASE-5155 to 0.92, 0.94, and TRUNK
[ https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455037#comment-13455037 ] Hudson commented on HBASE-5206: --- Integrated in HBase-0.94-security #52 (See [https://builds.apache.org/job/HBase-0.94-security/52/]) HBASE-6710 0.92/0.94 compatibility issues due to HBASE-5206 (Revision 1384181) Result = SUCCESS gchanan : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTableReadOnly.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTable.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTableReadOnly.java Port HBASE-5155 to 0.92, 0.94, and TRUNK Key: HBASE-5206 URL: https://issues.apache.org/jira/browse/HBASE-5206 Project: HBase Issue Type: Bug Affects Versions: 0.92.2, 0.94.0, 0.96.0 Reporter: Ted Yu Assignee: Ashutosh Jindal Fix For: 0.94.0, 0.96.0 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 5206_92_latest_2.patch, 5206_92_latest_3.patch, 5206_trunk_1.patch, 5206_trunk_latest_1.patch, 5206_trunk_latest_2.patch, 5206_trunk_latest_3.patch This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted) to 0.92 and TRUNK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6715) TestFromClientSide.testCacheOnWriteEvictOnClose is flaky
[ https://issues.apache.org/jira/browse/HBASE-6715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455038#comment-13455038 ] Hudson commented on HBASE-6715: --- Integrated in HBase-0.94-security #52 (See [https://builds.apache.org/job/HBase-0.94-security/52/]) HBASE-6715 TestFromClientSide.testCacheOnWriteEvictOnClose is flaky (Revision 1381678) Result = SUCCESS jxiang : Files : * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java TestFromClientSide.testCacheOnWriteEvictOnClose is flaky Key: HBASE-6715 URL: https://issues.apache.org/jira/browse/HBASE-6715 Project: HBase Issue Type: Test Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: trunk-6715.patch Occasionally, this test fails: {noformat} expected:2049 but was:2069 Stacktrace java.lang.AssertionError: expected:2049 but was:2069 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hbase.client.TestFromClientSide.testCacheOnWriteEvictOnClose(TestFromClientSide.java:4248) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) {noformat} It could be because there is other thread still accessing the cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6734) Code duplication in LoadIncrementalHFiles
[ https://issues.apache.org/jira/browse/HBASE-6734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455033#comment-13455033 ] Hudson commented on HBASE-6734: --- Integrated in HBase-0.94-security #52 (See [https://builds.apache.org/job/HBase-0.94-security/52/]) HBASE-6734 Code duplication in LoadIncrementalHFiles (Richard Ding) (Revision 1382354) Result = SUCCESS tedyu : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java Code duplication in LoadIncrementalHFiles - Key: HBASE-6734 URL: https://issues.apache.org/jira/browse/HBASE-6734 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.94.1 Reporter: Richard Ding Assignee: Richard Ding Priority: Minor Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6734.patch This was due to the merge of two Jiras: {code} if (queue.isEmpty()) { LOG.warn(Bulk load operation did not find any files to load in + directory + hfofDir.toUri() + . Does it contain files in + subdirectories that correspond to column family names?); return; } if (queue.isEmpty()) { LOG.warn(Bulk load operation did not find any files to load in + directory + hfofDir.toUri() + . Does it contain files in + subdirectories that correspond to column family names?); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6713) Stopping META/ROOT RS may take 50mins when some region is splitting
[ https://issues.apache.org/jira/browse/HBASE-6713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455040#comment-13455040 ] Hudson commented on HBASE-6713: --- Integrated in HBase-0.94-security #52 (See [https://builds.apache.org/job/HBase-0.94-security/52/]) HBASE-6713 Stopping META/ROOT RS may take 50mins when some region is splitting (Chunhui) (Revision 1382163) Result = SUCCESS tedyu : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java Stopping META/ROOT RS may take 50mins when some region is splitting --- Key: HBASE-6713 URL: https://issues.apache.org/jira/browse/HBASE-6713 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.1 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0, 0.94.2 Attachments: 6713.92-94, 6713v3.patch, HBASE-6713.patch, HBASE-6713v2.patch When we stop the RS carrying ROOT/META, if it is in the splitting for some region, the whole stopping process may take 50 mins. The reason is : 1.ROOT/META region is closed when stopping the regionserver 2.The Split Transaction failed updating META and it will retry 3.The retry num is 100, and the total time is about 50 mins as default; This configuration is set by HConnectionManager#setServerSideHConnectionRetries I think 50 mins is too long to acceptable, my suggested solution is closing MetaTable regions after the compact/split thread is closed -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6710) 0.92/0.94 compatibility issues due to HBASE-5206
[ https://issues.apache.org/jira/browse/HBASE-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455041#comment-13455041 ] Hudson commented on HBASE-6710: --- Integrated in HBase-0.94-security #52 (See [https://builds.apache.org/job/HBase-0.94-security/52/]) HBASE-6710 0.92/0.94 compatibility issues due to HBASE-5206 (Revision 1384181) Result = SUCCESS gchanan : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTableReadOnly.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTable.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTableReadOnly.java 0.92/0.94 compatibility issues due to HBASE-5206 Key: HBASE-6710 URL: https://issues.apache.org/jira/browse/HBASE-6710 Project: HBase Issue Type: Bug Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Critical Fix For: 0.94.2 Attachments: HBASE-6710-v3.patch HBASE-5206 introduces some compatibility issues between {0.94,0.94.1} and {0.92.0,0.92.1}. The release notes of HBASE-5155 describes the issue (HBASE-5206 is a backport of HBASE-5155). I think we can make 0.94.2 compatible with both {0.94.0,0.94.1} and {0.92.0,0.92.1}, although one of those sets will require configuration changes. The basic problem is that there is a znode for each table zookeeper.znode.tableEnableDisable that is handled differently. On 0.92.0 and 0.92.1 the states for this table are: [ disabled, disabling, enabling ] or deleted if the table is enabled On 0.94.1 and 0.94.2 the states for this table are: [ disabled, disabling, enabling, enabled ] What saves us is that the location of this znode is configurable. So the basic idea is to have the 0.94.2 master write two different znodes, zookeeper.znode.tableEnableDisabled92 and zookeeper.znode.tableEnableDisabled94 where the 92 node is in 92 format, the 94 node is in 94 format. And internally, the master would only use the 94 format in order to solve the original bug HBASE-5155 solves. We can of course make one of these the same default as exists now, so we don't need to make config changes for one of 0.92 or 0.94 clients. I argue that 0.92 clients shouldn't have to make config changes for the same reason I argued above. But that is debatable. Then, I think the only question left is the question of how to bring along the {0.94.0, 0.94.1} crew. A {0.94.0, 0.94.1} client would work against a 0.94.2 cluster by just configuring zookeeper.znode.tableEnableDisable in the client to be whatever zookeeper.znode.tableEnableDisabled94 is in the cluster. A 0.94.2 client would work against both a {0.94.0, 0.94.1} and {0.92.0, 0.92.1} cluster if it had HBASE-6268 applied. About rolling upgrade from {0.94.0, 0.94.1} to 0.94.2 -- I'd have to think about that. Do the regionservers ever read the tableEnableDisabled znode? On the mailing list, Lars H suggested the following: The only input I'd have is that format we'll use going forward will not have a version attached to it. So maybe the 92 version would still be called zookeeper.znode.tableEnableDisable and the new node could have a different name zookeeper.znode.tableEnableDisableNew (or something). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6288) In hbase-daemons.sh, description of the default backup-master file path is wrong
[ https://issues.apache.org/jira/browse/HBASE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455042#comment-13455042 ] Hudson commented on HBASE-6288: --- Integrated in HBase-0.94-security #52 (See [https://builds.apache.org/job/HBase-0.94-security/52/]) HBASE-6288 In hbase-daemons.sh, description of the default backup-master file path is wrong (Revision 1381219) Result = SUCCESS stack : Files : * /hbase/branches/0.94/bin/master-backup.sh * /hbase/branches/0.94/conf/hbase-env.sh In hbase-daemons.sh, description of the default backup-master file path is wrong Key: HBASE-6288 URL: https://issues.apache.org/jira/browse/HBASE-6288 Project: HBase Issue Type: Task Components: master, scripts, shell Affects Versions: 0.92.0, 0.92.1, 0.94.0 Reporter: Benjamin Kim Assignee: Benjamin Kim Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: HBASE-6288-92-1.patch, HBASE-6288-92.patch, HBASE-6288-94.patch, HBASE-6288-trunk.patch In hbase-daemons.sh, description of the default backup-master file path is wrong {code} # HBASE_BACKUP_MASTERS File naming remote hosts. # Default is ${HADOOP_CONF_DIR}/backup-masters {code} it says the default backup-masters file path is at a hadoop-conf-dir, but shouldn't this be HBASE_CONF_DIR? also adding following lines to conf/hbase-env.sh would be helpful {code} # File naming hosts on which backup HMaster will run. $HBASE_HOME/conf/backup-masters by default. export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6340) HBase RPC should allow protocol extension with common interfaces.
[ https://issues.apache.org/jira/browse/HBASE-6340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455039#comment-13455039 ] Hudson commented on HBASE-6340: --- Integrated in HBase-0.94-security #52 (See [https://builds.apache.org/job/HBase-0.94-security/52/]) HBASE-6340 Reapply with fix for SecureRpcEngine (Revision 1383754) HBASE-6340 HBase RPC should allow protocol extension with common interfaces.: REVERT (Revision 1383537) HBASE-6340 HBase RPC should allow protocol extension with common interfaces. (Revision 1382207) HBASE-6340 HBase RPC should allow protocol extension with common interfaces. (Revision 1382206) Result = SUCCESS larsh : Files : * /hbase/branches/0.94/security/src/main/java/org/apache/hadoop/hbase/ipc/SecureRpcEngine.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/coprocessor/Exec.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/Invocation.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/WritableRpcEngine.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/ipc/TestProtocolExtension.java stack : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/coprocessor/Exec.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/Invocation.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/WritableRpcEngine.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/ipc/TestProtocolExtension.java stack : Files : * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/ipc/TestProtocolExtension.java stack : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/coprocessor/Exec.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/Invocation.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/WritableRpcEngine.java HBase RPC should allow protocol extension with common interfaces. - Key: HBASE-6340 URL: https://issues.apache.org/jira/browse/HBASE-6340 Project: HBase Issue Type: Bug Components: coprocessors, regionserver Affects Versions: 0.92.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Fix For: 0.94.2 Attachments: 6340-6762-combined.txt, 6340-6762-combined-v2.txt, 6340-RPCInvocation.patch, RPCInvocation.patch HBase RPC fails if MyProtocol extends an interface, which is not a VersionedProtocol even if MyProtocol also directly extends VersionedProtocol. The reason is that rpc Invocation uses Method.getDeclaringClass(), which returns the interface class rather than the class of MyProtocol. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6765) 'Take a snapshot' interface
[ https://issues.apache.org/jira/browse/HBASE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455047#comment-13455047 ] Jonathan Hsieh commented on HBASE-6765: --- turned into a sub-issue of the snapshots umbrella issue. We'll resolve the umbrella after we get the 3x +1's and merge into trunk. When sub-issues resolved it means it was committed to dev-branch. Sound good? 'Take a snapshot' interface --- Key: HBASE-6765 URL: https://issues.apache.org/jira/browse/HBASE-6765 Project: HBase Issue Type: Sub-task Components: client, master Affects Versions: 0.96.0 Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.96.0 Attachments: hbase-6765-v0.patch Add interfaces taking a snapshot. This is in hopes of cutting down on the overhead involved in reviewing snapshots. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6768) HBase Rest server crashes if client tries to retrieve data size 5 MB
[ https://issues.apache.org/jira/browse/HBASE-6768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455054#comment-13455054 ] Andrew Purtell commented on HBASE-6768: --- Are you running REST with {{-XX:OnOutOfMemoryError=kill -9 %p}} (HBASE-4769)? That is one reason why a HBase process might be dead with nothing logged and no hs_err file. Have you tried setting/increasing MaxDirectMemory, e.g. {{-XX:MaxDirectMemorySize=$LARGE_VALUE}}? HBase Rest server crashes if client tries to retrieve data size 5 MB -- Key: HBASE-6768 URL: https://issues.apache.org/jira/browse/HBASE-6768 Project: HBase Issue Type: Bug Components: rest Affects Versions: 0.90.5 Reporter: Mubarak Seyed Labels: noob I have a CF with one qualifier, data size is 5 MB, when i try to read the raw binary data as octet-stream using curl, rest server got crashed and curl throws exception as {code} curl -v -H Accept: application/octet-stream http://abcdefgh-hbase003.test1.test.com:9090/table1/row_key1/cf:qualifer1 /tmp/out * About to connect() to abcdefgh-hbase003.test1.test.com port 9090 * Trying xx.xx.xx.xxx... connected * Connected to abcdefgh-hbase003.test1.test.com (xx.xxx.xx.xxx) port 9090 GET /table1/row_key1/cf:qualifer1 HTTP/1.1 User-Agent: curl/7.15.5 (x86_64-redhat-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.5 Host: abcdefgh-hbase003.test1.test.com:9090 Accept: application/octet-stream % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 0 00 00 0 0 0 --:--:-- 0:00:02 --:--:-- 0 HTTP/1.1 200 OK Content-Length: 5129836 X-Timestamp: 1347338813129 Content-Type: application/octet-stream 0 5009k0 162720 0 7460 0 0:11:27 0:00:02 0:11:25 13872transfer closed with 1148524 bytes remaining to read 77 5009k 77 3888k0 0 1765k 0 0:00:02 0:00:02 --:--:-- 3253k* Closing connection #0 curl: (18) transfer closed with 1148524 bytes remaining to read {code} Couldn't find the exception in rest server log or no core dump either. This issue is constantly reproducible. Even i tried with HBase Rest client (HRemoteTable) and i could recreate this issue if the data size is 10 MB (even with MIME_PROTOBUF accept header) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6591) checkAndPut executed/not metrics
[ https://issues.apache.org/jira/browse/HBASE-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455057#comment-13455057 ] Gregory Chanan commented on HBASE-6591: --- We had a customer with a very slow-running application that consisted mainly of checkAndPuts. checkAndPut latency looked good. It would have been nice to just look at the metrics and be able to eliminate the possibility that their checkAndPuts simply weren't getting executed. checkAndPut executed/not metrics Key: HBASE-6591 URL: https://issues.apache.org/jira/browse/HBASE-6591 Project: HBase Issue Type: Task Components: metrics, regionserver Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Minor Fix For: 0.96.0 checkAndPut/checkAndDelete return true if the new put was executed, false otherwise. So clients can figure out this metric for themselves, but it would be useful to get a look at what is happening on the cluster as a whole, across all clients. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6765) 'Take a snapshot' interface
[ https://issues.apache.org/jira/browse/HBASE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455072#comment-13455072 ] Jesse Yates commented on HBASE-6765: @Jon - my bad on the label. +1 on what resolved means. 'Take a snapshot' interface --- Key: HBASE-6765 URL: https://issues.apache.org/jira/browse/HBASE-6765 Project: HBase Issue Type: Sub-task Components: client, master Affects Versions: 0.96.0 Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.96.0 Attachments: hbase-6765-v0.patch Add interfaces taking a snapshot. This is in hopes of cutting down on the overhead involved in reviewing snapshots. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6765) 'Take a snapshot' interface
[ https://issues.apache.org/jira/browse/HBASE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-6765: -- Component/s: snapshots 'Take a snapshot' interface --- Key: HBASE-6765 URL: https://issues.apache.org/jira/browse/HBASE-6765 Project: HBase Issue Type: Sub-task Components: client, master, snapshots Affects Versions: 0.96.0 Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.96.0 Attachments: hbase-6765-v0.patch Add interfaces taking a snapshot. This is in hopes of cutting down on the overhead involved in reviewing snapshots. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5452) Fixes for HBase shell with protobuf-based data
[ https://issues.apache.org/jira/browse/HBASE-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455098#comment-13455098 ] Gregory Chanan commented on HBASE-5452: --- What do you think needs to be done here, Chris? Just manually go through all the shell commands and make sure nothing broke? Fixes for HBase shell with protobuf-based data -- Key: HBASE-5452 URL: https://issues.apache.org/jira/browse/HBASE-5452 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Todd Lipcon Assignee: Chris Trezzo -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6765) 'Take a snapshot' interface
[ https://issues.apache.org/jira/browse/HBASE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455111#comment-13455111 ] Jesse Yates commented on HBASE-6765: I'll let the RB sit up there for another day, and then roll it into the dev branch (modulo nit fixes). 'Take a snapshot' interface --- Key: HBASE-6765 URL: https://issues.apache.org/jira/browse/HBASE-6765 Project: HBase Issue Type: Sub-task Components: client, master, snapshots Affects Versions: 0.96.0 Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.96.0 Attachments: hbase-6765-v0.patch Add interfaces taking a snapshot. This is in hopes of cutting down on the overhead involved in reviewing snapshots. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6637) Move DaemonThreadFactory into Threads and Threads to hbase-common
[ https://issues.apache.org/jira/browse/HBASE-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455119#comment-13455119 ] Jesse Yates commented on HBASE-6637: [~saint@gmail.com] - any thoughts on why hadoopqa isn't running? Should we resubmit (again)? Move DaemonThreadFactory into Threads and Threads to hbase-common - Key: HBASE-6637 URL: https://issues.apache.org/jira/browse/HBASE-6637 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Jesse Yates Assignee: Jesse Yates Priority: Minor Fix For: 0.96.0 Attachments: hbase-6637-r1.patch, hbase-6637-v0.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HBASE-5354) Source to standalone deployment script
[ https://issues.apache.org/jira/browse/HBASE-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates reopened HBASE-5354: Reopening issue, since its apparently useful. Actually came up yesterday with testing one of the pom changes I made. Can you give it another spin @stack and commit if you like it? Source to standalone deployment script -- Key: HBASE-5354 URL: https://issues.apache.org/jira/browse/HBASE-5354 Project: HBase Issue Type: New Feature Components: build, scripts Affects Versions: 0.94.0 Reporter: Jesse Yates Assignee: Jesse Yates Priority: Minor Attachments: bash_HBASE-5354.patch, bash_HBASE-5354-v0.patch, bash_HBASE-5354-v1.patch Automating the testing of source code in a 'real' instance can be a bit of a pain, even getting it into standalone mode. Steps you need to go through: 1) Build the project 2) Copy it to the deployment directory 3) Shutdown the current cluster (if it is running) 4) Untar the tar 5) Update the configs to point to a local data cluster 6) Startup the new deployment Yeah, its not super difficult, but it would be nice to just have a script to make it button push easy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6178) LoadTest tool no longer packaged after the modularization
[ https://issues.apache.org/jira/browse/HBASE-6178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455135#comment-13455135 ] Lars Hofhansl commented on HBASE-6178: -- I'll double check the test failures and then commit. LoadTest tool no longer packaged after the modularization - Key: HBASE-6178 URL: https://issues.apache.org/jira/browse/HBASE-6178 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Jesse Yates Attachments: hbase-6178-v0.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6611) Forcing region state offline cause double assignment
[ https://issues.apache.org/jira/browse/HBASE-6611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455136#comment-13455136 ] Jimmy Xiang commented on HBASE-6611: Sure, I will do that to make sure existing function is not broken, and there is no substantial performance drop. Another thing I'd like to address in this jira is that bulk assigning currently doesn't pass the offlined ZK node version to region server as regular assignment does. I think it is needed to avoid competing assigning the same region at the same time. Forcing region state offline cause double assignment Key: HBASE-6611 URL: https://issues.apache.org/jira/browse/HBASE-6611 Project: HBase Issue Type: Bug Components: master Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 In assigning a region, assignment manager forces the region state offline if it is not. This could cause double assignment, for example, if the region is already assigned and in the Open state, you should not just change it's state to Offline, and assign it again. I think this could be the root cause for all double assignments IF the region state is reliable. After this loophole is closed, TestHBaseFsck should come up a different way to create some assignment inconsistencies, for example, calling region server to open a region directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455143#comment-13455143 ] Lars Hofhansl commented on HBASE-6649: -- Just failed again: https://builds.apache.org/job/PreCommit-HBASE-Build/2852//testReport/ [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5456) Introduce PowerMock into our unit tests to reduce unnecessary method exposure
[ https://issues.apache.org/jira/browse/HBASE-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455151#comment-13455151 ] Jesse Yates commented on HBASE-5456: Reviving this discussion after talking at recent pow-wow. In short, powermock has some _interesting_ features -making it very powerful - that will really help to cleanup the codebase. For instance, it can help get rid of are the test visible methods. Yes, on one hand you could subclass the class you are testing to get at the protected methods, but then you have the issue of making that class loadable as well. It can easily spiral out of control where everything is dynamically loadable, just so you can check the state of one variable. Also, this can lead to inadvertent race conditions for timing related things, where the test-exposed method could be really simple. Also, it helps you get real objects into a state that is more easily testable. Rather than rejiggering everything through a high-level interface, you can specify things succinctly and more easily when you can introspect the object. Another great use is for managing timing issues. A lot of times to test timing of things we rely on sleeps or adding latches. The former is really brittle and the latter makes the code incredibly more complicated than it needs to be, just for testing. Problems with powermock: * complicated - yeah, it can be a bit funky, but you get used to it. * brittle - its doing reflection so there are a lot of string method/object names used. That's the problem with introspection of objects and the price we pay for cleaner running code. Tests break when you change stuff though, so you know if something goes arwy. Stack raised a possible concern that he couldn't get powermock working on the current codebase. However, I volunteered to spend the time to figure that out (at least initially) and don't think it will be all that bad. Thoughts? If people are +1, I'll work on a simple patch that adds powermock to the pom and makes a change to a test to use it. Introduce PowerMock into our unit tests to reduce unnecessary method exposure - Key: HBASE-5456 URL: https://issues.apache.org/jira/browse/HBASE-5456 Project: HBase Issue Type: Task Reporter: Ted Yu We should introduce PowerMock into our unit tests so that we don't have to expose methods intended to be used by unit tests. Here was Benoit's reply to a user of asynchbase about testability: OpenTSDB has unit tests that are mocking out HBaseClient just fine [1]. You can mock out pretty much anything on the JVM: final, private, JDK stuff, etc. All you need is the right tools. I've been very happy with PowerMock. It supports Mockito and EasyMock. I've never been keen on mutilating public interfaces for the sake of testing. With tools like PowerMock, we can keep the public APIs tidy while mocking and overriding anything, even in the most private guts of the classes. [1] https://github.com/stumbleupon/opentsdb/blob/master/src/uid/TestUniqueId.java#L66 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6504) Adding GC details prevents HBase from starting in non-distributed mode
[ https://issues.apache.org/jira/browse/HBASE-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Drzal updated HBASE-6504: - Attachment: HBASE-6504-output.txt Shows output with/without gc args Adding GC details prevents HBase from starting in non-distributed mode -- Key: HBASE-6504 URL: https://issues.apache.org/jira/browse/HBASE-6504 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: Benoit Sigoure Assignee: Michael Drzal Priority: Trivial Labels: noob Attachments: HBASE-6504-output.txt The {{conf/hbase-env.sh}} that ships with HBase contains a few commented out examples of variables that could be useful, such as adding {{-XX:+PrintGCDetails -XX:+PrintGCDateStamps}} to {{HBASE_OPTS}}. This has the annoying side effect that the JVM prints a summary of memory usage when it exits, and it does so on stdout: {code} $ ./bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool hbase.cluster.distributed false Heap par new generation total 19136K, used 4908K [0x00073a20, 0x00073b6c, 0x00075186) eden space 17024K, 28% used [0x00073a20, 0x00073a6cb0a8, 0x00073b2a) from space 2112K, 0% used [0x00073b2a, 0x00073b2a, 0x00073b4b) to space 2112K, 0% used [0x00073b4b, 0x00073b4b, 0x00073b6c) concurrent mark-sweep generation total 63872K, used 0K [0x00075186, 0x0007556c, 0x0007f5a0) concurrent-mark-sweep perm gen total 21248K, used 6994K [0x0007f5a0, 0x0007f6ec, 0x0008) $ ./bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool hbase.cluster.distributed /dev/null (nothing printed) {code} And this confuses {{bin/start-hbase.sh}} when it does {{distMode=`$bin/hbase --config $HBASE_CONF_DIR org.apache.hadoop.hbase.util.HBaseConfTool hbase.cluster.distributed`}}, because then the {{distMode}} variable is not just set to {{false}}, it also contains all this JVM spam. If you don't pay enough attention and realize that 3 processes are getting started (ZK, HM, RS) instead of just one (HM), then you end up with this confusing error message: {{Could not start ZK at requested port of 2181. ZK was started at port: 2182. Aborting as clients (e.g. shell) will not be able to find this ZK quorum.}}, which is even more puzzling because when you run {{netstat}} to see who owns that port, then you won't find any rogue process other than the one you just started. I'm wondering if the fix is not to just change the {{if [ $distMode == 'false' ]}} to a {{switch $distMode case (false*)}} type of test, to work around this annoying JVM misfeature that pollutes stdout. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6504) Adding GC details prevents HBase from starting in non-distributed mode
[ https://issues.apache.org/jira/browse/HBASE-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Drzal updated HBASE-6504: - Attachment: HBASE-6504.patch Adding GC details prevents HBase from starting in non-distributed mode -- Key: HBASE-6504 URL: https://issues.apache.org/jira/browse/HBASE-6504 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: Benoit Sigoure Assignee: Michael Drzal Priority: Trivial Labels: noob Attachments: HBASE-6504-output.txt, HBASE-6504.patch The {{conf/hbase-env.sh}} that ships with HBase contains a few commented out examples of variables that could be useful, such as adding {{-XX:+PrintGCDetails -XX:+PrintGCDateStamps}} to {{HBASE_OPTS}}. This has the annoying side effect that the JVM prints a summary of memory usage when it exits, and it does so on stdout: {code} $ ./bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool hbase.cluster.distributed false Heap par new generation total 19136K, used 4908K [0x00073a20, 0x00073b6c, 0x00075186) eden space 17024K, 28% used [0x00073a20, 0x00073a6cb0a8, 0x00073b2a) from space 2112K, 0% used [0x00073b2a, 0x00073b2a, 0x00073b4b) to space 2112K, 0% used [0x00073b4b, 0x00073b4b, 0x00073b6c) concurrent mark-sweep generation total 63872K, used 0K [0x00075186, 0x0007556c, 0x0007f5a0) concurrent-mark-sweep perm gen total 21248K, used 6994K [0x0007f5a0, 0x0007f6ec, 0x0008) $ ./bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool hbase.cluster.distributed /dev/null (nothing printed) {code} And this confuses {{bin/start-hbase.sh}} when it does {{distMode=`$bin/hbase --config $HBASE_CONF_DIR org.apache.hadoop.hbase.util.HBaseConfTool hbase.cluster.distributed`}}, because then the {{distMode}} variable is not just set to {{false}}, it also contains all this JVM spam. If you don't pay enough attention and realize that 3 processes are getting started (ZK, HM, RS) instead of just one (HM), then you end up with this confusing error message: {{Could not start ZK at requested port of 2181. ZK was started at port: 2182. Aborting as clients (e.g. shell) will not be able to find this ZK quorum.}}, which is even more puzzling because when you run {{netstat}} to see who owns that port, then you won't find any rogue process other than the one you just started. I'm wondering if the fix is not to just change the {{if [ $distMode == 'false' ]}} to a {{switch $distMode case (false*)}} type of test, to work around this annoying JVM misfeature that pollutes stdout. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6504) Adding GC details prevents HBase from starting in non-distributed mode
[ https://issues.apache.org/jira/browse/HBASE-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Drzal updated HBASE-6504: - Status: Patch Available (was: Open) Fixed this for rolling-restart.sh, start-hbase.sh, and stop-hbase.sh by using head -1 to take the first line. Adding GC details prevents HBase from starting in non-distributed mode -- Key: HBASE-6504 URL: https://issues.apache.org/jira/browse/HBASE-6504 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: Benoit Sigoure Assignee: Michael Drzal Priority: Trivial Labels: noob Attachments: HBASE-6504-output.txt, HBASE-6504.patch The {{conf/hbase-env.sh}} that ships with HBase contains a few commented out examples of variables that could be useful, such as adding {{-XX:+PrintGCDetails -XX:+PrintGCDateStamps}} to {{HBASE_OPTS}}. This has the annoying side effect that the JVM prints a summary of memory usage when it exits, and it does so on stdout: {code} $ ./bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool hbase.cluster.distributed false Heap par new generation total 19136K, used 4908K [0x00073a20, 0x00073b6c, 0x00075186) eden space 17024K, 28% used [0x00073a20, 0x00073a6cb0a8, 0x00073b2a) from space 2112K, 0% used [0x00073b2a, 0x00073b2a, 0x00073b4b) to space 2112K, 0% used [0x00073b4b, 0x00073b4b, 0x00073b6c) concurrent mark-sweep generation total 63872K, used 0K [0x00075186, 0x0007556c, 0x0007f5a0) concurrent-mark-sweep perm gen total 21248K, used 6994K [0x0007f5a0, 0x0007f6ec, 0x0008) $ ./bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool hbase.cluster.distributed /dev/null (nothing printed) {code} And this confuses {{bin/start-hbase.sh}} when it does {{distMode=`$bin/hbase --config $HBASE_CONF_DIR org.apache.hadoop.hbase.util.HBaseConfTool hbase.cluster.distributed`}}, because then the {{distMode}} variable is not just set to {{false}}, it also contains all this JVM spam. If you don't pay enough attention and realize that 3 processes are getting started (ZK, HM, RS) instead of just one (HM), then you end up with this confusing error message: {{Could not start ZK at requested port of 2181. ZK was started at port: 2182. Aborting as clients (e.g. shell) will not be able to find this ZK quorum.}}, which is even more puzzling because when you run {{netstat}} to see who owns that port, then you won't find any rogue process other than the one you just started. I'm wondering if the fix is not to just change the {{if [ $distMode == 'false' ]}} to a {{switch $distMode case (false*)}} type of test, to work around this annoying JVM misfeature that pollutes stdout. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5306) Add support for protocol buffer based RPC
[ https://issues.apache.org/jira/browse/HBASE-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das resolved HBASE-5306. Resolution: Duplicate [~gchanan]Yes it can be closed I think. We have taken care of the issue in other jiras as you noted. Add support for protocol buffer based RPC - Key: HBASE-5306 URL: https://issues.apache.org/jira/browse/HBASE-5306 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Devaraj Das Assignee: Devaraj Das This will help HBase to achieve wire compatibility across versions. The idea (to start with) is to leverage the recent work that has gone in in the Hadoop core in this area. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6504) Adding GC details prevents HBase from starting in non-distributed mode
[ https://issues.apache.org/jira/browse/HBASE-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455207#comment-13455207 ] Lars Hofhansl commented on HBASE-6504: -- should this be head -n 1? head -1 works, but it is not document that way. Adding GC details prevents HBase from starting in non-distributed mode -- Key: HBASE-6504 URL: https://issues.apache.org/jira/browse/HBASE-6504 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: Benoit Sigoure Assignee: Michael Drzal Priority: Trivial Labels: noob Attachments: HBASE-6504-output.txt, HBASE-6504.patch The {{conf/hbase-env.sh}} that ships with HBase contains a few commented out examples of variables that could be useful, such as adding {{-XX:+PrintGCDetails -XX:+PrintGCDateStamps}} to {{HBASE_OPTS}}. This has the annoying side effect that the JVM prints a summary of memory usage when it exits, and it does so on stdout: {code} $ ./bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool hbase.cluster.distributed false Heap par new generation total 19136K, used 4908K [0x00073a20, 0x00073b6c, 0x00075186) eden space 17024K, 28% used [0x00073a20, 0x00073a6cb0a8, 0x00073b2a) from space 2112K, 0% used [0x00073b2a, 0x00073b2a, 0x00073b4b) to space 2112K, 0% used [0x00073b4b, 0x00073b4b, 0x00073b6c) concurrent mark-sweep generation total 63872K, used 0K [0x00075186, 0x0007556c, 0x0007f5a0) concurrent-mark-sweep perm gen total 21248K, used 6994K [0x0007f5a0, 0x0007f6ec, 0x0008) $ ./bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool hbase.cluster.distributed /dev/null (nothing printed) {code} And this confuses {{bin/start-hbase.sh}} when it does {{distMode=`$bin/hbase --config $HBASE_CONF_DIR org.apache.hadoop.hbase.util.HBaseConfTool hbase.cluster.distributed`}}, because then the {{distMode}} variable is not just set to {{false}}, it also contains all this JVM spam. If you don't pay enough attention and realize that 3 processes are getting started (ZK, HM, RS) instead of just one (HM), then you end up with this confusing error message: {{Could not start ZK at requested port of 2181. ZK was started at port: 2182. Aborting as clients (e.g. shell) will not be able to find this ZK quorum.}}, which is even more puzzling because when you run {{netstat}} to see who owns that port, then you won't find any rogue process other than the one you just started. I'm wondering if the fix is not to just change the {{if [ $distMode == 'false' ]}} to a {{switch $distMode case (false*)}} type of test, to work around this annoying JVM misfeature that pollutes stdout. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6282) The introspection, etc. of objects in the RPC has to be handled for PB objects
[ https://issues.apache.org/jira/browse/HBASE-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated HBASE-6282: --- Issue Type: Sub-task (was: Bug) Parent: HBASE-5305 The introspection, etc. of objects in the RPC has to be handled for PB objects -- Key: HBASE-6282 URL: https://issues.apache.org/jira/browse/HBASE-6282 Project: HBase Issue Type: Sub-task Components: ipc Reporter: Devaraj Das Priority: Blocker Fix For: 0.96.0 The places where the type of objects are inspected need to be updated to take into consideration PB types. I have noticed Objects.describeQuantity being used, and the private WritableRpcEngine.Server.logResponse method also needs updating (in the PB world, all information about operations/tablenames is contained in one PB argument). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6414) Remove the WritableRpcEngine associated Invocation classes
[ https://issues.apache.org/jira/browse/HBASE-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated HBASE-6414: --- Issue Type: Sub-task (was: Improvement) Parent: HBASE-5305 Remove the WritableRpcEngine associated Invocation classes Key: HBASE-6414 URL: https://issues.apache.org/jira/browse/HBASE-6414 Project: HBase Issue Type: Sub-task Affects Versions: 0.96.0 Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0 Attachments: 6414-1.patch.txt, 6414-3.patch.txt, 6414-4.patch.txt, 6414-4.patch.txt, 6414-5.patch.txt, 6414-5.patch.txt, 6414-5.patch.txt, 6414-6.patch.txt, 6414-6.patch.txt, 6414-6.txt, 6414-initial.patch.txt, 6414-initial.patch.txt, 6414-v7.txt Remove the WritableRpcEngine Invocation classes once HBASE-5705 gets committed and all the protocols are rebased to use PB. Raising this jira in advance.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6591) checkAndPut executed/not metrics
[ https://issues.apache.org/jira/browse/HBASE-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455210#comment-13455210 ] Gregory Chanan commented on HBASE-6591: --- Any thoughts on the use case Michael/Lars? Worth doing? Or push people to actually log the results in the client? If it's worthwhile, what granularity? checkAndPut executed/not metrics Key: HBASE-6591 URL: https://issues.apache.org/jira/browse/HBASE-6591 Project: HBase Issue Type: Task Components: metrics, regionserver Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Minor Fix For: 0.96.0 checkAndPut/checkAndDelete return true if the new put was executed, false otherwise. So clients can figure out this metric for themselves, but it would be useful to get a look at what is happening on the cluster as a whole, across all clients. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6504) Adding GC details prevents HBase from starting in non-distributed mode
[ https://issues.apache.org/jira/browse/HBASE-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455211#comment-13455211 ] Michael Drzal commented on HBASE-6504: -- I always use the old style syntax since it is more portable. From the coreutils info page: For compatibility `head' also supports an obsolete option syntax `-COUNTOPTIONS', which is recognized only if it is specified first. COUNT is a decimal number optionally followed by a size letter (`b', `k', `m') as in `-c', or `l' to mean count by lines, or other option letters (`cqv'). Scripts intended for standard hosts should use `-c COUNT' or `-n COUNT' instead. If your script must also run on hosts that support only the obsolete syntax, it is usually simpler to avoid `head', e.g., by using `sed 5q' instead of `head -5'. I can change it to head -n 1 if you would like. Adding GC details prevents HBase from starting in non-distributed mode -- Key: HBASE-6504 URL: https://issues.apache.org/jira/browse/HBASE-6504 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: Benoit Sigoure Assignee: Michael Drzal Priority: Trivial Labels: noob Attachments: HBASE-6504-output.txt, HBASE-6504.patch The {{conf/hbase-env.sh}} that ships with HBase contains a few commented out examples of variables that could be useful, such as adding {{-XX:+PrintGCDetails -XX:+PrintGCDateStamps}} to {{HBASE_OPTS}}. This has the annoying side effect that the JVM prints a summary of memory usage when it exits, and it does so on stdout: {code} $ ./bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool hbase.cluster.distributed false Heap par new generation total 19136K, used 4908K [0x00073a20, 0x00073b6c, 0x00075186) eden space 17024K, 28% used [0x00073a20, 0x00073a6cb0a8, 0x00073b2a) from space 2112K, 0% used [0x00073b2a, 0x00073b2a, 0x00073b4b) to space 2112K, 0% used [0x00073b4b, 0x00073b4b, 0x00073b6c) concurrent mark-sweep generation total 63872K, used 0K [0x00075186, 0x0007556c, 0x0007f5a0) concurrent-mark-sweep perm gen total 21248K, used 6994K [0x0007f5a0, 0x0007f6ec, 0x0008) $ ./bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool hbase.cluster.distributed /dev/null (nothing printed) {code} And this confuses {{bin/start-hbase.sh}} when it does {{distMode=`$bin/hbase --config $HBASE_CONF_DIR org.apache.hadoop.hbase.util.HBaseConfTool hbase.cluster.distributed`}}, because then the {{distMode}} variable is not just set to {{false}}, it also contains all this JVM spam. If you don't pay enough attention and realize that 3 processes are getting started (ZK, HM, RS) instead of just one (HM), then you end up with this confusing error message: {{Could not start ZK at requested port of 2181. ZK was started at port: 2182. Aborting as clients (e.g. shell) will not be able to find this ZK quorum.}}, which is even more puzzling because when you run {{netstat}} to see who owns that port, then you won't find any rogue process other than the one you just started. I'm wondering if the fix is not to just change the {{if [ $distMode == 'false' ]}} to a {{switch $distMode case (false*)}} type of test, to work around this annoying JVM misfeature that pollutes stdout. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6591) checkAndPut executed/not metrics
[ https://issues.apache.org/jira/browse/HBASE-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455213#comment-13455213 ] Michael Drzal commented on HBASE-6591: -- I'm indifferent. It is really up to you. If this is a pain point for you and you feel like putting in the work to create a patch or convincing others that this is important, go for it. checkAndPut executed/not metrics Key: HBASE-6591 URL: https://issues.apache.org/jira/browse/HBASE-6591 Project: HBase Issue Type: Task Components: metrics, regionserver Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Minor Fix For: 0.96.0 checkAndPut/checkAndDelete return true if the new put was executed, false otherwise. So clients can figure out this metric for themselves, but it would be useful to get a look at what is happening on the cluster as a whole, across all clients. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira