date:20120913


 [ 
https://issues.apache.org/jira/browse/HBASE-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-6769:
-

Attachment: HBASE-6769-0.94-1.patch

Here's the patch without the FailedSanityCheckException.  I put in comments 
around everywhere that catches the exception so hopefully that will keep things 
sane.

TestHRegion and TestFromClientSide are both passing on my machine locally.  
Running the rest of the suite right now.

 HRS.multi eats NoSuchColumnFamilyException since HBASE-5021
 ---

 Key: HBASE-6769
 URL: https://issues.apache.org/jira/browse/HBASE-6769
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.94.1
Reporter: Jean-Daniel Cryans
Assignee: Elliott Clark
Priority: Critical
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6769-0.94-0.patch, HBASE-6769-0.94-1.patch, 
 HBASE-6769-0.patch


 I think this is a pretty major usability regression, since HBASE-5021 this is 
 what you get in the client when using a wrong family:
 {noformat}
 2012-09-11 09:45:29,634 WARN 
 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 
 action: DoNotRetryIOException: 1 time, servers with issues: sfor3s44:10304, 
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1601)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1377)
   at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916)
   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:772)
   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:747)
 {noformat}
 Then you have to log on the server to understand what failed.
 Since everything is now a multi call, even single puts in the shell fail like 
 this.
 This is present since 0.94.0
 Assigning to Elliott because he asked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-5447) Support for custom filters with PB-based RPC

2012-09-13 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-5447.
--

   Resolution: Fixed
Fix Version/s: 0.96.0
 Assignee: Gregory Chanan  (was: Todd Lipcon)
 Hadoop Flags: Reviewed

Closing.  Assigned Gregory.

Regards custom filters, let them come out of the woodwork.  We'll help them 
make the convertion to pb.

 Support for custom filters with PB-based RPC
 

 Key: HBASE-5447
 URL: https://issues.apache.org/jira/browse/HBASE-5447
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Todd Lipcon
Assignee: Gregory Chanan
 Fix For: 0.96.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6769) HRS.multi eats NoSuchColumnFamilyException since HBASE-5021

2012-09-13 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454679#comment-13454679
]

Hadoop QA commented on HBASE-6769:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12544945/HBASE-6769-0.94-1.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 6 new or modified tests.

-1 patch. The patch command could not apply the patch.

Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/2856//console

This message is automatically generated.

HRS.multi eats NoSuchColumnFamilyException since HBASE-5021
---

Key: HBASE-6769
URL: https://issues.apache.org/jira/browse/HBASE-6769
Project: HBase
Issue Type: Bug
Affects Versions: 0.94.0, 0.94.1
Reporter: Jean-Daniel Cryans
Assignee: Elliott Clark
Priority: Critical
Fix For: 0.96.0, 0.94.2

Attachments: HBASE-6769-0.94-0.patch, HBASE-6769-0.94-1.patch,
HBASE-6769-0.patch

I think this is a pretty major usability regression, since HBASE-5021 this is
what you get in the client when using a wrong family:
{noformat}
2012-09-11 09:45:29,634 WARN
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1
action: DoNotRetryIOException: 1 time, servers with issues: sfor3s44:10304,
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1601)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1377)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916)
at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:772)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:747)
{noformat}
Then you have to log on the server to understand what failed.
Since everything is now a multi call, even single puts in the shell fail like
this.
This is present since 0.94.0
Assigning to Elliott because he asked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6769) HRS.multi eats NoSuchColumnFamilyException since HBASE-5021


[ 
https://issues.apache.org/jira/browse/HBASE-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454683#comment-13454683
 ] 

Lars Hofhansl commented on HBASE-6769:
--

+1 on 0.94 patch as well.

This comment is weird, as it refers to a non existing exception.
{code}
+  // Don't send a FailedSanityCheckException as older clients will 
not know about
+  // that class being a subclass of DoNotRetryIOException
+  // and will retry mutations that will never succeed.
{code}

Don't post a new patch :) ... I'll change the comment on commit:
{code}
// Use generic DoNotRetryIOException so that older clients know how to deal 
with it.
{code}


 HRS.multi eats NoSuchColumnFamilyException since HBASE-5021
 ---

 Key: HBASE-6769
 URL: https://issues.apache.org/jira/browse/HBASE-6769
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.94.1
Reporter: Jean-Daniel Cryans
Assignee: Elliott Clark
Priority: Critical
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6769-0.94-0.patch, HBASE-6769-0.94-1.patch, 
 HBASE-6769-0.patch


 I think this is a pretty major usability regression, since HBASE-5021 this is 
 what you get in the client when using a wrong family:
 {noformat}
 2012-09-11 09:45:29,634 WARN 
 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 
 action: DoNotRetryIOException: 1 time, servers with issues: sfor3s44:10304, 
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1601)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1377)
   at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916)
   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:772)
   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:747)
 {noformat}
 Then you have to log on the server to understand what failed.
 Since everything is now a multi call, even single puts in the shell fail like 
 this.
 This is present since 0.94.0
 Assigning to Elliott because he asked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6769) HRS.multi eats NoSuchColumnFamilyException since HBASE-5021


[ 
https://issues.apache.org/jira/browse/HBASE-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454685#comment-13454685
 ] 

Elliott Clark commented on HBASE-6769:
--

haha yeah I guess I have more context than 0.94 source would give a reader.  
Sorry about that.

 HRS.multi eats NoSuchColumnFamilyException since HBASE-5021
 ---

 Key: HBASE-6769
 URL: https://issues.apache.org/jira/browse/HBASE-6769
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.94.1
Reporter: Jean-Daniel Cryans
Assignee: Elliott Clark
Priority: Critical
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6769-0.94-0.patch, HBASE-6769-0.94-1.patch, 
 HBASE-6769-0.patch


 I think this is a pretty major usability regression, since HBASE-5021 this is 
 what you get in the client when using a wrong family:
 {noformat}
 2012-09-11 09:45:29,634 WARN 
 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 
 action: DoNotRetryIOException: 1 time, servers with issues: sfor3s44:10304, 
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1601)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1377)
   at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916)
   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:772)
   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:747)
 {noformat}
 Then you have to log on the server to understand what failed.
 Since everything is now a multi call, even single puts in the shell fail like 
 this.
 This is present since 0.94.0
 Assigning to Elliott because he asked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5306) Add support for protocol buffer based RPC


[ 
https://issues.apache.org/jira/browse/HBASE-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454688#comment-13454688
 ] 

Gregory Chanan commented on HBASE-5306:
---

Do you think there is more to do here Devaraj?  Or does HBASE-5705 and 
HBASE-5451 cover this?

 Add support for protocol buffer based RPC
 -

 Key: HBASE-5306
 URL: https://issues.apache.org/jira/browse/HBASE-5306
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Devaraj Das
Assignee: Devaraj Das

 This will help HBase to achieve wire compatibility across versions. The idea 
 (to start with) is to leverage the recent work that has gone in in the Hadoop 
 core in this area.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3529) Add search to HBase

2012-09-13 Thread liusheding (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

liusheding updated HBASE-3529:
--

Description:
Using the Apache Lucene library we can add freetext search to HBase. The
advantages of this are:

* HBase is highly scalable and distributed
* HBase is realtime
* Lucene is a fast inverted index and will soon be realtime (see LUCENE-2312)
* Lucene offers many types of queries not currently available in HBase (eg,
AND, OR, NOT, phrase, etc)
* It's easier to build scalable realtime systems on top of already
architecturally sound, scalable realtime data system, eg, HBase.
* Scaling realtime search will be as simple as scaling HBase.

Phase 1 - Indexing:

* Integrate Lucene into HBase such that an index mirrors a given region. This
means cascading add, update, and deletes between a Lucene index and an HBase
region (and vice versa).
* Define meta-data to mark a region as indexed, and use a Solr schema to allow
the user to define the fields and analyzers.
* Integrate with the HLog to ensure that index recovery can occur properly (eg,
on region server failure)
* Mirror region splits with indexes (use Lucene's IndexSplitter?)
* When a region is written to HDFS, also write the corresponding Lucene index
to HDFS.
* A row key will be the ID of a given Lucene document. The Lucene docstore
will explicitly not be used because the document/row data is stored in HBase.
We will need to solve what the best data structure for efficiently mapping a
docid - row key is. It could be a docstore, field cache, column stride
fields, or some other mechanism.
* Write unit tests for the above

Phase 2 - Queries:

* Enable distributed Lucene queries
* Regions that have Lucene indexes are inherently available and may be searched
on, meaning there's no need for a separate search related system in Zookeeper.
* Integrate search with HBase's RPC mechanis

was:
Using the Apache Lucene library we can add freetext search to HBase. The
advantages of this are:

Phase 1 - Indexing:

Phase 2 - Queries:

Add search to HBase
---

Key: HBASE-3529
URL: https://issues.apache.org/jira/browse/HBASE-3529
Project: HBase
Issue Type: Improvement
Affects Versions: 0.90.0
Reporter: Jason Rutherglen
Attachments: HBASE-3529.patch, HDFS-APPEND-0.20-LOCAL-FILE.patch

Using the Apache Lucene library we can add freetext search to HBase. The
advantages of this are:
* HBase is highly scalable and distributed
* HBase is realtime
* Lucene is a fast inverted index and will soon be realtime (see LUCENE-2312)
* Lucene offers many types of queries not currently available in HBase (eg,
AND, OR, NOT, phrase, etc)
* It's easier to build scalable realtime systems on top of already
architecturally sound, scalable realtime data system, eg, HBase.
* Scaling realtime search will be as simple as scaling HBase.
Phase 1 - Indexing:
* Integrate Lucene into HBase such that an index mirrors a given region.
This means cascading add, update, and deletes between a Lucene index and an
HBase region (and vice versa).
* Define meta-data to mark a region as indexed, and use a Solr schema to
allow the user to define the fields and analyzers.

[jira] [Commented] (HBASE-6769) HRS.multi eats NoSuchColumnFamilyException since HBASE-5021


[ 
https://issues.apache.org/jira/browse/HBASE-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454709#comment-13454709
 ] 

Elliott Clark commented on HBASE-6769:
--

All tests passed locally for 0.94.

 HRS.multi eats NoSuchColumnFamilyException since HBASE-5021
 ---

 Key: HBASE-6769
 URL: https://issues.apache.org/jira/browse/HBASE-6769
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.94.1
Reporter: Jean-Daniel Cryans
Assignee: Elliott Clark
Priority: Critical
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6769-0.94-0.patch, HBASE-6769-0.94-1.patch, 
 HBASE-6769-0.patch


 I think this is a pretty major usability regression, since HBASE-5021 this is 
 what you get in the client when using a wrong family:
 {noformat}
 2012-09-11 09:45:29,634 WARN 
 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 
 action: DoNotRetryIOException: 1 time, servers with issues: sfor3s44:10304, 
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1601)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1377)
   at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916)
   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:772)
   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:747)
 {noformat}
 Then you have to log on the server to understand what failed.
 Since everything is now a multi call, even single puts in the shell fail like 
 this.
 This is present since 0.94.0
 Assigning to Elliott because he asked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-5971) ServerLoad needs redo; can't be pb based


 [ 
https://issues.apache.org/jira/browse/HBASE-5971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark resolved HBASE-5971.
--

Resolution: Not A Problem

This was fixed in HBASE-6411 which is a sub issue of HBASE-4050

 ServerLoad needs redo; can't be pb based
 

 Key: HBASE-5971
 URL: https://issues.apache.org/jira/browse/HBASE-5971
 Project: HBase
  Issue Type: Bug
  Components: metrics
Reporter: stack
Priority: Blocker
 Fix For: 0.96.0


 Here is what happens when we try to register server bean:
 {code}
 javax.management.NotCompliantMBeanException: 
 org.apache.hadoop.hbase.master.MXBean: Method 
 org.apache.hadoop.hbase.master.MXBean.getRegionServers has parameter or 
 return type that cannot be translated into an open type
   at 
 com.sun.jmx.mbeanserver.Introspector.throwException(Introspector.java:412)
   at com.sun.jmx.mbeanserver.MBeanAnalyzer.init(MBeanAnalyzer.java:101)
   at com.sun.jmx.mbeanserver.MBeanAnalyzer.analyzer(MBeanAnalyzer.java:87)
   at 
 com.sun.jmx.mbeanserver.MXBeanIntrospector.getAnalyzer(MXBeanIntrospector.java:53)
   at 
 com.sun.jmx.mbeanserver.MBeanIntrospector.getPerInterface(MBeanIntrospector.java:163)
   at com.sun.jmx.mbeanserver.MBeanSupport.init(MBeanSupport.java:147)
   at com.sun.jmx.mbeanserver.MXBeanSupport.init(MXBeanSupport.java:48)
   at 
 com.sun.jmx.mbeanserver.Introspector.makeDynamicMBean(Introspector.java:184)
   at 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:915)
   at 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:312)
   at 
 com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:482)
   at 
 org.apache.hadoop.metrics.util.MBeanUtil.registerMBean(MBeanUtil.java:58)
   at 
 org.apache.hadoop.hbase.master.HMaster.registerMBean(HMaster.java:1926)
   at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:617)
   at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:367)
   at java.lang.Thread.run(Thread.java:680)
 Caused by: java.lang.IllegalArgumentException: Method 
 org.apache.hadoop.hbase.master.MXBean.getRegionServers has parameter or 
 return type that cannot be translated into an open type
   at 
 com.sun.jmx.mbeanserver.ConvertingMethod.from(ConvertingMethod.java:32)
   at 
 com.sun.jmx.mbeanserver.MXBeanIntrospector.mFrom(MXBeanIntrospector.java:63)
   at 
 com.sun.jmx.mbeanserver.MXBeanIntrospector.mFrom(MXBeanIntrospector.java:33)
   at 
 com.sun.jmx.mbeanserver.MBeanAnalyzer.initMaps(MBeanAnalyzer.java:118)
   at com.sun.jmx.mbeanserver.MBeanAnalyzer.init(MBeanAnalyzer.java:99)
   ... 14 more
 Caused by: javax.management.openmbean.OpenDataException: Cannot convert type: 
 java.util.Mapjava.lang.String, org.apache.hadoop.hbase.ServerLoad
   at 
 com.sun.jmx.mbeanserver.OpenConverter.openDataException(OpenConverter.java:1411)
   at 
 com.sun.jmx.mbeanserver.OpenConverter.toConverter(OpenConverter.java:264)
   at 
 com.sun.jmx.mbeanserver.ConvertingMethod.init(ConvertingMethod.java:184)
   at 
 com.sun.jmx.mbeanserver.ConvertingMethod.from(ConvertingMethod.java:27)
   ... 18 more
 Caused by: javax.management.openmbean.OpenDataException: Cannot convert type: 
 class org.apache.hadoop.hbase.ServerLoad
   at 
 com.sun.jmx.mbeanserver.OpenConverter.openDataException(OpenConverter.java:1411)
   at 
 com.sun.jmx.mbeanserver.OpenConverter.toConverter(OpenConverter.java:264)
   at 
 com.sun.jmx.mbeanserver.OpenConverter.makeTabularConverter(OpenConverter.java:360)
   at 
 com.sun.jmx.mbeanserver.OpenConverter.makeParameterizedConverter(OpenConverter.java:402)
   at 
 com.sun.jmx.mbeanserver.OpenConverter.makeConverter(OpenConverter.java:296)
   at 
 com.sun.jmx.mbeanserver.OpenConverter.toConverter(OpenConverter.java:262)
   ... 20 more
 Caused by: javax.management.openmbean.OpenDataException: Cannot convert type: 
 java.util.Listorg.apache.hadoop.hbase.protobuf.generated.HBaseProtos$Coprocessor
   at 
 com.sun.jmx.mbeanserver.OpenConverter.openDataException(OpenConverter.java:1411)
   at 
 com.sun.jmx.mbeanserver.OpenConverter.toConverter(OpenConverter.java:264)
   at 
 com.sun.jmx.mbeanserver.OpenConverter.makeCompositeConverter(OpenConverter.java:467)
   at 
 com.sun.jmx.mbeanserver.OpenConverter.makeConverter(OpenConverter.java:293)
   at 
 com.sun.jmx.mbeanserver.OpenConverter.toConverter(OpenConverter.java:262)
   ... 24 more
 Caused by: javax.management.openmbean.OpenDataException: Cannot convert type: 
 class org.apache.hadoop.hbase.protobuf.generated.HBaseProtos$Coprocessor
   at

[jira] [Updated] (HBASE-6413) Investigate having hbase-env.sh decide which hadoop-compat to include


 [ 
https://issues.apache.org/jira/browse/HBASE-6413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-6413:
-

Component/s: metrics

 Investigate having hbase-env.sh decide which hadoop-compat to include
 -

 Key: HBASE-6413
 URL: https://issues.apache.org/jira/browse/HBASE-6413
 Project: HBase
  Issue Type: Sub-task
  Components: metrics
Reporter: Elliott Clark

 Allow for one package to be created with both compat jars in and have 
 hbase-env load the correct one.  This would simplify shipping tar.gz's

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6408) Naming and documenting of the hadoop-metrics2.properties file


 [ 
https://issues.apache.org/jira/browse/HBASE-6408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-6408:
-

Component/s: metrics

 Naming and documenting of the hadoop-metrics2.properties file
 -

 Key: HBASE-6408
 URL: https://issues.apache.org/jira/browse/HBASE-6408
 Project: HBase
  Issue Type: Sub-task
  Components: metrics
Affects Versions: 0.96.0
Reporter: Elliott Clark
Assignee: Elliott Clark
Priority: Blocker
 Attachments: HBASE-6408-0.patch


 hadoop-metrics2.properties is currently where metrics2 loads it's sinks.
 This file could be better named, hadoop-hbase-metrics2.properties
 In addition it needs examples like the current hadoop-metrics.properties has.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6412) Move external servers to metrics2 (thrift,thrift2,rest)


 [ 
https://issues.apache.org/jira/browse/HBASE-6412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-6412:
-

Component/s: metrics

 Move external servers to metrics2 (thrift,thrift2,rest)
 ---

 Key: HBASE-6412
 URL: https://issues.apache.org/jira/browse/HBASE-6412
 Project: HBase
  Issue Type: Sub-task
  Components: metrics
Affects Versions: 0.96.0
Reporter: Elliott Clark
Assignee: Elliott Clark
Priority: Blocker
 Attachments: HBASE-6412-0.patch, HBASE-6412-1.patch, 
 HBASE-6412-2.patch, HBASE-6412-3.patch, HBASE-6412-4.patch, HBASE-6412-5.patch


 Implement metrics2 for all the external servers:
 * Thrift
 * Thrift2
 * Rest

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6410) Move RegionServer Metrics to metrics2


 [ 
https://issues.apache.org/jira/browse/HBASE-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-6410:
-

Component/s: metrics

 Move RegionServer Metrics to metrics2
 -

 Key: HBASE-6410
 URL: https://issues.apache.org/jira/browse/HBASE-6410
 Project: HBase
  Issue Type: Sub-task
  Components: metrics
Affects Versions: 0.96.0
Reporter: Elliott Clark
Assignee: Alex Baranau
Priority: Blocker
 Attachments: HBASE-6410.patch


 Move RegionServer Metrics to metrics2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6717) Remove hadoop-metrics.properties when everything has moved over.


 [ 
https://issues.apache.org/jira/browse/HBASE-6717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-6717:
-

Component/s: metrics

 Remove hadoop-metrics.properties when everything has moved over.
 

 Key: HBASE-6717
 URL: https://issues.apache.org/jira/browse/HBASE-6717
 Project: HBase
  Issue Type: Sub-task
  Components: metrics
Reporter: Elliott Clark
Assignee: Elliott Clark



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6409) Create histogram class for metrics 2


 [ 
https://issues.apache.org/jira/browse/HBASE-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-6409:
-

Component/s: metrics

 Create histogram class for metrics 2
 

 Key: HBASE-6409
 URL: https://issues.apache.org/jira/browse/HBASE-6409
 Project: HBase
  Issue Type: Sub-task
  Components: metrics
Affects Versions: 0.96.0
Reporter: Elliott Clark
Assignee: Elliott Clark
Priority: Blocker
 Attachments: HBASE-6409-0.patch, HBASE-6409-1.patch, 
 HBASE-6409-2.patch, HBASE-6409-3.patch, HBASE-6409-4.patch


 Create the replacement for MetricsHistogram and PersistantTimeVaryingRate 
 classes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4366) dynamic metrics logging

[
https://issues.apache.org/jira/browse/HBASE-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454716#comment-13454716
]

Elliott Clark commented on HBASE-4366:
--

Seems like this has been addressed in 0.94+ we now have Per Region Metrics, Per
CF metrics, and per block type.

Are there other requirements or has this been completed ?

dynamic metrics logging
---

Key: HBASE-4366
URL: https://issues.apache.org/jira/browse/HBASE-4366
Project: HBase
Issue Type: New Feature
Components: metrics
Reporter: Ming Ma
Assignee: Ming Ma

First, if there is existing solution for this, I would close this jira. Also
I realize we already have various overlapping solutions; creating another
solution isn't necessarily the best approach. However, I couldn't find
anything that can meet the need. So open this jira for discussion.
We have some scenarios in hbase/mapreduce/hdfs that requires logging large
number of dynamic metrics. They can be used for troubleshooting, better
measurement on the system and scorecard. For example,

1.HBase. Get metrics such as request per sec that are specific to a table, or
column family.
2.Mapreduce Job history analysis. Would like to found out all the job ids
that are submitted, completed, etc. in a specific time window.
For troubleshooting, what people usually do today, 1) Use current
machine-level metrics to find out which machine has the issue. 2) go to that
machine, analysis the local log.
The characteristics of such kind of metrics:

1.It isn't something that can be predefined. The key such as table name, job
id is dynamic.
2.The number of such metrics could be much larger than what the current
metrics framework can handle.
3.We don't have a scenario that require near real time query support, e.g.,
from the time the metrics is generated to the time it is available to query
can be at like an hour.
4.How data is consumed is highly application specific.
Some ideas:
1. Provide some interface for any application to log data.
2. The metrics can be written to log files. The log files or log entries will
be loaded to HBase, or HDFS asynchronously. That could go to a separate
cluster.
3. To consume such data, application could run map reduce job on the log
files for aggregation, or do random read directly from HBase.
Comments?

[jira] [Updated] (HBASE-6261) Better approximate high-percentile percentile latency metrics

[
https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Elliott Clark updated HBASE-6261:
-

Component/s: metrics

Better approximate high-percentile percentile latency metrics
-

Key: HBASE-6261
URL: https://issues.apache.org/jira/browse/HBASE-6261
Project: HBase
Issue Type: New Feature
Components: metrics
Reporter: Andrew Wang
Assignee: Andrew Wang
Labels: metrics
Attachments: Latencyestimation.pdf, MetricsHistogram.data, parse.py,
SampleQuantiles.data

The existing reservoir-sampling based latency metrics in HBase are not
well-suited for providing accurate estimates of high-percentile (e.g. 90th,
95th, or 99th) latency. This is a well-studied problem in the literature (see
[1] and [2]), the question is determining which methods best suit our needs
and then implementing it.
Ideally, we should be able to estimate these high percentiles with minimal
memory and CPU usage as well as minimal error (e.g. 1% error on 90th, or .1%
on 99th). It's also desirable to provide this over different time-based
sliding windows, e.g. last 1 min, 5 mins, 15 mins, and 1 hour.
I'll note that this would also be useful in HDFS, or really anywhere latency
metrics are kept.
[1] http://www.cs.rutgers.edu/~muthu/bquant.pdf
[2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf

[jira] [Commented] (HBASE-6500) hbck comlaining, Exception in thread main java.lang.NullPointerException

2012-09-13 Thread liuli (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454770#comment-13454770
 ] 

liuli commented on HBASE-6500:
--

Mr. Sorry for wrong type.

 hbck comlaining, Exception in thread main java.lang.NullPointerException 
 ---

 Key: HBASE-6500
 URL: https://issues.apache.org/jira/browse/HBASE-6500
 Project: HBase
  Issue Type: Bug
  Components: hbck
Affects Versions: 0.94.0
 Environment: Hadoop 0.20.205.0
 Zookeeper: zookeeper-3.3.5.jar
 Hbase: hbase-0.94.0
Reporter: liuli

 I met problem with starting Hbase:
 I have 5 machines (Ubuntu)
 109.123.121.23 rsmm-master.example.com
 109.123.121.24 rsmm-slave-1.example.com
 109.123.121.25 rsmm-slave-2.example.com
 109.123.121.26 rsmm-slave-3.example.com
 109.123.121.27 rsmm-slave-4.example.com
 Hadoop 0.20.205.0
 Zookeeper: zookeeper-3.3.5.jar
 Hbase: hbase-0.94.0
 After starting HBase, running hbck
 hduser@rsmm-master:~/hbase/bin$ ./hbase hbck
 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client 
 environment:zookeeper.version=3.3.5-1301095, built on 03/15/2012 19:48 GMT
 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client 
 environment:host.name=rsmm-master.example.com
 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client 
 environment:java.version=1.6.0_33
 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client 
 environment:java.vendor=Sun Microsystems Inc.
 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client 
 environment:java.home=/usr/java/jre1.6.0_33
 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client

[jira] [Commented] (HBASE-6500) hbck comlaining, Exception in thread main java.lang.NullPointerException

2012-09-13 Thread liuli (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454768#comment-13454768
 ] 

liuli commented on HBASE-6500:
--

Ms. @Lars Hofhansl, yes, it perfectly solve this issue.
You can close this ticket now.

 hbck comlaining, Exception in thread main java.lang.NullPointerException 
 ---

 Key: HBASE-6500
 URL: https://issues.apache.org/jira/browse/HBASE-6500
 Project: HBase
  Issue Type: Bug
  Components: hbck
Affects Versions: 0.94.0
 Environment: Hadoop 0.20.205.0
 Zookeeper: zookeeper-3.3.5.jar
 Hbase: hbase-0.94.0
Reporter: liuli

 I met problem with starting Hbase:
 I have 5 machines (Ubuntu)
 109.123.121.23 rsmm-master.example.com
 109.123.121.24 rsmm-slave-1.example.com
 109.123.121.25 rsmm-slave-2.example.com
 109.123.121.26 rsmm-slave-3.example.com
 109.123.121.27 rsmm-slave-4.example.com
 Hadoop 0.20.205.0
 Zookeeper: zookeeper-3.3.5.jar
 Hbase: hbase-0.94.0
 After starting HBase, running hbck
 hduser@rsmm-master:~/hbase/bin$ ./hbase hbck
 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client 
 environment:zookeeper.version=3.3.5-1301095, built on 03/15/2012 19:48 GMT
 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client 
 environment:host.name=rsmm-master.example.com
 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client 
 environment:java.version=1.6.0_33
 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client 
 environment:java.vendor=Sun Microsystems Inc.
 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client 
 environment:java.home=/usr/java/jre1.6.0_33
 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client

[jira] [Commented] (HBASE-6528) Raise the wait time for TestSplitLogWorker#testAcquireTaskAtStartup to reduce the failure probability

2012-09-13 Thread ShiXing (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454784#comment-13454784
 ] 

ShiXing commented on HBASE-6528:


[~lhofhansl] this test case was introduced when I fix HBASE-6520 which does not 
have any relationship with TestSplitLogWorker#testAcquireTaskAtStartup.

I don't see any failing till now~

 Raise the wait time for TestSplitLogWorker#testAcquireTaskAtStartup to reduce 
 the failure probability
 -

 Key: HBASE-6528
 URL: https://issues.apache.org/jira/browse/HBASE-6528
 Project: HBase
  Issue Type: Bug
Reporter: ShiXing
Assignee: ShiXing
 Attachments: HBASE-6528-trunk-v1.patch


 All the code for the TestSplitLogWorker, only the testAcquireTaskAtStartup 
 waits 100ms, other testCase wait 1000ms. The 100ms is short and sometimes 
 causes testAcquireTaskAtStartup failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6658) Rename WritableByteArrayComparable to something not mentioning Writable


[ 
https://issues.apache.org/jira/browse/HBASE-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454825#comment-13454825
 ] 

Hudson commented on HBASE-6658:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #171 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/171/])
HBASE-6658 Rename WritableByteArrayComparable to something not mentioning 
Writable (Revision 1384191)

 Result = FAILURE
gchanan : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/RegionObserver.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/BinaryComparator.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/BinaryPrefixComparator.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/BitComparator.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/ByteArrayComparable.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/CompareFilter.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/DependentColumnFilter.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/FamilyFilter.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/Filter.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/NullComparator.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/ParseFilter.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/QualifierFilter.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/RegexStringComparator.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/RowFilter.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueExcludeFilter.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/SubstringComparator.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/ValueFilter.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/WritableByteArrayComparable.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/rest/model/ScannerModel.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestFakeKeyInFilter.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/TestHbaseObjectWritable.java


 Rename WritableByteArrayComparable to something not mentioning Writable
 ---

 Key: HBASE-6658
 URL: https://issues.apache.org/jira/browse/HBASE-6658
 Project: HBase
  Issue Type: Bug
Reporter: Gregory Chanan
Assignee: Gregory Chanan
Priority: Minor
 Fix For: 0.96.0

 Attachments: HBASE-6658.patch, HBASE-6658-v3.patch, 
 HBASE-6658-v4.patch, HBASE-6658-v5.patch, HBASE-6658-v6.patch


 After HBASE-6477, WritableByteArrayComparable will no longer be Writable, so 
 should be renamed.
 Current idea is ByteArrayComparator (since all the derived classes are 
 *Comparator not *Comparable), but I'm open to suggestions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.

2012-09-13 Thread Maryann Xue (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maryann Xue updated HBASE-6299:
---

Attachment: HBASE-6299-v3.patch

Considering a live RS would most likely eventually get to the openRegion() 
request and process, it might be good just to return on SocketTimeoutException, 
for SocketTimeoutException indicates an uncertain state in the assign process, 
with potential race conditions. And this can happen if a RS is temporarily 
running out of IPC handlers, or if the RS's response is lost on the line.

 RS starts region open while fails ack to HMaster.sendRegionOpen() causes 
 inconsistency in HMaster's region state and a series of successive problems.
 -

 Key: HBASE-6299
 URL: https://issues.apache.org/jira/browse/HBASE-6299
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6299.patch, HBASE-6299-v2.patch, 
 HBASE-6299-v3.patch


 1. HMaster tries to assign a region to an RS.
 2. HMaster creates a RegionState for this region and puts it into 
 regionsInTransition.
 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS 
 receives the open region request and starts to proceed, with success 
 eventually. However, due to network problems, HMaster fails to receive the 
 response for the openRegion() call, and the call times out.
 4. HMaster attemps to assign for a second time, choosing another RS. 
 5. But since the HMaster's OpenedRegionHandler has been triggered by the 
 region open of the previous RS, and the RegionState has already been removed 
 from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK 
 node RS_ZK_REGION_OPENING updated by the second attempt.
 6. The unassigned ZK node stays and a later unassign fails coz 
 RS_ZK_REGION_CLOSING cannot be created.
 {code}
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.;
  
 plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
  src=swbss-hadoop-004,60020,1340890123243, 
 dest=swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:28,882 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,291 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
 event for 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, 
 regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node
 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Deleting existing unassigned node for 
 b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Successfully deleted unassigned node for 
 region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has 
 opened the region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  that was online on serverName=swbss-hadoop-006,60020,1340890678078, 
 load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301)
 2012-06-29 07:07:41,140 WARN

[jira] [Commented] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.

2012-09-13 Thread Maryann Xue (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454840#comment-13454840
 ] 

Maryann Xue commented on HBASE-6299:


updated the patch as HBASE-6299-v3.patch

 RS starts region open while fails ack to HMaster.sendRegionOpen() causes 
 inconsistency in HMaster's region state and a series of successive problems.
 -

 Key: HBASE-6299
 URL: https://issues.apache.org/jira/browse/HBASE-6299
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6299.patch, HBASE-6299-v2.patch, 
 HBASE-6299-v3.patch


 1. HMaster tries to assign a region to an RS.
 2. HMaster creates a RegionState for this region and puts it into 
 regionsInTransition.
 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS 
 receives the open region request and starts to proceed, with success 
 eventually. However, due to network problems, HMaster fails to receive the 
 response for the openRegion() call, and the call times out.
 4. HMaster attemps to assign for a second time, choosing another RS. 
 5. But since the HMaster's OpenedRegionHandler has been triggered by the 
 region open of the previous RS, and the RegionState has already been removed 
 from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK 
 node RS_ZK_REGION_OPENING updated by the second attempt.
 6. The unassigned ZK node stays and a later unassign fails coz 
 RS_ZK_REGION_CLOSING cannot be created.
 {code}
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.;
  
 plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
  src=swbss-hadoop-004,60020,1340890123243, 
 dest=swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:28,882 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,291 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
 event for 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, 
 regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node
 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Deleting existing unassigned node for 
 b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Successfully deleted unassigned node for 
 region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has 
 opened the region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  that was online on serverName=swbss-hadoop-006,60020,1340890678078, 
 load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301)
 2012-06-29 07:07:41,140 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=0, 
 regions=575, usedHeap=0, maxHeap=0), trying to assign elsewhere instead; 
 retry=0

[jira] [Created] (HBASE-6772) Make the Distributed Split HDFS Location aware

nkeywal created HBASE-6772:
--

 Summary: Make the Distributed Split HDFS Location aware
 Key: HBASE-6772
 URL: https://issues.apache.org/jira/browse/HBASE-6772
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal


During a hlog split, each log file (a single hdfs block) is allocated to a 
different region server. This region server reads the file and creates the 
recovery edit files.
The allocation to the region server is random. We could take into account the 
locations of the log file to split:
- the reads would be local, hence faster. This allows short circuit as well.
- less network i/o used during a failure (and this is important)
- we would be sure to read from a working datanode, hence we're sure we won't 
have read errors. Read errors slow the split process a lot, as we often enter 
the timeouted world. 

We need to limit the calls to the namenode however.

Typical algo could be:
- the master gets the locations of the hlog files
- it writes it into ZK, if possible in one transaction (this way all the tasks 
are visible alltogether, allowing some arbitrage by the region server).
- when the regionserver receives the event, it checks for all logs and all 
locations.
- if there is a match, it takes it
- if not it waits something like 0.2s (to give the time to other regionserver 
to take it if the location matches), and take any remaining task.

Drawbacks are:
- a 0.2s delay added if there is no regionserver available on one of the 
locations. It's likely possible to remove it with some extra synchronization.
- Small increase in complexity and dependency to HDFS

Considering the advantages, it's worth it imho.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-6536) [replication] replication will be block if WAL compress set differently in master and slave cluster configuration


 [ 
https://issues.apache.org/jira/browse/HBASE-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Drzal resolved HBASE-6536.
--

Resolution: Duplicate

 [replication] replication will be block if WAL compress set differently in 
 master and slave cluster configuration
 -

 Key: HBASE-6536
 URL: https://issues.apache.org/jira/browse/HBASE-6536
 Project: HBase
  Issue Type: Bug
  Components: replication
Affects Versions: 0.94.0
Reporter: terry zhang

 As we know in hbase 0.94.0 we have a configuration below
   property
 namehbase.regionserver.wal.enablecompression/name
  valuetrue/value
   /property
 if we enable it in master cluster and disable it in slave cluster . Then 
 replication will not work. It will throw unwrapRemoteException again and 
 again in master cluster because slave can not parse the hlog entry buffer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-6535) [replication] replication will be block if WAL compress set differently in master and slave cluster configuration


 [ 
https://issues.apache.org/jira/browse/HBASE-6535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Drzal resolved HBASE-6535.
--

Resolution: Duplicate

 [replication] replication will be block if WAL compress set differently in 
 master and slave cluster configuration
 -

 Key: HBASE-6535
 URL: https://issues.apache.org/jira/browse/HBASE-6535
 Project: HBase
  Issue Type: Bug
  Components: replication
Affects Versions: 0.94.0
Reporter: terry zhang

 As we know in hbase 0.94.0 we have a configuration below
   property
 namehbase.regionserver.wal.enablecompression/name
  valuetrue/value
   /property
 if we enable it in master cluster and disable it in slave cluster . Then 
 replication will not work. It will throw unwrapRemoteException again and 
 again in master cluster because slave can not parse the hlog entry buffer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.

2012-09-13 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454869#comment-13454869
]

Hadoop QA commented on HBASE-6299:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12544974/HBASE-6299-v3.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

+1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The patch appears to cause mvn compile goal to fail.

-1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed unit tests in .

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/2857//testReport/
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/2857//console

This message is automatically generated.

RS starts region open while fails ack to HMaster.sendRegionOpen() causes
inconsistency in HMaster's region state and a series of successive problems.
-

Key: HBASE-6299
URL: https://issues.apache.org/jira/browse/HBASE-6299
Project: HBase
Issue Type: Bug
Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
Attachments: HBASE-6299.patch, HBASE-6299-v2.patch,
HBASE-6299-v3.patch

1. HMaster tries to assign a region to an RS.
2. HMaster creates a RegionState for this region and puts it into
regionsInTransition.
3. In the first assign attempt, HMaster calls RS.openRegion(). The RS
receives the open region request and starts to proceed, with success
eventually. However, due to network problems, HMaster fails to receive the
response for the openRegion() call, and the call times out.
4. HMaster attemps to assign for a second time, choosing another RS.
5. But since the HMaster's OpenedRegionHandler has been triggered by the
region open of the previous RS, and the RegionState has already been removed
from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK
node RS_ZK_REGION_OPENING updated by the second attempt.
6. The unassigned ZK node stays and a later unassign fails coz
RS_ZK_REGION_CLOSING cannot be created.
{code}
2012-06-29 07:03:38,870 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for
region
CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.;

plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
src=swbss-hadoop-004,60020,1340890123243,
dest=swbss-hadoop-006,60020,1340890678078
2012-06-29 07:03:38,870 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
to swbss-hadoop-006,60020,1340890678078
2012-06-29 07:03:38,870 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Handling
transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6,
region=b713fd655fa02395496c5a6e39ddf568
2012-06-29 07:06:28,882 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Handling
transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078,
region=b713fd655fa02395496c5a6e39ddf568
2012-06-29 07:06:32,291 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Handling
transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078,
region=b713fd655fa02395496c5a6e39ddf568
2012-06-29 07:06:32,299 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Handling
transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078,
region=b713fd655fa02395496c5a6e39ddf568
2012-06-29 07:06:32,299 DEBUG
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED
event for
CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945,
regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node
2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:

[jira] [Resolved] (HBASE-6534) [replication] replication will be block if WAL compress set differently in master and slave configuration


 [ 
https://issues.apache.org/jira/browse/HBASE-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Drzal resolved HBASE-6534.
--

Resolution: Duplicate

 [replication] replication will be block if WAL compress set differently in 
 master and slave configuration
 -

 Key: HBASE-6534
 URL: https://issues.apache.org/jira/browse/HBASE-6534
 Project: HBase
  Issue Type: Bug
  Components: replication
Affects Versions: 0.94.0
Reporter: terry zhang

 as we know in hbase 0.94.0 we have a configuration below
   property
 namehbase.regionserver.wal.enablecompression/name
  valuetrue/value
   /property
 if we enable it in master cluster and disable it in slave cluster . Then 
 replication will not work. It will throw unwrapRemoteException again and 
 again in master cluster.
 2012-08-09 12:49:55,892 WARN 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Can't 
 replicate because of an error
  on the remote cluster: 
 java.io.IOException: IPC server unable to read call parameters: Error in 
 readFields
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
 at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:635)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:365)
 Caused by: org.apache.hadoop.ipc.RemoteException: IPC server unable to read 
 call parameters: Error in readFields
 at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:921)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:151)
 at $Proxy13.replicateLogEntries(Unknown Source)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:616)
 ... 1 more 
 This is because Slave cluster can not parse the hlog entry .
 2012-08-09 14:46:05,891 WARN org.apache.hadoop.ipc.HBaseServer: Unable to 
 read call parameters for client 10.232.98.89
 java.io.IOException: Error in readFields
 at 
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:685)
 at 
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:586)
 at 
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:635)
 at 
 org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:125)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1292)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1207)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:735)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:524)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:499)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.EOFException
 at java.io.DataInputStream.readFully(DataInputStream.java:180)
 at org.apache.hadoop.hbase.KeyValue.readFields(KeyValue.java:2254)
 at 
 org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFields(WALEdit.java:146)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog$Entry.readFields(HLog.java:1767)
 at 
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:682)
 ... 11 more 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6533) [replication] replication will block if WAL compress set differently in master and slave configuration


[ 
https://issues.apache.org/jira/browse/HBASE-6533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454872#comment-13454872
 ] 

Michael Drzal commented on HBASE-6533:
--

[~terry_zhang] I've cleaned up the duplicate issues for you.

 [replication] replication will block if WAL compress set differently in 
 master and slave configuration
 --

 Key: HBASE-6533
 URL: https://issues.apache.org/jira/browse/HBASE-6533
 Project: HBase
  Issue Type: Bug
  Components: replication
Affects Versions: 0.94.0
Reporter: terry zhang
Assignee: terry zhang
Priority: Critical
 Fix For: 0.94.3

 Attachments: hbase-6533.patch


 as we know in hbase 0.94.0 we have a configuration below
   property
 namehbase.regionserver.wal.enablecompression/name
  valuetrue/value
   /property
 if we enable it in master cluster and disable it in slave cluster . Then 
 replication will not work. It will throw unwrapRemoteException again and 
 again in master cluster.
 2012-08-09 12:49:55,892 WARN 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Can't 
 replicate because of an error
  on the remote cluster: 
 java.io.IOException: IPC server unable to read call parameters: Error in 
 readFields
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
 at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:635)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:365)
 Caused by: org.apache.hadoop.ipc.RemoteException: IPC server unable to read 
 call parameters: Error in readFields
 at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:921)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:151)
 at $Proxy13.replicateLogEntries(Unknown Source)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:616)
 ... 1 more 
 This is because Slave cluster can not parse the hlog entry .
 2012-08-09 14:46:05,891 WARN org.apache.hadoop.ipc.HBaseServer: Unable to 
 read call parameters for client 10.232.98.89
 java.io.IOException: Error in readFields
 at 
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:685)
 at 
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:586)
 at 
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:635)
 at 
 org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:125)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1292)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1207)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:735)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:524)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:499)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.EOFException
 at java.io.DataInputStream.readFully(DataInputStream.java:180)
 at org.apache.hadoop.hbase.KeyValue.readFields(KeyValue.java:2254)
 at 
 org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFields(WALEdit.java:146)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog$Entry.readFields(HLog.java:1767)
 at 
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:682)
 ... 11 more 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6533) [replication] replication will block if WAL compress set differently in master and slave configuration


[ 
https://issues.apache.org/jira/browse/HBASE-6533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454873#comment-13454873
 ] 

Michael Drzal commented on HBASE-6533:
--

[~jdcryans] should we just close this out since the real fix is HBASE-5778?

 [replication] replication will block if WAL compress set differently in 
 master and slave configuration
 --

 Key: HBASE-6533
 URL: https://issues.apache.org/jira/browse/HBASE-6533
 Project: HBase
  Issue Type: Bug
  Components: replication
Affects Versions: 0.94.0
Reporter: terry zhang
Assignee: terry zhang
Priority: Critical
 Fix For: 0.94.3

 Attachments: hbase-6533.patch


 as we know in hbase 0.94.0 we have a configuration below
   property
 namehbase.regionserver.wal.enablecompression/name
  valuetrue/value
   /property
 if we enable it in master cluster and disable it in slave cluster . Then 
 replication will not work. It will throw unwrapRemoteException again and 
 again in master cluster.
 2012-08-09 12:49:55,892 WARN 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Can't 
 replicate because of an error
  on the remote cluster: 
 java.io.IOException: IPC server unable to read call parameters: Error in 
 readFields
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
 at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:635)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:365)
 Caused by: org.apache.hadoop.ipc.RemoteException: IPC server unable to read 
 call parameters: Error in readFields
 at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:921)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:151)
 at $Proxy13.replicateLogEntries(Unknown Source)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:616)
 ... 1 more 
 This is because Slave cluster can not parse the hlog entry .
 2012-08-09 14:46:05,891 WARN org.apache.hadoop.ipc.HBaseServer: Unable to 
 read call parameters for client 10.232.98.89
 java.io.IOException: Error in readFields
 at 
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:685)
 at 
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:586)
 at 
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:635)
 at 
 org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:125)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1292)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1207)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:735)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:524)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:499)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.EOFException
 at java.io.DataInputStream.readFully(DataInputStream.java:180)
 at org.apache.hadoop.hbase.KeyValue.readFields(KeyValue.java:2254)
 at 
 org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFields(WALEdit.java:146)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog$Entry.readFields(HLog.java:1767)
 at 
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:682)
 ... 11 more 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6563) s.isMajorCompaction() throws npe will cause current major Compaction checking abort


[ 
https://issues.apache.org/jira/browse/HBASE-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454875#comment-13454875
 ] 

Michael Drzal commented on HBASE-6563:
--

[~zhou wenjian] any response to Ted's comments?

 s.isMajorCompaction() throws npe will cause current major Compaction checking 
 abort
 ---

 Key: HBASE-6563
 URL: https://issues.apache.org/jira/browse/HBASE-6563
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Zhou wenjian
Assignee: Zhou wenjian
 Fix For: 0.94.1

 Attachments: HBASE-6563-trunk.patch, HBASE-6563-trunk-v2.patch, 
 HBASE-6563-trunk-v3.patch


 2012-05-05 00:49:43,265 ERROR 
 org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker: 
 Caught exception
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:938)
 at 
 org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:917)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.isMajorCompaction(HRegion.java:3250)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker.chore(HRegionServer.java:1222)
 at org.apache.hadoop.hbase.Chore.run(Chore.java:66)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6564) HDFS space is not reclaimed when a column family is deleted


[ 
https://issues.apache.org/jira/browse/HBASE-6564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454876#comment-13454876
 ] 

Michael Drzal commented on HBASE-6564:
--

[~zhi...@ebaysf.com] can we close this out?

 HDFS space is not reclaimed when a column family is deleted
 ---

 Key: HBASE-6564
 URL: https://issues.apache.org/jira/browse/HBASE-6564
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.1
Reporter: J Mohamed Zahoor
Assignee: J Mohamed Zahoor
Priority: Minor
 Attachments: HBASE-6564-trunk.patch, HBASE-6564-v2.patch, 
 HBASE-6564-v3.patch, HBASE-6564-v4.patch, HBASE-6564-v5.patch


 When a column family of a table is deleted, the HDFS space of the column 
 family does not seem to be reclaimed even after a major compaction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6564) HDFS space is not reclaimed when a column family is deleted

2012-09-13 Thread J Mohamed Zahoor (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454877#comment-13454877
 ] 

J Mohamed Zahoor commented on HBASE-6564:
-

Yes. I guess.

 HDFS space is not reclaimed when a column family is deleted
 ---

 Key: HBASE-6564
 URL: https://issues.apache.org/jira/browse/HBASE-6564
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.1
Reporter: J Mohamed Zahoor
Assignee: J Mohamed Zahoor
Priority: Minor
 Attachments: HBASE-6564-trunk.patch, HBASE-6564-v2.patch, 
 HBASE-6564-v3.patch, HBASE-6564-v4.patch, HBASE-6564-v5.patch


 When a column family of a table is deleted, the HDFS space of the column 
 family does not seem to be reclaimed even after a major compaction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6583) Enhance Hbase load test tool to automatically create cf's if not present


 [ 
https://issues.apache.org/jira/browse/HBASE-6583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Drzal updated HBASE-6583:
-

Labels: noob  (was: )

 Enhance Hbase load test tool to automatically create cf's if not present
 

 Key: HBASE-6583
 URL: https://issues.apache.org/jira/browse/HBASE-6583
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: Karthik Ranganathan
  Labels: noob

 The load test tool currently disables the table and applies any changes to 
 the cf descriptor if any, but does not create the cf if not present.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6564) HDFS space is not reclaimed when a column family is deleted


 [ 
https://issues.apache.org/jira/browse/HBASE-6564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-6564:
--

   Resolution: Fixed
Fix Version/s: 0.96.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

 HDFS space is not reclaimed when a column family is deleted
 ---

 Key: HBASE-6564
 URL: https://issues.apache.org/jira/browse/HBASE-6564
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.1
Reporter: J Mohamed Zahoor
Assignee: J Mohamed Zahoor
Priority: Minor
 Fix For: 0.96.0

 Attachments: HBASE-6564-trunk.patch, HBASE-6564-v2.patch, 
 HBASE-6564-v3.patch, HBASE-6564-v4.patch, HBASE-6564-v5.patch


 When a column family of a table is deleted, the HDFS space of the column 
 family does not seem to be reclaimed even after a major compaction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6591) checkAndPut executed/not metrics


[ 
https://issues.apache.org/jira/browse/HBASE-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454885#comment-13454885
 ] 

Michael Drzal commented on HBASE-6591:
--

[~gchanan] just to make sure I understand you correctly, you would want metrics 
at the regionserver level along the lines of:

* checkAndPutSuccesses
* checkAndPutFailures
* checkAndDeleteSuccesses
* checkAndDeleteFailures

Would they actually be helpful at the regionserver level or would you need them 
more granular?

 checkAndPut executed/not metrics
 

 Key: HBASE-6591
 URL: https://issues.apache.org/jira/browse/HBASE-6591
 Project: HBase
  Issue Type: Task
  Components: metrics, regionserver
Reporter: Gregory Chanan
Assignee: Gregory Chanan
Priority: Minor
 Fix For: 0.96.0


 checkAndPut/checkAndDelete return true if the new put was executed, false 
 otherwise.
 So clients can figure out this metric for themselves, but it would be useful 
 to get a look at what is happening on the cluster as a whole, across all 
 clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-6773) Make the dfs replication factor configurable per table

nkeywal created HBASE-6773:
--

 Summary: Make the dfs replication factor configurable per table
 Key: HBASE-6773
 URL: https://issues.apache.org/jira/browse/HBASE-6773
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.96.0
Reporter: nkeywal


Today, it's an application level configuration. So all the HFiles are 
replicated 3 times per default.

There are some reasons to make it per table:
- some tables are critical while some others are not. For example, meta would 
benefit from an higher level of replication, to ensure we continue working even 
when we lose 20% of the cluster.
- some tables are backuped somewhere else, used by non essential process, so 
the user may accept a lower level of replication for these ones.
- it should be a dynamic parameter. For example, during a bulk load we set a 
replication of 1 or 2, then we increase it. It's in the same space as disabling 
the WAL for some writes.


The case that seems important to me is meta. We can also handle this one by a 
specific parameter in the usual hbase-site.xml if we don't want a generic 
solution.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6698) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation

2012-09-13 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454899#comment-13454899
 ] 

ramkrishna.s.vasudevan commented on HBASE-6698:
---

[~saint@gmail.com]
Is this patch fine Stack? 

 Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation
 --

 Key: HBASE-6698
 URL: https://issues.apache.org/jira/browse/HBASE-6698
 Project: HBase
  Issue Type: Improvement
Reporter: ramkrishna.s.vasudevan
 Fix For: 0.96.0

 Attachments: HBASE-6698_1.patch, HBASE-6698_2.patch, 
 HBASE-6698_3.patch, HBASE-6698_5.patch, HBASE-6698_6.patch, 
 HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698.patch


 Currently the checkAndPut and checkAndDelete api internally calls the 
 internalPut and internalDelete.  May be we can just call doMiniBatchMutation
 only.  This will help in future like if we have some hooks and the CP
 handles certain cases in the doMiniBatchMutation the same can be done while
 doing a put thro checkAndPut or while doing a delete thro checkAndDelete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.

2012-09-13 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454905#comment-13454905
 ] 

ramkrishna.s.vasudevan commented on HBASE-6299:
---

@Maryann
Thanks for the patch.  This what we just discussed over in HBASe-6438.
Pls take a look at that patch also.  We could actually merge both if you feel 
it is fine and commit them once others review it.

 RS starts region open while fails ack to HMaster.sendRegionOpen() causes 
 inconsistency in HMaster's region state and a series of successive problems.
 -

 Key: HBASE-6299
 URL: https://issues.apache.org/jira/browse/HBASE-6299
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6299.patch, HBASE-6299-v2.patch, 
 HBASE-6299-v3.patch


 1. HMaster tries to assign a region to an RS.
 2. HMaster creates a RegionState for this region and puts it into 
 regionsInTransition.
 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS 
 receives the open region request and starts to proceed, with success 
 eventually. However, due to network problems, HMaster fails to receive the 
 response for the openRegion() call, and the call times out.
 4. HMaster attemps to assign for a second time, choosing another RS. 
 5. But since the HMaster's OpenedRegionHandler has been triggered by the 
 region open of the previous RS, and the RegionState has already been removed 
 from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK 
 node RS_ZK_REGION_OPENING updated by the second attempt.
 6. The unassigned ZK node stays and a later unassign fails coz 
 RS_ZK_REGION_CLOSING cannot be created.
 {code}
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.;
  
 plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
  src=swbss-hadoop-004,60020,1340890123243, 
 dest=swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:28,882 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,291 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
 event for 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, 
 regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node
 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Deleting existing unassigned node for 
 b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Successfully deleted unassigned node for 
 region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has 
 opened the region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  that was online on serverName=swbss-hadoop-006,60020,1340890678078, 
 load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301)
 2012-06-29 07:07:41,140 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of

[jira] [Commented] (HBASE-6438) RegionAlreadyInTransitionException needs to give more info to avoid assignment inconsistencies

2012-09-13 Thread ramkrishna.s.vasudevan (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454906#comment-13454906
]

ramkrishna.s.vasudevan commented on HBASE-6438:
---

@Lars/@Ted
Maryann has come up with a patch for HBASe-6299 where there is no retry on
SocketTimeOut. May be if he is fine we can merge both or we can seperate this
HBASE-6438 seperately.

RegionAlreadyInTransitionException needs to give more info to avoid
assignment inconsistencies
--

Key: HBASE-6438
URL: https://issues.apache.org/jira/browse/HBASE-6438
Project: HBase
Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: rajeshbabu
Fix For: 0.96.0, 0.92.3, 0.94.3

Attachments: HBASE-6438_2.patch, HBASE-6438_94.patch,
HBASE-6438_trunk.patch

Seeing some of the recent issues in region assignment,
RegionAlreadyInTransitionException is one reason after which the region
assignment may or may not happen(in the sense we need to wait for the TM to
assign).
In HBASE-6317 we got one problem due to RegionAlreadyInTransitionException on
master restart.
Consider the following case, due to some reason like master restart or
external assign call, we try to assign a region that is already getting
opened in a RS.
Now the next call to assign has already changed the state of the znode and so
the current assign that is going on the RS is affected and it fails. The
second assignment that started also fails getting RAITE exception. Finally
both assignments not carrying on. Idea is to find whether any such RAITE
exception can be retried or not.
Here again we have following cases like where
- The znode is yet to transitioned from OFFLINE to OPENING in RS
- RS may be in the step of openRegion.
- RS may be trying to transition OPENING to OPENED.
- RS is yet to add to online regions in the RS side.
Here in openRegion() and updateMeta() any failures we are moving the znode to
FAILED_OPEN. So in these cases getting an RAITE should be ok. But in other
cases the assignment is stopped.
The idea is to just add the current state of the region assignment in the RIT
map in the RS side and using that info we can determine whether the
assignment can be retried or not on getting an RAITE.
Considering the current work going on in AM, pls do share if this is needed
atleast in the 0.92/0.94 versions?

[jira] [Created] (HBASE-6774) Immediate assignment of regions that don't have entries in HLog

nkeywal created HBASE-6774:
--

 Summary: Immediate assignment of regions that don't have entries 
in HLog
 Key: HBASE-6774
 URL: https://issues.apache.org/jira/browse/HBASE-6774
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal


The algo is today, after a failure detection:
- split the logs
- when all the logs are split, assign the regions

But some regions can have no entries at all in the HLog. There are many reasons 
for this:
- kind of reference or historical tables. Bulk written sometimes then read only.
- sequential rowkeys. In this case, most of the regions will be read only. But 
they can be in a regionserver with a lot of writes.
- tables flushed often for safety reasons. I'm thinking about meta here.

For meta; we can imagine flushing very often. Hence, the recovery for meta, in 
many cases, will be the failure detection time.

There are different possible algos:
Option 1)
 A new task is added, in parallel of the split. This task reads all the HLog. 
If there is no entry for a region, this region is assigned.
 Pro: simple
 Cons: We will need to read all the files. Add a read.
Option 2)
 The master writes in ZK the number of log files, per region.
 When the regionserver starts the split, it reads the full block (64M) and 
decrease the log file counter of the region. If it reaches 0, the assign start. 
At the end of its split, the region server decreases the counter as well. This 
allow to start the assign even if not all the HLog are finished. It would allow 
to make some regions available even if we have an issue in one of the log file.
 Pro: parallel
 Cons: add something to do for the region server. Requites to read the whole 
file before starting to write. 
Option 3)
 Add some metadata at the end of the log file. The last log file won't have 
meta data, as if we are recovering, it's because the server crashed. But the 
others will. And last log file should be smaller (half a block on average).  
Option 4) Still some metadata, but in a different file. Cons: write are 
increased (but not that much, we just need to write the region once). Pros: if 
we lose the HLog files (major failure, no replica available) we can still 
continue with the regions that were not written at this stage.

I think it should be done, even if none of the algorithm above is totally 
convincing yet. It's linked as well to locality and short circuit reads: with 
these two points reading the file twice become much less of an issue for 
example. My current preference would be to open the file twice in the region 
server, once for splitting as of today, once for a quick read looking for 
unused regions. Who knows, may be it would even be faster this way, the quick 
read thread would warm-up the different caches for the splitting thread.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5843) Improve HBase MTTR - Mean Time To Recover

[
https://issues.apache.org/jira/browse/HBASE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454911#comment-13454911
]

nkeywal commented on HBASE-5843:

Test with meta:

On a real cluster, 3 nodes.
dfs.replication = 2
local HD.
Start with 2 DN and 2 RS. Create a table with 100 regions in the second
one. The first holds meta root.
Start another box with a DN and a RS. This box is empty (no regions, no
blocks).
Unplug the box with meta root.
try to create a table

= Time taken is the recovery time of the bow holding meta. No bad surprise. It
means as well that with the default zookeeper timeout you're loosing the
cluster for 3 minutes if your meta regionserver dies. HBASE-6772, HBASE-6773
and HBASE-6774 would help to increase meta failure resiliency.

Improve HBase MTTR - Mean Time To Recover
-

Key: HBASE-5843
URL: https://issues.apache.org/jira/browse/HBASE-5843
Project: HBase
Issue Type: Umbrella
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal

A part of the approach is described here:
https://docs.google.com/document/d/1z03xRoZrIJmg7jsWuyKYl6zNournF_7ZHzdi0qz_B4c/edit
The ideal target is:
- failure impact client applications only by an added delay to execute a
query, whatever the failure.
- this delay is always inferior to 1 second.
We're not going to achieve that immediately...
Priority will be given to the most frequent issues.
Short term:
- software crash
- standard administrative tasks as stop/start of a cluster.

[jira] [Commented] (HBASE-6438) RegionAlreadyInTransitionException needs to give more info to avoid assignment inconsistencies

[
https://issues.apache.org/jira/browse/HBASE-6438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454929#comment-13454929
]

Ted Yu commented on HBASE-6438:
---

I think separating the fix would make discussion easier.

Thanks

RegionAlreadyInTransitionException needs to give more info to avoid
assignment inconsistencies
--

Key: HBASE-6438
URL: https://issues.apache.org/jira/browse/HBASE-6438
Project: HBase
Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: rajeshbabu
Fix For: 0.96.0, 0.92.3, 0.94.3

Attachments: HBASE-6438_2.patch, HBASE-6438_94.patch,
HBASE-6438_trunk.patch

[jira] [Commented] (HBASE-6500) hbck comlaining, Exception in thread main java.lang.NullPointerException


[ 
https://issues.apache.org/jira/browse/HBASE-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454954#comment-13454954
 ] 

Lars Hofhansl commented on HBASE-6500:
--

Heh, no problem English is not my native language either.

 hbck comlaining, Exception in thread main java.lang.NullPointerException 
 ---

 Key: HBASE-6500
 URL: https://issues.apache.org/jira/browse/HBASE-6500
 Project: HBase
  Issue Type: Bug
  Components: hbck
Affects Versions: 0.94.0
 Environment: Hadoop 0.20.205.0
 Zookeeper: zookeeper-3.3.5.jar
 Hbase: hbase-0.94.0
Reporter: liuli

 I met problem with starting Hbase:
 I have 5 machines (Ubuntu)
 109.123.121.23 rsmm-master.example.com
 109.123.121.24 rsmm-slave-1.example.com
 109.123.121.25 rsmm-slave-2.example.com
 109.123.121.26 rsmm-slave-3.example.com
 109.123.121.27 rsmm-slave-4.example.com
 Hadoop 0.20.205.0
 Zookeeper: zookeeper-3.3.5.jar
 Hbase: hbase-0.94.0
 After starting HBase, running hbck
 hduser@rsmm-master:~/hbase/bin$ ./hbase hbck
 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client 
 environment:zookeeper.version=3.3.5-1301095, built on 03/15/2012 19:48 GMT
 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client 
 environment:host.name=rsmm-master.example.com
 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client 
 environment:java.version=1.6.0_33
 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client 
 environment:java.vendor=Sun Microsystems Inc.
 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client 
 environment:java.home=/usr/java/jre1.6.0_33
 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client

[jira] [Resolved] (HBASE-6500) hbck comlaining, Exception in thread main java.lang.NullPointerException


 [ 
https://issues.apache.org/jira/browse/HBASE-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-6500.
--

Resolution: Duplicate

Closing as duplicate of HBASE-6464

 hbck comlaining, Exception in thread main java.lang.NullPointerException 
 ---

 Key: HBASE-6500
 URL: https://issues.apache.org/jira/browse/HBASE-6500
 Project: HBase
  Issue Type: Bug
  Components: hbck
Affects Versions: 0.94.0
 Environment: Hadoop 0.20.205.0
 Zookeeper: zookeeper-3.3.5.jar
 Hbase: hbase-0.94.0
Reporter: liuli

 I met problem with starting Hbase:
 I have 5 machines (Ubuntu)
 109.123.121.23 rsmm-master.example.com
 109.123.121.24 rsmm-slave-1.example.com
 109.123.121.25 rsmm-slave-2.example.com
 109.123.121.26 rsmm-slave-3.example.com
 109.123.121.27 rsmm-slave-4.example.com
 Hadoop 0.20.205.0
 Zookeeper: zookeeper-3.3.5.jar
 Hbase: hbase-0.94.0
 After starting HBase, running hbck
 hduser@rsmm-master:~/hbase/bin$ ./hbase hbck
 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client 
 environment:zookeeper.version=3.3.5-1301095, built on 03/15/2012 19:48 GMT
 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client 
 environment:host.name=rsmm-master.example.com
 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client 
 environment:java.version=1.6.0_33
 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client 
 environment:java.vendor=Sun Microsystems Inc.
 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client 
 environment:java.home=/usr/java/jre1.6.0_33
 28/12/12 17:13:29 INFO zookeeper.ZooKeeper: Client

[jira] [Commented] (HBASE-6438) RegionAlreadyInTransitionException needs to give more info to avoid assignment inconsistencies

[
https://issues.apache.org/jira/browse/HBASE-6438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454956#comment-13454956
]

Lars Hofhansl commented on HBASE-6438:
--

I'm fine either way. 0.94.2RC0 is nit spun, yet. If we can get this in quickly
I can pull it into the that RC.

RegionAlreadyInTransitionException needs to give more info to avoid
assignment inconsistencies
--

Key: HBASE-6438
URL: https://issues.apache.org/jira/browse/HBASE-6438
Project: HBase
Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: rajeshbabu
Fix For: 0.96.0, 0.92.3, 0.94.3

Attachments: HBASE-6438_2.patch, HBASE-6438_94.patch,
HBASE-6438_trunk.patch

[jira] [Commented] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.


[ 
https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454957#comment-13454957
 ] 

Lars Hofhansl commented on HBASE-6299:
--

Do we still need to unwrap the exception?

 RS starts region open while fails ack to HMaster.sendRegionOpen() causes 
 inconsistency in HMaster's region state and a series of successive problems.
 -

 Key: HBASE-6299
 URL: https://issues.apache.org/jira/browse/HBASE-6299
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6299.patch, HBASE-6299-v2.patch, 
 HBASE-6299-v3.patch


 1. HMaster tries to assign a region to an RS.
 2. HMaster creates a RegionState for this region and puts it into 
 regionsInTransition.
 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS 
 receives the open region request and starts to proceed, with success 
 eventually. However, due to network problems, HMaster fails to receive the 
 response for the openRegion() call, and the call times out.
 4. HMaster attemps to assign for a second time, choosing another RS. 
 5. But since the HMaster's OpenedRegionHandler has been triggered by the 
 region open of the previous RS, and the RegionState has already been removed 
 from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK 
 node RS_ZK_REGION_OPENING updated by the second attempt.
 6. The unassigned ZK node stays and a later unassign fails coz 
 RS_ZK_REGION_CLOSING cannot be created.
 {code}
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.;
  
 plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
  src=swbss-hadoop-004,60020,1340890123243, 
 dest=swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:28,882 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,291 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
 event for 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, 
 regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node
 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Deleting existing unassigned node for 
 b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Successfully deleted unassigned node for 
 region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has 
 opened the region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  that was online on serverName=swbss-hadoop-006,60020,1340890678078, 
 load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301)
 2012-06-29 07:07:41,140 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=0, 
 regions=575, usedHeap=0, maxHeap=0), trying to assign elsewhere instead; 
 retry=0

[jira] [Updated] (HBASE-6769) HRS.multi eats NoSuchColumnFamilyException since HBASE-5021


 [ 
https://issues.apache.org/jira/browse/HBASE-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6769:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to 0.94 and 0.96. Thanks the patch Elliot!

 HRS.multi eats NoSuchColumnFamilyException since HBASE-5021
 ---

 Key: HBASE-6769
 URL: https://issues.apache.org/jira/browse/HBASE-6769
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.94.1
Reporter: Jean-Daniel Cryans
Assignee: Elliott Clark
Priority: Critical
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6769-0.94-0.patch, HBASE-6769-0.94-1.patch, 
 HBASE-6769-0.patch


 I think this is a pretty major usability regression, since HBASE-5021 this is 
 what you get in the client when using a wrong family:
 {noformat}
 2012-09-11 09:45:29,634 WARN 
 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 
 action: DoNotRetryIOException: 1 time, servers with issues: sfor3s44:10304, 
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1601)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1377)
   at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916)
   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:772)
   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:747)
 {noformat}
 Then you have to log on the server to understand what failed.
 Since everything is now a multi call, even single puts in the shell fail like 
 this.
 This is present since 0.94.0
 Assigning to Elliott because he asked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.


 [ 
https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6299:
-

Fix Version/s: 0.94.3
   0.96.0

 RS starts region open while fails ack to HMaster.sendRegionOpen() causes 
 inconsistency in HMaster's region state and a series of successive problems.
 -

 Key: HBASE-6299
 URL: https://issues.apache.org/jira/browse/HBASE-6299
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Fix For: 0.96.0, 0.94.3

 Attachments: HBASE-6299.patch, HBASE-6299-v2.patch, 
 HBASE-6299-v3.patch


 1. HMaster tries to assign a region to an RS.
 2. HMaster creates a RegionState for this region and puts it into 
 regionsInTransition.
 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS 
 receives the open region request and starts to proceed, with success 
 eventually. However, due to network problems, HMaster fails to receive the 
 response for the openRegion() call, and the call times out.
 4. HMaster attemps to assign for a second time, choosing another RS. 
 5. But since the HMaster's OpenedRegionHandler has been triggered by the 
 region open of the previous RS, and the RegionState has already been removed 
 from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK 
 node RS_ZK_REGION_OPENING updated by the second attempt.
 6. The unassigned ZK node stays and a later unassign fails coz 
 RS_ZK_REGION_CLOSING cannot be created.
 {code}
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.;
  
 plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
  src=swbss-hadoop-004,60020,1340890123243, 
 dest=swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:28,882 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,291 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
 event for 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, 
 regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node
 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Deleting existing unassigned node for 
 b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Successfully deleted unassigned node for 
 region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has 
 opened the region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  that was online on serverName=swbss-hadoop-006,60020,1340890678078, 
 load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301)
 2012-06-29 07:07:41,140 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=0, 
 regions=575, usedHeap=0, maxHeap=0), trying to assign elsewhere instead; 
 retry=0
 java.net.SocketTimeoutException:

[jira] [Updated] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.


 [ 
https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-6299:
--

Fix Version/s: 0.92.3

 RS starts region open while fails ack to HMaster.sendRegionOpen() causes 
 inconsistency in HMaster's region state and a series of successive problems.
 -

 Key: HBASE-6299
 URL: https://issues.apache.org/jira/browse/HBASE-6299
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Fix For: 0.96.0, 0.92.3, 0.94.3

 Attachments: HBASE-6299.patch, HBASE-6299-v2.patch, 
 HBASE-6299-v3.patch


 1. HMaster tries to assign a region to an RS.
 2. HMaster creates a RegionState for this region and puts it into 
 regionsInTransition.
 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS 
 receives the open region request and starts to proceed, with success 
 eventually. However, due to network problems, HMaster fails to receive the 
 response for the openRegion() call, and the call times out.
 4. HMaster attemps to assign for a second time, choosing another RS. 
 5. But since the HMaster's OpenedRegionHandler has been triggered by the 
 region open of the previous RS, and the RegionState has already been removed 
 from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK 
 node RS_ZK_REGION_OPENING updated by the second attempt.
 6. The unassigned ZK node stays and a later unassign fails coz 
 RS_ZK_REGION_CLOSING cannot be created.
 {code}
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.;
  
 plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
  src=swbss-hadoop-004,60020,1340890123243, 
 dest=swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:28,882 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,291 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
 event for 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, 
 regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node
 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Deleting existing unassigned node for 
 b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Successfully deleted unassigned node for 
 region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has 
 opened the region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  that was online on serverName=swbss-hadoop-006,60020,1340890678078, 
 load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301)
 2012-06-29 07:07:41,140 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=0, 
 regions=575, usedHeap=0, maxHeap=0), trying to assign elsewhere instead; 
 retry=0
 java.net.SocketTimeoutException: Call to /172.16.0.6:60020 failed

[jira] [Commented] (HBASE-6591) checkAndPut executed/not metrics


[ 
https://issues.apache.org/jira/browse/HBASE-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454968#comment-13454968
 ] 

Gregory Chanan commented on HBASE-6591:
---

Michael,

Those are the metrics I was thinking.  What do you think about granularity?  
regionserver level? region level?

 checkAndPut executed/not metrics
 

 Key: HBASE-6591
 URL: https://issues.apache.org/jira/browse/HBASE-6591
 Project: HBase
  Issue Type: Task
  Components: metrics, regionserver
Reporter: Gregory Chanan
Assignee: Gregory Chanan
Priority: Minor
 Fix For: 0.96.0


 checkAndPut/checkAndDelete return true if the new put was executed, false 
 otherwise.
 So clients can figure out this metric for themselves, but it would be useful 
 to get a look at what is happening on the cluster as a whole, across all 
 clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.


[ 
https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454969#comment-13454969
 ] 

Ted Yu commented on HBASE-6299:
---

nit:
{code}
+else if (t instanceof java.net.SocketTimeoutException 
{code}
'else' keyword is not needed above.

 RS starts region open while fails ack to HMaster.sendRegionOpen() causes 
 inconsistency in HMaster's region state and a series of successive problems.
 -

 Key: HBASE-6299
 URL: https://issues.apache.org/jira/browse/HBASE-6299
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Fix For: 0.96.0, 0.92.3, 0.94.3

 Attachments: HBASE-6299.patch, HBASE-6299-v2.patch, 
 HBASE-6299-v3.patch


 1. HMaster tries to assign a region to an RS.
 2. HMaster creates a RegionState for this region and puts it into 
 regionsInTransition.
 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS 
 receives the open region request and starts to proceed, with success 
 eventually. However, due to network problems, HMaster fails to receive the 
 response for the openRegion() call, and the call times out.
 4. HMaster attemps to assign for a second time, choosing another RS. 
 5. But since the HMaster's OpenedRegionHandler has been triggered by the 
 region open of the previous RS, and the RegionState has already been removed 
 from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK 
 node RS_ZK_REGION_OPENING updated by the second attempt.
 6. The unassigned ZK node stays and a later unassign fails coz 
 RS_ZK_REGION_CLOSING cannot be created.
 {code}
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.;
  
 plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
  src=swbss-hadoop-004,60020,1340890123243, 
 dest=swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:28,882 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,291 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
 event for 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, 
 regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node
 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Deleting existing unassigned node for 
 b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Successfully deleted unassigned node for 
 region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has 
 opened the region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  that was online on serverName=swbss-hadoop-006,60020,1340890678078, 
 load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301)
 2012-06-29 07:07:41,140 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to serverName=swbss-hadoop-006,60020,1340890678078,

[jira] [Commented] (HBASE-6591) checkAndPut executed/not metrics


[ 
https://issues.apache.org/jira/browse/HBASE-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454974#comment-13454974
 ] 

Michael Drzal commented on HBASE-6591:
--

I don't know.  You'd have to tell me what would work for your use case.  If you 
track it at the regionserver level, you could end up with multiple regions 
affecting this counter.  I'm not sure if you'd end up with valuable data in 
that case.

 checkAndPut executed/not metrics
 

 Key: HBASE-6591
 URL: https://issues.apache.org/jira/browse/HBASE-6591
 Project: HBase
  Issue Type: Task
  Components: metrics, regionserver
Reporter: Gregory Chanan
Assignee: Gregory Chanan
Priority: Minor
 Fix For: 0.96.0


 checkAndPut/checkAndDelete return true if the new put was executed, false 
 otherwise.
 So clients can figure out this metric for themselves, but it would be useful 
 to get a look at what is happening on the cluster as a whole, across all 
 clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6769) HRS.multi eats NoSuchColumnFamilyException since HBASE-5021

2012-09-13 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454986#comment-13454986
 ] 

Jean-Daniel Cryans commented on HBASE-6769:
---

Great work Elliott.

 HRS.multi eats NoSuchColumnFamilyException since HBASE-5021
 ---

 Key: HBASE-6769
 URL: https://issues.apache.org/jira/browse/HBASE-6769
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.94.1
Reporter: Jean-Daniel Cryans
Assignee: Elliott Clark
Priority: Critical
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6769-0.94-0.patch, HBASE-6769-0.94-1.patch, 
 HBASE-6769-0.patch


 I think this is a pretty major usability regression, since HBASE-5021 this is 
 what you get in the client when using a wrong family:
 {noformat}
 2012-09-11 09:45:29,634 WARN 
 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 
 action: DoNotRetryIOException: 1 time, servers with issues: sfor3s44:10304, 
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1601)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1377)
   at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916)
   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:772)
   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:747)
 {noformat}
 Then you have to log on the server to understand what failed.
 Since everything is now a multi call, even single puts in the shell fail like 
 this.
 This is present since 0.94.0
 Assigning to Elliott because he asked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-6374) [89-fb] Unify the multi-put/get/delete path so there is only one call to each RS, instead of one call per region

2012-09-13 Thread Amitanand Aiyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-6374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amitanand Aiyer resolved HBASE-6374.


Resolution: Fixed

 [89-fb] Unify the multi-put/get/delete path so there is only one call to each 
 RS, instead of one call per region
 

 Key: HBASE-6374
 URL: https://issues.apache.org/jira/browse/HBASE-6374
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor
 Fix For: 0.89-fb


 This is a feature similar to the batch feature in trunk. 
 We have optimisation for the put path where we batch puts by the 
 regionserver, but for gets and deletes we do batching only per hregion. So, 
 if there are 20 regions on a regionserver, we would be doing 20 RPC when we 
 can potentially batch them together in 1 call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6591) checkAndPut executed/not metrics


[ 
https://issues.apache.org/jira/browse/HBASE-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454990#comment-13454990
 ] 

Lars Hofhansl commented on HBASE-6591:
--

What would the use of this metric generally be?

 checkAndPut executed/not metrics
 

 Key: HBASE-6591
 URL: https://issues.apache.org/jira/browse/HBASE-6591
 Project: HBase
  Issue Type: Task
  Components: metrics, regionserver
Reporter: Gregory Chanan
Assignee: Gregory Chanan
Priority: Minor
 Fix For: 0.96.0


 checkAndPut/checkAndDelete return true if the new put was executed, false 
 otherwise.
 So clients can figure out this metric for themselves, but it would be useful 
 to get a look at what is happening on the cluster as a whole, across all 
 clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6374) [89-fb] Unify the multi-put/get/delete path so there is only one call to each RS, instead of one call per region


[ 
https://issues.apache.org/jira/browse/HBASE-6374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454995#comment-13454995
 ] 

Lars Hofhansl commented on HBASE-6374:
--

So is this in trunk as well (the Get and Delete optimization)?
(I know I could just look, but asking is easier :) )

 [89-fb] Unify the multi-put/get/delete path so there is only one call to each 
 RS, instead of one call per region
 

 Key: HBASE-6374
 URL: https://issues.apache.org/jira/browse/HBASE-6374
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor
 Fix For: 0.89-fb


 This is a feature similar to the batch feature in trunk. 
 We have optimisation for the put path where we batch puts by the 
 regionserver, but for gets and deletes we do batching only per hregion. So, 
 if there are 20 regions on a regionserver, we would be doing 20 RPC when we 
 can potentially batch them together in 1 call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6769) HRS.multi eats NoSuchColumnFamilyException since HBASE-5021


[ 
https://issues.apache.org/jira/browse/HBASE-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455011#comment-13455011
 ] 

Hudson commented on HBASE-6769:
---

Integrated in HBase-TRUNK #3327 (See 
[https://builds.apache.org/job/HBase-TRUNK/3327/])
HBASE-6769 HRS.multi eats NoSuchColumnFamilyException (Elliott Clark) 
(Revision 1384377)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/FailedSanityCheckException.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java


 HRS.multi eats NoSuchColumnFamilyException since HBASE-5021
 ---

 Key: HBASE-6769
 URL: https://issues.apache.org/jira/browse/HBASE-6769
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.94.1
Reporter: Jean-Daniel Cryans
Assignee: Elliott Clark
Priority: Critical
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6769-0.94-0.patch, HBASE-6769-0.94-1.patch, 
 HBASE-6769-0.patch


 I think this is a pretty major usability regression, since HBASE-5021 this is 
 what you get in the client when using a wrong family:
 {noformat}
 2012-09-11 09:45:29,634 WARN 
 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 
 action: DoNotRetryIOException: 1 time, servers with issues: sfor3s44:10304, 
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1601)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1377)
   at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916)
   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:772)
   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:747)
 {noformat}
 Then you have to log on the server to understand what failed.
 Since everything is now a multi call, even single puts in the shell fail like 
 this.
 This is present since 0.94.0
 Assigning to Elliott because he asked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6769) HRS.multi eats NoSuchColumnFamilyException since HBASE-5021


[ 
https://issues.apache.org/jira/browse/HBASE-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455015#comment-13455015
 ] 

Hudson commented on HBASE-6769:
---

Integrated in HBase-0.94 #467 (See 
[https://builds.apache.org/job/HBase-0.94/467/])
HBASE-6769 HRS.multi eats NoSuchColumnFamilyException (Elliott Clark) 
(Revision 1384378)

 Result = FAILURE
larsh : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HConstants.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java


 HRS.multi eats NoSuchColumnFamilyException since HBASE-5021
 ---

 Key: HBASE-6769
 URL: https://issues.apache.org/jira/browse/HBASE-6769
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.94.1
Reporter: Jean-Daniel Cryans
Assignee: Elliott Clark
Priority: Critical
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6769-0.94-0.patch, HBASE-6769-0.94-1.patch, 
 HBASE-6769-0.patch


 I think this is a pretty major usability regression, since HBASE-5021 this is 
 what you get in the client when using a wrong family:
 {noformat}
 2012-09-11 09:45:29,634 WARN 
 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 
 action: DoNotRetryIOException: 1 time, servers with issues: sfor3s44:10304, 
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1601)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1377)
   at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916)
   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:772)
   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:747)
 {noformat}
 Then you have to log on the server to understand what failed.
 Since everything is now a multi call, even single puts in the shell fail like 
 this.
 This is present since 0.94.0
 Assigning to Elliott because he asked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6765) 'Take a snapshot' interface


[ 
https://issues.apache.org/jira/browse/HBASE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455026#comment-13455026
 ] 

Jonathan Hsieh commented on HBASE-6765:
---

+1 on v2 on review board.

 'Take a snapshot' interface
 ---

 Key: HBASE-6765
 URL: https://issues.apache.org/jira/browse/HBASE-6765
 Project: HBase
  Issue Type: Bug
  Components: client, master
Affects Versions: 0.96.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 0.96.0

 Attachments: hbase-6765-v0.patch


 Add interfaces taking a snapshot. This is in hopes of cutting down on the 
 overhead involved in reviewing snapshots.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6765) 'Take a snapshot' interface


 [ 
https://issues.apache.org/jira/browse/HBASE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-6765:
--

Issue Type: Sub-task  (was: Bug)
Parent: HBASE-6055

 'Take a snapshot' interface
 ---

 Key: HBASE-6765
 URL: https://issues.apache.org/jira/browse/HBASE-6765
 Project: HBase
  Issue Type: Sub-task
  Components: client, master
Affects Versions: 0.96.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 0.96.0

 Attachments: hbase-6765-v0.patch


 Add interfaces taking a snapshot. This is in hopes of cutting down on the 
 overhead involved in reviewing snapshots.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6719) [replication] Data will lose if open a Hlog failed more than maxRetriesMultiplier

2012-09-13 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455027#comment-13455027
 ] 

Jean-Daniel Cryans commented on HBASE-6719:
---

bq. Can we rewrite the patch this way?

Yeah I think this works.

bq. One concern I have: What if the file is actually gone for some reason? In 
that case it seems we'd never stop retrying.

If you go up in the file you'll see that after we looked everywhere I currently 
don't have a good solution for files that are missing completely. Basically my 
heuristic was if I can't open or get to the file and there's another one 
that's available, I'll dump it. This indeed doesn't work if there's a 
transient error that lasts long enough for the retries to exhaust.

Should we introduce a quarantine?

 [replication] Data will lose if open a Hlog failed more than 
 maxRetriesMultiplier
 -

 Key: HBASE-6719
 URL: https://issues.apache.org/jira/browse/HBASE-6719
 Project: HBase
  Issue Type: Bug
  Components: replication
Affects Versions: 0.94.1
Reporter: terry zhang
Assignee: terry zhang
Priority: Critical
 Fix For: 0.94.3

 Attachments: 6719.txt, hbase-6719.patch


 Please Take a look below code
 {code:title=ReplicationSource.java|borderStyle=solid}
 protected boolean openReader(int sleepMultiplier) {
 {
   ...
   catch (IOException ioe) {
   LOG.warn(peerClusterZnode +  Got: , ioe);
   // TODO Need a better way to determinate if a file is really gone but
   // TODO without scanning all logs dir
   if (sleepMultiplier == this.maxRetriesMultiplier) {
 LOG.warn(Waited too long for this file, considering dumping);
 return !processEndOfFile(); // Open a file failed over 
 maxRetriesMultiplier(default 10)
   }
 }
 return true;
   ...
 }
   protected boolean processEndOfFile() {
 if (this.queue.size() != 0) {// Skipped this Hlog . Data loss
   this.currentPath = null;
   this.position = 0;
   return true;
 } else if (this.queueRecovered) {   // Terminate Failover Replication 
 source thread ,data loss
   this.manager.closeRecoveredQueue(this);
   LOG.info(Finished recovering the queue);
   this.running = false;
   return true;
 }
 return false;
   }
 {code} 
 Some Time HDFS will meet some problem but actually Hlog file is OK , So after 
 HDFS back  ,Some data will lose and can not find them back in slave cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5997) Fix concerns raised in HBASE-5922 related to HalfStoreFileReader


[ 
https://issues.apache.org/jira/browse/HBASE-5997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455028#comment-13455028
 ] 

Hudson commented on HBASE-5997:
---

Integrated in HBase-0.94-security #52 (See 
[https://builds.apache.org/job/HBase-0.94-security/52/])
HBASE-5997 Fix concerns raised in HBASE-5922 related to HalfStoreFileReader 
(Revision 1383792)

 Result = SUCCESS
stack : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java


 Fix concerns raised in HBASE-5922 related to HalfStoreFileReader
 

 Key: HBASE-5997
 URL: https://issues.apache.org/jira/browse/HBASE-5997
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6, 0.92.1, 0.94.0, 0.96.0
Reporter: ramkrishna.s.vasudevan
Assignee: Anoop Sam John
 Fix For: 0.96.0, 0.94.2

 Attachments: 5997v3_trunk.txt, 5997v3_trunk.txt, 5997v3_trunk.txt, 
 5997v3_trunk.txt, HBASE-5997_0.94.patch, HBASE-5997_94 V2.patch, 
 HBASE-5997_94 V3.patch, Testcase.patch.txt


 Pls refer to the comment
 https://issues.apache.org/jira/browse/HBASE-5922?focusedCommentId=13269346page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13269346.
 Raised this issue to solve that comment. Just incase we don't forget it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6286) Upgrade maven-compiler-plugin to 2.5.1


[ 
https://issues.apache.org/jira/browse/HBASE-6286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455029#comment-13455029
 ] 

Hudson commented on HBASE-6286:
---

Integrated in HBase-0.94-security #52 (See 
[https://builds.apache.org/job/HBase-0.94-security/52/])
HBASE-6286 Upgrade maven-compiler-plugin to 2.5.1 (Revision 1381861)

 Result = SUCCESS
stack : 
Files : 
* /hbase/branches/0.94/pom.xml


 Upgrade maven-compiler-plugin to 2.5.1
 --

 Key: HBASE-6286
 URL: https://issues.apache.org/jira/browse/HBASE-6286
 Project: HBase
  Issue Type: Improvement
  Components: build
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6286.patch


 time mvn -PlocalTests clean install -DskipTests 
 With 2.5.1:
 |user|1m35.634s|1m31.178s|1m31.366s|
 |sys|0m06.540s|0m05.376s|0m05.488s|
 With 2.0.2 (current):
 |user|2m01.168s|1m54.027s|1m57.799s|
 |sys|0m05.896s|0m05.912s|0m06.032s|

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5922) HalfStoreFileReader seekBefore causes StackOverflowError


[ 
https://issues.apache.org/jira/browse/HBASE-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455030#comment-13455030
 ] 

Hudson commented on HBASE-5922:
---

Integrated in HBase-0.94-security #52 (See 
[https://builds.apache.org/job/HBase-0.94-security/52/])
HBASE-5997 Fix concerns raised in HBASE-5922 related to HalfStoreFileReader 
(Revision 1383792)

 Result = SUCCESS
stack : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java


 HalfStoreFileReader seekBefore causes StackOverflowError
 

 Key: HBASE-5922
 URL: https://issues.apache.org/jira/browse/HBASE-5922
 Project: HBase
  Issue Type: Bug
  Components: client, io
Affects Versions: 0.90.0
 Environment: HBase 0.90.4
Reporter: Nate Putnam
Assignee: Nate Putnam
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: 5922.092.txt, HBASE-5922.patch, HBASE-5922.patch, 
 HBASE-5922.v2.patch, HBASE-5922.v3.patch, HBASE-5922.v4.patch


 Calling HRegionServer.getClosestRowBefore() can cause a stack overflow if the 
 underlying store file is a reference and the row key is in the bottom.
 java.io.IOException: java.io.IOException: java.lang.StackOverflowError
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:990)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:978)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1651)
 at sun.reflect.GeneratedMethodAccessor174.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
 Caused by: java.lang.StackOverflowError
 at 
 org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:147)
 at 
 org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149)
 at 
 org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149)
 at 
 org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149)
 at 
 org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]


[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455031#comment-13455031
 ] 

Hudson commented on HBASE-6649:
---

Integrated in HBase-0.94-security #52 (See 
[https://builds.apache.org/job/HBase-0.94-security/52/])
HBASE-6649 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally 
fails [Part-1] (Revision 1381289)

 Result = SUCCESS
stack : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - 
 queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover 
 [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5631) hbck should handle case where .tableinfo file is missing.


[ 
https://issues.apache.org/jira/browse/HBASE-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455032#comment-13455032
 ] 

Hudson commented on HBASE-5631:
---

Integrated in HBase-0.94-security #52 (See 
[https://builds.apache.org/job/HBase-0.94-security/52/])
HBASE-5631 ADDENDUM (extra comments) (Revision 1382628)
HBASE-5631 hbck should handle case where .tableinfo file is missing (Jie Huang) 
(Revision 1382530)

 Result = SUCCESS
jmhsieh : 
Files : 
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java

jmhsieh : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java


 hbck should handle case where .tableinfo file is missing.
 -

 Key: HBASE-5631
 URL: https://issues.apache.org/jira/browse/HBASE-5631
 Project: HBase
  Issue Type: Improvement
  Components: hbck
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Jonathan Hsieh
Assignee: Jie Huang
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: hbase-5631-addendum.patch, hbase-5631.patch, 
 hbase-5631-v1.patch, hbase-5631-v2.patch


 0.92+ branches have a .tableinfo file which could be missing from hdfs.  hbck 
 should be able to detect and repair this properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6769) HRS.multi eats NoSuchColumnFamilyException since HBASE-5021


[ 
https://issues.apache.org/jira/browse/HBASE-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455035#comment-13455035
 ] 

Hudson commented on HBASE-6769:
---

Integrated in HBase-0.94-security #52 (See 
[https://builds.apache.org/job/HBase-0.94-security/52/])
HBASE-6769 HRS.multi eats NoSuchColumnFamilyException (Elliott Clark) 
(Revision 1384378)

 Result = SUCCESS
larsh : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HConstants.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java


 HRS.multi eats NoSuchColumnFamilyException since HBASE-5021
 ---

 Key: HBASE-6769
 URL: https://issues.apache.org/jira/browse/HBASE-6769
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.94.1
Reporter: Jean-Daniel Cryans
Assignee: Elliott Clark
Priority: Critical
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6769-0.94-0.patch, HBASE-6769-0.94-1.patch, 
 HBASE-6769-0.patch


 I think this is a pretty major usability regression, since HBASE-5021 this is 
 what you get in the client when using a wrong family:
 {noformat}
 2012-09-11 09:45:29,634 WARN 
 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 
 action: DoNotRetryIOException: 1 time, servers with issues: sfor3s44:10304, 
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1601)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1377)
   at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916)
   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:772)
   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:747)
 {noformat}
 Then you have to log on the server to understand what failed.
 Since everything is now a multi call, even single puts in the shell fail like 
 this.
 This is present since 0.94.0
 Assigning to Elliott because he asked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6757) Very inefficient behaviour of scan using FilterList


[ 
https://issues.apache.org/jira/browse/HBASE-6757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455034#comment-13455034
 ] 

Hudson commented on HBASE-6757:
---

Integrated in HBase-0.94-security #52 (See 
[https://builds.apache.org/job/HBase-0.94-security/52/])
HBASE-6757 Very inefficient behaviour of scan using FilterList (Revision 
1383749)

 Result = SUCCESS
larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java


 Very inefficient behaviour of scan using FilterList
 ---

 Key: HBASE-6757
 URL: https://issues.apache.org/jira/browse/HBASE-6757
 Project: HBase
  Issue Type: Bug
  Components: filters
Affects Versions: 0.90.6
Reporter: Jerry Lam
Assignee: Lars Hofhansl
 Fix For: 0.96.0, 0.94.2

 Attachments: 6757.txt, CopyOfTestColumnPrefixFilter.java, 
 DisplayFilter.java


 The behaviour of scan is very inefficient when using with FilterList.
 The FilterList rewrites the return code from NEXT_ROW to SKIP from a filter 
 if Operator.MUST_PASS_ALL is used. 
 This happens when using ColumnPrefixFilter. Even though the 
 ColumnPrefixFilter indicates to jump to NEXT_ROW because no further match can 
 be found, the scan continues to scan all versions of a column in that row and 
 all columns of that row because the ReturnCode from ColumnPrefixFilter has 
 been rewritten by the FilterList from NEXT_ROW to SKIP. 
 This is particularly inefficient when there are many versions in a column 
 because the check is performed on all versions of the column instead of just 
 by checking the qualifier of the column name.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6432) HRegionServer doesn't properly set clusterId in conf


[ 
https://issues.apache.org/jira/browse/HBASE-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455036#comment-13455036
 ] 

Hudson commented on HBASE-6432:
---

Integrated in HBase-0.94-security #52 (See 
[https://builds.apache.org/job/HBase-0.94-security/52/])
HBASE-6432 HRegionServer doesn't properly set clusterId in conf (Revision 
1381907)

 Result = SUCCESS
stack : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java


 HRegionServer doesn't properly set clusterId in conf
 

 Key: HBASE-6432
 URL: https://issues.apache.org/jira/browse/HBASE-6432
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.96.0
Reporter: Francis Liu
Assignee: Francis Liu
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6432_94.patch, HBASE-6432.patch


 ClusterId is normally set into the passed conf during instantiation of an 
 HTable class. In the case of a HRegionServer this is bypassed and set to 
 default since getMaster() since it uses HBaseRPC to create the proxy 
 directly and bypasses the class which retrieves and sets the correct 
 clusterId. 
 This becomes a problem with clients (ie within a coprocessor) using 
 delegation tokens for authentication. Since the token's service will be the 
 correct clusterId and while the TokenSelector is looking for one with service 
 default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5206) Port HBASE-5155 to 0.92, 0.94, and TRUNK


[ 
https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455037#comment-13455037
 ] 

Hudson commented on HBASE-5206:
---

Integrated in HBase-0.94-security #52 (See 
[https://builds.apache.org/job/HBase-0.94-security/52/])
HBASE-6710 0.92/0.94 compatibility issues due to HBASE-5206 (Revision 
1384181)

 Result = SUCCESS
gchanan : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTableReadOnly.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTable.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTableReadOnly.java


 Port HBASE-5155 to 0.92, 0.94, and TRUNK
 

 Key: HBASE-5206
 URL: https://issues.apache.org/jira/browse/HBASE-5206
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Ted Yu
Assignee: Ashutosh Jindal
 Fix For: 0.94.0, 0.96.0

 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 
 5206_92_latest_2.patch, 5206_92_latest_3.patch, 5206_trunk_1.patch, 
 5206_trunk_latest_1.patch, 5206_trunk_latest_2.patch, 
 5206_trunk_latest_3.patch


 This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should 
 not happen parallely leading to recreation of regions that were deleted) to 
 0.92 and TRUNK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6715) TestFromClientSide.testCacheOnWriteEvictOnClose is flaky


[ 
https://issues.apache.org/jira/browse/HBASE-6715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455038#comment-13455038
 ] 

Hudson commented on HBASE-6715:
---

Integrated in HBase-0.94-security #52 (See 
[https://builds.apache.org/job/HBase-0.94-security/52/])
HBASE-6715 TestFromClientSide.testCacheOnWriteEvictOnClose is flaky 
(Revision 1381678)

 Result = SUCCESS
jxiang : 
Files : 
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java


 TestFromClientSide.testCacheOnWriteEvictOnClose is flaky
 

 Key: HBASE-6715
 URL: https://issues.apache.org/jira/browse/HBASE-6715
 Project: HBase
  Issue Type: Test
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: trunk-6715.patch


 Occasionally, this test fails:
 {noformat}
 expected:2049 but was:2069
 Stacktrace
 java.lang.AssertionError: expected:2049 but was:2069
 at org.junit.Assert.fail(Assert.java:93)
 at org.junit.Assert.failNotEquals(Assert.java:647)
 at org.junit.Assert.assertEquals(Assert.java:128)
 at org.junit.Assert.assertEquals(Assert.java:472)
 at org.junit.Assert.assertEquals(Assert.java:456)
 at 
 org.apache.hadoop.hbase.client.TestFromClientSide.testCacheOnWriteEvictOnClose(TestFromClientSide.java:4248)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 {noformat}
 It could be because there is other thread still accessing the cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6734) Code duplication in LoadIncrementalHFiles


[ 
https://issues.apache.org/jira/browse/HBASE-6734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455033#comment-13455033
 ] 

Hudson commented on HBASE-6734:
---

Integrated in HBase-0.94-security #52 (See 
[https://builds.apache.org/job/HBase-0.94-security/52/])
HBASE-6734 Code duplication in LoadIncrementalHFiles (Richard Ding) 
(Revision 1382354)

 Result = SUCCESS
tedyu : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java


 Code duplication in LoadIncrementalHFiles
 -

 Key: HBASE-6734
 URL: https://issues.apache.org/jira/browse/HBASE-6734
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.94.1
Reporter: Richard Ding
Assignee: Richard Ding
Priority: Minor
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6734.patch


 This was due to the merge of two Jiras:
 {code}
   if (queue.isEmpty()) {
 LOG.warn(Bulk load operation did not find any files to load in  +
 directory  + hfofDir.toUri() + .  Does it contain files in  +
 subdirectories that correspond to column family names?);
 return;
   }
   if (queue.isEmpty()) {
 LOG.warn(Bulk load operation did not find any files to load in  +
 directory  + hfofDir.toUri() + .  Does it contain files in  +
 subdirectories that correspond to column family names?);
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6713) Stopping META/ROOT RS may take 50mins when some region is splitting


[ 
https://issues.apache.org/jira/browse/HBASE-6713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455040#comment-13455040
 ] 

Hudson commented on HBASE-6713:
---

Integrated in HBase-0.94-security #52 (See 
[https://builds.apache.org/job/HBase-0.94-security/52/])
HBASE-6713 Stopping META/ROOT RS may take 50mins when some region is 
splitting (Chunhui) (Revision 1382163)

 Result = SUCCESS
tedyu : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java


 Stopping META/ROOT RS may take 50mins when some region is splitting
 ---

 Key: HBASE-6713
 URL: https://issues.apache.org/jira/browse/HBASE-6713
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.1
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0, 0.94.2

 Attachments: 6713.92-94, 6713v3.patch, HBASE-6713.patch, 
 HBASE-6713v2.patch


 When we stop the RS carrying ROOT/META, if it is in the splitting for some 
 region, the whole stopping process may take 50 mins.
 The reason is :
 1.ROOT/META region is closed when stopping the regionserver
 2.The Split Transaction failed updating META and it will retry
 3.The retry num is 100, and the total time is about 50 mins as default;
 This configuration is set by 
 HConnectionManager#setServerSideHConnectionRetries
 I think 50 mins is too long to acceptable, my suggested solution is closing 
 MetaTable regions after the compact/split thread is closed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6710) 0.92/0.94 compatibility issues due to HBASE-5206


[ 
https://issues.apache.org/jira/browse/HBASE-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455041#comment-13455041
 ] 

Hudson commented on HBASE-6710:
---

Integrated in HBase-0.94-security #52 (See 
[https://builds.apache.org/job/HBase-0.94-security/52/])
HBASE-6710 0.92/0.94 compatibility issues due to HBASE-5206 (Revision 
1384181)

 Result = SUCCESS
gchanan : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTableReadOnly.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTable.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTableReadOnly.java


 0.92/0.94 compatibility issues due to HBASE-5206
 

 Key: HBASE-6710
 URL: https://issues.apache.org/jira/browse/HBASE-6710
 Project: HBase
  Issue Type: Bug
Reporter: Gregory Chanan
Assignee: Gregory Chanan
Priority: Critical
 Fix For: 0.94.2

 Attachments: HBASE-6710-v3.patch


 HBASE-5206 introduces some compatibility issues between {0.94,0.94.1} and
 {0.92.0,0.92.1}.  The release notes of HBASE-5155 describes the issue 
 (HBASE-5206 is a backport of HBASE-5155).
 I think we can make 0.94.2 compatible with both {0.94.0,0.94.1} and 
 {0.92.0,0.92.1}, although one of those sets will require configuration 
 changes.
 The basic problem is that there is a znode for each table 
 zookeeper.znode.tableEnableDisable that is handled differently.
 On 0.92.0 and 0.92.1 the states for this table are:
 [ disabled, disabling, enabling ] or deleted if the table is enabled
 On 0.94.1 and 0.94.2 the states for this table are:
 [ disabled, disabling, enabling, enabled ]
 What saves us is that the location of this znode is configurable.  So the 
 basic idea is to have the 0.94.2 master write two different znodes, 
 zookeeper.znode.tableEnableDisabled92 and 
 zookeeper.znode.tableEnableDisabled94 where the 92 node is in 92 format, 
 the 94 node is in 94 format.  And internally, the master would only use the 
 94 format in order to solve the original bug HBASE-5155 solves.
 We can of course make one of these the same default as exists now, so we 
 don't need to make config changes for one of 0.92 or 0.94 clients.  I argue 
 that 0.92 clients shouldn't have to make config changes for the same reason I 
 argued above.  But that is debatable.
 Then, I think the only question left is the question of how to bring along 
 the {0.94.0, 0.94.1} crew.  A {0.94.0, 0.94.1} client would work against a 
 0.94.2 cluster by just configuring zookeeper.znode.tableEnableDisable in 
 the client to be whatever zookeeper.znode.tableEnableDisabled94 is in the 
 cluster.  A 0.94.2 client would work against both a {0.94.0, 0.94.1} and 
 {0.92.0, 0.92.1} cluster if it had HBASE-6268 applied.  About rolling upgrade 
 from {0.94.0, 0.94.1} to 0.94.2 -- I'd have to think about that.  Do the 
 regionservers ever read the tableEnableDisabled znode?
 On the mailing list, Lars H suggested the following:
 The only input I'd have is that format we'll use going forward will not have 
 a version attached to it.
 So maybe the 92 version would still be called 
 zookeeper.znode.tableEnableDisable and the new node could have a different 
 name zookeeper.znode.tableEnableDisableNew (or something).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6288) In hbase-daemons.sh, description of the default backup-master file path is wrong


[ 
https://issues.apache.org/jira/browse/HBASE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455042#comment-13455042
 ] 

Hudson commented on HBASE-6288:
---

Integrated in HBase-0.94-security #52 (See 
[https://builds.apache.org/job/HBase-0.94-security/52/])
HBASE-6288 In hbase-daemons.sh, description of the default backup-master 
file path is wrong (Revision 1381219)

 Result = SUCCESS
stack : 
Files : 
* /hbase/branches/0.94/bin/master-backup.sh
* /hbase/branches/0.94/conf/hbase-env.sh


 In hbase-daemons.sh, description of the default backup-master file path is 
 wrong
 

 Key: HBASE-6288
 URL: https://issues.apache.org/jira/browse/HBASE-6288
 Project: HBase
  Issue Type: Task
  Components: master, scripts, shell
Affects Versions: 0.92.0, 0.92.1, 0.94.0
Reporter: Benjamin Kim
Assignee: Benjamin Kim
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: HBASE-6288-92-1.patch, HBASE-6288-92.patch, 
 HBASE-6288-94.patch, HBASE-6288-trunk.patch


 In hbase-daemons.sh, description of the default backup-master file path is 
 wrong
 {code}
 #   HBASE_BACKUP_MASTERS File naming remote hosts.
 # Default is ${HADOOP_CONF_DIR}/backup-masters
 {code}
 it says the default backup-masters file path is at a hadoop-conf-dir, but 
 shouldn't this be HBASE_CONF_DIR?
 also adding following lines to conf/hbase-env.sh would be helpful
 {code}
 # File naming hosts on which backup HMaster will run.  
 $HBASE_HOME/conf/backup-masters by default.
 export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6340) HBase RPC should allow protocol extension with common interfaces.


[ 
https://issues.apache.org/jira/browse/HBASE-6340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455039#comment-13455039
 ] 

Hudson commented on HBASE-6340:
---

Integrated in HBase-0.94-security #52 (See 
[https://builds.apache.org/job/HBase-0.94-security/52/])
HBASE-6340 Reapply with fix for SecureRpcEngine (Revision 1383754)
HBASE-6340 HBase RPC should allow protocol extension with common interfaces.: 
REVERT (Revision 1383537)
HBASE-6340 HBase RPC should allow protocol extension with common interfaces. 
(Revision 1382207)
HBASE-6340 HBase RPC should allow protocol extension with common interfaces. 
(Revision 1382206)

 Result = SUCCESS
larsh : 
Files : 
* 
/hbase/branches/0.94/security/src/main/java/org/apache/hadoop/hbase/ipc/SecureRpcEngine.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/coprocessor/Exec.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/Invocation.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/WritableRpcEngine.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/ipc/TestProtocolExtension.java

stack : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/coprocessor/Exec.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/Invocation.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/WritableRpcEngine.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/ipc/TestProtocolExtension.java

stack : 
Files : 
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/ipc/TestProtocolExtension.java

stack : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/coprocessor/Exec.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/Invocation.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/WritableRpcEngine.java


 HBase RPC should allow protocol extension with common interfaces.
 -

 Key: HBASE-6340
 URL: https://issues.apache.org/jira/browse/HBASE-6340
 Project: HBase
  Issue Type: Bug
  Components: coprocessors, regionserver
Affects Versions: 0.92.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Fix For: 0.94.2

 Attachments: 6340-6762-combined.txt, 6340-6762-combined-v2.txt, 
 6340-RPCInvocation.patch, RPCInvocation.patch


 HBase RPC fails if MyProtocol extends an interface, which is not a 
 VersionedProtocol even if MyProtocol also directly extends VersionedProtocol. 
 The reason is that rpc Invocation uses Method.getDeclaringClass(), which 
 returns the interface class rather than the class of MyProtocol.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6765) 'Take a snapshot' interface


[ 
https://issues.apache.org/jira/browse/HBASE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455047#comment-13455047
 ] 

Jonathan Hsieh commented on HBASE-6765:
---

turned into a sub-issue of the snapshots umbrella issue.  We'll resolve the 
umbrella after we get the 3x +1's and merge into trunk.  When sub-issues 
resolved it means it was committed to dev-branch.  Sound good?

 'Take a snapshot' interface
 ---

 Key: HBASE-6765
 URL: https://issues.apache.org/jira/browse/HBASE-6765
 Project: HBase
  Issue Type: Sub-task
  Components: client, master
Affects Versions: 0.96.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 0.96.0

 Attachments: hbase-6765-v0.patch


 Add interfaces taking a snapshot. This is in hopes of cutting down on the 
 overhead involved in reviewing snapshots.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6768) HBase Rest server crashes if client tries to retrieve data size 5 MB

2012-09-13 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455054#comment-13455054
 ] 

Andrew Purtell commented on HBASE-6768:
---

Are you running REST with {{-XX:OnOutOfMemoryError=kill -9 %p}} (HBASE-4769)? 
That is one reason why a HBase process might be dead with nothing logged and no 
hs_err file. 

Have you tried setting/increasing MaxDirectMemory, e.g. 
{{-XX:MaxDirectMemorySize=$LARGE_VALUE}}? 


 HBase Rest server crashes if client tries to retrieve data size  5 MB
 --

 Key: HBASE-6768
 URL: https://issues.apache.org/jira/browse/HBASE-6768
 Project: HBase
  Issue Type: Bug
  Components: rest
Affects Versions: 0.90.5
Reporter: Mubarak Seyed
  Labels: noob

 I have a CF with one qualifier, data size is  5 MB, when i try to read the 
 raw binary data as octet-stream using curl, rest server got crashed and curl 
 throws exception as
 {code}
  curl -v -H Accept: application/octet-stream 
 http://abcdefgh-hbase003.test1.test.com:9090/table1/row_key1/cf:qualifer1  
 /tmp/out
 * About to connect() to abcdefgh-hbase003.test1.test.com port 9090
 *   Trying xx.xx.xx.xxx... connected
 * Connected to abcdefgh-hbase003.test1.test.com (xx.xxx.xx.xxx) port 9090
  GET /table1/row_key1/cf:qualifer1 HTTP/1.1
  User-Agent: curl/7.15.5 (x86_64-redhat-linux-gnu) libcurl/7.15.5 
  OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.5
  Host: abcdefgh-hbase003.test1.test.com:9090
  Accept: application/octet-stream
  
   % Total% Received % Xferd  Average Speed   TimeTime Time  
 Current
  Dload  Upload   Total   SpentLeft  Speed
   0 00 00 0  0  0 --:--:--  0:00:02 --:--:-- 
 0 HTTP/1.1 200 OK
  Content-Length: 5129836
  X-Timestamp: 1347338813129
  Content-Type: application/octet-stream
   0 5009k0 162720 0   7460  0  0:11:27  0:00:02  0:11:25 
 13872transfer closed with 1148524 bytes remaining to read
  77 5009k   77 3888k0 0  1765k  0  0:00:02  0:00:02 --:--:-- 
 3253k* Closing connection #0
 curl: (18) transfer closed with 1148524 bytes remaining to read
 {code}
 Couldn't find the exception in rest server log or no core dump either. This 
 issue is constantly reproducible. Even i tried with HBase Rest client 
 (HRemoteTable) and i could recreate this issue if the data size is  10 MB 
 (even with MIME_PROTOBUF accept header)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6591) checkAndPut executed/not metrics


[ 
https://issues.apache.org/jira/browse/HBASE-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455057#comment-13455057
 ] 

Gregory Chanan commented on HBASE-6591:
---

We had a customer with a very slow-running application that consisted mainly of 
checkAndPuts.  checkAndPut latency looked good.  It would have been nice to 
just look at the metrics and be able to eliminate the possibility that their 
checkAndPuts simply weren't getting executed.

 checkAndPut executed/not metrics
 

 Key: HBASE-6591
 URL: https://issues.apache.org/jira/browse/HBASE-6591
 Project: HBase
  Issue Type: Task
  Components: metrics, regionserver
Reporter: Gregory Chanan
Assignee: Gregory Chanan
Priority: Minor
 Fix For: 0.96.0


 checkAndPut/checkAndDelete return true if the new put was executed, false 
 otherwise.
 So clients can figure out this metric for themselves, but it would be useful 
 to get a look at what is happening on the cluster as a whole, across all 
 clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6765) 'Take a snapshot' interface


[ 
https://issues.apache.org/jira/browse/HBASE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455072#comment-13455072
 ] 

Jesse Yates commented on HBASE-6765:


@Jon - my bad on the label. +1 on what resolved means.

 'Take a snapshot' interface
 ---

 Key: HBASE-6765
 URL: https://issues.apache.org/jira/browse/HBASE-6765
 Project: HBase
  Issue Type: Sub-task
  Components: client, master
Affects Versions: 0.96.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 0.96.0

 Attachments: hbase-6765-v0.patch


 Add interfaces taking a snapshot. This is in hopes of cutting down on the 
 overhead involved in reviewing snapshots.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6765) 'Take a snapshot' interface


 [ 
https://issues.apache.org/jira/browse/HBASE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-6765:
--

Component/s: snapshots

 'Take a snapshot' interface
 ---

 Key: HBASE-6765
 URL: https://issues.apache.org/jira/browse/HBASE-6765
 Project: HBase
  Issue Type: Sub-task
  Components: client, master, snapshots
Affects Versions: 0.96.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 0.96.0

 Attachments: hbase-6765-v0.patch


 Add interfaces taking a snapshot. This is in hopes of cutting down on the 
 overhead involved in reviewing snapshots.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5452) Fixes for HBase shell with protobuf-based data


[ 
https://issues.apache.org/jira/browse/HBASE-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455098#comment-13455098
 ] 

Gregory Chanan commented on HBASE-5452:
---

What do you think needs to be done here, Chris?

Just manually go through all the shell commands and make sure nothing broke?

 Fixes for HBase shell with protobuf-based data
 --

 Key: HBASE-5452
 URL: https://issues.apache.org/jira/browse/HBASE-5452
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Todd Lipcon
Assignee: Chris Trezzo



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6765) 'Take a snapshot' interface


[ 
https://issues.apache.org/jira/browse/HBASE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455111#comment-13455111
 ] 

Jesse Yates commented on HBASE-6765:


I'll let the RB sit up there for another day, and then roll it into the dev 
branch (modulo nit fixes).

 'Take a snapshot' interface
 ---

 Key: HBASE-6765
 URL: https://issues.apache.org/jira/browse/HBASE-6765
 Project: HBase
  Issue Type: Sub-task
  Components: client, master, snapshots
Affects Versions: 0.96.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 0.96.0

 Attachments: hbase-6765-v0.patch


 Add interfaces taking a snapshot. This is in hopes of cutting down on the 
 overhead involved in reviewing snapshots.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6637) Move DaemonThreadFactory into Threads and Threads to hbase-common


[ 
https://issues.apache.org/jira/browse/HBASE-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455119#comment-13455119
 ] 

Jesse Yates commented on HBASE-6637:


[~saint@gmail.com] - any thoughts on why hadoopqa isn't running? Should we 
resubmit (again)?

 Move DaemonThreadFactory into Threads and Threads to hbase-common
 -

 Key: HBASE-6637
 URL: https://issues.apache.org/jira/browse/HBASE-6637
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Jesse Yates
Assignee: Jesse Yates
Priority: Minor
 Fix For: 0.96.0

 Attachments: hbase-6637-r1.patch, hbase-6637-v0.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HBASE-5354) Source to standalone deployment script


 [ 
https://issues.apache.org/jira/browse/HBASE-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesse Yates reopened HBASE-5354:



Reopening issue, since its apparently useful. Actually came up yesterday with 
testing one of the pom changes I made. Can you give it another spin @stack and 
commit if you like it?

 Source to standalone deployment script
 --

 Key: HBASE-5354
 URL: https://issues.apache.org/jira/browse/HBASE-5354
 Project: HBase
  Issue Type: New Feature
  Components: build, scripts
Affects Versions: 0.94.0
Reporter: Jesse Yates
Assignee: Jesse Yates
Priority: Minor
 Attachments: bash_HBASE-5354.patch, bash_HBASE-5354-v0.patch, 
 bash_HBASE-5354-v1.patch


 Automating the testing of source code in a 'real' instance can be a bit of a 
 pain, even getting it into standalone mode.
 Steps you need to go through:
 1) Build the project
 2) Copy it to the deployment directory
 3) Shutdown the current cluster (if it is running)
 4) Untar the tar
 5) Update the configs to point to a local data cluster
 6) Startup the new deployment
 Yeah, its not super difficult, but it would be nice to just have a script to 
 make it button push easy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6178) LoadTest tool no longer packaged after the modularization


[ 
https://issues.apache.org/jira/browse/HBASE-6178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455135#comment-13455135
 ] 

Lars Hofhansl commented on HBASE-6178:
--

I'll double check the test failures and then commit.

 LoadTest tool no longer packaged after the modularization
 -

 Key: HBASE-6178
 URL: https://issues.apache.org/jira/browse/HBASE-6178
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Jesse Yates
 Attachments: hbase-6178-v0.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6611) Forcing region state offline cause double assignment

2012-09-13 Thread Jimmy Xiang (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455136#comment-13455136
]

Jimmy Xiang commented on HBASE-6611:

Sure, I will do that to make sure existing function is not broken, and there is
no substantial performance drop.

Another thing I'd like to address in this jira is that bulk assigning currently
doesn't pass the offlined ZK node version to region server as regular
assignment does. I think it is needed to avoid competing assigning the same
region at the same time.

Forcing region state offline cause double assignment

Key: HBASE-6611
URL: https://issues.apache.org/jira/browse/HBASE-6611
Project: HBase
Issue Type: Bug
Components: master
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Fix For: 0.96.0

In assigning a region, assignment manager forces the region state offline if
it is not. This could cause double assignment, for example, if the region is
already assigned and in the Open state, you should not just change it's state
to Offline, and assign it again.
I think this could be the root cause for all double assignments IF the region
state is reliable.
After this loophole is closed, TestHBaseFsck should come up a different way
to create some assignment inconsistencies, for example, calling region server
to open a region directly.

[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]


[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455143#comment-13455143
 ] 

Lars Hofhansl commented on HBASE-6649:
--

Just failed again: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2852//testReport/

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - 
 queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover 
 [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5456) Introduce PowerMock into our unit tests to reduce unnecessary method exposure


[ 
https://issues.apache.org/jira/browse/HBASE-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455151#comment-13455151
 ] 

Jesse Yates commented on HBASE-5456:


Reviving this discussion after talking at recent pow-wow. In short, powermock 
has some _interesting_ features -making it very powerful - that will really 
help to cleanup the codebase. 

For instance, it can help get rid of are the test visible methods. Yes, on one 
hand you could subclass the class you are testing to get at the protected 
methods, but then you have the issue of making that class loadable as well. It 
can easily spiral out of control where everything is dynamically loadable, just 
so you can check the state of one variable. Also, this can lead to inadvertent 
race conditions for timing related things, where the test-exposed method could 
be really simple.

Also, it helps you get real objects into a state that is more easily testable. 
Rather than rejiggering everything through a high-level interface, you can 
specify things succinctly and more easily when you can introspect the object.

Another great use is for managing timing issues. A lot of times to test timing 
of things we rely on sleeps or adding latches. The former is really brittle and 
the latter makes the code incredibly more complicated than it needs to be, just 
for testing.

Problems with powermock:
* complicated - yeah, it can be a bit funky, but you get used to it.
* brittle - its doing reflection so there are a lot of string method/object 
names used. That's the problem with introspection of objects and the price we 
pay for cleaner running code. Tests break when you change stuff though, so you 
know if something goes arwy.

Stack raised a possible concern that he couldn't get powermock working on the 
current codebase. However, I volunteered to spend the time to figure that out 
(at least initially) and don't think it will be all that bad.

Thoughts? If people are +1, I'll work on a simple patch that adds powermock to 
the pom and makes a change to a test to use it.

 Introduce PowerMock into our unit tests to reduce unnecessary method exposure
 -

 Key: HBASE-5456
 URL: https://issues.apache.org/jira/browse/HBASE-5456
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu

 We should introduce PowerMock into our unit tests so that we don't have to 
 expose methods intended to be used by unit tests.
 Here was Benoit's reply to a user of asynchbase about testability:
 OpenTSDB has unit tests that are mocking out HBaseClient just fine
 [1].  You can mock out pretty much anything on the JVM: final,
 private, JDK stuff, etc.  All you need is the right tools.  I've been
 very happy with PowerMock.  It supports Mockito and EasyMock.
 I've never been keen on mutilating public interfaces for the sake of
 testing.  With tools like PowerMock, we can keep the public APIs tidy
 while mocking and overriding anything, even in the most private guts
 of the classes.
  [1] 
 https://github.com/stumbleupon/opentsdb/blob/master/src/uid/TestUniqueId.java#L66

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6504) Adding GC details prevents HBase from starting in non-distributed mode


 [ 
https://issues.apache.org/jira/browse/HBASE-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Drzal updated HBASE-6504:
-

Attachment: HBASE-6504-output.txt

Shows output with/without gc args

 Adding GC details prevents HBase from starting in non-distributed mode
 --

 Key: HBASE-6504
 URL: https://issues.apache.org/jira/browse/HBASE-6504
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Benoit Sigoure
Assignee: Michael Drzal
Priority: Trivial
  Labels: noob
 Attachments: HBASE-6504-output.txt


 The {{conf/hbase-env.sh}} that ships with HBase contains a few commented out 
 examples of variables that could be useful, such as adding 
 {{-XX:+PrintGCDetails -XX:+PrintGCDateStamps}} to {{HBASE_OPTS}}.  This has 
 the annoying side effect that the JVM prints a summary of memory usage when 
 it exits, and it does so on stdout:
 {code}
 $ ./bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool 
 hbase.cluster.distributed
 false
 Heap
  par new generation   total 19136K, used 4908K [0x00073a20, 
 0x00073b6c, 0x00075186)
   eden space 17024K,  28% used [0x00073a20, 0x00073a6cb0a8, 
 0x00073b2a)
   from space 2112K,   0% used [0x00073b2a, 0x00073b2a, 
 0x00073b4b)
   to   space 2112K,   0% used [0x00073b4b, 0x00073b4b, 
 0x00073b6c)
  concurrent mark-sweep generation total 63872K, used 0K [0x00075186, 
 0x0007556c, 0x0007f5a0)
  concurrent-mark-sweep perm gen total 21248K, used 6994K [0x0007f5a0, 
 0x0007f6ec, 0x0008)
 $ ./bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool 
 hbase.cluster.distributed /dev/null
 (nothing printed)
 {code}
 And this confuses {{bin/start-hbase.sh}} when it does
 {{distMode=`$bin/hbase --config $HBASE_CONF_DIR 
 org.apache.hadoop.hbase.util.HBaseConfTool hbase.cluster.distributed`}}, 
 because then the {{distMode}} variable is not just set to {{false}}, it also 
 contains all this JVM spam.
 If you don't pay enough attention and realize that 3 processes are getting 
 started (ZK, HM, RS) instead of just one (HM), then you end up with this 
 confusing error message:
 {{Could not start ZK at requested port of 2181.  ZK was started at port: 
 2182.  Aborting as clients (e.g. shell) will not be able to find this ZK 
 quorum.}}, which is even more puzzling because when you run {{netstat}} to 
 see who owns that port, then you won't find any rogue process other than the 
 one you just started.
 I'm wondering if the fix is not to just change the {{if [ $distMode == 
 'false' ]}} to a {{switch $distMode case (false*)}} type of test, to work 
 around this annoying JVM misfeature that pollutes stdout.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6504) Adding GC details prevents HBase from starting in non-distributed mode


 [ 
https://issues.apache.org/jira/browse/HBASE-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Drzal updated HBASE-6504:
-

Attachment: HBASE-6504.patch

 Adding GC details prevents HBase from starting in non-distributed mode
 --

 Key: HBASE-6504
 URL: https://issues.apache.org/jira/browse/HBASE-6504
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Benoit Sigoure
Assignee: Michael Drzal
Priority: Trivial
  Labels: noob
 Attachments: HBASE-6504-output.txt, HBASE-6504.patch


 The {{conf/hbase-env.sh}} that ships with HBase contains a few commented out 
 examples of variables that could be useful, such as adding 
 {{-XX:+PrintGCDetails -XX:+PrintGCDateStamps}} to {{HBASE_OPTS}}.  This has 
 the annoying side effect that the JVM prints a summary of memory usage when 
 it exits, and it does so on stdout:
 {code}
 $ ./bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool 
 hbase.cluster.distributed
 false
 Heap
  par new generation   total 19136K, used 4908K [0x00073a20, 
 0x00073b6c, 0x00075186)
   eden space 17024K,  28% used [0x00073a20, 0x00073a6cb0a8, 
 0x00073b2a)
   from space 2112K,   0% used [0x00073b2a, 0x00073b2a, 
 0x00073b4b)
   to   space 2112K,   0% used [0x00073b4b, 0x00073b4b, 
 0x00073b6c)
  concurrent mark-sweep generation total 63872K, used 0K [0x00075186, 
 0x0007556c, 0x0007f5a0)
  concurrent-mark-sweep perm gen total 21248K, used 6994K [0x0007f5a0, 
 0x0007f6ec, 0x0008)
 $ ./bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool 
 hbase.cluster.distributed /dev/null
 (nothing printed)
 {code}
 And this confuses {{bin/start-hbase.sh}} when it does
 {{distMode=`$bin/hbase --config $HBASE_CONF_DIR 
 org.apache.hadoop.hbase.util.HBaseConfTool hbase.cluster.distributed`}}, 
 because then the {{distMode}} variable is not just set to {{false}}, it also 
 contains all this JVM spam.
 If you don't pay enough attention and realize that 3 processes are getting 
 started (ZK, HM, RS) instead of just one (HM), then you end up with this 
 confusing error message:
 {{Could not start ZK at requested port of 2181.  ZK was started at port: 
 2182.  Aborting as clients (e.g. shell) will not be able to find this ZK 
 quorum.}}, which is even more puzzling because when you run {{netstat}} to 
 see who owns that port, then you won't find any rogue process other than the 
 one you just started.
 I'm wondering if the fix is not to just change the {{if [ $distMode == 
 'false' ]}} to a {{switch $distMode case (false*)}} type of test, to work 
 around this annoying JVM misfeature that pollutes stdout.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6504) Adding GC details prevents HBase from starting in non-distributed mode


 [ 
https://issues.apache.org/jira/browse/HBASE-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Drzal updated HBASE-6504:
-

Status: Patch Available  (was: Open)

Fixed this for rolling-restart.sh, start-hbase.sh, and stop-hbase.sh by using 
head -1 to take the first line.

 Adding GC details prevents HBase from starting in non-distributed mode
 --

 Key: HBASE-6504
 URL: https://issues.apache.org/jira/browse/HBASE-6504
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Benoit Sigoure
Assignee: Michael Drzal
Priority: Trivial
  Labels: noob
 Attachments: HBASE-6504-output.txt, HBASE-6504.patch


 The {{conf/hbase-env.sh}} that ships with HBase contains a few commented out 
 examples of variables that could be useful, such as adding 
 {{-XX:+PrintGCDetails -XX:+PrintGCDateStamps}} to {{HBASE_OPTS}}.  This has 
 the annoying side effect that the JVM prints a summary of memory usage when 
 it exits, and it does so on stdout:
 {code}
 $ ./bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool 
 hbase.cluster.distributed
 false
 Heap
  par new generation   total 19136K, used 4908K [0x00073a20, 
 0x00073b6c, 0x00075186)
   eden space 17024K,  28% used [0x00073a20, 0x00073a6cb0a8, 
 0x00073b2a)
   from space 2112K,   0% used [0x00073b2a, 0x00073b2a, 
 0x00073b4b)
   to   space 2112K,   0% used [0x00073b4b, 0x00073b4b, 
 0x00073b6c)
  concurrent mark-sweep generation total 63872K, used 0K [0x00075186, 
 0x0007556c, 0x0007f5a0)
  concurrent-mark-sweep perm gen total 21248K, used 6994K [0x0007f5a0, 
 0x0007f6ec, 0x0008)
 $ ./bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool 
 hbase.cluster.distributed /dev/null
 (nothing printed)
 {code}
 And this confuses {{bin/start-hbase.sh}} when it does
 {{distMode=`$bin/hbase --config $HBASE_CONF_DIR 
 org.apache.hadoop.hbase.util.HBaseConfTool hbase.cluster.distributed`}}, 
 because then the {{distMode}} variable is not just set to {{false}}, it also 
 contains all this JVM spam.
 If you don't pay enough attention and realize that 3 processes are getting 
 started (ZK, HM, RS) instead of just one (HM), then you end up with this 
 confusing error message:
 {{Could not start ZK at requested port of 2181.  ZK was started at port: 
 2182.  Aborting as clients (e.g. shell) will not be able to find this ZK 
 quorum.}}, which is even more puzzling because when you run {{netstat}} to 
 see who owns that port, then you won't find any rogue process other than the 
 one you just started.
 I'm wondering if the fix is not to just change the {{if [ $distMode == 
 'false' ]}} to a {{switch $distMode case (false*)}} type of test, to work 
 around this annoying JVM misfeature that pollutes stdout.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-5306) Add support for protocol buffer based RPC

2012-09-13 Thread Devaraj Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das resolved HBASE-5306.


Resolution: Duplicate

[~gchanan]Yes it can be closed I think. We have taken care of the issue in 
other jiras as you noted.

 Add support for protocol buffer based RPC
 -

 Key: HBASE-5306
 URL: https://issues.apache.org/jira/browse/HBASE-5306
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Devaraj Das
Assignee: Devaraj Das

 This will help HBase to achieve wire compatibility across versions. The idea 
 (to start with) is to leverage the recent work that has gone in in the Hadoop 
 core in this area.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6504) Adding GC details prevents HBase from starting in non-distributed mode


[ 
https://issues.apache.org/jira/browse/HBASE-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455207#comment-13455207
 ] 

Lars Hofhansl commented on HBASE-6504:
--

should this be head -n 1? head -1 works, but it is not document that way.


 Adding GC details prevents HBase from starting in non-distributed mode
 --

 Key: HBASE-6504
 URL: https://issues.apache.org/jira/browse/HBASE-6504
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Benoit Sigoure
Assignee: Michael Drzal
Priority: Trivial
  Labels: noob
 Attachments: HBASE-6504-output.txt, HBASE-6504.patch


 The {{conf/hbase-env.sh}} that ships with HBase contains a few commented out 
 examples of variables that could be useful, such as adding 
 {{-XX:+PrintGCDetails -XX:+PrintGCDateStamps}} to {{HBASE_OPTS}}.  This has 
 the annoying side effect that the JVM prints a summary of memory usage when 
 it exits, and it does so on stdout:
 {code}
 $ ./bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool 
 hbase.cluster.distributed
 false
 Heap
  par new generation   total 19136K, used 4908K [0x00073a20, 
 0x00073b6c, 0x00075186)
   eden space 17024K,  28% used [0x00073a20, 0x00073a6cb0a8, 
 0x00073b2a)
   from space 2112K,   0% used [0x00073b2a, 0x00073b2a, 
 0x00073b4b)
   to   space 2112K,   0% used [0x00073b4b, 0x00073b4b, 
 0x00073b6c)
  concurrent mark-sweep generation total 63872K, used 0K [0x00075186, 
 0x0007556c, 0x0007f5a0)
  concurrent-mark-sweep perm gen total 21248K, used 6994K [0x0007f5a0, 
 0x0007f6ec, 0x0008)
 $ ./bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool 
 hbase.cluster.distributed /dev/null
 (nothing printed)
 {code}
 And this confuses {{bin/start-hbase.sh}} when it does
 {{distMode=`$bin/hbase --config $HBASE_CONF_DIR 
 org.apache.hadoop.hbase.util.HBaseConfTool hbase.cluster.distributed`}}, 
 because then the {{distMode}} variable is not just set to {{false}}, it also 
 contains all this JVM spam.
 If you don't pay enough attention and realize that 3 processes are getting 
 started (ZK, HM, RS) instead of just one (HM), then you end up with this 
 confusing error message:
 {{Could not start ZK at requested port of 2181.  ZK was started at port: 
 2182.  Aborting as clients (e.g. shell) will not be able to find this ZK 
 quorum.}}, which is even more puzzling because when you run {{netstat}} to 
 see who owns that port, then you won't find any rogue process other than the 
 one you just started.
 I'm wondering if the fix is not to just change the {{if [ $distMode == 
 'false' ]}} to a {{switch $distMode case (false*)}} type of test, to work 
 around this annoying JVM misfeature that pollutes stdout.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6282) The introspection, etc. of objects in the RPC has to be handled for PB objects

2012-09-13 Thread Devaraj Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HBASE-6282:
---

Issue Type: Sub-task  (was: Bug)
Parent: HBASE-5305

 The introspection, etc. of objects in the RPC has to be handled for PB objects
 --

 Key: HBASE-6282
 URL: https://issues.apache.org/jira/browse/HBASE-6282
 Project: HBase
  Issue Type: Sub-task
  Components: ipc
Reporter: Devaraj Das
Priority: Blocker
 Fix For: 0.96.0


 The places where the type of objects are inspected need to be updated to take 
 into consideration PB types. I have noticed Objects.describeQuantity being 
 used, and the private WritableRpcEngine.Server.logResponse method also needs 
 updating (in the PB world, all information about operations/tablenames is 
 contained in one PB argument).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6414) Remove the WritableRpcEngine associated Invocation classes

2012-09-13 Thread Devaraj Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HBASE-6414:
---

Issue Type: Sub-task  (was: Improvement)
Parent: HBASE-5305

 Remove the WritableRpcEngine  associated Invocation classes
 

 Key: HBASE-6414
 URL: https://issues.apache.org/jira/browse/HBASE-6414
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 0.96.0
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0

 Attachments: 6414-1.patch.txt, 6414-3.patch.txt, 6414-4.patch.txt, 
 6414-4.patch.txt, 6414-5.patch.txt, 6414-5.patch.txt, 6414-5.patch.txt, 
 6414-6.patch.txt, 6414-6.patch.txt, 6414-6.txt, 6414-initial.patch.txt, 
 6414-initial.patch.txt, 6414-v7.txt


 Remove the WritableRpcEngine  Invocation classes once HBASE-5705 gets 
 committed and all the protocols are rebased to use PB.
 Raising this jira in advance..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6591) checkAndPut executed/not metrics


[ 
https://issues.apache.org/jira/browse/HBASE-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455210#comment-13455210
 ] 

Gregory Chanan commented on HBASE-6591:
---

Any thoughts on the use case Michael/Lars?

Worth doing?  Or push people to actually log the results in the client?  If 
it's worthwhile, what granularity?

 checkAndPut executed/not metrics
 

 Key: HBASE-6591
 URL: https://issues.apache.org/jira/browse/HBASE-6591
 Project: HBase
  Issue Type: Task
  Components: metrics, regionserver
Reporter: Gregory Chanan
Assignee: Gregory Chanan
Priority: Minor
 Fix For: 0.96.0


 checkAndPut/checkAndDelete return true if the new put was executed, false 
 otherwise.
 So clients can figure out this metric for themselves, but it would be useful 
 to get a look at what is happening on the cluster as a whole, across all 
 clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6504) Adding GC details prevents HBase from starting in non-distributed mode


[ 
https://issues.apache.org/jira/browse/HBASE-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455211#comment-13455211
 ] 

Michael Drzal commented on HBASE-6504:
--

I always use the old style syntax since it is more portable.  From the 
coreutils info page:

   For compatibility `head' also supports an obsolete option syntax
`-COUNTOPTIONS', which is recognized only if it is specified first.
COUNT is a decimal number optionally followed by a size letter (`b',
`k', `m') as in `-c', or `l' to mean count by lines, or other option
letters (`cqv').  Scripts intended for standard hosts should use `-c
COUNT' or `-n COUNT' instead.  If your script must also run on hosts
that support only the obsolete syntax, it is usually simpler to avoid
`head', e.g., by using `sed 5q' instead of `head -5'.

I can change it to head -n 1 if you would like.

 Adding GC details prevents HBase from starting in non-distributed mode
 --

 Key: HBASE-6504
 URL: https://issues.apache.org/jira/browse/HBASE-6504
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Benoit Sigoure
Assignee: Michael Drzal
Priority: Trivial
  Labels: noob
 Attachments: HBASE-6504-output.txt, HBASE-6504.patch


 The {{conf/hbase-env.sh}} that ships with HBase contains a few commented out 
 examples of variables that could be useful, such as adding 
 {{-XX:+PrintGCDetails -XX:+PrintGCDateStamps}} to {{HBASE_OPTS}}.  This has 
 the annoying side effect that the JVM prints a summary of memory usage when 
 it exits, and it does so on stdout:
 {code}
 $ ./bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool 
 hbase.cluster.distributed
 false
 Heap
  par new generation   total 19136K, used 4908K [0x00073a20, 
 0x00073b6c, 0x00075186)
   eden space 17024K,  28% used [0x00073a20, 0x00073a6cb0a8, 
 0x00073b2a)
   from space 2112K,   0% used [0x00073b2a, 0x00073b2a, 
 0x00073b4b)
   to   space 2112K,   0% used [0x00073b4b, 0x00073b4b, 
 0x00073b6c)
  concurrent mark-sweep generation total 63872K, used 0K [0x00075186, 
 0x0007556c, 0x0007f5a0)
  concurrent-mark-sweep perm gen total 21248K, used 6994K [0x0007f5a0, 
 0x0007f6ec, 0x0008)
 $ ./bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool 
 hbase.cluster.distributed /dev/null
 (nothing printed)
 {code}
 And this confuses {{bin/start-hbase.sh}} when it does
 {{distMode=`$bin/hbase --config $HBASE_CONF_DIR 
 org.apache.hadoop.hbase.util.HBaseConfTool hbase.cluster.distributed`}}, 
 because then the {{distMode}} variable is not just set to {{false}}, it also 
 contains all this JVM spam.
 If you don't pay enough attention and realize that 3 processes are getting 
 started (ZK, HM, RS) instead of just one (HM), then you end up with this 
 confusing error message:
 {{Could not start ZK at requested port of 2181.  ZK was started at port: 
 2182.  Aborting as clients (e.g. shell) will not be able to find this ZK 
 quorum.}}, which is even more puzzling because when you run {{netstat}} to 
 see who owns that port, then you won't find any rogue process other than the 
 one you just started.
 I'm wondering if the fix is not to just change the {{if [ $distMode == 
 'false' ]}} to a {{switch $distMode case (false*)}} type of test, to work 
 around this annoying JVM misfeature that pollutes stdout.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6591) checkAndPut executed/not metrics