[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields

2012-09-21 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460253#comment-13460253
 ] 

Lars Hofhansl commented on HBASE-6852:
--

Interesting. Thanks Cheng. I wonder what causes the performance problem then. 
Is it the get/putIfAbsent of the ConcurrentMap we store the metrics in?

I'd probably feel better if you set the threshold to 100 (instead of 2000) - 
you'd still reduce the time used there by 99%.

Also looking at the places where updateOnCacheHit is called... We also 
increment an AtomicLong (cacheHits), which is never read (WTF). We should 
remove that counter while we're at it (even when AtomicLongs are not the 
problem).


 SchemaMetrics.updateOnCacheHit costs too much while full scanning a table 
 with all of its fields
 

 Key: HBASE-6852
 URL: https://issues.apache.org/jira/browse/HBASE-6852
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.94.0
Reporter: Cheng Hao
Priority: Minor
  Labels: performance
 Fix For: 0.94.2, 0.96.0

 Attachments: onhitcache-trunk.patch


 The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full 
 table scanning.
 Here is the top 5 hotspots within regionserver while full scanning a table: 
 (Sorry for the less-well-format)
 CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated)
 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit 
 mask of 0x00 (No unit mask) count 500
 samples  %image name   symbol name
 ---
 9844713.4324  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean)
   98447100.000  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean) [self]
 ---
 45814 6.2510  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int)
   45814100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int) [self]
 ---
 43523 5.9384  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
   43523100.000  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
  [self]
 ---
 42548 5.8054  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int)
   42548100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int) [self]
 ---
 40572 5.5358  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1
   40572100.000  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6841) Meta prefetching is slower than doing multiple meta lookups

2012-09-21 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460254#comment-13460254
 ] 

Lars Hofhansl commented on HBASE-6841:
--

Haven't been able to track down that test failure, yet. It shouldn't happen, 
but yet somehow it does.
@J-D: Since this is (presumably) a long standing condition, how do you feel 
about moving this to 0.94.3?

 Meta prefetching is slower than doing multiple meta lookups
 ---

 Key: HBASE-6841
 URL: https://issues.apache.org/jira/browse/HBASE-6841
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.2

 Attachments: 6841-0.94.txt, 6841-0.96.txt


 I got myself into a situation where I needed to truncate a massive table 
 while it was getting hits and surprisingly the clients were not recovering. 
 What I see in the logs is that every time we prefetch .META. we setup a new 
 HConnection because we close it on the way out. It's awfully slow.
 We should just turn it off or make it useful. jstacks coming up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6806) HBASE-4658 breaks backward compatibility / example scripts

2012-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460258#comment-13460258
 ] 

Hudson commented on HBASE-6806:
---

Integrated in HBase-TRUNK #3363 (See 
[https://builds.apache.org/job/HBase-TRUNK/3363/])
HBASE-6806 HBASE-4658 breaks backward compatibility / example scripts 
(Revision 1388318)

 Result = FAILURE
stack : 
Files : 
* /hbase/trunk/examples/thrift/DemoClient.cpp
* /hbase/trunk/examples/thrift/DemoClient.java
* /hbase/trunk/examples/thrift/DemoClient.php
* /hbase/trunk/examples/thrift/DemoClient.pl
* /hbase/trunk/examples/thrift/DemoClient.py
* /hbase/trunk/examples/thrift/DemoClient.rb
* /hbase/trunk/examples/thrift/Makefile


 HBASE-4658 breaks backward compatibility / example scripts
 --

 Key: HBASE-6806
 URL: https://issues.apache.org/jira/browse/HBASE-6806
 Project: HBase
  Issue Type: Bug
  Components: Thrift
Affects Versions: 0.94.0
Reporter: Lukas
 Fix For: 0.96.0

 Attachments: HBASE-6806-fix-examples.diff


 HBASE-4658 introduces the new 'attributes' argument as a non optional 
 parameter. This is not backward compatible and also breaks the code in the 
 example section. Resolution: Mark as 'optional'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-4658) Put attributes are not exposed via the ThriftServer

2012-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460259#comment-13460259
 ] 

Hudson commented on HBASE-4658:
---

Integrated in HBase-TRUNK #3363 (See 
[https://builds.apache.org/job/HBase-TRUNK/3363/])
HBASE-6806 HBASE-4658 breaks backward compatibility / example scripts 
(Revision 1388318)

 Result = FAILURE
stack : 
Files : 
* /hbase/trunk/examples/thrift/DemoClient.cpp
* /hbase/trunk/examples/thrift/DemoClient.java
* /hbase/trunk/examples/thrift/DemoClient.php
* /hbase/trunk/examples/thrift/DemoClient.pl
* /hbase/trunk/examples/thrift/DemoClient.py
* /hbase/trunk/examples/thrift/DemoClient.rb
* /hbase/trunk/examples/thrift/Makefile


 Put attributes are not exposed via the ThriftServer
 ---

 Key: HBASE-4658
 URL: https://issues.apache.org/jira/browse/HBASE-4658
 Project: HBase
  Issue Type: Bug
  Components: Thrift
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.94.0

 Attachments: ASF.LICENSE.NOT.GRANTED--D1563.1.patch, 
 ASF.LICENSE.NOT.GRANTED--D1563.1.patch, 
 ASF.LICENSE.NOT.GRANTED--D1563.1.patch, 
 ASF.LICENSE.NOT.GRANTED--D1563.2.patch, 
 ASF.LICENSE.NOT.GRANTED--D1563.2.patch, 
 ASF.LICENSE.NOT.GRANTED--D1563.2.patch, 
 ASF.LICENSE.NOT.GRANTED--D1563.3.patch, 
 ASF.LICENSE.NOT.GRANTED--D1563.3.patch, 
 ASF.LICENSE.NOT.GRANTED--D1563.3.patch, ThriftPutAttributes1.txt


 The Put api also takes in a bunch of arbitrary attributes that an application 
 can use to associate metadata with each put operation. This is not exposed 
 via Thrift.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6524) Hooks for hbase tracing

2012-09-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460265#comment-13460265
 ] 

stack commented on HBASE-6524:
--

Committed the doc. as appendix I in the manual.  Will show next time I push the 
doc.  Thanks Jonathan.

 Hooks for hbase tracing
 ---

 Key: HBASE-6524
 URL: https://issues.apache.org/jira/browse/HBASE-6524
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Leavitt
Assignee: Jonathan Leavitt
 Fix For: 0.96.0

 Attachments: 6524.addendum, 6524-v2.txt, 6524v3.txt, 
 createTableTrace.png, hbase-6524.diff


 Includes the hooks that use [htrace|http://www.github.com/cloudera/htrace] 
 library to add dapper-like tracing to hbase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.

2012-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460266#comment-13460266
 ] 

Hadoop QA commented on HBASE-6299:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12546000/6299v4.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

-1 javadoc.  The javadoc tool appears to have generated 139 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 7 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2912//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2912//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2912//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2912//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2912//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2912//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2912//console

This message is automatically generated.

 RS starts region open while fails ack to HMaster.sendRegionOpen() causes 
 inconsistency in HMaster's region state and a series of successive problems.
 -

 Key: HBASE-6299
 URL: https://issues.apache.org/jira/browse/HBASE-6299
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Fix For: 0.92.3, 0.94.3, 0.96.0

 Attachments: 6299v4.txt, HBASE-6299.patch, HBASE-6299-v2.patch, 
 HBASE-6299-v3.patch


 1. HMaster tries to assign a region to an RS.
 2. HMaster creates a RegionState for this region and puts it into 
 regionsInTransition.
 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS 
 receives the open region request and starts to proceed, with success 
 eventually. However, due to network problems, HMaster fails to receive the 
 response for the openRegion() call, and the call times out.
 4. HMaster attemps to assign for a second time, choosing another RS. 
 5. But since the HMaster's OpenedRegionHandler has been triggered by the 
 region open of the previous RS, and the RegionState has already been removed 
 from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK 
 node RS_ZK_REGION_OPENING updated by the second attempt.
 6. The unassigned ZK node stays and a later unassign fails coz 
 RS_ZK_REGION_CLOSING cannot be created.
 {code}
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.;
  
 plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
  src=swbss-hadoop-004,60020,1340890123243, 
 dest=swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:28,882 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 

[jira] [Commented] (HBASE-4658) Put attributes are not exposed via the ThriftServer

2012-09-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460269#comment-13460269
 ] 

stack commented on HBASE-4658:
--

The above comment from hudson is in wrong place.  The parse found the second 
hbase jira referenced which is this one rather than HBASE-6806.

 Put attributes are not exposed via the ThriftServer
 ---

 Key: HBASE-4658
 URL: https://issues.apache.org/jira/browse/HBASE-4658
 Project: HBase
  Issue Type: Bug
  Components: Thrift
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.94.0

 Attachments: ASF.LICENSE.NOT.GRANTED--D1563.1.patch, 
 ASF.LICENSE.NOT.GRANTED--D1563.1.patch, 
 ASF.LICENSE.NOT.GRANTED--D1563.1.patch, 
 ASF.LICENSE.NOT.GRANTED--D1563.2.patch, 
 ASF.LICENSE.NOT.GRANTED--D1563.2.patch, 
 ASF.LICENSE.NOT.GRANTED--D1563.2.patch, 
 ASF.LICENSE.NOT.GRANTED--D1563.3.patch, 
 ASF.LICENSE.NOT.GRANTED--D1563.3.patch, 
 ASF.LICENSE.NOT.GRANTED--D1563.3.patch, ThriftPutAttributes1.txt


 The Put api also takes in a bunch of arbitrary attributes that an application 
 can use to associate metadata with each put operation. This is not exposed 
 via Thrift.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6806) HBASE-4658 breaks backward compatibility / example scripts

2012-09-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460271#comment-13460271
 ] 

stack commented on HBASE-6806:
--

Hmm... it puts the commit in all issues referenced by the commit message, here 
and HBASE-4658

 HBASE-4658 breaks backward compatibility / example scripts
 --

 Key: HBASE-6806
 URL: https://issues.apache.org/jira/browse/HBASE-6806
 Project: HBase
  Issue Type: Bug
  Components: Thrift
Affects Versions: 0.94.0
Reporter: Lukas
 Fix For: 0.96.0

 Attachments: HBASE-6806-fix-examples.diff


 HBASE-4658 introduces the new 'attributes' argument as a non optional 
 parameter. This is not backward compatible and also breaks the code in the 
 example section. Resolution: Mark as 'optional'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields

2012-09-21 Thread liang xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460275#comment-13460275
 ] 

liang xie commented on HBASE-6852:
--

Hi Cheng, for running time, could you exclude the system resouce factor ?  e.g. 
you ran the original version with many physical IOs, but reran the patched 
version without similar physical IO requests due to hitting OS page cache.  
In other words, could the reduced running time symptom be reproduced always, 
even you run patched version first, then rerun the original version ?  It'd 
better if you can issue echo 1  /proc/sys/vm/drop_caches to free pagecache 
between each test.

 SchemaMetrics.updateOnCacheHit costs too much while full scanning a table 
 with all of its fields
 

 Key: HBASE-6852
 URL: https://issues.apache.org/jira/browse/HBASE-6852
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.94.0
Reporter: Cheng Hao
Priority: Minor
  Labels: performance
 Fix For: 0.94.2, 0.96.0

 Attachments: onhitcache-trunk.patch


 The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full 
 table scanning.
 Here is the top 5 hotspots within regionserver while full scanning a table: 
 (Sorry for the less-well-format)
 CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated)
 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit 
 mask of 0x00 (No unit mask) count 500
 samples  %image name   symbol name
 ---
 9844713.4324  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean)
   98447100.000  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean) [self]
 ---
 45814 6.2510  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int)
   45814100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int) [self]
 ---
 43523 5.9384  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
   43523100.000  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
  [self]
 ---
 42548 5.8054  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int)
   42548100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int) [self]
 ---
 40572 5.5358  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1
   40572100.000  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields

2012-09-21 Thread Cheng Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460280#comment-13460280
 ] 

Cheng Hao commented on HBASE-6852:
--

Lars, the only place to use the ConcurentMap in SchemaMetrics is 
tableAndFamilyToMetrics. in this patch, I pre-create an array of AtomicLong for 
all of the possible oncachehit metrics items, which will avoids the concurrent 
issue and easy to be indexed while accessing.

Thanks stack and Lars for the suggestions, I will create another patch file 
instead.

 SchemaMetrics.updateOnCacheHit costs too much while full scanning a table 
 with all of its fields
 

 Key: HBASE-6852
 URL: https://issues.apache.org/jira/browse/HBASE-6852
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.94.0
Reporter: Cheng Hao
Priority: Minor
  Labels: performance
 Fix For: 0.94.2, 0.96.0

 Attachments: onhitcache-trunk.patch


 The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full 
 table scanning.
 Here is the top 5 hotspots within regionserver while full scanning a table: 
 (Sorry for the less-well-format)
 CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated)
 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit 
 mask of 0x00 (No unit mask) count 500
 samples  %image name   symbol name
 ---
 9844713.4324  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean)
   98447100.000  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean) [self]
 ---
 45814 6.2510  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int)
   45814100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int) [self]
 ---
 43523 5.9384  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
   43523100.000  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
  [self]
 ---
 42548 5.8054  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int)
   42548100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int) [self]
 ---
 40572 5.5358  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1
   40572100.000  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields

2012-09-21 Thread Cheng Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460297#comment-13460297
 ] 

Cheng Hao commented on HBASE-6852:
--

Hi Liang, it's really good suggestion. Actually I didn't free the pagecache of 
OS before each launch. But I can try that later.

In my tests, the table data was about 600GB within 4 machines, I guess the 
system cache may not impact the entire performance so much for a full table 
scanning.

 SchemaMetrics.updateOnCacheHit costs too much while full scanning a table 
 with all of its fields
 

 Key: HBASE-6852
 URL: https://issues.apache.org/jira/browse/HBASE-6852
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.94.0
Reporter: Cheng Hao
Priority: Minor
  Labels: performance
 Fix For: 0.94.2, 0.96.0

 Attachments: onhitcache-trunk.patch


 The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full 
 table scanning.
 Here is the top 5 hotspots within regionserver while full scanning a table: 
 (Sorry for the less-well-format)
 CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated)
 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit 
 mask of 0x00 (No unit mask) count 500
 samples  %image name   symbol name
 ---
 9844713.4324  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean)
   98447100.000  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean) [self]
 ---
 45814 6.2510  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int)
   45814100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int) [self]
 ---
 43523 5.9384  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
   43523100.000  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
  [self]
 ---
 42548 5.8054  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int)
   42548100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int) [self]
 ---
 40572 5.5358  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1
   40572100.000  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields

2012-09-21 Thread Cheng Hao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Hao updated HBASE-6852:
-

Attachment: onhitcache-trunk.patch

change the THRESHOLD_METRICS_FLUSH from 2000 to 100, per Lars' suggestion

 SchemaMetrics.updateOnCacheHit costs too much while full scanning a table 
 with all of its fields
 

 Key: HBASE-6852
 URL: https://issues.apache.org/jira/browse/HBASE-6852
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.94.0
Reporter: Cheng Hao
Priority: Minor
  Labels: performance
 Fix For: 0.94.2, 0.96.0

 Attachments: onhitcache-trunk.patch


 The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full 
 table scanning.
 Here is the top 5 hotspots within regionserver while full scanning a table: 
 (Sorry for the less-well-format)
 CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated)
 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit 
 mask of 0x00 (No unit mask) count 500
 samples  %image name   symbol name
 ---
 9844713.4324  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean)
   98447100.000  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean) [self]
 ---
 45814 6.2510  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int)
   45814100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int) [self]
 ---
 43523 5.9384  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
   43523100.000  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
  [self]
 ---
 42548 5.8054  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int)
   42548100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int) [self]
 ---
 40572 5.5358  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1
   40572100.000  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields

2012-09-21 Thread Cheng Hao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Hao updated HBASE-6852:
-

Attachment: (was: onhitcache-trunk.patch)

 SchemaMetrics.updateOnCacheHit costs too much while full scanning a table 
 with all of its fields
 

 Key: HBASE-6852
 URL: https://issues.apache.org/jira/browse/HBASE-6852
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.94.0
Reporter: Cheng Hao
Priority: Minor
  Labels: performance
 Fix For: 0.94.2, 0.96.0

 Attachments: onhitcache-trunk.patch


 The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full 
 table scanning.
 Here is the top 5 hotspots within regionserver while full scanning a table: 
 (Sorry for the less-well-format)
 CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated)
 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit 
 mask of 0x00 (No unit mask) count 500
 samples  %image name   symbol name
 ---
 9844713.4324  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean)
   98447100.000  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean) [self]
 ---
 45814 6.2510  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int)
   45814100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int) [self]
 ---
 43523 5.9384  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
   43523100.000  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
  [self]
 ---
 42548 5.8054  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int)
   42548100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int) [self]
 ---
 40572 5.5358  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1
   40572100.000  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6381) AssignmentManager should use the same logic for clean startup and failover

2012-09-21 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460315#comment-13460315
 ] 

ramkrishna.s.vasudevan commented on HBASE-6381:
---

@Jimmy
The fix for Rajesh's comment seems valid.  I only have 2 questions
-Will these changes solve HBASE-6228 or do we still need to add some 
synchronization while fixupdaughters?
-In GeneralBulkAssigner
{code}
  while (regionInfoIterator.hasNext()) {
HRegionInfo hri = regionInfoIterator.next();
RegionState state = regionStates.getRegionState(hri);
if ((!regionStates.isRegionInTransition(hri)  
regionStates.isRegionAssigned(hri))
|| state.isSplit() || state.isSplitting()) {
  regionInfoIterator.remove();
{code}
This removal from regionInfoIterator may not be needed.  Anyway SSH is handling 
this case.  And also as part of HBASE-6317 EnableTableHandler will handle RIT 
regions and already assigned region.
In CreateTable this problem should not happen.  So we can remove this piece of 
code from GeneralBulkAssigner? what you feel?


Other than that I am +1.  The ZKTable change can be done in a new JIRA as you 
said. 

 AssignmentManager should use the same logic for clean startup and failover
 --

 Key: HBASE-6381
 URL: https://issues.apache.org/jira/browse/HBASE-6381
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: hbase-6381-notes.pdf, hbase-6381.pdf, 
 trunk-6381_v5.patch, trunk-6381_v7.patch, trunk-6381_v8.patch


 Currently AssignmentManager handles clean startup and failover very 
 differently.
 Different logic is mingled together so it is hard to find out which is for 
 which.
 We should clean it up and share the same logic so that AssignmentManager 
 handles
 both cases the same way.  This way, the code will much easier to understand 
 and
 maintain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields

2012-09-21 Thread Cheng Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460316#comment-13460316
 ] 

Cheng Hao commented on HBASE-6852:
--

I didn't remove the cacheHits in the HFileReaderV1  V2, hope it's a good 
start to design a less overhead metrics framework.

 SchemaMetrics.updateOnCacheHit costs too much while full scanning a table 
 with all of its fields
 

 Key: HBASE-6852
 URL: https://issues.apache.org/jira/browse/HBASE-6852
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.94.0
Reporter: Cheng Hao
Priority: Minor
  Labels: performance
 Fix For: 0.94.2, 0.96.0

 Attachments: onhitcache-trunk.patch


 The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full 
 table scanning.
 Here is the top 5 hotspots within regionserver while full scanning a table: 
 (Sorry for the less-well-format)
 CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated)
 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit 
 mask of 0x00 (No unit mask) count 500
 samples  %image name   symbol name
 ---
 9844713.4324  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean)
   98447100.000  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean) [self]
 ---
 45814 6.2510  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int)
   45814100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int) [self]
 ---
 43523 5.9384  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
   43523100.000  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
  [self]
 ---
 42548 5.8054  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int)
   42548100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int) [self]
 ---
 40572 5.5358  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1
   40572100.000  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields

2012-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460318#comment-13460318
 ] 

Hadoop QA commented on HBASE-6852:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12546009/onhitcache-trunk.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2913//console

This message is automatically generated.

 SchemaMetrics.updateOnCacheHit costs too much while full scanning a table 
 with all of its fields
 

 Key: HBASE-6852
 URL: https://issues.apache.org/jira/browse/HBASE-6852
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.94.0
Reporter: Cheng Hao
Priority: Minor
  Labels: performance
 Fix For: 0.94.2, 0.96.0

 Attachments: onhitcache-trunk.patch


 The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full 
 table scanning.
 Here is the top 5 hotspots within regionserver while full scanning a table: 
 (Sorry for the less-well-format)
 CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated)
 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit 
 mask of 0x00 (No unit mask) count 500
 samples  %image name   symbol name
 ---
 9844713.4324  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean)
   98447100.000  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean) [self]
 ---
 45814 6.2510  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int)
   45814100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int) [self]
 ---
 43523 5.9384  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
   43523100.000  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
  [self]
 ---
 42548 5.8054  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int)
   42548100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int) [self]
 ---
 40572 5.5358  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1
   40572100.000  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6491) add limit function at ClientScanner

2012-09-21 Thread ronghai.ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460324#comment-13460324
 ] 

ronghai.ma commented on HBASE-6491:
---

[~jeason]
Scenario : scaning more than one region.

 add limit function at ClientScanner
 ---

 Key: HBASE-6491
 URL: https://issues.apache.org/jira/browse/HBASE-6491
 Project: HBase
  Issue Type: New Feature
  Components: Client
Affects Versions: 0.96.0
Reporter: ronghai.ma
Assignee: ronghai.ma
  Labels: patch
 Fix For: 0.96.0

 Attachments: ClientScanner.java, HBASE-6491.patch


 Add a new method in ClientScanner to implement a function like LIMIT in MySQL.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6524) Hooks for hbase tracing

2012-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460326#comment-13460326
 ] 

Hudson commented on HBASE-6524:
---

Integrated in HBase-TRUNK #3364 (See 
[https://builds.apache.org/job/HBase-TRUNK/3364/])
HBASE-6524 Hooks for hbase tracing; add documentation as an appendix 
(Revision 1388337)

 Result = FAILURE
stack : 
Files : 
* /hbase/trunk/src/docbkx/book.xml


 Hooks for hbase tracing
 ---

 Key: HBASE-6524
 URL: https://issues.apache.org/jira/browse/HBASE-6524
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Leavitt
Assignee: Jonathan Leavitt
 Fix For: 0.96.0

 Attachments: 6524.addendum, 6524-v2.txt, 6524v3.txt, 
 createTableTrace.png, hbase-6524.diff


 Includes the hooks that use [htrace|http://www.github.com/cloudera/htrace] 
 library to add dapper-like tracing to hbase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6783) Make read short circuit the default

2012-09-21 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460357#comment-13460357
 ] 

nkeywal commented on HBASE-6783:


I checked, I didn't reproduced the error found on hadoop-qa locally.

Committed revision 1388374.

 Make read short circuit the default
 ---

 Key: HBASE-6783
 URL: https://issues.apache.org/jira/browse/HBASE-6783
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Fix For: 0.96.0

 Attachments: 6783.v2.patch, 6783.v2.patch, HBASE-6783.v1.patch


 Per mailing discussion, read short circuit has little or no drawback, hence 
 should used by default. As a consequence, we activate it on the default tests.
 It's possible to launch the test with -Ddfs.client.read.shortcircuit=false to 
 execute the tests without the shortcircuit, it will be used for some builds 
 on trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.

2012-09-21 Thread Igal Shilman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igal Shilman updated HBASE-6071:


Attachment: HBASE-6071.v4.patch

Changing debug level to info.

 getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
 

 Key: HBASE-6071
 URL: https://issues.apache.org/jira/browse/HBASE-6071
 Project: HBase
  Issue Type: Improvement
  Components: Client, IPC/RPC
Affects Versions: 0.92.0, 0.94.0
Reporter: Igal Shilman
Priority: Minor
  Labels: client, ipc
 Attachments: HBASE-6071.patch, HBASE-6071.v2.patch, 
 HBASE-6071.v3.patch, HBASE-6071.v4.patch, 
 HConnectionManager_HBASE-6071-0.90.0.patch, lease-exception.txt


 HConnectionImplementation.getRegionServerWithRetries might terminate w/ an 
 exception different then a DoNotRetryIOException, thus silently drops 
 exceptions from previous attempts.
 [~ted_yu] suggested 
 ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E])
  adding a log message inside the catch block describing the exception type 
 and details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6488) HBase wont run on IPv6 on OSes that use zone-indexes

2012-09-21 Thread ryan rawson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ryan rawson updated HBASE-6488:
---

Attachment: HBASE-6488-trunk.txt

here is thine trunk. stop renaming paths!

 HBase wont run on IPv6 on OSes that use zone-indexes
 

 Key: HBASE-6488
 URL: https://issues.apache.org/jira/browse/HBASE-6488
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ryan rawson
 Attachments: HBASE-6488-trunk.txt, HBASE-6488.txt


 In IPv6, an address may have a zone-index, which is specified with a percent, 
 eg: ...%0.  This looks like a format string, and thus in a part of the code 
 which uses the hostname as a prefix to another string which is interpreted 
 with String.format, you end up with an exception:
 2012-07-31 18:21:39,848 FATAL org.apache.hadoop.hbase.master.HMaster:
 Unhandled exception. Starting shutdown.
 java.util.UnknownFormatConversionException: Conversion = '0'
 at java.util.Formatter.checkText(Formatter.java:2503)
 at java.util.Formatter.parse(Formatter.java:2467)
 at java.util.Formatter.format(Formatter.java:2414)
 at java.util.Formatter.format(Formatter.java:2367)
 at java.lang.String.format(String.java:2769)
 at 
 com.google.common.util.concurrent.ThreadFactoryBuilder.setNameFormat(ThreadFactoryBuilder.java:68)
 at 
 org.apache.hadoop.hbase.executor.ExecutorService$Executor.init(ExecutorService.java:299)
 at 
 org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:185)
 at 
 org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:227)
 at 
 org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:821)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:507)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:344)
 at 
 org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:220)
 at java.lang.Thread.run(Thread.java:680)
 2012-07-31 18:21:39,908 INFO org.apache.hadoop.hbase.master.HMaster: Aborting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6783) Make read short circuit the default

2012-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460400#comment-13460400
 ] 

Hudson commented on HBASE-6783:
---

Integrated in HBase-TRUNK #3365 (See 
[https://builds.apache.org/job/HBase-TRUNK/3365/])
HBASE-6783  Make read short circuit the default (Revision 1388374)

 Result = FAILURE
nkeywal : 
Files : 
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java


 Make read short circuit the default
 ---

 Key: HBASE-6783
 URL: https://issues.apache.org/jira/browse/HBASE-6783
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Fix For: 0.96.0

 Attachments: 6783.v2.patch, 6783.v2.patch, HBASE-6783.v1.patch


 Per mailing discussion, read short circuit has little or no drawback, hence 
 should used by default. As a consequence, we activate it on the default tests.
 It's possible to launch the test with -Ddfs.client.read.shortcircuit=false to 
 execute the tests without the shortcircuit, it will be used for some builds 
 on trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-6853) IlegalArgument Exception is thrown when an empty region is spliitted.

2012-09-21 Thread ramkrishna.s.vasudevan (JIRA)
ramkrishna.s.vasudevan created HBASE-6853:
-

 Summary: IlegalArgument Exception is thrown when an empty region 
is spliitted.
 Key: HBASE-6853
 URL: https://issues.apache.org/jira/browse/HBASE-6853
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.1, 0.92.1
Reporter: ramkrishna.s.vasudevan


This is w.r.t a mail sent in the dev mail list.

Empty region split should be handled gracefully.  Either we should not allow 
the split to happen if we know that the region is empty or we should allow the 
split to happen by setting the no of threads to the thread pool executor as 1.
{code}
int nbFiles = hstoreFilesToSplit.size();
ThreadFactoryBuilder builder = new ThreadFactoryBuilder();
builder.setNameFormat(StoreFileSplitter-%1$d);
ThreadFactory factory = builder.build();
ThreadPoolExecutor threadPool =
  (ThreadPoolExecutor) Executors.newFixedThreadPool(nbFiles, factory);
ListFutureVoid futures = new ArrayListFutureVoid(nbFiles);

{code}
Here the nbFiles needs to be a non zero positive value.

 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6783) Make read short circuit the default

2012-09-21 Thread nkeywal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-6783:
---

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 Make read short circuit the default
 ---

 Key: HBASE-6783
 URL: https://issues.apache.org/jira/browse/HBASE-6783
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Fix For: 0.96.0

 Attachments: 6783.v2.patch, 6783.v2.patch, HBASE-6783.v1.patch


 Per mailing discussion, read short circuit has little or no drawback, hence 
 should used by default. As a consequence, we activate it on the default tests.
 It's possible to launch the test with -Ddfs.client.read.shortcircuit=false to 
 execute the tests without the shortcircuit, it will be used for some builds 
 on trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6783) Make read short circuit the default

2012-09-21 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460406#comment-13460406
 ] 

nkeywal commented on HBASE-6783:


I still need to make it non default for some specific builds mentioned in the 
mails with Andrew. Will do.

 Make read short circuit the default
 ---

 Key: HBASE-6783
 URL: https://issues.apache.org/jira/browse/HBASE-6783
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Fix For: 0.96.0

 Attachments: 6783.v2.patch, 6783.v2.patch, HBASE-6783.v1.patch


 Per mailing discussion, read short circuit has little or no drawback, hence 
 should used by default. As a consequence, we activate it on the default tests.
 It's possible to launch the test with -Ddfs.client.read.shortcircuit=false to 
 execute the tests without the shortcircuit, it will be used for some builds 
 on trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-6698) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation

2012-09-21 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-6698:
-

Assignee: Priyadarshini

 Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation
 --

 Key: HBASE-6698
 URL: https://issues.apache.org/jira/browse/HBASE-6698
 Project: HBase
  Issue Type: Improvement
Reporter: ramkrishna.s.vasudevan
Assignee: Priyadarshini
 Fix For: 0.96.0

 Attachments: HBASE-6698_1.patch, HBASE-6698_2.patch, 
 HBASE-6698_3.patch, HBASE-6698_5.patch, HBASE-6698_6.patch, 
 HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, 
 HBASE-6698_7.patch, HBASE-6698_8.patch, HBASE-6698_8.patch, 
 HBASE-6698_8.patch, HBASE-6698.patch


 Currently the checkAndPut and checkAndDelete api internally calls the 
 internalPut and internalDelete.  May be we can just call doMiniBatchMutation
 only.  This will help in future like if we have some hooks and the CP
 handles certain cases in the doMiniBatchMutation the same can be done while
 doing a put thro checkAndPut or while doing a delete thro checkAndDelete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.

2012-09-21 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460415#comment-13460415
 ] 

Harsh J commented on HBASE-6071:


Thanks for addressing my comment Igal, this looks good to me (am not a 
committer on HBase).

 getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
 

 Key: HBASE-6071
 URL: https://issues.apache.org/jira/browse/HBASE-6071
 Project: HBase
  Issue Type: Improvement
  Components: Client, IPC/RPC
Affects Versions: 0.92.0, 0.94.0
Reporter: Igal Shilman
Priority: Minor
  Labels: client, ipc
 Attachments: HBASE-6071.patch, HBASE-6071.v2.patch, 
 HBASE-6071.v3.patch, HBASE-6071.v4.patch, 
 HConnectionManager_HBASE-6071-0.90.0.patch, lease-exception.txt


 HConnectionImplementation.getRegionServerWithRetries might terminate w/ an 
 exception different then a DoNotRetryIOException, thus silently drops 
 exceptions from previous attempts.
 [~ted_yu] suggested 
 ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E])
  adding a log message inside the catch block describing the exception type 
 and details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6488) HBase wont run on IPv6 on OSes that use zone-indexes

2012-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460416#comment-13460416
 ] 

Hadoop QA commented on HBASE-6488:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12546020/HBASE-6488-trunk.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

-1 javadoc.  The javadoc tool appears to have generated 139 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 7 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2914//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2914//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2914//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2914//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2914//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2914//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2914//console

This message is automatically generated.

 HBase wont run on IPv6 on OSes that use zone-indexes
 

 Key: HBASE-6488
 URL: https://issues.apache.org/jira/browse/HBASE-6488
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ryan rawson
 Attachments: HBASE-6488-trunk.txt, HBASE-6488.txt


 In IPv6, an address may have a zone-index, which is specified with a percent, 
 eg: ...%0.  This looks like a format string, and thus in a part of the code 
 which uses the hostname as a prefix to another string which is interpreted 
 with String.format, you end up with an exception:
 2012-07-31 18:21:39,848 FATAL org.apache.hadoop.hbase.master.HMaster:
 Unhandled exception. Starting shutdown.
 java.util.UnknownFormatConversionException: Conversion = '0'
 at java.util.Formatter.checkText(Formatter.java:2503)
 at java.util.Formatter.parse(Formatter.java:2467)
 at java.util.Formatter.format(Formatter.java:2414)
 at java.util.Formatter.format(Formatter.java:2367)
 at java.lang.String.format(String.java:2769)
 at 
 com.google.common.util.concurrent.ThreadFactoryBuilder.setNameFormat(ThreadFactoryBuilder.java:68)
 at 
 org.apache.hadoop.hbase.executor.ExecutorService$Executor.init(ExecutorService.java:299)
 at 
 org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:185)
 at 
 org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:227)
 at 
 org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:821)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:507)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:344)
 at 
 org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:220)
 at java.lang.Thread.run(Thread.java:680)
 2012-07-31 18:21:39,908 INFO org.apache.hadoop.hbase.master.HMaster: Aborting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.

2012-09-21 Thread Igal Shilman (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460429#comment-13460429
 ] 

Igal Shilman commented on HBASE-6071:
-

[~qwertymaniac] n/p.   

 getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
 

 Key: HBASE-6071
 URL: https://issues.apache.org/jira/browse/HBASE-6071
 Project: HBase
  Issue Type: Improvement
  Components: Client, IPC/RPC
Affects Versions: 0.92.0, 0.94.0
Reporter: Igal Shilman
Priority: Minor
  Labels: client, ipc
 Attachments: HBASE-6071.patch, HBASE-6071.v2.patch, 
 HBASE-6071.v3.patch, HBASE-6071.v4.patch, 
 HConnectionManager_HBASE-6071-0.90.0.patch, lease-exception.txt


 HConnectionImplementation.getRegionServerWithRetries might terminate w/ an 
 exception different then a DoNotRetryIOException, thus silently drops 
 exceptions from previous attempts.
 [~ted_yu] suggested 
 ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E])
  adding a log message inside the catch block describing the exception type 
 and details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.

2012-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460433#comment-13460433
 ] 

Hadoop QA commented on HBASE-6071:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12546017/HBASE-6071.v4.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

-1 javadoc.  The javadoc tool appears to have generated 139 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 7 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestMultiParallel
  org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2915//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2915//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2915//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2915//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2915//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2915//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2915//console

This message is automatically generated.

 getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
 

 Key: HBASE-6071
 URL: https://issues.apache.org/jira/browse/HBASE-6071
 Project: HBase
  Issue Type: Improvement
  Components: Client, IPC/RPC
Affects Versions: 0.92.0, 0.94.0
Reporter: Igal Shilman
Priority: Minor
  Labels: client, ipc
 Attachments: HBASE-6071.patch, HBASE-6071.v2.patch, 
 HBASE-6071.v3.patch, HBASE-6071.v4.patch, 
 HConnectionManager_HBASE-6071-0.90.0.patch, lease-exception.txt


 HConnectionImplementation.getRegionServerWithRetries might terminate w/ an 
 exception different then a DoNotRetryIOException, thus silently drops 
 exceptions from previous attempts.
 [~ted_yu] suggested 
 ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E])
  adding a log message inside the catch block describing the exception type 
 and details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6410) Move RegionServer Metrics to metrics2

2012-09-21 Thread Alex Baranau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460434#comment-13460434
 ] 

Alex Baranau commented on HBASE-6410:
-

Oh, sorry, forgot to share some minor notes I took when looking at changes in 
common classes:

1) TestMasterMetricsSourceFactory:

{noformat}
//This should throw an exception because there is no compat lib on the 
class path.
CompatibilitySingletonFactory.getInstance(MasterMetricsSource.class);
{noformat}

This will throw an exception anyways, because now MasterMetricsSource*Factory* 
is used.

2)   MasterMetricsSourceImpl:

{noformat}
public void getMetrics(MetricsBuilder metricsBuilder, boolean all) {
  [...]
metricsRegistry.snapshot(metricsRecordBuilder, true);
}
{noformat}

Should be metricsRegistry.snapshot(metricsRecordBuilder, *all*) ?
Same in hadoop-compat2

 Move RegionServer Metrics to metrics2
 -

 Key: HBASE-6410
 URL: https://issues.apache.org/jira/browse/HBASE-6410
 Project: HBase
  Issue Type: Sub-task
  Components: metrics
Affects Versions: 0.96.0
Reporter: Elliott Clark
Assignee: Elliott Clark
Priority: Blocker
 Attachments: HBASE-6410-1.patch, HBASE-6410.patch


 Move RegionServer Metrics to metrics2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-4658) Put attributes are not exposed via the ThriftServer

2012-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460446#comment-13460446
 ] 

Hudson commented on HBASE-4658:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #185 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/185/])
HBASE-6806 HBASE-4658 breaks backward compatibility / example scripts 
(Revision 1388318)

 Result = FAILURE
stack : 
Files : 
* /hbase/trunk/examples/thrift/DemoClient.cpp
* /hbase/trunk/examples/thrift/DemoClient.java
* /hbase/trunk/examples/thrift/DemoClient.php
* /hbase/trunk/examples/thrift/DemoClient.pl
* /hbase/trunk/examples/thrift/DemoClient.py
* /hbase/trunk/examples/thrift/DemoClient.rb
* /hbase/trunk/examples/thrift/Makefile


 Put attributes are not exposed via the ThriftServer
 ---

 Key: HBASE-4658
 URL: https://issues.apache.org/jira/browse/HBASE-4658
 Project: HBase
  Issue Type: Bug
  Components: Thrift
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.94.0

 Attachments: ASF.LICENSE.NOT.GRANTED--D1563.1.patch, 
 ASF.LICENSE.NOT.GRANTED--D1563.1.patch, 
 ASF.LICENSE.NOT.GRANTED--D1563.1.patch, 
 ASF.LICENSE.NOT.GRANTED--D1563.2.patch, 
 ASF.LICENSE.NOT.GRANTED--D1563.2.patch, 
 ASF.LICENSE.NOT.GRANTED--D1563.2.patch, 
 ASF.LICENSE.NOT.GRANTED--D1563.3.patch, 
 ASF.LICENSE.NOT.GRANTED--D1563.3.patch, 
 ASF.LICENSE.NOT.GRANTED--D1563.3.patch, ThriftPutAttributes1.txt


 The Put api also takes in a bunch of arbitrary attributes that an application 
 can use to associate metadata with each put operation. This is not exposed 
 via Thrift.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6806) HBASE-4658 breaks backward compatibility / example scripts

2012-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460445#comment-13460445
 ] 

Hudson commented on HBASE-6806:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #185 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/185/])
HBASE-6806 HBASE-4658 breaks backward compatibility / example scripts 
(Revision 1388318)

 Result = FAILURE
stack : 
Files : 
* /hbase/trunk/examples/thrift/DemoClient.cpp
* /hbase/trunk/examples/thrift/DemoClient.java
* /hbase/trunk/examples/thrift/DemoClient.php
* /hbase/trunk/examples/thrift/DemoClient.pl
* /hbase/trunk/examples/thrift/DemoClient.py
* /hbase/trunk/examples/thrift/DemoClient.rb
* /hbase/trunk/examples/thrift/Makefile


 HBASE-4658 breaks backward compatibility / example scripts
 --

 Key: HBASE-6806
 URL: https://issues.apache.org/jira/browse/HBASE-6806
 Project: HBase
  Issue Type: Bug
  Components: Thrift
Affects Versions: 0.94.0
Reporter: Lukas
 Fix For: 0.96.0

 Attachments: HBASE-6806-fix-examples.diff


 HBASE-4658 introduces the new 'attributes' argument as a non optional 
 parameter. This is not backward compatible and also breaks the code in the 
 example section. Resolution: Mark as 'optional'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6783) Make read short circuit the default

2012-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460448#comment-13460448
 ] 

Hudson commented on HBASE-6783:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #185 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/185/])
HBASE-6783  Make read short circuit the default (Revision 1388374)

 Result = FAILURE
nkeywal : 
Files : 
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java


 Make read short circuit the default
 ---

 Key: HBASE-6783
 URL: https://issues.apache.org/jira/browse/HBASE-6783
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Fix For: 0.96.0

 Attachments: 6783.v2.patch, 6783.v2.patch, HBASE-6783.v1.patch


 Per mailing discussion, read short circuit has little or no drawback, hence 
 should used by default. As a consequence, we activate it on the default tests.
 It's possible to launch the test with -Ddfs.client.read.shortcircuit=false to 
 execute the tests without the shortcircuit, it will be used for some builds 
 on trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6524) Hooks for hbase tracing

2012-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460447#comment-13460447
 ] 

Hudson commented on HBASE-6524:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #185 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/185/])
HBASE-6524 Hooks for hbase tracing; add documentation as an appendix 
(Revision 1388337)

 Result = FAILURE
stack : 
Files : 
* /hbase/trunk/src/docbkx/book.xml


 Hooks for hbase tracing
 ---

 Key: HBASE-6524
 URL: https://issues.apache.org/jira/browse/HBASE-6524
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Leavitt
Assignee: Jonathan Leavitt
 Fix For: 0.96.0

 Attachments: 6524.addendum, 6524-v2.txt, 6524v3.txt, 
 createTableTrace.png, hbase-6524.diff


 Includes the hooks that use [htrace|http://www.github.com/cloudera/htrace] 
 library to add dapper-like tracing to hbase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6853) IlegalArgument Exception is thrown when an empty region is spliitted.

2012-09-21 Thread Priyadarshini (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Priyadarshini updated HBASE-6853:
-

Attachment: HBASE-6853_splitfailure.patch

 IlegalArgument Exception is thrown when an empty region is spliitted.
 -

 Key: HBASE-6853
 URL: https://issues.apache.org/jira/browse/HBASE-6853
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.1
Reporter: ramkrishna.s.vasudevan
 Attachments: HBASE-6853_splitfailure.patch


 This is w.r.t a mail sent in the dev mail list.
 Empty region split should be handled gracefully.  Either we should not allow 
 the split to happen if we know that the region is empty or we should allow 
 the split to happen by setting the no of threads to the thread pool executor 
 as 1.
 {code}
 int nbFiles = hstoreFilesToSplit.size();
 ThreadFactoryBuilder builder = new ThreadFactoryBuilder();
 builder.setNameFormat(StoreFileSplitter-%1$d);
 ThreadFactory factory = builder.build();
 ThreadPoolExecutor threadPool =
   (ThreadPoolExecutor) Executors.newFixedThreadPool(nbFiles, factory);
 ListFutureVoid futures = new ArrayListFutureVoid(nbFiles);
 {code}
 Here the nbFiles needs to be a non zero positive value.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6853) IlegalArgument Exception is thrown when an empty region is spliitted.

2012-09-21 Thread Priyadarshini (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Priyadarshini updated HBASE-6853:
-

Attachment: HBASE-6853_2_splitsuccess.patch

 IlegalArgument Exception is thrown when an empty region is spliitted.
 -

 Key: HBASE-6853
 URL: https://issues.apache.org/jira/browse/HBASE-6853
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.1
Reporter: ramkrishna.s.vasudevan
 Attachments: HBASE-6853_2_splitsuccess.patch, 
 HBASE-6853_splitfailure.patch


 This is w.r.t a mail sent in the dev mail list.
 Empty region split should be handled gracefully.  Either we should not allow 
 the split to happen if we know that the region is empty or we should allow 
 the split to happen by setting the no of threads to the thread pool executor 
 as 1.
 {code}
 int nbFiles = hstoreFilesToSplit.size();
 ThreadFactoryBuilder builder = new ThreadFactoryBuilder();
 builder.setNameFormat(StoreFileSplitter-%1$d);
 ThreadFactory factory = builder.build();
 ThreadPoolExecutor threadPool =
   (ThreadPoolExecutor) Executors.newFixedThreadPool(nbFiles, factory);
 ListFutureVoid futures = new ArrayListFutureVoid(nbFiles);
 {code}
 Here the nbFiles needs to be a non zero positive value.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6853) IlegalArgument Exception is thrown when an empty region is spliitted.

2012-09-21 Thread Priyadarshini (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460461#comment-13460461
 ] 

Priyadarshini commented on HBASE-6853:
--

Attached 2 patches refers to 2 scenarios.

SCENARIO 1 (success) :
-
Making the split to succeed by setting the number of threads for the 
FixedThreadPool to 1 even if the hstoreFilesToSplit size is zero.

SCENARIO 2 (failure) :
--
Making the split to fail when there are no store files to split.



 IlegalArgument Exception is thrown when an empty region is spliitted.
 -

 Key: HBASE-6853
 URL: https://issues.apache.org/jira/browse/HBASE-6853
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.1
Reporter: ramkrishna.s.vasudevan
 Attachments: HBASE-6853_2_splitsuccess.patch, 
 HBASE-6853_splitfailure.patch


 This is w.r.t a mail sent in the dev mail list.
 Empty region split should be handled gracefully.  Either we should not allow 
 the split to happen if we know that the region is empty or we should allow 
 the split to happen by setting the no of threads to the thread pool executor 
 as 1.
 {code}
 int nbFiles = hstoreFilesToSplit.size();
 ThreadFactoryBuilder builder = new ThreadFactoryBuilder();
 builder.setNameFormat(StoreFileSplitter-%1$d);
 ThreadFactory factory = builder.build();
 ThreadPoolExecutor threadPool =
   (ThreadPoolExecutor) Executors.newFixedThreadPool(nbFiles, factory);
 ListFutureVoid futures = new ArrayListFutureVoid(nbFiles);
 {code}
 Here the nbFiles needs to be a non zero positive value.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6853) IllegalArgument Exception is thrown when an empty region is spliitted.

2012-09-21 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-6853:
--

Summary: IllegalArgument Exception is thrown when an empty region is 
spliitted.  (was: IlegalArgument Exception is thrown when an empty region is 
spliitted.)

 IllegalArgument Exception is thrown when an empty region is spliitted.
 --

 Key: HBASE-6853
 URL: https://issues.apache.org/jira/browse/HBASE-6853
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.1
Reporter: ramkrishna.s.vasudevan
 Attachments: HBASE-6853_2_splitsuccess.patch, 
 HBASE-6853_splitfailure.patch


 This is w.r.t a mail sent in the dev mail list.
 Empty region split should be handled gracefully.  Either we should not allow 
 the split to happen if we know that the region is empty or we should allow 
 the split to happen by setting the no of threads to the thread pool executor 
 as 1.
 {code}
 int nbFiles = hstoreFilesToSplit.size();
 ThreadFactoryBuilder builder = new ThreadFactoryBuilder();
 builder.setNameFormat(StoreFileSplitter-%1$d);
 ThreadFactory factory = builder.build();
 ThreadPoolExecutor threadPool =
   (ThreadPoolExecutor) Executors.newFixedThreadPool(nbFiles, factory);
 ListFutureVoid futures = new ArrayListFutureVoid(nbFiles);
 {code}
 Here the nbFiles needs to be a non zero positive value.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-6854) Deletion of SPLITTING node on split rollback should clear the region from RIT

2012-09-21 Thread ramkrishna.s.vasudevan (JIRA)
ramkrishna.s.vasudevan created HBASE-6854:
-

 Summary: Deletion of SPLITTING node on split rollback should clear 
the region from RIT
 Key: HBASE-6854
 URL: https://issues.apache.org/jira/browse/HBASE-6854
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
 Fix For: 0.94.2


If a failure happens in split before OFFLINING_PARENT, we tend to rollback the 
split including deleting the znodes created.
On deletion of the RS_ZK_SPLITTING node we are getting a callback but not 
remvoving from RIT. We need to remove it from RIT, anyway SSH logic is well 
guarded in case the delete event comes due to RS down scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-6855) Add support for the REST Multi Gets to the RemoteHTable implementation

2012-09-21 Thread Erich Hochmuth (JIRA)
Erich Hochmuth created HBASE-6855:
-

 Summary: Add support for the REST Multi Gets to the RemoteHTable 
implementation
 Key: HBASE-6855
 URL: https://issues.apache.org/jira/browse/HBASE-6855
 Project: HBase
  Issue Type: Improvement
  Components: REST
Affects Versions: 0.94.1
Reporter: Erich Hochmuth
Priority: Minor


REST Multi Gets support was added in HBASE-3541. I'd like to extend this 
capability into the RemoteHTable implementation.

https://issues.apache.org/jira/browse/HBASE-3541

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6798) HDFS always read checksum form meta file

2012-09-21 Thread LiuLei (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460513#comment-13460513
 ] 

LiuLei commented on HBASE-6798:
---

Hi stack, yes, I think we should add a new setSkipChecksum(boolean) method in 
org.apache.hadoop.hdfs.FileSystem class. When read HLog files to use 
setSkipChecksum(false),  when read HFile file to use setSkipChecksum(true).

 HDFS always read checksum form meta file
 

 Key: HBASE-6798
 URL: https://issues.apache.org/jira/browse/HBASE-6798
 Project: HBase
  Issue Type: Bug
  Components: Performance
Affects Versions: 0.94.0, 0.94.1
Reporter: LiuLei
Priority: Blocker
 Attachments: 6798.txt


 I use hbase0.941 and hadoop-0.20.2-cdh3u5 version.
 The HBase support checksums in HBase block cache in HBASE-5074 jira.
 The  HBase  support checksums for decrease the iops of  HDFS, so that HDFS
 dont't need to read the checksum from meta file of block file.
 But in hadoop-0.20.2-cdh3u5 version, BlockSender still read the metadata file 
 even if the
  hbase.regionserver.checksum.verify property is ture.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6714) TestMultiSlaveReplication#testMultiSlaveReplication may fail

2012-09-21 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460530#comment-13460530
 ] 

Himanshu Vashishtha commented on HBASE-6714:


Any comments/reviews?

 TestMultiSlaveReplication#testMultiSlaveReplication may fail
 

 Key: HBASE-6714
 URL: https://issues.apache.org/jira/browse/HBASE-6714
 Project: HBase
  Issue Type: Bug
  Components: Replication, test
Affects Versions: 0.92.0, 0.94.0
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
Priority: Minor
 Attachments: HBase-6714-v1.patch


 java.lang.AssertionError: expected:1 but was:0
 at org.junit.Assert.fail(Assert.java:93)
 at org.junit.Assert.failNotEquals(Assert.java:647)
 at org.junit.Assert.assertEquals(Assert.java:128)
 at org.junit.Assert.assertEquals(Assert.java:472)
 at org.junit.Assert.assertEquals(Assert.java:456)
 at 
 org.apache.hadoop.hbase.replication.TestMultiSlaveReplication.checkRow(TestMultiSlaveReplication.java:203)
 at 
 org.apache.hadoop.hbase.replication.TestMultiSlaveReplication.testMultiSlaveReplication(TestMultiSlaveReplication.java:188)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 TestMultiSlaveReplication-testMultiSlaveReplication failed in our local 
 build citing that row was not replicated to second peer. This is because 
 after inserting row, log is rolled and we look for row2 in both the 
 clusters and then we check for existence of row in both clusters. 
 Meanwhile, Replication thread was sleeping for the second cluster and Row 
 row2 is not present in the second cluster from the very beginning. So, the 
 row2 existence check succeeds and control move on to find row in both 
 clusters where it fails for the second cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.

2012-09-21 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460532#comment-13460532
 ] 

ramkrishna.s.vasudevan commented on HBASE-6299:
---

There are some hanging tests in the hadoop QA build.
+1 on patch otherwise.

 RS starts region open while fails ack to HMaster.sendRegionOpen() causes 
 inconsistency in HMaster's region state and a series of successive problems.
 -

 Key: HBASE-6299
 URL: https://issues.apache.org/jira/browse/HBASE-6299
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Fix For: 0.92.3, 0.94.3, 0.96.0

 Attachments: 6299v4.txt, HBASE-6299.patch, HBASE-6299-v2.patch, 
 HBASE-6299-v3.patch


 1. HMaster tries to assign a region to an RS.
 2. HMaster creates a RegionState for this region and puts it into 
 regionsInTransition.
 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS 
 receives the open region request and starts to proceed, with success 
 eventually. However, due to network problems, HMaster fails to receive the 
 response for the openRegion() call, and the call times out.
 4. HMaster attemps to assign for a second time, choosing another RS. 
 5. But since the HMaster's OpenedRegionHandler has been triggered by the 
 region open of the previous RS, and the RegionState has already been removed 
 from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK 
 node RS_ZK_REGION_OPENING updated by the second attempt.
 6. The unassigned ZK node stays and a later unassign fails coz 
 RS_ZK_REGION_CLOSING cannot be created.
 {code}
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.;
  
 plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
  src=swbss-hadoop-004,60020,1340890123243, 
 dest=swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:28,882 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,291 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
 event for 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, 
 regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node
 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Deleting existing unassigned node for 
 b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Successfully deleted unassigned node for 
 region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has 
 opened the region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  that was online on serverName=swbss-hadoop-006,60020,1340890678078, 
 load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301)
 2012-06-29 07:07:41,140 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to serverName=swbss-hadoop-006,60020,1340890678078, 

[jira] [Created] (HBASE-6856) Document the LeaseException thrown in scanner next

2012-09-21 Thread Daniel Iancu (JIRA)
Daniel Iancu created HBASE-6856:
---

 Summary: Document the LeaseException thrown in scanner next
 Key: HBASE-6856
 URL: https://issues.apache.org/jira/browse/HBASE-6856
 Project: HBase
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.92.0
Reporter: Daniel Iancu


In some situations clients that fetch data from a RS get a LeaseException 
instead of the usual ScannerTimeoutException/UnknownScannerException.

This particular case should be documented in the HBase guide.

Some key points

* the source of exception is: 
org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:230)

* it happens in the context of a slow/freezing RS#next

* it can be prevented by having  hbase.rpc.timeout  
hbase.regionserver.lease.period

Harsh J investigated the issue and has some conclusions, see

http://mail-archives.apache.org/mod_mbox/hbase-user/201209.mbox/browser


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6856) Document the LeaseException thrown in scanner next

2012-09-21 Thread Daniel Iancu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Iancu updated HBASE-6856:


Description: 
In some situations clients that fetch data from a RS get a LeaseException 
instead of the usual ScannerTimeoutException/UnknownScannerException.

This particular case should be documented in the HBase guide.

Some key points

* the source of exception is: 
org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:230)

* it happens in the context of a slow/freezing RS#next

* it can be prevented by having  hbase.rpc.timeout  
hbase.regionserver.lease.period

Harsh J investigated the issue and has some conclusions, see

http://mail-archives.apache.org/mod_mbox/hbase-user/201209.mbox/%3CCAOcnVr3R-LqtKhFsk8Bhrm-YW2i9O6J6Fhjz2h7q6_sxvwd2yw%40mail.gmail.com%3E


  was:
In some situations clients that fetch data from a RS get a LeaseException 
instead of the usual ScannerTimeoutException/UnknownScannerException.

This particular case should be documented in the HBase guide.

Some key points

* the source of exception is: 
org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:230)

* it happens in the context of a slow/freezing RS#next

* it can be prevented by having  hbase.rpc.timeout  
hbase.regionserver.lease.period

Harsh J investigated the issue and has some conclusions, see

http://mail-archives.apache.org/mod_mbox/hbase-user/201209.mbox/browser



 Document the LeaseException thrown in scanner next
 --

 Key: HBASE-6856
 URL: https://issues.apache.org/jira/browse/HBASE-6856
 Project: HBase
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.92.0
Reporter: Daniel Iancu
  Labels: LeaseException

 In some situations clients that fetch data from a RS get a LeaseException 
 instead of the usual ScannerTimeoutException/UnknownScannerException.
 This particular case should be documented in the HBase guide.
 Some key points
 * the source of exception is: 
 org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:230)
 * it happens in the context of a slow/freezing RS#next
 * it can be prevented by having  hbase.rpc.timeout  
 hbase.regionserver.lease.period
 Harsh J investigated the issue and has some conclusions, see
 http://mail-archives.apache.org/mod_mbox/hbase-user/201209.mbox/%3CCAOcnVr3R-LqtKhFsk8Bhrm-YW2i9O6J6Fhjz2h7q6_sxvwd2yw%40mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5937) Refactor HLog into an interface.

2012-09-21 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460592#comment-13460592
 ] 

Flavio Junqueira commented on HBASE-5937:
-

Thanks for responding, Stack.

bq. Sorry for not getting to your log. What have you been having to do to get 
tests to pass? How did you fix TestMultiParallel? It is stuff to do w/ this 
refactoring?

It was our bug.

bq.  Currently Reader and Writer are Interfaces defined inside HLog. You get 
one by calling a static method on HLog. You'd like to getReader non-static, an 
invocation on a particular instance of HLog.
bq. That seems fine by me. It makes sense given what you are trying to do. It 
is less flexible than what we currently have but its flexible because it 
presumes a particular implementation of HLog.

It is simpler to leave getReader and getWriter as static methods. Given that a 
reader/writer is for a concrete WAL, Ivan and I thought that it would be best 
to have these methods available as instance methods. However, it is not looking 
simple to implement because we don't have HLog objects available in all places 
we need a reader or a writer, and the initialization of HLog objects makes it 
tricky to instantiate it only to get a reader or a writer. At this point, I'm 
tempted to leave them as static methods for now, unless anyone has a strong 
preference otherwise.

bq. I hope you call your HLog Inteface WAL!

It is fine with me to make the change.

bq. I think this work trying to make an Interface for WAL is kinda important. 
There is this bookeeping project but the multi-WAL dev – i.e. making the 
regionserver write more than one WAL at a time (into HDFS) – could use the 
result of this effort too.

BookKeeper provides the ability to write multiple concurrent logs, but if the 
regionserver code is not prepared, then we won't be able to benefit from this 
feature. Consequently, it does sound very important to have the regionserver 
writing to more than one WAL at a time.

Currently there is one test failing consistently for me:

{noformat}
org.apache.hadoop.hbase.TestLocalHBaseCluster
{noformat}

and I believe the culprit is this:

{noformat}
WARNING! File system needs to be upgraded.  You have version null and I want 
version 7.  Run the '${HBASE_HOME}/bin/hbase migrate' script.
2012-09-21 17:27:06,075 FATAL 
[Master:0;perfectsalt-lm.barcelona.corp.yahoo.com,52906,1348241225714] 
master.HMaster(1838): Unhandled exception. Starting shutdown.
org.apache.hadoop.hbase.util.FileSystemVersionException: File system needs to 
be upgraded.  You have version null and I want version 7.  Run the 
'${HBASE_HOME}/bin/hbase migrate' script.
{noformat}

Any clue of why this could be happening?

 Refactor HLog into an interface.
 

 Key: HBASE-5937
 URL: https://issues.apache.org/jira/browse/HBASE-5937
 Project: HBase
  Issue Type: Sub-task
Reporter: Li Pi
Assignee: Flavio Junqueira
Priority: Minor
 Attachments: 
 org.apache.hadoop.hbase.client.TestMultiParallel-output.txt


 What the summary says. Create HLog interface. Make current implementation use 
 it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6651) Thread safety of HTablePool is doubtful

2012-09-21 Thread Hiroshi Ikeda (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460612#comment-13460612
 ] 

Hiroshi Ikeda commented on HBASE-6651:
--

Sorry I didn't realize that PoolMap is also used in HBaseClient, and it is 
inteneded that the round robin logic gives the same thread-safe object to 
different threads and sticks to the limit count of the resources (aside from 
the other factors to break thread-safety of PoolMap). Apparently the 
requirements of pooling differ in HTablePool and HBaseClient.

 Thread safety of HTablePool is doubtful
 ---

 Key: HBASE-6651
 URL: https://issues.apache.org/jira/browse/HBASE-6651
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.1
Reporter: Hiroshi Ikeda
Priority: Minor
 Attachments: sample.zip, sample.zip


 There are some operations in HTablePool to access to PoolMap in multiple 
 times without any explict synchronization. 
 For example HTablePool.closeTablePool() calles PoolMap.values(), and calles 
 PoolMap.remove(). If other threads add new instances to the pool in the 
 middle of the calls, the new added instances might be dropped. 
 (HTablePool.closeTablePool() also has another problem that calling it by 
 multple threads causes accessing HTable by multiple threads.)
 Moreover, PoolMap is not thread safe for the same reason.
 For example PoolMap.put() calles ConcurrentMap.get() and calles 
 ConcurrentMap.put(). If other threads add a new instance to the concurent map 
 in the middle of the calls, the new instance might be dropped.
 And also implementations of Pool have the same problems.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields

2012-09-21 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460615#comment-13460615
 ] 

Lars Hofhansl commented on HBASE-6852:
--

Patch looks good. I'll remain sceptical about the real life impact, though. The 
expensive is taking out the memory barriers. As long as we use AtomicLong (or 
volatiles, or synchronized, or use ConcurrentMap) this is still going to happen.

Lemme move this out to 0.94.3, so that we can performance test this a bit more.

 SchemaMetrics.updateOnCacheHit costs too much while full scanning a table 
 with all of its fields
 

 Key: HBASE-6852
 URL: https://issues.apache.org/jira/browse/HBASE-6852
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.94.0
Reporter: Cheng Hao
Priority: Minor
  Labels: performance
 Fix For: 0.94.3, 0.96.0

 Attachments: onhitcache-trunk.patch


 The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full 
 table scanning.
 Here is the top 5 hotspots within regionserver while full scanning a table: 
 (Sorry for the less-well-format)
 CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated)
 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit 
 mask of 0x00 (No unit mask) count 500
 samples  %image name   symbol name
 ---
 9844713.4324  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean)
   98447100.000  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean) [self]
 ---
 45814 6.2510  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int)
   45814100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int) [self]
 ---
 43523 5.9384  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
   43523100.000  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
  [self]
 ---
 42548 5.8054  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int)
   42548100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int) [self]
 ---
 40572 5.5358  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1
   40572100.000  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields

2012-09-21 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6852:
-

Fix Version/s: (was: 0.94.2)
   0.94.3

 SchemaMetrics.updateOnCacheHit costs too much while full scanning a table 
 with all of its fields
 

 Key: HBASE-6852
 URL: https://issues.apache.org/jira/browse/HBASE-6852
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.94.0
Reporter: Cheng Hao
Priority: Minor
  Labels: performance
 Fix For: 0.94.3, 0.96.0

 Attachments: onhitcache-trunk.patch


 The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full 
 table scanning.
 Here is the top 5 hotspots within regionserver while full scanning a table: 
 (Sorry for the less-well-format)
 CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated)
 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit 
 mask of 0x00 (No unit mask) count 500
 samples  %image name   symbol name
 ---
 9844713.4324  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean)
   98447100.000  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean) [self]
 ---
 45814 6.2510  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int)
   45814100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int) [self]
 ---
 43523 5.9384  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
   43523100.000  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
  [self]
 ---
 42548 5.8054  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int)
   42548100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int) [self]
 ---
 40572 5.5358  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1
   40572100.000  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6854) Deletion of SPLITTING node on split rollback should clear the region from RIT

2012-09-21 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6854:
-

Fix Version/s: (was: 0.94.2)
   0.94.3

Unless there's patch today, let's move it to 0.94.3.

 Deletion of SPLITTING node on split rollback should clear the region from RIT
 -

 Key: HBASE-6854
 URL: https://issues.apache.org/jira/browse/HBASE-6854
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
 Fix For: 0.94.3


 If a failure happens in split before OFFLINING_PARENT, we tend to rollback 
 the split including deleting the znodes created.
 On deletion of the RS_ZK_SPLITTING node we are getting a callback but not 
 remvoving from RIT. We need to remove it from RIT, anyway SSH logic is well 
 guarded in case the delete event comes due to RS down scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6381) AssignmentManager should use the same logic for clean startup and failover

2012-09-21 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460618#comment-13460618
 ] 

Jimmy Xiang commented on HBASE-6381:


@Ram, thanks a lot for the review.  As to the removal from regionInfoIterator, 
it is used locally to track unassigned regions so that we know if the bulk 
assign is completed so I think it is needed.

As to HBASE-6228, I added the following to the fixupDaughters in HMaster before 
actually fixing anything in order to avoid duplicated processing:

{noformat}
  if (!serverManager.isServerDead(sn)) { // Otherwise, let SSH take care of 
it
{noformat}

Here sn is the parent server name. So the chance of duplicated processing 
should be very minimal, right?



 AssignmentManager should use the same logic for clean startup and failover
 --

 Key: HBASE-6381
 URL: https://issues.apache.org/jira/browse/HBASE-6381
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: hbase-6381-notes.pdf, hbase-6381.pdf, 
 trunk-6381_v5.patch, trunk-6381_v7.patch, trunk-6381_v8.patch


 Currently AssignmentManager handles clean startup and failover very 
 differently.
 Different logic is mingled together so it is hard to find out which is for 
 which.
 We should clean it up and share the same logic so that AssignmentManager 
 handles
 both cases the same way.  This way, the code will much easier to understand 
 and
 maintain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6841) Meta prefetching is slower than doing multiple meta lookups

2012-09-21 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460619#comment-13460619
 ] 

Lars Hofhansl commented on HBASE-6841:
--

Which client did you jstack? There're at least two clients at work here: The 
one the attempts the truncate and the ones that still perform the requests.
I assume the stuck ones are the those that are still performing the requests.

I am beginning to doubt that prefetching is a issue here. This has to do with 
how these clients connect. There should not need to be a connection setup each 
time.


 Meta prefetching is slower than doing multiple meta lookups
 ---

 Key: HBASE-6841
 URL: https://issues.apache.org/jira/browse/HBASE-6841
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.2

 Attachments: 6841-0.94.txt, 6841-0.96.txt


 I got myself into a situation where I needed to truncate a massive table 
 while it was getting hits and surprisingly the clients were not recovering. 
 What I see in the logs is that every time we prefetch .META. we setup a new 
 HConnection because we close it on the way out. It's awfully slow.
 We should just turn it off or make it useful. jstacks coming up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6651) Thread safety of HTablePool is doubtful

2012-09-21 Thread Hiroshi Ikeda (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hiroshi Ikeda updated HBASE-6651:
-

Attachment: sharedmap_for_hbaseclient.zip

Added sample implementation which might be used to pool and share Connection 
instances between threads in HBaseClient. I use the name SharedMap, the old 
name of PoolMap.

In HBaseClient I think PoolMap with ThreadLocalPool leaks objects. Connection 
(extending Thead) automatically tries to remove itself from the pool at the end 
of its life, but its thread is different from the thread which created the 
instance of Connection and put into the pool.

 Thread safety of HTablePool is doubtful
 ---

 Key: HBASE-6651
 URL: https://issues.apache.org/jira/browse/HBASE-6651
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.1
Reporter: Hiroshi Ikeda
Priority: Minor
 Attachments: sample.zip, sample.zip, sharedmap_for_hbaseclient.zip


 There are some operations in HTablePool to access to PoolMap in multiple 
 times without any explict synchronization. 
 For example HTablePool.closeTablePool() calles PoolMap.values(), and calles 
 PoolMap.remove(). If other threads add new instances to the pool in the 
 middle of the calls, the new added instances might be dropped. 
 (HTablePool.closeTablePool() also has another problem that calling it by 
 multple threads causes accessing HTable by multiple threads.)
 Moreover, PoolMap is not thread safe for the same reason.
 For example PoolMap.put() calles ConcurrentMap.get() and calles 
 ConcurrentMap.put(). If other threads add a new instance to the concurent map 
 in the middle of the calls, the new instance might be dropped.
 And also implementations of Pool have the same problems.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-6857) Investigate slow operations

2012-09-21 Thread Amitanand Aiyer (JIRA)
Amitanand Aiyer created HBASE-6857:
--

 Summary: Investigate slow operations
 Key: HBASE-6857
 URL: https://issues.apache.org/jira/browse/HBASE-6857
 Project: HBase
  Issue Type: Improvement
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor


We see that occassionally regionservers have a spate of slow operations.

Need to look into what is causing this. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5937) Refactor HLog into an interface.

2012-09-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460631#comment-13460631
 ] 

stack commented on HBASE-5937:
--

bq. Any clue of why this could be happening?

Somehow the test is pointed at wrong fs?  Did you mess w/ that?  HBase, when it 
starts, it looks for a file named hbase.version.  If present, it reads it to 
see that the version therein matches that of version the hbase software 
expects.  We used this facility whenever on-fs formats changed in a way that 
required you to run a migration step before starting cluster.

So, version == null makes me think hbase is looking in wrong place for the 
hbase.version file... looking in localfs rather than in hdf where it perhaps 
wrote it on startup?

bq.  ...and the initialization of HLog objects makes it tricky to instantiate 
it only to get a reader or a writer

HLog construction is the way it is again because we presume one implementation 
only.  I'd suggest you look at what it would take moving the heavyweight stuff 
done in HLog to an init or start method.  NP having us change how we do the 
HLog setup in HBase.  Perhaps it won't help much though as the Reader and 
Writer might want some of the heavy setup done?

I'd also say that HLog is the way it is, not because it was designed, but 
because it evolved this way over the years.  If you fellas want to startover, 
I'd say go for it: make a clean Interface that will work for our current hdfs 
use case and for the bkfs.  We'll shoehorn it into a 0.98 or whatever suits 
your schedule.

 Refactor HLog into an interface.
 

 Key: HBASE-5937
 URL: https://issues.apache.org/jira/browse/HBASE-5937
 Project: HBase
  Issue Type: Sub-task
Reporter: Li Pi
Assignee: Flavio Junqueira
Priority: Minor
 Attachments: 
 org.apache.hadoop.hbase.client.TestMultiParallel-output.txt


 What the summary says. Create HLog interface. Make current implementation use 
 it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-2038) Coprocessors: Region level indexing

2012-09-21 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau reassigned HBASE-2038:
-

Assignee: Jacques Nadeau

 Coprocessors: Region level indexing
 ---

 Key: HBASE-2038
 URL: https://issues.apache.org/jira/browse/HBASE-2038
 Project: HBase
  Issue Type: New Feature
  Components: Coprocessors
Reporter: Andrew Purtell
Assignee: Jacques Nadeau
Priority: Minor

 HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a 
 good goalpost for coprocessor environment design -- there should be enough of 
 it so region level indexing can be reimplemented as a coprocessor without any 
 loss of functionality. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]

2012-09-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460645#comment-13460645
 ] 

stack commented on HBASE-6733:
--

[~jdcryans] Review this boss?

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
 ---

 Key: HBASE-6733
 URL: https://issues.apache.org/jira/browse/HBASE-6733
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.92.3

 Attachments: 6733-1.patch, 6733-2.patch


 The failure is in TestReplication.queueFailover (fails due to unreplicated 
 rows). I have come across two problems:
 1. The sleepMultiplier is not properly reset when the currentPath is changed 
 (in ReplicationSource.java).
 2. ReplicationExecutor sometime removes files to replicate from the queue too 
 early, resulting in corresponding edits missing. Here the problem is due to 
 the fact the log-file length that the replication executor finds is not the 
 most updated one, and hence it doesn't read anything from there, and 
 ultimately, when there is a log roll, the replication-queue gets a new entry, 
 and the executor drops the old entry out of the queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-3340) Eventually Consistent Secondary Indexing via Coprocessors

2012-09-21 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau reassigned HBASE-3340:
-

Assignee: Jacques Nadeau  (was: Jonathan Gray)

 Eventually Consistent Secondary Indexing via Coprocessors
 -

 Key: HBASE-3340
 URL: https://issues.apache.org/jira/browse/HBASE-3340
 Project: HBase
  Issue Type: New Feature
  Components: Coprocessors
Reporter: Jonathan Gray
Assignee: Jacques Nadeau

 Secondary indexing support via coprocessors with an eventual consistency 
 guarantee.  Design to come.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-09-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460648#comment-13460648
 ] 

stack commented on HBASE-6758:
--

[~devaraj] What you think of Ted comment above boss?

[~jdcryans] Any comment on this patch?

 [replication] The replication-executor should make sure the file that it is 
 replicating is closed before declaring success on that file
 ---

 Key: HBASE-6758
 URL: https://issues.apache.org/jira/browse/HBASE-6758
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
 TEST-org.apache.hadoop.hbase.replication.TestReplication.xml


 I have seen cases where the replication-executor would lose data to replicate 
 since the file hasn't been closed yet. Upon closing, the new data becomes 
 visible. Before that happens the ZK node shouldn't be deleted in 
 ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
 in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6488) HBase wont run on IPv6 on OSes that use zone-indexes

2012-09-21 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6488:
-

   Resolution: Fixed
Fix Version/s: 0.96.0
   0.94.2
 Assignee: ryan rawson
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed to 0.94 and trunk.  Thanks for the patch Ryan (Should have a test and 
should be more general escaping but hey...)

 HBase wont run on IPv6 on OSes that use zone-indexes
 

 Key: HBASE-6488
 URL: https://issues.apache.org/jira/browse/HBASE-6488
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ryan rawson
Assignee: ryan rawson
 Fix For: 0.94.2, 0.96.0

 Attachments: HBASE-6488-trunk.txt, HBASE-6488.txt


 In IPv6, an address may have a zone-index, which is specified with a percent, 
 eg: ...%0.  This looks like a format string, and thus in a part of the code 
 which uses the hostname as a prefix to another string which is interpreted 
 with String.format, you end up with an exception:
 2012-07-31 18:21:39,848 FATAL org.apache.hadoop.hbase.master.HMaster:
 Unhandled exception. Starting shutdown.
 java.util.UnknownFormatConversionException: Conversion = '0'
 at java.util.Formatter.checkText(Formatter.java:2503)
 at java.util.Formatter.parse(Formatter.java:2467)
 at java.util.Formatter.format(Formatter.java:2414)
 at java.util.Formatter.format(Formatter.java:2367)
 at java.lang.String.format(String.java:2769)
 at 
 com.google.common.util.concurrent.ThreadFactoryBuilder.setNameFormat(ThreadFactoryBuilder.java:68)
 at 
 org.apache.hadoop.hbase.executor.ExecutorService$Executor.init(ExecutorService.java:299)
 at 
 org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:185)
 at 
 org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:227)
 at 
 org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:821)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:507)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:344)
 at 
 org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:220)
 at java.lang.Thread.run(Thread.java:680)
 2012-07-31 18:21:39,908 INFO org.apache.hadoop.hbase.master.HMaster: Aborting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6752) On region server failure, serve writes and timeranged reads during the log split

2012-09-21 Thread Gregory Chanan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460662#comment-13460662
 ] 

Gregory Chanan commented on HBASE-6752:
---

On timeranged reads:

if the user specified his own timestamps, couldn't the correct value to return 
be only in the WAL?

 On region server failure, serve writes and timeranged reads during the log 
 split
 

 Key: HBASE-6752
 URL: https://issues.apache.org/jira/browse/HBASE-6752
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Priority: Minor

 Opening for write on failure would mean:
 - Assign the region to a new regionserver. It marks the region as recovering
   -- specific exception returned to the client when we cannot server.
   -- allow them to know where they stand. The exception can include some time 
 information (failure stated on: ...)
   -- allow them to go immediately on the right regionserver, instead of 
 retrying or calling the region holding meta to get the new address
  = save network calls, lower the load on meta.
 - Do the split as today. Priority is given to region server holding the new 
 regions
   -- help to share the load balancing code: the split is done by region 
 server considered as available for new regions
   -- help locality (the recovered edits are available on the region server) 
 = lower the network usage
 - When the split is finished, we're done as of today
 - while the split is progressing, the region server can
  -- serve writes
--- that's useful for all application that need to write but not read 
 immediately:
--- whatever logs events to analyze them later
--- opentsdb is a perfect example.   
  -- serve reads if they have a compatible time range. For heavily used 
 tables, it could be an help, because:
--- we can expect to have a few minutes of data only (as it's loaded)
--- the heaviest queries, often accepts a few -or more- minutes delay. 
 Some What if:
 1) the split fails
 = Retry until it works. As today. Just that we serves writes. We need to 
 know (as today) that the region has not recovered if we fail again.
 2) the regionserver fails during the split
 = As 1 and as of today/
 3) the regionserver fails after the split but before the state change to 
 fully available.
 = New assign. More logs to split (the ones already dones and the new ones).
 4) the assignment fails
 = Retry until it works. As today.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-09-21 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460664#comment-13460664
 ] 

Devaraj Das commented on HBASE-6758:


[~stack] I have already responded to Ted's comment. In summary, the problem is 
that the log-splitter couldn't complete its work soon enough, and hence the 
file wasn't moved to .oldlogs soon enough. The replicator did the maxRetries 
and gave up. So this is a different issue (and maybe solved by increasing the 
value of maxRetries in the config.)

 [replication] The replication-executor should make sure the file that it is 
 replicating is closed before declaring success on that file
 ---

 Key: HBASE-6758
 URL: https://issues.apache.org/jira/browse/HBASE-6758
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
 TEST-org.apache.hadoop.hbase.replication.TestReplication.xml


 I have seen cases where the replication-executor would lose data to replicate 
 since the file hasn't been closed yet. Upon closing, the new data becomes 
 visible. Before that happens the ZK node shouldn't be deleted in 
 ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
 in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6841) Meta prefetching is slower than doing multiple meta lookups

2012-09-21 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6841:
-

Fix Version/s: (was: 0.94.2)
   0.94.3

I'd like to move this to 0.94.3. Please pull back if you disagree.

 Meta prefetching is slower than doing multiple meta lookups
 ---

 Key: HBASE-6841
 URL: https://issues.apache.org/jira/browse/HBASE-6841
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.3

 Attachments: 6841-0.94.txt, 6841-0.96.txt


 I got myself into a situation where I needed to truncate a massive table 
 while it was getting hits and surprisingly the clients were not recovering. 
 What I see in the logs is that every time we prefetch .META. we setup a new 
 HConnection because we close it on the way out. It's awfully slow.
 We should just turn it off or make it useful. jstacks coming up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6841) Meta prefetching is slower than doing multiple meta lookups

2012-09-21 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460674#comment-13460674
 ] 

Jean-Daniel Cryans commented on HBASE-6841:
---

bq. @J-D: Since this is (presumably) a long standing condition, how do you feel 
about moving this to 0.94.3?

I'm ok.

bq. Which client did you jstack? There're at least two clients at work here: 
The one the attempts the truncate and the ones that still perform the requests.

The clients. I truncated separately.

bq. I am beginning to doubt that prefetching is a issue here. This has to do 
with how these clients connect. There should not need to be a connection setup 
each time.

Something is broken, we agree there :)

 Meta prefetching is slower than doing multiple meta lookups
 ---

 Key: HBASE-6841
 URL: https://issues.apache.org/jira/browse/HBASE-6841
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.3

 Attachments: 6841-0.94.txt, 6841-0.96.txt


 I got myself into a situation where I needed to truncate a massive table 
 while it was getting hits and surprisingly the clients were not recovering. 
 What I see in the logs is that every time we prefetch .META. we setup a new 
 HConnection because we close it on the way out. It's awfully slow.
 We should just turn it off or make it useful. jstacks coming up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-6856) Document the LeaseException thrown in scanner next

2012-09-21 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-6856.
--

   Resolution: Fixed
Fix Version/s: 0.96.0
 Assignee: Daniel Iancu
 Hadoop Flags: Reviewed

Committed.  Let me push the book up to the site after this commit.  Thanks for 
the mailing list distillation (and thanks Harsh for digging in on this one...)

 Document the LeaseException thrown in scanner next
 --

 Key: HBASE-6856
 URL: https://issues.apache.org/jira/browse/HBASE-6856
 Project: HBase
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.92.0
Reporter: Daniel Iancu
Assignee: Daniel Iancu
  Labels: LeaseException
 Fix For: 0.96.0


 In some situations clients that fetch data from a RS get a LeaseException 
 instead of the usual ScannerTimeoutException/UnknownScannerException.
 This particular case should be documented in the HBase guide.
 Some key points
 * the source of exception is: 
 org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:230)
 * it happens in the context of a slow/freezing RS#next
 * it can be prevented by having  hbase.rpc.timeout  
 hbase.regionserver.lease.period
 Harsh J investigated the issue and has some conclusions, see
 http://mail-archives.apache.org/mod_mbox/hbase-user/201209.mbox/%3CCAOcnVr3R-LqtKhFsk8Bhrm-YW2i9O6J6Fhjz2h7q6_sxvwd2yw%40mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields

2012-09-21 Thread Mikhail Bautin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460683#comment-13460683
 ] 

Mikhail Bautin commented on HBASE-6852:
---

[~lhofhansl]: what are the other cases when metrics came up as performance 
issues?

[~chenghao_sh]: you said that your dataset size was 600GB, and the total amount 
of block cache was presumably much smaller than that, which makes me think the 
workload should have been I/O-bound. What was the CPU utilization on your test? 
What was the disk throughput?

 SchemaMetrics.updateOnCacheHit costs too much while full scanning a table 
 with all of its fields
 

 Key: HBASE-6852
 URL: https://issues.apache.org/jira/browse/HBASE-6852
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.94.0
Reporter: Cheng Hao
Priority: Minor
  Labels: performance
 Fix For: 0.94.3, 0.96.0

 Attachments: onhitcache-trunk.patch


 The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full 
 table scanning.
 Here is the top 5 hotspots within regionserver while full scanning a table: 
 (Sorry for the less-well-format)
 CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated)
 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit 
 mask of 0x00 (No unit mask) count 500
 samples  %image name   symbol name
 ---
 9844713.4324  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean)
   98447100.000  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean) [self]
 ---
 45814 6.2510  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int)
   45814100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int) [self]
 ---
 43523 5.9384  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
   43523100.000  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
  [self]
 ---
 42548 5.8054  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int)
   42548100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int) [self]
 ---
 40572 5.5358  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1
   40572100.000  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6488) HBase wont run on IPv6 on OSes that use zone-indexes

2012-09-21 Thread ryan rawson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460686#comment-13460686
 ] 

ryan rawson commented on HBASE-6488:


One of the problems is we cant generally escape % wherever they come, because 
most of the string output doesnt actually use format() so therefore we'd just 
end up doubling percents.  Bummer!

 HBase wont run on IPv6 on OSes that use zone-indexes
 

 Key: HBASE-6488
 URL: https://issues.apache.org/jira/browse/HBASE-6488
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ryan rawson
Assignee: ryan rawson
 Fix For: 0.94.2, 0.96.0

 Attachments: HBASE-6488-trunk.txt, HBASE-6488.txt


 In IPv6, an address may have a zone-index, which is specified with a percent, 
 eg: ...%0.  This looks like a format string, and thus in a part of the code 
 which uses the hostname as a prefix to another string which is interpreted 
 with String.format, you end up with an exception:
 2012-07-31 18:21:39,848 FATAL org.apache.hadoop.hbase.master.HMaster:
 Unhandled exception. Starting shutdown.
 java.util.UnknownFormatConversionException: Conversion = '0'
 at java.util.Formatter.checkText(Formatter.java:2503)
 at java.util.Formatter.parse(Formatter.java:2467)
 at java.util.Formatter.format(Formatter.java:2414)
 at java.util.Formatter.format(Formatter.java:2367)
 at java.lang.String.format(String.java:2769)
 at 
 com.google.common.util.concurrent.ThreadFactoryBuilder.setNameFormat(ThreadFactoryBuilder.java:68)
 at 
 org.apache.hadoop.hbase.executor.ExecutorService$Executor.init(ExecutorService.java:299)
 at 
 org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:185)
 at 
 org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:227)
 at 
 org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:821)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:507)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:344)
 at 
 org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:220)
 at java.lang.Thread.run(Thread.java:680)
 2012-07-31 18:21:39,908 INFO org.apache.hadoop.hbase.master.HMaster: Aborting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields

2012-09-21 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460689#comment-13460689
 ] 

Todd Lipcon commented on HBASE-6852:


I have a full table scan in isolation benchmark I've been working on. My 
benchmark currently disables metrics, so I haven't seen this, but I'll add a 
flag to it to enable metrics and see if I can reproduce. Since it runs in 
isolation it's easy to run under perf stat and get cycle counts, etc, out of 
it. Will report back next week.

 SchemaMetrics.updateOnCacheHit costs too much while full scanning a table 
 with all of its fields
 

 Key: HBASE-6852
 URL: https://issues.apache.org/jira/browse/HBASE-6852
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.94.0
Reporter: Cheng Hao
Priority: Minor
  Labels: performance
 Fix For: 0.94.3, 0.96.0

 Attachments: onhitcache-trunk.patch


 The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full 
 table scanning.
 Here is the top 5 hotspots within regionserver while full scanning a table: 
 (Sorry for the less-well-format)
 CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated)
 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit 
 mask of 0x00 (No unit mask) count 500
 samples  %image name   symbol name
 ---
 9844713.4324  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean)
   98447100.000  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean) [self]
 ---
 45814 6.2510  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int)
   45814100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int) [self]
 ---
 43523 5.9384  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
   43523100.000  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
  [self]
 ---
 42548 5.8054  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int)
   42548100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int) [self]
 ---
 40572 5.5358  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1
   40572100.000  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields

2012-09-21 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460695#comment-13460695
 ] 

Lars Hofhansl commented on HBASE-6852:
--

HBASE-6603 was the other one. Turns out this is the 2nd time (not the 3rd). The 
other issue I found through profiling were not metric related.

So I was thinking what we should generally do about this. The idea in this 
patch (using an array indexed by metric) is a good one. Can we generally do 
that? I.e.:
# we know the metric we wish to collect ahead of time
# Assign an index to each of them, and collect the value in an array
# Simply use long (not volatile, atomiclong, just long)
# Upon update or read we access the metric array by index

That would eliminate the cost of the ConcurrentMap and of the AtomicXYZ, with 
the caveat that the metric are only an approximation, which at the very least 
will make testing much harder.
Maybe we have exact and fuzzy metric and only use the fuzzy one on the hot 
code-paths.

 SchemaMetrics.updateOnCacheHit costs too much while full scanning a table 
 with all of its fields
 

 Key: HBASE-6852
 URL: https://issues.apache.org/jira/browse/HBASE-6852
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.94.0
Reporter: Cheng Hao
Priority: Minor
  Labels: performance
 Fix For: 0.94.3, 0.96.0

 Attachments: onhitcache-trunk.patch


 The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full 
 table scanning.
 Here is the top 5 hotspots within regionserver while full scanning a table: 
 (Sorry for the less-well-format)
 CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated)
 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit 
 mask of 0x00 (No unit mask) count 500
 samples  %image name   symbol name
 ---
 9844713.4324  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean)
   98447100.000  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean) [self]
 ---
 45814 6.2510  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int)
   45814100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int) [self]
 ---
 43523 5.9384  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
   43523100.000  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
  [self]
 ---
 42548 5.8054  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int)
   42548100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int) [self]
 ---
 40572 5.5358  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1
   40572100.000  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6856) Document the LeaseException thrown in scanner next

2012-09-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460697#comment-13460697
 ] 

stack commented on HBASE-6856:
--

I meant to say thanks Daniel for distilling the mailing list thread and making 
this issue.

 Document the LeaseException thrown in scanner next
 --

 Key: HBASE-6856
 URL: https://issues.apache.org/jira/browse/HBASE-6856
 Project: HBase
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.92.0
Reporter: Daniel Iancu
Assignee: Daniel Iancu
  Labels: LeaseException
 Fix For: 0.96.0


 In some situations clients that fetch data from a RS get a LeaseException 
 instead of the usual ScannerTimeoutException/UnknownScannerException.
 This particular case should be documented in the HBase guide.
 Some key points
 * the source of exception is: 
 org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:230)
 * it happens in the context of a slow/freezing RS#next
 * it can be prevented by having  hbase.rpc.timeout  
 hbase.regionserver.lease.period
 Harsh J investigated the issue and has some conclusions, see
 http://mail-archives.apache.org/mod_mbox/hbase-user/201209.mbox/%3CCAOcnVr3R-LqtKhFsk8Bhrm-YW2i9O6J6Fhjz2h7q6_sxvwd2yw%40mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields

2012-09-21 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460698#comment-13460698
 ] 

Todd Lipcon commented on HBASE-6852:


If using an array of longs, we'd get a ton of cache contention effects. 
Whatever we do should be cache-line padded to avoid this perf hole.

Having a per-thread (ThreadLocal) metrics array isn't a bad way to go: no 
contention, can use non-volatile types, and can be stale-read during metrics 
snapshots by just iterating over all the threads.

 SchemaMetrics.updateOnCacheHit costs too much while full scanning a table 
 with all of its fields
 

 Key: HBASE-6852
 URL: https://issues.apache.org/jira/browse/HBASE-6852
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.94.0
Reporter: Cheng Hao
Priority: Minor
  Labels: performance
 Fix For: 0.94.3, 0.96.0

 Attachments: onhitcache-trunk.patch


 The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full 
 table scanning.
 Here is the top 5 hotspots within regionserver while full scanning a table: 
 (Sorry for the less-well-format)
 CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated)
 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit 
 mask of 0x00 (No unit mask) count 500
 samples  %image name   symbol name
 ---
 9844713.4324  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean)
   98447100.000  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean) [self]
 ---
 45814 6.2510  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int)
   45814100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int) [self]
 ---
 43523 5.9384  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
   43523100.000  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
  [self]
 ---
 42548 5.8054  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int)
   42548100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int) [self]
 ---
 40572 5.5358  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1
   40572100.000  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields

2012-09-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460699#comment-13460699
 ] 

stack commented on HBASE-6852:
--

Perhaps use the cliffclick counter (if cost  volatile) and not have to do 
fuzzy?

 SchemaMetrics.updateOnCacheHit costs too much while full scanning a table 
 with all of its fields
 

 Key: HBASE-6852
 URL: https://issues.apache.org/jira/browse/HBASE-6852
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.94.0
Reporter: Cheng Hao
Priority: Minor
  Labels: performance
 Fix For: 0.94.3, 0.96.0

 Attachments: onhitcache-trunk.patch


 The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full 
 table scanning.
 Here is the top 5 hotspots within regionserver while full scanning a table: 
 (Sorry for the less-well-format)
 CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated)
 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit 
 mask of 0x00 (No unit mask) count 500
 samples  %image name   symbol name
 ---
 9844713.4324  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean)
   98447100.000  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean) [self]
 ---
 45814 6.2510  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int)
   45814100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int) [self]
 ---
 43523 5.9384  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
   43523100.000  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
  [self]
 ---
 42548 5.8054  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int)
   42548100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int) [self]
 ---
 40572 5.5358  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1
   40572100.000  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-6752) On region server failure, serve writes and timeranged reads during the log split

2012-09-21 Thread Gregory Chanan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gregory Chanan reassigned HBASE-6752:
-

Assignee: Gregory Chanan

 On region server failure, serve writes and timeranged reads during the log 
 split
 

 Key: HBASE-6752
 URL: https://issues.apache.org/jira/browse/HBASE-6752
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: Gregory Chanan
Priority: Minor

 Opening for write on failure would mean:
 - Assign the region to a new regionserver. It marks the region as recovering
   -- specific exception returned to the client when we cannot server.
   -- allow them to know where they stand. The exception can include some time 
 information (failure stated on: ...)
   -- allow them to go immediately on the right regionserver, instead of 
 retrying or calling the region holding meta to get the new address
  = save network calls, lower the load on meta.
 - Do the split as today. Priority is given to region server holding the new 
 regions
   -- help to share the load balancing code: the split is done by region 
 server considered as available for new regions
   -- help locality (the recovered edits are available on the region server) 
 = lower the network usage
 - When the split is finished, we're done as of today
 - while the split is progressing, the region server can
  -- serve writes
--- that's useful for all application that need to write but not read 
 immediately:
--- whatever logs events to analyze them later
--- opentsdb is a perfect example.   
  -- serve reads if they have a compatible time range. For heavily used 
 tables, it could be an help, because:
--- we can expect to have a few minutes of data only (as it's loaded)
--- the heaviest queries, often accepts a few -or more- minutes delay. 
 Some What if:
 1) the split fails
 = Retry until it works. As today. Just that we serves writes. We need to 
 know (as today) that the region has not recovered if we fail again.
 2) the regionserver fails during the split
 = As 1 and as of today/
 3) the regionserver fails after the split but before the state change to 
 fully available.
 = New assign. More logs to split (the ones already dones and the new ones).
 4) the assignment fails
 = Retry until it works. As today.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields

2012-09-21 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460702#comment-13460702
 ] 

Lars Hofhansl commented on HBASE-6852:
--

Oh yeah, you mentioned cliffclick... Need to look at that.

 SchemaMetrics.updateOnCacheHit costs too much while full scanning a table 
 with all of its fields
 

 Key: HBASE-6852
 URL: https://issues.apache.org/jira/browse/HBASE-6852
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.94.0
Reporter: Cheng Hao
Priority: Minor
  Labels: performance
 Fix For: 0.94.3, 0.96.0

 Attachments: onhitcache-trunk.patch


 The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full 
 table scanning.
 Here is the top 5 hotspots within regionserver while full scanning a table: 
 (Sorry for the less-well-format)
 CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated)
 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit 
 mask of 0x00 (No unit mask) count 500
 samples  %image name   symbol name
 ---
 9844713.4324  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean)
   98447100.000  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean) [self]
 ---
 45814 6.2510  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int)
   45814100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int) [self]
 ---
 43523 5.9384  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
   43523100.000  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
  [self]
 ---
 42548 5.8054  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int)
   42548100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int) [self]
 ---
 40572 5.5358  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1
   40572100.000  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6752) On region server failure, serve writes and timeranged reads during the log split

2012-09-21 Thread Gregory Chanan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460703#comment-13460703
 ] 

Gregory Chanan commented on HBASE-6752:
---

Assigned to myself, I'm definitely up for the serving writes part, need to 
think some more about the timeranged reads.  May file separate JIRAs.

 On region server failure, serve writes and timeranged reads during the log 
 split
 

 Key: HBASE-6752
 URL: https://issues.apache.org/jira/browse/HBASE-6752
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: Gregory Chanan
Priority: Minor

 Opening for write on failure would mean:
 - Assign the region to a new regionserver. It marks the region as recovering
   -- specific exception returned to the client when we cannot server.
   -- allow them to know where they stand. The exception can include some time 
 information (failure stated on: ...)
   -- allow them to go immediately on the right regionserver, instead of 
 retrying or calling the region holding meta to get the new address
  = save network calls, lower the load on meta.
 - Do the split as today. Priority is given to region server holding the new 
 regions
   -- help to share the load balancing code: the split is done by region 
 server considered as available for new regions
   -- help locality (the recovered edits are available on the region server) 
 = lower the network usage
 - When the split is finished, we're done as of today
 - while the split is progressing, the region server can
  -- serve writes
--- that's useful for all application that need to write but not read 
 immediately:
--- whatever logs events to analyze them later
--- opentsdb is a perfect example.   
  -- serve reads if they have a compatible time range. For heavily used 
 tables, it could be an help, because:
--- we can expect to have a few minutes of data only (as it's loaded)
--- the heaviest queries, often accepts a few -or more- minutes delay. 
 Some What if:
 1) the split fails
 = Retry until it works. As today. Just that we serves writes. We need to 
 know (as today) that the region has not recovered if we fail again.
 2) the regionserver fails during the split
 = As 1 and as of today/
 3) the regionserver fails after the split but before the state change to 
 fully available.
 = New assign. More logs to split (the ones already dones and the new ones).
 4) the assignment fails
 = Retry until it works. As today.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.

2012-09-21 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6299:
-

Attachment: 6299v4.txt

Retry

 RS starts region open while fails ack to HMaster.sendRegionOpen() causes 
 inconsistency in HMaster's region state and a series of successive problems.
 -

 Key: HBASE-6299
 URL: https://issues.apache.org/jira/browse/HBASE-6299
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Fix For: 0.92.3, 0.94.3, 0.96.0

 Attachments: 6299v4.txt, 6299v4.txt, HBASE-6299.patch, 
 HBASE-6299-v2.patch, HBASE-6299-v3.patch


 1. HMaster tries to assign a region to an RS.
 2. HMaster creates a RegionState for this region and puts it into 
 regionsInTransition.
 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS 
 receives the open region request and starts to proceed, with success 
 eventually. However, due to network problems, HMaster fails to receive the 
 response for the openRegion() call, and the call times out.
 4. HMaster attemps to assign for a second time, choosing another RS. 
 5. But since the HMaster's OpenedRegionHandler has been triggered by the 
 region open of the previous RS, and the RegionState has already been removed 
 from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK 
 node RS_ZK_REGION_OPENING updated by the second attempt.
 6. The unassigned ZK node stays and a later unassign fails coz 
 RS_ZK_REGION_CLOSING cannot be created.
 {code}
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.;
  
 plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
  src=swbss-hadoop-004,60020,1340890123243, 
 dest=swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:28,882 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,291 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
 event for 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, 
 regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node
 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Deleting existing unassigned node for 
 b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Successfully deleted unassigned node for 
 region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has 
 opened the region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  that was online on serverName=swbss-hadoop-006,60020,1340890678078, 
 load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301)
 2012-06-29 07:07:41,140 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=0, 
 regions=575, usedHeap=0, maxHeap=0), trying to assign elsewhere instead; 
 retry=0
 

[jira] [Updated] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.

2012-09-21 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6299:
-

Status: Open  (was: Patch Available)

 RS starts region open while fails ack to HMaster.sendRegionOpen() causes 
 inconsistency in HMaster's region state and a series of successive problems.
 -

 Key: HBASE-6299
 URL: https://issues.apache.org/jira/browse/HBASE-6299
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.90.6
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Fix For: 0.92.3, 0.94.3, 0.96.0

 Attachments: 6299v4.txt, 6299v4.txt, HBASE-6299.patch, 
 HBASE-6299-v2.patch, HBASE-6299-v3.patch


 1. HMaster tries to assign a region to an RS.
 2. HMaster creates a RegionState for this region and puts it into 
 regionsInTransition.
 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS 
 receives the open region request and starts to proceed, with success 
 eventually. However, due to network problems, HMaster fails to receive the 
 response for the openRegion() call, and the call times out.
 4. HMaster attemps to assign for a second time, choosing another RS. 
 5. But since the HMaster's OpenedRegionHandler has been triggered by the 
 region open of the previous RS, and the RegionState has already been removed 
 from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK 
 node RS_ZK_REGION_OPENING updated by the second attempt.
 6. The unassigned ZK node stays and a later unassign fails coz 
 RS_ZK_REGION_CLOSING cannot be created.
 {code}
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.;
  
 plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
  src=swbss-hadoop-004,60020,1340890123243, 
 dest=swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:28,882 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,291 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
 event for 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, 
 regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node
 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Deleting existing unassigned node for 
 b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Successfully deleted unassigned node for 
 region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has 
 opened the region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  that was online on serverName=swbss-hadoop-006,60020,1340890678078, 
 load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301)
 2012-06-29 07:07:41,140 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=0, 
 regions=575, usedHeap=0, maxHeap=0), trying to assign elsewhere instead; 
 retry=0
 

[jira] [Updated] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.

2012-09-21 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6299:
-

Hadoop Flags: Reviewed
  Status: Patch Available  (was: Open)

 RS starts region open while fails ack to HMaster.sendRegionOpen() causes 
 inconsistency in HMaster's region state and a series of successive problems.
 -

 Key: HBASE-6299
 URL: https://issues.apache.org/jira/browse/HBASE-6299
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.90.6
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Fix For: 0.92.3, 0.94.3, 0.96.0

 Attachments: 6299v4.txt, 6299v4.txt, HBASE-6299.patch, 
 HBASE-6299-v2.patch, HBASE-6299-v3.patch


 1. HMaster tries to assign a region to an RS.
 2. HMaster creates a RegionState for this region and puts it into 
 regionsInTransition.
 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS 
 receives the open region request and starts to proceed, with success 
 eventually. However, due to network problems, HMaster fails to receive the 
 response for the openRegion() call, and the call times out.
 4. HMaster attemps to assign for a second time, choosing another RS. 
 5. But since the HMaster's OpenedRegionHandler has been triggered by the 
 region open of the previous RS, and the RegionState has already been removed 
 from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK 
 node RS_ZK_REGION_OPENING updated by the second attempt.
 6. The unassigned ZK node stays and a later unassign fails coz 
 RS_ZK_REGION_CLOSING cannot be created.
 {code}
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.;
  
 plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
  src=swbss-hadoop-004,60020,1340890123243, 
 dest=swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:28,882 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,291 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
 event for 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, 
 regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node
 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Deleting existing unassigned node for 
 b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Successfully deleted unassigned node for 
 region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has 
 opened the region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  that was online on serverName=swbss-hadoop-006,60020,1340890678078, 
 load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301)
 2012-06-29 07:07:41,140 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=0, 
 regions=575, usedHeap=0, maxHeap=0), trying to assign elsewhere instead; 
 

[jira] [Commented] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.

2012-09-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460709#comment-13460709
 ] 

stack commented on HBASE-6071:
--

How much more log does this patch generate Igal?  As I read it, it is logging 
every retry.

Maybe we should commit this even if it ups the log noise in buckets so we get 
more conscious about the retrying that is going on mostly silently currently 
 we might do something more about it in client?

 getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
 

 Key: HBASE-6071
 URL: https://issues.apache.org/jira/browse/HBASE-6071
 Project: HBase
  Issue Type: Improvement
  Components: Client, IPC/RPC
Affects Versions: 0.92.0, 0.94.0
Reporter: Igal Shilman
Priority: Minor
  Labels: client, ipc
 Attachments: HBASE-6071.patch, HBASE-6071.v2.patch, 
 HBASE-6071.v3.patch, HBASE-6071.v4.patch, 
 HConnectionManager_HBASE-6071-0.90.0.patch, lease-exception.txt


 HConnectionImplementation.getRegionServerWithRetries might terminate w/ an 
 exception different then a DoNotRetryIOException, thus silently drops 
 exceptions from previous attempts.
 [~ted_yu] suggested 
 ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E])
  adding a log message inside the catch block describing the exception type 
 and details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-6851) Race condition in TableAuthManager.updateGlobalCache()

2012-09-21 Thread Gary Helmling (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Helmling reassigned HBASE-6851:


Assignee: Gary Helmling

 Race condition in TableAuthManager.updateGlobalCache()
 --

 Key: HBASE-6851
 URL: https://issues.apache.org/jira/browse/HBASE-6851
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.94.1, 0.96.0
Reporter: Gary Helmling
Assignee: Gary Helmling
Priority: Critical

 When new global permissions are assigned, there is a race condition, during 
 which further authorization checks relying on global permissions may fail.
 In TableAuthManager.updateGlobalCache(), we have:
 {code:java}
 USER_CACHE.clear();
 GROUP_CACHE.clear();
 try {
   initGlobal(conf);
 } catch (IOException e) {
   // Never happens
   LOG.error(Error occured while updating the user cache, e);
 }
 for (Map.EntryString,TablePermission entry : userPerms.entries()) {
   if (AccessControlLists.isGroupPrincipal(entry.getKey())) {
 GROUP_CACHE.put(AccessControlLists.getGroupName(entry.getKey()),
 new Permission(entry.getValue().getActions()));
   } else {
 USER_CACHE.put(entry.getKey(), new 
 Permission(entry.getValue().getActions()));
   }
 }
 {code}
 If authorization checks come in following the .clear() but before 
 repopulating, they will fail.
 We should have some synchronization here to serialize multiple updates and 
 use a COW type rebuild and reassign of the new maps.
 This particular issue crept in with the fix in HBASE-6157, so I'm flagging 
 for 0.94 and 0.96.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6714) TestMultiSlaveReplication#testMultiSlaveReplication may fail

2012-09-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460711#comment-13460711
 ] 

stack commented on HBASE-6714:
--

I love patches that fix broke tests.

In future, rather than this:

{code}
+  if (i == NB_RETRIES - 1) {
+fail(Waited too much time while getting the row.);
+  }
{code}

... just throw an exception rather than call fail?

Let me commit.

 TestMultiSlaveReplication#testMultiSlaveReplication may fail
 

 Key: HBASE-6714
 URL: https://issues.apache.org/jira/browse/HBASE-6714
 Project: HBase
  Issue Type: Bug
  Components: Replication, test
Affects Versions: 0.92.0, 0.94.0
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
Priority: Minor
 Attachments: HBase-6714-v1.patch


 java.lang.AssertionError: expected:1 but was:0
 at org.junit.Assert.fail(Assert.java:93)
 at org.junit.Assert.failNotEquals(Assert.java:647)
 at org.junit.Assert.assertEquals(Assert.java:128)
 at org.junit.Assert.assertEquals(Assert.java:472)
 at org.junit.Assert.assertEquals(Assert.java:456)
 at 
 org.apache.hadoop.hbase.replication.TestMultiSlaveReplication.checkRow(TestMultiSlaveReplication.java:203)
 at 
 org.apache.hadoop.hbase.replication.TestMultiSlaveReplication.testMultiSlaveReplication(TestMultiSlaveReplication.java:188)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 TestMultiSlaveReplication-testMultiSlaveReplication failed in our local 
 build citing that row was not replicated to second peer. This is because 
 after inserting row, log is rolled and we look for row2 in both the 
 clusters and then we check for existence of row in both clusters. 
 Meanwhile, Replication thread was sleeping for the second cluster and Row 
 row2 is not present in the second cluster from the very beginning. So, the 
 row2 existence check succeeds and control move on to find row in both 
 clusters where it fails for the second cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6714) TestMultiSlaveReplication#testMultiSlaveReplication may fail

2012-09-21 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6714:
-

   Resolution: Fixed
Fix Version/s: 0.96.0
   0.94.2
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed to 0.94 and to trunk.  Thanks for the patch Himanshu.

 TestMultiSlaveReplication#testMultiSlaveReplication may fail
 

 Key: HBASE-6714
 URL: https://issues.apache.org/jira/browse/HBASE-6714
 Project: HBase
  Issue Type: Bug
  Components: Replication, test
Affects Versions: 0.92.0, 0.94.0
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
Priority: Minor
 Fix For: 0.94.2, 0.96.0

 Attachments: HBase-6714-v1.patch


 java.lang.AssertionError: expected:1 but was:0
 at org.junit.Assert.fail(Assert.java:93)
 at org.junit.Assert.failNotEquals(Assert.java:647)
 at org.junit.Assert.assertEquals(Assert.java:128)
 at org.junit.Assert.assertEquals(Assert.java:472)
 at org.junit.Assert.assertEquals(Assert.java:456)
 at 
 org.apache.hadoop.hbase.replication.TestMultiSlaveReplication.checkRow(TestMultiSlaveReplication.java:203)
 at 
 org.apache.hadoop.hbase.replication.TestMultiSlaveReplication.testMultiSlaveReplication(TestMultiSlaveReplication.java:188)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 TestMultiSlaveReplication-testMultiSlaveReplication failed in our local 
 build citing that row was not replicated to second peer. This is because 
 after inserting row, log is rolled and we look for row2 in both the 
 clusters and then we check for existence of row in both clusters. 
 Meanwhile, Replication thread was sleeping for the second cluster and Row 
 row2 is not present in the second cluster from the very beginning. So, the 
 row2 existence check succeeds and control move on to find row in both 
 clusters where it fails for the second cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields

2012-09-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460717#comment-13460717
 ] 

stack commented on HBASE-6852:
--

[~lhofhansl] I made my comment before I saw Todd's suggestion

 SchemaMetrics.updateOnCacheHit costs too much while full scanning a table 
 with all of its fields
 

 Key: HBASE-6852
 URL: https://issues.apache.org/jira/browse/HBASE-6852
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.94.0
Reporter: Cheng Hao
Priority: Minor
  Labels: performance
 Fix For: 0.94.3, 0.96.0

 Attachments: onhitcache-trunk.patch


 The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full 
 table scanning.
 Here is the top 5 hotspots within regionserver while full scanning a table: 
 (Sorry for the less-well-format)
 CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated)
 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit 
 mask of 0x00 (No unit mask) count 500
 samples  %image name   symbol name
 ---
 9844713.4324  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean)
   98447100.000  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean) [self]
 ---
 45814 6.2510  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int)
   45814100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int) [self]
 ---
 43523 5.9384  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
   43523100.000  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
  [self]
 ---
 42548 5.8054  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int)
   42548100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int) [self]
 ---
 40572 5.5358  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1
   40572100.000  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields

2012-09-21 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460725#comment-13460725
 ] 

Elliott Clark commented on HBASE-6852:
--

I think we should start doing more of what this patch does. Collect the values 
locally and then use a single call into the metrics sources to push the 
collected metrics.  In addition I think that we should remove some of the 
lesser used dynamic metrics, and for other stop using the time varying rate.

For the most part I think that will remove the cost of metrics getting too out 
of control.  However I don't think that we should stop using 
AtomicLong/AtomicInt. From my understanding on most architectures the JVM will 
turn getAndIncrement into just one cpu instruction, rather than using compare 
and swap.  So there's very little gained by sacrificing correctness.

 SchemaMetrics.updateOnCacheHit costs too much while full scanning a table 
 with all of its fields
 

 Key: HBASE-6852
 URL: https://issues.apache.org/jira/browse/HBASE-6852
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.94.0
Reporter: Cheng Hao
Priority: Minor
  Labels: performance
 Fix For: 0.94.3, 0.96.0

 Attachments: onhitcache-trunk.patch


 The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full 
 table scanning.
 Here is the top 5 hotspots within regionserver while full scanning a table: 
 (Sorry for the less-well-format)
 CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated)
 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit 
 mask of 0x00 (No unit mask) count 500
 samples  %image name   symbol name
 ---
 9844713.4324  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean)
   98447100.000  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean) [self]
 ---
 45814 6.2510  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int)
   45814100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int) [self]
 ---
 43523 5.9384  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
   43523100.000  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
  [self]
 ---
 42548 5.8054  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int)
   42548100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int) [self]
 ---
 40572 5.5358  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1
   40572100.000  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6738) Too aggressive task resubmission from the distributed log manager

2012-09-21 Thread nkeywal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-6738:
---

Status: Patch Available  (was: Open)

 Too aggressive task resubmission from the distributed log manager
 -

 Key: HBASE-6738
 URL: https://issues.apache.org/jira/browse/HBASE-6738
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Affects Versions: 0.94.1, 0.96.0
 Environment: 3 nodes cluster test, but can occur as well on a much 
 bigger one. It's all luck!
Reporter: nkeywal
Priority: Critical
 Attachments: 6738.v1.patch


 With default settings for hbase.splitlog.manager.timeout = 25s and 
 hbase.splitlog.max.resubmit = 3.
 On tests mentionned on HBASE-5843, I have variations around this scenario, 
 0.94 + HDFS 1.0.3:
 The regionserver in charge of the split does not answer in less than 25s, so 
 it gets interrupted but actually continues. Sometimes, we go out of the 
 number of retry, sometimes not, sometimes we're out of retry, but the as the 
 interrupts were ignored we finish nicely. In the mean time, the same single 
 task is executed in parallel by multiple nodes, increasing the probability to 
 get into race conditions.
 Details:
 t0: unplug a box with DN+RS
 t + x: other boxes are already connected, to their connection starts to dies. 
 Nevertheless, they don't consider this node as suspect.
 t + 180s: zookeeper - master detects the node as dead. recovery start. It 
 can be less than 180s sometimes it around 150s.
 t + 180s: distributed split starts. There is only 1 task, it's immediately 
 acquired by a one RS.
 t + 205s: the RS has multiple errors when splitting, because a datanode is 
 missing as well. The master decides to give the task to someone else. But 
 often the task continues in the first RS. Interrupts are often ignored, as 
 it's well stated in the code (// TODO interrupt often gets swallowed, do 
 what else?)
 {code}
2012-09-04 18:27:30,404 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to 
 stop the worker thread
 {code}
 t + 211s: two regionsservers are processing the same task. They fight for the 
 leases:
 {code}
 2012-09-04 18:27:32,004 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
 Exception: org.apache.hadoop.ipc.RemoteException:  
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch 
 on

 /hbase/TABLE/4d1c1a4695b1df8c58d13382b834332e/recovered.edits/037.temp
  owned by DFSClient_hb_rs_BOX2,60020,1346775882980 but is accessed by 
 DFSClient_hb_rs_BOX1,60020,1346775719125
 {code}
  They can fight like this for many files, until the tasks finally get 
 interrupted or finished.
  The taks on the second box can be cancelled as well. In this case, the 
 task is created again for a new box.
  The master seems to stop after 3 attemps. It can as well renounce to 
 split the files. Sometimes the tasks were not cancelled on the RS side, so 
 the split is finished despites what the master thinks and logs. In this case, 
 the assignement starts. In the other, it's we've got a problem).
 {code}
 2012-09-04 18:43:52,724 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 Skipping resubmissions of task 
 /hbase/splitlog/hdfs%3A%2F%2FBOX1%3A9000%2Fhbase%2F.logs%2FBOX0%2C60020%2C1346776587640-splitting%2FBOX0%252C60020%252C1346776587640.1346776587832
  because threshold 3 reached 
 {code}
 t + 300s: split is finished. Assignement starts
 t + 330s: assignement is finished, regions are available again.
 There are a lot of subcases possible depending on the number of logs files, 
 of region server and so on.
 The issues are:
 1) it's difficult, especially in HBase but not only, to interrupt a task. The 
 pattern is often
 {code}
  void f() throws IOException{
   try {
  // whatever throw InterruptedException
   }catch(InterruptedException){
 throw new InterruptedIOException();
   }
 }
  boolean g(){
int nbRetry= 0;  
for(;;)
   try{
  f();
  return true;
   }catch(IOException e){
  nbRetry++;
  if ( nbRetry  maxRetry) return false;
   }
} 
  }
 {code}
 This tyically shallows the interrupt. There are other variation, but this one 
 seems to be the standard.
 Even if we fix this in HBase, we need the other layers to be Interrupteble as 
 well. That's not proven.
 2) 25s is very aggressive, considering that we have a default timeout of 180s 
 for zookeeper. In other words, we give 180s to a regionserver before acting, 
 but when it comes to split, it's 25s only. There may be reasons for this, but 
 it seems dangerous, as during a failure the cluster is less available than 
 during normal operations. We could do stuff around this, for example:
 = 

[jira] [Updated] (HBASE-6738) Too aggressive task resubmission from the distributed log manager

2012-09-21 Thread nkeywal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-6738:
---

Attachment: 6738.v1.patch

 Too aggressive task resubmission from the distributed log manager
 -

 Key: HBASE-6738
 URL: https://issues.apache.org/jira/browse/HBASE-6738
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Affects Versions: 0.94.1, 0.96.0
 Environment: 3 nodes cluster test, but can occur as well on a much 
 bigger one. It's all luck!
Reporter: nkeywal
Priority: Critical
 Attachments: 6738.v1.patch


 With default settings for hbase.splitlog.manager.timeout = 25s and 
 hbase.splitlog.max.resubmit = 3.
 On tests mentionned on HBASE-5843, I have variations around this scenario, 
 0.94 + HDFS 1.0.3:
 The regionserver in charge of the split does not answer in less than 25s, so 
 it gets interrupted but actually continues. Sometimes, we go out of the 
 number of retry, sometimes not, sometimes we're out of retry, but the as the 
 interrupts were ignored we finish nicely. In the mean time, the same single 
 task is executed in parallel by multiple nodes, increasing the probability to 
 get into race conditions.
 Details:
 t0: unplug a box with DN+RS
 t + x: other boxes are already connected, to their connection starts to dies. 
 Nevertheless, they don't consider this node as suspect.
 t + 180s: zookeeper - master detects the node as dead. recovery start. It 
 can be less than 180s sometimes it around 150s.
 t + 180s: distributed split starts. There is only 1 task, it's immediately 
 acquired by a one RS.
 t + 205s: the RS has multiple errors when splitting, because a datanode is 
 missing as well. The master decides to give the task to someone else. But 
 often the task continues in the first RS. Interrupts are often ignored, as 
 it's well stated in the code (// TODO interrupt often gets swallowed, do 
 what else?)
 {code}
2012-09-04 18:27:30,404 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to 
 stop the worker thread
 {code}
 t + 211s: two regionsservers are processing the same task. They fight for the 
 leases:
 {code}
 2012-09-04 18:27:32,004 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
 Exception: org.apache.hadoop.ipc.RemoteException:  
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch 
 on

 /hbase/TABLE/4d1c1a4695b1df8c58d13382b834332e/recovered.edits/037.temp
  owned by DFSClient_hb_rs_BOX2,60020,1346775882980 but is accessed by 
 DFSClient_hb_rs_BOX1,60020,1346775719125
 {code}
  They can fight like this for many files, until the tasks finally get 
 interrupted or finished.
  The taks on the second box can be cancelled as well. In this case, the 
 task is created again for a new box.
  The master seems to stop after 3 attemps. It can as well renounce to 
 split the files. Sometimes the tasks were not cancelled on the RS side, so 
 the split is finished despites what the master thinks and logs. In this case, 
 the assignement starts. In the other, it's we've got a problem).
 {code}
 2012-09-04 18:43:52,724 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 Skipping resubmissions of task 
 /hbase/splitlog/hdfs%3A%2F%2FBOX1%3A9000%2Fhbase%2F.logs%2FBOX0%2C60020%2C1346776587640-splitting%2FBOX0%252C60020%252C1346776587640.1346776587832
  because threshold 3 reached 
 {code}
 t + 300s: split is finished. Assignement starts
 t + 330s: assignement is finished, regions are available again.
 There are a lot of subcases possible depending on the number of logs files, 
 of region server and so on.
 The issues are:
 1) it's difficult, especially in HBase but not only, to interrupt a task. The 
 pattern is often
 {code}
  void f() throws IOException{
   try {
  // whatever throw InterruptedException
   }catch(InterruptedException){
 throw new InterruptedIOException();
   }
 }
  boolean g(){
int nbRetry= 0;  
for(;;)
   try{
  f();
  return true;
   }catch(IOException e){
  nbRetry++;
  if ( nbRetry  maxRetry) return false;
   }
} 
  }
 {code}
 This tyically shallows the interrupt. There are other variation, but this one 
 seems to be the standard.
 Even if we fix this in HBase, we need the other layers to be Interrupteble as 
 well. That's not proven.
 2) 25s is very aggressive, considering that we have a default timeout of 180s 
 for zookeeper. In other words, we give 180s to a regionserver before acting, 
 but when it comes to split, it's 25s only. There may be reasons for this, but 
 it seems dangerous, as during a failure the cluster is less available than 
 during normal operations. We could do stuff around this, for example:
 = Obvious 

[jira] [Commented] (HBASE-6651) Thread safety of HTablePool is doubtful

2012-09-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460726#comment-13460726
 ] 

stack commented on HBASE-6651:
--

Hiroshi Your work is more palatable as a patch rather than zip file.  This 
might be of use to you: http://hbase.apache.org/book.html#submitting.patches  
Thanks for looking into this stuff.

 Thread safety of HTablePool is doubtful
 ---

 Key: HBASE-6651
 URL: https://issues.apache.org/jira/browse/HBASE-6651
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.1
Reporter: Hiroshi Ikeda
Priority: Minor
 Attachments: sample.zip, sample.zip, sharedmap_for_hbaseclient.zip


 There are some operations in HTablePool to access to PoolMap in multiple 
 times without any explict synchronization. 
 For example HTablePool.closeTablePool() calles PoolMap.values(), and calles 
 PoolMap.remove(). If other threads add new instances to the pool in the 
 middle of the calls, the new added instances might be dropped. 
 (HTablePool.closeTablePool() also has another problem that calling it by 
 multple threads causes accessing HTable by multiple threads.)
 Moreover, PoolMap is not thread safe for the same reason.
 For example PoolMap.put() calles ConcurrentMap.get() and calles 
 ConcurrentMap.put(). If other threads add a new instance to the concurent map 
 in the middle of the calls, the new instance might be dropped.
 And also implementations of Pool have the same problems.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6738) Too aggressive task resubmission from the distributed log manager

2012-09-21 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460730#comment-13460730
 ] 

nkeywal commented on HBASE-6738:


This is a low profile patch, in which I tried to limit the impact to a 
minimum. I don't know if something more ambitious should not be done (i.e. 
cleaning up this stuff), but...

Reviews welcome. I have not tried it on a real cluster.

 Too aggressive task resubmission from the distributed log manager
 -

 Key: HBASE-6738
 URL: https://issues.apache.org/jira/browse/HBASE-6738
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Affects Versions: 0.94.1, 0.96.0
 Environment: 3 nodes cluster test, but can occur as well on a much 
 bigger one. It's all luck!
Reporter: nkeywal
Priority: Critical
 Attachments: 6738.v1.patch


 With default settings for hbase.splitlog.manager.timeout = 25s and 
 hbase.splitlog.max.resubmit = 3.
 On tests mentionned on HBASE-5843, I have variations around this scenario, 
 0.94 + HDFS 1.0.3:
 The regionserver in charge of the split does not answer in less than 25s, so 
 it gets interrupted but actually continues. Sometimes, we go out of the 
 number of retry, sometimes not, sometimes we're out of retry, but the as the 
 interrupts were ignored we finish nicely. In the mean time, the same single 
 task is executed in parallel by multiple nodes, increasing the probability to 
 get into race conditions.
 Details:
 t0: unplug a box with DN+RS
 t + x: other boxes are already connected, to their connection starts to dies. 
 Nevertheless, they don't consider this node as suspect.
 t + 180s: zookeeper - master detects the node as dead. recovery start. It 
 can be less than 180s sometimes it around 150s.
 t + 180s: distributed split starts. There is only 1 task, it's immediately 
 acquired by a one RS.
 t + 205s: the RS has multiple errors when splitting, because a datanode is 
 missing as well. The master decides to give the task to someone else. But 
 often the task continues in the first RS. Interrupts are often ignored, as 
 it's well stated in the code (// TODO interrupt often gets swallowed, do 
 what else?)
 {code}
2012-09-04 18:27:30,404 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to 
 stop the worker thread
 {code}
 t + 211s: two regionsservers are processing the same task. They fight for the 
 leases:
 {code}
 2012-09-04 18:27:32,004 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
 Exception: org.apache.hadoop.ipc.RemoteException:  
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch 
 on

 /hbase/TABLE/4d1c1a4695b1df8c58d13382b834332e/recovered.edits/037.temp
  owned by DFSClient_hb_rs_BOX2,60020,1346775882980 but is accessed by 
 DFSClient_hb_rs_BOX1,60020,1346775719125
 {code}
  They can fight like this for many files, until the tasks finally get 
 interrupted or finished.
  The taks on the second box can be cancelled as well. In this case, the 
 task is created again for a new box.
  The master seems to stop after 3 attemps. It can as well renounce to 
 split the files. Sometimes the tasks were not cancelled on the RS side, so 
 the split is finished despites what the master thinks and logs. In this case, 
 the assignement starts. In the other, it's we've got a problem).
 {code}
 2012-09-04 18:43:52,724 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 Skipping resubmissions of task 
 /hbase/splitlog/hdfs%3A%2F%2FBOX1%3A9000%2Fhbase%2F.logs%2FBOX0%2C60020%2C1346776587640-splitting%2FBOX0%252C60020%252C1346776587640.1346776587832
  because threshold 3 reached 
 {code}
 t + 300s: split is finished. Assignement starts
 t + 330s: assignement is finished, regions are available again.
 There are a lot of subcases possible depending on the number of logs files, 
 of region server and so on.
 The issues are:
 1) it's difficult, especially in HBase but not only, to interrupt a task. The 
 pattern is often
 {code}
  void f() throws IOException{
   try {
  // whatever throw InterruptedException
   }catch(InterruptedException){
 throw new InterruptedIOException();
   }
 }
  boolean g(){
int nbRetry= 0;  
for(;;)
   try{
  f();
  return true;
   }catch(IOException e){
  nbRetry++;
  if ( nbRetry  maxRetry) return false;
   }
} 
  }
 {code}
 This tyically shallows the interrupt. There are other variation, but this one 
 seems to be the standard.
 Even if we fix this in HBase, we need the other layers to be Interrupteble as 
 well. That's not proven.
 2) 25s is very aggressive, considering that we have a default timeout of 180s 
 for zookeeper. In other words, we give 180s to 

[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields

2012-09-21 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460736#comment-13460736
 ] 

Todd Lipcon commented on HBASE-6852:


bq. getAndIncrement into just one cpu instruction

True, but it's a pretty expensive instruction, since it has to steal that cache 
line from whichever other core used it previously, and I believe acts as a full 
memory barrier as well (eg flushing write-combining buffers)


The cliff click counter is effective but has more memory usage. Aggregating 
stuff locally and pushing to metrics seems ideal, but if we can't do that 
easily, then having the metrics per-thread and then occasionally grabbing them 
would work too. Memcached metrics work like that.

 SchemaMetrics.updateOnCacheHit costs too much while full scanning a table 
 with all of its fields
 

 Key: HBASE-6852
 URL: https://issues.apache.org/jira/browse/HBASE-6852
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.94.0
Reporter: Cheng Hao
Priority: Minor
  Labels: performance
 Fix For: 0.94.3, 0.96.0

 Attachments: onhitcache-trunk.patch


 The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full 
 table scanning.
 Here is the top 5 hotspots within regionserver while full scanning a table: 
 (Sorry for the less-well-format)
 CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated)
 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit 
 mask of 0x00 (No unit mask) count 500
 samples  %image name   symbol name
 ---
 9844713.4324  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean)
   98447100.000  14033.jo void 
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
  boolean) [self]
 ---
 45814 6.2510  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int)
   45814100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, 
 byte[], int, int) [self]
 ---
 43523 5.9384  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
   43523100.000  14033.jo boolean 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
  [self]
 ---
 42548 5.8054  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int)
   42548100.000  14033.jo int 
 org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, 
 byte[], int, int) [self]
 ---
 40572 5.5358  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1
   40572100.000  14033.jo int 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
  int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6738) Too aggressive task resubmission from the distributed log manager

2012-09-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460737#comment-13460737
 ] 

stack commented on HBASE-6738:
--

What do you mean here: {quote}This allows to continue if the worker cannot 
actually handle it,
+  //   for any reason.{quote}

This seems like a small change extending timeout while also reacting faster if 
server is actually gone.  I'm +1 on patch.

 Too aggressive task resubmission from the distributed log manager
 -

 Key: HBASE-6738
 URL: https://issues.apache.org/jira/browse/HBASE-6738
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Affects Versions: 0.94.1, 0.96.0
 Environment: 3 nodes cluster test, but can occur as well on a much 
 bigger one. It's all luck!
Reporter: nkeywal
Priority: Critical
 Attachments: 6738.v1.patch


 With default settings for hbase.splitlog.manager.timeout = 25s and 
 hbase.splitlog.max.resubmit = 3.
 On tests mentionned on HBASE-5843, I have variations around this scenario, 
 0.94 + HDFS 1.0.3:
 The regionserver in charge of the split does not answer in less than 25s, so 
 it gets interrupted but actually continues. Sometimes, we go out of the 
 number of retry, sometimes not, sometimes we're out of retry, but the as the 
 interrupts were ignored we finish nicely. In the mean time, the same single 
 task is executed in parallel by multiple nodes, increasing the probability to 
 get into race conditions.
 Details:
 t0: unplug a box with DN+RS
 t + x: other boxes are already connected, to their connection starts to dies. 
 Nevertheless, they don't consider this node as suspect.
 t + 180s: zookeeper - master detects the node as dead. recovery start. It 
 can be less than 180s sometimes it around 150s.
 t + 180s: distributed split starts. There is only 1 task, it's immediately 
 acquired by a one RS.
 t + 205s: the RS has multiple errors when splitting, because a datanode is 
 missing as well. The master decides to give the task to someone else. But 
 often the task continues in the first RS. Interrupts are often ignored, as 
 it's well stated in the code (// TODO interrupt often gets swallowed, do 
 what else?)
 {code}
2012-09-04 18:27:30,404 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to 
 stop the worker thread
 {code}
 t + 211s: two regionsservers are processing the same task. They fight for the 
 leases:
 {code}
 2012-09-04 18:27:32,004 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
 Exception: org.apache.hadoop.ipc.RemoteException:  
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch 
 on

 /hbase/TABLE/4d1c1a4695b1df8c58d13382b834332e/recovered.edits/037.temp
  owned by DFSClient_hb_rs_BOX2,60020,1346775882980 but is accessed by 
 DFSClient_hb_rs_BOX1,60020,1346775719125
 {code}
  They can fight like this for many files, until the tasks finally get 
 interrupted or finished.
  The taks on the second box can be cancelled as well. In this case, the 
 task is created again for a new box.
  The master seems to stop after 3 attemps. It can as well renounce to 
 split the files. Sometimes the tasks were not cancelled on the RS side, so 
 the split is finished despites what the master thinks and logs. In this case, 
 the assignement starts. In the other, it's we've got a problem).
 {code}
 2012-09-04 18:43:52,724 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 Skipping resubmissions of task 
 /hbase/splitlog/hdfs%3A%2F%2FBOX1%3A9000%2Fhbase%2F.logs%2FBOX0%2C60020%2C1346776587640-splitting%2FBOX0%252C60020%252C1346776587640.1346776587832
  because threshold 3 reached 
 {code}
 t + 300s: split is finished. Assignement starts
 t + 330s: assignement is finished, regions are available again.
 There are a lot of subcases possible depending on the number of logs files, 
 of region server and so on.
 The issues are:
 1) it's difficult, especially in HBase but not only, to interrupt a task. The 
 pattern is often
 {code}
  void f() throws IOException{
   try {
  // whatever throw InterruptedException
   }catch(InterruptedException){
 throw new InterruptedIOException();
   }
 }
  boolean g(){
int nbRetry= 0;  
for(;;)
   try{
  f();
  return true;
   }catch(IOException e){
  nbRetry++;
  if ( nbRetry  maxRetry) return false;
   }
} 
  }
 {code}
 This tyically shallows the interrupt. There are other variation, but this one 
 seems to be the standard.
 Even if we fix this in HBase, we need the other layers to be Interrupteble as 
 well. That's not proven.
 2) 25s is very aggressive, considering that we have a default timeout of 180s 
 for zookeeper. In other 

[jira] [Commented] (HBASE-6856) Document the LeaseException thrown in scanner next

2012-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460740#comment-13460740
 ] 

Hudson commented on HBASE-6856:
---

Integrated in HBase-TRUNK #3366 (See 
[https://builds.apache.org/job/HBase-TRUNK/3366/])
HBASE-6856 Document the LeaseException thrown in scanner next (Revision 
1388604)

 Result = FAILURE
stack : 
Files : 
* /hbase/trunk/src/docbkx/troubleshooting.xml


 Document the LeaseException thrown in scanner next
 --

 Key: HBASE-6856
 URL: https://issues.apache.org/jira/browse/HBASE-6856
 Project: HBase
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.92.0
Reporter: Daniel Iancu
Assignee: Daniel Iancu
  Labels: LeaseException
 Fix For: 0.96.0


 In some situations clients that fetch data from a RS get a LeaseException 
 instead of the usual ScannerTimeoutException/UnknownScannerException.
 This particular case should be documented in the HBase guide.
 Some key points
 * the source of exception is: 
 org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:230)
 * it happens in the context of a slow/freezing RS#next
 * it can be prevented by having  hbase.rpc.timeout  
 hbase.regionserver.lease.period
 Harsh J investigated the issue and has some conclusions, see
 http://mail-archives.apache.org/mod_mbox/hbase-user/201209.mbox/%3CCAOcnVr3R-LqtKhFsk8Bhrm-YW2i9O6J6Fhjz2h7q6_sxvwd2yw%40mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6752) On region server failure, serve writes and timeranged reads during the log split

2012-09-21 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460742#comment-13460742
 ] 

nkeywal commented on HBASE-6752:


Seems reasonable, there are still some dark areas around timerange. Let's do 
thing smoothly :-). But I think your comment is right.

Some various points I had in mind:
There is another use case mentionned in HBASE-3745: In some applications, a 
common access pattern is to frequently scan tables with a time range predicate 
restricted to a fairly recent time window. For example, you may want to do an 
incremental aggregation or indexing step only on rows that have changed in the 
last hour. We do this efficiently by tracking min and max timestamp on an HFile 
level, so that old HFiles don't have to be read.


bq. We do want the old edits to come in the correct order of sequence ids 
Imho yes, we should not relax any point of the HBase consistency.

bq. So, we somehow need to cheaply find the correct sequence id to use for the 
new puts. It needs to be bigger than sequence ids for all the edits for that 
region in the log files. So maybe all that's needed here is to open recover the 
latest log file, and scan it to find the last sequence id?
I would like HBase to be resilient to log files issues (no replica, corrupted 
files, overloaded datanodes, bad luck when choosing the datanode to read 
from...) by not opening them at all during this process. Would a guess estimate 
be ok? counting the number of files/blocks to calculate the maximum number of 
id?

bq. Picking a winner among duplicates in two files relies on using sequence id 
of the HFile as a tie-break. And therefore, today, compactions always pick a 
dense subrange of files order by sequence ids. 

I wonder if we need major compactions? I was thinking that they could be 
skipped. But we need to be able to manage small compactions for sure. I imagine 
that we can have some critical cases where we can be in the intermediate state 
a few days: (week end + trying to fix the broken hlog on a test cluster + 
waiting for a non critical moment for fixing the production env)... 




 On region server failure, serve writes and timeranged reads during the log 
 split
 

 Key: HBASE-6752
 URL: https://issues.apache.org/jira/browse/HBASE-6752
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: Gregory Chanan
Priority: Minor

 Opening for write on failure would mean:
 - Assign the region to a new regionserver. It marks the region as recovering
   -- specific exception returned to the client when we cannot server.
   -- allow them to know where they stand. The exception can include some time 
 information (failure stated on: ...)
   -- allow them to go immediately on the right regionserver, instead of 
 retrying or calling the region holding meta to get the new address
  = save network calls, lower the load on meta.
 - Do the split as today. Priority is given to region server holding the new 
 regions
   -- help to share the load balancing code: the split is done by region 
 server considered as available for new regions
   -- help locality (the recovered edits are available on the region server) 
 = lower the network usage
 - When the split is finished, we're done as of today
 - while the split is progressing, the region server can
  -- serve writes
--- that's useful for all application that need to write but not read 
 immediately:
--- whatever logs events to analyze them later
--- opentsdb is a perfect example.   
  -- serve reads if they have a compatible time range. For heavily used 
 tables, it could be an help, because:
--- we can expect to have a few minutes of data only (as it's loaded)
--- the heaviest queries, often accepts a few -or more- minutes delay. 
 Some What if:
 1) the split fails
 = Retry until it works. As today. Just that we serves writes. We need to 
 know (as today) that the region has not recovered if we fail again.
 2) the regionserver fails during the split
 = As 1 and as of today/
 3) the regionserver fails after the split but before the state change to 
 fully available.
 = New assign. More logs to split (the ones already dones and the new ones).
 4) the assignment fails
 = Retry until it works. As today.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6504) Adding GC details prevents HBase from starting in non-distributed mode

2012-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460747#comment-13460747
 ] 

Hudson commented on HBASE-6504:
---

Integrated in HBase-0.94-security #53 (See 
[https://builds.apache.org/job/HBase-0.94-security/53/])
HBASE-6504 Adding GC details prevents HBase from starting in 
non-distributed mode (Revision 1385027)

 Result = SUCCESS
stack : 
Files : 
* /hbase/branches/0.94/bin/rolling-restart.sh
* /hbase/branches/0.94/bin/start-hbase.sh
* /hbase/branches/0.94/bin/stop-hbase.sh


 Adding GC details prevents HBase from starting in non-distributed mode
 --

 Key: HBASE-6504
 URL: https://issues.apache.org/jira/browse/HBASE-6504
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Benoit Sigoure
Assignee: Michael Drzal
Priority: Trivial
  Labels: noob
 Fix For: 0.94.2, 0.96.0

 Attachments: HBASE-6504-output.txt, HBASE-6504.patch, 
 HBASE-6504-v2.patch


 The {{conf/hbase-env.sh}} that ships with HBase contains a few commented out 
 examples of variables that could be useful, such as adding 
 {{-XX:+PrintGCDetails -XX:+PrintGCDateStamps}} to {{HBASE_OPTS}}.  This has 
 the annoying side effect that the JVM prints a summary of memory usage when 
 it exits, and it does so on stdout:
 {code}
 $ ./bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool 
 hbase.cluster.distributed
 false
 Heap
  par new generation   total 19136K, used 4908K [0x00073a20, 
 0x00073b6c, 0x00075186)
   eden space 17024K,  28% used [0x00073a20, 0x00073a6cb0a8, 
 0x00073b2a)
   from space 2112K,   0% used [0x00073b2a, 0x00073b2a, 
 0x00073b4b)
   to   space 2112K,   0% used [0x00073b4b, 0x00073b4b, 
 0x00073b6c)
  concurrent mark-sweep generation total 63872K, used 0K [0x00075186, 
 0x0007556c, 0x0007f5a0)
  concurrent-mark-sweep perm gen total 21248K, used 6994K [0x0007f5a0, 
 0x0007f6ec, 0x0008)
 $ ./bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool 
 hbase.cluster.distributed /dev/null
 (nothing printed)
 {code}
 And this confuses {{bin/start-hbase.sh}} when it does
 {{distMode=`$bin/hbase --config $HBASE_CONF_DIR 
 org.apache.hadoop.hbase.util.HBaseConfTool hbase.cluster.distributed`}}, 
 because then the {{distMode}} variable is not just set to {{false}}, it also 
 contains all this JVM spam.
 If you don't pay enough attention and realize that 3 processes are getting 
 started (ZK, HM, RS) instead of just one (HM), then you end up with this 
 confusing error message:
 {{Could not start ZK at requested port of 2181.  ZK was started at port: 
 2182.  Aborting as clients (e.g. shell) will not be able to find this ZK 
 quorum.}}, which is even more puzzling because when you run {{netstat}} to 
 see who owns that port, then you won't find any rogue process other than the 
 one you just started.
 I'm wondering if the fix is not to just change the {{if [ $distMode == 
 'false' ]}} to a {{switch $distMode case (false*)}} type of test, to work 
 around this annoying JVM misfeature that pollutes stdout.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6803) script hbase should add JAVA_LIBRARY_PATH to LD_LIBRARY_PATH

2012-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460748#comment-13460748
 ] 

Hudson commented on HBASE-6803:
---

Integrated in HBase-0.94-security #53 (See 
[https://builds.apache.org/job/HBase-0.94-security/53/])
HBASE-6803 script hbase should add JAVA_LIBRARY_PATH to LD_LIBRARY_PATH 
(Revision 1387260)

 Result = SUCCESS
jxiang : 
Files : 
* /hbase/branches/0.94/bin/hbase


 script hbase should add JAVA_LIBRARY_PATH to LD_LIBRARY_PATH
 

 Key: HBASE-6803
 URL: https://issues.apache.org/jira/browse/HBASE-6803
 Project: HBase
  Issue Type: Bug
  Components: shell
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.94.2, 0.96.0

 Attachments: trunk-6803.patch


 Snappy SO fails to load properly if LD_LIBRARY_PATH does not include the path 
 where snappy SO is.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460750#comment-13460750
 ] 

Hudson commented on HBASE-6649:
---

Integrated in HBase-0.94-security #53 (See 
[https://builds.apache.org/job/HBase-0.94-security/53/])
HBASE-6847  HBASE-6649 broke replication (Devaraj Das via JD) (Revision 
1388160)

 Result = SUCCESS
jdcryans : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
Priority: Blocker
 Fix For: 0.92.3, 0.94.2, 0.96.0

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-fix-io-exception-handling-1.patch, 
 6649-fix-io-exception-handling-1-trunk.patch, 
 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 
 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 
 #502 test - queueFailover [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6792) Remove interface audience annotations in 0.94/0.92 introduced by HBASE-6516

2012-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460752#comment-13460752
 ] 

Hudson commented on HBASE-6792:
---

Integrated in HBase-0.94-security #53 (See 
[https://builds.apache.org/job/HBase-0.94-security/53/])
HBASE-6792 Remove interface audience annotations in 0.94/0.92 introduced by 
HBASE-6516 (Revision 1384947)

 Result = SUCCESS
jmhsieh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/TableInfoMissingException.java


 Remove interface audience annotations in 0.94/0.92 introduced by HBASE-6516
 ---

 Key: HBASE-6792
 URL: https://issues.apache.org/jira/browse/HBASE-6792
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.92.3, 0.94.2

 Attachments: hbase-6792.patch


 bq. An InterfaceAudience slipped into 0.94 here. It breaks 0.94 for older 
 versions of hadoop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6842) the jar used in coprocessor is not deleted in local which will exhaust the space of /tmp

2012-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460751#comment-13460751
 ] 

Hudson commented on HBASE-6842:
---

Integrated in HBase-0.94-security #53 (See 
[https://builds.apache.org/job/HBase-0.94-security/53/])
HBASE-6842 the jar used in coprocessor is not deleted in local which will 
exhaust the space of /tmp (Revision 1387862)

 Result = SUCCESS
stack : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java


 the jar used in  coprocessor is not deleted in local which will exhaust  the 
 space of /tmp 
 ---

 Key: HBASE-6842
 URL: https://issues.apache.org/jira/browse/HBASE-6842
 Project: HBase
  Issue Type: Bug
  Components: Coprocessors
Affects Versions: 0.94.1
Reporter: Zhou wenjian
Assignee: Zhou wenjian
Priority: Critical
 Fix For: 0.94.2, 0.96.0

 Attachments: HBASE-6842-trunk.patch


 FileSystem fs = path.getFileSystem(HBaseConfiguration.create());
   Path dst = new Path(System.getProperty(java.io.tmpdir) +
   java.io.File.separator +. + pathPrefix +
   . + className + . + System.currentTimeMillis() + .jar);
 fs.copyToLocalFile(path, dst);
 fs.deleteOnExit(dst);
 change to 
 File tmpLocal = new File(dst.toString());
 tmpLocal.deleteOnExit();
 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6438) RegionAlreadyInTransitionException needs to give more info to avoid assignment inconsistencies

2012-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460753#comment-13460753
 ] 

Hudson commented on HBASE-6438:
---

Integrated in HBase-0.94-security #53 (See 
[https://builds.apache.org/job/HBase-0.94-security/53/])
HBASE-6438 Addendum checks regionAlreadyInTransitionException when 
generating region plan (Chunhui) (Revision 1387209)
HBASE-6438 RegionAlreadyInTransitionException needs to give more info to avoid 
assignment inconsistencies (Rajesh) (Revision 1385209)

 Result = SUCCESS
tedyu : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java

tedyu : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java


 RegionAlreadyInTransitionException needs to give more info to avoid 
 assignment inconsistencies
 --

 Key: HBASE-6438
 URL: https://issues.apache.org/jira/browse/HBASE-6438
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: rajeshbabu
 Fix For: 0.92.3, 0.94.2, 0.96.0

 Attachments: 6438-0.92.txt, 6438.addendum, 6438-addendum.94, 
 6438-trunk_2.patch, HBASE-6438_2.patch, HBASE-6438_94_3.patch, 
 HBASE-6438_94_4.patch, HBASE-6438_94.patch, HBASE-6438-trunk_2.patch, 
 HBASE-6438_trunk.patch


 Seeing some of the recent issues in region assignment, 
 RegionAlreadyInTransitionException is one reason after which the region 
 assignment may or may not happen(in the sense we need to wait for the TM to 
 assign).
 In HBASE-6317 we got one problem due to RegionAlreadyInTransitionException on 
 master restart.
 Consider the following case, due to some reason like master restart or 
 external assign call, we try to assign a region that is already getting 
 opened in a RS.
 Now the next call to assign has already changed the state of the znode and so 
 the current assign that is going on the RS is affected and it fails.  The 
 second assignment that started also fails getting RAITE exception.  Finally 
 both assignments not carrying on.  Idea is to find whether any such RAITE 
 exception can be retried or not.
 Here again we have following cases like where
 - The znode is yet to transitioned from OFFLINE to OPENING in RS
 - RS may be in the step of openRegion.
 - RS may be trying to transition OPENING to OPENED.
 - RS is yet to add to online regions in the RS side.
 Here in openRegion() and updateMeta() any failures we are moving the znode to 
 FAILED_OPEN.  So in these cases getting an RAITE should be ok.  But in other 
 cases the assignment is stopped.
 The idea is to just add the current state of the region assignment in the RIT 
 map in the RS side and using that info we can determine whether the 
 assignment can be retried or not on getting an RAITE.
 Considering the current work going on in AM, pls do share if this is needed 
 atleast in the 0.92/0.94 versions?  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6844) upgrade 0.23 version dependency in 0.94

2012-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460754#comment-13460754
 ] 

Hudson commented on HBASE-6844:
---

Integrated in HBase-0.94-security #53 (See 
[https://builds.apache.org/job/HBase-0.94-security/53/])
HBASE-6844 upgrade 0.23 version dependency in 0.94 (Revision 1387856)

 Result = SUCCESS
stack : 
Files : 
* /hbase/branches/0.94/pom.xml


 upgrade 0.23 version dependency in 0.94
 ---

 Key: HBASE-6844
 URL: https://issues.apache.org/jira/browse/HBASE-6844
 Project: HBase
  Issue Type: Bug
Reporter: Francis Liu
Assignee: Francis Liu
 Fix For: 0.92.3, 0.94.2

 Attachments: 6844-092.txt, 6844.txt


 hadoop 0.23 has been promoted to stable. The snapshot jar no longer exists in 
 maven.
 https://repository.apache.org/content/repositories/releases/org/apache/hadoop/hadoop-common/0.23.3/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   3   >