[jira] [Updated] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.

2011-08-20 Thread gaojinchao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaojinchao updated HBASE-4124:
--

Attachment: HBASE-4124_Branch90V2.patch

 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 

 Key: HBASE-4124
 URL: https://issues.apache.org/jira/browse/HBASE-4124
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: fulin wang
 Attachments: HBASE-4124_Branch90V1_trial.patch, 
 HBASE-4124_Branch90V2.patch, log.txt

   Original Estimate: 0.4h
  Remaining Estimate: 0.4h

 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 Issue:
 The RS failed besause of 'already online on this server' and return; The HM 
 can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.

2011-08-20 Thread gaojinchao (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088146#comment-13088146
 ] 

gaojinchao commented on HBASE-4124:
---

I have finished the test. I discribe the scene:
step 1: startup cluster 
step 2: abort the master when finish call sendRegionOpen(destination, regions)
step 3: startup cluster again.

above steps will reproduce the issue. 
when master is failover. the meta records the dead server,but the region is 
processing for a living region server.


 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 

 Key: HBASE-4124
 URL: https://issues.apache.org/jira/browse/HBASE-4124
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: fulin wang
 Attachments: HBASE-4124_Branch90V1_trial.patch, 
 HBASE-4124_Branch90V2.patch, log.txt

   Original Estimate: 0.4h
  Remaining Estimate: 0.4h

 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 Issue:
 The RS failed besause of 'already online on this server' and return; The HM 
 can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.

2011-08-20 Thread gaojinchao (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088147#comment-13088147
 ] 

gaojinchao commented on HBASE-4124:
---

sorry.step 3: startup master again .

 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 

 Key: HBASE-4124
 URL: https://issues.apache.org/jira/browse/HBASE-4124
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: fulin wang
 Attachments: HBASE-4124_Branch90V1_trial.patch, 
 HBASE-4124_Branch90V2.patch, log.txt

   Original Estimate: 0.4h
  Remaining Estimate: 0.4h

 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 Issue:
 The RS failed besause of 'already online on this server' and return; The HM 
 can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4209) The HBase hbase-daemon.sh SIGKILLs master when stopping it

2011-08-20 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088152#comment-13088152
 ] 

stack commented on HBASE-4209:
--

Is it because no shutdown hook in master and when in standalone mode all runs 
in the one jvm, the master's effectively?

In start-hbase.sh, if distmode is false, we ONLY start master:

{code}
if [ $distMode == 'false' ]
then
  $bin/hbase-daemon.sh start master
else
  $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} start zookeeper
  $bin/hbase-daemon.sh --config ${HBASE_CONF_DIR} start master
  $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} \
--hosts ${HBASE_REGIONSERVERS} start regionserver
  $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} \
--hosts ${HBASE_BACKUP_MASTERS} start master-backup
fi
{code}

Inside in master it will take care of starting up all the other beasties if 
distmode == false.

 The HBase hbase-daemon.sh SIGKILLs master when stopping it
 --

 Key: HBASE-4209
 URL: https://issues.apache.org/jira/browse/HBASE-4209
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Roman Shaposhnik

 There's a bit of code in hbase-daemon.sh that makes HBase master being 
 SIGKILLed when stopping it rather than trying SIGTERM (like it does for other 
 daemons). When HBase is executed in a standalone mode (and the only daemon 
 you need to run is master) that causes newly created tables to go missing as 
 unflushed data is thrown out. If there was not a good reason to kill master 
 with SIGKILL perhaps we can take that special case out and rely on SIGTERM.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.

2011-08-20 Thread gaojinchao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaojinchao updated HBASE-4124:
--

Attachment: HBASE-4124_Branch90V2.patch

 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 

 Key: HBASE-4124
 URL: https://issues.apache.org/jira/browse/HBASE-4124
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: fulin wang
 Attachments: HBASE-4124_Branch90V1_trial.patch, 
 HBASE-4124_Branch90V2.patch, log.txt

   Original Estimate: 0.4h
  Remaining Estimate: 0.4h

 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 Issue:
 The RS failed besause of 'already online on this server' and return; The HM 
 can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.

2011-08-20 Thread gaojinchao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaojinchao updated HBASE-4124:
--

Attachment: (was: HBASE-4124_Branch90V2.patch)

 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 

 Key: HBASE-4124
 URL: https://issues.apache.org/jira/browse/HBASE-4124
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: fulin wang
 Attachments: HBASE-4124_Branch90V1_trial.patch, 
 HBASE-4124_Branch90V2.patch, log.txt

   Original Estimate: 0.4h
  Remaining Estimate: 0.4h

 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 Issue:
 The RS failed besause of 'already online on this server' and return; The HM 
 can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.

2011-08-20 Thread gaojinchao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaojinchao updated HBASE-4124:
--

Attachment: HBASE-4124_Branch90V2.patch

 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 

 Key: HBASE-4124
 URL: https://issues.apache.org/jira/browse/HBASE-4124
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: fulin wang
 Attachments: HBASE-4124_Branch90V1_trial.patch, 
 HBASE-4124_Branch90V2.patch, log.txt

   Original Estimate: 0.4h
  Remaining Estimate: 0.4h

 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 Issue:
 The RS failed besause of 'already online on this server' and return; The HM 
 can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.

2011-08-20 Thread gaojinchao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaojinchao updated HBASE-4124:
--

Attachment: (was: HBASE-4124_Branch90V2.patch)

 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 

 Key: HBASE-4124
 URL: https://issues.apache.org/jira/browse/HBASE-4124
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: fulin wang
 Attachments: HBASE-4124_Branch90V1_trial.patch, 
 HBASE-4124_Branch90V2.patch, log.txt

   Original Estimate: 0.4h
  Remaining Estimate: 0.4h

 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 Issue:
 The RS failed besause of 'already online on this server' and return; The HM 
 can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.

2011-08-20 Thread gaojinchao (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088173#comment-13088173
 ] 

gaojinchao commented on HBASE-4124:
---

I have added a test case for opening a region.

 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 

 Key: HBASE-4124
 URL: https://issues.apache.org/jira/browse/HBASE-4124
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: fulin wang
 Attachments: HBASE-4124_Branch90V1_trial.patch, 
 HBASE-4124_Branch90V2.patch, log.txt

   Original Estimate: 0.4h
  Remaining Estimate: 0.4h

 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 Issue:
 The RS failed besause of 'already online on this server' and return; The HM 
 can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4027) Enable direct byte buffers LruBlockCache

2011-08-20 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088211#comment-13088211
 ] 

jirapos...@reviews.apache.org commented on HBASE-4027:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1214/#review1585
---


In SingleSizeCache.cacheBlock():
CacheablePair newEntry = new CacheablePair(
toBeCached.serialize(storedBlock), storedBlock);
The above operation splits toBeCached into two parts: the first is for on-heap 
and is slim, storedBlock is for off-heap.


src/main/java/org/apache/hadoop/hbase/io/hfile/Cacheable.java
https://reviews.apache.org/r/1214/#comment3574

I think the word 'itself' in the javadoc above introduced confusion. It 
should be removed.



src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
https://reviews.apache.org/r/1214/#comment3573

As Pi explained in Cacheable interface, serialize() offloads majority of 
data to off-heap ByteBuffer. What gets returned is the skeleton that lives 
on-heap.


- Ted


On 2011-08-19 20:21:35, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1214/
bq.  ---
bq.  
bq.  (Updated 2011-08-19 20:21:35)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, Jonathan 
Gray, and Li Pi.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Review request - I apparently can't edit tlipcon's earlier posting of my 
diff, so creating a new one.
bq.  
bq.  
bq.  This addresses bug HBase-4027.
bq.  https://issues.apache.org/jira/browse/HBase-4027
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.conf/hbase-env.sh 2d55d27 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/BlockCache.java 2d4002c 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/CacheStats.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/Cacheable.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/CachedBlock.java 3b130d8 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/DoubleBlockCache.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java 097dc50 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java 
1338453 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/SimpleBlockCache.java 
886c31d 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SingleSizeCache.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/slab/Slab.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabCache.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabItemEvictionWatcher.java
 PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
e2c6c93 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 
7b7bf73 
bq.src/main/java/org/apache/hadoop/hbase/util/DirectMemoryUtils.java 
PRE-CREATION 
bq.src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java 
PRE-CREATION 
bq.src/test/java/org/apache/hadoop/hbase/io/hfile/TestCachedBlockQueue.java 
1ad2ece 
bq.src/test/java/org/apache/hadoop/hbase/io/hfile/TestLruBlockCache.java 
f0a9832 
bq.
src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSingleSizeCache.java 
PRE-CREATION 
bq.src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSlab.java 
PRE-CREATION 
bq.src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSlabCache.java 
PRE-CREATION 
bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStoreLAB.java 
d7e43a0 
bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 
4387170 
bq.  
bq.  Diff: https://reviews.apache.org/r/1214/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Ran benchmarks against it in HBase standalone mode. Wrote test cases for 
all classes, multithreaded test cases exist for the cache.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



 Enable direct byte buffers LruBlockCache
 

 Key: HBASE-4027
 URL: https://issues.apache.org/jira/browse/HBASE-4027
 Project: HBase
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Li Pi
Priority: Minor
 Attachments: 4027-v5.diff, 4027v7.diff, HBase-4027 (1).pdf, 
 HBase-4027.pdf, HBase4027v8.diff, HBase4027v9.diff, hbase-4027-v10.5.diff, 
 hbase-4027-v10.diff, hbase-4027v10.6.diff, hbase-4027v6.diff, 
 hbase4027v11.5.diff, hbase4027v11.6.diff, hbase4027v11.7.diff, 
 hbase4027v11.diff, hbase4027v12.1.diff, 

[jira] [Commented] (HBASE-4167) Potential leak of HTable instances when using HTablePool with PoolType.ThreadLocal

2011-08-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088214#comment-13088214
 ] 

Ted Yu commented on HBASE-4167:
---

+1 on patch.

 Potential leak of HTable instances when using HTablePool with 
 PoolType.ThreadLocal
 --

 Key: HBASE-4167
 URL: https://issues.apache.org/jira/browse/HBASE-4167
 Project: HBase
  Issue Type: Bug
  Components: client
Reporter: Gary Helmling
 Fix For: 0.92.0

 Attachments: HBASE-4167.patch


 (Initially discussed in HBASE-4150)
 In HTablePool, when obtaining a table:
 {code}
 private HTableInterface findOrCreateTable(String tableName) {
 HTableInterface table = tables.get(tableName);
 if (table == null) {
   table = createHTable(tableName);
 }
 return table;
   }
 {code}
 In the case of {{ThreadLocalPool}}, it seems like there's an exposure here 
 between when the table is created initially and when 
 {{ThreadLocalPool.put()}} is called to set the thread local variable (on 
 {{PooledHTable.close()}}).
 Potential solution described by Karthick Sankarachary:
 For one thing, we might want to clear the tables variable when the 
 {{HTablePool}} is closed (as shown below). For another, we should override 
 ThreadLocalPool#get method so that it removes the resource, otherwise it 
 might end up referencing a HTableInterface that's has been released.
 {code}
 1 diff --git a/src/main/java/org/apache/hadoop/hbase/client/HTablePool.java 
 b/src/main/java/org/apache/hadoop/hbase/client/HTablePool.java
   2 index 952a3aa..c198f15 100755
   3 --- a/src/main/java/org/apache/hadoop/hbase/client/HTablePool.java
   4 +++ b/src/main/java/org/apache/hadoop/hbase/client/HTablePool.java
  13 @@ -309,6 +310,7 @@ public class HTablePool implements Closeable {
  14  for (String tableName : tables.keySet()) {
  15closeTablePool(tableName);
  16  }
  17 +this.tables.clear();
  18}
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4222) Make HLog more resilient to write pipeline failures

2011-08-20 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088219#comment-13088219
 ] 

jirapos...@reviews.apache.org commented on HBASE-4222:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1590/#review1586
---

Ship it!


TestHLog and TestLogRolling passed.

- Ted


On 2011-08-20 05:39:30, Gary Helmling wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1590/
bq.  ---
bq.  
bq.  (Updated 2011-08-20 05:39:30)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch corrects a few problems, as I see it, with the current log 
rolling process:
bq.  
bq.  1) HLog.LogSyncer.run() now handles an IOException in the inner while 
loop.  Previously any IOException would cause the LogSyncer thread to exit, 
even if the subsequent log roll succeeded.  This would mean the region server 
kept running without a LogSyncer thread
bq.  2) Log rolls triggered by IOExceptions were being skipped in the event 
that there were no entries in the log.  This would prevent the log from being 
recovered in a timely manner.
bq.  3) minor - FailedLogCloseException was never actually being thrown out of 
HLog.cleanupCurrentWriter(), resulting in inaccurate logging on RS abort
bq.  
bq.  The bigger change is the addition of a configuration property -- 
hbase.regionserver.logroll.errors.tolerated -- that is checked against a 
counter of consecutive close errors to see whether or not an abort should be 
triggered.
bq.  
bq.  Prior to this patch, we could readily trigger region server aborts by 
rolling all the data nodes in a cluster while region servers were running.  
This was equally true whether write activity was happening or not.  (In fact I 
think having concurrent write activity actually gave a better chance for the 
log to be rolled prior to all DNs in the write pipeline going down and thus the 
region server not aborting).
bq.  
bq.  With this change and hbase.regionserver.logroll.errors.tolerated=2, I can 
roll DNs at will without causing any loss of service.
bq.  
bq.  I'd appreciate some scrutiny on any log rolling subtleties or interactions 
I may be missing here.  If there are alternate/better ways to handle this in 
the DFSClient layer, I'd also appreciate any pointers to that.
bq.  
bq.  
bq.  This addresses bug HBASE-4222.
bq.  https://issues.apache.org/jira/browse/HBASE-4222
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/regionserver/LogRoller.java 
8e87c83 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c301d1b 
bq.src/main/resources/hbase-default.xml 66548ca 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java 
5063896 
bq.  
bq.  Diff: https://reviews.apache.org/r/1590/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Added a new test for rolling data nodes under a running cluster: 
TestLogRolling.testLogRollOnPipelineRestart().
bq.  
bq.  Tested patch on a running cluster with 3 slaves, rolling data nodes with 
and without concurrent write activity.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Gary
bq.  
bq.



 Make HLog more resilient to write pipeline failures
 ---

 Key: HBASE-4222
 URL: https://issues.apache.org/jira/browse/HBASE-4222
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Gary Helmling
Assignee: Gary Helmling
 Fix For: 0.92.0


 The current implementation of HLog rolling to recover from transient errors 
 in the write pipeline seems to have two problems:
 # When {{HLog.LogSyncer}} triggers an {{IOException}} during time-based sync 
 operations, it triggers a log rolling request in the corresponding catch 
 block, but only after escaping from the internal while loop.  As a result, 
 the {{LogSyncer}} thread will exit and never be restarted from what I can 
 tell, even if the log rolling was successful.
 # Log rolling requests triggered by an {{IOException}} in {{sync()}} or 
 {{append()}} never happen if no entries have yet been written to the log.  
 This means that write errors are not immediately recovered, which extends the 
 exposure to more errors occurring in the pipeline.
 In addition, it seems like we should be able to better handle transient 
 problems, like a rolling restart of DataNodes while the HBase RegionServers 
 are running.  Currently this will reliably cause RegionServer aborts during 
 log rolling: either 

[jira] [Commented] (HBASE-4213) Support instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign)

2011-08-20 Thread Subbu M Iyer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088224#comment-13088224
 ] 

Subbu M Iyer commented on HBASE-4213:
-

Yes. I will.. I am  on it.




 Support instant schema updates with out master's intervention (i.e with out 
 enable/disable and bulk assign/unassign)
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.92.0

 Attachments: HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4235) Attempts to reconnect to expired ZooKeeper sessions

2011-08-20 Thread Andrew Purtell (JIRA)
Attempts to reconnect to expired ZooKeeper sessions
---

 Key: HBASE-4235
 URL: https://issues.apache.org/jira/browse/HBASE-4235
 Project: HBase
  Issue Type: Task
Affects Versions: 0.92.0, 0.90.5
Reporter: Andrew Purtell
Assignee: Andrew Purtell


In a couple of instances of short network outages, we have observed afterward 
zombie HBase processes attempting over and over to reconnect to expired 
ZooKeeper sessions. We believe this is due to ZOOKEEPER-1159. Opening this 
issue as reference to that.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4229) Replace Jettison JSON encoding with Jackson in HLogPrettyPrinter

2011-08-20 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088252#comment-13088252
 ] 

Andrew Purtell commented on HBASE-4229:
---

+1

 Replace Jettison JSON encoding with Jackson in HLogPrettyPrinter
 

 Key: HBASE-4229
 URL: https://issues.apache.org/jira/browse/HBASE-4229
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Riley Patterson
Assignee: Riley Patterson
Priority: Trivial
 Fix For: 0.92.0

 Attachments: HBASE-4229.patch


 HBase makes use of both jackson (in the region server) and jettison (in 
 HLogPrettyPrinter) for JSON encoding. Jackson seems to be better maintained, 
 so this patch standardizes by using jackson in HLogPrettyPrinter instead of 
 jettison.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4230) Compaction threads need names

2011-08-20 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-4230:
--

Status: Patch Available  (was: Open)

 Compaction threads need names
 -

 Key: HBASE-4230
 URL: https://issues.apache.org/jira/browse/HBASE-4230
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Todd Lipcon
 Fix For: 0.92.0

 Attachments: HBASE-4230.patch


 The CompactSplitThread creates executors for doing compaction work, but 
 threads end up named things like pool-2-thread-1 which isn't very useful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4230) Compaction threads need names

2011-08-20 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-4230:
--

Attachment: HBASE-4230.patch

Perhaps like the attached?

 Compaction threads need names
 -

 Key: HBASE-4230
 URL: https://issues.apache.org/jira/browse/HBASE-4230
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Todd Lipcon
 Fix For: 0.92.0

 Attachments: HBASE-4230.patch


 The CompactSplitThread creates executors for doing compaction work, but 
 threads end up named things like pool-2-thread-1 which isn't very useful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4230) Compaction threads need names

2011-08-20 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088263#comment-13088263
 ] 

stack commented on HBASE-4230:
--

+1

 Compaction threads need names
 -

 Key: HBASE-4230
 URL: https://issues.apache.org/jira/browse/HBASE-4230
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Todd Lipcon
 Fix For: 0.92.0

 Attachments: HBASE-4230.patch


 The CompactSplitThread creates executors for doing compaction work, but 
 threads end up named things like pool-2-thread-1 which isn't very useful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4229) Replace Jettison JSON encoding with Jackson in HLogPrettyPrinter

2011-08-20 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4229:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Thanks for the patch Riley.  Applied to TRUNK.

 Replace Jettison JSON encoding with Jackson in HLogPrettyPrinter
 

 Key: HBASE-4229
 URL: https://issues.apache.org/jira/browse/HBASE-4229
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Riley Patterson
Assignee: Riley Patterson
Priority: Trivial
 Fix For: 0.92.0

 Attachments: HBASE-4229.patch


 HBase makes use of both jackson (in the region server) and jettison (in 
 HLogPrettyPrinter) for JSON encoding. Jackson seems to be better maintained, 
 so this patch standardizes by using jackson in HLogPrettyPrinter instead of 
 jettison.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)

2011-08-20 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088266#comment-13088266
 ] 

Matt Corgan commented on HBASE-4218:


I lean towards byte-encoding ints whenever they're used often enough to have an 
impact on memory.  KeyValue could probably do better with some VInts.  You can 
encode 128 values in 1 byte and decode it with just one branch to check if b[0] 
 0.  Given the number of other byte comparisons going during reading the key, 
that doesn't seem too heavyweight (especially since many of those other byte 
comparisons are casting the byte to a positive integer before comparing).  If 
you reserved 2-4 bytes for that same number, then you may be doing even more 
work.

One problem with VInt decoders is that sometimes they do bounds checking which 
can slow things down a lot.  I think validation should be done at write time, 
and then possibly using a block-level checksum when a block is copied back into 
memory.  Then assume everything is correct.

For prefix compression, we're talking about encoding things at the block level 
where most of the ints are internal pointers that are less than the block size 
of 64k, so most ints can fit in 2 bytes.  But it's important that they be able 
to grow gracefully when block sizes grow beyond 64k or are configured to be 
bigger.  I've been using two types of encoded integers: VInt and FInt.  FInts 
are basically an optimization over VInts for cases where you have many ints 
with the same characteristics, and can therefore store their width at the block 
level rather than encoding it in every occurrence.

VInt (variable width int)
* width is not known ahead of time, so must interpret byte-by-byte
* slower because of branch on each byte, but still pretty fast
* only 2^7 values/byte, so 2 bytes can hold 16k values

FInt (fixed width int)
* width is known ahead of time and stored externally (at block level in 
PtBlockMeta in this project)
* an FInt is faster to encode decode because of the lack of if-statements
* each byte can store 2^8 values, so 2 bytes gets you 64k values (hbase block 
size)
* a list of these numbers provides random access.  important for binary 
searching
* if encoding the numbers 0-10,000, for example, then VInts will save you 1 
byte on the numbers 0-255, but that is a small % savings.  so use FInts for 
lists of numbers

- 

Sidenote: I've been meaning to make a CVInt (comparable variable width int) 
that:
* sorts based on raw bytes even if different widths (good for suffixing hbase 
row/colQualifier values)
* to interpret, count the number of leading 1 bits, and that is how many 
additional bytes there are beyond the first byte
* bits beyond the first 0 bit comprise the value
* should also be faster to decode because of fewer branches


 Delta Encoding of KeyValues  (aka prefix compression)
 -

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Reporter: Jacek Migdal
  Labels: compression

 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--

[jira] [Updated] (HBASE-4230) Compaction threads need names

2011-08-20 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-4230:
--

  Resolution: Fixed
Assignee: Andrew Purtell
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed to trunk.

 Compaction threads need names
 -

 Key: HBASE-4230
 URL: https://issues.apache.org/jira/browse/HBASE-4230
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Andrew Purtell
 Fix For: 0.92.0

 Attachments: HBASE-4230.patch


 The CompactSplitThread creates executors for doing compaction work, but 
 threads end up named things like pool-2-thread-1 which isn't very useful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4236) Don't lock the stream while serializing the response

2011-08-20 Thread Benoit Sigoure (JIRA)
Don't lock the stream while serializing the response


 Key: HBASE-4236
 URL: https://issues.apache.org/jira/browse/HBASE-4236
 Project: HBase
  Issue Type: Improvement
  Components: ipc
Affects Versions: 0.90.4
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
Priority: Minor


It is not necessary to hold the lock on the stream while the response is being 
serialized.  This unnecessarily prevents serializing responses in parallel.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4237) Directly remove the call being handled from the map of outstanding RPCs

2011-08-20 Thread Benoit Sigoure (JIRA)
Directly remove the call being handled from the map of outstanding RPCs
---

 Key: HBASE-4237
 URL: https://issues.apache.org/jira/browse/HBASE-4237
 Project: HBase
  Issue Type: Improvement
  Components: ipc
Affects Versions: 0.90.4
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
Priority: Minor


The client has to maintain a map of RPC ID to `Call' object for this RPC, for 
every outstanding RPC.  When receiving a response, the client was getting the 
`Call' out of the map (one O(log n) operation) and then removing it from the 
map (another O(log n) operation).  There is no benefit in not removing it 
directly from the map.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4237) Directly remove the call being handled from the map of outstanding RPCs

2011-08-20 Thread Benoit Sigoure (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088299#comment-13088299
 ] 

Benoit Sigoure commented on HBASE-4237:
---

Patch @ 
https://github.com/tsuna/hbase/commit/1f602391ee4cd3d11eaf3067208caeadf214b3a8

 Directly remove the call being handled from the map of outstanding RPCs
 ---

 Key: HBASE-4237
 URL: https://issues.apache.org/jira/browse/HBASE-4237
 Project: HBase
  Issue Type: Improvement
  Components: ipc
Affects Versions: 0.90.4
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
Priority: Minor

 The client has to maintain a map of RPC ID to `Call' object for this RPC, for 
 every outstanding RPC.  When receiving a response, the client was getting the 
 `Call' out of the map (one O(log n) operation) and then removing it from the 
 map (another O(log n) operation).  There is no benefit in not removing it 
 directly from the map.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4237) Directly remove the call being handled from the map of outstanding RPCs

2011-08-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088304#comment-13088304
 ] 

Ted Yu commented on HBASE-4237:
---

+1 on patch.

 Directly remove the call being handled from the map of outstanding RPCs
 ---

 Key: HBASE-4237
 URL: https://issues.apache.org/jira/browse/HBASE-4237
 Project: HBase
  Issue Type: Improvement
  Components: ipc
Affects Versions: 0.90.4
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
Priority: Minor

 The client has to maintain a map of RPC ID to `Call' object for this RPC, for 
 every outstanding RPC.  When receiving a response, the client was getting the 
 `Call' out of the map (one O(log n) operation) and then removing it from the 
 map (another O(log n) operation).  There is no benefit in not removing it 
 directly from the map.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4236) Don't lock the stream while serializing the response

2011-08-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088305#comment-13088305
 ] 

Ted Yu commented on HBASE-4236:
---

+1 on patch.

 Don't lock the stream while serializing the response
 

 Key: HBASE-4236
 URL: https://issues.apache.org/jira/browse/HBASE-4236
 Project: HBase
  Issue Type: Improvement
  Components: ipc
Affects Versions: 0.90.4
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
Priority: Minor

 It is not necessary to hold the lock on the stream while the response is 
 being serialized.  This unnecessarily prevents serializing responses in 
 parallel.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4199) blockCache summary - backend

2011-08-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088307#comment-13088307
 ] 

Ted Yu commented on HBASE-4199:
---

Patch version 4 is in a commit-table state. 

Minor comments:
In BlockCache:
{code}
+  public ListBlockCacheColumnFamilySummary 
getBlockCacheColumnFamilySummary(Configuration conf) throws IOException {
{code}
I think getBlockCacheColumnFamilySummaries might be a better name.

For HRegionInterface:
{code}
+   * Performs a BlockCache summary and returns a List of 
BlockCacheColumnFamily objects.
{code}
BlockCacheColumnFamilySummary objects are returned. Again, the method name 
should pluralize Summaries.

Good work, Doug.

 blockCache summary - backend
 

 Key: HBASE-4199
 URL: https://issues.apache.org/jira/browse/HBASE-4199
 Project: HBase
  Issue Type: Sub-task
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: java_HBASE_4199.patch, java_HBASE_4199_v2.patch, 
 java_HBASE_4199_v3.patch, java_HBASE_4199_v4.patch


 This is the backend work for the blockCache summary.  Change to BlockCache 
 interface, Summarization in LruBlockCache, BlockCacheSummaryEntry, addition 
 to HRegionInterface, and HRegionServer.
 This will NOT include any of the web UI or anything else like that.  That is 
 for another sub-task.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4065) TableOutputFormat ignores failure to create table instance

2011-08-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088309#comment-13088309
 ] 

Ted Yu commented on HBASE-4065:
---

In mapred/TableOutputFormat.java:
{code}
} catch(IOException e) {
  LOG.error(e);
  throw e;
}
{code}
Should we make their behavior consistent ?

 TableOutputFormat ignores failure to create table instance
 --

 Key: HBASE-4065
 URL: https://issues.apache.org/jira/browse/HBASE-4065
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.3
Reporter: Todd Lipcon
Assignee: Brock Noland
 Fix For: 0.94.0

 Attachments: HBASE-4065.1.patch


 If TableOutputFormat in the new API fails to create a table, it simply logs 
 this at ERROR level and then continues on its way. Then, the first write() to 
 the table will throw a NPE since table hasn't been set.
 Instead, it should probably rethrow the exception as a RuntimeException in 
 setConf, or do what the old-API TOF does and not create the HTable instance 
 until getRecordWriter, where it can throw an IOE.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira