date:20110820


 [ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaojinchao updated HBASE-4124:
--

Attachment: HBASE-4124_Branch90V2.patch

 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 

 Key: HBASE-4124
 URL: https://issues.apache.org/jira/browse/HBASE-4124
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: fulin wang
 Attachments: HBASE-4124_Branch90V1_trial.patch, 
 HBASE-4124_Branch90V2.patch, log.txt

   Original Estimate: 0.4h
  Remaining Estimate: 0.4h

 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 Issue:
 The RS failed besause of 'already online on this server' and return; The HM 
 can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.


[ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088146#comment-13088146
 ] 

gaojinchao commented on HBASE-4124:
---

I have finished the test. I discribe the scene:
step 1: startup cluster 
step 2: abort the master when finish call sendRegionOpen(destination, regions)
step 3: startup cluster again.

above steps will reproduce the issue. 
when master is failover. the meta records the dead server,but the region is 
processing for a living region server.


 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 

 Key: HBASE-4124
 URL: https://issues.apache.org/jira/browse/HBASE-4124
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: fulin wang
 Attachments: HBASE-4124_Branch90V1_trial.patch, 
 HBASE-4124_Branch90V2.patch, log.txt

   Original Estimate: 0.4h
  Remaining Estimate: 0.4h

 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 Issue:
 The RS failed besause of 'already online on this server' and return; The HM 
 can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.


[ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088147#comment-13088147
 ] 

gaojinchao commented on HBASE-4124:
---

sorry.step 3: startup master again .

 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 

 Key: HBASE-4124
 URL: https://issues.apache.org/jira/browse/HBASE-4124
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: fulin wang
 Attachments: HBASE-4124_Branch90V1_trial.patch, 
 HBASE-4124_Branch90V2.patch, log.txt

   Original Estimate: 0.4h
  Remaining Estimate: 0.4h

 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 Issue:
 The RS failed besause of 'already online on this server' and return; The HM 
 can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4209) The HBase hbase-daemon.sh SIGKILLs master when stopping it

2011-08-20 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088152#comment-13088152
 ] 

stack commented on HBASE-4209:
--

Is it because no shutdown hook in master and when in standalone mode all runs 
in the one jvm, the master's effectively?

In start-hbase.sh, if distmode is false, we ONLY start master:

{code}
if [ $distMode == 'false' ]
then
  $bin/hbase-daemon.sh start master
else
  $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} start zookeeper
  $bin/hbase-daemon.sh --config ${HBASE_CONF_DIR} start master
  $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} \
--hosts ${HBASE_REGIONSERVERS} start regionserver
  $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} \
--hosts ${HBASE_BACKUP_MASTERS} start master-backup
fi
{code}

Inside in master it will take care of starting up all the other beasties if 
distmode == false.

 The HBase hbase-daemon.sh SIGKILLs master when stopping it
 --

 Key: HBASE-4209
 URL: https://issues.apache.org/jira/browse/HBASE-4209
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Roman Shaposhnik

 There's a bit of code in hbase-daemon.sh that makes HBase master being 
 SIGKILLed when stopping it rather than trying SIGTERM (like it does for other 
 daemons). When HBase is executed in a standalone mode (and the only daemon 
 you need to run is master) that causes newly created tables to go missing as 
 unflushed data is thrown out. If there was not a good reason to kill master 
 with SIGKILL perhaps we can take that special case out and rely on SIGTERM.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.


 [ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaojinchao updated HBASE-4124:
--

Attachment: HBASE-4124_Branch90V2.patch

 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 

 Key: HBASE-4124
 URL: https://issues.apache.org/jira/browse/HBASE-4124
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: fulin wang
 Attachments: HBASE-4124_Branch90V1_trial.patch, 
 HBASE-4124_Branch90V2.patch, log.txt

   Original Estimate: 0.4h
  Remaining Estimate: 0.4h

 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 Issue:
 The RS failed besause of 'already online on this server' and return; The HM 
 can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.


 [ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaojinchao updated HBASE-4124:
--

Attachment: (was: HBASE-4124_Branch90V2.patch)

 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 

 Key: HBASE-4124
 URL: https://issues.apache.org/jira/browse/HBASE-4124
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: fulin wang
 Attachments: HBASE-4124_Branch90V1_trial.patch, 
 HBASE-4124_Branch90V2.patch, log.txt

   Original Estimate: 0.4h
  Remaining Estimate: 0.4h

 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 Issue:
 The RS failed besause of 'already online on this server' and return; The HM 
 can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.


 [ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaojinchao updated HBASE-4124:
--

Attachment: HBASE-4124_Branch90V2.patch

 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 

 Key: HBASE-4124
 URL: https://issues.apache.org/jira/browse/HBASE-4124
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: fulin wang
 Attachments: HBASE-4124_Branch90V1_trial.patch, 
 HBASE-4124_Branch90V2.patch, log.txt

   Original Estimate: 0.4h
  Remaining Estimate: 0.4h

 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 Issue:
 The RS failed besause of 'already online on this server' and return; The HM 
 can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.


 [ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaojinchao updated HBASE-4124:
--

Attachment: (was: HBASE-4124_Branch90V2.patch)

 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 

 Key: HBASE-4124
 URL: https://issues.apache.org/jira/browse/HBASE-4124
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: fulin wang
 Attachments: HBASE-4124_Branch90V1_trial.patch, 
 HBASE-4124_Branch90V2.patch, log.txt

   Original Estimate: 0.4h
  Remaining Estimate: 0.4h

 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 Issue:
 The RS failed besause of 'already online on this server' and return; The HM 
 can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.


[ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088173#comment-13088173
 ] 

gaojinchao commented on HBASE-4124:
---

I have added a test case for opening a region.

 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 

 Key: HBASE-4124
 URL: https://issues.apache.org/jira/browse/HBASE-4124
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: fulin wang
 Attachments: HBASE-4124_Branch90V1_trial.patch, 
 HBASE-4124_Branch90V2.patch, log.txt

   Original Estimate: 0.4h
  Remaining Estimate: 0.4h

 ZK restarted while assigning a region, new active HM re-assign it but the RS 
 warned 'already online on this server'.
 Issue:
 The RS failed besause of 'already online on this server' and return; The HM 
 can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4027) Enable direct byte buffers LruBlockCache

2011-08-20 Thread jirapos...@reviews.apache.org (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088211#comment-13088211
 ] 

jirapos...@reviews.apache.org commented on HBASE-4027:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1214/#review1585
---


In SingleSizeCache.cacheBlock():
CacheablePair newEntry = new CacheablePair(
toBeCached.serialize(storedBlock), storedBlock);
The above operation splits toBeCached into two parts: the first is for on-heap 
and is slim, storedBlock is for off-heap.


src/main/java/org/apache/hadoop/hbase/io/hfile/Cacheable.java
https://reviews.apache.org/r/1214/#comment3574

I think the word 'itself' in the javadoc above introduced confusion. It 
should be removed.



src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
https://reviews.apache.org/r/1214/#comment3573

As Pi explained in Cacheable interface, serialize() offloads majority of 
data to off-heap ByteBuffer. What gets returned is the skeleton that lives 
on-heap.


- Ted


On 2011-08-19 20:21:35, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1214/
bq.  ---
bq.  
bq.  (Updated 2011-08-19 20:21:35)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, Jonathan 
Gray, and Li Pi.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Review request - I apparently can't edit tlipcon's earlier posting of my 
diff, so creating a new one.
bq.  
bq.  
bq.  This addresses bug HBase-4027.
bq.  https://issues.apache.org/jira/browse/HBase-4027
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.conf/hbase-env.sh 2d55d27 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/BlockCache.java 2d4002c 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/CacheStats.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/Cacheable.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/CachedBlock.java 3b130d8 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/DoubleBlockCache.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java 097dc50 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java 
1338453 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/SimpleBlockCache.java 
886c31d 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SingleSizeCache.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/slab/Slab.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabCache.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabItemEvictionWatcher.java
 PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
e2c6c93 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 
7b7bf73 
bq.src/main/java/org/apache/hadoop/hbase/util/DirectMemoryUtils.java 
PRE-CREATION 
bq.src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java 
PRE-CREATION 
bq.src/test/java/org/apache/hadoop/hbase/io/hfile/TestCachedBlockQueue.java 
1ad2ece 
bq.src/test/java/org/apache/hadoop/hbase/io/hfile/TestLruBlockCache.java 
f0a9832 
bq.
src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSingleSizeCache.java 
PRE-CREATION 
bq.src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSlab.java 
PRE-CREATION 
bq.src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSlabCache.java 
PRE-CREATION 
bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStoreLAB.java 
d7e43a0 
bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 
4387170 
bq.  
bq.  Diff: https://reviews.apache.org/r/1214/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Ran benchmarks against it in HBase standalone mode. Wrote test cases for 
all classes, multithreaded test cases exist for the cache.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



 Enable direct byte buffers LruBlockCache
 

 Key: HBASE-4027
 URL: https://issues.apache.org/jira/browse/HBASE-4027
 Project: HBase
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Li Pi
Priority: Minor
 Attachments: 4027-v5.diff, 4027v7.diff, HBase-4027 (1).pdf, 
 HBase-4027.pdf, HBase4027v8.diff, HBase4027v9.diff, hbase-4027-v10.5.diff, 
 hbase-4027-v10.diff, hbase-4027v10.6.diff, hbase-4027v6.diff, 
 hbase4027v11.5.diff, hbase4027v11.6.diff, hbase4027v11.7.diff, 
 hbase4027v11.diff, hbase4027v12.1.diff,

[jira] [Commented] (HBASE-4167) Potential leak of HTable instances when using HTablePool with PoolType.ThreadLocal


[ 
https://issues.apache.org/jira/browse/HBASE-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088214#comment-13088214
 ] 

Ted Yu commented on HBASE-4167:
---

+1 on patch.

 Potential leak of HTable instances when using HTablePool with 
 PoolType.ThreadLocal
 --

 Key: HBASE-4167
 URL: https://issues.apache.org/jira/browse/HBASE-4167
 Project: HBase
  Issue Type: Bug
  Components: client
Reporter: Gary Helmling
 Fix For: 0.92.0

 Attachments: HBASE-4167.patch


 (Initially discussed in HBASE-4150)
 In HTablePool, when obtaining a table:
 {code}
 private HTableInterface findOrCreateTable(String tableName) {
 HTableInterface table = tables.get(tableName);
 if (table == null) {
   table = createHTable(tableName);
 }
 return table;
   }
 {code}
 In the case of {{ThreadLocalPool}}, it seems like there's an exposure here 
 between when the table is created initially and when 
 {{ThreadLocalPool.put()}} is called to set the thread local variable (on 
 {{PooledHTable.close()}}).
 Potential solution described by Karthick Sankarachary:
 For one thing, we might want to clear the tables variable when the 
 {{HTablePool}} is closed (as shown below). For another, we should override 
 ThreadLocalPool#get method so that it removes the resource, otherwise it 
 might end up referencing a HTableInterface that's has been released.
 {code}
 1 diff --git a/src/main/java/org/apache/hadoop/hbase/client/HTablePool.java 
 b/src/main/java/org/apache/hadoop/hbase/client/HTablePool.java
   2 index 952a3aa..c198f15 100755
   3 --- a/src/main/java/org/apache/hadoop/hbase/client/HTablePool.java
   4 +++ b/src/main/java/org/apache/hadoop/hbase/client/HTablePool.java
  13 @@ -309,6 +310,7 @@ public class HTablePool implements Closeable {
  14  for (String tableName : tables.keySet()) {
  15closeTablePool(tableName);
  16  }
  17 +this.tables.clear();
  18}
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4222) Make HLog more resilient to write pipeline failures

2011-08-20 Thread jirapos...@reviews.apache.org (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088219#comment-13088219
]

jirapos...@reviews.apache.org commented on HBASE-4222:
--

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1590/#review1586
---

Ship it!

TestHLog and TestLogRolling passed.

- Ted

On 2011-08-20 05:39:30, Gary Helmling wrote:
bq.
bq. ---
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/1590/
bq. ---
bq.
bq. (Updated 2011-08-20 05:39:30)
bq.
bq.
bq. Review request for hbase.
bq.
bq.
bq. Summary
bq. ---
bq.
bq. This patch corrects a few problems, as I see it, with the current log
rolling process:
bq.
bq. 1) HLog.LogSyncer.run() now handles an IOException in the inner while
loop. Previously any IOException would cause the LogSyncer thread to exit,
even if the subsequent log roll succeeded. This would mean the region server
kept running without a LogSyncer thread
bq. 2) Log rolls triggered by IOExceptions were being skipped in the event
that there were no entries in the log. This would prevent the log from being
recovered in a timely manner.
bq. 3) minor - FailedLogCloseException was never actually being thrown out of
HLog.cleanupCurrentWriter(), resulting in inaccurate logging on RS abort
bq.
bq. The bigger change is the addition of a configuration property --
hbase.regionserver.logroll.errors.tolerated -- that is checked against a
counter of consecutive close errors to see whether or not an abort should be
triggered.
bq.
bq. Prior to this patch, we could readily trigger region server aborts by
rolling all the data nodes in a cluster while region servers were running.
This was equally true whether write activity was happening or not. (In fact I
think having concurrent write activity actually gave a better chance for the
log to be rolled prior to all DNs in the write pipeline going down and thus the
region server not aborting).
bq.
bq. With this change and hbase.regionserver.logroll.errors.tolerated=2, I can
roll DNs at will without causing any loss of service.
bq.
bq. I'd appreciate some scrutiny on any log rolling subtleties or interactions
I may be missing here. If there are alternate/better ways to handle this in
the DFSClient layer, I'd also appreciate any pointers to that.
bq.
bq.
bq. This addresses bug HBASE-4222.
bq. https://issues.apache.org/jira/browse/HBASE-4222
bq.
bq.
bq. Diffs
bq. -
bq.
bq.src/main/java/org/apache/hadoop/hbase/regionserver/LogRoller.java
8e87c83
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c301d1b
bq.src/main/resources/hbase-default.xml 66548ca
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java
5063896
bq.
bq. Diff: https://reviews.apache.org/r/1590/diff
bq.
bq.
bq. Testing
bq. ---
bq.
bq. Added a new test for rolling data nodes under a running cluster:
TestLogRolling.testLogRollOnPipelineRestart().
bq.
bq. Tested patch on a running cluster with 3 slaves, rolling data nodes with
and without concurrent write activity.
bq.
bq.
bq. Thanks,
bq.
bq. Gary
bq.
bq.

Make HLog more resilient to write pipeline failures
---

Key: HBASE-4222
URL: https://issues.apache.org/jira/browse/HBASE-4222
Project: HBase
Issue Type: Improvement
Components: wal
Reporter: Gary Helmling
Assignee: Gary Helmling
Fix For: 0.92.0

The current implementation of HLog rolling to recover from transient errors
in the write pipeline seems to have two problems:
# When {{HLog.LogSyncer}} triggers an {{IOException}} during time-based sync
operations, it triggers a log rolling request in the corresponding catch
block, but only after escaping from the internal while loop. As a result,
the {{LogSyncer}} thread will exit and never be restarted from what I can
tell, even if the log rolling was successful.
# Log rolling requests triggered by an {{IOException}} in {{sync()}} or
{{append()}} never happen if no entries have yet been written to the log.
This means that write errors are not immediately recovered, which extends the
exposure to more errors occurring in the pipeline.
In addition, it seems like we should be able to better handle transient
problems, like a rolling restart of DataNodes while the HBase RegionServers
are running. Currently this will reliably cause RegionServer aborts during
log rolling: either

[jira] [Commented] (HBASE-4213) Support instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign)

2011-08-20 Thread Subbu M Iyer (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088224#comment-13088224
 ] 

Subbu M Iyer commented on HBASE-4213:
-

Yes. I will.. I am  on it.




 Support instant schema updates with out master's intervention (i.e with out 
 enable/disable and bulk assign/unassign)
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.92.0

 Attachments: HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4235) Attempts to reconnect to expired ZooKeeper sessions

Attempts to reconnect to expired ZooKeeper sessions
---

 Key: HBASE-4235
 URL: https://issues.apache.org/jira/browse/HBASE-4235
 Project: HBase
  Issue Type: Task
Affects Versions: 0.92.0, 0.90.5
Reporter: Andrew Purtell
Assignee: Andrew Purtell


In a couple of instances of short network outages, we have observed afterward 
zombie HBase processes attempting over and over to reconnect to expired 
ZooKeeper sessions. We believe this is due to ZOOKEEPER-1159. Opening this 
issue as reference to that.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4229) Replace Jettison JSON encoding with Jackson in HLogPrettyPrinter


[ 
https://issues.apache.org/jira/browse/HBASE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088252#comment-13088252
 ] 

Andrew Purtell commented on HBASE-4229:
---

+1

 Replace Jettison JSON encoding with Jackson in HLogPrettyPrinter
 

 Key: HBASE-4229
 URL: https://issues.apache.org/jira/browse/HBASE-4229
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Riley Patterson
Assignee: Riley Patterson
Priority: Trivial
 Fix For: 0.92.0

 Attachments: HBASE-4229.patch


 HBase makes use of both jackson (in the region server) and jettison (in 
 HLogPrettyPrinter) for JSON encoding. Jackson seems to be better maintained, 
 so this patch standardizes by using jackson in HLogPrettyPrinter instead of 
 jettison.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4230) Compaction threads need names


 [ 
https://issues.apache.org/jira/browse/HBASE-4230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-4230:
--

Status: Patch Available  (was: Open)

 Compaction threads need names
 -

 Key: HBASE-4230
 URL: https://issues.apache.org/jira/browse/HBASE-4230
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Todd Lipcon
 Fix For: 0.92.0

 Attachments: HBASE-4230.patch


 The CompactSplitThread creates executors for doing compaction work, but 
 threads end up named things like pool-2-thread-1 which isn't very useful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4230) Compaction threads need names


 [ 
https://issues.apache.org/jira/browse/HBASE-4230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-4230:
--

Attachment: HBASE-4230.patch

Perhaps like the attached?

 Compaction threads need names
 -

 Key: HBASE-4230
 URL: https://issues.apache.org/jira/browse/HBASE-4230
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Todd Lipcon
 Fix For: 0.92.0

 Attachments: HBASE-4230.patch


 The CompactSplitThread creates executors for doing compaction work, but 
 threads end up named things like pool-2-thread-1 which isn't very useful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4230) Compaction threads need names

2011-08-20 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088263#comment-13088263
 ] 

stack commented on HBASE-4230:
--

+1

 Compaction threads need names
 -

 Key: HBASE-4230
 URL: https://issues.apache.org/jira/browse/HBASE-4230
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Todd Lipcon
 Fix For: 0.92.0

 Attachments: HBASE-4230.patch


 The CompactSplitThread creates executors for doing compaction work, but 
 threads end up named things like pool-2-thread-1 which isn't very useful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4229) Replace Jettison JSON encoding with Jackson in HLogPrettyPrinter

2011-08-20 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4229:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Thanks for the patch Riley.  Applied to TRUNK.

 Replace Jettison JSON encoding with Jackson in HLogPrettyPrinter
 

 Key: HBASE-4229
 URL: https://issues.apache.org/jira/browse/HBASE-4229
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Riley Patterson
Assignee: Riley Patterson
Priority: Trivial
 Fix For: 0.92.0

 Attachments: HBASE-4229.patch


 HBase makes use of both jackson (in the region server) and jettison (in 
 HLogPrettyPrinter) for JSON encoding. Jackson seems to be better maintained, 
 so this patch standardizes by using jackson in HLogPrettyPrinter instead of 
 jettison.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)

2011-08-20 Thread Matt Corgan (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088266#comment-13088266
]

Matt Corgan commented on HBASE-4218:

I lean towards byte-encoding ints whenever they're used often enough to have an
impact on memory. KeyValue could probably do better with some VInts. You can
encode 128 values in 1 byte and decode it with just one branch to check if b[0]
0. Given the number of other byte comparisons going during reading the key,
that doesn't seem too heavyweight (especially since many of those other byte
comparisons are casting the byte to a positive integer before comparing). If
you reserved 2-4 bytes for that same number, then you may be doing even more
work.

One problem with VInt decoders is that sometimes they do bounds checking which
can slow things down a lot. I think validation should be done at write time,
and then possibly using a block-level checksum when a block is copied back into
memory. Then assume everything is correct.

For prefix compression, we're talking about encoding things at the block level
where most of the ints are internal pointers that are less than the block size
of 64k, so most ints can fit in 2 bytes. But it's important that they be able
to grow gracefully when block sizes grow beyond 64k or are configured to be
bigger. I've been using two types of encoded integers: VInt and FInt. FInts
are basically an optimization over VInts for cases where you have many ints
with the same characteristics, and can therefore store their width at the block
level rather than encoding it in every occurrence.

VInt (variable width int)
* width is not known ahead of time, so must interpret byte-by-byte
* slower because of branch on each byte, but still pretty fast
* only 2^7 values/byte, so 2 bytes can hold 16k values

FInt (fixed width int)
* width is known ahead of time and stored externally (at block level in
PtBlockMeta in this project)
* an FInt is faster to encode decode because of the lack of if-statements
* each byte can store 2^8 values, so 2 bytes gets you 64k values (hbase block
size)
* a list of these numbers provides random access. important for binary
searching
* if encoding the numbers 0-10,000, for example, then VInts will save you 1
byte on the numbers 0-255, but that is a small % savings. so use FInts for
lists of numbers

Sidenote: I've been meaning to make a CVInt (comparable variable width int)
that:
* sorts based on raw bytes even if different widths (good for suffixing hbase
row/colQualifier values)
* to interpret, count the number of leading 1 bits, and that is how many
additional bytes there are beyond the first byte
* bits beyond the first 0 bit comprise the value
* should also be faster to decode because of fewer branches

Delta Encoding of KeyValues (aka prefix compression)
-

Key: HBASE-4218
URL: https://issues.apache.org/jira/browse/HBASE-4218
Project: HBase
Issue Type: Improvement
Components: io
Reporter: Jacek Migdal
Labels: compression

A compression for keys. Keys are sorted in HFile and they are usually very
similar. Because of that, it is possible to design better compression than
general purpose algorithms,
It is an additional step designed to be used in memory. It aims to save
memory in cache as well as speeding seeks within HFileBlocks. It should
improve performance a lot, if key lengths are larger than value lengths. For
example, it makes a lot of sense to use it when value is a counter.
Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes)
shows that I could achieve decent level of compression:
key compression ratio: 92%
total compression ratio: 85%
LZO on the same data: 85%
LZO after delta encoding: 91%
While having much better performance (20-80% faster decompression ratio than
LZO). Moreover, it should allow far more efficient seeking which should
improve performance a bit.
It seems that a simple compression algorithms are good enough. Most of the
savings are due to prefix compression, int128 encoding, timestamp diffs and
bitfields to avoid duplication. That way, comparisons of compressed data can
be much faster than a byte comparator (thanks to prefix compression and
bitfields).
In order to implement it in HBase two important changes in design will be
needed:
-solidify interface to HFileBlock / HFileReader Scanner to provide seeking
and iterating; access to uncompressed buffer in HFileBlock will have bad
performance
-extend comparators to support comparison assuming that N first bytes are
equal (or some fields are equal)
Link to a discussion about something similar:
http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

[jira] [Updated] (HBASE-4230) Compaction threads need names


 [ 
https://issues.apache.org/jira/browse/HBASE-4230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-4230:
--

  Resolution: Fixed
Assignee: Andrew Purtell
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed to trunk.

 Compaction threads need names
 -

 Key: HBASE-4230
 URL: https://issues.apache.org/jira/browse/HBASE-4230
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Andrew Purtell
 Fix For: 0.92.0

 Attachments: HBASE-4230.patch


 The CompactSplitThread creates executors for doing compaction work, but 
 threads end up named things like pool-2-thread-1 which isn't very useful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4236) Don't lock the stream while serializing the response

2011-08-20 Thread Benoit Sigoure (JIRA)

Don't lock the stream while serializing the response


 Key: HBASE-4236
 URL: https://issues.apache.org/jira/browse/HBASE-4236
 Project: HBase
  Issue Type: Improvement
  Components: ipc
Affects Versions: 0.90.4
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
Priority: Minor


It is not necessary to hold the lock on the stream while the response is being 
serialized.  This unnecessarily prevents serializing responses in parallel.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4237) Directly remove the call being handled from the map of outstanding RPCs

2011-08-20 Thread Benoit Sigoure (JIRA)

Directly remove the call being handled from the map of outstanding RPCs
---

 Key: HBASE-4237
 URL: https://issues.apache.org/jira/browse/HBASE-4237
 Project: HBase
  Issue Type: Improvement
  Components: ipc
Affects Versions: 0.90.4
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
Priority: Minor


The client has to maintain a map of RPC ID to `Call' object for this RPC, for 
every outstanding RPC.  When receiving a response, the client was getting the 
`Call' out of the map (one O(log n) operation) and then removing it from the 
map (another O(log n) operation).  There is no benefit in not removing it 
directly from the map.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4237) Directly remove the call being handled from the map of outstanding RPCs

2011-08-20 Thread Benoit Sigoure (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088299#comment-13088299
 ] 

Benoit Sigoure commented on HBASE-4237:
---

Patch @ 
https://github.com/tsuna/hbase/commit/1f602391ee4cd3d11eaf3067208caeadf214b3a8

 Directly remove the call being handled from the map of outstanding RPCs
 ---

 Key: HBASE-4237
 URL: https://issues.apache.org/jira/browse/HBASE-4237
 Project: HBase
  Issue Type: Improvement
  Components: ipc
Affects Versions: 0.90.4
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
Priority: Minor

 The client has to maintain a map of RPC ID to `Call' object for this RPC, for 
 every outstanding RPC.  When receiving a response, the client was getting the 
 `Call' out of the map (one O(log n) operation) and then removing it from the 
 map (another O(log n) operation).  There is no benefit in not removing it 
 directly from the map.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4237) Directly remove the call being handled from the map of outstanding RPCs


[ 
https://issues.apache.org/jira/browse/HBASE-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088304#comment-13088304
 ] 

Ted Yu commented on HBASE-4237:
---

+1 on patch.

 Directly remove the call being handled from the map of outstanding RPCs
 ---

 Key: HBASE-4237
 URL: https://issues.apache.org/jira/browse/HBASE-4237
 Project: HBase
  Issue Type: Improvement
  Components: ipc
Affects Versions: 0.90.4
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
Priority: Minor

 The client has to maintain a map of RPC ID to `Call' object for this RPC, for 
 every outstanding RPC.  When receiving a response, the client was getting the 
 `Call' out of the map (one O(log n) operation) and then removing it from the 
 map (another O(log n) operation).  There is no benefit in not removing it 
 directly from the map.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4236) Don't lock the stream while serializing the response


[ 
https://issues.apache.org/jira/browse/HBASE-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088305#comment-13088305
 ] 

Ted Yu commented on HBASE-4236:
---

+1 on patch.

 Don't lock the stream while serializing the response
 

 Key: HBASE-4236
 URL: https://issues.apache.org/jira/browse/HBASE-4236
 Project: HBase
  Issue Type: Improvement
  Components: ipc
Affects Versions: 0.90.4
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
Priority: Minor

 It is not necessary to hold the lock on the stream while the response is 
 being serialized.  This unnecessarily prevents serializing responses in 
 parallel.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4199) blockCache summary - backend


[ 
https://issues.apache.org/jira/browse/HBASE-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088307#comment-13088307
 ] 

Ted Yu commented on HBASE-4199:
---

Patch version 4 is in a commit-table state. 

Minor comments:
In BlockCache:
{code}
+  public ListBlockCacheColumnFamilySummary 
getBlockCacheColumnFamilySummary(Configuration conf) throws IOException {
{code}
I think getBlockCacheColumnFamilySummaries might be a better name.

For HRegionInterface:
{code}
+   * Performs a BlockCache summary and returns a List of 
BlockCacheColumnFamily objects.
{code}
BlockCacheColumnFamilySummary objects are returned. Again, the method name 
should pluralize Summaries.

Good work, Doug.

 blockCache summary - backend
 

 Key: HBASE-4199
 URL: https://issues.apache.org/jira/browse/HBASE-4199
 Project: HBase
  Issue Type: Sub-task
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: java_HBASE_4199.patch, java_HBASE_4199_v2.patch, 
 java_HBASE_4199_v3.patch, java_HBASE_4199_v4.patch


 This is the backend work for the blockCache summary.  Change to BlockCache 
 interface, Summarization in LruBlockCache, BlockCacheSummaryEntry, addition 
 to HRegionInterface, and HRegionServer.
 This will NOT include any of the web UI or anything else like that.  That is 
 for another sub-task.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4065) TableOutputFormat ignores failure to create table instance