date:20111101

[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk

2011-11-01 Thread nkeywal (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141971#comment-13141971
 ] 

nkeywal commented on HBASE-4724:


I reproduce the issue 100% of the time on trunk, with 
TestAdmin#testCreateBadTables. I was not reproducing the error at all 
previously.

> TestAdmin hangs randomly in trunk
> -
>
> Key: HBASE-4724
> URL: https://issues.apache.org/jira/browse/HBASE-4724
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.94.0
>Reporter: nkeywal
>Priority: Critical
>
> fom the logs in my env
> {noformat}
> 2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
> master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
> localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
> org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
> org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
> server = 29)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
>   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
> {noformat}
> Anyway, after this the logs finishes with:
> {noformat}
> 2011-11-01 15:54:35,132 INFO  
> [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
> Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
> Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
> Master:0;localhost,39664,1320187706355
> {noformat}
> it's in
> {noformat}
> sun.management.ThreadImpl.getThreadInfo1(Native Method)
> sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
> sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)
> 
> org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
> 
> org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
> org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
> org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)
> 
> org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
> {noformat}
> So that's at least why adding a timeout wont help and may be why it does not 
> end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.
> I also wonder if the root cause of the non ending is my modif on the wal, 
> with some threads surprised to have updates that were not written in the wal. 
> Here is the full stack dump:
> {noformat}
> Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from 
> nkeywal):
>   State: TIMED_WAITING
>   Blocked count: 360
>   Waited count: 359
>   Stack:
> java.lang.Object.wait(Native Method)
> org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702)
> org.apache.hadoop.ipc.Client$Connection.run(Client.java:744)
> Thread 272 (Master:0;localhost,39664,1320187706355-EventThread):
>   State: WAITING
>   Blocked count: 0
>   Waited count: 4
>   Waiting on 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b
>   Stack:
> sun.misc.Unsafe.park(Native Method)
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
> 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
> Thread 271 
> (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)):
>   State: RUNNABLE
>   Blocked count: 2
>   Waited count: 0
>   Stack:
> sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
> sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
> sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
> sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
> Thread 152 (Master:0;localhost,39664,1320187706355):
>   State: WAITING
>   Blocked count: 217
>   Waited count: 174
>   Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c
>   Stack:
> java.lang.Object.wait(Native Method)
> java.lang.Object.wait(Object.java:485)
> 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131)
> 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKe

[jira] [Commented] (HBASE-4536) Allow CF to retain deleted rows

2011-11-01 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141962#comment-13141962
 ] 

jirapos...@reviews.apache.org commented on HBASE-4536:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2178/#review3009
---



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java


Hi Lars, Isn't this early-out problematic? It doesn't take into account 
min-versions. It doesn't take into account the newly introduced 
keepDeletedCells mode.


- Prakash


On 2011-10-18 21:43:38, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2178/
bq.  ---
bq.  
bq.  (Updated 2011-10-18 21:43:38)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HBase timerange Gets and Scans allow to do "timetravel" in HBase. I.e. 
look at the state of the data at any point in the past, provided the data is 
still around.
bq.  This did not work for deletes, however. Deletes would always mask all puts 
in the past.
bq.  This change adds a flag that can be on HColumnDescriptor to enable 
retention of deleted rows.
bq.  These rows are still subject to TTL and/or VERSIONS.
bq.  
bq.  This changes the following:
bq.  1. There is a new flag on HColumnDescriptor enabling that behavior.
bq.  2. Allow gets/scans with a timerange to retrieve rows hidden by a delete 
marker, if the timerange does not include the delete marker.
bq.  3. Do not unconditionally collect all deleted rows during a compaction.
bq.  4. Allow a "raw" Scan, which retrieves all delete markers and deleted rows.
bq.  
bq.  The change is small'ish, but the logic is intricate, so please review 
carefully.
bq.  
bq.  
bq.  This addresses bug HBASE-4536.
bq.  https://issues.apache.org/jira/browse/HBASE-4536
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Attributes.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
 1185362 
bq.http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 
1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeepDeletes.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hba

[jira] [Commented] (HBASE-4703) Improvements in tests

2011-11-01 Thread nkeywal (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141961#comment-13141961
 ] 

nkeywal commented on HBASE-4703:


Created HBASE-4724 to track this.

> Improvements in tests
> -
>
> Key: HBASE-4703
> URL: https://issues.apache.org/jira/browse/HBASE-4703
> Project: HBase
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 0.92.0
> Environment: all
>Reporter: nkeywal
>Assignee: nkeywal
>Priority: Minor
> Fix For: 0.92.0
>
> Attachments: 20111030_4703_all.patch, 20111030_4703_all.v2.patch, 
> 20111030_4703_all.v3.patch, 20111030_4703_all.v4.patch
>
>
> Global:
>  - when possible, make the test using the default cluster configuration for 
> the number of region (1 instead of 2 or 3). This allows a faster stop/start, 
> and is a step toward a shared cluster configuration.
>  - 'sleep': lower or remove the sleep based synchronisation in the tests (in 
> HBaseTestingUtility, TestGlobalMemStoreSize, TestAdmin, 
> TestCoprocessorEndpoint, TestHFileOutputFormat, TestLruBlockCache, 
> TestServerCustomProtocol, TestReplicationSink)
>   - Optimize 'put' by setting setWriteToWAL to false, when the 'put' is big 
> or in a loop. Not done for tests that rely on the WAL.
>  
> Local issues:
> - TestTableInputFormatScan fully deletes the hadoop.tmp.dir directory on 
> tearDown, that makes it impossible to use in // with another test using this 
> directory
> - TestIdLock logs too much (9000 lines per seconds). Test time lowered to 15 
> seconds to make it a part of the small subset
> - TestMemoryBoundedLogMessageBuffer useless System.out.println
> - io.hfile.TestReseekTo useless System.out.println
> - TestTableInputFormat does not shutdown the cluster
> - testGlobalMemStore does not shutdown the cluster
> - rest.client.TestRemoteAdmin: simplified, does not use local admin, single 
> test instead of two.
> - HBaseTestingUtility#ensureSomeRegionServersAvailable starts only one 
> server, should start the number of missing server instead.
> - TestMergeTool should starts/stops the dfs cluster with HBaseTestingUtility

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4724) TestAdmin hangs randomly in trunk

2011-11-01 Thread nkeywal (Created) (JIRA)

TestAdmin hangs randomly in trunk
-

 Key: HBASE-4724
 URL: https://issues.apache.org/jira/browse/HBASE-4724
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Priority: Critical



fom the logs in my env
{noformat}
2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
server = 29)
at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
{noformat}
Anyway, after this the logs finishes with:
{noformat}
2011-11-01 15:54:35,132 INFO  
[Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
Master:0;localhost,39664,1320187706355
{noformat}
it's in
{noformat}
sun.management.ThreadImpl.getThreadInfo1(Native Method)
sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)

org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)

org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)

org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)

org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
{noformat}
So that's at least why adding a timeout wont help and may be why it does not 
end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.

I also wonder if the root cause of the non ending is my modif on the wal, with 
some threads surprised to have updates that were not written in the wal. Here 
is the full stack dump:
{noformat}
Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from 
nkeywal):
  State: TIMED_WAITING
  Blocked count: 360
  Waited count: 359
  Stack:
java.lang.Object.wait(Native Method)
org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702)
org.apache.hadoop.ipc.Client$Connection.run(Client.java:744)
Thread 272 (Master:0;localhost,39664,1320187706355-EventThread):
  State: WAITING
  Blocked count: 0
  Waited count: 4
  Waiting on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b
  Stack:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
Thread 271 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)):
  State: RUNNABLE
  Blocked count: 2
  Waited count: 0
  Stack:
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
Thread 152 (Master:0;localhost,39664,1320187706355):
  State: WAITING
  Blocked count: 217
  Waited count: 174
  Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c
  Stack:
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)

org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131)

org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104)

org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:277)
org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:523)

org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:468)
org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:309)
java.lang.Thread.run(Thread.java:662)
Thread 165 (LruBlockCache.EvictionThread):
  State: WAITING
  Blocked count: 0
  Waited count: 1
  Waiting on 
org.apache.hadoop.hbase.io.hfile.LruBlockCache$EvictionThread@3e9d7b56
  Stack:
java.lang.

[jira] [Commented] (HBASE-4703) Improvements in tests

2011-11-01 Thread nkeywal (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141957#comment-13141957
 ] 

nkeywal commented on HBASE-4703:


Yes, the join are failing, and as the main thread does not end the JVM does not 
end. I pulled the trunk again, I still have the issue on rpc version.

> Improvements in tests
> -
>
> Key: HBASE-4703
> URL: https://issues.apache.org/jira/browse/HBASE-4703
> Project: HBase
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 0.92.0
> Environment: all
>Reporter: nkeywal
>Assignee: nkeywal
>Priority: Minor
> Fix For: 0.92.0
>
> Attachments: 20111030_4703_all.patch, 20111030_4703_all.v2.patch, 
> 20111030_4703_all.v3.patch, 20111030_4703_all.v4.patch
>
>
> Global:
>  - when possible, make the test using the default cluster configuration for 
> the number of region (1 instead of 2 or 3). This allows a faster stop/start, 
> and is a step toward a shared cluster configuration.
>  - 'sleep': lower or remove the sleep based synchronisation in the tests (in 
> HBaseTestingUtility, TestGlobalMemStoreSize, TestAdmin, 
> TestCoprocessorEndpoint, TestHFileOutputFormat, TestLruBlockCache, 
> TestServerCustomProtocol, TestReplicationSink)
>   - Optimize 'put' by setting setWriteToWAL to false, when the 'put' is big 
> or in a loop. Not done for tests that rely on the WAL.
>  
> Local issues:
> - TestTableInputFormatScan fully deletes the hadoop.tmp.dir directory on 
> tearDown, that makes it impossible to use in // with another test using this 
> directory
> - TestIdLock logs too much (9000 lines per seconds). Test time lowered to 15 
> seconds to make it a part of the small subset
> - TestMemoryBoundedLogMessageBuffer useless System.out.println
> - io.hfile.TestReseekTo useless System.out.println
> - TestTableInputFormat does not shutdown the cluster
> - testGlobalMemStore does not shutdown the cluster
> - rest.client.TestRemoteAdmin: simplified, does not use local admin, single 
> test instead of two.
> - HBaseTestingUtility#ensureSomeRegionServersAvailable starts only one 
> server, should start the number of missing server instead.
> - TestMergeTool should starts/stops the dfs cluster with HBaseTestingUtility

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4722) TestGlobalMemStoreSize has started failing

2011-11-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141948#comment-13141948
 ] 

stack commented on HBASE-4722:
--

I committed logging-v2 so can get more info when fails on jenkins since can't 
make it fail local (and I'm going to bed...)

> TestGlobalMemStoreSize has started failing
> --
>
> Key: HBASE-4722
> URL: https://issues.apache.org/jira/browse/HBASE-4722
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Priority: Critical
> Attachments: logging-v2.txt, logging.txt
>
>
> I'm digging in.  It fails occasionally for me locally to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4722) TestGlobalMemStoreSize has started failing

2011-11-01 Thread stack (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4722:
-

Attachment: logging-v2.txt

Bit more logging. Since I added this, I can't make it fail locally.  Its like a 
timing issue seemingly where we skip flushing seemingly.  Still digging.

> TestGlobalMemStoreSize has started failing
> --
>
> Key: HBASE-4722
> URL: https://issues.apache.org/jira/browse/HBASE-4722
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Priority: Critical
> Attachments: logging-v2.txt, logging.txt
>
>
> I'm digging in.  It fails occasionally for me locally to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2312) Possible data loss when RS goes into GC pause while rolling HLog

2011-11-01 Thread Phabricator (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141944#comment-13141944
 ] 

Phabricator commented on HBASE-2312:


stack has commented on the revision "HBASE-2312 [jira] Possible data loss when 
RS goes into GC pause while rolling HLog".

  Hmm... how do I get the nice green border around my comment like Prakash got? 
 Do I 'Resign as reviewer'?

REVISION DETAIL
  https://reviews.facebook.net/D99


> Possible data loss when RS goes into GC pause while rolling HLog
> 
>
> Key: HBASE-2312
> URL: https://issues.apache.org/jira/browse/HBASE-2312
> Project: HBase
>  Issue Type: Bug
>  Components: master, regionserver
>Affects Versions: 0.90.0
>Reporter: Karthik Ranganathan
>Assignee: Nicolas Spiegelberg
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: D99.1.patch, D99.2.patch, D99.3.patch
>
>
> There is a very corner case when bad things could happen(ie data loss):
> 1)RS #1 is going to roll its HLog - not yet created the new one, old one 
> will get no more writes
> 2)RS #1 enters GC Pause of Death
> 3)Master lists HLog files of RS#1 that is has to split as RS#1 is dead, 
> starts splitting
> 4)RS #1 wakes up, created the new HLog (previous one was rolled) and 
> appends an edit - which is lost
> The following seems like a possible solution:
> 1)Master detects RS#1 is dead
> 2)The master renames the /hbase/.logs/  directory to 
> something else (say /hbase/.logs/-dead)
> 3)Add mkdir support (as opposed to mkdirs) to HDFS - so that a file 
> create fails if the directory doesn't exist. Dhruba tells me this is very 
> doable.
> 4)RS#1 comes back up and is not able create the new hlog. It restarts 
> itself.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2312) Possible data loss when RS goes into GC pause while rolling HLog

2011-11-01 Thread Phabricator (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141943#comment-13141943
 ] 

Phabricator commented on HBASE-2312:


stack has commented on the revision "HBASE-2312 [jira] Possible data loss when 
RS goes into GC pause while rolling HLog".

  +1 on commit.  Minor comments that should not get in way of this patch.  
Attach the patch Nicolas to JIRA and then submit the patch and let the 
patch-build run its course.  Good stuff.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java:208 
trivial nit -- not important: FYI, I prefer the operator at EOF rather than at 
start; when a hanging operator, its useful as signifier of more to come (Did 
you format this with your apache formatter?)
  src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java:201 Next 
time, this messing w/ suffix belongs out in a method.
  src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java:253 Should 
we throw only if "hbase.hlog.split.skip.errors" is NOT set?
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java:84
 These gymnastics belong out in a dedicated method rather than inline here in 
init method.  Next time.

REVISION DETAIL
  https://reviews.facebook.net/D99


> Possible data loss when RS goes into GC pause while rolling HLog
> 
>
> Key: HBASE-2312
> URL: https://issues.apache.org/jira/browse/HBASE-2312
> Project: HBase
>  Issue Type: Bug
>  Components: master, regionserver
>Affects Versions: 0.90.0
>Reporter: Karthik Ranganathan
>Assignee: Nicolas Spiegelberg
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: D99.1.patch, D99.2.patch, D99.3.patch
>
>
> There is a very corner case when bad things could happen(ie data loss):
> 1)RS #1 is going to roll its HLog - not yet created the new one, old one 
> will get no more writes
> 2)RS #1 enters GC Pause of Death
> 3)Master lists HLog files of RS#1 that is has to split as RS#1 is dead, 
> starts splitting
> 4)RS #1 wakes up, created the new HLog (previous one was rolled) and 
> appends an edit - which is lost
> The following seems like a possible solution:
> 1)Master detects RS#1 is dead
> 2)The master renames the /hbase/.logs/  directory to 
> something else (say /hbase/.logs/-dead)
> 3)Add mkdir support (as opposed to mkdirs) to HDFS - so that a file 
> create fails if the directory doesn't exist. Dhruba tells me this is very 
> doable.
> 4)RS#1 comes back up and is not able create the new hlog. It restarts 
> itself.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4577) Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB

2011-11-01 Thread Hadoop QA (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141941#comment-13141941
 ] 

Hadoop QA commented on HBASE-4577:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12501906/HBASE-4577_trunk.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -165 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 2 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.master.TestDistributedLogSplitting
  org.apache.hadoop.hbase.master.TestMasterFailover
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/133//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/133//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/133//console

This message is automatically generated.

> Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB
> -
>
> Key: HBASE-4577
> URL: https://issues.apache.org/jira/browse/HBASE-4577
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
>Assignee: gaojinchao
>Priority: Minor
> Fix For: 0.92.0
>
> Attachments: HBASE-4577_trial_Trunk.patch, HBASE-4577_trunk.patch
>
>
> Minor issue while looking at the RS metrics:
> bq. numberOfStorefiles=8, storefileUncompressedSizeMB=2418, 
> storefileSizeMB=2420, compressionRatio=1.0008
> I guess there's a truncation somewhere when it's adding the numbers up.
> FWIW there's no compression on that table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4577) Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB

2011-11-01 Thread ramkrishna.s.vasudevan (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141939#comment-13141939
 ] 

ramkrishna.s.vasudevan commented on HBASE-4577:
---

+1 for commit

> Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB
> -
>
> Key: HBASE-4577
> URL: https://issues.apache.org/jira/browse/HBASE-4577
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
>Assignee: gaojinchao
>Priority: Minor
> Fix For: 0.92.0
>
> Attachments: HBASE-4577_trial_Trunk.patch, HBASE-4577_trunk.patch
>
>
> Minor issue while looking at the RS metrics:
> bq. numberOfStorefiles=8, storefileUncompressedSizeMB=2418, 
> storefileSizeMB=2420, compressionRatio=1.0008
> I guess there's a truncation somewhere when it's adding the numbers up.
> FWIW there's no compression on that table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4716) Improve locking for single column family bulk load

2011-11-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141937#comment-13141937
 ] 

stack commented on HBASE-4716:
--

Should locks be accompanied by try/finally's where locks get cleaned up even in 
the face of exceptions?

And what if we take the alternate code path here:

{code}
 if (this.closed.get()) {
{code}

... will the locks be cleaned up?


> Improve locking for single column family bulk load
> --
>
> Key: HBASE-4716
> URL: https://issues.apache.org/jira/browse/HBASE-4716
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 4716-v2.txt, 4716.txt
>
>
> HBASE-4552 changed the locking behavior for single column family bulk load, 
> namely we don't need to take write lock.
> A read lock would suffice in this scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2312) Possible data loss when RS goes into GC pause while rolling HLog

2011-11-01 Thread Phabricator (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141936#comment-13141936
 ] 

Phabricator commented on HBASE-2312:


khemani has accepted the revision "HBASE-2312 [jira] Possible data loss when RS 
goes into GC pause while rolling HLog".

  Looks good to me.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java:201-205 In 
the original patch the -splitting wouldn't be stripped off over here.  With the 
current diff, a region-server which was earlier known to be dead will get 
another opportunity to be treated as alive. But I don't think that will happen 
and this is fine.

REVISION DETAIL
  https://reviews.facebook.net/D99


> Possible data loss when RS goes into GC pause while rolling HLog
> 
>
> Key: HBASE-2312
> URL: https://issues.apache.org/jira/browse/HBASE-2312
> Project: HBase
>  Issue Type: Bug
>  Components: master, regionserver
>Affects Versions: 0.90.0
>Reporter: Karthik Ranganathan
>Assignee: Nicolas Spiegelberg
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: D99.1.patch, D99.2.patch, D99.3.patch
>
>
> There is a very corner case when bad things could happen(ie data loss):
> 1)RS #1 is going to roll its HLog - not yet created the new one, old one 
> will get no more writes
> 2)RS #1 enters GC Pause of Death
> 3)Master lists HLog files of RS#1 that is has to split as RS#1 is dead, 
> starts splitting
> 4)RS #1 wakes up, created the new HLog (previous one was rolled) and 
> appends an edit - which is lost
> The following seems like a possible solution:
> 1)Master detects RS#1 is dead
> 2)The master renames the /hbase/.logs/  directory to 
> something else (say /hbase/.logs/-dead)
> 3)Add mkdir support (as opposed to mkdirs) to HDFS - so that a file 
> create fails if the directory doesn't exist. Dhruba tells me this is very 
> doable.
> 4)RS#1 comes back up and is not able create the new hlog. It restarts 
> itself.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4708) Revert safemode related pieces of hbase-4510

2011-11-01 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141930#comment-13141930
 ] 

Hudson commented on HBASE-4708:
---

Integrated in HBase-TRUNK #2398 (See 
[https://builds.apache.org/job/HBase-TRUNK/2398/])
HBASE-4708 Revert safemode related pieces of hbase-4510

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/FSUtils.java


> Revert safemode related pieces of hbase-4510
> 
>
> Key: HBASE-4708
> URL: https://issues.apache.org/jira/browse/HBASE-4708
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: Harsh J
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 4708-v2.txt, HBASE-4708.patch
>
>
> This thread in dev has us backing out the safemode related portions of 
> hbase-4510 commit: 
> http://search-hadoop.com/m/7WOjpVyG5F/Hmaster+can%2527t+start+for+the+latest+trunk+version&subj=Hmaster+can+t+start+for+the+latest+trunk+version

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4510) Check and workaround usage of internal HDFS APIs in HBase

2011-11-01 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141929#comment-13141929
 ] 

Hudson commented on HBASE-4510:
---

Integrated in HBase-TRUNK #2398 (See 
[https://builds.apache.org/job/HBase-TRUNK/2398/])
HBASE-4708 Revert safemode related pieces of hbase-4510


> Check and workaround usage of internal HDFS APIs in HBase
> -
>
> Key: HBASE-4510
> URL: https://issues.apache.org/jira/browse/HBASE-4510
> Project: HBase
>  Issue Type: Task
>Affects Versions: 0.94.0
>Reporter: Harsh J
>Assignee: Harsh J
> Fix For: 0.92.0
>
> Attachments: HBASE-4510.patch
>
>
> HBase isn't seemingly compiling anymore on 0.23 after the HDFS-1620 naming 
> refactorings were carried out.
> Two solutions:
> * We use new classnames. This breaks HBase's backward compatibility with 
> older Hadoop releases (is that a concern with future releases?)
> * HBase gets its own sets of constants as the upstream one is not marked for 
> public usage. This needs a little more maintenance on HBases' side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3680) Publish more metrics about mslab

2011-11-01 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141928#comment-13141928
 ] 

Hudson commented on HBASE-3680:
---

Integrated in HBase-TRUNK #2398 (See 
[https://builds.apache.org/job/HBase-TRUNK/2398/])
HBASE-3680 Publish more metrics about mslab; REVERT

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java


> Publish more metrics about mslab
> 
>
> Key: HBASE-3680
> URL: https://issues.apache.org/jira/browse/HBASE-3680
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.1
>Reporter: Jean-Daniel Cryans
>Assignee: Todd Lipcon
> Fix For: 0.92.0
>
> Attachments: hbase-3680.txt, hbase-3680.txt
>
>
> We have been using mslab on all our clusters for a while now and it seems it 
> tends to OOME or send us into GC loops of death a lot more than it used to. 
> For example, one RS with mslab enabled and 7GB of heap died out of OOME this 
> afternoon; it had .55GB in the block cache and 2.03GB in the memstores which 
> doesn't account for much... but it could be that because of mslab a lot of 
> space was lost in those incomplete 2MB blocks and without metrics we can't 
> really tell. Compactions were running at the time of the OOME and I see block 
> cache activity. The average load on that cluster is 531.
> We should at least publish the total size of all those blocks and maybe even 
> take actions based on that (like force flushing).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3515) [replication] ReplicationSource can miss a log after RS comes out of GC

2011-11-01 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141927#comment-13141927
 ] 

Hudson commented on HBASE-3515:
---

Integrated in HBase-TRUNK #2398 (See 
[https://builds.apache.org/job/HBase-TRUNK/2398/])
HBASE-3515 [replication] ReplicationSource can miss a log after RS comes 
out of GC
   (fixing a missing handling of IOE in Replication)
HBASE-3515  [replication] ReplicationSource can miss a log after RS comes out 
of GC

jdcryans : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java

jdcryans : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java


> [replication] ReplicationSource can miss a log after RS comes out of GC
> ---
>
> Key: HBASE-3515
> URL: https://issues.apache.org/jira/browse/HBASE-3515
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.0
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: HBASE-3515-v2-0.92.patch, HBASE-3515-v2.patch, 
> HBASE-3515.patch
>
>
> This is from Hudson build 1738, if a log is about to be rolled and the ZK 
> connection is already closed then the replication code will fail at adding 
> the new log in ZK but the log will still be rolled and it's possible that 
> some edits will make it in.
> From the log:
> {quote}
> 2011-02-08 10:21:20,618 FATAL 
> [RegionServer:0;vesta.apache.org,46117,1297160399378.logRoller] 
> regionserver.HRegionServer(1383):
>  ABORTING region server serverName=vesta.apache.org,46117,1297160399378, 
> load=(requests=1525, regions=12,
>  usedHeap=273, maxHeap=1244): Failed add log to list
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for 
>  
> /1/replication/rs/vesta.apache.org,46117,1297160399378/2/vesta.apache.org%3A46117.1297160480509
> ...
> 2011-02-08 10:21:22,444 DEBUG 
> [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:56008-0] 
> wal.HLogSplitter(258):
>  Splitting hlog 8 of 8: 
> hdfs://localhost:55474/user/hudson/.logs/vesta.apache.org,46117,1297160399378/vesta.apache.org%3A46117.1297160480509,
>  length=0
> 2011-02-08 10:21:22,862 DEBUG 
> [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:56008-0] 
> wal.HLogSplitter(436):
>  Pushed=31 entries from 
> hdfs://localhost:55474/user/hudson/.logs/vesta.apache.org,46117,1297160399378/vesta.apache.org%3A46117.1297160480509
> {quote}
> The easiest thing to do would be let the exception out and cancel the log 
> roll.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4277) HRS.closeRegion should be able to close regions with only the encoded name

2011-11-01 Thread ramkrishna.s.vasudevan (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4277:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Integrated to 0.90.5. Thanks for your reviews Stack and Ted.

> HRS.closeRegion should be able to close regions with only the encoded name
> --
>
> Key: HBASE-4277
> URL: https://issues.apache.org/jira/browse/HBASE-4277
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 0.90.5
>
> Attachments: HBASE-4277_0.90.patch
>
>
> As suggested by Stack in HBASE-4217 creating a new issue to provide a patch 
> for 0.90.x version.
> We had some sort of an outage this morning due to a few racks losing power, 
> and some regions were left in the following state:
> ERROR: Region UNKNOWN_REGION on sv4r17s9:60020, 
> key=e32bbe1f48c9b3633c557dc0291b90a3, not on HDFS or in META but deployed on 
> sv4r17s9:60020
> That region was deleted by the master but the region server never got the 
> memo. Right now there's no way to force close it because HRS.closeRegion 
> requires an HRI and the only way to create one is to get it from .META. which 
> in our case doesn't contain a row for that region. Basically we have to wait 
> until that server is dead to get rid of the region and make hbck happy.
> The required change is to have closeRegion accept an encoded name in both HBA 
> (when the RS address is provided) and HRS since it's able to find it anyways 
> from it's list of live regions.
> bq.If a 0.90 version, we maybe should do that in another issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4583) Integrate RWCC with Append and Increment operations

2011-11-01 Thread Lars Hofhansl (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141914#comment-13141914
 ] 

Lars Hofhansl commented on HBASE-4583:
--

As discussed on rb (thank Prakash and Jon), this is a bit more complicated than 
expected. Because append and increment are not idempotent we have to produce 
serializable schedules
(when we start with a value of 1 and add 1 by two separate increment 
operations, the end result has to be 3, regardless of how the two increment 
operations are interleaved).

There are various ways to produce serializable schedules (pessimistic locking, 
optimistic locking with rechecking of pre conditions, snapshot isolation, etc), 
all which will probably mean worse performance for both append and increment.

As said above the current implementation sync's the WAL after the memstore is 
updated and the new values are visible to other threads, and after the locks 
are released. A failure to sync the WAL will leave uncommitted state in the 
memstore, in addition other threads can be see uncommitted state.

We can only safely make the changes available to other threads after (1) the 
WAL is committed (for these atomic, non-idempotent operations at least), at the 
same time we need to (2) retain the row lock until the rwcc is forwarded 
(otherwise two increment operations could come in and both see the same start 
value, leading to a non serializable schedule)... So my current patch is no 
good (it leads to problem (2))

(1) and (2) together mean that the WAL needs to be sync'ed with the row lock 
held (which would be quite a performance degradation).

Now, what we could do is use rwcc to make the changes to the CFs atomic, and 
still sync the WAL after all the locks are released (as we do now). With this 
compromise everything would be correct *unless* the sync'ing of WAL fails (the 
theme used for puts to release locks early, then forward rwcc, and do a 
rollback of the applied values if WAL sync fails, does not work here).


> Integrate RWCC with Append and Increment operations
> ---
>
> Key: HBASE-4583
> URL: https://issues.apache.org/jira/browse/HBASE-4583
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.94.0
>
> Attachments: 4583-v2.txt, 4583-v3.txt, 4583-v4.txt, 4583.txt
>
>
> Currently Increment and Append operations do not work with RWCC and hence a 
> client could see the results of multiple such operation mixed in the same 
> Get/Scan.
> The semantics might be a bit more interesting here as upsert adds and removes 
> to and from the memstore.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-1744) Thrift server to match the new java api.

2011-11-01 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141911#comment-13141911
 ] 

Ted Yu commented on HBASE-1744:
---

Integrated to TRUNK.

Thanks for the patch, Lars and Tim.

Thanks for the green light Stack.

> Thrift server to match the new java api.
> 
>
> Key: HBASE-1744
> URL: https://issues.apache.org/jira/browse/HBASE-1744
> Project: HBase
>  Issue Type: Improvement
>  Components: thrift
>Reporter: Tim Sell
>Assignee: Tim Sell
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: 
> 0001-thrift2-enable-usage-of-.deleteColumns-for-thrift.patch, 1744-trunk.10, 
> HBASE-1744.11.patch, HBASE-1744.2.patch, HBASE-1744.3.patch, 
> HBASE-1744.4.patch, HBASE-1744.5.patch, HBASE-1744.6.patch, 
> HBASE-1744.7.patch, HBASE-1744.8.patch, HBASE-1744.9.patch, 
> HBASE-1744.preview.1.patch, thriftexperiment.patch
>
>
> This mutateRows, etc.. is a little confusing compared to the new cleaner java 
> client.
> Thinking of ways to make a thrift client that is just as elegant. something 
> like:
> void put(1:Bytes table, 2:TPut put) throws (1:IOError io)
> with:
> struct TColumn {
>   1:Bytes family,
>   2:Bytes qualifier,
>   3:i64 timestamp
> }
> struct TPut {
>   1:Bytes row,
>   2:map values
> }
> This creates more verbose rpc  than if the columns in TPut were just 
> map>, but that is harder to fit timestamps into and 
> still be intuitive from say python.
> Presumably the goal of a thrift gateway is to be easy first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4344) Persist memstoreTS to disk

2011-11-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141909#comment-13141909
 ] 

stack commented on HBASE-4344:
--

+1 on committing v12 to TRUNK.

> Persist memstoreTS to disk
> --
>
> Key: HBASE-4344
> URL: https://issues.apache.org/jira/browse/HBASE-4344
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Amitanand Aiyer
>Assignee: Amitanand Aiyer
> Fix For: 0.89.20100924
>
> Attachments: 4344-v10.txt, 4344-v11.txt, 4344-v12.txt, 4344-v2.txt, 
> 4344-v4.txt, 4344-v5.txt, 4344-v6.txt, 4344-v7.txt, 4344-v8.txt, 4344-v9.txt, 
> patch-2
>
>
> Atomicity can be achieved in two ways -- (i) by using  a multiversion 
> concurrency system (MVCC), or (ii) by ensuring that "new" writes do not 
> complete, until the "old" reads complete.
> Currently, Memstore uses something along the lines of MVCC (called RWCC for 
> read-write-consistency-control). But, this mechanism is not incorporated for 
> the key-values written to the disk, as they do not include the memstore TS.
> Let us make the two approaches be similar, by persisting the memstoreTS along 
> with the key-value when it is written to the disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4577) Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB

2011-11-01 Thread gaojinchao (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141904#comment-13141904
 ] 

gaojinchao commented on HBASE-4577:
---

Ok, Thanks.


> Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB
> -
>
> Key: HBASE-4577
> URL: https://issues.apache.org/jira/browse/HBASE-4577
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
>Assignee: gaojinchao
>Priority: Minor
> Fix For: 0.92.0
>
> Attachments: HBASE-4577_trial_Trunk.patch, HBASE-4577_trunk.patch
>
>
> Minor issue while looking at the RS metrics:
> bq. numberOfStorefiles=8, storefileUncompressedSizeMB=2418, 
> storefileSizeMB=2420, compressionRatio=1.0008
> I guess there's a truncation somewhere when it's adding the numbers up.
> FWIW there's no compression on that table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-1744) Thrift server to match the new java api.

2011-11-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141902#comment-13141902
 ] 

stack commented on HBASE-1744:
--

+1 on commit to TRUNK.

> Thrift server to match the new java api.
> 
>
> Key: HBASE-1744
> URL: https://issues.apache.org/jira/browse/HBASE-1744
> Project: HBase
>  Issue Type: Improvement
>  Components: thrift
>Reporter: Tim Sell
>Assignee: Tim Sell
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: 
> 0001-thrift2-enable-usage-of-.deleteColumns-for-thrift.patch, 1744-trunk.10, 
> HBASE-1744.11.patch, HBASE-1744.2.patch, HBASE-1744.3.patch, 
> HBASE-1744.4.patch, HBASE-1744.5.patch, HBASE-1744.6.patch, 
> HBASE-1744.7.patch, HBASE-1744.8.patch, HBASE-1744.9.patch, 
> HBASE-1744.preview.1.patch, thriftexperiment.patch
>
>
> This mutateRows, etc.. is a little confusing compared to the new cleaner java 
> client.
> Thinking of ways to make a thrift client that is just as elegant. something 
> like:
> void put(1:Bytes table, 2:TPut put) throws (1:IOError io)
> with:
> struct TColumn {
>   1:Bytes family,
>   2:Bytes qualifier,
>   3:i64 timestamp
> }
> struct TPut {
>   1:Bytes row,
>   2:map values
> }
> This creates more verbose rpc  than if the columns in TPut were just 
> map>, but that is harder to fit timestamps into and 
> still be intuitive from say python.
> Presumably the goal of a thrift gateway is to be easy first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4577) Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB

2011-11-01 Thread stack (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4577:
-

Status: Patch Available  (was: Open)

Patch looks good to me.  Submitting to patch build (In future Gao, be careful 
and put spaces around operators as in ' + ' rather than '+ ' -- good stuff Gao).

> Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB
> -
>
> Key: HBASE-4577
> URL: https://issues.apache.org/jira/browse/HBASE-4577
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
>Assignee: gaojinchao
>Priority: Minor
> Fix For: 0.92.0
>
> Attachments: HBASE-4577_trial_Trunk.patch, HBASE-4577_trunk.patch
>
>
> Minor issue while looking at the RS metrics:
> bq. numberOfStorefiles=8, storefileUncompressedSizeMB=2418, 
> storefileSizeMB=2420, compressionRatio=1.0008
> I guess there's a truncation somewhere when it's adding the numbers up.
> FWIW there's no compression on that table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4723) Loads of NotAllMetaRegionsOnlineException traces when starting the master

2011-11-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141898#comment-13141898
 ] 

stack commented on HBASE-4723:
--

+1 but change log message on commit because it could be -ROOT- that is not yet 
on-line; its not just .META. issue.

The TestDistributedLogSplitting failed because of 'too many open files'.  I 
added logging ulimit and host name to the build config so we can see which 
machine is w/o the right ulimit setting (Because Giri today said he'd set them 
all to 16k).  Lets see.

> Loads of NotAllMetaRegionsOnlineException traces when starting the master
> -
>
> Key: HBASE-4723
> URL: https://issues.apache.org/jira/browse/HBASE-4723
> Project: HBase
>  Issue Type: Improvement
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
>Priority: Minor
> Fix For: 0.92.0, 0.90.5
>
> Attachments: HBASE-4723.patch
>
>
> Minor annoyance, when starting a master I very often get 1 or more stack 
> traces like these:
> {quote}
> 2011-11-02 00:39:14,448 INFO org.apache.hadoop.hbase.catalog.CatalogTracker: 
> Retrying
> org.apache.hadoop.hbase.NotAllMetaRegionsOnlineException: Timed out (100ms)
>   at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:449)
>   at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:413)
>   at 
> org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:541)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:468)
>   at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:309)
>   at java.lang.Thread.run(Thread.java:662)
> {quote}
> 1) it's not super clear what's going on (putting myself in a new user's head) 
> and 2) those exceptions look bad (until you see they are at INFO level).
> I'd just do a little cleanup: remove the stack trace, add a more meaningful 
> message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4577) Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB

2011-11-01 Thread gaojinchao (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaojinchao updated HBASE-4577:
--

Attachment: HBASE-4577_trunk.patch

> Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB
> -
>
> Key: HBASE-4577
> URL: https://issues.apache.org/jira/browse/HBASE-4577
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
>Assignee: gaojinchao
>Priority: Minor
> Fix For: 0.92.0
>
> Attachments: HBASE-4577_trial_Trunk.patch, HBASE-4577_trunk.patch
>
>
> Minor issue while looking at the RS metrics:
> bq. numberOfStorefiles=8, storefileUncompressedSizeMB=2418, 
> storefileSizeMB=2420, compressionRatio=1.0008
> I guess there's a truncation somewhere when it's adding the numbers up.
> FWIW there's no compression on that table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4120) isolation and allocation

2011-11-01 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141882#comment-13141882
 ] 

jirapos...@reviews.apache.org commented on HBASE-4120:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1421/
---

(Updated 2011-11-02 03:32:00.146288)


Review request for hbase.


Changes
---

Fix the test case's bugs, all tests passed in maven.


Summary
---

Patch used for table priority alone,In this patch, not only tables can have 
different priorities but also the different actions like "get","scan","put" and 
"delete" can have priorities.


This addresses bug HBase-4120.
https://issues.apache.org/jira/browse/HBase-4120


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java
 1189169 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
 1189169 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/PriorityFunction.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/PriorityHBaseServer.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/PriorityJobQueue.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
 1189169 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/allocation/test/TestForActionPriority.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/allocation/test/TestForPriorityJobQueue.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/allocation/test/TestForTablePriority.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/1421/diff


Testing
---

Tested with test cases in  TestCase_For_TablePriority_trunk_v1.patch 
please apply the patch of HBASE-4181 first,in some circumstances this bug will 
affect the performance of client.


Thanks,

Jia



> isolation and allocation
> 
>
> Key: HBASE-4120
> URL: https://issues.apache.org/jira/browse/HBASE-4120
> Project: HBase
>  Issue Type: New Feature
>  Components: master, regionserver
>Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0
>Reporter: Liu Jia
>Assignee: Liu Jia
> Fix For: 0.94.0
>
> Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, 
> Design_document_for_HBase_isolation_and_allocation_Revised.pdf, 
> HBase_isolation_and_allocation_user_guide.pdf, 
> Performance_of_Table_priority.pdf, System Structure.jpg, TablePriority.patch
>
>
> The HBase isolation and allocation tool is designed to help users manage 
> cluster resource among different application and tables.
> When we have a large scale of HBase cluster with many applications running on 
> it, there will be lots of problems. In Taobao there is a cluster for many 
> departments to test their applications performance, these applications are 
> based on HBase. With one cluster which has 12 servers, there will be only one 
> application running exclusively on this server, and many other applications 
> must wait until the previous test finished.
> After we add allocation manage function to the cluster, applications can 
> share the cluster and run concurrently. Also if the Test Engineer wants to 
> make sure there is no interference, he/she can move out other tables from 
> this group.
> In groups we use table priority to allocate resource, when system is busy; we 
> can make sure high-priority tables are not affected lower-priority tables
> Different groups can have different region server configurations, some groups 
> optimized for reading can have large block cache size, and others optimized 
> for writing can have large memstore size. 
> Tables and region servers can be moved easily between groups; after changing 
> the configuration, a group can be restarted alone instead of restarting the 
> whole cluster.
> git entry : https://github.com/ICT-Ope/HBase_allocation .
> We hope our work is helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4708) Revert safemode related pieces of hbase-4510

2011-11-01 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141868#comment-13141868
 ] 

Hudson commented on HBASE-4708:
---

Integrated in HBase-0.92 #96 (See 
[https://builds.apache.org/job/HBase-0.92/96/])
HBASE-4708 Revert safemode related pieces of hbase-4510

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/FSUtils.java


> Revert safemode related pieces of hbase-4510
> 
>
> Key: HBASE-4708
> URL: https://issues.apache.org/jira/browse/HBASE-4708
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: Harsh J
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 4708-v2.txt, HBASE-4708.patch
>
>
> This thread in dev has us backing out the safemode related portions of 
> hbase-4510 commit: 
> http://search-hadoop.com/m/7WOjpVyG5F/Hmaster+can%2527t+start+for+the+latest+trunk+version&subj=Hmaster+can+t+start+for+the+latest+trunk+version

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3515) [replication] ReplicationSource can miss a log after RS comes out of GC

2011-11-01 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141864#comment-13141864
 ] 

Hudson commented on HBASE-3515:
---

Integrated in HBase-0.92 #96 (See 
[https://builds.apache.org/job/HBase-0.92/96/])
HBASE-3515  [replication] ReplicationSource can miss a log after RS comes 
out of GC

jdcryans : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALActionsListener.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java


> [replication] ReplicationSource can miss a log after RS comes out of GC
> ---
>
> Key: HBASE-3515
> URL: https://issues.apache.org/jira/browse/HBASE-3515
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.0
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: HBASE-3515-v2-0.92.patch, HBASE-3515-v2.patch, 
> HBASE-3515.patch
>
>
> This is from Hudson build 1738, if a log is about to be rolled and the ZK 
> connection is already closed then the replication code will fail at adding 
> the new log in ZK but the log will still be rolled and it's possible that 
> some edits will make it in.
> From the log:
> {quote}
> 2011-02-08 10:21:20,618 FATAL 
> [RegionServer:0;vesta.apache.org,46117,1297160399378.logRoller] 
> regionserver.HRegionServer(1383):
>  ABORTING region server serverName=vesta.apache.org,46117,1297160399378, 
> load=(requests=1525, regions=12,
>  usedHeap=273, maxHeap=1244): Failed add log to list
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for 
>  
> /1/replication/rs/vesta.apache.org,46117,1297160399378/2/vesta.apache.org%3A46117.1297160480509
> ...
> 2011-02-08 10:21:22,444 DEBUG 
> [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:56008-0] 
> wal.HLogSplitter(258):
>  Splitting hlog 8 of 8: 
> hdfs://localhost:55474/user/hudson/.logs/vesta.apache.org,46117,1297160399378/vesta.apache.org%3A46117.1297160480509,
>  length=0
> 2011-02-08 10:21:22,862 DEBUG 
> [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:56008-0] 
> wal.HLogSplitter(436):
>  Pushed=31 entries from 
> hdfs://localhost:55474/user/hudson/.logs/vesta.apache.org,46117,1297160399378/vesta.apache.org%3A46117.1297160480509
> {quote}
> The easiest thing to do would be let the exception out and cancel the log 
> roll.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4711) Remove jsr jar; not needed

2011-11-01 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141867#comment-13141867
 ] 

Hudson commented on HBASE-4711:
---

Integrated in HBase-0.92 #96 (See 
[https://builds.apache.org/job/HBase-0.92/96/])
HBASE-4711 Remove jsr jar; not needed

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/pom.xml


> Remove jsr jar; not needed
> --
>
> Key: HBASE-4711
> URL: https://issues.apache.org/jira/browse/HBASE-4711
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: stack
> Fix For: 0.92.0
>
> Attachments: jsr.txt
>
>
> From Kan, jsr classes are in the jersey core jar.  I tried a build with it 
> removed and all tests pass.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4611) Add support for Phabricator/Differential as an alternative code review tool

2011-11-01 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141865#comment-13141865
 ] 

Hudson commented on HBASE-4611:
---

Integrated in HBase-0.92 #96 (See 
[https://builds.apache.org/job/HBase-0.92/96/])
HBASE-4611 Add support for Phabricator/Differential as an alternative code 
review tool

nspiegelberg : 
Files : 
* /hbase/branches/0.92/.arcconfig
* /hbase/branches/0.92/.gitignore
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/pom.xml


> Add support for Phabricator/Differential as an alternative code review tool
> ---
>
> Key: HBASE-4611
> URL: https://issues.apache.org/jira/browse/HBASE-4611
> Project: HBase
>  Issue Type: Task
>Reporter: Jonathan Gray
>Assignee: Nicolas Spiegelberg
> Fix For: 0.92.0, 0.94.0
>
> Attachments: D153.1.patch, D165.1.patch, D165.2.patch, D171.1.patch, 
> D177.1.patch, D177.2.patch, D183.1.patch, D189.1.patch, D201.1.patch, 
> D207.1.patch, D21.1.patch, D21.1.patch
>
>
> From http://phabricator.org/ : "Phabricator is a open source collection of 
> web applications which make it easier to write, review, and share source 
> code. It is currently available as an early release. Phabricator was 
> developed at Facebook."
> It's open source so pretty much anyone could host an instance of this 
> software.
> To begin with, there will be a public-facing instance located at 
> http://reviews.facebook.net (sponsored by Facebook and hosted by the OSUOSL 
> http://osuosl.org).
> We will use this JIRA to deal with adding (and ensuring) Apache-friendly 
> support that will allow us to do code reviews with Phabricator for HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4510) Check and workaround usage of internal HDFS APIs in HBase

2011-11-01 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141866#comment-13141866
 ] 

Hudson commented on HBASE-4510:
---

Integrated in HBase-0.92 #96 (See 
[https://builds.apache.org/job/HBase-0.92/96/])
HBASE-4708 Revert safemode related pieces of hbase-4510


> Check and workaround usage of internal HDFS APIs in HBase
> -
>
> Key: HBASE-4510
> URL: https://issues.apache.org/jira/browse/HBASE-4510
> Project: HBase
>  Issue Type: Task
>Affects Versions: 0.94.0
>Reporter: Harsh J
>Assignee: Harsh J
> Fix For: 0.92.0
>
> Attachments: HBASE-4510.patch
>
>
> HBase isn't seemingly compiling anymore on 0.23 after the HDFS-1620 naming 
> refactorings were carried out.
> Two solutions:
> * We use new classnames. This breaks HBase's backward compatibility with 
> older Hadoop releases (is that a concern with future releases?)
> * HBase gets its own sets of constants as the upstream one is not marked for 
> public usage. This needs a little more maintenance on HBases' side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4723) Loads of NotAllMetaRegionsOnlineException traces when starting the master

2011-11-01 Thread Hadoop QA (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141861#comment-13141861
 ] 

Hadoop QA commented on HBASE-4723:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12501885/HBASE-4723.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -165 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 2 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.master.TestDistributedLogSplitting

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/132//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/132//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/132//console

This message is automatically generated.

> Loads of NotAllMetaRegionsOnlineException traces when starting the master
> -
>
> Key: HBASE-4723
> URL: https://issues.apache.org/jira/browse/HBASE-4723
> Project: HBase
>  Issue Type: Improvement
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
>Priority: Minor
> Fix For: 0.92.0, 0.90.5
>
> Attachments: HBASE-4723.patch
>
>
> Minor annoyance, when starting a master I very often get 1 or more stack 
> traces like these:
> {quote}
> 2011-11-02 00:39:14,448 INFO org.apache.hadoop.hbase.catalog.CatalogTracker: 
> Retrying
> org.apache.hadoop.hbase.NotAllMetaRegionsOnlineException: Timed out (100ms)
>   at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:449)
>   at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:413)
>   at 
> org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:541)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:468)
>   at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:309)
>   at java.lang.Thread.run(Thread.java:662)
> {quote}
> 1) it's not super clear what's going on (putting myself in a new user's head) 
> and 2) those exceptions look bad (until you see they are at INFO level).
> I'd just do a little cleanup: remove the stack trace, add a more meaningful 
> message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-1744) Thrift server to match the new java api.

2011-11-01 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141860#comment-13141860
 ] 

Ted Yu commented on HBASE-1744:
---

Failed tested were caused by 'Too many open files'

> Thrift server to match the new java api.
> 
>
> Key: HBASE-1744
> URL: https://issues.apache.org/jira/browse/HBASE-1744
> Project: HBase
>  Issue Type: Improvement
>  Components: thrift
>Reporter: Tim Sell
>Assignee: Tim Sell
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: 
> 0001-thrift2-enable-usage-of-.deleteColumns-for-thrift.patch, 1744-trunk.10, 
> HBASE-1744.11.patch, HBASE-1744.2.patch, HBASE-1744.3.patch, 
> HBASE-1744.4.patch, HBASE-1744.5.patch, HBASE-1744.6.patch, 
> HBASE-1744.7.patch, HBASE-1744.8.patch, HBASE-1744.9.patch, 
> HBASE-1744.preview.1.patch, thriftexperiment.patch
>
>
> This mutateRows, etc.. is a little confusing compared to the new cleaner java 
> client.
> Thinking of ways to make a thrift client that is just as elegant. something 
> like:
> void put(1:Bytes table, 2:TPut put) throws (1:IOError io)
> with:
> struct TColumn {
>   1:Bytes family,
>   2:Bytes qualifier,
>   3:i64 timestamp
> }
> struct TPut {
>   1:Bytes row,
>   2:map values
> }
> This creates more verbose rpc  than if the columns in TPut were just 
> map>, but that is harder to fit timestamps into and 
> still be intuitive from say python.
> Presumably the goal of a thrift gateway is to be easy first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-1744) Thrift server to match the new java api.

2011-11-01 Thread Hadoop QA (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141855#comment-13141855
 ] 

Hadoop QA commented on HBASE-1744:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12501883/HBASE-1744.11.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -165 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 40 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.master.TestMasterFailover
  org.apache.hadoop.hbase.master.TestDistributedLogSplitting

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/131//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/131//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/131//console

This message is automatically generated.

> Thrift server to match the new java api.
> 
>
> Key: HBASE-1744
> URL: https://issues.apache.org/jira/browse/HBASE-1744
> Project: HBase
>  Issue Type: Improvement
>  Components: thrift
>Reporter: Tim Sell
>Assignee: Tim Sell
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: 
> 0001-thrift2-enable-usage-of-.deleteColumns-for-thrift.patch, 1744-trunk.10, 
> HBASE-1744.11.patch, HBASE-1744.2.patch, HBASE-1744.3.patch, 
> HBASE-1744.4.patch, HBASE-1744.5.patch, HBASE-1744.6.patch, 
> HBASE-1744.7.patch, HBASE-1744.8.patch, HBASE-1744.9.patch, 
> HBASE-1744.preview.1.patch, thriftexperiment.patch
>
>
> This mutateRows, etc.. is a little confusing compared to the new cleaner java 
> client.
> Thinking of ways to make a thrift client that is just as elegant. something 
> like:
> void put(1:Bytes table, 2:TPut put) throws (1:IOError io)
> with:
> struct TColumn {
>   1:Bytes family,
>   2:Bytes qualifier,
>   3:i64 timestamp
> }
> struct TPut {
>   1:Bytes row,
>   2:map values
> }
> This creates more verbose rpc  than if the columns in TPut were just 
> map>, but that is harder to fit timestamps into and 
> still be intuitive from say python.
> Presumably the goal of a thrift gateway is to be easy first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4717) More efficient age-off of old data during major compaction

2011-11-01 Thread Todd Lipcon (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141851#comment-13141851
 ] 

Todd Lipcon commented on HBASE-4717:


nice idea - that's probably useful outside of this use case, too. Another idea 
is maintaining time range histograms for each storefile to estimate whether 
it's worth doing a "filtration".

> More efficient age-off of old data during major compaction
> --
>
> Key: HBASE-4717
> URL: https://issues.apache.org/jira/browse/HBASE-4717
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 0.94.0
>Reporter: Todd Lipcon
>
> Many applications need to implement efficient age-off of old data. We 
> currently only perform age-off during major compaction by scanning through 
> all of the KVs. Instead, we could implement the following:
> - Set hbase.hstore.compaction.max.size reasonably small. Thus, older store 
> files contain only smaller finite ranges of time.
> - Periodically run an "age-off compaction". This compaction would scan the 
> current list of storefiles. Any store file that falls entirely out of the TTL 
> time range would be dropped. Store files completely within the time range 
> would be un-altered. Those crossing the time-range boundary could either be 
> left alone or compacted using the existing compaction code.
> I don't have a design in mind for how exactly this would be implemented, but 
> hope to generate some discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4518) TestServerCustomProtocol is flaky

2011-11-01 Thread Gary Helmling (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141849#comment-13141849
 ] 

Gary Helmling commented on HBASE-4518:
--

I just tried and am still able to trigger failures locally as well:
{noformat}
Running org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol
Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 25.045 sec <<< 
FAILURE!

Results :

Failed tests:   
testRowRange(org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol): 
Results should contain region 
test,bbb,1320200292803.20c8a68e32824178b128060aff59386d. for row 'bbb'
{noformat}

I'll dig into it tonight.

> TestServerCustomProtocol is flaky
> -
>
> Key: HBASE-4518
> URL: https://issues.apache.org/jira/browse/HBASE-4518
> Project: HBase
>  Issue Type: Bug
>  Components: coprocessors, test
>Affects Versions: 0.92.0
>Reporter: Gary Helmling
> Fix For: 0.92.0
>
> Attachments: 
> org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol-output.txt, 
> org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol.txt
>
>
> TestServerCustomProtocol has been showing some intermittent failures in 
> Jenkins due to what looks like region transitions.
> Here is the most recent failure:
> {noformat}
> Results :
> Failed tests:   
> testRowRange(org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol): 
> Results should contain region 
> test,bbb,1317332645939.aea9154349b9e0dc207e2e9476702763. for row 'bbb'
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4716) Improve locking for single column family bulk load

2011-11-01 Thread Ted Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4716:
--

Priority: Critical  (was: Major)

> Improve locking for single column family bulk load
> --
>
> Key: HBASE-4716
> URL: https://issues.apache.org/jira/browse/HBASE-4716
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 4716-v2.txt, 4716.txt
>
>
> HBASE-4552 changed the locking behavior for single column family bulk load, 
> namely we don't need to take write lock.
> A read lock would suffice in this scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4716) Improve locking for single column family bulk load

2011-11-01 Thread Ted Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4716:
--

Description: 
HBASE-4552 changed the locking behavior for single column family bulk load, 
namely we don't need to take write lock.
A read lock would suffice in this scenario.

  was:
HBASE-4552 changed the locking behavior for single column family bulk load, 
namely we don't need to take write lock.
A read lock would suffice.


> Improve locking for single column family bulk load
> --
>
> Key: HBASE-4716
> URL: https://issues.apache.org/jira/browse/HBASE-4716
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 0.92.0
>
> Attachments: 4716-v2.txt, 4716.txt
>
>
> HBASE-4552 changed the locking behavior for single column family bulk load, 
> namely we don't need to take write lock.
> A read lock would suffice in this scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4518) TestServerCustomProtocol is flaky

2011-11-01 Thread Ted Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4518:
--

Attachment: 
org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol.txt

> TestServerCustomProtocol is flaky
> -
>
> Key: HBASE-4518
> URL: https://issues.apache.org/jira/browse/HBASE-4518
> Project: HBase
>  Issue Type: Bug
>  Components: coprocessors, test
>Affects Versions: 0.92.0
>Reporter: Gary Helmling
> Fix For: 0.92.0
>
> Attachments: 
> org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol-output.txt, 
> org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol.txt
>
>
> TestServerCustomProtocol has been showing some intermittent failures in 
> Jenkins due to what looks like region transitions.
> Here is the most recent failure:
> {noformat}
> Results :
> Failed tests:   
> testRowRange(org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol): 
> Results should contain region 
> test,bbb,1317332645939.aea9154349b9e0dc207e2e9476702763. for row 'bbb'
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4518) TestServerCustomProtocol is flaky

2011-11-01 Thread Ted Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4518:
--

Attachment: 
org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol-output.txt

> TestServerCustomProtocol is flaky
> -
>
> Key: HBASE-4518
> URL: https://issues.apache.org/jira/browse/HBASE-4518
> Project: HBase
>  Issue Type: Bug
>  Components: coprocessors, test
>Affects Versions: 0.92.0
>Reporter: Gary Helmling
> Fix For: 0.92.0
>
> Attachments: 
> org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol-output.txt, 
> org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol.txt
>
>
> TestServerCustomProtocol has been showing some intermittent failures in 
> Jenkins due to what looks like region transitions.
> Here is the most recent failure:
> {noformat}
> Results :
> Failed tests:   
> testRowRange(org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol): 
> Results should contain region 
> test,bbb,1317332645939.aea9154349b9e0dc207e2e9476702763. for row 'bbb'
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4518) TestServerCustomProtocol is flaky

2011-11-01 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141837#comment-13141837
 ] 

Ted Yu commented on HBASE-4518:
---

I got the following when running test suite for latest TRUNK:
{code}
testRowRange(org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol)  
Time elapsed: 0.057 sec  <<< FAILURE!
java.lang.AssertionError: Results should contain region 
test,ccc,1320196026392.a22c262d449fa04ca4beeeb78afaf650. for row 'ccc'
  at org.junit.Assert.fail(Assert.java:91)
  at org.junit.Assert.assertTrue(Assert.java:43)
  at 
org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol.verifyRegionResults(TestServerCustomProtocol.java:328)
  at 
org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol.verifyRegionResults(TestServerCustomProtocol.java:320)
  at 
org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol.testRowRange(TestServerCustomProtocol.java:220)
{code}

> TestServerCustomProtocol is flaky
> -
>
> Key: HBASE-4518
> URL: https://issues.apache.org/jira/browse/HBASE-4518
> Project: HBase
>  Issue Type: Bug
>  Components: coprocessors, test
>Affects Versions: 0.92.0
>Reporter: Gary Helmling
> Fix For: 0.92.0
>
>
> TestServerCustomProtocol has been showing some intermittent failures in 
> Jenkins due to what looks like region transitions.
> Here is the most recent failure:
> {noformat}
> Results :
> Failed tests:   
> testRowRange(org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol): 
> Results should contain region 
> test,bbb,1317332645939.aea9154349b9e0dc207e2e9476702763. for row 'bbb'
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4695) WAL logs get deleted before region server can fully flush

2011-11-01 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141822#comment-13141822
 ] 

Hudson commented on HBASE-4695:
---

Integrated in HBase-0.92 #95 (See 
[https://builds.apache.org/job/HBase-0.92/95/])
HBASE-4695  WAL logs get deleted before region server can fully flush

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java


> WAL logs get deleted before region server can fully flush
> -
>
> Key: HBASE-4695
> URL: https://issues.apache.org/jira/browse/HBASE-4695
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.90.4
>Reporter: jack levin
>Assignee: gaojinchao
>Priority: Blocker
> Fix For: 0.92.0, 0.90.5
>
> Attachments: HBASE-4695_Branch90_V2.patch, HBASE-4695_Trunk_V2.patch, 
> HBASE-4695_branch90_trial.patch, hbase-4695-0.92.txt
>
>
> To replicate the problem do the following:
> 1. check /hbase/.logs/ directory to see if you have WAL logs for the 
> region server you are shutting down.
> 2. executing kill  (where pid is a regionserver pid)
> 3. Watch the regionserver log to start flushing, you will see how many 
> regions are left to flush:
> 09:36:54,665 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting 
> on 489 regions to close
> 09:56:35,779 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting 
> on 116 regions to close
> 4. Check /hbase/.logs/ -- you will notice that it has dissapeared.
> 5. Check namenode logs:
> 09:26:41,607 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: 
> ugi=root ip=/10.101.1.5 cmd=delete 
> src=/hbase/.logs/rdaa5.prod.imageshack.com,60020,1319749
> Note that, if you kill -9 the RS now, and it crashes on flush, you won't have 
> any WAL logs to replay.  We need to make sure that logs are deleted or moved 
> out only when RS has fully flushed. Otherwise its possible to lose data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4344) Persist memstoreTS to disk

2011-11-01 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141815#comment-13141815
 ] 

Ted Yu commented on HBASE-4344:
---

I agree we should get the patch into TRUNK.

@Amit:
Can you add variable-length-for-compactions into patch v12 and submit for 
PreCommit build ?

Thanks

> Persist memstoreTS to disk
> --
>
> Key: HBASE-4344
> URL: https://issues.apache.org/jira/browse/HBASE-4344
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Amitanand Aiyer
>Assignee: Amitanand Aiyer
> Fix For: 0.89.20100924
>
> Attachments: 4344-v10.txt, 4344-v11.txt, 4344-v12.txt, 4344-v2.txt, 
> 4344-v4.txt, 4344-v5.txt, 4344-v6.txt, 4344-v7.txt, 4344-v8.txt, 4344-v9.txt, 
> patch-2
>
>
> Atomicity can be achieved in two ways -- (i) by using  a multiversion 
> concurrency system (MVCC), or (ii) by ensuring that "new" writes do not 
> complete, until the "old" reads complete.
> Currently, Memstore uses something along the lines of MVCC (called RWCC for 
> read-write-consistency-control). But, this mechanism is not incorporated for 
> the key-values written to the disk, as they do not include the memstore TS.
> Let us make the two approaches be similar, by persisting the memstoreTS along 
> with the key-value when it is written to the disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4723) Loads of NotAllMetaRegionsOnlineException traces when starting the master

2011-11-01 Thread Jean-Daniel Cryans (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-4723:
--

Attachment: HBASE-4723.patch

Extremely simple patch, it looks like this when running:

bq. 2011-11-02 00:50:55,981 INFO 
org.apache.hadoop.hbase.catalog.CatalogTracker: .META. still not available, 
sleeping and retrying. Reason: Timed out (100ms)

> Loads of NotAllMetaRegionsOnlineException traces when starting the master
> -
>
> Key: HBASE-4723
> URL: https://issues.apache.org/jira/browse/HBASE-4723
> Project: HBase
>  Issue Type: Improvement
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
>Priority: Minor
> Fix For: 0.92.0, 0.90.5
>
> Attachments: HBASE-4723.patch
>
>
> Minor annoyance, when starting a master I very often get 1 or more stack 
> traces like these:
> {quote}
> 2011-11-02 00:39:14,448 INFO org.apache.hadoop.hbase.catalog.CatalogTracker: 
> Retrying
> org.apache.hadoop.hbase.NotAllMetaRegionsOnlineException: Timed out (100ms)
>   at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:449)
>   at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:413)
>   at 
> org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:541)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:468)
>   at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:309)
>   at java.lang.Thread.run(Thread.java:662)
> {quote}
> 1) it's not super clear what's going on (putting myself in a new user's head) 
> and 2) those exceptions look bad (until you see they are at INFO level).
> I'd just do a little cleanup: remove the stack trace, add a more meaningful 
> message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4723) Loads of NotAllMetaRegionsOnlineException traces when starting the master

2011-11-01 Thread Jean-Daniel Cryans (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-4723:
--

Status: Patch Available  (was: Open)

> Loads of NotAllMetaRegionsOnlineException traces when starting the master
> -
>
> Key: HBASE-4723
> URL: https://issues.apache.org/jira/browse/HBASE-4723
> Project: HBase
>  Issue Type: Improvement
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
>Priority: Minor
> Fix For: 0.92.0, 0.90.5
>
> Attachments: HBASE-4723.patch
>
>
> Minor annoyance, when starting a master I very often get 1 or more stack 
> traces like these:
> {quote}
> 2011-11-02 00:39:14,448 INFO org.apache.hadoop.hbase.catalog.CatalogTracker: 
> Retrying
> org.apache.hadoop.hbase.NotAllMetaRegionsOnlineException: Timed out (100ms)
>   at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:449)
>   at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:413)
>   at 
> org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:541)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:468)
>   at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:309)
>   at java.lang.Thread.run(Thread.java:662)
> {quote}
> 1) it's not super clear what's going on (putting myself in a new user's head) 
> and 2) those exceptions look bad (until you see they are at INFO level).
> I'd just do a little cleanup: remove the stack trace, add a more meaningful 
> message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-1744) Thrift server to match the new java api.

2011-11-01 Thread Tim Sell (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Sell updated HBASE-1744:


Status: Open  (was: Patch Available)

cancelling existing patch

> Thrift server to match the new java api.
> 
>
> Key: HBASE-1744
> URL: https://issues.apache.org/jira/browse/HBASE-1744
> Project: HBase
>  Issue Type: Improvement
>  Components: thrift
>Reporter: Tim Sell
>Assignee: Tim Sell
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: 
> 0001-thrift2-enable-usage-of-.deleteColumns-for-thrift.patch, 1744-trunk.10, 
> HBASE-1744.11.patch, HBASE-1744.2.patch, HBASE-1744.3.patch, 
> HBASE-1744.4.patch, HBASE-1744.5.patch, HBASE-1744.6.patch, 
> HBASE-1744.7.patch, HBASE-1744.8.patch, HBASE-1744.9.patch, 
> HBASE-1744.preview.1.patch, thriftexperiment.patch
>
>
> This mutateRows, etc.. is a little confusing compared to the new cleaner java 
> client.
> Thinking of ways to make a thrift client that is just as elegant. something 
> like:
> void put(1:Bytes table, 2:TPut put) throws (1:IOError io)
> with:
> struct TColumn {
>   1:Bytes family,
>   2:Bytes qualifier,
>   3:i64 timestamp
> }
> struct TPut {
>   1:Bytes row,
>   2:map values
> }
> This creates more verbose rpc  than if the columns in TPut were just 
> map>, but that is harder to fit timestamps into and 
> still be intuitive from say python.
> Presumably the goal of a thrift gateway is to be easy first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4344) Persist memstoreTS to disk

2011-11-01 Thread Nicolas Spiegelberg (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141807#comment-13141807
 ] 

Nicolas Spiegelberg commented on HBASE-4344:


What is the problem holding v12 from commit?  I think we can check in this JIRA 
and wait to commit HBASE-4485 until the deadlock issue is solved.  In general, 
can we get as many of the HBASE-2856 JIRAs committed while not introducing any 
regression?  I don't think we need to solve the HBASE-2856 issue completely 
until we can commit anything.  Forward progress without steps back is 
sufficient.  Having this large array of uncommitted patch files is very 
unwieldy for patch management.

> Persist memstoreTS to disk
> --
>
> Key: HBASE-4344
> URL: https://issues.apache.org/jira/browse/HBASE-4344
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Amitanand Aiyer
>Assignee: Amitanand Aiyer
> Fix For: 0.89.20100924
>
> Attachments: 4344-v10.txt, 4344-v11.txt, 4344-v12.txt, 4344-v2.txt, 
> 4344-v4.txt, 4344-v5.txt, 4344-v6.txt, 4344-v7.txt, 4344-v8.txt, 4344-v9.txt, 
> patch-2
>
>
> Atomicity can be achieved in two ways -- (i) by using  a multiversion 
> concurrency system (MVCC), or (ii) by ensuring that "new" writes do not 
> complete, until the "old" reads complete.
> Currently, Memstore uses something along the lines of MVCC (called RWCC for 
> read-write-consistency-control). But, this mechanism is not incorporated for 
> the key-values written to the disk, as they do not include the memstore TS.
> Let us make the two approaches be similar, by persisting the memstoreTS along 
> with the key-value when it is written to the disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-1744) Thrift server to match the new java api.

2011-11-01 Thread Tim Sell (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Sell updated HBASE-1744:


Status: Patch Available  (was: Open)

> Thrift server to match the new java api.
> 
>
> Key: HBASE-1744
> URL: https://issues.apache.org/jira/browse/HBASE-1744
> Project: HBase
>  Issue Type: Improvement
>  Components: thrift
>Reporter: Tim Sell
>Assignee: Tim Sell
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: 
> 0001-thrift2-enable-usage-of-.deleteColumns-for-thrift.patch, 1744-trunk.10, 
> HBASE-1744.11.patch, HBASE-1744.2.patch, HBASE-1744.3.patch, 
> HBASE-1744.4.patch, HBASE-1744.5.patch, HBASE-1744.6.patch, 
> HBASE-1744.7.patch, HBASE-1744.8.patch, HBASE-1744.9.patch, 
> HBASE-1744.preview.1.patch, thriftexperiment.patch
>
>
> This mutateRows, etc.. is a little confusing compared to the new cleaner java 
> client.
> Thinking of ways to make a thrift client that is just as elegant. something 
> like:
> void put(1:Bytes table, 2:TPut put) throws (1:IOError io)
> with:
> struct TColumn {
>   1:Bytes family,
>   2:Bytes qualifier,
>   3:i64 timestamp
> }
> struct TPut {
>   1:Bytes row,
>   2:map values
> }
> This creates more verbose rpc  than if the columns in TPut were just 
> map>, but that is harder to fit timestamps into and 
> still be intuitive from say python.
> Presumably the goal of a thrift gateway is to be easy first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-1744) Thrift server to match the new java api.

2011-11-01 Thread Tim Sell (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Sell updated HBASE-1744:


Attachment: HBASE-1744.11.patch

added patch with fix so tests build against current trunk (also now using 
--no-prefix option in git diff)

> Thrift server to match the new java api.
> 
>
> Key: HBASE-1744
> URL: https://issues.apache.org/jira/browse/HBASE-1744
> Project: HBase
>  Issue Type: Improvement
>  Components: thrift
>Reporter: Tim Sell
>Assignee: Tim Sell
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: 
> 0001-thrift2-enable-usage-of-.deleteColumns-for-thrift.patch, 1744-trunk.10, 
> HBASE-1744.11.patch, HBASE-1744.2.patch, HBASE-1744.3.patch, 
> HBASE-1744.4.patch, HBASE-1744.5.patch, HBASE-1744.6.patch, 
> HBASE-1744.7.patch, HBASE-1744.8.patch, HBASE-1744.9.patch, 
> HBASE-1744.preview.1.patch, thriftexperiment.patch
>
>
> This mutateRows, etc.. is a little confusing compared to the new cleaner java 
> client.
> Thinking of ways to make a thrift client that is just as elegant. something 
> like:
> void put(1:Bytes table, 2:TPut put) throws (1:IOError io)
> with:
> struct TColumn {
>   1:Bytes family,
>   2:Bytes qualifier,
>   3:i64 timestamp
> }
> struct TPut {
>   1:Bytes row,
>   2:map values
> }
> This creates more verbose rpc  than if the columns in TPut were just 
> map>, but that is harder to fit timestamps into and 
> still be intuitive from say python.
> Presumably the goal of a thrift gateway is to be easy first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4723) Loads of NotAllMetaRegionsOnlineException traces when starting the master

2011-11-01 Thread Jean-Daniel Cryans (Created) (JIRA)

Loads of NotAllMetaRegionsOnlineException traces when starting the master
-

 Key: HBASE-4723
 URL: https://issues.apache.org/jira/browse/HBASE-4723
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Minor
 Fix For: 0.92.0, 0.90.5


Minor annoyance, when starting a master I very often get 1 or more stack traces 
like these:

{quote}
2011-11-02 00:39:14,448 INFO org.apache.hadoop.hbase.catalog.CatalogTracker: 
Retrying
org.apache.hadoop.hbase.NotAllMetaRegionsOnlineException: Timed out (100ms)
at 
org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:449)
at 
org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:413)
at 
org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:541)
at 
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:468)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:309)
at java.lang.Thread.run(Thread.java:662)
{quote}

1) it's not super clear what's going on (putting myself in a new user's head) 
and 2) those exceptions look bad (until you see they are at INFO level).

I'd just do a little cleanup: remove the stack trace, add a more meaningful 
message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4717) More efficient age-off of old data during major compaction

2011-11-01 Thread dhruba borthakur (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141801#comment-13141801
 ] 

dhruba borthakur commented on HBASE-4717:
-

Is it possible to looks at blooms (that are mostly in block cache) for two 
Hfiles, estimate how much overlap is there between kvs and then decide whether 
to compact/merge those two files?

> More efficient age-off of old data during major compaction
> --
>
> Key: HBASE-4717
> URL: https://issues.apache.org/jira/browse/HBASE-4717
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 0.94.0
>Reporter: Todd Lipcon
>
> Many applications need to implement efficient age-off of old data. We 
> currently only perform age-off during major compaction by scanning through 
> all of the KVs. Instead, we could implement the following:
> - Set hbase.hstore.compaction.max.size reasonably small. Thus, older store 
> files contain only smaller finite ranges of time.
> - Periodically run an "age-off compaction". This compaction would scan the 
> current list of storefiles. Any store file that falls entirely out of the TTL 
> time range would be dropped. Store files completely within the time range 
> would be un-altered. Those crossing the time-range boundary could either be 
> left alone or compacted using the existing compaction code.
> I don't have a design in mind for how exactly this would be implemented, but 
> hope to generate some discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-1744) Thrift server to match the new java api.

2011-11-01 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141799#comment-13141799
 ] 

Ted Yu commented on HBASE-1744:
---

@Tim:
Do you want to take care of the following ?
{code}
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/src/test/java/org/apache/hadoop/hbase/thrift2/TestThriftHBaseServiceHandler.java:[108,28]
 unreported exception java.lang.Exception; must be caught or declared to be 
thrown
{code}

> Thrift server to match the new java api.
> 
>
> Key: HBASE-1744
> URL: https://issues.apache.org/jira/browse/HBASE-1744
> Project: HBase
>  Issue Type: Improvement
>  Components: thrift
>Reporter: Tim Sell
>Assignee: Tim Sell
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: 
> 0001-thrift2-enable-usage-of-.deleteColumns-for-thrift.patch, 1744-trunk.10, 
> HBASE-1744.2.patch, HBASE-1744.3.patch, HBASE-1744.4.patch, 
> HBASE-1744.5.patch, HBASE-1744.6.patch, HBASE-1744.7.patch, 
> HBASE-1744.8.patch, HBASE-1744.9.patch, HBASE-1744.preview.1.patch, 
> thriftexperiment.patch
>
>
> This mutateRows, etc.. is a little confusing compared to the new cleaner java 
> client.
> Thinking of ways to make a thrift client that is just as elegant. something 
> like:
> void put(1:Bytes table, 2:TPut put) throws (1:IOError io)
> with:
> struct TColumn {
>   1:Bytes family,
>   2:Bytes qualifier,
>   3:i64 timestamp
> }
> struct TPut {
>   1:Bytes row,
>   2:map values
> }
> This creates more verbose rpc  than if the columns in TPut were just 
> map>, but that is harder to fit timestamps into and 
> still be intuitive from say python.
> Presumably the goal of a thrift gateway is to be easy first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-1744) Thrift server to match the new java api.

2011-11-01 Thread Ted Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-1744:
--

Status: Patch Available  (was: Open)

> Thrift server to match the new java api.
> 
>
> Key: HBASE-1744
> URL: https://issues.apache.org/jira/browse/HBASE-1744
> Project: HBase
>  Issue Type: Improvement
>  Components: thrift
>Reporter: Tim Sell
>Assignee: Tim Sell
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: 
> 0001-thrift2-enable-usage-of-.deleteColumns-for-thrift.patch, 1744-trunk.10, 
> HBASE-1744.2.patch, HBASE-1744.3.patch, HBASE-1744.4.patch, 
> HBASE-1744.5.patch, HBASE-1744.6.patch, HBASE-1744.7.patch, 
> HBASE-1744.8.patch, HBASE-1744.9.patch, HBASE-1744.preview.1.patch, 
> thriftexperiment.patch
>
>
> This mutateRows, etc.. is a little confusing compared to the new cleaner java 
> client.
> Thinking of ways to make a thrift client that is just as elegant. something 
> like:
> void put(1:Bytes table, 2:TPut put) throws (1:IOError io)
> with:
> struct TColumn {
>   1:Bytes family,
>   2:Bytes qualifier,
>   3:i64 timestamp
> }
> struct TPut {
>   1:Bytes row,
>   2:map values
> }
> This creates more verbose rpc  than if the columns in TPut were just 
> map>, but that is harder to fit timestamps into and 
> still be intuitive from say python.
> Presumably the goal of a thrift gateway is to be easy first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-1744) Thrift server to match the new java api.

2011-11-01 Thread Ted Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-1744:
--

Status: Open  (was: Patch Available)

> Thrift server to match the new java api.
> 
>
> Key: HBASE-1744
> URL: https://issues.apache.org/jira/browse/HBASE-1744
> Project: HBase
>  Issue Type: Improvement
>  Components: thrift
>Reporter: Tim Sell
>Assignee: Tim Sell
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: 
> 0001-thrift2-enable-usage-of-.deleteColumns-for-thrift.patch, 1744-trunk.10, 
> HBASE-1744.2.patch, HBASE-1744.3.patch, HBASE-1744.4.patch, 
> HBASE-1744.5.patch, HBASE-1744.6.patch, HBASE-1744.7.patch, 
> HBASE-1744.8.patch, HBASE-1744.9.patch, HBASE-1744.preview.1.patch, 
> thriftexperiment.patch
>
>
> This mutateRows, etc.. is a little confusing compared to the new cleaner java 
> client.
> Thinking of ways to make a thrift client that is just as elegant. something 
> like:
> void put(1:Bytes table, 2:TPut put) throws (1:IOError io)
> with:
> struct TColumn {
>   1:Bytes family,
>   2:Bytes qualifier,
>   3:i64 timestamp
> }
> struct TPut {
>   1:Bytes row,
>   2:map values
> }
> This creates more verbose rpc  than if the columns in TPut were just 
> map>, but that is harder to fit timestamps into and 
> still be intuitive from say python.
> Presumably the goal of a thrift gateway is to be easy first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4343) Get the TestAcidGuarantee unit test to fail consistently

2011-11-01 Thread Nicolas Spiegelberg (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141797#comment-13141797
 ] 

Nicolas Spiegelberg commented on HBASE-4343:


@Amit: can we add @Ignore to this test and commit?  Trying to get as much of 
this JIRA checked in as possible without causing any regression.

> Get the TestAcidGuarantee unit test to fail consistently
> 
>
> Key: HBASE-4343
> URL: https://issues.apache.org/jira/browse/HBASE-4343
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Amitanand Aiyer
>Assignee: Amitanand Aiyer
>Priority: Minor
> Fix For: 0.89.20100924
>
> Attachments: patch-1
>
>
> We know that TestAcidGuarantee is broken in the current trunk. However, the 
> unit-test passes more often than not.
> In order to test out the solution we need to get it to fail consistently. 
> This patch may not be committed/turned in. But,
> required to test/accept the fix to 2856.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-1744) Thrift server to match the new java api.

2011-11-01 Thread Hadoop QA (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141793#comment-13141793
 ] 

Hadoop QA commented on HBASE-1744:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12501880/1744-trunk.10
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -165 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 40 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/130//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/130//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/130//console

This message is automatically generated.

> Thrift server to match the new java api.
> 
>
> Key: HBASE-1744
> URL: https://issues.apache.org/jira/browse/HBASE-1744
> Project: HBase
>  Issue Type: Improvement
>  Components: thrift
>Reporter: Tim Sell
>Assignee: Tim Sell
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: 
> 0001-thrift2-enable-usage-of-.deleteColumns-for-thrift.patch, 1744-trunk.10, 
> HBASE-1744.2.patch, HBASE-1744.3.patch, HBASE-1744.4.patch, 
> HBASE-1744.5.patch, HBASE-1744.6.patch, HBASE-1744.7.patch, 
> HBASE-1744.8.patch, HBASE-1744.9.patch, HBASE-1744.preview.1.patch, 
> thriftexperiment.patch
>
>
> This mutateRows, etc.. is a little confusing compared to the new cleaner java 
> client.
> Thinking of ways to make a thrift client that is just as elegant. something 
> like:
> void put(1:Bytes table, 2:TPut put) throws (1:IOError io)
> with:
> struct TColumn {
>   1:Bytes family,
>   2:Bytes qualifier,
>   3:i64 timestamp
> }
> struct TPut {
>   1:Bytes row,
>   2:map values
> }
> This creates more verbose rpc  than if the columns in TPut were just 
> map>, but that is harder to fit timestamps into and 
> still be intuitive from say python.
> Presumably the goal of a thrift gateway is to be easy first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-1744) Thrift server to match the new java api.

2011-11-01 Thread Ted Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-1744:
--

Attachment: 1744-trunk.10

Previous patch testing was blocked by broken TRUNK builds.

> Thrift server to match the new java api.
> 
>
> Key: HBASE-1744
> URL: https://issues.apache.org/jira/browse/HBASE-1744
> Project: HBase
>  Issue Type: Improvement
>  Components: thrift
>Reporter: Tim Sell
>Assignee: Tim Sell
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: 
> 0001-thrift2-enable-usage-of-.deleteColumns-for-thrift.patch, 1744-trunk.10, 
> HBASE-1744.2.patch, HBASE-1744.3.patch, HBASE-1744.4.patch, 
> HBASE-1744.5.patch, HBASE-1744.6.patch, HBASE-1744.7.patch, 
> HBASE-1744.8.patch, HBASE-1744.9.patch, HBASE-1744.preview.1.patch, 
> thriftexperiment.patch
>
>
> This mutateRows, etc.. is a little confusing compared to the new cleaner java 
> client.
> Thinking of ways to make a thrift client that is just as elegant. something 
> like:
> void put(1:Bytes table, 2:TPut put) throws (1:IOError io)
> with:
> struct TColumn {
>   1:Bytes family,
>   2:Bytes qualifier,
>   3:i64 timestamp
> }
> struct TPut {
>   1:Bytes row,
>   2:map values
> }
> This creates more verbose rpc  than if the columns in TPut were just 
> map>, but that is harder to fit timestamps into and 
> still be intuitive from say python.
> Presumably the goal of a thrift gateway is to be easy first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-1744) Thrift server to match the new java api.

2011-11-01 Thread Ted Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-1744:
--

Attachment: (was: 1744-trunk.10)

> Thrift server to match the new java api.
> 
>
> Key: HBASE-1744
> URL: https://issues.apache.org/jira/browse/HBASE-1744
> Project: HBase
>  Issue Type: Improvement
>  Components: thrift
>Reporter: Tim Sell
>Assignee: Tim Sell
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: 
> 0001-thrift2-enable-usage-of-.deleteColumns-for-thrift.patch, 1744-trunk.10, 
> HBASE-1744.2.patch, HBASE-1744.3.patch, HBASE-1744.4.patch, 
> HBASE-1744.5.patch, HBASE-1744.6.patch, HBASE-1744.7.patch, 
> HBASE-1744.8.patch, HBASE-1744.9.patch, HBASE-1744.preview.1.patch, 
> thriftexperiment.patch
>
>
> This mutateRows, etc.. is a little confusing compared to the new cleaner java 
> client.
> Thinking of ways to make a thrift client that is just as elegant. something 
> like:
> void put(1:Bytes table, 2:TPut put) throws (1:IOError io)
> with:
> struct TColumn {
>   1:Bytes family,
>   2:Bytes qualifier,
>   3:i64 timestamp
> }
> struct TPut {
>   1:Bytes row,
>   2:map values
> }
> This creates more verbose rpc  than if the columns in TPut were just 
> map>, but that is harder to fit timestamps into and 
> still be intuitive from say python.
> Presumably the goal of a thrift gateway is to be easy first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4657) Improve the efficiency of our MR jobs with a few configurations

2011-11-01 Thread Jean-Daniel Cryans (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-4657:
--

Fix Version/s: (was: 0.92.0)
   0.94.0

Bumping to 0.94, nice to have but not critical.

> Improve the efficiency of our MR jobs with a few configurations
> ---
>
> Key: HBASE-4657
> URL: https://issues.apache.org/jira/browse/HBASE-4657
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.4
>Reporter: Jean-Daniel Cryans
> Fix For: 0.94.0
>
>
> This is a low hanging fruit, some of our MR jobs like RowCounter and 
> CopyTable don't even setCacheBlocks on the scan object which out of the box 
> completely screws up a running system. Another thing would be to disable 
> speculative execution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-1744) Thrift server to match the new java api.

2011-11-01 Thread Ted Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-1744:
--

Status: Open  (was: Patch Available)

> Thrift server to match the new java api.
> 
>
> Key: HBASE-1744
> URL: https://issues.apache.org/jira/browse/HBASE-1744
> Project: HBase
>  Issue Type: Improvement
>  Components: thrift
>Reporter: Tim Sell
>Assignee: Tim Sell
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: 
> 0001-thrift2-enable-usage-of-.deleteColumns-for-thrift.patch, 1744-trunk.10, 
> HBASE-1744.2.patch, HBASE-1744.3.patch, HBASE-1744.4.patch, 
> HBASE-1744.5.patch, HBASE-1744.6.patch, HBASE-1744.7.patch, 
> HBASE-1744.8.patch, HBASE-1744.9.patch, HBASE-1744.preview.1.patch, 
> thriftexperiment.patch
>
>
> This mutateRows, etc.. is a little confusing compared to the new cleaner java 
> client.
> Thinking of ways to make a thrift client that is just as elegant. something 
> like:
> void put(1:Bytes table, 2:TPut put) throws (1:IOError io)
> with:
> struct TColumn {
>   1:Bytes family,
>   2:Bytes qualifier,
>   3:i64 timestamp
> }
> struct TPut {
>   1:Bytes row,
>   2:map values
> }
> This creates more verbose rpc  than if the columns in TPut were just 
> map>, but that is harder to fit timestamps into and 
> still be intuitive from say python.
> Presumably the goal of a thrift gateway is to be easy first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-1744) Thrift server to match the new java api.

2011-11-01 Thread Ted Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-1744:
--

Status: Patch Available  (was: Open)

> Thrift server to match the new java api.
> 
>
> Key: HBASE-1744
> URL: https://issues.apache.org/jira/browse/HBASE-1744
> Project: HBase
>  Issue Type: Improvement
>  Components: thrift
>Reporter: Tim Sell
>Assignee: Tim Sell
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: 
> 0001-thrift2-enable-usage-of-.deleteColumns-for-thrift.patch, 1744-trunk.10, 
> HBASE-1744.2.patch, HBASE-1744.3.patch, HBASE-1744.4.patch, 
> HBASE-1744.5.patch, HBASE-1744.6.patch, HBASE-1744.7.patch, 
> HBASE-1744.8.patch, HBASE-1744.9.patch, HBASE-1744.preview.1.patch, 
> thriftexperiment.patch
>
>
> This mutateRows, etc.. is a little confusing compared to the new cleaner java 
> client.
> Thinking of ways to make a thrift client that is just as elegant. something 
> like:
> void put(1:Bytes table, 2:TPut put) throws (1:IOError io)
> with:
> struct TColumn {
>   1:Bytes family,
>   2:Bytes qualifier,
>   3:i64 timestamp
> }
> struct TPut {
>   1:Bytes row,
>   2:map values
> }
> This creates more verbose rpc  than if the columns in TPut were just 
> map>, but that is harder to fit timestamps into and 
> still be intuitive from say python.
> Presumably the goal of a thrift gateway is to be easy first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4722) TestGlobalMemStoreSize has started failing

2011-11-01 Thread stack (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4722:
-

Priority: Critical  (was: Major)

> TestGlobalMemStoreSize has started failing
> --
>
> Key: HBASE-4722
> URL: https://issues.apache.org/jira/browse/HBASE-4722
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Priority: Critical
> Attachments: logging.txt
>
>
> I'm digging in.  It fails occasionally for me locally to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4722) TestGlobalMemStoreSize has started failing

2011-11-01 Thread stack (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4722:
-

Attachment: logging.txt

A bit of logging to help debug whats going on.

> TestGlobalMemStoreSize has started failing
> --
>
> Key: HBASE-4722
> URL: https://issues.apache.org/jira/browse/HBASE-4722
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
> Attachments: logging.txt
>
>
> I'm digging in.  It fails occasionally for me locally to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4703) Improvements in tests

2011-11-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141783#comment-13141783
 ] 

stack commented on HBASE-4703:
--

TestAdmin passes for me.  I need to run it a few times?

> Improvements in tests
> -
>
> Key: HBASE-4703
> URL: https://issues.apache.org/jira/browse/HBASE-4703
> Project: HBase
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 0.92.0
> Environment: all
>Reporter: nkeywal
>Assignee: nkeywal
>Priority: Minor
> Fix For: 0.92.0
>
> Attachments: 20111030_4703_all.patch, 20111030_4703_all.v2.patch, 
> 20111030_4703_all.v3.patch, 20111030_4703_all.v4.patch
>
>
> Global:
>  - when possible, make the test using the default cluster configuration for 
> the number of region (1 instead of 2 or 3). This allows a faster stop/start, 
> and is a step toward a shared cluster configuration.
>  - 'sleep': lower or remove the sleep based synchronisation in the tests (in 
> HBaseTestingUtility, TestGlobalMemStoreSize, TestAdmin, 
> TestCoprocessorEndpoint, TestHFileOutputFormat, TestLruBlockCache, 
> TestServerCustomProtocol, TestReplicationSink)
>   - Optimize 'put' by setting setWriteToWAL to false, when the 'put' is big 
> or in a loop. Not done for tests that rely on the WAL.
>  
> Local issues:
> - TestTableInputFormatScan fully deletes the hadoop.tmp.dir directory on 
> tearDown, that makes it impossible to use in // with another test using this 
> directory
> - TestIdLock logs too much (9000 lines per seconds). Test time lowered to 15 
> seconds to make it a part of the small subset
> - TestMemoryBoundedLogMessageBuffer useless System.out.println
> - io.hfile.TestReseekTo useless System.out.println
> - TestTableInputFormat does not shutdown the cluster
> - testGlobalMemStore does not shutdown the cluster
> - rest.client.TestRemoteAdmin: simplified, does not use local admin, single 
> test instead of two.
> - HBaseTestingUtility#ensureSomeRegionServersAvailable starts only one 
> server, should start the number of missing server instead.
> - TestMergeTool should starts/stops the dfs cluster with HBaseTestingUtility

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4722) TestGlobalMemStoreSize has started failing

2011-11-01 Thread stack (Created) (JIRA)

TestGlobalMemStoreSize has started failing
--

 Key: HBASE-4722
 URL: https://issues.apache.org/jira/browse/HBASE-4722
 Project: HBase
  Issue Type: Bug
Reporter: stack


I'm digging in.  It fails occasionally for me locally to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4470) ServerNotRunningException coming out of assignRootAndMeta kills the Master

2011-11-01 Thread Jean-Daniel Cryans (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-4470:
--

Priority: Critical  (was: Blocker)

Downgrading to critical after the discussion with Stack.

> ServerNotRunningException coming out of assignRootAndMeta kills the Master
> --
>
> Key: HBASE-4470
> URL: https://issues.apache.org/jira/browse/HBASE-4470
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: Jean-Daniel Cryans
>Priority: Critical
> Fix For: 0.90.5
>
>
> I'm surprised we still have issues like that and I didn't get a hit while 
> googling so forgive me if there's already a jira about it.
> When the master starts it verifies the locations of root and meta before 
> assigning them, if the server is started but not running you'll get this:
> {quote}
> 2011-09-23 04:47:44,859 WARN 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: 
> RemoteException connecting to RS
> org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.hbase.ipc.ServerNotRunningException: Server is not running 
> yet
> at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1038)
> at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:771)
> at 
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
> at $Proxy6.getProtocolVersion(Unknown Source)
> at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
> at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
> at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
> at 
> org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:969)
> at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:388)
> at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:287)
> at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:484)
> at 
> org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:441)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:388)
> at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:282)
> {quote}
> I hit that 3-4 times this week while debugging something else. The worst is 
> that when you restart the master it sees that as a failover, but none of the 
> regions are assigned so it takes an eternity to get back fully online.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4577) Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB

2011-11-01 Thread Jean-Daniel Cryans (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141776#comment-13141776
 ] 

Jean-Daniel Cryans commented on HBASE-4577:
---

Looks good to me after some cleanup.

> Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB
> -
>
> Key: HBASE-4577
> URL: https://issues.apache.org/jira/browse/HBASE-4577
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
>Assignee: gaojinchao
>Priority: Minor
> Fix For: 0.92.0
>
> Attachments: HBASE-4577_trial_Trunk.patch
>
>
> Minor issue while looking at the RS metrics:
> bq. numberOfStorefiles=8, storefileUncompressedSizeMB=2418, 
> storefileSizeMB=2420, compressionRatio=1.0008
> I guess there's a truncation somewhere when it's adding the numbers up.
> FWIW there's no compression on that table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3939) Some crossports of Hadoop IPC fixes

2011-11-01 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141777#comment-13141777
 ] 

Ted Yu commented on HBASE-3939:
---

>From https://builds.apache.org/job/PreCommit-HBASE-Build/129//testReport/
{code}
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/src/test/java/org/apache/hadoop/hbase/ipc/TestDelayedRpc.java:[163,17]
 org.apache.hadoop.hbase.ipc.TestDelayedRpc.TestRpcImpl is not abstract and 
does not override abstract method 
getProtocolSignature(java.lang.String,long,int) in 
org.apache.hadoop.hbase.ipc.VersionedProtocol
[ERROR] 
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/src/test/java/org/apache/hadoop/hbase/ipc/TestDelayedRpc.java:[266,17]
 org.apache.hadoop.hbase.ipc.TestDelayedRpc.FaultyTestRpc is not abstract and 
does not override abstract method 
getProtocolSignature(java.lang.String,long,int) in 
org.apache.hadoop.hbase.ipc.VersionedProtocol
{code}

> Some crossports of Hadoop IPC fixes
> ---
>
> Key: HBASE-3939
> URL: https://issues.apache.org/jira/browse/HBASE-3939
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 3939-v2.txt, 3939-v3.txt, 3939-v4.txt, 3939-v5.txt, 
> 3939.txt
>
>
> A few fixes from Hadoop IPC that we should probably cross-port into our copy:
> - HADOOP-7227: remove the protocol version check at call time
> - HADOOP-7146: fix a socket leak in server
> - HADOOP-7121: fix behavior when response serialization throws an exception
> - HADOOP-7346: send back nicer error response when client is using an out of 
> date IPC version

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-3515) [replication] ReplicationSource can miss a log after RS comes out of GC

2011-11-01 Thread Jean-Daniel Cryans (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans resolved HBASE-3515.
---

Resolution: Fixed

Committed to 0.92 and trunk, thanks for looking at my patch Stack!

> [replication] ReplicationSource can miss a log after RS comes out of GC
> ---
>
> Key: HBASE-3515
> URL: https://issues.apache.org/jira/browse/HBASE-3515
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.0
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: HBASE-3515-v2-0.92.patch, HBASE-3515-v2.patch, 
> HBASE-3515.patch
>
>
> This is from Hudson build 1738, if a log is about to be rolled and the ZK 
> connection is already closed then the replication code will fail at adding 
> the new log in ZK but the log will still be rolled and it's possible that 
> some edits will make it in.
> From the log:
> {quote}
> 2011-02-08 10:21:20,618 FATAL 
> [RegionServer:0;vesta.apache.org,46117,1297160399378.logRoller] 
> regionserver.HRegionServer(1383):
>  ABORTING region server serverName=vesta.apache.org,46117,1297160399378, 
> load=(requests=1525, regions=12,
>  usedHeap=273, maxHeap=1244): Failed add log to list
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for 
>  
> /1/replication/rs/vesta.apache.org,46117,1297160399378/2/vesta.apache.org%3A46117.1297160480509
> ...
> 2011-02-08 10:21:22,444 DEBUG 
> [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:56008-0] 
> wal.HLogSplitter(258):
>  Splitting hlog 8 of 8: 
> hdfs://localhost:55474/user/hudson/.logs/vesta.apache.org,46117,1297160399378/vesta.apache.org%3A46117.1297160480509,
>  length=0
> 2011-02-08 10:21:22,862 DEBUG 
> [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:56008-0] 
> wal.HLogSplitter(436):
>  Pushed=31 entries from 
> hdfs://localhost:55474/user/hudson/.logs/vesta.apache.org,46117,1297160399378/vesta.apache.org%3A46117.1297160480509
> {quote}
> The easiest thing to do would be let the exception out and cancel the log 
> roll.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4703) Improvements in tests

2011-11-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141772#comment-13141772
 ] 

stack commented on HBASE-4703:
--

If '(client = 28, server = 29)' is not an issue in your environment, some rpc 
version setting got messed up (Trying it here).

Are you saying that we are stuck at:

{code}
...
org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
{code}

... and we never let go?  That'd be be strange (would explain why builds hang 
on jenkins sometimes though).  Isn't this joined on another thread?  Whats the 
other thread doing?

That stack trace is horrid to look at.  Can you make sense of it?  Whats it 
stuck on?

> Improvements in tests
> -
>
> Key: HBASE-4703
> URL: https://issues.apache.org/jira/browse/HBASE-4703
> Project: HBase
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 0.92.0
> Environment: all
>Reporter: nkeywal
>Assignee: nkeywal
>Priority: Minor
> Fix For: 0.92.0
>
> Attachments: 20111030_4703_all.patch, 20111030_4703_all.v2.patch, 
> 20111030_4703_all.v3.patch, 20111030_4703_all.v4.patch
>
>
> Global:
>  - when possible, make the test using the default cluster configuration for 
> the number of region (1 instead of 2 or 3). This allows a faster stop/start, 
> and is a step toward a shared cluster configuration.
>  - 'sleep': lower or remove the sleep based synchronisation in the tests (in 
> HBaseTestingUtility, TestGlobalMemStoreSize, TestAdmin, 
> TestCoprocessorEndpoint, TestHFileOutputFormat, TestLruBlockCache, 
> TestServerCustomProtocol, TestReplicationSink)
>   - Optimize 'put' by setting setWriteToWAL to false, when the 'put' is big 
> or in a loop. Not done for tests that rely on the WAL.
>  
> Local issues:
> - TestTableInputFormatScan fully deletes the hadoop.tmp.dir directory on 
> tearDown, that makes it impossible to use in // with another test using this 
> directory
> - TestIdLock logs too much (9000 lines per seconds). Test time lowered to 15 
> seconds to make it a part of the small subset
> - TestMemoryBoundedLogMessageBuffer useless System.out.println
> - io.hfile.TestReseekTo useless System.out.println
> - TestTableInputFormat does not shutdown the cluster
> - testGlobalMemStore does not shutdown the cluster
> - rest.client.TestRemoteAdmin: simplified, does not use local admin, single 
> test instead of two.
> - HBaseTestingUtility#ensureSomeRegionServersAvailable starts only one 
> server, should start the number of missing server instead.
> - TestMergeTool should starts/stops the dfs cluster with HBaseTestingUtility

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3515) [replication] ReplicationSource can miss a log after RS comes out of GC

2011-11-01 Thread Jean-Daniel Cryans (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-3515:
--

Attachment: HBASE-3515-v2.patch

And the patch for trunk.

> [replication] ReplicationSource can miss a log after RS comes out of GC
> ---
>
> Key: HBASE-3515
> URL: https://issues.apache.org/jira/browse/HBASE-3515
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.0
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: HBASE-3515-v2-0.92.patch, HBASE-3515-v2.patch, 
> HBASE-3515.patch
>
>
> This is from Hudson build 1738, if a log is about to be rolled and the ZK 
> connection is already closed then the replication code will fail at adding 
> the new log in ZK but the log will still be rolled and it's possible that 
> some edits will make it in.
> From the log:
> {quote}
> 2011-02-08 10:21:20,618 FATAL 
> [RegionServer:0;vesta.apache.org,46117,1297160399378.logRoller] 
> regionserver.HRegionServer(1383):
>  ABORTING region server serverName=vesta.apache.org,46117,1297160399378, 
> load=(requests=1525, regions=12,
>  usedHeap=273, maxHeap=1244): Failed add log to list
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for 
>  
> /1/replication/rs/vesta.apache.org,46117,1297160399378/2/vesta.apache.org%3A46117.1297160480509
> ...
> 2011-02-08 10:21:22,444 DEBUG 
> [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:56008-0] 
> wal.HLogSplitter(258):
>  Splitting hlog 8 of 8: 
> hdfs://localhost:55474/user/hudson/.logs/vesta.apache.org,46117,1297160399378/vesta.apache.org%3A46117.1297160480509,
>  length=0
> 2011-02-08 10:21:22,862 DEBUG 
> [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:56008-0] 
> wal.HLogSplitter(436):
>  Pushed=31 entries from 
> hdfs://localhost:55474/user/hudson/.logs/vesta.apache.org,46117,1297160399378/vesta.apache.org%3A46117.1297160480509
> {quote}
> The easiest thing to do would be let the exception out and cancel the log 
> roll.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3515) [replication] ReplicationSource can miss a log after RS comes out of GC

2011-11-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141766#comment-13141766
 ] 

stack commented on HBASE-3515:
--

+1 on patch.

> [replication] ReplicationSource can miss a log after RS comes out of GC
> ---
>
> Key: HBASE-3515
> URL: https://issues.apache.org/jira/browse/HBASE-3515
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.0
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: HBASE-3515-v2-0.92.patch, HBASE-3515.patch
>
>
> This is from Hudson build 1738, if a log is about to be rolled and the ZK 
> connection is already closed then the replication code will fail at adding 
> the new log in ZK but the log will still be rolled and it's possible that 
> some edits will make it in.
> From the log:
> {quote}
> 2011-02-08 10:21:20,618 FATAL 
> [RegionServer:0;vesta.apache.org,46117,1297160399378.logRoller] 
> regionserver.HRegionServer(1383):
>  ABORTING region server serverName=vesta.apache.org,46117,1297160399378, 
> load=(requests=1525, regions=12,
>  usedHeap=273, maxHeap=1244): Failed add log to list
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for 
>  
> /1/replication/rs/vesta.apache.org,46117,1297160399378/2/vesta.apache.org%3A46117.1297160480509
> ...
> 2011-02-08 10:21:22,444 DEBUG 
> [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:56008-0] 
> wal.HLogSplitter(258):
>  Splitting hlog 8 of 8: 
> hdfs://localhost:55474/user/hudson/.logs/vesta.apache.org,46117,1297160399378/vesta.apache.org%3A46117.1297160480509,
>  length=0
> 2011-02-08 10:21:22,862 DEBUG 
> [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:56008-0] 
> wal.HLogSplitter(436):
>  Pushed=31 entries from 
> hdfs://localhost:55474/user/hudson/.logs/vesta.apache.org,46117,1297160399378/vesta.apache.org%3A46117.1297160480509
> {quote}
> The easiest thing to do would be let the exception out and cancel the log 
> roll.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4703) Improvements in tests

2011-11-01 Thread nkeywal (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141762#comment-13141762
 ] 

nkeywal commented on HBASE-4703:


When I tried my patch, I reproduced on the trunk the 15minutes timeout. Adding 
a timeout on each method does not help, but I have the traces.

First, I have this, may be it's an issue in my env. I've just pulled the trunk.
{noformat}
2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
server = 29)
at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
{noformat}

Anyway, after this the logs finishes with:
{noformat}
2011-11-01 15:54:35,132 INFO  
[Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
Master:0;localhost,39664,1320187706355
{noformat}

it's in 
{noformat}
sun.management.ThreadImpl.getThreadInfo1(Native Method)
sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)

org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)

org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)

org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)

org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
{noformat}

So that's at least why adding a timeout wont help and may be why it does not 
end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.

I also wonder if the root cause of the non ending is my modif on the wal, with 
some threads surprised to have updates that were not written in the wal. Here 
is the full stack dump:

{noformat}
Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from 
nkeywal):
  State: TIMED_WAITING
  Blocked count: 360
  Waited count: 359
  Stack:
java.lang.Object.wait(Native Method)
org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702)
org.apache.hadoop.ipc.Client$Connection.run(Client.java:744)
Thread 272 (Master:0;localhost,39664,1320187706355-EventThread):
  State: WAITING
  Blocked count: 0
  Waited count: 4
  Waiting on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b
  Stack:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
Thread 271 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)):
  State: RUNNABLE
  Blocked count: 2
  Waited count: 0
  Stack:
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
Thread 152 (Master:0;localhost,39664,1320187706355):
  State: WAITING
  Blocked count: 217
  Waited count: 174
  Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c
  Stack:
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)

org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131)

org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104)

org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:277)
org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:523)

org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:468)
org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:309)
java.lang.Thread.run(Thread.java:662)
Thread 165 (LruBlockCache.EvictionThread):
  State: WAITING
  Blocked count: 0
  Waited count: 1
  Waiti

[jira] [Updated] (HBASE-3515) [replication] ReplicationSource can miss a log after RS comes out of GC

2011-11-01 Thread Jean-Daniel Cryans (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-3515:
--

Attachment: HBASE-3515-v2-0.92.patch

Attaching the patch for 0.92, for trunk you just need to remove the adding of 
the IOE since it's already there.

> [replication] ReplicationSource can miss a log after RS comes out of GC
> ---
>
> Key: HBASE-3515
> URL: https://issues.apache.org/jira/browse/HBASE-3515
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.0
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: HBASE-3515-v2-0.92.patch, HBASE-3515.patch
>
>
> This is from Hudson build 1738, if a log is about to be rolled and the ZK 
> connection is already closed then the replication code will fail at adding 
> the new log in ZK but the log will still be rolled and it's possible that 
> some edits will make it in.
> From the log:
> {quote}
> 2011-02-08 10:21:20,618 FATAL 
> [RegionServer:0;vesta.apache.org,46117,1297160399378.logRoller] 
> regionserver.HRegionServer(1383):
>  ABORTING region server serverName=vesta.apache.org,46117,1297160399378, 
> load=(requests=1525, regions=12,
>  usedHeap=273, maxHeap=1244): Failed add log to list
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for 
>  
> /1/replication/rs/vesta.apache.org,46117,1297160399378/2/vesta.apache.org%3A46117.1297160480509
> ...
> 2011-02-08 10:21:22,444 DEBUG 
> [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:56008-0] 
> wal.HLogSplitter(258):
>  Splitting hlog 8 of 8: 
> hdfs://localhost:55474/user/hudson/.logs/vesta.apache.org,46117,1297160399378/vesta.apache.org%3A46117.1297160480509,
>  length=0
> 2011-02-08 10:21:22,862 DEBUG 
> [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:56008-0] 
> wal.HLogSplitter(436):
>  Pushed=31 entries from 
> hdfs://localhost:55474/user/hudson/.logs/vesta.apache.org,46117,1297160399378/vesta.apache.org%3A46117.1297160480509
> {quote}
> The easiest thing to do would be let the exception out and cancel the log 
> roll.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-11-01 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141756#comment-13141756
 ] 

Hudson commented on HBASE-4532:
---

Integrated in HBase-TRUNK #2397 (See 
[https://builds.apache.org/job/HBase-TRUNK/2397/])
Fixed CHANGES file for HBASE-4532 & HBASE-4611

nspiegelberg : 
Files : 
* /hbase/trunk/CHANGES.txt


> Avoid top row seek by dedicated bloom filter for delete family bloom filter
> ---
>
> Key: HBASE-4532
> URL: https://issues.apache.org/jira/browse/HBASE-4532
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.94.0
>
> Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
> hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch
>
>
> The previous jira, HBASE-4469, is to avoid the top row seek operation if 
> row-col bloom filter is enabled. 
> This jira tries to avoid top row seek for all the cases by creating a 
> dedicated bloom filter only for delete family
> The only subtle use case is when we are interested in the top row with empty 
> column.
> For example, 
> we are interested in row1/cf1:/1/put.
> So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
> bloom filter will say there is NO delete family.
> Then it will avoid the top row seek and return a fake kv, which is the last 
> kv for this row (createLastOnRowCol).
> In this way, we have already missed the real kv we are interested in.
> The solution for the above problem is to disable this optimization if we are 
> trying to GET/SCAN a row with empty column.
> Evaluation from TestSeekOptimization:
> Previously:
> For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
> optimization: 1714 (68.40%), savings: 31.60%
> For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
> optimization: 1714 (68.40%), savings: 31.60%
> For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
> optimization: 1714 (68.40%), savings: 31.60%
> For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
> optimization: 1714 (68.40%), savings: 31.60%
> For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
> enabled.[HBASE-4469]
> 
> After this change:
> For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3939) Some crossports of Hadoop IPC fixes

2011-11-01 Thread Hadoop QA (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141757#comment-13141757
 ] 

Hadoop QA commented on HBASE-3939:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12501870/3939-v5.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -165 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 4 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/129//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/129//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/129//console

This message is automatically generated.

> Some crossports of Hadoop IPC fixes
> ---
>
> Key: HBASE-3939
> URL: https://issues.apache.org/jira/browse/HBASE-3939
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 3939-v2.txt, 3939-v3.txt, 3939-v4.txt, 3939-v5.txt, 
> 3939.txt
>
>
> A few fixes from Hadoop IPC that we should probably cross-port into our copy:
> - HADOOP-7227: remove the protocol version check at call time
> - HADOOP-7146: fix a socket leak in server
> - HADOOP-7121: fix behavior when response serialization throws an exception
> - HADOOP-7346: send back nicer error response when client is using an out of 
> date IPC version

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4695) WAL logs get deleted before region server can fully flush

2011-11-01 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141754#comment-13141754
 ] 

Hudson commented on HBASE-4695:
---

Integrated in HBase-TRUNK #2397 (See 
[https://builds.apache.org/job/HBase-TRUNK/2397/])
HBASE-4695  WAL logs get deleted before region server can fully flush

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java


> WAL logs get deleted before region server can fully flush
> -
>
> Key: HBASE-4695
> URL: https://issues.apache.org/jira/browse/HBASE-4695
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.90.4
>Reporter: jack levin
>Assignee: gaojinchao
>Priority: Blocker
> Fix For: 0.92.0, 0.90.5
>
> Attachments: HBASE-4695_Branch90_V2.patch, HBASE-4695_Trunk_V2.patch, 
> HBASE-4695_branch90_trial.patch, hbase-4695-0.92.txt
>
>
> To replicate the problem do the following:
> 1. check /hbase/.logs/ directory to see if you have WAL logs for the 
> region server you are shutting down.
> 2. executing kill  (where pid is a regionserver pid)
> 3. Watch the regionserver log to start flushing, you will see how many 
> regions are left to flush:
> 09:36:54,665 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting 
> on 489 regions to close
> 09:56:35,779 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting 
> on 116 regions to close
> 4. Check /hbase/.logs/ -- you will notice that it has dissapeared.
> 5. Check namenode logs:
> 09:26:41,607 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: 
> ugi=root ip=/10.101.1.5 cmd=delete 
> src=/hbase/.logs/rdaa5.prod.imageshack.com,60020,1319749
> Note that, if you kill -9 the RS now, and it crashes on flush, you won't have 
> any WAL logs to replay.  We need to make sure that logs are deleted or moved 
> out only when RS has fully flushed. Otherwise its possible to lose data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4696) HRegionThriftServer' might have to indefinitely do redirtects

2011-11-01 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141753#comment-13141753
 ] 

Hudson commented on HBASE-4696:
---

Integrated in HBase-TRUNK #2397 (See 
[https://builds.apache.org/job/HBase-TRUNK/2397/])
HBASE-4696  HRegionThriftServer' might have to indefinitely do redirtects 
(jgray)

jgray : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionThriftServer.java


> HRegionThriftServer' might have to indefinitely do redirtects
> -
>
> Key: HBASE-4696
> URL: https://issues.apache.org/jira/browse/HBASE-4696
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.94.0
>Reporter: Prakash Khemani
>Assignee: Jonathan Gray
> Fix For: 0.94.0
>
> Attachments: HBASE-4696-v1.patch
>
>
> HRegionThriftServer.getRowWithColumnsTs() redirects the request to the 
> correct region server if it has landed on the wrong region-server. With this 
> approach the smart-client will never get a NotServingRegionException and it 
> will never be able to invalidate its cache. It will indefinitely send the 
> request to the wrong region server and the wrong region server will always be 
> redirecting it.
> Either redirects should be turned off and the client should react to 
> NotServingRegionExceptions.
> Or somehow a flag should be set in the response telling the client to refresh 
> its cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4627) Ability to specify a custom start/end to RegionSplitter

2011-11-01 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141755#comment-13141755
 ] 

Hudson commented on HBASE-4627:
---

Integrated in HBase-TRUNK #2397 (See 
[https://builds.apache.org/job/HBase-TRUNK/2397/])
Revert "HBASE-4627 Ability to specify a custom start/end to RegionSplitter"
This reverts commit r1196256.
HBASE-4627 Ability to specify a custom start/end to RegionSplitter (unintended 
commit)

nspiegelberg : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/RegionSplitter.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestRegionSplitter.java

nspiegelberg : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/RegionSplitter.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestRegionSplitter.java


> Ability to specify a custom start/end to RegionSplitter
> ---
>
> Key: HBASE-4627
> URL: https://issues.apache.org/jira/browse/HBASE-4627
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.94.0
>Reporter: Nicolas Spiegelberg
>Assignee: Nicolas Spiegelberg
> Attachments: D39.1.patch, D39.1.patch
>
>
> HBASE-4489 changed the default endKey on HexStringSplit from 7FFF... to 
> ...  While this is correct, existing users of 0.90 RegionSplitter have 
> 7FFF as the end key in their schema and the last region will not split 
> properly under this new code.  We need to let the user specify a custom 
> start/end key range for when situations like this arise.  Optimally, we 
> should also write the start/end key in META so we could figure this out 
> implicitly instead of requiring the user to explicitly specify it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3680) Publish more metrics about mslab

2011-11-01 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141751#comment-13141751
 ] 

Hudson commented on HBASE-3680:
---

Integrated in HBase-TRUNK #2397 (See 
[https://builds.apache.org/job/HBase-TRUNK/2397/])
HBASE-3680 Publish more metrics about mslab; REVERTED
HBASE-3680 Publish more metrics about mslab
HBASE-3680 Publish more metrics about mslab

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreLAB.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStoreLAB.java

stack : 
Files : 
* /hbase/trunk/CHANGES.txt

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java


> Publish more metrics about mslab
> 
>
> Key: HBASE-3680
> URL: https://issues.apache.org/jira/browse/HBASE-3680
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.1
>Reporter: Jean-Daniel Cryans
>Assignee: Todd Lipcon
> Fix For: 0.92.0
>
> Attachments: hbase-3680.txt, hbase-3680.txt
>
>
> We have been using mslab on all our clusters for a while now and it seems it 
> tends to OOME or send us into GC loops of death a lot more than it used to. 
> For example, one RS with mslab enabled and 7GB of heap died out of OOME this 
> afternoon; it had .55GB in the block cache and 2.03GB in the memstores which 
> doesn't account for much... but it could be that because of mslab a lot of 
> space was lost in those incomplete 2MB blocks and without metrics we can't 
> really tell. Compactions were running at the time of the OOME and I see block 
> cache activity. The average load on that cluster is 531.
> We should at least publish the total size of all those blocks and maybe even 
> take actions based on that (like force flushing).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4611) Add support for Phabricator/Differential as an alternative code review tool

2011-11-01 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141752#comment-13141752
 ] 

Hudson commented on HBASE-4611:
---

Integrated in HBase-TRUNK #2397 (See 
[https://builds.apache.org/job/HBase-TRUNK/2397/])
Fixed CHANGES file for HBASE-4532 & HBASE-4611
HBASE-4611 Add support for Phabricator/Differential as an alternative code 
review tool

nspiegelberg : 
Files : 
* /hbase/trunk/CHANGES.txt

nspiegelberg : 
Files : 
* /hbase/trunk/.arcconfig
* /hbase/trunk/.gitignore
* /hbase/trunk/pom.xml


> Add support for Phabricator/Differential as an alternative code review tool
> ---
>
> Key: HBASE-4611
> URL: https://issues.apache.org/jira/browse/HBASE-4611
> Project: HBase
>  Issue Type: Task
>Reporter: Jonathan Gray
>Assignee: Nicolas Spiegelberg
> Fix For: 0.92.0, 0.94.0
>
> Attachments: D153.1.patch, D165.1.patch, D165.2.patch, D171.1.patch, 
> D177.1.patch, D177.2.patch, D183.1.patch, D189.1.patch, D201.1.patch, 
> D207.1.patch, D21.1.patch, D21.1.patch
>
>
> From http://phabricator.org/ : "Phabricator is a open source collection of 
> web applications which make it easier to write, review, and share source 
> code. It is currently available as an early release. Phabricator was 
> developed at Facebook."
> It's open source so pretty much anyone could host an instance of this 
> software.
> To begin with, there will be a public-facing instance located at 
> http://reviews.facebook.net (sponsored by Facebook and hosted by the OSUOSL 
> http://osuosl.org).
> We will use this JIRA to deal with adding (and ensuring) Apache-friendly 
> support that will allow us to do code reviews with Phabricator for HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3939) Some crossports of Hadoop IPC fixes

2011-11-01 Thread stack (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3939:
-

Attachment: 3939-v5.txt

v5 same as v4.  Trying to trigger patch-build.

> Some crossports of Hadoop IPC fixes
> ---
>
> Key: HBASE-3939
> URL: https://issues.apache.org/jira/browse/HBASE-3939
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 3939-v2.txt, 3939-v3.txt, 3939-v4.txt, 3939-v5.txt, 
> 3939.txt
>
>
> A few fixes from Hadoop IPC that we should probably cross-port into our copy:
> - HADOOP-7227: remove the protocol version check at call time
> - HADOOP-7146: fix a socket leak in server
> - HADOOP-7121: fix behavior when response serialization throws an exception
> - HADOOP-7346: send back nicer error response when client is using an out of 
> date IPC version

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3939) Some crossports of Hadoop IPC fixes

2011-11-01 Thread stack (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3939:
-

Status: Patch Available  (was: Open)

> Some crossports of Hadoop IPC fixes
> ---
>
> Key: HBASE-3939
> URL: https://issues.apache.org/jira/browse/HBASE-3939
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 3939-v2.txt, 3939-v3.txt, 3939-v4.txt, 3939-v5.txt, 
> 3939.txt
>
>
> A few fixes from Hadoop IPC that we should probably cross-port into our copy:
> - HADOOP-7227: remove the protocol version check at call time
> - HADOOP-7146: fix a socket leak in server
> - HADOOP-7121: fix behavior when response serialization throws an exception
> - HADOOP-7346: send back nicer error response when client is using an out of 
> date IPC version

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3939) Some crossports of Hadoop IPC fixes

2011-11-01 Thread stack (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3939:
-

Status: Open  (was: Patch Available)

> Some crossports of Hadoop IPC fixes
> ---
>
> Key: HBASE-3939
> URL: https://issues.apache.org/jira/browse/HBASE-3939
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 3939-v2.txt, 3939-v3.txt, 3939-v4.txt, 3939-v5.txt, 
> 3939.txt
>
>
> A few fixes from Hadoop IPC that we should probably cross-port into our copy:
> - HADOOP-7227: remove the protocol version check at call time
> - HADOOP-7146: fix a socket leak in server
> - HADOOP-7121: fix behavior when response serialization throws an exception
> - HADOOP-7346: send back nicer error response when client is using an out of 
> date IPC version

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4711) Remove jsr jar; not needed

2011-11-01 Thread stack (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4711:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to branch and trunk.  Was committed to trunk accidentally as part of:

{code}

r1195833 | stack | 2011-10-31 22:36:12 -0700 (Mon, 31 Oct 2011) | 1 line

HBASE-4703 Improvements in tests
{code}

> Remove jsr jar; not needed
> --
>
> Key: HBASE-4711
> URL: https://issues.apache.org/jira/browse/HBASE-4711
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: stack
> Fix For: 0.92.0
>
> Attachments: jsr.txt
>
>
> From Kan, jsr classes are in the jersey core jar.  I tried a build with it 
> removed and all tests pass.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4721) Configurable TTL for Delete Markers

2011-11-01 Thread Prakash Khemani (Created) (JIRA)

Configurable TTL for Delete Markers
---

 Key: HBASE-4721
 URL: https://issues.apache.org/jira/browse/HBASE-4721
 Project: HBase
  Issue Type: New Feature
Reporter: Prakash Khemani
Assignee: Prakash Khemani


There is a need to provide long TTLs for delete markers. This is useful when 
replicating hbase logs from one cluster to another. The receiving cluster 
shouldn't compact away the delete markers because the affected key-values might 
still be on the way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4703) Improvements in tests

2011-11-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141705#comment-13141705
 ] 

stack commented on HBASE-4703:
--

Want to do this in a new issue N?

> Improvements in tests
> -
>
> Key: HBASE-4703
> URL: https://issues.apache.org/jira/browse/HBASE-4703
> Project: HBase
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 0.92.0
> Environment: all
>Reporter: nkeywal
>Assignee: nkeywal
>Priority: Minor
> Fix For: 0.92.0
>
> Attachments: 20111030_4703_all.patch, 20111030_4703_all.v2.patch, 
> 20111030_4703_all.v3.patch, 20111030_4703_all.v4.patch
>
>
> Global:
>  - when possible, make the test using the default cluster configuration for 
> the number of region (1 instead of 2 or 3). This allows a faster stop/start, 
> and is a step toward a shared cluster configuration.
>  - 'sleep': lower or remove the sleep based synchronisation in the tests (in 
> HBaseTestingUtility, TestGlobalMemStoreSize, TestAdmin, 
> TestCoprocessorEndpoint, TestHFileOutputFormat, TestLruBlockCache, 
> TestServerCustomProtocol, TestReplicationSink)
>   - Optimize 'put' by setting setWriteToWAL to false, when the 'put' is big 
> or in a loop. Not done for tests that rely on the WAL.
>  
> Local issues:
> - TestTableInputFormatScan fully deletes the hadoop.tmp.dir directory on 
> tearDown, that makes it impossible to use in // with another test using this 
> directory
> - TestIdLock logs too much (9000 lines per seconds). Test time lowered to 15 
> seconds to make it a part of the small subset
> - TestMemoryBoundedLogMessageBuffer useless System.out.println
> - io.hfile.TestReseekTo useless System.out.println
> - TestTableInputFormat does not shutdown the cluster
> - testGlobalMemStore does not shutdown the cluster
> - rest.client.TestRemoteAdmin: simplified, does not use local admin, single 
> test instead of two.
> - HBaseTestingUtility#ensureSomeRegionServersAvailable starts only one 
> server, should start the number of missing server instead.
> - TestMergeTool should starts/stops the dfs cluster with HBaseTestingUtility

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4703) Improvements in tests

2011-11-01 Thread nkeywal (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141698#comment-13141698
 ] 

nkeywal commented on HBASE-4703:


It seems that it finally worked in the last trunk... and this proves nothing 
unfortunately.
I compared the dates in the logs. There are just 15 minutes between the two 
tests around TestAdmin, it shows that it's the timeout in the pom.xml that took 
place. I would propose to add @Test(timeout=xxx) for all tests in TestAdmin, 
that could help to identify what is the issue. I can do that in this JIRA or in 
another one.

> Improvements in tests
> -
>
> Key: HBASE-4703
> URL: https://issues.apache.org/jira/browse/HBASE-4703
> Project: HBase
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 0.92.0
> Environment: all
>Reporter: nkeywal
>Assignee: nkeywal
>Priority: Minor
> Fix For: 0.92.0
>
> Attachments: 20111030_4703_all.patch, 20111030_4703_all.v2.patch, 
> 20111030_4703_all.v3.patch, 20111030_4703_all.v4.patch
>
>
> Global:
>  - when possible, make the test using the default cluster configuration for 
> the number of region (1 instead of 2 or 3). This allows a faster stop/start, 
> and is a step toward a shared cluster configuration.
>  - 'sleep': lower or remove the sleep based synchronisation in the tests (in 
> HBaseTestingUtility, TestGlobalMemStoreSize, TestAdmin, 
> TestCoprocessorEndpoint, TestHFileOutputFormat, TestLruBlockCache, 
> TestServerCustomProtocol, TestReplicationSink)
>   - Optimize 'put' by setting setWriteToWAL to false, when the 'put' is big 
> or in a loop. Not done for tests that rely on the WAL.
>  
> Local issues:
> - TestTableInputFormatScan fully deletes the hadoop.tmp.dir directory on 
> tearDown, that makes it impossible to use in // with another test using this 
> directory
> - TestIdLock logs too much (9000 lines per seconds). Test time lowered to 15 
> seconds to make it a part of the small subset
> - TestMemoryBoundedLogMessageBuffer useless System.out.println
> - io.hfile.TestReseekTo useless System.out.println
> - TestTableInputFormat does not shutdown the cluster
> - testGlobalMemStore does not shutdown the cluster
> - rest.client.TestRemoteAdmin: simplified, does not use local admin, single 
> test instead of two.
> - HBaseTestingUtility#ensureSomeRegionServersAvailable starts only one 
> server, should start the number of missing server instead.
> - TestMergeTool should starts/stops the dfs cluster with HBaseTestingUtility

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4717) More efficient age-off of old data during major compaction

2011-11-01 Thread Todd Lipcon (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141681#comment-13141681
 ] 

Todd Lipcon commented on HBASE-4717:


bq. It would probably be simple to add a check during compaction time of the 
time range of each file and if the max is expired, just to wipe out that file.

That's one optimization, but only saves on the read of the now-expired file. We 
still have to read/rewrite all of the rest of the data periodically to do the 
age-off.

The new idea above is to introduce something more like a "filtration" than a 
"compaction" -- you would only rewrite files that have a significant amount of 
data to be aged.

> More efficient age-off of old data during major compaction
> --
>
> Key: HBASE-4717
> URL: https://issues.apache.org/jira/browse/HBASE-4717
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 0.94.0
>Reporter: Todd Lipcon
>
> Many applications need to implement efficient age-off of old data. We 
> currently only perform age-off during major compaction by scanning through 
> all of the KVs. Instead, we could implement the following:
> - Set hbase.hstore.compaction.max.size reasonably small. Thus, older store 
> files contain only smaller finite ranges of time.
> - Periodically run an "age-off compaction". This compaction would scan the 
> current list of storefiles. Any store file that falls entirely out of the TTL 
> time range would be dropped. Store files completely within the time range 
> would be un-altered. Those crossing the time-range boundary could either be 
> left alone or compacted using the existing compaction code.
> I don't have a design in mind for how exactly this would be implemented, but 
> hope to generate some discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-4611) Add support for Phabricator/Differential as an alternative code review tool

2011-11-01 Thread Nicolas Spiegelberg (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Spiegelberg resolved HBASE-4611.


   Resolution: Fixed
Fix Version/s: 0.94.0
   0.92.0

we had the entire FB HBase team go through the steps I mentioned for using arc 
today.  They were able to successfully install arc and use it to create an 
example diff (hence all the activity on this diff today).  Any interested 
parties, please give it a whirl and give feedback.

> Add support for Phabricator/Differential as an alternative code review tool
> ---
>
> Key: HBASE-4611
> URL: https://issues.apache.org/jira/browse/HBASE-4611
> Project: HBase
>  Issue Type: Task
>Reporter: Jonathan Gray
>Assignee: Nicolas Spiegelberg
> Fix For: 0.92.0, 0.94.0
>
> Attachments: D153.1.patch, D165.1.patch, D165.2.patch, D171.1.patch, 
> D177.1.patch, D177.2.patch, D183.1.patch, D189.1.patch, D201.1.patch, 
> D207.1.patch, D21.1.patch, D21.1.patch
>
>
> From http://phabricator.org/ : "Phabricator is a open source collection of 
> web applications which make it easier to write, review, and share source 
> code. It is currently available as an early release. Phabricator was 
> developed at Facebook."
> It's open source so pretty much anyone could host an instance of this 
> software.
> To begin with, there will be a public-facing instance located at 
> http://reviews.facebook.net (sponsored by Facebook and hosted by the OSUOSL 
> http://osuosl.org).
> We will use this JIRA to deal with adding (and ensuring) Apache-friendly 
> support that will allow us to do code reviews with Phabricator for HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4676) Prefix Compression - Trie data block encoding

2011-11-01 Thread Matt Corgan (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141676#comment-13141676
 ] 

Matt Corgan commented on HBASE-4676:


More thorough benchmark shared here: 
https://docs.google.com/spreadsheet/ccc?key=0Ah5tKh7-sVXYdGkyWm9WenhSRzVfd2U5VC1XNF9tekE

cell format: Key[int,int,string,string]Value[VInt]

avg raw key bytes: 52
avg raw value bytes: 1
avg cells/row: ~10

The TRIE compressor gives ~6x compression when all the timestamps are the same. 
 For this table we would set the raw block size at 256KB or 1MB which gives 
encoded sizes of 39KB or 178KB.  

PREFIX compressor is doing 2x compression on this test because I don't think it 
gets too involved with qualifiers or timestamps.


Encoding MB/s are:
NONE: 1000
PREFIX: 275
TRIE: 15

That is purely because of CPU because everything is in my linux page cache.  
TRIE encoding could be improved quite a bit, but will likely stay slower than 
the others.  Although if the scanners/comparators could be improved down the 
road, then it may speed up compaction significantly.

-
Scanning MB/s are:
NONE: 3700
PREFIX: 1800
TRIE: 800

Trie is slower for a scan through all cells because of more complex decoding.  
However, it could be much faster for things like row counting because it can 
quickly jump from one row to the next without iterating cells in between.

> Prefix Compression - Trie data block encoding
> -
>
> Key: HBASE-4676
> URL: https://issues.apache.org/jira/browse/HBASE-4676
> Project: HBase
>  Issue Type: New Feature
>  Components: io
>Affects Versions: 0.94.0
>Reporter: Matt Corgan
> Attachments: SeeksPerSec by blockSize.png
>
>
> The HBase data block format has room for 2 significant improvements for 
> applications that have high block cache hit ratios.  
> First, there is no prefix compression, and the current KeyValue format is 
> somewhat metadata heavy, so there can be tremendous memory bloat for many 
> common data layouts, specifically those with long keys and short values.
> Second, there is no random access to KeyValues inside data blocks.  This 
> means that every time you double the datablock size, average seek time (or 
> average cpu consumption) goes up by a factor of 2.  The standard 64KB block 
> size is ~10x slower for random seeks than a 4KB block size, but block sizes 
> as small as 4KB cause problems elsewhere.  Using block sizes of 256KB or 1MB 
> or more may be more efficient from a disk access and block-cache perspective 
> in many big-data applications, but doing so is infeasible from a random seek 
> perspective.
> The PrefixTrie block encoding format attempts to solve both of these 
> problems.  Some features:
> * trie format for row key encoding completely eliminates duplicate row keys 
> and encodes similar row keys into a standard trie structure which also saves 
> a lot of space
> * the column family is currently stored once at the beginning of each block.  
> this could easily be modified to allow multiple family names per block
> * all qualifiers in the block are stored in their own trie format which 
> caters nicely to wide rows.  duplicate qualifers between rows are eliminated. 
>  the size of this trie determines the width of the block's qualifier 
> fixed-width-int
> * the minimum timestamp is stored at the beginning of the block, and deltas 
> are calculated from that.  the maximum delta determines the width of the 
> block's timestamp fixed-width-int
> The block is structured with metadata at the beginning, then a section for 
> the row trie, then the column trie, then the timestamp deltas, and then then 
> all the values.  Most work is done in the row trie, where every leaf node 
> (corresponding to a row) contains a list of offsets/references corresponding 
> to the cells in that row.  Each cell is fixed-width to enable binary 
> searching and is represented by [1 byte operationType, X bytes qualifier 
> offset, X bytes timestamp delta offset].
> If all operation types are the same for a block, there will be zero per-cell 
> overhead.  Same for timestamps.  Same for qualifiers when i get a chance.  
> So, the compression aspect is very strong, but makes a few small sacrifices 
> on VarInt size to enable faster binary searches in trie fan-out nodes.
> A more compressed but slower version might build on this by also applying 
> further (suffix, etc) compression on the trie nodes at the cost of slower 
> write speed.  Even further compression could be obtained by using all VInts 
> instead of FInts with a sacrifice on random seek speed (though not huge).
> One current drawback is the current write speed.  While programmed with good 
> constructs like TreeMaps, ByteBuffers, binary searches, etc, it's not 
> programmed with the same level of op

[jira] [Updated] (HBASE-4708) Revert safemode related pieces of hbase-4510

2011-11-01 Thread stack (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4708:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk and branch.  Thanks for the patch Harsh.

> Revert safemode related pieces of hbase-4510
> 
>
> Key: HBASE-4708
> URL: https://issues.apache.org/jira/browse/HBASE-4708
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: Harsh J
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 4708-v2.txt, HBASE-4708.patch
>
>
> This thread in dev has us backing out the safemode related portions of 
> hbase-4510 commit: 
> http://search-hadoop.com/m/7WOjpVyG5F/Hmaster+can%2527t+start+for+the+latest+trunk+version&subj=Hmaster+can+t+start+for+the+latest+trunk+version

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-2312) Possible data loss when RS goes into GC pause while rolling HLog

2011-11-01 Thread Phabricator (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-2312:
---

Attachment: D99.3.patch

nspiegelberg updated the revision "HBASE-2312 [jira] Possible data loss when RS 
goes into GC pause while rolling HLog".
Reviewers: JIRA, stack

  Minor touchups + reinsert @Ignore since 0.20.205 doesn't have the critical 
patches necessary.  205.1 should however

REVISION DETAIL
  https://reviews.facebook.net/D99

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
  
src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
  
src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java


> Possible data loss when RS goes into GC pause while rolling HLog
> 
>
> Key: HBASE-2312
> URL: https://issues.apache.org/jira/browse/HBASE-2312
> Project: HBase
>  Issue Type: Bug
>  Components: master, regionserver
>Affects Versions: 0.90.0
>Reporter: Karthik Ranganathan
>Assignee: Nicolas Spiegelberg
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: D99.1.patch, D99.2.patch, D99.3.patch
>
>
> There is a very corner case when bad things could happen(ie data loss):
> 1)RS #1 is going to roll its HLog - not yet created the new one, old one 
> will get no more writes
> 2)RS #1 enters GC Pause of Death
> 3)Master lists HLog files of RS#1 that is has to split as RS#1 is dead, 
> starts splitting
> 4)RS #1 wakes up, created the new HLog (previous one was rolled) and 
> appends an edit - which is lost
> The following seems like a possible solution:
> 1)Master detects RS#1 is dead
> 2)The master renames the /hbase/.logs/  directory to 
> something else (say /hbase/.logs/-dead)
> 3)Add mkdir support (as opposed to mkdirs) to HDFS - so that a file 
> create fails if the directory doesn't exist. Dhruba tells me this is very 
> doable.
> 4)RS#1 comes back up and is not able create the new hlog. It restarts 
> itself.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4715) Remove stale broke .rb scripts from bin dir

2011-11-01 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141669#comment-13141669
 ] 

Hudson commented on HBASE-4715:
---

Integrated in HBase-TRUNK #2396 (See 
[https://builds.apache.org/job/HBase-TRUNK/2396/])
HBASE-4715 Remove stale broke .rb scripts from bin dir
HBASE-4715 Remove stale broke .rb scripts from bin dir

stack : 
Files : 
* /hbase/trunk/bin/set_meta_block_caching.rb

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/bin/add_table.rb
* /hbase/trunk/bin/check_meta.rb
* /hbase/trunk/bin/loadtable.rb
* /hbase/trunk/bin/rename_table.rb
* /hbase/trunk/bin/set_meta_memstore_size.rb


> Remove stale broke .rb scripts from bin dir
> ---
>
> Key: HBASE-4715
> URL: https://issues.apache.org/jira/browse/HBASE-4715
> Project: HBase
>  Issue Type: Task
>  Components: scripts
>Reporter: stack
>Assignee: stack
> Fix For: 0.92.0
>
> Attachments: 4715.txt
>
>
> Lets clean up bin dir removing scripts that have gone stale and don't work 
> any more.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4714) Don't ship w/ icms enabled by default

2011-11-01 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141670#comment-13141670
 ] 

Hudson commented on HBASE-4714:
---

Integrated in HBase-TRUNK #2396 (See 
[https://builds.apache.org/job/HBase-TRUNK/2396/])
HBASE-4714 Don't ship w/ icms enabled by default; REDO
HBASE-4714 Don't ship w/ icms enabled by default; REVERT -- I overcommitted by 
mistake
HBASE-4714 Don't ship w/ icms enabled by default

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/conf/hbase-env.sh
* /hbase/trunk/src/docbkx/performance.xml

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/conf/hbase-env.sh
* /hbase/trunk/src/docbkx/performance.xml
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HConstants.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/CreateTableHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/FSTableDescriptors.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/FSUtils.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/HMerge.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Merge.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionInfo.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestFSTableDescriptors.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestFSUtils.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestMergeTool.java

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/conf/hbase-env.sh
* /hbase/trunk/src/docbkx/performance.xml
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HConstants.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/CreateTableHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/FSTableDescriptors.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/FSUtils.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/HMerge.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Merge.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionInfo.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestFSTableDescriptors.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestFSUtils.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestMergeTool.java


> Don't ship w/ icms enabled by default
> -
>
> Key: HBASE-4714
> URL: https://issues.apache.org/jira/browse/HBASE-4714
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: stack
> Fix For: 0.92.0
>
> Attachments: icms-v2.txt, icms.txt
>
>
> "Incremental CMS (iCMS) was written for a specific use case - 1 or 2 hardware
> threads where concurrent activity by CMS would look like a STW (if
> only 1 hardware thread) or a high tax on the cpu cycles (2 hardware
> threads).   It has a higher overhead and also is less efficient in terms
> of identifying garbage.  The latter is because iCMS spreads out the
> concurrent work so that objects that it has identified as live earlier
> may actually be dead when the dead objects are swept up.  It's
> worth testing with regular CMS instead of iCMS."
> From recent hotspot mailing list message.
> Rare is the case where we run on systems of 1 or 2 hardware threads (other 
> than fellows laptops and there the above likely doesn't matter).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4710) HBaseRPC$UnknownProtocolException should abort any client retries in HConnectionManager

2011-11-01 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141671#comment-13141671
 ] 

Hudson commented on HBASE-4710:
---

Integrated in HBase-TRUNK #2396 (See 
[https://builds.apache.org/job/HBase-TRUNK/2396/])
HBASE-4710  UnknownProtocolException should abort client retries as a 
DoNotRetryIOException

garyh : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java


> HBaseRPC$UnknownProtocolException should abort any client retries in 
> HConnectionManager
> ---
>
> Key: HBASE-4710
> URL: https://issues.apache.org/jira/browse/HBASE-4710
> Project: HBase
>  Issue Type: Bug
>  Components: coprocessors
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
> Fix For: 0.92.0
>
> Attachments: HBASE-4710.patch
>
>
> While {{HBaseRPC$UnknownProtocolException}} currently extends 
> {{DoNotRetryIOException}}, it's still allowing retries of client RPCs when 
> encountered in {{HConnectionManager.getRegionServerWithRetries()}}.  It turns 
> out that {{UnknownProtocolException}} is missing a public constructor taking 
> a single {{String}} argument, which is required when unwrapping an 
> {{IOException}} from a {{RemoteException}} in 
> {{RemoteExceptionHandler.decodeRemoteException()}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3025) Coprocessor based simple access control

2011-11-01 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141659#comment-13141659
 ] 

jirapos...@reviews.apache.org commented on HBASE-3025:
--



bq.  On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.  > 
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java,
 line 192
bq.  > 
bq.  >
bq.  > Debug logging should go to LOG not AUDITLOG
bq.  
bq.  Gary Helmling wrote:
bq.  The idea was that all authorization decisions should be separated into 
audit log.  Here we're allowing access, so AUDITLOG seemed to make sense.  I 
agree that this still needs to be cleaned up a lot.  Maybe all audit logging 
should be done up in requirePermission() with authorization result?  At the 
very least we need a consistent format and consistent logging levels for 
messages (trace, right?).
bq.  
bq.  Andrew Purtell wrote:
bq.  > Maybe all audit logging should be done up in requirePermission() 
with authorization result?
bq.  
bq.  Sounds good.
bq.  
bq.  > At the very least we need a consistent format and consistent logging 
levels for messages (trace, right?).
bq.  
bq.  I'd argue for TRACE
bq.  
bq.  Gary Helmling wrote:
bq.  Reworked the audit logging to happen in requirePermission(), so we get 
a single log message per auth check indicating success or failure, with a more 
consistent format.  Result is logged to AUDITLOG at trace level.
bq.  
bq.  Michael Stack wrote:
bq.  Is there TRACE level in our commons interface?  I believe it just maps 
to DEBUG?
bq.  
bq.  Gary Helmling wrote:
bq.  Commons-logging source for 1.1.1 claims that with log4j >= 1.2.12, 
trace level is supported.  Prior to that it's mapped to debug.

Oh.  We need TRACE bad.  We have 1.2.16 log4j.  Have you seen TRACE logs Gary?  
If so, that'd make me happy.


- Michael


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2041/#review2077
---


On 2011-11-01 21:18:27, Gary Helmling wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2041/
bq.  ---
bq.  
bq.  (Updated 2011-11-01 21:18:27)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch implements access control list based authorization of HBase 
operations.  The patch depends on the currently posted patch for HBASE-2742 
(secure RPC engine).
bq.  
bq.  Key parts of the implementation are:
bq.  
bq.  * AccessControlLists - encapsulates storage of permission grants in a 
metadata table ("_acl_").  This differs from previous implementation where the 
".META." table was used to store permissions.
bq.  
bq.  * AccessController - 
bq.- implements MasterObserver and RegionObserver, performing authorization 
checks in each of the preXXX() hooks.  If authorization fails, an 
AccessDeniedException is thrown.
bq.- implements AccessControllerProtocol as a coprocessor endpoint to 
provide RPC methods for granting, revoking and listing permissions.
bq.  
bq.  * ZKPermissionWatcher (and TableAuthManager) - synchronizes ACL entries 
and updates throughout the cluster nodes using ZK.  ACL entries are stored in 
per-table znodes as /hbase/acl/tablename.
bq.  
bq.  * Additional ruby shell scripts providing the "grant", "revoke" and 
"user_permission" commands
bq.  
bq.  * Support for a new OWNER attribute in HTableDescriptor.  I could separate 
out this change into a new JIRA for discussion, but I don't see it as currently 
useful outside of security.  Alternately, I could handle the OWNER attribute 
completely in AccessController without changing HTD, but that would make 
interaction via hbase shell a bit uglier.
bq.  
bq.  
bq.  This addresses bug HBASE-3025.
bq.  https://issues.apache.org/jira/browse/HBASE-3025
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
security/src/main/java/org/apache/hadoop/hbase/security/access/AccessControlFilter.java
 PRE-CREATION 
bq.
security/src/main/java/org/apache/hadoop/hbase/security/access/AccessControlLists.java
 PRE-CREATION 
bq.
security/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java
 PRE-CREATION 
bq.
security/src/main/java/org/apache/hadoop/hbase/security/access/AccessControllerProtocol.java
 PRE-CREATION 
bq.
security/src/main/java/org/apache/hadoop/hbase/security/access/Permission.java 
PRE-CREATION 
bq.
security/src/main/java/org/apache/hadoop/hbase/security/access/TableAuthManager.java
 PRE-CREATION 
bq.
security/src/main/java/org/apache/hadoop/hba

[jira] [Updated] (HBASE-2312) Possible data loss when RS goes into GC pause while rolling HLog

2011-11-01 Thread Phabricator (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-2312:
---

Attachment: D99.2.patch

nspiegelberg updated the revision "HBASE-2312 [jira] Possible data loss when RS 
goes into GC pause while rolling HLog".
Reviewers: JIRA, stack

  Addressed Stack & Ted's change requests.  Integrated some fixes that Prakash 
found.

REVISION DETAIL
  https://reviews.facebook.net/D99

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
  
src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
  
src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java


> Possible data loss when RS goes into GC pause while rolling HLog
> 
>
> Key: HBASE-2312
> URL: https://issues.apache.org/jira/browse/HBASE-2312
> Project: HBase
>  Issue Type: Bug
>  Components: master, regionserver
>Affects Versions: 0.90.0
>Reporter: Karthik Ranganathan
>Assignee: Nicolas Spiegelberg
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: D99.1.patch, D99.2.patch
>
>
> There is a very corner case when bad things could happen(ie data loss):
> 1)RS #1 is going to roll its HLog - not yet created the new one, old one 
> will get no more writes
> 2)RS #1 enters GC Pause of Death
> 3)Master lists HLog files of RS#1 that is has to split as RS#1 is dead, 
> starts splitting
> 4)RS #1 wakes up, created the new HLog (previous one was rolled) and 
> appends an edit - which is lost
> The following seems like a possible solution:
> 1)Master detects RS#1 is dead
> 2)The master renames the /hbase/.logs/  directory to 
> something else (say /hbase/.logs/-dead)
> 3)Add mkdir support (as opposed to mkdirs) to HDFS - so that a file 
> create fails if the directory doesn't exist. Dhruba tells me this is very 
> doable.
> 4)RS#1 comes back up and is not able create the new hlog. It restarts 
> itself.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-1744) Thrift server to match the new java api.

2011-11-01 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141648#comment-13141648
 ] 

Ted Yu commented on HBASE-1744:
---

@Tim:
HadoopQA won't recognize patch v9 which requires -p1 at time of application.

I regenerated patch v10.
Once test result comes back clean, I can integrate.

> Thrift server to match the new java api.
> 
>
> Key: HBASE-1744
> URL: https://issues.apache.org/jira/browse/HBASE-1744
> Project: HBase
>  Issue Type: Improvement
>  Components: thrift
>Reporter: Tim Sell
>Assignee: Tim Sell
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: 
> 0001-thrift2-enable-usage-of-.deleteColumns-for-thrift.patch, 1744-trunk.10, 
> HBASE-1744.2.patch, HBASE-1744.3.patch, HBASE-1744.4.patch, 
> HBASE-1744.5.patch, HBASE-1744.6.patch, HBASE-1744.7.patch, 
> HBASE-1744.8.patch, HBASE-1744.9.patch, HBASE-1744.preview.1.patch, 
> thriftexperiment.patch
>
>
> This mutateRows, etc.. is a little confusing compared to the new cleaner java 
> client.
> Thinking of ways to make a thrift client that is just as elegant. something 
> like:
> void put(1:Bytes table, 2:TPut put) throws (1:IOError io)
> with:
> struct TColumn {
>   1:Bytes family,
>   2:Bytes qualifier,
>   3:i64 timestamp
> }
> struct TPut {
>   1:Bytes row,
>   2:map values
> }
> This creates more verbose rpc  than if the columns in TPut were just 
> map>, but that is harder to fit timestamps into and 
> still be intuitive from say python.
> Presumably the goal of a thrift gateway is to be easy first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

1 2 3 >

1 - 100 of 223 matches

Mail list logo