[jira] [Commented] (HBASE-4271) Clean up coprocessor's handlings of table operations

2011-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100121#comment-13100121
 ] 

Hudson commented on HBASE-4271:
---

Integrated in HBase-TRUNK #2187 (See 
[https://builds.apache.org/job/HBase-TRUNK/2187/])
HBASE-4271  Clean up coprocessor handling of table operations

garyh : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java


> Clean up coprocessor's handlings of table operations
> 
>
> Key: HBASE-4271
> URL: https://issues.apache.org/jira/browse/HBASE-4271
> Project: HBase
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 0.92.0
>
> Attachments: HBASE-4271_final.patch
>
>
> Couple fixes we can do w.r.t coprocessor's handlings of table operations.
> 1. Honor MasterObserver's requests to bypass default action.
> 2. Fix up the function signatures for preCreateTable to use HRegionInfo as 
> parameter instead.
> 3. Invoke postEnableTable, etc. methods after the operations are done.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.

2011-09-08 Thread ramkrishna.s.vasudevan (JIRA)
If from Admin we try to unassign a region forcefully, though a valid region 
name is given the master is not able to identify the region to unassign.


 Key: HBASE-4351
 URL: https://issues.apache.org/jira/browse/HBASE-4351
 Project: HBase
  Issue Type: Bug
 Environment: Linux
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan


The following is the problem
Get the exact region name from UI and call
HBaseAdmin.unassign(regionname, true).
Here true is forceful option.
As part of unassign api
{code}
  public void unassign(final byte [] regionName, final boolean force)
  throws IOException {
Pair pair =
  MetaReader.getRegion(this.catalogTracker, regionName);
if (pair == null) throw new 
UnknownRegionException(Bytes.toStringBinary(regionName));
HRegionInfo hri = pair.getFirst();
if (force) this.assignmentManager.clearRegionFromTransition(hri);
this.assignmentManager.unassign(hri, force);
  }
{code}
As part of clearRegionFromTransition()
{code}
synchronized (this.regions) {
  this.regions.remove(hri);
  for (Set regions : this.servers.values()) {
regions.remove(hri);
  }
}
{code}
the region is also removed.  Hence when the master tries to identify the region
{code}
  if (!regions.containsKey(region)) {
debugLog(region, "Attempted to unassign region " +
  region.getRegionNameAsString() + " but it is not " +
  "currently assigned anywhere");
return;
  }
{code}
It is not able to identify the region.  It exists in trunk and 0.90.x also.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4341) HRS#closeAllRegions should take care of HRS#onlineRegions's weak consistency

2011-09-08 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean updated HBASE-4341:


Attachment: HBASE-4341-Branch.patch

> HRS#closeAllRegions should take care of HRS#onlineRegions's weak consistency
> 
>
> Key: HBASE-4341
> URL: https://issues.apache.org/jira/browse/HBASE-4341
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.90.4
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
> Fix For: 0.90.5
>
> Attachments: HBASE-4341-Branch.patch
>
>
> This's the reason of why did "https://builds.apache.org/job/hbase-0.90/282"; 
> get failure . In this test, one case was timeout and cause the whole test 
> process got killed.
> [logs]
> Here's the related logs(From 
> org.apache.hadoop.hbase.mapreduce.TestTableMapReduce-output.txt):
> {noformat}
> 2011-08-31 10:09:01,089 INFO  
> [RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker] 
> regionserver.Leases(124): 
> RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker closing 
> leases
> 2011-08-31 10:09:01,089 INFO  
> [RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker] 
> regionserver.Leases(131): 
> RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker closed leases
> 2011-08-31 10:09:01,403 INFO  
> [RegionServer:0;vesta.apache.org,52257,1314785332968] 
> regionserver.HRegionServer(709): Waiting on 1 regions to close
> 2011-08-31 10:09:01,403 DEBUG 
> [RegionServer:0;vesta.apache.org,52257,1314785332968] 
> regionserver.HRegionServer(713): 
> {74a7a8befdf9561dc1d90c4241afeac7=mrtest,uuu,1314785328546.74a7a8befdf9561dc1d90c4241afeac7.}
> 2011-08-31 10:09:01,697 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:02,697 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:03,008 INFO  [vesta.apache.org:50036.timeoutMonitor] 
> hbase.Chore(79): vesta.apache.org:50036.timeoutMonitor exiting
> 2011-08-31 10:09:03,697 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:04,697 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:05,698 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:06,698 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:07,698 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> {noformat}
> [Analysis]
> One region was opened during the RS's stopping. 
> This is method of "HRS#closeAllRegions":
> {noformat}
>   protected void closeAllRegions(final boolean abort) {
> closeUserRegions(abort);
> -
> if (meta != null) closeRegion(meta.getRegionInfo(), abort, false);
> if (root != null) closeRegion(root.getRegionInfo(), abort, false);
>   }
> {noformat}
> HRS#onlineRegions is a ConcurrentHashMap. So walk down this map may not get 
> all the data if some entries are been added during the traverse. Once one 
> region was missed, it can't be closed anymore. And this regionserver will not 
> be stopped normally. Then the following logs occurred:
> {noformat}
> 2011-08-31 10:09:01,403 INFO  
> [RegionServer:0;vesta.apache.org,52257,1314785332968] 
> regionserver.HRegionServer(709): Waiting on 1 regions to close
> 2011-08-31 10:09:01,403 DEBUG 
> [RegionServer:0;vesta.apache.org,52257,1314785332968] 
> regionserver.HRegionServer(713): 
> {74a7a8befdf9561dc1d90c4241afeac7=mrtest,uuu,1314785328546.74a7a8befdf9561dc1d90c4241afeac7.}
> 2011-08-31 10:09:01,697 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4195) Possible inconsistency in a memstore read after a reseek, possible performance improvement

2011-09-08 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100139#comment-13100139
 ] 

nkeywal commented on HBASE-4195:


@stack: yes I am ok with all your points. Thanks! Some details below:

bq. Are seek and reseek the same now? Or it seems like they have a bunch of 
common code... can we factor it out to common method if so?

The initialization of kvTail & snapshotTail differs, then it's the same code. 
There are only 6 lines of code, but I aggree, it would be cleaner if shared in 
a private method (this would simplify as well the improvement on peek)


bq. We're fixing a bug where we may miss a Put if a flush comes in in meantime 
because we won't have a running Iterator on new KVSet (but maybe this is not 
such a big deal - perhaps - because its unlikely the new Put will be within the 
purview of the current read point?

That's what I expect. Note that between the 3 implementations:
- the initial one: it was impossible because we were just using the iterator 
without going back to the list.
- the one currently in the tunk: possible because we're restarting from the 
very beginning of the list.
- the proposed one; in the middle: we're not restarting from the beginning from 
from an intermediate point of the list.

So we're not in the same situation as we were 2 years ago, but I expect 
(without having done a full analysis) that the readpoint will hide this.

The best of the best, in terms of performance and similarity to the initial 
implementation, would be to get the sub-skiplist implictly pointed by the 
iterator, but there is nothing in the Java API to do it today: it would require 
to implement a specific skip list.

> Possible inconsistency in a memstore read after a reseek, possible 
> performance improvement
> --
>
> Key: HBASE-4195
> URL: https://issues.apache.org/jira/browse/HBASE-4195
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.90.4
> Environment: all
>Reporter: nkeywal
>Assignee: nkeywal
>Priority: Critical
> Fix For: 0.90.5
>
> Attachments: 20110824_4195_MemStore.patch, 
> 20110824_4195_TestHRegion.patch
>
>
> This follows the dicussion around HBASE-3855, and the random errors (20% 
> failure on trunk) on the unit test 
> org.apache.hadoop.hbase.regionserver.TestHRegion.testWritesWhileGetting
> I saw some points related to numIterReseek, used in the 
> MemStoreScanner#getNext (line 690):
> {noformat}679 protected KeyValue getNext(Iterator it) {
> 680 KeyValue ret = null;
> 681 long readPoint = ReadWriteConsistencyControl.getThreadReadPoint();
> 682 //DebugPrint.println( " MS@" + hashCode() + ": threadpoint = " + 
> readPoint);
> 683
> 684 while (ret == null && it.hasNext()) {
> 685   KeyValue v = it.next();
> 686   if (v.getMemstoreTS() <= readPoint) {
> 687 // keep it.
> 688 ret = v;
> 689   }
> 690   numIterReseek--;
> 691   if (numIterReseek == 0) {
> 692 break;
> 693}
> 694 }
> 695 return ret;
> 696   }{noformat}
> This function is called by seek, reseek, and next. The numIterReseek is only 
> usefull for reseek.
> There are some issues, I am not totally sure it's the root cause of the test 
> case error, but it could explain partly the randomness of the error, and one 
> point is for sure a bug.
> 1) In getNext, numIterReseek is decreased, then compared to zero. The seek 
> function sets numIterReseek to zero before calling getNext. It means that the 
> value will be actually negative, hence the test will always fail, and the 
> loop will continue. It is the expected behaviour, but it's quite smart.
> 2) In "reseek", numIterReseek is not set between the loops on the two 
> iterators. If the numIterReseek is equals to zero after the loop on the first 
> one, the loop on the second one will never call seek, as numIterReseek will 
> be negative.
> 3) Still in "reseek", the test to call "seek" is (kvsetNextRow == null && 
> numIterReseek == 0). In other words, if kvsetNextRow is not null when 
> numIterReseek equals zero, numIterReseek will start to be negative at the 
> next iteration and seek will never be called.
> 4) You can have side effects if reseek ends with a numIterReseek > 0: the 
> following calls to the "next" function will decrease numIterReseek to zero, 
> and getNext will break instead of continuing the loop. As a result, later 
> calls to next() may return null or not depending on how is configured the 
> default value for numIterReseek.
> To check if the issue comes from point 4, you can set the numIterReseek to 
> zero before returning in reseek:
> {nofor

[jira] [Commented] (HBASE-4313) Refactor TestHBaseFsck to make adding individual hbck tests easier

2011-09-08 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100160#comment-13100160
 ] 

jirapos...@reviews.apache.org commented on HBASE-4313:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1748/
---

Review request for hbase and Todd Lipcon.


Summary
---

Trunk version of HBASE-4313.

commit 2d0e127fbe13be7d1fda0a7dff91c0717dbb13a0
Author: Jonathan Hsieh 
Date:   Thu Aug 25 00:10:27 2011 -0700

HBASE-4313 Refactor TestHBaseFsck to make adding hbck tests easier

- Cleanup table creation
- Add proper MiniCluster shutdown
- Separate each corruption into a separate test


This addresses bug HBASE-4313.
https://issues.apache.org/jira/browse/HBASE-4313


Diffs
-

  src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 7ff8eb5 

Diff: https://reviews.apache.org/r/1748/diff


Testing
---

This unittest class passes.


Thanks,

jmhsieh



> Refactor TestHBaseFsck to make adding individual hbck tests easier
> --
>
> Key: HBASE-4313
> URL: https://issues.apache.org/jira/browse/HBASE-4313
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.4
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: 
> 0001-HBASE-4313-Refactor-TestHBaseFsck-to-make-adding-hbc.patch, 
> 0001-HBASE-4313-Refactor-TestHBaseFsck-to-make-adding-hbc.patch, 
> hbase-4313-trunk.patch
>
>
> The current TestHBaseFsck has one test case that tests multiple things in the 
> same table.  This refactor essentially preserves what is tested but isolates 
> each error type so that there is no bleed over in error from table to table.  
> This will also enable the writing of other simple to read tests for other 
> hbck detectable errors.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4340) Hbase can't balance.

2011-09-08 Thread gaojinchao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaojinchao updated HBASE-4340:
--

Attachment: HBASE-4340_branch90.patch

> Hbase can't balance.
> 
>
> Key: HBASE-4340
> URL: https://issues.apache.org/jira/browse/HBASE-4340
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: gaojinchao
>Assignee: gaojinchao
> Fix For: 0.90.5
>
> Attachments: HBASE-4340_branch90.patch
>
>
> Version: 0.90.4
> Cluster : 40 boxes
> As I saw below logs. It said that balance couldn't work because of a dead RS.
> I dug deeply and found two issues:
> 1.   shutdownhandler didn't clear numProcessing deal with some 
> exceptions. It seems whatever exceptions we should clear the flag or close 
> master.
> 2.   "dead regionserver(s): [158-1-130-12,20020,1314971097929]" is 
> inaccurate. The dead sever should be " 158-1-130-10,20020,1315068597979"
> //master logs:
> 2011-09-05 00:28:00,487 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 00:33:00,489 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 00:38:00,493 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 00:43:00,495 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 00:48:00,499 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 00:53:00,501 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 00:58:00,501 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:03:00,502 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:08:00,506 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:13:00,508 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:18:00,512 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:23:00,514 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:28:00,518 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:33:00,520 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:38:00,524 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:43:00,526 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:48:00,530 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:53:00,532 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:58:00,536 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 02:03:00,537 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 02:08:00,538 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 02:13:00,539 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 02:18:00,543 DEBUG org.apache

[jira] [Commented] (HBASE-4340) Hbase can't balance.

2011-09-08 Thread gaojinchao (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100207#comment-13100207
 ] 

gaojinchao commented on HBASE-4340:
---

I have made a patch, Please review.

> Hbase can't balance.
> 
>
> Key: HBASE-4340
> URL: https://issues.apache.org/jira/browse/HBASE-4340
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: gaojinchao
>Assignee: gaojinchao
> Fix For: 0.90.5
>
> Attachments: HBASE-4340_branch90.patch
>
>
> Version: 0.90.4
> Cluster : 40 boxes
> As I saw below logs. It said that balance couldn't work because of a dead RS.
> I dug deeply and found two issues:
> 1.   shutdownhandler didn't clear numProcessing deal with some 
> exceptions. It seems whatever exceptions we should clear the flag or close 
> master.
> 2.   "dead regionserver(s): [158-1-130-12,20020,1314971097929]" is 
> inaccurate. The dead sever should be " 158-1-130-10,20020,1315068597979"
> //master logs:
> 2011-09-05 00:28:00,487 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 00:33:00,489 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 00:38:00,493 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 00:43:00,495 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 00:48:00,499 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 00:53:00,501 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 00:58:00,501 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:03:00,502 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:08:00,506 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:13:00,508 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:18:00,512 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:23:00,514 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:28:00,518 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:33:00,520 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:38:00,524 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:43:00,526 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:48:00,530 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:53:00,532 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:58:00,536 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 02:03:00,537 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 02:08:00,538 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 02:13:00,539 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,131497

[jira] [Updated] (HBASE-4340) Hbase can't balance.

2011-09-08 Thread gaojinchao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaojinchao updated HBASE-4340:
--

Status: Patch Available  (was: Open)

> Hbase can't balance.
> 
>
> Key: HBASE-4340
> URL: https://issues.apache.org/jira/browse/HBASE-4340
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: gaojinchao
>Assignee: gaojinchao
> Fix For: 0.90.5
>
> Attachments: HBASE-4340_branch90.patch
>
>
> Version: 0.90.4
> Cluster : 40 boxes
> As I saw below logs. It said that balance couldn't work because of a dead RS.
> I dug deeply and found two issues:
> 1.   shutdownhandler didn't clear numProcessing deal with some 
> exceptions. It seems whatever exceptions we should clear the flag or close 
> master.
> 2.   "dead regionserver(s): [158-1-130-12,20020,1314971097929]" is 
> inaccurate. The dead sever should be " 158-1-130-10,20020,1315068597979"
> //master logs:
> 2011-09-05 00:28:00,487 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 00:33:00,489 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 00:38:00,493 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 00:43:00,495 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 00:48:00,499 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 00:53:00,501 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 00:58:00,501 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:03:00,502 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:08:00,506 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:13:00,508 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:18:00,512 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:23:00,514 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:28:00,518 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:33:00,520 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:38:00,524 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:43:00,526 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:48:00,530 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:53:00,532 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:58:00,536 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 02:03:00,537 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 02:08:00,538 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 02:13:00,539 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 02:18:00,543 DEBUG org.apache.

[jira] [Commented] (HBASE-4304) requestsPerSecond counter stuck at 0

2011-09-08 Thread subramanian raghunathan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100252#comment-13100252
 ] 

subramanian raghunathan commented on HBASE-4304:


I want to provide my analysis & background since its relatd to HBASE-3807 fixed 
by me earlier

1) Previously the problem was present only in the region server.  

As part of defect 

HBASE-3807:Fix units in RS UI metrics 

I had changed the HServerLoad also to use the region server metrics and made it 
to be consistent as  {color:green}requestPerSecond {color}.

{code}
return new HServerLoad(requestCount.get(),(int)metrics.getRequests(),
  (int)(memory.getUsed() / 1024 / 1024),
  (int) (memory.getMax() / 1024 / 1024), regionLoads);  
{code}

Request per second value derived from here 
:{color:green}(int)metrics.getRequests(){color}

2) But what i missed to find out was the region server metrics itself was not 
functioning properly.

Digged in and found following things.

1) HRegionServer.run() method is the major source of value provided for 
request.It runs by default every three seconds 

{code}
if ((now - lastMsg) >= msgInterval) {
{code}

The request count is populated here into the metrics as part of the doMetrics 
method.

{code}
this.metrics.incrementRequests(this.requestCount.get())
{code}

will go and update the metrics rate thro the method 
{code}
   public synchronized void inc(final int incr) {
value += incr;
  }
{code}  

{color:green}This gives the request received between the previous run and 
current run.{color}  

But the request per second value will be populated only when the following 
piece gets executed of the class MetricsRate 

{code}
 
  private synchronized void intervalHeartBeat() {
long now = System.currentTimeMillis();
long diff = (now-ts)/1000;
if (diff == 0) diff = 1; // sigh this is crap.
this.prevRate = (float)value / diff;
this.value = 0;
this.ts = now;
  }
  
@Override
  public synchronized void pushMetric(final MetricsRecord mr) {
intervalHeartBeat();
try {
  mr.setMetric(getName(), getPreviousIntervalValue());
} catch (Exception e) {
  LOG.info("pushMetric failed for " + getName() + "\n" +
  StringUtils.stringifyException(e));
}
  }
{code}

{color:red} pushMetric wont be invoked by default since the metrics won't be 
enabled by default 
hbase.class=org.apache.hadoop.metrics.spi.NullContext.
Please do correct me if i am  wrong here{color}

So ideally the value is always displayed as zero.

> requestsPerSecond counter stuck at 0
> 
>
> Key: HBASE-4304
> URL: https://issues.apache.org/jira/browse/HBASE-4304
> Project: HBase
>  Issue Type: Bug
>  Components: master, regionserver
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Assignee: Li Pi
>Priority: Critical
> Fix For: 0.92.0
>
>
> Running trunk @ r1163343, all of the requestsPerSecond counters are showing 0 
> both in the master UI and in the RS UI. The writeRequestsCount metric is 
> properly updating in the RS UI.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4304) requestsPerSecond counter stuck at 0

2011-09-08 Thread subramanian raghunathan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100261#comment-13100261
 ] 

subramanian raghunathan commented on HBASE-4304:


What i felt as corrective action could be is like :

{code}
  protected void metrics() {
this.metrics.regions.set(this.onlineRegions.size());
this.metrics.incrementRequests(this.requestCount.get());
//TODO:compute the request per second here  
{code} 

{color:green}//TODO:compute the request per second here {color}

So the the request per second value will be fixed till the next run 
which will be recalculated in the next run and the counter is reset to zero as 
part of 
{code}
 void tryRegionServerReport()
  throws IOException {
HServerLoad hsl = buildServerLoad();
// Why we do this?
this.requestCount.set(0); 
{code}

{color:red}But this has a contradiction with the usage of context based metrics 
update , when the context is enabled. 
There could be a chance of race in the computation of the value. 
The race will be resetting of the value to zero {this.value = 0;}
after computation {color}

{code}
private synchronized void intervalHeartBeat() {
long now = System.currentTimeMillis();
long diff = (now-ts)/1000;
if (diff == 0) diff = 1; // sigh this is crap.
this.prevRate = (float)value / diff;
this.value = 0;
this.ts = now;
  }
{code}

> requestsPerSecond counter stuck at 0
> 
>
> Key: HBASE-4304
> URL: https://issues.apache.org/jira/browse/HBASE-4304
> Project: HBase
>  Issue Type: Bug
>  Components: master, regionserver
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Assignee: Li Pi
>Priority: Critical
> Fix For: 0.92.0
>
>
> Running trunk @ r1163343, all of the requestsPerSecond counters are showing 0 
> both in the master UI and in the RS UI. The writeRequestsCount metric is 
> properly updating in the RS UI.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4304) requestsPerSecond counter stuck at 0

2011-09-08 Thread subramanian raghunathan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100263#comment-13100263
 ] 

subramanian raghunathan commented on HBASE-4304:


@LiPi are you working on this patch?

> requestsPerSecond counter stuck at 0
> 
>
> Key: HBASE-4304
> URL: https://issues.apache.org/jira/browse/HBASE-4304
> Project: HBase
>  Issue Type: Bug
>  Components: master, regionserver
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Assignee: Li Pi
>Priority: Critical
> Fix For: 0.92.0
>
>
> Running trunk @ r1163343, all of the requestsPerSecond counters are showing 0 
> both in the master UI and in the RS UI. The writeRequestsCount metric is 
> properly updating in the RS UI.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4247) Add isAborted method to the Abortable interface

2011-09-08 Thread Akash Ashok (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100320#comment-13100320
 ] 

Akash Ashok commented on HBASE-4247:


In HConnectionManager.java under the subclass HConnectionManagerImplementation

{code}
 @Override
public void abort(final String msg, Throwable t) {
  if (t instanceof KeeperException.SessionExpiredException) {
try {
  LOG.info("This client just lost it's session with ZooKeeper, trying" +
  " to reconnect.");
  resetZooKeeperTrackers();
  LOG.info("Reconnected successfully. This disconnect could have been" +
  " caused by a network partition or a long-running GC pause," +
  " either way it's recommended that you verify your environment.");
  return;
} catch (ZooKeeperConnectionException e) {
  LOG.error("Could not reconnect to ZooKeeper after session" +
  " expiration, aborting");
  t = e;
}
  }
  if (t != null) LOG.fatal(msg, t);
  else LOG.fatal(msg);
  this.closed = true;
}
{code}

If we call close() instead of this tests are failing( one of them being 
TestMergeTools ). I was wondering if some change should be made out here. 
Somehow I feel its not right set this.closed out here? Or if this is some 
special case and could be left untouched. 

Thanks

> Add isAborted method to the Abortable interface
> ---
>
> Key: HBASE-4247
> URL: https://issues.apache.org/jira/browse/HBASE-4247
> Project: HBase
>  Issue Type: Task
>Reporter: Akash Ashok
>Assignee: Akash Ashok
>Priority: Minor
> Fix For: 0.94.0
>
>
> Add a new method isAborted() to the Abortable interface 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4340) Hbase can't balance.

2011-09-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100341#comment-13100341
 ] 

Ted Yu commented on HBASE-4340:
---

The NPE happened on this line in MetaReader.java:
{code}
  final long startCode = 
Bytes.toLong(data.getValue(HConstants.CATALOG_FAMILY,
  HConstants.STARTCODE_QUALIFIER));
{code}
The patch looks reasonable since there is no action taken if hris is null.

Have you tested the patch on a cluster, Jinchao ?

> Hbase can't balance.
> 
>
> Key: HBASE-4340
> URL: https://issues.apache.org/jira/browse/HBASE-4340
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: gaojinchao
>Assignee: gaojinchao
> Fix For: 0.90.5
>
> Attachments: HBASE-4340_branch90.patch
>
>
> Version: 0.90.4
> Cluster : 40 boxes
> As I saw below logs. It said that balance couldn't work because of a dead RS.
> I dug deeply and found two issues:
> 1.   shutdownhandler didn't clear numProcessing deal with some 
> exceptions. It seems whatever exceptions we should clear the flag or close 
> master.
> 2.   "dead regionserver(s): [158-1-130-12,20020,1314971097929]" is 
> inaccurate. The dead sever should be " 158-1-130-10,20020,1315068597979"
> //master logs:
> 2011-09-05 00:28:00,487 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 00:33:00,489 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 00:38:00,493 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 00:43:00,495 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 00:48:00,499 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 00:53:00,501 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 00:58:00,501 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:03:00,502 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:08:00,506 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:13:00,508 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:18:00,512 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:23:00,514 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:28:00,518 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:33:00,520 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:38:00,524 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:43:00,526 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:48:00,530 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:53:00,532 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 01:58:00,536 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 02:03:00,537 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-130-12,20020,1314971097929]
> 2011-09-05 02:08:00,538 DEBUG org.apache.hadoop.hbase.master.HMas

[jira] [Commented] (HBASE-4297) TableMapReduceUtil overwrites user supplied options

2011-09-08 Thread Jan Lukavsky (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100347#comment-13100347
 ] 

Jan Lukavsky commented on HBASE-4297:
-

Hi Stack,

I've tested the patch against cdh3u1 and it works fine for us. I haven't seen 
any negative side affects so far.

> TableMapReduceUtil overwrites user supplied options
> ---
>
> Key: HBASE-4297
> URL: https://issues.apache.org/jira/browse/HBASE-4297
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.90.4
>Reporter: Jan Lukavsky
> Attachments: HBASE-4297.patch
>
>
> Job configuration is overwritten by hbase-default and hbase-site in 
> TableMapReduceUtil.initTable(Mapper|Reducer)Job, causing unexpected behavior 
> in the following code:
> {noformat}
> Configuration conf = HBaseConfiguration.create();
> // change keyvalue size
> conf.setInt("hbase.client.keyvalue.maxsize", 20971520);
> Job job = new Job(conf, ...);
> TableMapReduceUtil.initTableMapperJob(...);
> // the job doesn't have the option changed, uses it from hbase-site or 
> hbase-default
> job.submit();
> {noformat}
> Although in this case it could be fixed by moving the set() after 
> initTableMapperJob(), in case where user wants to change some option using 
> GenericOptionsParser and -D this is impossible, making this cool feature 
> useless.
> In the 0.20.x era this code behaved as expected. The solution of this problem 
> should be that we don't overwrite the options, but just read them if they are 
> missing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2195) Support cyclic replication

2011-09-08 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100354#comment-13100354
 ] 

Lars Hofhansl commented on HBASE-2195:
--

The version in WALEdit is final, though, and there is no constructor setting it.
The fact that it is not static looks like a oversight.

I disagree with the latest patch. The version in HLogKey should be a class 
version. That is the whole point of VersionedWritable.

Having an instance version only makes sense if the write would behave 
differently based on version.

I prefer patch v11, where HLogKey does not extend VersionedWritable. Just my 
$0.02.


> Support cyclic replication
> --
>
> Key: HBASE-2195
> URL: https://issues.apache.org/jira/browse/HBASE-2195
> Project: HBase
>  Issue Type: Sub-task
>  Components: replication
>Reporter: Jean-Daniel Cryans
>Assignee: Lars Hofhansl
> Attachments: 2195-v10.txt, 2195-v12.txt, 2195-v13.txt, 2195-v14.txt, 
> 2195-v5.txt, 2195-v6.txt, 2195.txt
>
>
> We need to support cyclic replication by using the cluster id of each HlogKey 
> and stop replicating when it goes back to the original cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4341) HRS#closeAllRegions should take care of HRS#onlineRegions's weak consistency

2011-09-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100356#comment-13100356
 ] 

Ted Yu commented on HBASE-4341:
---

The patch is reasonable.

> HRS#closeAllRegions should take care of HRS#onlineRegions's weak consistency
> 
>
> Key: HBASE-4341
> URL: https://issues.apache.org/jira/browse/HBASE-4341
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.90.4
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
> Fix For: 0.90.5
>
> Attachments: HBASE-4341-Branch.patch
>
>
> This's the reason of why did "https://builds.apache.org/job/hbase-0.90/282"; 
> get failure . In this test, one case was timeout and cause the whole test 
> process got killed.
> [logs]
> Here's the related logs(From 
> org.apache.hadoop.hbase.mapreduce.TestTableMapReduce-output.txt):
> {noformat}
> 2011-08-31 10:09:01,089 INFO  
> [RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker] 
> regionserver.Leases(124): 
> RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker closing 
> leases
> 2011-08-31 10:09:01,089 INFO  
> [RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker] 
> regionserver.Leases(131): 
> RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker closed leases
> 2011-08-31 10:09:01,403 INFO  
> [RegionServer:0;vesta.apache.org,52257,1314785332968] 
> regionserver.HRegionServer(709): Waiting on 1 regions to close
> 2011-08-31 10:09:01,403 DEBUG 
> [RegionServer:0;vesta.apache.org,52257,1314785332968] 
> regionserver.HRegionServer(713): 
> {74a7a8befdf9561dc1d90c4241afeac7=mrtest,uuu,1314785328546.74a7a8befdf9561dc1d90c4241afeac7.}
> 2011-08-31 10:09:01,697 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:02,697 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:03,008 INFO  [vesta.apache.org:50036.timeoutMonitor] 
> hbase.Chore(79): vesta.apache.org:50036.timeoutMonitor exiting
> 2011-08-31 10:09:03,697 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:04,697 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:05,698 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:06,698 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:07,698 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> {noformat}
> [Analysis]
> One region was opened during the RS's stopping. 
> This is method of "HRS#closeAllRegions":
> {noformat}
>   protected void closeAllRegions(final boolean abort) {
> closeUserRegions(abort);
> -
> if (meta != null) closeRegion(meta.getRegionInfo(), abort, false);
> if (root != null) closeRegion(root.getRegionInfo(), abort, false);
>   }
> {noformat}
> HRS#onlineRegions is a ConcurrentHashMap. So walk down this map may not get 
> all the data if some entries are been added during the traverse. Once one 
> region was missed, it can't be closed anymore. And this regionserver will not 
> be stopped normally. Then the following logs occurred:
> {noformat}
> 2011-08-31 10:09:01,403 INFO  
> [RegionServer:0;vesta.apache.org,52257,1314785332968] 
> regionserver.HRegionServer(709): Waiting on 1 regions to close
> 2011-08-31 10:09:01,403 DEBUG 
> [RegionServer:0;vesta.apache.org,52257,1314785332968] 
> regionserver.HRegionServer(713): 
> {74a7a8befdf9561dc1d90c4241afeac7=mrtest,uuu,1314785328546.74a7a8befdf9561dc1d90c4241afeac7.}
> 2011-08-31 10:09:01,697 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2195) Support cyclic replication

2011-09-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100359#comment-13100359
 ] 

Ted Yu commented on HBASE-2195:
---

The special handling in readFields() already makes HLogKey deviate from a 
single class version. Meaning when this.clusterId carries 
HConstants.DEFAULT_CLUSTER_ID, there is a chance that the HLogKey wasn't 
generated in local cluster.
I think that's why we call it DEFAULT_CLUSTER_ID instead of LOCAL_CLUSTER_ID.

We can create our own VersionedWritable interface (maybe with a different name) 
which HLogKey can implement.

> Support cyclic replication
> --
>
> Key: HBASE-2195
> URL: https://issues.apache.org/jira/browse/HBASE-2195
> Project: HBase
>  Issue Type: Sub-task
>  Components: replication
>Reporter: Jean-Daniel Cryans
>Assignee: Lars Hofhansl
> Attachments: 2195-v10.txt, 2195-v12.txt, 2195-v13.txt, 2195-v14.txt, 
> 2195-v5.txt, 2195-v6.txt, 2195.txt
>
>
> We need to support cyclic replication by using the cluster id of each HlogKey 
> and stop replicating when it goes back to the original cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4015) Refactor the TimeoutMonitor to make it less racy

2011-09-08 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100367#comment-13100367
 ] 

ramkrishna.s.vasudevan commented on HBASE-4015:
---

@J-D

bq.You could also try doing a worst case cold startup by killing -9 all HBase 
components at the same time (more or less) and then restarting them all (also 
after data was added). Finally you could try setting a super low timeout 
setting, like 5 seconds, to trigger RIT timeouts by the hundreds.

I conducted the tests again particularly with 5 secs time out. Killed the 
cluster, started again, Randomly killed RS -> invoked balancer command also.
I was able to get back all the regions (4003 regions) among 3 RS.
hbck result was also positive
{noformat}
* The number of timed out regions  938
* The number of timed out regions  270
* The number of timed out regions  673
* The number of timed out regions  269
* The number of timed out regions  941
* The number of timed out regions  942
* The number of timed out regions  941
{noformat}

{noformat}
Summary:
  -ROOT- is okay.
Number of regions: 1
Deployed on:  HOST-10-18-52-253,60020,1315480076091
  .META. is okay.
Number of regions: 1
Deployed on:  HOST-10-18-52-253,60020,1315480076091
  testram2 is okay.
Number of regions: 4001
Deployed on:  HOST-10-18-52-108,60020,1315480229321 
HOST-10-18-52-253,60020,1315480076091
0 inconsistencies detected.
Status: OK
{noformat}

> Refactor the TimeoutMonitor to make it less racy
> 
>
> Key: HBASE-4015
> URL: https://issues.apache.org/jira/browse/HBASE-4015
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 0.90.3
>Reporter: Jean-Daniel Cryans
>Assignee: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: HBASE-4015_1_trunk.patch, HBASE-4015_2_trunk.patch, 
> HBASE-4015_reprepared_trunk_2.patch, Timeoutmonitor with state diagrams.pdf
>
>
> The current implementation of the TimeoutMonitor acts like a race condition 
> generator, mostly making things worse rather than better. It does it's own 
> thing for a while without caring for what's happening in the rest of the 
> master.
> The first thing that needs to happen is that the regions should not be 
> processed in one big batch, because that sometimes can take minutes to 
> process (meanwhile a region that timed out opening might have opened, then 
> what happens is it will be reassigned by the TimeoutMonitor generating the 
> never ending PENDING_OPEN situation).
> Those operations should also be done more atomically, although I'm not sure 
> how to do it in a scalable way in this case.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2195) Support cyclic replication

2011-09-08 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100373#comment-13100373
 ] 

Lars Hofhansl commented on HBASE-2195:
--

The entries in a log file can have different versions. The class is either 
(implicit) version 0 in 0.90.x or version -1 0.92. It should not be a member 
variable.

We have the class version so that we can decide how to tag the write side of 
thing and then decide at read time how to behave. Every versioned class has a 
static version (or a final that is never set by a constructor which amount to 
the same behavior).

Let's please settle on v11 of the patch.
We can have a general discussion about versioning outside of this patch.

(If anything it shows that everything that is written to a file needs to be 
versioned from the beginning).


> Support cyclic replication
> --
>
> Key: HBASE-2195
> URL: https://issues.apache.org/jira/browse/HBASE-2195
> Project: HBase
>  Issue Type: Sub-task
>  Components: replication
>Reporter: Jean-Daniel Cryans
>Assignee: Lars Hofhansl
> Attachments: 2195-v10.txt, 2195-v12.txt, 2195-v13.txt, 2195-v14.txt, 
> 2195-v5.txt, 2195-v6.txt, 2195.txt
>
>
> We need to support cyclic replication by using the cluster id of each HlogKey 
> and stop replicating when it goes back to the original cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4153) Handle RegionAlreadyInTransitionException in AssignmentManager

2011-09-08 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100386#comment-13100386
 ] 

ramkrishna.s.vasudevan commented on HBASE-4153:
---

Pls find the analysis for the following state transitions

This is how i tried to simulate the scenarios
Create some 7 or 8 regions.
Using HBaseAdmin call Unassign(regionname, false) and assign(regionname, false) 
parallely.
See what happens when both operations go on parallel.

Correct me if am wrong.  Pls provide your suggestions.

1) CloseClose -> No problem
2) CloseOpen 
Here we depend on the timeout
 Assume the closing is in partial state
 -> After setting the node to CLOSED state 
Here the closing is done successfully but the problem is to open we 
need to
wait for the timeout monitor to deduce that the region is in RIT as the 
inmemory
state is put to OFFLINE once RegionAlreadyInTransitionExceptionHappens
 -> Before setting the node to CLOSED state 
Here the problem is that closing is not done properly and also open 
also fails
putting the inmemory state to OFFLINE
The closing itself fails because when we try to assign the region it 
forcefully
moves the znode to OFFLINE. so close is not able to move from CLOSING 
to CLOSED
May be if we get an RegionAlreadyInTransition just dont update the memory state 
to OFFLINE.
Either the previous open should be successful or even if it fails the 
PENDING_OPEN state 
timeout transition will any way happen

3) Open Open
This is causing problem.
The thing here is assume one open region is in progress.
The next open region just fails and adds in memory state to OFFLINE.
Now the first open region gets completed and moves it to OPENED.
In handling of OPENED state
{code}
  if (regionState == null ||
  (!regionState.isPendingOpen() && !regionState.isOpening())) {
LOG.warn("Received OPENED for region " +
prettyPrintedRegionName +
" from server " + data.getOrigin() + " but region was in " +
" the state " + regionState + " and not " +
"in expected PENDING_OPEN or OPENING states");
return;
{code}
we have the above code.  Hence never the region can be added to master's online 
list.
This scenario is what has been handled in HBASE-4015 patch when a race happens 
between
forcing the node to OFFLINE and by the time OPENING has happened.
{code}
+  // If we are reassigning the node do not force in-memory state to 
OFFLINE.
+  // Based on the znode state we will decide if to change
+  // in-memory state to OFFLINE or not. It will
+  // be done before setting the znode to OFFLINE state.
+  if (!hijackAndPreempt) {
+LOG.debug("Forcing OFFLINE; was=" + state);
+state.update(RegionState.State.OFFLINE);
+  }
{code}
4)Open  Close
This will not be a seperate case in my testing.  As once we call unassign() 
region it will any way
call assign once closing is successful.  Hence it ends up in any one of the 
three.


> Handle RegionAlreadyInTransitionException in AssignmentManager
> --
>
> Key: HBASE-4153
> URL: https://issues.apache.org/jira/browse/HBASE-4153
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.0
>
>
> Comment from Stack over in HBASE-3741:
> {quote}
> Question: Looking at this patch again, if we throw a 
> RegionAlreadyInTransitionException, won't we just assign the region elsewhere 
> though RegionAlreadyInTransitionException in at least one case here is saying 
> that the region is already open on this regionserver?
> {quote}
> Indeed looking at the code it's going to be handled the same way other 
> exceptions are. Need to add special cases for assign and unassign.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4057) Implement HBase version of "show processlist"

2011-09-08 Thread Riley Patterson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riley Patterson updated HBASE-4057:
---

Attachment: HBASE-4057.patch

Includes extended TaskMonitor with consideration for high-frequency, short 
duration changes, a new MonitoredRPCHandler class, integration with the RPC, 
and filtered exposure both as and HTML table and as JSON via jamon.

Will also put this on the review board, as it's a rather large patch.

> Implement HBase version of "show processlist"
> -
>
> Key: HBASE-4057
> URL: https://issues.apache.org/jira/browse/HBASE-4057
> Project: HBase
>  Issue Type: New Feature
>  Components: client, regionserver
>Reporter: Nicolas Spiegelberg
>Assignee: Riley Patterson
> Attachments: HBASE-4057.patch
>
>
> One of the features that our DBAs use for MySQL analysis is "show 
> processlist", which gives application-level stats about the RPC threads.  
> Right now, we use jstack but that is very core-developer-centric.  We need to 
> create a similar tool that DBA/Ops/AppDevs can use.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2195) Support cyclic replication

2011-09-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100391#comment-13100391
 ] 

stack commented on HBASE-2195:
--

bq. Let's please settle on v11 of the patch. We can have a general discussion 
about versioning outside of this patch.

I'm +1 on doing this

> Support cyclic replication
> --
>
> Key: HBASE-2195
> URL: https://issues.apache.org/jira/browse/HBASE-2195
> Project: HBase
>  Issue Type: Sub-task
>  Components: replication
>Reporter: Jean-Daniel Cryans
>Assignee: Lars Hofhansl
> Attachments: 2195-v10.txt, 2195-v12.txt, 2195-v13.txt, 2195-v14.txt, 
> 2195-v5.txt, 2195-v6.txt, 2195.txt
>
>
> We need to support cyclic replication by using the cluster id of each HlogKey 
> and stop replicating when it goes back to the original cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4057) Implement HBase version of "show processlist"

2011-09-08 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100392#comment-13100392
 ] 

jirapos...@reviews.apache.org commented on HBASE-4057:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1750/
---

Review request for hbase.


Summary
---

Includes extended TaskMonitor with consideration for high-frequency, short 
duration changes, a new MonitoredRPCHandler class, integration with the RPC, 
and filtered exposure both as and HTML table and as JSON via jamon.


This addresses bug HBASE-4057.
https://issues.apache.org/jira/browse/HBASE-4057


Diffs
-

  /src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredRPCHandler.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/master/MasterStatusServlet.java 
1166510 
  /src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java 1166510 
  /src/main/java/org/apache/hadoop/hbase/ipc/WritableRpcEngine.java 1166510 
  /src/main/jamon/org/apache/hbase/tmpl/regionserver/RSStatusTmpl.jamon 1166510 
  /src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java 1166510 
  /src/main/jamon/org/apache/hbase/tmpl/master/MasterStatusTmpl.jamon 1166510 
  /src/main/jamon/org/apache/hbase/tmpl/common/TaskMonitorTmpl.jamon 1166510 
  
/src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredRPCHandlerImpl.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredTask.java 1166510 
  /src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredTaskImpl.java 
1166510 
  /src/main/java/org/apache/hadoop/hbase/monitoring/TaskMonitor.java 1166510 
  /src/main/java/org/apache/hadoop/hbase/regionserver/RSStatusServlet.java 
1166510 
  /src/main/resources/hbase-webapps/static/hbase.css 1166510 
  /src/test/java/org/apache/hadoop/hbase/monitoring/TestTaskMonitor.java 
1166510 

Diff: https://reviews.apache.org/r/1750/diff


Testing
---

All unit tests passed. All exposure works as expected. Extensive load testing 
has been done on FB's internal branch.


Thanks,

Riley



> Implement HBase version of "show processlist"
> -
>
> Key: HBASE-4057
> URL: https://issues.apache.org/jira/browse/HBASE-4057
> Project: HBase
>  Issue Type: New Feature
>  Components: client, regionserver
>Reporter: Nicolas Spiegelberg
>Assignee: Riley Patterson
> Attachments: HBASE-4057.patch
>
>
> One of the features that our DBAs use for MySQL analysis is "show 
> processlist", which gives application-level stats about the RPC threads.  
> Right now, we use jstack but that is very core-developer-centric.  We need to 
> create a similar tool that DBA/Ops/AppDevs can use.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4341) HRS#closeAllRegions should take care of HRS#onlineRegions's weak consistency

2011-09-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100393#comment-13100393
 ] 

stack commented on HBASE-4341:
--

I agree.

> HRS#closeAllRegions should take care of HRS#onlineRegions's weak consistency
> 
>
> Key: HBASE-4341
> URL: https://issues.apache.org/jira/browse/HBASE-4341
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.90.4
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
> Fix For: 0.90.5
>
> Attachments: HBASE-4341-Branch.patch
>
>
> This's the reason of why did "https://builds.apache.org/job/hbase-0.90/282"; 
> get failure . In this test, one case was timeout and cause the whole test 
> process got killed.
> [logs]
> Here's the related logs(From 
> org.apache.hadoop.hbase.mapreduce.TestTableMapReduce-output.txt):
> {noformat}
> 2011-08-31 10:09:01,089 INFO  
> [RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker] 
> regionserver.Leases(124): 
> RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker closing 
> leases
> 2011-08-31 10:09:01,089 INFO  
> [RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker] 
> regionserver.Leases(131): 
> RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker closed leases
> 2011-08-31 10:09:01,403 INFO  
> [RegionServer:0;vesta.apache.org,52257,1314785332968] 
> regionserver.HRegionServer(709): Waiting on 1 regions to close
> 2011-08-31 10:09:01,403 DEBUG 
> [RegionServer:0;vesta.apache.org,52257,1314785332968] 
> regionserver.HRegionServer(713): 
> {74a7a8befdf9561dc1d90c4241afeac7=mrtest,uuu,1314785328546.74a7a8befdf9561dc1d90c4241afeac7.}
> 2011-08-31 10:09:01,697 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:02,697 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:03,008 INFO  [vesta.apache.org:50036.timeoutMonitor] 
> hbase.Chore(79): vesta.apache.org:50036.timeoutMonitor exiting
> 2011-08-31 10:09:03,697 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:04,697 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:05,698 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:06,698 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:07,698 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> {noformat}
> [Analysis]
> One region was opened during the RS's stopping. 
> This is method of "HRS#closeAllRegions":
> {noformat}
>   protected void closeAllRegions(final boolean abort) {
> closeUserRegions(abort);
> -
> if (meta != null) closeRegion(meta.getRegionInfo(), abort, false);
> if (root != null) closeRegion(root.getRegionInfo(), abort, false);
>   }
> {noformat}
> HRS#onlineRegions is a ConcurrentHashMap. So walk down this map may not get 
> all the data if some entries are been added during the traverse. Once one 
> region was missed, it can't be closed anymore. And this regionserver will not 
> be stopped normally. Then the following logs occurred:
> {noformat}
> 2011-08-31 10:09:01,403 INFO  
> [RegionServer:0;vesta.apache.org,52257,1314785332968] 
> regionserver.HRegionServer(709): Waiting on 1 regions to close
> 2011-08-31 10:09:01,403 DEBUG 
> [RegionServer:0;vesta.apache.org,52257,1314785332968] 
> regionserver.HRegionServer(713): 
> {74a7a8befdf9561dc1d90c4241afeac7=mrtest,uuu,1314785328546.74a7a8befdf9561dc1d90c4241afeac7.}
> 2011-08-31 10:09:01,697 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.

2011-09-08 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100395#comment-13100395
 ] 

ramkrishna.s.vasudevan commented on HBASE-4351:
---

The impact could be like the region is removed from the regions map in master 
and until we restart the master becomes unaware of the region.
Kindly correct me if am wrong.

> If from Admin we try to unassign a region forcefully, though a valid region 
> name is given the master is not able to identify the region to unassign.
> 
>
> Key: HBASE-4351
> URL: https://issues.apache.org/jira/browse/HBASE-4351
> Project: HBase
>  Issue Type: Bug
> Environment: Linux
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>
> The following is the problem
> Get the exact region name from UI and call
> HBaseAdmin.unassign(regionname, true).
> Here true is forceful option.
> As part of unassign api
> {code}
>   public void unassign(final byte [] regionName, final boolean force)
>   throws IOException {
> Pair pair =
>   MetaReader.getRegion(this.catalogTracker, regionName);
> if (pair == null) throw new 
> UnknownRegionException(Bytes.toStringBinary(regionName));
> HRegionInfo hri = pair.getFirst();
> if (force) this.assignmentManager.clearRegionFromTransition(hri);
> this.assignmentManager.unassign(hri, force);
>   }
> {code}
> As part of clearRegionFromTransition()
> {code}
> synchronized (this.regions) {
>   this.regions.remove(hri);
>   for (Set regions : this.servers.values()) {
> regions.remove(hri);
>   }
> }
> {code}
> the region is also removed.  Hence when the master tries to identify the 
> region
> {code}
>   if (!regions.containsKey(region)) {
> debugLog(region, "Attempted to unassign region " +
>   region.getRegionNameAsString() + " but it is not " +
>   "currently assigned anywhere");
> return;
>   }
> {code}
> It is not able to identify the region.  It exists in trunk and 0.90.x also.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-4341) HRS#closeAllRegions should take care of HRS#onlineRegions's weak consistency

2011-09-08 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-4341.
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Applied to branch and trunk.  Thank you for the patch Jieshan.

> HRS#closeAllRegions should take care of HRS#onlineRegions's weak consistency
> 
>
> Key: HBASE-4341
> URL: https://issues.apache.org/jira/browse/HBASE-4341
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.90.4
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
> Fix For: 0.90.5
>
> Attachments: HBASE-4341-Branch.patch
>
>
> This's the reason of why did "https://builds.apache.org/job/hbase-0.90/282"; 
> get failure . In this test, one case was timeout and cause the whole test 
> process got killed.
> [logs]
> Here's the related logs(From 
> org.apache.hadoop.hbase.mapreduce.TestTableMapReduce-output.txt):
> {noformat}
> 2011-08-31 10:09:01,089 INFO  
> [RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker] 
> regionserver.Leases(124): 
> RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker closing 
> leases
> 2011-08-31 10:09:01,089 INFO  
> [RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker] 
> regionserver.Leases(131): 
> RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker closed leases
> 2011-08-31 10:09:01,403 INFO  
> [RegionServer:0;vesta.apache.org,52257,1314785332968] 
> regionserver.HRegionServer(709): Waiting on 1 regions to close
> 2011-08-31 10:09:01,403 DEBUG 
> [RegionServer:0;vesta.apache.org,52257,1314785332968] 
> regionserver.HRegionServer(713): 
> {74a7a8befdf9561dc1d90c4241afeac7=mrtest,uuu,1314785328546.74a7a8befdf9561dc1d90c4241afeac7.}
> 2011-08-31 10:09:01,697 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:02,697 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:03,008 INFO  [vesta.apache.org:50036.timeoutMonitor] 
> hbase.Chore(79): vesta.apache.org:50036.timeoutMonitor exiting
> 2011-08-31 10:09:03,697 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:04,697 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:05,698 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:06,698 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:07,698 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> {noformat}
> [Analysis]
> One region was opened during the RS's stopping. 
> This is method of "HRS#closeAllRegions":
> {noformat}
>   protected void closeAllRegions(final boolean abort) {
> closeUserRegions(abort);
> -
> if (meta != null) closeRegion(meta.getRegionInfo(), abort, false);
> if (root != null) closeRegion(root.getRegionInfo(), abort, false);
>   }
> {noformat}
> HRS#onlineRegions is a ConcurrentHashMap. So walk down this map may not get 
> all the data if some entries are been added during the traverse. Once one 
> region was missed, it can't be closed anymore. And this regionserver will not 
> be stopped normally. Then the following logs occurred:
> {noformat}
> 2011-08-31 10:09:01,403 INFO  
> [RegionServer:0;vesta.apache.org,52257,1314785332968] 
> regionserver.HRegionServer(709): Waiting on 1 regions to close
> 2011-08-31 10:09:01,403 DEBUG 
> [RegionServer:0;vesta.apache.org,52257,1314785332968] 
> regionserver.HRegionServer(713): 
> {74a7a8befdf9561dc1d90c4241afeac7=mrtest,uuu,1314785328546.74a7a8befdf9561dc1d90c4241afeac7.}
> 2011-08-31 10:09:01,697 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4243) HADOOP_HOME should be auto-detected

2011-09-08 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reassigned HBASE-4243:


Assignee: Roman Shaposhnik

> HADOOP_HOME should be auto-detected
> ---
>
> Key: HBASE-4243
> URL: https://issues.apache.org/jira/browse/HBASE-4243
> Project: HBase
>  Issue Type: Improvement
>Reporter: Roman Shaposhnik
>Assignee: Roman Shaposhnik
>Priority: Minor
> Fix For: 0.92.0
>
> Attachments: HBASE-4243.patch.txt
>
>
> Now that HBASE-3465 has been integrated, perhaps we should try to auto-detect 
> the HADOOP_HOME setting if it is not given explicitly. Something along the 
> lines of:
> {noformat}
> # check for hadoop in the path
> 141   HADOOP_IN_PATH=`which hadoop 2>/dev/null`
> 142   if [ -f ${HADOOP_IN_PATH} ]; then
> 143 HADOOP_DIR=`dirname "$HADOOP_IN_PATH"`/..
> 144   fi
> 145   # HADOOP_HOME env variable overrides hadoop in the path
> 146   HADOOP_HOME=${HADOOP_HOME:-$HADOOP_DIR}
> 147   if [ "$HADOOP_HOME" == "" ]; then
> 148 echo "Cannot find hadoop installation: \$HADOOP_HOME must be set or 
> hadoop must be in the path";
> 149 exit 4;
> 150   fi
> {noformat}
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4243) HADOOP_HOME should be auto-detected

2011-09-08 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4243:
-

   Resolution: Fixed
Fix Version/s: 0.92.0
 Release Note: Use HADOOP_HOME if set.
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed to TRUNK.  Thanks Roman.  We can display version in shell via another 
JIRA.

> HADOOP_HOME should be auto-detected
> ---
>
> Key: HBASE-4243
> URL: https://issues.apache.org/jira/browse/HBASE-4243
> Project: HBase
>  Issue Type: Improvement
>Reporter: Roman Shaposhnik
>Assignee: Roman Shaposhnik
>Priority: Minor
> Fix For: 0.92.0
>
> Attachments: HBASE-4243.patch.txt
>
>
> Now that HBASE-3465 has been integrated, perhaps we should try to auto-detect 
> the HADOOP_HOME setting if it is not given explicitly. Something along the 
> lines of:
> {noformat}
> # check for hadoop in the path
> 141   HADOOP_IN_PATH=`which hadoop 2>/dev/null`
> 142   if [ -f ${HADOOP_IN_PATH} ]; then
> 143 HADOOP_DIR=`dirname "$HADOOP_IN_PATH"`/..
> 144   fi
> 145   # HADOOP_HOME env variable overrides hadoop in the path
> 146   HADOOP_HOME=${HADOOP_HOME:-$HADOOP_DIR}
> 147   if [ "$HADOOP_HOME" == "" ]; then
> 148 echo "Cannot find hadoop installation: \$HADOOP_HOME must be set or 
> hadoop must be in the path";
> 149 exit 4;
> 150   fi
> {noformat}
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4057) Implement HBase version of "show processlist"

2011-09-08 Thread Riley Patterson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100401#comment-13100401
 ] 

Riley Patterson commented on HBASE-4057:


Also @Andrew, currently, the only user-facing exposure is through jamon. 
However, the TaskMonitor's getTasks() method is a very straightforward internal 
exposure mechanism, and it would be relatively trivial to expose show 
processlist to the shell or any other user interface.

My internship ends on Friday (tomorrow), so I don't have time to do this while 
I'm at Facebook, but if you think shell exposure is important, I can implement 
it independently after my internship ends without too much trouble. However, I 
would really like to get this core functionality in before I leave Facebook.

> Implement HBase version of "show processlist"
> -
>
> Key: HBASE-4057
> URL: https://issues.apache.org/jira/browse/HBASE-4057
> Project: HBase
>  Issue Type: New Feature
>  Components: client, regionserver
>Reporter: Nicolas Spiegelberg
>Assignee: Riley Patterson
> Attachments: HBASE-4057.patch
>
>
> One of the features that our DBAs use for MySQL analysis is "show 
> processlist", which gives application-level stats about the RPC threads.  
> Right now, we use jstack but that is very core-developer-centric.  We need to 
> create a similar tool that DBA/Ops/AppDevs can use.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.

2011-09-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100403#comment-13100403
 ] 

stack commented on HBASE-4351:
--

That looks like a silly mistake.  Good one Ram.

> If from Admin we try to unassign a region forcefully, though a valid region 
> name is given the master is not able to identify the region to unassign.
> 
>
> Key: HBASE-4351
> URL: https://issues.apache.org/jira/browse/HBASE-4351
> Project: HBase
>  Issue Type: Bug
> Environment: Linux
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>
> The following is the problem
> Get the exact region name from UI and call
> HBaseAdmin.unassign(regionname, true).
> Here true is forceful option.
> As part of unassign api
> {code}
>   public void unassign(final byte [] regionName, final boolean force)
>   throws IOException {
> Pair pair =
>   MetaReader.getRegion(this.catalogTracker, regionName);
> if (pair == null) throw new 
> UnknownRegionException(Bytes.toStringBinary(regionName));
> HRegionInfo hri = pair.getFirst();
> if (force) this.assignmentManager.clearRegionFromTransition(hri);
> this.assignmentManager.unassign(hri, force);
>   }
> {code}
> As part of clearRegionFromTransition()
> {code}
> synchronized (this.regions) {
>   this.regions.remove(hri);
>   for (Set regions : this.servers.values()) {
> regions.remove(hri);
>   }
> }
> {code}
> the region is also removed.  Hence when the master tries to identify the 
> region
> {code}
>   if (!regions.containsKey(region)) {
> debugLog(region, "Attempted to unassign region " +
>   region.getRegionNameAsString() + " but it is not " +
>   "currently assigned anywhere");
> return;
>   }
> {code}
> It is not able to identify the region.  It exists in trunk and 0.90.x also.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4297) TableMapReduceUtil overwrites user supplied options

2011-09-08 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4297:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed to TRUNK.  Thank you for the patch Jan.

> TableMapReduceUtil overwrites user supplied options
> ---
>
> Key: HBASE-4297
> URL: https://issues.apache.org/jira/browse/HBASE-4297
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.90.4
>Reporter: Jan Lukavsky
> Attachments: HBASE-4297.patch
>
>
> Job configuration is overwritten by hbase-default and hbase-site in 
> TableMapReduceUtil.initTable(Mapper|Reducer)Job, causing unexpected behavior 
> in the following code:
> {noformat}
> Configuration conf = HBaseConfiguration.create();
> // change keyvalue size
> conf.setInt("hbase.client.keyvalue.maxsize", 20971520);
> Job job = new Job(conf, ...);
> TableMapReduceUtil.initTableMapperJob(...);
> // the job doesn't have the option changed, uses it from hbase-site or 
> hbase-default
> job.submit();
> {noformat}
> Although in this case it could be fixed by moving the set() after 
> initTableMapperJob(), in case where user wants to change some option using 
> GenericOptionsParser and -D this is impossible, making this cool feature 
> useless.
> In the 0.20.x era this code behaved as expected. The solution of this problem 
> should be that we don't overwrite the options, but just read them if they are 
> missing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4224) Need a flush by regionserver rather than by table option

2011-09-08 Thread Akash Ashok (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100437#comment-13100437
 ] 

Akash Ashok commented on HBASE-4224:


Thanks Stack. I'll start workin on it then.

> Need a flush by regionserver rather than by table option
> 
>
> Key: HBASE-4224
> URL: https://issues.apache.org/jira/browse/HBASE-4224
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Reporter: stack
>
> This evening needed to clean out logs on the cluster.  logs are by 
> regionserver.  to let go of logs, we need to have all edits emptied from 
> memory.  only flush is by table or region.  We need to be able to flush the 
> regionserver.  Need to add this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4224) Need a flush by regionserver rather than by table option

2011-09-08 Thread Akash Ashok (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akash Ashok reassigned HBASE-4224:
--

Assignee: Akash Ashok

> Need a flush by regionserver rather than by table option
> 
>
> Key: HBASE-4224
> URL: https://issues.apache.org/jira/browse/HBASE-4224
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Reporter: stack
>Assignee: Akash Ashok
>
> This evening needed to clean out logs on the cluster.  logs are by 
> regionserver.  to let go of logs, we need to have all edits emptied from 
> memory.  only flush is by table or region.  We need to be able to flush the 
> regionserver.  Need to add this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4015) Refactor the TimeoutMonitor to make it less racy

2011-09-08 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100438#comment-13100438
 ] 

Jean-Daniel Cryans commented on HBASE-4015:
---

Perfect!

> Refactor the TimeoutMonitor to make it less racy
> 
>
> Key: HBASE-4015
> URL: https://issues.apache.org/jira/browse/HBASE-4015
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 0.90.3
>Reporter: Jean-Daniel Cryans
>Assignee: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: HBASE-4015_1_trunk.patch, HBASE-4015_2_trunk.patch, 
> HBASE-4015_reprepared_trunk_2.patch, Timeoutmonitor with state diagrams.pdf
>
>
> The current implementation of the TimeoutMonitor acts like a race condition 
> generator, mostly making things worse rather than better. It does it's own 
> thing for a while without caring for what's happening in the rest of the 
> master.
> The first thing that needs to happen is that the regions should not be 
> processed in one big batch, because that sometimes can take minutes to 
> process (meanwhile a region that timed out opening might have opened, then 
> what happens is it will be reassigned by the TimeoutMonitor generating the 
> never ending PENDING_OPEN situation).
> Those operations should also be done more atomically, although I'm not sure 
> how to do it in a scalable way in this case.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4153) Handle RegionAlreadyInTransitionException in AssignmentManager

2011-09-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100440#comment-13100440
 ] 

Ted Yu commented on HBASE-4153:
---

Thanks for your analysis Ramkrishna.
The race condition in case 2 above should be handled. Your suggestion for case 
2 is reasonable.

> Handle RegionAlreadyInTransitionException in AssignmentManager
> --
>
> Key: HBASE-4153
> URL: https://issues.apache.org/jira/browse/HBASE-4153
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.0
>
>
> Comment from Stack over in HBASE-3741:
> {quote}
> Question: Looking at this patch again, if we throw a 
> RegionAlreadyInTransitionException, won't we just assign the region elsewhere 
> though RegionAlreadyInTransitionException in at least one case here is saying 
> that the region is already open on this regionserver?
> {quote}
> Indeed looking at the code it's going to be handled the same way other 
> exceptions are. Need to add special cases for assign and unassign.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4015) Refactor the TimeoutMonitor to make it less racy

2011-09-08 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4015:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Applied to TRUNK.  We should consider doing a version of this on branch.  J-D 
points out it changes the HRegionInterface.  Maybe if we put the change on the 
end in branch we'll be able to do rolling restarts up to 0.90.5.  I'll open new 
issue to look into this.

Thanks for persevering with the patch Ram.

> Refactor the TimeoutMonitor to make it less racy
> 
>
> Key: HBASE-4015
> URL: https://issues.apache.org/jira/browse/HBASE-4015
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 0.90.3
>Reporter: Jean-Daniel Cryans
>Assignee: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: HBASE-4015_1_trunk.patch, HBASE-4015_2_trunk.patch, 
> HBASE-4015_reprepared_trunk_2.patch, Timeoutmonitor with state diagrams.pdf
>
>
> The current implementation of the TimeoutMonitor acts like a race condition 
> generator, mostly making things worse rather than better. It does it's own 
> thing for a while without caring for what's happening in the rest of the 
> master.
> The first thing that needs to happen is that the regions should not be 
> processed in one big batch, because that sometimes can take minutes to 
> process (meanwhile a region that timed out opening might have opened, then 
> what happens is it will be reassigned by the TimeoutMonitor generating the 
> never ending PENDING_OPEN situation).
> Those operations should also be done more atomically, although I'm not sure 
> how to do it in a scalable way in this case.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4352) Apply version of hbase-4015 to branch

2011-09-08 Thread stack (JIRA)
Apply version of hbase-4015 to branch
-

 Key: HBASE-4352
 URL: https://issues.apache.org/jira/browse/HBASE-4352
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Fix For: 0.90.5


Consider adding a version of hbase-4015 to 0.90.  It changes HRegionInterface 
so would need move change to end of the Interface and then test that it doesn't 
break rolling restart.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4117) Slow Query Log

2011-09-08 Thread Riley Patterson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riley Patterson updated HBASE-4117:
---

Attachment: HBASE-4117-doc.txt

A basic, user-manual type of document describing the feature, its use, its 
configuration, and grokking its output.

> Slow Query Log
> --
>
> Key: HBASE-4117
> URL: https://issues.apache.org/jira/browse/HBASE-4117
> Project: HBase
>  Issue Type: New Feature
>  Components: ipc
>Reporter: Riley Patterson
>Assignee: Riley Patterson
>Priority: Minor
>  Labels: client, ipc
> Fix For: 0.92.0
>
> Attachments: HBASE-4117-doc.txt, HBASE-4117-v2.patch, 
> HBASE-4117-v3.patch, HBASE-4117.patch
>
>
> Produce log messages for slow queries. The RPC server will decide what is 
> slow based on a configurable "warn response time" parameter. Queries 
> designated as slow will then output a "response too slow" message followed by 
> a fingerprint of the query, and a summary limited in size by another 
> configurable parameter (to limit log spamming).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2195) Support cyclic replication

2011-09-08 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100465#comment-13100465
 ] 

Lars Hofhansl commented on HBASE-2195:
--

Based on a suggestion by Ted I tested the following scenario (with patch v11): 
Start with hbase-0.90.x, upgrade and restart with trunk (with 4301 applied), 
and set up master-master replication with another trunk cluster. Make sure 
replication works fine (and it does)

In terms of versioning...
When we did versioning for a project I used to work on a long time ago, we had 
maintained a class version *and* an instance version. We might need both.
The class should know which version it is, but also it is useful in many 
scenarios to ask the instance what specific version it is on.
I.e. we need an interface with two methods: One to get the class version, one 
to get the instance version.
I think this is for a different patch, though.


> Support cyclic replication
> --
>
> Key: HBASE-2195
> URL: https://issues.apache.org/jira/browse/HBASE-2195
> Project: HBase
>  Issue Type: Sub-task
>  Components: replication
>Reporter: Jean-Daniel Cryans
>Assignee: Lars Hofhansl
> Attachments: 2195-v10.txt, 2195-v12.txt, 2195-v13.txt, 2195-v14.txt, 
> 2195-v5.txt, 2195-v6.txt, 2195.txt
>
>
> We need to support cyclic replication by using the cluster id of each HlogKey 
> and stop replicating when it goes back to the original cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4353) Create VersionedWritable interface that allows a class version and an instance version

2011-09-08 Thread Ted Yu (JIRA)
Create VersionedWritable interface that allows a class version and an instance 
version
--

 Key: HBASE-4353
 URL: https://issues.apache.org/jira/browse/HBASE-4353
 Project: HBase
  Issue Type: New Feature
Reporter: Ted Yu


Here is comment from Lars (HBASE-2195):

When we did versioning for a project I used to work on a long time ago, we had 
maintained a class version *and* an instance version. We might need both.
The class should know which version it is, but also it is useful in many 
scenarios to ask the instance what specific version it is on.

I.e. we need an interface with two methods: One to get the class version, one 
to get the instance version.

This would apply to HLogKey which is able to read entries without explicit 
version.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2195) Support cyclic replication

2011-09-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100472#comment-13100472
 ] 

Ted Yu commented on HBASE-2195:
---

I created HBASE-4353 for the new VersionedWritable interface.
The remaining work can be done there.

Thanks for the perseverance Lars.

> Support cyclic replication
> --
>
> Key: HBASE-2195
> URL: https://issues.apache.org/jira/browse/HBASE-2195
> Project: HBase
>  Issue Type: Sub-task
>  Components: replication
>Reporter: Jean-Daniel Cryans
>Assignee: Lars Hofhansl
> Attachments: 2195-v10.txt, 2195-v12.txt, 2195-v13.txt, 2195-v14.txt, 
> 2195-v5.txt, 2195-v6.txt, 2195.txt
>
>
> We need to support cyclic replication by using the cluster id of each HlogKey 
> and stop replicating when it goes back to the original cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Reopened] (HBASE-4243) HADOOP_HOME should be auto-detected

2011-09-08 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reopened HBASE-4243:
--


Reverted.  When I tried it I get this (on macosx)

h45:clean_trunk Stack$ cd ../hbase
h45:hbase Stack$ ./bin/start-hbase.sh 
readlink: illegal option -- f
usage: readlink [-n] [file ...]
master running as process 39080. Stop it first.

> HADOOP_HOME should be auto-detected
> ---
>
> Key: HBASE-4243
> URL: https://issues.apache.org/jira/browse/HBASE-4243
> Project: HBase
>  Issue Type: Improvement
>Reporter: Roman Shaposhnik
>Assignee: Roman Shaposhnik
>Priority: Minor
> Fix For: 0.92.0
>
> Attachments: HBASE-4243.patch.txt
>
>
> Now that HBASE-3465 has been integrated, perhaps we should try to auto-detect 
> the HADOOP_HOME setting if it is not given explicitly. Something along the 
> lines of:
> {noformat}
> # check for hadoop in the path
> 141   HADOOP_IN_PATH=`which hadoop 2>/dev/null`
> 142   if [ -f ${HADOOP_IN_PATH} ]; then
> 143 HADOOP_DIR=`dirname "$HADOOP_IN_PATH"`/..
> 144   fi
> 145   # HADOOP_HOME env variable overrides hadoop in the path
> 146   HADOOP_HOME=${HADOOP_HOME:-$HADOOP_DIR}
> 147   if [ "$HADOOP_HOME" == "" ]; then
> 148 echo "Cannot find hadoop installation: \$HADOOP_HOME must be set or 
> hadoop must be in the path";
> 149 exit 4;
> 150   fi
> {noformat}
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2195) Support cyclic replication

2011-09-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100489#comment-13100489
 ] 

stack commented on HBASE-2195:
--

Are we good to go then?

> Support cyclic replication
> --
>
> Key: HBASE-2195
> URL: https://issues.apache.org/jira/browse/HBASE-2195
> Project: HBase
>  Issue Type: Sub-task
>  Components: replication
>Reporter: Jean-Daniel Cryans
>Assignee: Lars Hofhansl
> Attachments: 2195-v10.txt, 2195-v12.txt, 2195-v13.txt, 2195-v14.txt, 
> 2195-v5.txt, 2195-v6.txt, 2195.txt
>
>
> We need to support cyclic replication by using the cluster id of each HlogKey 
> and stop replicating when it goes back to the original cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2195) Support cyclic replication

2011-09-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100492#comment-13100492
 ] 

Ted Yu commented on HBASE-2195:
---

Yes.

> Support cyclic replication
> --
>
> Key: HBASE-2195
> URL: https://issues.apache.org/jira/browse/HBASE-2195
> Project: HBase
>  Issue Type: Sub-task
>  Components: replication
>Reporter: Jean-Daniel Cryans
>Assignee: Lars Hofhansl
> Attachments: 2195-v10.txt, 2195-v12.txt, 2195-v13.txt, 2195-v14.txt, 
> 2195-v5.txt, 2195-v6.txt, 2195.txt
>
>
> We need to support cyclic replication by using the cluster id of each HlogKey 
> and stop replicating when it goes back to the original cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4057) Implement HBase version of "show processlist"

2011-09-08 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100493#comment-13100493
 ] 

jirapos...@reviews.apache.org commented on HBASE-4057:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1750/#review1817
---

Ship it!


Took a quick look at the patch -- looks good -- then I tried it.  Its lovely.  
+1 on commit.

- Michael


On 2011-09-08 15:35:10, Riley Patterson wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1750/
bq.  ---
bq.  
bq.  (Updated 2011-09-08 15:35:10)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Includes extended TaskMonitor with consideration for high-frequency, short 
duration changes, a new MonitoredRPCHandler class, integration with the RPC, 
and filtered exposure both as and HTML table and as JSON via jamon.
bq.  
bq.  
bq.  This addresses bug HBASE-4057.
bq.  https://issues.apache.org/jira/browse/HBASE-4057
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredRPCHandler.java 
PRE-CREATION 
bq./src/main/java/org/apache/hadoop/hbase/master/MasterStatusServlet.java 
1166510 
bq./src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java 1166510 
bq./src/main/java/org/apache/hadoop/hbase/ipc/WritableRpcEngine.java 
1166510 
bq./src/main/jamon/org/apache/hbase/tmpl/regionserver/RSStatusTmpl.jamon 
1166510 
bq./src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java 1166510 
bq./src/main/jamon/org/apache/hbase/tmpl/master/MasterStatusTmpl.jamon 
1166510 
bq./src/main/jamon/org/apache/hbase/tmpl/common/TaskMonitorTmpl.jamon 
1166510 
bq.
/src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredRPCHandlerImpl.java 
PRE-CREATION 
bq./src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredTask.java 
1166510 
bq./src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredTaskImpl.java 
1166510 
bq./src/main/java/org/apache/hadoop/hbase/monitoring/TaskMonitor.java 
1166510 
bq./src/main/java/org/apache/hadoop/hbase/regionserver/RSStatusServlet.java 
1166510 
bq./src/main/resources/hbase-webapps/static/hbase.css 1166510 
bq./src/test/java/org/apache/hadoop/hbase/monitoring/TestTaskMonitor.java 
1166510 
bq.  
bq.  Diff: https://reviews.apache.org/r/1750/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  All unit tests passed. All exposure works as expected. Extensive load 
testing has been done on FB's internal branch.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Riley
bq.  
bq.



> Implement HBase version of "show processlist"
> -
>
> Key: HBASE-4057
> URL: https://issues.apache.org/jira/browse/HBASE-4057
> Project: HBase
>  Issue Type: New Feature
>  Components: client, regionserver
>Reporter: Nicolas Spiegelberg
>Assignee: Riley Patterson
> Attachments: HBASE-4057.patch
>
>
> One of the features that our DBAs use for MySQL analysis is "show 
> processlist", which gives application-level stats about the RPC threads.  
> Right now, we use jstack but that is very core-developer-centric.  We need to 
> create a similar tool that DBA/Ops/AppDevs can use.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2195) Support cyclic replication

2011-09-08 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100501#comment-13100501
 ] 

Jean-Daniel Cryans commented on HBASE-2195:
---

+1

> Support cyclic replication
> --
>
> Key: HBASE-2195
> URL: https://issues.apache.org/jira/browse/HBASE-2195
> Project: HBase
>  Issue Type: Sub-task
>  Components: replication
>Reporter: Jean-Daniel Cryans
>Assignee: Lars Hofhansl
> Attachments: 2195-v10.txt, 2195-v12.txt, 2195-v13.txt, 2195-v14.txt, 
> 2195-v5.txt, 2195-v6.txt, 2195.txt
>
>
> We need to support cyclic replication by using the cluster id of each HlogKey 
> and stop replicating when it goes back to the original cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2195) Support cyclic replication

2011-09-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100507#comment-13100507
 ] 

stack commented on HBASE-2195:
--

Where is v11?  Its whats up on rb?

> Support cyclic replication
> --
>
> Key: HBASE-2195
> URL: https://issues.apache.org/jira/browse/HBASE-2195
> Project: HBase
>  Issue Type: Sub-task
>  Components: replication
>Reporter: Jean-Daniel Cryans
>Assignee: Lars Hofhansl
> Attachments: 2195-v10.txt, 2195-v12.txt, 2195-v13.txt, 2195-v14.txt, 
> 2195-v5.txt, 2195-v6.txt, 2195.txt
>
>
> We need to support cyclic replication by using the cluster id of each HlogKey 
> and stop replicating when it goes back to the original cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.

2011-09-08 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100510#comment-13100510
 ] 

Jean-Daniel Cryans commented on HBASE-4351:
---

A force unassign is a different "type" of unassignment. If not forcing, we 
kindly ask the RS to close the region. If forcing, then we explicitly wipe out 
the master's knowledge of that region... which can be still be fixed later by a 
run of hbck -fix. I think the current behavior is correct, and if there's 
anything to fix it would be to skip calling 
this.assignmentManager.unassign(hri, force) altogether when force=true.

> If from Admin we try to unassign a region forcefully, though a valid region 
> name is given the master is not able to identify the region to unassign.
> 
>
> Key: HBASE-4351
> URL: https://issues.apache.org/jira/browse/HBASE-4351
> Project: HBase
>  Issue Type: Bug
> Environment: Linux
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>
> The following is the problem
> Get the exact region name from UI and call
> HBaseAdmin.unassign(regionname, true).
> Here true is forceful option.
> As part of unassign api
> {code}
>   public void unassign(final byte [] regionName, final boolean force)
>   throws IOException {
> Pair pair =
>   MetaReader.getRegion(this.catalogTracker, regionName);
> if (pair == null) throw new 
> UnknownRegionException(Bytes.toStringBinary(regionName));
> HRegionInfo hri = pair.getFirst();
> if (force) this.assignmentManager.clearRegionFromTransition(hri);
> this.assignmentManager.unassign(hri, force);
>   }
> {code}
> As part of clearRegionFromTransition()
> {code}
> synchronized (this.regions) {
>   this.regions.remove(hri);
>   for (Set regions : this.servers.values()) {
> regions.remove(hri);
>   }
> }
> {code}
> the region is also removed.  Hence when the master tries to identify the 
> region
> {code}
>   if (!regions.containsKey(region)) {
> debugLog(region, "Attempted to unassign region " +
>   region.getRegionNameAsString() + " but it is not " +
>   "currently assigned anywhere");
> return;
>   }
> {code}
> It is not able to identify the region.  It exists in trunk and 0.90.x also.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2195) Support cyclic replication

2011-09-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100511#comment-13100511
 ] 

Ted Yu commented on HBASE-2195:
---

Yes.

> Support cyclic replication
> --
>
> Key: HBASE-2195
> URL: https://issues.apache.org/jira/browse/HBASE-2195
> Project: HBase
>  Issue Type: Sub-task
>  Components: replication
>Reporter: Jean-Daniel Cryans
>Assignee: Lars Hofhansl
> Attachments: 2195-v10.txt, 2195-v12.txt, 2195-v13.txt, 2195-v14.txt, 
> 2195-v5.txt, 2195-v6.txt, 2195.txt
>
>
> We need to support cyclic replication by using the cluster id of each HlogKey 
> and stop replicating when it goes back to the original cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4353) Create VersionedWritable interface that allows a class version and an instance version

2011-09-08 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100513#comment-13100513
 ] 

Lars Hofhansl commented on HBASE-4353:
--

I guess we could have interface with getVersion() and getInstanceVersion(). Or 
maybe getClassVersion() and getInstanceVersion().
Then an abstract implementation (getVersion or getClassVersion would be the 
abstract method) of that interface pretty close to what VersionedWritable does. 
That class would also implement Writable.

In the HLogKey case, it would not subclass the abstract class, but just 
implement the interface.

If that's the route we want to take... Need to come up with good names for the 
interface and the abstract class.
And should existing versioned things (HRegion, WALEdit, etc, etc) be refactored 
into this?

(Just brainstorming here :) )

> Create VersionedWritable interface that allows a class version and an 
> instance version
> --
>
> Key: HBASE-4353
> URL: https://issues.apache.org/jira/browse/HBASE-4353
> Project: HBase
>  Issue Type: New Feature
>Reporter: Ted Yu
>
> Here is comment from Lars (HBASE-2195):
> When we did versioning for a project I used to work on a long time ago, we 
> had maintained a class version *and* an instance version. We might need both.
> The class should know which version it is, but also it is useful in many 
> scenarios to ask the instance what specific version it is on.
> I.e. we need an interface with two methods: One to get the class version, one 
> to get the instance version.
> This would apply to HLogKey which is able to read entries without explicit 
> version.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2195) Support cyclic replication

2011-09-08 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100514#comment-13100514
 ] 

Lars Hofhansl commented on HBASE-2195:
--

Yep, the one of rb.

> Support cyclic replication
> --
>
> Key: HBASE-2195
> URL: https://issues.apache.org/jira/browse/HBASE-2195
> Project: HBase
>  Issue Type: Sub-task
>  Components: replication
>Reporter: Jean-Daniel Cryans
>Assignee: Lars Hofhansl
> Attachments: 2195-v10.txt, 2195-v12.txt, 2195-v13.txt, 2195-v14.txt, 
> 2195-v5.txt, 2195-v6.txt, 2195.txt
>
>
> We need to support cyclic replication by using the cluster id of each HlogKey 
> and stop replicating when it goes back to the original cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4260) Expose a command to manually trigger an HLog roll

2011-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100520#comment-13100520
 ] 

Hudson commented on HBASE-4260:
---

Integrated in HBase-TRUNK #2188 (See 
[https://builds.apache.org/job/HBase-TRUNK/2188/])
HBASE-4260 Expose a command to manually trigger an HLog roll

stack : 
Files : 
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogUtils.java


> Expose a command to manually trigger an HLog roll
> -
>
> Key: HBASE-4260
> URL: https://issues.apache.org/jira/browse/HBASE-4260
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver, shell
>Reporter: Gary Helmling
>Assignee: ramkrishna.s.vasudevan
>  Labels: noob
> Fix For: 0.92.0
>
> Attachments: 4260-v2.patch, HBASE-4260.patch
>
>
> HBASE-4222 added a version of HLog.rollWriter() that allows "forcing" a log 
> roll when requested.  It would be useful to expose this as an 
> HRegionInterface RPC method and provide a corresponding shell command to 
> allow explicit log rolling when desired.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4297) TableMapReduceUtil overwrites user supplied options

2011-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100519#comment-13100519
 ] 

Hudson commented on HBASE-4297:
---

Integrated in HBase-TRUNK #2188 (See 
[https://builds.apache.org/job/HBase-TRUNK/2188/])
HBASE-4297 TableMapReduceUtil overwrites user supplied options

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HBaseConfiguration.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.java


> TableMapReduceUtil overwrites user supplied options
> ---
>
> Key: HBASE-4297
> URL: https://issues.apache.org/jira/browse/HBASE-4297
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.90.4
>Reporter: Jan Lukavsky
> Attachments: HBASE-4297.patch
>
>
> Job configuration is overwritten by hbase-default and hbase-site in 
> TableMapReduceUtil.initTable(Mapper|Reducer)Job, causing unexpected behavior 
> in the following code:
> {noformat}
> Configuration conf = HBaseConfiguration.create();
> // change keyvalue size
> conf.setInt("hbase.client.keyvalue.maxsize", 20971520);
> Job job = new Job(conf, ...);
> TableMapReduceUtil.initTableMapperJob(...);
> // the job doesn't have the option changed, uses it from hbase-site or 
> hbase-default
> job.submit();
> {noformat}
> Although in this case it could be fixed by moving the set() after 
> initTableMapperJob(), in case where user wants to change some option using 
> GenericOptionsParser and -D this is impossible, making this cool feature 
> useless.
> In the 0.20.x era this code behaved as expected. The solution of this problem 
> should be that we don't overwrite the options, but just read them if they are 
> missing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4243) HADOOP_HOME should be auto-detected

2011-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100518#comment-13100518
 ] 

Hudson commented on HBASE-4243:
---

Integrated in HBase-TRUNK #2188 (See 
[https://builds.apache.org/job/HBase-TRUNK/2188/])
HBASE-4243 HADOOP_HOME should be auto-detected

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/bin/hbase


> HADOOP_HOME should be auto-detected
> ---
>
> Key: HBASE-4243
> URL: https://issues.apache.org/jira/browse/HBASE-4243
> Project: HBase
>  Issue Type: Improvement
>Reporter: Roman Shaposhnik
>Assignee: Roman Shaposhnik
>Priority: Minor
> Fix For: 0.92.0
>
> Attachments: HBASE-4243.patch.txt
>
>
> Now that HBASE-3465 has been integrated, perhaps we should try to auto-detect 
> the HADOOP_HOME setting if it is not given explicitly. Something along the 
> lines of:
> {noformat}
> # check for hadoop in the path
> 141   HADOOP_IN_PATH=`which hadoop 2>/dev/null`
> 142   if [ -f ${HADOOP_IN_PATH} ]; then
> 143 HADOOP_DIR=`dirname "$HADOOP_IN_PATH"`/..
> 144   fi
> 145   # HADOOP_HOME env variable overrides hadoop in the path
> 146   HADOOP_HOME=${HADOOP_HOME:-$HADOOP_DIR}
> 147   if [ "$HADOOP_HOME" == "" ]; then
> 148 echo "Cannot find hadoop installation: \$HADOOP_HOME must be set or 
> hadoop must be in the path";
> 149 exit 4;
> 150   fi
> {noformat}
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4341) HRS#closeAllRegions should take care of HRS#onlineRegions's weak consistency

2011-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100517#comment-13100517
 ] 

Hudson commented on HBASE-4341:
---

Integrated in HBase-TRUNK #2188 (See 
[https://builds.apache.org/job/HBase-TRUNK/2188/])
HBASE-4341 HRS#closeAllRegions should take care of HRS#onlineRegions's weak 
consistency

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java


> HRS#closeAllRegions should take care of HRS#onlineRegions's weak consistency
> 
>
> Key: HBASE-4341
> URL: https://issues.apache.org/jira/browse/HBASE-4341
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.90.4
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
> Fix For: 0.90.5
>
> Attachments: HBASE-4341-Branch.patch
>
>
> This's the reason of why did "https://builds.apache.org/job/hbase-0.90/282"; 
> get failure . In this test, one case was timeout and cause the whole test 
> process got killed.
> [logs]
> Here's the related logs(From 
> org.apache.hadoop.hbase.mapreduce.TestTableMapReduce-output.txt):
> {noformat}
> 2011-08-31 10:09:01,089 INFO  
> [RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker] 
> regionserver.Leases(124): 
> RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker closing 
> leases
> 2011-08-31 10:09:01,089 INFO  
> [RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker] 
> regionserver.Leases(131): 
> RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker closed leases
> 2011-08-31 10:09:01,403 INFO  
> [RegionServer:0;vesta.apache.org,52257,1314785332968] 
> regionserver.HRegionServer(709): Waiting on 1 regions to close
> 2011-08-31 10:09:01,403 DEBUG 
> [RegionServer:0;vesta.apache.org,52257,1314785332968] 
> regionserver.HRegionServer(713): 
> {74a7a8befdf9561dc1d90c4241afeac7=mrtest,uuu,1314785328546.74a7a8befdf9561dc1d90c4241afeac7.}
> 2011-08-31 10:09:01,697 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:02,697 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:03,008 INFO  [vesta.apache.org:50036.timeoutMonitor] 
> hbase.Chore(79): vesta.apache.org:50036.timeoutMonitor exiting
> 2011-08-31 10:09:03,697 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:04,697 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:05,698 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:06,698 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:07,698 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> {noformat}
> [Analysis]
> One region was opened during the RS's stopping. 
> This is method of "HRS#closeAllRegions":
> {noformat}
>   protected void closeAllRegions(final boolean abort) {
> closeUserRegions(abort);
> -
> if (meta != null) closeRegion(meta.getRegionInfo(), abort, false);
> if (root != null) closeRegion(root.getRegionInfo(), abort, false);
>   }
> {noformat}
> HRS#onlineRegions is a ConcurrentHashMap. So walk down this map may not get 
> all the data if some entries are been added during the traverse. Once one 
> region was missed, it can't be closed anymore. And this regionserver will not 
> be stopped normally. Then the following logs occurred:
> {noformat}
> 2011-08-31 10:09:01,403 INFO  
> [RegionServer:0;vesta.apache.org,52257,1314785332968] 
> regionserver.HRegionServer(709): Waiting on 1 regions to close
> 2011-08-31 10:09:01,403 DEBUG 
> [RegionServer:0;vesta.apache.org,52257,1314785332968] 
> regionserver.HRegionServer(713): 
> {74a7a8befdf9561dc1d90c4241afeac7=mrtest,uuu,1314785328546.74a7a8befdf9561dc1d90c4241afeac7.}
> 2011-08-31 10:09:01,697 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.

2011-09-08 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100530#comment-13100530
 ] 

ramkrishna.s.vasudevan commented on HBASE-4351:
---

@J-D
Sorry I may be wrong. I was going through the javadoc and it read like
{noformat}
   * @param force If true, force unassign (Will remove region from
   * regions-in-transition too if present).
{noformat}
and because after 
{noformat}
if (force) this.assignmentManager.clearRegionFromTransition(hri);
{noformat}
we were calling the unassign().  So this made me think that something was wrong 
here.
Thanks for correcting me. So can we invalidate this issue?



> If from Admin we try to unassign a region forcefully, though a valid region 
> name is given the master is not able to identify the region to unassign.
> 
>
> Key: HBASE-4351
> URL: https://issues.apache.org/jira/browse/HBASE-4351
> Project: HBase
>  Issue Type: Bug
> Environment: Linux
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>
> The following is the problem
> Get the exact region name from UI and call
> HBaseAdmin.unassign(regionname, true).
> Here true is forceful option.
> As part of unassign api
> {code}
>   public void unassign(final byte [] regionName, final boolean force)
>   throws IOException {
> Pair pair =
>   MetaReader.getRegion(this.catalogTracker, regionName);
> if (pair == null) throw new 
> UnknownRegionException(Bytes.toStringBinary(regionName));
> HRegionInfo hri = pair.getFirst();
> if (force) this.assignmentManager.clearRegionFromTransition(hri);
> this.assignmentManager.unassign(hri, force);
>   }
> {code}
> As part of clearRegionFromTransition()
> {code}
> synchronized (this.regions) {
>   this.regions.remove(hri);
>   for (Set regions : this.servers.values()) {
> regions.remove(hri);
>   }
> }
> {code}
> the region is also removed.  Hence when the master tries to identify the 
> region
> {code}
>   if (!regions.containsKey(region)) {
> debugLog(region, "Attempted to unassign region " +
>   region.getRegionNameAsString() + " but it is not " +
>   "currently assigned anywhere");
> return;
>   }
> {code}
> It is not able to identify the region.  It exists in trunk and 0.90.x also.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4353) Create VersionedWritable interface that allows a class version and an instance version

2011-09-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100532#comment-13100532
 ] 

Ted Yu commented on HBASE-4353:
---

I think DoubleVersionedWritable may not be a good name.
We can name the interface WritableWithVersions. Since it lives inside 
org.apache.hadoop.hbase namespace, it is easy to distinguish with 
VersionedWritable.

If HLogKey only implements this interface, there is no hurry in creating the 
abstract class.
In fact, I am not sure the abstract class should assume version being a byte.

> Create VersionedWritable interface that allows a class version and an 
> instance version
> --
>
> Key: HBASE-4353
> URL: https://issues.apache.org/jira/browse/HBASE-4353
> Project: HBase
>  Issue Type: New Feature
>Reporter: Ted Yu
>
> Here is comment from Lars (HBASE-2195):
> When we did versioning for a project I used to work on a long time ago, we 
> had maintained a class version *and* an instance version. We might need both.
> The class should know which version it is, but also it is useful in many 
> scenarios to ask the instance what specific version it is on.
> I.e. we need an interface with two methods: One to get the class version, one 
> to get the instance version.
> This would apply to HLogKey which is able to read entries without explicit 
> version.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.

2011-09-08 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100541#comment-13100541
 ] 

Jean-Daniel Cryans commented on HBASE-4351:
---

Well there's definitely something we need to do. For example the shell help 
says:

bq.  Pass 'true' to force the unassignment ('force' will clear all in-memory 
state in master before the reassign). 

This doesn't make sense since you can't wipe out the in-memory state, and then 
know where to unassign (you just wiped that). Maybe we could do something 
better since we're not going to send a close... like I mentioned, I have to 
hbck -fix to reassign the region after using this, maybe we could call assign 
in the unassign method when force=true?

I know it sounds dumb, but in the end that's what happens anyway when you 
unassign with force=false because when the master receives the notification 
that the region is closed then it reassigns it.

> If from Admin we try to unassign a region forcefully, though a valid region 
> name is given the master is not able to identify the region to unassign.
> 
>
> Key: HBASE-4351
> URL: https://issues.apache.org/jira/browse/HBASE-4351
> Project: HBase
>  Issue Type: Bug
> Environment: Linux
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>
> The following is the problem
> Get the exact region name from UI and call
> HBaseAdmin.unassign(regionname, true).
> Here true is forceful option.
> As part of unassign api
> {code}
>   public void unassign(final byte [] regionName, final boolean force)
>   throws IOException {
> Pair pair =
>   MetaReader.getRegion(this.catalogTracker, regionName);
> if (pair == null) throw new 
> UnknownRegionException(Bytes.toStringBinary(regionName));
> HRegionInfo hri = pair.getFirst();
> if (force) this.assignmentManager.clearRegionFromTransition(hri);
> this.assignmentManager.unassign(hri, force);
>   }
> {code}
> As part of clearRegionFromTransition()
> {code}
> synchronized (this.regions) {
>   this.regions.remove(hri);
>   for (Set regions : this.servers.values()) {
> regions.remove(hri);
>   }
> }
> {code}
> the region is also removed.  Hence when the master tries to identify the 
> region
> {code}
>   if (!regions.containsKey(region)) {
> debugLog(region, "Attempted to unassign region " +
>   region.getRegionNameAsString() + " but it is not " +
>   "currently assigned anywhere");
> return;
>   }
> {code}
> It is not able to identify the region.  It exists in trunk and 0.90.x also.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4353) Create VersionedWritable interface that allows a class version and an instance version

2011-09-08 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100546#comment-13100546
 ] 

Lars Hofhansl commented on HBASE-4353:
--

Version could be a vint.

> Create VersionedWritable interface that allows a class version and an 
> instance version
> --
>
> Key: HBASE-4353
> URL: https://issues.apache.org/jira/browse/HBASE-4353
> Project: HBase
>  Issue Type: New Feature
>Reporter: Ted Yu
>
> Here is comment from Lars (HBASE-2195):
> When we did versioning for a project I used to work on a long time ago, we 
> had maintained a class version *and* an instance version. We might need both.
> The class should know which version it is, but also it is useful in many 
> scenarios to ask the instance what specific version it is on.
> I.e. we need an interface with two methods: One to get the class version, one 
> to get the instance version.
> This would apply to HLogKey which is able to read entries without explicit 
> version.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4304) requestsPerSecond counter stuck at 0

2011-09-08 Thread Li Pi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100552#comment-13100552
 ] 

Li Pi commented on HBASE-4304:
--

Yup. Its on my barnburner. But I do intend to complete it.
On Sep 8, 2011 5:26 AM, "subramanian raghunathan (JIRA)" 
https://issues.apache.org/jira/browse/HBASE-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100263#comment-13100263]
showing 0 both in the master UI and in the RS UI. The writeRequestsCount
metric is properly updating in the RS UI.


> requestsPerSecond counter stuck at 0
> 
>
> Key: HBASE-4304
> URL: https://issues.apache.org/jira/browse/HBASE-4304
> Project: HBase
>  Issue Type: Bug
>  Components: master, regionserver
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Assignee: Li Pi
>Priority: Critical
> Fix For: 0.92.0
>
>
> Running trunk @ r1163343, all of the requestsPerSecond counters are showing 0 
> both in the master UI and in the RS UI. The writeRequestsCount metric is 
> properly updating in the RS UI.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4354) track region history

2011-09-08 Thread Ming Ma (JIRA)
track region history


 Key: HBASE-4354
 URL: https://issues.apache.org/jira/browse/HBASE-4354
 Project: HBase
  Issue Type: New Feature
  Components: master, metrics, regionserver
Reporter: Ming Ma
Assignee: Ming Ma


For debugging and analysis purposes it will be useful to understand regions' 
lifecycle, how it is created ( from which parent region, for example), how it 
is splitted, assigned, etc. Some of these info are in the logs, hbase .META. 
table, zookeeper, metrics. Certain history data is lost; for example, the 
states will be removed from zookeeper /hbase/unassigned once the region is 
assigned; also .META. table has max version of 10 thus only tracks the last 10 
RS assignments of a given region. It will be nice to put it a central place. It 
can provide:

1. How applications use hbase. For example, it might create large number of 
regions in a short period of time and drop the table later.
2. How HBase internally manage regions such as how regions are splitted, 
assigned, turned offline, etc.

Things to track
1. How it is created, parent region in the case of split.
2. Region tranisition process such as region state change, region server change.


One idea is to put such transition history data to zookeeper. One issue is it 
could blow up zookeeper memory if we have large number of regions and the 
cluster runs for a long time. I would like to get your feedback on different 
approaches to address the issue. One assumption is region assignment doesn't 
happen with high frequency and thus the overhead introduced won't have much 
impact on the system performance.


Approach 1:

Zookeeper knows the history of how /hbase/unassigned is modified, if we can get 
zookeeper's logs (Bookkeeper ? ) somehow, we know the history of region 
transition.

Approach 2:

1.  HBase logs extra region transition data to zookeeper. It could be one 
zookeeper node per transaction.
2.  Have a separate thread on the Master to move data from zookeeper and 
append to HDFS. That will keep the zookeeper size in check.
3.  Have some tool or web UI to show the history of a given region by 
looking at zookeeper and HDFS.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4015) Refactor the TimeoutMonitor to make it less racy

2011-09-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100573#comment-13100573
 ] 

stack commented on HBASE-4015:
--

Sorry.  I bungled the commit.  Ram pointed out that the latest attached here is 
not the right patch to apply.  I should have gotten the patch from RB.  I had 
to make two attempts at fixup.  My third application hopefully is correct (I'm 
sure Ram will let me know if it is not).  Thanks for staying on top of this Ram.

> Refactor the TimeoutMonitor to make it less racy
> 
>
> Key: HBASE-4015
> URL: https://issues.apache.org/jira/browse/HBASE-4015
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 0.90.3
>Reporter: Jean-Daniel Cryans
>Assignee: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: HBASE-4015_1_trunk.patch, HBASE-4015_2_trunk.patch, 
> HBASE-4015_reprepared_trunk_2.patch, Timeoutmonitor with state diagrams.pdf
>
>
> The current implementation of the TimeoutMonitor acts like a race condition 
> generator, mostly making things worse rather than better. It does it's own 
> thing for a while without caring for what's happening in the rest of the 
> master.
> The first thing that needs to happen is that the regions should not be 
> processed in one big batch, because that sometimes can take minutes to 
> process (meanwhile a region that timed out opening might have opened, then 
> what happens is it will be reassigned by the TimeoutMonitor generating the 
> never ending PENDING_OPEN situation).
> Those operations should also be done more atomically, although I'm not sure 
> how to do it in a scalable way in this case.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4347) Remove duplicated code from Put, Delete, Get, Scan, MultiPut

2011-09-08 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-4347:
-

Attachment: 4347.txt

This are the two things I have roughly in mind:
1. A simple OperationWithAttributes class (and Attributes interface) that all 
Operations that attributes extend (Get/Scan/Put/Delete).
2. A Mutation class. Extended by Put and Delete. All shared attributes are 
moved up into Mutation (as protected), and shared methods are move up.

While doing this I noticed a bunch differences between Put and Delete:
o Put.toMap does not include the timestamp.
o Put also has no setter for timestamp
o Delete has no numFamilies method
o Delete has no heapSize method.


> Remove duplicated code from Put, Delete, Get, Scan, MultiPut
> 
>
> Key: HBASE-4347
> URL: https://issues.apache.org/jira/browse/HBASE-4347
> Project: HBase
>  Issue Type: Improvement
>  Components: util
>Affects Versions: 0.92.0
>Reporter: Lars Hofhansl
>Priority: Minor
> Fix For: 0.92.0
>
> Attachments: 4347.txt
>
>
> This came from discussion with Stack w.r.t. HBASE-2195.
> There is currently a lot of duplicated code especially between Put and 
> Delete, and also between all Operations.
> For example all of Put/Delete/Get/Scan have attributes with exactly the same 
> code in all classes.
> Put and Delete also have the familyMap, Row, Rowlock, Timestamp, etc.
> One way to do this is to introduce "OperationWithAttributes" which extends 
> Operation, and have Put/Delete/Get/Scan extend that rather than Operation.
> In addition Put and Delete could extends from Mutation (which itself would 
> extend OperationWithAttributes).
> If a static inheritance hierarchy is not desired here, we can use delegation.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4347) Remove duplicated code from Put, Delete, Get, Scan, MultiPut

2011-09-08 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100608#comment-13100608
 ] 

Lars Hofhansl commented on HBASE-4347:
--

HBASE-2105 be checked in first, because it add new shared members between Put 
and Delete, that should be into Mutation.

> Remove duplicated code from Put, Delete, Get, Scan, MultiPut
> 
>
> Key: HBASE-4347
> URL: https://issues.apache.org/jira/browse/HBASE-4347
> Project: HBase
>  Issue Type: Improvement
>  Components: util
>Affects Versions: 0.92.0
>Reporter: Lars Hofhansl
>Priority: Minor
> Fix For: 0.92.0
>
> Attachments: 4347.txt
>
>
> This came from discussion with Stack w.r.t. HBASE-2195.
> There is currently a lot of duplicated code especially between Put and 
> Delete, and also between all Operations.
> For example all of Put/Delete/Get/Scan have attributes with exactly the same 
> code in all classes.
> Put and Delete also have the familyMap, Row, Rowlock, Timestamp, etc.
> One way to do this is to introduce "OperationWithAttributes" which extends 
> Operation, and have Put/Delete/Get/Scan extend that rather than Operation.
> In addition Put and Delete could extends from Mutation (which itself would 
> extend OperationWithAttributes).
> If a static inheritance hierarchy is not desired here, we can use delegation.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4347) Remove duplicated code from Put, Delete, Get, Scan, MultiPut

2011-09-08 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100609#comment-13100609
 ] 

Lars Hofhansl commented on HBASE-4347:
--

HBASE-2195 that is.

> Remove duplicated code from Put, Delete, Get, Scan, MultiPut
> 
>
> Key: HBASE-4347
> URL: https://issues.apache.org/jira/browse/HBASE-4347
> Project: HBase
>  Issue Type: Improvement
>  Components: util
>Affects Versions: 0.92.0
>Reporter: Lars Hofhansl
>Priority: Minor
> Fix For: 0.92.0
>
> Attachments: 4347.txt
>
>
> This came from discussion with Stack w.r.t. HBASE-2195.
> There is currently a lot of duplicated code especially between Put and 
> Delete, and also between all Operations.
> For example all of Put/Delete/Get/Scan have attributes with exactly the same 
> code in all classes.
> Put and Delete also have the familyMap, Row, Rowlock, Timestamp, etc.
> One way to do this is to introduce "OperationWithAttributes" which extends 
> Operation, and have Put/Delete/Get/Scan extend that rather than Operation.
> In addition Put and Delete could extends from Mutation (which itself would 
> extend OperationWithAttributes).
> If a static inheritance hierarchy is not desired here, we can use delegation.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4007) distributed log splitting can get indefinitely stuck

2011-09-08 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100628#comment-13100628
 ] 

jirapos...@reviews.apache.org commented on HBASE-4007:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1747/
---

(Updated 2011-09-08 19:34:22.528153)


Review request for hbase, Ted Yu and Michael Stack.


Changes
---

Implemented Ted's feedback

tested on a cluster that when a regionserver (splitlog-worker) dies then the 
notification reaches SplitLogManager.


Summary
---

1/ resubmit all tasks owned by a dead splitlog-worker
2/ prevent accumulation of /hbase/splitlog/RESCAN nodes


This addresses bug HBASE-4007.
https://issues.apache.org/jira/browse/HBASE-4007


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 54b6d45 
  src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java 9a71fdf 
  src/main/java/org/apache/hadoop/hbase/zookeeper/ZKSplitLog.java 61e5c65 
  src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java 9a88855 

Diff: https://reviews.apache.org/r/1747/diff


Testing
---

1/ resubmit all tasks owned by a dead splitlog-worker - only unit tested. will 
do cluster testing.
2/ prevent accumulation of /hbase/splitlog/RESCAN nodes - tested and deployed 
in production.


Thanks,

Prakash



> distributed log splitting can get indefinitely stuck
> 
>
> Key: HBASE-4007
> URL: https://issues.apache.org/jira/browse/HBASE-4007
> Project: HBase
>  Issue Type: Bug
>Reporter: Prakash Khemani
>Assignee: Prakash Khemani
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-4007-distributed-log-splitting-can-get-indefin.patch
>
>
> After the configured number of retries SplitLogManager is not going to 
> resubmit log-split tasks. In this situation even if the splitLogWorker that 
> owns the task dies the task will not get resubmitted.
> When a regionserver goes away then all the split-log tasks that it owned 
> should be resubmitted by the SplitLogMaster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4330) Fix races in slab cache

2011-09-08 Thread Li Pi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Pi updated HBASE-4330:
-

Attachment: hbase-4330.txt

fixed evictor resource starvation. Removed spinlock.

Spinlock, with enough threads, was starving the evictionthread of cycles. This 
causes the tests to run extremely slowly, giving the appearance of a hang.

> Fix races in slab cache
> ---
>
> Key: HBASE-4330
> URL: https://issues.apache.org/jira/browse/HBASE-4330
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.92.0
>
> Attachments: hbase-4330.txt, hbase-4330.txt
>
>
> A few races are still lingering in the slab cache. Here are some tests and 
> proposed fixes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4330) Fix races in slab cache

2011-09-08 Thread Li Pi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Pi updated HBASE-4330:
-

Assignee: Li Pi  (was: Todd Lipcon)
  Status: Patch Available  (was: Open)

> Fix races in slab cache
> ---
>
> Key: HBASE-4330
> URL: https://issues.apache.org/jira/browse/HBASE-4330
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Assignee: Li Pi
> Fix For: 0.92.0
>
> Attachments: hbase-4330.txt, hbase-4330.txt
>
>
> A few races are still lingering in the slab cache. Here are some tests and 
> proposed fixes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4243) HADOOP_HOME should be auto-detected

2011-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100646#comment-13100646
 ] 

Hudson commented on HBASE-4243:
---

Integrated in HBase-TRUNK #2189 (See 
[https://builds.apache.org/job/HBase-TRUNK/2189/])
HBASE-4243  HADOOP_HOME should be auto-detected (Roman Shaposhnik) -- 
revert.. non-portable shell change

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/bin/hbase


> HADOOP_HOME should be auto-detected
> ---
>
> Key: HBASE-4243
> URL: https://issues.apache.org/jira/browse/HBASE-4243
> Project: HBase
>  Issue Type: Improvement
>Reporter: Roman Shaposhnik
>Assignee: Roman Shaposhnik
>Priority: Minor
> Fix For: 0.92.0
>
> Attachments: HBASE-4243.patch.txt
>
>
> Now that HBASE-3465 has been integrated, perhaps we should try to auto-detect 
> the HADOOP_HOME setting if it is not given explicitly. Something along the 
> lines of:
> {noformat}
> # check for hadoop in the path
> 141   HADOOP_IN_PATH=`which hadoop 2>/dev/null`
> 142   if [ -f ${HADOOP_IN_PATH} ]; then
> 143 HADOOP_DIR=`dirname "$HADOOP_IN_PATH"`/..
> 144   fi
> 145   # HADOOP_HOME env variable overrides hadoop in the path
> 146   HADOOP_HOME=${HADOOP_HOME:-$HADOOP_DIR}
> 147   if [ "$HADOOP_HOME" == "" ]; then
> 148 echo "Cannot find hadoop installation: \$HADOOP_HOME must be set or 
> hadoop must be in the path";
> 149 exit 4;
> 150   fi
> {noformat}
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4015) Refactor the TimeoutMonitor to make it less racy

2011-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100645#comment-13100645
 ] 

Hudson commented on HBASE-4015:
---

Integrated in HBase-TRUNK #2189 (See 
[https://builds.apache.org/job/HBase-TRUNK/2189/])
HBASE-4015 Refactor the TimeoutMonitor to make it less racy
HBASE-4015 Refactor the TimeoutMonitor to make it less racy
HBASE-4015 Refactor the TimeoutMonitor to make it less racy

stack : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/TimeOutManagerCallable.java

stack : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/TimeOutManagerCallable.java

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenMetaHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRootHandler.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java


> Refactor the TimeoutMonitor to make it less racy
> 
>
> Key: HBASE-4015
> URL: https://issues.apache.org/jira/browse/HBASE-4015
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 0.90.3
>Reporter: Jean-Daniel Cryans
>Assignee: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: HBASE-4015_1_trunk.patch, HBASE-4015_2_trunk.patch, 
> HBASE-4015_reprepared_trunk_2.patch, Timeoutmonitor with state diagrams.pdf
>
>
> The current implementation of the TimeoutMonitor acts like a race condition 
> generator, mostly making things worse rather than better. It does it's own 
> thing for a while without caring for what's happening in the rest of the 
> master.
> The first thing that needs to happen is that the regions should not be 
> processed in one big batch, because that sometimes can take minutes to 
> process (meanwhile a region that timed out opening might have opened, then 
> what happens is it will be reassigned by the TimeoutMonitor generating the 
> never ending PENDING_OPEN situation).
> Those operations should also be done more atomically, although I'm not sure 
> how to do it in a scalable way in this case.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4007) distributed log splitting can get indefinitely stuck

2011-09-08 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100681#comment-13100681
 ] 

jirapos...@reviews.apache.org commented on HBASE-4007:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1747/#review1819
---

Ship it!



src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java


I know this code has been there.
But it follows a similar pattern where enum is more appropriate.


- Ted


On 2011-09-08 19:34:22, Prakash Khemani wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1747/
bq.  ---
bq.  
bq.  (Updated 2011-09-08 19:34:22)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  1/ resubmit all tasks owned by a dead splitlog-worker
bq.  2/ prevent accumulation of /hbase/splitlog/RESCAN nodes
bq.  
bq.  
bq.  This addresses bug HBASE-4007.
bq.  https://issues.apache.org/jira/browse/HBASE-4007
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 
54b6d45 
bq.src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java 
9a71fdf 
bq.src/main/java/org/apache/hadoop/hbase/zookeeper/ZKSplitLog.java 61e5c65 
bq.src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java 
9a88855 
bq.  
bq.  Diff: https://reviews.apache.org/r/1747/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  1/ resubmit all tasks owned by a dead splitlog-worker - only unit tested. 
will do cluster testing.
bq.  2/ prevent accumulation of /hbase/splitlog/RESCAN nodes - tested and 
deployed in production.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Prakash
bq.  
bq.



> distributed log splitting can get indefinitely stuck
> 
>
> Key: HBASE-4007
> URL: https://issues.apache.org/jira/browse/HBASE-4007
> Project: HBase
>  Issue Type: Bug
>Reporter: Prakash Khemani
>Assignee: Prakash Khemani
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-4007-distributed-log-splitting-can-get-indefin.patch
>
>
> After the configured number of retries SplitLogManager is not going to 
> resubmit log-split tasks. In this situation even if the splitLogWorker that 
> owns the task dies the task will not get resubmitted.
> When a regionserver goes away then all the split-log tasks that it owned 
> should be resubmitted by the SplitLogMaster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4355) TestHTablePool failure

2011-09-08 Thread Ming Ma (JIRA)
TestHTablePool failure
--

 Key: HBASE-4355
 URL: https://issues.apache.org/jira/browse/HBASE-4355
 Project: HBase
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma


This unit test has been failing on my machine with the following error.

testTableWithStringName(org.apache.hadoop.hbase.client.TestHTablePool$TestHTableThreadLocalPool):
 Cluster already running at 
/hbase-core-trunk/target/test-data/2e41efb9-7b96-4ab3-abec-c58f467b220c/af01017e-ee3c-46fc-b908-078a3a4e8b52/bfd8e9b4-66da-4322-96bd-6db4564d8f41/d9a97e3d-8ffb-4945-a71e-d059e3bc7274/6cdf0b73-b9a0-45f4-856d-53cd02ecebce/34c41612-9311-4199-9902-cf30a9cb7b9d/33e7bfd5-2519-4349-9a44-d05000e00526/dbc60fd9-756d-4263-9ed1-bbff69ec7a80/0e1bde7e-c966-4c3e-a01c-50ded9cb166b/415e8d51-46f2-4d50-879a-870298a9e1f8/fb165bb9-7d6c-4cf8-970a-e281b9818e97

It looks like TestHTablePool uses nested classes TestHTableReusablePool and 
TestHTableThreadLocalPool. Both classes could be instantiated by junit 
framework in mulitple threads fashion. Both classes call 
HBaseTestingUtilility.startMiniCluster. HBaseTestingUtilility.isRunningCluster 
throws this exception.


Is the understanding about junit framework correctly? I don't know why others 
haven't got such error.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4356) Wide rows can cause OOME when compacting

2011-09-08 Thread Nate Putnam (JIRA)
Wide rows can cause OOME when compacting


 Key: HBASE-4356
 URL: https://issues.apache.org/jira/browse/HBASE-4356
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Nate Putnam


The scanner used for compaction doesn't limit the number of columns retrieved 
when doing a compaction. If a row exists with tens of millions of rows it can 
fill all available memory and crash the region server. 

It would be better if the scanner could page through the columns of wide rows 
when performing a compaction. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4356) Wide rows can cause OOME when compacting

2011-09-08 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100723#comment-13100723
 ] 

Todd Lipcon commented on HBASE-4356:


Dup of HBASE-3421?

> Wide rows can cause OOME when compacting
> 
>
> Key: HBASE-4356
> URL: https://issues.apache.org/jira/browse/HBASE-4356
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: Nate Putnam
>
> The scanner used for compaction doesn't limit the number of columns retrieved 
> when doing a compaction. If a row exists with tens of millions of rows it can 
> fill all available memory and crash the region server. 
> It would be better if the scanner could page through the columns of wide rows 
> when performing a compaction. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4357) Region in transition - in closing state

2011-09-08 Thread Ming Ma (JIRA)
Region in transition - in closing state
---

 Key: HBASE-4357
 URL: https://issues.apache.org/jira/browse/HBASE-4357
 Project: HBase
  Issue Type: Bug
Reporter: Ming Ma


Got the following during testing, 

1. On a given machine, kill "RS process id". Then kill "HMaster process id".
2. Start RS first via "bin/hbase-daemon.sh --config ./conf start 
regionserver.". Start HMaster via "bin/hbase-daemon.sh --config ./conf start 
master".

One region of a table stayed in closing state.

According to zookeeper,
794a6ff17a4de0dd0a19b984ba18eea9 
miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
 state=CLOSING, ts=Wed Sep 07 17:21:44 PDT 2011 (75701s ago), 
server=sea-esxi-0,6,1315428682281 

According to .META. table, the region has been assigned to from sea-esxi-0 to 
sea-esxi-4.

miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
 sea-esxi-4:60030  H\xB49X\x10bM\xB1 I7K\xC6\xA7\xEF\x9D\x90 0 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2195) Support cyclic replication

2011-09-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100738#comment-13100738
 ] 

Ted Yu commented on HBASE-2195:
---

Integrated v11 to TRUNK.

Thanks for the patch Lars.
Thanks for the review Michael and J-D.

> Support cyclic replication
> --
>
> Key: HBASE-2195
> URL: https://issues.apache.org/jira/browse/HBASE-2195
> Project: HBase
>  Issue Type: Sub-task
>  Components: replication
>Reporter: Jean-Daniel Cryans
>Assignee: Lars Hofhansl
> Attachments: 2195-v10.txt, 2195-v12.txt, 2195-v13.txt, 2195-v14.txt, 
> 2195-v5.txt, 2195-v6.txt, 2195.txt
>
>
> We need to support cyclic replication by using the cluster id of each HlogKey 
> and stop replicating when it goes back to the original cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4105) Stargate does not support Content-Type: application/json and Content-Encoding: gzip in parallel

2011-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100736#comment-13100736
 ] 

Hudson commented on HBASE-4105:
---

Integrated in HBase-TRUNK #2190 (See 
[https://builds.apache.org/job/HBase-TRUNK/2190/])
HBASE-4105 HBASE-4015-Making the timeout monitor less racy; third attempt

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignCallable.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/UnAssignCallable.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenMetaHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRootHandler.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java


> Stargate does not support Content-Type: application/json and 
> Content-Encoding: gzip in parallel
> ---
>
> Key: HBASE-4105
> URL: https://issues.apache.org/jira/browse/HBASE-4105
> Project: HBase
>  Issue Type: Bug
>  Components: rest
>Affects Versions: 0.90.1
> Environment: Server: jetty/6.1.26
> REST: 0.0.2 
> OS: Linux 2.6.32-bpo.5-amd64 amd64
> Jersey: 1.4
> JVM: Sun Microsystems Inc. 1.6.0_22-17.1-b03
>Reporter: Jean-Pierre Koenig
>Assignee: Andrew Purtell
>  Labels: gzip, json, rest
> Fix For: 0.90.4, 0.94.0
>
> Attachments: HBASE-4105.patch
>
>
> When:
> curl -H "Accept: application/json" http://localhost:3000/version -v
> Response is:
> About to connect() to localhost port 3000 (#0)
> Trying 127.0.0.1... connected
> Connected to localhost (127.0.0.1) port 3000 (#0)
> > GET /version HTTP/1.1
> > User-Agent: curl/7.19.7 (universal-apple-darwin10.0) libcurl/7.19.7 
> > OpenSSL/0.9.8r zlib/1.2.3
> > Host: localhost:3000
> > Accept: application/json
> > 
> < HTTP/1.1 200 OK
> < Cache-Control: no-cache
> < Content-Type: application/json
> < Transfer-Encoding: chunked
> <
> Connection #0 to host localhost left intact
> Closing connection #0 {"Server":"jetty/6.1.26","REST":"0.0.2","OS":"Linux 
> 2.6.32-bpo.5-amd64 amd64","Jersey":"1.4","JVM":"Sun Microsystems Inc. 
> 1.6.0_22-17.1-b03"}
> but with compression:
> curl -H "Accept: application/json" http://localhost:3000/version -v 
> --compressed
> Reponse is:
> About to connect() to localhost port 3000 (#0)
> Trying 127.0.0.1 ... connected
> Connected to localhost (127.0.0.1) port 3000 (#0)
> > GET /version HTTP/1.1
> > User-Agent: curl/7.19.7 (universal-apple-darwin10.0) libcurl/7.19.7 
> > OpenSSL/0.9.8r zlib/1.2.3
> > Host: localhost:3000
> > Accept-Encoding: deflate, gzip
> > Accept: application/json
> > 
> < HTTP/1.1 200 OK
> < Cache-Control: no-cache
> < Content-Type: application/json
> < Content-Encoding: gzip
> < Transfer-Encoding: chunked
> <
> Connection #0 to host localhost left intact
> Closing connection #0
> and the stargate server throws the following exception:
> 11/07/14 11:21:44 ERROR mortbay.log: /version
> java.lang.ClassCastException: org.mortbay.jetty.HttpConnection$Output cannot 
> be cast to org.apache.hadoop.hbase.rest.filter.GZIPResponseStream
> at org.apache.hadoop.hbase.rest.filter.GzipFilter.doFilter(GzipFilter.java:54)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:326)
> at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> at 
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
> at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> This is not reproduceable with content type text/plain and gzi

[jira] [Commented] (HBASE-4015) Refactor the TimeoutMonitor to make it less racy

2011-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100737#comment-13100737
 ] 

Hudson commented on HBASE-4015:
---

Integrated in HBase-TRUNK #2190 (See 
[https://builds.apache.org/job/HBase-TRUNK/2190/])
HBASE-4105 HBASE-4015-Making the timeout monitor less racy; third attempt
HBASE-4015 Refactor the TimeoutMonitor to make it less racy -- REVERT.  Ram 
says I have applied wrong patch.  Redoing.   A SECOND TIMEsvn rm --force   
src/main/java/org/apache/hadoop/hbase/master/TimeOutManagerCallable.javasvn rm 
--force   
src/main/java/org/apache/hadoop/hbase/master/TimeOutManagerCallable.java
HBASE-4015 Refactor the TimeoutMonitor to make it less racy -- reapply with 
HBASE-4015_reprepared_trunk_2.patch
HBASE-4015 Refactor the TimeoutMonitor to make it less racy -- REVERT.  Ram 
says I have applied wrong patch.  Redoing.

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignCallable.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/UnAssignCallable.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenMetaHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRootHandler.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/TimeOutManagerCallable.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenMetaHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRootHandler.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/TimeOutManagerCallable.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenMetaHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRootHandler.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/TimeOutManagerCallable.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenMetaHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRootHandler.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java


> Refactor the TimeoutMonitor to make it less racy
> 
>
> Key: HBASE-4015
> URL: https://issues.apache.org/jira/browse/HBASE-4015
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 0.90.3
>Reporter: Jean-Daniel Cryans
>Assignee: ramkrishna.s.vasudevan
>Priori

[jira] [Commented] (HBASE-4007) distributed log splitting can get indefinitely stuck

2011-09-08 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100741#comment-13100741
 ] 

jirapos...@reviews.apache.org commented on HBASE-4007:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1747/
---

(Updated 2011-09-08 22:09:32.715934)


Review request for hbase, Ted Yu and Michael Stack.


Changes
---

implemented Ted's feedback.


Summary
---

1/ resubmit all tasks owned by a dead splitlog-worker
2/ prevent accumulation of /hbase/splitlog/RESCAN nodes


This addresses bug HBASE-4007.
https://issues.apache.org/jira/browse/HBASE-4007


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java 9a71fdf 
  src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 54b6d45 
  src/main/java/org/apache/hadoop/hbase/zookeeper/ZKSplitLog.java 61e5c65 
  src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java 9a88855 

Diff: https://reviews.apache.org/r/1747/diff


Testing
---

1/ resubmit all tasks owned by a dead splitlog-worker - only unit tested. will 
do cluster testing.
2/ prevent accumulation of /hbase/splitlog/RESCAN nodes - tested and deployed 
in production.


Thanks,

Prakash



> distributed log splitting can get indefinitely stuck
> 
>
> Key: HBASE-4007
> URL: https://issues.apache.org/jira/browse/HBASE-4007
> Project: HBase
>  Issue Type: Bug
>Reporter: Prakash Khemani
>Assignee: Prakash Khemani
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-4007-distributed-log-splitting-can-get-indefin.patch
>
>
> After the configured number of retries SplitLogManager is not going to 
> resubmit log-split tasks. In this situation even if the splitLogWorker that 
> owns the task dies the task will not get resubmitted.
> When a regionserver goes away then all the split-log tasks that it owned 
> should be resubmitted by the SplitLogMaster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4330) Fix races in slab cache

2011-09-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100740#comment-13100740
 ] 

Ted Yu commented on HBASE-4330:
---

Please fix the following:
{code}
[INFO] Compilation failure

/home/hadoop/hbase/src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSlab.java:[47,33]
 unreported exception java.lang.InterruptedException; must be caught or 
declared to be thrown

/home/hadoop/hbase/src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSlab.java:[67,33]
 unreported exception java.lang.InterruptedException; must be caught or 
declared to be thrown
{code}

> Fix races in slab cache
> ---
>
> Key: HBASE-4330
> URL: https://issues.apache.org/jira/browse/HBASE-4330
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Assignee: Li Pi
> Fix For: 0.92.0
>
> Attachments: hbase-4330.txt, hbase-4330.txt
>
>
> A few races are still lingering in the slab cache. Here are some tests and 
> proposed fixes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4057) Implement HBase version of "show processlist"

2011-09-08 Thread Riley Patterson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riley Patterson updated HBASE-4057:
---

Attachment: HBASE-4057-doc.docx

A basic, user-manual type of document describing the feature, its use, its 
configuration, and grokking its output.

> Implement HBase version of "show processlist"
> -
>
> Key: HBASE-4057
> URL: https://issues.apache.org/jira/browse/HBASE-4057
> Project: HBase
>  Issue Type: New Feature
>  Components: client, regionserver
>Reporter: Nicolas Spiegelberg
>Assignee: Riley Patterson
> Attachments: HBASE-4057-doc.docx, HBASE-4057.patch
>
>
> One of the features that our DBAs use for MySQL analysis is "show 
> processlist", which gives application-level stats about the RPC threads.  
> Right now, we use jstack but that is very core-developer-centric.  We need to 
> create a similar tool that DBA/Ops/AppDevs can use.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4007) distributed log splitting can get indefinitely stuck

2011-09-08 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100745#comment-13100745
 ] 

jirapos...@reviews.apache.org commented on HBASE-4007:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1747/#review1824
---

Ship it!


- Ted


On 2011-09-08 22:09:32, Prakash Khemani wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1747/
bq.  ---
bq.  
bq.  (Updated 2011-09-08 22:09:32)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  1/ resubmit all tasks owned by a dead splitlog-worker
bq.  2/ prevent accumulation of /hbase/splitlog/RESCAN nodes
bq.  
bq.  
bq.  This addresses bug HBASE-4007.
bq.  https://issues.apache.org/jira/browse/HBASE-4007
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java 
9a71fdf 
bq.src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 
54b6d45 
bq.src/main/java/org/apache/hadoop/hbase/zookeeper/ZKSplitLog.java 61e5c65 
bq.src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java 
9a88855 
bq.  
bq.  Diff: https://reviews.apache.org/r/1747/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  1/ resubmit all tasks owned by a dead splitlog-worker - only unit tested. 
will do cluster testing.
bq.  2/ prevent accumulation of /hbase/splitlog/RESCAN nodes - tested and 
deployed in production.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Prakash
bq.  
bq.



> distributed log splitting can get indefinitely stuck
> 
>
> Key: HBASE-4007
> URL: https://issues.apache.org/jira/browse/HBASE-4007
> Project: HBase
>  Issue Type: Bug
>Reporter: Prakash Khemani
>Assignee: Prakash Khemani
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-4007-distributed-log-splitting-can-get-indefin.patch
>
>
> After the configured number of retries SplitLogManager is not going to 
> resubmit log-split tasks. In this situation even if the splitLogWorker that 
> owns the task dies the task will not get resubmitted.
> When a regionserver goes away then all the split-log tasks that it owned 
> should be resubmitted by the SplitLogMaster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4331) Bypassing default actions in prePut fails sometimes with HTable client

2011-09-08 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100749#comment-13100749
 ] 

Gary Helmling commented on HBASE-4331:
--

Lars,

Looks good.  My only comment would be to rename the new TestRegionObserver 
class to something like TestRegionObserverBypass, since that's what it's 
actually testing and we already have TestRegionObserverInterface.

> Bypassing default actions in prePut fails sometimes with HTable client
> --
>
> Key: HBASE-4331
> URL: https://issues.apache.org/jira/browse/HBASE-4331
> Project: HBase
>  Issue Type: Bug
>  Components: coprocessors
>Affects Versions: 0.92.0
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.92.0
>
> Attachments: 4331-v2.txt, 4331-v3.txt, 4331.txt
>
>
> While testing some other scenario I found calling 
> CoprocessorEnvironment.bypass() fails if all trailing puts in a batch are 
> bypassed that way. By extension a single bypassed put will also fail.
> The problem is that the puts are removed from the batch in a way that does 
> not align them with the result-status, and in addition the result is never 
> marked as success.
> A possible fix is to just mark bypassed puts as SUCCESS and filter them in 
> the following logic.
> (I also contemplated a new BYPASSED OperationStatusCode, but that turned out 
> to be not necessary).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2195) Support cyclic replication

2011-09-08 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100750#comment-13100750
 ] 

Lars Hofhansl commented on HBASE-2195:
--

Thanks for the thorough review Ted. And thanks to Stack and J-D for your review.

> Support cyclic replication
> --
>
> Key: HBASE-2195
> URL: https://issues.apache.org/jira/browse/HBASE-2195
> Project: HBase
>  Issue Type: Sub-task
>  Components: replication
>Reporter: Jean-Daniel Cryans
>Assignee: Lars Hofhansl
> Attachments: 2195-v10.txt, 2195-v12.txt, 2195-v13.txt, 2195-v14.txt, 
> 2195-v5.txt, 2195-v6.txt, 2195.txt
>
>
> We need to support cyclic replication by using the cluster id of each HlogKey 
> and stop replicating when it goes back to the original cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2195) Support cyclic replication

2011-09-08 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100755#comment-13100755
 ] 

Lars Hofhansl commented on HBASE-2195:
--

Where should this be documented?

> Support cyclic replication
> --
>
> Key: HBASE-2195
> URL: https://issues.apache.org/jira/browse/HBASE-2195
> Project: HBase
>  Issue Type: Sub-task
>  Components: replication
>Reporter: Jean-Daniel Cryans
>Assignee: Lars Hofhansl
> Attachments: 2195-v10.txt, 2195-v12.txt, 2195-v13.txt, 2195-v14.txt, 
> 2195-v5.txt, 2195-v6.txt, 2195.txt
>
>
> We need to support cyclic replication by using the cluster id of each HlogKey 
> and stop replicating when it goes back to the original cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2195) Support cyclic replication

2011-09-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100758#comment-13100758
 ] 

Ted Yu commented on HBASE-2195:
---

http://hbase.apache.org/replication.html

> Support cyclic replication
> --
>
> Key: HBASE-2195
> URL: https://issues.apache.org/jira/browse/HBASE-2195
> Project: HBase
>  Issue Type: Sub-task
>  Components: replication
>Reporter: Jean-Daniel Cryans
>Assignee: Lars Hofhansl
> Attachments: 2195-v10.txt, 2195-v12.txt, 2195-v13.txt, 2195-v14.txt, 
> 2195-v5.txt, 2195-v6.txt, 2195.txt
>
>
> We need to support cyclic replication by using the cluster id of each HlogKey 
> and stop replicating when it goes back to the original cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4330) Fix races in slab cache

2011-09-08 Thread Li Pi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Pi updated HBASE-4330:
-

Attachment: hbase-4330v3.txt

> Fix races in slab cache
> ---
>
> Key: HBASE-4330
> URL: https://issues.apache.org/jira/browse/HBASE-4330
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Assignee: Li Pi
> Fix For: 0.92.0
>
> Attachments: hbase-4330.txt, hbase-4330.txt, hbase-4330v3.txt
>
>
> A few races are still lingering in the slab cache. Here are some tests and 
> proposed fixes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4330) Fix races in slab cache

2011-09-08 Thread Li Pi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100765#comment-13100765
 ] 

Li Pi commented on HBASE-4330:
--

Done.

> Fix races in slab cache
> ---
>
> Key: HBASE-4330
> URL: https://issues.apache.org/jira/browse/HBASE-4330
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Assignee: Li Pi
> Fix For: 0.92.0
>
> Attachments: hbase-4330.txt, hbase-4330.txt, hbase-4330v3.txt
>
>
> A few races are still lingering in the slab cache. Here are some tests and 
> proposed fixes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4330) Fix races in slab cache

2011-09-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100769#comment-13100769
 ] 

Ted Yu commented on HBASE-4330:
---

Patch v3 contains way too many changes.
Can you rebase and produce a cleaner patch ?

Thanks

> Fix races in slab cache
> ---
>
> Key: HBASE-4330
> URL: https://issues.apache.org/jira/browse/HBASE-4330
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Assignee: Li Pi
> Fix For: 0.92.0
>
> Attachments: hbase-4330.txt, hbase-4330.txt, hbase-4330v3.txt
>
>
> A few races are still lingering in the slab cache. Here are some tests and 
> proposed fixes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4356) Wide rows can cause OOME when compacting

2011-09-08 Thread Nate Putnam (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100772#comment-13100772
 ] 

Nate Putnam commented on HBASE-4356:


Yep. Looks like the same issue. I'll close this one. 

> Wide rows can cause OOME when compacting
> 
>
> Key: HBASE-4356
> URL: https://issues.apache.org/jira/browse/HBASE-4356
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: Nate Putnam
>
> The scanner used for compaction doesn't limit the number of columns retrieved 
> when doing a compaction. If a row exists with tens of millions of rows it can 
> fill all available memory and crash the region server. 
> It would be better if the scanner could page through the columns of wide rows 
> when performing a compaction. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-4356) Wide rows can cause OOME when compacting

2011-09-08 Thread Nate Putnam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nate Putnam resolved HBASE-4356.


Resolution: Duplicate

Dup of HBAE-3421

> Wide rows can cause OOME when compacting
> 
>
> Key: HBASE-4356
> URL: https://issues.apache.org/jira/browse/HBASE-4356
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: Nate Putnam
>
> The scanner used for compaction doesn't limit the number of columns retrieved 
> when doing a compaction. If a row exists with tens of millions of rows it can 
> fill all available memory and crash the region server. 
> It would be better if the scanner could page through the columns of wide rows 
> when performing a compaction. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4330) Fix races in slab cache

2011-09-08 Thread Li Pi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100773#comment-13100773
 ] 

Li Pi commented on HBASE-4330:
--

Woah, not sure what happened there. Fixing.

> Fix races in slab cache
> ---
>
> Key: HBASE-4330
> URL: https://issues.apache.org/jira/browse/HBASE-4330
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Assignee: Li Pi
> Fix For: 0.92.0
>
> Attachments: hbase-4330.txt, hbase-4330.txt, hbase-4330v3.txt
>
>
> A few races are still lingering in the slab cache. Here are some tests and 
> proposed fixes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4330) Fix races in slab cache

2011-09-08 Thread Li Pi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Pi updated HBASE-4330:
-

Attachment: hbase-4330v4.txt

rebased.

> Fix races in slab cache
> ---
>
> Key: HBASE-4330
> URL: https://issues.apache.org/jira/browse/HBASE-4330
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Assignee: Li Pi
> Fix For: 0.92.0
>
> Attachments: hbase-4330.txt, hbase-4330.txt, hbase-4330v3.txt, 
> hbase-4330v4.txt
>
>
> A few races are still lingering in the slab cache. Here are some tests and 
> proposed fixes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4355) TestHTablePool failure

2011-09-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100780#comment-13100780
 ] 

Ted Yu commented on HBASE-4355:
---

I wasn't able to reproduce on my laptop:
{code}
Darwin tyu.ciq.com 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun  7 16:33:36 PDT 
2011; root:xnu-1504.15.3~1/RELEASE_I386 i386 i386
{code}

> TestHTablePool failure
> --
>
> Key: HBASE-4355
> URL: https://issues.apache.org/jira/browse/HBASE-4355
> Project: HBase
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
>
> This unit test has been failing on my machine with the following error.
> testTableWithStringName(org.apache.hadoop.hbase.client.TestHTablePool$TestHTableThreadLocalPool):
>  Cluster already running at 
> /hbase-core-trunk/target/test-data/2e41efb9-7b96-4ab3-abec-c58f467b220c/af01017e-ee3c-46fc-b908-078a3a4e8b52/bfd8e9b4-66da-4322-96bd-6db4564d8f41/d9a97e3d-8ffb-4945-a71e-d059e3bc7274/6cdf0b73-b9a0-45f4-856d-53cd02ecebce/34c41612-9311-4199-9902-cf30a9cb7b9d/33e7bfd5-2519-4349-9a44-d05000e00526/dbc60fd9-756d-4263-9ed1-bbff69ec7a80/0e1bde7e-c966-4c3e-a01c-50ded9cb166b/415e8d51-46f2-4d50-879a-870298a9e1f8/fb165bb9-7d6c-4cf8-970a-e281b9818e97
> It looks like TestHTablePool uses nested classes TestHTableReusablePool and 
> TestHTableThreadLocalPool. Both classes could be instantiated by junit 
> framework in mulitple threads fashion. Both classes call 
> HBaseTestingUtilility.startMiniCluster. 
> HBaseTestingUtilility.isRunningCluster throws this exception.
> Is the understanding about junit framework correctly? I don't know why others 
> haven't got such error.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4331) Bypassing default actions in prePut fails sometimes with HTable client

2011-09-08 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100783#comment-13100783
 ] 

Lars Hofhansl commented on HBASE-4331:
--

Thanks Gary. I will attach a new patch in a few minutes. Would you prefer a 
review on review board?

> Bypassing default actions in prePut fails sometimes with HTable client
> --
>
> Key: HBASE-4331
> URL: https://issues.apache.org/jira/browse/HBASE-4331
> Project: HBase
>  Issue Type: Bug
>  Components: coprocessors
>Affects Versions: 0.92.0
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.92.0
>
> Attachments: 4331-v2.txt, 4331-v3.txt, 4331.txt
>
>
> While testing some other scenario I found calling 
> CoprocessorEnvironment.bypass() fails if all trailing puts in a batch are 
> bypassed that way. By extension a single bypassed put will also fail.
> The problem is that the puts are removed from the batch in a way that does 
> not align them with the result-status, and in addition the result is never 
> marked as success.
> A possible fix is to just mark bypassed puts as SUCCESS and filter them in 
> the following logic.
> (I also contemplated a new BYPASSED OperationStatusCode, but that turned out 
> to be not necessary).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4331) Bypassing default actions in prePut fails sometimes with HTable client

2011-09-08 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100784#comment-13100784
 ] 

Gary Helmling commented on HBASE-4331:
--

No, another patch here is fine.  I'm +1 with that change, assuming tests pass.

> Bypassing default actions in prePut fails sometimes with HTable client
> --
>
> Key: HBASE-4331
> URL: https://issues.apache.org/jira/browse/HBASE-4331
> Project: HBase
>  Issue Type: Bug
>  Components: coprocessors
>Affects Versions: 0.92.0
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.92.0
>
> Attachments: 4331-v2.txt, 4331-v3.txt, 4331.txt
>
>
> While testing some other scenario I found calling 
> CoprocessorEnvironment.bypass() fails if all trailing puts in a batch are 
> bypassed that way. By extension a single bypassed put will also fail.
> The problem is that the puts are removed from the batch in a way that does 
> not align them with the result-status, and in addition the result is never 
> marked as success.
> A possible fix is to just mark bypassed puts as SUCCESS and filter them in 
> the following logic.
> (I also contemplated a new BYPASSED OperationStatusCode, but that turned out 
> to be not necessary).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4330) Fix races in slab cache

2011-09-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100789#comment-13100789
 ] 

Ted Yu commented on HBASE-4330:
---

{code}
Running org.apache.hadoop.hbase.io.hfile.slab.TestSlabCache

Results :

Tests run: 0, Failures: 0, Errors: 0, Skipped: 0

[INFO] 
[ERROR] BUILD ERROR
[INFO] 
[INFO] Failure or timeout
[INFO] 
[INFO] For more information, run Maven with the -e switch
[INFO] 
[INFO] Total time: 15 minutes 5 seconds
{code}
Here is the jstack: http://pastebin.com/vDCBMyrq
Here is the OS:
{code}
Linux us01.ciq.com 2.6.18-194.32.1.el5 #1 SMP Wed Jan 5 17:52:25 EST 2011 
x86_64 x86_64 x86_64 GNU/Linux
{code}

> Fix races in slab cache
> ---
>
> Key: HBASE-4330
> URL: https://issues.apache.org/jira/browse/HBASE-4330
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Assignee: Li Pi
> Fix For: 0.92.0
>
> Attachments: hbase-4330.txt, hbase-4330.txt, hbase-4330v3.txt, 
> hbase-4330v4.txt
>
>
> A few races are still lingering in the slab cache. Here are some tests and 
> proposed fixes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4014) Coprocessors: Flag the presence of coprocessors in logged exceptions

2011-09-08 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100798#comment-13100798
 ] 

jirapos...@reviews.apache.org commented on HBASE-4014:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/969/#review1805
---


Thanks, Eugene.  Almost there I think!  Just a couple comments on the tests.


src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java


This should default to false.



src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorException.java


Does this need to be a separate thread?  Can the contents of the run() 
method just be inline in testExceptionFromCoprocessorWhenCreatingTable()?



src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorException.java


Do we need this test?  If we're already doing the same tests in 
TestMasterObserver, it doesn't seem like it.  Has anything been added to this 
method that we need?



src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorException.java


Name should be something like testExceptionDuringPut?


- Gary


On 2011-09-06 19:08:59, Eugene Koontz wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/969/
bq.  ---
bq.  
bq.  (Updated 2011-09-06 19:08:59)
bq.  
bq.  
bq.  Review request for hbase, Gary Helmling and Mingjie Lai.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  https://issues.apache.org/jira/browse/HBASE-4014 Coprocessors: Flag the 
presence of coprocessors in logged exceptions
bq.  
bq.  The general gist here is to wrap each of 
{Master,RegionServer}CoprocessorHost's coprocessor call inside a 
bq.  
bq.  "try { ... } catch (Throwable e) { handleCoprocessorThrowable(e) }"
bq.  
bq.  block. 
bq.  
bq.  handleCoprocessorThrowable() is responsible for either passing 'e' along 
to the client (if 'e' is an IOException) or, otherwise, aborting the service 
(Regionserver or Master).
bq.  
bq.  The abort message contains a list of the loaded coprocessors for crash 
analysis.
bq.  
bq.  
bq.  This addresses bug HBASE-4014.
bq.  https://issues.apache.org/jira/browse/HBASE-4014
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java 
4e492e1 
bq.src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3f60653 
bq.src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java 
aa930f5 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
8ff6e62 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java 
5796413 
bq.src/main/resources/hbase-default.xml 2c8f44b 
bq.
src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorException.java
 PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorException.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/969/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  patch includes two tests:
bq.  
bq.  TestMasterCoprocessorException.java
bq.  TestRegionServerCoprocessorException.java
bq.  
bq.  both tests pass in my build environment.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Eugene
bq.  
bq.



> Coprocessors: Flag the presence of coprocessors in logged exceptions
> 
>
> Key: HBASE-4014
> URL: https://issues.apache.org/jira/browse/HBASE-4014
> Project: HBase
>  Issue Type: Improvement
>  Components: coprocessors
>Reporter: Andrew Purtell
>Assignee: Eugene Koontz
> Fix For: 0.92.0
>
> Attachments: HBASE-4014.patch, HBASE-4014.patch, HBASE-4014.patch, 
> HBASE-4014.patch, HBASE-4014.patch
>
>
> For some initial triage of bug reports for core versus for deployments with 
> loaded coprocessors, we need something like the Linux kernel's taint flag, 
> and list of linked in modules that show up in the output of every OOPS, to 
> appear above or below exceptions that appear in the logs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4331) Bypassing default actions in prePut fails sometimes with HTable client

2011-09-08 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100806#comment-13100806
 ] 

Lars Hofhansl commented on HBASE-4331:
--

I sync'd the latest trunk and now there seems to be additional checking on 
column families (a put with non-existent CF fails before it even gets to the 
coprocessor, I assume that's desired).

But also with the latest changes my own test fails now.
So it'll be a bit until I track that down.


> Bypassing default actions in prePut fails sometimes with HTable client
> --
>
> Key: HBASE-4331
> URL: https://issues.apache.org/jira/browse/HBASE-4331
> Project: HBase
>  Issue Type: Bug
>  Components: coprocessors
>Affects Versions: 0.92.0
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.92.0
>
> Attachments: 4331-v2.txt, 4331-v3.txt, 4331.txt
>
>
> While testing some other scenario I found calling 
> CoprocessorEnvironment.bypass() fails if all trailing puts in a batch are 
> bypassed that way. By extension a single bypassed put will also fail.
> The problem is that the puts are removed from the batch in a way that does 
> not align them with the result-status, and in addition the result is never 
> marked as success.
> A possible fix is to just mark bypassed puts as SUCCESS and filter them in 
> the following logic.
> (I also contemplated a new BYPASSED OperationStatusCode, but that turned out 
> to be not necessary).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4358) Batch Table Alter Operations

2011-09-08 Thread Riley Patterson (JIRA)
Batch Table Alter Operations


 Key: HBASE-4358
 URL: https://issues.apache.org/jira/browse/HBASE-4358
 Project: HBase
  Issue Type: Improvement
  Components: ipc, master, shell
Affects Versions: 0.92.0
Reporter: Riley Patterson
Assignee: Riley Patterson
Priority: Minor


Currently, the RPC provides no way of asking for several table alterations at 
once, and the master has no way of batch handling alter requests. Thus, when 
the user requests several changes at the same time (i.e. add these I columns, 
delete these J columns, and modify these K columns), each region is brought 
down (I+J+K) times so that it can reflect the new schema. Additionally, 
multiple writes are made to META, and multiple RPC calls must be made.

This patch provides batching for these operations, both at the RPC level and 
within the Master's TableEventHandlers. This involves a bit of reorganization 
in the TableEventHandler class hierarchy, and a new TableEventHandler, 
TableMultiFamilyHandler. The net effect ends up being the difference seen here:

Before patch:
hbase(main):001:0> alter 'peeps', {NAME => 'rawr'}, {METHOD => 'delete', NAME 
=> 'name'}
Updating all regions with the new schema...
1/1 regions updated.
Done.
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 2.6450 seconds

After patch:
hbase(main):002:0> alter 'peeps', {NAME => 'rawr'}, {METHOD => 'delete', NAME 
=> 'name'}
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.1930 seconds

Regions are only brought down once, and the duration is cut 1/N.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >