date:20120713


[ 
https://issues.apache.org/jira/browse/HBASE-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413535#comment-13413535
 ] 

stack commented on HBASE-4050:
--

I suppose you don't need to set test size annotation on below because 
annotations are not a dependency when this is built:

{code}
+public class ReplicationMetricsSourceFactoryTest {
{code}

Does BaseMetricsSource not implement MetricsSource?

{code}
+public class BaseMetricsSourceImpl implements BaseMetricsSource, MetricsSource 
{
{code}

These need to be this accessible:

{code}
+  public ConcurrentMapString, MetricMutableGaugeLong
+  gauges = new ConcurrentHashMapString, MetricMutableGaugeLong();
+  public ConcurrentMapString, MetricMutableCounterLong counters =
+  new ConcurrentHashMapString, MetricMutableCounterLong();
+
+  protected String metricsContext;
+  protected String metricsName;
+  protected String metricsDescription;
{code}

(I see above twice)

The stuff below where we have a static boolean and in constructor we test 
something already created could be a PITA in minihbase setups?  Does it have to 
be static?  Aren't we slinging singletons here anyways?  (The singletons are ok 
in minihbasecontext too?):

{code}
+if (!hasInited) {
+  //Not too worried about mutli-threaded here as all it does is spam the 
logs.
+  hasInited = true;
+  DefaultMetricsSystem.initialize(HBASE_METRICS_SYSTEM_NAME);
+}
{code}

'hasInited' is name of a method that tests 'inited' variable... suggest 
changing its name.

What about that jmx mess registering metrics in tests?  The exception saying 
metrics already registered because we have more than one daemon in the one jvm. 
 We still have that issue here?

You wanted to complete this: +/** BaseClass for */

Another class has no class comments though has the comment delimiters.

Do we have to have metrics2 package?  Can this new stuff be in the metrics 
package?

I thought I saw a patch where you'd renamed the properties file to what LarsG 
suggested?

You seem to have made it so we do not need to have a metrics2 in hbase... thats 
great... but in the properties file I see:

{code}
+# See package.html for org.apache.hadoop.metrics2 for details
+
+*.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink
{code}

Is that just old stuff?

Good stuff Elliott.  I'd be up for committing this and then doing other stuff 
in other issues.



 Update HBase metrics framework to metrics2 framework
 

 Key: HBASE-4050
 URL: https://issues.apache.org/jira/browse/HBASE-4050
 Project: HBase
  Issue Type: New Feature
  Components: metrics
Affects Versions: 0.90.4
 Environment: Java 6
Reporter: Eric Yang
Assignee: Alex Baranau
Priority: Critical
 Fix For: 0.96.0

 Attachments: 4050-metrics-v2.patch, 4050-metrics-v3.patch, 
 HBASE-4050-0.patch, HBASE-4050-1.patch, HBASE-4050-2.patch, 
 HBASE-4050-3.patch, HBASE-4050-5.patch, HBASE-4050-6.patch, 
 HBASE-4050-7.patch, HBASE-4050.patch


 Metrics Framework has been marked deprecated in Hadoop 0.20.203+ and 0.22+, 
 and it might get removed in future Hadoop release.  Hence, HBase needs to 
 revise the dependency of MetricsContext to use Metrics2 framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6380) bulkload should update the store.storeSize


 [ 
https://issues.apache.org/jira/browse/HBASE-6380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6380:
-

Status: Patch Available  (was: Open)

 bulkload should update the store.storeSize
 --

 Key: HBASE-6380
 URL: https://issues.apache.org/jira/browse/HBASE-6380
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0, 0.96.0
Reporter: Jie Huang
Assignee: Jie Huang
Priority: Critical
 Attachments: 6380-trunk.txt, 6380-trunk.txt, hbase-6380_0_94_0.patch


 After bulkloading some HFiles into the Table, we found the force-split didn't 
 work because of the MidKey == NULL. Only if we re-booted the HBase service, 
 the force-split can work normally. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6380) bulkload should update the store.storeSize


 [ 
https://issues.apache.org/jira/browse/HBASE-6380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6380:
-

Status: Open  (was: Patch Available)

 bulkload should update the store.storeSize
 --

 Key: HBASE-6380
 URL: https://issues.apache.org/jira/browse/HBASE-6380
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0, 0.96.0
Reporter: Jie Huang
Assignee: Jie Huang
Priority: Critical
 Attachments: 6380-trunk.txt, 6380-trunk.txt, hbase-6380_0_94_0.patch


 After bulkloading some HFiles into the Table, we found the force-split didn't 
 work because of the MidKey == NULL. Only if we re-booted the HBase service, 
 the force-split can work normally. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6380) bulkload should update the store.storeSize


 [ 
https://issues.apache.org/jira/browse/HBASE-6380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6380:
-

Attachment: 6380-trunk.txt

Retry

 bulkload should update the store.storeSize
 --

 Key: HBASE-6380
 URL: https://issues.apache.org/jira/browse/HBASE-6380
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0, 0.96.0
Reporter: Jie Huang
Assignee: Jie Huang
Priority: Critical
 Attachments: 6380-trunk.txt, 6380-trunk.txt, hbase-6380_0_94_0.patch


 After bulkloading some HFiles into the Table, we found the force-split didn't 
 work because of the MidKey == NULL. Only if we re-booted the HBase service, 
 the force-split can work normally. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6370) Add compression codec test at HMaster when createTable/modifyColumn/modifyTable


 [ 
https://issues.apache.org/jira/browse/HBASE-6370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6370:
-

   Resolution: Fixed
Fix Version/s: 0.96.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed to trunk.  Thanks for the patch ShiXing.

 Add compression codec test at HMaster when 
 createTable/modifyColumn/modifyTable
 ---

 Key: HBASE-6370
 URL: https://issues.apache.org/jira/browse/HBASE-6370
 Project: HBase
  Issue Type: Improvement
Reporter: ShiXing
Assignee: ShiXing
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6370v3.txt, HBASE-6370-trunk-V1.patch, 
 HBASE-6370-trunk-V2.patch, runAllTests.out


 We deployed a cluster that none of the regionserver supports the compression 
 codec such like lzo, but the cluster user/client does not know this, and he 
 specifies the family's compression codec by 
 HColumnDescripto.setCompressionType(Compresson.Algorithm.LZO);
 Because the HBaseAdmin's createTable is async, so the client is waiting all 
 the regions of the table to be online forever. And client does not know why 
 the regions are not online until the HBase administrator find this problem.
 In deed, all of the regions are assigning by master, but regionserver's 
 openHRegion always failed.
 In my option, we can suppose all the cluster's enviroment are the same, means 
 if the master is deployed some lib, the regionserver should also be deployed. 
 Of course above is just a suppose， in real deployment, the hbase dba may just 
 deploy lib on regionserver or master.
 So I think this failure can be found earlier before master create the 
 CreateTableHandler thread, and we can tell client quickly we didn't support 
 this compression codec type.
 I will upload the patch later.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6338) Cache Method in RPC handler

2012-07-13 Thread binlijin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-6338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HBASE-6338:


Attachment: HBASE-6338-94-2.patch
HBASE-6338-92-2.patch
HBASE-6338-90-2.patch

 Cache Method in RPC handler
 ---

 Key: HBASE-6338
 URL: https://issues.apache.org/jira/browse/HBASE-6338
 Project: HBase
  Issue Type: Improvement
Reporter: binlijin
 Attachments: HBASE-6338-90-2.patch, HBASE-6338-90.patch, 
 HBASE-6338-92-2.patch, HBASE-6338-92.patch, HBASE-6338-94-2.patch, 
 HBASE-6338-94.patch, HBASE-6338-trunk-2.patch, HBASE-6338-trunk.patch


 Every call in rpc handler a Method will be created, if we cache the method 
 will improve a little.
 I test with 0.90, Average Class.getMethod(String name, Class... 
 parameterTypes) cost 4780 ns , if we cache it cost 2620 ns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6338) Cache Method in RPC handler

2012-07-13 Thread binlijin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-6338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HBASE-6338:


Attachment: HBASE-6338-trunk-2.patch

 Cache Method in RPC handler
 ---

 Key: HBASE-6338
 URL: https://issues.apache.org/jira/browse/HBASE-6338
 Project: HBase
  Issue Type: Improvement
Reporter: binlijin
 Attachments: HBASE-6338-90-2.patch, HBASE-6338-90.patch, 
 HBASE-6338-92-2.patch, HBASE-6338-92.patch, HBASE-6338-94-2.patch, 
 HBASE-6338-94.patch, HBASE-6338-trunk-2.patch, HBASE-6338-trunk.patch


 Every call in rpc handler a Method will be created, if we cache the method 
 will improve a little.
 I test with 0.90, Average Class.getMethod(String name, Class... 
 parameterTypes) cost 4780 ns , if we cache it cost 2620 ns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6387) Cache DNS lookups in HServerAddress


[ 
https://issues.apache.org/jira/browse/HBASE-6387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413552#comment-13413552
 ] 

stack commented on HBASE-6387:
--

HServerAddress is deprecated in trunk, replaced. On deserialization was doing a 
dns lookup.  So this is 0.89fb only Mikhail?

 Cache DNS lookups in HServerAddress
 ---

 Key: HBASE-6387
 URL: https://issues.apache.org/jira/browse/HBASE-6387
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin

 We have noticed that we rely on DNS lookups in some critical paths by using 
 HServerAddress, and Java only seems to be caching DNS data for 30 seconds by 
 default. Also, if DNS is down, Java's negative cache of DNS will ensure that 
 many successive attempts fail. However, we cannot just increase 
 networkaddress.cache.ttl to a large value, because e.g. namenode failover may 
 require resolving the same DNS name differently. Therefore I propose that we 
 add a DNS lookup cache in HServerAddress.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6391) Master restart when enabling table will lead to region assignned twice

2012-07-13 Thread zhou wenjian (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413550#comment-13413550
 ] 

zhou wenjian commented on HBASE-6391:
-

that is different, i think

 Master restart when enabling table will lead to region assignned twice
 --

 Key: HBASE-6391
 URL: https://issues.apache.org/jira/browse/HBASE-6391
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0
Reporter: zhou wenjian
 Fix For: 0.94.1


 The Scenario can be reproduce below.
 Enabling an table, some region is online on regionserver,some are still being 
 processed.
 And restart the master.
 when master failover:
 // Region is being served and on an active server
 // add only if region not in disabled and enabling table
 if (false == checkIfRegionBelongsToDisabled(regionInfo)
  false == checkIfRegionsBelongsToEnabling(regionInfo)) {
   regions.put(regionInfo, regionLocation);
   addToServers(regionLocation, regionInfo);
 }
 the opened region will not add to the Regions in master.
 and in the following recoverTableInEnablingState,the region will be assigned 
 again.
 that will lead to the cluster inconsistent

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6391) Master restart when enabling table will lead to region assignned twice


[ 
https://issues.apache.org/jira/browse/HBASE-6391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413545#comment-13413545
 ] 

stack commented on HBASE-6391:
--

Is this the same as HBASE-6317 Master clean start up and Partially enabled 
tables make region assignment inconsistent.?

 Master restart when enabling table will lead to region assignned twice
 --

 Key: HBASE-6391
 URL: https://issues.apache.org/jira/browse/HBASE-6391
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0
Reporter: zhou wenjian
 Fix For: 0.94.1


 The Scenario can be reproduce below.
 Enabling an table, some region is online on regionserver,some are still being 
 processed.
 And restart the master.
 when master failover:
 // Region is being served and on an active server
 // add only if region not in disabled and enabling table
 if (false == checkIfRegionBelongsToDisabled(regionInfo)
  false == checkIfRegionsBelongsToEnabling(regionInfo)) {
   regions.put(regionInfo, regionLocation);
   addToServers(regionLocation, regionInfo);
 }
 the opened region will not add to the Regions in master.
 and in the following recoverTableInEnablingState,the region will be assigned 
 again.
 that will lead to the cluster inconsistent

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6272) In-memory region state is inconsistent


[ 
https://issues.apache.org/jira/browse/HBASE-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413558#comment-13413558
 ] 

stack commented on HBASE-6272:
--

High level Jimmy, how should we proceed with this patch?  If we apply it, I 
think it means that any fixes on stuff like hbase-6060 will be for trunk only; 
they won't be backportable, at least not w/o a bunch of work.  Maybe thats 
fine.  Raising the question.

 In-memory region state is inconsistent
 --

 Key: HBASE-6272
 URL: https://issues.apache.org/jira/browse/HBASE-6272
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang

 AssignmentManger stores region state related information in several places: 
 regionsInTransition, regions (region info to server name map), and servers 
 (server name to region info set map).  However the access to these places is 
 not coordinated properly.  It leads to inconsistent in-memory region state 
 information.  Sometimes, some region could even be offline, and not in 
 transition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-13 Thread nkeywal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413559#comment-13413559
 ] 

nkeywal commented on HBASE-6389:


We could remove the timeout? That would make things a little simpler.
Or we could keep it as an error case, and throw an exception if the timeout is 
reached. The intend would be to stop the master.

 Modify the conditions to ensure that Master waits for sufficient number of 
 Region Servers before starting region assignments
 

 Key: HBASE-6389
 URL: https://issues.apache.org/jira/browse/HBASE-6389
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.96.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-6389_trunk.patch


 Continuing from HBASE-6375.
 It seems I was mistaken in my assumption that changing the value of 
 hbase.master.wait.on.regionservers.mintostart to a sufficient number (from 
 default of 1) can help prevent assignment of all regions to one (or a small 
 number of) region server(s).
 While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
 0.94.0 onwards to address HBASE-4993.
 From 0.94.0 onwards, Master will proceed immediately after the timeout has 
 lapsed, even if hbase.master.wait.on.regionservers.mintostart has not 
 reached.
 Reading the current conditions of waitForRegionServers() clarifies it
 {code:title=ServerManager.java (trunk rev:1360470)}
 
 581 /**
 582  * Wait for the region servers to report in.
 583  * We will wait until one of this condition is met:
 584  *  - the master is stopped
 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
 587  *region servers is reached
 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
 AND
 589  *   there have been no new region server in for
 590  *  'hbase.master.wait.on.regionservers.interval' time
 591  *
 592  * @throws InterruptedException
 593  */
 594 public void waitForRegionServers(MonitoredTask status)
 595 throws InterruptedException {
 
 
 612   while (
 613 !this.master.isStopped() 
 614   slept  timeout 
 615   count  maxToStart 
 616   (lastCountChange+interval  now || count  minToStart)
 617 ){
 
 {code}
 So with the current conditions, the wait will end as soon as timeout is 
 reached even lesser number of RS have checked-in with the Master and the 
 master will proceed with the region assignment among these RSes alone.
 As mentioned in 
 -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
  and I concur, this could have disastrous effect in large cluster especially 
 now that MSLAB is turned on.
 To enforce the required quorum as specified by 
 hbase.master.wait.on.regionservers.mintostart irrespective of timeout, 
 these conditions need to be modified as following
 {code:title=ServerManager.java}
 ..
   /**
* Wait for the region servers to report in.
* We will wait until one of this condition is met:
*  - the master is stopped
*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
*region servers is reached
*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
*   there have been no new region server in for
*  'hbase.master.wait.on.regionservers.interval' time AND
*   the 'hbase.master.wait.on.regionservers.timeout' is reached
*
* @throws InterruptedException
*/
   public void waitForRegionServers(MonitoredTask status)
 ..
 ..
 int minToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.mintostart, 1);
 int maxToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.maxtostart, 
 Integer.MAX_VALUE);
 if (maxToStart  minToStart) {
   maxToStart = minToStart;
 }
 ..
 ..
 while (
   !this.master.isStopped() 
 count  maxToStart 
 (lastCountChange+interval  now || timeout  slept || count  
 minToStart)
   ){
 ..
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4050) Update HBase metrics framework to metrics2 framework

[
https://issues.apache.org/jira/browse/HBASE-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413569#comment-13413569
]

Elliott Clark commented on HBASE-4050:
--

bq.I suppose you don't need to set test size annotation on below because
annotations are not a dependency when this is built:

Correct. The hbase-hadoop-compat module has no hadoop dependency. In addition
hbase-hadoop1-compat and hbase-hadoop2-compat currently only have unit tests,
so they have the second test pass completely turned off.

bq.Does BaseMetricsSource not implement MetricsSource?
It does. I guess it's just a little too explicit. I'll fix it in the patch
first thing tomorrow morning.

bq.These need to be this accessible:
Kind of but not 100%; I'm open to either way. In hadoop 1 metrics are pretty
hard to test. Opening the maps up will make testing any classes that extend
MetricsBaseSourceImpl easier. Those classes that add functionality will need
those maps to be public for testing. However with that said this patch doesn't
have those classes in it, so if you prefer I could make them protected and
change that when needed.

bq.The stuff below where we have a static boolean and in constructor we test
something already created could be a PITA in minihbase setups? Does it have to
be static? Aren't we slinging singletons here anyways? (The singletons are ok
in minihbasecontext too?):

We are currently slinging a singleton. However when we add in more than just
replication metrics we'll have more than one BaseMetricsSourceImpl. The
DefaultMetricsSystem.initialize call can be done multiple times as long as it's
inited with the same string, however it complains quite loudly in logs.

bq.'hasInited' is name of a method that tests 'inited' variable... suggest
changing its name.
Sure. Something like defaultMetricsInited

bq.What about that jmx mess registering metrics in tests? The exception saying
metrics already registered because we have more than one daemon in the one jvm.
We still have that issue here?

We'll still have that. A little bit less spam but not completely gone.
Basically when all metrics are moved to metrics2 we'll see 4 or 5 log messages
(one per dupe of ReplicationMeticsSource et al.) rather than the massive
ammount we see now.
Maybe on test we should silience the junit messages from those classes ?
Probably a good issue to file for the metrics clean up.

bq.Do we have to have metrics2 package? Can this new stuff be in the metrics
package?
Nope. Earlier you were asking to remove it. So everything is in the metrics
namespace. That should make things a little nicer if we go the DI route,
that's being discussed on the mailing list, and someone wants to go back to the
old hadoop metrics.

bq.I thought I saw a patch where you'd renamed the properties file to what
LarsG suggested?
Nope just replied that we could. That file needs some examples and other love
(ganglia examples and examples for regionserver/rest). Seems like a good issue
for me to file after this.

I'll clean up the two javadocs tomorrow morning.

Update HBase metrics framework to metrics2 framework

Key: HBASE-4050
URL: https://issues.apache.org/jira/browse/HBASE-4050
Project: HBase
Issue Type: New Feature
Components: metrics
Affects Versions: 0.90.4
Environment: Java 6
Reporter: Eric Yang
Assignee: Alex Baranau
Priority: Critical
Fix For: 0.96.0

Attachments: 4050-metrics-v2.patch, 4050-metrics-v3.patch,
HBASE-4050-0.patch, HBASE-4050-1.patch, HBASE-4050-2.patch,
HBASE-4050-3.patch, HBASE-4050-5.patch, HBASE-4050-6.patch,
HBASE-4050-7.patch, HBASE-4050.patch

Metrics Framework has been marked deprecated in Hadoop 0.20.203+ and 0.22+,
and it might get removed in future Hadoop release. Hence, HBase needs to
revise the dependency of MetricsContext to use Metrics2 framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6391) Master restart when enabling table will lead to region assignned twice

2012-07-13 Thread zhou wenjian (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413571#comment-13413571
 ] 

zhou wenjian commented on HBASE-6391:
-

in HBASE-6317 
rajeshbabu  comments
As per the current code two scenarios may cause assignment incosistent.
1)in EnableTableHandler we dont assign regions if they are present in regions 
map.
final ListHRegionInfo onlineRegions 
=this.assignmentManager.getRegionsOfTable(tableName);
regionsInMeta.removeAll(onlineRegions);
But in case of enabling table regions during master start up we are not adding 
them to regions map in rebuldUseRegions even the regions in/transition to 
onlineServers.
if (false == checkIfRegionBelongsToDisabled(regionInfo)  false == 
checkIfRegionsBelongsToEnabling(regionInfo)) {
  synchronized (this.regions) {
regions.put(regionInfo, regionLocation);
addToServers(regionLocation, regionInfo);
  }
}
So we will call assign to all the regions even they are in transition/already 
assigned to online servers which may cause double assignment.
2) If all the tables are in ENABLING we may consider as clean cluster 
startup(because regions map is empty) and again call assignment for all the 
regions.(Which may again cause double assignment)


if we romove the check for RegionsBelongsToEnabling, the first scenario will 
not happen again.
and for the other scenario we just need to worry about only one case.
that is ,all tables are enabling ,and none of the regions' location are 
registered in the meta.


 Master restart when enabling table will lead to region assignned twice
 --

 Key: HBASE-6391
 URL: https://issues.apache.org/jira/browse/HBASE-6391
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0
Reporter: zhou wenjian
 Fix For: 0.94.1


 The Scenario can be reproduce below.
 Enabling an table, some region is online on regionserver,some are still being 
 processed.
 And restart the master.
 when master failover:
 // Region is being served and on an active server
 // add only if region not in disabled and enabling table
 if (false == checkIfRegionBelongsToDisabled(regionInfo)
  false == checkIfRegionsBelongsToEnabling(regionInfo)) {
   regions.put(regionInfo, regionLocation);
   addToServers(regionLocation, regionInfo);
 }
 the opened region will not add to the Regions in master.
 and in the following recoverTableInEnablingState,the region will be assigned 
 again.
 that will lead to the cluster inconsistent

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6391) Master restart when enabling table will lead to region assignned twice

2012-07-13 Thread zhou wenjian (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413576#comment-13413576
 ] 

zhou wenjian commented on HBASE-6391:
-

in my opinion, we could treat the case as failover rather than clean start.

 

 Master restart when enabling table will lead to region assignned twice
 --

 Key: HBASE-6391
 URL: https://issues.apache.org/jira/browse/HBASE-6391
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0
Reporter: zhou wenjian
 Fix For: 0.94.1


 The Scenario can be reproduce below.
 Enabling an table, some region is online on regionserver,some are still being 
 processed.
 And restart the master.
 when master failover:
 // Region is being served and on an active server
 // add only if region not in disabled and enabling table
 if (false == checkIfRegionBelongsToDisabled(regionInfo)
  false == checkIfRegionsBelongsToEnabling(regionInfo)) {
   regions.put(regionInfo, regionLocation);
   addToServers(regionLocation, regionInfo);
 }
 the opened region will not add to the Regions in master.
 and in the following recoverTableInEnablingState,the region will be assigned 
 again.
 that will lead to the cluster inconsistent

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6380) bulkload should update the store.storeSize

[
https://issues.apache.org/jira/browse/HBASE-6380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413583#comment-13413583
]

Hadoop QA commented on HBASE-6380:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12536342/6380-trunk.txt
against trunk revision .

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

+1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The applied patch generated 5 javac compiler warnings (more than
the trunk's current 4 warnings).

-1 findbugs. The patch appears to introduce 8 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.client.TestFromClientSide
org.apache.hadoop.hbase.master.TestSplitLogManager

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/2381//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2381//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2381//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/2381//console

This message is automatically generated.

bulkload should update the store.storeSize
--

Key: HBASE-6380
URL: https://issues.apache.org/jira/browse/HBASE-6380
Project: HBase
Issue Type: Bug
Components: regionserver
Affects Versions: 0.94.0, 0.96.0
Reporter: Jie Huang
Assignee: Jie Huang
Priority: Critical
Attachments: 6380-trunk.txt, 6380-trunk.txt, hbase-6380_0_94_0.patch

After bulkloading some HFiles into the Table, we found the force-split didn't
work because of the MidKey == NULL. Only if we re-booted the HBase service,
the force-split can work normally.

[jira] [Commented] (HBASE-6370) Add compression codec test at HMaster when createTable/modifyColumn/modifyTable


[ 
https://issues.apache.org/jira/browse/HBASE-6370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413587#comment-13413587
 ] 

Hudson commented on HBASE-6370:
---

Integrated in HBase-TRUNK #3124 (See 
[https://builds.apache.org/job/HBase-TRUNK/3124/])
HBASE-6370 Add compression codec test at HMaster when 
createTable/modifyColumn/modifyTable (Revision 1361058)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java


 Add compression codec test at HMaster when 
 createTable/modifyColumn/modifyTable
 ---

 Key: HBASE-6370
 URL: https://issues.apache.org/jira/browse/HBASE-6370
 Project: HBase
  Issue Type: Improvement
Reporter: ShiXing
Assignee: ShiXing
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6370v3.txt, HBASE-6370-trunk-V1.patch, 
 HBASE-6370-trunk-V2.patch, runAllTests.out


 We deployed a cluster that none of the regionserver supports the compression 
 codec such like lzo, but the cluster user/client does not know this, and he 
 specifies the family's compression codec by 
 HColumnDescripto.setCompressionType(Compresson.Algorithm.LZO);
 Because the HBaseAdmin's createTable is async, so the client is waiting all 
 the regions of the table to be online forever. And client does not know why 
 the regions are not online until the HBase administrator find this problem.
 In deed, all of the regions are assigning by master, but regionserver's 
 openHRegion always failed.
 In my option, we can suppose all the cluster's enviroment are the same, means 
 if the master is deployed some lib, the regionserver should also be deployed. 
 Of course above is just a suppose， in real deployment, the hbase dba may just 
 deploy lib on regionserver or master.
 So I think this failure can be found earlier before master create the 
 CreateTableHandler thread, and we can tell client quickly we didn't support 
 this compression codec type.
 I will upload the patch later.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6391) Master restart when enabling table will lead to region assignned twice

2012-07-13 Thread rajeshbabu (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413588#comment-13413588
]

rajeshbabu commented on HBASE-6391:
---

I feel this is same as HBASE-6317 and we are trying to address the concerns in
that.
To answer your questions
bq.may anyone tell me why not to add region in enabling state to regions in
master
Consider a case where i had disabled a table. Again try to ENABLE. But in the
middle the master restarted. Now if we add the regions to the this.regions map
then the EnableTableHandler will see if the regions are available in
this.regions and wont call assign. So those regions will remain closed in the
RS.
bq.in my opinion, we could treat the case as failover rather than clean start.
In HBASE-6317 we are making it as a failover only.
{code}
// store all the enabling state table names and corresponding online servers'
regions.
// This may be needed to avoid calling assign twice for the regions of the
ENABLING table
// that could have been assigned through processRIT.
MapString, ListHRegionInfo enablingTables = new HashMapString,
ListHRegionInfo(1);
{code}
In the patch available in HBASE-6317 we are trying to avoid double assignment
by making a map of the enabling table regions so that if those regions are
already assigned by processRIT we wont assign it now.
Also even if roundrobinassignemt is set to true on master restart and if we
find some partially enabled tables we go with single assignment. Please review
the patch over in HBASE-6317 and let us know if you have some more open points.

Master restart when enabling table will lead to region assignned twice
--

Key: HBASE-6391
URL: https://issues.apache.org/jira/browse/HBASE-6391
Project: HBase
Issue Type: Bug
Components: master
Affects Versions: 0.94.0
Reporter: zhou wenjian
Fix For: 0.94.1

The Scenario can be reproduce below.
Enabling an table, some region is online on regionserver,some are still being
processed.
And restart the master.
when master failover:
// Region is being served and on an active server
// add only if region not in disabled and enabling table
if (false == checkIfRegionBelongsToDisabled(regionInfo)
false == checkIfRegionsBelongsToEnabling(regionInfo)) {
regions.put(regionInfo, regionLocation);
addToServers(regionLocation, regionInfo);
}
the opened region will not add to the Regions in master.
and in the following recoverTableInEnablingState,the region will be assigned
again.
that will lead to the cluster inconsistent

[jira] [Commented] (HBASE-4364) Filters applied to columns not in the selected column list are ignored

[
https://issues.apache.org/jira/browse/HBASE-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413590#comment-13413590
]

Hadoop QA commented on HBASE-4364:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12536340/hbase-4364_trunk.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The applied patch generated 5 javac compiler warnings (more than
the trunk's current 4 warnings).

-1 findbugs. The patch appears to introduce 10 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.master.TestSplitLogManager
org.apache.hadoop.hbase.catalog.TestMetaReaderEditor

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/2382//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2382//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2382//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/2382//console

This message is automatically generated.

Filters applied to columns not in the selected column list are ignored
--

Key: HBASE-4364
URL: https://issues.apache.org/jira/browse/HBASE-4364
Project: HBase
Issue Type: Bug
Components: filters
Affects Versions: 0.90.4, 0.92.0, 0.94.0
Reporter: Todd Lipcon
Priority: Critical
Attachments: hbase-4364_trunk.patch

For a scan, if you select some set of columns using addColumns(), and then
apply a SingleColumnValueFilter that restricts the results based on some
other columns which aren't selected, then those filter conditions are ignored.

[jira] [Commented] (HBASE-6380) bulkload should update the store.storeSize

2012-07-13 Thread Jie Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413592#comment-13413592
 ] 

Jie Huang commented on HBASE-6380:
--

Re-run those 2 test cases locally (on a 64-bit Linux server), Passed.
{noformat}
---
 T E S T S
---
Running org.apache.hadoop.hbase.client.TestFromClientSide
Tests run: 56, Failures: 0, Errors: 0, Skipped: 3, Time elapsed: 172.105 sec

Results :

Tests run: 56, Failures: 0, Errors: 0, Skipped: 3

---
 T E S T S
---
Running org.apache.hadoop.hbase.master.TestSplitLogManager
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 15.548 sec

Results :

Tests run: 12, Failures: 0, Errors: 0, Skipped: 0


{noformat}

 bulkload should update the store.storeSize
 --

 Key: HBASE-6380
 URL: https://issues.apache.org/jira/browse/HBASE-6380
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0, 0.96.0
Reporter: Jie Huang
Assignee: Jie Huang
Priority: Critical
 Attachments: 6380-trunk.txt, 6380-trunk.txt, hbase-6380_0_94_0.patch


 After bulkloading some HFiles into the Table, we found the force-split didn't 
 work because of the MidKey == NULL. Only if we re-booted the HBase service, 
 the force-split can work normally. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6370) Add compression codec test at HMaster when createTable/modifyColumn/modifyTable


[ 
https://issues.apache.org/jira/browse/HBASE-6370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413675#comment-13413675
 ] 

Hudson commented on HBASE-6370:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #92 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/92/])
HBASE-6370 Add compression codec test at HMaster when 
createTable/modifyColumn/modifyTable (Revision 1361058)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java


 Add compression codec test at HMaster when 
 createTable/modifyColumn/modifyTable
 ---

 Key: HBASE-6370
 URL: https://issues.apache.org/jira/browse/HBASE-6370
 Project: HBase
  Issue Type: Improvement
Reporter: ShiXing
Assignee: ShiXing
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6370v3.txt, HBASE-6370-trunk-V1.patch, 
 HBASE-6370-trunk-V2.patch, runAllTests.out


 We deployed a cluster that none of the regionserver supports the compression 
 codec such like lzo, but the cluster user/client does not know this, and he 
 specifies the family's compression codec by 
 HColumnDescripto.setCompressionType(Compresson.Algorithm.LZO);
 Because the HBaseAdmin's createTable is async, so the client is waiting all 
 the regions of the table to be online forever. And client does not know why 
 the regions are not online until the HBase administrator find this problem.
 In deed, all of the regions are assigning by master, but regionserver's 
 openHRegion always failed.
 In my option, we can suppose all the cluster's enviroment are the same, means 
 if the master is deployed some lib, the regionserver should also be deployed. 
 Of course above is just a suppose， in real deployment, the hbase dba may just 
 deploy lib on regionserver or master.
 So I think this failure can be found earlier before master create the 
 CreateTableHandler thread, and we can tell client quickly we didn't support 
 this compression codec type.
 I will upload the patch later.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5533) Add more metrics to HBase


[ 
https://issues.apache.org/jira/browse/HBASE-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413673#comment-13413673
 ] 

Hudson commented on HBASE-5533:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #92 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/92/])
HBASE-6377. HBASE-5533 metrics miss all operations submitted via MultiAction

Committed 6377-trunk-remove-get-put-delete-histograms.patch (Revision 1361026)

 Result = FAILURE
apurtell : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/ServerMetricsTmpl.jamon
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java


 Add more metrics to HBase
 -

 Key: HBASE-5533
 URL: https://issues.apache.org/jira/browse/HBASE-5533
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.2, 0.94.0
Reporter: Shaneal Manek
Assignee: Shaneal Manek
Priority: Minor
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: BlockingQueueContention.java, HBASE-5533-0.92-v4.patch, 
 HBASE-5533-TRUNK-v6.patch, HBASE-5533-TRUNK-v6.patch, 
 HBASE-5533-v7-0.92.patch, TimingOverhead.java, hbase-5533-0.92.patch, 
 hbase5533-0.92-v2.patch, hbase5533-0.92-v3.patch, hbase5533-0.92-v5.patch, 
 histogram_web_ui.png


 To debug/monitor production clusters, there are some more metrics I wish I 
 had available.
 In particular:
 - Although the average FS latencies are useful, a 'histogram' of recent 
 latencies (90% of reads completed in under 100ms, 99% in under 200ms, etc) 
 would be more useful
 - Similar histograms of latencies on common operations (GET, PUT, DELETE) 
 would be useful
 - Counting the number of accesses to each region to detect hotspotting
 - Exposing the current number of HLog files

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6377) HBASE-5533 metrics miss all operations submitted via MultiAction


[ 
https://issues.apache.org/jira/browse/HBASE-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413674#comment-13413674
 ] 

Hudson commented on HBASE-6377:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #92 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/92/])
HBASE-6377. HBASE-5533 metrics miss all operations submitted via MultiAction

Committed 6377-trunk-remove-get-put-delete-histograms.patch (Revision 1361026)

 Result = FAILURE
apurtell : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/ServerMetricsTmpl.jamon
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java


 HBASE-5533 metrics miss all operations submitted via MultiAction
 

 Key: HBASE-6377
 URL: https://issues.apache.org/jira/browse/HBASE-6377
 Project: HBase
  Issue Type: Bug
  Components: metrics, regionserver
Affects Versions: 0.96.0, 0.94.1
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.96.0, 0.94.1

 Attachments: 6377-0.94-remove-get-put-delete-histograms.patch, 
 6377-0.94.patch, 6377-trunk-remove-get-put-delete-histograms.patch, 
 6377-trunk-simple.patch, 6377.patch


 A client application (LoadTestTool) calls put() on HTables. Internally to the 
 HBase client those puts are batched into MultiActions. The total number of 
 put operations shown in the RegionServer's put metrics histogram never 
 increases from 0 even though millions of such operations are made. Needless 
 to say the latency for those operations are not measured either. The value of 
 HBASE-5533 metrics are suspect given the client will batch put and delete ops 
 like this.
 I had a fix in progress but HBASE-6284 messed it up. Before, MultiAction 
 processing in HRegionServer would distingush between puts and deletes and 
 dispatch them separately. It was easy to account for the time for them. Now 
 both puts and deletes are submitted in batch together as mutations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6384) hbck should group together those sidelined regions need to be bulk loaded later


[ 
https://issues.apache.org/jira/browse/HBASE-6384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413676#comment-13413676
 ] 

Hudson commented on HBASE-6384:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #92 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/92/])
HBASE-6384 hbck should group together those sidelined regions need to be 
bulk loaded later (Revision 1361034)

 Result = FAILURE
jxiang : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java


 hbck should group together those sidelined regions need to be bulk loaded 
 later
 ---

 Key: HBASE-6384
 URL: https://issues.apache.org/jira/browse/HBASE-6384
 Project: HBase
  Issue Type: Improvement
  Components: hbck
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: 6384-trunk.patch


 Currently, hbck sidelines some regions to break big overlap groups to avoid 
 possible compaction and region split.  These sidelined regions should be
 bulk loaded back later.  Information about these regions is in the output.
 It will be much easier to group them together under the same sideline rootdir,
 for example, /hbase/.hbck/to_be_loaded/.  If so, even we lose the output
 file, we still know what regions to load back.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6380) bulkload should update the store.storeSize


 [ 
https://issues.apache.org/jira/browse/HBASE-6380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6380:
-

   Resolution: Fixed
Fix Version/s: 0.94.1
   0.96.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed 0.94 and trunk.  Thanks for the patch Jie.

 bulkload should update the store.storeSize
 --

 Key: HBASE-6380
 URL: https://issues.apache.org/jira/browse/HBASE-6380
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0, 0.96.0
Reporter: Jie Huang
Assignee: Jie Huang
Priority: Critical
 Fix For: 0.96.0, 0.94.1

 Attachments: 6380-trunk.txt, 6380-trunk.txt, hbase-6380_0_94_0.patch


 After bulkloading some HFiles into the Table, we found the force-split didn't 
 work because of the MidKey == NULL. Only if we re-booted the HBase service, 
 the force-split can work normally. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6272) In-memory region state is inconsistent


[ 
https://issues.apache.org/jira/browse/HBASE-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413752#comment-13413752
 ] 

stack commented on HBASE-6272:
--

@Ram What do you think?  You think we should commit this to 0.96 and build 
fixes like 6060 on top of this or Maryann's issue on OFFLINE?  Or you want to 
hold off?  At the moment I'm thinking that fixes for 6060 will be big changes, 
not easily backported.

@Jimmy I added review over on rb.  Its looking good.

 In-memory region state is inconsistent
 --

 Key: HBASE-6272
 URL: https://issues.apache.org/jira/browse/HBASE-6272
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang

 AssignmentManger stores region state related information in several places: 
 regionsInTransition, regions (region info to server name map), and servers 
 (server name to region info set map).  However the access to these places is 
 not coordinated properly.  It leads to inconsistent in-memory region state 
 information.  Sometimes, some region could even be offline, and not in 
 transition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.


[ 
https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413772#comment-13413772
 ] 

stack commented on HBASE-6299:
--

bq. Is it possible that we can do something in an earlier stage to prevent 
double assignment? like in forceRegionStateToOffline()?

Yes.  Lets try.  I was going to try and write up a reproduction of the bugs you 
describe above in a harness so can play with them in isolation rather than have 
to blow up someone's world.

 RS starts region open while fails ack to HMaster.sendRegionOpen() causes 
 inconsistency in HMaster's region state and a series of successive problems.
 -

 Key: HBASE-6299
 URL: https://issues.apache.org/jira/browse/HBASE-6299
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6299-v2.patch, HBASE-6299.patch


 1. HMaster tries to assign a region to an RS.
 2. HMaster creates a RegionState for this region and puts it into 
 regionsInTransition.
 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS 
 receives the open region request and starts to proceed, with success 
 eventually. However, due to network problems, HMaster fails to receive the 
 response for the openRegion() call, and the call times out.
 4. HMaster attemps to assign for a second time, choosing another RS. 
 5. But since the HMaster's OpenedRegionHandler has been triggered by the 
 region open of the previous RS, and the RegionState has already been removed 
 from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK 
 node RS_ZK_REGION_OPENING updated by the second attempt.
 6. The unassigned ZK node stays and a later unassign fails coz 
 RS_ZK_REGION_CLOSING cannot be created.
 {code}
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.;
  
 plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
  src=swbss-hadoop-004,60020,1340890123243, 
 dest=swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:28,882 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,291 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
 event for 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, 
 regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node
 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Deleting existing unassigned node for 
 b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Successfully deleted unassigned node for 
 region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has 
 opened the region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  that was online on serverName=swbss-hadoop-006,60020,1340890678078, 
 load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301)
 2012-06-29 07:07:41,140 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of

[jira] [Commented] (HBASE-6380) bulkload should update the store.storeSize


[ 
https://issues.apache.org/jira/browse/HBASE-6380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413785#comment-13413785
 ] 

Hudson commented on HBASE-6380:
---

Integrated in HBase-TRUNK #3125 (See 
[https://builds.apache.org/job/HBase-TRUNK/3125/])
HBASE-6380 bulkload should update the store.storeSize (Revision 1361203)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java


 bulkload should update the store.storeSize
 --

 Key: HBASE-6380
 URL: https://issues.apache.org/jira/browse/HBASE-6380
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0, 0.96.0
Reporter: Jie Huang
Assignee: Jie Huang
Priority: Critical
 Fix For: 0.96.0, 0.94.1

 Attachments: 6380-trunk.txt, 6380-trunk.txt, hbase-6380_0_94_0.patch


 After bulkloading some HFiles into the Table, we found the force-split didn't 
 work because of the MidKey == NULL. Only if we re-booted the HBase service, 
 the force-split can work normally. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6380) bulkload should update the store.storeSize


[ 
https://issues.apache.org/jira/browse/HBASE-6380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413801#comment-13413801
 ] 

Hudson commented on HBASE-6380:
---

Integrated in HBase-0.94 #315 (See 
[https://builds.apache.org/job/HBase-0.94/315/])
HBASE-6380 bulkload should update the store.storeSize (Revision 1361204)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java


 bulkload should update the store.storeSize
 --

 Key: HBASE-6380
 URL: https://issues.apache.org/jira/browse/HBASE-6380
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0, 0.96.0
Reporter: Jie Huang
Assignee: Jie Huang
Priority: Critical
 Fix For: 0.96.0, 0.94.1

 Attachments: 6380-trunk.txt, 6380-trunk.txt, hbase-6380_0_94_0.patch


 After bulkloading some HFiles into the Table, we found the force-split didn't 
 work because of the MidKey == NULL. Only if we re-booted the HBase service, 
 the force-split can work normally. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4050) Update HBase metrics framework to metrics2 framework


 [ 
https://issues.apache.org/jira/browse/HBASE-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-4050:
-

Attachment: HBASE-4050-8.patch

Addressed stack's comments.

 Update HBase metrics framework to metrics2 framework
 

 Key: HBASE-4050
 URL: https://issues.apache.org/jira/browse/HBASE-4050
 Project: HBase
  Issue Type: New Feature
  Components: metrics
Affects Versions: 0.90.4
 Environment: Java 6
Reporter: Eric Yang
Assignee: Alex Baranau
Priority: Critical
 Fix For: 0.96.0

 Attachments: 4050-metrics-v2.patch, 4050-metrics-v3.patch, 
 HBASE-4050-0.patch, HBASE-4050-1.patch, HBASE-4050-2.patch, 
 HBASE-4050-3.patch, HBASE-4050-5.patch, HBASE-4050-6.patch, 
 HBASE-4050-7.patch, HBASE-4050-8.patch, HBASE-4050.patch


 Metrics Framework has been marked deprecated in Hadoop 0.20.203+ and 0.22+, 
 and it might get removed in future Hadoop release.  Hence, HBase needs to 
 revise the dependency of MetricsContext to use Metrics2 framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6272) In-memory region state is inconsistent


[ 
https://issues.apache.org/jira/browse/HBASE-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413856#comment-13413856
 ] 

Jimmy Xiang commented on HBASE-6272:


@Stack, thanks a lot for the review. I will respond on RB.
I will backport this patch to 0.92 and 0.94 after it is applied to trunk.

 In-memory region state is inconsistent
 --

 Key: HBASE-6272
 URL: https://issues.apache.org/jira/browse/HBASE-6272
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang

 AssignmentManger stores region state related information in several places: 
 regionsInTransition, regions (region info to server name map), and servers 
 (server name to region info set map).  However the access to these places is 
 not coordinated properly.  It leads to inconsistent in-memory region state 
 information.  Sometimes, some region could even be offline, and not in 
 transition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5376) Add more logging to triage HBASE-5312: Closed parent region present in Hlog.lastSeqWritten

2012-07-13 Thread Jonathan Hsieh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-5376:
--

Affects Version/s: 0.90.7
Fix Version/s: (was: 0.90.7)

 Add more logging to triage HBASE-5312: Closed parent region present in 
 Hlog.lastSeqWritten
 --

 Key: HBASE-5376
 URL: https://issues.apache.org/jira/browse/HBASE-5376
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 0.90.7
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: hbase-5376.txt


 It is hard to find out what exactly caused HBASE-5312.  Some logging will be 
 helpful to shine some lights.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6390) append() and increment() may result in inconsistent result on retries.

2012-07-13 Thread Andrew Purtell (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413870#comment-13413870
]

Andrew Purtell commented on HBASE-6390:
---

So what you are looking for here is a way for a user to, perhaps optionally,
make idempotent requests out of Append and Increment, correct?

Let me volunteer a couple of strawmen:

1) Could overload the timestamp of the Append and Increment requests. If the
request is out of date relative to another request already applied, throw
back a DoNotRetryException (or just a DNRE for that op if submitted as a
MultiAction). This is roughly how ZooKeeper handles this class of distributed
synchronization issue. Timestamp becomes a global sequence number. Not a
logical sequence number so clocks must be closely synchronized. Each memstore
would track the (server side) time of the most recent in-place update mutation.
Could go further and keep a soft cache of in-place update times by row or even
KV for use by append/increment/ICV. If more specific information gets evicted
from the cache due to pressure then fallback to the per-memstore global
timestamp would still insure correctness but potentially more resubmission work
for the client/app.

2) A more generic option could be:

* Extend the API where the user can set an optional cookie (a long).

* Keep a ring buffer of recent cookies up on the server.

* Check the buffer first if a request with given cookie has already been
applied and throw an exception back to the client if so.

Wouldn't guarantee correctness outside of some time bound. Also I worry about
state management on the server. How large would that buffer need to be to
capture all cookies submitted within ~(2 * time bound)?

append() and increment() may result in inconsistent result on retries.
--

Key: HBASE-6390
URL: https://issues.apache.org/jira/browse/HBASE-6390
Project: HBase
Issue Type: Bug
Affects Versions: 0.94.0, 0.96.0
Reporter: Ashutosh Jindal

append() and increment() api can give inconsistent result in following
scenarios :
1- For eg, if the client does not receive the response in the specified time,
it retries. Now the first call to increment/append is already done and this
retry will again make the operation to succeed.
2- Now if the sync() to WAL fails we get an IOException, on getting an
exception there is a retry done which again results in the doing the
increment/append again.
When may need some sort of roll back for the second problem.
For the first one we need to see how to handle this.

[jira] [Created] (HBASE-6392) UnknownRegionException blocks hbck from sideline big overlap regions

Jimmy Xiang created HBASE-6392:
--

 Summary: UnknownRegionException blocks hbck from sideline big 
overlap regions
 Key: HBASE-6392
 URL: https://issues.apache.org/jira/browse/HBASE-6392
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang


Before sidelining a big overlap region, hbck tries to close it and offline it 
at first.  However, sometimes, it throws NotServingRegion or 
UnknownRegionException.
It could be because the region is not open/assigned at all, or some other issue.
We should figure out why and fix it.

By the way, it's better to print out in the log the command line to bulk load 
back sidelined regions, if any. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6384) hbck should group together those sidelined regions need to be bulk loaded later


[ 
https://issues.apache.org/jira/browse/HBASE-6384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413871#comment-13413871
 ] 

Jimmy Xiang commented on HBASE-6384:


@Jon, as to the actual bulk load command line, it is a good idea.  It will be 
addressed in HBASE-6392.

 hbck should group together those sidelined regions need to be bulk loaded 
 later
 ---

 Key: HBASE-6384
 URL: https://issues.apache.org/jira/browse/HBASE-6384
 Project: HBase
  Issue Type: Improvement
  Components: hbck
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: 6384-trunk.patch


 Currently, hbck sidelines some regions to break big overlap groups to avoid 
 possible compaction and region split.  These sidelined regions should be
 bulk loaded back later.  Information about these regions is in the output.
 It will be much easier to group them together under the same sideline rootdir,
 for example, /hbase/.hbck/to_be_loaded/.  If so, even we lose the output
 file, we still know what regions to load back.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4050) Update HBase metrics framework to metrics2 framework

[
https://issues.apache.org/jira/browse/HBASE-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413872#comment-13413872
]

Hadoop QA commented on HBASE-4050:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12536404/HBASE-4050-8.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 11 new or modified tests.

+1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The applied patch generated 5 javac compiler warnings (more than
the trunk's current 4 warnings).

-1 findbugs. The patch appears to introduce 10 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.catalog.TestMetaReaderEditor

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/2383//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2383//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2383//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2383//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2383//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2383//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/2383//console

This message is automatically generated.

Update HBase metrics framework to metrics2 framework

[jira] [Commented] (HBASE-6378) the javadoc of setEnabledTable maybe not describe accurately

2012-07-13 Thread David S. Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413883#comment-13413883
 ] 

David S. Wang commented on HBASE-6378:
--

From the patch:

+   * Sets the ENABLED state in the cache and Creates or force updates an node 
to
+   * the ENABLED state for the specified table.

I'd modify the above to be:

+   * Sets the ENABLED state in the cache and creates or force updates a node to
+   * ENABLED state for the specified table.

 the javadoc of  setEnabledTable maybe not describe accurately 
 --

 Key: HBASE-6378
 URL: https://issues.apache.org/jira/browse/HBASE-6378
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0
Reporter: zhou wenjian
 Fix For: 0.94.2

 Attachments: 6378.patch


   /**
* Sets the ENABLED state in the cache and deletes the zookeeper node. Fails
* silently if the node is not in enabled in zookeeper
* 
* @param tableName
* @throws KeeperException
*/
   public void setEnabledTable(final String tableName) throws KeeperException {
 setTableState(tableName, TableState.ENABLED);
   }
 When setEnabledTable occours ,It will update the cache and the zookeeper 
 node,rather than to delete the zk node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5997) Fix concerns raised in HBASE-5922 related to HalfStoreFileReader

2012-07-13 Thread Anoop Sam John (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413884#comment-13413884
]

Anoop Sam John commented on HBASE-5997:
---

bq. On the second item when we do the compare, are the offsets to where the key
bytes start or to where the key starts (with its length preample)? For sure, we
are comparing the row portions of keys?

Offset will be to the key(with its length preample). KeyComparator will be
used.we can see how the rowLength being considered. We compare the full key
(rowKey and then CF, qualifier... )

Fix concerns raised in HBASE-5922 related to HalfStoreFileReader

Key: HBASE-5997
URL: https://issues.apache.org/jira/browse/HBASE-5997
Project: HBase
Issue Type: Bug
Affects Versions: 0.90.6, 0.92.1, 0.94.0, 0.96.0
Reporter: ramkrishna.s.vasudevan
Assignee: Anoop Sam John
Fix For: 0.94.2

Attachments: HBASE-5997_0.94.patch, HBASE-5997_94 V2.patch,
Testcase.patch.txt

Pls refer to the comment
https://issues.apache.org/jira/browse/HBASE-5922?focusedCommentId=13269346page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13269346.
Raised this issue to solve that comment. Just incase we don't forget it.

[jira] [Updated] (HBASE-5997) Fix concerns raised in HBASE-5922 related to HalfStoreFileReader

2012-07-13 Thread Anoop Sam John (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-5997:
--

Attachment: HBASE-5997_94 V3.patch

Patch addressing Stack's comment

 Fix concerns raised in HBASE-5922 related to HalfStoreFileReader
 

 Key: HBASE-5997
 URL: https://issues.apache.org/jira/browse/HBASE-5997
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6, 0.92.1, 0.94.0, 0.96.0
Reporter: ramkrishna.s.vasudevan
Assignee: Anoop Sam John
 Fix For: 0.94.2

 Attachments: HBASE-5997_0.94.patch, HBASE-5997_94 V2.patch, 
 HBASE-5997_94 V3.patch, Testcase.patch.txt


 Pls refer to the comment
 https://issues.apache.org/jira/browse/HBASE-5922?focusedCommentId=13269346page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13269346.
 Raised this issue to solve that comment. Just incase we don't forget it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6288) In hbase-daemons.sh, description of the default backup-master file path is wrong

2012-07-13 Thread Benjamin Kim (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Kim updated HBASE-6288:


Attachment: HBASE-6288-trunk.patch
HBASE-6288-94.patch
HBASE-6288-92-1.patch
HBASE-6288-92.patch

 In hbase-daemons.sh, description of the default backup-master file path is 
 wrong
 

 Key: HBASE-6288
 URL: https://issues.apache.org/jira/browse/HBASE-6288
 Project: HBase
  Issue Type: Task
  Components: master, scripts, shell
Affects Versions: 0.92.0, 0.92.1, 0.94.0
Reporter: Benjamin Kim
 Attachments: HBASE-6288-92-1.patch, HBASE-6288-92.patch, 
 HBASE-6288-94.patch, HBASE-6288-trunk.patch


 In hbase-daemons.sh, description of the default backup-master file path is 
 wrong
 {code}
 #   HBASE_BACKUP_MASTERS File naming remote hosts.
 # Default is ${HADOOP_CONF_DIR}/backup-masters
 {code}
 it says the default backup-masters file path is at a hadoop-conf-dir, but 
 shouldn't this be HBASE_CONF_DIR?
 also adding following lines to conf/hbase-env.sh would be helpful
 {code}
 # File naming hosts on which backup HMaster will run.  
 $HBASE_HOME/conf/backup-masters by default.
 export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6288) In hbase-daemons.sh, description of the default backup-master file path is wrong

2012-07-13 Thread Benjamin Kim (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413921#comment-13413921
 ] 

Benjamin Kim commented on HBASE-6288:
-

It took a while for being gone for a vacation. Here goes the patches =)

 In hbase-daemons.sh, description of the default backup-master file path is 
 wrong
 

 Key: HBASE-6288
 URL: https://issues.apache.org/jira/browse/HBASE-6288
 Project: HBase
  Issue Type: Task
  Components: master, scripts, shell
Affects Versions: 0.92.0, 0.92.1, 0.94.0
Reporter: Benjamin Kim
 Attachments: HBASE-6288-92-1.patch, HBASE-6288-92.patch, 
 HBASE-6288-94.patch, HBASE-6288-trunk.patch


 In hbase-daemons.sh, description of the default backup-master file path is 
 wrong
 {code}
 #   HBASE_BACKUP_MASTERS File naming remote hosts.
 # Default is ${HADOOP_CONF_DIR}/backup-masters
 {code}
 it says the default backup-masters file path is at a hadoop-conf-dir, but 
 shouldn't this be HBASE_CONF_DIR?
 also adding following lines to conf/hbase-env.sh would be helpful
 {code}
 # File naming hosts on which backup HMaster will run.  
 $HBASE_HOME/conf/backup-masters by default.
 export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-6393) Decouple audit event creation from storage in AccessController

2012-07-13 Thread Marcelo Vanzin (JIRA)

Marcelo Vanzin created HBASE-6393:
-

 Summary: Decouple audit event creation from storage in 
AccessController
 Key: HBASE-6393
 URL: https://issues.apache.org/jira/browse/HBASE-6393
 Project: HBase
  Issue Type: Brainstorming
  Components: security
Reporter: Marcelo Vanzin


Currently, AccessControler takes care of both generating audit events (by 
performing access checks) and storing them (by creating a log message and 
writing it to the AUDITLOG logger).

This makes the logging system the only way to catch audit events. It means that 
if someone wants to do something fancier (like writing these records to a 
database somewhere), they need to hack through the logging system, and parse 
the messages generated by AccessController, which is not optimal.

The attached patch decouples generation and storage by introducing a new 
interface, used by AccessController, to log the audit events. The current, 
log-based storage is kept in place so that current users won't be affected by 
the change.

I'm filing this as an RFC at this point, so the patch is not totally clean; 
it's on top of HBase 0.92 (which is easier for me to test) and doesn't have any 
unit tests, for starters. But the changes should be very similar on trunk - I 
don't remember changes in this particular area of the code between those 
versions.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6393) Decouple audit event creation from storage in AccessController

2012-07-13 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-6393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HBASE-6393:
--

Attachment: accesslogger-v1.patch

Current version of my code, tested with a custom implementation of the new 
AccessLogger interface.

 Decouple audit event creation from storage in AccessController
 --

 Key: HBASE-6393
 URL: https://issues.apache.org/jira/browse/HBASE-6393
 Project: HBase
  Issue Type: Brainstorming
  Components: security
Reporter: Marcelo Vanzin
 Attachments: accesslogger-v1.patch


 Currently, AccessControler takes care of both generating audit events (by 
 performing access checks) and storing them (by creating a log message and 
 writing it to the AUDITLOG logger).
 This makes the logging system the only way to catch audit events. It means 
 that if someone wants to do something fancier (like writing these records to 
 a database somewhere), they need to hack through the logging system, and 
 parse the messages generated by AccessController, which is not optimal.
 The attached patch decouples generation and storage by introducing a new 
 interface, used by AccessController, to log the audit events. The current, 
 log-based storage is kept in place so that current users won't be affected by 
 the change.
 I'm filing this as an RFC at this point, so the patch is not totally clean; 
 it's on top of HBase 0.92 (which is easier for me to test) and doesn't have 
 any unit tests, for starters. But the changes should be very similar on trunk 
 - I don't remember changes in this particular area of the code between those 
 versions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4050) Update HBase metrics framework to metrics2 framework


[ 
https://issues.apache.org/jira/browse/HBASE-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413945#comment-13413945
 ] 

Elliott Clark commented on HBASE-4050:
--

Test failure looks un-related.  Works on my machine.

 Update HBase metrics framework to metrics2 framework
 

 Key: HBASE-4050
 URL: https://issues.apache.org/jira/browse/HBASE-4050
 Project: HBase
  Issue Type: New Feature
  Components: metrics
Affects Versions: 0.90.4
 Environment: Java 6
Reporter: Eric Yang
Assignee: Alex Baranau
Priority: Critical
 Fix For: 0.96.0

 Attachments: 4050-metrics-v2.patch, 4050-metrics-v3.patch, 
 HBASE-4050-0.patch, HBASE-4050-1.patch, HBASE-4050-2.patch, 
 HBASE-4050-3.patch, HBASE-4050-5.patch, HBASE-4050-6.patch, 
 HBASE-4050-7.patch, HBASE-4050-8.patch, HBASE-4050.patch


 Metrics Framework has been marked deprecated in Hadoop 0.20.203+ and 0.22+, 
 and it might get removed in future Hadoop release.  Hence, HBase needs to 
 revise the dependency of MetricsContext to use Metrics2 framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-4050) Update HBase metrics framework to metrics2 framework


 [ 
https://issues.apache.org/jira/browse/HBASE-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark reassigned HBASE-4050:


Assignee: Elliott Clark  (was: Alex Baranau)

 Update HBase metrics framework to metrics2 framework
 

 Key: HBASE-4050
 URL: https://issues.apache.org/jira/browse/HBASE-4050
 Project: HBase
  Issue Type: New Feature
  Components: metrics
Affects Versions: 0.90.4
 Environment: Java 6
Reporter: Eric Yang
Assignee: Elliott Clark
Priority: Critical
 Fix For: 0.96.0

 Attachments: 4050-metrics-v2.patch, 4050-metrics-v3.patch, 
 HBASE-4050-0.patch, HBASE-4050-1.patch, HBASE-4050-2.patch, 
 HBASE-4050-3.patch, HBASE-4050-5.patch, HBASE-4050-6.patch, 
 HBASE-4050-7.patch, HBASE-4050-8.patch, HBASE-4050.patch


 Metrics Framework has been marked deprecated in Hadoop 0.20.203+ and 0.22+, 
 and it might get removed in future Hadoop release.  Hence, HBase needs to 
 revise the dependency of MetricsContext to use Metrics2 framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-13 Thread Aditya Kishore (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414031#comment-13414031
 ] 

Aditya Kishore commented on HBASE-6389:
---

I like the idea of treating timeout as error case and if we do decide on that, 
two things need to be taken care of.

# The current default timeout of 4.5 sec may not be appropriate and may require 
upward revision (to the tune of few minutes), and
# The master would need to do a cluster shutdown including other standby 
masters, otherwise each standby master may continue after the previous one has 
given up. In the worst case scenario of this case, if somehow 'minToStart' 
number of RSes join the last master, the cluster may be left with no standby 
master.

For this JIRA, I would like to revert to the original behavior (until 0.92) of 
Master of waiting for 'minToStart' number of RSes.

 Modify the conditions to ensure that Master waits for sufficient number of 
 Region Servers before starting region assignments
 

 Key: HBASE-6389
 URL: https://issues.apache.org/jira/browse/HBASE-6389
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.96.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-6389_trunk.patch


 Continuing from HBASE-6375.
 It seems I was mistaken in my assumption that changing the value of 
 hbase.master.wait.on.regionservers.mintostart to a sufficient number (from 
 default of 1) can help prevent assignment of all regions to one (or a small 
 number of) region server(s).
 While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
 0.94.0 onwards to address HBASE-4993.
 From 0.94.0 onwards, Master will proceed immediately after the timeout has 
 lapsed, even if hbase.master.wait.on.regionservers.mintostart has not 
 reached.
 Reading the current conditions of waitForRegionServers() clarifies it
 {code:title=ServerManager.java (trunk rev:1360470)}
 
 581 /**
 582  * Wait for the region servers to report in.
 583  * We will wait until one of this condition is met:
 584  *  - the master is stopped
 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
 587  *region servers is reached
 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
 AND
 589  *   there have been no new region server in for
 590  *  'hbase.master.wait.on.regionservers.interval' time
 591  *
 592  * @throws InterruptedException
 593  */
 594 public void waitForRegionServers(MonitoredTask status)
 595 throws InterruptedException {
 
 
 612   while (
 613 !this.master.isStopped() 
 614   slept  timeout 
 615   count  maxToStart 
 616   (lastCountChange+interval  now || count  minToStart)
 617 ){
 
 {code}
 So with the current conditions, the wait will end as soon as timeout is 
 reached even lesser number of RS have checked-in with the Master and the 
 master will proceed with the region assignment among these RSes alone.
 As mentioned in 
 -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
  and I concur, this could have disastrous effect in large cluster especially 
 now that MSLAB is turned on.
 To enforce the required quorum as specified by 
 hbase.master.wait.on.regionservers.mintostart irrespective of timeout, 
 these conditions need to be modified as following
 {code:title=ServerManager.java}
 ..
   /**
* Wait for the region servers to report in.
* We will wait until one of this condition is met:
*  - the master is stopped
*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
*region servers is reached
*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
*   there have been no new region server in for
*  'hbase.master.wait.on.regionservers.interval' time AND
*   the 'hbase.master.wait.on.regionservers.timeout' is reached
*
* @throws InterruptedException
*/
   public void waitForRegionServers(MonitoredTask status)
 ..
 ..
 int minToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.mintostart, 1);
 int maxToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.maxtostart, 
 Integer.MAX_VALUE);
 if (maxToStart  minToStart) {
   maxToStart = minToStart;
 }
 ..
 ..
 while (
   !this.master.isStopped() 
 count  maxToStart

[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-13 Thread Aditya Kishore (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Kishore updated HBASE-6389:
--

Attachment: HBASE-6389_trunk.patch

The test failure were result of masked error in test code which this change 
brought out.

There were two such errors.

# The function 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster() was 
overriding the value of 'mintostart' and 'maxtostart' with a single value, even 
if the caller has set them explicitly.
# org.apache.hadoop.hbase.regionserver.TestRSKilledWhenMasterInitializing did 
not set these values even though it kills one RS during master initialization.

The attached patch fixes these two.

 Modify the conditions to ensure that Master waits for sufficient number of 
 Region Servers before starting region assignments
 

 Key: HBASE-6389
 URL: https://issues.apache.org/jira/browse/HBASE-6389
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.96.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch


 Continuing from HBASE-6375.
 It seems I was mistaken in my assumption that changing the value of 
 hbase.master.wait.on.regionservers.mintostart to a sufficient number (from 
 default of 1) can help prevent assignment of all regions to one (or a small 
 number of) region server(s).
 While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
 0.94.0 onwards to address HBASE-4993.
 From 0.94.0 onwards, Master will proceed immediately after the timeout has 
 lapsed, even if hbase.master.wait.on.regionservers.mintostart has not 
 reached.
 Reading the current conditions of waitForRegionServers() clarifies it
 {code:title=ServerManager.java (trunk rev:1360470)}
 
 581 /**
 582  * Wait for the region servers to report in.
 583  * We will wait until one of this condition is met:
 584  *  - the master is stopped
 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
 587  *region servers is reached
 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
 AND
 589  *   there have been no new region server in for
 590  *  'hbase.master.wait.on.regionservers.interval' time
 591  *
 592  * @throws InterruptedException
 593  */
 594 public void waitForRegionServers(MonitoredTask status)
 595 throws InterruptedException {
 
 
 612   while (
 613 !this.master.isStopped() 
 614   slept  timeout 
 615   count  maxToStart 
 616   (lastCountChange+interval  now || count  minToStart)
 617 ){
 
 {code}
 So with the current conditions, the wait will end as soon as timeout is 
 reached even lesser number of RS have checked-in with the Master and the 
 master will proceed with the region assignment among these RSes alone.
 As mentioned in 
 -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
  and I concur, this could have disastrous effect in large cluster especially 
 now that MSLAB is turned on.
 To enforce the required quorum as specified by 
 hbase.master.wait.on.regionservers.mintostart irrespective of timeout, 
 these conditions need to be modified as following
 {code:title=ServerManager.java}
 ..
   /**
* Wait for the region servers to report in.
* We will wait until one of this condition is met:
*  - the master is stopped
*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
*region servers is reached
*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
*   there have been no new region server in for
*  'hbase.master.wait.on.regionservers.interval' time AND
*   the 'hbase.master.wait.on.regionservers.timeout' is reached
*
* @throws InterruptedException
*/
   public void waitForRegionServers(MonitoredTask status)
 ..
 ..
 int minToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.mintostart, 1);
 int maxToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.maxtostart, 
 Integer.MAX_VALUE);
 if (maxToStart  minToStart) {
   maxToStart = minToStart;
 }
 ..
 ..
 while (
   !this.master.isStopped() 
 count  maxToStart 
 (lastCountChange+interval  now || timeout  slept || count  
 minToStart)
   ){
 ..
 {code}

--
This message is automatically generated by JIRA.
If you think

[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

[
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414106#comment-13414106
]

Hadoop QA commented on HBASE-6389:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12536453/HBASE-6389_trunk.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 6 new or modified tests.

+1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The applied patch generated 5 javac compiler warnings (more than
the trunk's current 4 warnings).

-1 findbugs. The patch appears to introduce 8 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.mapreduce.TestHLogRecordReader

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/2384//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2384//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2384//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/2384//console

This message is automatically generated.

Modify the conditions to ensure that Master waits for sufficient number of
Region Servers before starting region assignments

Key: HBASE-6389
URL: https://issues.apache.org/jira/browse/HBASE-6389
Project: HBase
Issue Type: Bug
Components: master
Affects Versions: 0.94.0, 0.96.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
Fix For: 0.96.0, 0.94.1

Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch

Continuing from HBASE-6375.
It seems I was mistaken in my assumption that changing the value of
hbase.master.wait.on.regionservers.mintostart to a sufficient number (from
default of 1) can help prevent assignment of all regions to one (or a small
number of) region server(s).
While this was the case in 0.90.x and 0.92.x, the behavior has changed in
0.94.0 onwards to address HBASE-4993.
From 0.94.0 onwards, Master will proceed immediately after the timeout has
lapsed, even if hbase.master.wait.on.regionservers.mintostart has not
reached.
Reading the current conditions of waitForRegionServers() clarifies it
{code:title=ServerManager.java (trunk rev:1360470)}

581 /**
582 * Wait for the region servers to report in.
583 * We will wait until one of this condition is met:
584 * - the master is stopped
585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached
586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of
587 *region servers is reached
588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached
AND
589 * there have been no new region server in for
590 * 'hbase.master.wait.on.regionservers.interval' time
591 *
592 * @throws InterruptedException
593 */
594 public void waitForRegionServers(MonitoredTask status)
595 throws InterruptedException {

612 while (
613 !this.master.isStopped()
614 slept timeout
615 count maxToStart
616 (lastCountChange+interval now || count minToStart)
617 ){

{code}
So with the current conditions, the wait will end as soon as timeout is
reached even lesser number of RS have checked-in with the Master and the
master will proceed with the region assignment among these RSes alone.
As mentioned in
-[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
and I concur, this could have disastrous effect in large cluster especially
now that MSLAB is turned on.
To enforce the required quorum as specified by
hbase.master.wait.on.regionservers.mintostart irrespective of timeout,
these conditions need to be modified as following
{code:title=ServerManager.java}
..
/**
* Wait for the region servers to report in.
* We will wait until one of this condition is met:
* - the master is stopped
* - the 'hbase.master.wait.on.regionservers.maxtostart' number of
*region servers is reached
* - the

[jira] [Assigned] (HBASE-6392) UnknownRegionException blocks hbck from sideline big overlap regions


 [ 
https://issues.apache.org/jira/browse/HBASE-6392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang reassigned HBASE-6392:
--

Assignee: Jimmy Xiang

 UnknownRegionException blocks hbck from sideline big overlap regions
 

 Key: HBASE-6392
 URL: https://issues.apache.org/jira/browse/HBASE-6392
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang

 Before sidelining a big overlap region, hbck tries to close it and offline it 
 at first.  However, sometimes, it throws NotServingRegion or 
 UnknownRegionException.
 It could be because the region is not open/assigned at all, or some other 
 issue.
 We should figure out why and fix it.
 By the way, it's better to print out in the log the command line to bulk load 
 back sidelined regions, if any. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-6381) AssignmentManager should use the same logic for clean startup and failover


 [ 
https://issues.apache.org/jira/browse/HBASE-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang reassigned HBASE-6381:
--

Assignee: Jimmy Xiang

 AssignmentManager should use the same logic for clean startup and failover
 --

 Key: HBASE-6381
 URL: https://issues.apache.org/jira/browse/HBASE-6381
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang

 Currently AssignmentManager handles clean startup and failover very 
 differently.
 Different logic is mingled together so it is hard to find out which is for 
 which.
 We should clean it up and share the same logic so that AssignmentManager 
 handles
 both cases the same way.  This way, the code will much easier to understand 
 and
 maintain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6380) bulkload should update the store.storeSize


[ 
https://issues.apache.org/jira/browse/HBASE-6380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414154#comment-13414154
 ] 

Hudson commented on HBASE-6380:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #93 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/93/])
HBASE-6380 bulkload should update the store.storeSize (Revision 1361203)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java


 bulkload should update the store.storeSize
 --

 Key: HBASE-6380
 URL: https://issues.apache.org/jira/browse/HBASE-6380
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0, 0.96.0
Reporter: Jie Huang
Assignee: Jie Huang
Priority: Critical
 Fix For: 0.96.0, 0.94.1

 Attachments: 6380-trunk.txt, 6380-trunk.txt, hbase-6380_0_94_0.patch


 After bulkloading some HFiles into the Table, we found the force-split didn't 
 work because of the MidKey == NULL. Only if we re-booted the HBase service, 
 the force-split can work normally. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-6394) verifyrep MR job map tasks throws NullPointerException

Jimmy Xiang created HBASE-6394:
--

 Summary: verifyrep MR job map tasks throws NullPointerException 
 Key: HBASE-6394
 URL: https://issues.apache.org/jira/browse/HBASE-6394
 Project: HBase
  Issue Type: Bug
  Components: replication
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: 6394-trunk.patch

{noformat}
2012-07-02 16:23:34,871 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2012-07-02 16:23:34,876 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier.cleanup(VerifyReplication.java:140)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
2012-07-02 16:23:34,882 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
for the task
{noformat}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6394) verifyrep MR job map tasks throws NullPointerException


 [ 
https://issues.apache.org/jira/browse/HBASE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6394:
---

Attachment: 6394-trunk.patch

 verifyrep MR job map tasks throws NullPointerException 
 ---

 Key: HBASE-6394
 URL: https://issues.apache.org/jira/browse/HBASE-6394
 Project: HBase
  Issue Type: Bug
  Components: replication
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: 6394-trunk.patch


 {noformat}
 2012-07-02 16:23:34,871 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
 Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
 2012-07-02 16:23:34,876 WARN org.apache.hadoop.mapred.Child: Error running 
 child
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier.cleanup(VerifyReplication.java:140)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.mapred.Child.main(Child.java:264)
 2012-07-02 16:23:34,882 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
 for the task
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6394) verifyrep MR job map tasks throws NullPointerException


 [ 
https://issues.apache.org/jira/browse/HBASE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6394:
---

Status: Patch Available  (was: Open)

The log is from a previous version of HBase. So it is a little bit off with 
trunk.

 verifyrep MR job map tasks throws NullPointerException 
 ---

 Key: HBASE-6394
 URL: https://issues.apache.org/jira/browse/HBASE-6394
 Project: HBase
  Issue Type: Bug
  Components: replication
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: 6394-trunk.patch


 {noformat}
 2012-07-02 16:23:34,871 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
 Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
 2012-07-02 16:23:34,876 WARN org.apache.hadoop.mapred.Child: Error running 
 child
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier.cleanup(VerifyReplication.java:140)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.mapred.Child.main(Child.java:264)
 2012-07-02 16:23:34,882 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
 for the task
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6394) verifyrep MR job map tasks throws NullPointerException


[ 
https://issues.apache.org/jira/browse/HBASE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414161#comment-13414161
 ] 

Zhihong Ted Yu commented on HBASE-6394:
---

{code}
+replicatedScanner.close();
{code}
I was expecting 'replicatedScanner = null' following the above call.

 verifyrep MR job map tasks throws NullPointerException 
 ---

 Key: HBASE-6394
 URL: https://issues.apache.org/jira/browse/HBASE-6394
 Project: HBase
  Issue Type: Bug
  Components: replication
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: 6394-trunk.patch


 {noformat}
 2012-07-02 16:23:34,871 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
 Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
 2012-07-02 16:23:34,876 WARN org.apache.hadoop.mapred.Child: Error running 
 child
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier.cleanup(VerifyReplication.java:140)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.mapred.Child.main(Child.java:264)
 2012-07-02 16:23:34,882 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
 for the task
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-6395) TestFSSchedulerApp should be in scheduler.fair package

Zhihong Ted Yu created HBASE-6395:
-

 Summary: TestFSSchedulerApp should be in scheduler.fair package
 Key: HBASE-6395
 URL: https://issues.apache.org/jira/browse/HBASE-6395
 Project: HBase
  Issue Type: Bug
Reporter: Zhihong Ted Yu


MAPREDUCE-3451 added Fair Scheduler to MRv2

TestFSSchedulerApp was added under 
src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair but 
its package was declared to be 
org.apache.hadoop.yarn.server.resourcemanager.scheduler

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-6395) TestFSSchedulerApp should be in scheduler.fair package


 [ 
https://issues.apache.org/jira/browse/HBASE-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu resolved HBASE-6395.
---

Resolution: Won't Fix

This should have been a MAPREDUCE JIRA.

 TestFSSchedulerApp should be in scheduler.fair package
 --

 Key: HBASE-6395
 URL: https://issues.apache.org/jira/browse/HBASE-6395
 Project: HBase
  Issue Type: Bug
Reporter: Zhihong Ted Yu

 MAPREDUCE-3451 added Fair Scheduler to MRv2
 TestFSSchedulerApp was added under 
 src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair 
 but its package was declared to be 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6394) verifyrep MR job map tasks throws NullPointerException


 [ 
https://issues.apache.org/jira/browse/HBASE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6394:
---

Status: Open  (was: Patch Available)

 verifyrep MR job map tasks throws NullPointerException 
 ---

 Key: HBASE-6394
 URL: https://issues.apache.org/jira/browse/HBASE-6394
 Project: HBase
  Issue Type: Bug
  Components: replication
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: 6394-trunk.patch, 6394-trunk_v2.patch


 {noformat}
 2012-07-02 16:23:34,871 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
 Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
 2012-07-02 16:23:34,876 WARN org.apache.hadoop.mapred.Child: Error running 
 child
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier.cleanup(VerifyReplication.java:140)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.mapred.Child.main(Child.java:264)
 2012-07-02 16:23:34,882 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
 for the task
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6394) verifyrep MR job map tasks throws NullPointerException


 [ 
https://issues.apache.org/jira/browse/HBASE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6394:
---

Attachment: 6394-trunk_v2.patch

 verifyrep MR job map tasks throws NullPointerException 
 ---

 Key: HBASE-6394
 URL: https://issues.apache.org/jira/browse/HBASE-6394
 Project: HBase
  Issue Type: Bug
  Components: replication
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: 6394-trunk.patch, 6394-trunk_v2.patch


 {noformat}
 2012-07-02 16:23:34,871 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
 Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
 2012-07-02 16:23:34,876 WARN org.apache.hadoop.mapred.Child: Error running 
 child
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier.cleanup(VerifyReplication.java:140)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.mapred.Child.main(Child.java:264)
 2012-07-02 16:23:34,882 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
 for the task
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments


[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414185#comment-13414185
 ] 

Lars Hofhansl commented on HBASE-6389:
--

+1 on last patch.
If there are no objections I'll commit this to 0.94 and 0.96.

Let's discuss the failure after timeout idea in a different jira.

 Modify the conditions to ensure that Master waits for sufficient number of 
 Region Servers before starting region assignments
 

 Key: HBASE-6389
 URL: https://issues.apache.org/jira/browse/HBASE-6389
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.96.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch


 Continuing from HBASE-6375.
 It seems I was mistaken in my assumption that changing the value of 
 hbase.master.wait.on.regionservers.mintostart to a sufficient number (from 
 default of 1) can help prevent assignment of all regions to one (or a small 
 number of) region server(s).
 While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
 0.94.0 onwards to address HBASE-4993.
 From 0.94.0 onwards, Master will proceed immediately after the timeout has 
 lapsed, even if hbase.master.wait.on.regionservers.mintostart has not 
 reached.
 Reading the current conditions of waitForRegionServers() clarifies it
 {code:title=ServerManager.java (trunk rev:1360470)}
 
 581 /**
 582  * Wait for the region servers to report in.
 583  * We will wait until one of this condition is met:
 584  *  - the master is stopped
 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
 587  *region servers is reached
 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
 AND
 589  *   there have been no new region server in for
 590  *  'hbase.master.wait.on.regionservers.interval' time
 591  *
 592  * @throws InterruptedException
 593  */
 594 public void waitForRegionServers(MonitoredTask status)
 595 throws InterruptedException {
 
 
 612   while (
 613 !this.master.isStopped() 
 614   slept  timeout 
 615   count  maxToStart 
 616   (lastCountChange+interval  now || count  minToStart)
 617 ){
 
 {code}
 So with the current conditions, the wait will end as soon as timeout is 
 reached even lesser number of RS have checked-in with the Master and the 
 master will proceed with the region assignment among these RSes alone.
 As mentioned in 
 -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
  and I concur, this could have disastrous effect in large cluster especially 
 now that MSLAB is turned on.
 To enforce the required quorum as specified by 
 hbase.master.wait.on.regionservers.mintostart irrespective of timeout, 
 these conditions need to be modified as following
 {code:title=ServerManager.java}
 ..
   /**
* Wait for the region servers to report in.
* We will wait until one of this condition is met:
*  - the master is stopped
*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
*region servers is reached
*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
*   there have been no new region server in for
*  'hbase.master.wait.on.regionservers.interval' time AND
*   the 'hbase.master.wait.on.regionservers.timeout' is reached
*
* @throws InterruptedException
*/
   public void waitForRegionServers(MonitoredTask status)
 ..
 ..
 int minToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.mintostart, 1);
 int maxToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.maxtostart, 
 Integer.MAX_VALUE);
 if (maxToStart  minToStart) {
   maxToStart = minToStart;
 }
 ..
 ..
 while (
   !this.master.isStopped() 
 count  maxToStart 
 (lastCountChange+interval  now || timeout  slept || count  
 minToStart)
   ){
 ..
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6394) verifyrep MR job map tasks throws NullPointerException


 [ 
https://issues.apache.org/jira/browse/HBASE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6394:
---

Status: Patch Available  (was: Open)

Addressed Ted's comment.

 verifyrep MR job map tasks throws NullPointerException 
 ---

 Key: HBASE-6394
 URL: https://issues.apache.org/jira/browse/HBASE-6394
 Project: HBase
  Issue Type: Bug
  Components: replication
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: 6394-trunk.patch, 6394-trunk_v2.patch


 {noformat}
 2012-07-02 16:23:34,871 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
 Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
 2012-07-02 16:23:34,876 WARN org.apache.hadoop.mapred.Child: Error running 
 child
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier.cleanup(VerifyReplication.java:140)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.mapred.Child.main(Child.java:264)
 2012-07-02 16:23:34,882 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
 for the task
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6391) Master restart when enabling table will lead to region assignned twice


 [ 
https://issues.apache.org/jira/browse/HBASE-6391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6391:
-

Fix Version/s: (was: 0.94.1)
   0.94.2

I think this could be closed to DUP as well.
Moving to 0.94.2 for now.

 Master restart when enabling table will lead to region assignned twice
 --

 Key: HBASE-6391
 URL: https://issues.apache.org/jira/browse/HBASE-6391
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0
Reporter: zhou wenjian
 Fix For: 0.94.2


 The Scenario can be reproduce below.
 Enabling an table, some region is online on regionserver,some are still being 
 processed.
 And restart the master.
 when master failover:
 // Region is being served and on an active server
 // add only if region not in disabled and enabling table
 if (false == checkIfRegionBelongsToDisabled(regionInfo)
  false == checkIfRegionsBelongsToEnabling(regionInfo)) {
   regions.put(regionInfo, regionLocation);
   addToServers(regionLocation, regionInfo);
 }
 the opened region will not add to the Regions in master.
 and in the following recoverTableInEnablingState,the region will be assigned 
 again.
 that will lead to the cluster inconsistent

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6394) verifyrep MR job map tasks throws NullPointerException


[ 
https://issues.apache.org/jira/browse/HBASE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414189#comment-13414189
 ] 

Zhihong Ted Yu commented on HBASE-6394:
---

+1 on patch v2.

 verifyrep MR job map tasks throws NullPointerException 
 ---

 Key: HBASE-6394
 URL: https://issues.apache.org/jira/browse/HBASE-6394
 Project: HBase
  Issue Type: Bug
  Components: replication
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: 6394-trunk.patch, 6394-trunk_v2.patch


 {noformat}
 2012-07-02 16:23:34,871 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
 Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
 2012-07-02 16:23:34,876 WARN org.apache.hadoop.mapred.Child: Error running 
 child
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier.cleanup(VerifyReplication.java:140)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.mapred.Child.main(Child.java:264)
 2012-07-02 16:23:34,882 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
 for the task
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (HBASE-6391) Master restart when enabling table will lead to region assignned twice

[
https://issues.apache.org/jira/browse/HBASE-6391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414186#comment-13414186
]

Lars Hofhansl edited comment on HBASE-6391 at 7/14/12 12:24 AM:

I think this could be closed as DUP as well.
Moving to 0.94.2 for now.

was (Author: lhofhansl):
I think this could be closed to DUP as well.
Moving to 0.94.2 for now.

Master restart when enabling table will lead to region assignned twice
--

Key: HBASE-6391
URL: https://issues.apache.org/jira/browse/HBASE-6391
Project: HBase
Issue Type: Bug
Components: master
Affects Versions: 0.94.0
Reporter: zhou wenjian
Fix For: 0.94.2

[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments


[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414201#comment-13414201
 ] 

Lars Hofhansl commented on HBASE-6389:
--

Ran TestHLogRecordReader locally. Passes fine (I did not expect that to be 
related to this patch).


 Modify the conditions to ensure that Master waits for sufficient number of 
 Region Servers before starting region assignments
 

 Key: HBASE-6389
 URL: https://issues.apache.org/jira/browse/HBASE-6389
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.96.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch


 Continuing from HBASE-6375.
 It seems I was mistaken in my assumption that changing the value of 
 hbase.master.wait.on.regionservers.mintostart to a sufficient number (from 
 default of 1) can help prevent assignment of all regions to one (or a small 
 number of) region server(s).
 While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
 0.94.0 onwards to address HBASE-4993.
 From 0.94.0 onwards, Master will proceed immediately after the timeout has 
 lapsed, even if hbase.master.wait.on.regionservers.mintostart has not 
 reached.
 Reading the current conditions of waitForRegionServers() clarifies it
 {code:title=ServerManager.java (trunk rev:1360470)}
 
 581 /**
 582  * Wait for the region servers to report in.
 583  * We will wait until one of this condition is met:
 584  *  - the master is stopped
 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
 587  *region servers is reached
 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
 AND
 589  *   there have been no new region server in for
 590  *  'hbase.master.wait.on.regionservers.interval' time
 591  *
 592  * @throws InterruptedException
 593  */
 594 public void waitForRegionServers(MonitoredTask status)
 595 throws InterruptedException {
 
 
 612   while (
 613 !this.master.isStopped() 
 614   slept  timeout 
 615   count  maxToStart 
 616   (lastCountChange+interval  now || count  minToStart)
 617 ){
 
 {code}
 So with the current conditions, the wait will end as soon as timeout is 
 reached even lesser number of RS have checked-in with the Master and the 
 master will proceed with the region assignment among these RSes alone.
 As mentioned in 
 -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
  and I concur, this could have disastrous effect in large cluster especially 
 now that MSLAB is turned on.
 To enforce the required quorum as specified by 
 hbase.master.wait.on.regionservers.mintostart irrespective of timeout, 
 these conditions need to be modified as following
 {code:title=ServerManager.java}
 ..
   /**
* Wait for the region servers to report in.
* We will wait until one of this condition is met:
*  - the master is stopped
*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
*region servers is reached
*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
*   there have been no new region server in for
*  'hbase.master.wait.on.regionservers.interval' time AND
*   the 'hbase.master.wait.on.regionservers.timeout' is reached
*
* @throws InterruptedException
*/
   public void waitForRegionServers(MonitoredTask status)
 ..
 ..
 int minToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.mintostart, 1);
 int maxToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.maxtostart, 
 Integer.MAX_VALUE);
 if (maxToStart  minToStart) {
   maxToStart = minToStart;
 }
 ..
 ..
 while (
   !this.master.isStopped() 
 count  maxToStart 
 (lastCountChange+interval  now || timeout  slept || count  
 minToStart)
   ){
 ..
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6394) verifyrep MR job map tasks throws NullPointerException

[
https://issues.apache.org/jira/browse/HBASE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414211#comment-13414211
]

Hadoop QA commented on HBASE-6394:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12536479/6394-trunk.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The applied patch generated 5 javac compiler warnings (more than
the trunk's current 4 warnings).

-1 findbugs. The patch appears to introduce 8 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.master.TestSplitLogManager

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/2385//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2385//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2385//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/2385//console

This message is automatically generated.

verifyrep MR job map tasks throws NullPointerException
---

Key: HBASE-6394
URL: https://issues.apache.org/jira/browse/HBASE-6394
Project: HBase
Issue Type: Bug
Components: replication
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
Attachments: 6394-trunk.patch, 6394-trunk_v2.patch

{noformat}
2012-07-02 16:23:34,871 INFO org.apache.hadoop.mapred.TaskLogsTruncater:
Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2012-07-02 16:23:34,876 WARN org.apache.hadoop.mapred.Child: Error running
child
java.lang.NullPointerException
at
org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier.cleanup(VerifyReplication.java:140)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
2012-07-02 16:23:34,882 INFO org.apache.hadoop.mapred.Task: Runnning cleanup
for the task
{noformat}

[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments


 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6389:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to 0.94 and 0.96

 Modify the conditions to ensure that Master waits for sufficient number of 
 Region Servers before starting region assignments
 

 Key: HBASE-6389
 URL: https://issues.apache.org/jira/browse/HBASE-6389
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.96.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch


 Continuing from HBASE-6375.
 It seems I was mistaken in my assumption that changing the value of 
 hbase.master.wait.on.regionservers.mintostart to a sufficient number (from 
 default of 1) can help prevent assignment of all regions to one (or a small 
 number of) region server(s).
 While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
 0.94.0 onwards to address HBASE-4993.
 From 0.94.0 onwards, Master will proceed immediately after the timeout has 
 lapsed, even if hbase.master.wait.on.regionservers.mintostart has not 
 reached.
 Reading the current conditions of waitForRegionServers() clarifies it
 {code:title=ServerManager.java (trunk rev:1360470)}
 
 581 /**
 582  * Wait for the region servers to report in.
 583  * We will wait until one of this condition is met:
 584  *  - the master is stopped
 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
 587  *region servers is reached
 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
 AND
 589  *   there have been no new region server in for
 590  *  'hbase.master.wait.on.regionservers.interval' time
 591  *
 592  * @throws InterruptedException
 593  */
 594 public void waitForRegionServers(MonitoredTask status)
 595 throws InterruptedException {
 
 
 612   while (
 613 !this.master.isStopped() 
 614   slept  timeout 
 615   count  maxToStart 
 616   (lastCountChange+interval  now || count  minToStart)
 617 ){
 
 {code}
 So with the current conditions, the wait will end as soon as timeout is 
 reached even lesser number of RS have checked-in with the Master and the 
 master will proceed with the region assignment among these RSes alone.
 As mentioned in 
 -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
  and I concur, this could have disastrous effect in large cluster especially 
 now that MSLAB is turned on.
 To enforce the required quorum as specified by 
 hbase.master.wait.on.regionservers.mintostart irrespective of timeout, 
 these conditions need to be modified as following
 {code:title=ServerManager.java}
 ..
   /**
* Wait for the region servers to report in.
* We will wait until one of this condition is met:
*  - the master is stopped
*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
*region servers is reached
*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
*   there have been no new region server in for
*  'hbase.master.wait.on.regionservers.interval' time AND
*   the 'hbase.master.wait.on.regionservers.timeout' is reached
*
* @throws InterruptedException
*/
   public void waitForRegionServers(MonitoredTask status)
 ..
 ..
 int minToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.mintostart, 1);
 int maxToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.maxtostart, 
 Integer.MAX_VALUE);
 if (maxToStart  minToStart) {
   maxToStart = minToStart;
 }
 ..
 ..
 while (
   !this.master.isStopped() 
 count  maxToStart 
 (lastCountChange+interval  now || timeout  slept || count  
 minToStart)
   ){
 ..
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6380) bulkload should update the store.storeSize


[ 
https://issues.apache.org/jira/browse/HBASE-6380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414242#comment-13414242
 ] 

Hudson commented on HBASE-6380:
---

Integrated in HBase-0.94-security #41 (See 
[https://builds.apache.org/job/HBase-0.94-security/41/])
HBASE-6380 bulkload should update the store.storeSize (Revision 1361204)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java


 bulkload should update the store.storeSize
 --

 Key: HBASE-6380
 URL: https://issues.apache.org/jira/browse/HBASE-6380
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0, 0.96.0
Reporter: Jie Huang
Assignee: Jie Huang
Priority: Critical
 Fix For: 0.96.0, 0.94.1

 Attachments: 6380-trunk.txt, 6380-trunk.txt, hbase-6380_0_94_0.patch


 After bulkloading some HFiles into the Table, we found the force-split didn't 
 work because of the MidKey == NULL. Only if we re-booted the HBase service, 
 the force-split can work normally. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments


[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414241#comment-13414241
 ] 

Hudson commented on HBASE-6389:
---

Integrated in HBase-0.94-security #41 (See 
[https://builds.apache.org/job/HBase-0.94-security/41/])
HBASE-6389 Modify the conditions to ensure that Master waits for sufficient 
number of Region Servers before starting region assignments (Aditya Kishore) 
(Revision 1361458)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java


 Modify the conditions to ensure that Master waits for sufficient number of 
 Region Servers before starting region assignments
 

 Key: HBASE-6389
 URL: https://issues.apache.org/jira/browse/HBASE-6389
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.96.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch


 Continuing from HBASE-6375.
 It seems I was mistaken in my assumption that changing the value of 
 hbase.master.wait.on.regionservers.mintostart to a sufficient number (from 
 default of 1) can help prevent assignment of all regions to one (or a small 
 number of) region server(s).
 While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
 0.94.0 onwards to address HBASE-4993.
 From 0.94.0 onwards, Master will proceed immediately after the timeout has 
 lapsed, even if hbase.master.wait.on.regionservers.mintostart has not 
 reached.
 Reading the current conditions of waitForRegionServers() clarifies it
 {code:title=ServerManager.java (trunk rev:1360470)}
 
 581 /**
 582  * Wait for the region servers to report in.
 583  * We will wait until one of this condition is met:
 584  *  - the master is stopped
 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
 587  *region servers is reached
 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
 AND
 589  *   there have been no new region server in for
 590  *  'hbase.master.wait.on.regionservers.interval' time
 591  *
 592  * @throws InterruptedException
 593  */
 594 public void waitForRegionServers(MonitoredTask status)
 595 throws InterruptedException {
 
 
 612   while (
 613 !this.master.isStopped() 
 614   slept  timeout 
 615   count  maxToStart 
 616   (lastCountChange+interval  now || count  minToStart)
 617 ){
 
 {code}
 So with the current conditions, the wait will end as soon as timeout is 
 reached even lesser number of RS have checked-in with the Master and the 
 master will proceed with the region assignment among these RSes alone.
 As mentioned in 
 -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
  and I concur, this could have disastrous effect in large cluster especially 
 now that MSLAB is turned on.
 To enforce the required quorum as specified by 
 hbase.master.wait.on.regionservers.mintostart irrespective of timeout, 
 these conditions need to be modified as following
 {code:title=ServerManager.java}
 ..
   /**
* Wait for the region servers to report in.
* We will wait until one of this condition is met:
*  - the master is stopped
*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
*region servers is reached
*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
*   there have been no new region server in for
*  'hbase.master.wait.on.regionservers.interval' time AND
*   the 'hbase.master.wait.on.regionservers.timeout' is reached
*
* @throws InterruptedException
*/
   public void waitForRegionServers(MonitoredTask status)
 ..
 ..
 int minToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.mintostart, 1);
 int maxToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.maxtostart, 
 Integer.MAX_VALUE);
 if (maxToStart  minToStart) {
   maxToStart = minToStart;
 }
 ..
 ..
 while (
   !this.master.isStopped() 
 count  maxToStart 
 (lastCountChange+interval  now || timeout  slept || count  
 minToStart)

[jira] [Commented] (HBASE-6384) hbck should group together those sidelined regions need to be bulk loaded later


[ 
https://issues.apache.org/jira/browse/HBASE-6384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414243#comment-13414243
 ] 

Hudson commented on HBASE-6384:
---

Integrated in HBase-0.94-security #41 (See 
[https://builds.apache.org/job/HBase-0.94-security/41/])
HBASE-6384 hbck should group together those sidelined regions need to be 
bulk loaded later (Revision 1361036)

 Result = FAILURE
jxiang : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java


 hbck should group together those sidelined regions need to be bulk loaded 
 later
 ---

 Key: HBASE-6384
 URL: https://issues.apache.org/jira/browse/HBASE-6384
 Project: HBase
  Issue Type: Improvement
  Components: hbck
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: 6384-trunk.patch


 Currently, hbck sidelines some regions to break big overlap groups to avoid 
 possible compaction and region split.  These sidelined regions should be
 bulk loaded back later.  Information about these regions is in the output.
 It will be much easier to group them together under the same sideline rootdir,
 for example, /hbase/.hbck/to_be_loaded/.  If so, even we lose the output
 file, we still know what regions to load back.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments


[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414250#comment-13414250
 ] 

Hudson commented on HBASE-6389:
---

Integrated in HBase-0.94 #316 (See 
[https://builds.apache.org/job/HBase-0.94/316/])
HBASE-6389 Modify the conditions to ensure that Master waits for sufficient 
number of Region Servers before starting region assignments (Aditya Kishore) 
(Revision 1361458)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java


 Modify the conditions to ensure that Master waits for sufficient number of 
 Region Servers before starting region assignments
 

 Key: HBASE-6389
 URL: https://issues.apache.org/jira/browse/HBASE-6389
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.96.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch


 Continuing from HBASE-6375.
 It seems I was mistaken in my assumption that changing the value of 
 hbase.master.wait.on.regionservers.mintostart to a sufficient number (from 
 default of 1) can help prevent assignment of all regions to one (or a small 
 number of) region server(s).
 While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
 0.94.0 onwards to address HBASE-4993.
 From 0.94.0 onwards, Master will proceed immediately after the timeout has 
 lapsed, even if hbase.master.wait.on.regionservers.mintostart has not 
 reached.
 Reading the current conditions of waitForRegionServers() clarifies it
 {code:title=ServerManager.java (trunk rev:1360470)}
 
 581 /**
 582  * Wait for the region servers to report in.
 583  * We will wait until one of this condition is met:
 584  *  - the master is stopped
 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
 587  *region servers is reached
 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
 AND
 589  *   there have been no new region server in for
 590  *  'hbase.master.wait.on.regionservers.interval' time
 591  *
 592  * @throws InterruptedException
 593  */
 594 public void waitForRegionServers(MonitoredTask status)
 595 throws InterruptedException {
 
 
 612   while (
 613 !this.master.isStopped() 
 614   slept  timeout 
 615   count  maxToStart 
 616   (lastCountChange+interval  now || count  minToStart)
 617 ){
 
 {code}
 So with the current conditions, the wait will end as soon as timeout is 
 reached even lesser number of RS have checked-in with the Master and the 
 master will proceed with the region assignment among these RSes alone.
 As mentioned in 
 -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
  and I concur, this could have disastrous effect in large cluster especially 
 now that MSLAB is turned on.
 To enforce the required quorum as specified by 
 hbase.master.wait.on.regionservers.mintostart irrespective of timeout, 
 these conditions need to be modified as following
 {code:title=ServerManager.java}
 ..
   /**
* Wait for the region servers to report in.
* We will wait until one of this condition is met:
*  - the master is stopped
*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
*region servers is reached
*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
*   there have been no new region server in for
*  'hbase.master.wait.on.regionservers.interval' time AND
*   the 'hbase.master.wait.on.regionservers.timeout' is reached
*
* @throws InterruptedException
*/
   public void waitForRegionServers(MonitoredTask status)
 ..
 ..
 int minToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.mintostart, 1);
 int maxToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.maxtostart, 
 Integer.MAX_VALUE);
 if (maxToStart  minToStart) {
   maxToStart = minToStart;
 }
 ..
 ..
 while (
   !this.master.isStopped() 
 count  maxToStart 
 (lastCountChange+interval  now || timeout  slept || count  
 minToStart)
   ){
 ..

[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments


[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414265#comment-13414265
 ] 

Hudson commented on HBASE-6389:
---

Integrated in HBase-TRUNK #3126 (See 
[https://builds.apache.org/job/HBase-TRUNK/3126/])
HBASE-6389 Modify the conditions to ensure that Master waits for sufficient 
number of Region Servers before starting region assignments (Aditya Kishore) 
(Revision 1361456)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java


 Modify the conditions to ensure that Master waits for sufficient number of 
 Region Servers before starting region assignments
 

 Key: HBASE-6389
 URL: https://issues.apache.org/jira/browse/HBASE-6389
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.96.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch


 Continuing from HBASE-6375.
 It seems I was mistaken in my assumption that changing the value of 
 hbase.master.wait.on.regionservers.mintostart to a sufficient number (from 
 default of 1) can help prevent assignment of all regions to one (or a small 
 number of) region server(s).
 While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
 0.94.0 onwards to address HBASE-4993.
 From 0.94.0 onwards, Master will proceed immediately after the timeout has 
 lapsed, even if hbase.master.wait.on.regionservers.mintostart has not 
 reached.
 Reading the current conditions of waitForRegionServers() clarifies it
 {code:title=ServerManager.java (trunk rev:1360470)}
 
 581 /**
 582  * Wait for the region servers to report in.
 583  * We will wait until one of this condition is met:
 584  *  - the master is stopped
 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
 587  *region servers is reached
 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
 AND
 589  *   there have been no new region server in for
 590  *  'hbase.master.wait.on.regionservers.interval' time
 591  *
 592  * @throws InterruptedException
 593  */
 594 public void waitForRegionServers(MonitoredTask status)
 595 throws InterruptedException {
 
 
 612   while (
 613 !this.master.isStopped() 
 614   slept  timeout 
 615   count  maxToStart 
 616   (lastCountChange+interval  now || count  minToStart)
 617 ){
 
 {code}
 So with the current conditions, the wait will end as soon as timeout is 
 reached even lesser number of RS have checked-in with the Master and the 
 master will proceed with the region assignment among these RSes alone.
 As mentioned in 
 -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
  and I concur, this could have disastrous effect in large cluster especially 
 now that MSLAB is turned on.
 To enforce the required quorum as specified by 
 hbase.master.wait.on.regionservers.mintostart irrespective of timeout, 
 these conditions need to be modified as following
 {code:title=ServerManager.java}
 ..
   /**
* Wait for the region servers to report in.
* We will wait until one of this condition is met:
*  - the master is stopped
*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
*region servers is reached
*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
*   there have been no new region server in for
*  'hbase.master.wait.on.regionservers.interval' time AND
*   the 'hbase.master.wait.on.regionservers.timeout' is reached
*
* @throws InterruptedException
*/
   public void waitForRegionServers(MonitoredTask status)
 ..
 ..
 int minToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.mintostart, 1);
 int maxToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.maxtostart, 
 Integer.MAX_VALUE);
 if (maxToStart  minToStart) {
   maxToStart = minToStart;
 }
 ..
 ..
 while (
   !this.master.isStopped() 
 count  maxToStart 
 (lastCountChange+interval  now || timeout  slept || count

[jira] [Updated] (HBASE-6394) verifyrep MR job map tasks throws NullPointerException


 [ 
https://issues.apache.org/jira/browse/HBASE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6394:
---

   Resolution: Fixed
Fix Version/s: 0.94.1
   0.96.0
   0.92.2
   Status: Resolved  (was: Patch Available)

Integrated to 0.92, 0,.94 and 0.96. Thanks Ted for the review.

 verifyrep MR job map tasks throws NullPointerException 
 ---

 Key: HBASE-6394
 URL: https://issues.apache.org/jira/browse/HBASE-6394
 Project: HBase
  Issue Type: Bug
  Components: replication
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: 6394-trunk.patch, 6394-trunk_v2.patch


 {noformat}
 2012-07-02 16:23:34,871 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
 Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
 2012-07-02 16:23:34,876 WARN org.apache.hadoop.mapred.Child: Error running 
 child
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier.cleanup(VerifyReplication.java:140)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.mapred.Child.main(Child.java:264)
 2012-07-02 16:23:34,882 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
 for the task
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6394) verifyrep MR job map tasks throws NullPointerException

[
https://issues.apache.org/jira/browse/HBASE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414274#comment-13414274
]

Hadoop QA commented on HBASE-6394:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12536484/6394-trunk_v2.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The applied patch generated 5 javac compiler warnings (more than
the trunk's current 4 warnings).

-1 findbugs. The patch appears to introduce 8 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:

org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks
org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/2386//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2386//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2386//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/2386//console

This message is automatically generated.

verifyrep MR job map tasks throws NullPointerException
---

Attachments: 6394-trunk.patch, 6394-trunk_v2.patch

[jira] [Commented] (HBASE-6394) verifyrep MR job map tasks throws NullPointerException


[ 
https://issues.apache.org/jira/browse/HBASE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414282#comment-13414282
 ] 

Hudson commented on HBASE-6394:
---

Integrated in HBase-0.94 #318 (See 
[https://builds.apache.org/job/HBase-0.94/318/])
HBASE-6394 verifyrep MR job map tasks throws NullPointerException (Revision 
1361470)

 Result = ABORTED
jxiang : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java


 verifyrep MR job map tasks throws NullPointerException 
 ---

 Key: HBASE-6394
 URL: https://issues.apache.org/jira/browse/HBASE-6394
 Project: HBase
  Issue Type: Bug
  Components: replication
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: 6394-trunk.patch, 6394-trunk_v2.patch


 {noformat}
 2012-07-02 16:23:34,871 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
 Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
 2012-07-02 16:23:34,876 WARN org.apache.hadoop.mapred.Child: Error running 
 child
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier.cleanup(VerifyReplication.java:140)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.mapred.Child.main(Child.java:264)
 2012-07-02 16:23:34,882 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
 for the task
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments


[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414283#comment-13414283
 ] 

Hudson commented on HBASE-6389:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #94 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/94/])
HBASE-6389 Modify the conditions to ensure that Master waits for sufficient 
number of Region Servers before starting region assignments (Aditya Kishore) 
(Revision 1361456)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java


 Modify the conditions to ensure that Master waits for sufficient number of 
 Region Servers before starting region assignments
 

 Key: HBASE-6389
 URL: https://issues.apache.org/jira/browse/HBASE-6389
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.96.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch


 Continuing from HBASE-6375.
 It seems I was mistaken in my assumption that changing the value of 
 hbase.master.wait.on.regionservers.mintostart to a sufficient number (from 
 default of 1) can help prevent assignment of all regions to one (or a small 
 number of) region server(s).
 While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
 0.94.0 onwards to address HBASE-4993.
 From 0.94.0 onwards, Master will proceed immediately after the timeout has 
 lapsed, even if hbase.master.wait.on.regionservers.mintostart has not 
 reached.
 Reading the current conditions of waitForRegionServers() clarifies it
 {code:title=ServerManager.java (trunk rev:1360470)}
 
 581 /**
 582  * Wait for the region servers to report in.
 583  * We will wait until one of this condition is met:
 584  *  - the master is stopped
 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
 587  *region servers is reached
 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
 AND
 589  *   there have been no new region server in for
 590  *  'hbase.master.wait.on.regionservers.interval' time
 591  *
 592  * @throws InterruptedException
 593  */
 594 public void waitForRegionServers(MonitoredTask status)
 595 throws InterruptedException {
 
 
 612   while (
 613 !this.master.isStopped() 
 614   slept  timeout 
 615   count  maxToStart 
 616   (lastCountChange+interval  now || count  minToStart)
 617 ){
 
 {code}
 So with the current conditions, the wait will end as soon as timeout is 
 reached even lesser number of RS have checked-in with the Master and the 
 master will proceed with the region assignment among these RSes alone.
 As mentioned in 
 -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
  and I concur, this could have disastrous effect in large cluster especially 
 now that MSLAB is turned on.
 To enforce the required quorum as specified by 
 hbase.master.wait.on.regionservers.mintostart irrespective of timeout, 
 these conditions need to be modified as following
 {code:title=ServerManager.java}
 ..
   /**
* Wait for the region servers to report in.
* We will wait until one of this condition is met:
*  - the master is stopped
*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
*region servers is reached
*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
*   there have been no new region server in for
*  'hbase.master.wait.on.regionservers.interval' time AND
*   the 'hbase.master.wait.on.regionservers.timeout' is reached
*
* @throws InterruptedException
*/
   public void waitForRegionServers(MonitoredTask status)
 ..
 ..
 int minToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.mintostart, 1);
 int maxToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.maxtostart, 
 Integer.MAX_VALUE);
 if (maxToStart  minToStart) {
   maxToStart = minToStart;
 }
 ..
 ..
 while (
   !this.master.isStopped() 
 count  maxToStart 
 (lastCountChange+interval  now || timeout

[jira] [Commented] (HBASE-6394) verifyrep MR job map tasks throws NullPointerException


[ 
https://issues.apache.org/jira/browse/HBASE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414284#comment-13414284
 ] 

Hudson commented on HBASE-6394:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #94 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/94/])
HBASE-6394 verifyrep MR job map tasks throws NullPointerException (Revision 
1361469)

 Result = FAILURE
jxiang : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java


 verifyrep MR job map tasks throws NullPointerException 
 ---

 Key: HBASE-6394
 URL: https://issues.apache.org/jira/browse/HBASE-6394
 Project: HBase
  Issue Type: Bug
  Components: replication
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: 6394-trunk.patch, 6394-trunk_v2.patch


 {noformat}
 2012-07-02 16:23:34,871 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
 Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
 2012-07-02 16:23:34,876 WARN org.apache.hadoop.mapred.Child: Error running 
 child
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier.cleanup(VerifyReplication.java:140)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.mapred.Child.main(Child.java:264)
 2012-07-02 16:23:34,882 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
 for the task
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6394) verifyrep MR job map tasks throws NullPointerException


[ 
https://issues.apache.org/jira/browse/HBASE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414288#comment-13414288
 ] 

Hudson commented on HBASE-6394:
---

Integrated in HBase-0.94-security #42 (See 
[https://builds.apache.org/job/HBase-0.94-security/42/])
HBASE-6394 verifyrep MR job map tasks throws NullPointerException (Revision 
1361470)

 Result = FAILURE
jxiang : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java


 verifyrep MR job map tasks throws NullPointerException 
 ---

 Key: HBASE-6394
 URL: https://issues.apache.org/jira/browse/HBASE-6394
 Project: HBase
  Issue Type: Bug
  Components: replication
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: 6394-trunk.patch, 6394-trunk_v2.patch


 {noformat}
 2012-07-02 16:23:34,871 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
 Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
 2012-07-02 16:23:34,876 WARN org.apache.hadoop.mapred.Child: Error running 
 child
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier.cleanup(VerifyReplication.java:140)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.mapred.Child.main(Child.java:264)
 2012-07-02 16:23:34,882 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
 for the task
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6394) verifyrep MR job map tasks throws NullPointerException


[ 
https://issues.apache.org/jira/browse/HBASE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414289#comment-13414289
 ] 

Hudson commented on HBASE-6394:
---

Integrated in HBase-TRUNK #3127 (See 
[https://builds.apache.org/job/HBase-TRUNK/3127/])
HBASE-6394 verifyrep MR job map tasks throws NullPointerException (Revision 
1361469)

 Result = FAILURE
jxiang : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java


 verifyrep MR job map tasks throws NullPointerException 
 ---

 Key: HBASE-6394
 URL: https://issues.apache.org/jira/browse/HBASE-6394
 Project: HBase
  Issue Type: Bug
  Components: replication
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: 6394-trunk.patch, 6394-trunk_v2.patch


 {noformat}
 2012-07-02 16:23:34,871 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
 Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
 2012-07-02 16:23:34,876 WARN org.apache.hadoop.mapred.Child: Error running 
 child
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier.cleanup(VerifyReplication.java:140)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.mapred.Child.main(Child.java:264)
 2012-07-02 16:23:34,882 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
 for the task
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6394) verifyrep MR job map tasks throws NullPointerException