date:20111227


 [ 
https://issues.apache.org/jira/browse/HBASE-5073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5073:
--

Fix Version/s: (was: 0.90.5)
   0.90.6
 Hadoop Flags: Reviewed

 Registered listeners not getting removed leading to memory leak in HBaseAdmin
 -

 Key: HBASE-5073
 URL: https://issues.apache.org/jira/browse/HBASE-5073
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.6

 Attachments: HBASE-5073.patch


 HBaseAdmin apis like tableExists(), flush, split, closeRegion uses catalog 
 tracker.  Every time Root node tracker and meta node tracker are started and 
 a listener is registered.  But after the operations are performed the 
 listeners are not getting removed. Hence if the admin apis are consistently 
 used then it may lead to memory leak.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5073) Registered listeners not getting removed leading to memory leak in HBaseAdmin


[ 
https://issues.apache.org/jira/browse/HBASE-5073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176231#comment-13176231
 ] 

Zhihong Yu commented on HBASE-5073:
---

Integrated to 0.90 branch

Thanks for the patch, Ramkrishna.

Thanks for the review, Stack and Lars

 Registered listeners not getting removed leading to memory leak in HBaseAdmin
 -

 Key: HBASE-5073
 URL: https://issues.apache.org/jira/browse/HBASE-5073
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.6

 Attachments: HBASE-5073.patch


 HBaseAdmin apis like tableExists(), flush, split, closeRegion uses catalog 
 tracker.  Every time Root node tracker and meta node tracker are started and 
 a listener is registered.  But after the operations are performed the 
 listeners are not getting removed. Hence if the admin apis are consistently 
 used then it may lead to memory leak.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server


 [ 
https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-4720:
--

Attachment: (was: HBASE-4720.trunk.v2.patch)

 Implement atomic update operations (checkAndPut, checkAndDelete) for REST 
 client/server 
 

 Key: HBASE-4720
 URL: https://issues.apache.org/jira/browse/HBASE-4720
 Project: HBase
  Issue Type: Improvement
Reporter: Daniel Lord
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.trunk.v2.patch, 
 HBASE-4720.v1.patch, HBASE-4720.v3.patch


 I have several large application/HBase clusters where an application node 
 will occasionally need to talk to HBase from a different cluster.  In order 
 to help ensure some of my consistency guarantees I have a sentinel table that 
 is updated atomically as users interact with the system.  This works quite 
 well for the regular hbase client but the REST client does not implement 
 the checkAndPut and checkAndDelete operations.  This exposes the application 
 to some race conditions that have to be worked around.  It would be ideal if 
 the same checkAndPut/checkAndDelete operations could be supported by the REST 
 client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.


 [ 
https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5094:
--

Priority: Critical  (was: Major)

 The META can hold an entry for a region with a different server name from the 
 one actually in the AssignmentManager thus making the region inaccessible.
 

 Key: HBASE-5094
 URL: https://issues.apache.org/jira/browse/HBASE-5094
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: ramkrishna.s.vasudevan
Priority: Critical

 {code}
 RegionState rit = 
 this.services.getAssignmentManager().isRegionInTransition(e.getKey());
 ServerName addressFromAM = this.services.getAssignmentManager()
 .getRegionServerOfRegion(e.getKey());
 if (rit != null  !rit.isClosing()  !rit.isPendingClose()) {
   // Skip regions that were in transition unless CLOSING or
   // PENDING_CLOSE
   LOG.info(Skip assigning region  + rit.toString());
 } else if (addressFromAM != null
  !addressFromAM.equals(this.serverName)) {
   LOG.debug(Skip assigning region 
 + e.getKey().getRegionNameAsString()
 +  because it has been opened in 
 + addressFromAM.getServerName());
   }
 {code}
 In ServerShutDownHandler we try to get the address in the AM.  This address 
 is initially null because it is not yet updated after the region was opened 
 .i.e. the CAll back after node deletion is not yet done in the master side.
 But removal from RIT is completed on the master side.  So this will trigger a 
 new assignment.
 So there is a small window between the online region is actually added in to 
 the online list and the ServerShutdownHandler where we check the existing 
 address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4009) Script to patch holes in .META. table

2011-12-27 Thread Todd Lipcon (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176243#comment-13176243
 ] 

Todd Lipcon commented on HBASE-4009:


I don't htink we should be committing any more rb scripts like this to the 
source code - they just get people into trouble, fall out of date, etc. Instead 
this should be an hbck feature that automatically patches holes in a table if 
requested

 Script to patch holes in .META. table
 -

 Key: HBASE-4009
 URL: https://issues.apache.org/jira/browse/HBASE-4009
 Project: HBase
  Issue Type: New Feature
  Components: shell
 Environment: CDH3U0
Reporter: Lars George
Priority: Trivial
 Attachments: add_empty_region.rb, patch_meta.rb


 I need a script to patch holes in the .META. table, which was corrupted by 
 earlier issue on the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3274) Replace all config properties references in code with string constants


[ 
https://issues.apache.org/jira/browse/HBASE-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176276#comment-13176276
 ] 

Lars Hofhansl commented on HBASE-3274:
--

Big +1.

I have actually reviewed some recent patches that add config names as string. 
Sorry I did not point that out as defect.
Also many people turn to hbase-default.xml, but many (new and old) configs are 
not in there.

 Replace all config properties references in code with string constants
 --

 Key: HBASE-3274
 URL: https://issues.apache.org/jira/browse/HBASE-3274
 Project: HBase
  Issue Type: Improvement
Reporter: Lars George
Assignee: Harsh J
Priority: Trivial
   Original Estimate: 168h
  Remaining Estimate: 168h

 See HBASE-2721 for details. We have fixed the default values in HBASE-3272 
 but we should also follow Hadoop to remove all hardcoded strings that refer 
 to configuration properties and move them to HConstants. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3924) Improve Shell's CLI help


[ 
https://issues.apache.org/jira/browse/HBASE-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176281#comment-13176281
 ] 

Lars Hofhansl commented on HBASE-3924:
--

It's still a bit confusing I think.

How about something like?

{code}
Usage: HBase shell [OPTIONS] [SCRIPT [ARGUMENTS]]

 --format=OPTIONFormatter for outputting results.
Valid options are: console, html.
(Default: console)

 -d | --debug   Set DEBUG log levels.
 -h | --helpThis help.
{code}


 Improve Shell's CLI help
 

 Key: HBASE-3924
 URL: https://issues.apache.org/jira/browse/HBASE-3924
 Project: HBase
  Issue Type: Improvement
  Components: shell
Affects Versions: 0.90.3
Reporter: Lars George
Assignee: Harsh J
Priority: Trivial
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-3924.patch


 In the hirb.rb source we have
 {noformat}
 # so they don't go through to irb.  Output shell 'usage' if user types 
 '--help'
 cmdline_help = HERE # HERE document output as shell usage
 HBase Shell command-line options:
  formatFormatter for outputting results: console | html.
 Default: console
  -d | --debug  Set DEBUG log levels.
 HERE
 found = []
 format = 'console'
 script2run = nil
 log_level = org.apache.log4j.Level::ERROR
 for arg in ARGV
  if arg =~ /^--format=(.+)/i
format = $1
if format =~ /^html$/i
  raise NoMethodError.new(Not yet implemented)
elsif format =~ /^console$/i
  # This is default
else
  raise ArgumentError.new(Unsupported format  + arg)
end
found.push(arg)
  elsif arg == '-h' || arg == '--help'
puts cmdline_help
exit
  elsif arg == '-d' || arg == '--debug'
log_level = org.apache.log4j.Level::DEBUG
$fullBackTrace = true
puts Setting DEBUG log level...
  else
# Presume it a script. Save it off for running later below
# after we've set up some environment.
script2run = arg
found.push(arg)
# Presume that any other args are meant for the script.
break
  end
 end
 {noformat}
 We should enhance the help printed when using -h/--help to look like this?
 {noformat}
 cmdline_help = HERE # HERE document output as shell usage
 HBase Shell command-line options:
  --format={console|html}Formatter for outputting results.
 Default: console
  -d | --debug  Set DEBUG log levels.
  -h | --help   This help.
  script-filename [script-options]
 HERE
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5097) Coprocessor RegionObserver implementation without preScannerOpen and postScannerOpen Impl is throwing NPE and so failing the system initialization.

2011-12-27 Thread Andrew Purtell (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176287#comment-13176287
 ] 

Andrew Purtell commented on HBASE-5097:
---

Wouldn't hurt. 

 Coprocessor RegionObserver implementation without preScannerOpen and 
 postScannerOpen Impl is throwing NPE and so failing the system initialization.
 ---

 Key: HBASE-5097
 URL: https://issues.apache.org/jira/browse/HBASE-5097
 Project: HBase
  Issue Type: Bug
  Components: coprocessors
Reporter: ramkrishna.s.vasudevan

 In HRegionServer.java openScanner()
 {code}
   r.prepareScanner(scan);
   RegionScanner s = null;
   if (r.getCoprocessorHost() != null) {
 s = r.getCoprocessorHost().preScannerOpen(scan);
   }
   if (s == null) {
 s = r.getScanner(scan);
   }
   if (r.getCoprocessorHost() != null) {
 s = r.getCoprocessorHost().postScannerOpen(scan, s);
   }
 {code}
 If we dont have implemention for postScannerOpen the RegionScanner is null 
 and so throwing nullpointer 
 {code}
 java.lang.NullPointerException
   at 
 java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.addScanner(HRegionServer.java:2282)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2272)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)
 {code}
 Making this defect as blocker.. Pls feel free to change the priority if am 
 wrong.  Also correct me if my way of trying out coprocessors without 
 implementing postScannerOpen is wrong.  Am just a learner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5097) Coprocessor RegionObserver implementation without preScannerOpen and postScannerOpen Impl is throwing NPE and so failing the system initialization.


[ 
https://issues.apache.org/jira/browse/HBASE-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176296#comment-13176296
 ] 

Lars Hofhansl commented on HBASE-5097:
--

Not a big fan of catching NPEs. We can add a null check somewhere in the code, 
but catching NPEs is bad design (IMHO).

@Ram: How did you actually do this? You need to implement RegionObserver, so 
you cannot compile unless you provide an implementation of postScannerOpen. Or 
did you class not even implement RegionObserver?


 Coprocessor RegionObserver implementation without preScannerOpen and 
 postScannerOpen Impl is throwing NPE and so failing the system initialization.
 ---

 Key: HBASE-5097
 URL: https://issues.apache.org/jira/browse/HBASE-5097
 Project: HBase
  Issue Type: Bug
  Components: coprocessors
Reporter: ramkrishna.s.vasudevan

 In HRegionServer.java openScanner()
 {code}
   r.prepareScanner(scan);
   RegionScanner s = null;
   if (r.getCoprocessorHost() != null) {
 s = r.getCoprocessorHost().preScannerOpen(scan);
   }
   if (s == null) {
 s = r.getScanner(scan);
   }
   if (r.getCoprocessorHost() != null) {
 s = r.getCoprocessorHost().postScannerOpen(scan, s);
   }
 {code}
 If we dont have implemention for postScannerOpen the RegionScanner is null 
 and so throwing nullpointer 
 {code}
 java.lang.NullPointerException
   at 
 java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.addScanner(HRegionServer.java:2282)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2272)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)
 {code}
 Making this defect as blocker.. Pls feel free to change the priority if am 
 wrong.  Also correct me if my way of trying out coprocessors without 
 implementing postScannerOpen is wrong.  Am just a learner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5098) Update hbase-default.xml to reflect the state of HConstants once its the center.

2011-12-27 Thread Harsh J (Created) (JIRA)

Update hbase-default.xml to reflect the state of HConstants once its the center.


 Key: HBASE-5098
 URL: https://issues.apache.org/jira/browse/HBASE-5098
 Project: HBase
  Issue Type: Sub-task
Reporter: Harsh J
Priority: Trivial


Once the parent task is done, we should be easily able to add to and update 
hbase-default.xml.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3924) Improve Shell's CLI help

2011-12-27 Thread Harsh J (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176299#comment-13176299
 ] 

Harsh J commented on HBASE-3924:


+1 on that. It does look much more readable.

Nit: I'd ditch the 'HBase' though, seems unnecessary. Or maybe use 'hbase' to 
avoid confusion about what it means there.

 Improve Shell's CLI help
 

 Key: HBASE-3924
 URL: https://issues.apache.org/jira/browse/HBASE-3924
 Project: HBase
  Issue Type: Improvement
  Components: shell
Affects Versions: 0.90.3
Reporter: Lars George
Assignee: Harsh J
Priority: Trivial
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-3924.patch


 In the hirb.rb source we have
 {noformat}
 # so they don't go through to irb.  Output shell 'usage' if user types 
 '--help'
 cmdline_help = HERE # HERE document output as shell usage
 HBase Shell command-line options:
  formatFormatter for outputting results: console | html.
 Default: console
  -d | --debug  Set DEBUG log levels.
 HERE
 found = []
 format = 'console'
 script2run = nil
 log_level = org.apache.log4j.Level::ERROR
 for arg in ARGV
  if arg =~ /^--format=(.+)/i
format = $1
if format =~ /^html$/i
  raise NoMethodError.new(Not yet implemented)
elsif format =~ /^console$/i
  # This is default
else
  raise ArgumentError.new(Unsupported format  + arg)
end
found.push(arg)
  elsif arg == '-h' || arg == '--help'
puts cmdline_help
exit
  elsif arg == '-d' || arg == '--debug'
log_level = org.apache.log4j.Level::DEBUG
$fullBackTrace = true
puts Setting DEBUG log level...
  else
# Presume it a script. Save it off for running later below
# after we've set up some environment.
script2run = arg
found.push(arg)
# Presume that any other args are meant for the script.
break
  end
 end
 {noformat}
 We should enhance the help printed when using -h/--help to look like this?
 {noformat}
 cmdline_help = HERE # HERE document output as shell usage
 HBase Shell command-line options:
  --format={console|html}Formatter for outputting results.
 Default: console
  -d | --debug  Set DEBUG log levels.
  -h | --help   This help.
  script-filename [script-options]
 HERE
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5061) StoreFileLocalityChecker


[ 
https://issues.apache.org/jira/browse/HBASE-5061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176311#comment-13176311
 ] 

Lars Hofhansl commented on HBASE-5061:
--

From looking at the code, this would produce data locality with respect to 
certain host (if I understand this correctly).
Wouldn't we want to report locality from the viewpoint the respective 
regionserver(s)?

I.e. regions for a table might be on many regionservers, but for each the data 
might be local to the regionserver.


 StoreFileLocalityChecker
 

 Key: HBASE-5061
 URL: https://issues.apache.org/jira/browse/HBASE-5061
 Project: HBase
  Issue Type: New Feature
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Attachments: StoreFileLocalityChecker.java


 org.apache.hadoop.hbase.HFileLocalityChecker [options]
 A tool to report the number of local and nonlocal HFile blocks, and the ratio 
 of as a percentage.
 Where options are:
 |-f file|Analyze a store file|
 |-r region|Analyze all store files for the region|
 |-t table|Analyze all store files for regions of the table served by the 
 local regionserver|
 |-h host|Consider host local, defaults to the local host|
 |-v|Verbose operation|

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5061) StoreFileLocalityChecker

2011-12-27 Thread Andrew Purtell (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176322#comment-13176322
 ] 

Andrew Purtell commented on HBASE-5061:
---

bq. From looking at the code, this would produce data locality with respect to 
certain host (if I understand this correctly).

Yes this is the intent, to report locality to a given regionserver host, by 
default the local host as determined by a HBase style reverse lookup, assuming 
ops would run it either on an ops box (or the master) and supply the desired 
local host name via the '-h' option, or local to the RS.

I was thinking one use case could be to iterate over each cluster node and 
trigger major compactions for region(s) depending on the output of this tool. 

Of you are looking for a whole cluster report?


 StoreFileLocalityChecker
 

 Key: HBASE-5061
 URL: https://issues.apache.org/jira/browse/HBASE-5061
 Project: HBase
  Issue Type: New Feature
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Attachments: StoreFileLocalityChecker.java


 org.apache.hadoop.hbase.HFileLocalityChecker [options]
 A tool to report the number of local and nonlocal HFile blocks, and the ratio 
 of as a percentage.
 Where options are:
 |-f file|Analyze a store file|
 |-r region|Analyze all store files for the region|
 |-t table|Analyze all store files for regions of the table served by the 
 local regionserver|
 |-h host|Consider host local, defaults to the local host|
 |-v|Verbose operation|

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5061) StoreFileLocalityChecker

2011-12-27 Thread Andrew Purtell (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176326#comment-13176326
 ] 

Andrew Purtell commented on HBASE-5061:
---

So, currently this tool is meant to check the locality of files or regions to a 
given single host, however it could be modified such that if the '-h' option is 
supplied then the report looks only at the given host, otherwise it will 
enumerate the live hosts in ClusterStatus and return results for all. While at 
it, a new option '-j' for reporting in JSON.

Is this more what you have in mind Lars?

 StoreFileLocalityChecker
 

 Key: HBASE-5061
 URL: https://issues.apache.org/jira/browse/HBASE-5061
 Project: HBase
  Issue Type: New Feature
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Attachments: StoreFileLocalityChecker.java


 org.apache.hadoop.hbase.HFileLocalityChecker [options]
 A tool to report the number of local and nonlocal HFile blocks, and the ratio 
 of as a percentage.
 Where options are:
 |-f file|Analyze a store file|
 |-r region|Analyze all store files for the region|
 |-t table|Analyze all store files for regions of the table served by the 
 local regionserver|
 |-h host|Consider host local, defaults to the local host|
 |-v|Verbose operation|

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5061) StoreFileLocalityChecker


[ 
https://issues.apache.org/jira/browse/HBASE-5061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176332#comment-13176332
 ] 

Lars Hofhansl commented on HBASE-5061:
--

That's what I had envisioned for the -t option at least.
Would be nice to have a report on all regions of a table.
But maybe that would not be useful as the number of region servers grows...?

+1 on JSON output.


 StoreFileLocalityChecker
 

 Key: HBASE-5061
 URL: https://issues.apache.org/jira/browse/HBASE-5061
 Project: HBase
  Issue Type: New Feature
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Attachments: StoreFileLocalityChecker.java


 org.apache.hadoop.hbase.HFileLocalityChecker [options]
 A tool to report the number of local and nonlocal HFile blocks, and the ratio 
 of as a percentage.
 Where options are:
 |-f file|Analyze a store file|
 |-r region|Analyze all store files for the region|
 |-t table|Analyze all store files for regions of the table served by the 
 local regionserver|
 |-h host|Consider host local, defaults to the local host|
 |-v|Verbose operation|

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3924) Improve Shell's CLI help

2011-12-27 Thread ramkrishna.s.vasudevan (Updated) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176334#comment-13176334
 ] 

Lars Hofhansl commented on HBASE-3924:
--

+1 on ditching HBase and just say  shell.
And should it say SCRIPT or SCRIPTFILE?

 Improve Shell's CLI help
 

 Key: HBASE-3924
 URL: https://issues.apache.org/jira/browse/HBASE-3924
 Project: HBase
  Issue Type: Improvement
  Components: shell
Affects Versions: 0.90.3
Reporter: Lars George
Assignee: Harsh J
Priority: Trivial
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-3924.patch


 In the hirb.rb source we have
 {noformat}
 # so they don't go through to irb.  Output shell 'usage' if user types 
 '--help'
 cmdline_help = HERE # HERE document output as shell usage
 HBase Shell command-line options:
  formatFormatter for outputting results: console | html.
 Default: console
  -d | --debug  Set DEBUG log levels.
 HERE
 found = []
 format = 'console'
 script2run = nil
 log_level = org.apache.log4j.Level::ERROR
 for arg in ARGV
  if arg =~ /^--format=(.+)/i
format = $1
if format =~ /^html$/i
  raise NoMethodError.new(Not yet implemented)
elsif format =~ /^console$/i
  # This is default
else
  raise ArgumentError.new(Unsupported format  + arg)
end
found.push(arg)
  elsif arg == '-h' || arg == '--help'
puts cmdline_help
exit
  elsif arg == '-d' || arg == '--debug'
log_level = org.apache.log4j.Level::DEBUG
$fullBackTrace = true
puts Setting DEBUG log level...
  else
# Presume it a script. Save it off for running later below
# after we've set up some environment.
script2run = arg
found.push(arg)
# Presume that any other args are meant for the script.
break
  end
 end
 {noformat}
 We should enhance the help printed when using -h/--help to look like this?
 {noformat}
 cmdline_help = HERE # HERE document output as shell usage
 HBase Shell command-line options:
  --format={console|html}Formatter for outputting results.
 Default: console
  -d | --debug  Set DEBUG log levels.
  -h | --help   This help.
  script-filename [script-options]
 HERE
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3924) Improve Shell's CLI help

2011-12-27 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176353#comment-13176353
 ] 

stack commented on HBASE-3924:
--

+1 on patch.  SCRIPTFILE sounds better, yeah, and *HBase* shell is superfluous. 
 I don't think formatter other than console works but thats for another issue.

 Improve Shell's CLI help
 

 Key: HBASE-3924
 URL: https://issues.apache.org/jira/browse/HBASE-3924
 Project: HBase
  Issue Type: Improvement
  Components: shell
Affects Versions: 0.90.3
Reporter: Lars George
Assignee: Harsh J
Priority: Trivial
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-3924.patch


 In the hirb.rb source we have
 {noformat}
 # so they don't go through to irb.  Output shell 'usage' if user types 
 '--help'
 cmdline_help = HERE # HERE document output as shell usage
 HBase Shell command-line options:
  formatFormatter for outputting results: console | html.
 Default: console
  -d | --debug  Set DEBUG log levels.
 HERE
 found = []
 format = 'console'
 script2run = nil
 log_level = org.apache.log4j.Level::ERROR
 for arg in ARGV
  if arg =~ /^--format=(.+)/i
format = $1
if format =~ /^html$/i
  raise NoMethodError.new(Not yet implemented)
elsif format =~ /^console$/i
  # This is default
else
  raise ArgumentError.new(Unsupported format  + arg)
end
found.push(arg)
  elsif arg == '-h' || arg == '--help'
puts cmdline_help
exit
  elsif arg == '-d' || arg == '--debug'
log_level = org.apache.log4j.Level::DEBUG
$fullBackTrace = true
puts Setting DEBUG log level...
  else
# Presume it a script. Save it off for running later below
# after we've set up some environment.
script2run = arg
found.push(arg)
# Presume that any other args are meant for the script.
break
  end
 end
 {noformat}
 We should enhance the help printed when using -h/--help to look like this?
 {noformat}
 cmdline_help = HERE # HERE document output as shell usage
 HBase Shell command-line options:
  --format={console|html}Formatter for outputting results.
 Default: console
  -d | --debug  Set DEBUG log levels.
  -h | --help   This help.
  script-filename [script-options]
 HERE
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5093) wiki update for HBase/Scala

2011-12-27 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176355#comment-13176355
 ] 

stack commented on HBASE-5093:
--

Do you have a 'wiki' name Joe?  I'll add you to the list of contribs (Ours and 
the hadoop wikis have been spammed bad for years and a recent change has it 
that folks need to ask for perms to edit).  Thanks for doing this ska la doc.

 wiki update for HBase/Scala
 ---

 Key: HBASE-5093
 URL: https://issues.apache.org/jira/browse/HBASE-5093
 Project: HBase
  Issue Type: Improvement
Reporter: Joe Stein

 I tried to edit the wiki but it says immutable page
 would be helpful/nice for folks to know how to get sbt working with Scala
 the following is what I did to get it working, not sure why could not edit 
 the wiki figure i open a JIRA so someone with access could update this
 {code}
 resolvers += Apache HBase at 
 https://repository.apache.org/content/repositories/releases;
 resolvers += Thrift at http://people.apache.org/~rawson/repo/;
 libraryDependencies ++= Seq(
 org.apache.hadoop % hadoop-core % 0.20.2,
 org.apache.hbase % hbase % 0.90.4
 )
 {code}
 or let me access it and I can do it, np

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root re

2011-12-27 Thread Jimmy Xiang (Created) (JIRA)

ZK event thread waiting for root region while server shutdown handler waiting 
for event thread to finish distributed log splitting to recover the region 
sever the root region is on


 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang


A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
SpliLogManager
installed the tasks asynchronously, then started to wait for them to complete.

The task znodes were not created actually.  The requests were just queued.
At this time, the zookeeper connection expired.  HMaster tried to recover the 
expired ZK session.
During the recovery, a new zookeeper connection was created.  However, this 
master became the
new master again.  It tried to assign root and meta.

Because the dead RS got the old root region, the master needs to wait for the 
log splitting to complete.
This waiting holds the zookeeper event thread.  So the async create split task 
is never retried since
there is only one event thread, which is waiting for the root region assigned.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root re

2011-12-27 Thread Jimmy Xiang (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jimmy Xiang updated HBASE-5099:
---

Attachment: ZK-event-thread-waiting-for-root.png
distributed-log-splitting-hangs.png

ZK event thread waiting for root region while server shutdown handler waiting
for event thread to finish distributed log splitting to recover the region
sever the root region is on

Key: HBASE-5099
URL: https://issues.apache.org/jira/browse/HBASE-5099
Project: HBase
Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Attachments: ZK-event-thread-waiting-for-root.png,
distributed-log-splitting-hangs.png

A RS died. The ServerShutdownHandler kicked in and started the logspliting.
SpliLogManager
installed the tasks asynchronously, then started to wait for them to complete.
The task znodes were not created actually. The requests were just queued.
At this time, the zookeeper connection expired. HMaster tried to recover the
expired ZK session.
During the recovery, a new zookeeper connection was created. However, this
master became the
new master again. It tried to assign root and meta.
Because the dead RS got the old root region, the master needs to wait for the
log splitting to complete.
This waiting holds the zookeeper event thread. So the async create split
task is never retried since
there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.


 [ 
https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5094:
--

Attachment: 5094.patch

Patch for trunk

 The META can hold an entry for a region with a different server name from the 
 one actually in the AssignmentManager thus making the region inaccessible.
 

 Key: HBASE-5094
 URL: https://issues.apache.org/jira/browse/HBASE-5094
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: ramkrishna.s.vasudevan
Priority: Critical
 Attachments: 5094.patch


 {code}
 RegionState rit = 
 this.services.getAssignmentManager().isRegionInTransition(e.getKey());
 ServerName addressFromAM = this.services.getAssignmentManager()
 .getRegionServerOfRegion(e.getKey());
 if (rit != null  !rit.isClosing()  !rit.isPendingClose()) {
   // Skip regions that were in transition unless CLOSING or
   // PENDING_CLOSE
   LOG.info(Skip assigning region  + rit.toString());
 } else if (addressFromAM != null
  !addressFromAM.equals(this.serverName)) {
   LOG.debug(Skip assigning region 
 + e.getKey().getRegionNameAsString()
 +  because it has been opened in 
 + addressFromAM.getServerName());
   }
 {code}
 In ServerShutDownHandler we try to get the address in the AM.  This address 
 is initially null because it is not yet updated after the region was opened 
 .i.e. the CAll back after node deletion is not yet done in the master side.
 But removal from RIT is completed on the master side.  So this will trigger a 
 new assignment.
 So there is a small window between the online region is actually added in to 
 the online list and the ServerShutdownHandler where we check the existing 
 address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5097) Coprocessor RegionObserver implementation without preScannerOpen and postScannerOpen Impl is throwing NPE and so failing the system initialization.

2011-12-27 Thread ramkrishna.s.vasudevan (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176402#comment-13176402
 ] 

ramkrishna.s.vasudevan commented on HBASE-5097:
---

@Lars

I implemented RegionObserver and not BaseRegionObserver.  In BaseRegionObserver 
we return a regionscanner so it will not cause any problem.  But if we 
implement RegionObserver then default will be null value.  That is the problem. 
 

 Coprocessor RegionObserver implementation without preScannerOpen and 
 postScannerOpen Impl is throwing NPE and so failing the system initialization.
 ---

 Key: HBASE-5097
 URL: https://issues.apache.org/jira/browse/HBASE-5097
 Project: HBase
  Issue Type: Bug
  Components: coprocessors
Reporter: ramkrishna.s.vasudevan

 In HRegionServer.java openScanner()
 {code}
   r.prepareScanner(scan);
   RegionScanner s = null;
   if (r.getCoprocessorHost() != null) {
 s = r.getCoprocessorHost().preScannerOpen(scan);
   }
   if (s == null) {
 s = r.getScanner(scan);
   }
   if (r.getCoprocessorHost() != null) {
 s = r.getCoprocessorHost().postScannerOpen(scan, s);
   }
 {code}
 If we dont have implemention for postScannerOpen the RegionScanner is null 
 and so throwing nullpointer 
 {code}
 java.lang.NullPointerException
   at 
 java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.addScanner(HRegionServer.java:2282)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2272)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)
 {code}
 Making this defect as blocker.. Pls feel free to change the priority if am 
 wrong.  Also correct me if my way of trying out coprocessors without 
 implementing postScannerOpen is wrong.  Am just a learner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.


 [ 
https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5094:
--

Status: Patch Available  (was: Open)

 The META can hold an entry for a region with a different server name from the 
 one actually in the AssignmentManager thus making the region inaccessible.
 

 Key: HBASE-5094
 URL: https://issues.apache.org/jira/browse/HBASE-5094
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: ramkrishna.s.vasudevan
Priority: Critical
 Attachments: 5094.patch


 {code}
 RegionState rit = 
 this.services.getAssignmentManager().isRegionInTransition(e.getKey());
 ServerName addressFromAM = this.services.getAssignmentManager()
 .getRegionServerOfRegion(e.getKey());
 if (rit != null  !rit.isClosing()  !rit.isPendingClose()) {
   // Skip regions that were in transition unless CLOSING or
   // PENDING_CLOSE
   LOG.info(Skip assigning region  + rit.toString());
 } else if (addressFromAM != null
  !addressFromAM.equals(this.serverName)) {
   LOG.debug(Skip assigning region 
 + e.getKey().getRegionNameAsString()
 +  because it has been opened in 
 + addressFromAM.getServerName());
   }
 {code}
 In ServerShutDownHandler we try to get the address in the AM.  This address 
 is initially null because it is not yet updated after the region was opened 
 .i.e. the CAll back after node deletion is not yet done in the master side.
 But removal from RIT is completed on the master side.  So this will trigger a 
 new assignment.
 So there is a small window between the online region is actually added in to 
 the online list and the ServerShutdownHandler where we check the existing 
 address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root

2011-12-27 Thread ramkrishna.s.vasudevan (Resolved) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176406#comment-13176406
]

Zhihong Yu commented on HBASE-5099:
---

bq. Another one is to abort the master
The above sounds better. How about introducing a timeout for:
{code}
this.assignmentManager.waitForAssignment(HRegionInfo.ROOT_REGIONINFO);
{code}
after which master aborts ?

ZK event thread waiting for root region while server shutdown handler waiting
for event thread to finish distributed log splitting to recover the region
sever the root region is on

[jira] [Resolved] (HBASE-5073) Registered listeners not getting removed leading to memory leak in HBaseAdmin


 [ 
https://issues.apache.org/jira/browse/HBASE-5073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan resolved HBASE-5073.
---

Resolution: Fixed

Committed to branch hence resolving.

 Registered listeners not getting removed leading to memory leak in HBaseAdmin
 -

 Key: HBASE-5073
 URL: https://issues.apache.org/jira/browse/HBASE-5073
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.6

 Attachments: HBASE-5073.patch


 HBaseAdmin apis like tableExists(), flush, split, closeRegion uses catalog 
 tracker.  Every time Root node tracker and meta node tracker are started and 
 a listener is registered.  But after the operations are performed the 
 listeners are not getting removed. Hence if the admin apis are consistently 
 used then it may lead to memory leak.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5009) Failure of creating split dir if it already exists prevents splits from happening further

2011-12-27 Thread ramkrishna.s.vasudevan (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5009:
--

Status: Open  (was: Patch Available)

Cancelling and resubmitting as the tests hanged

 Failure of creating split dir if it already exists prevents splits from 
 happening further
 -

 Key: HBASE-5009
 URL: https://issues.apache.org/jira/browse/HBASE-5009
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-5009.patch, HBASE-5009_Branch90.patch


 The scenario is
 - The split of a region takes a long time
 - The deletion of the splitDir fails due to HDFS problems.
 - Subsequent splits also fail after that.
 {code}
 private static void createSplitDir(final FileSystem fs, final Path splitdir)
   throws IOException {
 if (fs.exists(splitdir)) throw new IOException(Splitdir already exits?  
 + splitdir);
 if (!fs.mkdirs(splitdir)) throw new IOException(Failed create of  + 
 splitdir);
   }
 {code}
 Correct me if am wrong? If it is an issue can we change the behaviour of 
 throwing exception?
 Pls suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5009) Failure of creating split dir if it already exists prevents splits from happening further

2011-12-27 Thread ramkrishna.s.vasudevan (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5009:
--

Fix Version/s: 0.90.6
   0.92.1
Affects Version/s: (was: 0.90.6)
   Status: Patch Available  (was: Open)

 Failure of creating split dir if it already exists prevents splits from 
 happening further
 -

 Key: HBASE-5009
 URL: https://issues.apache.org/jira/browse/HBASE-5009
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.1, 0.90.6

 Attachments: HBASE-5009.patch, HBASE-5009_Branch90.patch


 The scenario is
 - The split of a region takes a long time
 - The deletion of the splitDir fails due to HDFS problems.
 - Subsequent splits also fail after that.
 {code}
 private static void createSplitDir(final FileSystem fs, final Path splitdir)
   throws IOException {
 if (fs.exists(splitdir)) throw new IOException(Splitdir already exits?  
 + splitdir);
 if (!fs.mkdirs(splitdir)) throw new IOException(Failed create of  + 
 splitdir);
   }
 {code}
 Correct me if am wrong? If it is an issue can we change the behaviour of 
 throwing exception?
 Pls suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root

2011-12-27 Thread Jimmy Xiang (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176430#comment-13176430
]

Jimmy Xiang commented on HBASE-5099:

This is good. If introducing a timeout, I prefer to do it for
tryRecoveringExpiredZKSession().
The reason for that is, other than waitForAssignment, there are several other
places which have the waiting logic as well,
such as bulkAssign(), waitForRoot(),
this.activeMasterManager.blockUntilBecomingActiveMaster(startupStatus), etc.

ZK event thread waiting for root region while server shutdown handler waiting
for event thread to finish distributed log splitting to recover the region
sever the root region is on

[jira] [Commented] (HBASE-5009) Failure of creating split dir if it already exists prevents splits from happening further


[ 
https://issues.apache.org/jira/browse/HBASE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176433#comment-13176433
 ] 

Zhihong Yu commented on HBASE-5009:
---

Pressing 'Submit' again wouldn't run the tests.
This is because Hadoop QA remembers the attachment Id of the patch for which 
test suite was executed.

Please attach the same copy of patch for TRUNK again.

 Failure of creating split dir if it already exists prevents splits from 
 happening further
 -

 Key: HBASE-5009
 URL: https://issues.apache.org/jira/browse/HBASE-5009
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.1, 0.90.6

 Attachments: HBASE-5009.patch, HBASE-5009_Branch90.patch


 The scenario is
 - The split of a region takes a long time
 - The deletion of the splitDir fails due to HDFS problems.
 - Subsequent splits also fail after that.
 {code}
 private static void createSplitDir(final FileSystem fs, final Path splitdir)
   throws IOException {
 if (fs.exists(splitdir)) throw new IOException(Splitdir already exits?  
 + splitdir);
 if (!fs.mkdirs(splitdir)) throw new IOException(Failed create of  + 
 splitdir);
   }
 {code}
 Correct me if am wrong? If it is an issue can we change the behaviour of 
 throwing exception?
 Pls suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5009) Failure of creating split dir if it already exists prevents splits from happening further


 [ 
https://issues.apache.org/jira/browse/HBASE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5009:
--

Attachment: 5009.txt

 Failure of creating split dir if it already exists prevents splits from 
 happening further
 -

 Key: HBASE-5009
 URL: https://issues.apache.org/jira/browse/HBASE-5009
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.1, 0.90.6

 Attachments: 5009.txt, HBASE-5009.patch, HBASE-5009_Branch90.patch


 The scenario is
 - The split of a region takes a long time
 - The deletion of the splitDir fails due to HDFS problems.
 - Subsequent splits also fail after that.
 {code}
 private static void createSplitDir(final FileSystem fs, final Path splitdir)
   throws IOException {
 if (fs.exists(splitdir)) throw new IOException(Splitdir already exits?  
 + splitdir);
 if (!fs.mkdirs(splitdir)) throw new IOException(Failed create of  + 
 splitdir);
   }
 {code}
 Correct me if am wrong? If it is an issue can we change the behaviour of 
 throwing exception?
 Pls suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5009) Failure of creating split dir if it already exists prevents splits from happening further


 [ 
https://issues.apache.org/jira/browse/HBASE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5009:
--

Status: Open  (was: Patch Available)

 Failure of creating split dir if it already exists prevents splits from 
 happening further
 -

 Key: HBASE-5009
 URL: https://issues.apache.org/jira/browse/HBASE-5009
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.1, 0.90.6

 Attachments: 5009.txt, HBASE-5009.patch, HBASE-5009_Branch90.patch


 The scenario is
 - The split of a region takes a long time
 - The deletion of the splitDir fails due to HDFS problems.
 - Subsequent splits also fail after that.
 {code}
 private static void createSplitDir(final FileSystem fs, final Path splitdir)
   throws IOException {
 if (fs.exists(splitdir)) throw new IOException(Splitdir already exits?  
 + splitdir);
 if (!fs.mkdirs(splitdir)) throw new IOException(Failed create of  + 
 splitdir);
   }
 {code}
 Correct me if am wrong? If it is an issue can we change the behaviour of 
 throwing exception?
 Pls suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5009) Failure of creating split dir if it already exists prevents splits from happening further


 [ 
https://issues.apache.org/jira/browse/HBASE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5009:
--

Status: Patch Available  (was: Open)

 Failure of creating split dir if it already exists prevents splits from 
 happening further
 -

 Key: HBASE-5009
 URL: https://issues.apache.org/jira/browse/HBASE-5009
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.1, 0.90.6

 Attachments: 5009.txt, HBASE-5009.patch, HBASE-5009_Branch90.patch


 The scenario is
 - The split of a region takes a long time
 - The deletion of the splitDir fails due to HDFS problems.
 - Subsequent splits also fail after that.
 {code}
 private static void createSplitDir(final FileSystem fs, final Path splitdir)
   throws IOException {
 if (fs.exists(splitdir)) throw new IOException(Splitdir already exits?  
 + splitdir);
 if (!fs.mkdirs(splitdir)) throw new IOException(Failed create of  + 
 splitdir);
   }
 {code}
 Correct me if am wrong? If it is an issue can we change the behaviour of 
 throwing exception?
 Pls suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5097) Coprocessor RegionObserver implementation without preScannerOpen and postScannerOpen Impl is throwing NPE and so failing the system initialization.


[ 
https://issues.apache.org/jira/browse/HBASE-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176440#comment-13176440
 ] 

Lars Hofhansl commented on HBASE-5097:
--

Hmmm. There is no default implementation for an interface. Are you referring to 
the default implementation generated for you by eclipse?
I just don't think this is a bug.




 Coprocessor RegionObserver implementation without preScannerOpen and 
 postScannerOpen Impl is throwing NPE and so failing the system initialization.
 ---

 Key: HBASE-5097
 URL: https://issues.apache.org/jira/browse/HBASE-5097
 Project: HBase
  Issue Type: Bug
  Components: coprocessors
Reporter: ramkrishna.s.vasudevan

 In HRegionServer.java openScanner()
 {code}
   r.prepareScanner(scan);
   RegionScanner s = null;
   if (r.getCoprocessorHost() != null) {
 s = r.getCoprocessorHost().preScannerOpen(scan);
   }
   if (s == null) {
 s = r.getScanner(scan);
   }
   if (r.getCoprocessorHost() != null) {
 s = r.getCoprocessorHost().postScannerOpen(scan, s);
   }
 {code}
 If we dont have implemention for postScannerOpen the RegionScanner is null 
 and so throwing nullpointer 
 {code}
 java.lang.NullPointerException
   at 
 java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.addScanner(HRegionServer.java:2282)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2272)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)
 {code}
 Making this defect as blocker.. Pls feel free to change the priority if am 
 wrong.  Also correct me if my way of trying out coprocessors without 
 implementing postScannerOpen is wrong.  Am just a learner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root

2011-12-27 Thread Jimmy Xiang (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176447#comment-13176447
]

Jimmy Xiang commented on HBASE-5099:

tryRecoveringExpiredZKSession() is only called by abortNow(), which is called
by abort(), which is called by the
eventThread. I was thinking to put this whole method in another thread with
executor service and time it out after a certain time, for example, 5 minutes,
then fails the recovery and let it abort.

This way, we don't have to adding timeout for all the methods. The regular
master startup is not impacted which calls assignRootAndMeta() too.

However, if we know most likely just waitForAssignment() takes a long time, we
can add timeout to this method only. But I am not so sure.

ZK event thread waiting for root region while server shutdown handler waiting
for event thread to finish distributed log splitting to recover the region
sever the root region is on

[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root

2011-12-27 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176454#comment-13176454
]

Zhihong Yu commented on HBASE-5099:
---

Timing out tryRecoveringExpiredZKSession() is good.
We can introduce a separate thread to carry the current logic of
tryRecoveringExpiredZKSession() and monitor this thread in eventThread.

ZK event thread waiting for root region while server shutdown handler waiting
for event thread to finish distributed log splitting to recover the region
sever the root region is on

[jira] [Commented] (HBASE-5097) Coprocessor RegionObserver implementation without preScannerOpen and postScannerOpen Impl is throwing NPE and so failing the system initialization.


[ 
https://issues.apache.org/jira/browse/HBASE-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176456#comment-13176456
 ] 

ramkrishna.s.vasudevan commented on HBASE-5097:
---

@Lars..

Ok Lars..I too accept it..

 Coprocessor RegionObserver implementation without preScannerOpen and 
 postScannerOpen Impl is throwing NPE and so failing the system initialization.
 ---

 Key: HBASE-5097
 URL: https://issues.apache.org/jira/browse/HBASE-5097
 Project: HBase
  Issue Type: Bug
  Components: coprocessors
Reporter: ramkrishna.s.vasudevan

 In HRegionServer.java openScanner()
 {code}
   r.prepareScanner(scan);
   RegionScanner s = null;
   if (r.getCoprocessorHost() != null) {
 s = r.getCoprocessorHost().preScannerOpen(scan);
   }
   if (s == null) {
 s = r.getScanner(scan);
   }
   if (r.getCoprocessorHost() != null) {
 s = r.getCoprocessorHost().postScannerOpen(scan, s);
   }
 {code}
 If we dont have implemention for postScannerOpen the RegionScanner is null 
 and so throwing nullpointer 
 {code}
 java.lang.NullPointerException
   at 
 java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.addScanner(HRegionServer.java:2282)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2272)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)
 {code}
 Making this defect as blocker.. Pls feel free to change the priority if am 
 wrong.  Also correct me if my way of trying out coprocessors without 
 implementing postScannerOpen is wrong.  Am just a learner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.


[ 
https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176458#comment-13176458
 ] 

Zhihong Yu commented on HBASE-5094:
---

+1 on patch.

 The META can hold an entry for a region with a different server name from the 
 one actually in the AssignmentManager thus making the region inaccessible.
 

 Key: HBASE-5094
 URL: https://issues.apache.org/jira/browse/HBASE-5094
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: ramkrishna.s.vasudevan
Priority: Critical
 Attachments: 5094.patch


 {code}
 RegionState rit = 
 this.services.getAssignmentManager().isRegionInTransition(e.getKey());
 ServerName addressFromAM = this.services.getAssignmentManager()
 .getRegionServerOfRegion(e.getKey());
 if (rit != null  !rit.isClosing()  !rit.isPendingClose()) {
   // Skip regions that were in transition unless CLOSING or
   // PENDING_CLOSE
   LOG.info(Skip assigning region  + rit.toString());
 } else if (addressFromAM != null
  !addressFromAM.equals(this.serverName)) {
   LOG.debug(Skip assigning region 
 + e.getKey().getRegionNameAsString()
 +  because it has been opened in 
 + addressFromAM.getServerName());
   }
 {code}
 In ServerShutDownHandler we try to get the address in the AM.  This address 
 is initially null because it is not yet updated after the region was opened 
 .i.e. the CAll back after node deletion is not yet done in the master side.
 But removal from RIT is completed on the master side.  So this will trigger a 
 new assignment.
 So there is a small window between the online region is actually added in to 
 the online list and the ServerShutdownHandler where we check the existing 
 address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4529) Make WAL Pluggable

2011-12-27 Thread dhruba borthakur (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176459#comment-13176459
]

dhruba borthakur commented on HBASE-4529:
-

This will be a great feature!

HDFS provides io-fencing for HBase log. When a regionserver dies, the master
renames the log directory so that the oldregionserver cannot continue to write
any new transactions to the transaction log. (HDFS provides an api that fails a
file-creation-request if an intermediate path component is non-existant). It
will be nice to get the pluggable-wal API support this type of operation.

Make WAL Pluggable
---

Key: HBASE-4529
URL: https://issues.apache.org/jira/browse/HBASE-4529
Project: HBase
Issue Type: Improvement
Components: regionserver, wal
Reporter: Akash Ashok
Assignee: Akash Ashok
Labels: regionserver, wal

Make WAL a pluggable, configurable component, thus making it easier to write
to different filesystems (including multiple filesystems).
From Stack:
Pluggable WAL component would need to check that the split can deal w/
multiple logs written by the one server concurrently (sort by sequence
edit id after sorting on all the rest that makes up a wal log key).
From Jesse Yates:
It would be nice to be able to tie pluggable WAL component into a service
that logs directly to
disk, rather than go through HDFS giving some potentially awesome speedup at
the cost of having to write a logging service that handles replication, etc.
From Karthik Tunga:
Along with the log replaying part, logic is also needed for log roll.
This, I think, is easier compared to the merging of the logs. Any edits less
than the last sequence number on the file system can be removed from all
the WALs.

[jira] [Created] (HBASE-5100) Rollback of split would cause closed region to opened

2011-12-27 Thread chunhui shen (Created) (JIRA)

Rollback of split would cause closed region to opened 
--

 Key: HBASE-5100
 URL: https://issues.apache.org/jira/browse/HBASE-5100
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen


If master sending close region to rs and region's split transaction 
concurrently happen,
it may cause closed region to opened. 

See the detailed code in SplitTransaction#createDaughters
{code}
ListStoreFile hstoreFilesToSplit = null;
try{
  hstoreFilesToSplit = this.parent.close(false);
  if (hstoreFilesToSplit == null) {
// The region was closed by a concurrent thread.  We can't continue
// with the split, instead we must just abandon the split.  If we
// reopen or split this could cause problems because the region has
// probably already been moved to a different server, or is in the
// process of moving to a different server.
throw new IOException(Failed to close region: already closed by  +
  another thread);
  }
} finally {
  this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
}
{code}

when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes 
this.parent.initialize();

Although this region is not onlined in the regionserver, it may bring some 
potential problem.

For example, in our environment, the closed parent region is rolled back 
sucessfully , and then starting compaction and split again.

The parent region is f892dd6107b6b4130199582abc78e9c1

master log
{code}
2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance 
hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
 src=dw87.kgb.sqa.cm4,60020,1324827866085, 
dest=dw80.kgb.sqa.cm4,60020,1324827865780
2011-12-26 00:24:42,693 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Starting unassignment of region 
writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 (offlining)
2011-12-26 00:24:42,694 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Sent CLOSE to serverName=dw87.kgb.sqa.cm4,60020,1324827866085, 
load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region 
writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling new unassigned node: 
/hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 
(region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
 server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING)
2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_CLOSING, 
server=dw87.kgb.sqa.cm4,60020,1324827866085, 
region=f892dd6107b6b4130199582abc78e9c1
2011-12-26 00:24:45,348 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_CLOSED, 
server=dw87.kgb.sqa.cm4,60020,1324827866085, 
region=f892dd6107b6b4130199582abc78e9c1
2011-12-26 00:24:45,349 DEBUG 
org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
event for f892dd6107b6b4130199582abc78e9c1
2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Forcing OFFLINE; 
was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 state=CLOSED, ts=1324830285347
2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:6-0x13447f283f40e73 Creating (or updating) unassigned node for 
f892dd6107b6b4130199582abc78e9c1 with OFFLINE state
2011-12-26 00:24:45,354 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=M_ZK_REGION_OFFLINE, server=dw75.kgb.sqa.cm4:6, 
region=f892dd6107b6b4130199582abc78e9c1
2011-12-26 00:24:45,354 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Found an existing plan for 
writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 destination server is + serverName=dw80.kgb.sqa.cm4,60020,1324827865780, 
load=(requests=0, regions=1, usedHeap=0, maxHeap=0)
2011-12-26 00:24:45,354 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Using pre-existing plan for region 
writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.;

[jira] [Commented] (HBASE-5100) Rollback of split would cause closed region to opened

2011-12-27 Thread chunhui shen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176465#comment-13176465
 ] 

chunhui shen commented on HBASE-5100:
-

When this.parent.close（false） returns null(It means region has already been 
closed), we needn't add the JournalEntry.CLOSED_PARENT_REGION

 Rollback of split would cause closed region to opened 
 --

 Key: HBASE-5100
 URL: https://issues.apache.org/jira/browse/HBASE-5100
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen

 If master sending close region to rs and region's split transaction 
 concurrently happen,
 it may cause closed region to opened. 
 See the detailed code in SplitTransaction#createDaughters
 {code}
 ListStoreFile hstoreFilesToSplit = null;
 try{
   hstoreFilesToSplit = this.parent.close(false);
   if (hstoreFilesToSplit == null) {
 // The region was closed by a concurrent thread.  We can't continue
 // with the split, instead we must just abandon the split.  If we
 // reopen or split this could cause problems because the region has
 // probably already been moved to a different server, or is in the
 // process of moving to a different server.
 throw new IOException(Failed to close region: already closed by  +
   another thread);
   }
 } finally {
   this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
 }
 {code}
 when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes 
 this.parent.initialize();
 Although this region is not onlined in the regionserver, it may bring some 
 potential problem.
 For example, in our environment, the closed parent region is rolled back 
 sucessfully , and then starting compaction and split again.
 The parent region is f892dd6107b6b4130199582abc78e9c1
 master log
 {code}
 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  src=dw87.kgb.sqa.cm4,60020,1324827866085, 
 dest=dw80.kgb.sqa.cm4,60020,1324827865780
 2011-12-26 00:24:42,693 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  (offlining)
 2011-12-26 00:24:42,694 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, 
 usedHeap=0, maxHeap=0) for region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
 node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 
 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING)
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,348 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  state=CLOSED, ts=1324830285347
 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x13447f283f40e73 Creating (or updating) unassigned node for 
 f892dd6107b6b4130199582abc78e9c1 with OFFLINE state
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=dw75.kgb.sqa.cm4:6, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for

[jira] [Commented] (HBASE-5009) Failure of creating split dir if it already exists prevents splits from happening further

2011-12-27 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176467#comment-13176467
 ] 

Zhihong Yu commented on HBASE-5009:
---

Patch for 0.90 passes tests:
{code}
Tests run: 702, Failures: 0, Errors: 0, Skipped: 9

[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 1:18:08.131s
{code}
Integrated to 0.90 branch.

 Failure of creating split dir if it already exists prevents splits from 
 happening further
 -

 Key: HBASE-5009
 URL: https://issues.apache.org/jira/browse/HBASE-5009
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.1, 0.90.6

 Attachments: 5009.txt, HBASE-5009.patch, HBASE-5009_Branch90.patch


 The scenario is
 - The split of a region takes a long time
 - The deletion of the splitDir fails due to HDFS problems.
 - Subsequent splits also fail after that.
 {code}
 private static void createSplitDir(final FileSystem fs, final Path splitdir)
   throws IOException {
 if (fs.exists(splitdir)) throw new IOException(Splitdir already exits?  
 + splitdir);
 if (!fs.mkdirs(splitdir)) throw new IOException(Failed create of  + 
 splitdir);
   }
 {code}
 Correct me if am wrong? If it is an issue can we change the behaviour of 
 throwing exception?
 Pls suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5100) Rollback of split would cause closed region to opened


[ 
https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176465#comment-13176465
 ] 

Zhihong Yu edited comment on HBASE-5100 at 12/28/11 6:01 AM:
-

When this.parent.close(false) returns null(It means region has already been 
closed), we needn't add the JournalEntry.CLOSED_PARENT_REGION

  was (Author: zjushch):
When this.parent.close（false） returns null(It means region has already been 
closed), we needn't add the JournalEntry.CLOSED_PARENT_REGION
  
 Rollback of split would cause closed region to opened 
 --

 Key: HBASE-5100
 URL: https://issues.apache.org/jira/browse/HBASE-5100
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen

 If master sending close region to rs and region's split transaction 
 concurrently happen,
 it may cause closed region to opened. 
 See the detailed code in SplitTransaction#createDaughters
 {code}
 ListStoreFile hstoreFilesToSplit = null;
 try{
   hstoreFilesToSplit = this.parent.close(false);
   if (hstoreFilesToSplit == null) {
 // The region was closed by a concurrent thread.  We can't continue
 // with the split, instead we must just abandon the split.  If we
 // reopen or split this could cause problems because the region has
 // probably already been moved to a different server, or is in the
 // process of moving to a different server.
 throw new IOException(Failed to close region: already closed by  +
   another thread);
   }
 } finally {
   this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
 }
 {code}
 when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes 
 this.parent.initialize();
 Although this region is not onlined in the regionserver, it may bring some 
 potential problem.
 For example, in our environment, the closed parent region is rolled back 
 sucessfully , and then starting compaction and split again.
 The parent region is f892dd6107b6b4130199582abc78e9c1
 master log
 {code}
 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  src=dw87.kgb.sqa.cm4,60020,1324827866085, 
 dest=dw80.kgb.sqa.cm4,60020,1324827865780
 2011-12-26 00:24:42,693 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  (offlining)
 2011-12-26 00:24:42,694 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, 
 usedHeap=0, maxHeap=0) for region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
 node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 
 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING)
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,348 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  state=CLOSED, ts=1324830285347
 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x13447f283f40e73 Creating (or updating) unassigned node for 
 f892dd6107b6b4130199582abc78e9c1 with OFFLINE state
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=dw75.kgb.sqa.cm4:6, 
 region=f892dd6107b6b4130199582abc78e9c1

[jira] [Commented] (HBASE-5100) Rollback of split would cause closed region to opened


[ 
https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176473#comment-13176473
 ] 

Zhihong Yu commented on HBASE-5100:
---

@Chunhui:
Do you have a patch ?

 Rollback of split would cause closed region to opened 
 --

 Key: HBASE-5100
 URL: https://issues.apache.org/jira/browse/HBASE-5100
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen

 If master sending close region to rs and region's split transaction 
 concurrently happen,
 it may cause closed region to opened. 
 See the detailed code in SplitTransaction#createDaughters
 {code}
 ListStoreFile hstoreFilesToSplit = null;
 try{
   hstoreFilesToSplit = this.parent.close(false);
   if (hstoreFilesToSplit == null) {
 // The region was closed by a concurrent thread.  We can't continue
 // with the split, instead we must just abandon the split.  If we
 // reopen or split this could cause problems because the region has
 // probably already been moved to a different server, or is in the
 // process of moving to a different server.
 throw new IOException(Failed to close region: already closed by  +
   another thread);
   }
 } finally {
   this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
 }
 {code}
 when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes 
 this.parent.initialize();
 Although this region is not onlined in the regionserver, it may bring some 
 potential problem.
 For example, in our environment, the closed parent region is rolled back 
 sucessfully , and then starting compaction and split again.
 The parent region is f892dd6107b6b4130199582abc78e9c1
 master log
 {code}
 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  src=dw87.kgb.sqa.cm4,60020,1324827866085, 
 dest=dw80.kgb.sqa.cm4,60020,1324827865780
 2011-12-26 00:24:42,693 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  (offlining)
 2011-12-26 00:24:42,694 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, 
 usedHeap=0, maxHeap=0) for region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
 node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 
 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING)
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,348 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  state=CLOSED, ts=1324830285347
 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x13447f283f40e73 Creating (or updating) unassigned node for 
 f892dd6107b6b4130199582abc78e9c1 with OFFLINE state
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=dw75.kgb.sqa.cm4:6, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  destination server is + serverName=dw80.kgb.sqa.cm4,60020,1324827865780,

[jira] [Updated] (HBASE-5100) Rollback of split would cause closed region to opened

2011-12-27 Thread chunhui shen (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-5100:


Attachment: hbase-5100.patch

 Rollback of split would cause closed region to opened 
 --

 Key: HBASE-5100
 URL: https://issues.apache.org/jira/browse/HBASE-5100
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: hbase-5100.patch


 If master sending close region to rs and region's split transaction 
 concurrently happen,
 it may cause closed region to opened. 
 See the detailed code in SplitTransaction#createDaughters
 {code}
 ListStoreFile hstoreFilesToSplit = null;
 try{
   hstoreFilesToSplit = this.parent.close(false);
   if (hstoreFilesToSplit == null) {
 // The region was closed by a concurrent thread.  We can't continue
 // with the split, instead we must just abandon the split.  If we
 // reopen or split this could cause problems because the region has
 // probably already been moved to a different server, or is in the
 // process of moving to a different server.
 throw new IOException(Failed to close region: already closed by  +
   another thread);
   }
 } finally {
   this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
 }
 {code}
 when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes 
 this.parent.initialize();
 Although this region is not onlined in the regionserver, it may bring some 
 potential problem.
 For example, in our environment, the closed parent region is rolled back 
 sucessfully , and then starting compaction and split again.
 The parent region is f892dd6107b6b4130199582abc78e9c1
 master log
 {code}
 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  src=dw87.kgb.sqa.cm4,60020,1324827866085, 
 dest=dw80.kgb.sqa.cm4,60020,1324827865780
 2011-12-26 00:24:42,693 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  (offlining)
 2011-12-26 00:24:42,694 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, 
 usedHeap=0, maxHeap=0) for region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
 node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 
 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING)
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,348 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  state=CLOSED, ts=1324830285347
 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x13447f283f40e73 Creating (or updating) unassigned node for 
 f892dd6107b6b4130199582abc78e9c1 with OFFLINE state
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=dw75.kgb.sqa.cm4:6, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  destination server is + serverName=dw80.kgb.sqa.cm4,60020,1324827865780, 
 load=(requests=0, regions=1,

[jira] [Updated] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.