date:20120331

2012-03-31 Thread ramkrishna.s.vasudevan (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243052#comment-13243052
 ] 

Lars Hofhansl commented on HBASE-5682:
--

v2 passes all tests locally.

 Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
 only)
 --

 Key: HBASE-5682
 URL: https://issues.apache.org/jira/browse/HBASE-5682
 Project: HBase
  Issue Type: Improvement
  Components: client
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.1

 Attachments: 5682-v2.txt, 5682.txt


 Just realized that without this HBASE-4805 is broken.
 I.e. there's no point keeping a persistent HConnection around if it can be 
 rendered permanently unusable if the ZK connection is lost temporarily.
 Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
 backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5689) Skip RecoveredEdits may cause data loss


[ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243057#comment-13243057
 ] 

ramkrishna.s.vasudevan commented on HBASE-5689:
---

@Chunhui

Good test case.  Yes able to reproduce the problem :).. Just try to understand 
more on what is happening there.  

 Skip RecoveredEdits may cause data loss
 ---

 Key: HBASE-5689
 URL: https://issues.apache.org/jira/browse/HBASE-5689
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: 5689-testcase.patch


 Let's see the following scenario:
 1.Region is on the server A
 2.put KV(r1-v1) to the region
 3.move region from server A to server B
 4.put KV(r2-v2) to the region
 5.move region from server B to server A
 6.put KV(r3-v3) to the region
 7.kill -9 server B and start it
 8.kill -9 server A and start it 
 9.scan the region, we could only get two KV(r1-v1,r2-v2), the third 
 KV(r3-v3) is lost.
 Let's analyse the upper scenario from the code:
 1.the edit logs of KV(r1-v1) and KV(r3-v3) are both recorded in the same 
 hlog file on server A.
 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
 we create one RecoveredEdits file f1 for the region.
 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
 we create another RecoveredEdits file f2 for the region.
 3.however, RecoveredEdits file f2 will be skiped when initializing region
 HRegion#replayRecoveredEditsIfAny
 {code}
  for (Path edits: files) {
   if (edits == null || !this.fs.exists(edits)) {
 LOG.warn(Null or non-existent edits file:  + edits);
 continue;
   }
   if (isZeroLengthThenDelete(this.fs, edits)) continue;
   if (checkSafeToSkip) {
 Path higher = files.higher(edits);
 long maxSeqId = Long.MAX_VALUE;
 if (higher != null) {
   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: -?[0-9]+
   String fileName = higher.getName();
   maxSeqId = Math.abs(Long.parseLong(fileName));
 }
 if (maxSeqId = minSeqId) {
   String msg = Maximum possible sequenceid for this log is  + 
 maxSeqId
   + , skipped the whole file, path= + edits;
   LOG.debug(msg);
   continue;
 } else {
   checkSafeToSkip = false;
 }
   }
 {code}
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5689) Skip RecoveredEdits may cause data loss

2012-03-31 Thread ramkrishna.s.vasudevan (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243057#comment-13243057
 ] 

ramkrishna.s.vasudevan edited comment on HBASE-5689 at 3/31/12 6:53 AM:


@Chunhui

Good test case.  Yes able to reproduce the problem :).. Just try to understand 
more on what is happening there.  
{Edit}Good test case.  Yes able to reproduce the problem :).. Just trying to 
understand more on what is happening there.  

  was (Author: ram_krish):
@Chunhui

Good test case.  Yes able to reproduce the problem :).. Just try to understand 
more on what is happening there.  
  
 Skip RecoveredEdits may cause data loss
 ---

 Key: HBASE-5689
 URL: https://issues.apache.org/jira/browse/HBASE-5689
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: 5689-testcase.patch


 Let's see the following scenario:
 1.Region is on the server A
 2.put KV(r1-v1) to the region
 3.move region from server A to server B
 4.put KV(r2-v2) to the region
 5.move region from server B to server A
 6.put KV(r3-v3) to the region
 7.kill -9 server B and start it
 8.kill -9 server A and start it 
 9.scan the region, we could only get two KV(r1-v1,r2-v2), the third 
 KV(r3-v3) is lost.
 Let's analyse the upper scenario from the code:
 1.the edit logs of KV(r1-v1) and KV(r3-v3) are both recorded in the same 
 hlog file on server A.
 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
 we create one RecoveredEdits file f1 for the region.
 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
 we create another RecoveredEdits file f2 for the region.
 3.however, RecoveredEdits file f2 will be skiped when initializing region
 HRegion#replayRecoveredEditsIfAny
 {code}
  for (Path edits: files) {
   if (edits == null || !this.fs.exists(edits)) {
 LOG.warn(Null or non-existent edits file:  + edits);
 continue;
   }
   if (isZeroLengthThenDelete(this.fs, edits)) continue;
   if (checkSafeToSkip) {
 Path higher = files.higher(edits);
 long maxSeqId = Long.MAX_VALUE;
 if (higher != null) {
   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: -?[0-9]+
   String fileName = higher.getName();
   maxSeqId = Math.abs(Long.parseLong(fileName));
 }
 if (maxSeqId = minSeqId) {
   String msg = Maximum possible sequenceid for this log is  + 
 maxSeqId
   + , skipped the whole file, path= + edits;
   LOG.debug(msg);
   continue;
 } else {
   checkSafeToSkip = false;
 }
   }
 {code}
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5689) Skip RecoveredEdits may cause data loss

2012-03-31 Thread chunhui shen (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-5689:


Attachment: HBASE-5689.patch

In the patch,
I make the region's MaximumEditLogSeqNum in the RecoveredEdit file as the file 
name.

 Skip RecoveredEdits may cause data loss
 ---

 Key: HBASE-5689
 URL: https://issues.apache.org/jira/browse/HBASE-5689
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: 5689-testcase.patch, HBASE-5689.patch


 Let's see the following scenario:
 1.Region is on the server A
 2.put KV(r1-v1) to the region
 3.move region from server A to server B
 4.put KV(r2-v2) to the region
 5.move region from server B to server A
 6.put KV(r3-v3) to the region
 7.kill -9 server B and start it
 8.kill -9 server A and start it 
 9.scan the region, we could only get two KV(r1-v1,r2-v2), the third 
 KV(r3-v3) is lost.
 Let's analyse the upper scenario from the code:
 1.the edit logs of KV(r1-v1) and KV(r3-v3) are both recorded in the same 
 hlog file on server A.
 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
 we create one RecoveredEdits file f1 for the region.
 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
 we create another RecoveredEdits file f2 for the region.
 3.however, RecoveredEdits file f2 will be skiped when initializing region
 HRegion#replayRecoveredEditsIfAny
 {code}
  for (Path edits: files) {
   if (edits == null || !this.fs.exists(edits)) {
 LOG.warn(Null or non-existent edits file:  + edits);
 continue;
   }
   if (isZeroLengthThenDelete(this.fs, edits)) continue;
   if (checkSafeToSkip) {
 Path higher = files.higher(edits);
 long maxSeqId = Long.MAX_VALUE;
 if (higher != null) {
   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: -?[0-9]+
   String fileName = higher.getName();
   maxSeqId = Math.abs(Long.parseLong(fileName));
 }
 if (maxSeqId = minSeqId) {
   String msg = Maximum possible sequenceid for this log is  + 
 maxSeqId
   + , skipped the whole file, path= + edits;
   LOG.debug(msg);
   continue;
 } else {
   checkSafeToSkip = false;
 }
   }
 {code}
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5689) Skip RecoveredEdits may cause data loss

2012-03-31 Thread chunhui shen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243070#comment-13243070
 ] 

chunhui shen commented on HBASE-5689:
-

rather than the current MinimumEditLogSeqNum as the file name.


 Skip RecoveredEdits may cause data loss
 ---

 Key: HBASE-5689
 URL: https://issues.apache.org/jira/browse/HBASE-5689
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: 5689-testcase.patch, HBASE-5689.patch


 Let's see the following scenario:
 1.Region is on the server A
 2.put KV(r1-v1) to the region
 3.move region from server A to server B
 4.put KV(r2-v2) to the region
 5.move region from server B to server A
 6.put KV(r3-v3) to the region
 7.kill -9 server B and start it
 8.kill -9 server A and start it 
 9.scan the region, we could only get two KV(r1-v1,r2-v2), the third 
 KV(r3-v3) is lost.
 Let's analyse the upper scenario from the code:
 1.the edit logs of KV(r1-v1) and KV(r3-v3) are both recorded in the same 
 hlog file on server A.
 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
 we create one RecoveredEdits file f1 for the region.
 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
 we create another RecoveredEdits file f2 for the region.
 3.however, RecoveredEdits file f2 will be skiped when initializing region
 HRegion#replayRecoveredEditsIfAny
 {code}
  for (Path edits: files) {
   if (edits == null || !this.fs.exists(edits)) {
 LOG.warn(Null or non-existent edits file:  + edits);
 continue;
   }
   if (isZeroLengthThenDelete(this.fs, edits)) continue;
   if (checkSafeToSkip) {
 Path higher = files.higher(edits);
 long maxSeqId = Long.MAX_VALUE;
 if (higher != null) {
   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: -?[0-9]+
   String fileName = higher.getName();
   maxSeqId = Math.abs(Long.parseLong(fileName));
 }
 if (maxSeqId = minSeqId) {
   String msg = Maximum possible sequenceid for this log is  + 
 maxSeqId
   + , skipped the whole file, path= + edits;
   LOG.debug(msg);
   continue;
 } else {
   checkSafeToSkip = false;
 }
   }
 {code}
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5690) compression unavailable

2012-03-31 Thread honghua zhu (Created) (JIRA)

compression unavailable
---

 Key: HBASE-5690
 URL: https://issues.apache.org/jira/browse/HBASE-5690
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
 Environment: all
Reporter: honghua zhu
Priority: Critical
 Fix For: 0.94.0


HBASE-5442 The store.createWriterInTmp method missing compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5690) compression unavailable

2012-03-31 Thread honghua zhu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

honghua zhu updated HBASE-5690:
---

Attachment: Store.patch

 compression unavailable
 ---

 Key: HBASE-5690
 URL: https://issues.apache.org/jira/browse/HBASE-5690
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
 Environment: all
Reporter: honghua zhu
Priority: Critical
 Fix For: 0.94.0

 Attachments: Store.patch


 HBASE-5442 The store.createWriterInTmp method missing compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5663) MultithreadedTableMapper doesn't work.

2012-03-31 Thread Takuya Ueshin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated HBASE-5663:
-

Attachment: HBASE-5663.patch

I attached a patch file.

Could someone please review and refactor it?
I think this is a little dirty code but I couldn't find a better way.

 MultithreadedTableMapper doesn't work.
 --

 Key: HBASE-5663
 URL: https://issues.apache.org/jira/browse/HBASE-5663
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.94.0
Reporter: Takuya Ueshin
 Attachments: HBASE-5663.patch


 MapReduce job using MultithreadedTableMapper goes down throwing the following 
 Exception:
 {noformat}
 java.io.IOException: java.lang.NoSuchMethodException: 
 org.apache.hadoop.mapreduce.Mapper$Context.init(org.apache.hadoop.conf.Configuration,
  org.apache.hadoop.mapred.TaskAttemptID, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
  
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
  org.apache.hadoop.hbase.mapreduce.TableSplit)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.init(MultithreadedTableMapper.java:260)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.NoSuchMethodException: 
 org.apache.hadoop.mapreduce.Mapper$Context.init(org.apache.hadoop.conf.Configuration,
  org.apache.hadoop.mapred.TaskAttemptID, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
  
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
  org.apache.hadoop.hbase.mapreduce.TableSplit)
   at java.lang.Class.getConstructor0(Class.java:2706)
   at java.lang.Class.getConstructor(Class.java:1657)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.init(MultithreadedTableMapper.java:241)
   ... 8 more
 {noformat}
 This occured when the tasks are creating MapRunner threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5673) The OOM problem of IPC client call cause all handle block

2012-03-31 Thread xufeng (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243080#comment-13243080
 ] 

xufeng commented on HBASE-5673:
---

@Stack @Ted
I analyze the problem of my patch.
this is the result:
I wrap all exception in IOException,this IOException can not be handled in 
CatalogTracker#private HRegionInterface getCachedConnection(ServerName sn)
so the master will abort,the cases will fail.


In the future,I will submit the patch with the test result.

 The OOM problem of IPC client call  cause all handle block
 --

 Key: HBASE-5673
 URL: https://issues.apache.org/jira/browse/HBASE-5673
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
 Environment: 0.90.6
Reporter: xufeng
Assignee: xufeng
 Fix For: 0.90.7, 0.92.2, 0.94.1

 Attachments: HBASE-5673-90-V2.patch, HBASE-5673-90.patch


 if HBaseClient meet unable to create new native thread exception, the call 
 will never complete because it be lost in calls queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5536) Make it clear that hbase 0.96 requires hadoop 1.0.0 at least; we will no longer work on older versions

2012-03-31 Thread ramkrishna.s.vasudevan (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243085#comment-13243085
 ] 

ramkrishna.s.vasudevan commented on HBASE-5536:
---

bq.So, Hbase/Other dependant users need need end up in reflections for every 
version compatibility
I think you meant need 'not'

 Make it clear that hbase 0.96 requires hadoop 1.0.0 at least; we will no 
 longer work on older versions
 --

 Key: HBASE-5536
 URL: https://issues.apache.org/jira/browse/HBASE-5536
 Project: HBase
  Issue Type: Task
Reporter: stack
Priority: Blocker
 Fix For: 0.96.0


 Looks like there is pretty much consensus that depending on 1.0.0 in 0.96 
 should be fine?  See 
 http://search-hadoop.com/m/dSbVW14EsUb2/discuss+0.96subj=RE+DISCUSS+Have+hbase+require+at+least+hadoop+1+0+0+in+hbase+0+96+0+

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5635) If getTaskList() returns null splitlogWorker is down. It wont serve any requests.

2012-03-31 Thread ramkrishna.s.vasudevan (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243087#comment-13243087
 ] 

ramkrishna.s.vasudevan commented on HBASE-5635:
---

bq.Do we need a timeout for the newly added loop ?
If we have timeout, then if the zk connection takes a little longer than the 
timeout how to make the worker thread to be alive?  That is the reason an 
infinite loop was tried.

 If getTaskList() returns null splitlogWorker is down. It wont serve any 
 requests. 
 --

 Key: HBASE-5635
 URL: https://issues.apache.org/jira/browse/HBASE-5635
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.1
Reporter: Kristam Subba Swathi
 Attachments: HBASE-5635.1.patch, HBASE-5635.patch


 During the hlog split operation if all the zookeepers are down ,then the 
 paths will be returned as null and the splitworker thread wil be exited
 Now this regionserver wil not be able to acquire any other tasks since the 
 splitworker thread is exited
 Please find the attached code for more details
 {code}
 private ListString getTaskList() {
 for (int i = 0; i  zkretries; i++) {
   try {
 return (ZKUtil.listChildrenAndWatchForNewChildren(this.watcher,
 this.watcher.splitLogZNode));
   } catch (KeeperException e) {
 LOG.warn(Could not get children of znode  +
 this.watcher.splitLogZNode, e);
 try {
   Thread.sleep(1000);
 } catch (InterruptedException e1) {
   LOG.warn(Interrupted while trying to get task list ..., e1);
   Thread.currentThread().interrupt();
   return null;
 }
   }
 }
 {code}
 in the org.apache.hadoop.hbase.regionserver.SplitLogWorker 
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5536) Make it clear that hbase 0.96 requires hadoop 1.0.0 at least; we will no longer work on older versions

2012-03-31 Thread Uma Maheswara Rao G (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243092#comment-13243092
 ] 

Uma Maheswara Rao G commented on HBASE-5536:


Yes, Ram. typo :-).

Here is what I meant:
We should not access any non exposed APIs from HDFS at least from this version. 
We should bring the requirement to HDFS about the usage and need of that APIs 
in Hbase. Get the access throw some public interfacing. So, that even though 
HDFS changes internal APIs, public interfacing would remains same. So, 
Hbase/Other dependent users need not end up with reflections for every 
compatibility change.

 Make it clear that hbase 0.96 requires hadoop 1.0.0 at least; we will no 
 longer work on older versions
 --

 Key: HBASE-5536
 URL: https://issues.apache.org/jira/browse/HBASE-5536
 Project: HBase
  Issue Type: Task
Reporter: stack
Priority: Blocker
 Fix For: 0.96.0


 Looks like there is pretty much consensus that depending on 1.0.0 in 0.96 
 should be fine?  See 
 http://search-hadoop.com/m/dSbVW14EsUb2/discuss+0.96subj=RE+DISCUSS+Have+hbase+require+at+least+hadoop+1+0+0+in+hbase+0+96+0+

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5689) Skip RecoveredEdits may cause data loss

2012-03-31 Thread ramkrishna.s.vasudevan (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243106#comment-13243106
 ] 

ramkrishna.s.vasudevan commented on HBASE-5689:
---

@Chunhui
Thanks for the patch.
I would like to tell my analysis in this
- Suppose the current  seq for RS 1 is 4
When the first row kv was inserted KV(r1-v1), the current seq is 5.
On moving the region to RS2 the store gets flushed and when RS2 opens the 
region the next seq he can use will be 6.
Now we make the next kv entry KV(r2-v1), now the current seq for this entry is 
6. Now when the region is moved again to RS1, another store file is created by 
RS2.
Now when RS1 opens the region the seq number which he can use will be 7.

We now add an entry KV(r3-v1) again in RS1 so it will have 7 in it (in WAL).

Kill the RS2 first.  This will create a recovered.edits with file name 06.

Kill RS1.   This will create a recovered.edits with file name 5.

Now when the region is finally opened in a new RS i will be having the 2 store 
files and the max seq id from them will be 6.  Now the recovered.edits will 
also give me 6 as highest seq.  
{code}
  if (maxSeqId = minSeqId) {
  String msg = Maximum possible sequenceid for this log is  + maxSeqId
  + , skipped the whole file, path= + edits;
  LOG.debug(msg);
  continue;
{code}
Correct me my analysis is wrong.



 Skip RecoveredEdits may cause data loss
 ---

 Key: HBASE-5689
 URL: https://issues.apache.org/jira/browse/HBASE-5689
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: 5689-testcase.patch, HBASE-5689.patch


 Let's see the following scenario:
 1.Region is on the server A
 2.put KV(r1-v1) to the region
 3.move region from server A to server B
 4.put KV(r2-v2) to the region
 5.move region from server B to server A
 6.put KV(r3-v3) to the region
 7.kill -9 server B and start it
 8.kill -9 server A and start it 
 9.scan the region, we could only get two KV(r1-v1,r2-v2), the third 
 KV(r3-v3) is lost.
 Let's analyse the upper scenario from the code:
 1.the edit logs of KV(r1-v1) and KV(r3-v3) are both recorded in the same 
 hlog file on server A.
 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
 we create one RecoveredEdits file f1 for the region.
 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
 we create another RecoveredEdits file f2 for the region.
 3.however, RecoveredEdits file f2 will be skiped when initializing region
 HRegion#replayRecoveredEditsIfAny
 {code}
  for (Path edits: files) {
   if (edits == null || !this.fs.exists(edits)) {
 LOG.warn(Null or non-existent edits file:  + edits);
 continue;
   }
   if (isZeroLengthThenDelete(this.fs, edits)) continue;
   if (checkSafeToSkip) {
 Path higher = files.higher(edits);
 long maxSeqId = Long.MAX_VALUE;
 if (higher != null) {
   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: -?[0-9]+
   String fileName = higher.getName();
   maxSeqId = Math.abs(Long.parseLong(fileName));
 }
 if (maxSeqId = minSeqId) {
   String msg = Maximum possible sequenceid for this log is  + 
 maxSeqId
   + , skipped the whole file, path= + edits;
   LOG.debug(msg);
   continue;
 } else {
   checkSafeToSkip = false;
 }
   }
 {code}
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5689) Skip RecoveredEdits may cause data loss

2012-03-31 Thread ramkrishna.s.vasudevan (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243109#comment-13243109
 ] 

ramkrishna.s.vasudevan commented on HBASE-5689:
---

I just want to understand here like removing the 
{code}
  if (maxSeqId = minSeqId) {
  String msg = Maximum possible sequenceid for this log is  + maxSeqId
  + , skipped the whole file, path= + edits;
  LOG.debug(msg);
  continue;
{code}
'continue' here should solve the problem.  Inside replayRecoveredEdits we have 
this code
{code}
 // Now, figure if we should skip this edit.
  if (key.getLogSeqNum() = currentEditSeqId) {
skippedEdits++;
continue;
  }
{code}
which will any way skip unnecessary edits.  I tried by commenting the 
'continue' in the code and the test case that you gave passed.  It is my 
thought, your comments are welcome.

 Skip RecoveredEdits may cause data loss
 ---

 Key: HBASE-5689
 URL: https://issues.apache.org/jira/browse/HBASE-5689
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: 5689-testcase.patch, HBASE-5689.patch


 Let's see the following scenario:
 1.Region is on the server A
 2.put KV(r1-v1) to the region
 3.move region from server A to server B
 4.put KV(r2-v2) to the region
 5.move region from server B to server A
 6.put KV(r3-v3) to the region
 7.kill -9 server B and start it
 8.kill -9 server A and start it 
 9.scan the region, we could only get two KV(r1-v1,r2-v2), the third 
 KV(r3-v3) is lost.
 Let's analyse the upper scenario from the code:
 1.the edit logs of KV(r1-v1) and KV(r3-v3) are both recorded in the same 
 hlog file on server A.
 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
 we create one RecoveredEdits file f1 for the region.
 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
 we create another RecoveredEdits file f2 for the region.
 3.however, RecoveredEdits file f2 will be skiped when initializing region
 HRegion#replayRecoveredEditsIfAny
 {code}
  for (Path edits: files) {
   if (edits == null || !this.fs.exists(edits)) {
 LOG.warn(Null or non-existent edits file:  + edits);
 continue;
   }
   if (isZeroLengthThenDelete(this.fs, edits)) continue;
   if (checkSafeToSkip) {
 Path higher = files.higher(edits);
 long maxSeqId = Long.MAX_VALUE;
 if (higher != null) {
   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: -?[0-9]+
   String fileName = higher.getName();
   maxSeqId = Math.abs(Long.parseLong(fileName));
 }
 if (maxSeqId = minSeqId) {
   String msg = Maximum possible sequenceid for this log is  + 
 maxSeqId
   + , skipped the whole file, path= + edits;
   LOG.debug(msg);
   continue;
 } else {
   checkSafeToSkip = false;
 }
   }
 {code}
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5690) compression unavailable

2012-03-31 Thread honghua zhu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

honghua zhu updated HBASE-5690:
---

Fix Version/s: (was: 0.94.0)
   0.94.1

 compression unavailable
 ---

 Key: HBASE-5690
 URL: https://issues.apache.org/jira/browse/HBASE-5690
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
 Environment: all
Reporter: honghua zhu
Priority: Critical
 Fix For: 0.94.1

 Attachments: Store.patch


 HBASE-5442 The store.createWriterInTmp method missing compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)


[ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243141#comment-13243141
 ] 

Zhihong Yu commented on HBASE-5682:
---

+1 on patch.

 Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
 only)
 --

 Key: HBASE-5682
 URL: https://issues.apache.org/jira/browse/HBASE-5682
 Project: HBase
  Issue Type: Improvement
  Components: client
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.1

 Attachments: 5682-v2.txt, 5682.txt


 Just realized that without this HBASE-4805 is broken.
 I.e. there's no point keeping a persistent HConnection around if it can be 
 rendered permanently unusable if the ZK connection is lost temporarily.
 Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
 backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5690) compression unavailable


[ 
https://issues.apache.org/jira/browse/HBASE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243142#comment-13243142
 ] 

Zhihong Yu commented on HBASE-5690:
---

+1 on patch.

HBASE-5605 fixed the same problem in trunk.

 compression unavailable
 ---

 Key: HBASE-5690
 URL: https://issues.apache.org/jira/browse/HBASE-5690
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
 Environment: all
Reporter: honghua zhu
Priority: Critical
 Fix For: 0.94.1

 Attachments: Store.patch


 HBASE-5442 The store.createWriterInTmp method missing compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5690) compression does not work in Store.java of 0.94


 [ 
https://issues.apache.org/jira/browse/HBASE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5690:
--

Hadoop Flags: Reviewed
 Summary: compression does not work in Store.java of 0.94  (was: 
compression unavailable)

 compression does not work in Store.java of 0.94
 ---

 Key: HBASE-5690
 URL: https://issues.apache.org/jira/browse/HBASE-5690
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
 Environment: all
Reporter: honghua zhu
Priority: Critical
 Fix For: 0.94.1

 Attachments: Store.patch


 HBASE-5442 The store.createWriterInTmp method missing compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-5690) compression does not work in Store.java of 0.94

2012-03-31 Thread Zhihong Yu (Assigned) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu reassigned HBASE-5690:
-

Assignee: honghua zhu

 compression does not work in Store.java of 0.94
 ---

 Key: HBASE-5690
 URL: https://issues.apache.org/jira/browse/HBASE-5690
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
 Environment: all
Reporter: honghua zhu
Assignee: honghua zhu
Priority: Critical
 Fix For: 0.94.1

 Attachments: Store.patch


 HBASE-5442 The store.createWriterInTmp method missing compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5635) If getTaskList() returns null splitlogWorker is down. It wont serve any requests.


[ 
https://issues.apache.org/jira/browse/HBASE-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243147#comment-13243147
 ] 

Zhihong Yu commented on HBASE-5635:
---

Then we should log a message saying what we wait for every X minutes so that 
user doesn't have to use jstack.

 If getTaskList() returns null splitlogWorker is down. It wont serve any 
 requests. 
 --

 Key: HBASE-5635
 URL: https://issues.apache.org/jira/browse/HBASE-5635
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.1
Reporter: Kristam Subba Swathi
 Attachments: HBASE-5635.1.patch, HBASE-5635.patch


 During the hlog split operation if all the zookeepers are down ,then the 
 paths will be returned as null and the splitworker thread wil be exited
 Now this regionserver wil not be able to acquire any other tasks since the 
 splitworker thread is exited
 Please find the attached code for more details
 {code}
 private ListString getTaskList() {
 for (int i = 0; i  zkretries; i++) {
   try {
 return (ZKUtil.listChildrenAndWatchForNewChildren(this.watcher,
 this.watcher.splitLogZNode));
   } catch (KeeperException e) {
 LOG.warn(Could not get children of znode  +
 this.watcher.splitLogZNode, e);
 try {
   Thread.sleep(1000);
 } catch (InterruptedException e1) {
   LOG.warn(Interrupted while trying to get task list ..., e1);
   Thread.currentThread().interrupt();
   return null;
 }
   }
 }
 {code}
 in the org.apache.hadoop.hbase.regionserver.SplitLogWorker 
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5663) MultithreadedTableMapper doesn't work.


 [ 
https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5663:
--

Hadoop Flags: Reviewed
  Status: Patch Available  (was: Open)

 MultithreadedTableMapper doesn't work.
 --

 Key: HBASE-5663
 URL: https://issues.apache.org/jira/browse/HBASE-5663
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.94.0
Reporter: Takuya Ueshin
 Attachments: HBASE-5663.patch


 MapReduce job using MultithreadedTableMapper goes down throwing the following 
 Exception:
 {noformat}
 java.io.IOException: java.lang.NoSuchMethodException: 
 org.apache.hadoop.mapreduce.Mapper$Context.init(org.apache.hadoop.conf.Configuration,
  org.apache.hadoop.mapred.TaskAttemptID, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
  
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
  org.apache.hadoop.hbase.mapreduce.TableSplit)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.init(MultithreadedTableMapper.java:260)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.NoSuchMethodException: 
 org.apache.hadoop.mapreduce.Mapper$Context.init(org.apache.hadoop.conf.Configuration,
  org.apache.hadoop.mapred.TaskAttemptID, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
  
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
  org.apache.hadoop.hbase.mapreduce.TableSplit)
   at java.lang.Class.getConstructor0(Class.java:2706)
   at java.lang.Class.getConstructor(Class.java:1657)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.init(MultithreadedTableMapper.java:241)
   ... 8 more
 {noformat}
 This occured when the tasks are creating MapRunner threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5663) MultithreadedTableMapper doesn't work.

2012-03-31 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243150#comment-13243150
 ] 

Zhihong Yu commented on HBASE-5663:
---

We could check the Exception was caused by NoSuchMethodException.
Since the retry is in MapRunner ctor, the above may not be needed.

 MultithreadedTableMapper doesn't work.
 --

 Key: HBASE-5663
 URL: https://issues.apache.org/jira/browse/HBASE-5663
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.94.0
Reporter: Takuya Ueshin
 Attachments: HBASE-5663.patch


 MapReduce job using MultithreadedTableMapper goes down throwing the following 
 Exception:
 {noformat}
 java.io.IOException: java.lang.NoSuchMethodException: 
 org.apache.hadoop.mapreduce.Mapper$Context.init(org.apache.hadoop.conf.Configuration,
  org.apache.hadoop.mapred.TaskAttemptID, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
  
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
  org.apache.hadoop.hbase.mapreduce.TableSplit)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.init(MultithreadedTableMapper.java:260)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.NoSuchMethodException: 
 org.apache.hadoop.mapreduce.Mapper$Context.init(org.apache.hadoop.conf.Configuration,
  org.apache.hadoop.mapred.TaskAttemptID, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
  
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
  org.apache.hadoop.hbase.mapreduce.TableSplit)
   at java.lang.Class.getConstructor0(Class.java:2706)
   at java.lang.Class.getConstructor(Class.java:1657)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.init(MultithreadedTableMapper.java:241)
   ... 8 more
 {noformat}
 This occured when the tasks are creating MapRunner threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5689) Skip RecoveredEdits may cause data loss


[ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243057#comment-13243057
 ] 

Zhihong Yu edited comment on HBASE-5689 at 3/31/12 2:00 PM:


@Chunhui

Good test case.  Yes able to reproduce the problem :).. Just trying to 
understand more on what is happening there.  

  was (Author: ram_krish):
@Chunhui

Good test case.  Yes able to reproduce the problem :).. Just try to understand 
more on what is happening there.  
{Edit}Good test case.  Yes able to reproduce the problem :).. Just trying to 
understand more on what is happening there.  
  
 Skip RecoveredEdits may cause data loss
 ---

 Key: HBASE-5689
 URL: https://issues.apache.org/jira/browse/HBASE-5689
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: 5689-testcase.patch, HBASE-5689.patch


 Let's see the following scenario:
 1.Region is on the server A
 2.put KV(r1-v1) to the region
 3.move region from server A to server B
 4.put KV(r2-v2) to the region
 5.move region from server B to server A
 6.put KV(r3-v3) to the region
 7.kill -9 server B and start it
 8.kill -9 server A and start it 
 9.scan the region, we could only get two KV(r1-v1,r2-v2), the third 
 KV(r3-v3) is lost.
 Let's analyse the upper scenario from the code:
 1.the edit logs of KV(r1-v1) and KV(r3-v3) are both recorded in the same 
 hlog file on server A.
 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
 we create one RecoveredEdits file f1 for the region.
 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
 we create another RecoveredEdits file f2 for the region.
 3.however, RecoveredEdits file f2 will be skiped when initializing region
 HRegion#replayRecoveredEditsIfAny
 {code}
  for (Path edits: files) {
   if (edits == null || !this.fs.exists(edits)) {
 LOG.warn(Null or non-existent edits file:  + edits);
 continue;
   }
   if (isZeroLengthThenDelete(this.fs, edits)) continue;
   if (checkSafeToSkip) {
 Path higher = files.higher(edits);
 long maxSeqId = Long.MAX_VALUE;
 if (higher != null) {
   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: -?[0-9]+
   String fileName = higher.getName();
   maxSeqId = Math.abs(Long.parseLong(fileName));
 }
 if (maxSeqId = minSeqId) {
   String msg = Maximum possible sequenceid for this log is  + 
 maxSeqId
   + , skipped the whole file, path= + edits;
   LOG.debug(msg);
   continue;
 } else {
   checkSafeToSkip = false;
 }
   }
 {code}
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5689) Skip RecoveredEdits may cause data loss


[ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243154#comment-13243154
 ] 

Zhihong Yu commented on HBASE-5689:
---

From the above analysis, it looks like this bug was caused by the optimization 
made in HBASE-4797.

 Skip RecoveredEdits may cause data loss
 ---

 Key: HBASE-5689
 URL: https://issues.apache.org/jira/browse/HBASE-5689
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: 5689-testcase.patch, HBASE-5689.patch


 Let's see the following scenario:
 1.Region is on the server A
 2.put KV(r1-v1) to the region
 3.move region from server A to server B
 4.put KV(r2-v2) to the region
 5.move region from server B to server A
 6.put KV(r3-v3) to the region
 7.kill -9 server B and start it
 8.kill -9 server A and start it 
 9.scan the region, we could only get two KV(r1-v1,r2-v2), the third 
 KV(r3-v3) is lost.
 Let's analyse the upper scenario from the code:
 1.the edit logs of KV(r1-v1) and KV(r3-v3) are both recorded in the same 
 hlog file on server A.
 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
 we create one RecoveredEdits file f1 for the region.
 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
 we create another RecoveredEdits file f2 for the region.
 3.however, RecoveredEdits file f2 will be skiped when initializing region
 HRegion#replayRecoveredEditsIfAny
 {code}
  for (Path edits: files) {
   if (edits == null || !this.fs.exists(edits)) {
 LOG.warn(Null or non-existent edits file:  + edits);
 continue;
   }
   if (isZeroLengthThenDelete(this.fs, edits)) continue;
   if (checkSafeToSkip) {
 Path higher = files.higher(edits);
 long maxSeqId = Long.MAX_VALUE;
 if (higher != null) {
   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: -?[0-9]+
   String fileName = higher.getName();
   maxSeqId = Math.abs(Long.parseLong(fileName));
 }
 if (maxSeqId = minSeqId) {
   String msg = Maximum possible sequenceid for this log is  + 
 maxSeqId
   + , skipped the whole file, path= + edits;
   LOG.debug(msg);
   continue;
 } else {
   checkSafeToSkip = false;
 }
   }
 {code}
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5689) Skipping RecoveredEdits may cause data loss


 [ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5689:
--

Summary: Skipping RecoveredEdits may cause data loss  (was: Skip 
RecoveredEdits may cause data loss)

 Skipping RecoveredEdits may cause data loss
 ---

 Key: HBASE-5689
 URL: https://issues.apache.org/jira/browse/HBASE-5689
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: 5689-testcase.patch, HBASE-5689.patch


 Let's see the following scenario:
 1.Region is on the server A
 2.put KV(r1-v1) to the region
 3.move region from server A to server B
 4.put KV(r2-v2) to the region
 5.move region from server B to server A
 6.put KV(r3-v3) to the region
 7.kill -9 server B and start it
 8.kill -9 server A and start it 
 9.scan the region, we could only get two KV(r1-v1,r2-v2), the third 
 KV(r3-v3) is lost.
 Let's analyse the upper scenario from the code:
 1.the edit logs of KV(r1-v1) and KV(r3-v3) are both recorded in the same 
 hlog file on server A.
 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
 we create one RecoveredEdits file f1 for the region.
 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
 we create another RecoveredEdits file f2 for the region.
 3.however, RecoveredEdits file f2 will be skiped when initializing region
 HRegion#replayRecoveredEditsIfAny
 {code}
  for (Path edits: files) {
   if (edits == null || !this.fs.exists(edits)) {
 LOG.warn(Null or non-existent edits file:  + edits);
 continue;
   }
   if (isZeroLengthThenDelete(this.fs, edits)) continue;
   if (checkSafeToSkip) {
 Path higher = files.higher(edits);
 long maxSeqId = Long.MAX_VALUE;
 if (higher != null) {
   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: -?[0-9]+
   String fileName = higher.getName();
   maxSeqId = Math.abs(Long.parseLong(fileName));
 }
 if (maxSeqId = minSeqId) {
   String msg = Maximum possible sequenceid for this log is  + 
 maxSeqId
   + , skipped the whole file, path= + edits;
   LOG.debug(msg);
   continue;
 } else {
   checkSafeToSkip = false;
 }
   }
 {code}
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5689) Skip RecoveredEdits may cause data loss


 [ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5689:
--

Affects Version/s: 0.94.0

 Skip RecoveredEdits may cause data loss
 ---

 Key: HBASE-5689
 URL: https://issues.apache.org/jira/browse/HBASE-5689
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: 5689-testcase.patch, HBASE-5689.patch


 Let's see the following scenario:
 1.Region is on the server A
 2.put KV(r1-v1) to the region
 3.move region from server A to server B
 4.put KV(r2-v2) to the region
 5.move region from server B to server A
 6.put KV(r3-v3) to the region
 7.kill -9 server B and start it
 8.kill -9 server A and start it 
 9.scan the region, we could only get two KV(r1-v1,r2-v2), the third 
 KV(r3-v3) is lost.
 Let's analyse the upper scenario from the code:
 1.the edit logs of KV(r1-v1) and KV(r3-v3) are both recorded in the same 
 hlog file on server A.
 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
 we create one RecoveredEdits file f1 for the region.
 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
 we create another RecoveredEdits file f2 for the region.
 3.however, RecoveredEdits file f2 will be skiped when initializing region
 HRegion#replayRecoveredEditsIfAny
 {code}
  for (Path edits: files) {
   if (edits == null || !this.fs.exists(edits)) {
 LOG.warn(Null or non-existent edits file:  + edits);
 continue;
   }
   if (isZeroLengthThenDelete(this.fs, edits)) continue;
   if (checkSafeToSkip) {
 Path higher = files.higher(edits);
 long maxSeqId = Long.MAX_VALUE;
 if (higher != null) {
   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: -?[0-9]+
   String fileName = higher.getName();
   maxSeqId = Math.abs(Long.parseLong(fileName));
 }
 if (maxSeqId = minSeqId) {
   String msg = Maximum possible sequenceid for this log is  + 
 maxSeqId
   + , skipped the whole file, path= + edits;
   LOG.debug(msg);
   continue;
 } else {
   checkSafeToSkip = false;
 }
   }
 {code}
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5663) MultithreadedTableMapper doesn't work.

2012-03-31 Thread Takuya Ueshin (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243172#comment-13243172
 ] 

Takuya Ueshin commented on HBASE-5663:
--

No, any MapReduce Job using current MultithreadedTableMapper would not work.

When the Mapper starts, it creates some threads and 'original' Mapper object 
which should be called by the thread and MapperContext object used by the 
original mapper.
But because the constructor call of MapperContext through reflection is wrong, 
it throws the NoSuchMethodException.

As I reported in an issue HBASE-5636, trunk version's test case 
TestMulitthreadedTableMapper has a bug, so please test by new version test 
attached in it.

 MultithreadedTableMapper doesn't work.
 --

 Key: HBASE-5663
 URL: https://issues.apache.org/jira/browse/HBASE-5663
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.94.0
Reporter: Takuya Ueshin
 Attachments: HBASE-5663.patch


 MapReduce job using MultithreadedTableMapper goes down throwing the following 
 Exception:
 {noformat}
 java.io.IOException: java.lang.NoSuchMethodException: 
 org.apache.hadoop.mapreduce.Mapper$Context.init(org.apache.hadoop.conf.Configuration,
  org.apache.hadoop.mapred.TaskAttemptID, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
  
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
  org.apache.hadoop.hbase.mapreduce.TableSplit)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.init(MultithreadedTableMapper.java:260)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.NoSuchMethodException: 
 org.apache.hadoop.mapreduce.Mapper$Context.init(org.apache.hadoop.conf.Configuration,
  org.apache.hadoop.mapred.TaskAttemptID, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
  
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
  org.apache.hadoop.hbase.mapreduce.TableSplit)
   at java.lang.Class.getConstructor0(Class.java:2706)
   at java.lang.Class.getConstructor(Class.java:1657)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.init(MultithreadedTableMapper.java:241)
   ... 8 more
 {noformat}
 This occured when the tasks are creating MapRunner threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5636) TestTableMapReduce doesn't work properly.


[ 
https://issues.apache.org/jira/browse/HBASE-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243180#comment-13243180
 ] 

Zhihong Yu commented on HBASE-5636:
---

Without patch from HBASE-5663, I saw the following in test output:
{code}
2012-03-31 07:55:46,743 DEBUG [main] mapreduce.TableInputFormatBase(194): 
getSplits: split -gt; 24 -gt; 192.168.0.17:yyy,
java.io.IOException: java.lang.NoSuchMethodException: 
org.apache.hadoop.mapreduce.Mapper$Context.lt;initgt;(org.apache.hadoop.conf.Configuration,
 org.apache.hadoop.mapred.TaskAttemptID, 
org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader, 
org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter, 
org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
 org.apache.hadoop.hbase.mapreduce.TableSplit)
  at 
org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.lt;initgt;(MultithreadedTableMapper.java:260)
  at 
org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133)
  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
  at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:396)
{code}

 TestTableMapReduce doesn't work properly.
 -

 Key: HBASE-5636
 URL: https://issues.apache.org/jira/browse/HBASE-5636
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.92.1, 0.94.0
Reporter: Takuya Ueshin
 Attachments: HBASE-5636-v2.patch, HBASE-5636.patch


 No map function is called because there are no test data put before test 
 starts.
 The following three tests are in the same situation:
 - org.apache.hadoop.hbase.mapred.TestTableMapReduce
 - org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
 - org.apache.hadoop.hbase.mapreduce.TestMulitthreadedTableMapper

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5663) MultithreadedTableMapper doesn't work.


[ 
https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243182#comment-13243182
 ] 

Zhihong Yu commented on HBASE-5663:
---

So the comment in patch about hadoop versions can be removed.
See the error I got when using hadoop 1.0:
https://issues.apache.org/jira/browse/HBASE-5636?focusedCommentId=13243180page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13243180

 MultithreadedTableMapper doesn't work.
 --

 Key: HBASE-5663
 URL: https://issues.apache.org/jira/browse/HBASE-5663
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.94.0
Reporter: Takuya Ueshin
 Attachments: HBASE-5663.patch


 MapReduce job using MultithreadedTableMapper goes down throwing the following 
 Exception:
 {noformat}
 java.io.IOException: java.lang.NoSuchMethodException: 
 org.apache.hadoop.mapreduce.Mapper$Context.init(org.apache.hadoop.conf.Configuration,
  org.apache.hadoop.mapred.TaskAttemptID, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
  
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
  org.apache.hadoop.hbase.mapreduce.TableSplit)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.init(MultithreadedTableMapper.java:260)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.NoSuchMethodException: 
 org.apache.hadoop.mapreduce.Mapper$Context.init(org.apache.hadoop.conf.Configuration,
  org.apache.hadoop.mapred.TaskAttemptID, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
  
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
  org.apache.hadoop.hbase.mapreduce.TableSplit)
   at java.lang.Class.getConstructor0(Class.java:2706)
   at java.lang.Class.getConstructor(Class.java:1657)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.init(MultithreadedTableMapper.java:241)
   ... 8 more
 {noformat}
 This occured when the tasks are creating MapRunner threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5663) MultithreadedTableMapper doesn't work.


 [ 
https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5663:
--

Attachment: 5663+5636.txt

Combined patch for HBASE-5663 and HBASE-5636

 MultithreadedTableMapper doesn't work.
 --

 Key: HBASE-5663
 URL: https://issues.apache.org/jira/browse/HBASE-5663
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.94.0
Reporter: Takuya Ueshin
 Attachments: 5663+5636.txt, HBASE-5663.patch


 MapReduce job using MultithreadedTableMapper goes down throwing the following 
 Exception:
 {noformat}
 java.io.IOException: java.lang.NoSuchMethodException: 
 org.apache.hadoop.mapreduce.Mapper$Context.init(org.apache.hadoop.conf.Configuration,
  org.apache.hadoop.mapred.TaskAttemptID, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
  
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
  org.apache.hadoop.hbase.mapreduce.TableSplit)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.init(MultithreadedTableMapper.java:260)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.NoSuchMethodException: 
 org.apache.hadoop.mapreduce.Mapper$Context.init(org.apache.hadoop.conf.Configuration,
  org.apache.hadoop.mapred.TaskAttemptID, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
  
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
  org.apache.hadoop.hbase.mapreduce.TableSplit)
   at java.lang.Class.getConstructor0(Class.java:2706)
   at java.lang.Class.getConstructor(Class.java:1657)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.init(MultithreadedTableMapper.java:241)
   ... 8 more
 {noformat}
 This occured when the tasks are creating MapRunner threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5663) MultithreadedTableMapper doesn't work.

2012-03-31 Thread ramkrishna.s.vasudevan (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243190#comment-13243190
 ] 

Zhihong Yu commented on HBASE-5663:
---

I ran the following tests based on combined patch and they passed:
TestMultithreadedTableMapper, TestColumnSeeking, TestTableMapReduce

Looks like Hadoop QA is not working. I am running test suite on Linux.

 MultithreadedTableMapper doesn't work.
 --

 Key: HBASE-5663
 URL: https://issues.apache.org/jira/browse/HBASE-5663
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.94.0
Reporter: Takuya Ueshin
 Attachments: 5663+5636.txt, HBASE-5663.patch


 MapReduce job using MultithreadedTableMapper goes down throwing the following 
 Exception:
 {noformat}
 java.io.IOException: java.lang.NoSuchMethodException: 
 org.apache.hadoop.mapreduce.Mapper$Context.init(org.apache.hadoop.conf.Configuration,
  org.apache.hadoop.mapred.TaskAttemptID, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
  
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
  org.apache.hadoop.hbase.mapreduce.TableSplit)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.init(MultithreadedTableMapper.java:260)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.NoSuchMethodException: 
 org.apache.hadoop.mapreduce.Mapper$Context.init(org.apache.hadoop.conf.Configuration,
  org.apache.hadoop.mapred.TaskAttemptID, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
  
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
  org.apache.hadoop.hbase.mapreduce.TableSplit)
   at java.lang.Class.getConstructor0(Class.java:2706)
   at java.lang.Class.getConstructor(Class.java:1657)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.init(MultithreadedTableMapper.java:241)
   ... 8 more
 {noformat}
 This occured when the tasks are creating MapRunner threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5663) MultithreadedTableMapper doesn't work.

2012-03-31 Thread Takuya Ueshin (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243191#comment-13243191
 ] 

Takuya Ueshin commented on HBASE-5663:
--

Oh, I misread what you wrote.
Thank you for your support.

 MultithreadedTableMapper doesn't work.
 --

 Key: HBASE-5663
 URL: https://issues.apache.org/jira/browse/HBASE-5663
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.94.0
Reporter: Takuya Ueshin
 Attachments: 5663+5636.txt, HBASE-5663.patch


 MapReduce job using MultithreadedTableMapper goes down throwing the following 
 Exception:
 {noformat}
 java.io.IOException: java.lang.NoSuchMethodException: 
 org.apache.hadoop.mapreduce.Mapper$Context.init(org.apache.hadoop.conf.Configuration,
  org.apache.hadoop.mapred.TaskAttemptID, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
  
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
  org.apache.hadoop.hbase.mapreduce.TableSplit)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.init(MultithreadedTableMapper.java:260)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.NoSuchMethodException: 
 org.apache.hadoop.mapreduce.Mapper$Context.init(org.apache.hadoop.conf.Configuration,
  org.apache.hadoop.mapred.TaskAttemptID, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
  
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
  org.apache.hadoop.hbase.mapreduce.TableSplit)
   at java.lang.Class.getConstructor0(Class.java:2706)
   at java.lang.Class.getConstructor(Class.java:1657)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.init(MultithreadedTableMapper.java:241)
   ... 8 more
 {noformat}
 This occured when the tasks are creating MapRunner threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss


[ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243204#comment-13243204
 ] 

ramkrishna.s.vasudevan commented on HBASE-5689:
---

@Ted
What do you feel Ted? Chunhui's patch also makes sense but any way the idea is 
not to skip any recovered.edits file.  So i felt removing the check will be 
enough.


 Skipping RecoveredEdits may cause data loss
 ---

 Key: HBASE-5689
 URL: https://issues.apache.org/jira/browse/HBASE-5689
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: 5689-testcase.patch, HBASE-5689.patch


 Let's see the following scenario:
 1.Region is on the server A
 2.put KV(r1-v1) to the region
 3.move region from server A to server B
 4.put KV(r2-v2) to the region
 5.move region from server B to server A
 6.put KV(r3-v3) to the region
 7.kill -9 server B and start it
 8.kill -9 server A and start it 
 9.scan the region, we could only get two KV(r1-v1,r2-v2), the third 
 KV(r3-v3) is lost.
 Let's analyse the upper scenario from the code:
 1.the edit logs of KV(r1-v1) and KV(r3-v3) are both recorded in the same 
 hlog file on server A.
 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
 we create one RecoveredEdits file f1 for the region.
 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
 we create another RecoveredEdits file f2 for the region.
 3.however, RecoveredEdits file f2 will be skiped when initializing region
 HRegion#replayRecoveredEditsIfAny
 {code}
  for (Path edits: files) {
   if (edits == null || !this.fs.exists(edits)) {
 LOG.warn(Null or non-existent edits file:  + edits);
 continue;
   }
   if (isZeroLengthThenDelete(this.fs, edits)) continue;
   if (checkSafeToSkip) {
 Path higher = files.higher(edits);
 long maxSeqId = Long.MAX_VALUE;
 if (higher != null) {
   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: -?[0-9]+
   String fileName = higher.getName();
   maxSeqId = Math.abs(Long.parseLong(fileName));
 }
 if (maxSeqId = minSeqId) {
   String msg = Maximum possible sequenceid for this log is  + 
 maxSeqId
   + , skipped the whole file, path= + edits;
   LOG.debug(msg);
   continue;
 } else {
   checkSafeToSkip = false;
 }
   }
 {code}
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5689) Skipping RecoveredEdits may cause data loss

2012-03-31 Thread ramkrishna.s.vasudevan (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5689:
--

Priority: Critical  (was: Major)

Making it critical as it is data loss related.

 Skipping RecoveredEdits may cause data loss
 ---

 Key: HBASE-5689
 URL: https://issues.apache.org/jira/browse/HBASE-5689
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Attachments: 5689-testcase.patch, HBASE-5689.patch


 Let's see the following scenario:
 1.Region is on the server A
 2.put KV(r1-v1) to the region
 3.move region from server A to server B
 4.put KV(r2-v2) to the region
 5.move region from server B to server A
 6.put KV(r3-v3) to the region
 7.kill -9 server B and start it
 8.kill -9 server A and start it 
 9.scan the region, we could only get two KV(r1-v1,r2-v2), the third 
 KV(r3-v3) is lost.
 Let's analyse the upper scenario from the code:
 1.the edit logs of KV(r1-v1) and KV(r3-v3) are both recorded in the same 
 hlog file on server A.
 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
 we create one RecoveredEdits file f1 for the region.
 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
 we create another RecoveredEdits file f2 for the region.
 3.however, RecoveredEdits file f2 will be skiped when initializing region
 HRegion#replayRecoveredEditsIfAny
 {code}
  for (Path edits: files) {
   if (edits == null || !this.fs.exists(edits)) {
 LOG.warn(Null or non-existent edits file:  + edits);
 continue;
   }
   if (isZeroLengthThenDelete(this.fs, edits)) continue;
   if (checkSafeToSkip) {
 Path higher = files.higher(edits);
 long maxSeqId = Long.MAX_VALUE;
 if (higher != null) {
   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: -?[0-9]+
   String fileName = higher.getName();
   maxSeqId = Math.abs(Long.parseLong(fileName));
 }
 if (maxSeqId = minSeqId) {
   String msg = Maximum possible sequenceid for this log is  + 
 maxSeqId
   + , skipped the whole file, path= + edits;
   LOG.debug(msg);
   continue;
 } else {
   checkSafeToSkip = false;
 }
   }
 {code}
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5663) MultithreadedTableMapper doesn't work.


[ 
https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243205#comment-13243205
 ] 

Zhihong Yu commented on HBASE-5663:
---

{code}
Results :

Tests run: 541, Failures: 0, Errors: 0, Skipped: 0
...
Results :

Tests run: 615, Failures: 0, Errors: 0, Skipped: 2

[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 35:45.595s
{code}

I will integrate Monday if there is no objection.

 MultithreadedTableMapper doesn't work.
 --

 Key: HBASE-5663
 URL: https://issues.apache.org/jira/browse/HBASE-5663
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.94.0
Reporter: Takuya Ueshin
 Fix For: 0.94.0, 0.96.0

 Attachments: 5663+5636.txt, HBASE-5663.patch


 MapReduce job using MultithreadedTableMapper goes down throwing the following 
 Exception:
 {noformat}
 java.io.IOException: java.lang.NoSuchMethodException: 
 org.apache.hadoop.mapreduce.Mapper$Context.init(org.apache.hadoop.conf.Configuration,
  org.apache.hadoop.mapred.TaskAttemptID, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
  
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
  org.apache.hadoop.hbase.mapreduce.TableSplit)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.init(MultithreadedTableMapper.java:260)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.NoSuchMethodException: 
 org.apache.hadoop.mapreduce.Mapper$Context.init(org.apache.hadoop.conf.Configuration,
  org.apache.hadoop.mapred.TaskAttemptID, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
  
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
  org.apache.hadoop.hbase.mapreduce.TableSplit)
   at java.lang.Class.getConstructor0(Class.java:2706)
   at java.lang.Class.getConstructor(Class.java:1657)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.init(MultithreadedTableMapper.java:241)
   ... 8 more
 {noformat}
 This occured when the tasks are creating MapRunner threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-5663) MultithreadedTableMapper doesn't work.

2012-03-31 Thread Zhihong Yu (Assigned) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu reassigned HBASE-5663:
-

Assignee: Takuya Ueshin

 MultithreadedTableMapper doesn't work.
 --

 Key: HBASE-5663
 URL: https://issues.apache.org/jira/browse/HBASE-5663
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.94.0
Reporter: Takuya Ueshin
Assignee: Takuya Ueshin
 Fix For: 0.94.0, 0.96.0

 Attachments: 5663+5636.txt, HBASE-5663.patch


 MapReduce job using MultithreadedTableMapper goes down throwing the following 
 Exception:
 {noformat}
 java.io.IOException: java.lang.NoSuchMethodException: 
 org.apache.hadoop.mapreduce.Mapper$Context.init(org.apache.hadoop.conf.Configuration,
  org.apache.hadoop.mapred.TaskAttemptID, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
  
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
  org.apache.hadoop.hbase.mapreduce.TableSplit)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.init(MultithreadedTableMapper.java:260)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.NoSuchMethodException: 
 org.apache.hadoop.mapreduce.Mapper$Context.init(org.apache.hadoop.conf.Configuration,
  org.apache.hadoop.mapred.TaskAttemptID, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
  
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
  org.apache.hadoop.hbase.mapreduce.TableSplit)
   at java.lang.Class.getConstructor0(Class.java:2706)
   at java.lang.Class.getConstructor(Class.java:1657)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.init(MultithreadedTableMapper.java:241)
   ... 8 more
 {noformat}
 This occured when the tasks are creating MapRunner threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5663) MultithreadedTableMapper doesn't work.


 [ 
https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5663:
--

Fix Version/s: 0.96.0
   0.94.0

 MultithreadedTableMapper doesn't work.
 --

 Key: HBASE-5663
 URL: https://issues.apache.org/jira/browse/HBASE-5663
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.94.0
Reporter: Takuya Ueshin
 Fix For: 0.94.0, 0.96.0

 Attachments: 5663+5636.txt, HBASE-5663.patch


 MapReduce job using MultithreadedTableMapper goes down throwing the following 
 Exception:
 {noformat}
 java.io.IOException: java.lang.NoSuchMethodException: 
 org.apache.hadoop.mapreduce.Mapper$Context.init(org.apache.hadoop.conf.Configuration,
  org.apache.hadoop.mapred.TaskAttemptID, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
  
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
  org.apache.hadoop.hbase.mapreduce.TableSplit)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.init(MultithreadedTableMapper.java:260)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.NoSuchMethodException: 
 org.apache.hadoop.mapreduce.Mapper$Context.init(org.apache.hadoop.conf.Configuration,
  org.apache.hadoop.mapred.TaskAttemptID, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
  
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
  org.apache.hadoop.hbase.mapreduce.TableSplit)
   at java.lang.Class.getConstructor0(Class.java:2706)
   at java.lang.Class.getConstructor(Class.java:1657)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.init(MultithreadedTableMapper.java:241)
   ... 8 more
 {noformat}
 This occured when the tasks are creating MapRunner threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-5636) TestTableMapReduce doesn't work properly.

2012-03-31 Thread Zhihong Yu (Assigned) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu reassigned HBASE-5636:
-

Assignee: Takuya Ueshin

 TestTableMapReduce doesn't work properly.
 -

 Key: HBASE-5636
 URL: https://issues.apache.org/jira/browse/HBASE-5636
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.92.1, 0.94.0
Reporter: Takuya Ueshin
Assignee: Takuya Ueshin
 Attachments: HBASE-5636-v2.patch, HBASE-5636.patch


 No map function is called because there are no test data put before test 
 starts.
 The following three tests are in the same situation:
 - org.apache.hadoop.hbase.mapred.TestTableMapReduce
 - org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
 - org.apache.hadoop.hbase.mapreduce.TestMulitthreadedTableMapper

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5689) Skipping RecoveredEdits may cause data loss


 [ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5689:
--

Attachment: 5689-simplified.txt

Removing the check should be enough.

With the attached patch, TestHRegion#testDataCorrectnessReplayingRecoveredEdits 
passes.

 Skipping RecoveredEdits may cause data loss
 ---

 Key: HBASE-5689
 URL: https://issues.apache.org/jira/browse/HBASE-5689
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Attachments: 5689-simplified.txt, 5689-testcase.patch, 
 HBASE-5689.patch


 Let's see the following scenario:
 1.Region is on the server A
 2.put KV(r1-v1) to the region
 3.move region from server A to server B
 4.put KV(r2-v2) to the region
 5.move region from server B to server A
 6.put KV(r3-v3) to the region
 7.kill -9 server B and start it
 8.kill -9 server A and start it 
 9.scan the region, we could only get two KV(r1-v1,r2-v2), the third 
 KV(r3-v3) is lost.
 Let's analyse the upper scenario from the code:
 1.the edit logs of KV(r1-v1) and KV(r3-v3) are both recorded in the same 
 hlog file on server A.
 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
 we create one RecoveredEdits file f1 for the region.
 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
 we create another RecoveredEdits file f2 for the region.
 3.however, RecoveredEdits file f2 will be skiped when initializing region
 HRegion#replayRecoveredEditsIfAny
 {code}
  for (Path edits: files) {
   if (edits == null || !this.fs.exists(edits)) {
 LOG.warn(Null or non-existent edits file:  + edits);
 continue;
   }
   if (isZeroLengthThenDelete(this.fs, edits)) continue;
   if (checkSafeToSkip) {
 Path higher = files.higher(edits);
 long maxSeqId = Long.MAX_VALUE;
 if (higher != null) {
   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: -?[0-9]+
   String fileName = higher.getName();
   maxSeqId = Math.abs(Long.parseLong(fileName));
 }
 if (maxSeqId = minSeqId) {
   String msg = Maximum possible sequenceid for this log is  + 
 maxSeqId
   + , skipped the whole file, path= + edits;
   LOG.debug(msg);
   continue;
 } else {
   checkSafeToSkip = false;
 }
   }
 {code}
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

[
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243222#comment-13243222
]

Lars Hofhansl commented on HBASE-5682:
--

Thanks Ted.
The last question is: Should we do this for all HConnection (not just for
unmanaged ones)? It means that HConnection would be able to recover from loss
of ZK connection and the abort() method would only clear out the ZK trackers
and never close or abort he connection. I'd be in favor of that.

@Ram and @Jieshan: Since would a more robust version of HBASE-5153, could you
have a look at this?

Allow HConnectionImplementation to recover from ZK connection loss (for 0.94
only)
--

Key: HBASE-5682
URL: https://issues.apache.org/jira/browse/HBASE-5682
Project: HBase
Issue Type: Improvement
Components: client
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Fix For: 0.94.1

Attachments: 5682-v2.txt, 5682.txt

Just realized that without this HBASE-4805 is broken.
I.e. there's no point keeping a persistent HConnection around if it can be
rendered permanently unusable if the ZK connection is lost temporarily.
Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to
backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)


 [ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5682:
-

Fix Version/s: (was: 0.94.1)
   0.94.0

 Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
 only)
 --

 Key: HBASE-5682
 URL: https://issues.apache.org/jira/browse/HBASE-5682
 Project: HBase
  Issue Type: Improvement
  Components: client
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5682-v2.txt, 5682.txt


 Just realized that without this HBASE-4805 is broken.
 I.e. there's no point keeping a persistent HConnection around if it can be 
 rendered permanently unusable if the ZK connection is lost temporarily.
 Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
 backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5689) Skipping RecoveredEdits may cause data loss


 [ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5689:
-

Fix Version/s: 0.94.0

 Skipping RecoveredEdits may cause data loss
 ---

 Key: HBASE-5689
 URL: https://issues.apache.org/jira/browse/HBASE-5689
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5689-simplified.txt, 5689-testcase.patch, 
 HBASE-5689.patch


 Let's see the following scenario:
 1.Region is on the server A
 2.put KV(r1-v1) to the region
 3.move region from server A to server B
 4.put KV(r2-v2) to the region
 5.move region from server B to server A
 6.put KV(r3-v3) to the region
 7.kill -9 server B and start it
 8.kill -9 server A and start it 
 9.scan the region, we could only get two KV(r1-v1,r2-v2), the third 
 KV(r3-v3) is lost.
 Let's analyse the upper scenario from the code:
 1.the edit logs of KV(r1-v1) and KV(r3-v3) are both recorded in the same 
 hlog file on server A.
 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
 we create one RecoveredEdits file f1 for the region.
 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
 we create another RecoveredEdits file f2 for the region.
 3.however, RecoveredEdits file f2 will be skiped when initializing region
 HRegion#replayRecoveredEditsIfAny
 {code}
  for (Path edits: files) {
   if (edits == null || !this.fs.exists(edits)) {
 LOG.warn(Null or non-existent edits file:  + edits);
 continue;
   }
   if (isZeroLengthThenDelete(this.fs, edits)) continue;
   if (checkSafeToSkip) {
 Path higher = files.higher(edits);
 long maxSeqId = Long.MAX_VALUE;
 if (higher != null) {
   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: -?[0-9]+
   String fileName = higher.getName();
   maxSeqId = Math.abs(Long.parseLong(fileName));
 }
 if (maxSeqId = minSeqId) {
   String msg = Maximum possible sequenceid for this log is  + 
 maxSeqId
   + , skipped the whole file, path= + edits;
   LOG.debug(msg);
   continue;
 } else {
   checkSafeToSkip = false;
 }
   }
 {code}
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5690) compression does not work in Store.java of 0.94


 [ 
https://issues.apache.org/jira/browse/HBASE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5690:
-

Fix Version/s: (was: 0.94.1)
   0.94.0

 compression does not work in Store.java of 0.94
 ---

 Key: HBASE-5690
 URL: https://issues.apache.org/jira/browse/HBASE-5690
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
 Environment: all
Reporter: honghua zhu
Assignee: honghua zhu
Priority: Critical
 Fix For: 0.94.0

 Attachments: Store.patch


 HBASE-5442 The store.createWriterInTmp method missing compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3134) [replication] Add the ability to enable/disable streams


 [ 
https://issues.apache.org/jira/browse/HBASE-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-3134:
-

Fix Version/s: (was: 0.94.1)
   0.94.0

 [replication] Add the ability to enable/disable streams
 ---

 Key: HBASE-3134
 URL: https://issues.apache.org/jira/browse/HBASE-3134
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Jean-Daniel Cryans
Assignee: Teruyoshi Zenmyo
Priority: Minor
  Labels: replication
 Fix For: 0.94.0

 Attachments: 3134-v2.txt, 3134-v3.txt, 3134-v4.txt, 3134.txt, 
 HBASE-3134.patch, HBASE-3134.patch, HBASE-3134.patch, HBASE-3134.patch


 This jira was initially in the scope of HBASE-2201, but was pushed out since 
 it has low value compared to the required effort (and when want to ship 
 0.90.0 rather soonish).
 We need to design a way to enable/disable replication streams in a 
 determinate fashion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5097) RegionObserver implementation whose preScannerOpen and postScannerOpen Impl return null can stall the system initialization through NPE


 [ 
https://issues.apache.org/jira/browse/HBASE-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5097:
-

Fix Version/s: (was: 0.94.1)
   0.94.0

 RegionObserver implementation whose preScannerOpen and postScannerOpen Impl 
 return null can stall the system initialization through NPE
 ---

 Key: HBASE-5097
 URL: https://issues.apache.org/jira/browse/HBASE-5097
 Project: HBase
  Issue Type: Bug
  Components: coprocessors
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5097.patch, HBASE-5097_1.patch, HBASE-5097_2.patch


 In HRegionServer.java openScanner()
 {code}
   r.prepareScanner(scan);
   RegionScanner s = null;
   if (r.getCoprocessorHost() != null) {
 s = r.getCoprocessorHost().preScannerOpen(scan);
   }
   if (s == null) {
 s = r.getScanner(scan);
   }
   if (r.getCoprocessorHost() != null) {
 s = r.getCoprocessorHost().postScannerOpen(scan, s);
   }
 {code}
 If we dont have implemention for postScannerOpen the RegionScanner is null 
 and so throwing nullpointer 
 {code}
 java.lang.NullPointerException
   at 
 java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.addScanner(HRegionServer.java:2282)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2272)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)
 {code}
 Making this defect as blocker.. Pls feel free to change the priority if am 
 wrong.  Also correct me if my way of trying out coprocessors without 
 implementing postScannerOpen is wrong.  Am just a learner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5656) LoadIncrementalHFiles createTable should detect and set compression algorithm


 [ 
https://issues.apache.org/jira/browse/HBASE-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5656:
-

Fix Version/s: (was: 0.94.1)
   0.94.0

 LoadIncrementalHFiles createTable should detect and set compression algorithm
 -

 Key: HBASE-5656
 URL: https://issues.apache.org/jira/browse/HBASE-5656
 Project: HBase
  Issue Type: Bug
  Components: util
Affects Versions: 0.92.1
Reporter: Cosmin Lehene
Assignee: Cosmin Lehene
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5656-0.92.patch, HBASE-5656-0.92.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 LoadIncrementalHFiles doesn't set compression when creating the the table.
 This can be detected from the files within each family dir. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)


[ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243227#comment-13243227
 ] 

Zhihong Yu commented on HBASE-5682:
---

Application to other HConnection makes sense.

 Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
 only)
 --

 Key: HBASE-5682
 URL: https://issues.apache.org/jira/browse/HBASE-5682
 Project: HBase
  Issue Type: Improvement
  Components: client
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5682-v2.txt, 5682.txt


 Just realized that without this HBASE-4805 is broken.
 I.e. there's no point keeping a persistent HConnection around if it can be 
 rendered permanently unusable if the ZK connection is lost temporarily.
 Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
 backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5690) compression does not work in Store.java of 0.94


[ 
https://issues.apache.org/jira/browse/HBASE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243229#comment-13243229
 ] 

Zhihong Yu commented on HBASE-5690:
---

Integrated to 0.94 branch.

Thanks for the patch Honghua.

 compression does not work in Store.java of 0.94
 ---

 Key: HBASE-5690
 URL: https://issues.apache.org/jira/browse/HBASE-5690
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
 Environment: all
Reporter: honghua zhu
Assignee: honghua zhu
Priority: Critical
 Fix For: 0.94.0

 Attachments: Store.patch


 HBASE-5442 The store.createWriterInTmp method missing compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-5690) compression does not work in Store.java of 0.94

2012-03-31 Thread Zhihong Yu (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu resolved HBASE-5690.
---

Resolution: Fixed

 compression does not work in Store.java of 0.94
 ---

 Key: HBASE-5690
 URL: https://issues.apache.org/jira/browse/HBASE-5690
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
 Environment: all
Reporter: honghua zhu
Assignee: honghua zhu
Priority: Critical
 Fix For: 0.94.0

 Attachments: Store.patch


 HBASE-5442 The store.createWriterInTmp method missing compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-03-31 Thread Matteo Bertozzi (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243236#comment-13243236
 ] 

Matteo Bertozzi commented on HBASE-5666:


diving into the code, I've also noticed that there's a 
ZKUtil.waitForBaseZNode() that has already the retry logic but:
 - It takes a Configuration object
 - The timeout is set internally to 1ms
 - It's only used by test/.../hbase/util/ProcessBasedLocalHBaseCluster.java


 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, 
 hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, 
 hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available


[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243239#comment-13243239
 ] 

Zhihong Yu commented on HBASE-5666:
---

Refactoring ZKUtil.waitForBaseZNode() so that it can be used by region server 
and the test would be good.

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, 
 hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, 
 hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5690) compression does not work in Store.java of 0.94

2012-03-31 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243246#comment-13243246
 ] 

Hudson commented on HBASE-5690:
---

Integrated in HBase-0.94 #72 (See 
[https://builds.apache.org/job/HBase-0.94/72/])
HBASE-5690 compression does not work in Store.java of 0.94 (Honghua Zhu) 
(Revision 1307851)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java


 compression does not work in Store.java of 0.94
 ---

 Key: HBASE-5690
 URL: https://issues.apache.org/jira/browse/HBASE-5690
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
 Environment: all
Reporter: honghua zhu
Assignee: honghua zhu
Priority: Critical
 Fix For: 0.94.0

 Attachments: Store.patch


 HBASE-5442 The store.createWriterInTmp method missing compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-03-31 Thread Matteo Bertozzi (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-5666:
---

Attachment: zk-exists-refactor-v0.patch

don't know... I've tried to refactor the method to get something useful and 
shared...
The problem is that checkExists() called by checkIfBaseNodeAvailable() uses a 
ZooKeeperWatcher and call exists() on  a RecoverableZooKeeper object, while 
waitForBaseZNode() has a plain ZooKeeper node... 
so the checkExists(ZooKeeperWatcher) implementation relays on the fact that the 
RecoverableZooKeeper.exists() is implemented as RZK.getZooKeeper().exists() 
which I don't like...

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, 
 hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, 
 hbase-zookeeper.log, zk-exists-refactor-v0.patch


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5670) Have Mutation implement the Row interface.


 [ 
https://issues.apache.org/jira/browse/HBASE-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5670:
-

Fix Version/s: (was: 0.94.1)
   0.94.0

 Have Mutation implement the Row interface.
 --

 Key: HBASE-5670
 URL: https://issues.apache.org/jira/browse/HBASE-5670
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Trivial
 Fix For: 0.94.0, 0.96.0

 Attachments: 5670-0.94.txt, 5670-trunk.txt, 5670-trunk.txt


 In HBASE-4347 I factored some code from Put/Delete/Append in Mutation.
 In a discussion with a co-worker I noticed that Put/Delete/Append still 
 implement the Row interface, but Mutation does not.
 In a trivial change I would like to move that interface up to Mutation, along 
 with changing HTable.batch(ListRow) to HTable.batch(List? extends Row) 
 (HConnection.processBatch takes List? extends Row already anyway), so that 
 HTable.batch can be used with a list of Mutations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5084) Allow different HTable instances to share one ExecutorService


 [ 
https://issues.apache.org/jira/browse/HBASE-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5084:
-

Fix Version/s: (was: 0.94.1)
   0.94.0

 Allow different HTable instances to share one ExecutorService
 -

 Key: HBASE-5084
 URL: https://issues.apache.org/jira/browse/HBASE-5084
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5084-0.94.txt, 5084-trunk.txt


 This came out of Lily 1.1.1 release:
 Use a shared ExecutorService for all HTable instances, leading to better (or 
 actual) thread reuse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4398) If HRegionPartitioner is used in MapReduce, client side configurations are overwritten by hbase-site.xml.


 [ 
https://issues.apache.org/jira/browse/HBASE-4398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-4398:
-

Fix Version/s: (was: 0.94.1)
   0.94.0

 If HRegionPartitioner is used in MapReduce, client side configurations are 
 overwritten by hbase-site.xml.
 -

 Key: HBASE-4398
 URL: https://issues.apache.org/jira/browse/HBASE-4398
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.4
Reporter: Takuya Ueshin
Assignee: Takuya Ueshin
 Fix For: 0.92.2, 0.94.0

 Attachments: HBASE-4398.patch


 If HRegionPartitioner is used in MapReduce, client side configurations are 
 overwritten by hbase-site.xml.
 We can reproduce the problem by the following instructions:
 {noformat}
 - Add HRegionPartitioner.class to the 4th argument of
 TableMapReduceUtil#initTableReducerJob()
 at line around 133
 in src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableMapReduce.java
 - Change or remove hbase.zookeeper.property.clientPort property
 in hbase-site.xml ( for example, changed to 12345 ).
 - run testMultiRegionTable()
 {noformat}
 Then I got error messages as following:
 {noformat}
 2011-09-12 22:28:51,020 DEBUG [Thread-832] zookeeper.ZKUtil(93): hconnection 
 opening connection to ZooKeeper with ensemble (localhost:12345)
 2011-09-12 22:28:51,022 INFO  [Thread-832] 
 zookeeper.RecoverableZooKeeper(89): The identifier of this process is 
 43200@imac.local
 2011-09-12 22:28:51,123 WARN  [Thread-832] 
 zookeeper.RecoverableZooKeeper(161): Possibly transient ZooKeeper exception: 
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss for /hbase/master
 2011-09-12 22:28:51,123 INFO  [Thread-832] 
 zookeeper.RecoverableZooKeeper(173): The 1 times to retry ZooKeeper after 
 sleeping 1000 ms
  =
 2011-09-12 22:29:02,418 ERROR [Thread-832] mapreduce.HRegionPartitioner(125): 
 java.io.IOException: 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2e54e48d
  closed
 2011-09-12 22:29:02,422 WARN  [Thread-832] mapred.LocalJobRunner$Job(256): 
 job_local_0001
 java.lang.NullPointerException
at 
 org.apache.hadoop.hbase.mapreduce.HRegionPartitioner.setConf(HRegionPartitioner.java:128)
at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:527)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
 {noformat}
 I think HTable should connect to ZooKeeper at port 21818 configured at client 
 side instead of 12345 in hbase-site.xml
 and It might be caused by HBaseConfiguration.addHbaseResources(conf); in 
 HRegionPartitioner#setConf(Configuration).
 And this might mean that all of client side configurations, also configured 
 in hbase-site.xml, are overwritten caused by this problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5669) AggregationClient fails validation for open stoprow scan


 [ 
https://issues.apache.org/jira/browse/HBASE-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5669:
-

Fix Version/s: (was: 0.94.1)
   0.94.0

 AggregationClient fails validation for open stoprow scan
 

 Key: HBASE-5669
 URL: https://issues.apache.org/jira/browse/HBASE-5669
 Project: HBase
  Issue Type: Bug
  Components: coprocessors
Affects Versions: 0.92.1
 Environment: n/a
Reporter: Brian Rogers
Assignee: Mubarak Seyed
Priority: Minor
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5669.trunk.v1.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 AggregationClient.validateParameters throws an exception when the Scan has a 
 valid startrow but an unset endrow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available


[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243296#comment-13243296
 ] 

Zhihong Yu commented on HBASE-5666:
---

Patch makes sense.
Can you integrate it into HRegionServer ?

Thanks

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, 
 hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, 
 hbase-zookeeper.log, zk-exists-refactor-v0.patch


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-03-31 Thread Matteo Bertozzi (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243302#comment-13243302
 ] 

Matteo Bertozzi commented on HBASE-5666:


I was thinking to patch ZooKeeperNodeTracker.checkIfBaseNodeAvailable() and 
HConnectionImplementation.checkIfBaseNodeAvailable() instead of only the 
HRegionServer...

what do you think?


 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, 
 hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, 
 hbase-zookeeper.log, zk-exists-refactor-v0.patch


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available


[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243310#comment-13243310
 ] 

Zhihong Yu commented on HBASE-5666:
---

HConnectionImplementation.checkIfBaseNodeAvailable() doesn't take Abortable.
We can limit the scope of change in this JIRA.

Is that Okay ?

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, 
 hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, 
 hbase-zookeeper.log, zk-exists-refactor-v0.patch


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5691) Importtsv stops the webservice from which it is evoked

2012-03-31 Thread debarshi basak (Created) (JIRA)

Importtsv stops the webservice from which it is evoked
--

 Key: HBASE-5691
 URL: https://issues.apache.org/jira/browse/HBASE-5691
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: debarshi basak
Priority: Minor


I was trying to run importtsv from a servlet. Everytime after the completion of 
job, the tomcat server was shutdown.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

[
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243312#comment-13243312
]

stack commented on HBASE-5682:
--

This is a perversion.

If we pass in a connection from outside, down in the guts, do special handling
that makes the connection and zookeeper handling do reconnect. Its like we
should be passing an Interface made at a higher-level of abstraction and then
in the implementation, it did this fixup when connection breaks.

With that out of the way, do whatever you need to make it work. Patch looks
fine. How did you test. Would it be hard to make a unit test of it. A unit
test would be good codifying this perversion since it will be brittle being not
whats expected.

I'm against changing the behavior of the default case in 0.92/0.94. I'm
interested in problems you see in hbase-5153 or issues you have w/ the
implementation there that being the 0.96 client.

Allow HConnectionImplementation to recover from ZK connection loss (for 0.94
only)
--

Key: HBASE-5682
URL: https://issues.apache.org/jira/browse/HBASE-5682
Project: HBase
Issue Type: Improvement
Components: client
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Fix For: 0.94.0

Attachments: 5682-v2.txt, 5682.txt

[jira] [Commented] (HBASE-5691) Importtsv stops the webservice from which it is evoked


[ 
https://issues.apache.org/jira/browse/HBASE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243314#comment-13243314
 ] 

stack commented on HBASE-5691:
--

It calls System.exit on the end: System.exit(job.waitForCompletion(true) ? 
0 : 1);.   How you invoking it?You call main?   Why not call the 
createSubmittableJob(conf, otherArgs) and then pass the passed job to 
job.waitForCompletion(true)...  and show error or success dependent on what is 
returned.  Is it even  a good idea calling this script from a servlet?

 Importtsv stops the webservice from which it is evoked
 --

 Key: HBASE-5691
 URL: https://issues.apache.org/jira/browse/HBASE-5691
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: debarshi basak
Priority: Minor

 I was trying to run importtsv from a servlet. Everytime after the completion 
 of job, the tomcat server was shutdown.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5691) Importtsv stops the webservice from which it is evoked


[ 
https://issues.apache.org/jira/browse/HBASE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243315#comment-13243315
 ] 

stack commented on HBASE-5691:
--

Oh, please close this issue if the above suggestion works for you.

 Importtsv stops the webservice from which it is evoked
 --

 Key: HBASE-5691
 URL: https://issues.apache.org/jira/browse/HBASE-5691
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: debarshi basak
Priority: Minor

 I was trying to run importtsv from a servlet. Everytime after the completion 
 of job, the tomcat server was shutdown.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)


 [ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5682:
-

Attachment: 5682-all.txt

Here's a patch that always attempts reconnecting to ZK when a ZK connection is 
needed.

 Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
 only)
 --

 Key: HBASE-5682
 URL: https://issues.apache.org/jira/browse/HBASE-5682
 Project: HBase
  Issue Type: Improvement
  Components: client
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5682-all.txt, 5682-v2.txt, 5682.txt


 Just realized that without this HBASE-4805 is broken.
 I.e. there's no point keeping a persistent HConnection around if it can be 
 rendered permanently unusable if the ZK connection is lost temporarily.
 Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
 backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

2012-03-31 Thread Jonathan Hsieh (Issue Comment Edited) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243589#comment-13243589
]

Lars Hofhansl commented on HBASE-5682:
--

perversion is hard word. :) It is just rechecking before each use whether the
trackers are still usable. The timeout is handled through the HConnection's
abort().

The testing I've done:
# ZK down, HBase down, start a client. Then start ZK, then HBase.
# ZK up, HBase down, start client. Then start HBase
# both ZK and HBase up, start client, kill HBase, restart HBase
# both ZK and HBase up, start client, kill ZK and HBase restart

The client just create a new HTable and then tries to get some rows in a loop.
In all cases the client should successfully be able to reconnect when both ZK
and HBase are up.

The problem I have seen in 0.94/0.92 without this patch even with managed
connections is that after HConnection times out, it is unusable and even
getting a new HTable does not fix the problem since behind the scenes the same
HConnection is retrieved.

Will think about an automated test. Do you like the version better that always
does the recheck (and hence all the conditional for managed go away)?

Allow HConnectionImplementation to recover from ZK connection loss (for 0.94
only)
--

Key: HBASE-5682
URL: https://issues.apache.org/jira/browse/HBASE-5682
Project: HBase
Issue Type: Improvement
Components: client
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Fix For: 0.94.0

Attachments: 5682-all.txt, 5682-v2.txt, 5682.txt

[jira] [Resolved] (HBASE-5691) Importtsv stops the webservice from which it is evoked

2012-03-31 Thread debarshi basak (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

debarshi basak resolved HBASE-5691.
---

Resolution: Fixed

 Importtsv stops the webservice from which it is evoked
 --

 Key: HBASE-5691
 URL: https://issues.apache.org/jira/browse/HBASE-5691
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: debarshi basak
Priority: Minor

 I was trying to run importtsv from a servlet. Everytime after the completion 
 of job, the tomcat server was shutdown.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5691) Importtsv stops the webservice from which it is evoked

2012-03-31 Thread debarshi basak (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243591#comment-13243591
 ] 

debarshi basak commented on HBASE-5691:
---

Thanks alot.It worked.

 Importtsv stops the webservice from which it is evoked
 --

 Key: HBASE-5691
 URL: https://issues.apache.org/jira/browse/HBASE-5691
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: debarshi basak
Priority: Minor

 I was trying to run importtsv from a servlet. Everytime after the completion 
 of job, the tomcat server was shutdown.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5680) Hbase94 and Hbase 92.2 is not compatible with the Hadoop 23.1

2012-03-31 Thread Jonathan Hsieh (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243593#comment-13243593
 ] 

Jonathan Hsieh commented on HBASE-5680:
---

The master starts on top of hadoop 0.23.x
apache 0.92.1 recompiled with the security profile *on* (-Psecurity) and with 
-Dhadoop.profile=23
apache 0.94.0rc0 recompiled with the security profile *on* (-Psecurity) and  
with -Dhadoop.profile=23 

The master fails on top of hadoop 0.23.x with the class not found error in 
these cases:
apache 0.92.1 right out of tarball
apache 0.92.1 with the security profile *off* with -Dhadoop.profile=23
apache 0.94.0rc0 right out of tarball.
apache 0.94.0rc0 security right out of tarball.
apache 0.94.0rc0 with the security profile *off* with -Dhadoop.profile=23



 Hbase94 and Hbase 92.2 is not compatible with the Hadoop 23.1 
 --

 Key: HBASE-5680
 URL: https://issues.apache.org/jira/browse/HBASE-5680
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Kristam Subba Swathi

 Hmaster is not able to start because of the following error
 Please find the following error 
 
 2012-03-30 11:12:19,487 FATAL org.apache.hadoop.hbase.master.HMaster: 
 Unhandled exception. Starting shutdown.
 java.lang.NoClassDefFoundError: 
 org/apache/hadoop/hdfs/protocol/FSConstants$SafeModeAction
   at org.apache.hadoop.hbase.util.FSUtils.waitOnSafeMode(FSUtils.java:524)
   at 
 org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:324)
   at 
 org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:127)
   at 
 org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:112)
   at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:496)
   at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.hdfs.protocol.FSConstants$SafeModeAction
   at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
   ... 7 more
 There is a change in the FSConstants

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-5691) Importtsv stops the webservice from which it is evoked

2012-03-31 Thread Todd Lipcon (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HBASE-5691.


Resolution: Not A Problem

 Importtsv stops the webservice from which it is evoked
 --

 Key: HBASE-5691
 URL: https://issues.apache.org/jira/browse/HBASE-5691
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: debarshi basak
Priority: Minor

 I was trying to run importtsv from a servlet. Everytime after the completion 
 of job, the tomcat server was shutdown.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HBASE-5691) Importtsv stops the webservice from which it is evoked

2012-03-31 Thread Todd Lipcon (Reopened) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reopened HBASE-5691:



 Importtsv stops the webservice from which it is evoked
 --

 Key: HBASE-5691
 URL: https://issues.apache.org/jira/browse/HBASE-5691
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: debarshi basak
Priority: Minor

 I was trying to run importtsv from a servlet. Everytime after the completion 
 of job, the tomcat server was shutdown.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5680) Hbase94 and Hbase 92.2 is not compatible with the Hadoop 23.1


[ 
https://issues.apache.org/jira/browse/HBASE-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243593#comment-13243593
 ] 

Jonathan Hsieh edited comment on HBASE-5680 at 3/31/12 11:24 PM:
-

The master runs on top of hadoop 0.23.x
apache 0.92.1 recompiled with the security profile *on* (-Psecurity) and with 
-Dhadoop.profile=23
apache 0.94.0rc0 recompiled with the security profile *on* (-Psecurity) and  
with -Dhadoop.profile=23 

The master fails on top of hadoop 0.23.x with the class not found error in 
these cases:
apache 0.92.1 right out of tarball
apache 0.92.1 with the security profile *off* with -Dhadoop.profile=23
apache 0.94.0rc0 right out of tarball.
apache 0.94.0rc0 security right out of tarball.
apache 0.94.0rc0 with the security profile *off* with -Dhadoop.profile=23



  was (Author: jmhsieh):
The master starts on top of hadoop 0.23.x
apache 0.92.1 recompiled with the security profile *on* (-Psecurity) and with 
-Dhadoop.profile=23
apache 0.94.0rc0 recompiled with the security profile *on* (-Psecurity) and  
with -Dhadoop.profile=23 

The master fails on top of hadoop 0.23.x with the class not found error in 
these cases:
apache 0.92.1 right out of tarball
apache 0.92.1 with the security profile *off* with -Dhadoop.profile=23
apache 0.94.0rc0 right out of tarball.
apache 0.94.0rc0 security right out of tarball.
apache 0.94.0rc0 with the security profile *off* with -Dhadoop.profile=23


  
 Hbase94 and Hbase 92.2 is not compatible with the Hadoop 23.1 
 --

 Key: HBASE-5680
 URL: https://issues.apache.org/jira/browse/HBASE-5680
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Kristam Subba Swathi

 Hmaster is not able to start because of the following error
 Please find the following error 
 
 2012-03-30 11:12:19,487 FATAL org.apache.hadoop.hbase.master.HMaster: 
 Unhandled exception. Starting shutdown.
 java.lang.NoClassDefFoundError: 
 org/apache/hadoop/hdfs/protocol/FSConstants$SafeModeAction
   at org.apache.hadoop.hbase.util.FSUtils.waitOnSafeMode(FSUtils.java:524)
   at 
 org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:324)
   at 
 org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:127)
   at 
 org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:112)
   at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:496)
   at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.hdfs.protocol.FSConstants$SafeModeAction
   at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
   ... 7 more
 There is a change in the FSConstants

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

2012-03-31 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243605#comment-13243605
 ] 

Lars Hofhansl commented on HBASE-5682:
--

The more I look at, the more I do like the patch that changes the behavior in 
all cases.
It's simple and low risk: Just recheck the ZK trackers before they are needed.


 Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
 only)
 --

 Key: HBASE-5682
 URL: https://issues.apache.org/jira/browse/HBASE-5682
 Project: HBase
  Issue Type: Improvement
  Components: client
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5682-all.txt, 5682-v2.txt, 5682.txt


 Just realized that without this HBASE-4805 is broken.
 I.e. there's no point keeping a persistent HConnection around if it can be 
 rendered permanently unusable if the ZK connection is lost temporarily.
 Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
 backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5688) Convert zk root-region-server znode content to pb

2012-03-31 Thread stack (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5688:
-

Fix Version/s: 0.96.0
 Assignee: stack
 Release Note: 
Changes the content of the  root location znode, root-region-server, to be
four magic bytes ('PBUF') followed by a protobuf message that holds the
ServerName of the server currently hosting root.
   Status: Patch Available  (was: Open)

Trying against hadoopqa.  Let me put it up on review board too because would 
appreciate some review.

 Convert zk root-region-server znode content to pb
 -

 Key: HBASE-5688
 URL: https://issues.apache.org/jira/browse/HBASE-5688
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Fix For: 0.96.0

 Attachments: 5688.txt, 5688v4.txt


 Move the root-region-server znode content from the versioned bytes that 
 ServerName.getVersionedBytes outputs to instead be pb.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5688) Convert zk root-region-server znode content to pb

2012-03-31 Thread stack (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack updated HBASE-5688:
-

Attachment: 5688v4.txt

Changes the content of the root location znode, root-region-server, to be
four magic bytes ('PBUF') followed by a protobuf message that holds the
ServerName of the server currently hosting root.

D src/main/java/org/apache/hadoop/hbase/catalog/RootLocationEditor.java
Removed. Had two methods, one to add root-region-server znode and another
to removed it. Rather, put these methods in RootRegionTracker. It
tracks root-region-server znode. Having all to do w/ root-region-server
is more cohesive. Also makes it so can encapsulate in one class
all to do w/ create, delete, and reading of root-region-server.
We also want to purge the catalog package (See note at head of
CatalogTracker).
M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
M src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
M src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
Get root region location from RootRegionTracker rather than from
RootLocationEditor.
A src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
Utility to do w/ protobuf handling. Has methods to help prefixing
and stripping from serialized protobuf messages some 'magic'.
A src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java
PB generated.
M src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
Use new RootRegionTracker method for getting content of znode rather
than do it all here (going via RootRegionTracker, we can keep how
the znode content is serialized private to the RootRegionTracker class.
M src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java
Has the methods that used to be in RootLocationEditor plus a new
getRootRegionLocation method (does not set watcher).
A src/main/protobuf/ZooKeeper.proto
M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTrackerOnCluster.java
M src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java
Get root region location from RootRegionTracker rather than from
RootLocationEditor.
A src/test/java/org/apache/hadoop/hbase/zookeeper/TestRootRegionTracker.java
Test of dataToServerName method in RootRegionTracker.

Convert zk root-region-server znode content to pb
-

Key: HBASE-5688
URL: https://issues.apache.org/jira/browse/HBASE-5688
Project: HBase
Issue Type: Task
Reporter: stack
Fix For: 0.96.0

Attachments: 5688.txt, 5688v4.txt

Move the root-region-server znode content from the versioned bytes that
ServerName.getVersionedBytes outputs to instead be pb.

[jira] [Issue Comment Edited] (HBASE-5688) Convert zk root-region-server znode content to pb

2012-03-31 Thread stack (Issue Comment Edited) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243616#comment-13243616
]

stack edited comment on HBASE-5688 at 4/1/12 12:18 AM:
---

Changes the content of the root location znode, root-region-server, to be
four magic bytes ('PBUF') followed by a protobuf message that holds the
ServerName of the server currently hosting root.
{code}
D src/main/java/org/apache/hadoop/hbase/catalog/RootLocationEditor.java
Removed. Had two methods, one to add root-region-server znode and another
to removed it. Rather, put these methods in RootRegionTracker. It
tracks root-region-server znode. Having all to do w/ root-region-server
is more cohesive. Also makes it so can encapsulate in one class
all to do w/ create, delete, and reading of root-region-server.
We also want to purge the catalog package (See note at head of
CatalogTracker).
M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
M src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
M src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
Get root region location from RootRegionTracker rather than from
RootLocationEditor.
A src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
Utility to do w/ protobuf handling. Has methods to help prefixing
and stripping from serialized protobuf messages some 'magic'.
A src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java
PB generated.
M src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
Use new RootRegionTracker method for getting content of znode rather
than do it all here (going via RootRegionTracker, we can keep how
the znode content is serialized private to the RootRegionTracker class.
M src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java
Has the methods that used to be in RootLocationEditor plus a new
getRootRegionLocation method (does not set watcher).
A src/main/protobuf/ZooKeeper.proto
M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTrackerOnCluster.java
M src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java
Get root region location from RootRegionTracker rather than from
RootLocationEditor.
A src/test/java/org/apache/hadoop/hbase/zookeeper/TestRootRegionTracker.java
Test of dataToServerName method in RootRegionTracker.
{code}

was (Author: stack):
Changes the content of the root location znode, root-region-server, to be
four magic bytes ('PBUF') followed by a protobuf message that holds the
ServerName of the server currently hosting root.

Convert zk root-region-server znode content to pb
-

Key:

[jira] [Commented] (HBASE-5688) Convert zk root-region-server znode content to pb


[ 
https://issues.apache.org/jira/browse/HBASE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243618#comment-13243618
 ] 

jirapos...@reviews.apache.org commented on HBASE-5688:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4600/
---

Review request for hbase.


Summary
---

Changes the content of the root location znode, root-region-server, to be
four magic bytes ('PBUF') followed by a protobuf message that holds the
ServerName of the server currently hosting root.

D src/main/java/org/apache/hadoop/hbase/catalog/RootLocationEditor.java
  Removed. Had two methods, one to add root-region-server znode and another
  to removed it.  Rather, put these methods in RootRegionTracker.  It
  tracks root-region-server znode.  Having all to do w/ root-region-server
  is more cohesive.  Also makes it so can encapsulate in one class
  all to do w/ create, delete, and reading of root-region-server.
  We also want to purge the catalog package (See note at head of
  CatalogTracker).
M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
M src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
M src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
  Get root region location from RootRegionTracker rather than from 
RootLocationEditor.
A src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
  Utility to do w/ protobuf handling.  Has methods to help prefixing
  and stripping from serialized protobuf messages some 'magic'.
A src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java
  PB generated.
M src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
  Use new RootRegionTracker method for getting content of znode rather
  than do it all here (going via RootRegionTracker, we can keep how
  the znode content is serialized private to the RootRegionTracker class.
M src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java
  Has the methods that used to be in RootLocationEditor plus a new


This addresses bug hbase-5688.
https://issues.apache.org/jira/browse/hbase-5688


Diffs
-

  src/main/java/org/apache/hadoop/hbase/catalog/RootLocationEditor.java c90864a 
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java b2a5463 
  src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 64def15 
  src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 9c215b4 
  src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 2f05005 
  src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java 
33e4e71 
  src/main/protobuf/ZooKeeper.proto PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 533b2bf 
  
src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTrackerOnCluster.java 
fe37156 
  src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java 2132036 
  src/test/java/org/apache/hadoop/hbase/zookeeper/TestRootRegionTracker.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/4600/diff


Testing
---


Thanks,

Michael



 Convert zk root-region-server znode content to pb
 -

 Key: HBASE-5688
 URL: https://issues.apache.org/jira/browse/HBASE-5688
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Fix For: 0.96.0

 Attachments: 5688.txt, 5688v4.txt


 Move the root-region-server znode content from the versioned bytes that 
 ServerName.getVersionedBytes outputs to instead be pb.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

[
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243619#comment-13243619
]

stack commented on HBASE-5682:
--

bq. The problem I have seen in 0.94/0.92 without this patch even with managed
connections is that after HConnection times out, it is unusable and even
getting a new HTable does not fix the problem since behind the scenes the same
HConnection is retrieved.

Didn't we add a check for if the connection is bad?

bq. Will think about an automated test. Do you like the version better that
always does the recheck (and hence all the conditional for managed go away)?

How does this work in trunk? In trunk the work has been done so we don't
really keep open a zk session any more. For the sake of making tests run
smoother, we'll do keep alive on zk session and hold it open 5 minutes and let
it go if unused.

I'm +1 on making our stuff more resilient. Resusing a dud hconnection either
because the connection is dead or zk session died is hard to figure.

How will this change a users's perception about how this stuff is used? If
your answer is that it helps in the extreme where the connection goes dead, and
thats the only change a user percieves, then lets commit. But we should
include a test? If you describe one, I can try help write it?

You think this should go into 0.92?

Allow HConnectionImplementation to recover from ZK connection loss (for 0.94
only)
--

Key: HBASE-5682
URL: https://issues.apache.org/jira/browse/HBASE-5682
Project: HBase
Issue Type: Improvement
Components: client
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Fix For: 0.94.0

Attachments: 5682-all.txt, 5682-v2.txt, 5682.txt

[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)


[ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243620#comment-13243620
 ] 

stack commented on HBASE-5682:
--

I looked at the 'all' patch.  Looks good to me.  Am interested in how it 
changes API usage (if at all).

 Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
 only)
 --

 Key: HBASE-5682
 URL: https://issues.apache.org/jira/browse/HBASE-5682
 Project: HBase
  Issue Type: Improvement
  Components: client
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5682-all.txt, 5682-v2.txt, 5682.txt


 Just realized that without this HBASE-4805 is broken.
 I.e. there's no point keeping a persistent HConnection around if it can be 
 rendered permanently unusable if the ZK connection is lost temporarily.
 Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
 backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

2012-03-31 Thread chunhui shen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243625#comment-13243625
 ] 

chunhui shen commented on HBASE-5689:
-

@Ram
The analysis is correct.
However, I think your solution is not use the optimization made in HBASE-4797.
In my  HBASE-5689.patch, I just change the file name of edit log to 
MaximumEditLogSeqNum。
In this issue, with the patch it will not skip RecoveredEdits file because we 
shouldn't, however, in other case, for example, we first put many data to RS1, 
then move regon to RS2 and put many data again. If RS1 and RS2 died, we would 
skip the edit log file from RS1.

bq.Chunhui's patch also makes sense but any way the idea is not to skip any 
recovered.edits file.
So, it is wrong, my way is keeping skip recovered.edits file, except the case 
such as this issue.

 Skipping RecoveredEdits may cause data loss
 ---

 Key: HBASE-5689
 URL: https://issues.apache.org/jira/browse/HBASE-5689
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5689-simplified.txt, 5689-testcase.patch, 
 HBASE-5689.patch


 Let's see the following scenario:
 1.Region is on the server A
 2.put KV(r1-v1) to the region
 3.move region from server A to server B
 4.put KV(r2-v2) to the region
 5.move region from server B to server A
 6.put KV(r3-v3) to the region
 7.kill -9 server B and start it
 8.kill -9 server A and start it 
 9.scan the region, we could only get two KV(r1-v1,r2-v2), the third 
 KV(r3-v3) is lost.
 Let's analyse the upper scenario from the code:
 1.the edit logs of KV(r1-v1) and KV(r3-v3) are both recorded in the same 
 hlog file on server A.
 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
 we create one RecoveredEdits file f1 for the region.
 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
 we create another RecoveredEdits file f2 for the region.
 3.however, RecoveredEdits file f2 will be skiped when initializing region
 HRegion#replayRecoveredEditsIfAny
 {code}
  for (Path edits: files) {
   if (edits == null || !this.fs.exists(edits)) {
 LOG.warn(Null or non-existent edits file:  + edits);
 continue;
   }
   if (isZeroLengthThenDelete(this.fs, edits)) continue;
   if (checkSafeToSkip) {
 Path higher = files.higher(edits);
 long maxSeqId = Long.MAX_VALUE;
 if (higher != null) {
   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: -?[0-9]+
   String fileName = higher.getName();
   maxSeqId = Math.abs(Long.parseLong(fileName));
 }
 if (maxSeqId = minSeqId) {
   String msg = Maximum possible sequenceid for this log is  + 
 maxSeqId
   + , skipped the whole file, path= + edits;
   LOG.debug(msg);
   continue;
 } else {
   checkSafeToSkip = false;
 }
   }
 {code}
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)


[ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243637#comment-13243637
 ] 

Lars Hofhansl commented on HBASE-5682:
--

I am not envisioning any API changes, just that the HConnection would no longer 
be ripped from under any HTables where there is a ZK connection loss.

I ran all tests again, and TestReplication and TestZookeeper have some failures 
that are related. Looking.

 Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
 only)
 --

 Key: HBASE-5682
 URL: https://issues.apache.org/jira/browse/HBASE-5682
 Project: HBase
  Issue Type: Improvement
  Components: client
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5682-all.txt, 5682-v2.txt, 5682.txt


 Just realized that without this HBASE-4805 is broken.
 I.e. there's no point keeping a persistent HConnection around if it can be 
 rendered permanently unusable if the ZK connection is lost temporarily.
 Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
 backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5692) Add real action time for HLogPrettyPrinter

2012-03-31 Thread Xing Shi (Created) (JIRA)

Add real action time for HLogPrettyPrinter
--

 Key: HBASE-5692
 URL: https://issues.apache.org/jira/browse/HBASE-5692
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Xing Shi
Priority: Minor


Now the HLogPrettyPrinter print the log without real op time but the timestamp

{quote}
Sequence 4 from region ee9877dfd55624f50b20acf572416a88 in table t5
  Action:
row: r
column: f3:q
at time: Thu Jan 01 08:02:03 CST 1970
{quote}
Maybe we need to know the real op time like this
{quote}
Sequence 4 from region ee9877dfd55624f50b20acf572416a88 in table t5 at time: 
Sun Apr 01 10:42:53 CST 2012
  Action:
row: r
column: f3:q
timestamp: Thu Jan 01 08:02:03 CST 1970
{quote}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5213) hbase master stop does not bring down backup masters

2012-03-31 Thread Jonathan Hsieh (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243638#comment-13243638
 ] 

Jonathan Hsieh commented on HBASE-5213:
---

Committed to 0.92 and 0.90.  Thanks Greg.

 hbase master stop does not bring down backup masters
 --

 Key: HBASE-5213
 URL: https://issues.apache.org/jira/browse/HBASE-5213
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
Priority: Minor
 Fix For: 0.94.0, 0.96.0

 Attachments: HBASE-5213-v0-trunk.patch, HBASE-5213-v1-trunk.patch, 
 HBASE-5213-v2-90.patch, HBASE-5213-v2-92.patch, HBASE-5213-v2-trunk.patch


 Typing hbase master stop produces the following message:
 stop   Start cluster shutdown; Master signals RegionServer shutdown
 It seems like backup masters should be considered part of the cluster, but 
 they are not brought down by hbase master stop.
 stop-hbase.sh does correctly bring down the backup masters.
 The same behavior is observed when a client app makes use of the client API 
 HBaseAdmin.shutdown() 
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#shutdown()
  -- this isn't too surprising since I think hbase master stop just calls 
 this API.
 It seems like HBASE-1448 address this; perhaps there was a regression?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5213) hbase master stop does not bring down backup masters

2012-03-31 Thread Jonathan Hsieh (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243639#comment-13243639
 ] 

Jonathan Hsieh commented on HBASE-5213:
---

Committed to 0.92 and 0.90.  Thanks Greg.

 hbase master stop does not bring down backup masters
 --

 Key: HBASE-5213
 URL: https://issues.apache.org/jira/browse/HBASE-5213
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
Priority: Minor
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5213-v0-trunk.patch, HBASE-5213-v1-trunk.patch, 
 HBASE-5213-v2-90.patch, HBASE-5213-v2-92.patch, HBASE-5213-v2-trunk.patch


 Typing hbase master stop produces the following message:
 stop   Start cluster shutdown; Master signals RegionServer shutdown
 It seems like backup masters should be considered part of the cluster, but 
 they are not brought down by hbase master stop.
 stop-hbase.sh does correctly bring down the backup masters.
 The same behavior is observed when a client app makes use of the client API 
 HBaseAdmin.shutdown() 
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#shutdown()
  -- this isn't too surprising since I think hbase master stop just calls 
 this API.
 It seems like HBASE-1448 address this; perhaps there was a regression?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5213) hbase master stop does not bring down backup masters

2012-03-31 Thread Jonathan Hsieh (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-5213:
--

Fix Version/s: 0.92.2
   0.90.7

 hbase master stop does not bring down backup masters
 --

 Key: HBASE-5213
 URL: https://issues.apache.org/jira/browse/HBASE-5213
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
Priority: Minor
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5213-v0-trunk.patch, HBASE-5213-v1-trunk.patch, 
 HBASE-5213-v2-90.patch, HBASE-5213-v2-92.patch, HBASE-5213-v2-trunk.patch


 Typing hbase master stop produces the following message:
 stop   Start cluster shutdown; Master signals RegionServer shutdown
 It seems like backup masters should be considered part of the cluster, but 
 they are not brought down by hbase master stop.
 stop-hbase.sh does correctly bring down the backup masters.
 The same behavior is observed when a client app makes use of the client API 
 HBaseAdmin.shutdown() 
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#shutdown()
  -- this isn't too surprising since I think hbase master stop just calls 
 this API.
 It seems like HBASE-1448 address this; perhaps there was a regression?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5213) hbase master stop does not bring down backup masters

2012-03-31 Thread Jonathan Hsieh (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-5213:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 hbase master stop does not bring down backup masters
 --

 Key: HBASE-5213
 URL: https://issues.apache.org/jira/browse/HBASE-5213
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
Priority: Minor
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5213-v0-trunk.patch, HBASE-5213-v1-trunk.patch, 
 HBASE-5213-v2-90.patch, HBASE-5213-v2-92.patch, HBASE-5213-v2-trunk.patch


 Typing hbase master stop produces the following message:
 stop   Start cluster shutdown; Master signals RegionServer shutdown
 It seems like backup masters should be considered part of the cluster, but 
 they are not brought down by hbase master stop.
 stop-hbase.sh does correctly bring down the backup masters.
 The same behavior is observed when a client app makes use of the client API 
 HBaseAdmin.shutdown() 
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#shutdown()
  -- this isn't too surprising since I think hbase master stop just calls 
 this API.
 It seems like HBASE-1448 address this; perhaps there was a regression?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5692) Add real action time for HLogPrettyPrinter

2012-03-31 Thread Xing Shi (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xing Shi updated HBASE-5692:


Attachment: HBASE-5692.patch

 Add real action time for HLogPrettyPrinter
 --

 Key: HBASE-5692
 URL: https://issues.apache.org/jira/browse/HBASE-5692
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Xing Shi
Priority: Minor
 Attachments: HBASE-5692.patch


 Now the HLogPrettyPrinter print the log without real op time but the timestamp
 {quote}
 Sequence 4 from region ee9877dfd55624f50b20acf572416a88 in table t5
   Action:
 row: r
 column: f3:q
 at time: Thu Jan 01 08:02:03 CST 1970
 {quote}
 Maybe we need to know the real op time like this
 {quote}
 Sequence 4 from region ee9877dfd55624f50b20acf572416a88 in table t5 at time: 
 Sun Apr 01 10:42:53 CST 2012
   Action:
 row: r
 column: f3:q
 timestamp: Thu Jan 01 08:02:03 CST 1970
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4348) Add metrics for regions in transition

2012-03-31 Thread Otis Gospodnetic (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243641#comment-13243641
 ] 

Otis Gospodnetic commented on HBASE-4348:
-

Fix Version/s is set to None.  Is this for 0.96?

 Add metrics for regions in transition
 -

 Key: HBASE-4348
 URL: https://issues.apache.org/jira/browse/HBASE-4348
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Himanshu Vashishtha
Priority: Minor
  Labels: noob
 Attachments: 4348-metrics-v3.patch, 4348-v1.patch, 4348-v2.patch, 
 RITs.png, RegionInTransitions2.png, metrics-v2.patch


 The following metrics would be useful for monitoring the master:
 - the number of regions in transition
 - the number of regions in transition that have been in transition for more 
 than a minute
 - how many seconds has the oldest region-in-transition been in transition

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2186) hbase master should publish more stats

2012-03-31 Thread Otis Gospodnetic (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243642#comment-13243642
 ] 

Otis Gospodnetic commented on HBASE-2186:
-

Just saw some other JIRA issue (don't recall the number) that mentioned master 
having more data so I wonder - is this issue still relevant?  Hasn't been 
touched in over 2 years.  Thanks.

 hbase master should publish more stats
 --

 Key: HBASE-2186
 URL: https://issues.apache.org/jira/browse/HBASE-2186
 Project: HBase
  Issue Type: Bug
  Components: metrics
Reporter: ryan rawson

 hbase master only publishes cluster.requests to ganglia. we should also 
 publish regionserver count and other interesting metrics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss


[ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243648#comment-13243648
 ] 

Zhihong Yu commented on HBASE-5689:
---

Some minor comments about Chunhui's patch:
{code}
+  LOG.debug(Rename  + wap.p +  to  + dst);
{code}
The log should begin with Renamed .
{code}
+private final Mapbyte[], Long regionMaximumEditLogSeqNum = Collections
+.synchronizedMap(new TreeMapbyte[], Long(Bytes.BYTES_COMPARATOR));
{code}
Is a TreeMap needed above ? We're just remembering the mapping, right ?

 Skipping RecoveredEdits may cause data loss
 ---

 Key: HBASE-5689
 URL: https://issues.apache.org/jira/browse/HBASE-5689
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5689-simplified.txt, 5689-testcase.patch, 
 HBASE-5689.patch


 Let's see the following scenario:
 1.Region is on the server A
 2.put KV(r1-v1) to the region
 3.move region from server A to server B
 4.put KV(r2-v2) to the region
 5.move region from server B to server A
 6.put KV(r3-v3) to the region
 7.kill -9 server B and start it
 8.kill -9 server A and start it 
 9.scan the region, we could only get two KV(r1-v1,r2-v2), the third 
 KV(r3-v3) is lost.
 Let's analyse the upper scenario from the code:
 1.the edit logs of KV(r1-v1) and KV(r3-v3) are both recorded in the same 
 hlog file on server A.
 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
 we create one RecoveredEdits file f1 for the region.
 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
 we create another RecoveredEdits file f2 for the region.
 3.however, RecoveredEdits file f2 will be skiped when initializing region
 HRegion#replayRecoveredEditsIfAny
 {code}
  for (Path edits: files) {
   if (edits == null || !this.fs.exists(edits)) {
 LOG.warn(Null or non-existent edits file:  + edits);
 continue;
   }
   if (isZeroLengthThenDelete(this.fs, edits)) continue;
   if (checkSafeToSkip) {
 Path higher = files.higher(edits);
 long maxSeqId = Long.MAX_VALUE;
 if (higher != null) {
   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: -?[0-9]+
   String fileName = higher.getName();
   maxSeqId = Math.abs(Long.parseLong(fileName));
 }
 if (maxSeqId = minSeqId) {
   String msg = Maximum possible sequenceid for this log is  + 
 maxSeqId
   + , skipped the whole file, path= + edits;
   LOG.debug(msg);
   continue;
 } else {
   checkSafeToSkip = false;
 }
   }
 {code}
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4348) Add metrics for regions in transition


 [ 
https://issues.apache.org/jira/browse/HBASE-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-4348:
--

Fix Version/s: 0.96.0
 Hadoop Flags: Reviewed

 Add metrics for regions in transition
 -

 Key: HBASE-4348
 URL: https://issues.apache.org/jira/browse/HBASE-4348
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Himanshu Vashishtha
Priority: Minor
  Labels: noob
 Fix For: 0.96.0

 Attachments: 4348-metrics-v3.patch, 4348-v1.patch, 4348-v2.patch, 
 RITs.png, RegionInTransitions2.png, metrics-v2.patch


 The following metrics would be useful for monitoring the master:
 - the number of regions in transition
 - the number of regions in transition that have been in transition for more 
 than a minute
 - how many seconds has the oldest region-in-transition been in transition

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

[
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Lars Hofhansl updated HBASE-5682:
-

Attachment: 5682-all-v2.txt

Found the problem.
The ClusterId could be remain null permanently if
HConnection.getZookeeperWatcher() was called. That would initialize
HConnectionImplementation.zookeeper, and hence not reset clusterid in
ensureZookeeperTrackers.
TestZookeeper.testClientSessionExpired does that.

Also in TestZookeeper.testClientSessionExpired the state might be CONNECTING
rather than CONNECTED depending on timing.

Upon inspection I also made clusterId, rootRegionTracker, masterAddressTracker,
and zooKeeper volatile, because they can be modified by a different thread, but
are not exclusively accessed in a synchronized block (exiting problem).

New patch that fixes the problem, passes all tests.

TestZookeeper seems to have good coverage. If I can think of more tests, I'll
add them there.

Allow HConnectionImplementation to recover from ZK connection loss (for 0.94
only)
--

Key: HBASE-5682
URL: https://issues.apache.org/jira/browse/HBASE-5682
Project: HBase
Issue Type: Improvement
Components: client
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Fix For: 0.94.0

Attachments: 5682-all-v2.txt, 5682-all.txt, 5682-v2.txt, 5682.txt

[jira] [Updated] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

2012-03-31 Thread Lars Hofhansl (Issue Comment Edited) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5682:
-

Priority: Critical  (was: Major)

Upped to critical. Without this the HBase client is pretty much useless in an 
AppServer setting where client can outlive the HBase cluster and ZK ensemble.
(Testing within the Salesforce AppServer is how I noticed the problem 
initially.)


 Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
 only)
 --

 Key: HBASE-5682
 URL: https://issues.apache.org/jira/browse/HBASE-5682
 Project: HBase
  Issue Type: Improvement
  Components: client
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5682-all-v2.txt, 5682-all.txt, 5682-v2.txt, 5682.txt


 Just realized that without this HBASE-4805 is broken.
 I.e. there's no point keeping a persistent HConnection around if it can be 
 rendered permanently unusable if the ZK connection is lost temporarily.
 Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
 backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

[
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243662#comment-13243662
]

Lars Hofhansl edited comment on HBASE-5682 at 4/1/12 5:49 AM:
--

Found the problem.
The ClusterId could remain null permanently if
HConnection.getZookeeperWatcher() was called. That would initialize
HConnectionImplementation.zookeeper, and hence not reset clusterid in
ensureZookeeperTrackers.
TestZookeeper.testClientSessionExpired does that.

Also in TestZookeeper.testClientSessionExpired the state might be CONNECTING
rather than CONNECTED depending on timing.

New patch that fixes the problem, passes all tests.

TestZookeeper seems to have good coverage. If I can think of more tests, I'll
add them there.

was (Author: lhofhansl):
Found the problem.
The ClusterId could be remain null permanently if
HConnection.getZookeeperWatcher() was called. That would initialize
HConnectionImplementation.zookeeper, and hence not reset clusterid in
ensureZookeeperTrackers.
TestZookeeper.testClientSessionExpired does that.

Also in TestZookeeper.testClientSessionExpired the state might be CONNECTING
rather than CONNECTED depending on timing.

New patch that fixes the problem, passes all tests.

TestZookeeper seems to have good coverage. If I can think of more tests, I'll
add them there.

Allow HConnectionImplementation to recover from ZK connection loss (for 0.94
only)
--

Key: HBASE-5682
URL: https://issues.apache.org/jira/browse/HBASE-5682
Project: HBase
Issue Type: Improvement
Components: client
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
Fix For: 0.94.0

Attachments: 5682-all-v2.txt, 5682-all.txt, 5682-v2.txt, 5682.txt

[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)