[jira] [Updated] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6550:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: Replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.94.2

 Attachments: 6550-havealook.txt, HBase-6550-0.94.patch, 
 HBase-6550-0.94-v2.patch, HBase-6550-0.94-v3.patch, HBase-6550.patch, 
 HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch, 
 HBase-6550-v5.patch, HBase-6550-v6.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-30 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6550:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to 0.94 and 0.96

 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.96.0, 0.94.2

 Attachments: 6550-havealook.txt, HBase-6550-0.94.patch, 
 HBase-6550-0.94-v2.patch, HBase-6550-0.94-v3.patch, HBase-6550.patch, 
 HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch, 
 HBase-6550-v5.patch, HBase-6550-v6.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-15 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6550:
-

Fix Version/s: 0.94.2
   0.96.0

Let's get this into 0.94 as well.

 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.96.0, 0.94.2

 Attachments: 6550-havealook.txt, HBase-6550.patch, 
 HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-15 Thread Himanshu Vashishtha (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Himanshu Vashishtha updated HBASE-6550:
---

Status: Patch Available  (was: Open)

 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.96.0, 0.94.2

 Attachments: 6550-havealook.txt, HBase-6550.patch, 
 HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-14 Thread Himanshu Vashishtha (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Himanshu Vashishtha updated HBASE-6550:
---

Attachment: HBase-6550.patch

Ok, I removed the bailout behavior. Attached is a patch. Replication tests 
pass; also did a smoke testing on a real cluster.

 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Attachments: 6550-havealook.txt, HBase-6550.patch, HBase-6550-v1.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-14 Thread Himanshu Vashishtha (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Himanshu Vashishtha updated HBASE-6550:
---

Attachment: HBase-6550-v3.patch

closing connection and thread pool in separate try-catch blocks.

 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Attachments: 6550-havealook.txt, HBase-6550.patch, 
 HBase-6550-v1.patch, HBase-6550-v3.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-14 Thread Himanshu Vashishtha (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Himanshu Vashishtha updated HBASE-6550:
---

Attachment: HBase-6550-v4.patch

Sorry, I missed the clone in the last patch.
Included other comments. Thank you all for the feedback.

 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Attachments: 6550-havealook.txt, HBase-6550.patch, 
 HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-09 Thread Himanshu Vashishtha (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Himanshu Vashishtha updated HBASE-6550:
---

Attachment: HBase-6550-v1.patch

Attached is a patch to incorporate the suggestions mentioned in the description.

Testing: jenkins is green; ran replication for a few days (intermittently 
running ycsb write load on master), in tandem with HBase-6165.

 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Attachments: HBase-6550-v1.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-09 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6550:
-

Attachment: 6550-havealook.txt

So you guarding against the client (i.e. the ReplicationSource) going away. I 
see. Although I would not think that that would be a common problem once the 
timeouts here are short enough.

Just because it is easier to make a patch than to describe what I mean, I made 
one.

I am *not* saying you to do it this way, just showing what I mean. Have a look 
(the Executor probably needs tweaking and the DaemonThreadFactory should go 
into a common class, but you get the gist).

 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Attachments: 6550-havealook.txt, HBase-6550-v1.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira