[jira] [Created] (HBASE-7632) fail to create ReplicationSource if ReplicationPeer.startStateTracker checkExists(peerStateNode) and find not exist but fails in createAndWatch due to client/shell is don

2013-01-20 Thread Feng Honghua (JIRA)
Feng Honghua created HBASE-7632:
---

 Summary: fail to create ReplicationSource if 
ReplicationPeer.startStateTracker checkExists(peerStateNode) and find not exist 
but fails in createAndWatch due to client/shell is done creating it then, now 
throws exception and results in addPeer fail
 Key: HBASE-7632
 URL: https://issues.apache.org/jira/browse/HBASE-7632
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.94.4, 0.94.3, 0.94.2
Reporter: Feng Honghua


fail to create ReplicationSource if ReplicationPeer.startStateTracker 
checkExists(peerStateNode) and find not exist but fails in createAndWatch due 
to client/shell is done creating it then, now throws exception and results in 
addPeer fail

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7632) fail to create ReplicationSource if ReplicationPeer.startStateTracker checkExists(peerStateNode) and find not exist but fails in createAndWatch due to client/shell is d

2013-01-20 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558517#comment-13558517
 ] 

Feng Honghua commented on HBASE-7632:
-

The root cause is when client/shell add_peer, it first creates the peerNode, 
and then creates the peerStateNode; but RS receives the peerNode change event 
and begins to addPeer accordingly, this occurs after client/shell creates 
peerNode but before creating peerStateNode; RS's connectToPeer will listen on 
peerStateNode, which will create the peerStateNode if find the peerStateNode 
not exists. since client/shell and RS are two different process, RS's check 
and create can succeed in check phase, but may fail in create phase due to 
client/shell is done creating peerStateNode. no exclusive mechanism can be used 
to make RS' check and create atomic.
Such bug can be avoided if we merge peerState info into the peerNode, when RS 
addPeer for a newly added peerNode, the peerState info can be read from the 
peerNode, no above check and create problem.

 fail to create ReplicationSource if ReplicationPeer.startStateTracker 
 checkExists(peerStateNode) and find not exist but fails in createAndWatch due 
 to client/shell is done creating it then, now throws exception and results in 
 addPeer fail
 --

 Key: HBASE-7632
 URL: https://issues.apache.org/jira/browse/HBASE-7632
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.94.2, 0.94.3, 0.94.4
Reporter: Feng Honghua
   Original Estimate: 48h
  Remaining Estimate: 48h

 fail to create ReplicationSource if ReplicationPeer.startStateTracker 
 checkExists(peerStateNode) and find not exist but fails in createAndWatch due 
 to client/shell is done creating it then, now throws exception and results in 
 addPeer fail

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7280) TableNotFoundException thrown in peer cluster will incur endless retry for shipEdits, which in turn block following normal replication

2012-12-09 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527752#comment-13527752
 ] 

Feng Honghua commented on HBASE-7280:
-

Thanks Jean-Daniel

But even REPLICATION_SCOPE is implemented, I don't think it's as flexible as 
adding per-peer table/CF configuration. Let me know if I'm wrong in 
understanding how REPLICATION_SCOPE is used as routing information: edits in 
master cluster will be shipped to all peer clusters whose peer_id-s are 
less_than_or_equal_to the REPLICATION_SCOPE. But what if a newly added peer 
want to replicate a table/CF with REPLICATION_SCOPE=A and another table/CF with 
REPLICATION=E, but doesn't want table/CF with REPLICATION_SCOPE=B/C/D 
(ABCDE here) ? Interpreting REPLICATION_SCOPE as bit-array and treating 
each bit as a peer_id has a similar problem. (At least we need to change 
REPLICATION_SCOPE if the original REPLICATION_SCOPE can't satisfy a later added 
peer's replication requirement)

Why REPLICATION_SCOPE isn't a rescue here is because in many cases the master 
cluster doesn't know exactly which peer cluster will / want to replicate which 
table/CF from it when it creates tables/CFs. On the contrast, each peer cluster 
knows exactly which tables/CFs to replicate from the master cluster when it 
adds itself as peer to the master cluster. By introducing table/CF list 
configuration when adding peer, we don't bother with figuring out in advance 
which(how many) peers can replicate the table/CF when creating them in master 
cluster, and we don't need to change the REPLICATION_SCOPE later on. 
ReplicationSourceManager just listens on the peer ZK nodes and adds a new 
ReplicationSource for the new peer with configured table/CF list, 
reads/filters/ships edits of the configured tables/CFs to the corresponding 
peer.

ReplicationSource also needs to listen on its peer ZK node for table/CF 
configuration change, which in turn influence which edits to ship to the peer 
from then on.

Any opinion?

 TableNotFoundException thrown in peer cluster will incur endless retry for 
 shipEdits, which in turn block following normal replication
 --

 Key: HBASE-7280
 URL: https://issues.apache.org/jira/browse/HBASE-7280
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.94.2
Reporter: Feng Honghua
 Fix For: 0.94.4

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 in cluster replication, if the master cluster have 2 tables which have 
 column-family declared with replication scope = 1, and add a peer cluster 
 which has only 1 table with the same name as the master cluster, in the 
 ReplicationSource (thread in master cluster) for this peer, edits (logs) for 
 both tables will be shipped to the peer, the peer will fail applying the 
 edits due to TableNotFoundException, and this exception will also be 
 responsed to the original shipper (ReplicationSource in master cluster), and 
 the shipper will fall into an endless retry for shipping the failed edits 
 without proceeding to read the remained(newer) log files and to ship 
 following edits(maybe the normal, expected edit for the registered table). 
 the symptom looks like the TableNotFoundException incurs endless retry and 
 blocking normal table replication

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7280) TableNotFoundException thrown in peer cluster will incur endless retry for shipEdits, which in turn block following normal replication

2012-12-05 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511021#comment-13511021
 ] 

Feng Honghua commented on HBASE-7280:
-

I can understand the initiative of current design. A master cluster may have 
multiple tables with REPLICATION_SCOPE=1, but not all peer clusters want to 
replicate all these tables, current design prevents only replicating selective 
table(s). In our scenario, I expect peer cluster(sink) can omit the edits for 
which the table doesn't exist in peer cluster and only apply edits for which 
the table(s) exist in peer cluster(we really want to replicate). I make a minor 
change in ReplicationSink.java which just omits edits for non-existing table(s) 
in peer cluster and the behavior is what we want. Though this change doesn't 
reduce the needless network bandwidth it's at least doesn't block the normal 
replication.
Seems current replication's per-cluster granularity is a bit coarse-grained for 
many real-world scenarios. In my opinion adding such as table- or columnfamily- 
list configuration for peer when adding peer is more reasonable.

 TableNotFoundException thrown in peer cluster will incur endless retry for 
 shipEdits, which in turn block following normal replication
 --

 Key: HBASE-7280
 URL: https://issues.apache.org/jira/browse/HBASE-7280
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.94.2
Reporter: Feng Honghua
 Fix For: 0.94.4

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 in cluster replication, if the master cluster have 2 tables which have 
 column-family declared with replication scope = 1, and add a peer cluster 
 which has only 1 table with the same name as the master cluster, in the 
 ReplicationSource (thread in master cluster) for this peer, edits (logs) for 
 both tables will be shipped to the peer, the peer will fail applying the 
 edits due to TableNotFoundException, and this exception will also be 
 responsed to the original shipper (ReplicationSource in master cluster), and 
 the shipper will fall into an endless retry for shipping the failed edits 
 without proceeding to read the remained(newer) log files and to ship 
 following edits(maybe the normal, expected edit for the registered table). 
 the symptom looks like the TableNotFoundException incurs endless retry and 
 blocking normal table replication

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7280) TableNotFoundException thrown in peer cluster will incur endless retry for shipEdits, which in turn block following normal replication

2012-12-05 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511076#comment-13511076
 ] 

Feng Honghua commented on HBASE-7280:
-

yes, that's what I hope for the finer-grained cluster replication. for such 
design by default (without any table/cf configuration) peer receives all the 
edits from master cluster. Since in real-world scenario, we may have a master 
cluster, and a backup cluster which need to replicate the whole copy of the 
master cluster and it receives all edits, but at the same time maybe there are 
some experiment/down-stream clusters which just need a certain table or even 
some CF of a table from master cluster. by providing table/cf configurable peer 
we can enable such scenarios. 

ReplicationSource need to parse out the peer's table/cf configuration on 
creation, and filter the edits while reading the HLog files to determine which 
edits needs to be shipped to the corresponding peer. Looks like no more change 
in peer-side (ReplicationSink), right?

Yes, my current change in ReplicationSink doesn't save the unnecessary edits to 
peers, but it's enough to unblocks us. A wiser treatment should be in 
ReplicationSource where we can filter out unnecessary edits before shipping out 
to peer cluster by checking if the table exists at peer cluster for each edit.

 TableNotFoundException thrown in peer cluster will incur endless retry for 
 shipEdits, which in turn block following normal replication
 --

 Key: HBASE-7280
 URL: https://issues.apache.org/jira/browse/HBASE-7280
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.94.2
Reporter: Feng Honghua
 Fix For: 0.94.4

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 in cluster replication, if the master cluster have 2 tables which have 
 column-family declared with replication scope = 1, and add a peer cluster 
 which has only 1 table with the same name as the master cluster, in the 
 ReplicationSource (thread in master cluster) for this peer, edits (logs) for 
 both tables will be shipped to the peer, the peer will fail applying the 
 edits due to TableNotFoundException, and this exception will also be 
 responsed to the original shipper (ReplicationSource in master cluster), and 
 the shipper will fall into an endless retry for shipping the failed edits 
 without proceeding to read the remained(newer) log files and to ship 
 following edits(maybe the normal, expected edit for the registered table). 
 the symptom looks like the TableNotFoundException incurs endless retry and 
 blocking normal table replication

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7280) TableNotFoundException thrown in peer cluster will incur endless retry for shipEdits, which in turn block following normal replication

2012-12-04 Thread Feng Honghua (JIRA)
Feng Honghua created HBASE-7280:
---

 Summary: TableNotFoundException thrown in peer cluster will incur 
endless retry for shipEdits, which in turn block following normal replication
 Key: HBASE-7280
 URL: https://issues.apache.org/jira/browse/HBASE-7280
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.94.2
Reporter: Feng Honghua
 Fix For: 0.94.4


in cluster replication, if the master cluster have 2 tables which have 
column-family declared with replication scope = 1, and add a peer cluster which 
has only 1 table with the same name as the master cluster, in the 
ReplicationSource (thread in master cluster) for this peer, edits (logs) for 
both tables will be shipped to the peer, the peer will fail applying the edits 
due to TableNotFoundException, and this exception will also be responsed to the 
original shipper (ReplicationSource in master cluster), and the shipper will 
fall into an endless retry for shipping the failed edits without proceeding to 
read the remained(newer) log files and to ship following edits(maybe the 
normal, expected edit for the registered table). the symptom looks like the 
TableNotFoundException incurs endless retry and blocking normal table 
replication

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7226) HRegion.checkAndMutate uses incorrect comparison result for , =, and =

2012-11-27 Thread Feng Honghua (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Honghua updated HBASE-7226:


Status: Patch Available  (was: Open)

 HRegion.checkAndMutate uses incorrect comparison result for , =,  and =
 ---

 Key: HBASE-7226
 URL: https://issues.apache.org/jira/browse/HBASE-7226
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.2
 Environment: 0.94.2
Reporter: Feng Honghua
Priority: Minor
 Fix For: 0.94.2

   Original Estimate: 10m
  Remaining Estimate: 10m

 in HRegion.checkAndMutate, incorrect comparison results are used for , =,  
 and =, as below:
   switch (compareOp) {
   case LESS:
 matches = compareResult = 0;  // should be '' here
 break;
   case LESS_OR_EQUAL:
 matches = compareResult  0;  // should be '=' here
 break;
   case EQUAL:
 matches = compareResult == 0;
 break;
   case NOT_EQUAL:
 matches = compareResult != 0;
 break;
   case GREATER_OR_EQUAL:
 matches = compareResult  0;  // should be '=' here
 break;
   case GREATER:
 matches = compareResult = 0;  // should be '' here
 break;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7226) HRegion.checkAndMutate uses incorrect comparison result for , =, and =

2012-11-27 Thread Feng Honghua (JIRA)
Feng Honghua created HBASE-7226:
---

 Summary: HRegion.checkAndMutate uses incorrect comparison result 
for , =,  and =
 Key: HBASE-7226
 URL: https://issues.apache.org/jira/browse/HBASE-7226
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.2
 Environment: 0.94.2
Reporter: Feng Honghua
Priority: Minor
 Fix For: 0.94.2


in HRegion.checkAndMutate, incorrect comparison results are used for , =,  
and =, as below:

  switch (compareOp) {
  case LESS:
matches = compareResult = 0;  // should be '' here
break;
  case LESS_OR_EQUAL:
matches = compareResult  0;  // should be '=' here
break;
  case EQUAL:
matches = compareResult == 0;
break;
  case NOT_EQUAL:
matches = compareResult != 0;
break;
  case GREATER_OR_EQUAL:
matches = compareResult  0;  // should be '=' here
break;
  case GREATER:
matches = compareResult = 0;  // should be '' here
break;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7226) HRegion.checkAndMutate uses incorrect comparison result for , =, and =

2012-11-27 Thread Feng Honghua (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Honghua updated HBASE-7226:


Attachment: checkAndMutate.patch

 HRegion.checkAndMutate uses incorrect comparison result for , =,  and =
 ---

 Key: HBASE-7226
 URL: https://issues.apache.org/jira/browse/HBASE-7226
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.2
 Environment: 0.94.2
Reporter: Feng Honghua
Priority: Minor
 Fix For: 0.94.2

 Attachments: checkAndMutate.patch

   Original Estimate: 10m
  Remaining Estimate: 10m

 in HRegion.checkAndMutate, incorrect comparison results are used for , =,  
 and =, as below:
   switch (compareOp) {
   case LESS:
 matches = compareResult = 0;  // should be '' here
 break;
   case LESS_OR_EQUAL:
 matches = compareResult  0;  // should be '=' here
 break;
   case EQUAL:
 matches = compareResult == 0;
 break;
   case NOT_EQUAL:
 matches = compareResult != 0;
 break;
   case GREATER_OR_EQUAL:
 matches = compareResult  0;  // should be '=' here
 break;
   case GREATER:
 matches = compareResult = 0;  // should be '' here
 break;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7226) HRegion.checkAndMutate uses incorrect comparison result for , =, and =

2012-11-27 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505236#comment-13505236
 ] 

Feng Honghua commented on HBASE-7226:
-

the same typo bug exists for trunk as well

 HRegion.checkAndMutate uses incorrect comparison result for , =,  and =
 ---

 Key: HBASE-7226
 URL: https://issues.apache.org/jira/browse/HBASE-7226
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.2
 Environment: 0.94.2
Reporter: Feng Honghua
Priority: Minor
 Fix For: 0.94.2

 Attachments: checkAndMutate.patch

   Original Estimate: 10m
  Remaining Estimate: 10m

 in HRegion.checkAndMutate, incorrect comparison results are used for , =,  
 and =, as below:
   switch (compareOp) {
   case LESS:
 matches = compareResult = 0;  // should be '' here
 break;
   case LESS_OR_EQUAL:
 matches = compareResult  0;  // should be '=' here
 break;
   case EQUAL:
 matches = compareResult == 0;
 break;
   case NOT_EQUAL:
 matches = compareResult != 0;
 break;
   case GREATER_OR_EQUAL:
 matches = compareResult  0;  // should be '=' here
 break;
   case GREATER:
 matches = compareResult = 0;  // should be '' here
 break;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7226) HRegion.checkAndMutate uses incorrect comparison result for , =, and =

2012-11-27 Thread Feng Honghua (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Honghua updated HBASE-7226:


Attachment: (was: checkAndMutate.patch)

 HRegion.checkAndMutate uses incorrect comparison result for , =,  and =
 ---

 Key: HBASE-7226
 URL: https://issues.apache.org/jira/browse/HBASE-7226
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.2
 Environment: 0.94.2
Reporter: Feng Honghua
Priority: Minor
 Fix For: 0.94.2

 Attachments: HRegion_HBASE_7226_0.94.2.patch

   Original Estimate: 10m
  Remaining Estimate: 10m

 in HRegion.checkAndMutate, incorrect comparison results are used for , =,  
 and =, as below:
   switch (compareOp) {
   case LESS:
 matches = compareResult = 0;  // should be '' here
 break;
   case LESS_OR_EQUAL:
 matches = compareResult  0;  // should be '=' here
 break;
   case EQUAL:
 matches = compareResult == 0;
 break;
   case NOT_EQUAL:
 matches = compareResult != 0;
 break;
   case GREATER_OR_EQUAL:
 matches = compareResult  0;  // should be '=' here
 break;
   case GREATER:
 matches = compareResult = 0;  // should be '' here
 break;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7226) HRegion.checkAndMutate uses incorrect comparison result for , =, and =

2012-11-27 Thread Feng Honghua (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Honghua updated HBASE-7226:


Attachment: HRegion_HBASE_7226_0.94.2.patch

 HRegion.checkAndMutate uses incorrect comparison result for , =,  and =
 ---

 Key: HBASE-7226
 URL: https://issues.apache.org/jira/browse/HBASE-7226
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.2
 Environment: 0.94.2
Reporter: Feng Honghua
Priority: Minor
 Fix For: 0.94.2

 Attachments: HRegion_HBASE_7226_0.94.2.patch

   Original Estimate: 10m
  Remaining Estimate: 10m

 in HRegion.checkAndMutate, incorrect comparison results are used for , =,  
 and =, as below:
   switch (compareOp) {
   case LESS:
 matches = compareResult = 0;  // should be '' here
 break;
   case LESS_OR_EQUAL:
 matches = compareResult  0;  // should be '=' here
 break;
   case EQUAL:
 matches = compareResult == 0;
 break;
   case NOT_EQUAL:
 matches = compareResult != 0;
 break;
   case GREATER_OR_EQUAL:
 matches = compareResult  0;  // should be '=' here
 break;
   case GREATER:
 matches = compareResult = 0;  // should be '' here
 break;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


<    1   2   3   4   5   6