[jira] [Created] (HBASE-7632) fail to create ReplicationSource if ReplicationPeer.startStateTracker checkExists(peerStateNode) and find not exist but fails in createAndWatch due to client/shell is don
Feng Honghua created HBASE-7632: --- Summary: fail to create ReplicationSource if ReplicationPeer.startStateTracker checkExists(peerStateNode) and find not exist but fails in createAndWatch due to client/shell is done creating it then, now throws exception and results in addPeer fail Key: HBASE-7632 URL: https://issues.apache.org/jira/browse/HBASE-7632 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.4, 0.94.3, 0.94.2 Reporter: Feng Honghua fail to create ReplicationSource if ReplicationPeer.startStateTracker checkExists(peerStateNode) and find not exist but fails in createAndWatch due to client/shell is done creating it then, now throws exception and results in addPeer fail -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7632) fail to create ReplicationSource if ReplicationPeer.startStateTracker checkExists(peerStateNode) and find not exist but fails in createAndWatch due to client/shell is d
[ https://issues.apache.org/jira/browse/HBASE-7632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558517#comment-13558517 ] Feng Honghua commented on HBASE-7632: - The root cause is when client/shell add_peer, it first creates the peerNode, and then creates the peerStateNode; but RS receives the peerNode change event and begins to addPeer accordingly, this occurs after client/shell creates peerNode but before creating peerStateNode; RS's connectToPeer will listen on peerStateNode, which will create the peerStateNode if find the peerStateNode not exists. since client/shell and RS are two different process, RS's check and create can succeed in check phase, but may fail in create phase due to client/shell is done creating peerStateNode. no exclusive mechanism can be used to make RS' check and create atomic. Such bug can be avoided if we merge peerState info into the peerNode, when RS addPeer for a newly added peerNode, the peerState info can be read from the peerNode, no above check and create problem. fail to create ReplicationSource if ReplicationPeer.startStateTracker checkExists(peerStateNode) and find not exist but fails in createAndWatch due to client/shell is done creating it then, now throws exception and results in addPeer fail -- Key: HBASE-7632 URL: https://issues.apache.org/jira/browse/HBASE-7632 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.2, 0.94.3, 0.94.4 Reporter: Feng Honghua Original Estimate: 48h Remaining Estimate: 48h fail to create ReplicationSource if ReplicationPeer.startStateTracker checkExists(peerStateNode) and find not exist but fails in createAndWatch due to client/shell is done creating it then, now throws exception and results in addPeer fail -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7280) TableNotFoundException thrown in peer cluster will incur endless retry for shipEdits, which in turn block following normal replication
[ https://issues.apache.org/jira/browse/HBASE-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527752#comment-13527752 ] Feng Honghua commented on HBASE-7280: - Thanks Jean-Daniel But even REPLICATION_SCOPE is implemented, I don't think it's as flexible as adding per-peer table/CF configuration. Let me know if I'm wrong in understanding how REPLICATION_SCOPE is used as routing information: edits in master cluster will be shipped to all peer clusters whose peer_id-s are less_than_or_equal_to the REPLICATION_SCOPE. But what if a newly added peer want to replicate a table/CF with REPLICATION_SCOPE=A and another table/CF with REPLICATION=E, but doesn't want table/CF with REPLICATION_SCOPE=B/C/D (ABCDE here) ? Interpreting REPLICATION_SCOPE as bit-array and treating each bit as a peer_id has a similar problem. (At least we need to change REPLICATION_SCOPE if the original REPLICATION_SCOPE can't satisfy a later added peer's replication requirement) Why REPLICATION_SCOPE isn't a rescue here is because in many cases the master cluster doesn't know exactly which peer cluster will / want to replicate which table/CF from it when it creates tables/CFs. On the contrast, each peer cluster knows exactly which tables/CFs to replicate from the master cluster when it adds itself as peer to the master cluster. By introducing table/CF list configuration when adding peer, we don't bother with figuring out in advance which(how many) peers can replicate the table/CF when creating them in master cluster, and we don't need to change the REPLICATION_SCOPE later on. ReplicationSourceManager just listens on the peer ZK nodes and adds a new ReplicationSource for the new peer with configured table/CF list, reads/filters/ships edits of the configured tables/CFs to the corresponding peer. ReplicationSource also needs to listen on its peer ZK node for table/CF configuration change, which in turn influence which edits to ship to the peer from then on. Any opinion? TableNotFoundException thrown in peer cluster will incur endless retry for shipEdits, which in turn block following normal replication -- Key: HBASE-7280 URL: https://issues.apache.org/jira/browse/HBASE-7280 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.2 Reporter: Feng Honghua Fix For: 0.94.4 Original Estimate: 0.5h Remaining Estimate: 0.5h in cluster replication, if the master cluster have 2 tables which have column-family declared with replication scope = 1, and add a peer cluster which has only 1 table with the same name as the master cluster, in the ReplicationSource (thread in master cluster) for this peer, edits (logs) for both tables will be shipped to the peer, the peer will fail applying the edits due to TableNotFoundException, and this exception will also be responsed to the original shipper (ReplicationSource in master cluster), and the shipper will fall into an endless retry for shipping the failed edits without proceeding to read the remained(newer) log files and to ship following edits(maybe the normal, expected edit for the registered table). the symptom looks like the TableNotFoundException incurs endless retry and blocking normal table replication -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7280) TableNotFoundException thrown in peer cluster will incur endless retry for shipEdits, which in turn block following normal replication
[ https://issues.apache.org/jira/browse/HBASE-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511021#comment-13511021 ] Feng Honghua commented on HBASE-7280: - I can understand the initiative of current design. A master cluster may have multiple tables with REPLICATION_SCOPE=1, but not all peer clusters want to replicate all these tables, current design prevents only replicating selective table(s). In our scenario, I expect peer cluster(sink) can omit the edits for which the table doesn't exist in peer cluster and only apply edits for which the table(s) exist in peer cluster(we really want to replicate). I make a minor change in ReplicationSink.java which just omits edits for non-existing table(s) in peer cluster and the behavior is what we want. Though this change doesn't reduce the needless network bandwidth it's at least doesn't block the normal replication. Seems current replication's per-cluster granularity is a bit coarse-grained for many real-world scenarios. In my opinion adding such as table- or columnfamily- list configuration for peer when adding peer is more reasonable. TableNotFoundException thrown in peer cluster will incur endless retry for shipEdits, which in turn block following normal replication -- Key: HBASE-7280 URL: https://issues.apache.org/jira/browse/HBASE-7280 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.2 Reporter: Feng Honghua Fix For: 0.94.4 Original Estimate: 0.5h Remaining Estimate: 0.5h in cluster replication, if the master cluster have 2 tables which have column-family declared with replication scope = 1, and add a peer cluster which has only 1 table with the same name as the master cluster, in the ReplicationSource (thread in master cluster) for this peer, edits (logs) for both tables will be shipped to the peer, the peer will fail applying the edits due to TableNotFoundException, and this exception will also be responsed to the original shipper (ReplicationSource in master cluster), and the shipper will fall into an endless retry for shipping the failed edits without proceeding to read the remained(newer) log files and to ship following edits(maybe the normal, expected edit for the registered table). the symptom looks like the TableNotFoundException incurs endless retry and blocking normal table replication -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7280) TableNotFoundException thrown in peer cluster will incur endless retry for shipEdits, which in turn block following normal replication
[ https://issues.apache.org/jira/browse/HBASE-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511076#comment-13511076 ] Feng Honghua commented on HBASE-7280: - yes, that's what I hope for the finer-grained cluster replication. for such design by default (without any table/cf configuration) peer receives all the edits from master cluster. Since in real-world scenario, we may have a master cluster, and a backup cluster which need to replicate the whole copy of the master cluster and it receives all edits, but at the same time maybe there are some experiment/down-stream clusters which just need a certain table or even some CF of a table from master cluster. by providing table/cf configurable peer we can enable such scenarios. ReplicationSource need to parse out the peer's table/cf configuration on creation, and filter the edits while reading the HLog files to determine which edits needs to be shipped to the corresponding peer. Looks like no more change in peer-side (ReplicationSink), right? Yes, my current change in ReplicationSink doesn't save the unnecessary edits to peers, but it's enough to unblocks us. A wiser treatment should be in ReplicationSource where we can filter out unnecessary edits before shipping out to peer cluster by checking if the table exists at peer cluster for each edit. TableNotFoundException thrown in peer cluster will incur endless retry for shipEdits, which in turn block following normal replication -- Key: HBASE-7280 URL: https://issues.apache.org/jira/browse/HBASE-7280 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.2 Reporter: Feng Honghua Fix For: 0.94.4 Original Estimate: 0.5h Remaining Estimate: 0.5h in cluster replication, if the master cluster have 2 tables which have column-family declared with replication scope = 1, and add a peer cluster which has only 1 table with the same name as the master cluster, in the ReplicationSource (thread in master cluster) for this peer, edits (logs) for both tables will be shipped to the peer, the peer will fail applying the edits due to TableNotFoundException, and this exception will also be responsed to the original shipper (ReplicationSource in master cluster), and the shipper will fall into an endless retry for shipping the failed edits without proceeding to read the remained(newer) log files and to ship following edits(maybe the normal, expected edit for the registered table). the symptom looks like the TableNotFoundException incurs endless retry and blocking normal table replication -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7280) TableNotFoundException thrown in peer cluster will incur endless retry for shipEdits, which in turn block following normal replication
Feng Honghua created HBASE-7280: --- Summary: TableNotFoundException thrown in peer cluster will incur endless retry for shipEdits, which in turn block following normal replication Key: HBASE-7280 URL: https://issues.apache.org/jira/browse/HBASE-7280 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.2 Reporter: Feng Honghua Fix For: 0.94.4 in cluster replication, if the master cluster have 2 tables which have column-family declared with replication scope = 1, and add a peer cluster which has only 1 table with the same name as the master cluster, in the ReplicationSource (thread in master cluster) for this peer, edits (logs) for both tables will be shipped to the peer, the peer will fail applying the edits due to TableNotFoundException, and this exception will also be responsed to the original shipper (ReplicationSource in master cluster), and the shipper will fall into an endless retry for shipping the failed edits without proceeding to read the remained(newer) log files and to ship following edits(maybe the normal, expected edit for the registered table). the symptom looks like the TableNotFoundException incurs endless retry and blocking normal table replication -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7226) HRegion.checkAndMutate uses incorrect comparison result for , =, and =
[ https://issues.apache.org/jira/browse/HBASE-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Honghua updated HBASE-7226: Status: Patch Available (was: Open) HRegion.checkAndMutate uses incorrect comparison result for , =, and = --- Key: HBASE-7226 URL: https://issues.apache.org/jira/browse/HBASE-7226 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.2 Environment: 0.94.2 Reporter: Feng Honghua Priority: Minor Fix For: 0.94.2 Original Estimate: 10m Remaining Estimate: 10m in HRegion.checkAndMutate, incorrect comparison results are used for , =, and =, as below: switch (compareOp) { case LESS: matches = compareResult = 0; // should be '' here break; case LESS_OR_EQUAL: matches = compareResult 0; // should be '=' here break; case EQUAL: matches = compareResult == 0; break; case NOT_EQUAL: matches = compareResult != 0; break; case GREATER_OR_EQUAL: matches = compareResult 0; // should be '=' here break; case GREATER: matches = compareResult = 0; // should be '' here break; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7226) HRegion.checkAndMutate uses incorrect comparison result for , =, and =
Feng Honghua created HBASE-7226: --- Summary: HRegion.checkAndMutate uses incorrect comparison result for , =, and = Key: HBASE-7226 URL: https://issues.apache.org/jira/browse/HBASE-7226 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.2 Environment: 0.94.2 Reporter: Feng Honghua Priority: Minor Fix For: 0.94.2 in HRegion.checkAndMutate, incorrect comparison results are used for , =, and =, as below: switch (compareOp) { case LESS: matches = compareResult = 0; // should be '' here break; case LESS_OR_EQUAL: matches = compareResult 0; // should be '=' here break; case EQUAL: matches = compareResult == 0; break; case NOT_EQUAL: matches = compareResult != 0; break; case GREATER_OR_EQUAL: matches = compareResult 0; // should be '=' here break; case GREATER: matches = compareResult = 0; // should be '' here break; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7226) HRegion.checkAndMutate uses incorrect comparison result for , =, and =
[ https://issues.apache.org/jira/browse/HBASE-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Honghua updated HBASE-7226: Attachment: checkAndMutate.patch HRegion.checkAndMutate uses incorrect comparison result for , =, and = --- Key: HBASE-7226 URL: https://issues.apache.org/jira/browse/HBASE-7226 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.2 Environment: 0.94.2 Reporter: Feng Honghua Priority: Minor Fix For: 0.94.2 Attachments: checkAndMutate.patch Original Estimate: 10m Remaining Estimate: 10m in HRegion.checkAndMutate, incorrect comparison results are used for , =, and =, as below: switch (compareOp) { case LESS: matches = compareResult = 0; // should be '' here break; case LESS_OR_EQUAL: matches = compareResult 0; // should be '=' here break; case EQUAL: matches = compareResult == 0; break; case NOT_EQUAL: matches = compareResult != 0; break; case GREATER_OR_EQUAL: matches = compareResult 0; // should be '=' here break; case GREATER: matches = compareResult = 0; // should be '' here break; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7226) HRegion.checkAndMutate uses incorrect comparison result for , =, and =
[ https://issues.apache.org/jira/browse/HBASE-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505236#comment-13505236 ] Feng Honghua commented on HBASE-7226: - the same typo bug exists for trunk as well HRegion.checkAndMutate uses incorrect comparison result for , =, and = --- Key: HBASE-7226 URL: https://issues.apache.org/jira/browse/HBASE-7226 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.2 Environment: 0.94.2 Reporter: Feng Honghua Priority: Minor Fix For: 0.94.2 Attachments: checkAndMutate.patch Original Estimate: 10m Remaining Estimate: 10m in HRegion.checkAndMutate, incorrect comparison results are used for , =, and =, as below: switch (compareOp) { case LESS: matches = compareResult = 0; // should be '' here break; case LESS_OR_EQUAL: matches = compareResult 0; // should be '=' here break; case EQUAL: matches = compareResult == 0; break; case NOT_EQUAL: matches = compareResult != 0; break; case GREATER_OR_EQUAL: matches = compareResult 0; // should be '=' here break; case GREATER: matches = compareResult = 0; // should be '' here break; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7226) HRegion.checkAndMutate uses incorrect comparison result for , =, and =
[ https://issues.apache.org/jira/browse/HBASE-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Honghua updated HBASE-7226: Attachment: (was: checkAndMutate.patch) HRegion.checkAndMutate uses incorrect comparison result for , =, and = --- Key: HBASE-7226 URL: https://issues.apache.org/jira/browse/HBASE-7226 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.2 Environment: 0.94.2 Reporter: Feng Honghua Priority: Minor Fix For: 0.94.2 Attachments: HRegion_HBASE_7226_0.94.2.patch Original Estimate: 10m Remaining Estimate: 10m in HRegion.checkAndMutate, incorrect comparison results are used for , =, and =, as below: switch (compareOp) { case LESS: matches = compareResult = 0; // should be '' here break; case LESS_OR_EQUAL: matches = compareResult 0; // should be '=' here break; case EQUAL: matches = compareResult == 0; break; case NOT_EQUAL: matches = compareResult != 0; break; case GREATER_OR_EQUAL: matches = compareResult 0; // should be '=' here break; case GREATER: matches = compareResult = 0; // should be '' here break; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7226) HRegion.checkAndMutate uses incorrect comparison result for , =, and =
[ https://issues.apache.org/jira/browse/HBASE-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Honghua updated HBASE-7226: Attachment: HRegion_HBASE_7226_0.94.2.patch HRegion.checkAndMutate uses incorrect comparison result for , =, and = --- Key: HBASE-7226 URL: https://issues.apache.org/jira/browse/HBASE-7226 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.2 Environment: 0.94.2 Reporter: Feng Honghua Priority: Minor Fix For: 0.94.2 Attachments: HRegion_HBASE_7226_0.94.2.patch Original Estimate: 10m Remaining Estimate: 10m in HRegion.checkAndMutate, incorrect comparison results are used for , =, and =, as below: switch (compareOp) { case LESS: matches = compareResult = 0; // should be '' here break; case LESS_OR_EQUAL: matches = compareResult 0; // should be '=' here break; case EQUAL: matches = compareResult == 0; break; case NOT_EQUAL: matches = compareResult != 0; break; case GREATER_OR_EQUAL: matches = compareResult 0; // should be '=' here break; case GREATER: matches = compareResult = 0; // should be '' here break; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira