Re: Write retry issues with ZooKeeperClient
Current retry logic is transparent to the type of the operation and is down the stack which has no understanding of the operation type. If we want to differentiate between idempotent and non-idempotent, we may need to insert that logic into the retry logic. Other option is to handle retry at higher level were we have operational context. Both seems little messy. Looks like we can take care of ephemeral node issue as explained above. For ledger nodes and metadata updates we need to come up with possible failures and ramifications of getting node exists then we can figure out how to handle that. On Tue, Jun 27, 2017 at 4:08 PM, Sijie Guowrote: > On Tue, Jun 27, 2017 at 2:36 PM, Sam Just wrote: > > > On Tue, Jun 27, 2017 at 2:29 PM, Sijie Guo wrote: > > > > > On Tue, Jun 27, 2017 at 10:18 AM, Sam Just > wrote: > > > > > > > JV: What do you mean by "May not be perfect for negative testing"? > > > > > > > > I don't think there's anything inevitable about this particular class > > of > > > > behavior. ZK could have chosen to avoid this problem entirely by > doing > > > > duplicate op detection server-side with a per-session transaction > log. > > > > > > > > Since it doesn't, we'll need to solve it ourselves. > > > > ZkLedgerUnderreplicationManager relies on either getting success or > > > > NodeExists on the ephemeral lock node to determine whether a > particular > > > > ReplicationWorker is responsible for replicating a particular ledger. > > I > > > > haven't reproduced this one, but it seems to me that two workers > could > > > both > > > > try to replicate the same ledger and *both get NodeExists* on the > lock > > > > node. This would leave the ledger locked for replication until > > whichever > > > > one actually wrote the node restarts. > > > > > > > > Users like the above are pretty easy to fix. One option would be to > > > simply > > > > include with the write a payload including a nonce for the client. > > Upon > > > a > > > > ConnectionLoss event, we read the node and determine whether we > > "won". I > > > > think ledger creation probably falls into the same category, the > > metadata > > > > could include an identifier for the creator. > > > > > > > > > > for ephemeral znode, you don't have to add extra payload. > > > > > > in the retry logic, > > > > > > - catch NodeExists exception > > > - call exists to check the znode. you can get the Stat ( > > > https://zookeeper.apache.org/doc/r3.4.6/api/org/apache/ > > > zookeeper/data/Stat.html#getEphemeralOwner() > > > ) of this znode. > > > - you can compare the Stat#getEphmeralOwner with the client's current > > > session id. if they match, the node is created by this session, > otherwise > > > this node is created by other session. > > > > > > > > Ah, I'll do that then, much easier. > > > > > > > > > > > > > > > I'm less sure about AbstractZkLedgerManager.writeLedgerMetadata. It > > > > relies > > > > on setData success/BadVersion atomicity to ensure consistent rmw. > > > > > > It's > > > > certainly solvable by including a version vector, but I think there's > > > > probably a simpler way to do it. I'm not sure under what > circumstances > > > > there even *can* be racing writes to the metadata -- is the only case > > > that > > > > matters concurrent updates between the writer and a single > replication > > > > worker? > > > > > > > > > > I don't think it can be addressed by a version vector, there is no > public > > > *history* about who changed what version. > > > But the question is why do you need to address this > #writeLedgerMetadata > > > inside the ZooKeeperClient retry loop. I think there is already a logic > > on > > > #writeLedgerMetadata to resolve conflicts due to BadVersion. > > > The LedgerHandle will re-read the ledger metadata when encountering > > version > > > conflicts. > > > > > > > > That's the part I'm unsure of. I know it has handling for BadVersion, I > > don't know > > whether it's still correct if you get a BadVersion even though your write > > actually > > succeeded, I need to look into that more. Also, I'm not at all > suggesting > > we handle > > this in the retry loop, I'm suggesting that for non-idempotent writes, we > > should not > > retry in ZooKeeperClient and instead propagate the ConnectionLoss error > and > > let the caller deal with it. > > > > +1 I agree with no retrying non-idempotent writes. > > > > -Sam > > > > > > > > > > > > > > > The outline is probably: > > > > 1) Add a way to do retries automatically on *unconditional* creates > and > > > > deletes (there are some users that don't care). > > > > 2) Audit and modify existing users to either use the methods > introduced > > > in > > > > 1 or handle the ConnectionLoss events explicitly. > > > > 3) Switch ZooKeeperClient to not retry conditional writes instead > > > > propagating the ConnectionLoss event, that should make its semantics > > > match > > > >
Re: Write retry issues with ZooKeeperClient
On Tue, Jun 27, 2017 at 4:27 PM, Sam Justwrote: > Should EXPIRED be considered a recoverable error for retry purposes? > Retrying in that case would mean that operations which might have been > submitted under the assumption that ephemeral nodes were still present > would be retried after the ephemeral nodes disappeared. Don't all users > have special handling for EXPIRED anyway? > If a request is failed with expired, that means the request was never attempted. so it is a recoverable error to retry. When a request is failed with expired, that means the session that this request used is also expired and all ephemeral nodes associated with this session will also be deleted. so it is a safe situation for retries. > -Sam > > On Tue, Jun 27, 2017 at 4:08 PM, Sijie Guo wrote: > > > On Tue, Jun 27, 2017 at 2:36 PM, Sam Just wrote: > > > > > On Tue, Jun 27, 2017 at 2:29 PM, Sijie Guo wrote: > > > > > > > On Tue, Jun 27, 2017 at 10:18 AM, Sam Just > > wrote: > > > > > > > > > JV: What do you mean by "May not be perfect for negative testing"? > > > > > > > > > > I don't think there's anything inevitable about this particular > class > > > of > > > > > behavior. ZK could have chosen to avoid this problem entirely by > > doing > > > > > duplicate op detection server-side with a per-session transaction > > log. > > > > > > > > > > Since it doesn't, we'll need to solve it ourselves. > > > > > ZkLedgerUnderreplicationManager relies on either getting success > or > > > > > NodeExists on the ephemeral lock node to determine whether a > > particular > > > > > ReplicationWorker is responsible for replicating a particular > ledger. > > > I > > > > > haven't reproduced this one, but it seems to me that two workers > > could > > > > both > > > > > try to replicate the same ledger and *both get NodeExists* on the > > lock > > > > > node. This would leave the ledger locked for replication until > > > whichever > > > > > one actually wrote the node restarts. > > > > > > > > > > Users like the above are pretty easy to fix. One option would be > to > > > > simply > > > > > include with the write a payload including a nonce for the client. > > > Upon > > > > a > > > > > ConnectionLoss event, we read the node and determine whether we > > > "won". I > > > > > think ledger creation probably falls into the same category, the > > > metadata > > > > > could include an identifier for the creator. > > > > > > > > > > > > > for ephemeral znode, you don't have to add extra payload. > > > > > > > > in the retry logic, > > > > > > > > - catch NodeExists exception > > > > - call exists to check the znode. you can get the Stat ( > > > > https://zookeeper.apache.org/doc/r3.4.6/api/org/apache/ > > > > zookeeper/data/Stat.html#getEphemeralOwner() > > > > ) of this znode. > > > > - you can compare the Stat#getEphmeralOwner with the client's current > > > > session id. if they match, the node is created by this session, > > otherwise > > > > this node is created by other session. > > > > > > > > > > > Ah, I'll do that then, much easier. > > > > > > > > > > > > > > > > > > > > I'm less sure about AbstractZkLedgerManager.writeLedgerMetadata. > It > > > > > relies > > > > > on setData success/BadVersion atomicity to ensure consistent rmw. > > > > > > > > It's > > > > > certainly solvable by including a version vector, but I think > there's > > > > > probably a simpler way to do it. I'm not sure under what > > circumstances > > > > > there even *can* be racing writes to the metadata -- is the only > case > > > > that > > > > > matters concurrent updates between the writer and a single > > replication > > > > > worker? > > > > > > > > > > > > > I don't think it can be addressed by a version vector, there is no > > public > > > > *history* about who changed what version. > > > > But the question is why do you need to address this > > #writeLedgerMetadata > > > > inside the ZooKeeperClient retry loop. I think there is already a > logic > > > on > > > > #writeLedgerMetadata to resolve conflicts due to BadVersion. > > > > The LedgerHandle will re-read the ledger metadata when encountering > > > version > > > > conflicts. > > > > > > > > > > > That's the part I'm unsure of. I know it has handling for BadVersion, > I > > > don't know > > > whether it's still correct if you get a BadVersion even though your > write > > > actually > > > succeeded, I need to look into that more. Also, I'm not at all > > suggesting > > > we handle > > > this in the retry loop, I'm suggesting that for non-idempotent writes, > we > > > should not > > > retry in ZooKeeperClient and instead propagate the ConnectionLoss error > > and > > > let the caller deal with it. > > > > > > > +1 I agree with no retrying non-idempotent writes. > > > > > > > -Sam > > > > > > > > > > > > > > > > > > > > The outline is probably: > > > > > 1) Add a way to do retries automatically
Re: Write retry issues with ZooKeeperClient
Should EXPIRED be considered a recoverable error for retry purposes? Retrying in that case would mean that operations which might have been submitted under the assumption that ephemeral nodes were still present would be retried after the ephemeral nodes disappeared. Don't all users have special handling for EXPIRED anyway? -Sam On Tue, Jun 27, 2017 at 4:08 PM, Sijie Guowrote: > On Tue, Jun 27, 2017 at 2:36 PM, Sam Just wrote: > > > On Tue, Jun 27, 2017 at 2:29 PM, Sijie Guo wrote: > > > > > On Tue, Jun 27, 2017 at 10:18 AM, Sam Just > wrote: > > > > > > > JV: What do you mean by "May not be perfect for negative testing"? > > > > > > > > I don't think there's anything inevitable about this particular class > > of > > > > behavior. ZK could have chosen to avoid this problem entirely by > doing > > > > duplicate op detection server-side with a per-session transaction > log. > > > > > > > > Since it doesn't, we'll need to solve it ourselves. > > > > ZkLedgerUnderreplicationManager relies on either getting success or > > > > NodeExists on the ephemeral lock node to determine whether a > particular > > > > ReplicationWorker is responsible for replicating a particular ledger. > > I > > > > haven't reproduced this one, but it seems to me that two workers > could > > > both > > > > try to replicate the same ledger and *both get NodeExists* on the > lock > > > > node. This would leave the ledger locked for replication until > > whichever > > > > one actually wrote the node restarts. > > > > > > > > Users like the above are pretty easy to fix. One option would be to > > > simply > > > > include with the write a payload including a nonce for the client. > > Upon > > > a > > > > ConnectionLoss event, we read the node and determine whether we > > "won". I > > > > think ledger creation probably falls into the same category, the > > metadata > > > > could include an identifier for the creator. > > > > > > > > > > for ephemeral znode, you don't have to add extra payload. > > > > > > in the retry logic, > > > > > > - catch NodeExists exception > > > - call exists to check the znode. you can get the Stat ( > > > https://zookeeper.apache.org/doc/r3.4.6/api/org/apache/ > > > zookeeper/data/Stat.html#getEphemeralOwner() > > > ) of this znode. > > > - you can compare the Stat#getEphmeralOwner with the client's current > > > session id. if they match, the node is created by this session, > otherwise > > > this node is created by other session. > > > > > > > > Ah, I'll do that then, much easier. > > > > > > > > > > > > > > > I'm less sure about AbstractZkLedgerManager.writeLedgerMetadata. It > > > > relies > > > > on setData success/BadVersion atomicity to ensure consistent rmw. > > > > > > It's > > > > certainly solvable by including a version vector, but I think there's > > > > probably a simpler way to do it. I'm not sure under what > circumstances > > > > there even *can* be racing writes to the metadata -- is the only case > > > that > > > > matters concurrent updates between the writer and a single > replication > > > > worker? > > > > > > > > > > I don't think it can be addressed by a version vector, there is no > public > > > *history* about who changed what version. > > > But the question is why do you need to address this > #writeLedgerMetadata > > > inside the ZooKeeperClient retry loop. I think there is already a logic > > on > > > #writeLedgerMetadata to resolve conflicts due to BadVersion. > > > The LedgerHandle will re-read the ledger metadata when encountering > > version > > > conflicts. > > > > > > > > That's the part I'm unsure of. I know it has handling for BadVersion, I > > don't know > > whether it's still correct if you get a BadVersion even though your write > > actually > > succeeded, I need to look into that more. Also, I'm not at all > suggesting > > we handle > > this in the retry loop, I'm suggesting that for non-idempotent writes, we > > should not > > retry in ZooKeeperClient and instead propagate the ConnectionLoss error > and > > let the caller deal with it. > > > > +1 I agree with no retrying non-idempotent writes. > > > > -Sam > > > > > > > > > > > > > > > The outline is probably: > > > > 1) Add a way to do retries automatically on *unconditional* creates > and > > > > deletes (there are some users that don't care). > > > > 2) Audit and modify existing users to either use the methods > introduced > > > in > > > > 1 or handle the ConnectionLoss events explicitly. > > > > 3) Switch ZooKeeperClient to not retry conditional writes instead > > > > propagating the ConnectionLoss event, that should make its semantics > > > match > > > > the vanilla ZK client. > > > > -Sam > > > > > > > > On Tue, Jun 27, 2017 at 8:59 AM, Sijie Guo > wrote: > > > > > > > > > I agree in current callback model, we only propagate error code to > > the > > > > > outer client, so we lose the information
Re: Write retry issues with ZooKeeperClient
On Tue, Jun 27, 2017 at 2:36 PM, Sam Justwrote: > On Tue, Jun 27, 2017 at 2:29 PM, Sijie Guo wrote: > > > On Tue, Jun 27, 2017 at 10:18 AM, Sam Just wrote: > > > > > JV: What do you mean by "May not be perfect for negative testing"? > > > > > > I don't think there's anything inevitable about this particular class > of > > > behavior. ZK could have chosen to avoid this problem entirely by doing > > > duplicate op detection server-side with a per-session transaction log. > > > > > > Since it doesn't, we'll need to solve it ourselves. > > > ZkLedgerUnderreplicationManager relies on either getting success or > > > NodeExists on the ephemeral lock node to determine whether a particular > > > ReplicationWorker is responsible for replicating a particular ledger. > I > > > haven't reproduced this one, but it seems to me that two workers could > > both > > > try to replicate the same ledger and *both get NodeExists* on the lock > > > node. This would leave the ledger locked for replication until > whichever > > > one actually wrote the node restarts. > > > > > > Users like the above are pretty easy to fix. One option would be to > > simply > > > include with the write a payload including a nonce for the client. > Upon > > a > > > ConnectionLoss event, we read the node and determine whether we > "won". I > > > think ledger creation probably falls into the same category, the > metadata > > > could include an identifier for the creator. > > > > > > > for ephemeral znode, you don't have to add extra payload. > > > > in the retry logic, > > > > - catch NodeExists exception > > - call exists to check the znode. you can get the Stat ( > > https://zookeeper.apache.org/doc/r3.4.6/api/org/apache/ > > zookeeper/data/Stat.html#getEphemeralOwner() > > ) of this znode. > > - you can compare the Stat#getEphmeralOwner with the client's current > > session id. if they match, the node is created by this session, otherwise > > this node is created by other session. > > > > > Ah, I'll do that then, much easier. > > > > > > > > > > I'm less sure about AbstractZkLedgerManager.writeLedgerMetadata. It > > > relies > > > on setData success/BadVersion atomicity to ensure consistent rmw. > > > > It's > > > certainly solvable by including a version vector, but I think there's > > > probably a simpler way to do it. I'm not sure under what circumstances > > > there even *can* be racing writes to the metadata -- is the only case > > that > > > matters concurrent updates between the writer and a single replication > > > worker? > > > > > > > I don't think it can be addressed by a version vector, there is no public > > *history* about who changed what version. > > But the question is why do you need to address this #writeLedgerMetadata > > inside the ZooKeeperClient retry loop. I think there is already a logic > on > > #writeLedgerMetadata to resolve conflicts due to BadVersion. > > The LedgerHandle will re-read the ledger metadata when encountering > version > > conflicts. > > > > > That's the part I'm unsure of. I know it has handling for BadVersion, I > don't know > whether it's still correct if you get a BadVersion even though your write > actually > succeeded, I need to look into that more. Also, I'm not at all suggesting > we handle > this in the retry loop, I'm suggesting that for non-idempotent writes, we > should not > retry in ZooKeeperClient and instead propagate the ConnectionLoss error and > let the caller deal with it. > +1 I agree with no retrying non-idempotent writes. > -Sam > > > > > > > > > > The outline is probably: > > > 1) Add a way to do retries automatically on *unconditional* creates and > > > deletes (there are some users that don't care). > > > 2) Audit and modify existing users to either use the methods introduced > > in > > > 1 or handle the ConnectionLoss events explicitly. > > > 3) Switch ZooKeeperClient to not retry conditional writes instead > > > propagating the ConnectionLoss event, that should make its semantics > > match > > > the vanilla ZK client. > > > -Sam > > > > > > On Tue, Jun 27, 2017 at 8:59 AM, Sijie Guo wrote: > > > > > > > I agree in current callback model, we only propagate error code to > the > > > > outer client, so we lose the information about the detail cause. > > > > > > > > But I think we also tried to map some zk error codes to bk error > codes. > > > > > > > > NoNode -> NoLedger > > > > NodeExists -> LedgerExists > > > > ··· (other codes) > > > > > > > > Also we tried to hide zk metadata related implementation behind > > > interfaces, > > > > so some of the errors should be handled by the zk ledger manager > > > > implementation before propagating to the client. > > > > > > > > Sijie > > > > > > > > On Jun 27, 2017 7:32 AM, "Enrico Olivelli - Diennea" < > > > > enrico.olive...@diennea.com> wrote: > > > > > > > > > Il giorno mar, 27/06/2017 alle 13.45 +, Venkateswara Rao > Jujjuri > >
Re: Write retry issues with ZooKeeperClient
On Tue, Jun 27, 2017 at 2:29 PM, Sijie Guowrote: > On Tue, Jun 27, 2017 at 10:18 AM, Sam Just wrote: > > > JV: What do you mean by "May not be perfect for negative testing"? > > > > I don't think there's anything inevitable about this particular class of > > behavior. ZK could have chosen to avoid this problem entirely by doing > > duplicate op detection server-side with a per-session transaction log. > > > > Since it doesn't, we'll need to solve it ourselves. > > ZkLedgerUnderreplicationManager relies on either getting success or > > NodeExists on the ephemeral lock node to determine whether a particular > > ReplicationWorker is responsible for replicating a particular ledger. I > > haven't reproduced this one, but it seems to me that two workers could > both > > try to replicate the same ledger and *both get NodeExists* on the lock > > node. This would leave the ledger locked for replication until whichever > > one actually wrote the node restarts. > > > > Users like the above are pretty easy to fix. One option would be to > simply > > include with the write a payload including a nonce for the client. Upon > a > > ConnectionLoss event, we read the node and determine whether we "won". I > > think ledger creation probably falls into the same category, the metadata > > could include an identifier for the creator. > > > > for ephemeral znode, you don't have to add extra payload. > > in the retry logic, > > - catch NodeExists exception > - call exists to check the znode. you can get the Stat ( > https://zookeeper.apache.org/doc/r3.4.6/api/org/apache/ > zookeeper/data/Stat.html#getEphemeralOwner() > ) of this znode. > - you can compare the Stat#getEphmeralOwner with the client's current > session id. if they match, the node is created by this session, otherwise > this node is created by other session. > > Ah, I'll do that then, much easier. > > > > > I'm less sure about AbstractZkLedgerManager.writeLedgerMetadata. It > > relies > > on setData success/BadVersion atomicity to ensure consistent rmw. > > It's > > certainly solvable by including a version vector, but I think there's > > probably a simpler way to do it. I'm not sure under what circumstances > > there even *can* be racing writes to the metadata -- is the only case > that > > matters concurrent updates between the writer and a single replication > > worker? > > > > I don't think it can be addressed by a version vector, there is no public > *history* about who changed what version. > But the question is why do you need to address this #writeLedgerMetadata > inside the ZooKeeperClient retry loop. I think there is already a logic on > #writeLedgerMetadata to resolve conflicts due to BadVersion. > The LedgerHandle will re-read the ledger metadata when encountering version > conflicts. > > That's the part I'm unsure of. I know it has handling for BadVersion, I don't know whether it's still correct if you get a BadVersion even though your write actually succeeded, I need to look into that more. Also, I'm not at all suggesting we handle this in the retry loop, I'm suggesting that for non-idempotent writes, we should not retry in ZooKeeperClient and instead propagate the ConnectionLoss error and let the caller deal with it. -Sam > > > > > The outline is probably: > > 1) Add a way to do retries automatically on *unconditional* creates and > > deletes (there are some users that don't care). > > 2) Audit and modify existing users to either use the methods introduced > in > > 1 or handle the ConnectionLoss events explicitly. > > 3) Switch ZooKeeperClient to not retry conditional writes instead > > propagating the ConnectionLoss event, that should make its semantics > match > > the vanilla ZK client. > > -Sam > > > > On Tue, Jun 27, 2017 at 8:59 AM, Sijie Guo wrote: > > > > > I agree in current callback model, we only propagate error code to the > > > outer client, so we lose the information about the detail cause. > > > > > > But I think we also tried to map some zk error codes to bk error codes. > > > > > > NoNode -> NoLedger > > > NodeExists -> LedgerExists > > > ··· (other codes) > > > > > > Also we tried to hide zk metadata related implementation behind > > interfaces, > > > so some of the errors should be handled by the zk ledger manager > > > implementation before propagating to the client. > > > > > > Sijie > > > > > > On Jun 27, 2017 7:32 AM, "Enrico Olivelli - Diennea" < > > > enrico.olive...@diennea.com> wrote: > > > > > > > Il giorno mar, 27/06/2017 alle 13.45 +, Venkateswara Rao Jujjuri > ha > > > > scritto: > > > > > > > > This is nothing different in any network based system. Like nfs. So > we > > > need > > > > to have some kind of logic on the client side to make reasonable > > > > assumption. May not be perfect for negative testing. > > > > > > > > > > > > Many times I wanted to have some "exception cause" on BKException, > > > > expecially for ZK issues. > > >
Re: Write retry issues with ZooKeeperClient
On Tue, Jun 27, 2017 at 10:18 AM, Sam Justwrote: > JV: What do you mean by "May not be perfect for negative testing"? > > I don't think there's anything inevitable about this particular class of > behavior. ZK could have chosen to avoid this problem entirely by doing > duplicate op detection server-side with a per-session transaction log. > > Since it doesn't, we'll need to solve it ourselves. > ZkLedgerUnderreplicationManager relies on either getting success or > NodeExists on the ephemeral lock node to determine whether a particular > ReplicationWorker is responsible for replicating a particular ledger. I > haven't reproduced this one, but it seems to me that two workers could both > try to replicate the same ledger and *both get NodeExists* on the lock > node. This would leave the ledger locked for replication until whichever > one actually wrote the node restarts. > > Users like the above are pretty easy to fix. One option would be to simply > include with the write a payload including a nonce for the client. Upon a > ConnectionLoss event, we read the node and determine whether we "won". I > think ledger creation probably falls into the same category, the metadata > could include an identifier for the creator. > for ephemeral znode, you don't have to add extra payload. in the retry logic, - catch NodeExists exception - call exists to check the znode. you can get the Stat ( https://zookeeper.apache.org/doc/r3.4.6/api/org/apache/zookeeper/data/Stat.html#getEphemeralOwner() ) of this znode. - you can compare the Stat#getEphmeralOwner with the client's current session id. if they match, the node is created by this session, otherwise this node is created by other session. > > I'm less sure about AbstractZkLedgerManager.writeLedgerMetadata. It > relies > on setData success/BadVersion atomicity to ensure consistent rmw. It's > certainly solvable by including a version vector, but I think there's > probably a simpler way to do it. I'm not sure under what circumstances > there even *can* be racing writes to the metadata -- is the only case that > matters concurrent updates between the writer and a single replication > worker? > I don't think it can be addressed by a version vector, there is no public *history* about who changed what version. But the question is why do you need to address this #writeLedgerMetadata inside the ZooKeeperClient retry loop. I think there is already a logic on #writeLedgerMetadata to resolve conflicts due to BadVersion. The LedgerHandle will re-read the ledger metadata when encountering version conflicts. > > The outline is probably: > 1) Add a way to do retries automatically on *unconditional* creates and > deletes (there are some users that don't care). > 2) Audit and modify existing users to either use the methods introduced in > 1 or handle the ConnectionLoss events explicitly. > 3) Switch ZooKeeperClient to not retry conditional writes instead > propagating the ConnectionLoss event, that should make its semantics match > the vanilla ZK client. > -Sam > > On Tue, Jun 27, 2017 at 8:59 AM, Sijie Guo wrote: > > > I agree in current callback model, we only propagate error code to the > > outer client, so we lose the information about the detail cause. > > > > But I think we also tried to map some zk error codes to bk error codes. > > > > NoNode -> NoLedger > > NodeExists -> LedgerExists > > ··· (other codes) > > > > Also we tried to hide zk metadata related implementation behind > interfaces, > > so some of the errors should be handled by the zk ledger manager > > implementation before propagating to the client. > > > > Sijie > > > > On Jun 27, 2017 7:32 AM, "Enrico Olivelli - Diennea" < > > enrico.olive...@diennea.com> wrote: > > > > > Il giorno mar, 27/06/2017 alle 13.45 +, Venkateswara Rao Jujjuri ha > > > scritto: > > > > > > This is nothing different in any network based system. Like nfs. So we > > need > > > to have some kind of logic on the client side to make reasonable > > > assumption. May not be perfect for negative testing. > > > > > > > > > Many times I wanted to have some "exception cause" on BKException, > > > expecially for ZK issues. > > > The way we use only int error codes hides the root causes of the error. > > > BookKeeper client writes to the log, but the "cause" cannot be reported > > to > > > higher level logs and sometime this is annoying. > > > In the future I would like to add more details on errors > > > > > > -- Enrico > > > > > > > > > > > > > > > > > > JV > > > > > > On Mon, Jun 26, 2017 at 11:19 PM Sijie Guo guo > > > si...@gmail.com>> wrote: > > > > > > > > > > > > Hi Sam, > > > > > > Let's assume there is no such retry logic. How do you expect to handle > > this > > > situation? > > > > > > Can your application try to create a new ledger or catch NodeExists > > > exception? > > > > > > - Sijie > > > > > > On Mon, Jun 26, 2017 at 5:49 PM, Sam Just
Re: Write retry issues with ZooKeeperClient
JV: What do you mean by "May not be perfect for negative testing"? I don't think there's anything inevitable about this particular class of behavior. ZK could have chosen to avoid this problem entirely by doing duplicate op detection server-side with a per-session transaction log. Since it doesn't, we'll need to solve it ourselves. ZkLedgerUnderreplicationManager relies on either getting success or NodeExists on the ephemeral lock node to determine whether a particular ReplicationWorker is responsible for replicating a particular ledger. I haven't reproduced this one, but it seems to me that two workers could both try to replicate the same ledger and *both get NodeExists* on the lock node. This would leave the ledger locked for replication until whichever one actually wrote the node restarts. Users like the above are pretty easy to fix. One option would be to simply include with the write a payload including a nonce for the client. Upon a ConnectionLoss event, we read the node and determine whether we "won". I think ledger creation probably falls into the same category, the metadata could include an identifier for the creator. I'm less sure about AbstractZkLedgerManager.writeLedgerMetadata. It relies on setData success/BadVersion atomicity to ensure consistent rmw. It's certainly solvable by including a version vector, but I think there's probably a simpler way to do it. I'm not sure under what circumstances there even *can* be racing writes to the metadata -- is the only case that matters concurrent updates between the writer and a single replication worker? The outline is probably: 1) Add a way to do retries automatically on *unconditional* creates and deletes (there are some users that don't care). 2) Audit and modify existing users to either use the methods introduced in 1 or handle the ConnectionLoss events explicitly. 3) Switch ZooKeeperClient to not retry conditional writes instead propagating the ConnectionLoss event, that should make its semantics match the vanilla ZK client. -Sam On Tue, Jun 27, 2017 at 8:59 AM, Sijie Guowrote: > I agree in current callback model, we only propagate error code to the > outer client, so we lose the information about the detail cause. > > But I think we also tried to map some zk error codes to bk error codes. > > NoNode -> NoLedger > NodeExists -> LedgerExists > ··· (other codes) > > Also we tried to hide zk metadata related implementation behind interfaces, > so some of the errors should be handled by the zk ledger manager > implementation before propagating to the client. > > Sijie > > On Jun 27, 2017 7:32 AM, "Enrico Olivelli - Diennea" < > enrico.olive...@diennea.com> wrote: > > > Il giorno mar, 27/06/2017 alle 13.45 +, Venkateswara Rao Jujjuri ha > > scritto: > > > > This is nothing different in any network based system. Like nfs. So we > need > > to have some kind of logic on the client side to make reasonable > > assumption. May not be perfect for negative testing. > > > > > > Many times I wanted to have some "exception cause" on BKException, > > expecially for ZK issues. > > The way we use only int error codes hides the root causes of the error. > > BookKeeper client writes to the log, but the "cause" cannot be reported > to > > higher level logs and sometime this is annoying. > > In the future I would like to add more details on errors > > > > -- Enrico > > > > > > > > > > > > JV > > > > On Mon, Jun 26, 2017 at 11:19 PM Sijie Guo > si...@gmail.com>> wrote: > > > > > > > > Hi Sam, > > > > Let's assume there is no such retry logic. How do you expect to handle > this > > situation? > > > > Can your application try to create a new ledger or catch NodeExists > > exception? > > > > - Sijie > > > > On Mon, Jun 26, 2017 at 5:49 PM, Sam Just j...@salesforce.com>> wrote: > > > > > > > > Hmm, curator seems to have essentially the same problem: > > https://issues.apache.org/jira/browse/CURATOR-268 > > I'm not sure there's a good way to solve this transparently -- the right > > answer is > > probably to plumb the ConnectionLoss event through ZooKeeperClient for > > interested callers, audit for metadata users where we depend on > > > > > > atomicity, > > > > > > and update each one to handle it appropriately. > > -Sam > > > > On Mon, Jun 26, 2017 at 4:34 PM, Sam Just j...@salesforce.com>> wrote: > > > > > > > > BookKeeper has a wrapper class for the ZooKeeper client called > > ZooKeeperClient. > > Its purpose appears to be to transparently perform retries in the case > > > > > > that > > > > > > ZooKeeper returns ConnectionLoss on an operation due to a Disconnect > > > > > > event. > > > > > > > > The trouble is that it's possible that a write which received a > > ConnectionLoss > > error as a return value actually succeeded. Once ZooKeeperClient > > > > > > retries, > > > > > > it'll > > get back NodeExists and propagate that error to the caller, even
Re: Write retry issues with ZooKeeperClient
I agree in current callback model, we only propagate error code to the outer client, so we lose the information about the detail cause. But I think we also tried to map some zk error codes to bk error codes. NoNode -> NoLedger NodeExists -> LedgerExists ··· (other codes) Also we tried to hide zk metadata related implementation behind interfaces, so some of the errors should be handled by the zk ledger manager implementation before propagating to the client. Sijie On Jun 27, 2017 7:32 AM, "Enrico Olivelli - Diennea" < enrico.olive...@diennea.com> wrote: > Il giorno mar, 27/06/2017 alle 13.45 +, Venkateswara Rao Jujjuri ha > scritto: > > This is nothing different in any network based system. Like nfs. So we need > to have some kind of logic on the client side to make reasonable > assumption. May not be perfect for negative testing. > > > Many times I wanted to have some "exception cause" on BKException, > expecially for ZK issues. > The way we use only int error codes hides the root causes of the error. > BookKeeper client writes to the log, but the "cause" cannot be reported to > higher level logs and sometime this is annoying. > In the future I would like to add more details on errors > > -- Enrico > > > > > > JV > > On Mon, Jun 26, 2017 at 11:19 PM Sijie Guosi...@gmail.com>> wrote: > > > > Hi Sam, > > Let's assume there is no such retry logic. How do you expect to handle this > situation? > > Can your application try to create a new ledger or catch NodeExists > exception? > > - Sijie > > On Mon, Jun 26, 2017 at 5:49 PM, Sam Just > wrote: > > > > Hmm, curator seems to have essentially the same problem: > https://issues.apache.org/jira/browse/CURATOR-268 > I'm not sure there's a good way to solve this transparently -- the right > answer is > probably to plumb the ConnectionLoss event through ZooKeeperClient for > interested callers, audit for metadata users where we depend on > > > atomicity, > > > and update each one to handle it appropriately. > -Sam > > On Mon, Jun 26, 2017 at 4:34 PM, Sam Just > wrote: > > > > BookKeeper has a wrapper class for the ZooKeeper client called > ZooKeeperClient. > Its purpose appears to be to transparently perform retries in the case > > > that > > > ZooKeeper returns ConnectionLoss on an operation due to a Disconnect > > > event. > > > > The trouble is that it's possible that a write which received a > ConnectionLoss > error as a return value actually succeeded. Once ZooKeeperClient > > > retries, > > > it'll > get back NodeExists and propagate that error to the caller, even though > > > the > > > write succeeded and the node in fact did not exist. > > It seems as though the same issue would hold for setData and delete > > > > > calls > > > > > which > use the version argument -- you could get a spurious BadVersion. > > I believe I've reproduced the variant with a spurious NodeExists. It > manifested as a suprious BKLedgerExistException when running against a > > > > > 3 > > > > > instance ZooKeeper cluster with dm-delay under the ZooKeeper instance > storage > to force Disconnect events by injecting write delays. This seems to > > > > > make > > > > > sense > as AbstractZkLedgerManager.createLedgerMetadata uses > ZkUtils.asyncCreateFullPathOptimistic to create the metadata node and > appears > to depend on the create atomicity to avoid two writers overwriting each > other's > nodes. > > AbstractZkLedgerManager.writeLedger would seem to have the same problem > with > its dependence on using setData with the version argument to avoid lost > updates. > > Am I missing something in this analysis? It seems to me that behavior > could > be actually problematic during periods of spotty connectivity to the > ZooKeeper cluster. > > Thanks! > -Sam > > > > > > > > > > -- > > Enrico Olivelli Software Development Manager @Diennea Tel.: (+39) 0546 > 066100 - Int. 925 Viale G.Marconi 30/14 - 48018 Faenza (RA) MagNews - > E-mail Marketing Solutions http://www.magnews.it Diennea - Digital > Marketing Solutions http://www.diennea.com > > > > Iscriviti alla nostra newsletter per rimanere aggiornato su digital ed > email marketing! http://www.magnews.it/newsletter/ > > The information in this email is confidential and may be legally > privileged. If you are not the intended recipient please notify the sender > immediately and destroy this email. Any unauthorized, direct or indirect, > disclosure, copying, storage, distribution or other use is strictly > forbidden. >
Re: Write retry issues with ZooKeeperClient
I agree. For create ledger, it is easy to handle - the zk ledger manager can catch such exception and recreate a new one. For create ledger adv, the ledger id is supplied by application, so the application is responsible for handle this ledgerexists exception. Thoughts? On Jun 27, 2017 6:45 AM, "Venkateswara Rao Jujjuri"wrote: This is nothing different in any network based system. Like nfs. So we need to have some kind of logic on the client side to make reasonable assumption. May not be perfect for negative testing. JV On Mon, Jun 26, 2017 at 11:19 PM Sijie Guo wrote: > Hi Sam, > > Let's assume there is no such retry logic. How do you expect to handle this > situation? > > Can your application try to create a new ledger or catch NodeExists > exception? > > - Sijie > > On Mon, Jun 26, 2017 at 5:49 PM, Sam Just wrote: > > > Hmm, curator seems to have essentially the same problem: > > https://issues.apache.org/jira/browse/CURATOR-268 > > I'm not sure there's a good way to solve this transparently -- the right > > answer is > > probably to plumb the ConnectionLoss event through ZooKeeperClient for > > interested callers, audit for metadata users where we depend on > atomicity, > > and update each one to handle it appropriately. > > -Sam > > > > On Mon, Jun 26, 2017 at 4:34 PM, Sam Just wrote: > > > > > BookKeeper has a wrapper class for the ZooKeeper client called > > > ZooKeeperClient. > > > Its purpose appears to be to transparently perform retries in the case > > that > > > ZooKeeper returns ConnectionLoss on an operation due to a Disconnect > > event. > > > > > > The trouble is that it's possible that a write which received a > > > ConnectionLoss > > > error as a return value actually succeeded. Once ZooKeeperClient > > retries, > > > it'll > > > get back NodeExists and propagate that error to the caller, even though > > the > > > write succeeded and the node in fact did not exist. > > > > > > It seems as though the same issue would hold for setData and delete > calls > > > which > > > use the version argument -- you could get a spurious BadVersion. > > > > > > I believe I've reproduced the variant with a spurious NodeExists. It > > > manifested as a suprious BKLedgerExistException when running against a > 3 > > > instance ZooKeeper cluster with dm-delay under the ZooKeeper instance > > > storage > > > to force Disconnect events by injecting write delays. This seems to > make > > > sense > > > as AbstractZkLedgerManager.createLedgerMetadata uses > > > ZkUtils.asyncCreateFullPathOptimistic to create the metadata node and > > > appears > > > to depend on the create atomicity to avoid two writers overwriting each > > > other's > > > nodes. > > > > > > AbstractZkLedgerManager.writeLedger would seem to have the same problem > > > with > > > its dependence on using setData with the version argument to avoid lost > > > updates. > > > > > > Am I missing something in this analysis? It seems to me that behavior > > > could > > > be actually problematic during periods of spotty connectivity to the > > > ZooKeeper cluster. > > > > > > Thanks! > > > -Sam > > > > > > -- Sent from iPhone
Re: Write retry issues with ZooKeeperClient
Il giorno mar, 27/06/2017 alle 13.45 +, Venkateswara Rao Jujjuri ha scritto: This is nothing different in any network based system. Like nfs. So we need to have some kind of logic on the client side to make reasonable assumption. May not be perfect for negative testing. Many times I wanted to have some "exception cause" on BKException, expecially for ZK issues. The way we use only int error codes hides the root causes of the error. BookKeeper client writes to the log, but the "cause" cannot be reported to higher level logs and sometime this is annoying. In the future I would like to add more details on errors -- Enrico JV On Mon, Jun 26, 2017 at 11:19 PM Sijie Guo> wrote: Hi Sam, Let's assume there is no such retry logic. How do you expect to handle this situation? Can your application try to create a new ledger or catch NodeExists exception? - Sijie On Mon, Jun 26, 2017 at 5:49 PM, Sam Just > wrote: Hmm, curator seems to have essentially the same problem: https://issues.apache.org/jira/browse/CURATOR-268 I'm not sure there's a good way to solve this transparently -- the right answer is probably to plumb the ConnectionLoss event through ZooKeeperClient for interested callers, audit for metadata users where we depend on atomicity, and update each one to handle it appropriately. -Sam On Mon, Jun 26, 2017 at 4:34 PM, Sam Just > wrote: BookKeeper has a wrapper class for the ZooKeeper client called ZooKeeperClient. Its purpose appears to be to transparently perform retries in the case that ZooKeeper returns ConnectionLoss on an operation due to a Disconnect event. The trouble is that it's possible that a write which received a ConnectionLoss error as a return value actually succeeded. Once ZooKeeperClient retries, it'll get back NodeExists and propagate that error to the caller, even though the write succeeded and the node in fact did not exist. It seems as though the same issue would hold for setData and delete calls which use the version argument -- you could get a spurious BadVersion. I believe I've reproduced the variant with a spurious NodeExists. It manifested as a suprious BKLedgerExistException when running against a 3 instance ZooKeeper cluster with dm-delay under the ZooKeeper instance storage to force Disconnect events by injecting write delays. This seems to make sense as AbstractZkLedgerManager.createLedgerMetadata uses ZkUtils.asyncCreateFullPathOptimistic to create the metadata node and appears to depend on the create atomicity to avoid two writers overwriting each other's nodes. AbstractZkLedgerManager.writeLedger would seem to have the same problem with its dependence on using setData with the version argument to avoid lost updates. Am I missing something in this analysis? It seems to me that behavior could be actually problematic during periods of spotty connectivity to the ZooKeeper cluster. Thanks! -Sam -- Enrico Olivelli Software Development Manager @Diennea Tel.: (+39) 0546 066100 - Int. 925 Viale G.Marconi 30/14 - 48018 Faenza (RA) MagNews - E-mail Marketing Solutions http://www.magnews.it Diennea - Digital Marketing Solutions http://www.diennea.com Iscriviti alla nostra newsletter per rimanere aggiornato su digital ed email marketing! http://www.magnews.it/newsletter/ The information in this email is confidential and may be legally privileged. If you are not the intended recipient please notify the sender immediately and destroy this email. Any unauthorized, direct or indirect, disclosure, copying, storage, distribution or other use is strictly forbidden.
Re: Write retry issues with ZooKeeperClient
Hi Sam, Let's assume there is no such retry logic. How do you expect to handle this situation? Can your application try to create a new ledger or catch NodeExists exception? - Sijie On Mon, Jun 26, 2017 at 5:49 PM, Sam Justwrote: > Hmm, curator seems to have essentially the same problem: > https://issues.apache.org/jira/browse/CURATOR-268 > I'm not sure there's a good way to solve this transparently -- the right > answer is > probably to plumb the ConnectionLoss event through ZooKeeperClient for > interested callers, audit for metadata users where we depend on atomicity, > and update each one to handle it appropriately. > -Sam > > On Mon, Jun 26, 2017 at 4:34 PM, Sam Just wrote: > > > BookKeeper has a wrapper class for the ZooKeeper client called > > ZooKeeperClient. > > Its purpose appears to be to transparently perform retries in the case > that > > ZooKeeper returns ConnectionLoss on an operation due to a Disconnect > event. > > > > The trouble is that it's possible that a write which received a > > ConnectionLoss > > error as a return value actually succeeded. Once ZooKeeperClient > retries, > > it'll > > get back NodeExists and propagate that error to the caller, even though > the > > write succeeded and the node in fact did not exist. > > > > It seems as though the same issue would hold for setData and delete calls > > which > > use the version argument -- you could get a spurious BadVersion. > > > > I believe I've reproduced the variant with a spurious NodeExists. It > > manifested as a suprious BKLedgerExistException when running against a 3 > > instance ZooKeeper cluster with dm-delay under the ZooKeeper instance > > storage > > to force Disconnect events by injecting write delays. This seems to make > > sense > > as AbstractZkLedgerManager.createLedgerMetadata uses > > ZkUtils.asyncCreateFullPathOptimistic to create the metadata node and > > appears > > to depend on the create atomicity to avoid two writers overwriting each > > other's > > nodes. > > > > AbstractZkLedgerManager.writeLedger would seem to have the same problem > > with > > its dependence on using setData with the version argument to avoid lost > > updates. > > > > Am I missing something in this analysis? It seems to me that behavior > > could > > be actually problematic during periods of spotty connectivity to the > > ZooKeeper cluster. > > > > Thanks! > > -Sam > > >
Re: Write retry issues with ZooKeeperClient
Hmm, curator seems to have essentially the same problem: https://issues.apache.org/jira/browse/CURATOR-268 I'm not sure there's a good way to solve this transparently -- the right answer is probably to plumb the ConnectionLoss event through ZooKeeperClient for interested callers, audit for metadata users where we depend on atomicity, and update each one to handle it appropriately. -Sam On Mon, Jun 26, 2017 at 4:34 PM, Sam Justwrote: > BookKeeper has a wrapper class for the ZooKeeper client called > ZooKeeperClient. > Its purpose appears to be to transparently perform retries in the case that > ZooKeeper returns ConnectionLoss on an operation due to a Disconnect event. > > The trouble is that it's possible that a write which received a > ConnectionLoss > error as a return value actually succeeded. Once ZooKeeperClient retries, > it'll > get back NodeExists and propagate that error to the caller, even though the > write succeeded and the node in fact did not exist. > > It seems as though the same issue would hold for setData and delete calls > which > use the version argument -- you could get a spurious BadVersion. > > I believe I've reproduced the variant with a spurious NodeExists. It > manifested as a suprious BKLedgerExistException when running against a 3 > instance ZooKeeper cluster with dm-delay under the ZooKeeper instance > storage > to force Disconnect events by injecting write delays. This seems to make > sense > as AbstractZkLedgerManager.createLedgerMetadata uses > ZkUtils.asyncCreateFullPathOptimistic to create the metadata node and > appears > to depend on the create atomicity to avoid two writers overwriting each > other's > nodes. > > AbstractZkLedgerManager.writeLedger would seem to have the same problem > with > its dependence on using setData with the version argument to avoid lost > updates. > > Am I missing something in this analysis? It seems to me that behavior > could > be actually problematic during periods of spotty connectivity to the > ZooKeeper cluster. > > Thanks! > -Sam >