[
https://issues.apache.org/jira/browse/GEODE-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155779#comment-17155779
]
ASF subversion and git services commented on GEODE-8338:
--------------------------------------------------------
Commit 25bb3b53fdb31a28bde5376bb105ee0ed2414c9a in geode's branch
refs/heads/develop from Sarah Abbey
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=25bb3b5 ]
GEODE-8338: change redis commands not be repeated when a server dies (#5351)
The redis functions are no longer HA.
The product does have some cases when it can safely retry the function
but if a server dies the client will see a redis error containing
"memberDeparted".
In that case the client app can check to see if the redis operation should be
done
again, or if it already happened even though a server died.
Co-authored-by: Sarah Abbey <[email protected]>
Co-authored-by: Darrel Schneider <[email protected]>
> Redis commands may be repeated when server dies
> -----------------------------------------------
>
> Key: GEODE-8338
> URL: https://issues.apache.org/jira/browse/GEODE-8338
> Project: Geode
> Issue Type: Bug
> Components: redis
> Reporter: Sarah Abbey
> Priority: Major
>
> Since we have one redundant copy of the data, and since we modify the data
> using a function, I think we may have a data corruption issue with
> non-idempotent operations. What can happen is that an operation like APPEND
> can:
> 0) executor called on non-primary redis server,
> 1) modify the primary (by sending a function exec to it),
> 2) modify the secondary (by sending a geode delta to it),
> 3) the primary server fails now (before the function executing on it
> completes),
> 4) the non-primary redis server sees the function fail and that it is marked
> as HA so it retries it. This time it sends it the secondary, which is the new
> primary, but the operation was actually done on the secondary so this retry
> will end up doing the operation twice.
> This may be okay for certain ops (like SADD) that are idempotent (but even
> they could cause extra key events in the future), but for ops like APPEND we
> end up appending twice.
> This will only happen when a server executing a function dies and our
> function service retries the function on another server because it is marked
> HA. The easy way to fix this is to change our function to not be HA. This is
> just a single one line change.
> Note that our clients can already see exceptions/errors if the server they
> are connected to dies. When that happens the operation they requested may
> have happened, and if they have multiple geode redis servers running it may
> have been stored and still in memory. So clients will need some logic to
> decide if they should redo such an operation or not (because it is already
> done).
> *Note:* By making the function non-HA, it should just give the client another
> case in which they need to handle a server crash. It can now be for servers
> they were not connected to but that were involved in performing the operation
> they requested.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)