[
https://issues.apache.org/jira/browse/GEODE-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153107#comment-17153107
]
ASF GitHub Bot commented on GEODE-8338:
---------------------------------------
sabbeyPivotal opened a new pull request #5351:
URL: https://github.com/apache/geode/pull/5351
Since we have one redundant copy of the data, and since we modify the data
using a function, I think we may have a data corruption issue with
non-idempotent operations. What can happen is that an operation like APPEND can:
0) executor called on non-primary redis server,
1) modify the primary (by sending a function exec to it),
2) modify the secondary (by sending a geode delta to it),
3) the primary server fails now (before the function executing on it
completes),
4) the non-primary redis server sees the function fail and that it is marked
as HA so it retries it. This time it sends it the secondary, which is the new
primary, but the operation was actually done on the secondary so this retry
will end up doing the operation twice.
This may be okay for certain ops (like SADD) that are idempotent (but even
they could cause extra key events in the future), but for ops like APPEND we
end up appending twice.
This will only happen when a server executing a function dies and our
function service retries the function on another server because it is marked
HA. The easy way to fix this is to change our function to not be HA. This is
just a single one line change.
Note that our clients can already see exceptions/errors if the server they
are connected to dies. When that happens the operation they requested may have
happened, and if they have multiple geode redis servers running it may have
been stored and still in memory. So clients will need some logic to decide if
they should redo such an operation or not (because it is already done).
Note: By making the function non-HA, it should just give the client another
case in which they need to handle a server crash. It can now be for servers
they were not connected to but that were involved in performing the operation
they requested.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Redis commands may be repeated when server dies
> -----------------------------------------------
>
> Key: GEODE-8338
> URL: https://issues.apache.org/jira/browse/GEODE-8338
> Project: Geode
> Issue Type: Bug
> Components: redis
> Reporter: Sarah Abbey
> Priority: Major
>
> Since we have one redundant copy of the data, and since we modify the data
> using a function, I think we may have a data corruption issue with
> non-idempotent operations. What can happen is that an operation like APPEND
> can:
> 0) executor called on non-primary redis server,
> 1) modify the primary (by sending a function exec to it),
> 2) modify the secondary (by sending a geode delta to it),
> 3) the primary server fails now (before the function executing on it
> completes),
> 4) the non-primary redis server sees the function fail and that it is marked
> as HA so it retries it. This time it sends it the secondary, which is the new
> primary, but the operation was actually done on the secondary so this retry
> will end up doing the operation twice.
> This may be okay for certain ops (like SADD) that are idempotent (but even
> they could cause extra key events in the future), but for ops like APPEND we
> end up appending twice.
> This will only happen when a server executing a function dies and our
> function service retries the function on another server because it is marked
> HA. The easy way to fix this is to change our function to not be HA. This is
> just a single one line change.
> Note that our clients can already see exceptions/errors if the server they
> are connected to dies. When that happens the operation they requested may
> have happened, and if they have multiple geode redis servers running it may
> have been stored and still in memory. So clients will need some logic to
> decide if they should redo such an operation or not (because it is already
> done).
> *Note:* By making the function non-HA, it should just give the client another
> case in which they need to handle a server crash. It can now be for servers
> they were not connected to but that were involved in performing the operation
> they requested.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)