beobal commented on code in PR #3502:
URL: https://github.com/apache/cassandra/pull/3502#discussion_r1743697811
##########
src/java/org/apache/cassandra/tcm/log/LogReader.java:
##########
@@ -60,6 +60,11 @@ public interface LogReader
* case we return all subsequent entries in the log.
*/
default LogState getLogState(Epoch startEpoch)
+ {
+ return getLogState(startEpoch, true);
+ }
+
+ default LogState getLogState(Epoch startEpoch, boolean allowSnapshots)
Review Comment:
note: this is also not strictly related to the issue at hand, but was
spotted during testing. When constructing a `LogState` for inclusion in a
success or failure response to a `Commit` we never want to include metadata
snapshots, only entries (see `LocalLog::getCommittedEntries`)
##########
src/java/org/apache/cassandra/tcm/log/LogReader.java:
##########
@@ -69,11 +74,13 @@ default LogState getLogState(Epoch startEpoch)
// If there is at most 1 snapshot with an epoch > startEpoch, we
prefer to skip that snapshot and just build a
// list of consecutive entries
- if (snapshotEpochs.size() <= 1)
+ if (snapshotEpochs.size() <= 1 || !allowSnapshots)
{
entries = getEntries(startEpoch);
if (entries.isContinuous())
return new LogState(null, entries.immutable());
+ else if (!allowSnapshots)
+ throw new IllegalStateException("Can't construct a
continious log since " + startEpoch + " and we don't allow snapshots");
Review Comment:
typo: `continious`
##########
src/java/org/apache/cassandra/tcm/ClusterMetadataService.java:
##########
@@ -844,11 +844,23 @@ private Pair<State, Processor> delegateInternal()
@Override
public Commit.Result commit(Entry.Id entryId, Transformation
transform, Epoch lastKnown, Retry.Deadline retryPolicy)
{
- Pair<State, Processor> delegate = delegateInternal();
- Commit.Result result = delegate.right.commit(entryId, transform,
lastKnown, retryPolicy);
- if (delegate.left == LOCAL || delegate.left == RESET)
- replicator.send(result, null);
- return result;
+ while (!retryPolicy.reachedMax())
Review Comment:
The underlying processor or the choice of which processor to delegate to is
the issue here so I don't see how that can be pushed down into the delegated-to
processor.
Using the `retryPolicy` to manage retries seems like the right thing to do
as it will maintain the state of the retries if a `NotCMSException` is
encountered. e.g.s
* node is a CMS member, so delegate to a local processor
* local processor makes several attempts to commit, but is unable to due to
competing commits
* CMS membership changes as a result of one of those competing commits, node
is no longer a member
* on the next retry, local processor throws `NotCMSException`
* which is caught in `SwitchableProcessor`, which itself now retries and
obtains a new, remote delegate
* committing via that remote delegate shouldn't reset the retry policy (i.e.
if it had a `maxTries` the total attempts across both delegated processors
should count toward it).
##########
src/java/org/apache/cassandra/tcm/log/LocalLog.java:
##########
@@ -894,9 +894,9 @@ private LogListener snapshotListener()
if ((entry.epoch.getEpoch() %
DatabaseDescriptor.getMetadataSnapshotFrequency()) == 0)
{
- List<InetAddressAndPort> list = new
ArrayList<>(ClusterMetadata.current().fullCMSMembers());
- list.sort(comparing(i -> i.addressBytes[i.addressBytes.length
- 1]));
- if
(list.get(0).equals(FBUtilities.getBroadcastAddressAndPort()))
+ List<NodeId> list = new
ArrayList<>(metadata.success().metadata.fullCMSMemberIds());
Review Comment:
note: this is unrelated to the issue being addressed here, but removes the
chance of non-determinism where nodes share the same the last address byte
##########
src/java/org/apache/cassandra/tcm/ClusterMetadataService.java:
##########
@@ -844,11 +844,23 @@ private Pair<State, Processor> delegateInternal()
@Override
public Commit.Result commit(Entry.Id entryId, Transformation
transform, Epoch lastKnown, Retry.Deadline retryPolicy)
{
- Pair<State, Processor> delegate = delegateInternal();
- Commit.Result result = delegate.right.commit(entryId, transform,
lastKnown, retryPolicy);
- if (delegate.left == LOCAL || delegate.left == RESET)
- replicator.send(result, null);
- return result;
+ while (!retryPolicy.reachedMax())
+ {
+ try
+ {
+ Pair<State, Processor> delegate = delegateInternal();
+ Commit.Result result = delegate.right.commit(entryId,
transform, lastKnown, retryPolicy);
+ ClusterMetadataService.State state = delegate.left;
+ if (state == LOCAL || state == RESET)
+ replicator.send(result, null);
+ return result;
+ }
+ catch (NotCMSException e)
+ {
+ retryPolicy.maybeSleep();
+ }
+ }
+ return Commit.Result.failed(ExceptionCode.SERVER_ERROR, "Could not
commit " + transform.kind() + " at epoch " + lastKnown);
Review Comment:
It seems we're rather inconsistent in this class and have a mix of both
concatenation and format when throwing exceptions. The only real benefit of
format is readability, which I don't think is terrible anyway so +0 ¯\_(ツ)_/¯
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]