[
https://issues.apache.org/jira/browse/OAK-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16182563#comment-16182563
]
Francesco Mari commented on OAK-6678:
-------------------------------------
[~dulceanu], thanks for your updated patch. I'm not convinced that this is the
best solution. The timeout when reading the head state must not come from the
standby. We risk unwanted denial of service on the primary if a standby
specifies too high timeouts in the requests. What I think is preferable is to
modify the {{GetRequestHandler}} to something like that:
{noformat}
class DefaultStandbyHeadReader implements StandbyHeadReader {
private final FileStore store;
private final long timeout;
DefaultStandbyHeadReader(FileStore store, long timeout) {
this.store = store;
this.timeout = timeout;
}
@Override
public String readHeadRecordId() {
RecordId persistedHead = readPersistedHeadWithRetry(store, timeout);
return persistedHead != null ? persistedHead.toString() : null;
}
}
{noformat}
The timeout should be part of the configuration of the {{StandbyServer}}, which
propagates it to the {{DefaultStandbyHeadReader}} when the server pipeline is
created. If a persisted head state is not found, the primary should return a
timely response to the client. That is, the {{GetHeadRequestHandler}} should be
modified to do the following.
{noformat}
String id = reader.readHeadRecordId();
if (id == null) {
ctx.writeAndFlush(new NotFoundGetHeadResponse(msg.getClientId(), id));
return;
}
ctx.writeAndFlush(new GetHeadResponse(msg.getClientId(), id));
{noformat}
This way, if the timeout in {{DefaultStandbyHeadReader}} is substantially less
than the read timeout used by the client, we should be able to gracefully
handle the absence of a persisted head state both on the server and the client.
For example, the timeout in {{DefaultStandbyHeadReader}} could be about 1s,
while the default value of the read timeout on the client is 60s. The 1s on the
server, given the implementation in the patch, translates to eight consecutive
attempts at reading the head state IIUC.
What do you think about that?
> Syncing big blobs fails since StandbyServer sends persisted head
> ----------------------------------------------------------------
>
> Key: OAK-6678
> URL: https://issues.apache.org/jira/browse/OAK-6678
> Project: Jackrabbit Oak
> Issue Type: Bug
> Components: segment-tar, tarmk-standby
> Reporter: Andrei Dulceanu
> Assignee: Andrei Dulceanu
> Labels: cold-standby, resilience
> Fix For: 1.8, 1.7.9
>
> Attachments: OAK-6678-02.patch, OAK-6678.patch
>
>
> With changes for OAK-6653 in place,
> {{ExternalPrivateStoreIT#testSyncBigBlog}} and sometimes
> {{ExternalSharedStoreIT#testSyncBigBlob}} are failing on CI:
> {noformat}
> org.apache.jackrabbit.oak.segment.standby.ExternalSharedStoreIT
> testSyncBigBlob(org.apache.jackrabbit.oak.segment.standby.ExternalSharedStoreIT)
> Time elapsed: 96.82 sec <<< FAILURE!
> java.lang.AssertionError: expected:<{ root = { ... } }> but was:<{ root : { }
> }>
> ...
> testSyncBigBlob(org.apache.jackrabbit.oak.segment.standby.ExternalPrivateStoreIT)
> Time elapsed: 95.254 sec <<< FAILURE!
> java.lang.AssertionError: expected:<{ root = { ... } }> but was:<{ root : { }
> }>
> {noformat}
> Partial stacktrace:
> {noformat}
> 14:09:08.355 DEBUG [main] StandbyServer.java:242 Binding was
> successful
> 14:09:08.358 DEBUG [standby-1] GetHeadRequestEncoder.java:33 Sending request
> from client Bar for current head
> 14:09:08.359 DEBUG [primary-1] ClientFilterHandler.java:53 Client
> /127.0.0.1:52988 is allowed
> 14:09:08.360 DEBUG [primary-1] RequestDecoder.java:42 Parsed 'get head'
> message
> 14:09:08.360 DEBUG [primary-1] CommunicationObserver.java:79 Message 'get
> head' received from client Bar
> 14:09:08.362 DEBUG [primary-1] GetHeadRequestHandler.java:43 Reading head for
> client Bar
> 14:09:08.363 WARN [primary-1] ExceptionHandler.java:31 Exception caught
> on the server
> java.lang.NullPointerException: null
> at
> org.apache.jackrabbit.oak.segment.standby.server.DefaultStandbyHeadReader.readHeadRecordId(DefaultStandbyHeadReader.java:32)
> ~[oak-segment-tar-1.8-SNAPSHOT.jar:1.8-SNAPSHOT]
> at
> org.apache.jackrabbit.oak.segment.standby.server.GetHeadRequestHandler.channelRead0(GetHeadRequestHandler.java:45)
> ~[oak-segment-tar-1.8-SNAPSHOT.jar:1.8-SNAPSHOT]
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)