[
https://issues.apache.org/jira/browse/OAK-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16182633#comment-16182633
]
Francesco Mari commented on OAK-6678:
-------------------------------------
bq. I agree, but one question comes to mind: what do we do with existing OSGi
setting standby.readtimeout? Do we split it into two settings, one for the
server, defaulting to 6s (let's say), to be used while fetching the persisted
head state and one for the client, to be used while waiting for a response from
the server?
I think we can just pick a good default and hide it behind a system property
for the time being. As you noted, everything between 5s and 10s should be a
good guess.
bq. I think the DoS on the primary can't actually happen due to a high timeout
used in the client.
Maybe calling it a denial of service might be a bit exaggerated, but I don't
think it's healthy for a client to decide the timeout that the server should
use to manage its internal data structures.
bq. Now let's suppose that we re-use the default timeout from the client,
namely 60s. If the primary is unable to persist its first state in 60s, then I
think something really fishy is happening on the instance, don't you agree?
I agree that if the server is not able to read its head state in 60s something
is really going awry. But so far the read timeout has been always interpreted
as the timeout that the standby has to wait for a response to come back. Adding
a different semantics to this property on the server feels wrong to me. As
stated before, let's try with a sensitive default and a system property first.
> Syncing big blobs fails since StandbyServer sends persisted head
> ----------------------------------------------------------------
>
> Key: OAK-6678
> URL: https://issues.apache.org/jira/browse/OAK-6678
> Project: Jackrabbit Oak
> Issue Type: Bug
> Components: segment-tar, tarmk-standby
> Reporter: Andrei Dulceanu
> Assignee: Andrei Dulceanu
> Labels: cold-standby, resilience
> Fix For: 1.8, 1.7.9
>
> Attachments: OAK-6678-02.patch, OAK-6678.patch
>
>
> With changes for OAK-6653 in place,
> {{ExternalPrivateStoreIT#testSyncBigBlog}} and sometimes
> {{ExternalSharedStoreIT#testSyncBigBlob}} are failing on CI:
> {noformat}
> org.apache.jackrabbit.oak.segment.standby.ExternalSharedStoreIT
> testSyncBigBlob(org.apache.jackrabbit.oak.segment.standby.ExternalSharedStoreIT)
> Time elapsed: 96.82 sec <<< FAILURE!
> java.lang.AssertionError: expected:<{ root = { ... } }> but was:<{ root : { }
> }>
> ...
> testSyncBigBlob(org.apache.jackrabbit.oak.segment.standby.ExternalPrivateStoreIT)
> Time elapsed: 95.254 sec <<< FAILURE!
> java.lang.AssertionError: expected:<{ root = { ... } }> but was:<{ root : { }
> }>
> {noformat}
> Partial stacktrace:
> {noformat}
> 14:09:08.355 DEBUG [main] StandbyServer.java:242 Binding was
> successful
> 14:09:08.358 DEBUG [standby-1] GetHeadRequestEncoder.java:33 Sending request
> from client Bar for current head
> 14:09:08.359 DEBUG [primary-1] ClientFilterHandler.java:53 Client
> /127.0.0.1:52988 is allowed
> 14:09:08.360 DEBUG [primary-1] RequestDecoder.java:42 Parsed 'get head'
> message
> 14:09:08.360 DEBUG [primary-1] CommunicationObserver.java:79 Message 'get
> head' received from client Bar
> 14:09:08.362 DEBUG [primary-1] GetHeadRequestHandler.java:43 Reading head for
> client Bar
> 14:09:08.363 WARN [primary-1] ExceptionHandler.java:31 Exception caught
> on the server
> java.lang.NullPointerException: null
> at
> org.apache.jackrabbit.oak.segment.standby.server.DefaultStandbyHeadReader.readHeadRecordId(DefaultStandbyHeadReader.java:32)
> ~[oak-segment-tar-1.8-SNAPSHOT.jar:1.8-SNAPSHOT]
> at
> org.apache.jackrabbit.oak.segment.standby.server.GetHeadRequestHandler.channelRead0(GetHeadRequestHandler.java:45)
> ~[oak-segment-tar-1.8-SNAPSHOT.jar:1.8-SNAPSHOT]
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)