[
https://issues.apache.org/jira/browse/OAK-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16182583#comment-16182583
]
Andrei Dulceanu commented on OAK-6678:
--------------------------------------
bq. What do you think about that?
[~frm], I agree with all your proposals, except the part below:
{quote}
The 1s on the server, given the implementation in the patch, translates to
eight consecutive attempts at reading the head state IIUC.
{quote}
I think an improved default for timeout on the server is >5s (e.g. 6s), since
the flush thread kicks in once every 5s. This way we really use the "read
persisted head with retry" mechanism, by waiting for the flush thread to
actually write the latest content to segments.
{quote}
The timeout should be part of the configuration of the StandbyServer
{quote}
I agree, but one question comes to mind: what do we do with existing OSGi
setting {{standby.readtimeout}}? Do we split it into two settings, one for the
server, defaulting to 6s (let's say), to be used while fetching the persisted
head state and one for the client, to be used while waiting for a response from
the server?
On a different note, one important gain from treating the absence of a
persisted head state on the primary as a non-erroneous behaviour, is the fact
that we will be able to start two instances (primary and standby) and let them
do the sync in multiple cycles, without worrying about what's flushed on the
primary.
> Syncing big blobs fails since StandbyServer sends persisted head
> ----------------------------------------------------------------
>
> Key: OAK-6678
> URL: https://issues.apache.org/jira/browse/OAK-6678
> Project: Jackrabbit Oak
> Issue Type: Bug
> Components: segment-tar, tarmk-standby
> Reporter: Andrei Dulceanu
> Assignee: Andrei Dulceanu
> Labels: cold-standby, resilience
> Fix For: 1.8, 1.7.9
>
> Attachments: OAK-6678-02.patch, OAK-6678.patch
>
>
> With changes for OAK-6653 in place,
> {{ExternalPrivateStoreIT#testSyncBigBlog}} and sometimes
> {{ExternalSharedStoreIT#testSyncBigBlob}} are failing on CI:
> {noformat}
> org.apache.jackrabbit.oak.segment.standby.ExternalSharedStoreIT
> testSyncBigBlob(org.apache.jackrabbit.oak.segment.standby.ExternalSharedStoreIT)
> Time elapsed: 96.82 sec <<< FAILURE!
> java.lang.AssertionError: expected:<{ root = { ... } }> but was:<{ root : { }
> }>
> ...
> testSyncBigBlob(org.apache.jackrabbit.oak.segment.standby.ExternalPrivateStoreIT)
> Time elapsed: 95.254 sec <<< FAILURE!
> java.lang.AssertionError: expected:<{ root = { ... } }> but was:<{ root : { }
> }>
> {noformat}
> Partial stacktrace:
> {noformat}
> 14:09:08.355 DEBUG [main] StandbyServer.java:242 Binding was
> successful
> 14:09:08.358 DEBUG [standby-1] GetHeadRequestEncoder.java:33 Sending request
> from client Bar for current head
> 14:09:08.359 DEBUG [primary-1] ClientFilterHandler.java:53 Client
> /127.0.0.1:52988 is allowed
> 14:09:08.360 DEBUG [primary-1] RequestDecoder.java:42 Parsed 'get head'
> message
> 14:09:08.360 DEBUG [primary-1] CommunicationObserver.java:79 Message 'get
> head' received from client Bar
> 14:09:08.362 DEBUG [primary-1] GetHeadRequestHandler.java:43 Reading head for
> client Bar
> 14:09:08.363 WARN [primary-1] ExceptionHandler.java:31 Exception caught
> on the server
> java.lang.NullPointerException: null
> at
> org.apache.jackrabbit.oak.segment.standby.server.DefaultStandbyHeadReader.readHeadRecordId(DefaultStandbyHeadReader.java:32)
> ~[oak-segment-tar-1.8-SNAPSHOT.jar:1.8-SNAPSHOT]
> at
> org.apache.jackrabbit.oak.segment.standby.server.GetHeadRequestHandler.channelRead0(GetHeadRequestHandler.java:45)
> ~[oak-segment-tar-1.8-SNAPSHOT.jar:1.8-SNAPSHOT]
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)