[ 
https://issues.apache.org/jira/browse/OAK-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16182583#comment-16182583
 ] 

Andrei Dulceanu commented on OAK-6678:
--------------------------------------

bq. What do you think about that?
[~frm], I agree with all your proposals, except the part below:

{quote}
The 1s on the server, given the implementation in the patch, translates to 
eight consecutive attempts at reading the head state IIUC.
{quote}
I think an improved default for timeout on the server is >5s (e.g. 6s), since 
the flush thread kicks in once every 5s. This way we really use the "read 
persisted head with retry" mechanism, by waiting for the flush thread to 
actually write the latest content to segments.

{quote}
The timeout should be part of the configuration of the StandbyServer
{quote}
I agree, but one question comes to mind: what do we do with existing OSGi 
setting {{standby.readtimeout}}? Do we split it into two settings, one for the 
server, defaulting to 6s (let's say), to be used while fetching the persisted 
head state and one for the client, to be used while waiting for a response from 
the server?

On a different note, one important gain from treating the absence of a 
persisted head state on the primary as a non-erroneous behaviour, is the fact 
that we will be able to start two instances (primary and standby) and let them 
do the sync in multiple cycles, without worrying about what's flushed on the 
primary.

> Syncing big blobs fails since StandbyServer sends persisted head
> ----------------------------------------------------------------
>
>                 Key: OAK-6678
>                 URL: https://issues.apache.org/jira/browse/OAK-6678
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: segment-tar, tarmk-standby
>            Reporter: Andrei Dulceanu
>            Assignee: Andrei Dulceanu
>              Labels: cold-standby, resilience
>             Fix For: 1.8, 1.7.9
>
>         Attachments: OAK-6678-02.patch, OAK-6678.patch
>
>
> With changes for OAK-6653 in place, 
> {{ExternalPrivateStoreIT#testSyncBigBlog}} and sometimes 
> {{ExternalSharedStoreIT#testSyncBigBlob}} are failing on CI:
> {noformat}
> org.apache.jackrabbit.oak.segment.standby.ExternalSharedStoreIT
> testSyncBigBlob(org.apache.jackrabbit.oak.segment.standby.ExternalSharedStoreIT)
>   Time elapsed: 96.82 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<{ root = { ... } }> but was:<{ root : { } 
> }>
> ...
> testSyncBigBlob(org.apache.jackrabbit.oak.segment.standby.ExternalPrivateStoreIT)
>   Time elapsed: 95.254 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<{ root = { ... } }> but was:<{ root : { } 
> }>
> {noformat}
> Partial stacktrace:
> {noformat}
> 14:09:08.355 DEBUG [main] StandbyServer.java:242            Binding was 
> successful
> 14:09:08.358 DEBUG [standby-1] GetHeadRequestEncoder.java:33 Sending request 
> from client Bar for current head
> 14:09:08.359 DEBUG [primary-1] ClientFilterHandler.java:53  Client 
> /127.0.0.1:52988 is allowed
> 14:09:08.360 DEBUG [primary-1] RequestDecoder.java:42       Parsed 'get head' 
> message
> 14:09:08.360 DEBUG [primary-1] CommunicationObserver.java:79 Message 'get 
> head' received from client Bar
> 14:09:08.362 DEBUG [primary-1] GetHeadRequestHandler.java:43 Reading head for 
> client Bar
> 14:09:08.363 WARN  [primary-1] ExceptionHandler.java:31     Exception caught 
> on the server
> java.lang.NullPointerException: null
>       at 
> org.apache.jackrabbit.oak.segment.standby.server.DefaultStandbyHeadReader.readHeadRecordId(DefaultStandbyHeadReader.java:32)
>  ~[oak-segment-tar-1.8-SNAPSHOT.jar:1.8-SNAPSHOT]
>       at 
> org.apache.jackrabbit.oak.segment.standby.server.GetHeadRequestHandler.channelRead0(GetHeadRequestHandler.java:45)
>  ~[oak-segment-tar-1.8-SNAPSHOT.jar:1.8-SNAPSHOT]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to