[ 
https://issues.apache.org/jira/browse/OAK-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16180703#comment-16180703
 ] 

Andrei Dulceanu commented on OAK-6678:
--------------------------------------

bq. About the head state, if a persisted head state doesn't yet exist on the 
primary, the standby instance should not interpret this as an error condition.
Agreed, that is covered by the first bullet point.

bq.  Instead, the standby should handle the situation by gracefully abort the 
current synchronization. The standby will try again at the next 
synchronization. This implies that a "null" head state response should be sent 
from the primary to the standby in case a persisted head state doesn't exist on 
the primary.
I partially agree with this. Although I share the same viewpoint here, I think 
it's important to keep the "read persisted head with retry" logic. This will 
allow us to simplify our ITs (no need to employ a scheduling mechanism for a 
standby sync) at no additional cost. In order to have a predictable amount of 
wait time during retries, we can transparently configure 
{{DefaultStandbyHeadReader}} to use {{readTimeoutMs}} for this. 

bq. About reading segments, the retry logic should not be necessary if we 
guarantee that the head state is always a persisted one.
Totally agree on this one. I even wanted to remove the code for this, but 
forgot about it :)

[~frm], let me know what you think about the comments above. After that, I will 
attach an improved version of the patch for review.

> Syncing big blobs fails since StandbyServer sends persisted head
> ----------------------------------------------------------------
>
>                 Key: OAK-6678
>                 URL: https://issues.apache.org/jira/browse/OAK-6678
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: segment-tar, tarmk-standby
>            Reporter: Andrei Dulceanu
>            Assignee: Andrei Dulceanu
>              Labels: cold-standby, resilience
>             Fix For: 1.8, 1.7.9
>
>         Attachments: OAK-6678.patch
>
>
> With changes for OAK-6653 in place, 
> {{ExternalPrivateStoreIT#testSyncBigBlog}} and sometimes 
> {{ExternalSharedStoreIT#testSyncBigBlob}} are failing on CI:
> {noformat}
> org.apache.jackrabbit.oak.segment.standby.ExternalSharedStoreIT
> testSyncBigBlob(org.apache.jackrabbit.oak.segment.standby.ExternalSharedStoreIT)
>   Time elapsed: 96.82 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<{ root = { ... } }> but was:<{ root : { } 
> }>
> ...
> testSyncBigBlob(org.apache.jackrabbit.oak.segment.standby.ExternalPrivateStoreIT)
>   Time elapsed: 95.254 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<{ root = { ... } }> but was:<{ root : { } 
> }>
> {noformat}
> Partial stacktrace:
> {noformat}
> 14:09:08.355 DEBUG [main] StandbyServer.java:242            Binding was 
> successful
> 14:09:08.358 DEBUG [standby-1] GetHeadRequestEncoder.java:33 Sending request 
> from client Bar for current head
> 14:09:08.359 DEBUG [primary-1] ClientFilterHandler.java:53  Client 
> /127.0.0.1:52988 is allowed
> 14:09:08.360 DEBUG [primary-1] RequestDecoder.java:42       Parsed 'get head' 
> message
> 14:09:08.360 DEBUG [primary-1] CommunicationObserver.java:79 Message 'get 
> head' received from client Bar
> 14:09:08.362 DEBUG [primary-1] GetHeadRequestHandler.java:43 Reading head for 
> client Bar
> 14:09:08.363 WARN  [primary-1] ExceptionHandler.java:31     Exception caught 
> on the server
> java.lang.NullPointerException: null
>       at 
> org.apache.jackrabbit.oak.segment.standby.server.DefaultStandbyHeadReader.readHeadRecordId(DefaultStandbyHeadReader.java:32)
>  ~[oak-segment-tar-1.8-SNAPSHOT.jar:1.8-SNAPSHOT]
>       at 
> org.apache.jackrabbit.oak.segment.standby.server.GetHeadRequestHandler.channelRead0(GetHeadRequestHandler.java:45)
>  ~[oak-segment-tar-1.8-SNAPSHOT.jar:1.8-SNAPSHOT]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to