[ 
https://issues.apache.org/jira/browse/OAK-5034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret updated OAK-5034:
--------------------------------
    Attachment: OAK-5034.patch

Attaching a patch to set the max retry period to 20 seconds (same as 
{{oak-tarmk-standby}}).

[~frm] could you have a look at the patch ?

> FileStoreUtil#readSegmentWithRetry max retry delay is too short to be 
> functional
> --------------------------------------------------------------------------------
>
>                 Key: OAK-5034
>                 URL: https://issues.apache.org/jira/browse/OAK-5034
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: segment-tar
>    Affects Versions: Segment Tar 0.0.16
>            Reporter: Timothee Maret
>            Assignee: Timothee Maret
>         Attachments: OAK-5034.patch
>
>
> The commit {{1765838}} introduced the {{FileStoreUtil#readSegmentWithRetry}} 
> util and reduced the period between two tries (from 2sec to 0.125s) while the 
> total number of tries did not change.
> This does not give enough time for the server to find references and 
> segments, thus causing exceptions such as
> {code}
> 29.10.2016 05:07:37.242 *ERROR* [sling-default-2-Registered Service.605] 
> org.apache.jackrabbit.oak.segment.standby.client.StandbyClientSync Failed 
> synchronizing state.
> java.lang.IllegalStateException: Unable to read references of segment 
> 5168c878-3a3f-49d0-aea9-b8b57d5d867f from primary
>         at 
> org.apache.jackrabbit.oak.segment.standby.client.StandbyClientSyncExecution.readReferences(StandbyClientSyncExecution.java:196)
>         at 
> org.apache.jackrabbit.oak.segment.standby.client.StandbyClientSyncExecution.copySegmentHierarchyFromPrimary(StandbyClientSyncExecution.java:130)
>         at 
> org.apache.jackrabbit.oak.segment.standby.client.StandbyClientSyncExecution.compareAgainstBaseState(StandbyClientSyncExecution.java:94)
>         at 
> org.apache.jackrabbit.oak.segment.standby.client.StandbyClientSyncExecution.execute(StandbyClientSyncExecution.java:74)
>         at 
> org.apache.jackrabbit.oak.segment.standby.client.StandbyClientSync.run(StandbyClientSync.java:143)
>         at 
> org.apache.sling.commons.scheduler.impl.QuartzJobExecutor.execute(QuartzJobExecutor.java:118)
>         at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> and causing the client to throw exceptions, ultimately causing IT tests to 
> fail.
> IIUC, the minimum period to retry should be bigger than a TarMK flush cycle 
> (5 sec).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to