[ https://issues.apache.org/jira/browse/SLING-3432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920816#comment-13920816 ]
Stefan Egli commented on SLING-3432: ------------------------------------ Besides trying harder to avoid a pseudo-network-partition due to overload/slow-repository, another option is to be more resilient just by increasing the heartbeatTimeout. Hence increasing the default of 60sec to 120sec for the timeout, leaving the interval at 30sec though. > pseudo network partition causes job deserialization issue in a cluster (when > reading while job is being reassigned) > ------------------------------------------------------------------------------------------------------------------- > > Key: SLING-3432 > URL: https://issues.apache.org/jira/browse/SLING-3432 > Project: Sling > Issue Type: Bug > Components: Extensions > Affects Versions: Event 3.3.4 > Reporter: Stefan Egli > > There is a race condition between two instances in a cluster (eg oak or crx): > Instance 1 is writing a job with a binary property, instance 2 is reading the > job (likely triggered by discovery sending it a topologychangedevent). It > looks like instance 2 is reading the job just about while instance 1 is still > in the process or completely writing the job, or at least the binary. > Resulting in the following exception: > 04.03.2014 06:55:39.667 *WARN* [Apache Sling Job Background Loader] > org.apache.sling.event.impl.jobs.JobManagerImpl Unable to read job from > /var/eventing/jobs/assigned/e4337f8f-47d2-41df-b3ab-0d40b1b2acd4/slingevent:eventadmin/2014/3/3/8/45/cq.wcm.msm.job.pageEvent_9718d7db-85b4-4930-a2ba-11a80d772970_172 > java.lang.Exception: Unable to deserialize property 'pageEvent' > at > org.apache.sling.event.impl.support.ResourceHelper.cloneValueMap(ResourceHelper.java:213) > at > org.apache.sling.event.impl.jobs.JobManagerImpl.readJob(JobManagerImpl.java:538) > at > org.apache.sling.event.impl.jobs.BackgroundLoader.loadJobInTheBackground(BackgroundLoader.java:318) > at > org.apache.sling.event.impl.jobs.BackgroundLoader.loadJobsInTheBackground(BackgroundLoader.java:294) > at > org.apache.sling.event.impl.jobs.BackgroundLoader.run(BackgroundLoader.java:203) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.io.EOFException: null > at > java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2280) > at > java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2749) > at > java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:779) > at java.io.ObjectInputStream.<init>(ObjectInputStream.java:279) > at > org.apache.sling.event.impl.support.ResourceHelper.cloneValueMap(ResourceHelper.java:208) > ... 5 common frames omitted -- This message was sent by Atlassian JIRA (v6.2#6252)