On Tue, Jun 28, 2011 at 2:05 PM, Galder Zamarreño <gal...@redhat.com> wrote: > Some comments below: > > On Jun 21, 2011, at 10:26 AM, Dan Berindei wrote: > >> On Mon, Jun 20, 2011 at 11:42 PM, Sanne Grinovero <sa...@infinispan.org> >> wrote: >>> 2011/6/20 Manik Surtani <ma...@jboss.org>: >>>> Oddly enough, I don't see any other tests exhibiting this behaviour. Let >>>> me know if you see it in more recent CI runs, and we'll investigate in >>>> detail. >>> >>> In fact there are not many tests in core which verify a full stream is >>> received; but as in another thread I mentioned I was seeing the >>> following exception relatively often (it never caught my attention >>> some months ago) >>> >>> Caused by: java.io.EOFException: The stream ended unexpectedly. >>> Please check whether the source of the stream encountered any issues >>> generating the stream. >>> at >>> org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:193) >>> at >>> org.infinispan.statetransfer.StateTransferManagerImpl.processCommitLog(StateTransferManagerImpl.java:218) >>> at >>> org.infinispan.statetransfer.StateTransferManagerImpl.applyTransactionLog(StateTransferManagerImpl.java:245) >>> at >>> org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:284) >>> ... 27 more >>> Caused by: java.io.EOFException: Read past end of file >>> at >>> org.jboss.marshalling.SimpleDataInput.eofOnRead(SimpleDataInput.java:126) >>> at >>> org.jboss.marshalling.SimpleDataInput.readUnsignedByteDirect(SimpleDataInput.java:263) >>> at >>> org.jboss.marshalling.SimpleDataInput.readUnsignedByte(SimpleDataInput.java:224) >>> at >>> org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209) >>> at >>> org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:37) >>> at >>> org.infinispan.marshall.jboss.GenericJBossMarshaller.objectFromObjectStream(GenericJBossMarshaller.java:191) >>> at >>> org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:191) >>> ... 30 more >>> >> >> The line "at >> org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)" >> suggests EOF has been reached while reading the lead byte of the >> object, not a partial object. This is consistent with >> StateTransferManagerImpl.generateTransactionLog() getting a timeout >> while trying to acquire the processing lock (at >> StateTransferManagerImpl.java:192) and closing the stream in the >> middle of the transaction log, not a transmission error. We could >> probably get rid of the exception in the logs by inserting another >> delimiter here. > > You mean inserting a particular delimiter in case of a timeout acquiring the > processing lock? That sounds like a good idea. Just back from holidays and > can't remember well, but where previous EOFs related to processing lock > acquisition? If so, I think your idea makes even more sense cos it'd fall > into a possible known problem and could provide the receiver side with that > bit more of information. >
I'm pretty sure all the EOFs we got are related to processing lock acquisition, yes. >> >> Back to the original problem, if this was a stream corruption issue >> I'd expect a lot more instances of deserializing errors because the >> length of the buffer was smaller/larger than the number of bytes >> following it and the next object to be deserialized from the stream >> found garbage instead. >> >> This looks to me more like an index segment has been created with size >> x on node A and also on node B, then it was updated with size y > x on >> node A but only the metadata got to node B, the segment's byte array >> remained the same. >> >> I don't know anything about the Lucene directory implementation yet, >> so I have no idea if/how this could happen and I haven't been able to >> reproduce it on my machine. Is there a way to see the Jenkins test >> logs? > > There's the console log in http://goo.gl/JFh5R but if this happens relatively > often in the Lucene dir impl, we could create a small jenkins run and pass a > -Dlog4j.configuration configuring TRACE to be printed on console. The > testsuite is small and would not generate a lot logging. There's always a > change of not encountering the issue when TRACE is enabled, particularly if > it's a race condition, but I think it's worth doing. IOW, I can set it up. > It would be great if you could set it up. It's not happening on every run though, so is it possible to configure jenkins to repeat it until it gets a failure? >> >> Dan >> >> >>> This looks like a suspicious correlation to me, as I think the >>> reported errors are similar in nature. >>> >>> Cheers, >>> Sanne >>> >>> >>> >>>> >>>> On 18 Jun 2011, at 20:18, Sanne Grinovero wrote: >>>> >>>>> Hello all, >>>>> I'm not in state to fully debug the issue this week, but even though >>>>> this failure happens in the Lucene Directory it looks like it's >>>>> reporting an issue with Infinispan core: >>>>> >>>>> https://infinispan.ci.cloudbees.com/job/Infinispan-master-JDK6-tcp/90/org.infinispan$infinispan-lucene-directory/testReport/junit/org.infinispan.lucene/SimpleLuceneTest/org_infinispan_lucene_SimpleLuceneTest_testIndexWritingAndFinding/ >>>>> >>>>> In this test we're writing to the index, and then asserting on the >>>>> expected state on both nodes, but while it is successful on the same >>>>> node as the writer, it fails with >>>>> "java.io.IOException: Read past EOF" on the second node. >>>>> >>>>> This exception can mean only one thing: the value, which is a >>>>> buffer[], was not completely transferred to the second node, which >>>>> seems quite critical as the caches are using sync. >>>>> I can't reproduce the error locally, but it's not the first time it is >>>>> reported by CI: builds 60, 62, 65 for example (and more) show the same >>>>> testcase fail in the same manner. >>>>> >>>>> Cheers, >>>>> Sanne >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev@lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> >>>> -- >>>> Manik Surtani >>>> ma...@jboss.org >>>> twitter.com/maniksurtani >>>> >>>> Lead, Infinispan >>>> http://www.infinispan.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev@lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev@lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev@lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > -- > Galder Zamarreño > Sr. Software Engineer > Infinispan, JBoss Cache > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev