[infinispan-dev] Alert from a failing test
Hello all, I'm not in state to fully debug the issue this week, but even though this failure happens in the Lucene Directory it looks like it's reporting an issue with Infinispan core: https://infinispan.ci.cloudbees.com/job/Infinispan-master-JDK6-tcp/90/org.infinispan$infinispan-lucene-directory/testReport/junit/org.infinispan.lucene/SimpleLuceneTest/org_infinispan_lucene_SimpleLuceneTest_testIndexWritingAndFinding/ In this test we're writing to the index, and then asserting on the expected state on both nodes, but while it is successful on the same node as the writer, it fails with "java.io.IOException: Read past EOF" on the second node. This exception can mean only one thing: the value, which is a buffer[], was not completely transferred to the second node, which seems quite critical as the caches are using sync. I can't reproduce the error locally, but it's not the first time it is reported by CI: builds 60, 62, 65 for example (and more) show the same testcase fail in the same manner. Cheers, Sanne ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Alert from a failing test
Oddly enough, I don't see any other tests exhibiting this behaviour. Let me know if you see it in more recent CI runs, and we'll investigate in detail. On 18 Jun 2011, at 20:18, Sanne Grinovero wrote: > Hello all, > I'm not in state to fully debug the issue this week, but even though > this failure happens in the Lucene Directory it looks like it's > reporting an issue with Infinispan core: > > https://infinispan.ci.cloudbees.com/job/Infinispan-master-JDK6-tcp/90/org.infinispan$infinispan-lucene-directory/testReport/junit/org.infinispan.lucene/SimpleLuceneTest/org_infinispan_lucene_SimpleLuceneTest_testIndexWritingAndFinding/ > > In this test we're writing to the index, and then asserting on the > expected state on both nodes, but while it is successful on the same > node as the writer, it fails with > "java.io.IOException: Read past EOF" on the second node. > > This exception can mean only one thing: the value, which is a > buffer[], was not completely transferred to the second node, which > seems quite critical as the caches are using sync. > I can't reproduce the error locally, but it's not the first time it is > reported by CI: builds 60, 62, 65 for example (and more) show the same > testcase fail in the same manner. > > Cheers, > Sanne > ___ > infinispan-dev mailing list > infinispan-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Alert from a failing test
2011/6/20 Manik Surtani : > Oddly enough, I don't see any other tests exhibiting this behaviour. Let me > know if you see it in more recent CI runs, and we'll investigate in detail. In fact there are not many tests in core which verify a full stream is received; but as in another thread I mentioned I was seeing the following exception relatively often (it never caught my attention some months ago) Caused by: java.io.EOFException: The stream ended unexpectedly. Please check whether the source of the stream encountered any issues generating the stream. at org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:193) at org.infinispan.statetransfer.StateTransferManagerImpl.processCommitLog(StateTransferManagerImpl.java:218) at org.infinispan.statetransfer.StateTransferManagerImpl.applyTransactionLog(StateTransferManagerImpl.java:245) at org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:284) ... 27 more Caused by: java.io.EOFException: Read past end of file at org.jboss.marshalling.SimpleDataInput.eofOnRead(SimpleDataInput.java:126) at org.jboss.marshalling.SimpleDataInput.readUnsignedByteDirect(SimpleDataInput.java:263) at org.jboss.marshalling.SimpleDataInput.readUnsignedByte(SimpleDataInput.java:224) at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209) at org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:37) at org.infinispan.marshall.jboss.GenericJBossMarshaller.objectFromObjectStream(GenericJBossMarshaller.java:191) at org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:191) ... 30 more This looks like a suspicious correlation to me, as I think the reported errors are similar in nature. Cheers, Sanne > > On 18 Jun 2011, at 20:18, Sanne Grinovero wrote: > >> Hello all, >> I'm not in state to fully debug the issue this week, but even though >> this failure happens in the Lucene Directory it looks like it's >> reporting an issue with Infinispan core: >> >> https://infinispan.ci.cloudbees.com/job/Infinispan-master-JDK6-tcp/90/org.infinispan$infinispan-lucene-directory/testReport/junit/org.infinispan.lucene/SimpleLuceneTest/org_infinispan_lucene_SimpleLuceneTest_testIndexWritingAndFinding/ >> >> In this test we're writing to the index, and then asserting on the >> expected state on both nodes, but while it is successful on the same >> node as the writer, it fails with >> "java.io.IOException: Read past EOF" on the second node. >> >> This exception can mean only one thing: the value, which is a >> buffer[], was not completely transferred to the second node, which >> seems quite critical as the caches are using sync. >> I can't reproduce the error locally, but it's not the first time it is >> reported by CI: builds 60, 62, 65 for example (and more) show the same >> testcase fail in the same manner. >> >> Cheers, >> Sanne >> ___ >> infinispan-dev mailing list >> infinispan-dev@lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > -- > Manik Surtani > ma...@jboss.org > twitter.com/maniksurtani > > Lead, Infinispan > http://www.infinispan.org > > > > > ___ > infinispan-dev mailing list > infinispan-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Alert from a failing test
On Mon, Jun 20, 2011 at 11:42 PM, Sanne Grinovero wrote: > 2011/6/20 Manik Surtani : >> Oddly enough, I don't see any other tests exhibiting this behaviour. Let me >> know if you see it in more recent CI runs, and we'll investigate in detail. > > In fact there are not many tests in core which verify a full stream is > received; but as in another thread I mentioned I was seeing the > following exception relatively often (it never caught my attention > some months ago) > > Caused by: java.io.EOFException: The stream ended unexpectedly. > Please check whether the source of the stream encountered any issues > generating the stream. > at > org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:193) > at > org.infinispan.statetransfer.StateTransferManagerImpl.processCommitLog(StateTransferManagerImpl.java:218) > at > org.infinispan.statetransfer.StateTransferManagerImpl.applyTransactionLog(StateTransferManagerImpl.java:245) > at > org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:284) > ... 27 more > Caused by: java.io.EOFException: Read past end of file > at > org.jboss.marshalling.SimpleDataInput.eofOnRead(SimpleDataInput.java:126) > at > org.jboss.marshalling.SimpleDataInput.readUnsignedByteDirect(SimpleDataInput.java:263) > at > org.jboss.marshalling.SimpleDataInput.readUnsignedByte(SimpleDataInput.java:224) > at > org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209) > at > org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:37) > at > org.infinispan.marshall.jboss.GenericJBossMarshaller.objectFromObjectStream(GenericJBossMarshaller.java:191) > at > org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:191) > ... 30 more > The line "at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)" suggests EOF has been reached while reading the lead byte of the object, not a partial object. This is consistent with StateTransferManagerImpl.generateTransactionLog() getting a timeout while trying to acquire the processing lock (at StateTransferManagerImpl.java:192) and closing the stream in the middle of the transaction log, not a transmission error. We could probably get rid of the exception in the logs by inserting another delimiter here. Back to the original problem, if this was a stream corruption issue I'd expect a lot more instances of deserializing errors because the length of the buffer was smaller/larger than the number of bytes following it and the next object to be deserialized from the stream found garbage instead. This looks to me more like an index segment has been created with size x on node A and also on node B, then it was updated with size y > x on node A but only the metadata got to node B, the segment's byte array remained the same. I don't know anything about the Lucene directory implementation yet, so I have no idea if/how this could happen and I haven't been able to reproduce it on my machine. Is there a way to see the Jenkins test logs? Dan > This looks like a suspicious correlation to me, as I think the > reported errors are similar in nature. > > Cheers, > Sanne > > > >> >> On 18 Jun 2011, at 20:18, Sanne Grinovero wrote: >> >>> Hello all, >>> I'm not in state to fully debug the issue this week, but even though >>> this failure happens in the Lucene Directory it looks like it's >>> reporting an issue with Infinispan core: >>> >>> https://infinispan.ci.cloudbees.com/job/Infinispan-master-JDK6-tcp/90/org.infinispan$infinispan-lucene-directory/testReport/junit/org.infinispan.lucene/SimpleLuceneTest/org_infinispan_lucene_SimpleLuceneTest_testIndexWritingAndFinding/ >>> >>> In this test we're writing to the index, and then asserting on the >>> expected state on both nodes, but while it is successful on the same >>> node as the writer, it fails with >>> "java.io.IOException: Read past EOF" on the second node. >>> >>> This exception can mean only one thing: the value, which is a >>> buffer[], was not completely transferred to the second node, which >>> seems quite critical as the caches are using sync. >>> I can't reproduce the error locally, but it's not the first time it is >>> reported by CI: builds 60, 62, 65 for example (and more) show the same >>> testcase fail in the same manner. >>> >>> Cheers, >>> Sanne >>> ___ >>> infinispan-dev mailing list >>> infinispan-dev@lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> -- >> Manik Surtani >> ma...@jboss.org >> twitter.com/maniksurtani >> >> Lead, Infinispan >> http://www.infinispan.org >> >> >> >> >> ___ >> infinispan-dev mailing list >> infinispan-dev@lists.jboss.org >> https://lists.jboss.org/mailma
Re: [infinispan-dev] Alert from a failing test
Some comments below: On Jun 21, 2011, at 10:26 AM, Dan Berindei wrote: > On Mon, Jun 20, 2011 at 11:42 PM, Sanne Grinovero > wrote: >> 2011/6/20 Manik Surtani : >>> Oddly enough, I don't see any other tests exhibiting this behaviour. Let >>> me know if you see it in more recent CI runs, and we'll investigate in >>> detail. >> >> In fact there are not many tests in core which verify a full stream is >> received; but as in another thread I mentioned I was seeing the >> following exception relatively often (it never caught my attention >> some months ago) >> >> Caused by: java.io.EOFException: The stream ended unexpectedly. >> Please check whether the source of the stream encountered any issues >> generating the stream. >>at >> org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:193) >>at >> org.infinispan.statetransfer.StateTransferManagerImpl.processCommitLog(StateTransferManagerImpl.java:218) >>at >> org.infinispan.statetransfer.StateTransferManagerImpl.applyTransactionLog(StateTransferManagerImpl.java:245) >>at >> org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:284) >>... 27 more >> Caused by: java.io.EOFException: Read past end of file >>at >> org.jboss.marshalling.SimpleDataInput.eofOnRead(SimpleDataInput.java:126) >>at >> org.jboss.marshalling.SimpleDataInput.readUnsignedByteDirect(SimpleDataInput.java:263) >>at >> org.jboss.marshalling.SimpleDataInput.readUnsignedByte(SimpleDataInput.java:224) >>at >> org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209) >>at >> org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:37) >>at >> org.infinispan.marshall.jboss.GenericJBossMarshaller.objectFromObjectStream(GenericJBossMarshaller.java:191) >>at >> org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:191) >>... 30 more >> > > The line "at > org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)" > suggests EOF has been reached while reading the lead byte of the > object, not a partial object. This is consistent with > StateTransferManagerImpl.generateTransactionLog() getting a timeout > while trying to acquire the processing lock (at > StateTransferManagerImpl.java:192) and closing the stream in the > middle of the transaction log, not a transmission error. We could > probably get rid of the exception in the logs by inserting another > delimiter here. You mean inserting a particular delimiter in case of a timeout acquiring the processing lock? That sounds like a good idea. Just back from holidays and can't remember well, but where previous EOFs related to processing lock acquisition? If so, I think your idea makes even more sense cos it'd fall into a possible known problem and could provide the receiver side with that bit more of information. > > Back to the original problem, if this was a stream corruption issue > I'd expect a lot more instances of deserializing errors because the > length of the buffer was smaller/larger than the number of bytes > following it and the next object to be deserialized from the stream > found garbage instead. > > This looks to me more like an index segment has been created with size > x on node A and also on node B, then it was updated with size y > x on > node A but only the metadata got to node B, the segment's byte array > remained the same. > > I don't know anything about the Lucene directory implementation yet, > so I have no idea if/how this could happen and I haven't been able to > reproduce it on my machine. Is there a way to see the Jenkins test > logs? There's the console log in http://goo.gl/JFh5R but if this happens relatively often in the Lucene dir impl, we could create a small jenkins run and pass a -Dlog4j.configuration configuring TRACE to be printed on console. The testsuite is small and would not generate a lot logging. There's always a change of not encountering the issue when TRACE is enabled, particularly if it's a race condition, but I think it's worth doing. IOW, I can set it up. > > Dan > > >> This looks like a suspicious correlation to me, as I think the >> reported errors are similar in nature. >> >> Cheers, >> Sanne >> >> >> >>> >>> On 18 Jun 2011, at 20:18, Sanne Grinovero wrote: >>> Hello all, I'm not in state to fully debug the issue this week, but even though this failure happens in the Lucene Directory it looks like it's reporting an issue with Infinispan core: https://infinispan.ci.cloudbees.com/job/Infinispan-master-JDK6-tcp/90/org.infinispan$infinispan-lucene-directory/testReport/junit/org.infinispan.lucene/SimpleLuceneTest/org_infinispan_lucene_SimpleLuceneTest_testIndexWritingAndFinding/ In this test we're writing to the inde
Re: [infinispan-dev] Alert from a failing test
I hope this isn't related to [1]. Are you using STREAMING_STATE_TRANSFER and set use_default_transport=true ? [1] https://issues.jboss.org/browse/JGRP-1339 On 6/28/11 1:05 PM, Galder Zamarreño wrote: > Some comments below: > > On Jun 21, 2011, at 10:26 AM, Dan Berindei wrote: > >> On Mon, Jun 20, 2011 at 11:42 PM, Sanne Grinovero >> wrote: >>> 2011/6/20 Manik Surtani: Oddly enough, I don't see any other tests exhibiting this behaviour. Let me know if you see it in more recent CI runs, and we'll investigate in detail. >>> >>> In fact there are not many tests in core which verify a full stream is >>> received; but as in another thread I mentioned I was seeing the >>> following exception relatively often (it never caught my attention >>> some months ago) >>> >>> Caused by: java.io.EOFException: The stream ended unexpectedly. >>> Please check whether the source of the stream encountered any issues >>> generating the stream. >>> at >>> org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:193) >>> at >>> org.infinispan.statetransfer.StateTransferManagerImpl.processCommitLog(StateTransferManagerImpl.java:218) >>> at >>> org.infinispan.statetransfer.StateTransferManagerImpl.applyTransactionLog(StateTransferManagerImpl.java:245) >>> at >>> org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:284) >>> ... 27 more >>> Caused by: java.io.EOFException: Read past end of file >>> at >>> org.jboss.marshalling.SimpleDataInput.eofOnRead(SimpleDataInput.java:126) >>> at >>> org.jboss.marshalling.SimpleDataInput.readUnsignedByteDirect(SimpleDataInput.java:263) >>> at >>> org.jboss.marshalling.SimpleDataInput.readUnsignedByte(SimpleDataInput.java:224) >>> at >>> org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209) >>> at >>> org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:37) >>> at >>> org.infinispan.marshall.jboss.GenericJBossMarshaller.objectFromObjectStream(GenericJBossMarshaller.java:191) >>> at >>> org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:191) >>> ... 30 more >>> >> >> The line "at >> org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)" >> suggests EOF has been reached while reading the lead byte of the >> object, not a partial object. This is consistent with >> StateTransferManagerImpl.generateTransactionLog() getting a timeout >> while trying to acquire the processing lock (at >> StateTransferManagerImpl.java:192) and closing the stream in the >> middle of the transaction log, not a transmission error. We could >> probably get rid of the exception in the logs by inserting another >> delimiter here. > > You mean inserting a particular delimiter in case of a timeout acquiring the > processing lock? That sounds like a good idea. Just back from holidays and > can't remember well, but where previous EOFs related to processing lock > acquisition? If so, I think your idea makes even more sense cos it'd fall > into a possible known problem and could provide the receiver side with that > bit more of information. > >> >> Back to the original problem, if this was a stream corruption issue >> I'd expect a lot more instances of deserializing errors because the >> length of the buffer was smaller/larger than the number of bytes >> following it and the next object to be deserialized from the stream >> found garbage instead. >> >> This looks to me more like an index segment has been created with size >> x on node A and also on node B, then it was updated with size y> x on >> node A but only the metadata got to node B, the segment's byte array >> remained the same. >> >> I don't know anything about the Lucene directory implementation yet, >> so I have no idea if/how this could happen and I haven't been able to >> reproduce it on my machine. Is there a way to see the Jenkins test >> logs? > > There's the console log in http://goo.gl/JFh5R but if this happens relatively > often in the Lucene dir impl, we could create a small jenkins run and pass a > -Dlog4j.configuration configuring TRACE to be printed on console. The > testsuite is small and would not generate a lot logging. There's always a > change of not encountering the issue when TRACE is enabled, particularly if > it's a race condition, but I think it's worth doing. IOW, I can set it up. > >> >> Dan >> >> >>> This looks like a suspicious correlation to me, as I think the >>> reported errors are similar in nature. >>> >>> Cheers, >>> Sanne >>> >>> >>> On 18 Jun 2011, at 20:18, Sanne Grinovero wrote: > Hello all, > I'm not in state to fully debug the issue this week, but even though > this failure happens in the Lucene Directory it looks like it's > reporting an is
Re: [infinispan-dev] Alert from a failing test
On Tue, Jun 28, 2011 at 2:05 PM, Galder Zamarreño wrote: > Some comments below: > > On Jun 21, 2011, at 10:26 AM, Dan Berindei wrote: > >> On Mon, Jun 20, 2011 at 11:42 PM, Sanne Grinovero >> wrote: >>> 2011/6/20 Manik Surtani : Oddly enough, I don't see any other tests exhibiting this behaviour. Let me know if you see it in more recent CI runs, and we'll investigate in detail. >>> >>> In fact there are not many tests in core which verify a full stream is >>> received; but as in another thread I mentioned I was seeing the >>> following exception relatively often (it never caught my attention >>> some months ago) >>> >>> Caused by: java.io.EOFException: The stream ended unexpectedly. >>> Please check whether the source of the stream encountered any issues >>> generating the stream. >>> at >>> org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:193) >>> at >>> org.infinispan.statetransfer.StateTransferManagerImpl.processCommitLog(StateTransferManagerImpl.java:218) >>> at >>> org.infinispan.statetransfer.StateTransferManagerImpl.applyTransactionLog(StateTransferManagerImpl.java:245) >>> at >>> org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:284) >>> ... 27 more >>> Caused by: java.io.EOFException: Read past end of file >>> at >>> org.jboss.marshalling.SimpleDataInput.eofOnRead(SimpleDataInput.java:126) >>> at >>> org.jboss.marshalling.SimpleDataInput.readUnsignedByteDirect(SimpleDataInput.java:263) >>> at >>> org.jboss.marshalling.SimpleDataInput.readUnsignedByte(SimpleDataInput.java:224) >>> at >>> org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209) >>> at >>> org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:37) >>> at >>> org.infinispan.marshall.jboss.GenericJBossMarshaller.objectFromObjectStream(GenericJBossMarshaller.java:191) >>> at >>> org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:191) >>> ... 30 more >>> >> >> The line "at >> org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)" >> suggests EOF has been reached while reading the lead byte of the >> object, not a partial object. This is consistent with >> StateTransferManagerImpl.generateTransactionLog() getting a timeout >> while trying to acquire the processing lock (at >> StateTransferManagerImpl.java:192) and closing the stream in the >> middle of the transaction log, not a transmission error. We could >> probably get rid of the exception in the logs by inserting another >> delimiter here. > > You mean inserting a particular delimiter in case of a timeout acquiring the > processing lock? That sounds like a good idea. Just back from holidays and > can't remember well, but where previous EOFs related to processing lock > acquisition? If so, I think your idea makes even more sense cos it'd fall > into a possible known problem and could provide the receiver side with that > bit more of information. > I'm pretty sure all the EOFs we got are related to processing lock acquisition, yes. >> >> Back to the original problem, if this was a stream corruption issue >> I'd expect a lot more instances of deserializing errors because the >> length of the buffer was smaller/larger than the number of bytes >> following it and the next object to be deserialized from the stream >> found garbage instead. >> >> This looks to me more like an index segment has been created with size >> x on node A and also on node B, then it was updated with size y > x on >> node A but only the metadata got to node B, the segment's byte array >> remained the same. >> >> I don't know anything about the Lucene directory implementation yet, >> so I have no idea if/how this could happen and I haven't been able to >> reproduce it on my machine. Is there a way to see the Jenkins test >> logs? > > There's the console log in http://goo.gl/JFh5R but if this happens relatively > often in the Lucene dir impl, we could create a small jenkins run and pass a > -Dlog4j.configuration configuring TRACE to be printed on console. The > testsuite is small and would not generate a lot logging. There's always a > change of not encountering the issue when TRACE is enabled, particularly if > it's a race condition, but I think it's worth doing. IOW, I can set it up. > It would be great if you could set it up. It's not happening on every run though, so is it possible to configure jenkins to repeat it until it gets a failure? >> >> Dan >> >> >>> This looks like a suspicious correlation to me, as I think the >>> reported errors are similar in nature. >>> >>> Cheers, >>> Sanne >>> >>> >>> On 18 Jun 2011, at 20:18, Sanne Grinovero wrote: > Hello all, > I'm not in state to fully debug the issue this week, but even though >>
Re: [infinispan-dev] Alert from a failing test
On Jun 28, 2011, at 6:58 PM, Dan Berindei wrote: > On Tue, Jun 28, 2011 at 2:05 PM, Galder Zamarreño wrote: >> Some comments below: >> >> On Jun 21, 2011, at 10:26 AM, Dan Berindei wrote: >> >>> On Mon, Jun 20, 2011 at 11:42 PM, Sanne Grinovero >>> wrote: 2011/6/20 Manik Surtani : > Oddly enough, I don't see any other tests exhibiting this behaviour. Let > me know if you see it in more recent CI runs, and we'll investigate in > detail. In fact there are not many tests in core which verify a full stream is received; but as in another thread I mentioned I was seeing the following exception relatively often (it never caught my attention some months ago) Caused by: java.io.EOFException: The stream ended unexpectedly. Please check whether the source of the stream encountered any issues generating the stream. at org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:193) at org.infinispan.statetransfer.StateTransferManagerImpl.processCommitLog(StateTransferManagerImpl.java:218) at org.infinispan.statetransfer.StateTransferManagerImpl.applyTransactionLog(StateTransferManagerImpl.java:245) at org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:284) ... 27 more Caused by: java.io.EOFException: Read past end of file at org.jboss.marshalling.SimpleDataInput.eofOnRead(SimpleDataInput.java:126) at org.jboss.marshalling.SimpleDataInput.readUnsignedByteDirect(SimpleDataInput.java:263) at org.jboss.marshalling.SimpleDataInput.readUnsignedByte(SimpleDataInput.java:224) at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209) at org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:37) at org.infinispan.marshall.jboss.GenericJBossMarshaller.objectFromObjectStream(GenericJBossMarshaller.java:191) at org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:191) ... 30 more >>> >>> The line "at >>> org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)" >>> suggests EOF has been reached while reading the lead byte of the >>> object, not a partial object. This is consistent with >>> StateTransferManagerImpl.generateTransactionLog() getting a timeout >>> while trying to acquire the processing lock (at >>> StateTransferManagerImpl.java:192) and closing the stream in the >>> middle of the transaction log, not a transmission error. We could >>> probably get rid of the exception in the logs by inserting another >>> delimiter here. >> >> You mean inserting a particular delimiter in case of a timeout acquiring the >> processing lock? That sounds like a good idea. Just back from holidays and >> can't remember well, but where previous EOFs related to processing lock >> acquisition? If so, I think your idea makes even more sense cos it'd fall >> into a possible known problem and could provide the receiver side with that >> bit more of information. >> > > I'm pretty sure all the EOFs we got are related to processing lock > acquisition, yes. https://issues.jboss.org/browse/ISPN-1216 > >>> >>> Back to the original problem, if this was a stream corruption issue >>> I'd expect a lot more instances of deserializing errors because the >>> length of the buffer was smaller/larger than the number of bytes >>> following it and the next object to be deserialized from the stream >>> found garbage instead. >>> >>> This looks to me more like an index segment has been created with size >>> x on node A and also on node B, then it was updated with size y > x on >>> node A but only the metadata got to node B, the segment's byte array >>> remained the same. >>> >>> I don't know anything about the Lucene directory implementation yet, >>> so I have no idea if/how this could happen and I haven't been able to >>> reproduce it on my machine. Is there a way to see the Jenkins test >>> logs? >> >> There's the console log in http://goo.gl/JFh5R but if this happens >> relatively often in the Lucene dir impl, we could create a small jenkins run >> and pass a -Dlog4j.configuration configuring TRACE to be printed on console. >> The testsuite is small and would not generate a lot logging. There's always >> a change of not encountering the issue when TRACE is enabled, particularly >> if it's a race condition, but I think it's worth doing. IOW, I can set it up. >> > > It would be great if you could set it up. It's not happening on every > run though, so is it possible to configure jenkins to repeat it until > it gets a failure? https://infinispan.ci.cloudbees.com/job/Infinispan-lucene-master-TRACE/4/ That job's running the lucene dir