Re: [infinispan-dev] New Logging rules!!! MUST READ

2011-06-28 Thread David M. Lloyd
On 06/28/2011 06:31 AM, Galder Zamarreño wrote:
>
> On Jun 21, 2011, at 11:54 AM, Manik Surtani wrote:
>
>>
>> On 27 Apr 2011, at 14:18, Galder Zamarreño wrote:
>>
>>> Please note as well that this might require some IDE settings changes. I 
>>> had to enable annotation processing in IntelliJ to be able to compile stuff 
>>> from IDE.
>>
>> Galder, do you have any instructions on these IDE settings?  Things work 
>> fine from the command line, but the generated logger does not exist if I 
>> build from an IDE.  I don't believe the annotation processor is being 
>> invoked, or perhaps it is generating sources where the rest of the build 
>> doesn't expect.
>
> True, I don't think it's generating the log implementations which makes tests 
> fail unless you've previously built from cmd line. I'll look into it.
>
> If David's listening, he might already have some tricks from the JBoss 
> Logging development work?

Sorry, I always build on the command line.  I know that IDEA can be 
configured because I have done it... once... in the distant past... but 
that's all I remember.
-- 
- DML
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Alert from a failing test

2011-06-28 Thread Dan Berindei
On Tue, Jun 28, 2011 at 2:05 PM, Galder Zamarreño  wrote:
> Some comments below:
>
> On Jun 21, 2011, at 10:26 AM, Dan Berindei wrote:
>
>> On Mon, Jun 20, 2011 at 11:42 PM, Sanne Grinovero  
>> wrote:
>>> 2011/6/20 Manik Surtani :
 Oddly enough, I don't see any other tests exhibiting this behaviour.  Let 
 me know if you see it in more recent CI runs, and we'll investigate in 
 detail.
>>>
>>> In fact there are not many tests in core which verify a full stream is
>>> received; but as in another thread I mentioned I was seeing the
>>> following exception relatively often (it never caught my attention
>>> some months ago)
>>>
>>> Caused by: java.io.EOFException: The stream ended unexpectedly.
>>> Please check whether the source of the stream encountered any issues
>>> generating the stream.
>>>        at 
>>> org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:193)
>>>        at 
>>> org.infinispan.statetransfer.StateTransferManagerImpl.processCommitLog(StateTransferManagerImpl.java:218)
>>>        at 
>>> org.infinispan.statetransfer.StateTransferManagerImpl.applyTransactionLog(StateTransferManagerImpl.java:245)
>>>        at 
>>> org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:284)
>>>        ... 27 more
>>> Caused by: java.io.EOFException: Read past end of file
>>>        at 
>>> org.jboss.marshalling.SimpleDataInput.eofOnRead(SimpleDataInput.java:126)
>>>        at 
>>> org.jboss.marshalling.SimpleDataInput.readUnsignedByteDirect(SimpleDataInput.java:263)
>>>        at 
>>> org.jboss.marshalling.SimpleDataInput.readUnsignedByte(SimpleDataInput.java:224)
>>>        at 
>>> org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)
>>>        at 
>>> org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:37)
>>>        at 
>>> org.infinispan.marshall.jboss.GenericJBossMarshaller.objectFromObjectStream(GenericJBossMarshaller.java:191)
>>>        at 
>>> org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:191)
>>>        ... 30 more
>>>
>>
>> The line "at 
>> org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)"
>> suggests EOF has been reached while reading the lead byte of the
>> object, not a partial object. This is consistent with
>> StateTransferManagerImpl.generateTransactionLog() getting a timeout
>> while trying to acquire the processing lock (at
>> StateTransferManagerImpl.java:192) and closing the stream in the
>> middle of the transaction log, not a transmission error. We could
>> probably get rid of the exception in the logs by inserting another
>> delimiter here.
>
> You mean inserting a particular delimiter in case of a timeout acquiring the 
> processing lock? That sounds like a good idea. Just back from holidays and 
> can't remember well, but where previous EOFs related to processing lock 
> acquisition? If so, I think your idea makes even more sense cos it'd fall 
> into a possible known problem and could provide the receiver side with that 
> bit more of information.
>

I'm pretty sure all the EOFs we got are related to processing lock
acquisition, yes.

>>
>> Back to the original problem, if this was a stream corruption issue
>> I'd expect a lot more instances of deserializing errors because the
>> length of the buffer was smaller/larger than the number of bytes
>> following it and the next object to be deserialized from the stream
>> found garbage instead.
>>
>> This looks to me more like an index segment has been created with size
>> x on node A and also on node B, then it was updated with size y > x on
>> node A but only the metadata got to node B, the segment's byte array
>> remained the same.
>>
>> I don't know anything about the Lucene directory implementation yet,
>> so I have no idea if/how this could happen and I haven't been able to
>> reproduce it on my machine. Is there a way to see the Jenkins test
>> logs?
>
> There's the console log in http://goo.gl/JFh5R but if this happens relatively 
> often in the Lucene dir impl, we could create a small jenkins run and pass a 
> -Dlog4j.configuration configuring TRACE to be printed on console. The 
> testsuite is small and would not generate a lot logging. There's always a 
> change of not encountering the issue when TRACE is enabled, particularly if 
> it's a race condition, but I think it's worth doing. IOW, I can set it up.
>

It would be great if you could set it up. It's not happening on every
run though, so is it possible to configure jenkins to repeat it until
it gets a failure?

>>
>> Dan
>>
>>
>>> This looks like a suspicious correlation to me, as I think the
>>> reported errors are similar in nature.
>>>
>>> Cheers,
>>> Sanne
>>>
>>>
>>>

 On 18 Jun 2011, at 20:18, Sanne Grinovero wrote:

> Hello all,
> I'm not in state to fully debug the issue this week, but even though
>>

Re: [infinispan-dev] DataContainer performance review

2011-06-28 Thread Vladimir Blagojevic
On 11-06-28 10:06 AM, Galder Zamarreño wrote:
> Vladimir,
>
> I think it's better if you run your tests in one of the cluster or perf 
> machines cos that way everyone has access to the same base system and results 
> can be compared, particularly when changes are made. Also, you avoid local 
> apps or CPU usage affecting your test results.
>
> I agree with Sanne, put ops for LIRS don't look go in comparison with LRU. 
> Did you run some profiling?
>
Hey,

Very likely you are right and it is a better approach but it does not 
take much to notice a trend of deteriorating BCHM performance for large 
map/cache size. Looking to do some profiling now.

Cheers


___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] DataContainer performance review

2011-06-28 Thread Galder Zamarreño
Vladimir,

I think it's better if you run your tests in one of the cluster or perf 
machines cos that way everyone has access to the same base system and results 
can be compared, particularly when changes are made. Also, you avoid local apps 
or CPU usage affecting your test results. 

I agree with Sanne, put ops for LIRS don't look go in comparison with LRU. Did 
you run some profiling?

Cheers,

On Jun 27, 2011, at 10:11 PM, Vladimir Blagojevic wrote:

> Sanne & others,
> 
> I think we might be onto something. I changed the test to run for 
> specified period of time, I used 10 minutes test runs (You need to pull 
> this change in MapStressTest manually until it is integrated). I noticed 
> that as we raise map capacity BCHM and CacheImpl performance starts to 
> degrade while it does not for ConcurrentHashMap and SynchronizedMap. See 
> results below.
> 
> 
> max capacity = 512
> Performance for container BoundedConcurrentHashMap
> Average get ops/ms 382
> Average put ops/ms 35
> Average remove ops/ms 195
> Size = 478
> Performance for container BoundedConcurrentHashMap
> Average get ops/ms 388
> Average put ops/ms 54
> Average remove ops/ms 203
> Size = 462
> Performance for container CacheImpl
> Average get ops/ms 143
> Average put ops/ms 16
> Average remove ops/ms 26
> Size = 418
> Performance for container ConcurrentHashMap
> Average get ops/ms 176
> Average put ops/ms 67
> Average remove ops/ms 74
> Size = 43451
> Performance for container SynchronizedMap
> Average get ops/ms 58
> Average put ops/ms 47
> Average remove ops/ms 60
> Size = 30996
> 
> 
> max capacity = 16384
> Performance for container BoundedConcurrentHashMap
> Average get ops/ms 118
> Average put ops/ms 7
> Average remove ops/ms 11
> Size = 16358
> Performance for container BoundedConcurrentHashMap
> Average get ops/ms 76
> Average put ops/ms 5
> Average remove ops/ms 6
> Size = 15488
> Performance for container CacheImpl
> Average get ops/ms 48
> Average put ops/ms 4
> Average remove ops/ms 16
> Size = 12275
> Performance for container ConcurrentHashMap
> Average get ops/ms 251
> Average put ops/ms 107
> Average remove ops/ms 122
> Size = 17629
> Performance for container SynchronizedMap
> Average get ops/ms 51
> Average put ops/ms 42
> Average remove ops/ms 51
> Size = 36978
> 
> 
> max capacity = 32768
> Performance for container BoundedConcurrentHashMap
> Average get ops/ms 72
> Average put ops/ms 7
> Average remove ops/ms 9
> Size = 32405
> Performance for container BoundedConcurrentHashMap
> Average get ops/ms 13
> Average put ops/ms 5
> Average remove ops/ms 2
> Size = 29214
> Performance for container CacheImpl
> Average get ops/ms 14
> Average put ops/ms 2
> Average remove ops/ms 4
> Size = 23887
> Performance for container ConcurrentHashMap
> Average get ops/ms 235
> Average put ops/ms 102
> Average remove ops/ms 115
> Size = 27823
> Performance for container SynchronizedMap
> Average get ops/ms 55
> Average put ops/ms 48
> Average remove ops/ms 53
> Size = 39650
> 
> On 11-06-26 8:44 PM, Sanne Grinovero wrote:
>> Hi Vladimir,
>> this looks very interesting, I couldn't resist to start some runs.
>> 
>> I noticed the test is quite quick to finish, so I've raised my
>> LOOP_FACTOR to 200, but it still finishes in some minutes which is not
>> long enough IMHO for these numbers to be really representative.
>> I've noticed that the test has some "warmup" boolean, but that's not
>> being used while I think it should.
>> Also, the three different operations need of course to happen all
>> together to properly "shuffle" the data, but we have to consider while
>> interpreting these numbers that some operations will finish before the
>> others, so some of the results achieved by the remaining operations
>> are not disturbed by the other operations. Maybe it's more interesting
>> to have the three operations run in a predictable sequence? or have
>> them all work as fast as they can for a given timebox instead of
>> "until the keys are finished" ?
>> 
>> Here where my results, if any comparing is useful. To conclude
>> something from this data, it looks to me that indeed the put
>> operations during LIRS are having something wrong? Also trying to add
>> more writers worsens the scenario for LIRS significantly.
>> 
>> When running the test with "doTest(map, 28, 8, 8, true, testName);"
>> (adding more put and remove operations) the synchronizedMap is
>> significanly faster than the CacheImpl.
>> 
>> Performance for container BoundedConcurrentHashMap
>> Average get ops/ms 1711
>> Average put ops/ms 63
>> Average remove ops/ms 1108
>> Size = 480
>> Performance for container BoundedConcurrentHashMap
>> Average get ops/ms 1851
>> Average put ops/ms 665
>> Average remove ops/ms 1199
>> Size = 463
>> Performance for container CacheImpl
>> Average get ops/ms 349
>> Average put ops/ms 213
>> Average remove ops/ms 250
>> Size = 459
>> Performance for container ConcurrentHashMap
>> Average get ops/ms 776
>> Average put ops/ms 611
>> Average rem

Re: [infinispan-dev] New Logging rules!!! MUST READ

2011-06-28 Thread Galder Zamarreño

On Jun 21, 2011, at 11:54 AM, Manik Surtani wrote:

> 
> On 27 Apr 2011, at 14:18, Galder Zamarreño wrote:
> 
>> Please note as well that this might require some IDE settings changes. I had 
>> to enable annotation processing in IntelliJ to be able to compile stuff from 
>> IDE.
> 
> Galder, do you have any instructions on these IDE settings?  Things work fine 
> from the command line, but the generated logger does not exist if I build 
> from an IDE.  I don't believe the annotation processor is being invoked, or 
> perhaps it is generating sources where the rest of the build doesn't expect.

True, I don't think it's generating the log implementations which makes tests 
fail unless you've previously built from cmd line. I'll look into it.

If David's listening, he might already have some tricks from the JBoss Logging 
development work?

> 
> --
> Manik Surtani
> ma...@jboss.org
> twitter.com/maniksurtani
> 
> Lead, Infinispan
> http://www.infinispan.org
> 
> 
> 
> 
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache


___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Alert from a failing test

2011-06-28 Thread Bela Ban
I hope this isn't related to [1]. Are you using STREAMING_STATE_TRANSFER 
and set use_default_transport=true ?


[1] https://issues.jboss.org/browse/JGRP-1339


On 6/28/11 1:05 PM, Galder Zamarreño wrote:
> Some comments below:
>
> On Jun 21, 2011, at 10:26 AM, Dan Berindei wrote:
>
>> On Mon, Jun 20, 2011 at 11:42 PM, Sanne Grinovero  
>> wrote:
>>> 2011/6/20 Manik Surtani:
 Oddly enough, I don't see any other tests exhibiting this behaviour.  Let 
 me know if you see it in more recent CI runs, and we'll investigate in 
 detail.
>>>
>>> In fact there are not many tests in core which verify a full stream is
>>> received; but as in another thread I mentioned I was seeing the
>>> following exception relatively often (it never caught my attention
>>> some months ago)
>>>
>>> Caused by: java.io.EOFException: The stream ended unexpectedly.
>>> Please check whether the source of the stream encountered any issues
>>> generating the stream.
>>> at 
>>> org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:193)
>>> at 
>>> org.infinispan.statetransfer.StateTransferManagerImpl.processCommitLog(StateTransferManagerImpl.java:218)
>>> at 
>>> org.infinispan.statetransfer.StateTransferManagerImpl.applyTransactionLog(StateTransferManagerImpl.java:245)
>>> at 
>>> org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:284)
>>> ... 27 more
>>> Caused by: java.io.EOFException: Read past end of file
>>> at 
>>> org.jboss.marshalling.SimpleDataInput.eofOnRead(SimpleDataInput.java:126)
>>> at 
>>> org.jboss.marshalling.SimpleDataInput.readUnsignedByteDirect(SimpleDataInput.java:263)
>>> at 
>>> org.jboss.marshalling.SimpleDataInput.readUnsignedByte(SimpleDataInput.java:224)
>>> at 
>>> org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)
>>> at 
>>> org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:37)
>>> at 
>>> org.infinispan.marshall.jboss.GenericJBossMarshaller.objectFromObjectStream(GenericJBossMarshaller.java:191)
>>> at 
>>> org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:191)
>>> ... 30 more
>>>
>>
>> The line "at 
>> org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)"
>> suggests EOF has been reached while reading the lead byte of the
>> object, not a partial object. This is consistent with
>> StateTransferManagerImpl.generateTransactionLog() getting a timeout
>> while trying to acquire the processing lock (at
>> StateTransferManagerImpl.java:192) and closing the stream in the
>> middle of the transaction log, not a transmission error. We could
>> probably get rid of the exception in the logs by inserting another
>> delimiter here.
>
> You mean inserting a particular delimiter in case of a timeout acquiring the 
> processing lock? That sounds like a good idea. Just back from holidays and 
> can't remember well, but where previous EOFs related to processing lock 
> acquisition? If so, I think your idea makes even more sense cos it'd fall 
> into a possible known problem and could provide the receiver side with that 
> bit more of information.
>
>>
>> Back to the original problem, if this was a stream corruption issue
>> I'd expect a lot more instances of deserializing errors because the
>> length of the buffer was smaller/larger than the number of bytes
>> following it and the next object to be deserialized from the stream
>> found garbage instead.
>>
>> This looks to me more like an index segment has been created with size
>> x on node A and also on node B, then it was updated with size y>  x on
>> node A but only the metadata got to node B, the segment's byte array
>> remained the same.
>>
>> I don't know anything about the Lucene directory implementation yet,
>> so I have no idea if/how this could happen and I haven't been able to
>> reproduce it on my machine. Is there a way to see the Jenkins test
>> logs?
>
> There's the console log in http://goo.gl/JFh5R but if this happens relatively 
> often in the Lucene dir impl, we could create a small jenkins run and pass a 
> -Dlog4j.configuration configuring TRACE to be printed on console. The 
> testsuite is small and would not generate a lot logging. There's always a 
> change of not encountering the issue when TRACE is enabled, particularly if 
> it's a race condition, but I think it's worth doing. IOW, I can set it up.
>
>>
>> Dan
>>
>>
>>> This looks like a suspicious correlation to me, as I think the
>>> reported errors are similar in nature.
>>>
>>> Cheers,
>>> Sanne
>>>
>>>
>>>

 On 18 Jun 2011, at 20:18, Sanne Grinovero wrote:

> Hello all,
> I'm not in state to fully debug the issue this week, but even though
> this failure happens in the Lucene Directory it looks like it's
> reporting an is

Re: [infinispan-dev] Alert from a failing test

2011-06-28 Thread Galder Zamarreño
Some comments below:

On Jun 21, 2011, at 10:26 AM, Dan Berindei wrote:

> On Mon, Jun 20, 2011 at 11:42 PM, Sanne Grinovero  
> wrote:
>> 2011/6/20 Manik Surtani :
>>> Oddly enough, I don't see any other tests exhibiting this behaviour.  Let 
>>> me know if you see it in more recent CI runs, and we'll investigate in 
>>> detail.
>> 
>> In fact there are not many tests in core which verify a full stream is
>> received; but as in another thread I mentioned I was seeing the
>> following exception relatively often (it never caught my attention
>> some months ago)
>> 
>> Caused by: java.io.EOFException: The stream ended unexpectedly.
>> Please check whether the source of the stream encountered any issues
>> generating the stream.
>>at 
>> org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:193)
>>at 
>> org.infinispan.statetransfer.StateTransferManagerImpl.processCommitLog(StateTransferManagerImpl.java:218)
>>at 
>> org.infinispan.statetransfer.StateTransferManagerImpl.applyTransactionLog(StateTransferManagerImpl.java:245)
>>at 
>> org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:284)
>>... 27 more
>> Caused by: java.io.EOFException: Read past end of file
>>at 
>> org.jboss.marshalling.SimpleDataInput.eofOnRead(SimpleDataInput.java:126)
>>at 
>> org.jboss.marshalling.SimpleDataInput.readUnsignedByteDirect(SimpleDataInput.java:263)
>>at 
>> org.jboss.marshalling.SimpleDataInput.readUnsignedByte(SimpleDataInput.java:224)
>>at 
>> org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)
>>at 
>> org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:37)
>>at 
>> org.infinispan.marshall.jboss.GenericJBossMarshaller.objectFromObjectStream(GenericJBossMarshaller.java:191)
>>at 
>> org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:191)
>>... 30 more
>> 
> 
> The line "at 
> org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)"
> suggests EOF has been reached while reading the lead byte of the
> object, not a partial object. This is consistent with
> StateTransferManagerImpl.generateTransactionLog() getting a timeout
> while trying to acquire the processing lock (at
> StateTransferManagerImpl.java:192) and closing the stream in the
> middle of the transaction log, not a transmission error. We could
> probably get rid of the exception in the logs by inserting another
> delimiter here.

You mean inserting a particular delimiter in case of a timeout acquiring the 
processing lock? That sounds like a good idea. Just back from holidays and 
can't remember well, but where previous EOFs related to processing lock 
acquisition? If so, I think your idea makes even more sense cos it'd fall into 
a possible known problem and could provide the receiver side with that bit more 
of information.

> 
> Back to the original problem, if this was a stream corruption issue
> I'd expect a lot more instances of deserializing errors because the
> length of the buffer was smaller/larger than the number of bytes
> following it and the next object to be deserialized from the stream
> found garbage instead.
> 
> This looks to me more like an index segment has been created with size
> x on node A and also on node B, then it was updated with size y > x on
> node A but only the metadata got to node B, the segment's byte array
> remained the same.
> 
> I don't know anything about the Lucene directory implementation yet,
> so I have no idea if/how this could happen and I haven't been able to
> reproduce it on my machine. Is there a way to see the Jenkins test
> logs?

There's the console log in http://goo.gl/JFh5R but if this happens relatively 
often in the Lucene dir impl, we could create a small jenkins run and pass a 
-Dlog4j.configuration configuring TRACE to be printed on console. The testsuite 
is small and would not generate a lot logging. There's always a change of not 
encountering the issue when TRACE is enabled, particularly if it's a race 
condition, but I think it's worth doing. IOW, I can set it up.

> 
> Dan
> 
> 
>> This looks like a suspicious correlation to me, as I think the
>> reported errors are similar in nature.
>> 
>> Cheers,
>> Sanne
>> 
>> 
>> 
>>> 
>>> On 18 Jun 2011, at 20:18, Sanne Grinovero wrote:
>>> 
 Hello all,
 I'm not in state to fully debug the issue this week, but even though
 this failure happens in the Lucene Directory it looks like it's
 reporting an issue with Infinispan core:
 
 https://infinispan.ci.cloudbees.com/job/Infinispan-master-JDK6-tcp/90/org.infinispan$infinispan-lucene-directory/testReport/junit/org.infinispan.lucene/SimpleLuceneTest/org_infinispan_lucene_SimpleLuceneTest_testIndexWritingAndFinding/
 
 In this test we're writing to the inde