[infinispan-dev] JBoss Libra

2012-01-31 Thread Galder Zamarreño
Just saw this: https://github.com/wolfc/jboss-libra

We should investigate the possibility of adding this to Infinispan and provide 
memory size based eviction, WDYT?

The performance impact would need to be measured too.

EhCache has apparenlty done something similar but from what I heard, it's full 
of hacks to work on diff plattforms...

Cheers,
--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache


___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] JBoss Libra

2012-01-31 Thread Bela Ban
IMO, measuring object size using java.lang.instrument is not a good 
idea: first of all, it's probably very slow and the time to do so is 
linear to the number of live objects. Second, this probably takes into 
acount only objects in the cache, but not the cache structures used by 
Infinispan, JGroups and so on...

The approach I've recommended before is to trigger an eviction policy 
based on free/available memory. This can easily be fetched from the JVM 
via JMX...

On 1/31/12 10:25 AM, Galder Zamarreño wrote:
 Just saw this: https://github.com/wolfc/jboss-libra

 We should investigate the possibility of adding this to Infinispan and 
 provide memory size based eviction, WDYT?

 The performance impact would need to be measured too.

 EhCache has apparenlty done something similar but from what I heard, it's 
 full of hacks to work on diff plattforms...


-- 
Bela Ban
Lead JGroups (http://www.jgroups.org)
JBoss / Red Hat
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] again: no physical address

2012-01-31 Thread Manik Surtani
I have sporadically seen this before when running some perf tests as well … 
curious to know what's up.

On 30 Jan 2012, at 17:45, Sanne Grinovero wrote:

 Hi Bela,
 this is the same error we where having in Boston when preparing the
 Infinispan nodes for some of the demos. So I didn't see it for a long
 time, but today it returned especially to add a special twist to my
 performance tests.
 
 Dan,
 when this happened it looked like I had a deadlock: the benchmark is
 not making any more progress, it looks like they are all waiting for
 answers. JConsole didn't detect a deadlock, and unfortunately I'm not
 having more logs than this from nor JGroups nor Infinispan (since it
 was supposed to be a performance test!).
 
 I'm attaching a threaddump in case it interests you, but I hope not:
 this is a DIST test with 12 nodes (in the same VM from this dump). I
 didn't have time to inspect it myself as I have to run, and I think
 the interesting news here is with the no physical address
 
 ideas?
 
 [org.jboss.logging] Logging Provider: org.jboss.logging.Log4jLoggerProvider
 [org.jgroups.protocols.UDP] sanne-55119: no physical address for
 sanne-53650, dropping message
 [org.jgroups.protocols.pbcast.GMS] JOIN(sanne-55119) sent to
 sanne-53650 timed out (after 3000 ms), retrying
 [org.jgroups.protocols.pbcast.GMS] sanne-55119 already present;
 returning existing view [sanne-53650|5] [sanne-53650, sanne-49978,
 sanne-27401, sanne-4741, sanne-29196, sanne-55119]
 [org.jgroups.protocols.UDP] sanne-39563: no physical address for
 sanne-53650, dropping message
 [org.jgroups.protocols.pbcast.GMS] JOIN(sanne-39563) sent to
 sanne-53650 timed out (after 3000 ms), retrying
 [org.jgroups.protocols.pbcast.GMS] sanne-39563 already present;
 returning existing view [sanne-53650|6] [sanne-53650, sanne-49978,
 sanne-27401, sanne-4741, sanne-29196, sanne-55119, sanne-39563]
 [org.jgroups.protocols.UDP] sanne-18071: no physical address for
 sanne-39563, dropping message
 [org.jgroups.protocols.UDP] sanne-18071: no physical address for
 sanne-55119, dropping message
 threadDump.txt___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Manik Surtani
ma...@jboss.org
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org




___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Don't forget to update the XSD when adding configuration

2012-01-31 Thread Sanne Grinovero
Was something changed already in the configuration format from 5.1.0.Final ?

On 31 January 2012 10:28, Manik Surtani ma...@jboss.org wrote:
 Please consider this when both developing as well as reviewing code.  This is 
 very important.

 On 30 Jan 2012, at 20:17, Pete Muir wrote:

 This is not done automatically, you'll need to do it yourself. Make sure to 
 add docs too.

 Please also remember to update src/test/resources/configs/all.xml with your 
 new elements or attributes. A test validates this file against the schema.
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

 --
 Manik Surtani
 ma...@jboss.org
 twitter.com/maniksurtani

 Lead, Infinispan
 http://www.infinispan.org




 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] JBoss Libra

2012-01-31 Thread Mircea Markus

On 31 Jan 2012, at 09:34, Bela Ban wrote:

 IMO, measuring object size using java.lang.instrument is not a good 
 idea: first of all, it's probably very slow and the time to do so is 
 linear to the number of live objects.
+1
I remember prototyping this for JBossCache and the harness and that's what I 
found as well. 
 Second, this probably takes into 
 acount only objects in the cache, but not the cache structures used by 
 Infinispan, JGroups and so on...
 
 The approach I've recommended before is to trigger an eviction policy 
 based on free/available memory. This can easily be fetched from the JVM 
 via JMX...
..or keep the in-memory data in serialized form (byte[]) - that can be counted 
- and add an empiric factor  (TBD) for the harness/ISPN structure that holds 
the data. AFAIK coherence does this.
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] JBoss Libra

2012-01-31 Thread Manik Surtani

On 31 Jan 2012, at 11:06, Mircea Markus wrote:

 ..or keep the in-memory data in serialized form (byte[]) - that can be 
 counted - and add an empiric factor  (TBD) for the harness/ISPN structure 
 that holds the data. AFAIK coherence does this.

I believe this is our beset bet.  We already recommend storing serialised forms 
when using DIST so thats easy.

--
Manik Surtani
ma...@jboss.org
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org



___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] JBoss Libra

2012-01-31 Thread Sanne Grinovero
MarshalledValue is already caching the size when serialized to
properly dimension output buffers. This value can be reused..

On 31 January 2012 11:09, Manik Surtani ma...@jboss.org wrote:

 On 31 Jan 2012, at 11:06, Mircea Markus wrote:

 ..or keep the in-memory data in serialized form (byte[]) - that can be
 counted - and add an empiric factor  (TBD) for the harness/ISPN structure
 that holds the data. AFAIK coherence does this.


 I believe this is our beset bet.  We already recommend storing serialised
 forms when using DIST so thats easy.

 --
 Manik Surtani
 ma...@jboss.org
 twitter.com/maniksurtani

 Lead, Infinispan
 http://www.infinispan.org




 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Don't forget to update the XSD when adding configuration

2012-01-31 Thread Pete Muir
No, there were just some little bugs.

On 31 Jan 2012, at 10:30, Sanne Grinovero wrote:

 Was something changed already in the configuration format from 5.1.0.Final ?
 
 On 31 January 2012 10:28, Manik Surtani ma...@jboss.org wrote:
 Please consider this when both developing as well as reviewing code.  This 
 is very important.
 
 On 30 Jan 2012, at 20:17, Pete Muir wrote:
 
 This is not done automatically, you'll need to do it yourself. Make sure to 
 add docs too.
 
 Please also remember to update src/test/resources/configs/all.xml with your 
 new elements or attributes. A test validates this file against the schema.
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
 --
 Manik Surtani
 ma...@jboss.org
 twitter.com/maniksurtani
 
 Lead, Infinispan
 http://www.infinispan.org
 
 
 
 
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev


___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] JBoss Libra

2012-01-31 Thread Bela Ban


On 1/31/12 11:32 AM, Tristan Tarrant wrote:
 On 01/31/2012 10:34 AM, Bela Ban wrote:
 The approach I've recommended before is to trigger an eviction policy
 based on free/available memory. This can easily be fetched from the JVM
 via JMX...
 And maybe you're just close to a large GC and you're evicting for no reason.

IIRC you can look at the size of the young and old generation via JMX, 
and there you can see how much memory has accumulated.

-- 
Bela Ban
Lead JGroups (http://www.jgroups.org)
JBoss / Red Hat
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] JBoss Libra

2012-01-31 Thread Tristan Tarrant
On 01/31/2012 03:32 PM, Bela Ban wrote:
 IIRC you can look at the size of the young and old generation via JMX,
 and there you can see how much memory has accumulated.
Does that work when using G1 ?

Tristan
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] JBoss Libra

2012-01-31 Thread Bela Ban
I don't know, but I don't see why not.

If you look at java.lang.{Memory,GarbageCollector,MemoryPool}, there are 
a lot of values you can look at, including the details on eden and old.

You can even register for notifications when free memory drops below a 
certain threshold, but I've never tried this out and I heard some impls 
don't provide this...


On 1/31/12 3:32 PM, Tristan Tarrant wrote:
 On 01/31/2012 03:32 PM, Bela Ban wrote:
 IIRC you can look at the size of the young and old generation via JMX,
 and there you can see how much memory has accumulated.
 Does that work when using G1 ?

 Tristan
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

-- 
Bela Ban
Lead JGroups (http://www.jgroups.org)
JBoss / Red Hat
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


[infinispan-dev] Proposal: ISPN-1394 Manual rehashing in 5.2

2012-01-31 Thread Sanne Grinovero
I think this is an important feature to have soon;

My understanding of it:

We default with the feature off, and newly discovered nodes are
added/removed as usual. With a JMX operatable switch, one can disable
this:

If a remote node is joining the JGroups view, but rehash is off: it
will be added to a to-be-installed view, but this won't be installed
until rehash is enabled again. This gives time to add more changes
before starting the rehash, and would help a lot to start larger
clusters.

If the [self] node is booting and joining a cluster with manual rehash
off, the start process and any getCache() invocation should block and
wait for it to be enabled. This would need of course to override the
usually low timeouts.

When a node is suspected it's a bit a different story as we need to
make sure no data is lost. The principle is the same, but maybe we
should have two flags: one which is a soft request to avoid rehashes
of less than N members (and refuse N=numOwners ?), one which is just
disable it and don't care: data might be in a cachestore, data might
not be important. Which reminds me, we should consider as well a JMX
command to flush the container to the CacheLoader.

--Sanne
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Proposal: ISPN-1394 Manual rehashing in 5.2

2012-01-31 Thread Bela Ban
This is essentially what I suggested at the Lisbon meeting, right ?

I think Dan had a design wiki on this somewhere...


On 1/31/12 4:53 PM, Sanne Grinovero wrote:
 I think this is an important feature to have soon;

 My understanding of it:

 We default with the feature off, and newly discovered nodes are
 added/removed as usual. With a JMX operatable switch, one can disable
 this:

 If a remote node is joining the JGroups view, but rehash is off: it
 will be added to a to-be-installed view, but this won't be installed
 until rehash is enabled again. This gives time to add more changes
 before starting the rehash, and would help a lot to start larger
 clusters.

 If the [self] node is booting and joining a cluster with manual rehash
 off, the start process and any getCache() invocation should block and
 wait for it to be enabled. This would need of course to override the
 usually low timeouts.

 When a node is suspected it's a bit a different story as we need to
 make sure no data is lost. The principle is the same, but maybe we
 should have two flags: one which is a soft request to avoid rehashes
 of less than N members (and refuse N=numOwners ?), one which is just
 disable it and don't care: data might be in a cachestore, data might
 not be important. Which reminds me, we should consider as well a JMX
 command to flush the container to the CacheLoader.

 --Sanne
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

-- 
Bela Ban
Lead JGroups (http://www.jgroups.org)
JBoss / Red Hat
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


[infinispan-dev] Infinispan 5.1.1.CR1 is out

2012-01-31 Thread Galder Zamarreño
Infinispan 5.1.1.CR1 is out, read all about it in http://goo.gl/EtCeT

Cheers,
--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache


___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Proposal: ISPN-1394 Manual rehashing in 5.2

2012-01-31 Thread Sanne Grinovero
On 31 January 2012 16:06, Bela Ban b...@redhat.com wrote:
 This is essentially what I suggested at the Lisbon meeting, right ?

Yes!

 I think Dan had a design wiki on this somewhere...

Just rising it here as it was moved to 6.0, while I think it deserves
a dedicated thread to better think about it. If it's not hard, I think
it should be done sooner.
But while I started the thread to wake up the brilliant minds, I can't
volunteer for this to make it happen.

Sanne



 On 1/31/12 4:53 PM, Sanne Grinovero wrote:
 I think this is an important feature to have soon;

 My understanding of it:

 We default with the feature off, and newly discovered nodes are
 added/removed as usual. With a JMX operatable switch, one can disable
 this:

 If a remote node is joining the JGroups view, but rehash is off: it
 will be added to a to-be-installed view, but this won't be installed
 until rehash is enabled again. This gives time to add more changes
 before starting the rehash, and would help a lot to start larger
 clusters.

 If the [self] node is booting and joining a cluster with manual rehash
 off, the start process and any getCache() invocation should block and
 wait for it to be enabled. This would need of course to override the
 usually low timeouts.

 When a node is suspected it's a bit a different story as we need to
 make sure no data is lost. The principle is the same, but maybe we
 should have two flags: one which is a soft request to avoid rehashes
 of less than N members (and refuse N=numOwners ?), one which is just
 disable it and don't care: data might be in a cachestore, data might
 not be important. Which reminds me, we should consider as well a JMX
 command to flush the container to the CacheLoader.

 --Sanne
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

 --
 Bela Ban
 Lead JGroups (http://www.jgroups.org)
 JBoss / Red Hat
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Adding a Combiner

2012-01-31 Thread Vladimir Blagojevic

Response from Brent Douglas:

Hi Vladimir,

I'm not sure this is the same thing being discussed however if it is not 
I had intended to request this anyhow.  When I looked into using 
infinispans's map reduce facility this is the the task I came up with:


http://pastebin.com/7GGjVnVt

I would prefer to specify it as:

http://pastebin.com/HTSq3g66

I'm pretty sure this is the not intended use case but it distributes the 
creation of my reports which is what I want. It's not really a big deal 
for me as I can get around this limitation by creating a wrapper class 
such as in the first example but it would be nice if I did not have 
to. Is this a reasonable request?


Also, and this is probably just wrong, when I use this I bundle up the 
task and execute it via JMS. Would it be reasonable to make Collator 
extend Serializable?


Brent


On 12-01-30 11:47 AM, Vladimir Blagojevic wrote:

Guys,

I was looking at this again recently and I still do not understand how 
combiner could have different interface than Reducer! Hadoop forces a 
user to implement combiner as a Reducer 
http://developer.yahoo.com/hadoop/tutorial/module4.html#functionality 
and 
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setCombinerClass%28java.lang.Class%29 
In addition, the original paper does not mention any change of types.


What we have admittedly done wrong is to apply Reducer on individual 
Mapper without checking if a reduce function is both /commutative/ and 
/associative/! This can lead to problems: 
http://philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/


So yes, I am all for adding Combiner (it should do the optional 
reducer per mapper we do automatically now) but I do not see why we 
have to change the interface!



Regards,
Vladimir




___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] JBoss Libra

2012-01-31 Thread Dan Berindei
On Tue, Jan 31, 2012 at 4:49 PM, Bela Ban b...@redhat.com wrote:
 I don't know, but I don't see why not.

 If you look at java.lang.{Memory,GarbageCollector,MemoryPool}, there are
 a lot of values you can look at, including the details on eden and old.


I just looked at G1 with JConsole, it does provide memory usage
information in (java.lang:type=MemoryPool, name={G1 Young Gen, G1
Survivor, G1 Old Gen}).Usage. However, the locations are not standard,
so we'd have to keep a bunch of rules about where to look for each
garbage collector.

Plus I'm not sure what looking at each generation separately would buy
us over looking only at Runtime.freeMemory().

 You can even register for notifications when free memory drops below a
 certain threshold, but I've never tried this out and I heard some impls
 don't provide this...


The biggest problem I think with looking at the amount of used/free
memory is used memory includes dead objects. So you'll get a free
memory threshold notification and start evicting entries, but you
won't know how much memory you've freed (or even if you needed to
evict any entries in the first place) without triggering an extremely
expensive full GC.

I second Mircea's idea of using the serialized data size, that's the
only reliable data we've got.

Cheers
Dan



 On 1/31/12 3:32 PM, Tristan Tarrant wrote:
 On 01/31/2012 03:32 PM, Bela Ban wrote:
 IIRC you can look at the size of the young and old generation via JMX,
 and there you can see how much memory has accumulated.
 Does that work when using G1 ?

 Tristan
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

 --
 Bela Ban
 Lead JGroups (http://www.jgroups.org)
 JBoss / Red Hat
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] again: no physical address

2012-01-31 Thread Dan Berindei
Hi Bela

I guess it's pretty clear now... In Sanne's thread dump the main
thread is blocked in a cache.put() call after the cluster has
supposedly already formed:

org.infinispan.benchmark.Transactional.main() prio=10
tid=0x7ff4045de000 nid=0x7c92 in Object.wait()
[0x7ff40919d000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0x0007f61997d0 (a
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$FutureCollator)
at 
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$FutureCollator.getResponseList(CommandAwareRpcDispatcher.java:372)
...
at 
org.infinispan.distribution.DistributionManagerImpl.retrieveFromRemoteSource(DistributionManagerImpl.java:169)
...
at org.infinispan.CacheSupport.put(CacheSupport.java:52)
at org.infinispan.benchmark.Transactional.start(Transactional.java:110)
at org.infinispan.benchmark.Transactional.main(Transactional.java:70)

State transfer was disabled, so during the cluster startup the nodes
only had to communicate with the coordinator and not between them. The
put command had to get the old value from another node, so it needed
the physical address and had to block until PING would retrieve it.

Does PING use RSVP or does it wait for the normal STABLE timeout for
retransmission? Note that everything is blocked at this point, we
won't send another message in the entire cluster until we got the
physical address.

I'm sure you've already considered it before, but why not make the
physical addresses a part of the view installation message? This
should ensure that every node can communicate with every other node by
the time the view is installed.


I'm also not sure what to make of these lines:

 [org.jgroups.protocols.UDP] sanne-55119: no physical address for
 sanne-53650, dropping message
 [org.jgroups.protocols.pbcast.GMS] JOIN(sanne-55119) sent to
 sanne-53650 timed out (after 3000 ms), retrying

It appears that sanne-55119 knows the logical name of sanne-53650, and
the fact that it's coordinator, but not its physical address.
Shouldn't all of this information have arrived at the same time?


Cheers
Dan


On Tue, Jan 31, 2012 at 4:31 PM, Bela Ban b...@redhat.com wrote:
 This happens every now and then, when multiple nodes join at the same
 time, on the same host and PING has a small num_initial_mbrs.

 Since 2.8, the identity of a member is not an IP address:port anymore,
 but a UUID. The UUID has to be mapped to an IP address (and port), and
 every member maintains a table of UUIDs/IP addresses. This table is
 populated at startup, but the shipping of the IP address/UUID
 association is unreliable (in the case of UDP), so packets do get
 dropped when there are traffic spikes, like concurrent startup, or when
 the high CPU usage slows down things.

 If we need to send a unicast message to P, and the table doesn't have a
 mapping for P, PING multicasts a discovery request, and drops the
 message. Every member responds with the IP address of P, which is then
 added to the table. The next time the message is sent (through
 retransmission), P's IP address will be available, and the unicast send
 should succeed.

 Of course, if the multicast or unicast response is dropped too, we'll
 run this protocol again... and again ... and again, until we finally
 have a valid IP address for P.


 On 1/31/12 11:29 AM, Manik Surtani wrote:
 I have sporadically seen this before when running some perf tests as well … 
 curious to know what's up.

 On 30 Jan 2012, at 17:45, Sanne Grinovero wrote:

 Hi Bela,
 this is the same error we where having in Boston when preparing the
 Infinispan nodes for some of the demos. So I didn't see it for a long
 time, but today it returned especially to add a special twist to my
 performance tests.

 Dan,
 when this happened it looked like I had a deadlock: the benchmark is
 not making any more progress, it looks like they are all waiting for
 answers. JConsole didn't detect a deadlock, and unfortunately I'm not
 having more logs than this from nor JGroups nor Infinispan (since it
 was supposed to be a performance test!).

 I'm attaching a threaddump in case it interests you, but I hope not:
 this is a DIST test with 12 nodes (in the same VM from this dump). I
 didn't have time to inspect it myself as I have to run, and I think
 the interesting news here is with the no physical address

 ideas?

 [org.jboss.logging] Logging Provider: org.jboss.logging.Log4jLoggerProvider
 [org.jgroups.protocols.UDP] sanne-55119: no physical address for
 sanne-53650, dropping message
 [org.jgroups.protocols.pbcast.GMS] JOIN(sanne-55119) sent to
 sanne-53650 timed out (after 3000 ms), retrying
 [org.jgroups.protocols.pbcast.GMS] sanne-55119 already present;
 returning existing view [sanne-53650|5] [sanne-53650, sanne-49978,
 sanne-27401, sanne-4741, sanne-29196, sanne-55119]
 [org.jgroups.protocols.UDP] 

Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-01-31 Thread Dan Berindei
It's true, but then JGroups' GroupRequest does exactly the same thing...

socket.send() takes some time too, I thought sending the request in
parallel would mean calling socket.send() on a separate thread for
each recipient.

Cheers
Dan


On Fri, Jan 27, 2012 at 6:41 PM, Manik Surtani ma...@jboss.org wrote:
 Doesn't setBlockForResults(false) mean that we're not waiting on a response, 
 and can proceed to the next message to the next recipient?

 On 27 Jan 2012, at 16:34, Dan Berindei wrote:

 Manik, Bela, I think we send the requests sequentially as well. In
 ReplicationTask.call:

               for (Address a : targets) {
                  NotifyingFutureObject f =
 sendMessageWithFuture(constructMessage(buf, a), opts);
                  futureCollator.watchFuture(f, a);
               }


 In MessageDispatcher.sendMessageWithFuture:

        UnicastRequestT req=new UnicastRequestT(msg, corr, dest, options);
        req.setBlockForResults(false);
        req.execute();


 Did we use to send each request on a separate thread?


 Cheers
 Dan


 On Fri, Jan 27, 2012 at 1:21 PM, Bela Ban b...@redhat.com wrote:
 yes.

 On 1/27/12 12:13 PM, Manik Surtani wrote:

 On 25 Jan 2012, at 09:42, Bela Ban wrote:

 No, parallel unicasts will be faster, as an anycast to A,B,C sends the
 unicasts sequentially

 Is this still the case in JG 3.x?


 --
 Bela Ban
 Lead JGroups (http://www.jgroups.org)
 JBoss / Red Hat
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

 --
 Manik Surtani
 ma...@jboss.org
 twitter.com/maniksurtani

 Lead, Infinispan
 http://www.infinispan.org




 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] again: no physical address

2012-01-31 Thread Bela Ban


On 1/31/12 10:55 PM, Dan Berindei wrote:
 Hi Bela

 I guess it's pretty clear now... In Sanne's thread dump the main
 thread is blocked in a cache.put() call after the cluster has
 supposedly already formed:

 org.infinispan.benchmark.Transactional.main() prio=10
 tid=0x7ff4045de000 nid=0x7c92 in Object.wait()
 [0x7ff40919d000]
 java.lang.Thread.State: TIMED_WAITING (on object monitor)
  at java.lang.Object.wait(Native Method)
  - waiting on0x0007f61997d0  (a
 org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$FutureCollator)
  at 
 org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$FutureCollator.getResponseList(CommandAwareRpcDispatcher.java:372)
  ...
  at 
 org.infinispan.distribution.DistributionManagerImpl.retrieveFromRemoteSource(DistributionManagerImpl.java:169)
  ...
  at org.infinispan.CacheSupport.put(CacheSupport.java:52)
  at 
 org.infinispan.benchmark.Transactional.start(Transactional.java:110)
  at org.infinispan.benchmark.Transactional.main(Transactional.java:70)

 State transfer was disabled, so during the cluster startup the nodes
 only had to communicate with the coordinator and not between them. The
 put command had to get the old value from another node, so it needed
 the physical address and had to block until PING would retrieve it.


That's not the way it works; at startup of F, it sends its IP address 
with the discovery request. Everybody returns its IP address with the 
discovery response, so even though we have F only talking to A (the 
coordinator) initially, F will also know the IP addresses of A,B,C,D and E.



 Does PING use RSVP


No: (1) I don;'t want a dependency of Discovery on RSVP and (2) the 
discovery is unreliable; discovery requests or responses can get dropped.


 or does it wait for the normal STABLE timeout for retransmission?


  Note that everything is blocked at this point, we
 won't send another message in the entire cluster until we got the physical 
 address.


As I said; this is an exceptional case, probably caused by Sanne 
starting 12 channels inside the same JVM, at the same time, therefore 
causing a traffic spike, which results in dropped discovery requests or 
responses.

After than, when F wants to talk to C, it asks the cluster for C's IP 
address, and that should be a few ms at most.


 I'm sure you've already considered it before, but why not make the
 physical addresses a part of the view installation message? This
 should ensure that every node can communicate with every other node by
 the time the view is installed.


There's a few reasons:

- I don't want to make GMS dependent on logical addresses. GMS is 
completely independent and shouldn't know about physical addresses
- At the time GMS kicks in, it's already too late. Remember, F needs to 
send a unicast JOIN request to A, but at this point it doesn't yet know 
A's address
- MERGE{2,3} also use discovery to detect sub-partitions to be merged, 
so discovery needs to be a separate piece of functionality
- A View is already big as it is, and I've managed to reduce its size 
even more, but adding physical addresses would blow up the size of View 
even more, especially in large clusters


 I'm also not sure what to make of these lines:

 [org.jgroups.protocols.UDP] sanne-55119: no physical address for
 sanne-53650, dropping message
 [org.jgroups.protocols.pbcast.GMS] JOIN(sanne-55119) sent to
 sanne-53650 timed out (after 3000 ms), retrying

 It appears that sanne-55119 knows the logical name of sanne-53650, and
 the fact that it's coordinator, but not its physical address.
 Shouldn't all of this information have arrived at the same time?

Hmm, correct. However, the logical names are kept in (a static) 
UUID.cache and the IP addresses in TP.logical_addr_cache.

I suggest to do the following when this happens (can you reproduce this ?):
- Before: set enable_diagnostics=true in UDP
- probe.sh op=UDP.printLogicalAddressCache // you can replace probe.sh 
with java -jar jgroups.jar org.jgroups.tests.Probe

Here you can dump the logical caches, to see whether this information is 
absent.

You could also enable tracing for PING:
probe.sh op=PING.setLevel[trace]

-- 
Bela Ban
Lead JGroups (http://www.jgroups.org)
JBoss / Red Hat
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev