Re: [infinispan-dev] [hibernate-dev] HSEARCH-1296

2013-04-16 Thread Sanne Grinovero
On 16 April 2013 09:51, Bela Ban b...@redhat.com wrote:


 On 4/15/13 8:27 PM, Dan Berindei wrote:
 Sorry for missing your message, Ales!

 Anyway, good news, we found out why the test was taking so long: the
 Message instance passed to dispatcher.cast() already had a destination
 address set, and JGroups only sent the message to that address, even
 though the dispatcher was waiting for a reply from the local node as well.


 I modified the code to throw an exception when this happens, as this is
 illegal. In 4.0 [1], I'll remove the Message object as parameter and
 users have to pass a buffer and a request options obj instead.

Thanks!
Sanne


 [1] https://issues.jboss.org/browse/JGRP-1620


 --
 Bela Ban, JGroups lead (http://www.jgroups.org)
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] [hibernate-dev] HSEARCH-1296

2013-04-15 Thread Ales Justin
@Bela -- I can help you setup the whole CD env. ;-)

On Apr 15, 2013, at 7:14 AM, Bela Ban b...@redhat.com wrote:

 
 
 On 4/13/13 1:42 PM, Sanne Grinovero wrote:
 On 13 April 2013 11:20, Bela Ban b...@redhat.com wrote:
 
 
 On 4/13/13 2:02 AM, Sanne Grinovero wrote:
 
 @All, the performance problem seemed to be caused by a problem in
 JGroups, which I've logged here:
 https://issues.jboss.org/browse/JGRP-1617
 
 
 Almost no information attached to the case :-( If it wasn't you, Sanne,
 I'd outright reject the case ...
 
 I wouldn't blame you, and am sorry for the lack of details: as I said
 it was very late, still I preferred to share the observations we made
 so far.
 
 
 OK, but I'll need more info if you want me to look into this.
 
 
 From all the experiments we made - and some good logs I'll cleanup for
 sharing - it's clear that the thread is not woken up while the ACK was
 already received.
 
 How do you know the ack has been received ? Logs ? Debugging ?
 
 And of course I wouldn't expect this to fail in a simple test as it
 wouldn't have escaped you ;-) or at least you would have had earlier
 reports.
 
 There are lots of complex moving parts in this scenario: from a Muxed
 JGroups Channel, and the Application Server responsible for
 initializing the stack with some added magic from CapeDwarf itself:
 it's not clear to me what configuration is exactly being used, for one.
 
 
 I see.
 
 
 Without a testcase we might not be 100% sure but it seems likely to be
 an unexpected behaviour in JGroups, at least under some very specific
 setup.
 
 I'm glad to help tracking down more details of what could trigger
 this, but I'm not too eager to write a full unit test for this as it
 involves a lot of other components, and by mocking my own components
 out I could still reproduce it: it's not Hibernate Search, so I'll
 need the help from the field experts.
 
 
 It would at least be nice to have some logs. I understand that even 
 though the test involves many components, it is short lived, so logs 
 (even at TRACE) shouldn't be too big.
 
 
 
 If you add more information to JGRP-1617, I'll take a look. This would
 be a critical bug in JGroups *if* you can prove that the
 MessageDispatcher always runs into the timeout (I don't think you can
 though !).
 
 Considering the easy workaround and that definitely this needs
 something special in the configuration, I wouldn't consider it too
 critical? For as far as we know now, it's entirely possible the
 configuration being used is illegal. But this is exactly where I need
 your help ;-)
 
 
 I'm here to help. But I need more information.
 
 
 -- 
 Bela Ban, JGroups lead (http://www.jgroups.org)
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] [hibernate-dev] HSEARCH-1296

2013-04-15 Thread Dan Berindei
On Sat, Apr 13, 2013 at 3:02 AM, Sanne Grinovero sa...@hibernate.orgwrote:

 that's right, as suggested by Emmanuel I plan to separate the JGroups
 Sync/Async options from the worker.execution property so you can play
 with the two independently.
 I think the JGroups option's default could depend on the backend - if
 not otherwise specified, and if we all agree it doesn't make it too
 confusing.

 @All, the performance problem seemed to be caused by a problem in
 JGroups, which I've logged here:
 https://issues.jboss.org/browse/JGRP-1617

 For the record, the first operation was indeed triggering some lazy
 initialization of indexes, which in turn would trigger a Lucene
 Directory being started, triggering 3 Cache starts which in turn would
 trigger 6 state transfer processes: so indeed the first operation
 would not be exactly cheap performance wise, still this would
 complete in about 120 milliseconds.
 The same cost is paid again when the second node is hit the first
 time, after that index write operations block the writer for 1ms (not
 investigated further on potential throughput).

 Not being sure about the options of depending to a newer JGroups
 release or the complexity of a fix, I'll implement a workaround in
 HSearch in the scope of HSEARCH-1296.

 As a lesson learned, I think we need to polish some of our TRACE level
 messaged to include the cache name:



Sanne, we already push the cache name in the NDC if trace is enabled for
the entry point of the thread. So for your purpose, I think enabling
trace for org.infinispan.interceptors.InvocationContextInterceptor and
including %x in your pattern layout should work.



 to resolve this we had not just
 many threads and components but also 4 of them where using JGroups
 (interleaving messages of all sorts) and 9 different caches where
 involved for each simple write operation in CD: made it interesting to
 figure what was going on! Also I'm wondering how hard it would be to
 have a log parser which converts my 10GB of text log from today in a
 graphical sequence diagram.
 Big thanks to Mircea who helped me figuring this out.

 Sanne

 On 12 April 2013 21:10, Ales Justin ales.jus...@gmail.com wrote:
  I think we need more fine-grained config for this new JGroups sync
 feature.
 
  I added this to our cache config
 
  property
 name=hibernate.search.default.worker.executionasync/property
 
  and it broke our tests.
 
  Where previous (old / non JGroups sync) behavior worked.
 
  It of course also works  without this async config,
  but in this case we don't need sync / ACK JGroups message.
  (we didn't have one before and it worked ;-)
 
  -Ales
 
  On Apr 11, 2013, at 11:51 PM, Sanne Grinovero sa...@hibernate.org
 wrote:
 
  There is a blackhole indexing backend, which pipes all indexing
  requests  /dev/null
 
  Set this as an Infinispan Query configuration property:
 
 default.worker.backend = blackhole
 
  Of course that means that the index will not be updated: you might
  need to adapt your test to tolerate that, but the point is not
  functional testing but to verify how much the SYNC option on the
  JGroups backend is actually slowing you down. I suspect the
  performance penalty is not in the network but in the fact you're now
  waiting for the index operations, while in async you where not waiting
  for them to be flushed.
 
  If you can identify which part is slow, then we can help you with
  better configuration options.
 
 
  On 11 April 2013 20:47, Ales Justin ales.jus...@gmail.com wrote:
  What do you mean?
 
  On Apr 11, 2013, at 21:41, Sanne Grinovero sa...@hibernate.org
 wrote:
 
  You could try the new sync version but setting the blackhole backend
 on the
  master node to remove the indexing overhead from the picture.
 
  On Apr 11, 2013 8:39 PM, Sanne Grinovero sa...@hibernate.org
 wrote:
 
  Are you sure that the async version actually had applied all writes
 to the
  index in the measured interval?
 
  On Apr 11, 2013 8:13 PM, Ales Justin ales.jus...@gmail.com wrote:
 
  Although this change fixes query lookup,
  it adds horrible performance:
 
  Running CapeDwarf cluster QueryTest:
 
  with HSEARCH-1296
 
  21:00:27,188 INFO
  [org.hibernate.search.indexes.impl.DirectoryBasedIndexManager]
  (http-/192.168.1.102:8080-1) HSEARCH000168: Serialization service
 Avro
  SerializationProvider v1.0 being used for index
  'default_capedwarf-test__com.google.appengine.api.datastore.Entity'
  21:01:17,911 INFO  [org.jboss.web] (ServerService Thread Pool -- 49)
  JBAS018224: Unregister web context: /capedwarf-tests
 
  50sec
 
  old 4.2.0.Final HS
 
  21:08:19,988 INFO
  [org.hibernate.search.indexes.impl.DirectoryBasedIndexManager]
  (http-/192.168.1.102:8080-2) HSEARCH000168: Serialization service
 Avro
  SerializationProvider v1.0 being used for index
  'default_capedwarf-test__com.google.appengine.api.datastore.Entity'
  21:08:20,829 INFO  [org.jboss.web] (ServerService Thread Pool -- 49)
  JBAS018224: 

Re: [infinispan-dev] [hibernate-dev] HSEARCH-1296

2013-04-15 Thread Dan Berindei
On Sat, Apr 13, 2013 at 2:42 PM, Sanne Grinovero sa...@infinispan.orgwrote:

 On 13 April 2013 11:20, Bela Ban b...@redhat.com wrote:
 
 
  On 4/13/13 2:02 AM, Sanne Grinovero wrote:
 
  @All, the performance problem seemed to be caused by a problem in
  JGroups, which I've logged here:
  https://issues.jboss.org/browse/JGRP-1617
 
 
  Almost no information attached to the case :-( If it wasn't you, Sanne,
  I'd outright reject the case ...

 I wouldn't blame you, and am sorry for the lack of details: as I said
 it was very late, still I preferred to share the observations we made
 so far.

 From all the experiments we made - and some good logs I'll cleanup for
 sharing - it's clear that the thread is not woken up while the ACK was
 already received.
 And of course I wouldn't expect this to fail in a simple test as it
 wouldn't have escaped you ;-) or at least you would have had earlier
 reports.

 There are lots of complex moving parts in this scenario: from a Muxed
 JGroups Channel, and the Application Server responsible for
 initializing the stack with some added magic from CapeDwarf itself:
 it's not clear to me what configuration is exactly being used, for
 one.


Does CD also change the JGroups configuration? I thought it only tweaks the
Infinispan cache configuration on deployment, and the JGroups channel is
already started by the time the CD application is deployed.



 Without a testcase we might not be 100% sure but it seems likely to be
 an unexpected behaviour in JGroups, at least under some very specific
 setup.


I'm glad to help tracking down more details of what could trigger
 this, but I'm not too eager to write a full unit test for this as it
 involves a lot of other components, and by mocking my own components
 out I could still reproduce it: it's not Hibernate Search, so I'll
 need the help from the field experts.

 Also I suspect a test would need to depend on many more components: is
 JGroups having an easy way to manage dependencies nowadays?

 some more inline:

 
  The MessageDispatcher will *not* wait until the timeout kicks in, it'll
  return as soon as it has acks from all members of the target set. This
  works and is covered with a bunch of unit tests, so a regression would
  have been caught immediately.

 I don't doubt the vanilla scenario, but this is what happens in the
 more complex case of the CapeDwarf setup.


My first guess would be that the MuxRpcDispatcher on the second node hasn't
started yet by the time you call castMessage on the first node. It could be
that your workaround just delayed the message a little bit, until the
MuxRpcDispatcher on the other node actually started (because the JChannel
is already started on both nodes, but as long as the MuxRpcDispatcher isn't
started on the 2nd node it won't send any responses back).



 
  I attached a test program to JGRP-1617 which shows that this feature
  works correctly.
 
  Of course, if you lose an ack (e.g. due to a maxed out incoming / OOB
  thread pool), the unicast protocol will have to retransmit the ack until
  it has been received. Depending on the unicast protocol you use, this
  will be immediate (UNICAST, UNICAST3), or based on a stability interval
  (UNICAST2).

 Right it's totally possible this is a stack configuration problem in the
 AS.
 I wouldn't be the best to ask that though, I don't even understand the
 configuration format.


You can get the actual JGroups configuration with
channel.getProtocolStack().printProtocolSpecAsXml(), but I wouldn't expect
you to find any surprises there: they should use pretty much the JGroups
defaults.

By default STABLE.desired_avg_gossip is 20s and STABLE.stability_delay is
6s, so even if the message was lost it should take  30s for the message to
be resent.




  For the record, the first operation was indeed triggering some lazy
  initialization of indexes, which in turn would trigger a Lucene
  Directory being started, triggering 3 Cache starts which in turn would
  trigger 6 state transfer processes: so indeed the first operation
  would not be exactly cheap performance wise, still this would
  complete in about 120 milliseconds.
 
  This sounds very low for the work you describe above. I don't think 6
  state transfers can be completed in 120ms, unless they're async (but
  then that means they're not done when you return). Also, cache starts
  (wrt JGroups) will definitely take more than a few seconds if you're the
  first cluster node...

 It's a unit test: the caches are initially empty and networking is
 loopback,
 on the second round some ~6 elements are in the cache, no larger than
 ~10 character strings.
 Should be reasonable?


Yes, I think it's reasonable, if the JChannel was already started before
the CD application was deployed. Starting the first JChannel would take at
least 3s, which is the default PING.timeout.



  Not being sure about the options of depending to a newer JGroups
  release or the complexity of a fix, I'll implement a workaround 

Re: [infinispan-dev] [hibernate-dev] HSEARCH-1296

2013-04-15 Thread Sanne Grinovero
I've attached the logs the the JIRA.

Some replies inline:

On 15 April 2013 11:04, Dan Berindei dan.berin...@gmail.com wrote:



 On Sat, Apr 13, 2013 at 2:42 PM, Sanne Grinovero sa...@infinispan.org
 wrote:

 On 13 April 2013 11:20, Bela Ban b...@redhat.com wrote:
 
 
  On 4/13/13 2:02 AM, Sanne Grinovero wrote:
 
  @All, the performance problem seemed to be caused by a problem in
  JGroups, which I've logged here:
  https://issues.jboss.org/browse/JGRP-1617
 
 
  Almost no information attached to the case :-( If it wasn't you, Sanne,
  I'd outright reject the case ...

 I wouldn't blame you, and am sorry for the lack of details: as I said
 it was very late, still I preferred to share the observations we made
 so far.

 From all the experiments we made - and some good logs I'll cleanup for
 sharing - it's clear that the thread is not woken up while the ACK was
 already received.
 And of course I wouldn't expect this to fail in a simple test as it
 wouldn't have escaped you ;-) or at least you would have had earlier
 reports.

 There are lots of complex moving parts in this scenario: from a Muxed
 JGroups Channel, and the Application Server responsible for
 initializing the stack with some added magic from CapeDwarf itself:
 it's not clear to me what configuration is exactly being used, for
 one.


 Does CD also change the JGroups configuration? I thought it only tweaks the
 Infinispan cache configuration on deployment, and the JGroups channel is
 already started by the time the CD application is deployed.

CD uses a custom AS build and a custom AS configuration, so anything
could be different.
On top of that, some things are reconfigured programmatically by it.


 Without a testcase we might not be 100% sure but it seems likely to be
 an unexpected behaviour in JGroups, at least under some very specific
 setup.


 I'm glad to help tracking down more details of what could trigger
 this, but I'm not too eager to write a full unit test for this as it
 involves a lot of other components, and by mocking my own components
 out I could still reproduce it: it's not Hibernate Search, so I'll
 need the help from the field experts.

 Also I suspect a test would need to depend on many more components: is
 JGroups having an easy way to manage dependencies nowadays?

 some more inline:

 
  The MessageDispatcher will *not* wait until the timeout kicks in, it'll
  return as soon as it has acks from all members of the target set. This
  works and is covered with a bunch of unit tests, so a regression would
  have been caught immediately.

 I don't doubt the vanilla scenario, but this is what happens in the
 more complex case of the CapeDwarf setup.


 My first guess would be that the MuxRpcDispatcher on the second node hasn't
 started yet by the time you call castMessage on the first node. It could be
 that your workaround just delayed the message a little bit, until the
 MuxRpcDispatcher on the other node actually started (because the JChannel is
 already started on both nodes, but as long as the MuxRpcDispatcher isn't
 started on the 2nd node it won't send any responses back).

Before the point in which Search uses the dispatcher, many more
operations happened succesfully and with a reasonable timing:
especially some transactions on Infinispan stored entries quickly and
without trouble.

Besides if such a race condition would be possible, I would consider
it a critical bug.


  I attached a test program to JGRP-1617 which shows that this feature
  works correctly.
 
  Of course, if you lose an ack (e.g. due to a maxed out incoming / OOB
  thread pool), the unicast protocol will have to retransmit the ack until
  it has been received. Depending on the unicast protocol you use, this
  will be immediate (UNICAST, UNICAST3), or based on a stability interval
  (UNICAST2).

 Right it's totally possible this is a stack configuration problem in the
 AS.
 I wouldn't be the best to ask that though, I don't even understand the
 configuration format.


 You can get the actual JGroups configuration with
 channel.getProtocolStack().printProtocolSpecAsXml(), but I wouldn't expect
 you to find any surprises there: they should use pretty much the JGroups
 defaults.

Nice tip. I'll add this as a logging option.


 By default STABLE.desired_avg_gossip is 20s and STABLE.stability_delay is
 6s, so even if the message was lost it should take  30s for the message to
 be resent.

The delay is actually ~10 seconds per RPC, so still 30s.
The reason the overall test takes 60 seconds is because there are 6
operations being performed.


  For the record, the first operation was indeed triggering some lazy
  initialization of indexes, which in turn would trigger a Lucene
  Directory being started, triggering 3 Cache starts which in turn would
  trigger 6 state transfer processes: so indeed the first operation
  would not be exactly cheap performance wise, still this would
  complete in about 120 milliseconds.
 
  This sounds very low for the work 

Re: [infinispan-dev] [hibernate-dev] HSEARCH-1296

2013-04-15 Thread Ales Justin
 Does CD also change the JGroups configuration? I thought it only tweaks the
 Infinispan cache configuration on deployment, and the JGroups channel is
 already started by the time the CD application is deployed.
 
 CD uses a custom AS build and a custom AS configuration, so anything
 could be different.
 On top of that, some things are reconfigured programmatically by it.

We don't change the JGroups config at all -- we just use whatever is in AS.
(we atm use 7.2.0.Final tag)

Also, caches are mostly same as their templates in standalone-capedwarf.xml.
We just tweak some indexing properties programmatically,
as there are things we cannot set via string props; e.g. ClassLoader instance.

-Ales


___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] [hibernate-dev] HSEARCH-1296

2013-04-15 Thread Dan Berindei
On Mon, Apr 15, 2013 at 1:30 PM, Sanne Grinovero sa...@infinispan.orgwrote:

 I've attached the logs the the JIRA.

 Some replies inline:

 On 15 April 2013 11:04, Dan Berindei dan.berin...@gmail.com wrote:
 
 
 
  On Sat, Apr 13, 2013 at 2:42 PM, Sanne Grinovero sa...@infinispan.org
  wrote:
 
  On 13 April 2013 11:20, Bela Ban b...@redhat.com wrote:
  
  
   On 4/13/13 2:02 AM, Sanne Grinovero wrote:
  
   @All, the performance problem seemed to be caused by a problem in
   JGroups, which I've logged here:
   https://issues.jboss.org/browse/JGRP-1617
  
  
   Almost no information attached to the case :-( If it wasn't you,
 Sanne,
   I'd outright reject the case ...
 
  I wouldn't blame you, and am sorry for the lack of details: as I said
  it was very late, still I preferred to share the observations we made
  so far.
 
  From all the experiments we made - and some good logs I'll cleanup for
  sharing - it's clear that the thread is not woken up while the ACK was
  already received.
  And of course I wouldn't expect this to fail in a simple test as it
  wouldn't have escaped you ;-) or at least you would have had earlier
  reports.
 
  There are lots of complex moving parts in this scenario: from a Muxed
  JGroups Channel, and the Application Server responsible for
  initializing the stack with some added magic from CapeDwarf itself:
  it's not clear to me what configuration is exactly being used, for
  one.
 
 
  Does CD also change the JGroups configuration? I thought it only tweaks
 the
  Infinispan cache configuration on deployment, and the JGroups channel is
  already started by the time the CD application is deployed.

 CD uses a custom AS build and a custom AS configuration, so anything
 could be different.
 On top of that, some things are reconfigured programmatically by it.


Ales already cleared this out, CD doesn't change the JGroups config at all.


  Without a testcase we might not be 100% sure but it seems likely to be
  an unexpected behaviour in JGroups, at least under some very specific
  setup.
 
 
  I'm glad to help tracking down more details of what could trigger
  this, but I'm not too eager to write a full unit test for this as it
  involves a lot of other components, and by mocking my own components
  out I could still reproduce it: it's not Hibernate Search, so I'll
  need the help from the field experts.
 
  Also I suspect a test would need to depend on many more components: is
  JGroups having an easy way to manage dependencies nowadays?
 
  some more inline:
 
  
   The MessageDispatcher will *not* wait until the timeout kicks in,
 it'll
   return as soon as it has acks from all members of the target set. This
   works and is covered with a bunch of unit tests, so a regression would
   have been caught immediately.
 
  I don't doubt the vanilla scenario, but this is what happens in the
  more complex case of the CapeDwarf setup.
 
 
  My first guess would be that the MuxRpcDispatcher on the second node
 hasn't
  started yet by the time you call castMessage on the first node. It could
 be
  that your workaround just delayed the message a little bit, until the
  MuxRpcDispatcher on the other node actually started (because the
 JChannel is
  already started on both nodes, but as long as the MuxRpcDispatcher isn't
  started on the 2nd node it won't send any responses back).

 Before the point in which Search uses the dispatcher, many more
 operations happened succesfully and with a reasonable timing:
 especially some transactions on Infinispan stored entries quickly and
 without trouble.

 Besides if such a race condition would be possible, I would consider
 it a critical bug.


I looked at the muxer code and I think they actually take care of this
already: MuxUpHandler returns a NoMuxHandler response when it can't find an
appropriate MuxedRpcDispatcher, and MessageDispatcher counts that response
against the number of expected responses. So it must be something else...




   I attached a test program to JGRP-1617 which shows that this feature
   works correctly.
  
   Of course, if you lose an ack (e.g. due to a maxed out incoming / OOB
   thread pool), the unicast protocol will have to retransmit the ack
 until
   it has been received. Depending on the unicast protocol you use, this
   will be immediate (UNICAST, UNICAST3), or based on a stability
 interval
   (UNICAST2).
 
  Right it's totally possible this is a stack configuration problem in the
  AS.
  I wouldn't be the best to ask that though, I don't even understand the
  configuration format.
 
 
  You can get the actual JGroups configuration with
  channel.getProtocolStack().printProtocolSpecAsXml(), but I wouldn't
 expect
  you to find any surprises there: they should use pretty much the JGroups
  defaults.

 Nice tip. I'll add this as a logging option.

 
  By default STABLE.desired_avg_gossip is 20s and STABLE.stability_delay is
  6s, so even if the message was lost it should take  30s for the message
 to
  be resent.

Re: [infinispan-dev] [hibernate-dev] HSEARCH-1296

2013-04-15 Thread Ales Justin
 Looking at your workaround, I think you actually set the response mode to 
 GET_NONE (because that's the default value in RequestOptions), so you're back 
 to sending an asynchronous request.

That was my question as well:

 Shouldn't this synchronous flag still be used?
 
 https://github.com/Sanne/hibernate-search/blob/077f29c245d2d6e960cd6ab59ff58752320d5658/hibernate-search-engine/src/main/java/org/hibernate/search/backend/impl/jgroups/DispatcherMessageSender.java#L57
 
 e.g.
 if (synchronous) {
   int size = 
 dispatcher.getChannel().getView().getMembers().size();
   RequestOptions options = RequestOptions.SYNC();
   options.setRspFilter( new WaitAllFilter( size ) );
 } else {
   options = RequestOptions.ASYNC();
 }


___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] [hibernate-dev] HSEARCH-1296

2013-04-15 Thread Dan Berindei
Sorry for missing your message, Ales!

Anyway, good news, we found out why the test was taking so long: the
Message instance passed to dispatcher.cast() already had a destination
address set, and JGroups only sent the message to that address, even though
the dispatcher was waiting for a reply from the local node as well.

Sanne has verified that setting the destination in the message to null
fixes things, and I have been able to verify this by modifying Bela's test.

Sanne, a few notes:
1) The cast() call doesn't throw a TimeoutException if one of the targets
didn't reply - if you want to throw an exception, you need to check
wasReceived() on each element of the responses list.
2) For ChannelMessageSender as well, channel.send() may throw a
TimeoutException or not - depending on the value of
RSVP.throw_exception_on_timeout. Because of this and the potential conflict
with Infinispan on RSVP.ack_on_delivery, I would strongly recommend using
the DispatcherMessageSender all the time.
3) Because Infinispan calls channel.setDiscardOwnMessages(false) in
5.3.0.Alpha1, and channel.setDiscardOwnMessages(true) in all previous
versions, whether the local node receives a broadcast message depends on
the Infinispan version running on the same channel. If you don't actually
need the local node to process the message, you should use
options.setExclusionList(dispatcher.getChannel().getAddress()) to make it
obvious. If you do need the local node to process the message, you may need
to process the message yourself when channel.getDiscardOwnMessages()
returns true.




On Mon, Apr 15, 2013 at 3:44 PM, Ales Justin ales.jus...@gmail.com wrote:

 Looking at your workaround, I think you actually set the response mode to
 GET_NONE (because that's the default value in RequestOptions), so you're
 back to sending an asynchronous request.


 That was my question as well:

 Shouldn't this synchronous flag still be used?


 https://github.com/Sanne/hibernate-search/blob/077f29c245d2d6e960cd6ab59ff58752320d5658/hibernate-search-engine/src/main/java/org/hibernate/search/backend/impl/jgroups/DispatcherMessageSender.java#L57

 e.g.
 if (synchronous) {
 int size = dispatcher.getChannel().getView().getMembers().size();
 RequestOptions options = RequestOptions.SYNC();
  options.setRspFilter( new WaitAllFilter( size ) );
 } else {
 options = RequestOptions.ASYNC();
 }




 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] [hibernate-dev] HSEARCH-1296

2013-04-14 Thread Bela Ban


On 4/13/13 1:42 PM, Sanne Grinovero wrote:
 On 13 April 2013 11:20, Bela Ban b...@redhat.com wrote:


 On 4/13/13 2:02 AM, Sanne Grinovero wrote:

 @All, the performance problem seemed to be caused by a problem in
 JGroups, which I've logged here:
 https://issues.jboss.org/browse/JGRP-1617


 Almost no information attached to the case :-( If it wasn't you, Sanne,
 I'd outright reject the case ...

 I wouldn't blame you, and am sorry for the lack of details: as I said
 it was very late, still I preferred to share the observations we made
 so far.


OK, but I'll need more info if you want me to look into this.


From all the experiments we made - and some good logs I'll cleanup for
 sharing - it's clear that the thread is not woken up while the ACK was
 already received.

How do you know the ack has been received ? Logs ? Debugging ?

 And of course I wouldn't expect this to fail in a simple test as it
 wouldn't have escaped you ;-) or at least you would have had earlier
 reports.

 There are lots of complex moving parts in this scenario: from a Muxed
 JGroups Channel, and the Application Server responsible for
 initializing the stack with some added magic from CapeDwarf itself:
 it's not clear to me what configuration is exactly being used, for one.


I see.


 Without a testcase we might not be 100% sure but it seems likely to be
 an unexpected behaviour in JGroups, at least under some very specific
 setup.

 I'm glad to help tracking down more details of what could trigger
 this, but I'm not too eager to write a full unit test for this as it
 involves a lot of other components, and by mocking my own components
 out I could still reproduce it: it's not Hibernate Search, so I'll
 need the help from the field experts.


It would at least be nice to have some logs. I understand that even 
though the test involves many components, it is short lived, so logs 
(even at TRACE) shouldn't be too big.



 If you add more information to JGRP-1617, I'll take a look. This would
 be a critical bug in JGroups *if* you can prove that the
 MessageDispatcher always runs into the timeout (I don't think you can
 though !).

 Considering the easy workaround and that definitely this needs
 something special in the configuration, I wouldn't consider it too
 critical? For as far as we know now, it's entirely possible the
 configuration being used is illegal. But this is exactly where I need
 your help ;-)


I'm here to help. But I need more information.


-- 
Bela Ban, JGroups lead (http://www.jgroups.org)
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] [hibernate-dev] HSEARCH-1296

2013-04-13 Thread Ales Justin
Hmmm, did you try our QueryTest with this fix?

With HS update (your jgroupsWorkaround branch), my current run:

Running org.jboss.test.capedwarf.cluster.test.QueryTest
Tests run: 9, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 14.287 sec  
FAILURE!

Results :

Failed tests:   
deleteAndQueryInA(org.jboss.test.capedwarf.cluster.test.QueryTest): Should not 
be here: null
  deleteAndQueryInA_2(org.jboss.test.capedwarf.cluster.test.QueryTest): Should 
not be here: null

-Ales

On Apr 13, 2013, at 2:02 AM, Sanne Grinovero sa...@hibernate.org wrote:

 that's right, as suggested by Emmanuel I plan to separate the JGroups
 Sync/Async options from the worker.execution property so you can play
 with the two independently.
 I think the JGroups option's default could depend on the backend - if
 not otherwise specified, and if we all agree it doesn't make it too
 confusing.
 
 @All, the performance problem seemed to be caused by a problem in
 JGroups, which I've logged here:
 https://issues.jboss.org/browse/JGRP-1617
 
 For the record, the first operation was indeed triggering some lazy
 initialization of indexes, which in turn would trigger a Lucene
 Directory being started, triggering 3 Cache starts which in turn would
 trigger 6 state transfer processes: so indeed the first operation
 would not be exactly cheap performance wise, still this would
 complete in about 120 milliseconds.
 The same cost is paid again when the second node is hit the first
 time, after that index write operations block the writer for 1ms (not
 investigated further on potential throughput).
 
 Not being sure about the options of depending to a newer JGroups
 release or the complexity of a fix, I'll implement a workaround in
 HSearch in the scope of HSEARCH-1296.
 
 As a lesson learned, I think we need to polish some of our TRACE level
 messaged to include the cache name: to resolve this we had not just
 many threads and components but also 4 of them where using JGroups
 (interleaving messages of all sorts) and 9 different caches where
 involved for each simple write operation in CD: made it interesting to
 figure what was going on! Also I'm wondering how hard it would be to
 have a log parser which converts my 10GB of text log from today in a
 graphical sequence diagram.
 Big thanks to Mircea who helped me figuring this out.
 
 Sanne
 
 On 12 April 2013 21:10, Ales Justin ales.jus...@gmail.com wrote:
 I think we need more fine-grained config for this new JGroups sync feature.
 
 I added this to our cache config
 
property 
 name=hibernate.search.default.worker.executionasync/property
 
 and it broke our tests.
 
 Where previous (old / non JGroups sync) behavior worked.
 
 It of course also works  without this async config,
 but in this case we don't need sync / ACK JGroups message.
 (we didn't have one before and it worked ;-)
 
 -Ales
 
 On Apr 11, 2013, at 11:51 PM, Sanne Grinovero sa...@hibernate.org wrote:
 
 There is a blackhole indexing backend, which pipes all indexing
 requests  /dev/null
 
 Set this as an Infinispan Query configuration property:
 
   default.worker.backend = blackhole
 
 Of course that means that the index will not be updated: you might
 need to adapt your test to tolerate that, but the point is not
 functional testing but to verify how much the SYNC option on the
 JGroups backend is actually slowing you down. I suspect the
 performance penalty is not in the network but in the fact you're now
 waiting for the index operations, while in async you where not waiting
 for them to be flushed.
 
 If you can identify which part is slow, then we can help you with
 better configuration options.
 
 
 On 11 April 2013 20:47, Ales Justin ales.jus...@gmail.com wrote:
 What do you mean?
 
 On Apr 11, 2013, at 21:41, Sanne Grinovero sa...@hibernate.org wrote:
 
 You could try the new sync version but setting the blackhole backend on the
 master node to remove the indexing overhead from the picture.
 
 On Apr 11, 2013 8:39 PM, Sanne Grinovero sa...@hibernate.org wrote:
 
 Are you sure that the async version actually had applied all writes to the
 index in the measured interval?
 
 On Apr 11, 2013 8:13 PM, Ales Justin ales.jus...@gmail.com wrote:
 
 Although this change fixes query lookup,
 it adds horrible performance:
 
 Running CapeDwarf cluster QueryTest:
 
 with HSEARCH-1296
 
 21:00:27,188 INFO
 [org.hibernate.search.indexes.impl.DirectoryBasedIndexManager]
 (http-/192.168.1.102:8080-1) HSEARCH000168: Serialization service Avro
 SerializationProvider v1.0 being used for index
 'default_capedwarf-test__com.google.appengine.api.datastore.Entity'
 21:01:17,911 INFO  [org.jboss.web] (ServerService Thread Pool -- 49)
 JBAS018224: Unregister web context: /capedwarf-tests
 
 50sec
 
 old 4.2.0.Final HS
 
 21:08:19,988 INFO
 [org.hibernate.search.indexes.impl.DirectoryBasedIndexManager]
 (http-/192.168.1.102:8080-2) HSEARCH000168: Serialization service Avro
 SerializationProvider v1.0 being used for index

Re: [infinispan-dev] [hibernate-dev] HSEARCH-1296

2013-04-13 Thread Ales Justin
Shouldn't this synchronous flag still be used?

https://github.com/Sanne/hibernate-search/blob/077f29c245d2d6e960cd6ab59ff58752320d5658/hibernate-search-engine/src/main/java/org/hibernate/search/backend/impl/jgroups/DispatcherMessageSender.java#L57

e.g.
if (synchronous) {
int size = 
dispatcher.getChannel().getView().getMembers().size();
RequestOptions options = RequestOptions.SYNC();
options.setRspFilter( new WaitAllFilter( size ) );
} else {
options = RequestOptions.ASYNC();
}

-Ales

On Apr 13, 2013, at 11:25 AM, Ales Justin ales.jus...@gmail.com wrote:

 Hmmm, did you try our QueryTest with this fix?
 
 With HS update (your jgroupsWorkaround branch), my current run:
 
 Running org.jboss.test.capedwarf.cluster.test.QueryTest
 Tests run: 9, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 14.287 sec 
  FAILURE!
 
 Results :
 
 Failed tests:   
 deleteAndQueryInA(org.jboss.test.capedwarf.cluster.test.QueryTest): Should 
 not be here: null
  deleteAndQueryInA_2(org.jboss.test.capedwarf.cluster.test.QueryTest): Should 
 not be here: null
 
 -Ales
 
 On Apr 13, 2013, at 2:02 AM, Sanne Grinovero sa...@hibernate.org wrote:
 
 that's right, as suggested by Emmanuel I plan to separate the JGroups
 Sync/Async options from the worker.execution property so you can play
 with the two independently.
 I think the JGroups option's default could depend on the backend - if
 not otherwise specified, and if we all agree it doesn't make it too
 confusing.
 
 @All, the performance problem seemed to be caused by a problem in
 JGroups, which I've logged here:
 https://issues.jboss.org/browse/JGRP-1617
 
 For the record, the first operation was indeed triggering some lazy
 initialization of indexes, which in turn would trigger a Lucene
 Directory being started, triggering 3 Cache starts which in turn would
 trigger 6 state transfer processes: so indeed the first operation
 would not be exactly cheap performance wise, still this would
 complete in about 120 milliseconds.
 The same cost is paid again when the second node is hit the first
 time, after that index write operations block the writer for 1ms (not
 investigated further on potential throughput).
 
 Not being sure about the options of depending to a newer JGroups
 release or the complexity of a fix, I'll implement a workaround in
 HSearch in the scope of HSEARCH-1296.
 
 As a lesson learned, I think we need to polish some of our TRACE level
 messaged to include the cache name: to resolve this we had not just
 many threads and components but also 4 of them where using JGroups
 (interleaving messages of all sorts) and 9 different caches where
 involved for each simple write operation in CD: made it interesting to
 figure what was going on! Also I'm wondering how hard it would be to
 have a log parser which converts my 10GB of text log from today in a
 graphical sequence diagram.
 Big thanks to Mircea who helped me figuring this out.
 
 Sanne
 
 On 12 April 2013 21:10, Ales Justin ales.jus...@gmail.com wrote:
 I think we need more fine-grained config for this new JGroups sync feature.
 
 I added this to our cache config
 
   property 
 name=hibernate.search.default.worker.executionasync/property
 
 and it broke our tests.
 
 Where previous (old / non JGroups sync) behavior worked.
 
 It of course also works  without this async config,
 but in this case we don't need sync / ACK JGroups message.
 (we didn't have one before and it worked ;-)
 
 -Ales
 
 On Apr 11, 2013, at 11:51 PM, Sanne Grinovero sa...@hibernate.org wrote:
 
 There is a blackhole indexing backend, which pipes all indexing
 requests  /dev/null
 
 Set this as an Infinispan Query configuration property:
 
  default.worker.backend = blackhole
 
 Of course that means that the index will not be updated: you might
 need to adapt your test to tolerate that, but the point is not
 functional testing but to verify how much the SYNC option on the
 JGroups backend is actually slowing you down. I suspect the
 performance penalty is not in the network but in the fact you're now
 waiting for the index operations, while in async you where not waiting
 for them to be flushed.
 
 If you can identify which part is slow, then we can help you with
 better configuration options.
 
 
 On 11 April 2013 20:47, Ales Justin ales.jus...@gmail.com wrote:
 What do you mean?
 
 On Apr 11, 2013, at 21:41, Sanne Grinovero sa...@hibernate.org wrote:
 
 You could try the new sync version but setting the blackhole backend on 
 the
 master node to remove the indexing overhead from the picture.
 
 On Apr 11, 2013 8:39 PM, Sanne Grinovero sa...@hibernate.org wrote:
 
 Are you sure that the async version actually had applied all writes to 
 the
 index in the measured interval?
 
 On Apr 11, 2013 8:13 PM, Ales Justin ales.jus...@gmail.com wrote:
 
 Although this change fixes query lookup,
 it adds horrible performance:
 
 Running CapeDwarf cluster QueryTest:
 
 with 

Re: [infinispan-dev] [hibernate-dev] HSEARCH-1296

2013-04-13 Thread Bela Ban


On 4/13/13 2:02 AM, Sanne Grinovero wrote:

 @All, the performance problem seemed to be caused by a problem in
 JGroups, which I've logged here:
 https://issues.jboss.org/browse/JGRP-1617


Almost no information attached to the case :-( If it wasn't you, Sanne, 
I'd outright reject the case ...

The MessageDispatcher will *not* wait until the timeout kicks in, it'll 
return as soon as it has acks from all members of the target set. This 
works and is covered with a bunch of unit tests, so a regression would 
have been caught immediately.

I attached a test program to JGRP-1617 which shows that this feature 
works correctly.

Of course, if you lose an ack (e.g. due to a maxed out incoming / OOB 
thread pool), the unicast protocol will have to retransmit the ack until 
it has been received. Depending on the unicast protocol you use, this 
will be immediate (UNICAST, UNICAST3), or based on a stability interval 
(UNICAST2).


 For the record, the first operation was indeed triggering some lazy
 initialization of indexes, which in turn would trigger a Lucene
 Directory being started, triggering 3 Cache starts which in turn would
 trigger 6 state transfer processes: so indeed the first operation
 would not be exactly cheap performance wise, still this would
 complete in about 120 milliseconds.

This sounds very low for the work you describe above. I don't think 6 
state transfers can be completed in 120ms, unless they're async (but 
then that means they're not done when you return). Also, cache starts 
(wrt JGroups) will definitely take more than a few seconds if you're the 
first cluster node...


 Not being sure about the options of depending to a newer JGroups
 release or the complexity of a fix, I'll implement a workaround in
 HSearch in the scope of HSEARCH-1296.


If you add more information to JGRP-1617, I'll take a look. This would 
be a critical bug in JGroups *if* you can prove that the 
MessageDispatcher always runs into the timeout (I don't think you can 
though !).


 As a lesson learned, I think we need to polish some of our TRACE level
 messaged to include the cache name: to resolve this we had not just
 many threads and components but also 4 of them where using JGroups
 (interleaving messages of all sorts) and 9 different caches where
 involved for each simple write operation in CD: made it interesting to
 figure what was going on!

Yes, that would help. In JGroups, I usually log the cluster address of 
the thread that's writing to the log, so I can differentiate between 
different clusters on the same host.

 Also I'm wondering how hard it would be to
 have a log parser which converts my 10GB of text log from today in a
 graphical sequence diagram.


Yes, something like wireshark follow TCP feature would be very helpful !

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] [hibernate-dev] HSEARCH-1296

2013-04-13 Thread Sanne Grinovero
As I told you, we had not finished, but after 10h debugging it was
night and we still where in the office, but the ugly experiment you
see was working fine so I opted to send you a preview of the branch
for the case I would not get up early in the morning and you would
need something which worked ;)

Don't know why it fails for you, it passed our tests. Will look again
and polish the code.

On 13 April 2013 10:41, Ales Justin ales.jus...@gmail.com wrote:
 Shouldn't this synchronous flag still be used?

 https://github.com/Sanne/hibernate-search/blob/077f29c245d2d6e960cd6ab59ff58752320d5658/hibernate-search-engine/src/main/java/org/hibernate/search/backend/impl/jgroups/DispatcherMessageSender.java#L57

 e.g.
 if (synchronous) {
 int size = dispatcher.getChannel().getView().getMembers().size();
 RequestOptions options = RequestOptions.SYNC();
 options.setRspFilter( new WaitAllFilter( size ) );
 } else {
 options = RequestOptions.ASYNC();
 }

 -Ales

 On Apr 13, 2013, at 11:25 AM, Ales Justin ales.jus...@gmail.com wrote:

 Hmmm, did you try our QueryTest with this fix?

 With HS update (your jgroupsWorkaround branch), my current run:

 Running org.jboss.test.capedwarf.cluster.test.QueryTest
 Tests run: 9, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 14.287 sec
  FAILURE!

 Results :

 Failed tests:
 deleteAndQueryInA(org.jboss.test.capedwarf.cluster.test.QueryTest): Should
 not be here: null
  deleteAndQueryInA_2(org.jboss.test.capedwarf.cluster.test.QueryTest):
 Should not be here: null

 -Ales

 On Apr 13, 2013, at 2:02 AM, Sanne Grinovero sa...@hibernate.org wrote:

 that's right, as suggested by Emmanuel I plan to separate the JGroups
 Sync/Async options from the worker.execution property so you can play
 with the two independently.
 I think the JGroups option's default could depend on the backend - if
 not otherwise specified, and if we all agree it doesn't make it too
 confusing.

 @All, the performance problem seemed to be caused by a problem in
 JGroups, which I've logged here:
 https://issues.jboss.org/browse/JGRP-1617

 For the record, the first operation was indeed triggering some lazy
 initialization of indexes, which in turn would trigger a Lucene
 Directory being started, triggering 3 Cache starts which in turn would
 trigger 6 state transfer processes: so indeed the first operation
 would not be exactly cheap performance wise, still this would
 complete in about 120 milliseconds.
 The same cost is paid again when the second node is hit the first
 time, after that index write operations block the writer for 1ms (not
 investigated further on potential throughput).

 Not being sure about the options of depending to a newer JGroups
 release or the complexity of a fix, I'll implement a workaround in
 HSearch in the scope of HSEARCH-1296.

 As a lesson learned, I think we need to polish some of our TRACE level
 messaged to include the cache name: to resolve this we had not just
 many threads and components but also 4 of them where using JGroups
 (interleaving messages of all sorts) and 9 different caches where
 involved for each simple write operation in CD: made it interesting to
 figure what was going on! Also I'm wondering how hard it would be to
 have a log parser which converts my 10GB of text log from today in a
 graphical sequence diagram.
 Big thanks to Mircea who helped me figuring this out.

 Sanne

 On 12 April 2013 21:10, Ales Justin ales.jus...@gmail.com wrote:

 I think we need more fine-grained config for this new JGroups sync feature.

 I added this to our cache config

   property
 name=hibernate.search.default.worker.executionasync/property

 and it broke our tests.

 Where previous (old / non JGroups sync) behavior worked.

 It of course also works  without this async config,
 but in this case we don't need sync / ACK JGroups message.
 (we didn't have one before and it worked ;-)

 -Ales

 On Apr 11, 2013, at 11:51 PM, Sanne Grinovero sa...@hibernate.org wrote:

 There is a blackhole indexing backend, which pipes all indexing
 requests  /dev/null

 Set this as an Infinispan Query configuration property:

  default.worker.backend = blackhole

 Of course that means that the index will not be updated: you might
 need to adapt your test to tolerate that, but the point is not
 functional testing but to verify how much the SYNC option on the
 JGroups backend is actually slowing you down. I suspect the
 performance penalty is not in the network but in the fact you're now
 waiting for the index operations, while in async you where not waiting
 for them to be flushed.

 If you can identify which part is slow, then we can help you with
 better configuration options.


 On 11 April 2013 20:47, Ales Justin ales.jus...@gmail.com wrote:

 What do you mean?

 On Apr 11, 2013, at 21:41, Sanne Grinovero sa...@hibernate.org wrote:

 You could try the new sync version but setting the blackhole backend on the
 master node to remove the indexing overhead from the picture.

 On 

Re: [infinispan-dev] [hibernate-dev] HSEARCH-1296

2013-04-13 Thread Sanne Grinovero
On 13 April 2013 11:20, Bela Ban b...@redhat.com wrote:


 On 4/13/13 2:02 AM, Sanne Grinovero wrote:

 @All, the performance problem seemed to be caused by a problem in
 JGroups, which I've logged here:
 https://issues.jboss.org/browse/JGRP-1617


 Almost no information attached to the case :-( If it wasn't you, Sanne,
 I'd outright reject the case ...

I wouldn't blame you, and am sorry for the lack of details: as I said
it was very late, still I preferred to share the observations we made
so far.

From all the experiments we made - and some good logs I'll cleanup for
sharing - it's clear that the thread is not woken up while the ACK was
already received.
And of course I wouldn't expect this to fail in a simple test as it
wouldn't have escaped you ;-) or at least you would have had earlier
reports.

There are lots of complex moving parts in this scenario: from a Muxed
JGroups Channel, and the Application Server responsible for
initializing the stack with some added magic from CapeDwarf itself:
it's not clear to me what configuration is exactly being used, for
one.

Without a testcase we might not be 100% sure but it seems likely to be
an unexpected behaviour in JGroups, at least under some very specific
setup.

I'm glad to help tracking down more details of what could trigger
this, but I'm not too eager to write a full unit test for this as it
involves a lot of other components, and by mocking my own components
out I could still reproduce it: it's not Hibernate Search, so I'll
need the help from the field experts.

Also I suspect a test would need to depend on many more components: is
JGroups having an easy way to manage dependencies nowadays?

some more inline:


 The MessageDispatcher will *not* wait until the timeout kicks in, it'll
 return as soon as it has acks from all members of the target set. This
 works and is covered with a bunch of unit tests, so a regression would
 have been caught immediately.

I don't doubt the vanilla scenario, but this is what happens in the
more complex case of the CapeDwarf setup.


 I attached a test program to JGRP-1617 which shows that this feature
 works correctly.

 Of course, if you lose an ack (e.g. due to a maxed out incoming / OOB
 thread pool), the unicast protocol will have to retransmit the ack until
 it has been received. Depending on the unicast protocol you use, this
 will be immediate (UNICAST, UNICAST3), or based on a stability interval
 (UNICAST2).

Right it's totally possible this is a stack configuration problem in the AS.
I wouldn't be the best to ask that though, I don't even understand the
configuration format.


 For the record, the first operation was indeed triggering some lazy
 initialization of indexes, which in turn would trigger a Lucene
 Directory being started, triggering 3 Cache starts which in turn would
 trigger 6 state transfer processes: so indeed the first operation
 would not be exactly cheap performance wise, still this would
 complete in about 120 milliseconds.

 This sounds very low for the work you describe above. I don't think 6
 state transfers can be completed in 120ms, unless they're async (but
 then that means they're not done when you return). Also, cache starts
 (wrt JGroups) will definitely take more than a few seconds if you're the
 first cluster node...

It's a unit test: the caches are initially empty and networking is loopback,
on the second round some ~6 elements are in the cache, no larger than
~10 character strings.
Should be reasonable?

 Not being sure about the options of depending to a newer JGroups
 release or the complexity of a fix, I'll implement a workaround in
 HSearch in the scope of HSEARCH-1296.


 If you add more information to JGRP-1617, I'll take a look. This would
 be a critical bug in JGroups *if* you can prove that the
 MessageDispatcher always runs into the timeout (I don't think you can
 though !).

Considering the easy workaround and that definitely this needs
something special in the configuration, I wouldn't consider it too
critical? For as far as we know now, it's entirely possible the
configuration being used is illegal. But this is exactly where I need
your help ;-)


 As a lesson learned, I think we need to polish some of our TRACE level
 messaged to include the cache name: to resolve this we had not just
 many threads and components but also 4 of them where using JGroups
 (interleaving messages of all sorts) and 9 different caches where
 involved for each simple write operation in CD: made it interesting to
 figure what was going on!

 Yes, that would help. In JGroups, I usually log the cluster address of
 the thread that's writing to the log, so I can differentiate between
 different clusters on the same host.

Right, thread names and message correlation ids have been extremely useful,
it's mostly the multiple Infinispan - caches without a name statement
which should be improved as often the same user thread would invoke a
sequence of operations on different caches, so same thread 

Re: [infinispan-dev] [hibernate-dev] HSEARCH-1296

2013-04-13 Thread Ales Justin
 There are lots of complex moving parts in this scenario: from a Muxed
 JGroups Channel, and the Application Server responsible for
 initializing the stack with some added magic from CapeDwarf itself:
 it's not clear to me what configuration is exactly being used, for
 one.


Some quick info on how we configure things.

In CapeDwarf we have 10 custom caches, 5 of them are indexable.
And 3 Lucene Infinispan directory caches.

We use AS config as template for these caches,
* 
https://github.com/capedwarf/capedwarf-jboss-as/blob/master/build/src/main/resources/standalone/configuration/standalone-capedwarf.xml
but we sometimes need to adjust the config at deployment time.
* 
https://github.com/capedwarf/capedwarf-jboss-as/blob/master/extension/src/main/java/org/jboss/as/capedwarf/deployment/CapedwarfCacheProcessor.java

Config adjusters:
* 
https://github.com/capedwarf/capedwarf-jboss-as/blob/master/extension/src/main/java/org/jboss/as/capedwarf/services/DatastoreConfigurationCallback.java
* 
https://github.com/capedwarf/capedwarf-jboss-as/blob/master/extension/src/main/java/org/jboss/as/capedwarf/services/DatastoreVersionsConfigurationCallback.java
* 
https://github.com/capedwarf/capedwarf-jboss-as/blob/master/extension/src/main/java/org/jboss/as/capedwarf/services/BasicConfigurationCallback.java
* 
https://github.com/capedwarf/capedwarf-jboss-as/blob/master/extension/src/main/java/org/jboss/as/capedwarf/services/IndexableConfigurationCallback.java

We generate unique mux id per app:
* 
https://github.com/capedwarf/capedwarf-jboss-as/blob/master/extension/src/main/java/org/jboss/as/capedwarf/services/MuxIdGenerator.java

Where each indexing cache gets proper unique number:
* 
https://github.com/capedwarf/capedwarf-jboss-as/blob/master/extension/src/main/java/org/jboss/as/capedwarf/services/IndexableConfigurationCallback.java#L72

HTH

-Ales

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] [hibernate-dev] HSEARCH-1296

2013-04-12 Thread Ales Justin
I think we need more fine-grained config for this new JGroups sync feature.

I added this to our cache config

property 
name=hibernate.search.default.worker.executionasync/property

and it broke our tests.

Where previous (old / non JGroups sync) behavior worked.

It of course also works  without this async config,
but in this case we don't need sync / ACK JGroups message.
(we didn't have one before and it worked ;-)

-Ales

On Apr 11, 2013, at 11:51 PM, Sanne Grinovero sa...@hibernate.org wrote:

 There is a blackhole indexing backend, which pipes all indexing
 requests  /dev/null
 
 Set this as an Infinispan Query configuration property:
 
default.worker.backend = blackhole
 
 Of course that means that the index will not be updated: you might
 need to adapt your test to tolerate that, but the point is not
 functional testing but to verify how much the SYNC option on the
 JGroups backend is actually slowing you down. I suspect the
 performance penalty is not in the network but in the fact you're now
 waiting for the index operations, while in async you where not waiting
 for them to be flushed.
 
 If you can identify which part is slow, then we can help you with
 better configuration options.
 
 
 On 11 April 2013 20:47, Ales Justin ales.jus...@gmail.com wrote:
 What do you mean?
 
 On Apr 11, 2013, at 21:41, Sanne Grinovero sa...@hibernate.org wrote:
 
 You could try the new sync version but setting the blackhole backend on the
 master node to remove the indexing overhead from the picture.
 
 On Apr 11, 2013 8:39 PM, Sanne Grinovero sa...@hibernate.org wrote:
 
 Are you sure that the async version actually had applied all writes to the
 index in the measured interval?
 
 On Apr 11, 2013 8:13 PM, Ales Justin ales.jus...@gmail.com wrote:
 
 Although this change fixes query lookup,
 it adds horrible performance:
 
 Running CapeDwarf cluster QueryTest:
 
 with HSEARCH-1296
 
 21:00:27,188 INFO
 [org.hibernate.search.indexes.impl.DirectoryBasedIndexManager]
 (http-/192.168.1.102:8080-1) HSEARCH000168: Serialization service Avro
 SerializationProvider v1.0 being used for index
 'default_capedwarf-test__com.google.appengine.api.datastore.Entity'
 21:01:17,911 INFO  [org.jboss.web] (ServerService Thread Pool -- 49)
 JBAS018224: Unregister web context: /capedwarf-tests
 
 50sec
 
 old 4.2.0.Final HS
 
 21:08:19,988 INFO
 [org.hibernate.search.indexes.impl.DirectoryBasedIndexManager]
 (http-/192.168.1.102:8080-2) HSEARCH000168: Serialization service Avro
 SerializationProvider v1.0 being used for index
 'default_capedwarf-test__com.google.appengine.api.datastore.Entity'
 21:08:20,829 INFO  [org.jboss.web] (ServerService Thread Pool -- 49)
 JBAS018224: Unregister web context: /capedwarf-tests
 
 841ms
 
 ---
 
 I added
 
property name=enable_bundlingtrue/property
 
 to AS jgroups transport config, but no improvement.
 
 Any (other) idea?
 
 -Ales
 
 
 ___
 hibernate-dev mailing list
 hibernate-...@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/hibernate-dev


___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] [hibernate-dev] HSEARCH-1296

2013-04-12 Thread Sanne Grinovero
that's right, as suggested by Emmanuel I plan to separate the JGroups
Sync/Async options from the worker.execution property so you can play
with the two independently.
I think the JGroups option's default could depend on the backend - if
not otherwise specified, and if we all agree it doesn't make it too
confusing.

@All, the performance problem seemed to be caused by a problem in
JGroups, which I've logged here:
https://issues.jboss.org/browse/JGRP-1617

For the record, the first operation was indeed triggering some lazy
initialization of indexes, which in turn would trigger a Lucene
Directory being started, triggering 3 Cache starts which in turn would
trigger 6 state transfer processes: so indeed the first operation
would not be exactly cheap performance wise, still this would
complete in about 120 milliseconds.
The same cost is paid again when the second node is hit the first
time, after that index write operations block the writer for 1ms (not
investigated further on potential throughput).

Not being sure about the options of depending to a newer JGroups
release or the complexity of a fix, I'll implement a workaround in
HSearch in the scope of HSEARCH-1296.

As a lesson learned, I think we need to polish some of our TRACE level
messaged to include the cache name: to resolve this we had not just
many threads and components but also 4 of them where using JGroups
(interleaving messages of all sorts) and 9 different caches where
involved for each simple write operation in CD: made it interesting to
figure what was going on! Also I'm wondering how hard it would be to
have a log parser which converts my 10GB of text log from today in a
graphical sequence diagram.
Big thanks to Mircea who helped me figuring this out.

Sanne

On 12 April 2013 21:10, Ales Justin ales.jus...@gmail.com wrote:
 I think we need more fine-grained config for this new JGroups sync feature.

 I added this to our cache config

 property 
 name=hibernate.search.default.worker.executionasync/property

 and it broke our tests.

 Where previous (old / non JGroups sync) behavior worked.

 It of course also works  without this async config,
 but in this case we don't need sync / ACK JGroups message.
 (we didn't have one before and it worked ;-)

 -Ales

 On Apr 11, 2013, at 11:51 PM, Sanne Grinovero sa...@hibernate.org wrote:

 There is a blackhole indexing backend, which pipes all indexing
 requests  /dev/null

 Set this as an Infinispan Query configuration property:

default.worker.backend = blackhole

 Of course that means that the index will not be updated: you might
 need to adapt your test to tolerate that, but the point is not
 functional testing but to verify how much the SYNC option on the
 JGroups backend is actually slowing you down. I suspect the
 performance penalty is not in the network but in the fact you're now
 waiting for the index operations, while in async you where not waiting
 for them to be flushed.

 If you can identify which part is slow, then we can help you with
 better configuration options.


 On 11 April 2013 20:47, Ales Justin ales.jus...@gmail.com wrote:
 What do you mean?

 On Apr 11, 2013, at 21:41, Sanne Grinovero sa...@hibernate.org wrote:

 You could try the new sync version but setting the blackhole backend on the
 master node to remove the indexing overhead from the picture.

 On Apr 11, 2013 8:39 PM, Sanne Grinovero sa...@hibernate.org wrote:

 Are you sure that the async version actually had applied all writes to the
 index in the measured interval?

 On Apr 11, 2013 8:13 PM, Ales Justin ales.jus...@gmail.com wrote:

 Although this change fixes query lookup,
 it adds horrible performance:

 Running CapeDwarf cluster QueryTest:

 with HSEARCH-1296

 21:00:27,188 INFO
 [org.hibernate.search.indexes.impl.DirectoryBasedIndexManager]
 (http-/192.168.1.102:8080-1) HSEARCH000168: Serialization service Avro
 SerializationProvider v1.0 being used for index
 'default_capedwarf-test__com.google.appengine.api.datastore.Entity'
 21:01:17,911 INFO  [org.jboss.web] (ServerService Thread Pool -- 49)
 JBAS018224: Unregister web context: /capedwarf-tests

 50sec

 old 4.2.0.Final HS

 21:08:19,988 INFO
 [org.hibernate.search.indexes.impl.DirectoryBasedIndexManager]
 (http-/192.168.1.102:8080-2) HSEARCH000168: Serialization service Avro
 SerializationProvider v1.0 being used for index
 'default_capedwarf-test__com.google.appengine.api.datastore.Entity'
 21:08:20,829 INFO  [org.jboss.web] (ServerService Thread Pool -- 49)
 JBAS018224: Unregister web context: /capedwarf-tests

 841ms

 ---

 I added

property name=enable_bundlingtrue/property

 to AS jgroups transport config, but no improvement.

 Any (other) idea?

 -Ales


 ___
 hibernate-dev mailing list
 hibernate-...@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/hibernate-dev


 ___
 hibernate-dev mailing list
 

Re: [infinispan-dev] [hibernate-dev] HSEARCH-1296

2013-04-11 Thread Sanne Grinovero
Are you sure that the async version actually had applied all writes to the
index in the measured interval?
 On Apr 11, 2013 8:13 PM, Ales Justin ales.jus...@gmail.com wrote:

 Although this change fixes query lookup,
 it adds horrible performance:

 Running CapeDwarf cluster QueryTest:

 with HSEARCH-1296

 21:00:27,188 INFO
  [org.hibernate.search.indexes.impl.DirectoryBasedIndexManager]
 (http-/192.168.1.102:8080-1) HSEARCH000168: Serialization service Avro
 SerializationProvider v1.0 being used for index
 'default_capedwarf-test__com.google.appengine.api.datastore.Entity'
 21:01:17,911 INFO  [org.jboss.web] (ServerService Thread Pool -- 49)
 JBAS018224: Unregister web context: /capedwarf-tests

 50sec

 old 4.2.0.Final HS

 21:08:19,988 INFO
  [org.hibernate.search.indexes.impl.DirectoryBasedIndexManager]
 (http-/192.168.1.102:8080-2) HSEARCH000168: Serialization service Avro
 SerializationProvider v1.0 being used for index
 'default_capedwarf-test__com.google.appengine.api.datastore.Entity'
 21:08:20,829 INFO  [org.jboss.web] (ServerService Thread Pool -- 49)
 JBAS018224: Unregister web context: /capedwarf-tests

 841ms

 ---

 I added

 property name=enable_bundlingtrue/property

 to AS jgroups transport config, but no improvement.

 Any (other) idea?

 -Ales


 ___
 hibernate-dev mailing list
 hibernate-...@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/hibernate-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] [hibernate-dev] HSEARCH-1296

2013-04-11 Thread Sanne Grinovero
You could try the new sync version but setting the blackhole backend on the
master node to remove the indexing overhead from the picture.
 On Apr 11, 2013 8:39 PM, Sanne Grinovero sa...@hibernate.org wrote:

 Are you sure that the async version actually had applied all writes to the
 index in the measured interval?
  On Apr 11, 2013 8:13 PM, Ales Justin ales.jus...@gmail.com wrote:

 Although this change fixes query lookup,
 it adds horrible performance:

 Running CapeDwarf cluster QueryTest:

 with HSEARCH-1296

 21:00:27,188 INFO
  [org.hibernate.search.indexes.impl.DirectoryBasedIndexManager]
 (http-/192.168.1.102:8080-1) HSEARCH000168: Serialization service Avro
 SerializationProvider v1.0 being used for index
 'default_capedwarf-test__com.google.appengine.api.datastore.Entity'
 21:01:17,911 INFO  [org.jboss.web] (ServerService Thread Pool -- 49)
 JBAS018224: Unregister web context: /capedwarf-tests

 50sec

 old 4.2.0.Final HS

 21:08:19,988 INFO
  [org.hibernate.search.indexes.impl.DirectoryBasedIndexManager]
 (http-/192.168.1.102:8080-2) HSEARCH000168: Serialization service Avro
 SerializationProvider v1.0 being used for index
 'default_capedwarf-test__com.google.appengine.api.datastore.Entity'
 21:08:20,829 INFO  [org.jboss.web] (ServerService Thread Pool -- 49)
 JBAS018224: Unregister web context: /capedwarf-tests

 841ms

 ---

 I added

 property name=enable_bundlingtrue/property

 to AS jgroups transport config, but no improvement.

 Any (other) idea?

 -Ales


 ___
 hibernate-dev mailing list
 hibernate-...@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/hibernate-dev


___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] [hibernate-dev] HSEARCH-1296

2013-04-11 Thread Ales Justin
No, not in those 800ms, hence the failing test.

But if i add 2sec sleep in between delete and query,
the test passes.

Which is still 25x better. :)

On Apr 11, 2013, at 21:39, Sanne Grinovero sa...@hibernate.org wrote:

 Are you sure that the async version actually had applied all writes to the 
 index in the measured interval? 
 On Apr 11, 2013 8:13 PM, Ales Justin ales.jus...@gmail.com wrote:
 Although this change fixes query lookup,
 it adds horrible performance:
 
 Running CapeDwarf cluster QueryTest:
 
 with HSEARCH-1296
 
 21:00:27,188 INFO  
 [org.hibernate.search.indexes.impl.DirectoryBasedIndexManager] 
 (http-/192.168.1.102:8080-1) HSEARCH000168: Serialization service Avro 
 SerializationProvider v1.0 being used for index 
 'default_capedwarf-test__com.google.appengine.api.datastore.Entity'
 21:01:17,911 INFO  [org.jboss.web] (ServerService Thread Pool -- 49) 
 JBAS018224: Unregister web context: /capedwarf-tests
 
 50sec
 
 old 4.2.0.Final HS
 
 21:08:19,988 INFO  
 [org.hibernate.search.indexes.impl.DirectoryBasedIndexManager] 
 (http-/192.168.1.102:8080-2) HSEARCH000168: Serialization service Avro 
 SerializationProvider v1.0 being used for index 
 'default_capedwarf-test__com.google.appengine.api.datastore.Entity'
 21:08:20,829 INFO  [org.jboss.web] (ServerService Thread Pool -- 49) 
 JBAS018224: Unregister web context: /capedwarf-tests
 
 841ms
 
 ---
 
 I added
 
 property name=enable_bundlingtrue/property
 
 to AS jgroups transport config, but no improvement.
 
 Any (other) idea?
 
 -Ales
 
 
 ___
 hibernate-dev mailing list
 hibernate-...@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/hibernate-dev
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] [hibernate-dev] HSEARCH-1296

2013-04-11 Thread Ales Justin
What do you mean?

On Apr 11, 2013, at 21:41, Sanne Grinovero sa...@hibernate.org wrote:

 You could try the new sync version but setting the blackhole backend on the 
 master node to remove the indexing overhead from the picture. 
 On Apr 11, 2013 8:39 PM, Sanne Grinovero sa...@hibernate.org wrote:
 Are you sure that the async version actually had applied all writes to the 
 index in the measured interval? 
 On Apr 11, 2013 8:13 PM, Ales Justin ales.jus...@gmail.com wrote:
 Although this change fixes query lookup,
 it adds horrible performance:
 
 Running CapeDwarf cluster QueryTest:
 
 with HSEARCH-1296
 
 21:00:27,188 INFO  
 [org.hibernate.search.indexes.impl.DirectoryBasedIndexManager] 
 (http-/192.168.1.102:8080-1) HSEARCH000168: Serialization service Avro 
 SerializationProvider v1.0 being used for index 
 'default_capedwarf-test__com.google.appengine.api.datastore.Entity'
 21:01:17,911 INFO  [org.jboss.web] (ServerService Thread Pool -- 49) 
 JBAS018224: Unregister web context: /capedwarf-tests
 
 50sec
 
 old 4.2.0.Final HS
 
 21:08:19,988 INFO  
 [org.hibernate.search.indexes.impl.DirectoryBasedIndexManager] 
 (http-/192.168.1.102:8080-2) HSEARCH000168: Serialization service Avro 
 SerializationProvider v1.0 being used for index 
 'default_capedwarf-test__com.google.appengine.api.datastore.Entity'
 21:08:20,829 INFO  [org.jboss.web] (ServerService Thread Pool -- 49) 
 JBAS018224: Unregister web context: /capedwarf-tests
 
 841ms
 
 ---
 
 I added
 
 property name=enable_bundlingtrue/property
 
 to AS jgroups transport config, but no improvement.
 
 Any (other) idea?
 
 -Ales
 
 
 ___
 hibernate-dev mailing list
 hibernate-...@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/hibernate-dev
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] [hibernate-dev] HSEARCH-1296

2013-04-11 Thread Sanne Grinovero
There is a blackhole indexing backend, which pipes all indexing
requests  /dev/null

Set this as an Infinispan Query configuration property:

default.worker.backend = blackhole

Of course that means that the index will not be updated: you might
need to adapt your test to tolerate that, but the point is not
functional testing but to verify how much the SYNC option on the
JGroups backend is actually slowing you down. I suspect the
performance penalty is not in the network but in the fact you're now
waiting for the index operations, while in async you where not waiting
for them to be flushed.

If you can identify which part is slow, then we can help you with
better configuration options.


On 11 April 2013 20:47, Ales Justin ales.jus...@gmail.com wrote:
 What do you mean?

 On Apr 11, 2013, at 21:41, Sanne Grinovero sa...@hibernate.org wrote:

 You could try the new sync version but setting the blackhole backend on the
 master node to remove the indexing overhead from the picture.

 On Apr 11, 2013 8:39 PM, Sanne Grinovero sa...@hibernate.org wrote:

 Are you sure that the async version actually had applied all writes to the
 index in the measured interval?

 On Apr 11, 2013 8:13 PM, Ales Justin ales.jus...@gmail.com wrote:

 Although this change fixes query lookup,
 it adds horrible performance:

 Running CapeDwarf cluster QueryTest:

 with HSEARCH-1296

 21:00:27,188 INFO
 [org.hibernate.search.indexes.impl.DirectoryBasedIndexManager]
 (http-/192.168.1.102:8080-1) HSEARCH000168: Serialization service Avro
 SerializationProvider v1.0 being used for index
 'default_capedwarf-test__com.google.appengine.api.datastore.Entity'
 21:01:17,911 INFO  [org.jboss.web] (ServerService Thread Pool -- 49)
 JBAS018224: Unregister web context: /capedwarf-tests

 50sec

 old 4.2.0.Final HS

 21:08:19,988 INFO
 [org.hibernate.search.indexes.impl.DirectoryBasedIndexManager]
 (http-/192.168.1.102:8080-2) HSEARCH000168: Serialization service Avro
 SerializationProvider v1.0 being used for index
 'default_capedwarf-test__com.google.appengine.api.datastore.Entity'
 21:08:20,829 INFO  [org.jboss.web] (ServerService Thread Pool -- 49)
 JBAS018224: Unregister web context: /capedwarf-tests

 841ms

 ---

 I added

 property name=enable_bundlingtrue/property

 to AS jgroups transport config, but no improvement.

 Any (other) idea?

 -Ales


 ___
 hibernate-dev mailing list
 hibernate-...@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/hibernate-dev
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev