Hello, I'm forwarding this email to Emmanuel and Hibernate Search dev, as I believe we should join the discussion. Could we keep both dev-lists (jbosscache-...@lists.jboss.org, hibernate-dev@lists.jboss.org ) on CC ?
Sanne 2009/4/29 Manik Surtani <ma...@jboss.org>: > > On 27 Apr 2009, at 05:18, Andrew Duckworth wrote: > >> Hello, >> >> I have been working on a Lucene Directory provider based on JBoss Cache, >> my starting point was an implementation Manik had already written which >> pretty much worked with a few minor tweaks. Our use case was to cluster a >> Lucene index being used with Hibernate Search in our application, with the >> requirements that searching needed to be fast, there was no shared file >> system and it was important that the index was consistent across the cluster >> in a relatively short time frame. >> >> Maniks code used a token node in the cache to implement the distributed >> lock. During my testing I set up multiple cache copies with multiple threads >> reading/writing to each cache copy. I was finding a lot of transactions to >> acquire or release this lock were timing out, not understanding JBC well I >> modified the distributed lock to use JGroups DistrubutedLockManager. This >> worked quite well, however the time taken to acquire/release the lock (~100 >> ms for both) dwarfed the time to process the index update, lowering >> throughput. Even using Hibernate Search with an async worker thread, there >> was still a lot of contention for the single lock which seemed to limit the >> scalability of the solution. I thinkl part of the problem was that our use >> of HB Search generates a lot of small units of work (remove index entry, add >> index entry) and each of these UOW acquire a new IndexWriter and new write >> lock on the underlying Lucene Directory implementation. >> >> >> Out of curiosity, I created an alternative implementation based on the >> Hibernate Search JMS clustering strategy. Inside JBoss Cache I created a >> queue node and each slave node in the cluster creates a separate queue >> underneath where indexing work is written: >> >> /queue/slave1/[work0, work1, work2 ....] >> /slave2 >> /slave3 >> >> etc >> >> In each cluster member a background thread runs continuously when it wakes >> up, it decides if it is the master node or not (currently checks if it is >> the view coordinator, but I'm considering changing it to use a longer lived >> distributed lock). If it is the master it merges the tasks from each slave >> queue, and updates the JBCDirectory in one go, it can safely do this with >> only local VM locking. This approach means that in all the slave nodes they >> can write to their queue without needing a global lock that any other slave >> or the master would be using. On the master, it can perform multiple updates >> in the context of a single Lucene index writer. With a cache loader >> configured, work that is written into the slave queue is persistent, so it >> can survive the master node crashing with automatic fail over to a new >> master meaning that eventually all updates should be applied to the index. >> Each work element in the queue is time stamped to allow them to be processed >> in order (requires! >> time synchronisation across the cluster) by the master. For our workload >> the master/slave pattern seems to improve the throughput of the system. >> >> >> Currently I'm refining the code and I have a few JBoss Cache questions >> which I hope you can help me with: >> >> 1) I have noticed that under high load I get LockTimeoutExceptions writing >> to /queue/slave0 when the lock owner is a transaction working on >> /queue/slave1 , i.e. the same lock seems to be used for 2 unrelated nodes in >> the cache. I'm assuming this is a result of the lock striping algorithm, if >> you could give me some insight into how this works that would be very >> helpful. Bumping up the cache concurrency level from 500 to 2000 seemed to >> reduce this problem, however I'm not sure if it just reduces the probability >> of a random event of if there is some level that will be sufficient to >> eliminate the issue. > > It could well be the lock striping at work. As of JBoss Cache 3.1.0 you can > disable lock striping and have one lock per node. While this is expensive > in that if you have a lot of nodes, you end up with a lot of locks, if you > have a finite number of nodes this may help you a lot. > >> 2) Is there a reason to use separate nodes for each slave queue ? Will it >> help with locking, or can each slave safely insert to the same parent node >> in separate transactions without interfering or blocking each other ? If I >> can reduce it to a single queue I thin that would be a more elegant >> solution. I am setting the lockParentForChildInsertRemove to false for the >> queue nodes. > > It depends. Are the work objects attributes in /queue/slaveN ? Remember > that the granularity for all locks is the node itself so if all slaves write > to a single node, they will all compete for the same lock. > >> 3) Similarly, is there any reason why the master should/shouldn't take >> responsibility for removing work nodes that have been processed ? > > Not quite sure I understand your design - so this distributes the work > objects and each cluster member maintains indexes locally? If so, you need > to know when all members have processed the work objects before removing > these. > >> Thanks in advance for help, I hope to make this solution general purpose >> enough to be able to contribute back to Hibernate Search and JBC teams. > > Thanks for offering to contribute. :-) One other thing that may be of > interest is that I just launched Infinispan [1] [2] - a new data grid > product. You could implement a directory provider on Infinispan too - it is > a lot more efficient than JBC at many things, including concurrency. Also, > Infinispan's lock granularity is per-key/value pair. So a single > distributed cache would be all you need for work objects. Also, another > thing that could help is the eager locking we have on the roadmap [3] which > may make a more traditional approach of locking + writing indexes to the > cache more feasible. I'd encourage you to check it out. > > [1] http://www.infinispan.org > [2] > http://infinispan.blogspot.com/2009/04/infinispan-start-of-new-era-in-open.html > [3] https://jira.jboss.org/jira/browse/ISPN-48 > -- > Manik Surtani > ma...@jboss.org > Lead, Infinispan > Lead, JBoss Cache > http://www.infinispan.org > http://www.jbosscache.org > > > > > _______________________________________________ > jbosscache-dev mailing list > jbosscache-...@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/jbosscache-dev > _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev