Re: [hibernate-dev] Hibernate Search: Master/Slave DirectoryProviders improvements proposal
On Apr 28, 2008, at 15:03, Sanne Grinovero wrote: Hello, Nice, I'm going to work on this as soon as my previous patch gets applied. I've some more very minor issues to add to the new patch if you agree: service thread is started by: timer.scheduleAtFixedRate( task, period, period ); I would think the first operation should start immediately, not after period. If you look at the code, the first operation is done synchronously in start() we can't strat HSearch until we have a working Directory ready. Also I would use schedule( task, long, long) instead of scheduleAtFixedRate; the "atFixedRate" version means it will begin sooner if it started too late, we don't require this (correct?). So I propose: timer.schedule( task, 0, period ); Well I kinda like scheduleAtFixedRate for the copy tasks, it sounds more admin friendly to me. Why do you like schedule better? Also the timer creates another thread, running concurrently to itself and skipping task if the previous is still working. What would I miss if I just let the Timer do the job, without needing the Executor around? (one less thread and less code to check current activity as the timer supports that check for us). In the current scheme, the copy is run every n minutes unless the previous tasks takes more than n minutes. In the latter, the task is just skipped. If you don't, then the tasks will pile up and the copy will run over and over (while the tasks try to catch up) Using the executor it would be nice to share the same executor among several Directory Providers, to have better resource bounds (serialize the copy for different instances of FSSlave/Master Providers). Would a static Executor instance be ugly? Yes it would be ugly and would probably complicate hot redeployment. What would be the benefit really? I don't like very much the idea of one copy blocking all other copies. Plus there are some deadlocks issues between directories that HSearch deal with internally that might interfere as well if you share the same resource pool. Threads are cheap. regards, Sanne 2008/4/28 Emmanuel Bernard <[EMAIL PROTECTED]>: One JIRA / patch is good Self-conflict is not always bad "When fight begins within himself, a man's worth something." - Sir Frederick Browning :) On Apr 27, 2008, at 08:03, Sanne Grinovero wrote: I would like to address A and B first; may I open a single JIRA for both? both issues are related to the same files and I would like to avoid having to produce two different patches, potentially conflicting with myself. ___ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev
Re: [hibernate-dev] Hibernate Search: Master/Slave DirectoryProviders improvements proposal
Hello, Nice, I'm going to work on this as soon as my previous patch gets applied. I've some more very minor issues to add to the new patch if you agree: service thread is started by: timer.scheduleAtFixedRate( task, period, period ); I would think the first operation should start immediately, not after period. Also I would use schedule( task, long, long) instead of scheduleAtFixedRate; the "atFixedRate" version means it will begin sooner if it started too late, we don't require this (correct?). So I propose: timer.schedule( task, 0, period ); Also the timer creates another thread, running concurrently to itself and skipping task if the previous is still working. What would I miss if I just let the Timer do the job, without needing the Executor around? (one less thread and less code to check current activity as the timer supports that check for us). Using the executor it would be nice to share the same executor among several Directory Providers, to have better resource bounds (serialize the copy for different instances of FSSlave/Master Providers). Would a static Executor instance be ugly? regards, Sanne 2008/4/28 Emmanuel Bernard <[EMAIL PROTECTED]>: > One JIRA / patch is good > Self-conflict is not always bad "When fight begins within himself, a man's > worth something." - Sir Frederick Browning :) > > > > On Apr 27, 2008, at 08:03, Sanne Grinovero wrote: > > > > I would like to address A and B first; may I > > open a single JIRA for both? both issues are related to the same files > > and I would > > like to avoid having to produce two different patches, potentially > conflicting > > with myself. > > > > ___ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev
Re: [hibernate-dev] Hibernate Search: Master/Slave DirectoryProviders improvements proposal
One JIRA / patch is good Self-conflict is not always bad "When fight begins within himself, a man's worth something." - Sir Frederick Browning :) On Apr 27, 2008, at 08:03, Sanne Grinovero wrote: I would like to address A and B first; may I open a single JIRA for both? both issues are related to the same files and I would like to avoid having to produce two different patches, potentially conflicting with myself. ___ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev
Re: [hibernate-dev] Hibernate Search: Master/Slave DirectoryProviders improvements proposal
Hi Hardy, A -- nice I'm getting the chance to teach something to you gurus.. yes this is really required, locking is fine, but there are two problems there; a) visibility: JVM makes no guarantee another thread will see the changes made to a variable by another thread; it could see it "later on" and could also never see the updated value. Also when several variables are updated, the other thread could see them updated in different order (so returning another Dir.Provider than intended by other state variables). Only "final" instance variables are to be considered safe. b) variable update atomicity: when updating a reference you are changing several byte variables, another thread could read some bytes of a new value and some of an older version. This depends on architecture and JVM implementation, but should really be avoided. Just remember that all communication between threads must always be synchronized, or even better use some of the really good java.util.concurrent. That's why I am recommending the AtomicReference, this class has been made exactly to solve this problem, and on the architectures where it is unneeded it is just compiled to the fastest but SAFE implementation; the same considerations are true for any java type, even primitives: one line of code in java doesn't mean it's an atomic op (sorry for repeating the obvious). For a very good example on this problem: http://en.wikipedia.org/wiki/Double_checked_locking_pattern D -- how stupid, I'm sorry I didn't think about that! thanks, I suppose D is ok then. As C and D are just little improvements, but A and B look like a bugs needing a fix with some urgency, I would like to address A and B first; may I open a single JIRA for both? both issues are related to the same files and I would like to avoid having to produce two different patches, potentially conflicting with myself. regards, Sanne 2008/4/27 Hardy Ferentschik <[EMAIL PROTECTED]>: > Hi Sanne, > > > > > A) ... but all comunication beetween the client thread calling > "getDirectory()" and the CopyDirectory > > > > which defines the current directory is unprotected; also volatile isn't > enough > > to fix it, IMHO an AtomicReference would be more appropriate. > > > > Is this really required? Is synchronisation not ensured via > SearchFactoryImplementor.getLockableDirectoryProviders()? > > > > B) ... Also I would like to address the use of finalize() to stop the > Timer, > > > > there is a TODO about it already; > > > > Fair point. Initially a non daemon thread was used. Once we switched to a > deamon thread we > could probably have got rid of the call to timer.canel(). > > > > > > C) FileHelper not listening to Thread.interrupt() requests > > As FileHelper.synchronize is used in the mentioned Timers to copy > > potentially big files, > > it would be nice to check for Thread.isInterrupted() and also catch > > ClosedByInterruptException in file-copy operations so to abort all I/O > > operations and > > not just the current one to abort faster on JVM shutdown. > > (It doesn't look like the copy safety was guaranteed anyway). > > > > Sounds reasonable. > > > > > > D) Master and Slave share configuration properties. > > Wouldn't it be natural to have two different property keys to manage > > the shared index path? I imagine the typical scenario would be to have the > two > > configured on different servers and copy through some NFS, but it looks > like > > we are forcing people to use exactly the same path on both machines as the > two > > classes read the same props. > > > > The properties are different. Yes, the master and the slave both define: > hibernate.search.default.sourceBase and hibernate.search.default.indexBase, > but remember > that these are in two different configuration files. Master and slave are > two seperate machines > or at the very least two different JVMs. Nothing stops you to have > different mount points. I guess the > docuemtnation might suggest this. > > You can definitely create a Jira issue for B and D. Not sure about A > though. > Maybe someone else can have another look at the code. > > --Hardy > > ___ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev
Re: [hibernate-dev] Hibernate Search: Master/Slave DirectoryProviders improvements proposal
Hi Sanne, A) ... but all comunication beetween the client thread calling "getDirectory()" and the CopyDirectory which defines the current directory is unprotected; also volatile isn't enough to fix it, IMHO an AtomicReference would be more appropriate. Is this really required? Is synchronisation not ensured via SearchFactoryImplementor.getLockableDirectoryProviders()? B) ... Also I would like to address the use of finalize() to stop the Timer, there is a TODO about it already; Fair point. Initially a non daemon thread was used. Once we switched to a deamon thread we could probably have got rid of the call to timer.canel(). C) FileHelper not listening to Thread.interrupt() requests As FileHelper.synchronize is used in the mentioned Timers to copy potentially big files, it would be nice to check for Thread.isInterrupted() and also catch ClosedByInterruptException in file-copy operations so to abort all I/O operations and not just the current one to abort faster on JVM shutdown. (It doesn't look like the copy safety was guaranteed anyway). Sounds reasonable. D) Master and Slave share configuration properties. Wouldn't it be natural to have two different property keys to manage the shared index path? I imagine the typical scenario would be to have the two configured on different servers and copy through some NFS, but it looks like we are forcing people to use exactly the same path on both machines as the two classes read the same props. The properties are different. Yes, the master and the slave both define: hibernate.search.default.sourceBase and hibernate.search.default.indexBase, but remember that these are in two different configuration files. Master and slave are two seperate machines or at the very least two different JVMs. Nothing stops you to have different mount points. I guess the docuemtnation might suggest this. You can definitely create a Jira issue for B and D. Not sure about A though. Maybe someone else can have another look at the code. --Hardy ___ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev
[hibernate-dev] Hibernate Search: Master/Slave DirectoryProviders improvements proposal
Hello most esteemed developers, I have been scrutinizing the FSMasterDirectoryProvider and FSSlaveDirectoryProvider classes in Hibernate Search and have found some minor glitches and space for some improvement (IMHO); I would like to hear your opinion about it, and if you agree I'll open a JIRA and provide a patch myself ASAP. A) Variable visibility both FSMasterD.P. and FSSlaveD.P. have some little concurrency problems: there is a "volatile boolean inProgress" that fixes the communication beetween the "CopyDirectory" Runnable and the "TriggerTask" Timer, but all comunication beetween the client thread calling "getDirectory()" and the CopyDirectory which defines the current directory is unprotected; also volatile isn't enough to fix it, IMHO an AtomicReference would be more appropriate. The locking schema looks great! I've had a hard time understanding the flow and did some tests about the behaviour of Lucene in case of concurrent index-deletion and forcing the most dangerous situations I could think of, I couldn't find anything failing. Actually in the current situation the lock works ok but the wrong Directory could be returned because of the visibility problems; the lock avoids most problems but could hurt scalability (as we could return locked resources even whenever an unlocked dir was available). B) The finalize / stop issue Also I would like to address the use of finalize() to stop the Timer, there is a TODO about it already; IMHO the best solution is to remove the finalize without changing other code: the Timer is declared as daemon thread already, so it's going to stop automatically on JVM exit. The current finalize idea could harm the GC, as in the finalize we are referencing the timer and instruct the timer to do something: the timer has a reference to so this would revive the reference to and cancel the garbage-collection; actually I don't think this finalizer is ever being called, as the timer also has a reference to this (being an inner class) and is running in another thread which is certainly alive (as our purpose was to stop it and we we still didn't get to that point). So we could fix this issues just removing code, and making some instance variables (such as the Timer itself) unnecessary. Also this means you won't need a stop() to be added to DirectoryProvider interface, at least not for FSMasterD.P. and FSSlaveD.P. ( which are the only known implementations needing it, so we could also remove the TODO in the DirectoryProvider. C) FileHelper not listening to Thread.interrupt() requests As FileHelper.synchronize is used in the mentioned Timers to copy potentially big files, it would be nice to check for Thread.isInterrupted() and also catch ClosedByInterruptException in file-copy operations so to abort all I/O operations and not just the current one to abort faster on JVM shutdown. (It doesn't look like the copy safety was guaranteed anyway). D) Master and Slave share configuration properties. Wouldn't it be natural to have two different property keys to manage the shared index path? I imagine the typical scenario would be to have the two configured on different servers and copy through some NFS, but it looks like we are forcing people to use exactly the same path on both machines as the two classes read the same props. kind regards, Sanne ___ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev
Re: [hibernate-dev] Hibernate Search master/slave
Max Rydahl Andersen wrote: > Recently I've been working on Hibernate Search clustering. > Hibernate Search will have the ability to work in a master / slave mode. > > Every node has a local copy of the index and can read it (ie query it). > When an update is required (actually when a transaction scoped list of > updates are required) by a node, a message is sent to a master node > responsible to update the master index. > On a regular basis, each node acquires a copy of the master index in > local to stay up to day asynchronously. > > My first implementation involves JMS, but one could imagine using: > - JBoss Messaging standalone in headless mode (when it's done) > - a servlet controller receiving http requests > - whatever you want > > Thoughts? Is this clustering at the level of Hibernate Search or is it something at the lucence index level ? Hibernate Search Would it not make sense to be a "index tech" level concern or ? No because the index has no notion of transaction and batch of operations. However I plan sometime to create a JBossCache Directory (index container). The scalability is limited though since Lucene needs an index based locking system. -- -- Max Rydahl Andersen callto://max.rydahl.andersen Hibernate [EMAIL PROTECTED] http://hibernate.org JBoss a division of Red Hat [EMAIL PROTECTED] ___ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev
Re: [hibernate-dev] Hibernate Search master/slave
Recently I've been working on Hibernate Search clustering. Hibernate Search will have the ability to work in a master / slave mode. Every node has a local copy of the index and can read it (ie query it). When an update is required (actually when a transaction scoped list of updates are required) by a node, a message is sent to a master node responsible to update the master index. On a regular basis, each node acquires a copy of the master index in local to stay up to day asynchronously. My first implementation involves JMS, but one could imagine using: - JBoss Messaging standalone in headless mode (when it's done) - a servlet controller receiving http requests - whatever you want Thoughts? Is this clustering at the level of Hibernate Search or is it something at the lucence index level ? Would it not make sense to be a "index tech" level concern or ? -- -- Max Rydahl Andersen callto://max.rydahl.andersen Hibernate [EMAIL PROTECTED] http://hibernate.org JBoss a division of Red Hat [EMAIL PROTECTED] ___ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev
[hibernate-dev] Hibernate Search master/slave
Recently I've been working on Hibernate Search clustering. Hibernate Search will have the ability to work in a master / slave mode. Every node has a local copy of the index and can read it (ie query it). When an update is required (actually when a transaction scoped list of updates are required) by a node, a message is sent to a master node responsible to update the master index. On a regular basis, each node acquires a copy of the master index in local to stay up to day asynchronously. My first implementation involves JMS, but one could imagine using: - JBoss Messaging standalone in headless mode (when it's done) - a servlet controller receiving http requests - whatever you want Thoughts? The project moves slowly from a library to a library + infrastructure project. The focus will stay on the library side, but I am think of having some helpers to build the server side (ie build a ready to deploy EAR or WAR) ___ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev