Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml
: So what/how should we document all of this? ... : I've got more info on this. Mark: most of what you wrote is above my head, but since you fixed a grammar error in my updated example solrconfig.xml comment w/o making any content changes, I'm assuming you feel what i put there is sufficient. Most of your comments feel like they should be raised over in Lucene-Java land, at a minimum in documentation (added to the AvailableLockFactories page perhaps) or possibly in some code changes (should we changed the default LockFactory depending on Java version?) I'll leave that up to you, since (as i mentioned) i didnt' understand half of it. : Checking for OverlappingFileLockException *should* actually work when : using Java 1.6. Java 1.6 started using a *system wide* thread safe check : for this. : : Previous to Java 1.6, checks for this *were* limited to an instance of : FileChannel - the FileChannel maintained its own personal lock list. So : you have to use : the same Channel to even have any hope of seeing an : OverlappingFileLockException. Even then though, its not properly thread : safe. They did not sync across : checking if the lock exists and acquiring the lock - they separately : sync each action - leaving room to acquire the lock twice from two : different threads like I was seeing. : : Interestingly, Java 1.6 has a back compat mode you can turn on that : doesn't use the system wide lock list, and they have fixed this thread : safety issue in that impl - there is a sync across checking : and getting the lock so that it is properly thread safe - but not in : Java 1.4, 1.5. : : Looking at GCC - uh ... I don't think you want to use GCC - they don't : appear to use a lock list and check for this at all :) : : But the point is, this is fixable on Java 6 if we check for : OverlappingFileLockException - it *should* work across webapps, and it : is actually thread safe, unlike Java 1.4,1.5. : : : Another interesting fact: : : On Windows, if you attempt to lock the same file with different channel : instances pre Java 1.6 - the code will deadlock. : : -- : - Mark : : http://www.lucidimagination.com : : : -Hoss
Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml
Chris Hostetter wrote: : At a minimu, shouldn't NativeFSLock.obtain() be checking for : OverlappingFileLockException and treating that as a failure to acquire the : lock? ... : Perhaps - that should make it work in more cases - but in my simple : testing its not 100% reliable. ... : File locks are held on behalf of the entire Java virtual machine. : * They are not suitable for controlling access to a file by multiple : * threads within the same virtual machine. ...Grrr so where does that leave us? Yonik's added comment was that native isnt' recommended when running multiple webapps in the same container. in truth, native *can* work when running multiple webapps in the same container, just as long as those cotnainers don't refrence the same data dirs I'm worried that we should recommend people avoid native altogether because even if you are only running one webapp, it seems like a reload or that app could trigger some similar bad behavior. So what/how should we document all of this? -Hoss I've got more info on this. Checking for OverlappingFileLockException *should* actually work when using Java 1.6. Java 1.6 started using a *system wide* thread safe check for this. Previous to Java 1.6, checks for this *were* limited to an instance of FileChannel - the FileChannel maintained its own personal lock list. So you have to use the same Channel to even have any hope of seeing an OverlappingFileLockException. Even then though, its not properly thread safe. They did not sync across checking if the lock exists and acquiring the lock - they separately sync each action - leaving room to acquire the lock twice from two different threads like I was seeing. Interestingly, Java 1.6 has a back compat mode you can turn on that doesn't use the system wide lock list, and they have fixed this thread safety issue in that impl - there is a sync across checking and getting the lock so that it is properly thread safe - but not in Java 1.4, 1.5. Looking at GCC - uh ... I don't think you want to use GCC - they don't appear to use a lock list and check for this at all :) But the point is, this is fixable on Java 6 if we check for OverlappingFileLockException - it *should* work across webapps, and it is actually thread safe, unlike Java 1.4,1.5. -- - Mark http://www.lucidimagination.com
Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml
Mark Miller wrote: Chris Hostetter wrote: : At a minimu, shouldn't NativeFSLock.obtain() be checking for : OverlappingFileLockException and treating that as a failure to acquire the : lock? ... : Perhaps - that should make it work in more cases - but in my simple : testing its not 100% reliable. ... : File locks are held on behalf of the entire Java virtual machine. : * They are not suitable for controlling access to a file by multiple : * threads within the same virtual machine. ...Grrr so where does that leave us? Yonik's added comment was that native isnt' recommended when running multiple webapps in the same container. in truth, native *can* work when running multiple webapps in the same container, just as long as those cotnainers don't refrence the same data dirs I'm worried that we should recommend people avoid native altogether because even if you are only running one webapp, it seems like a reload or that app could trigger some similar bad behavior. So what/how should we document all of this? -Hoss I've got more info on this. Checking for OverlappingFileLockException *should* actually work when using Java 1.6. Java 1.6 started using a *system wide* thread safe check for this. Previous to Java 1.6, checks for this *were* limited to an instance of FileChannel - the FileChannel maintained its own personal lock list. So you have to use the same Channel to even have any hope of seeing an OverlappingFileLockException. Even then though, its not properly thread safe. They did not sync across checking if the lock exists and acquiring the lock - they separately sync each action - leaving room to acquire the lock twice from two different threads like I was seeing. Interestingly, Java 1.6 has a back compat mode you can turn on that doesn't use the system wide lock list, and they have fixed this thread safety issue in that impl - there is a sync across checking and getting the lock so that it is properly thread safe - but not in Java 1.4, 1.5. Looking at GCC - uh ... I don't think you want to use GCC - they don't appear to use a lock list and check for this at all :) But the point is, this is fixable on Java 6 if we check for OverlappingFileLockException - it *should* work across webapps, and it is actually thread safe, unlike Java 1.4,1.5. Another interesting fact: On Windows, if you attempt to lock the same file with different channel instances pre Java 1.6 - the code will deadlock. -- - Mark http://www.lucidimagination.com
Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml
thanks for the heads-up, this is good to know. I've updated http://wiki.apache.org/lucene-java/AvailableLockFactories which I recently created as a guide to help in choosing between different LockFactories. I believe the Native LockFactory is very useful, I wouldn't consider this a bug nor consider discouraging it's use, people just need to be informed of the behavior and know that no LockFactory impl is good for all cases. Adding some lines to it's javadoc seems appropriate. Regards, Sanne 2010/1/20 Chris Hostetter hossman_luc...@fucit.org: : At a minimu, shouldn't NativeFSLock.obtain() be checking for : OverlappingFileLockException and treating that as a failure to acquire the : lock? ... : Perhaps - that should make it work in more cases - but in my simple : testing its not 100% reliable. ... : File locks are held on behalf of the entire Java virtual machine. : * They are not suitable for controlling access to a file by multiple : * threads within the same virtual machine. ...Grrr so where does that leave us? Yonik's added comment was that native isnt' recommended when running multiple webapps in the same container. in truth, native *can* work when running multiple webapps in the same container, just as long as those cotnainers don't refrence the same data dirs I'm worried that we should recommend people avoid native altogether because even if you are only running one webapp, it seems like a reload or that app could trigger some similar bad behavior. So what/how should we document all of this? -Hoss
Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml
: again. I don't think it matters if its the same FileChannel or not - you : just can't use Native Locks within the same JVM, as the lock is held by : the JVM - they are per process - so Lucene does its own little static : map stuff to lock within JVM (simple in memory lock tracking) and uses : the actual Native Lock for multiple JVMs (which is all its good for - : process granularity). But obviously, the in memory locking doesn't work : across webapps. Assuming I'm understanding all of this correctly, that implies a bug in Lucene's NativeFSLockFactory when used in a multiple classloader type situation -- including any app running in a servlet container. At a minimu, shouldn't NativeFSLock.obtain() be checking for OverlappingFileLockException and treating that as a failure to acquire the lock? -Hoss
Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml
Chris Hostetter wrote: : again. I don't think it matters if its the same FileChannel or not - you : just can't use Native Locks within the same JVM, as the lock is held by : the JVM - they are per process - so Lucene does its own little static : map stuff to lock within JVM (simple in memory lock tracking) and uses : the actual Native Lock for multiple JVMs (which is all its good for - : process granularity). But obviously, the in memory locking doesn't work : across webapps. Assuming I'm understanding all of this correctly, that implies a bug in Lucene's NativeFSLockFactory when used in a multiple classloader type situation -- including any app running in a servlet container. At a minimu, shouldn't NativeFSLock.obtain() be checking for OverlappingFileLockException and treating that as a failure to acquire the lock? -Hoss Perhaps - that should make it work in more cases - but in my simple testing its not 100% reliable. If I startup two threads and and try and get a lock (with the same channel, with different channels) with first one thread and then the other - sometimes it throws OverlappingFileLockException ... and sometimes it doesn't. From what I can tell, you certainly can't count on it. If you pause between attempts, it does appear to always work - so it certainly would give us a lot of ground it would seem - but if they attempts are back to back, both threads can still successfully get the lock. This behavior could be OS dependent as its using OS level locks. FileChannel does appear to say that this should work (though its obviously not completely thread safe from what I can tell), but it also says: File locks are held on behalf of the entire Java virtual machine. * They are not suitable for controlling access to a file by multiple * threads within the same virtual machine. Which seems to be the case. -- - Mark http://www.lucidimagination.com
Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml
: At a minimu, shouldn't NativeFSLock.obtain() be checking for : OverlappingFileLockException and treating that as a failure to acquire the : lock? ... : Perhaps - that should make it work in more cases - but in my simple : testing its not 100% reliable. ... : File locks are held on behalf of the entire Java virtual machine. : * They are not suitable for controlling access to a file by multiple : * threads within the same virtual machine. ...Grrr so where does that leave us? Yonik's added comment was that native isnt' recommended when running multiple webapps in the same container. in truth, native *can* work when running multiple webapps in the same container, just as long as those cotnainers don't refrence the same data dirs I'm worried that we should recommend people avoid native altogether because even if you are only running one webapp, it seems like a reload or that app could trigger some similar bad behavior. So what/how should we document all of this? -Hoss
Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml
On Mon, Jan 18, 2010 at 1:17 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : Right... for stock Solr usage (i.e. as long as they don't try to lock : the same thing.) : It is funny that native locks always work across different processes, : but not always in the same JVM though. Actaully, the more i think about this the less i understand it ... why don't native locks work within the same VM? ... and by work i mean why didn't he just get a lock timeout error? Within the same VM, you need the same FileChannel for some reason. Lucene uses a static hashmap so that multiple NativeFSLockFactory instances will end up using the same FileChannel for locking. But multiple webapps obviously breaks that. -Yonik http://www.lucidimagination.com
Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml
Yonik Seeley wrote: On Mon, Jan 18, 2010 at 1:17 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : Right... for stock Solr usage (i.e. as long as they don't try to lock : the same thing.) : It is funny that native locks always work across different processes, : but not always in the same JVM though. Actaully, the more i think about this the less i understand it ... why don't native locks work within the same VM? ... and by work i mean why didn't he just get a lock timeout error? Within the same VM, you need the same FileChannel for some reason. Lucene uses a static hashmap so that multiple NativeFSLockFactory instances will end up using the same FileChannel for locking. But multiple webapps obviously breaks that. -Yonik http://www.lucidimagination.com Native Locks are obtained at the JVM level - so if you try and lock the same Channel twice, since the same JVM already has the lock, its granted again. I don't think it matters if its the same FileChannel or not - you just can't use Native Locks within the same JVM, as the lock is held by the JVM - they are per process - so Lucene does its own little static map stuff to lock within JVM (simple in memory lock tracking) and uses the actual Native Lock for multiple JVMs (which is all its good for - process granularity). But obviously, the in memory locking doesn't work across webapps. -- - Mark http://www.lucidimagination.com
Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml
Mark Miller wrote: Yonik Seeley wrote: On Mon, Jan 18, 2010 at 1:17 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : Right... for stock Solr usage (i.e. as long as they don't try to lock : the same thing.) : It is funny that native locks always work across different processes, : but not always in the same JVM though. Actaully, the more i think about this the less i understand it ... why don't native locks work within the same VM? ... and by work i mean why didn't he just get a lock timeout error? Within the same VM, you need the same FileChannel for some reason. Lucene uses a static hashmap so that multiple NativeFSLockFactory instances will end up using the same FileChannel for locking. But multiple webapps obviously breaks that. -Yonik http://www.lucidimagination.com Native Locks are obtained at the JVM level - so if you try and lock the same Channel twice, since the same JVM already has the lock, its granted again. I don't think it matters if its the same FileChannel or not - you just can't use Native Locks within the same JVM, as the lock is held by the JVM - they are per process - so Lucene does its own little static map stuff to lock within JVM (simple in memory lock tracking) and uses the actual Native Lock for multiple JVMs (which is all its good for - process granularity). But obviously, the in memory locking doesn't work across webapps. Also, the javadocs in Lucene are wrong: /* * The javadocs for FileChannel state that you should have * a single instance of a FileChannel (per JVM) for all * locking against a given file. To ensure this, we have * a single (static) HashSet that contains the file paths * of all currently locked locks. This protects against * possible cases where different Directory instances in * one JVM (each with their own NativeFSLockFactory * instance) have set the same lock dir and lock prefix. */ The javadocs for FileChannel don't say this at all - and this implies that Lucene is doing something that it is not. The javadocs say don't expect native locks to work for locking within a JVM, because it doesn't. And Lucene doesn't try and use the same FileChannel per JVM (it wouldn't help anyway) - Lucene simply attempts to track per JVM locks in a static map (which doesn't work per JVM when you are dealing with different classloaders). -- - Mark http://www.lucidimagination.com
Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml
Ah thanks - I was going by that comment :-) On Mon, Jan 18, 2010 at 12:07 PM, Mark Miller markrmil...@gmail.com wrote: Mark Miller wrote: Yonik Seeley wrote: On Mon, Jan 18, 2010 at 1:17 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : Right... for stock Solr usage (i.e. as long as they don't try to lock : the same thing.) : It is funny that native locks always work across different processes, : but not always in the same JVM though. Actaully, the more i think about this the less i understand it ... why don't native locks work within the same VM? ... and by work i mean why didn't he just get a lock timeout error? Within the same VM, you need the same FileChannel for some reason. Lucene uses a static hashmap so that multiple NativeFSLockFactory instances will end up using the same FileChannel for locking. But multiple webapps obviously breaks that. -Yonik http://www.lucidimagination.com Native Locks are obtained at the JVM level - so if you try and lock the same Channel twice, since the same JVM already has the lock, its granted again. I don't think it matters if its the same FileChannel or not - you just can't use Native Locks within the same JVM, as the lock is held by the JVM - they are per process - so Lucene does its own little static map stuff to lock within JVM (simple in memory lock tracking) and uses the actual Native Lock for multiple JVMs (which is all its good for - process granularity). But obviously, the in memory locking doesn't work across webapps. Also, the javadocs in Lucene are wrong: /* * The javadocs for FileChannel state that you should have * a single instance of a FileChannel (per JVM) for all * locking against a given file. To ensure this, we have * a single (static) HashSet that contains the file paths * of all currently locked locks. This protects against * possible cases where different Directory instances in * one JVM (each with their own NativeFSLockFactory * instance) have set the same lock dir and lock prefix. */ The javadocs for FileChannel don't say this at all - and this implies that Lucene is doing something that it is not. The javadocs say don't expect native locks to work for locking within a JVM, because it doesn't. And Lucene doesn't try and use the same FileChannel per JVM (it wouldn't help anyway) - Lucene simply attempts to track per JVM locks in a static map (which doesn't work per JVM when you are dealing with different classloaders). -- - Mark http://www.lucidimagination.com
Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml
: Right... for stock Solr usage (i.e. as long as they don't try to lock : the same thing.) : It is funny that native locks always work across different processes, : but not always in the same JVM though. Actaully, the more i think about this the less i understand it ... why don't native locks work within the same VM? ... and by work i mean why didn't he just get a lock timeout error? If the behavior of Native Locks is really that you don't get the same behavior if both clients are in the same JVM, then shouldn't the Lucene NativeLockFactory be doing something like wrapping a SingleInstanceLockFactory arround the NativeFSLockFactory? : #2) native lock factory fails if it's two different Solr webapps in : the same JVM trying to lock the same thing. ... : Should we clarify Do not use with multiple solr webapps in the same : JVM or just remove it? I'm starting to think we should remove support for native locks at all -- if it can fail in the situation of multiple wars in the same JVM trying to use the same solr home, that implies that it can also fail if something goes wrong during a hot deploying the solr.war ... if the shutdown of the older instance of solr.war fails for some reason, thentheir could be a stale lock, created in the same JVM, left over when the newer instance is brought online. correct? -Hoss
Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml
: doc: note about native locks not working for multiple webapps in same JVM Is this in resposne to the OverlappingFileLockException thread started by Joe Kessel? ... : + native = NativeFSLockFactory - uses OS native file locking. : + Do not use with multiple solr webapps in the same JVM. I think there's a missunderstanding about the root cause of hte problem. There shouldn't be any inherent problem with using Native locks and multiple webapps -- i believe the underlying source of the exception was that he was using multiple webapps w/o realizing it -- so presumably both webapps were trying to use the same solr home dir. -Hoss
Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml
On Sat, Jan 16, 2010 at 3:40 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : doc: note about native locks not working for multiple webapps in same JVM Is this in resposne to the OverlappingFileLockException thread started by Joe Kessel? ... : + native = NativeFSLockFactory - uses OS native file locking. : + Do not use with multiple solr webapps in the same JVM. I think there's a missunderstanding about the root cause of hte problem. There shouldn't be any inherent problem with using Native locks and multiple webapps Right... for stock Solr usage (i.e. as long as they don't try to lock the same thing.) It is funny that native locks always work across different processes, but not always in the same JVM though. -- i believe the underlying source of the exception was that he was using multiple webapps w/o realizing it -- so presumably both webapps were trying to use the same solr home dir. Right... it's really two issues: #1) two separate solr instances trying to use the same solr index #2) native lock factory fails if it's two different Solr webapps in the same JVM trying to lock the same thing. I do recall expert level stuff like people having mutiple solr instances pointing to the same data directory in the past though, but not sure if it was from the same JVM or not. Should we clarify Do not use with multiple solr webapps in the same JVM or just remove it? -Yonik http://www.lucidimagination.com