Opening an issue sounds great. It kinda hijacked that thread already :) We can argue there whether FSLockFactory should or shouldn't allow locking outside the index directory. Perhaps we can put on the application the burden of generating a unique lock name, if it intends to use it outside the index directory.
Shai On Sun, Dec 26, 2010 at 12:17 PM, Uwe Schindler <[email protected]> wrote: > Hi Shai, > > > > Thanks for the insight. I still disagree about the lock file placement. If > your application needs such functionality you can always program your own > lock factory. But even if we add support for external dirs, the creation of > the lock file name and prefix should be done inside the lock factory > (currently, the lock id – the MD5 – is created in the directory instance). > This has pros for e.g. RAMDirectory, but is just unclean at the moment. > Especially the special handling of the lock id when the directory path for > the lock file differs from the index directory. > > > > About MD5: At least for Lucene 3.x we must stay with MD5, as changing the > hash algorithm would change the lock file name, so an index locked with 3.0 > would have a different lock file than an index locked with 3.x (for external > lock directory). This would lead to problems for users upgrading several > servers writing to shared indexes during the upgrades. You cannot be sure > that the same index is always locked when you have 2 different lucene > versions. This is just a warning, maybe it’s not important, but its > important. > > > > I would propose the following (for 3.x): Remove the static MessageDigest > from FSDirectory and replace it by a lazy-init one. Because for the common > case, the message digest is never used and should not be initialized. On > first use, create one (which only happens if you specify a separate lock > directory). Normal Lucene users and Solr will never hit this. We can stay > with MD5. About availability: As nitpicked yesterday with Robert, we can > still assume MD5 is always available for any JVM, because without MD5, tons > of software applications will break (just think of sessions in web app > servers). Harmony has it of course, Android. So we should simply use it (and > please don’t copy an MD5 impl to util!) and throw exception when not > available. For this seldom case (normal user have their locks in index dir), > this is fine. Everything else would break safe upgrade pathes. > > > > I think we should open an issue and discuss about the LockFactory > simplyfications (the code is currently very broken as it relies on strange > sophisticated backwards assumptions, just my opinion). > > > > Uwe > > ----- > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: [email protected] > > > > *From:* Shai Erera [mailto:[email protected]] > *Sent:* Sunday, December 26, 2010 7:43 AM > > *To:* [email protected] > *Subject:* Re: LuceneTestCase.threadCleanup incorrectly reports left > running threads > > > > Thanks for the explanation Uwe. I got confused with that and the process > name we add to the temp lock file. > > About what you say, we have a scenario where we use a FSLockFactory outside > an IndexWriter scope, but still within a search application scope. We lock a > directory so that several processes can be synced on it. Another use case is > when you manage several indexes, that should be locked at once -- the lock > file may reside external to all of them. > > So I would not want to see LockFactory suddenly assuming and enforcing the > lock file to be created in the index directory, or rely on IndexWriter > passing it something etc. I think the way LF works today, allowing you to > place the lock file wherever you want gives more freedom to people and opens > up the door for more use cases. > > I don't think though that we should rely on MD5. A simple hash function > should be enough IMO. > > Shai > > On Sun, Dec 26, 2010 at 12:41 AM, Uwe Schindler <[email protected]> wrote: > > Hi Shai, > > the md5 hash generated has nothing to do with concurrency anymore (the > concurrency thing was this NativeFSLock test method already removed). The > thing is the following: > > In early lucene versions, the lock files were put into TEMP directory. > Later > the lock factories allowed, to put the lock files into arbitrary folders. > For these both cases, the lock file name got an MD5 hash of the index > directory appended/prepended. In later Lucene versions the default for lock > files was changed to be the index folder. For backwards compatibility > reasons, with 2.9 and 3.0 you still had the possibility to instantiate a > LockFactory using a non-null path (using the ctor with a directory name). > FSLockFactory was programmed to support both cases (null directory or > explicit directory). When the lock directory is the same like the index > directory, the lock file got no hash appended. For the rare case that > somebody used a different folder (e.g. a temp directory), FSLockFactory was > falling back to the "old" behavior of adding the hash to the lock file > name. > > The magic for the md5 magic lock prefix is done if > FSDirectory#setLockFactory(). It checks for lockFactory extends > FSLockFactory and if yes then checks, that the LockFactories path name is > the same like the FSDir's or null. In that case it sets the lock prefix to > null. Otherwise the lock prefix is generated by calling the magic MD5 > creating method (Directory#getLockId()). > > In my opinion, in 3.x we should deprecate the separate path for the lock > file (Directory#getLockId()) and enforce the lockfile always to be placed > in > the index dir. LockFactory should not get a directory at all, but instead > should get the index dir on locking. For FS locks it would place the > write.lock file in the supplied folder and for other locks (like per-JVM > locks for RAMDirs) it could e.g. lookup the index dir in some map or > whatever. To place the lockfile somewhere else, you should be able to use > FileSwitchDirectory (currently not possible). > > Most tests in Lucene use the default (null lock dir in LockFactory), but > some tests for SimpleFSLockFactory & Co use the explicit directory names > and > therefore generate MD5 hashes to test the special behavior. > > For compatibility reasons we have to still use MD5 (to prevent different > lock file names after Lucene upgrade when FSDir is locked by another JVM > with older Lucene version). For 4.0 I would remove this stupidity and only > allow lock files in index directory. > > I hope I explained this stuff so everybody understand it, its really a > little bit confusing (how its implemented), but its "sophisticated > backwards" (haha). I would like to get rid of it and then we have no digest > code anymore. > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: [email protected] > > > -----Original Message----- > > From: Shai Erera [mailto:[email protected]] > > Sent: Saturday, December 25, 2010 3:04 PM > > To: [email protected] > > Subject: Re: LuceneTestCase.threadCleanup incorrectly reports left > running > > threads > > > > Actually, the MD5 thingy is an attempt to generate a unique temp lock ID, > > IIRC. so this piece of code can disappear entirely now that the tests > > concurrency is better. > > > > As for the other threads that are left running, I couldn't track down yet > the > > warning from the benchmark tests, but I'd love to get rid of those false > > warnings. I thought the stack trace could at least tell us who spawned > the > > thread, but obviously it's not always clear. > > > > Shai > > > > On Saturday, December 25, 2010, Robert Muir <[email protected]> wrote: > > > On Sat, Dec 25, 2010 at 4:04 AM, Uwe Schindler <[email protected]> > > wrote: > > >> Md5 is guaranteed to be there (like utf8 as charset). This is > documented in > > crypto Api, which algorithms are available for digest. > > >> > > > > > > where is this documented? its not in the javadocs. > > > > > > anyway, we shouldn't be doing this: > > > * this algorithm might not exist on J2ME etc (still java), you need to > > > install an extra crypto add-on. > > > * we shouldnt start up an expensive PKI infrastructure on mac os X, > > > including spawning a new thread, just to hash a string. thats absurd. > > > * we pay all these costs ... for md5! its not even a good hash! > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [email protected] For > > > additional commands, e-mail: [email protected] > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] For additional > > commands, e-mail: [email protected] > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > >
