Thanks for the explanation Uwe. I got confused with that and the process name we add to the temp lock file.
About what you say, we have a scenario where we use a FSLockFactory outside an IndexWriter scope, but still within a search application scope. We lock a directory so that several processes can be synced on it. Another use case is when you manage several indexes, that should be locked at once -- the lock file may reside external to all of them. So I would not want to see LockFactory suddenly assuming and enforcing the lock file to be created in the index directory, or rely on IndexWriter passing it something etc. I think the way LF works today, allowing you to place the lock file wherever you want gives more freedom to people and opens up the door for more use cases. I don't think though that we should rely on MD5. A simple hash function should be enough IMO. Shai On Sun, Dec 26, 2010 at 12:41 AM, Uwe Schindler <[email protected]> wrote: > Hi Shai, > > the md5 hash generated has nothing to do with concurrency anymore (the > concurrency thing was this NativeFSLock test method already removed). The > thing is the following: > > In early lucene versions, the lock files were put into TEMP directory. > Later > the lock factories allowed, to put the lock files into arbitrary folders. > For these both cases, the lock file name got an MD5 hash of the index > directory appended/prepended. In later Lucene versions the default for lock > files was changed to be the index folder. For backwards compatibility > reasons, with 2.9 and 3.0 you still had the possibility to instantiate a > LockFactory using a non-null path (using the ctor with a directory name). > FSLockFactory was programmed to support both cases (null directory or > explicit directory). When the lock directory is the same like the index > directory, the lock file got no hash appended. For the rare case that > somebody used a different folder (e.g. a temp directory), FSLockFactory was > falling back to the "old" behavior of adding the hash to the lock file > name. > > The magic for the md5 magic lock prefix is done if > FSDirectory#setLockFactory(). It checks for lockFactory extends > FSLockFactory and if yes then checks, that the LockFactories path name is > the same like the FSDir's or null. In that case it sets the lock prefix to > null. Otherwise the lock prefix is generated by calling the magic MD5 > creating method (Directory#getLockId()). > > In my opinion, in 3.x we should deprecate the separate path for the lock > file (Directory#getLockId()) and enforce the lockfile always to be placed > in > the index dir. LockFactory should not get a directory at all, but instead > should get the index dir on locking. For FS locks it would place the > write.lock file in the supplied folder and for other locks (like per-JVM > locks for RAMDirs) it could e.g. lookup the index dir in some map or > whatever. To place the lockfile somewhere else, you should be able to use > FileSwitchDirectory (currently not possible). > > Most tests in Lucene use the default (null lock dir in LockFactory), but > some tests for SimpleFSLockFactory & Co use the explicit directory names > and > therefore generate MD5 hashes to test the special behavior. > > For compatibility reasons we have to still use MD5 (to prevent different > lock file names after Lucene upgrade when FSDir is locked by another JVM > with older Lucene version). For 4.0 I would remove this stupidity and only > allow lock files in index directory. > > I hope I explained this stuff so everybody understand it, its really a > little bit confusing (how its implemented), but its "sophisticated > backwards" (haha). I would like to get rid of it and then we have no digest > code anymore. > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: [email protected] > > > -----Original Message----- > > From: Shai Erera [mailto:[email protected]] > > Sent: Saturday, December 25, 2010 3:04 PM > > To: [email protected] > > Subject: Re: LuceneTestCase.threadCleanup incorrectly reports left > running > > threads > > > > Actually, the MD5 thingy is an attempt to generate a unique temp lock ID, > > IIRC. so this piece of code can disappear entirely now that the tests > > concurrency is better. > > > > As for the other threads that are left running, I couldn't track down yet > the > > warning from the benchmark tests, but I'd love to get rid of those false > > warnings. I thought the stack trace could at least tell us who spawned > the > > thread, but obviously it's not always clear. > > > > Shai > > > > On Saturday, December 25, 2010, Robert Muir <[email protected]> wrote: > > > On Sat, Dec 25, 2010 at 4:04 AM, Uwe Schindler <[email protected]> > > wrote: > > >> Md5 is guaranteed to be there (like utf8 as charset). This is > documented in > > crypto Api, which algorithms are available for digest. > > >> > > > > > > where is this documented? its not in the javadocs. > > > > > > anyway, we shouldn't be doing this: > > > * this algorithm might not exist on J2ME etc (still java), you need to > > > install an extra crypto add-on. > > > * we shouldnt start up an expensive PKI infrastructure on mac os X, > > > including spawning a new thread, just to hash a string. thats absurd. > > > * we pay all these costs ... for md5! its not even a good hash! > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [email protected] For > > > additional commands, e-mail: [email protected] > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] For additional > > commands, e-mail: [email protected] > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
