Re: LuceneTestCase.threadCleanup incorrectly reports left running threads

Shai Erera Sat, 25 Dec 2010 22:43:26 -0800

Thanks for the explanation Uwe. I got confused with that and the process
name we add to the temp lock file.


About what you say, we have a scenario where we use a FSLockFactory outside
an IndexWriter scope, but still within a search application scope. We lock a
directory so that several processes can be synced on it. Another use case is
when you manage several indexes, that should be locked at once -- the lock
file may reside external to all of them.

So I would not want to see LockFactory suddenly assuming and enforcing the
lock file to be created in the index directory, or rely on IndexWriter
passing it something etc. I think the way LF works today, allowing you to
place the lock file wherever you want gives more freedom to people and opens
up the door for more use cases.

I don't think though that we should rely on MD5. A simple hash function
should be enough IMO.

Shai

On Sun, Dec 26, 2010 at 12:41 AM, Uwe Schindler <[email protected]> wrote:

> Hi Shai,
>
> the md5 hash generated has nothing to do with concurrency anymore (the
> concurrency thing was this NativeFSLock test method already removed). The
> thing is the following:
>
> In early lucene versions, the lock files were put into TEMP directory.
> Later
> the lock factories allowed, to put the lock files into arbitrary folders.
> For these both cases, the lock file name got an MD5 hash of the index
> directory appended/prepended. In later Lucene versions the default for lock
> files was changed to be the index folder. For backwards compatibility
> reasons, with 2.9 and 3.0 you still had the possibility to instantiate a
> LockFactory using a non-null path (using the ctor with a directory name).
> FSLockFactory was programmed to support both cases (null directory or
> explicit directory). When the lock directory is the same like the index
> directory, the lock file got no hash appended. For the rare case that
> somebody used a different folder (e.g. a temp directory), FSLockFactory was
> falling back to the "old" behavior of adding the hash to the lock file
> name.
>
> The magic for the md5 magic lock prefix is done if
> FSDirectory#setLockFactory(). It checks for lockFactory extends
> FSLockFactory and if yes then checks, that the LockFactories path name is
> the same like the FSDir's or null. In that case it sets the lock prefix to
> null. Otherwise the lock prefix is generated by calling the magic MD5
> creating method (Directory#getLockId()).
>
> In my opinion, in 3.x we should deprecate the separate path for the lock
> file (Directory#getLockId()) and enforce the lockfile always to be placed
> in
> the index dir. LockFactory should not get a directory at all, but instead
> should get the index dir on locking. For FS locks it would place the
> write.lock file in the supplied folder and for other locks (like per-JVM
> locks for RAMDirs) it could e.g. lookup the index dir in some map or
> whatever. To place the lockfile somewhere else, you should be able to use
> FileSwitchDirectory (currently not possible).
>
> Most tests in Lucene use the default (null lock dir in LockFactory), but
> some tests for SimpleFSLockFactory & Co use the explicit directory names
> and
> therefore generate MD5 hashes to test the special behavior.
>
> For compatibility reasons we have to still use MD5 (to prevent different
> lock file names after Lucene upgrade when FSDir is locked by another JVM
> with older Lucene version). For 4.0 I would remove this stupidity and only
> allow lock files in index directory.
>
> I hope I explained this stuff so everybody understand it, its really a
> little bit confusing (how its implemented), but its "sophisticated
> backwards" (haha). I would like to get rid of it and then we have no digest
> code anymore.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [email protected]
>
> > -----Original Message-----
> > From: Shai Erera [mailto:[email protected]]
> > Sent: Saturday, December 25, 2010 3:04 PM
> > To: [email protected]
> > Subject: Re: LuceneTestCase.threadCleanup incorrectly reports left
> running
> > threads
> >
> > Actually, the MD5 thingy is an attempt to generate a unique temp lock ID,
> > IIRC. so this piece of code can disappear entirely now that the tests
> > concurrency is better.
> >
> > As for the other threads that are left running, I couldn't track down yet
> the
> > warning from the benchmark tests, but I'd love to get rid of those false
> > warnings. I thought the stack trace could at least tell us who spawned
> the
> > thread, but obviously it's not always clear.
> >
> > Shai
> >
> > On Saturday, December 25, 2010, Robert Muir <[email protected]> wrote:
> > > On Sat, Dec 25, 2010 at 4:04 AM, Uwe Schindler <[email protected]>
> > wrote:
> > >> Md5 is guaranteed to be there (like utf8 as charset). This is
> documented in
> > crypto Api, which algorithms are available for digest.
> > >>
> > >
> > > where is this documented? its not in the javadocs.
> > >
> > > anyway, we shouldn't be doing this:
> > > * this algorithm might not exist on J2ME etc (still java), you need to
> > > install an extra crypto add-on.
> > > * we shouldnt start up an expensive PKI infrastructure on mac os X,
> > > including spawning a new thread, just to hash a string. thats absurd.
> > > * we pay all these costs ... for md5! its not even a good hash!
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected] For
> > > additional commands, e-mail: [email protected]
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected] For additional
> > commands, e-mail: [email protected]
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: LuceneTestCase.threadCleanup incorrectly reports left running threads

Reply via email to