A few suggestions: . Queue the docs once they are complete using something like JMS.
. Get the document producers to write to e.g. xxx.tmp and rename to e.g. xxx.txt at the end . Get the document producers to write to a tmp folder and move to e.g. input/ when done . Find a file, store size, sleep for a while, check size and if changed, skip I've used all these at one time or another for assorted, mainly non-lucene, apps, and they are all workable. -- Ian. On Tue, Aug 4, 2009 at 4:40 PM, <oh...@cox.net> wrote: > Hi, > > I have an app to initially create a Lucene index, and to populate it with > documents. I'm now working on that app to insert new documents into that > Lucene index. > > In general, this new app, which is based loosely on the demo apps (e.g., > IndexFiles.java), is working, i.e., I can run it with a "create" parameter, > and it creates a good/valid index from the documents, and then I can run it > with an "insert" parameter, and it inserts new documents into the index. > > [As I mentioned in an earlier thread, we only have a requirement to insert > new documents into the index, no requirements for deleting documents or > updating documents that have already been indexed). > > Ok, as I said, that works so far. > > However, in our case, the processes that are creating the documents that we > are indexing are fairly long-lived, and write fairly large documents, and I'm > worried that when an insert operation is run, some of the potential documents > may still be being written to, and we wouldn't want the indexer to insert the > document into the Lucene index until the document is "complete". > > As you know, the way that the demos such as IndexFiles work is that they call > a method called IndexDocs(). IndexDocs() then recursively walks the > directory tree, and calling the writer to add to the index. > > In this loop, IndexDocs() does a few checks (isDirectory(), canRead), and I > think that it would "pick up" (find) some documents that are still "in > progress" (being written to, and not closed) in our case. > > I was wondering if anyone here has a situation similar to this (having to > index large documents that may be "in progress/being written to"), and how > you handle this situation? > > FYI, this is on Redhat Linux (and on Windows in my test environment). > > Thanks! > > Jim > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org