Slightly Off-topic: How to decide whether or not to add a document?

ohaya Tue, 04 Aug 2009 08:41:00 -0700

Hi,

I have an app to initially create a Lucene index, and to populate it with 
documents.  I'm now working on that app to insert new documents into that 
Lucene index.


In general, this new app, which is based loosely on the demo apps (e.g., 
IndexFiles.java), is working, i.e., I can run it with a "create" parameter, and 
it creates a good/valid index from the documents, and then I can run it with an 
"insert" parameter, and it inserts new documents into the index.

[As I mentioned in an earlier thread, we only have a requirement to insert new 
documents into the index, no requirements for deleting documents or updating 
documents that have already been indexed).

Ok, as I said, that works so far.

However, in our case, the processes that are creating the documents that we are 
indexing are fairly long-lived, and write fairly large documents, and I'm 
worried that when an insert operation is run, some of the potential documents 
may still be being written to, and we wouldn't want the indexer to insert the 
document into the Lucene index until the document is "complete".

As you know, the way that the demos such as IndexFiles work is that they call a 
method called IndexDocs().  IndexDocs() then recursively walks the directory 
tree, and calling the writer to add to the index.

In this loop, IndexDocs() does a few checks (isDirectory(), canRead), and I 
think that it would "pick up" (find) some documents that are still "in 
progress" (being written to, and not closed) in our case.

I was wondering if anyone here has a situation similar to this (having to index 
large documents that may be "in progress/being written to"), and how you handle 
this situation?

FYI, this is on Redhat Linux (and on Windows in my test environment).

Thanks!

Jim


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Slightly Off-topic: How to decide whether or not to add a document?

Reply via email to