Re: Slightly Off-topic: How to decide whether or not to add a document?

Ian Lea Tue, 04 Aug 2009 08:56:36 -0700

A few suggestions:

. Queue the docs once they are complete using something like JMS.


. Get the document producers to write to e.g. xxx.tmp and rename to
e.g. xxx.txt at the end

. Get the document producers to write to a tmp folder and move to e.g.
input/ when done

. Find a file, store size, sleep for a while, check size and if changed, skip

I've used all these at one time or another for assorted, mainly
non-lucene, apps, and they are all workable.


--
Ian.


On Tue, Aug 4, 2009 at 4:40 PM, <oh...@cox.net> wrote:
> Hi,
>
> I have an app to initially create a Lucene index, and to populate it with 
> documents.  I'm now working on that app to insert new documents into that 
> Lucene index.
>
> In general, this new app, which is based loosely on the demo apps (e.g., 
> IndexFiles.java), is working, i.e., I can run it with a "create" parameter, 
> and it creates a good/valid index from the documents, and then I can run it 
> with an "insert" parameter, and it inserts new documents into the index.
>
> [As I mentioned in an earlier thread, we only have a requirement to insert 
> new documents into the index, no requirements for deleting documents or 
> updating documents that have already been indexed).
>
> Ok, as I said, that works so far.
>
> However, in our case, the processes that are creating the documents that we 
> are indexing are fairly long-lived, and write fairly large documents, and I'm 
> worried that when an insert operation is run, some of the potential documents 
> may still be being written to, and we wouldn't want the indexer to insert the 
> document into the Lucene index until the document is "complete".
>
> As you know, the way that the demos such as IndexFiles work is that they call 
> a method called IndexDocs().  IndexDocs() then recursively walks the 
> directory tree, and calling the writer to add to the index.
>
> In this loop, IndexDocs() does a few checks (isDirectory(), canRead), and I 
> think that it would "pick up" (find) some documents that are still "in 
> progress" (being written to, and not closed) in our case.
>
> I was wondering if anyone here has a situation similar to this (having to 
> index large documents that may be "in progress/being written to"), and how 
> you handle this situation?
>
> FYI, this is on Redhat Linux (and on Windows in my test environment).
>
> Thanks!
>
> Jim
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Slightly Off-topic: How to decide whether or not to add a document?

Reply via email to