Re: Slightly Off-topic: How to decide whether or not to add a document?

ohaya Tue, 04 Aug 2009 09:03:50 -0700

Hi Ian,

Thanks for the quick response.


I forgot to mention, but in our case, the "producers" is part of a commercial 
package, so we don't have a way to get them to change anything, so I think the 
1st 3 suggestions are not feasible for us.

I have considered something like the 4th suggestion (check file size, timeout, 
and check file size again).  I'm worried that it would impact the overall index 
insertion process, but that unless there's something better, that may be our 
best option :(...

Thanks again,
Jim


---- Ian Lea <[email protected]> wrote: 
> A few suggestions:
> 
> . Queue the docs once they are complete using something like JMS.
> 
> . Get the document producers to write to e.g. xxx.tmp and rename to
> e.g. xxx.txt at the end
> 
> . Get the document producers to write to a tmp folder and move to e.g.
> input/ when done
> 
> . Find a file, store size, sleep for a while, check size and if changed, skip
> 
> I've used all these at one time or another for assorted, mainly
> non-lucene, apps, and they are all workable.
> 
> 
> --
> Ian.
> 
> 
> On Tue, Aug 4, 2009 at 4:40 PM, <[email protected]> wrote:
> > Hi,
> >
> > I have an app to initially create a Lucene index, and to populate it with 
> > documents.  I'm now working on that app to insert new documents into that 
> > Lucene index.
> >
> > In general, this new app, which is based loosely on the demo apps (e.g., 
> > IndexFiles.java), is working, i.e., I can run it with a "create" parameter, 
> > and it creates a good/valid index from the documents, and then I can run it 
> > with an "insert" parameter, and it inserts new documents into the index.
> >
> > [As I mentioned in an earlier thread, we only have a requirement to insert 
> > new documents into the index, no requirements for deleting documents or 
> > updating documents that have already been indexed).
> >
> > Ok, as I said, that works so far.
> >
> > However, in our case, the processes that are creating the documents that we 
> > are indexing are fairly long-lived, and write fairly large documents, and 
> > I'm worried that when an insert operation is run, some of the potential 
> > documents may still be being written to, and we wouldn't want the indexer 
> > to insert the document into the Lucene index until the document is 
> > "complete".
> >
> > As you know, the way that the demos such as IndexFiles work is that they 
> > call a method called IndexDocs().  IndexDocs() then recursively walks the 
> > directory tree, and calling the writer to add to the index.
> >
> > In this loop, IndexDocs() does a few checks (isDirectory(), canRead), and I 
> > think that it would "pick up" (find) some documents that are still "in 
> > progress" (being written to, and not closed) in our case.
> >
> > I was wondering if anyone here has a situation similar to this (having to 
> > index large documents that may be "in progress/being written to"), and how 
> > you handle this situation?
> >
> > FYI, this is on Redhat Linux (and on Windows in my test environment).
> >
> > Thanks!
> >
> > Jim
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Slightly Off-topic: How to decide whether or not to add a document?

Reply via email to