Ian,

One question about the 4th alternative:  I was wondering how you implemented 
the sleep() in Java, esp. in such a way as not to mess up any of the Lucene 
stuff (in case there's threading)?

Right now, my indexer/inserter app doesn't explicitly do any threading stuff.

Thanks,
Jim


---- oh...@cox.net wrote: 
> Hi Ian,
> 
> Thanks for the quick response.
> 
> I forgot to mention, but in our case, the "producers" is part of a commercial 
> package, so we don't have a way to get them to change anything, so I think 
> the 1st 3 suggestions are not feasible for us.
> 
> I have considered something like the 4th suggestion (check file size, 
> timeout, and check file size again).  I'm worried that it would impact the 
> overall index insertion process, but that unless there's something better, 
> that may be our best option :(...
> 
> Thanks again,
> Jim
> 
> 
> ---- Ian Lea <ian....@gmail.com> wrote: 
> > A few suggestions:
> > 
> > . Queue the docs once they are complete using something like JMS.
> > 
> > . Get the document producers to write to e.g. xxx.tmp and rename to
> > e.g. xxx.txt at the end
> > 
> > . Get the document producers to write to a tmp folder and move to e.g.
> > input/ when done
> > 
> > . Find a file, store size, sleep for a while, check size and if changed, 
> > skip
> > 
> > I've used all these at one time or another for assorted, mainly
> > non-lucene, apps, and they are all workable.
> > 
> > 
> > --
> > Ian.
> > 
> > 
> > On Tue, Aug 4, 2009 at 4:40 PM, <oh...@cox.net> wrote:
> > > Hi,
> > >
> > > I have an app to initially create a Lucene index, and to populate it with 
> > > documents.  I'm now working on that app to insert new documents into that 
> > > Lucene index.
> > >
> > > In general, this new app, which is based loosely on the demo apps (e.g., 
> > > IndexFiles.java), is working, i.e., I can run it with a "create" 
> > > parameter, and it creates a good/valid index from the documents, and then 
> > > I can run it with an "insert" parameter, and it inserts new documents 
> > > into the index.
> > >
> > > [As I mentioned in an earlier thread, we only have a requirement to 
> > > insert new documents into the index, no requirements for deleting 
> > > documents or updating documents that have already been indexed).
> > >
> > > Ok, as I said, that works so far.
> > >
> > > However, in our case, the processes that are creating the documents that 
> > > we are indexing are fairly long-lived, and write fairly large documents, 
> > > and I'm worried that when an insert operation is run, some of the 
> > > potential documents may still be being written to, and we wouldn't want 
> > > the indexer to insert the document into the Lucene index until the 
> > > document is "complete".
> > >
> > > As you know, the way that the demos such as IndexFiles work is that they 
> > > call a method called IndexDocs().  IndexDocs() then recursively walks the 
> > > directory tree, and calling the writer to add to the index.
> > >
> > > In this loop, IndexDocs() does a few checks (isDirectory(), canRead), and 
> > > I think that it would "pick up" (find) some documents that are still "in 
> > > progress" (being written to, and not closed) in our case.
> > >
> > > I was wondering if anyone here has a situation similar to this (having to 
> > > index large documents that may be "in progress/being written to"), and 
> > > how you handle this situation?
> > >
> > > FYI, this is on Redhat Linux (and on Windows in my test environment).
> > >
> > > Thanks!
> > >
> > > Jim
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> > >
> > >
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to