Ian, One question about the 4th alternative: I was wondering how you implemented the sleep() in Java, esp. in such a way as not to mess up any of the Lucene stuff (in case there's threading)?
Right now, my indexer/inserter app doesn't explicitly do any threading stuff. Thanks, Jim ---- [email protected] wrote: > Hi Ian, > > Thanks for the quick response. > > I forgot to mention, but in our case, the "producers" is part of a commercial > package, so we don't have a way to get them to change anything, so I think > the 1st 3 suggestions are not feasible for us. > > I have considered something like the 4th suggestion (check file size, > timeout, and check file size again). I'm worried that it would impact the > overall index insertion process, but that unless there's something better, > that may be our best option :(... > > Thanks again, > Jim > > > ---- Ian Lea <[email protected]> wrote: > > A few suggestions: > > > > . Queue the docs once they are complete using something like JMS. > > > > . Get the document producers to write to e.g. xxx.tmp and rename to > > e.g. xxx.txt at the end > > > > . Get the document producers to write to a tmp folder and move to e.g. > > input/ when done > > > > . Find a file, store size, sleep for a while, check size and if changed, > > skip > > > > I've used all these at one time or another for assorted, mainly > > non-lucene, apps, and they are all workable. > > > > > > -- > > Ian. > > > > > > On Tue, Aug 4, 2009 at 4:40 PM, <[email protected]> wrote: > > > Hi, > > > > > > I have an app to initially create a Lucene index, and to populate it with > > > documents. I'm now working on that app to insert new documents into that > > > Lucene index. > > > > > > In general, this new app, which is based loosely on the demo apps (e.g., > > > IndexFiles.java), is working, i.e., I can run it with a "create" > > > parameter, and it creates a good/valid index from the documents, and then > > > I can run it with an "insert" parameter, and it inserts new documents > > > into the index. > > > > > > [As I mentioned in an earlier thread, we only have a requirement to > > > insert new documents into the index, no requirements for deleting > > > documents or updating documents that have already been indexed). > > > > > > Ok, as I said, that works so far. > > > > > > However, in our case, the processes that are creating the documents that > > > we are indexing are fairly long-lived, and write fairly large documents, > > > and I'm worried that when an insert operation is run, some of the > > > potential documents may still be being written to, and we wouldn't want > > > the indexer to insert the document into the Lucene index until the > > > document is "complete". > > > > > > As you know, the way that the demos such as IndexFiles work is that they > > > call a method called IndexDocs(). IndexDocs() then recursively walks the > > > directory tree, and calling the writer to add to the index. > > > > > > In this loop, IndexDocs() does a few checks (isDirectory(), canRead), and > > > I think that it would "pick up" (find) some documents that are still "in > > > progress" (being written to, and not closed) in our case. > > > > > > I was wondering if anyone here has a situation similar to this (having to > > > index large documents that may be "in progress/being written to"), and > > > how you handle this situation? > > > > > > FYI, this is on Redhat Linux (and on Windows in my test environment). > > > > > > Thanks! > > > > > > Jim > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [email protected] > > > For additional commands, e-mail: [email protected] > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
