Hi Ian, Ok, thanks for the additional info.
I've implemented check for both file.lastModified and file.length(), and it seems to work in my dev environment (Windows), so I'll have to test on a "real" system. Thanks again, Jim ---- Ian Lea <ian....@gmail.com> wrote: > Jim > > > The sleep is simply > > try { Thread.sleep(millis); } > catch (InterruptedException ie) { } > > No threading issues that I'm aware of, despite the method living in > the Thread class. > > But you're right about it possibly impacting performance, if you've > got to sleep for a reasonable amount of time for each doc, if you've > got loads of docs. You can improve it by getting a list of possible > files + size + lastmod + whatever, sleeping, then checking them all > again i.e. only sleep once for each pass rather than once per file. > > Yet another option is to forget about sleeping and check the lastmod > timestamp and only index the doc if was finished some time ago. > > And yet another ... make the producer write to /a/b/c and have a > standalone non-lucene job that reads /a/b/c doing whatever checks you > like, moving files to your input directory. > > > That's more than enough options from me. > > > -- > Ian. > > On Tue, Aug 4, 2009 at 5:08 PM, <oh...@cox.net> wrote: > > Ian, > > > > One question about the 4th alternative: I was wondering how you > > implemented the sleep() in Java, esp. in such a way as not to mess up any > > of the Lucene stuff (in case there's threading)? > > > > Right now, my indexer/inserter app doesn't explicitly do any threading > > stuff. > > > > Thanks, > > Jim > > > > > > ---- oh...@cox.net wrote: > >> Hi Ian, > >> > >> Thanks for the quick response. > >> > >> I forgot to mention, but in our case, the "producers" is part of a > >> commercial package, so we don't have a way to get them to change anything, > >> so I think the 1st 3 suggestions are not feasible for us. > >> > >> I have considered something like the 4th suggestion (check file size, > >> timeout, and check file size again). I'm worried that it would impact the > >> overall index insertion process, but that unless there's something better, > >> that may be our best option :(... > >> > >> Thanks again, > >> Jim > >> > >> > >> ---- Ian Lea <ian....@gmail.com> wrote: > >> > A few suggestions: > >> > > >> > . Queue the docs once they are complete using something like JMS. > >> > > >> > . Get the document producers to write to e.g. xxx.tmp and rename to > >> > e.g. xxx.txt at the end > >> > > >> > . Get the document producers to write to a tmp folder and move to e.g. > >> > input/ when done > >> > > >> > . Find a file, store size, sleep for a while, check size and if changed, > >> > skip > >> > > >> > I've used all these at one time or another for assorted, mainly > >> > non-lucene, apps, and they are all workable. > >> > > >> > > >> > -- > >> > Ian. > >> > > >> > > >> > On Tue, Aug 4, 2009 at 4:40 PM, <oh...@cox.net> wrote: > >> > > Hi, > >> > > > >> > > I have an app to initially create a Lucene index, and to populate it > >> > > with documents. I'm now working on that app to insert new documents > >> > > into that Lucene index. > >> > > > >> > > In general, this new app, which is based loosely on the demo apps > >> > > (e.g., IndexFiles.java), is working, i.e., I can run it with a > >> > > "create" parameter, and it creates a good/valid index from the > >> > > documents, and then I can run it with an "insert" parameter, and it > >> > > inserts new documents into the index. > >> > > > >> > > [As I mentioned in an earlier thread, we only have a requirement to > >> > > insert new documents into the index, no requirements for deleting > >> > > documents or updating documents that have already been indexed). > >> > > > >> > > Ok, as I said, that works so far. > >> > > > >> > > However, in our case, the processes that are creating the documents > >> > > that we are indexing are fairly long-lived, and write fairly large > >> > > documents, and I'm worried that when an insert operation is run, some > >> > > of the potential documents may still be being written to, and we > >> > > wouldn't want the indexer to insert the document into the Lucene index > >> > > until the document is "complete". > >> > > > >> > > As you know, the way that the demos such as IndexFiles work is that > >> > > they call a method called IndexDocs(). IndexDocs() then recursively > >> > > walks the directory tree, and calling the writer to add to the index. > >> > > > >> > > In this loop, IndexDocs() does a few checks (isDirectory(), canRead), > >> > > and I think that it would "pick up" (find) some documents that are > >> > > still "in progress" (being written to, and not closed) in our case. > >> > > > >> > > I was wondering if anyone here has a situation similar to this (having > >> > > to index large documents that may be "in progress/being written to"), > >> > > and how you handle this situation? > >> > > > >> > > FYI, this is on Redhat Linux (and on Windows in my test environment). > >> > > > >> > > Thanks! > >> > > > >> > > Jim > >> > > > >> > > > >> > > --------------------------------------------------------------------- > >> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> > > For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > > > >> > > > >> > > >> > --------------------------------------------------------------------- > >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> > For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org