Hi Ian,

Ok, thanks for the additional info.

I've implemented  check for both file.lastModified and file.length(), and it 
seems to work in my dev environment (Windows), so I'll have to test on a "real" 
system.

Thanks again,
Jim


---- Ian Lea <ian....@gmail.com> wrote: 
> Jim
> 
> 
> The sleep is simply
> 
>           try { Thread.sleep(millis); }
>           catch (InterruptedException ie) { }
> 
> No threading issues that I'm aware of, despite the method living in
> the Thread class.
> 
> But you're right about it possibly impacting performance, if you've
> got to sleep for a reasonable amount of time for each doc, if you've
> got loads of docs.  You can improve it by getting a list of possible
> files + size + lastmod + whatever, sleeping, then checking them all
> again i.e. only sleep once for each pass rather than once per file.
> 
> Yet another option is to forget about sleeping and check the lastmod
> timestamp and only index the doc if was finished some time ago.
> 
> And yet another ... make the producer write to /a/b/c and have a
> standalone non-lucene job that reads /a/b/c doing whatever checks you
> like, moving files to your input directory.
> 
> 
> That's more than enough options from me.
> 
> 
> --
> Ian.
> 
> On Tue, Aug 4, 2009 at 5:08 PM, <oh...@cox.net> wrote:
> > Ian,
> >
> > One question about the 4th alternative:  I was wondering how you 
> > implemented the sleep() in Java, esp. in such a way as not to mess up any 
> > of the Lucene stuff (in case there's threading)?
> >
> > Right now, my indexer/inserter app doesn't explicitly do any threading 
> > stuff.
> >
> > Thanks,
> > Jim
> >
> >
> > ---- oh...@cox.net wrote:
> >> Hi Ian,
> >>
> >> Thanks for the quick response.
> >>
> >> I forgot to mention, but in our case, the "producers" is part of a 
> >> commercial package, so we don't have a way to get them to change anything, 
> >> so I think the 1st 3 suggestions are not feasible for us.
> >>
> >> I have considered something like the 4th suggestion (check file size, 
> >> timeout, and check file size again).  I'm worried that it would impact the 
> >> overall index insertion process, but that unless there's something better, 
> >> that may be our best option :(...
> >>
> >> Thanks again,
> >> Jim
> >>
> >>
> >> ---- Ian Lea <ian....@gmail.com> wrote:
> >> > A few suggestions:
> >> >
> >> > . Queue the docs once they are complete using something like JMS.
> >> >
> >> > . Get the document producers to write to e.g. xxx.tmp and rename to
> >> > e.g. xxx.txt at the end
> >> >
> >> > . Get the document producers to write to a tmp folder and move to e.g.
> >> > input/ when done
> >> >
> >> > . Find a file, store size, sleep for a while, check size and if changed, 
> >> > skip
> >> >
> >> > I've used all these at one time or another for assorted, mainly
> >> > non-lucene, apps, and they are all workable.
> >> >
> >> >
> >> > --
> >> > Ian.
> >> >
> >> >
> >> > On Tue, Aug 4, 2009 at 4:40 PM, <oh...@cox.net> wrote:
> >> > > Hi,
> >> > >
> >> > > I have an app to initially create a Lucene index, and to populate it 
> >> > > with documents.  I'm now working on that app to insert new documents 
> >> > > into that Lucene index.
> >> > >
> >> > > In general, this new app, which is based loosely on the demo apps 
> >> > > (e.g., IndexFiles.java), is working, i.e., I can run it with a 
> >> > > "create" parameter, and it creates a good/valid index from the 
> >> > > documents, and then I can run it with an "insert" parameter, and it 
> >> > > inserts new documents into the index.
> >> > >
> >> > > [As I mentioned in an earlier thread, we only have a requirement to 
> >> > > insert new documents into the index, no requirements for deleting 
> >> > > documents or updating documents that have already been indexed).
> >> > >
> >> > > Ok, as I said, that works so far.
> >> > >
> >> > > However, in our case, the processes that are creating the documents 
> >> > > that we are indexing are fairly long-lived, and write fairly large 
> >> > > documents, and I'm worried that when an insert operation is run, some 
> >> > > of the potential documents may still be being written to, and we 
> >> > > wouldn't want the indexer to insert the document into the Lucene index 
> >> > > until the document is "complete".
> >> > >
> >> > > As you know, the way that the demos such as IndexFiles work is that 
> >> > > they call a method called IndexDocs().  IndexDocs() then recursively 
> >> > > walks the directory tree, and calling the writer to add to the index.
> >> > >
> >> > > In this loop, IndexDocs() does a few checks (isDirectory(), canRead), 
> >> > > and I think that it would "pick up" (find) some documents that are 
> >> > > still "in progress" (being written to, and not closed) in our case.
> >> > >
> >> > > I was wondering if anyone here has a situation similar to this (having 
> >> > > to index large documents that may be "in progress/being written to"), 
> >> > > and how you handle this situation?
> >> > >
> >> > > FYI, this is on Redhat Linux (and on Windows in my test environment).
> >> > >
> >> > > Thanks!
> >> > >
> >> > > Jim
> >> > >
> >> > >
> >> > > ---------------------------------------------------------------------
> >> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >> > >
> >> > >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >> >
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to