Re: LazyTextExtractorField and background text extraction

Jukka Zitting Thu, 16 Jul 2009 03:29:19 -0700

Hi,

On Thu, Jul 16, 2009 at 11:51 AM, Marcel
Reutegger<[email protected]> wrote:
> I'm not sure I understand that correctly. with the current design
> multiple nodes are already indexed in parallel. but the index update
> as a whole will still be blocked, waiting for *all* nodes to be
> indexed.


OK, I'm just getting up to speed with the latest state of the indexing code.

If I understand correctly, we update the search index within the
transaction but if a text extraction task takes longer than the
configurable limit, that part of the index update is replaced with an
empty string and a new background task is fired to update the index
for that document once the text extraction is complete.

Would it be a problem to *always* defer text extraction to a
background task that's disconnected from the transaction? That would
make things a lot simpler at a slight loss of functionality.

Alternatively, we should probably move the extraction timeout handling
to some getExtractedText(long timeout) method that does a
wait(timeout) call on the extraction task, waiting for it to return
the extracted text as a String. If the timeout is reached, then just
an empty string is used and the rest of the extraction task is placed
in the indexing queue.

BR,

Jukka Zitting

Re: LazyTextExtractorField and background text extraction

Reply via email to