Hi, On Thu, Jul 16, 2009 at 11:51 AM, Marcel Reutegger<[email protected]> wrote: > I'm not sure I understand that correctly. with the current design > multiple nodes are already indexed in parallel. but the index update > as a whole will still be blocked, waiting for *all* nodes to be > indexed.
OK, I'm just getting up to speed with the latest state of the indexing code. If I understand correctly, we update the search index within the transaction but if a text extraction task takes longer than the configurable limit, that part of the index update is replaced with an empty string and a new background task is fired to update the index for that document once the text extraction is complete. Would it be a problem to *always* defer text extraction to a background task that's disconnected from the transaction? That would make things a lot simpler at a slight loss of functionality. Alternatively, we should probably move the extraction timeout handling to some getExtractedText(long timeout) method that does a wait(timeout) call on the extraction task, waiting for it to return the extracted text as a String. If the timeout is reached, then just an empty string is used and the rest of the extraction task is placed in the indexing queue. BR, Jukka Zitting
