[ http://issues.apache.org/jira/browse/JCR-390?page=all ]

Jukka Zitting updated JCR-390:
------------------------------

    Version: 1.0.1

> Move text extraction into a background thread
> ---------------------------------------------
>
>          Key: JCR-390
>          URL: http://issues.apache.org/jira/browse/JCR-390
>      Project: Jackrabbit
>         Type: Improvement

>   Components: indexing
>     Versions: 1.0, 1.0.1
>  Environment: all
>     Reporter: Marcel Reutegger
>     Assignee: Marcel Reutegger
>     Priority: Minor

>
> Even though text extraction is not done right on save() most of the 
> extraction work is later done by a client thread. There is a mechanism in 
> place that commits the deferred work in a background thread. But the 
> background thread is only triggered by a timer and does not constantly write 
> back pending index changes. For regular index changes this is done on purpose 
> and should not be changed. However text extraction work should be moved 
> completely into a background thread because it often takes a fair amount of 
> time to index a large document.
> Outline of a possible solution:
> - all text filtering is tasks are put into a work queue
> - the work queue is processed by a background thread
> - basic indexing of nt:resource without text filtering takes place
> - the background thread updates the index when text filtering completed for a 
> nt:resource
> There should be a configuration parameter that allows to execute text 
> filtering without the background thread. This way it is possible to get the 
> existing behaviour of Jackrabbit: the fulltext index is always up-to-date and 
> can be used.
> With the background process this is no longer the case.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to