[jira] Commented: (JCR-415) Enhance indexing of binary content

Marcel Reutegger (JIRA) Mon, 18 Dec 2006 07:45:47 -0800

    [ 
http://issues.apache.org/jira/browse/JCR-415?page=comments#action_12459384 ] 
            
Marcel Reutegger commented on JCR-415:
--------------------------------------


I would like to get this change into the next major release (1.3) and propose 
the following changes:

- Create a new module jackrabbit-text-extractors which will initially contain 
the jackrabbit-extractor patch provided by Jukka
- Migrate the jackrabbit-text-filters into the new extractors module
- Add jackrabbit-text-filters as dependency to jackrabbit-core
- Remove the jackrabbit-text-filters module and do not create releases anymore 
for this module. Jackrabbit would still support existing releases of 
jackrabbit-text-filters but the interface TextFilter will be deprecated (see 
Jukkas' patch) and developers are encouraged to use the new TextExtractor 
interface.

Does this make sense?

> Enhance indexing of binary content
> ----------------------------------
>
>                 Key: JCR-415
>                 URL: http://issues.apache.org/jira/browse/JCR-415
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: indexing
>    Affects Versions: 1.0, 1.0.1, 0.9
>            Reporter: Marcel Reutegger
>            Priority: Minor
>         Attachments: jackrabbit-extractor-r420472.patch, 
> jackrabbit-query-r420472.patch, jackrabbit-query-r421461.patch, 
> org.apache.jackrabbit.core.query-extractor.jpg, 
> org.apache.jackrabbit.core.query.lucene-extractor.jpg, 
> org.apache.jackrabbit.extractor.jpg
>
>
> Indexing of binary content should be enhanced in order to allow either 
> configuration what fields are indexed or provide better support for custom 
> NodeIndexer implementations.
> The current design has a couple of flaws that should be addressed at the same 
> time:
> - Reader instances are requested from the text filters even though the reader 
> might never be used
> - only jcr:data properties of nt:resource nodes are fulltext indexed
> - It is up to the text filter implementation to decide the lucene field name 
> for the text representation, responsibility should be moved to the 
> NodeIndexer. A text filter should only provide a Reader instance.
> With those changes a custom NodeIndexer can then decide if a binary property 
> has one or more representations in the index.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (JCR-415) Enhance indexing of binary content

Reply via email to