[ 
https://issues.apache.org/jira/browse/OAK-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059566#comment-15059566
 ] 

Chetan Mehrotra commented on OAK-3576:
--------------------------------------

[~catholicon] Missed reviewing this new feature due to other on going work. So 
some late comments before we consider this as fixed

*SPI Interface*

{code}
public interface IndexFieldProvider {
    Iterable<Field> getAugmentedFields(final String path, final String 
propertyName,
                                   final NodeState document, final 
PropertyState property,
                                   final NodeState indexDefinition);
}
{code}

I think going per property is too fine grained. Instead it would be better to 
just pass in the NodeState of the root node which was found to be updated. So 
instead of that this interface should be like

{code}
    /**
     * Callback to add custom fields for given node
     *
     * @param path path of the document being indexed
     * @param node node being indexed
     * @param indexDefinition nodeState of the index definition node
     *
     * @return fields which are to be added as part of {@link 
org.apache.lucene.document.Document} being prepared
     * for node being indexed
     */
    Iterable<Field> getAugmentedFields(String path, NodeState node, NodeState 
indexDefinition);

    Set<String> getSupportedTypes();
{code}

*How callbacks are selected*

bq. All registered augmentors would get callbacks. It's upto the implementation 
to decide to quit early.

Thinking more about it now it would implementation bit tricky and each 
implementation would need to figure out how to opt out. So would be better to 
add a method {{supportedTypes}} where the implementation can provide a list of 
supported nodeTypes for which it wishes to participate. This aligns with the 
nodeType based approach used by indexing logic currently. Same rule should be 
applicable on the query side

This would impact other part of current design. For keeping things fast it 
would better to track this providers against the matching {{IndexRule}}

*IndexAugmentorFactory*

We can do away with that and instead provide default noop impls as part of 
interface itself. The tracking of multiple implementations can be simplified 
with use of {{Tracker}}. See {{ConsolidatedCacheStats}} for an example. Keep 
stuff consistent with matching {{IndexRule}} and OSGi dynamics would add some 
complication. I can take care of this part

*Misc*

* Move the SPI package to 
{{org.apache.jackrabbit.oak.plugins.index.lucene.spi.fieldprovider}} and add 
{{ConsumerType}} annotation. Also add package-info with initial version. This 
would ensure that addition of any new class or change in one othe SPI class 
does not require minor version update. 
*  Minor nit pick  - Avoid reordering of imports in commit. It causes issue in 
later merge to branches!


> Allow custom extension to augment indexed lucene documents
> ----------------------------------------------------------
>
>                 Key: OAK-3576
>                 URL: https://issues.apache.org/jira/browse/OAK-3576
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>            Reporter: Vikas Saurabh
>            Assignee: Vikas Saurabh
>             Fix For: 1.3.13
>
>         Attachments: OAK-3576.jsedding.patch, OAK-3576.wip.patch
>
>
> Following up on http://oak.markmail.org/thread/a53ahsgb3bowtwyq, we should 
> have an extension point in oak to allow custom code to add fields to 
> documents getting indexed in lucene. We'd also need to allow extension point 
> to add extra query terms to utilize such augmented fields.
> (cc [~teofili], [~chetanm])



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to