Re: Features to be supported while enabling boost support in Lucene Full text index

Tommaso Teofili Wed, 05 Nov 2014 00:53:01 -0800

Hi Chetan,

first of all thanks for all your great work on this.
I generally agree with you that we need to be on par with JR2 in terms of
capabilities.


Looking in more detail into the index configuration what about the
following format:

--------------------------------
        "indexRules" : {
            "rule0" : {
                "name" : "title", /* Unscoped property */
                "boost" : 2.0
            },
            "rule1" : {
                "type" : "nt:unstructured",
                "name" : "title", /* Scoped property */
                "boost" : 1.5
            },
            "rule2" : {
                "type" : "nt:file",
                "name" : "title", /* Scoped property */
                "boost" : 2.0,
                "condition" : "@priority = 'high'"
            },
        }
--------------------------------

The rationale is to handle each rule using the same structure, what do you
think? Would it be feasible?

Regards,
Tommaso


2014-11-05 5:57 GMT+01:00 Chetan Mehrotra <[email protected]>:

> Hi Team,
>
> With OAK-2178 some basic support for boosting has been added. However
> Jackrabbit used to support lots more fine grained boosting [1]. So for
> boost feature to be used in real world scenarios should we aim to
> implement similar support i.e. provide
>
> 1. Conditional boosting based on some criteria
> 2. Node level boosting based on NodeType
>
> Q.1 - Should we support all or some of that. It would introduce some
> complexity but probably for feature to be useful they need to be
> supported
>
> Q.2 - Config format - If we need to support all (or some of that) we
> would need to decide the index definition format
>
> Configuration Format
> -----------------------------
>
> As documented in [2] the new configuration format proposed and being
> used with Lucene Property Index is like following
>
> "assetIndex":
> {
>   "jcr:primaryType":"oak:QueryIndexDefinition",
>   "declaringNodeTypes":"app:Asset",
>   "includePropertyNames":["title", "type"],
>   "type":"lucene",
>   "async":"async",
>   "fulltextEnabled":false,
>   "orderedProps":["jcr:content/jcr:lastModified"]
>   "properties": {
>     "title" : { "boost" : 2.0 }
>   }
> }
>
> This works fine for property index where we would restrict the
> definition to some specific NodeType and specific propertyNames
>
> However for full text index which is more generic we would need to
> have way to distinguish properties for specific nodeTypes
>
> If we need to utilize same format to capture index rules at [2] then
> one way would be to capture nodeType scoped property definitions
> separately
>
> --------------------
> "properties": {
>             "title" : { "boost" : 2.0 } /* Unscoped property */
>         },
>         "indexRules" : {
>             "nt:unstructured" : {
>                 "properties" :{
>                     "title" : { /* Scoped property */
>                         "boost" : 1.5
>                     }
>                 }
>             },
>             "nt:file" : {
>                 "boost" : "2.0",
>                 "condition" : "@priority = 'high'"
>             }
>         }
> -----------------------
>
> With current design most of the conditions can be support except one
> involving ancesstor as Oak NodeState model does not allow traversing
> up easily
>
> Thoughts?
>
> Chetan Mehrotra
> [1] http://wiki.apache.org/jackrabbit/IndexingConfiguration
> [2] http://jackrabbit.apache.org/oak/docs/query/lucene.html
>

Re: Features to be supported while enabling boost support in Lucene Full text index

Reply via email to