[jira] [Commented] (OAK-3336) Abstract a full text index implementation to be extended by Lucene and Solr

2018-09-19 Thread Tommaso Teofili (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620381#comment-16620381
 ] 

Tommaso Teofili commented on OAK-3336:
--

I noticed right after the commit was done that I accidentally reverted a couple 
of changes.

I have already committed a fix.

> Abstract a full text index implementation to be extended by Lucene and Solr
> ---
>
> Key: OAK-3336
> URL: https://issues.apache.org/jira/browse/OAK-3336
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query, solr
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: 1.10
>
>
> Current Lucene and Solr indexes implement quite a no. of features according 
> to their specific APIs, design and implementation. However in the long run, 
> while differences in APIs and implementations will / can of course stay, the 
> difference in design can make it hard to keep those features on par.
> It'd be therefore nice to make it possible to abstract as much of design and 
> implementation bits as possible in an abstract full text implementation which 
> Lucene and Solr would extend according to their specifics.
> An example advantage of this is that index time aggregation will be 
> implemented only once and therefore any bugfixes and improvements in that 
> area will be done in the abstract implementation rather than having to do 
> that in two places.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-3336) Abstract a full text index implementation to be extended by Lucene and Solr

2018-09-19 Thread Amit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620294#comment-16620294
 ] 

Amit Jain commented on OAK-3336:


[~teofili] The commit breaks the build

{noformat}
[INFO] -[ERROR] 
COMPILATION ERROR : [INFO] 
-[ERROR] 
/home/jenkins/jenkins-slave/workspace/Jackrabbit 
Oak/oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/PropertyDefinition.java:[43,1]
 cannot find symbol  symbol:   static DEFAULT_PROPERTY_WEIGHT
  location: class[ERROR] /home/jenkins/jenkins-slave/workspace/Jackrabbit 
Oak/oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/PropertyDefinition.java:[134,59]
 cannot find symbol  symbol:   variable DEFAULT_PROPERTY_WEIGHT
  location: class 
org.apache.jackrabbit.oak.plugins.index.search.PropertyDefinition[ERROR] 
/home/jenkins/jenkins-slave/workspace/Jackrabbit 
Oak/oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/spi/editor/FulltextIndexEditorContext.java:[268,13]
 cannot find symbol  symbol:   method 
root(org.apache.jackrabbit.oak.spi.state.NodeState)
  location: class 
org.apache.jackrabbit.oak.plugins.index.search.IndexDefinition.Builder[ERROR] 
/home/jenkins/jenkins-slave/workspace/Jackrabbit 
Oak/oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/ReindexOperations.java:[58,17]
 cannot find symbol  symbol:   method 
root(org.apache.jackrabbit.oak.spi.state.NodeState)
  location: variable indexDefBuilder of type 
org.apache.jackrabbit.oak.plugins.index.search.IndexDefinition.Builder
{noformat}

> Abstract a full text index implementation to be extended by Lucene and Solr
> ---
>
> Key: OAK-3336
> URL: https://issues.apache.org/jira/browse/OAK-3336
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query, solr
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: 1.10
>
>
> Current Lucene and Solr indexes implement quite a no. of features according 
> to their specific APIs, design and implementation. However in the long run, 
> while differences in APIs and implementations will / can of course stay, the 
> difference in design can make it hard to keep those features on par.
> It'd be therefore nice to make it possible to abstract as much of design and 
> implementation bits as possible in an abstract full text implementation which 
> Lucene and Solr would extend according to their specifics.
> An example advantage of this is that index time aggregation will be 
> implemented only once and therefore any bugfixes and improvements in that 
> area will be done in the abstract implementation rather than having to do 
> that in two places.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-3336) Abstract a full text index implementation to be extended by Lucene and Solr

2018-09-19 Thread Tommaso Teofili (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620237#comment-16620237
 ] 

Tommaso Teofili commented on OAK-3336:
--

added and adjusted SPIs in r1841291.

> Abstract a full text index implementation to be extended by Lucene and Solr
> ---
>
> Key: OAK-3336
> URL: https://issues.apache.org/jira/browse/OAK-3336
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query, solr
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: 1.10
>
>
> Current Lucene and Solr indexes implement quite a no. of features according 
> to their specific APIs, design and implementation. However in the long run, 
> while differences in APIs and implementations will / can of course stay, the 
> difference in design can make it hard to keep those features on par.
> It'd be therefore nice to make it possible to abstract as much of design and 
> implementation bits as possible in an abstract full text implementation which 
> Lucene and Solr would extend according to their specifics.
> An example advantage of this is that index time aggregation will be 
> implemented only once and therefore any bugfixes and improvements in that 
> area will be done in the abstract implementation rather than having to do 
> that in two places.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-3336) Abstract a full text index implementation to be extended by Lucene and Solr

2018-08-06 Thread Julian Reschke (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16570206#comment-16570206
 ] 

Julian Reschke commented on OAK-3336:
-

Adjusted baseline check in [r1837519|http://svn.apache.org/r1837519]

> Abstract a full text index implementation to be extended by Lucene and Solr
> ---
>
> Key: OAK-3336
> URL: https://issues.apache.org/jira/browse/OAK-3336
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query, solr
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: 1.10
>
>
> Current Lucene and Solr indexes implement quite a no. of features according 
> to their specific APIs, design and implementation. However in the long run, 
> while differences in APIs and implementations will / can of course stay, the 
> difference in design can make it hard to keep those features on par.
> It'd be therefore nice to make it possible to abstract as much of design and 
> implementation bits as possible in an abstract full text implementation which 
> Lucene and Solr would extend according to their specifics.
> An example advantage of this is that index time aggregation will be 
> implemented only once and therefore any bugfixes and improvements in that 
> area will be done in the abstract implementation rather than having to do 
> that in two places.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-3336) Abstract a full text index implementation to be extended by Lucene and Solr

2018-04-12 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16435404#comment-16435404
 ] 

Tommaso Teofili commented on OAK-3336:
--

copied some Lucene agnostic classes from _oak-lucene_ to _oak-search_ in 
r1828972.

> Abstract a full text index implementation to be extended by Lucene and Solr
> ---
>
> Key: OAK-3336
> URL: https://issues.apache.org/jira/browse/OAK-3336
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query, solr
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: 1.10
>
>
> Current Lucene and Solr indexes implement quite a no. of features according 
> to their specific APIs, design and implementation. However in the long run, 
> while differences in APIs and implementations will / can of course stay, the 
> difference in design can make it hard to keep those features on par.
> It'd be therefore nice to make it possible to abstract as much of design and 
> implementation bits as possible in an abstract full text implementation which 
> Lucene and Solr would extend according to their specifics.
> An example advantage of this is that index time aggregation will be 
> implemented only once and therefore any bugfixes and improvements in that 
> area will be done in the abstract implementation rather than having to do 
> that in two places.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-3336) Abstract a full text index implementation to be extended by Lucene and Solr

2018-04-04 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426448#comment-16426448
 ] 

Vikas Saurabh commented on OAK-3336:


These most likely would call for separate issue/tasks but, it would be 
useful to remember what we ([~teofili], [~tmueller] and I) discussed off-list 
in a brain storming session:
h4. Index definitions
* can likely be common for most parts
* analyzer – probably specific to lucene??
** even if some (say solr or ES) allow for different definitions use different 
analyzers - but the concept might not be generic for oak-search module
* tika – common
* aggregates – common
* property definitions – common, except below??
** Suggestion
** Spellcheck
** Facet
** Excerpt
** Function index - probably common as function indexes are essentially just 
providing value to be indexed

h4. Editor
* When to index – most likely common as values affecting state change should be 
independent of index provider in play
* What to index – most likely common except for following??
** Spellcheck
** Suggestion
** Facet
** Excerpt
** Custom Field provider – common
*** needs to be made independed of lucene Fields though
*** how to deprecate current SPI?
** How to index – has to be custom for each index provider

h4. Sync indexing
should be common as its storage is node state based. Sync indexed data doesn't 
go to centrally indexed async information

h4. NRT
* should be common similar to sync indexing above
* BUT would require oak-search to use lucene (which might be debatable)

h4. CoR/CoW or counterparts
* custom on need basis
* most likely relevant only for lucene indexes but utilities could be useful to 
support different lucene versions (is that a goal??)

h4. Query
* Index selection – can I answer this query
** common
** this would be part of planner which checks index definition to see if the 
index can answer a give query
* Cost estimation
** needs to be custom as it's highly tied to "how a given indexer indexes data" 
AND "how costly would it be to get a good fast estimate"
** some parts might be common like
*** how many unique values does a given constraint have
*** what's the worst case result count (maybe backed by node counter) in case 
concrete implementation can't get that information in a fast manner
* Custom query terms provider – can be common (similar to Custom field provider)
** needs to be made independent of Lucene Query
** how to deprecate current SPI?
* Low level query – needs to be custom
** But, maybe, we can utilize current form of LuceneProperyIndex and translate 
LuceneQuery AST to underlying engine’s query

h4. Text extraction + tika configuration - common
* as similar to function index, this is about generating data to index
* should we allow for some implementations that might be interested in doing 
their own extractions?
** maybe with a caution that "external text extraction is out of control - so 
expectation of extraction feature parity is implicitly undefined"

h4. Tests
* Lucene tests have pretty decent coverage
* Since the idea is to abstract as much stuff as possible, so, any test that’s 
querying and verifying result should be parametrized to all index providers 
(all = lucene, solr, ES, etc)

> Abstract a full text index implementation to be extended by Lucene and Solr
> ---
>
> Key: OAK-3336
> URL: https://issues.apache.org/jira/browse/OAK-3336
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query, solr
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: 1.10
>
>
> Current Lucene and Solr indexes implement quite a no. of features according 
> to their specific APIs, design and implementation. However in the long run, 
> while differences in APIs and implementations will / can of course stay, the 
> difference in design can make it hard to keep those features on par.
> It'd be therefore nice to make it possible to abstract as much of design and 
> implementation bits as possible in an abstract full text implementation which 
> Lucene and Solr would extend according to their specifics.
> An example advantage of this is that index time aggregation will be 
> implemented only once and therefore any bugfixes and improvements in that 
> area will be done in the abstract implementation rather than having to do 
> that in two places.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-3336) Abstract a full text index implementation to be extended by Lucene and Solr

2018-04-04 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425456#comment-16425456
 ] 

Tommaso Teofili commented on OAK-3336:
--

created (empty) _oak-search_ module in r1828335.

> Abstract a full text index implementation to be extended by Lucene and Solr
> ---
>
> Key: OAK-3336
> URL: https://issues.apache.org/jira/browse/OAK-3336
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query, solr
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: 1.10
>
>
> Current Lucene and Solr indexes implement quite a no. of features according 
> to their specific APIs, design and implementation. However in the long run, 
> while differences in APIs and implementations will / can of course stay, the 
> difference in design can make it hard to keep those features on par.
> It'd be therefore nice to make it possible to abstract as much of design and 
> implementation bits as possible in an abstract full text implementation which 
> Lucene and Solr would extend according to their specifics.
> An example advantage of this is that index time aggregation will be 
> implemented only once and therefore any bugfixes and improvements in that 
> area will be done in the abstract implementation rather than having to do 
> that in two places.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)