[jira] [Updated] (SOLR-6581) Prepare CollapsingQParserPlugin and ExpandComponent for 5.0

2015-01-10 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-6581:
-
Attachment: SOLR-6581.patch

Patch with bugfix when collapsing on numeric fields that was turned up during 
manual testing. Testcases have be updated to catch this bug as well.

 Prepare CollapsingQParserPlugin and ExpandComponent for 5.0
 ---

 Key: SOLR-6581
 URL: https://issues.apache.org/jira/browse/SOLR-6581
 Project: Solr
  Issue Type: Bug
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, renames.diff


 *Background*
 The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
 are optimized to work with a top level FieldCache. Top level FieldCaches have 
 a very fast docID to top-level ordinal lookup. Fast access to the top-level 
 ordinals allows for very high performance field collapsing on high 
 cardinality fields. 
 LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
 FieldCache is no longer in regular use. Instead all top level caches are 
 accessed through MultiDocValues. 
 There are some major advantages of using the MultiDocValues rather then a top 
 level FieldCache. But there is one disadvantage, the lookup from docId to 
 top-level ordinals is slower using MultiDocValues.
 My testing has shown that *after optimizing* the CollapsingQParserPlugin code 
 to use MultiDocValues, the performance drop is around 100%.  For some use 
 cases this performance drop is a blocker.
 *What About Faceting?*
 String faceting also relies on the top level ordinals. Is faceting 
 performance affected also? My testing has shown that the faceting performance 
 is affected much less then collapsing. 
 One possible reason for this may be that field collapsing is memory bound and 
 faceting is not. So the additional memory accesses needed for MultiDocValues 
 affects field collapsing much more then faceting.
 *Proposed Solution*
 The proposed solution is to have the default Collapse and Expand algorithm 
 use MultiDocValues, but to provide an option to use a top level FieldCache if 
 the performance of MultiDocValues is a blocker.
 The proposed mechanism for switching to the FieldCache would be a new hint 
 parameter. If the hint parameter is set to FAST_QUERY then the top-level 
 FieldCache would be used for both Collapse and Expand.
 Example syntax:
 {code}
 fq={!collapse field=x hint=FAST_QUERY}
 {code}
  
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6581) Prepare CollapsingQParserPlugin and ExpandComponent for 5.0

2015-01-09 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-6581:
-
Attachment: SOLR-6581.patch

Added more error handling and removed all debugging/timing code.

 Prepare CollapsingQParserPlugin and ExpandComponent for 5.0
 ---

 Key: SOLR-6581
 URL: https://issues.apache.org/jira/browse/SOLR-6581
 Project: Solr
  Issue Type: Bug
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, renames.diff


 *Background*
 The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
 are optimized to work with a top level FieldCache. Top level FieldCaches have 
 a very fast docID to top-level ordinal lookup. Fast access to the top-level 
 ordinals allows for very high performance field collapsing on high 
 cardinality fields. 
 LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
 FieldCache is no longer in regular use. Instead all top level caches are 
 accessed through MultiDocValues. 
 There are some major advantages of using the MultiDocValues rather then a top 
 level FieldCache. But there is one disadvantage, the lookup from docId to 
 top-level ordinals is slower using MultiDocValues.
 My testing has shown that *after optimizing* the CollapsingQParserPlugin code 
 to use MultiDocValues, the performance drop is around 100%.  For some use 
 cases this performance drop is a blocker.
 *What About Faceting?*
 String faceting also relies on the top level ordinals. Is faceting 
 performance affected also? My testing has shown that the faceting performance 
 is affected much less then collapsing. 
 One possible reason for this may be that field collapsing is memory bound and 
 faceting is not. So the additional memory accesses needed for MultiDocValues 
 affects field collapsing much more then faceting.
 *Proposed Solution*
 The proposed solution is to have the default Collapse and Expand algorithm 
 use MultiDocValues, but to provide an option to use a top level FieldCache if 
 the performance of MultiDocValues is a blocker.
 The proposed mechanism for switching to the FieldCache would be a new hint 
 parameter. If the hint parameter is set to FAST_QUERY then the top-level 
 FieldCache would be used for both Collapse and Expand.
 Example syntax:
 {code}
 fq={!collapse field=x hint=FAST_QUERY}
 {code}
  
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6581) Prepare CollapsingQParserPlugin and ExpandComponent for 5.0

2015-01-08 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-6581:
-
Attachment: SOLR-6581.patch

Patch with all updated trunk code and all tests passing.

 Prepare CollapsingQParserPlugin and ExpandComponent for 5.0
 ---

 Key: SOLR-6581
 URL: https://issues.apache.org/jira/browse/SOLR-6581
 Project: Solr
  Issue Type: Bug
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, renames.diff


 *Background*
 The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
 are optimized to work with a top level FieldCache. Top level FieldCaches have 
 a very fast docID to top-level ordinal lookup. Fast access to the top-level 
 ordinals allows for very high performance field collapsing on high 
 cardinality fields. 
 LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
 FieldCache is no longer in regular use. Instead all top level caches are 
 accessed through MultiDocValues. 
 There are some major advantages of using the MultiDocValues rather then a top 
 level FieldCache. But there is one disadvantage, the lookup from docId to 
 top-level ordinals is slower using MultiDocValues.
 My testing has shown that *after optimizing* the CollapsingQParserPlugin code 
 to use MultiDocValues, the performance drop is around 100%.  For some use 
 cases this performance drop is a blocker.
 *What About Faceting?*
 String faceting also relies on the top level ordinals. Is faceting 
 performance affected also? My testing has shown that the faceting performance 
 is affected much less then collapsing. 
 One possible reason for this may be that field collapsing is memory bound and 
 faceting is not. So the additional memory accesses needed for MultiDocValues 
 affects field collapsing much more then faceting.
 *Proposed Solution*
 The proposed solution is to have the default Collapse and Expand algorithm 
 use MultiDocValues, but to provide an option to use a top level FieldCache if 
 the performance of MultiDocValues is a blocker.
 The proposed mechanism for switching to the FieldCache would be a new hint 
 parameter. If the hint parameter is set to FAST_QUERY then the top-level 
 FieldCache would be used for both Collapse and Expand.
 Example syntax:
 {code}
 fq={!collapse field=x hint=FAST_QUERY}
 {code}
  
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6581) Prepare CollapsingQParserPlugin and ExpandComponent for 5.0

2015-01-07 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-6581:
-
Attachment: SOLR-6581.patch

Patch with performance improvements for the ExpandComponent. Tests still need 
to be re-worked to support the performance enhancements.

 Prepare CollapsingQParserPlugin and ExpandComponent for 5.0
 ---

 Key: SOLR-6581
 URL: https://issues.apache.org/jira/browse/SOLR-6581
 Project: Solr
  Issue Type: Bug
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 renames.diff


 *Background*
 The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
 are optimized to work with a top level FieldCache. Top level FieldCaches have 
 a very fast docID to top-level ordinal lookup. Fast access to the top-level 
 ordinals allows for very high performance field collapsing on high 
 cardinality fields. 
 LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
 FieldCache is no longer in regular use. Instead all top level caches are 
 accessed through MultiDocValues. 
 There are some major advantages of using the MultiDocValues rather then a top 
 level FieldCache. But there is one disadvantage, the lookup from docId to 
 top-level ordinals is slower using MultiDocValues.
 My testing has shown that *after optimizing* the CollapsingQParserPlugin code 
 to use MultiDocValues, the performance drop is around 100%.  For some use 
 cases this performance drop is a blocker.
 *What About Faceting?*
 String faceting also relies on the top level ordinals. Is faceting 
 performance affected also? My testing has shown that the faceting performance 
 is affected much less then collapsing. 
 One possible reason for this may be that field collapsing is memory bound and 
 faceting is not. So the additional memory accesses needed for MultiDocValues 
 affects field collapsing much more then faceting.
 *Proposed Solution*
 The proposed solution is to have the default Collapse and Expand algorithm 
 use MultiDocValues, but to provide an option to use a top level FieldCache if 
 the performance of MultiDocValues is a blocker.
 The proposed mechanism for switching to the FieldCache would be a new hint 
 parameter. If the hint parameter is set to FAST_QUERY then the top-level 
 FieldCache would be used for both Collapse and Expand.
 Example syntax:
 {code}
 fq={!collapse field=x hint=FAST_QUERY}
 {code}
  
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6581) Prepare CollapsingQParserPlugin and ExpandComponent for 5.0

2015-01-06 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-6581:
-
Attachment: SOLR-6581.patch

New patch with working tests for the new ExpandComponent implementations. Test 
for expanding on Numeric fields is included.

 Prepare CollapsingQParserPlugin and ExpandComponent for 5.0
 ---

 Key: SOLR-6581
 URL: https://issues.apache.org/jira/browse/SOLR-6581
 Project: Solr
  Issue Type: Bug
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, renames.diff


 *Background*
 The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
 are optimized to work with a top level FieldCache. Top level FieldCaches have 
 a very fast docID to top-level ordinal lookup. Fast access to the top-level 
 ordinals allows for very high performance field collapsing on high 
 cardinality fields. 
 LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
 FieldCache is no longer in regular use. Instead all top level caches are 
 accessed through MultiDocValues. 
 There are some major advantages of using the MultiDocValues rather then a top 
 level FieldCache. But there is one disadvantage, the lookup from docId to 
 top-level ordinals is slower using MultiDocValues.
 My testing has shown that *after optimizing* the CollapsingQParserPlugin code 
 to use MultiDocValues, the performance drop is around 100%.  For some use 
 cases this performance drop is a blocker.
 *What About Faceting?*
 String faceting also relies on the top level ordinals. Is faceting 
 performance affected also? My testing has shown that the faceting performance 
 is affected much less then collapsing. 
 One possible reason for this may be that field collapsing is memory bound and 
 faceting is not. So the additional memory accesses needed for MultiDocValues 
 affects field collapsing much more then faceting.
 *Proposed Solution*
 The proposed solution is to have the default Collapse and Expand algorithm 
 use MultiDocValues, but to provide an option to use a top level FieldCache if 
 the performance of MultiDocValues is a blocker.
 The proposed mechanism for switching to the FieldCache would be a new hint 
 parameter. If the hint parameter is set to FAST_QUERY then the top-level 
 FieldCache would be used for both Collapse and Expand.
 Example syntax:
 {code}
 fq={!collapse field=x hint=FAST_QUERY}
 {code}
  
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6581) Prepare CollapsingQParserPlugin and ExpandComponent for 5.0

2015-01-06 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-6581:
-
Attachment: SOLR-6581.patch

New patch adding testing for FAST_QUERY hint to the ExpandComponent tests.

 Prepare CollapsingQParserPlugin and ExpandComponent for 5.0
 ---

 Key: SOLR-6581
 URL: https://issues.apache.org/jira/browse/SOLR-6581
 Project: Solr
  Issue Type: Bug
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, renames.diff


 *Background*
 The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
 are optimized to work with a top level FieldCache. Top level FieldCaches have 
 a very fast docID to top-level ordinal lookup. Fast access to the top-level 
 ordinals allows for very high performance field collapsing on high 
 cardinality fields. 
 LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
 FieldCache is no longer in regular use. Instead all top level caches are 
 accessed through MultiDocValues. 
 There are some major advantages of using the MultiDocValues rather then a top 
 level FieldCache. But there is one disadvantage, the lookup from docId to 
 top-level ordinals is slower using MultiDocValues.
 My testing has shown that *after optimizing* the CollapsingQParserPlugin code 
 to use MultiDocValues, the performance drop is around 100%.  For some use 
 cases this performance drop is a blocker.
 *What About Faceting?*
 String faceting also relies on the top level ordinals. Is faceting 
 performance affected also? My testing has shown that the faceting performance 
 is affected much less then collapsing. 
 One possible reason for this may be that field collapsing is memory bound and 
 faceting is not. So the additional memory accesses needed for MultiDocValues 
 affects field collapsing much more then faceting.
 *Proposed Solution*
 The proposed solution is to have the default Collapse and Expand algorithm 
 use MultiDocValues, but to provide an option to use a top level FieldCache if 
 the performance of MultiDocValues is a blocker.
 The proposed mechanism for switching to the FieldCache would be a new hint 
 parameter. If the hint parameter is set to FAST_QUERY then the top-level 
 FieldCache would be used for both Collapse and Expand.
 Example syntax:
 {code}
 fq={!collapse field=x hint=FAST_QUERY}
 {code}
  
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6581) Prepare CollapsingQParserPlugin and ExpandComponent for 5.0

2015-01-05 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-6581:
-
Attachment: SOLR-6581.patch

Added implementation to the ExpandComponent to handle expanding on numeric 
fields. Test cases have not yet been updated to include numeric field expansion.

 Prepare CollapsingQParserPlugin and ExpandComponent for 5.0
 ---

 Key: SOLR-6581
 URL: https://issues.apache.org/jira/browse/SOLR-6581
 Project: Solr
  Issue Type: Bug
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, renames.diff


 *Background*
 The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
 are optimized to work with a top level FieldCache. Top level FieldCaches have 
 a very fast docID to top-level ordinal lookup. Fast access to the top-level 
 ordinals allows for very high performance field collapsing on high 
 cardinality fields. 
 LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
 FieldCache is no longer in regular use. Instead all top level caches are 
 accessed through MultiDocValues. 
 There are some major advantages of using the MultiDocValues rather then a top 
 level FieldCache. But there is one disadvantage, the lookup from docId to 
 top-level ordinals is slower using MultiDocValues.
 My testing has shown that *after optimizing* the CollapsingQParserPlugin code 
 to use MultiDocValues, the performance drop is around 100%.  For some use 
 cases this performance drop is a blocker.
 *What About Faceting?*
 String faceting also relies on the top level ordinals. Is faceting 
 performance affected also? My testing has shown that the faceting performance 
 is affected much less then collapsing. 
 One possible reason for this may be that field collapsing is memory bound and 
 faceting is not. So the additional memory accesses needed for MultiDocValues 
 affects field collapsing much more then faceting.
 *Proposed Solution*
 The proposed solution is to have the default Collapse and Expand algorithm 
 use MultiDocValues, but to provide an option to use a top level FieldCache if 
 the performance of MultiDocValues is a blocker.
 The proposed mechanism for switching to the FieldCache would be a new hint 
 parameter. If the hint parameter is set to FAST_QUERY then the top-level 
 FieldCache would be used for both Collapse and Expand.
 Example syntax:
 {code}
 fq={!collapse field=x hint=FAST_QUERY}
 {code}
  
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6581) Prepare CollapsingQParserPlugin and ExpandComponent for 5.0

2015-01-03 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-6581:
-
Attachment: SOLR-6581.patch

Getting much closer. The numeric collapse field tests are now passing and 
variables have been renamed for clarity.

 Prepare CollapsingQParserPlugin and ExpandComponent for 5.0
 ---

 Key: SOLR-6581
 URL: https://issues.apache.org/jira/browse/SOLR-6581
 Project: Solr
  Issue Type: Bug
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 renames.diff


 *Background*
 The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
 are optimized to work with a top level FieldCache. Top level FieldCaches have 
 a very fast docID to top-level ordinal lookup. Fast access to the top-level 
 ordinals allows for very high performance field collapsing on high 
 cardinality fields. 
 LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
 FieldCache is no longer in regular use. Instead all top level caches are 
 accessed through MultiDocValues. 
 There are some major advantages of using the MultiDocValues rather then a top 
 level FieldCache. But there is one disadvantage, the lookup from docId to 
 top-level ordinals is slower using MultiDocValues.
 My testing has shown that *after optimizing* the CollapsingQParserPlugin code 
 to use MultiDocValues, the performance drop is around 100%.  For some use 
 cases this performance drop is a blocker.
 *What About Faceting?*
 String faceting also relies on the top level ordinals. Is faceting 
 performance affected also? My testing has shown that the faceting performance 
 is affected much less then collapsing. 
 One possible reason for this may be that field collapsing is memory bound and 
 faceting is not. So the additional memory accesses needed for MultiDocValues 
 affects field collapsing much more then faceting.
 *Proposed Solution*
 The proposed solution is to have the default Collapse and Expand algorithm 
 use MultiDocValues, but to provide an option to use a top level FieldCache if 
 the performance of MultiDocValues is a blocker.
 The proposed mechanism for switching to the FieldCache would be a new hint 
 parameter. If the hint parameter is set to FAST_QUERY then the top-level 
 FieldCache would be used for both Collapse and Expand.
 Example syntax:
 {code}
 fq={!collapse field=x hint=FAST_QUERY}
 {code}
  
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6581) Prepare CollapsingQParserPlugin and ExpandComponent for 5.0

2015-01-02 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-6581:
-
Attachment: SOLR-6581.patch

 Prepare CollapsingQParserPlugin and ExpandComponent for 5.0
 ---

 Key: SOLR-6581
 URL: https://issues.apache.org/jira/browse/SOLR-6581
 Project: Solr
  Issue Type: Bug
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, renames.diff


 *Background*
 The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
 are optimized to work with a top level FieldCache. Top level FieldCaches have 
 a very fast docID to top-level ordinal lookup. Fast access to the top-level 
 ordinals allows for very high performance field collapsing on high 
 cardinality fields. 
 LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
 FieldCache is no longer in regular use. Instead all top level caches are 
 accessed through MultiDocValues. 
 There are some major advantages of using the MultiDocValues rather then a top 
 level FieldCache. But there is one disadvantage, the lookup from docId to 
 top-level ordinals is slower using MultiDocValues.
 My testing has shown that *after optimizing* the CollapsingQParserPlugin code 
 to use MultiDocValues, the performance drop is around 100%.  For some use 
 cases this performance drop is a blocker.
 *What About Faceting?*
 String faceting also relies on the top level ordinals. Is faceting 
 performance affected also? My testing has shown that the faceting performance 
 is affected much less then collapsing. 
 One possible reason for this may be that field collapsing is memory bound and 
 faceting is not. So the additional memory accesses needed for MultiDocValues 
 affects field collapsing much more then faceting.
 *Proposed Solution*
 The proposed solution is to have the default Collapse and Expand algorithm 
 use MultiDocValues, but to provide an option to use a top level FieldCache if 
 the performance of MultiDocValues is a blocker.
 The proposed mechanism for switching to the FieldCache would be a new hint 
 parameter. If the hint parameter is set to FAST_QUERY then the top-level 
 FieldCache would be used for both Collapse and Expand.
 Example syntax:
 {code}
 fq={!collapse field=x hint=FAST_QUERY}
 {code}
  
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6581) Prepare CollapsingQParserPlugin and ExpandComponent for 5.0

2014-12-31 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-6581:
-
Attachment: SOLR-6581.patch

Latest work including test cases for collapsing on numeric field.  Not all 
tests are passing yet.

 Prepare CollapsingQParserPlugin and ExpandComponent for 5.0
 ---

 Key: SOLR-6581
 URL: https://issues.apache.org/jira/browse/SOLR-6581
 Project: Solr
  Issue Type: Bug
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, renames.diff


 *Background*
 The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
 are optimized to work with a top level FieldCache. Top level FieldCaches have 
 a very fast docID to top-level ordinal lookup. Fast access to the top-level 
 ordinals allows for very high performance field collapsing on high 
 cardinality fields. 
 LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
 FieldCache is no longer in regular use. Instead all top level caches are 
 accessed through MultiDocValues. 
 There are some major advantages of using the MultiDocValues rather then a top 
 level FieldCache. But there is one disadvantage, the lookup from docId to 
 top-level ordinals is slower using MultiDocValues.
 My testing has shown that *after optimizing* the CollapsingQParserPlugin code 
 to use MultiDocValues, the performance drop is around 100%.  For some use 
 cases this performance drop is a blocker.
 *What About Faceting?*
 String faceting also relies on the top level ordinals. Is faceting 
 performance affected also? My testing has shown that the faceting performance 
 is affected much less then collapsing. 
 One possible reason for this may be that field collapsing is memory bound and 
 faceting is not. So the additional memory accesses needed for MultiDocValues 
 affects field collapsing much more then faceting.
 *Proposed Solution*
 The proposed solution is to have the default Collapse and Expand algorithm 
 use MultiDocValues, but to provide an option to use a top level FieldCache if 
 the performance of MultiDocValues is a blocker.
 The proposed mechanism for switching to the FieldCache would be a new hint 
 parameter. If the hint parameter is set to FAST_QUERY then the top-level 
 FieldCache would be used for both Collapse and Expand.
 Example syntax:
 {code}
 fq={!collapse field=x hint=FAST_QUERY}
 {code}
  
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6581) Prepare CollapsingQParserPlugin and ExpandComponent for 5.0

2014-12-30 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-6581:
-
Attachment: SOLR-6581.patch

 Prepare CollapsingQParserPlugin and ExpandComponent for 5.0
 ---

 Key: SOLR-6581
 URL: https://issues.apache.org/jira/browse/SOLR-6581
 Project: Solr
  Issue Type: Bug
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch


 *Background*
 The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
 are optimized to work with a top level FieldCache. Top level FieldCaches have 
 a very fast docID to top-level ordinal lookup. Fast access to the top-level 
 ordinals allows for very high performance field collapsing on high 
 cardinality fields. 
 LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
 FieldCache is no longer in regular use. Instead all top level caches are 
 accessed through MultiDocValues. 
 There are some major advantages of using the MultiDocValues rather then a top 
 level FieldCache. But there is one disadvantage, the lookup from docId to 
 top-level ordinals is slower using MultiDocValues.
 My testing has shown that *after optimizing* the CollapsingQParserPlugin code 
 to use MultiDocValues, the performance drop is around 100%.  For some use 
 cases this performance drop is a blocker.
 *What About Faceting?*
 String faceting also relies on the top level ordinals. Is faceting 
 performance affected also? My testing has shown that the faceting performance 
 is affected much less then collapsing. 
 One possible reason for this may be that field collapsing is memory bound and 
 faceting is not. So the additional memory accesses needed for MultiDocValues 
 affects field collapsing much more then faceting.
 *Proposed Solution*
 The proposed solution is to have the default Collapse and Expand algorithm 
 use MultiDocValues, but to provide an option to use a top level FieldCache if 
 the performance of MultiDocValues is a blocker.
 The proposed mechanism for switching to the FieldCache would be a new hint 
 parameter. If the hint parameter is set to FAST_QUERY then the top-level 
 FieldCache would be used for both Collapse and Expand.
 Example syntax:
 {code}
 fq={!collapse field=x hint=FAST_QUERY}
 {code}
  
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6581) Prepare CollapsingQParserPlugin and ExpandComponent for 5.0

2014-12-30 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-6581:
---
Attachment: renames.diff

While you are improving CollapsingQParserPlugin, I suggest doing some variable 
renames.  In my work I needed something like this code so I forked it to be 
modified, but found my first task was to do a bunch of renames so that it was 
clear what variable was for what.  The attached patch is a redacted version 
from my code and includes a tad bit of other stuff to be ignored, but see the 
change in variable names, and a getter rename or two.  As a random example, 
docId is unclear; is this a global doc ID or is it segment local?  Likewise 
for ordinals.  Arguably most of Lucene/Solr is guilty of this but this one 
source file I found hard to penetrate until I did the renames to decipher 
what's going on.

 Prepare CollapsingQParserPlugin and ExpandComponent for 5.0
 ---

 Key: SOLR-6581
 URL: https://issues.apache.org/jira/browse/SOLR-6581
 Project: Solr
  Issue Type: Bug
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, renames.diff


 *Background*
 The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
 are optimized to work with a top level FieldCache. Top level FieldCaches have 
 a very fast docID to top-level ordinal lookup. Fast access to the top-level 
 ordinals allows for very high performance field collapsing on high 
 cardinality fields. 
 LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
 FieldCache is no longer in regular use. Instead all top level caches are 
 accessed through MultiDocValues. 
 There are some major advantages of using the MultiDocValues rather then a top 
 level FieldCache. But there is one disadvantage, the lookup from docId to 
 top-level ordinals is slower using MultiDocValues.
 My testing has shown that *after optimizing* the CollapsingQParserPlugin code 
 to use MultiDocValues, the performance drop is around 100%.  For some use 
 cases this performance drop is a blocker.
 *What About Faceting?*
 String faceting also relies on the top level ordinals. Is faceting 
 performance affected also? My testing has shown that the faceting performance 
 is affected much less then collapsing. 
 One possible reason for this may be that field collapsing is memory bound and 
 faceting is not. So the additional memory accesses needed for MultiDocValues 
 affects field collapsing much more then faceting.
 *Proposed Solution*
 The proposed solution is to have the default Collapse and Expand algorithm 
 use MultiDocValues, but to provide an option to use a top level FieldCache if 
 the performance of MultiDocValues is a blocker.
 The proposed mechanism for switching to the FieldCache would be a new hint 
 parameter. If the hint parameter is set to FAST_QUERY then the top-level 
 FieldCache would be used for both Collapse and Expand.
 Example syntax:
 {code}
 fq={!collapse field=x hint=FAST_QUERY}
 {code}
  
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6581) Prepare CollapsingQParserPlugin and ExpandComponent for 5.0

2014-12-29 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-6581:
-
Attachment: SOLR-6581.patch

A patch with the current state of the work being done. Patch includes 
implementations for top level String FieldCache and String MultiDocValues. The 
patch also includes initial implementations for collapsing on an integer field. 

 Prepare CollapsingQParserPlugin and ExpandComponent for 5.0
 ---

 Key: SOLR-6581
 URL: https://issues.apache.org/jira/browse/SOLR-6581
 Project: Solr
  Issue Type: Bug
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch


 *Background*
 The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
 are optimized to work with a top level FieldCache. Top level FieldCaches have 
 a very fast docID to top-level ordinal lookup. Fast access to the top-level 
 ordinals allows for very high performance field collapsing on high 
 cardinality fields. 
 LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
 FieldCache is no longer in regular use. Instead all top level caches are 
 accessed through MultiDocValues. 
 There are some major advantages of using the MultiDocValues rather then a top 
 level FieldCache. But there is one disadvantage, the lookup from docId to 
 top-level ordinals is slower using MultiDocValues.
 My testing has shown that *after optimizing* the CollapsingQParserPlugin code 
 to use MultiDocValues, the performance drop is around 100%.  For some use 
 cases this performance drop is a blocker.
 *What About Faceting?*
 String faceting also relies on the top level ordinals. Is faceting 
 performance affected also? My testing has shown that the faceting performance 
 is affected much less then collapsing. 
 One possible reason for this may be that field collapsing is memory bound and 
 faceting is not. So the additional memory accesses needed for MultiDocValues 
 affects field collapsing much more then faceting.
 *Proposed Solution*
 The proposed solution is to have the default Collapse and Expand algorithm 
 use MultiDocValues, but to provide an option to use a top level FieldCache if 
 the performance of MultiDocValues is a blocker.
 The proposed mechanism for switching to the FieldCache would be a new hint 
 parameter. If the hint parameter is set to FAST_QUERY then the top-level 
 FieldCache would be used for both Collapse and Expand.
 Example syntax:
 {code}
 fq={!collapse field=x hint=FAST_QUERY}
 {code}
  
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6581) Prepare CollapsingQParserPlugin and ExpandComponent for 5.0

2014-12-15 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-6581:
-
Description: 
*Background*

The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
are optimized to work with a top level FieldCache. Top level FieldCaches have a 
very fast docID to top-level ordinal lookup. Fast access to the top-level 
ordinals allows for very high performance field collapsing on high cardinality 
fields. 

LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
FieldCache is no longer in regular use. Instead all top level caches are 
accessed through MultiDocValues. 

There are some major advantages of using the MultiDocValues rather then a top 
level FieldCache. But there is one disadvantage, the lookup from docId to 
top-level ordinals is slower using MultiDocValues.

My testing has shown that *after optimizing* the CollapsingQParserPlugin code 
to use MultiDocValues, the performance drop is around 100%.  For some use cases 
this performance drop is a blocker.

*What About Faceting?*

String faceting also relies on the top level ordinals. Is faceting performance 
affected also? My testing has shown that the faceting performance is affected 
much less then collapsing. 

One possible reason for this may be that field collapsing is memory bound and 
faceting is not. So the additional memory accesses needed for MultiDocValues 
affects field collapsing much more then faceting.

*Proposed Solution*

The proposed solution is to have the default Collapse and Expand algorithm use 
MultiDocValues, but to provide an option to use a top level FieldCache if the 
performance of MultiDocValues is a blocker.

The proposed mechanism for switching to the FieldCache would be a new hint 
parameter. If the hint parameter is set to FAST_QUERY then the top-level 
FieldCache would be used for both Collapse and Expand.

Example syntax:
{code}
fq={!collapse field=x hint=FAST_QUERY}
{code}






 







 






  was:
*Background*

The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
are optimized to work with a top level FieldCache. Top level FieldCaches have a 
very fast docID to top-level ordinal lookup. Fast access to the top-level 
ordinals allows for very high performance field collapsing on high cardinality 
fields. 

LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
FieldCache is no longer in regular use. Instead all top level caches are 
accessed through MultiDocValues. 

There are some major advantages of using the MultiDocValues rather then a top 
level FieldCache. But the lookup from docId to top-level ordinals is slower 
using MultiDocValues.

My testing has shown that *after optimizing* the CollapsingQParserPlugin code 
to use MultiDocValues, the performance drop is around 100%.  For some use cases 
this performance drop is a blocker.

*What About Faceting?*

String faceting also relies on the top level ordinals. Is faceting performance 
effected also? My testing has shown that the faceting performance is effected 
much less then collapsing. 

One possible reason for this is that field collapsing is memory bound and 
faceting is not. So the additional memory accesses needed for MultiDocValues 
effects field collapsing much more the faceting.

*Proposed Solution*

The proposed solution is to have the default Collapse and Expand algorithm us 
MultiDocValues, but to provide an option to use a top level FieldCache if the 
performance of MultiDocValues is a blocker.

The proposed mechanism for switching to the FieldCache would be a new hint 
parameter. If the hint parameter is set to FAST_QUERY then the top-level 
FieldCache would be used for both Collapse and Expand.

Example syntax:
{code}
fq={!collapse field=x hint=FAST_QUERY}
{code}






 







 







 Prepare CollapsingQParserPlugin and ExpandComponent for 5.0
 ---

 Key: SOLR-6581
 URL: https://issues.apache.org/jira/browse/SOLR-6581
 Project: Solr
  Issue Type: Bug
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-6581.patch, SOLR-6581.patch


 *Background*
 The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
 are optimized to work with a top level FieldCache. Top level FieldCaches have 
 a very fast docID to top-level ordinal lookup. Fast access to the top-level 
 ordinals allows for very high performance field collapsing on high 
 cardinality fields. 
 LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
 FieldCache is no longer in regular use. Instead all top level caches are 
 accessed through MultiDocValues. 
 There are some major advantages of using the MultiDocValues rather then 

[jira] [Updated] (SOLR-6581) Prepare CollapsingQParserPlugin and ExpandComponent for 5.0

2014-12-14 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-6581:
-
Description: 
*Background*

The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
are optimized to work with a top level FieldCache. Top level FieldCaches have a 
very fast docID to top-level ordinal lookup. Fast access to the top-level 
ordinals allows for very high performance field collapsing on high cardinality 
fields. 

LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
FieldCache is no longer in regular use. Instead all top level caches are 
accessed through MultiDocValues. 

There are some major advantages of using the MultiDocValues rather then a top 
level FieldCache. But the lookup from docId to top-level ordinals is slower 
using MultiDocValues.

My testing has shown that *after optimizing* the CollapsingQParserPlugin code 
to use MultiDocValues, the performance drop is around 100%.  For some use cases 
this performance drop is a blocker.

*What About Faceting?*

String faceting also relies on the top level ordinals. Is faceting performance 
effected also? My testing has shown that the faceting performance is effected 
much less then collapsing. 

One possible reason for this is that field collapsing is memory bound and 
faceting is not. So the additional memory accesses needed for MultiDocValues 
effects field collapsing much more the faceting.

*Proposed Solution*

The proposed solution is to have the default Collapse and Expand algorithm us 
MultiDocValues, but to provide an option to use a top level FieldCache if the 
performance of MultiDocValues is a blocker.

The proposed mechanism for switching to the FieldCache would be a new hint 
parameter. If the hint parameter is set to FAST_QUERY then the top-level 
FieldCache would be used for both Collapse and Expand.

Example syntax:

fq={!collapse field=x hint=FAST_QUERY}







 







 






  was:
There were changes made to the CollapsingQParserPlugin and ExpandComponent in 
the 5x branch that were driven by changes to the Lucene Collectors API and 
DocValues API. This ticket is to review the 5x implementation and make any 
changes necessary in preparation for a 5.0 release.




 Prepare CollapsingQParserPlugin and ExpandComponent for 5.0
 ---

 Key: SOLR-6581
 URL: https://issues.apache.org/jira/browse/SOLR-6581
 Project: Solr
  Issue Type: Bug
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-6581.patch, SOLR-6581.patch


 *Background*
 The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
 are optimized to work with a top level FieldCache. Top level FieldCaches have 
 a very fast docID to top-level ordinal lookup. Fast access to the top-level 
 ordinals allows for very high performance field collapsing on high 
 cardinality fields. 
 LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
 FieldCache is no longer in regular use. Instead all top level caches are 
 accessed through MultiDocValues. 
 There are some major advantages of using the MultiDocValues rather then a top 
 level FieldCache. But the lookup from docId to top-level ordinals is slower 
 using MultiDocValues.
 My testing has shown that *after optimizing* the CollapsingQParserPlugin code 
 to use MultiDocValues, the performance drop is around 100%.  For some use 
 cases this performance drop is a blocker.
 *What About Faceting?*
 String faceting also relies on the top level ordinals. Is faceting 
 performance effected also? My testing has shown that the faceting performance 
 is effected much less then collapsing. 
 One possible reason for this is that field collapsing is memory bound and 
 faceting is not. So the additional memory accesses needed for MultiDocValues 
 effects field collapsing much more the faceting.
 *Proposed Solution*
 The proposed solution is to have the default Collapse and Expand algorithm us 
 MultiDocValues, but to provide an option to use a top level FieldCache if the 
 performance of MultiDocValues is a blocker.
 The proposed mechanism for switching to the FieldCache would be a new hint 
 parameter. If the hint parameter is set to FAST_QUERY then the top-level 
 FieldCache would be used for both Collapse and Expand.
 Example syntax:
 fq={!collapse field=x hint=FAST_QUERY}
  
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6581) Prepare CollapsingQParserPlugin and ExpandComponent for 5.0

2014-12-14 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-6581:
-
Description: 
*Background*

The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
are optimized to work with a top level FieldCache. Top level FieldCaches have a 
very fast docID to top-level ordinal lookup. Fast access to the top-level 
ordinals allows for very high performance field collapsing on high cardinality 
fields. 

LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
FieldCache is no longer in regular use. Instead all top level caches are 
accessed through MultiDocValues. 

There are some major advantages of using the MultiDocValues rather then a top 
level FieldCache. But the lookup from docId to top-level ordinals is slower 
using MultiDocValues.

My testing has shown that *after optimizing* the CollapsingQParserPlugin code 
to use MultiDocValues, the performance drop is around 100%.  For some use cases 
this performance drop is a blocker.

*What About Faceting?*

String faceting also relies on the top level ordinals. Is faceting performance 
effected also? My testing has shown that the faceting performance is effected 
much less then collapsing. 

One possible reason for this is that field collapsing is memory bound and 
faceting is not. So the additional memory accesses needed for MultiDocValues 
effects field collapsing much more the faceting.

*Proposed Solution*

The proposed solution is to have the default Collapse and Expand algorithm us 
MultiDocValues, but to provide an option to use a top level FieldCache if the 
performance of MultiDocValues is a blocker.

The proposed mechanism for switching to the FieldCache would be a new hint 
parameter. If the hint parameter is set to FAST_QUERY then the top-level 
FieldCache would be used for both Collapse and Expand.

Example syntax:
{code}
fq={!collapse field=x hint=FAST_QUERY}
{code}






 







 






  was:
*Background*

The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
are optimized to work with a top level FieldCache. Top level FieldCaches have a 
very fast docID to top-level ordinal lookup. Fast access to the top-level 
ordinals allows for very high performance field collapsing on high cardinality 
fields. 

LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
FieldCache is no longer in regular use. Instead all top level caches are 
accessed through MultiDocValues. 

There are some major advantages of using the MultiDocValues rather then a top 
level FieldCache. But the lookup from docId to top-level ordinals is slower 
using MultiDocValues.

My testing has shown that *after optimizing* the CollapsingQParserPlugin code 
to use MultiDocValues, the performance drop is around 100%.  For some use cases 
this performance drop is a blocker.

*What About Faceting?*

String faceting also relies on the top level ordinals. Is faceting performance 
effected also? My testing has shown that the faceting performance is effected 
much less then collapsing. 

One possible reason for this is that field collapsing is memory bound and 
faceting is not. So the additional memory accesses needed for MultiDocValues 
effects field collapsing much more the faceting.

*Proposed Solution*

The proposed solution is to have the default Collapse and Expand algorithm us 
MultiDocValues, but to provide an option to use a top level FieldCache if the 
performance of MultiDocValues is a blocker.

The proposed mechanism for switching to the FieldCache would be a new hint 
parameter. If the hint parameter is set to FAST_QUERY then the top-level 
FieldCache would be used for both Collapse and Expand.

Example syntax:

fq={!collapse field=x hint=FAST_QUERY}







 







 







 Prepare CollapsingQParserPlugin and ExpandComponent for 5.0
 ---

 Key: SOLR-6581
 URL: https://issues.apache.org/jira/browse/SOLR-6581
 Project: Solr
  Issue Type: Bug
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-6581.patch, SOLR-6581.patch


 *Background*
 The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
 are optimized to work with a top level FieldCache. Top level FieldCaches have 
 a very fast docID to top-level ordinal lookup. Fast access to the top-level 
 ordinals allows for very high performance field collapsing on high 
 cardinality fields. 
 LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
 FieldCache is no longer in regular use. Instead all top level caches are 
 accessed through MultiDocValues. 
 There are some major advantages of using the MultiDocValues rather then a top 
 level FieldCache. But the lookup from 

[jira] [Updated] (SOLR-6581) Prepare CollapsingQParserPlugin and ExpandComponent for 5.0

2014-12-03 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-6581:
-
Attachment: SOLR-6581.patch

 Prepare CollapsingQParserPlugin and ExpandComponent for 5.0
 ---

 Key: SOLR-6581
 URL: https://issues.apache.org/jira/browse/SOLR-6581
 Project: Solr
  Issue Type: Bug
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-6581.patch, SOLR-6581.patch


 There were changes made to the CollapsingQParserPlugin and ExpandComponent in 
 the 5x branch that were driven by changes to the Lucene Collectors API and 
 DocValues API. This ticket is to review the 5x implementation and make any 
 changes necessary in preparation for a 5.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6581) Prepare CollapsingQParserPlugin and ExpandComponent for 5.0

2014-12-02 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-6581:
-
Attachment: SOLR-6581.patch

 Prepare CollapsingQParserPlugin and ExpandComponent for 5.0
 ---

 Key: SOLR-6581
 URL: https://issues.apache.org/jira/browse/SOLR-6581
 Project: Solr
  Issue Type: Bug
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-6581.patch


 There were changes made to the CollapsingQParserPlugin and ExpandComponent in 
 the 5x branch that were driven by changes to the Lucene Collectors API and 
 DocValues API. This ticket is to review the 5x implementation and make any 
 changes necessary in preparation for a 5.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6581) Prepare CollapsingQParserPlugin and ExpandComponent for 5.0

2014-10-02 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-6581:
-
Fix Version/s: 5.0

 Prepare CollapsingQParserPlugin and ExpandComponent for 5.0
 ---

 Key: SOLR-6581
 URL: https://issues.apache.org/jira/browse/SOLR-6581
 Project: Solr
  Issue Type: Bug
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 5.0


 There we changes made to the CollapsingQParserPlugin and ExpandComponent in 
 the 5x branch that were driven by changes to the Lucene Collectors API and 
 DocValues API. This ticket is to review the 5x implementation and make any 
 changes necessary in preparation for a 5.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6581) Prepare CollapsingQParserPlugin and ExpandComponent for 5.0

2014-10-02 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-6581:
-
Description: 
There were changes made to the CollapsingQParserPlugin and ExpandComponent in 
the 5x branch that were driven by changes to the Lucene Collectors API and 
DocValues API. This ticket is to review the 5x implementation and make any 
changes necessary in preparation for a 5.0 release.



  was:
There we changes made to the CollapsingQParserPlugin and ExpandComponent in the 
5x branch that were driven by changes to the Lucene Collectors API and 
DocValues API. This ticket is to review the 5x implementation and make any 
changes necessary in preparation for a 5.0 release.




 Prepare CollapsingQParserPlugin and ExpandComponent for 5.0
 ---

 Key: SOLR-6581
 URL: https://issues.apache.org/jira/browse/SOLR-6581
 Project: Solr
  Issue Type: Bug
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 5.0


 There were changes made to the CollapsingQParserPlugin and ExpandComponent in 
 the 5x branch that were driven by changes to the Lucene Collectors API and 
 DocValues API. This ticket is to review the 5x implementation and make any 
 changes necessary in preparation for a 5.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org