[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857706#action_12857706 ] Claus Schröter commented on SOLR-236: - Hi all, I applied Martijns last Patch to the trunk and encountered a problem with document counts: whenever I set the rows= value to the query, the "numFound" result parameter is limited to exactly the value of rows. The facet counts are also limited to this value. If I omit the rows parameter everything is fine. I tried to track back the problem. It seems that the SolrSearcher query is limited to "rows" value before collapsing is done. Does anybody encounter a similar problem? Cheers! clausi > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java, > field-collapse-3.patch, field-collapse-4-with-solrj.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855469#action_12855469 ] Pierre-Luc commented on SOLR-236: - Hi all, We have integrated the most recent patch into our 1.4 install and the Out of memory fix suggested by Peter. I am facing memory issues only when collapsing. I would like to know why the class CacheValue is static in AbstractDocumentCollapser. If I remove the static attribute of that class, the memory footprint is greatly reduced and everything works fine. My document count is around 5 million. Any help would be greatly appreciated. Thank you. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java, > field-collapse-3.patch, field-collapse-4-with-solrj.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850944#action_12850944 ] Robert Zotter commented on SOLR-236: @Thomas Essentially my use case involves a product listing of sorts whereas there are many closely related items being sold by any number of sellers. I would like to distribute the search results across as many sellers as possible giving each seller a fair chance to sell their products, so I was going to use field collapsing to limit the number of items being displayed per seller. Ideally it would be nice if there were some way to evenly distribute closely related documents (scores within some defined percentage of each other) For example instead of: Item 1 sold by Seller A Item 2 sold by Seller A Item 3 sold by Seller A Item 4 sold by Seller B Item 5 sold by Seller B Item 6 sold by Seller B Assuming all of these ideas are within a certain percentage of each other it would be nice to have: Item 1 sold by Seller A Item 4 sold by Seller B Item 2 sold by Seller A Item 5 sold by Seller B Although I do not achieve this exact behavior with this particular patch It will at least get me closer to my goal. FYI my document count is around 6 million and I am already utilizing the document deduper. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java, > field-collapse-3.patch, field-collapse-4-with-solrj.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850934#action_12850934 ] Thomas Heigl commented on SOLR-236: --- @Robert: What is your use case for field collapsing? I think under "normal" conditions (collapsing on a field with reasonably many unique values) you can go with the slightly older patch and the OOM fixes. I compared the performance of the newest patch for the trunk with the 1.4 release patched as described above and didn't notice much difference under these conditions. I will must likely go with the trunk, however, as I have millions of documents with millions of unique values on the collapse field and need every bit of performance I can get. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java, > field-collapse-3.patch, field-collapse-4-with-solrj.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850930#action_12850930 ] Robert Zotter commented on SOLR-236: @Thomas. Thanks for the input. Do you think its best to go with a clean version of 1.4 or the latest from trunk? Basically I'm asking if you think trunk is semi-stable enough for a production environment. Thanks > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java, > field-collapse-3.patch, field-collapse-4-with-solrj.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850925#action_12850925 ] Thomas Heigl commented on SOLR-236: --- @Robert: I just tried the field collapsing patch with a clean version of the 1.4 release. The only recent patch that seems to be applicable without manually resolving conflicts is [2009-12-08|https://issues.apache.org/jira/secure/attachment/12427386/field-collapse-5.patch]. In addition to the patch you should also add the three individual files uploaded by Peter Karich to deal with the worst memory issues. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java, > field-collapse-3.patch, field-collapse-4-with-solrj.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850921#action_12850921 ] Thomas Heigl commented on SOLR-236: --- @Martijn: There is a small problem with the latest patch file. Both TortoiseSVN and patch complain that the file is malformed because there is an "empty" patch for FieldCollapseResponse.java around line 2199. Simply removing lines 2195-2199 does the trick. Apart from that, the patch works perfectly for me. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java, > field-collapse-3.patch, field-collapse-4-with-solrj.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848978#action_12848978 ] Robert Zotter commented on SOLR-236: What are the required steps to get this patch working with a clean 1.4? Is it even compatible? I've read in the above comments that the 12/12 field-collapse-5.patch will patch correctly but it has horrible memory bugs. Has there been any updates on this? Recommendations anyone? > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java, > field-collapse-3.patch, field-collapse-4-with-solrj.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841752#action_12841752 ] Peter Karich commented on SOLR-236: --- > Shouldn't the float array in DocSetScoreCollector be changed to a Map? hmmh, maybe I expressed myself a bit weird: I already changed this all to a Map (a SortedMap) ... I started this change in DocSetScoreCollector and changed all the other occurances of the float array (otherwise I would have to copy the entire map) > > I think the compare method should NOT be called if no docs are in the > > scores array ... ? > I would expect that every docId has a score. Yes, me too. So I expect there is somewhere a bug. But as I sayd this breaks only one test (collapse with faceting before). It could be even a but in the testcase though. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841558#action_12841558 ] Martijn van Groningen commented on SOLR-236: Shouldn't the float array in DocSetScoreCollector be changed to a Map? Because that is actually being cached and requires the most memory. The float array in the NonAdjacentDocumentCollapser.PredefinedScorer isn't being cached. Though changing this to a Map can be an improvement. bq. I think the compare method should NOT be called if no docs are in the scores array ... ? I would expect that every docId has a score. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841147#action_12841147 ] Peter Karich commented on SOLR-236: --- regarding the OutOfMemory problem: we are now testing the suggested change in production. I replaced the float array with a TreeMap. The change was nearly trivial (I cannot provide a patch easily, because we are using an older patch, althoug I could post the 3 changed files.) The point why I used a TreeMap instead a HashMap was that in the method advance in the class NonAdjacentDocumentCollapser.PredefinedScorer I needed the tailMap method: {noformat} public int advance(int target) throws IOException { // now we need a treemap method: iter = scores.tailMap(target).entrySet().iterator(); if (iter.hasNext()) return target; else return NO_MORE_DOCS; } {noformat} Then - I think - I discovered a bug/inconsistent behaviour: If I run the test FieldCollapsingIntegrationTest.testNonAdjacentCollapse_withFacetingBefore then the scores arrays will be created ala new float[maxDocs] in the old version. But the array will never be filled with some values so Float value1 = values.get(doc1); will return null in the method NonAdjacentDocumentCollapser.FloatValueFieldComparator.compare (the size of TreeMap is 0!); I work around this via {noformat} if (value1 == null) value1 = 0f; if (value2 == null) value2 = 0f; {noformat} although the compare method should be called if no docs are in the scores array ... ? > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840648#action_12840648 ] Martijn van Groningen commented on SOLR-236: The numFound attribute holds the total number of documents found for the specified query, so also the documents beyond the first result page. The reason that for the first query, the numFound is lower the the second query is that the collapse.threshold is higher. Only documents with the same collapse field value, that appear more then twice will be omitted from the result. This results in less document being collapsed. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840470#action_12840470 ] Yao Ge commented on SOLR-236: - I just applied the latest patch to trunk and I don't quite understand how the "numFound" in the response list is computed. With rows=10&collapse.threshold=1, I got numFound=11, with rows=10&collapse.threshold=2, I got numFound=22. I both cases the actual doc in the list is 10. Why is the numFound reported this way? > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839657#action_12839657 ] Martijn van Groningen commented on SOLR-236: That makes sense. I initially made it an array to maintain the document order for the scores, but this order is already in the openbitset. I think a Map is a good idea. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839545#action_12839545 ] Leon Messerschmidt commented on SOLR-236: - The OutOfMemory problem affects both field-collapse-5.patch on Solr 1.4 and SOLR-236.patch on the trunk. The root cause of the problem is DocSetScoreCollector that creates an array of float that is the size of the maxID document that matches the query. If you have a large index (we have several million documents) and a document with a very large id is matched you may end up with a huge array (in our case several hundred MB). Only a really small subset of the array is being used at any given time (especially if you're matching just a few documents with big doc ids). The implementation can rather use a sparse array or a map to keep track of scores. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836919#action_12836919 ] Peter Steevensz commented on SOLR-236: -- I applied this patch to the nightlybuild of feb 22 and this compilers without any problem. I can start Solr and it runs fine. But when i add the Field Collapse in the solrconfig.xml i cannot start Solr anymore. After adding this line to my solrconfig.xml: I get this error when i run Solr: 2010-02-22 22:24:30.722::WARN: Failed startup of context org.mortbay.jetty.webapp.webappcont...@7f5580{/solr,jar:file:/opt/apache-solr-1.5-dev/example/webapps/solr.war!/} java.lang.NullPointerException at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:593) at org.mortbay.jetty.servlet.Context.startContext(Context.java:139) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117) at org.mortbay.jetty.Server.doStart(Server.java:210) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.mortbay.start.Main.invokeMain(Main.java:183) at org.mortbay.start.Main.start(Main.java:497) at org.mortbay.start.Main.main(Main.java:115) (I am using Centos with Java 1.6 build 16) Any help is greatly appreciated!! > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases >
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835258#action_12835258 ] Peter Karich commented on SOLR-236: --- Trying the latest patch from 1th Feb 2010 compiles against solr-2010-02-13 from nightly build but does not work. If I query http://searchdev05:15100/cs-bidcs/select?q=*:*&collapse.field=myfield it fails with: {noformat} HTTP Status 500 - null java.lang.NullPointerException at org.apache.solr.schema.FieldType.toExternal(FieldType.java:329) at org.apache.solr.schema.FieldType.storedToReadable(FieldType.java:348) at org.apache.solr.search.fieldcollapse.collector.AbstractCollapseCollector.getCollapseGroupResult(AbstractCollapseCollector.java:58) at org.apache.solr.search.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory$DocumentCountCollapseCollector.getResult(DocumentGroupCountCollapseCollectorFactory.ja va:84) at org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.getCollapseInfo(AbstractDocumentCollapser.java:193) at org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:192) at org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at ... {noformat} I only need the OutOfMemory problem solved ... :-( > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835230#action_12835230 ] Peter Karich commented on SOLR-236: --- We are facing OutOfMemory problems too. We are using https://issues.apache.org/jira/secure/attachment/12425775/field-collapse-5.patch > Are you using any other features besides plain collapsing? The field collapse > cache gets large very quickly, > I suggest you turn it off (if you are using it). Also you can try to make > your filterCache smaller. How can I turn off the collapse cache or make the filterCache smaller? Are there other workarounds? E.g. via using a special version of the patch ? I read that it could help to specify collapse.maxdocs but this didn't help in our case ... could collapse.type=adjacent help here? (https://issues.apache.org/jira/browse/SOLR-236?focusedCommentId=12495376&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12495376) What do you think? BTW: We really like this patch and would like to use it !! :-) > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832703#action_12832703 ] Gerald DeConto commented on SOLR-236: - I have been able to apply and use the solr-236 patch successfully. Very, very cool and powerful. Are there any plans/hacks to include the non-collapsed document in the collapseCount and aggregate function values (ie so that it includes ALL documents, not just the collapsed ones)? Possibly via some parameter like collapse.includeAllDocs? I think this would be a great addition to the collapse code (and solr functionality), via what I would think is a small change, since solr doesnt have any other aggregation mechanism (as yet). Am trying to see how to change the code myself but Java is not my primary language. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831617#action_12831617 ] Kevin Cunningham commented on SOLR-236: --- No, just field collapsing. We went back to the field-collapse-5.patch for the time being. So far its been good and we updated just to get closer to the latest not because we were seeing issues. Thanks. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (SOLR-236) Field collapsing
I also think the isTokenized() check/exception should be removed. It is probably a common use-case to have a single-valued "tokenized" field - i.e. a case insensitive string (a text field where the only filter applied is a LowerCaseFilterFactory). I think that as long as it's documented that field collapsing "doesn't work" for fields with multiple tokens then it shouldn't be an issue. That certainly seems better to me than preventing a perfectly valid use case, since you wouldn't get any results anyway. if (schemaField.getType(). isTokenized()) { throw new RuntimeException("Could not collapse, because collapse field is tokenized"); } I agree that it would be better to "check" if the field has multiple values or not. In the mean-time, though, perhaps the "remove the check and log a warning" approach would suffice? -Trey On Tue, Jan 19, 2010 at 5:46 AM, Martijn van Groningen (JIRA) < j...@apache.org> wrote: > >[ > https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802186#action_12802186] > > Martijn van Groningen commented on SOLR-236: > > > If the field is tokenized and has more than one token your field collapse > result will become incorrect. What happens if I remember correctly is that > it will only collapse on the field's last token. This off course leads to > weird collapse groups. For the users that only have one token per collapse > field are because of this check out of luck. Somehow I think we should make > the user know that is not possible to collapse on a tokenized field (at > least with multiple tokens). Maybe adding a warning in the response. Still I > think the exception is more clear, but also prohibits it off course. > > bq. Or someone could come after me and write a patch that checks for > multi-tokened fields somehow and throws an exception. > Checking if a tokenized field contains only one token is really > inefficient, because you have the check all every collapse field of all > documents. Now do check is done based on the field's definition in the > schema. > > > Field collapsing > > > > > > Key: SOLR-236 > > URL: https://issues.apache.org/jira/browse/SOLR-236 > > Project: Solr > > Issue Type: New Feature > > Components: search > >Affects Versions: 1.3 > >Reporter: Emmanuel Keller > >Assignee: Shalin Shekhar Mangar > > Fix For: 1.5 > > > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, > collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > > > > This patch include a new feature called "Field collapsing". > > "Used in order to collapse a group of results with similar value for a > given field to a single entry in the result set. Site collapsing is a > special case of this, where all results for a given web site is collapsed > into one or two entries in the result set, typically with an associated > "more documents from this site" link. See also Duplicate detection." > > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > > The implementation add 3 new query parameters (SolrParams): > > "collapse.field" to choose the field used to group results > > "collapse.type" normal (default value) or adjacent > > "collapse.max" to select how many continuous results are allowed before > collapsing > > TODO (in progress): > > - More documentation (on source code) > > - Test cases > > Two patches: > > - "field_collapsing.patch" for current development version > > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > > P.S.: Feedback and misspelling correction are welcome ;-) > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > >
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831150#action_12831150 ] Martijn van Groningen commented on SOLR-236: bq. Regarding Patrick's comment about a memory leak, we are seeing something similar - very large memory usage and eventually using all the available memory. Were there any confirmed issues that may have been addressed with the later patches? We're using the 12-24 patch. Any toggles we can switch to still get the feature, yet minimize the memory footprint? Are you using any other features besides plain collapsing? The field collapse cache gets large very quickly, I suggest you turn it off (if you are using it). Also you can try to make your filterCache smaller. bq. What fixes would we be missing if ran Solr 1.4 with the last "field-collapse-5.patch" patch? Not much I believe, some are using it in production without too many problems. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830305#action_12830305 ] Kevin Cunningham commented on SOLR-236: --- Regarding Patrick's comment about a memory leak, we are seeing something similar - very large memory usage and eventually using all the available memory. Were there any confirmed issues that may have been addressed with the later patches? We're using the 12-24 patch. Any toggles we can switch to still get the feature, yet minimize the memory footprint? > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829727#action_12829727 ] Martijn van Groningen commented on SOLR-236: If you look into the AbstractDocumentCollapser#createDocumentCollapseResult() you will see that the collapseResult will never be null. Therefore I think the null check is not necessary. It think the following code is sufficient: {code} DocListAndSet results = searcher.getDocListAndSet(rb.getQuery(), collapseResult.getCollapsedDocset(), rb.getSortSpec().getSort(), rb.getSortSpec().getOffset(), rb.getSortSpec().getCount(), rb.getFieldFlags()); {code} Also specifying the filters is unnecessary, because it was already taken into account when creating the uncollapsed docset. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829522#action_12829522 ] Koji Sekiguchi commented on SOLR-236: - The following snippet in CollapseComponent.doProcess(): {code} DocListAndSet results = searcher.getDocListAndSet(rb.getQuery(), collapseResult == null ? rb.getFilters() : null, collapseResult.getCollapsedDocset(), rb.getSortSpec().getSort(), rb.getSortSpec().getOffset(), rb.getSortSpec().getCount(), rb.getFieldFlags()); {code} 2nd line implies that collapseResult may be null. If it is null, we got NPE at 3rd line? > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828039#action_12828039 ] Koji Sekiguchi commented on SOLR-236: - A random comment, don't we need to check collapse.field is indexed in checkCollapseField()? {code} protected void checkCollapseField(IndexSchema schema) { SchemaField schemaField = schema.getFieldOrNull(collapseField); if (schemaField == null) { throw new RuntimeException("Could not collapse, because collapse field does not exist in the schema."); } if (schemaField.multiValued()) { throw new RuntimeException("Could not collapse, because collapse field is multivalued"); } if (schemaField.getType().isTokenized()) { throw new RuntimeException("Could not collapse, because collapse field is tokenized"); } } {code} I accidentally specified an unindexed field for collapse.field, I got unexpected result without any errors. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802512#action_12802512 ] Martijn van Groningen commented on SOLR-236: Hi Yaniv, I tried the same on 1.4 branch (from svn) and the svn trunk. Applying the patch on both sources went fine, but when building (ant dist) on trunk I also got compile errors. This had to dowith that SolrQueryResponse changed package from request package to response package. I will update the patch shortly. Building on the 1.4 branch went without any problems (ant dist). What errors did occur when running ant dist on 1.4 branch? > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802334#action_12802334 ] Yaniv S. commented on SOLR-236: --- Hi All, this is a very exciting feature and I'm trying to apply it on our system. I've tried patching on 1.4 and on the trunk version but both give me build errors. Any suggestions on how I can build 1.4 or latest with this patch? Many Thanks, Yaniv > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802186#action_12802186 ] Martijn van Groningen commented on SOLR-236: If the field is tokenized and has more than one token your field collapse result will become incorrect. What happens if I remember correctly is that it will only collapse on the field's last token. This off course leads to weird collapse groups. For the users that only have one token per collapse field are because of this check out of luck. Somehow I think we should make the user know that is not possible to collapse on a tokenized field (at least with multiple tokens). Maybe adding a warning in the response. Still I think the exception is more clear, but also prohibits it off course. bq. Or someone could come after me and write a patch that checks for multi-tokened fields somehow and throws an exception. Checking if a tokenized field contains only one token is really inefficient, because you have the check all every collapse field of all documents. Now do check is done based on the field's definition in the schema. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801987#action_12801987 ] Michael Gundlach commented on SOLR-236: --- I've found the need to collapse on an analyzed field which contains one token (an email field, which is analyzed in order to lowercase it.) I had to apply a patch on top of field-collapse-5.patch in order to comment out the isTokenized() check in AbstractCollapseComponent.java , at which point the code worked perfectly. Is there a strong argument for keeping the isTokenized() check in? Anyone who needs to collapse an analyzed, single-token field is out of luck with this check in place. I understand that the current version protects users from incorrect results if they collapse a multi-token tokenized field, but maybe collapsing on analyzed fields is worth that risk. (Or someone could come after me and write a patch that checks for multi-tokened fields somehow and throws an exception.) > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799692#action_12799692 ] Martijn van Groningen commented on SOLR-236: I believe the field-collapse-5.patch should work for 1.4. Some bugs were fixed in later patches so I recommend using the latest patch on the latest successful nightly build if that is an option for you. Applying the latest patch on the 1.4 sources will properly result in some minor merge errors, but I think these should be easy the fix. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799409#action_12799409 ] Kevin Cunningham commented on SOLR-236: --- Which patch is recommended for those running a stock 1.4 release? > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797794#action_12797794 ] Martijn van Groningen commented on SOLR-236: bq. The result document of our prefix query, which was at position 1 without collapsing, was with collapsing not even within the top 10 results. We using the option collapse.maxdocs=150 and after changing this option to the value 15000, the results seem to be as expected. Because of that, we concluded, that there has to be a problem with the sorting of the uncollapsed docset. The collapse.maxdocs aborts collapsing after the threshold is met, but it is doing that based on the uncollapsed docset which is not sorted in any way. The result of that is that documents that would normally appear in the first page don't appear at all in the search result. Eventually the collapse component uses the collapsed docset as the result set and not the uncollapsed docset. bq. Also, we noticed a huge memory leak problem, when using collapsing. We configured the component with . Without setting the option collapse.field, it works normally, there are far no memory problems. If requests with enabled collapsing are received by the Solr server, the whole memory (oldgen could not be freed; eden space is heavily in use; ...) gets full after some few requests. By using a profiler, we noticed that the filterCache was extraordinary large. We supposed that there could be a caching problem (collapeCache was not enabled). I agree it gets huge. This applies for both the filterCache and field collapse cache. This is something that has to be addressed and certainly will in the new field-collapse implementation. In the patch you're using too much is being cached (some data can even be neglected in the cache). Also in some cases strings are being cached that actually could be replaced with hashcodes. bq. Additionally it might be very useful, if the parameter collapse=true|false would work again and could be used to enabled/disable the collapsing functionality. Currently, the existence of a field choosen for collapsing enables this feature and there is no possibility to configure the fields for collapsing within the request handlers. With that, we could configure it and only enable/disable it within the requests like it will be conveniently used by other components (highlighting, faceting, ...). That actually makes sense for using the collapse.enable parameter again in the patch. Martijn > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > T
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797716#action_12797716 ] Patrick Jungermann commented on SOLR-236: - Hi all, we using the Solr's trunk with the latest patch of {{2009-12-24 09:54 AM}}. Within the index, there are ~3.5 million documents with string-based identifiers of a length up to 50 chars. The result document of our prefix query, which was at position 1 without collapsing, was with collapsing not even within the top 10 results. We using the option {{collapse.maxdocs=150}} and after changing this option to the value 15000, the results seem to be as expected. Because of that, we concluded, that there has to be a problem with the sorting of the uncollapsed docset. Also, we noticed a huge memory leak problem, when using collapsing. We configured the component with {{}}. Without setting the option {{collapse.field}}, it works normally, there are far no memory problems. If requests with enabled collapsing are received by the Solr server, the whole memory (oldgen could not be freed; eden space is heavily in use; ...) gets full after some few requests. By using a profiler, we noticed that the filterCache was extraordinary large. We supposed that there could be a caching problem (collapeCache was not enabled). Additionally it might be very useful, if the parameter {{collapse=true|false}} would work again and could be used to enabled/disable the collapsing functionality. Currently, the existence of a field choosen for collapsing enables this feature and there is no possibility to configure the fields for collapsing within the request handlers. With that, we could configure it and only enable/disable it within the requests like it will be conveniently used by other components (highlighting, faceting, ...). Patrick > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795067#action_12795067 ] Stanislaw Osinski commented on SOLR-236: Hi Grant, {quote} I would note, in looking at the Carrot2 code, they actually have a ByFieldClusteringAlgorithm (what they call synthetic clustering) which does field collapsing/clustering on a value of a field. To quote the javadocs: Clusters documents into a flat structure based on the values of some field of the documents. By default the \...@link Document#SOURCES} field is used and Name of the field to cluster by. Each non-null scalar field value with distinct hash code will give raise to a single cluster, named using the \...@link Object#toString()} value of the field. If the field value is a collection, the document will be assigned to all clusters corresponding to the values in the collection. Note that arrays will not be 'unfolded' in this way. I don't know how it performs, but it seems like it would at least be worth investigating. {quote} Carrot2's {{ByFieldClusteringAlgorithm}} is very simple. It literally throws everything into a hash map based on the field value ([source code|http://fisheye3.atlassian.com/browse/carrot2/trunk/core/carrot2-algorithm-synthetic/src/org/carrot2/clustering/synthetic/ByFieldClusteringAlgorithm.java?r=trunk#l99]). This algorithm is used in our live demo to [cluster by news source|http://search.carrot2.org/stable/search?source=boss-news&query=iphone&algorithm=source]. {quote} Note, they also have a synthetic one for collapsing based on URL: ByUrlClusteringAlgorithm {quote} This one creates a [hierarchy based on the URL segments|http://search.carrot2.org/stable/search?source=boss-web&query=solr&algorithm=url&results=200] and might be useful to create "by-domain" collapsing if needed. In general, my rough guess is that it's the criteria for content-based collapsing would be closer to duplicate detection rather than the type of grouping Carrot2 produces. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795063#action_12795063 ] Grant Ingersoll commented on SOLR-236: -- bq. I'm curious as to whether anyone has just thought of using the Clustering component for this? If your "collapse" field was a single token, I wonder if you would get the results you're looking for. I would note, in looking at the Carrot2 code, they actually have a ByFieldClusteringAlgorithm (what they call synthetic clustering) which does field collapsing/clustering on a value of a field. To quote the javadocs: {quote} Clusters documents into a flat structure based on the values of some field of the documents. By default the {...@link Document#SOURCES} field is used {quote} and {quote} * Name of the field to cluster by. Each non-null scalar field value with distinct * hash code will give raise to a single cluster, named using the * {...@link Object#toString()} value of the field. If the field value is a collection, * the document will be assigned to all clusters corresponding to the values in the * collection. Note that arrays will not be 'unfolded' in this way. * {quote} I don't know how it performs, but it seems like it would at least be worth investigating. Note, they also have a synthetic one for collapsing based on URL: ByUrlClusteringAlgorithm Just food for thought. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794252#action_12794252 ] Uri Boness commented on SOLR-236: - {quote}If we are returning a number of documents (as opposed to a number of groups) to the user, how do they avoid splitting on a page in the middle of the group?{quote} As far as I know (Martijn, correct me if I'm wrong), Martijn's patch returns the number of groups *and* documents, where each group is actually represented as a document. So in that sense, the total count applies to the result set as is (groups count as documents) and therefore pagination just works. {quote}The only thing this algorithm can't do (related to pagination) is give the total number of documents after collapsing (and hence can't calculate the exact number of pages). This can be fine in many circumstances as long as the gui handles it (people don't seem to mind google doing it... I just tried it. Google didn't show the result count right unless displaying the last page).{quote} First of all, I must admit that I never noticed that in Google, so I guess you're right :-). But when you think about it, with Google, how many time do you get a low hit count that only fits in 2-3 pages? Well, I hardly ever get it, and when I do I don't even bother to check the result I just try to improve my search. With Solr, a lot of times its different, specially when all these discovery features and faceting are so often used to narrow the search extensively... I'm not saying not having a perfect pagination mechanism is a problem... not at all, I'm just saying that it *might* be an issue for specific use cases or specific domains but that's just an assumption (or a gut feeling) :-) > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794052#action_12794052 ] Martijn van Groningen commented on SOLR-236: Yes, I used his patch. Made a small bugfix and made sure that is in sync with the latest trunk. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (SOLR-236) Field collapsing
Yes, I used his patch. 2009/12/23 Noble Paul (JIRA) : > > Â Â [ > https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794050#action_12794050 > ] > > Noble Paul commented on SOLR-236: > - > > is't the patch built on the one given by shalin? the configuration looks > different... > >> Field collapsing >> >> >> Â Â Â Â Â Â Â Â Key: SOLR-236 >> Â Â Â Â Â Â Â Â URL: https://issues.apache.org/jira/browse/SOLR-236 >> Â Â Â Â Â Â Project: Solr >> Â Â Â Â Â Issue Type: New Feature >> Â Â Â Â Â Components: search >> Â Â Affects Versions: 1.3 >> Â Â Â Â Â Â Reporter: Emmanuel Keller >> Â Â Â Â Â Â Assignee: Shalin Shekhar Mangar >> Â Â Â Â Â Â Fix For: 1.5 >> >> Â Â Â Â Attachments: collapsing-patch-to-1.3.0-dieter.patch, >> collapsing-patch-to-1.3.0-ivan.patch, >> collapsing-patch-to-1.3.0-ivan_2.patch, >> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, >> field-collapse-4-with-solrj.patch, field-collapse-5.patch, >> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, >> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, >> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, >> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, >> field-collapse-5.patch, field-collapse-5.patch, >> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, >> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, >> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, >> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, >> quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, >> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, >> SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, >> solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch >> >> >> This patch include a new feature called "Field collapsing". >> "Used in order to collapse a group of results with similar value for a given >> field to a single entry in the result set. Site collapsing is a special case >> of this, where all results for a given web site is collapsed into one or two >> entries in the result set, typically with an associated "more documents from >> this site" link. See also Duplicate detection." >> http://www.fastsearch.com/glossary.aspx?m=48&amid=299 >> The implementation add 3 new query parameters (SolrParams): >> "collapse.field" to choose the field used to group results >> "collapse.type" normal (default value) or adjacent >> "collapse.max" to select how many continuous results are allowed before >> collapsing >> TODO (in progress): >> - More documentation (on source code) >> - Test cases >> Two patches: >> - "field_collapsing.patch" for current development version >> - "field_collapsing_1.1.0.patch" for Solr-1.1.0 >> P.S.: Feedback and misspelling correction are welcome ;-) > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > -- Met vriendelijke groet, Martijn van Groningen
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794050#action_12794050 ] Noble Paul commented on SOLR-236: - is't the patch built on the one given by shalin? the configuration looks different... > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793958#action_12793958 ] Shalin Shekhar Mangar commented on SOLR-236: @ttdi - Please post your questions to solr-user mailing list. This issue is strictly for Solr related development (not usage). > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793898#action_12793898 ] ttdi commented on SOLR-236: --- hi,Martijn van Groningen experts, when i use http://localhost:8080/search/?page=1 this can collapse the page=1 result,but when i use http://localhost:8080/search/?page=2 it can only collapse the page=2 result, not collapse all record? i want collapse the all record use pagination ,how can i do it? Thanks! > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793820#action_12793820 ] Stephen Weiss commented on SOLR-236: bq. Are you using any extra field collapse features? Such as aggregate functions. Also the collapse groups you collapse on do these have large field values? I'm going over the code and re-consider the way stuff is cached right now. No, we're very simple in our usage of the collapse features themselves, we don't even use the output that the collapse patch adds. However we do facet on a number of fields in this query as well, and sort by a date field. We also use local filter queries which we exclude for the facets individually (my favorite new feature).This packs a lot more action into one query then we had been doing previously (without that, we were running 8+ queries to get the same information), I was worried at first that this was the cause of the ram consumption. The field we are collapsing on is type "pint", it can be positive or negative depending on what system the document is coming in from. Each document has several stored fields, but a whole document's stored fields are under 1K together, always (it's only image metadata - there's no body text to any of these documents, this is for an image search engine). > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793808#action_12793808 ] Martijn van Groningen commented on SOLR-236: bq. It almost maxed out a machine with 18GB devoted to jetty in about 20 minutes. Hmmm that doesn't seem right. This is an issue. Are you using any extra field collapse features? Such as aggregate functions. Also the collapse groups you collapse on do these have large field values? I'm going over the code and re-consider the way stuff is cached right now. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793644#action_12793644 ] Stephen Weiss commented on SOLR-236: Quick note on the collapse cache - we just went into production with 1.4 and right away we had to turn off the collapse cache. This was with 1.4 dist and the patch from 12/12. With the cache enabled, RAM consumption was through the roof on the production servers - I guess with the variety of queries coming in, it filled up very fast. It almost maxed out a machine with 18GB devoted to jetty in about 20 minutes. We just used the sample config (maxSize=512), it looks like there were about 60 entries in the cache before we restarted. We would see the memory usage jump by as much as 2% after just one query. Without the cache the performance is still quite good (far better than what we had before) so we're not plussed, but it may indicate there needs to be more optimization there... Generally our consumption rarely goes over 50% on this machine unless we have a lot of commits coming in. The cache *did* provide some performance benefits on some of the queries that return large numbers of results (1M+) so it would be nice to have. Of course, it's possible with our index that these levels of RAM consumption would be unavoidable. I'm not sure if there's any further specifics I could provide that would be helpful, let me know. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793621#action_12793621 ] Yonik Seeley commented on SOLR-236: --- bq. As far as I understand from your collapse algorithm proposal, in order to save memory you'd like to restrict the group creation to only those that belong in the requested results page. A ton of memory, and probably a good amount of time too. It may be the only variant that certain people would be able to use (but note that it is just a variant - I'm not proposing doing away with the other options). bq. I think there might be a problem with pagination as well Yes, pagination is a sticky issue... but I don't think this algorithm messes it up further. If we are returning a number of documents (as opposed to a number of groups) to the user, how do they avoid splitting on a page in the middle of the group? I guess they over-request a little. What if they want a fixed number of groups? I guess they over-request by a lot (nGroups*collapse.threshold). Then they need to keep track of how many documents they actually used. The only thing this algorithm can't do (related to pagination) is give the total number of documents after collapsing (and hence can't calculate the exact number of pages). This can be fine in many circumstances as long as the gui handles it (people don't seem to mind google doing it... I just tried it. Google didn't show the result count right unless displaying the last page). > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793607#action_12793607 ] Shalin Shekhar Mangar commented on SOLR-236: {quote} This is exactly the point, it's not really meta-data over the document, but on the group the document belongs to. And you also need a more obvious way to mark this document as a group representation (to distinguish it from other normal documents). {quote} We show the highest scoring document of a group, so does the fact that the metadata belongs to the group and not the document matter at all? {quote} But extending the current element, doesn't mean we break BWC. Adding a (or ) sub element to it, will certainly not break anything, specially when we still don't have a formal xsd for the responses (I know we're working on it, but it's still not out there so it's safe). {quote} We are not extending anything. We're just adding a couple of fields which may not exist in the index and this is a capability we plan to introduce anyway (however this issue does not need to depend on SOLR-1566). The response format remains exactly the same. There is no break in compatibility. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793565#action_12793565 ] Uri Boness commented on SOLR-236: - @Yonik As far as I understand from your collapse algorithm proposal, in order to save memory you'd like to restrict the group creation to only those that belong in the requested results page. Beyond loosing the faceting support over the collapsed DocSet, I think there might be a problem with pagination as well. For every page you'll end up with a different total count and therefore different number of pages. This can be very confusing from the user perspective - imagine going to the first page and calculating (and displaying) that you have 3 pages of results, then when the user asks for the second page, s/he gets a response with 2 pages and different total count. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793554#action_12793554 ] Uri Boness commented on SOLR-236: - bq. Why is it wrong. it is about adding meta-info to the docs. This is what we plan to do with SOLR-1566 This is exactly the point, it's not really meta-data over the document, but on the group the document belongs to. And you also need a more obvious way to mark this document as a group representation (to distinguish it from other normal documents). bq. Even when we collapse what we are expecting is simple search results. So a drastic deviation from the standard format is not a good idea. I definitely agree that BWC should be kept, specially here when we're dealing with a query component. But extending the current element, doesn't mean we break BWC. Adding a (or ) sub element to it, will certainly not break anything, specially when we still don't have a formal xsd for the responses (I know we're working on it, but it's still not out there so it's safe). > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793534#action_12793534 ] Noble Paul commented on SOLR-236: - bq.I think mixing the collapse information with document fields is wrong Why is it wrong. it is about adding meta-info to the docs. This is what we plan to do with SOLR-1566 Even when we collapse what we are expecting is simple search results. So a drastic deviation from the standard format is not a good idea. Moreover , if we keep it in the document, it keeps parsing and processing simpler > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793476#action_12793476 ] Yonik Seeley commented on SOLR-236: --- bq. You think that collapse.collectDiscardedDocuments.fl is better? Is this something that's really needed? If so, some other name ideas could be collapse.discarded.fl collapse.discarded.limit (doesn't seem to be a good idea to have an unbounded number). bq. Just one thought I had about the algorithm you propose. If you only create collapse groups for the top ten documents then what about the total count of the search? Unique documents outside the top ten documents are not being grouped (if I understand you correctly) and that would impact the total count with how it currency works. Right - one would not be able to tell the total number of collapsed docs, or the total number of hits (or the DocSet) after collapsing. So only collapse.facet=before would be supported. I do think that just like faceting, there will be multiple ways of doing collapsing. Anyway, this is a great example of trying to make sure the interface doesn't preclude optimizations. Perhaps the total count of the search (numFound) should be pre-collapsing if collapse.facet=before, or perhaps it should always be pre-collapsing, and we should have another optional count for post-collapsing? > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793411#action_12793411 ] Uri Boness commented on SOLR-236: - @Shalin I think mixing the collapse information with document fields is wrong. The collapse fields don't really belong to the document, but to the group the document represents, while the other field do belong to it. The response format should somehow indicate this difference. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793365#action_12793365 ] Martijn van Groningen commented on SOLR-236: bq. We need to open a separate issue for the core related changes. As you properly have noticed I have split the patch into smaller patches and created sub issues for each patch. bq. How about we change the current field collapsing response format to the following? Looks okay at first sight. bq. For this to work, CollapseComponent must generate a custom SolrDocumentList and set it as "results" in the response. Maybe we need a more elegant solution for this. All these extra fields are calculated values. If we were to put the calculated values into a certain context and the response writers can then look values up in the context and write them to the response. Other functionalities might also benefit from this solution like distances from a central point when doing a geo search. It is just an idea. I recall there an issue in Jira that propose something like this, but I couldn't find it. bq. "collapse.aggregate" - Can we make this a multi-valued parameter instead of comma separated? I think that is good idea, other parameters (like the fq) are also multi-valued. BTW I think we should continue further technical discussions in the sub issues. We got space there for a lot of comments :-) > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793101#action_12793101 ] Shalin Shekhar Mangar commented on SOLR-236: How about we change the current field collapsing response format to the following? We add new well-known fields to the document itself, say # "collapse.value" - contains the group field's value for this document # "collapse.count" - the number of results collapsed under this document # "collapse.aggregate.function(field-name)" - the aggregate value for the given function applied to the given field for this document's group Example: {code:xml} 0 2 manu_exact max(field1) avg(field1) title:test title collapse F8V7067-APL-KIT Belkin 1 100 50.0 TWINX2048-3200PRO Corsair Microsystems Inc. 3 100 50.0 {code} No need to have another section and correlate based on uniqueKeys. For this to work, CollapseComponent must generate a custom SolrDocumentList and set it as "results" in the response. For request parameters: # "collapse.aggregate" - Can we make this a multi-valued parameter instead of comma separated? > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793090#action_12793090 ] Noble Paul commented on SOLR-236: - We need to open a separate issue for the core related changes. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793048#action_12793048 ] Martijn van Groningen commented on SOLR-236: bq. I support your suggestion on splitting this issue into two. i.e make the core changes in a separate patch . That is the plan anyway. The changes in the core that should be in a separate patch are: # SolrIndexSearcher # DocSetHitCollector # DocSetAwareCollector The above files where changes because of the following reasons: # The getDocSet(...) methods in the SolrIndexSearcher did not allow me to specify a Lucene Collector, which I needed to get the uncollapsed docset and levering the Solr caches whilst doing that. I changed them so I was able to do that. # The patch also contains an extra getDocListAndSet(...) method that allows specifying a filter docset, which in the case of field collapsing is the collapsed docset. The QueryComponent has changed as well. The only reason these changes where made, was to support the psuedo distributed field-collapsing. Maybe for the distributed field collapsing a separate patch should created with this change as a start. Last but not least the SolrJ code. I think for these changes a separate patch should be created as well. Maybe for each patch a sub issue should be created in Jira. The rest of the files in the patch do not impact any core files and I think should remain in one patch. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792997#action_12792997 ] Martijn van Groningen commented on SOLR-236: ttdi, The latest patch is not in sync with the latest trunk. You can try to patch to the trunk or use a previous patch for the 1.4 code. Yonik, The parameters description is a bit poor. The response format of the older patches contains two separate lists of collapse group counts. A list with counts per most relevant document id that is enabled or disabled with collapse.info.doc param. The second list with counts per fieldvalue of the most relevant document that is controlled with collapse.info.count param. Now that the response format has changed we should rename it to something more descriptive. Maybe something like collapse.showCount that adds the collapse count to the collapse group in the response (default to true) and collapse.showFieldValue that adds the fieldvalue of the most relevant document to the group (defaults to false)? The collapse.maxdocs specifies when to abort field-collapsing after n document have been processed. I have never used is. I can imagine that one would use it to shorten the search time. The collapse.includeCollapsedDocs.fl enables a collapse collector that collects the documents that have been discarded and output the specified fields of the discarded documents to the fieldcollapse response per collapse group (* for all fields). The parameter name does not reflect that behaviour entirely. You think that collapse.collectDiscardedDocuments.fl is better? However personally I would not use this, because of the negative impact it has on performance. Usually one wants to know something like the average / highest / lowest price of a collapse group. The AggregateCollapseCollector would fit the needs better. bq. Should I be able to specify a completely different sort within a group? collapse.sort=... seems nice... what are the implications? One bit of strangeness: it would seem to allow a highly ranked document responsible for the group being at the top of the list being dropped from the group due to a different sort criteria within the group. It's not necessarily an implementation problem though (sort values for the group should be maintained separately). I'm not sure about that. It would make things more complicated. Sorting the discarded documents in combination with the collapse.includeCollapsedDocs.fl functionality would maybe make more sense. bq. The most basic question about the interface would be how to present groups. Do we stick with a linear document list and supplement that with extra info in a different part of the response (as the current approach takes)? Or stick that extra info in with some of the documents somehow? Or if collapse=true, replace the list of documents with a list of groups, each which can contain many documents? Which will be easiest for clients to deal with? If you were starting from scratch and didn't have to deal with any of Solr's current shortcomings, what would it look like? I think the latter would make more sense, because field-collapsing does change the search result. It would just make it more obvious. bq. Is there a way to specify the number of groups that I want back instead of the number of documents? No there is not, but if the list of documents is replaced with a list of groups then the rows parameter should be used to indicate the number of groups to be displayed instead the number of documents to be displayed. Just one thought I had about the algorithm you propose. If you only create collapse groups for the top ten documents then what about the total count of the search? Unique documents outside the top ten documents are not being grouped (if I understand you correctly) and that would impact the total count with how it currency works. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-col
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792917#action_12792917 ] Yonik Seeley commented on SOLR-236: --- First, thanks to everyone who has spent so much time working on this - lack of committer attention doesn't equate to lack of interest... this is a very much needed feature! I'd agree with Erik that the most important thing is the interface to the client, and making it well thought out and semantically "tight". Martijn's recent improvements to the response structure is an example of improvements in this area. It's also important to think about the interface in terms of how easy it will be to add further features, optimizations, and support distributed search. If the code isn't sufficiently standalone, we also need to see how easily it fits into the rest of Solr (what APIs it adds or modifies, etc). Actually implementing performance improvements and more distributed search can come later - as long as we've thought about it now so we haven't boxed ourselves in. It seems like field collapsing should just be additional functionality of the query component rather than a separate component since it changes the results? The most basic question about the interface would be how to present groups. Do we stick with a linear document list and supplement that with extra info in a different part of the response (as the current approach takes)? Or stick that extra info in with some of the documents somehow? Or if collapse=true, replace the list of documents with a list of groups, each which can contain many documents? Which will be easiest for clients to deal with? If you were starting from scratch and didn't have to deal with any of Solr's current shortcomings, what would it look like? >From the wiki: collapse.maxdocs - what does this actually mean? I assume it collects arbitrary documents up to the max (normally by index order)? Does this really make sense? Does it affect faceting, etc? If it does make sense, it seems like it would also make sense for normal non-collapsed query results too, in which case it should be implemented at that level. collapse.info.doc - what does that do? I understand counts per group, but what's count per doc? collapse.includeCollapsedDocs.fl - I don't understand this one, and can't find an example on the wiki or blogs. It says "Parameter indicating to return the collapsed documents in the response"... but I thought documents were included up until collapse.threshold. collapse.debug - should perhaps just be rolled into debugQuery, or another general debug param (someone recently suggested using a comma separated list... debug=timings,query, etc. Should I be able to specify a completely different sort *within* a group? collapse.sort=... seems nice... what are the implications? One bit of strangeness: it would seem to allow a highly ranked document responsible for the group being at the top of the list being dropped from the group due to a different sort criteria within the group. It's not necessarily an implementation problem though (sort values for the group should be maintained separately). Is there a way to specify the number of groups that I want back instead of the number of documents? Or am I supposed to just over-request (rows=num_groups_I_want*threshold) and ignore if I get too many documents back? Random thought: We need a test to make sure this works with multi-select faceting (SimpleFacets asks for the docset of be base query...) Distributed Search: should be able to use the same type of algorithm that faceting does to ensure accurate counts. Performance: yes, it looks like the current code uses a *lot* of memory. Here's an algorithm that I thought of on my last plane ride that can do much better (assuming max() is the aggregation function): {code} === two pass collapsing algorithm for collapse.aggregate=max First pass: pretend that collapseCount=1 - Use a TreeSet as a priority queue since one can remove and insert entries. - A HashMap will be used to map from collapse group to top entry in the TreeSet - compare new doc with smallest element in treeset. If smaller discard and go to the next doc. - If new doc is bigger, look up it's group. Use the Map to find if the group has been added to the TreeSet and add it if not. - If the new bigger doc is already in the TreeSet, compare with the document in that group. If bigger, update the node, remove and re-add to the TreeSet to re-sort. efficiency: the treeset and hashmap are both only the size of the top number of docs we are looking at (10 for instance) We will now have the top 10 documents collapsed by the right field with a collapseCount of 1. Put another way, we have the top 10 groups. Second pass (if collapseCount>1): - create a priority queue for each gro
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792916#action_12792916 ] Yonik Seeley commented on SOLR-236: --- First, thanks to everyone who has spent so much time working on this - lack of committer attention doesn't equate to lack of interest... this is a very much needed feature! I'd agree with Erik that the most important thing is the interface to the client, and making it well thought out and semantically "tight". Martijn's recent improvements to the response structure is an example of improvements in this area. It's also important to think about the interface in terms of how easy it will be to add further features, optimizations, and support distributed search. If the code isn't sufficiently standalone, we also need to see how easily it fits into the rest of Solr (what APIs it adds or modifies, etc). Actually implementing performance improvements and more distributed search can come later - as long as we've thought about it now so we haven't boxed ourselves in. It seems like field collapsing should just be additional functionality of the query component rather than a separate component since it changes the results? The most basic question about the interface would be how to present groups. Do we stick with a linear document list and supplement that with extra info in a different part of the response (as the current approach takes)? Or stick that extra info in with some of the documents somehow? Or if collapse=true, replace the list of documents with a list of groups, each which can contain many documents? Which will be easiest for clients to deal with? If you were starting from scratch and didn't have to deal with any of Solr's current shortcomings, what would it look like? >From the wiki: collapse.maxdocs - what does this actually mean? I assume it collects arbitrary documents up to the max (normally by index order)? Does this really make sense? Does it affect faceting, etc? If it does make sense, it seems like it would also make sense for normal non-collapsed query results too, in which case it should be implemented at that level. collapse.info.doc - what does that do? I understand counts per group, but what's count per doc? collapse.includeCollapsedDocs.fl - I don't understand this one, and can't find an example on the wiki or blogs. It says "Parameter indicating to return the collapsed documents in the response"... but I thought documents were included up until collapse.threshold. collapse.debug - should perhaps just be rolled into debugQuery, or another general debug param (someone recently suggested using a comma separated list... debug=timings,query, etc. Should I be able to specify a completely different sort *within* a group? collapse.sort=... seems nice... what are the implications? One bit of strangeness: it would seem to allow a highly ranked document responsible for the group being at the top of the list being dropped from the group due to a different sort criteria within the group. It's not necessarily an implementation problem though (sort values for the group should be maintained separately). Is there a way to specify the number of groups that I want back instead of the number of documents? Or am I supposed to just over-request (rows=num_groups_I_want*threshold) and ignore if I get too many documents back? Random thought: We need a test to make sure this works with multi-select faceting (SimpleFacets asks for the docset of be base query...) Distributed Search: should be able to use the same type of algorithm that faceting does to ensure accurate counts. Performance: yes, it looks like the current code uses a *lot* of memory. Here's an algorithm that I thought of on my last plane ride that can do much better (assuming max() is the aggregation function): {code} === two pass collapsing algorithm for collapse.aggregate=max First pass: pretend that collapseCount=1 - Use a TreeSet as a priority queue since one can remove and insert entries. - A HashMap will be used to map from collapse group to top entry in the TreeSet - compare new doc with smallest element in treeset. If smaller discard and go to the next doc. - If new doc is bigger, look up it's group. Use the Map to find if the group has been added to the TreeSet and add it if not. - If the new bigger doc is already in the TreeSet, compare with the document in that group. If bigger, update the node, remove and re-add to the TreeSet to re-sort. efficiency: the treeset and hashmap are both only the size of the top number of docs we are looking at (10 for instance) We will now have the top 10 documents collapsed by the right field with a collapseCount of 1. Put another way, we have the top 10 groups. Second pass (if collapseCount>1): - create a priority queue for each group (10) of size collapseC
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792803#action_12792803 ] ttdi commented on SOLR-236: --- hi,experts, thanks for the great work! now i download solr1.4 from http://apache.freelamp.com/lucene/solr/1.4.0/apache-solr-1.4.0.zip and i path this patch:SOLR-236.patch 2009-12-18 10:16 AM Shalin Shekhar Mangar like this: G:\doc\apache-solr-1.4.0>patch.exe -p0 < SOLR-236.patch it will show some error,and this patch( SOLR-236.patch 2009-12-18 10:16 AM )don't support solr1.4 ? and the result is: patching file src/test/test-files/solr/conf/solrconfig-fieldcollapse.xml patching file src/test/test-files/solr/conf/schema-fieldcollapse.xml patching file src/test/test-files/solr/conf/solrconfig.xml patching file src/test/test-files/fieldcollapse/testResponse.xml can't find file to patch at input line 787 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -- | |Property changes on: src/test/test-files/fieldcollapse/testResponse.xml |___ |Added: svn:keywords | + Date Author Id Revision HeadURL |Added: svn:eol-style | + native | |Index: src/test/org/apache/solr/BaseDistributedSearchTestCase.java |=== |--- src/test/org/apache/solr/BaseDistributedSearchTestCase.java(revision 891214) |+++ src/test/org/apache/solr/BaseDistributedSearchTestCase.java(working copy) -- File to patch: SOLR-236.patch S: No such file or directory Skip this patch? [y] y Skipping patch. 2 out of 2 hunks ignored patching file src/test/org/apache/solr/search/fieldcollapse/FieldCollapsingIntegrationTest.java patching file src/test/org/apache/solr/search/fieldcollapse/DistributedFieldCollapsingIntegrationTest.java patching file src/test/org/apache/solr/search/fieldcollapse/NonAdjacentDocumentCollapserTest.java patching file src/test/org/apache/solr/search/fieldcollapse/AdjacentCollapserTest.java patching file src/test/org/apache/solr/handler/component/CollapseComponentTest.java patching file src/test/org/apache/solr/client/solrj/response/FieldCollapseResponseTest.java patching file src/java/org/apache/solr/search/DocSetAwareCollector.java patching file src/java/org/apache/solr/search/fieldcollapse/CollapseGroup.java patching file src/java/org/apache/solr/search/fieldcollapse/DocumentCollapseResult.java patching file src/java/org/apache/solr/search/fieldcollapse/DocumentCollapser.java patching file src/java/org/apache/solr/search/fieldcollapse/collector/CollapseCollectorFactory.java patching file src/java/org/apache/solr/search/fieldcollapse/collector/DocumentGroupCountCollapseCollectorFactory.java patching file src/java/org/apache/solr/search/fieldcollapse/collector/aggregate/AverageFunction.java patching file src/java/org/apache/solr/search/fieldcollapse/collector/aggregate/MinFunction.java patching file src/java/org/apache/solr/search/fieldcollapse/collector/aggregate/SumFunction.java patching file src/java/org/apache/solr/search/fieldcollapse/collector/aggregate/MaxFunction.java patching file src/java/org/apache/solr/search/fieldcollapse/collector/aggregate/AggregateFunction.java patching file src/java/org/apache/solr/search/fieldcollapse/collector/CollapseContext.java patching file src/java/org/apache/solr/search/fieldcollapse/collector/DocumentFieldsCollapseCollectorFactory.java patching file src/java/org/apache/solr/search/fieldcollapse/collector/AggregateCollapseCollectorFactory.java patching file src/java/org/apache/solr/search/fieldcollapse/collector/CollapseCollector.java patching file src/java/org/apache/solr/search/fieldcollapse/collector/FieldValueCountCollapseCollectorFactory.java patching file src/java/org/apache/solr/search/fieldcollapse/collector/AbstractCollapseCollector.java patching file src/java/org/apache/solr/search/fieldcollapse/AbstractDocumentCollapser.java patching file src/java/org/apache/solr/search/fieldcollapse/NonAdjacentDocumentCollapser.java patching file src/java/org/apache/solr/search/fieldcollapse/AdjacentDocumentCollapser.java patching file src/java/org/apache/solr/search/fieldcollapse/util/Counter.java patching file src/java/org/apache/solr/search/SolrIndexSearcher.java patching file src/java/org/apache/solr/search/DocSetHitCollector.java patching file src/java/org/apache/solr/handler/component/CollapseComponent.java patching file src/java/org/apache/solr/handler/component/QueryComponent.java Hunk #5 succeeded at 521 with fuzz 2. Hunk #6 succeeded at 562 (offset -5 lines). patching file src/java/org/apache/solr/util/DocSetScoreCollector.java patching file src/common/org/apache/solr/common/params/CollapseParams.java patching file src/solrj/org/apache/solr/client/solrj/SolrQuery.java Hunk #1 FAILED at 17. Hunk #2 FAILED at 50. Hunk #3 FAILED a
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792793#action_12792793 ] Mark Miller commented on SOLR-236: -- bq. This is a huge difference. Considering the no:of involvement of the non-committers involved in this issue. Its not really any different than putting it in trunk. Non committers can still post patches to the branch in JIRA, the same as if the issue was in trunk. Smaller, more focused patches. If there are no benefits to a branch in this regard, what is the argument to putting this in trunk for further dev? Might as well just stay in patch form until its ready then. bq. If your patch does not modify any existing files you never have to sync it w/ trunk. It is always synced. You have to apply the patch. With a branch you have to type a merge command. Its the same effort - a single command. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792784#action_12792784 ] Noble Paul commented on SOLR-236: - bq.The main difference I see is that its easier for non committers to share updated patches This is a huge difference. Considering the no:of involvement of the non-committers involved in this issue. If your patch does not modify any existing files you never have to sync it w/ trunk. It is always synced. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792781#action_12792781 ] Mark Miller commented on SOLR-236: -- bq . On the other hand if the code lives in a branch it is more work to keep it synced w/ the trunk than the patch itself. Is that true? Syncing a branch is the same as syncing a patch - non conflicts are merged automatically and conflicts must be handled - same with a patch or a branch. And a patch gets out of date just as easily as a branch. The main difference I see is that its easier for non committers to share updated patches, whereas merging the branch will require the help of a committer if you want to share the merge with others. Anyone can checkout the branch and merge with trunk though - its literally the same effort as updating an out of date patch. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792777#action_12792777 ] Noble Paul commented on SOLR-236: - bq.olr already has a few places where the response format is still marked as experimental and as subject to changes in the future Marking the output format as experimental is just trying to be safe. We strive hard to ensure that we don't change it or even if we do it it is not disruptive. So let us not take this as an excuse to be lax of the review of the public API. on keeping a separate branch I would say a branch is less useful than an patch. if the patch applies to the trunk , I can be sure that I have the latest and greatest stuff. On the other hand if the code lives in a branch it is more work to keep it synced w/ the trunk than the patch itself. @Uri I support your suggestion on splitting this issue into two. i.e make the core changes in a separate patch . That is the plan anyway. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792686#action_12792686 ] Uri Boness commented on SOLR-236: - Essentially it boils down to two options: # Keep it out of the trunk, in which case users that will need this functionality will only get it by working with a patched Solr version of their own, or use a branch (in both cases, most likely they will miss the continuous work done on the trunk unless they keep on merging the changes) # Keep in the trunk with some caveats, in which case they users have a chance to use this functionality out of the box In both cases, the user have a choice to make: - be satisfied by the performance of this feature - look for an alternative solution (other products) - give up this functionality all together (if their business requirements allow that) So the main difference here I would say is in how easy you'd like to provide this functionality to the users. On the Solr development part, indeed once this is committed to the trunk there's much more responsibility on the committers to make it work (enhance performance and fix bugs)... but this is a *good* thing as there is a high demand for this feature and as a community driven project this demand should to be satisfied. And I *do* think that the number of users using this patch already is a good indicator that it is good enough for quite a lot of use cases. I do agree though that before committing anything, the public API should be re-evaluated to minimize chances for BWC issues later on. BTW, regarding the response, Solr already has a few places where the response format is still marked as experimental and as subject to changes in the future (but it doesn't stop people from using this functionality as they take the responsibility to adapt to any such future changes when the come). Now... writing this, it suddenly occurred to me that there might be another solution to this all discussion which is in a way a combination of many of the suggestions in this thread. What if, this patch would be split to two: the changes to the core and the component itself. Now, if the changes to the core are not that drastic and make sense (or at least everyone can live with them) then perhaps they can be committed to the trunk. As for the rest of the patch (which consists of the search components and its other supporting classes), this can be put in SVN as separate branch for contrib. The good thing about this solution is that the work done on this functionality will be in SVN so you benefit from it as David mentioned above. The other benefit is that with this layout you can actually build the branched code base separately and distribute this functionality as a separate jar which can be deployed in Solr 1.5x distribution. Again, a bit of work left to the users (too much to my taste) but at least they're not forced to use a patched version of Solr. Would that be a possible solution? > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > thi
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792587#action_12792587 ] Patrick Eger commented on SOLR-236: --- Hi, possibly not important but would like to give my perspective as a user. Specifically, the code is very much production ready in our opinion, albeit under a limited set of circumstances that we are comfortable with (< 5 million docs, no distributed search). Within those confines it works great and satisfies our needs, and we are more than willing to pay the performance hit since it's absolutely essential to the correct functionality. I suppose i'd disagree with the assertion that the performance is "unacceptable", as i think that is a value judgement each user will have to make. Modulo the discussion about the request format, output format and config (stuff that is hard to change later). I would much rather have the code be in and documented with those caveats clearly spelled out and probably tracked in separate JIRA issues. IE DO NOT USE IF SHARDING, >5 million docs, etc, etc. Again, just my 2c as a satisfied user. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792539#action_12792539 ] Grant Ingersoll commented on SOLR-236: -- I'm not sold on the output yet, either. Have we considered it being inline? We're getting more and more parallel arrays we need to consider. I think with the other Solr issues that are looking at pseudo-fields and the ability for components to add results, that we could rework these things. Also, why don't the aggregate functions just work w/ all the existing functions? > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792535#action_12792535 ] Noble Paul commented on SOLR-236: - The main problem with the patch is that the performance/resource consumption is unacceptable. * Is it true that the perf cost is avoidable? * or are their implementation details which can be optimized? We are working to make to ready for trunk. So anything that helps us move towards the objective is welcome > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792518#action_12792518 ] Mark Miller commented on SOLR-236: -- bq. I very much disagree with a policy blocking non-production-ready code from being in source control Just to be clear, there is no such policy that I've seen - each decision just comes down to consensus. And as far as I know, our branch policy is pretty much "anything goes" - trunk is very different than svn. Anyone can play around with a branch for anything if they want. I agree with your thoughts on a branch - if the argument is, we want it to be easier for devs to check out and work on this, or for users to checkout and build this without applying patches, why not just make a branch? Merging is annoying but not difficult - I've been doing plenty of branch merging lately, and while its not glorious work, modern tools make it more of a grind than a challenge. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792514#action_12792514 ] David Smiley commented on SOLR-236: --- I've been watching this thread forever without saying anything but want to offer my two cents and I'll but out. I very much disagree with a policy blocking non-production-ready code from being in source control. All code starts off this way and it would be quite a shame not to leverage the advantages of source control simply because it isn't ready yet. If people are uncomfortable with it being in trunk then _simply_ use a branch. Of course, how simple "simple" is depends on one's comfort with source control and the particular source control technology used and tools to help you (e.g. IDEs). By the way, git makes "feature branches" (which is what this would be) easy to manage and integrates bidirectionally with subversion. If you're not comfortable with branching because you're not familiar with it then you need to learn. By "you" I don't mean anyone in particular, I mean all professional software developers. Source control and branching are tools of our trade. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792510#action_12792510 ] Mark Miller commented on SOLR-236: -- bq. (Faceting fot a 50 times perf boost in 1.4) No it didn't. Certain cases have gotten a boost (I think you might be referring to multi-field faceting cases?). And general faceting was always relatively fast and scalable. I'm against committing features to trunk with a warning that the feature is not ready for trunk. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792509#action_12792509 ] Noble Paul commented on SOLR-236: - bq.This patch has quite a resource/performance hit. I've seen and read about the resource hit. Its rather large. The performance price is paid only if you use this component. Having the functionality itself in Solr is quite important. Performance can obviously be improved. (Faceting fot a 50 times perf boost in 1.4) . As long as the performance of the component is within the acceptable range we should leave that call to the user. The cost actually depends on the data set too. As long as the component has a correct public API (req params/response format/configuration) I believe it can be committed with a clear warning. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792477#action_12792477 ] Mark Miller commented on SOLR-236: -- I'm with Grant on this one. Trunk is not a sandbox, and getting more developer attention is not a good reason to put something in trunk. Issues should go in when they are ready. Tons of interest and votes doesn't mean rush to trunk - if that type of thing moves you, it means start putting some work into it to make it ready for trunk. This patch has quite a resource/performance hit. I've seen and read about the resource hit. Its rather large. The performance hit is not any better. The linked to blog marks performance with collapsing as 5-10 times slower than without. Personally, I don't think this issue is ready for trunk. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792458#action_12792458 ] Uri Boness commented on SOLR-236: - bq. I'm curious as to whether anyone has just thought of using the Clustering component for this? If your "collapse" field was a single token, I wonder if you would get the results you're looking for. The main difference between the two components is that while the clustering works more as a function where the input is the doclist/docset and the output is a separate data structure representing the groups, the collapse component operates directly on the docset & doclist modifies them and incorporates the groups within the final search result. In all occurrences where we found the need for the collapse component, we needed to incorporate the grouping within the search result, and adjust the sorting and the pagination accordingly. As far as I know you cannot do that with the clustering component. This tight integration with the result is also the reason why the collapse component right now is actually a replacement to the query component. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792446#action_12792446 ] Grant Ingersoll commented on SOLR-236: -- I'm curious as to whether anyone has just thought of using the Clustering component for this? If your "collapse" field was a single token, I wonder if you would get the results you're looking for. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792426#action_12792426 ] Martijn van Groningen commented on SOLR-236: For Shalin: {quote} I just don't think that we should introduce new tags and new kinds of components in solrconfig.xml, particularly those that are useful to only a single component. That introduces changes in SolrConfig.java so that it knows how to load such things. That is why I moved that configuration inside CollapseComponent. Ideally, all components will use PluginInfo and load whatever they need from their own PluginInfo object and SolrConfig would not need to be changed unless we introduce new kinds of Solr plugins. {quote} I agree about the PluginInfo and I think it is the right place for field collapse config. bq. Just curious, what would be a use-case for sharing factories (other than reducing duplication of configuration) and having multiple CollapseComponent? Besides different configured CollapseCollectorFactories none. bq. I don't think we need to add that functionality to CoreContainer and SolrDispatchFilter. It is still possible to specify a different solrconfig and schema for a test. Let me see if I can make this work with BaseDistributedSearchTestCase That would be great! > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792350#action_12792350 ] Shalin Shekhar Mangar commented on SOLR-236: For Martijn: {quote} The reason I added ... was to be able support sharing of collapseCollectorFactory instances between different collapse components in the near future. You think that is a valid reason for that? Or do you think that collapseCollectorFactories shouldn't be shared? {quote} I just don't think that we should introduce new tags and new kinds of components in solrconfig.xml, particularly those that are useful to only a single component. That introduces changes in SolrConfig.java so that it knows how to load such things. That is why I moved that configuration inside CollapseComponent. Ideally, all components will use PluginInfo and load whatever they need from their own PluginInfo object and SolrConfig would not need to be changed unless we introduce new kinds of Solr plugins. Just curious, what would be a use-case for sharing factories (other than reducing duplication of configuration) and having multiple CollapseComponent? {quote} The CollapseComponentTest was failing. The field collapseCollectorFactories in CollapseComponent was null when not specifying any collapse collector factories in the solrconfig.xml which resulted in a NPE. {quote} Oops, sorry about that. I only ran the tests inside org.apache.solr.search.fieldcollapse. I didn't notice there are other tests too. Thanks! bq. The DistributedFieldCollapsingIntegrationTest is still failing, because you left out changes in JettySolrRunner, CoreContainer and SolrDispatchFilter from my original patch. I don't think we need to add that functionality to CoreContainer and SolrDispatchFilter. It is still possible to specify a different solrconfig and schema for a test. Let me see if I can make this work with BaseDistributedSearchTestCase > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792349#action_12792349 ] Noble Paul commented on SOLR-236: - bq.I think that is all the more reason why it needs to be done right and not just be a "good start". The fact that it has been around for so long means that the "good start" is gonna take longer to happen. According to me , we should fix the obvious stuff and commit this with a clear warning in the javadocs and wiki that this has perf isssues and the code/API/configuration may change incompatibly in the future. bq.Committed stuff I'll try out easier than patches actually. +1 There is a better chances of developers taking a look at it if it is already in the trunk. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792199#action_12792199 ] Grant Ingersoll commented on SOLR-236: -- {quote} Grant, this patch may not be perfect but I think we all agree that it is a great start. This is stable, used by many and has been well supported by the community. This is also a large patch and as I have known from my DataImportHandler experience, maintaining a large patch is quite a pain (and DataImportHandler didn't even touch the core). How about we commit this (after some review, of course), mark this as experimental (no guarantees of any sort) and then start improving it one issue at a time? Alternately, if you are not comfortable adding it to trunk, we can commit this on a branch and merge into trunk later. {quote} Which is why it should not go in unless it is ready. Adding a large patch that isn't right just b/c it's been around for a while and is "hard to maintain" is no reason to just go commit something. The problem w/ committing something that isn't ready is then we have to do even more work to maintain it, thus taking away from the opportunity to make it better. As for the voting and the popularity, I think that is all the more reason why it needs to be done right and not just be a "good start". With this many eyes on it, it shouldn't be easy to get people testing it and giving feedback. If the issue is that the patch is to big, then perhaps it needs to be broken up into smaller pieces that lay the framework for field collapsing to work. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792191#action_12792191 ] Erik Hatcher commented on SOLR-236: --- I'll just add my 0,02€ - the main thing to vet now that it works (first make it work), is the interface to the client. are the request params ideal? is the response data structure locked down? if so, get this committed ASAP and iterate on the internals of distributed and performance issues. Admittedly I've not tried this feature out myself though. Committed stuff I'll try out easier than patches actually. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792190#action_12792190 ] Martijn van Groningen commented on SOLR-236: I have updated the response examples on the wiki. Some time ago I tried to come up with an *accurate* distributed solution, but I ran a problem as I have described in a previous comment: {quote} Field collapsing keeps track of the number of document collapsed per unique field value and the total count documents encountered per unique field. If the total count is greater than the specified collapse threshold then the number of documents collapsed is the difference between the total count and threshold. Lets say we have two shards each shard has one document with the same field value. The collapse threshold is one, meaning that if we run the collapsing algorithm on the shard individually both documents will never be collapsed. But when the algorithm applies to both shards, one of the documents must be collapsed however neither shared knows that its document is the one to collapse. There are more situations described as above, but it all boils down to the fact that each shard does not have meta information about the other shards in the cluster. Sharing the intermediate collapse results between the shards is in my opinion not an option. This is because if you do that then you also need to share information about documents / fields that have a collapse count of zero. This is totally impractical for large indexes. {quote} I'm really curious how others have addressed this issue. I have not stumbled on any literature on this particular issue, maybe someone else has. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792189#action_12792189 ] Uri Boness commented on SOLR-236: - {quote} Grant, this patch may not be perfect but I think we all agree that it is a great start. This is stable, used by many and has been well supported by the community. This is also a large patch and as I have known from my DataImportHandler experience, maintaining a large patch is quite a pain (and DataImportHandler didn't even touch the core). How about we commit this (after some review, of course), mark this as experimental (no guarantees of any sort) and then start improving it one issue at a time? Alternately, if you are not comfortable adding it to trunk, we can commit this on a branch and merge into trunk later. {quote} I think managing a separate branch will be just as hard as managing a patch. I do however agree that it's about time this patch will be committed to the trunk. Even though the current solution is not scalable in terms of distributed search (and I agree that the current solution for that is not really a viable solution), many are already using it and it is the most wanted feature in JIRA after all. One think you can do, is apply the changed to the core (which are not really many) and commit the rest of the patch as a contrib (along with all the disclaimers Shalin mentioned above). > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792115#action_12792115 ] Shalin Shekhar Mangar commented on SOLR-236: {quote} I'd define large scale for this in a couple of ways: 1. Lots of docs in the result set (10K+) 2. Lots of overall docs (100M+) 3. Lots of queries (> 10 QPS) {quote} Grant, this patch may not be perfect but I think we all agree that it is a great start. This is stable, used by many and has been well supported by the community. This is also a large patch and as I have known from my DataImportHandler experience, maintaining a large patch is quite a pain (and DataImportHandler didn't even touch the core). How about we commit this (after some review, of course), mark this as experimental (no guarantees of any sort) and then start improving it one issue at a time? Alternately, if you are not comfortable adding it to trunk, we can commit this on a branch and merge into trunk later. What do you think? > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791995#action_12791995 ] Oleg Gnatovskiy commented on SOLR-236: -- Grant - I agree regarding the current distributed implementation. The implementation is pretty much pseudo-distributed and would cause many companies (ours included) to have to completely restructure their indexes. What we tried long ago, was to have the process method on each shard to return the id that is being collapsed on, along with documentId and score. Then, in mergeIds we would do another level of collapse - basically keeping only 1 of the documents with a unique collapseId, and removing the others from all other shards. Obviously this caused several problems, not the least of which being that facet counts would always be slightly off, since we might have removed a document that was counted by the facetComponent. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791992#action_12791992 ] Grant Ingersoll commented on SOLR-236: -- {quote} I think you also referring to sharding. Sharding is supported, but not in a very elegant way. You will need to partition your documents to your shards in such a way that all documents belonging to a collapse group appear on one shard. To be honest I have never tested the patch on a corpus of 100M docs. {quote} That doesn't seem good and I don't think it will work w/ all the distributed work going on. I will likely have some time next week to help out. Has anyone looked at how Google or others do this? Clearly they collapse at very large scale w/ no noticeable detrimental effect. Anyone looked at the literature on this? bq. The first two response examples are for 'old' patches. The last response example is for the more recent patches (and current patch). OK, good to know. Can you update the page to reflect the latest patch? > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791986#action_12791986 ] Martijn van Groningen commented on SOLR-236: Shalin. 1. This configuration also looks fine by me. The reason I added ... was to be able support sharing of collapseCollectorFactory instances between different collapse components in the near future. You think that is a valid reason for that? Or do you think that collapseCollectorFactories shouldn't be shared? 2. I forgot to create that, so a good thing you added it. 3. I think leaving out those changes will make the distributed integration tests fail (Haven't checked it). Noble. 1. The reason I gave a name to collaspeCollectorFactory was for using an instance twice for different collapse components. 2. Moving the classname to the class attribute looks better, then in the function element. So I think we should change that. Grant. 1. I think you also referring to sharding. Sharding is supported, but not in a very elegant way. You will need to partition your documents to your shards in such a way that all documents belonging to a collapse group appear on one shard. To be honest I have never tested the patch on a corpus of 100M docs. 2. Field collapsing can impact the search time in a very negative way. I wrote a small paragraph about it on my [blog|http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/]. 3. The first two response examples are for 'old' patches. The last response example is for the more recent patches (and current patch). > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791977#action_12791977 ] Grant Ingersoll commented on SOLR-236: -- Is there a typo on the http://wiki.apache.org/solr/FieldCollapsing page in regards to the outputs? There are two different output results, but the URL for the examples are the same. See http://wiki.apache.org/solr/FieldCollapsing#Examples. I think the second one is intended to show a collapse count for fields? Also, I'm not sold on having separate collapse elements from the actual response, but I know other things do it too, so it isn't a huge deal), but the list of "parallel arrays" that one needs to traverse in order to render results is growing (highlighter, MLT, now Field Collapsing. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791972#action_12791972 ] Grant Ingersoll commented on SOLR-236: -- I'd define large scale for this in a couple of ways: 1. Lots of docs in the result set (10K+) 2. Lots of overall docs (100M+) 3. Lots of queries (> 10 QPS) > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791968#action_12791968 ] Stephen Weiss commented on SOLR-236: How do we define "large scale"? I have an index of about 5 million docs. Does that qualify? I'm working on it right now, I can run whatever benchmarks you like. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791953#action_12791953 ] Grant Ingersoll commented on SOLR-236: -- bq. Does anybody have a reason for why this should not be committed to trunk as it stands right now? It's been a while, but the last time I looked at it (3-4 mos. ago) I had the impression that it wouldn't scale. Has anyone benchmarked this at large scale? > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791952#action_12791952 ] Noble Paul commented on SOLR-236: - shalin, the names may not be necessary on the collapseCollectorFactory becaus they are never referred by the name > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791850#action_12791850 ] Martijn van Groningen commented on SOLR-236: Well that is nice to hear Stephen :-). I think I will add a 1.4 comparable patch to the issue, so people do not have issues while patching. I think it is a good idea Shalin to add the patch to the trunk as it is. The patch is quite stable now. For any future work related to field-collapsing we should open new issues (this is the longest issue I've ever seen). Does anyone else has a reason why field-collapsing shouldn't be committed to the trunk? > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791836#action_12791836 ] Shalin Shekhar Mangar commented on SOLR-236: Does anybody have a reason for why this should not be committed to trunk as it stands right now? > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791672#action_12791672 ] Stephen Weiss commented on SOLR-236: Martijn, I'm about to upgrade our production servers to Solr 1.4 with this latest patch you just posted and the difference is *incredible*. The time from startup to first collapsed query results has gone from 90 down to about 20 seconds, subsequent searches seem to execute about twice as fast on average. SOLR-236 has come a very long way in the year since we last patched. Thanks for all the hard work, it's truly great. FYI, it doesn't patch clean against the 1.4 distribution tarball but I don't even understand what the conflict is, reading the patch the original code in that area that failed looked identical to what the patch was expecting: (in QueryComponent.java) sreq.params.remove(ResponseBuilder.FIELD_SORT_VALUES); // this was there + + // disable collapser + sreq.params.remove("collapse.field"); + // make sure that the id is returned for correlation. // and so was this? Maybe it's a whitespace issue? Anyway it works fine if you just paste it in place. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789550#action_12789550 ] Chad Kouse commented on SOLR-236: - Just wanted to comment that I am experiencing the same behavior as Marc Menghin above (NPE) -- the patch did NOT install cleanly (1 hunk failed) -- but I couldn't really tell why since it looked like it should have worked -- I just manually copied the hunk into the write class Sorry I didn't note what failed > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-solr-236-2.patch, > field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, > field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789381#action_12789381 ] Marc Menghin commented on SOLR-236: --- Hi, new to Solr, so sorry for my likely still incomplete setup. I got everything from Solr SVN and applied the Patch (field-collapse-5.patch2009-12-08 09:43 PM). As I search I get a NPE because I seem to not have a cache for the collapsing. It wants to add a entry to the cache but can't. There is none at that time, which it checks before in AbstractDocumentCollapser.collapse but still wants to use it later in AbstractDocumentCollapser.createDocumentCollapseResult. I suppose thats a bug? Or is something wrong on my side? Exception I get is: java.lang.NullPointerException at org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.createDocumentCollapseResult(AbstractDocumentCollapser.java:278) at org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.executeCollapse(AbstractDocumentCollapser.java:249) at org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java:172) at org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:173) at org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) I fixed it locally by only adding something to the cache if there is one (fieldCollapseCache != null). But I'm not very into the code so not sure if thats a good/right way to fix it. Thanks, Marc > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-solr-236-2.patch, > field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, > field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781279#action_12781279 ] German Attanasio Ruiz commented on SOLR-236: Tomorrow I'm going to try the patch , the next time I hope to help and not only communicate the problem > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsin g.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779418#action_12779418 ] Martijn van Groningen commented on SOLR-236: I can confirm this bug. I will attach a new patch that fixes this issue shortly. Thanks for noticing. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-solr-236-2.patch, > field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, > field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779061#action_12779061 ] German Attanasio Ruiz commented on SOLR-236: Sorting of results doesn't work properly. Next, I detail the steps I followed and the problem I faced I am using solr as a search engine for web pages, from which I use a field named "site" for collapsing and sort over scord Steps After downloading the last version of solr "solr-2009-11-15" and applying the patch "field-collapse-5.patch 2009-11-15 08:55 PM Martijn van Groningen 239 kB" STEP 1 - I make a search using fieldcollapsing and the result is correct, the number with greatest scord is 0.477 STEP 2 - I make the same search and the fieldcollapsing throws other result with scord 0.17, the (correct) result of step 1 does not appear again Possible problem Step 1 stores the document in the cache for future searches at Step 2 the search is don over the cache and does not find the previously stored document Possible solution I believe that the problem is in the storing of the document in the cache since if we make step 2 again we have the same result and the document with scord of 0.17 is not removed from the results, the only result removed is the document with scord 0.477 Conclusion Documents are not sorted properly when using "fieldcollapsing + solrcache", that is when documents stored in solr cache are required > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-solr-236-2.patch, > field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, > field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778443#action_12778443 ] Thomas Woodard commented on SOLR-236: - And this morning, without changing anything, it is working fine. I don't know what happened on Friday, but the changes I made then must have fixed it without showing up for some reason. In any case, thank you for the assistance. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-solr-236-2.patch, > field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, > field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.