[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841756#action_12841756 ] Peter Karich edited comment on SOLR-236 at 3/5/10 8:53 AM: --- It seems to me that the provided changes are necessary to make the OutOfMemory exception gone (see appended 3 files). Please apply the files with caution, because I made the changes from an old patch (from Nov 2009) was (Author: peathal): It seems to me that the provides changes are necessary to make the OutOfMemory exception gone. Please apply the files with caution, because I made the changes from an old patch (from Nov 2009) Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841147#action_12841147 ] Peter Karich edited comment on SOLR-236 at 3/4/10 9:48 AM: --- regarding the OutOfMemory problem: we are now testing the suggested change in production. I replaced the float array with a TreeMapInteger, Float. The change was nearly trivial (I cannot provide a patch easily, because we are using an older patch, althoug I could post the 3 changed files.) The point why I used a TreeMap instead a HashMap was that in the method advance in the class NonAdjacentDocumentCollapser.PredefinedScorer I needed the tailMap method: {noformat}public int advance(int target) throws IOException { // now we need a treemap method: iter = scores.tailMap(target).entrySet().iterator(); if (iter.hasNext()) return target; else return NO_MORE_DOCS; } {noformat} Then - I think - I discovered a bug/inconsistent behaviour: If I run the test FieldCollapsingIntegrationTest.testNonAdjacentCollapse_withFacetingBefore then the scores arrays will be created ala new float[maxDocs] in the old version. But the array will never be filled with some values so Float value1 = values.get(doc1); will return null in the method NonAdjacentDocumentCollapser.FloatValueFieldComparator.compare (the size of TreeMap is 0!); I work around this via {noformat} if (value1 == null) value1 = 0f; if (value2 == null) value2 = 0f; {noformat} I think the compare method should NOT be called if no docs are in the scores array ... ? was (Author: peathal): regarding the OutOfMemory problem: we are now testing the suggested change in production. I replaced the float array with a TreeMapInteger, Float. The change was nearly trivial (I cannot provide a patch easily, because we are using an older patch, althoug I could post the 3 changed files.) The point why I used a TreeMap instead a HashMap was that in the method advance in the class NonAdjacentDocumentCollapser.PredefinedScorer I needed the tailMap method: {noformat}public int advance(int target) throws IOException { // now we need a treemap method: iter = scores.tailMap(target).entrySet().iterator(); if (iter.hasNext()) return target; else return NO_MORE_DOCS; } {noformat} Then - I think - I discovered a bug/inconsistent behaviour: If I run the test FieldCollapsingIntegrationTest.testNonAdjacentCollapse_withFacetingBefore then the scores arrays will be created ala new float[maxDocs] in the old version. But the array will never be filled with some values so Float value1 = values.get(doc1); will return null in the method NonAdjacentDocumentCollapser.FloatValueFieldComparator.compare (the size of TreeMap is 0!); I work around this via {noformat} if (value1 == null) value1 = 0f; if (value2 == null) value2 = 0f; {noformat} although the compare method should be called if no docs are in the scores array ... ? Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841147#action_12841147 ] Peter Karich edited comment on SOLR-236 at 3/4/10 9:46 AM: --- regarding the OutOfMemory problem: we are now testing the suggested change in production. I replaced the float array with a TreeMapInteger, Float. The change was nearly trivial (I cannot provide a patch easily, because we are using an older patch, althoug I could post the 3 changed files.) The point why I used a TreeMap instead a HashMap was that in the method advance in the class NonAdjacentDocumentCollapser.PredefinedScorer I needed the tailMap method: {noformat}public int advance(int target) throws IOException { // now we need a treemap method: iter = scores.tailMap(target).entrySet().iterator(); if (iter.hasNext()) return target; else return NO_MORE_DOCS; } {noformat} Then - I think - I discovered a bug/inconsistent behaviour: If I run the test FieldCollapsingIntegrationTest.testNonAdjacentCollapse_withFacetingBefore then the scores arrays will be created ala new float[maxDocs] in the old version. But the array will never be filled with some values so Float value1 = values.get(doc1); will return null in the method NonAdjacentDocumentCollapser.FloatValueFieldComparator.compare (the size of TreeMap is 0!); I work around this via {noformat} if (value1 == null) value1 = 0f; if (value2 == null) value2 = 0f; {noformat} although the compare method should be called if no docs are in the scores array ... ? was (Author: peathal): regarding the OutOfMemory problem: we are now testing the suggested change in production. I replaced the float array with a TreeMapInteger, Float. The change was nearly trivial (I cannot provide a patch easily, because we are using an older patch, althoug I could post the 3 changed files.) The point why I used a TreeMap instead a HashMap was that in the method advance in the class NonAdjacentDocumentCollapser.PredefinedScorer I needed the tailMap method: {noformat} public int advance(int target) throws IOException { // now we need a treemap method: iter = scores.tailMap(target).entrySet().iterator(); if (iter.hasNext()) return target; else return NO_MORE_DOCS; } {noformat} Then - I think - I discovered a bug/inconsistent behaviour: If I run the test FieldCollapsingIntegrationTest.testNonAdjacentCollapse_withFacetingBefore then the scores arrays will be created ala new float[maxDocs] in the old version. But the array will never be filled with some values so Float value1 = values.get(doc1); will return null in the method NonAdjacentDocumentCollapser.FloatValueFieldComparator.compare (the size of TreeMap is 0!); I work around this via {noformat} if (value1 == null) value1 = 0f; if (value2 == null) value2 = 0f; {noformat} although the compare method should be called if no docs are in the scores array ... ? Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836919#action_12836919 ] Peter Steevensz edited comment on SOLR-236 at 2/22/10 9:40 PM: --- I applied this patch to the nightlybuild of feb 22 and this compilers without any problem. I can start Solr and it runs fine. But when i add the Field Collapse in the solrconfig.xml i cannot start Solr anymore. After adding this line to my solrconfig.xml: searchComponent name=query class=org.apache.solr.handler.component.CollapseComponent / I get this error when i run Solr: 2010-02-22 22:24:30.722::WARN: Failed startup of context org.mortbay.jetty.webapp.webappcont...@7f5580{/solr,jar:file:/opt/apache-solr-1.5-dev/example/webapps/solr.war!/} java.lang.NullPointerException at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:593) at org.mortbay.jetty.servlet.Context.startContext(Context.java:139) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117) at org.mortbay.jetty.Server.doStart(Server.java:210) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.mortbay.start.Main.invokeMain(Main.java:183) at org.mortbay.start.Main.start(Main.java:497) at org.mortbay.start.Main.main(Main.java:115) (I am using Centos with Java 1.6.0_13) Any help is greatly appreciated!! was (Author: steevensz): I applied this patch to the nightlybuild of feb 22 and this compilers without any problem. I can start Solr and it runs fine. But when i add the Field Collapse in the solrconfig.xml i cannot start Solr anymore. After adding this line to my solrconfig.xml: searchComponent name=query class=org.apache.solr.handler.component.CollapseComponent / I get this error when i run Solr: 2010-02-22 22:24:30.722::WARN: Failed startup of context org.mortbay.jetty.webapp.webappcont...@7f5580{/solr,jar:file:/opt/apache-solr-1.5-dev/example/webapps/solr.war!/} java.lang.NullPointerException at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:593) at org.mortbay.jetty.servlet.Context.startContext(Context.java:139) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117) at org.mortbay.jetty.Server.doStart(Server.java:210) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.mortbay.start.Main.invokeMain(Main.java:183) at org.mortbay.start.Main.start(Main.java:497)
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836919#action_12836919 ] Peter Steevensz edited comment on SOLR-236 at 2/22/10 9:43 PM: --- I applied this patch to the nightlybuild of feb 22 and this compiles without any problem. I can start Solr and it runs fine. But when i add the Field Collapse in the solrconfig.xml i cannot start Solr anymore. After adding this line to my solrconfig.xml: searchComponent name=query class=org.apache.solr.handler.component.CollapseComponent / I get this error when i run Solr: 2010-02-22 22:24:30.722::WARN: Failed startup of context org.mortbay.jetty.webapp.webappcont...@7f5580{/solr,jar:file:/opt/apache-solr-1.5-dev/example/webapps/solr.war!/} java.lang.NullPointerException at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:593) at org.mortbay.jetty.servlet.Context.startContext(Context.java:139) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117) at org.mortbay.jetty.Server.doStart(Server.java:210) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.mortbay.start.Main.invokeMain(Main.java:183) at org.mortbay.start.Main.start(Main.java:497) at org.mortbay.start.Main.main(Main.java:115) (I am using Centos with Java 1.6.0_13) Any help is greatly appreciated!! was (Author: steevensz): I applied this patch to the nightlybuild of feb 22 and this compilers without any problem. I can start Solr and it runs fine. But when i add the Field Collapse in the solrconfig.xml i cannot start Solr anymore. After adding this line to my solrconfig.xml: searchComponent name=query class=org.apache.solr.handler.component.CollapseComponent / I get this error when i run Solr: 2010-02-22 22:24:30.722::WARN: Failed startup of context org.mortbay.jetty.webapp.webappcont...@7f5580{/solr,jar:file:/opt/apache-solr-1.5-dev/example/webapps/solr.war!/} java.lang.NullPointerException at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:593) at org.mortbay.jetty.servlet.Context.startContext(Context.java:139) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117) at org.mortbay.jetty.Server.doStart(Server.java:210) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.mortbay.start.Main.invokeMain(Main.java:183) at org.mortbay.start.Main.start(Main.java:497)
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835258#action_12835258 ] Peter Karich edited comment on SOLR-236 at 2/18/10 4:06 PM: Trying the latest patch from 1th Feb 2010 compiles against solr-2010-02-13 from nightly build but does not work. If I query http://server/cs-bidcs/select?q=*:*collapse.field=myfield it fails with: {noformat} HTTP Status 500 - null java.lang.NullPointerException at org.apache.solr.schema.FieldType.toExternal(FieldType.java:329) at org.apache.solr.schema.FieldType.storedToReadable(FieldType.java:348) at org.apache.solr.search.fieldcollapse.collector.AbstractCollapseCollector.getCollapseGroupResult(AbstractCollapseCollector.java:58) at org.apache.solr.search.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory$DocumentCountCollapseCollector.getResult(DocumentGroupCountCollapseCollectorFactory.ja va:84) at org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.getCollapseInfo(AbstractDocumentCollapser.java:193) at org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:192) at org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at ... {noformat} I only need the OutOfMemory problem solved ... :-( was (Author: peathal): Trying the latest patch from 1th Feb 2010 compiles against solr-2010-02-13 from nightly build but does not work. If I query http://searchdev05:15100/cs-bidcs/select?q=*:*collapse.field=myfield it fails with: {noformat} HTTP Status 500 - null java.lang.NullPointerException at org.apache.solr.schema.FieldType.toExternal(FieldType.java:329) at org.apache.solr.schema.FieldType.storedToReadable(FieldType.java:348) at org.apache.solr.search.fieldcollapse.collector.AbstractCollapseCollector.getCollapseGroupResult(AbstractCollapseCollector.java:58) at org.apache.solr.search.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory$DocumentCountCollapseCollector.getResult(DocumentGroupCountCollapseCollectorFactory.ja va:84) at org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.getCollapseInfo(AbstractDocumentCollapser.java:193) at org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:192) at org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at ... {noformat} I only need the OutOfMemory problem solved ... :-( Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835258#action_12835258 ] Peter Karich edited comment on SOLR-236 at 2/18/10 4:06 PM: Trying the latest patch from 1th Feb 2010 compiles against solr-2010-02-13 from nightly build but does not work. If I query http://server/solr-app/select?q=*:*collapse.field=myfield it fails with: {noformat} HTTP Status 500 - null java.lang.NullPointerException at org.apache.solr.schema.FieldType.toExternal(FieldType.java:329) at org.apache.solr.schema.FieldType.storedToReadable(FieldType.java:348) at org.apache.solr.search.fieldcollapse.collector.AbstractCollapseCollector.getCollapseGroupResult(AbstractCollapseCollector.java:58) at org.apache.solr.search.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory$DocumentCountCollapseCollector.getResult(DocumentGroupCountCollapseCollectorFactory.ja va:84) at org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.getCollapseInfo(AbstractDocumentCollapser.java:193) at org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:192) at org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at ... {noformat} I only need the OutOfMemory problem solved ... :-( was (Author: peathal): Trying the latest patch from 1th Feb 2010 compiles against solr-2010-02-13 from nightly build but does not work. If I query http://server/cs-bidcs/select?q=*:*collapse.field=myfield it fails with: {noformat} HTTP Status 500 - null java.lang.NullPointerException at org.apache.solr.schema.FieldType.toExternal(FieldType.java:329) at org.apache.solr.schema.FieldType.storedToReadable(FieldType.java:348) at org.apache.solr.search.fieldcollapse.collector.AbstractCollapseCollector.getCollapseGroupResult(AbstractCollapseCollector.java:58) at org.apache.solr.search.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory$DocumentCountCollapseCollector.getResult(DocumentGroupCountCollapseCollectorFactory.ja va:84) at org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.getCollapseInfo(AbstractDocumentCollapser.java:193) at org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:192) at org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at ... {noformat} I only need the OutOfMemory problem solved ... :-( Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835258#action_12835258 ] Peter Karich edited comment on SOLR-236 at 2/18/10 4:07 PM: Trying the latest patch from 1th Feb 2010. It compiles against solr-2010-02-13 from nightly build dir, but does not work. If I query http://server/solr-app/select?q=*:*collapse.field=myfield it fails with: {noformat} HTTP Status 500 - null java.lang.NullPointerException at org.apache.solr.schema.FieldType.toExternal(FieldType.java:329) at org.apache.solr.schema.FieldType.storedToReadable(FieldType.java:348) at org.apache.solr.search.fieldcollapse.collector.AbstractCollapseCollector.getCollapseGroupResult(AbstractCollapseCollector.java:58) at org.apache.solr.search.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory$DocumentCountCollapseCollector.getResult(DocumentGroupCountCollapseCollectorFactory.ja va:84) at org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.getCollapseInfo(AbstractDocumentCollapser.java:193) at org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:192) at org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at ... {noformat} I only need the OutOfMemory problem solved ... :-( was (Author: peathal): Trying the latest patch from 1th Feb 2010 compiles against solr-2010-02-13 from nightly build but does not work. If I query http://server/solr-app/select?q=*:*collapse.field=myfield it fails with: {noformat} HTTP Status 500 - null java.lang.NullPointerException at org.apache.solr.schema.FieldType.toExternal(FieldType.java:329) at org.apache.solr.schema.FieldType.storedToReadable(FieldType.java:348) at org.apache.solr.search.fieldcollapse.collector.AbstractCollapseCollector.getCollapseGroupResult(AbstractCollapseCollector.java:58) at org.apache.solr.search.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory$DocumentCountCollapseCollector.getResult(DocumentGroupCountCollapseCollectorFactory.ja va:84) at org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.getCollapseInfo(AbstractDocumentCollapser.java:193) at org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:192) at org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at ... {noformat} I only need the OutOfMemory problem solved ... :-( Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830305#action_12830305 ] Kevin Cunningham edited comment on SOLR-236 at 2/5/10 11:06 PM: Regarding Patrick's comment about a memory leak, we are seeing something similar - very large memory usage and eventually using all the available memory. Were there any confirmed issues that may have been addressed with the later patches? We're using the 12-24 patch. Any toggles we can switch to still get the feature, yet minimize the memory footprint? We had been running the 11-29 field-collapse-5.patch patch and saw nothing near this amount of memory consumption. was (Author: kunningham): Regarding Patrick's comment about a memory leak, we are seeing something similar - very large memory usage and eventually using all the available memory. Were there any confirmed issues that may have been addressed with the later patches? We're using the 12-24 patch. Any toggles we can switch to still get the feature, yet minimize the memory footprint? Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830305#action_12830305 ] Kevin Cunningham edited comment on SOLR-236 at 2/5/10 11:06 PM: Regarding Patrick's comment about a memory leak, we are seeing something similar - very large memory usage and eventually using all the available memory. Were there any confirmed issues that may have been addressed with the later patches? We're using the 12-24 patch. Any toggles we can switch to still get the feature, yet minimize the memory footprint? We had been running the 11-29 field-collapse-5.patch patch and saw nothing near this amount of memory consumption. What fixes would we be missing if ran Solr 1.4 with the last field-collapse-5.patch patch? was (Author: kunningham): Regarding Patrick's comment about a memory leak, we are seeing something similar - very large memory usage and eventually using all the available memory. Were there any confirmed issues that may have been addressed with the later patches? We're using the 12-24 patch. Any toggles we can switch to still get the feature, yet minimize the memory footprint? We had been running the 11-29 field-collapse-5.patch patch and saw nothing near this amount of memory consumption. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797794#action_12797794 ] Martijn van Groningen edited comment on SOLR-236 at 1/7/10 9:28 PM: bq. The result document of our prefix query, which was at position 1 without collapsing, was with collapsing not even within the top 10 results. We using the option collapse.maxdocs=150 and after changing this option to the value 15000, the results seem to be as expected. Because of that, we concluded, that there has to be a problem with the sorting of the uncollapsed docset. The collapse.maxdocs aborts collapsing after the threshold is met, but it is doing that based on the uncollapsed docset which is not sorted in any way. The result of that is that documents that would normally appear in the first page don't appear at all in the search result. Eventually the collapse component uses the collapsed docset as the result set and not the uncollapsed docset. bq. Also, we noticed a huge memory leak problem, when using collapsing. We configured the component with searchComponent name=query class=org.apache.solr.handler.component.CollapseComponent/. Without setting the option collapse.field, it works normally, there are far no memory problems. If requests with enabled collapsing are received by the Solr server, the whole memory (oldgen could not be freed; eden space is heavily in use; ...) gets full after some few requests. By using a profiler, we noticed that the filterCache was extraordinary large. We supposed that there could be a caching problem (collapeCache was not enabled). I agree it gets huge. This applies for both the filterCache and field collapse cache. This is something that has to be addressed and certainly will in the new field-collapse implementation. In the patch you're using too much is being cached (some data can even be neglected in the cache). Also in some cases strings are being cached that actually could be replaced with hashcodes. bq. Additionally it might be very useful, if the parameter collapse=true|false would work again and could be used to enabled/disable the collapsing functionality. Currently, the existence of a field choosen for collapsing enables this feature and there is no possibility to configure the fields for collapsing within the request handlers. With that, we could configure it and only enable/disable it within the requests like it will be conveniently used by other components (highlighting, faceting, ...). That actually makes sense for using the collapse.enable parameter again in the patch. Martijn was (Author: martijn): bq. The result document of our prefix query, which was at position 1 without collapsing, was with collapsing not even within the top 10 results. We using the option collapse.maxdocs=150 and after changing this option to the value 15000, the results seem to be as expected. Because of that, we concluded, that there has to be a problem with the sorting of the uncollapsed docset. The collapse.maxdocs aborts collapsing after the threshold is met, but it is doing that based on the uncollapsed docset which is not sorted in any way. The result of that is that documents that would normally appear in the first page don't appear at all in the search result. Eventually the collapse component uses the collapsed docset as the result set and not the uncollapsed docset. bq. Also, we noticed a huge memory leak problem, when using collapsing. We configured the component with searchComponent name=query class=org.apache.solr.handler.component.CollapseComponent/. Without setting the option collapse.field, it works normally, there are far no memory problems. If requests with enabled collapsing are received by the Solr server, the whole memory (oldgen could not be freed; eden space is heavily in use; ...) gets full after some few requests. By using a profiler, we noticed that the filterCache was extraordinary large. We supposed that there could be a caching problem (collapeCache was not enabled). I agree it gets huge. This applies for both the filterCache and field collapse cache. This is something that has to be addressed and certainly will in the new field-collapse implementation. In the patch you're using too much is being cached (some data can even be neglected in the cache). Also in some cases strings are being cached that actually could be replaced with hashcodes. bq. Additionally it might be very useful, if the parameter collapse=true|false would work again and could be used to enabled/disable the collapsing functionality. Currently, the existence of a field choosen for collapsing enables this feature and there is no possibility to configure the fields for collapsing within the request handlers. With that, we could configure it and only enable/disable it within the requests like it will be
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792510#action_12792510 ] Mark Miller edited comment on SOLR-236 at 12/18/09 3:41 PM: bq. (Faceting fot a 50 times perf boost in 1.4) No it didn't. Certain cases have gotten a boost (I think you might be referring to multi-valued field faceting cases?). And general faceting was always relatively fast and scalable. I'm against committing features to trunk with a warning that the feature is not ready for trunk. was (Author: markrmil...@gmail.com): bq. (Faceting fot a 50 times perf boost in 1.4) No it didn't. Certain cases have gotten a boost (I think you might be referring to multi-field faceting cases?). And general faceting was always relatively fast and scalable. I'm against committing features to trunk with a warning that the feature is not ready for trunk. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792518#action_12792518 ] Mark Miller edited comment on SOLR-236 at 12/18/09 4:12 PM: bq. I very much disagree with a policy blocking non-production-ready code from being in source control Just to be clear, there is no such policy that I've seen - each decision just comes down to consensus. And as far as I know, our branch policy is pretty much anything goes - trunk is very different than svn. Anyone (anyone with access to svn that is) can play around with a branch for anything if they want. I agree with your thoughts on a branch - if the argument is, we want it to be easier for devs to check out and work on this, or for users to checkout and build this without applying patches, why not just make a branch? Merging is annoying but not difficult - I've been doing plenty of branch merging lately, and while its not glorious work, modern tools make it more of a grind than a challenge. was (Author: markrmil...@gmail.com): bq. I very much disagree with a policy blocking non-production-ready code from being in source control Just to be clear, there is no such policy that I've seen - each decision just comes down to consensus. And as far as I know, our branch policy is pretty much anything goes - trunk is very different than svn. Anyone can play around with a branch for anything if they want. I agree with your thoughts on a branch - if the argument is, we want it to be easier for devs to check out and work on this, or for users to checkout and build this without applying patches, why not just make a branch? Merging is annoying but not difficult - I've been doing plenty of branch merging lately, and while its not glorious work, modern tools make it more of a grind than a challenge. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791952#action_12791952 ] Noble Paul edited comment on SOLR-236 at 12/17/09 2:55 PM: --- shalin, the names may not be necessary on the collapseCollectorFactory becaus they are never referred by the name how about making the functions also plugis as {code:xml} collapseCollectorFactory class=org.apache.solr.search.fieldcollapse.collector.AggregateCollapseCollectorFactory function name=sum class=org.apache.solr.search.fieldcollapse.collector.aggregate.SumFunction/ function name=avg class=org.apache.solr.search.fieldcollapse.collector.aggregate.AverageFunction/ function name=min class=org.apache.solr.search.fieldcollapse.collector.aggregate.MinFunction/ function name=max class=org.apache.solr.search.fieldcollapse.collector.aggregate.MaxFunction/ /collapseCollectorFactory {code} was (Author: noble.paul): shalin, the names may not be necessary on the collapseCollectorFactory becaus they are never referred by the name Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792191#action_12792191 ] Erik Hatcher edited comment on SOLR-236 at 12/17/09 11:03 PM: -- I'll just add my 0,02€ - the main thing to vet now that it works (first make it work), is the interface to the client. are the request params ideal? is the response data structure locked down? if so, get this committed ASAP and iterate on the internals of distributed and performance issues (then make it right). Admittedly I've not tried this feature out myself though. Committed stuff I'll try out easier than patches actually. was (Author: ehatcher): I'll just add my 0,02€ - the main thing to vet now that it works (first make it work), is the interface to the client. are the request params ideal? is the response data structure locked down? if so, get this committed ASAP and iterate on the internals of distributed and performance issues. Admittedly I've not tried this feature out myself though. Committed stuff I'll try out easier than patches actually. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789550#action_12789550 ] Chad Kouse edited comment on SOLR-236 at 12/11/09 9:37 PM: --- Just wanted to comment that I am experiencing the same behavior as Marc Menghin above (NPE) -- the patch did NOT install cleanly (1 hunk failed) -- but I couldn't really tell why since it looked like it should have worked -- I just manually copied the hunk into the correct class Sorry I didn't note what failed was (Author: chadkouse): Just wanted to comment that I am experiencing the same behavior as Marc Menghin above (NPE) -- the patch did NOT install cleanly (1 hunk failed) -- but I couldn't really tell why since it looked like it should have worked -- I just manually copied the hunk into the write class Sorry I didn't note what failed Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Issue Comment Edited: (SOLR-236) Field collapsing
Hey there, I have beeb testing the last patch and I think or I am missing something or the way to show the collapsed documents when adjacent collapse can be sometimes confusing: I am using the patch replacing queryComponent for collapseComponent (not using both at same time): searchComponent name=query class=org.apache.solr.handler.component.CollapseComponent What I have noticed is, imagin you get these results in the search: doc1: id:001 collapseField:ccc doc2: id:002 collapseField:aaa doc3: id:003 collapseField:ccc doc4: id:004 collapseField:bbb And in the collapse_counts you get: int name=collapseCount1/int str name=fieldValueccc/str result name=collapsedDocs numFound=1 start=0 doc long name=id008/long str name=contentaaa aaa/str str name=colccc/str /doc /result Now, how can I know the head document of doc 008? Both 001 and 003 could be... wouldn't make sense to connect in someway the uniqueField with the collapsed documents? Adding something to collapse_counts like: int name=collapseCount1/int str name=fieldValueccc/str str name=uniqueFieldId003/str I currently have hacked FieldValueCountCollapseCollectorFactory to return: str name=fieldValueccc#003/str but this respose looks dirty... As I said maybe I am missunderstanding something and this can be knwon in someway. In that case can someone tell me how? Thanks in advance JIRA j...@apache.org wrote: [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783484#action_12783484 ] Martijn van Groningen edited comment on SOLR-236 at 11/29/09 9:56 PM: -- I have attached a new patch that has the following changes: # Added caching for the field collapse functionality. Check the [solr wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure field-collapsing with caching. # Removed the collapse.max parameter (collapse.threshold must be used instead). It was deprecated for a long time. was (Author: martijn): I have attached a new patch that has the following changes: # Added caching for the field collapse functionality. Check the [solr wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure the field-collapsing with caching. # Removed the collapse.max parameter (collapse.threshold must be used instead). It was deprecated for a long time. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. -- View this message in context:
Re: [jira] Issue Comment Edited: (SOLR-236) Field collapsing
Hi Marc, I'm not sure if I follow you completely, but the example you gave is not complete. I'm missing a few tags in your example. Lets assume the following response that the latest patches produce. lst name=collapse_counts str name=fieldcat/str lst name=results lst name=009 str name=fieldValuehard/str int name=collapseCount1/int result name=collapsedDocs numFound=1 start=0 doc long name=id008/long str name=contentaaa aaa/str str name=colccc/str /doc /result /lst ... /lst /lst The result list contains collapse groups. The name of the child elements are the collapse head ids. Everything that falls under the collapse head belongs to that collapse group and thus adding document head id to the field value is unnecessary. In the above example document with id 009 is the document head of document with id 008. Document with id 009 should be displayed in the search result. From what you have said, it seems that you properly configured the patch. Martijn 2009/12/7 Marc Sturlese marc.sturl...@gmail.com: Hey there, I have beeb testing the last patch and I think or I am missing something or the way to show the collapsed documents when adjacent collapse can be sometimes confusing: I am using the patch replacing queryComponent for collapseComponent (not using both at same time):  searchComponent name=query class=org.apache.solr.handler.component.CollapseComponent What I have noticed is, imagin you get these results in the search: doc1:  id:001  collapseField:ccc doc2:  id:002  collapseField:aaa doc3:  id:003  collapseField:ccc doc4:  id:004  collapseField:bbb And in the collapse_counts you get: int name=collapseCount1/int str name=fieldValueccc/str result name=collapsedDocs numFound=1 start=0 doc long name=id008/long str name=contentaaa aaa/str str name=colccc/str /doc /result Now, how can I know the head document of doc 008? Both 001 and 003 could be... wouldn't make sense to connect in someway  the uniqueField with the collapsed documents? Adding something to collapse_counts like: int name=collapseCount1/int str name=fieldValueccc/str str name=uniqueFieldId003/str I currently have hacked FieldValueCountCollapseCollectorFactory to return: str name=fieldValueccc#003/str but this respose looks dirty... As I said maybe I am missunderstanding something and this can be knwon in someway. In that case can someone tell me how? Thanks in advance JIRA j...@apache.org wrote:   [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783484#action_12783484 ] Martijn van Groningen edited comment on SOLR-236 at 11/29/09 9:56 PM: -- I have attached a new patch that has the following changes: # Added caching for the field collapse functionality. Check the [solr wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure field-collapsing with caching. # Removed the collapse.max parameter (collapse.threshold must be used instead). It was deprecated for a long time.    was (Author: martijn):   I have attached a new patch that has the following changes: # Added caching for the field collapse functionality. Check the [solr wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure the field-collapsing with caching. # Removed the collapse.max parameter (collapse.threshold must be used instead). It was deprecated for a long time. Field collapsing         Key: SOLR-236         URL: https://issues.apache.org/jira/browse/SOLR-236       Project: Solr      Issue Type: New Feature      Components: search   Affects Versions: 1.3       Reporter: Emmanuel Keller       Fix For: 1.5     Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch,
Re: [jira] Issue Comment Edited: (SOLR-236) Field collapsing
Yes it should look similar to that. What is the exact request you send to Solr? Also to check if the patch works correctly can you run: ant clean test There are a number of tests that test the Field collapse functionality. Martijn 2009/12/7 Marc Sturlese marc.sturl...@gmail.com: lst name=collapse_counts  str name=fieldcat/str   lst name=results     lst name=009       str name=fieldValuehard/str      int name=collapseCount1/int       result name=collapsedDocs numFound=1 start=0         doc           long name=id008/long           str name=contentaaa aaa/str           str name=colccc/str         /doc       /result     /lst     ...   /lst /lst I see, looks like I am applying the patch wrongly somehow. This the complete collapse_counts response I am getting: lst name=collapse_counts  str name=fieldcol/str  lst name=results   lst    int name=collapseCount1/int    int name=collapseCount1/int    int name=collapseCount1/int    str name=fieldValuebbb/str    str name=fieldValueccc/str    str name=fieldValuexxx/str    result name=collapsedDocs numFound=1 start=0     doc      long name=id2/long      str name=contentaaa aaa/str      str name=colbbb/str     /doc    /result    result name=collapsedDocs numFound=1 start=0     doc      long name=id8/long      str name=contentaaa aaa aaa sd/str      str name=colccc/str    /doc    /result    result name=collapsedDocs numFound=4 start=0     doc      long name=id12/long      str name=contentaaa aaa aaa v/str      str name=colxxx/str     /doc    /result   /lst  /lst /lst As you can see I am getting a lst tag with no name. As I understood what you told me. I should be getting as many lst tags as collapsed groups and the name attribute of the lst should be the unique field value. So, if the patch was applyed correcly teh response should look like: lst name=collapse_counts  str name=fieldcol/str  lst name=results   lst name=354 (the head value of the collapsed group)    int name=collapseCount1/int    str name=fieldValuebbb/str    result name=collapsedDocs numFound=1 start=0     doc      long name=id2/long      str name=contentaaa aaa/str      str name=colbbb/str     /doc    /result   /lst   lst name=654    int name=collapseCount1/int    str name=fieldValueccc/str    result name=collapsedDocs numFound=1 start=0     doc      long name=id8/long      str name=contentaaa aaa aaa sd/str      str name=colccc/str    /doc    /result   /lst   lst name=654    int name=collapseCount1/int    str name=fieldValuexxx/str    result name=collapsedDocs numFound=4 start=0     doc      long name=id12/long      str name=contentaaa aaa aaa v/str      str name=colxxx/str     /doc    /result   /lst  /lst /lst Is this the way the response looks like when you use teh patch? Thanks in advance Martijn v Groningen wrote: Hi Marc, I'm not sure if I follow you completely, but the example you gave is not complete. I'm missing a few tags in your example. Lets assume the following response that the latest patches produce. lst name=collapse_counts   str name=fieldcat/str   lst name=results     lst name=009       str name=fieldValuehard/str       int name=collapseCount1/int       result name=collapsedDocs numFound=1 start=0          doc           long name=id008/long           str name=contentaaa aaa/str           str name=colccc/str          /doc       /result     /lst     ...   /lst /lst The result list contains collapse groups. The name of the child elements are the collapse head ids. Everything that falls under the collapse head belongs to that collapse group and thus adding document head id to the field value is unnecessary.  In the above example document with id 009 is the document head of document with id 008. Document with id 009 should be displayed in the search result. From what you have said, it seems that you properly configured the patch. Martijn 2009/12/7 Marc Sturlese marc.sturl...@gmail.com: Hey there, I have beeb testing the last patch and I think or I am missing something or the way to show the collapsed documents when adjacent collapse can be sometimes confusing: I am using the patch replacing queryComponent for collapseComponent (not using both at same time):  searchComponent name=query class=org.apache.solr.handler.component.CollapseComponent What I have noticed is, imagin you get these results in the search: doc1:  id:001  collapseField:ccc doc2:  id:002  collapseField:aaa doc3:  id:003  collapseField:ccc doc4:  id:004  collapseField:bbb And in the collapse_counts you get: int name=collapseCount1/int
Re: [jira] Issue Comment Edited: (SOLR-236) Field collapsing
The request I am sending is: http://localhost:8983/solr/select/?q=aaaversion=2.2start=0rows=20indent=oncollapse.field=colcollapse.includeCollapsedDocs.fl=*collapse.type=adjacentcollapse.info.doc=truecollapse.info.count=true I search for 'aaa' in the content field. All the documents in the result contain that string in the field content Martijn v Groningen wrote: Yes it should look similar to that. What is the exact request you send to Solr? Also to check if the patch works correctly can you run: ant clean test There are a number of tests that test the Field collapse functionality. Martijn 2009/12/7 Marc Sturlese marc.sturl...@gmail.com: lst name=collapse_counts  str name=fieldcat/str   lst name=results     lst name=009       str name=fieldValuehard/str      int name=collapseCount1/int       result name=collapsedDocs numFound=1 start=0         doc           long name=id008/long           str name=contentaaa aaa/str           str name=colccc/str         /doc       /result     /lst     ...   /lst /lst I see, looks like I am applying the patch wrongly somehow. This the complete collapse_counts response I am getting: lst name=collapse_counts  str name=fieldcol/str  lst name=results   lst    int name=collapseCount1/int    int name=collapseCount1/int    int name=collapseCount1/int    str name=fieldValuebbb/str    str name=fieldValueccc/str    str name=fieldValuexxx/str    result name=collapsedDocs numFound=1 start=0     doc      long name=id2/long      str name=contentaaa aaa/str      str name=colbbb/str     /doc    /result    result name=collapsedDocs numFound=1 start=0     doc      long name=id8/long      str name=contentaaa aaa aaa sd/str      str name=colccc/str    /doc    /result    result name=collapsedDocs numFound=4 start=0     doc      long name=id12/long      str name=contentaaa aaa aaa v/str      str name=colxxx/str     /doc    /result   /lst  /lst /lst As you can see I am getting a lst tag with no name. As I understood what you told me. I should be getting as many lst tags as collapsed groups and the name attribute of the lst should be the unique field value. So, if the patch was applyed correcly teh response should look like: lst name=collapse_counts  str name=fieldcol/str  lst name=results   lst name=354 (the head value of the collapsed group)    int name=collapseCount1/int    str name=fieldValuebbb/str    result name=collapsedDocs numFound=1 start=0     doc      long name=id2/long      str name=contentaaa aaa/str      str name=colbbb/str     /doc    /result   /lst   lst name=654    int name=collapseCount1/int    str name=fieldValueccc/str    result name=collapsedDocs numFound=1 start=0     doc      long name=id8/long      str name=contentaaa aaa aaa sd/str      str name=colccc/str    /doc    /result   /lst   lst name=654    int name=collapseCount1/int    str name=fieldValuexxx/str    result name=collapsedDocs numFound=4 start=0     doc      long name=id12/long      str name=contentaaa aaa aaa v/str      str name=colxxx/str     /doc    /result   /lst  /lst /lst Is this the way the response looks like when you use teh patch? Thanks in advance Martijn v Groningen wrote: Hi Marc, I'm not sure if I follow you completely, but the example you gave is not complete. I'm missing a few tags in your example. Lets assume the following response that the latest patches produce. lst name=collapse_counts   str name=fieldcat/str   lst name=results     lst name=009       str name=fieldValuehard/str       int name=collapseCount1/int       result name=collapsedDocs numFound=1 start=0          doc           long name=id008/long           str name=contentaaa aaa/str           str name=colccc/str          /doc       /result     /lst     ...   /lst /lst The result list contains collapse groups. The name of the child elements are the collapse head ids. Everything that falls under the collapse head belongs to that collapse group and thus adding document head id to the field value is unnecessary.  In the above example document with id 009 is the document head of document with id 008. Document with id 009 should be displayed in the search result. From what you have said, it seems that you properly configured the patch. Martijn 2009/12/7 Marc Sturlese marc.sturl...@gmail.com: Hey there, I have beeb testing the last patch and I think or I am missing something or the way to show the collapsed documents when adjacent collapse can be sometimes confusing: I am using the patch replacing queryComponent for collapseComponent (not using both at same
Re: [jira] Issue Comment Edited: (SOLR-236) Field collapsing
The last two parameters are not necessary, since they default both to true. Could you run the field collapse tests tests successful? 2009/12/7 Marc Sturlese marc.sturl...@gmail.com: The request I am sending is: http://localhost:8983/solr/select/?q=aaaversion=2.2start=0rows=20indent=oncollapse.field=colcollapse.includeCollapsedDocs.fl=*collapse.type=adjacentcollapse.info.doc=truecollapse.info.count=true I search for 'aaa' in the content field. All the documents in the result contain that string in the field content Martijn v Groningen wrote: Yes it should look similar to that. What is the exact request you send to Solr? Also to check if the patch works correctly can you run: ant clean test There are a number of tests that test the Field collapse functionality. Martijn 2009/12/7 Marc Sturlese marc.sturl...@gmail.com: lst name=collapse_counts  str name=fieldcat/str   lst name=results     lst name=009       str name=fieldValuehard/str      int name=collapseCount1/int       result name=collapsedDocs numFound=1 start=0         doc           long name=id008/long           str name=contentaaa aaa/str           str name=colccc/str         /doc       /result     /lst     ...   /lst /lst I see, looks like I am applying the patch wrongly somehow. This the complete collapse_counts response I am getting: lst name=collapse_counts  str name=fieldcol/str  lst name=results   lst    int name=collapseCount1/int    int name=collapseCount1/int    int name=collapseCount1/int    str name=fieldValuebbb/str    str name=fieldValueccc/str    str name=fieldValuexxx/str    result name=collapsedDocs numFound=1 start=0     doc      long name=id2/long      str name=contentaaa aaa/str      str name=colbbb/str     /doc    /result    result name=collapsedDocs numFound=1 start=0     doc      long name=id8/long      str name=contentaaa aaa aaa sd/str      str name=colccc/str    /doc    /result    result name=collapsedDocs numFound=4 start=0     doc      long name=id12/long      str name=contentaaa aaa aaa v/str      str name=colxxx/str     /doc    /result   /lst  /lst /lst As you can see I am getting a lst tag with no name. As I understood what you told me. I should be getting as many lst tags as collapsed groups and the name attribute of the lst should be the unique field value. So, if the patch was applyed correcly teh response should look like: lst name=collapse_counts  str name=fieldcol/str  lst name=results   lst name=354 (the head value of the collapsed group)    int name=collapseCount1/int    str name=fieldValuebbb/str    result name=collapsedDocs numFound=1 start=0     doc      long name=id2/long      str name=contentaaa aaa/str      str name=colbbb/str     /doc    /result   /lst   lst name=654    int name=collapseCount1/int    str name=fieldValueccc/str    result name=collapsedDocs numFound=1 start=0     doc      long name=id8/long      str name=contentaaa aaa aaa sd/str      str name=colccc/str    /doc    /result   /lst   lst name=654    int name=collapseCount1/int    str name=fieldValuexxx/str    result name=collapsedDocs numFound=4 start=0     doc      long name=id12/long      str name=contentaaa aaa aaa v/str      str name=colxxx/str     /doc    /result   /lst  /lst /lst Is this the way the response looks like when you use teh patch? Thanks in advance Martijn v Groningen wrote: Hi Marc, I'm not sure if I follow you completely, but the example you gave is not complete. I'm missing a few tags in your example. Lets assume the following response that the latest patches produce. lst name=collapse_counts   str name=fieldcat/str   lst name=results     lst name=009       str name=fieldValuehard/str       int name=collapseCount1/int       result name=collapsedDocs numFound=1 start=0          doc           long name=id008/long           str name=contentaaa aaa/str           str name=colccc/str          /doc       /result     /lst     ...   /lst /lst The result list contains collapse groups. The name of the child elements are the collapse head ids. Everything that falls under the collapse head belongs to that collapse group and thus adding document head id to the field value is unnecessary.  In the above example document with id 009 is the document head of document with id 008. Document with id 009 should be displayed in the search result. From what you have said, it seems that you properly configured the patch. Martijn 2009/12/7 Marc Sturlese marc.sturl...@gmail.com: Hey there, I have beeb testing the last patch and I think or I am missing something or the
Re: [jira] Issue Comment Edited: (SOLR-236) Field collapsing
Yes, I can reproduce the same situation here. I will update the patch asap and add it to Jira. Martijn 2009/12/7 Marc Sturlese marc.sturl...@gmail.com: Hey! Got it working! The problem was that my uniqueField is indexed as long and it's not suported by the patch. The value is obtained in getCollapseGroupResult function in AbstarctCollapseCollector.java as: String schemaId = searcher.doc(docId).get(uniqueIdFieldname); To suport long,int,slong,sint,float,sfloat... It should be obtaining doing somenthing like: FieldType idFieldType = searcher.getSchema().getFieldType(uniqueIdFieldname); String schemaId = ; Fieldable name_field = null; try {    name_field = searcher.doc(id).getFieldable(uniqueIdFieldname); } catch (IOException ex) {    //deal with exception } if (name_field != null) {  schemaId = idFieldType.storedToReadable(name_field); } Martijn v Groningen wrote: The last two parameters are not necessary, since they default both to true. Could you run the field collapse tests tests successful? 2009/12/7 Marc Sturlese marc.sturl...@gmail.com: The request I am sending is: http://localhost:8983/solr/select/?q=aaaversion=2.2start=0rows=20indent=oncollapse.field=colcollapse.includeCollapsedDocs.fl=*collapse.type=adjacentcollapse.info.doc=truecollapse.info.count=true I search for 'aaa' in the content field. All the documents in the result contain that string in the field content Martijn v Groningen wrote: Yes it should look similar to that. What is the exact request you send to Solr? Also to check if the patch works correctly can you run: ant clean test There are a number of tests that test the Field collapse functionality. Martijn 2009/12/7 Marc Sturlese marc.sturl...@gmail.com: lst name=collapse_counts  str name=fieldcat/str   lst name=results     lst name=009       str name=fieldValuehard/str      int name=collapseCount1/int       result name=collapsedDocs numFound=1 start=0         doc           long name=id008/long           str name=contentaaa aaa/str           str name=colccc/str         /doc       /result     /lst     ...   /lst /lst I see, looks like I am applying the patch wrongly somehow. This the complete collapse_counts response I am getting: lst name=collapse_counts  str name=fieldcol/str  lst name=results   lst    int name=collapseCount1/int    int name=collapseCount1/int    int name=collapseCount1/int    str name=fieldValuebbb/str    str name=fieldValueccc/str    str name=fieldValuexxx/str    result name=collapsedDocs numFound=1 start=0     doc      long name=id2/long      str name=contentaaa aaa/str      str name=colbbb/str     /doc    /result    result name=collapsedDocs numFound=1 start=0     doc      long name=id8/long      str name=contentaaa aaa aaa sd/str      str name=colccc/str    /doc    /result    result name=collapsedDocs numFound=4 start=0     doc      long name=id12/long      str name=contentaaa aaa aaa v/str      str name=colxxx/str     /doc    /result   /lst  /lst /lst As you can see I am getting a lst tag with no name. As I understood what you told me. I should be getting as many lst tags as collapsed groups and the name attribute of the lst should be the unique field value. So, if the patch was applyed correcly teh response should look like: lst name=collapse_counts  str name=fieldcol/str  lst name=results   lst name=354 (the head value of the collapsed group)    int name=collapseCount1/int    str name=fieldValuebbb/str    result name=collapsedDocs numFound=1 start=0     doc      long name=id2/long      str name=contentaaa aaa/str      str name=colbbb/str     /doc    /result   /lst   lst name=654    int name=collapseCount1/int    str name=fieldValueccc/str    result name=collapsedDocs numFound=1 start=0     doc      long name=id8/long      str name=contentaaa aaa aaa sd/str      str name=colccc/str    /doc    /result   /lst   lst name=654    int name=collapseCount1/int    str name=fieldValuexxx/str    result name=collapsedDocs numFound=4 start=0     doc      long name=id12/long      str name=contentaaa aaa aaa v/str      str name=colxxx/str     /doc    /result   /lst  /lst /lst Is this the way the response looks like when you use teh patch? Thanks in advance Martijn v Groningen wrote: Hi Marc, I'm not sure if I follow you completely, but the example you gave is not complete. I'm missing a few tags in your example. Lets assume the following response that the latest patches produce. lst name=collapse_counts   str name=fieldcat/str   lst name=results     lst name=009       str name=fieldValuehard/str       int name=collapseCount1/int       result
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783484#action_12783484 ] Martijn van Groningen edited comment on SOLR-236 at 11/29/09 9:56 PM: -- I have attached a new patch that has the following changes: # Added caching for the field collapse functionality. Check the [solr wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure field-collapsing with caching. # Removed the collapse.max parameter (collapse.threshold must be used instead). It was deprecated for a long time. was (Author: martijn): I have attached a new patch that has the following changes: # Added caching for the field collapse functionality. Check the [solr wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure the field-collapsing with caching. # Removed the collapse.max parameter (collapse.threshold must be used instead). It was deprecated for a long time. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781232#action_12781232 ] Martijn van Groningen edited comment on SOLR-236 at 11/22/09 10:06 PM: --- The reason why the search results after the first search were incorrect was, because the scores were not preserved in the cache. The result of that was that the collapsing algorithm could not properly group the documents into the collapse groups (the most relevant document per document group could not be determined properly), because there was no score information when retrieving the documents from cache (as DocSet in SolrIndexSearcher) . I made sure that in the attached patch the score is also saved in the cache, so the collapsing algorithm can do its work properly when the documents are retrieved from the cache. Because the scores are now stored with the cached documents the actual size of the filterCache in memory will increase. was (Author: martijn): The reason why the search results after the first search were incorrect was, because the score was not preserved in the cache. The result of that was that the collapsing algorithm could not properly group the documents into the collapse groups (the most relevant document per document group could not be determined properly), because there was not score information when retrieving the documents from cache (as DocSet in SolrIndexSearcher) . I made sure that in the attached patch the score is also saved in the cache, so the collapsing algorithm can do its work properly when the documents are retrieved from the cache. Because the scores are now stored with the cached documents the actual size of the filterCache in memory will increase. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsin g.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12777659#action_12777659 ] Thomas Woodard edited comment on SOLR-236 at 11/13/09 9:10 PM: --- I tried the build again, and you are right, it does work fine with the default search handler. I had been trying to get it working with our search handler, which is dismax. That still doesn't work. Here is the handler configuration, which works fine until collapsing is added. {code:xml} requestHandler name=glsearch class=solr.SearchHandler lst name=defaults str name=defTypedismax/str str name=qfname^3 description^2 long_description^2 search_stars^1 search_directors^1 product_id^0.1/str str name=tie0.1/str str name=facettrue/str str name=facet.fieldstars/str str name=facet.fielddirectors/str str name=facet.fieldkeywords/str str name=facet.fieldstudio/str str name=facet.mincount1/str /lst /requestHandler {code} Edit: The search fails even if you don't pass a collapse field. was (Author: gtfoomw): I tried the build again, and you are right, it does work fine with the default search handler. I had been trying to get it working with our search handler, which is dismax. That still doesn't work. Here is the handler configuration, which works fine until collapsing is added. {code:xml} requestHandler name=glsearch class=solr.SearchHandler lst name=defaults str name=defTypedismax/str str name=qfname^3 description^2 long_description^2 search_stars^1 search_directors^1 product_id^0.1/str str name=tie0.1/str str name=facettrue/str str name=facet.fieldstars/str str name=facet.fielddirectors/str str name=facet.fieldkeywords/str str name=facet.fieldstudio/str str name=facet.mincount1/str /lst /requestHandler {code} Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12775925#action_12775925 ] Michael Gundlach edited comment on SOLR-236 at 11/10/09 4:13 PM: - This patch (quasidistributed.additional.patch) does not apply field collapsing. Apply this patch in addition to the latest field collapsing patch, to avoid an NPE when: - you are collapsing on a field F, - you are sharding into multiple cores, using the hash of field F as your sharding key, AND - you perform a distributed search on a tokenized field. Note that if you attempt to use this patch to collapse on a field F1 and shard according to a field F2, you will get buggy search behavior. was (Author: gundlach): This patch does not apply field collapsing. Apply this patch in addition to the latest field collapsing patch, to avoid an NPE when: - you are collapsing on a field F, - you are sharding into multiple cores, using the hash of field F as your sharding key, AND - you perform a distributed search on a tokenized field. Note that if you attempt to use this patch to collapse on a field F1 and shard according to a field F2, you will get buggy search behavior. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12775192#action_12775192 ] Michael Gundlach edited comment on SOLR-236 at 11/9/09 11:45 PM: - I've found an NPE that occurs when performing quasi-distributed field collapsing. My company only has one use case for field collapsing: collapsing on email address. Our index is spread across multiple cores. We found that if we shard by email address, so that all documents with a given email address are guaranteed to appear on the same core, then we can do distributed field collapsing. We add collapse.field=email and shards=core1,core2,... to a regular query. Each core collapses on email and sends the results back to the requestor. Since no emails appear on more than one core, we've accomplished distributed search. We do lose the collapse_count section, but that's not needed for our purpose -- we just need an accurate total document count, and to have no more than one document for a given email address in the results. Unfortunately, this throws an NPE when searching on a tokenized field. Searching string fields is fine. I don't understand exactly why the NPE appears, but I did bandaid over it by checking explicitly for nulls at the appropriate line in the code. No more NPE. There's a downside, which is that if we attempt to collapse on a field other than email -- one which has documents appearing in multiple cores -- the results are buggy: the first search returns few documents, and the number of documents actually displayed don't always match the numFound value. Then upon refresh we get what we think is the correct numFound, and the correct list of documents. This doesn't bother me too much, as you're guaranteed to get incorrect answers from the collapse code anyway when collapsing on a field that you didn't use as your key for sharding. In the spirit of Yonik's law of patches, I have made two imperfect patches attempting to contribute the fix, or at least point out the error: 1. I pulled trunk, applied the latest SOLR-236 patch, made my 2 line change, and created a patch file. The resultant patch file looks very different from the latest SOLR-236 patchfile, so I assume I did something wrong. 2. I pulled trunk, made my 2 line change, and created another patch file. This file is tiny but of course is missing all of the field collapsing changes. Would you like me to post either of these patchfiles to this issue? Or is it sufficient to just tell you that the NPE occured in QueryComponent.java on line 556? (rb._responseDocs.set(sdoc.positionInResponse, doc); where sdoc was null.) Perhaps my use case is extraordinary enough that you're happy leaving the NPE in place and telling other users to not do what I'm doing? Thanks! Michael was (Author: gundlach): I've found an NPE that occurs when performing quasi-distributed field collapsing. My company only has one use case for field collapsing: collapsing on email address. Our index is spread across multiple cores. We found that if we shard by email address, so that a given all documents with a given email address are guaranteed to appear on the same core, then we can do distributed field collapsing. We add collapse.field=email and shards=core1,core2,... to a regular query. Each core collapses on email and sends the results back to the requestor. Since no emails appear on more than one core, we've accomplished distributed search. We do lose the collapse_count section, but that's not needed for our purpose -- we just need an accurate total document count, and to have no more than one document for a given email address in the results. Unfortunately, this throws an NPE when searching on a tokenized field. Searching string fields is fine. I don't understand exactly why the NPE appears, but I did bandaid over it by checking explicitly for nulls at the appropriate line in the code. No more NPE. There's a downside, which is that if we attempt to collapse on a field other than email -- one which has documents appearing in multiple cores -- the results are buggy: the first search returns few documents, and the number of documents actually displayed don't always match the numFound value. Then upon refresh we get what we think is the correct numFound, and the correct list of documents. This doesn't bother me too much, as you're guaranteed to get incorrect answers from the collapse code anyway when collapsing on a field that you didn't use as your key for sharding. In the spirit of Yonik's law of patches, I have made two imperfect patches attempting to contribute the fix, or at least point out the error: 1. I pulled trunk, applied the latest SOLR-236 patch, made my 2 line change, and created a patch file. The resultant patch file looks very different from the latest SOLR-236
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765076#action_12765076 ] Aytek Ekici edited comment on SOLR-236 at 10/13/09 6:46 AM: Hi all, Just applied field-collapse-5.patch and i guess there are problems with filter queries. Here it is: 1- http://10.231.14.252:8080/myindex/select?q=*:*fq=lat:[37.2 TO 39.8] numFound: 6284 2- http://10.231.14.252:8080/myindex/select?q=*:*fq=lng:[24.5 TO 29.9] numFound: 16912 3- http://10.231.14.252:8080/myindex/select?q=*:*fq=lat:[37.2 TO 39.8]fq=lng:[24.5 TO 29.9] numFound: 19419 4- When using q instead of fq which is http://10.231.14.252:8080/myindex/select?q=lat:[37.2 TO 39.8] AND lng:[24.5 TO 29.9] numFound: 3777 (which is the only correct number) The thing is, as i understand, instead of applying AND for each filter query it applies OR. Checked http://10.231.14.252:8080/myindex/select?q=lat:[37.2 TO 39.8] OR lng:[24.5 TO 29.9] numFound: 19419 (same as 3rd one) Any idea how to fix this? Thx. was (Author: aytek): Hi all, Just applied field-collapse-5.patch and i guess there are problems with filter queries. Here it is: 1- Use one(first) filter http://10.231.14.252:8080/myindex/select?q=*:*fq=lat:[37.2 TO 39.8] numFound: 6284 2- Use second filter http://10.231.14.252:8080/myindex/select?q=*:*fq=lng:[24.5 TO 29.9] numFound: 16912 3- Use both filters http://10.231.14.252:8080/myindex/select?q=*:*fq=lat:[37.2 TO 39.8]fq=lng:[24.5 TO 29.9] numFound: 19419 4- When using q instead of fq which is : http://10.231.14.252:8080/myindex/select?q=lat:[37.2 TO 39.8] AND lng:[24.5 TO 29.9] numFound: 3777 (which is the only correct number) The thing is, as i understand, instead of applying AND for each filter query it applies OR. Checked http://10.231.14.252:8080/myindex/select?q=lat:[37.2 TO 39.8] OR lng:[24.5 TO 29.9] numFound: 19419 (same as 3rd one) Any idea how to fix this? Thx. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765076#action_12765076 ] Aytek Ekici edited comment on SOLR-236 at 10/13/09 6:48 AM: Hi all, Just applied field-collapse-5.patch and i guess there are problems with filter queries. Here it is: 1- select?q=*:*fq=lat:[37.2 TO 39.8] numFound: 6284 2- select?q=*:*fq=lng:[24.5 TO 29.9] numFound: 16912 3- select?q=*:*fq=lat:[37.2 TO 39.8]fq=lng:[24.5 TO 29.9] numFound: 19419 4- When using q instead of fq which is: select?q=lat:[37.2 TO 39.8] AND lng:[24.5 TO 29.9] numFound: 3777 (which is the only correct number) The thing is, as i understand, instead of applying AND for each filter query it applies OR. Checked select?q=lat:[37.2 TO 39.8] OR lng:[24.5 TO 29.9] numFound: 19419 (same as 3rd one) Any idea how to fix this? Thx. was (Author: aytek): Hi all, Just applied field-collapse-5.patch and i guess there are problems with filter queries. Here it is: 1- http://10.231.14.252:8080/myindex/select?q=*:*fq=lat:[37.2 TO 39.8] numFound: 6284 2- http://10.231.14.252:8080/myindex/select?q=*:*fq=lng:[24.5 TO 29.9] numFound: 16912 3- http://10.231.14.252:8080/myindex/select?q=*:*fq=lat:[37.2 TO 39.8]fq=lng:[24.5 TO 29.9] numFound: 19419 4- When using q instead of fq which is http://10.231.14.252:8080/myindex/select?q=lat:[37.2 TO 39.8] AND lng:[24.5 TO 29.9] numFound: 3777 (which is the only correct number) The thing is, as i understand, instead of applying AND for each filter query it applies OR. Checked http://10.231.14.252:8080/myindex/select?q=lat:[37.2 TO 39.8] OR lng:[24.5 TO 29.9] numFound: 19419 (same as 3rd one) Any idea how to fix this? Thx. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753335#action_12753335 ] Paul Nelson edited comment on SOLR-236 at 9/9/09 5:07 PM: -- Hey All: Just upgraded to 1.4 to get the new patch (many thanks, Martijn). The new algorithm appears to be sensitive to the size and complexity of the query (rather than simply the count of documents) - should this be the case? Unfortunately, we have rather large and complex queries with dozens of terms and several phrases, and while these queries are 0.5sec without collapsing, they are 3-4sec with collapsing. Meanwhile, collapse using *:* or other simple queries come back in 0.5sec - so it appears to be primarily a query-complexity issue. I'm wondering if the filter cache (or some other cache) might be able to help with this situation? was (Author: pnelsoncomposer): Hey All: Just upgraded to 1.4 to get the new patch (many thanks, Martijn). The new algorithm appears to be sensitive to the size and complexity of the query (rather than simply the count of documents) - should this be the case? Unfortunately, we have rather large and complex queries with dozens of terms and several phrases, and while these queries are 0.5sec without collapsing, they are 3-4sec with collapsing. I'm wondering if the filter cache (or some other cache) might be able to help with this situation? Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753335#action_12753335 ] Paul Nelson edited comment on SOLR-236 at 9/9/09 5:09 PM: -- Hey All: Just upgraded to 1.4 to get the new patch (many thanks, Martijn). The new algorithm appears to be sensitive to the size and complexity of the query (rather than simply the count of documents) - should this be the case? Unfortunately, we have rather large and complex queries with dozens of terms and several phrases, and while these queries are 0.5sec without collapsing, they are 3-4sec with collapsing. Meanwhile, collapse using \*:\* or other simple queries come back in 0.5sec - so it appears to be primarily a query-complexity issue. I'm wondering if the filter cache (or some other cache) might be able to help with this situation? was (Author: pnelsoncomposer): Hey All: Just upgraded to 1.4 to get the new patch (many thanks, Martijn). The new algorithm appears to be sensitive to the size and complexity of the query (rather than simply the count of documents) - should this be the case? Unfortunately, we have rather large and complex queries with dozens of terms and several phrases, and while these queries are 0.5sec without collapsing, they are 3-4sec with collapsing. Meanwhile, collapse using *:* or other simple queries come back in 0.5sec - so it appears to be primarily a query-complexity issue. I'm wondering if the filter cache (or some other cache) might be able to help with this situation? Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751243#action_12751243 ] Abdul Chaudhry edited comment on SOLR-236 at 9/3/09 5:56 PM: - I have some ideas for performance improvements. I noticed that the code fetches the field cache twice, once for the collapse and then for the response object, assuming you asked for the info count in the response. That seems expensive, especially for real-time content. I think its better to use FieldCache.StringIndex instead of returning a large string array and keep it around for the collapse and the response object. I changed the code so that I keep the cache around like so /** * Keep the field cached for the collapsed fields for the response object as well */ private FieldCache.StringIndex collapseIndex; To get the index use something like this instead of getting the string array for all docs collapseIndex = FieldCache.DEFAULT.getStringIndex(searcher.getReader(), collapseField) when collapsing , you can get the current value using something like this and remove the code passing the array int currentId = i.nextDoc(); String currentValue = collapseIndex.lookup[collapseIndex.order[currentId]]; when building the response for the info count, you can reference the same cache like so:- if (collapseInfoCount) { resCount.add(collapseFieldType.indexedToReadable( collapseIndex.lookup[collapseIndex.order[id]]), count); } I also added timing for the cache access as it could be slow if you are doing a lot of updates I have added code for displaying selected fields for the duplicates but its difficult to submit . I hope this gets committed as its hard to sumbit a patch as its not in svn and I cannot submit a patch to a patch to a patch .. you get the idea. was (Author: abdollar): I have some ideas for performance improvements. I noticed that the code fetches the field cache twice, once for the collapse and then for the response object, assuming you asked for the info count in the response. That seems expensive, especially for real-time content. I think its better to use FieldCache.StringIndex instead of returning a large string array and keep it around for the collapse and the response object. I changed the code so that I keep the cache around like so /** * Keep the field cached for the collapsed fields for the response object as well */ private FieldCache.StringIndex collapseIndex; when collapsing , you can get the current value using something like this and remove the code passing the array int currentId = i.nextDoc(); String currentValue = collapseIndex.lookup[collapseIndex.order[currentId]]; when building the response for the info count, you can reference the same cache like so:- if (collapseInfoCount) { resCount.add(collapseFieldType.indexedToReadable( collapseIndex.lookup[collapseIndex.order[id]]), count); } I also added timing for the cache access as it could be slow if you are doing a lot of updates I have added code for displaying selected fields for the duplicates but its difficult to submit . I hope this gets committed as its hard to sumbit a patch as its not in svn and I cannot submit a patch to a patch to a patch .. you get the idea. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12750582#action_12750582 ] Martijn van Groningen edited comment on SOLR-236 at 9/2/09 11:18 AM: - Yes, specifying which collapse fields to return is a good idea. Just like the fl parameter for a normal request. I was thinking about how to fit this new feature into the current patch and I thought that it might be a good idea to revise the current field collapse result format. So that the results of this feature can fit nicely into the response. Currently the collapse response is like this: {code:xml} lst name=collapse_counts str name=fieldvenue/str lst name=doc int name=2332381/int /lst lst name=count int name=melkweg1/int /lst /lst {code} I think a response format like the following would be more {code:xml} lst name=collapse_counts str name=fieldvenue/str lst name=results lst name=233238 str name=fieldValuemelkweg/str int name=collapseCount2/int lst name=collapsedValues str name=price10.99, 1.999,99/str str name=nameadapter, laptop/str /lst /lst /lst /lst {code} As you can see the data is more banded together and therefore easier to parse. The collapsedValues can have one or more fields, each containing collapsed field values in a comma separated format. The _collapseValues_ element will off course only be added when the client specifies the collapsed fields in the request. What do you think about this new result format? was (Author: martijn): Yes, specifying which collapse fields to return is a good idea. Just like the fl parameter for a normal request. I was thinking about how to fit this new feature into the current patch and I thought that it might be a good idea to revise the current field collapse result format. So that the results of this feature can fit nicely into the response. Currently the collapse response is like this: {code:xml} lst name=collapse_counts str name=fieldvenue/str lst name=doc int name=2332381/int /lst lst name=count int name=melkweg1/int /lst /lst {code} I think a response format like the following would be more {code:xml} lst name=collapse_counts str name=fieldvenue/str lst name= lst name=233238 str name=fieldValuemelkweg/str int name=collapseCount2/int lst name=collapsedValues str name=price10.99, 1.999,99/str str name=nameadapter, laptop/str /lst /lst /lst {code} As you can see the data is more banded together and therefore easier to parse. The collapsedValues can have one or more fields, each containing collapsed field values in a comma separated format. The _collapseValues_ element will off course only be added when the client specifies the collapsed fields in the request. What do you think about this new result format? Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721100#action_12721100 ] Martijn van Groningen edited comment on SOLR-236 at 6/18/09 2:26 AM: - I have not found an online example yet, but I copied this config from the javadoc of the DistanceCalculatingComponent class and modified it. The patch also modifies the solr examples, so i f you look there you can see how the patch is used (example/solr/conf/schema.xml and example/solr/conf/solrconfig.xml). You need to add an extra update processor and an extra field and dynamic field in order to make it work. was (Author: martijn): I have not found an online example yet, but I copied this config from the javadoc of the DistanceCalculatingComponent class and modified it. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719677#action_12719677 ] Shekhar edited comment on SOLR-236 at 6/15/09 3:34 PM: --- Here is the solfconfig file. requestHandler name=geo class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str /lst arr name=components strlocalsolr/str strcollapse/str /arr /requestHandler You can get more details from http://www.gissearch.com/localsolr === Following are the results I am getting : response − lst name=responseHeader int name=status0/int int name=QTime146/int − lst name=params str name=lat41.883784/str str name=radius50/str str name=collapse.fieldresource_id/str str name=rows2/str str name=indenton/str str name=flresource_id,geo_distance/str str name=qTV/str str name=qtgeo/str str name=long-87.637668/str /lst /lst − result name=response numFound=4294 start=0 − doc int name=resource_id10018/int double name=geo_distance26.16691883965225/double /doc − doc int name=resource_id10102/int double name=geo_distance39.90588996589528/double /doc /result − lst name=collapse_counts str name=fieldresource_id/str − lst name=doc int name=10022116/int int name=117014/int /lst − lst name=count int name=10015116/int int name=100184/int /lst − lst name=debug str name=Docset typeBitDocSet(5201)/str long name=Total collapsing time(ms)46/long long name=Create uncollapsed docset(ms)22/long long name=Collapsing normal time(ms)24/long long name=Creating collapseinfo time(ms)0/long long name=Convert to bitset time(ms)0/long long name=Create collapsed docset time(ms)0/long /lst /lst − result name=response numFound=5201 start=0 − doc int name=resource_id10015/int /doc − doc int name=resource_id10018/int /doc /result /response was (Author: csnirkhe): Here is the solfconfig file. requestHandler name=geo class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str /lst arr name=components strlocalsolr/str strcollapse/str /arr /requestHandler Following are the results I am getting : response − lst name=responseHeader int name=status0/int int name=QTime146/int − lst name=params str name=lat41.883784/str str name=radius50/str str name=collapse.fieldresource_id/str str name=rows2/str str name=indenton/str str name=flresource_id,geo_distance/str str name=qTV/str str name=qtgeo/str str name=long-87.637668/str /lst /lst − result name=response numFound=4294 start=0 − doc int name=resource_id10018/int double name=geo_distance26.16691883965225/double /doc − doc int name=resource_id10102/int double name=geo_distance39.90588996589528/double /doc /result − lst name=collapse_counts str name=fieldresource_id/str − lst name=doc int name=10022116/int int name=117014/int /lst − lst name=count int name=10015116/int int name=100184/int /lst − lst name=debug str name=Docset typeBitDocSet(5201)/str long name=Total collapsing time(ms)46/long long name=Create uncollapsed docset(ms)22/long long name=Collapsing normal time(ms)24/long long name=Creating collapseinfo time(ms)0/long long name=Convert to bitset time(ms)0/long long name=Create collapsed docset time(ms)0/long /lst /lst − result name=response numFound=5201 start=0 − doc int name=resource_id10015/int /doc − doc int name=resource_id10018/int /doc /result /response Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12716128#action_12716128 ] Ron Veenstra edited comment on SOLR-236 at 6/3/09 7:22 PM: --- I require assistance. I've installed a fresh Solr (1.3.0), and all appears/operates well. I then patch using SOLR-236_collapsing.patch [by Thomas Traeger] (the last patch i saw claimed to work with 1.3.0), without error. I then add to solrconfig.xml the following (per: http://wiki.apache.org/solr/FieldCollapsing) : searchComponent name=collapse class=org.apache.solr.handler.component.CollapseComponent / Upon restart, I get a long configuration error, which seems to hinge on: HTTP Status 500 - Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: abortOnConfigurationErrorfalse/abortOnConfigurationError in solrconfig.xml - org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.component.CollapseComponent' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:273) [the full error can be included if desired.] I've verified that the CollapseComponent file exists in the proper place. I've moved CollapseParams as required, (move CollapseParams.java from common/org/apache/solr/common/params to java/org/apache/solr/common/params/ ) I've tried multiple iterations of the patch (on fresh installs), all with the same issue. Are there additional steps, patches, or configurations that are required? Is this a known issue? Any help is very much appreciated. was (Author: ronunism): I require assistance. I've installed a fresh Solr (1.3.0), and all appears/operates well. I then patch using SOLR-236_collapsing.patch (the last patch i saw claimed to work with 1.3.0), without error. I then add to solrconfig.xml the following (per: http://wiki.apache.org/solr/FieldCollapsing) : searchComponent name=collapse class=org.apache.solr.handler.component.CollapseComponent / Upon restart, I get a long configuration error, which seems to hinge on: HTTP Status 500 - Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: abortOnConfigurationErrorfalse/abortOnConfigurationError in solrconfig.xml - org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.component.CollapseComponent' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:273) [the full error can be included if desired.] I've verified that the CollapseComponent file exists in the proper place. I've moved CollapseParams as required, (move CollapseParams.java from common/org/apache/solr/common/params to java/org/apache/solr/common/params/ ) I've tried multiple iterations of the patch (on fresh installs), all with the same issue. Are there additional steps, patches, or configurations that are required? Is this a known issue? Any help is very much appreciated. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714442#action_12714442 ] Martijn van Groningen edited comment on SOLR-236 at 5/29/09 6:02 AM: - Hi, I have modified the latest patch of Thomas and made two performance improvements: 1) Improved normal field collapsing. I tested it with an index 1.1 million documents. When collapsing on all documents and with no sorting specified (so sorting on score) the query time is around 130ms compared with the previous patch which is around 1.5 s. When I then add sorting on string field the query time is around 220 ms compared with the previous patch which is around 5.2 s. The reason why it is faster is because the latest patch queries for a doclist instead of a docset. In the normal collapse method it keeps track of the most relevant documents, so the end result is the same, also creating a docList of 1.1 million documents (and ordering it) is very expensive. Note: I did not improved adjacent collapsing, because the adjacent method needs (as far as I understand it) a completely sorted list of documents (docList). 2) Slightly improved facetation in combination with field collapsing, by reusing the uncollapsed docset that is created during the collapsing process (the previous patch made invoked a second search). I also have added documentation, added a few unit tests for the collapsing process itself and made the debug information more readable. I'm very interested in other people's experiences with this patch and feedback on the patch itself. Cheers, Martijn was (Author: martijn): Hi, I have modified the latest patch of Thomas and made two performance improvements: 1) Improved normal field collapsing. I tested it with an index 1.1 million documents. When collapsing on all documents and with no sorting specified (so sorting on score) the query time is around 130ms compared with the previous patch which is around 1.5 s. When I then add sorting on string field the query time is around 220 ms compared with the previous patch which is around 5.2 s. The reason why it is faster is because the latest patch queries for a doclist instead of a docset. In the normal collapse method it keeps track of the most relevant documents, so the end result is the same, also creating a docList of 1.1 million documents (and ordering it) is very expensive. Note: I did not improved adjacent collapsing, because the adjacent method needs (as far as I understand it) a completely sorted list of documents (docList). 2) Sightly improved facetation in combination with field collapsing, by reusing the uncollapsed docset that is created during the collapsing process (the previous patch made invoked a second search). I also have added documentation, added a few unit tests for the collapsing process itself and made the debug information easier readable. I'm very interested in other people's experiences with this patch and feedback on the patch itself. Cheers, Martijn Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.:
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714442#action_12714442 ] Martijn van Groningen edited comment on SOLR-236 at 5/29/09 11:38 AM: -- Hi, I have modified the latest patch of Thomas and made two performance improvements: 1) Improved normal field collapsing. I tested it with an index 1.1 million documents. When collapsing on all documents and with no sorting specified (so sorting on score) the query time is around 130ms compared with the previous patch which is around 1.5 s. When I then add sorting on string field the query time is around 220 ms compared with the previous patch which is around 5.2 s. The reason why it is faster is because the latest patch queries for a doclist instead of a docset. In the normal collapse method it keeps track of the most relevant documents, so the end result is the same, also creating a docList of 1.1 million documents (and ordering it) is very expensive. Note: I did not improved adjacent collapsing, because the adjacent method needs (as far as I understand it) a completely sorted list of documents (docList). 2) Slightly improved facetation in combination with field collapsing, by reusing the uncollapsed docset that is created during the collapsing process (the previous patch made invoked a second search). I also have added documentation, added a few unit tests for the collapsing process itself and made the debug information more readable. This patch works from revision 779335 (last Wednesday) and up. This patch depends on some changes in Solr and a change inside Lucene. I'm very interested in other people's experiences with this patch and feedback on the patch itself. Cheers, Martijn was (Author: martijn): Hi, I have modified the latest patch of Thomas and made two performance improvements: 1) Improved normal field collapsing. I tested it with an index 1.1 million documents. When collapsing on all documents and with no sorting specified (so sorting on score) the query time is around 130ms compared with the previous patch which is around 1.5 s. When I then add sorting on string field the query time is around 220 ms compared with the previous patch which is around 5.2 s. The reason why it is faster is because the latest patch queries for a doclist instead of a docset. In the normal collapse method it keeps track of the most relevant documents, so the end result is the same, also creating a docList of 1.1 million documents (and ordering it) is very expensive. Note: I did not improved adjacent collapsing, because the adjacent method needs (as far as I understand it) a completely sorted list of documents (docList). 2) Slightly improved facetation in combination with field collapsing, by reusing the uncollapsed docset that is created during the collapsing process (the previous patch made invoked a second search). I also have added documentation, added a few unit tests for the collapsing process itself and made the debug information more readable. I'm very interested in other people's experiences with this patch and feedback on the patch itself. Cheers, Martijn Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) -
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12705959#action_12705959 ] Domingo Gómez GarcÃa edited comment on SOLR-236 at 5/5/09 1:53 AM: --- The results of collapse_counts are not what i have expected. It losses many categories, only showing a few . I tried incrementing the collapse.max parameter: max=1 results lst name=doc int name=2008/LICOBLE-00023109/int int name=2008/LICOBLE-35/int int name=2009/LICOBLE-000364/int int name=2009/LICOBLE-000951/int /lst − lst name=count int name=12740109/int int name=127415/int int name=132824/int int1/int /lst max=2 results lst name=doc int name=2009/LICOBLE-8108/int int name=2007/LICOBLE-14/int /lst − lst name=count int name=12740108/int int name=127414/int /lst max=3 results lst name=doc int name=2008/LICOBLE-00020107/int int name=2008/LICOBLE-000213/int /lst − lst name=count int name=12740107/int int name=127413/int /lst max=4 lst name=doc int name=2009/LICOBLE-00060106/int /lst − lst name=count int name=12740106/int /lst How is possible to get less results each time? There are like 70 categories, do I have any way to obtain all those counts? Am I mising any collapsing concept? Thanks. was (Author: dgomezca): The results of collapse_counts are not what i have expected. It losses many categories, only showing . I tried incrementing the collapse.max parameter: max=1 results lst name=doc int name=2008/LICOBLE-00023109/int int name=2008/LICOBLE-35/int int name=2009/LICOBLE-000364/int int name=2009/LICOBLE-000951/int /lst − lst name=count int name=12740109/int int name=127415/int int name=132824/int int1/int /lst max=2 results lst name=doc int name=2009/LICOBLE-8108/int int name=2007/LICOBLE-14/int /lst − lst name=count int name=12740108/int int name=127414/int /lst max=3 results lst name=doc int name=2008/LICOBLE-00020107/int int name=2008/LICOBLE-000213/int /lst − lst name=count int name=12740107/int int name=127413/int /lst max=4 lst name=doc int name=2009/LICOBLE-00060106/int /lst − lst name=count int name=12740106/int /lst How is possible to get less results each time? There are like 70 categories, do I have any way to obtain all those counts? Am I mising any collapsing concept? Thanks. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12701862#action_12701862 ] Domingo Gómez GarcÃa edited comment on SOLR-236 at 4/29/09 4:23 AM: I made checkout on svn release-1.3.0 and applied SOLR-236_collapsing.patch. I have upgraded from 1.2 to 1.3.0 (patched) and I get a lot of permgen exceptions. Specially in calls from solrj. was (Author: dgomezca): I made checkout on svn release-1.3.0 and applied SOLR-236_collapsing.patch. After the task generate-maven-artifacts I use the resulting distribution and made http://localhost:8983/solr/select/?q=*:*collapse.field=catcollapse.max=1collapse.type=normal (from wiki). No collapsed results. It seems to be ignoring CollapseComponent or something like that. Do I have to configure something else? Could anyone bring to me a working version/patch? Thank you. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12701862#action_12701862 ] Domingo Gómez GarcÃa edited comment on SOLR-236 at 4/29/09 4:29 AM: I made checkout on svn release-1.3.0 and applied SOLR-236_collapsing.patch. When I use collapse parameters I always get permgen exceptions. How much memory could use collapse vs normal querys? was (Author: dgomezca): I made checkout on svn release-1.3.0 and applied SOLR-236_collapsing.patch. I have upgraded from 1.2 to 1.3.0 (patched) and I get a lot of permgen exceptions. Specially in calls from solrj. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12701862#action_12701862 ] Domingo Gómez GarcÃa edited comment on SOLR-236 at 4/29/09 6:46 AM: I made checkout on svn release-1.3.0 and applied SOLR-236_collapsing.patch. Is there any way of integrate with solrj? was (Author: dgomezca): I made checkout on svn release-1.3.0 and applied SOLR-236_collapsing.patch. When I use collapse parameters I always get permgen exceptions. How much memory could use collapse vs normal querys? Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699903#action_12699903 ] Jeff edited comment on SOLR-236 at 4/16/09 3:22 PM: We have tried to integrate the most recent patch into our 1.4 install. The patching was smooth and overall it works good. However, it appears the issue with fq has returned. Whenever I try to filter the query it gives Either filter or filterList may be set in the QueryCommand, but not both. Not sure what happened. What part of the patch makes it possible for fq to work as it may not be there now. Additionally, the collapse.facet=before seems to not work. Any help in this area would be greatly appreciated. was (Author: jnewburn): We have tried to integrate the most recent patch into our 1.4 install. The patching was smooth and overall it works good. However, it appears the issue with fq has returned. Whenever I try to filter the query it gives Either filter or filterList may be set in the QueryCommand, but not both. Not sure what happened. What part of the patch makes it possible for fq to work as it may not be there now. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12694851#action_12694851 ] Dave Redford edited comment on SOLR-236 at 4/10/09 6:47 PM: There is an issue with collapsed result ordering when querying with only the unique Id and score fields in the request. [Update: this is only an issue when both standard results and collapse results are present - which I was using for testing] eg: q=fordversion=2.2start=0rows=10indent=onfl=Id,scorecollapse.field=PrimaryIdcollapse.max=1 gives wrong ordering (note: Id is our unique Id) but adding a another field - even a bogus one - works. q=fordversion=2.2start=0rows=10indent=onfl=Id,score,boguscollapse.field=PrimaryIdcollapse.max=1 Also using an fq makes it work eg: fq=Type:articlesq=fordversion=2.2start=0rows=10indent=onfl=Id,scorecollapse.field=PrimaryIdcollapse.max=1 I'm using the latest Dmitry patch (25/mar/09) against 1.3.0. Apart from that great so far...thanks to all was (Author: dredford): There is an issue with collapsed result ordering when querying with only the unique Id and score fields in the request. [Update: this is only an issue when both standard results and collapse results are present - which I was using for testing] eg: q=fordversion=2.2start=0rows=10indent=onfl=Id,scorecollapse.field=PrimaryIdcollapse.max=1 gives wrong ordering (note: Id is our unique Id) but adding a another field - even a bogus one - works. q=fordversion=2.2start=0rows=10indent=onfl=Id,score,boguscollapse.field=PrimaryIdcollapse.max=1 Also using an fq makes it work eg: fq=Type:articlesq=fordversion=2.2start=0rows=10indent=onfl=Id,scorecollapse.field=PrimaryIdcollapse.max=1 I'm using the latest Dmitry patch (25/mar/09) against 1.3.0. Apart from that great so far... Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12694851#action_12694851 ] Dave Redford edited comment on SOLR-236 at 4/10/09 6:46 PM: There is an issue with collapsed result ordering when querying with only the unique Id and score fields in the request. [Update: this is only an issue when both standard results and collapse results are present - which I was using for testing] eg: q=fordversion=2.2start=0rows=10indent=onfl=Id,scorecollapse.field=PrimaryIdcollapse.max=1 gives wrong ordering (note: Id is our unique Id) but adding a another field - even a bogus one - works. q=fordversion=2.2start=0rows=10indent=onfl=Id,score,boguscollapse.field=PrimaryIdcollapse.max=1 Also using an fq makes it work eg: fq=Type:articlesq=fordversion=2.2start=0rows=10indent=onfl=Id,scorecollapse.field=PrimaryIdcollapse.max=1 I'm using the latest Dmitry patch (25/mar/09) against 1.3.0. Apart from that great so far... was (Author: dredford): There is an issue with collapsed result ordering when querying with only the unique Id and score fields in the request. eg: q=fordversion=2.2start=0rows=10indent=onfl=Id,scorecollapse.field=PrimaryIdcollapse.max=1 gives wrong ordering (note: Id is our unique Id) but adding a another field - even a bogus one - works. q=fordversion=2.2start=0rows=10indent=onfl=Id,score,boguscollapse.field=PrimaryIdcollapse.max=1 Also using an fq makes it work eg: fq=Type:articlesq=fordversion=2.2start=0rows=10indent=onfl=Id,scorecollapse.field=PrimaryIdcollapse.max=1 I'm using the latest Dmitry patch (25/mar/09) against 1.3.0. Apart from that great so far... Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12694851#action_12694851 ] Dave Redford edited comment on SOLR-236 at 4/1/09 5:56 PM: --- There is an issue with collapsed result ordering when querying with only the unique Id and score fields in the request. eg: q=fordversion=2.2start=0rows=10indent=onfl=Id,scorecollapse.field=PrimaryIdcollapse.max=1 gives wrong ordering (note: Id is our unique Id) but adding a another field - even a bogus one - works. q=fordversion=2.2start=0rows=10indent=onfl=Id,score,boguscollapse.field=PrimaryIdcollapse.max=1 Also using an fq makes it work eg: fq=Type:articlesq=fordversion=2.2start=0rows=10indent=onfl=Id,scorecollapse.field=PrimaryIdcollapse.max=1 I'm using the latest Dmitry patch (25/mar/09) against 1.3.0. Apart from that great so far... was (Author: dredford): There is an issue with collapsed result ordering when querying with only the unique Id and score fields in the request. eg: q=fordversion=2.2start=0rows=10indent=onfl=Id,scorecollapse.field=PrimaryIdcollapse.max=1 gives wrong order (note: Id is our unique Id) but q=fordversion=2.2start=0rows=10indent=onfl=Id,score,boguscollapse.field=PrimaryIdcollapse.max=1 Also using an fq make it work eg: fq=Type:articlesq=fordversion=2.2start=0rows=10indent=onfl=Id,scorecollapse.field=PrimaryIdcollapse.max=1 I'm using the latest Dmitry patch (25/mar/09) against 1.3.0. Apart from that great so far... Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12679603#action_12679603 ] jove4015 edited comment on SOLR-236 at 3/6/09 6:13 AM: Help!! We've been using this patch in production for months now, and suddenly in the last 3 days it is crashing constantly. [Edit - It's Ivan's latest patch, #3, with Solr 1.3 dist] Mar 6, 2009 5:23:50 AM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.util.OpenBitSet.ensureCapacityWords(OpenBitSet.java:701) at org.apache.solr.util.OpenBitSet.ensureCapacity(OpenBitSet.java:711) at org.apache.solr.util.OpenBitSet.expandingWordNum(OpenBitSet.java:280) at org.apache.solr.util.OpenBitSet.set(OpenBitSet.java:221) at org.apache.solr.search.CollapseFilter.addDoc(CollapseFilter.java:217) at org.apache.solr.search.CollapseFilter.adjacentCollapse(CollapseFilter.java:171) at org.apache.solr.search.CollapseFilter.init(CollapseFilter.java:139) at org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:52) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:361) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:324) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) It seems to happen randomly - there's no special request happening, nothing new added to the index, nothing. We've made no configuration changes. The only thing that's happened is more documents have been added since then. The schema is the same, we have perhaps 20 more documents in the index now than we did when we first went live with it. It was a 32-bit machine allocated 2GB of RAM for Java before. We just upgraded it to 64-bit and increased the heap space to 3GB, and still it went down last night. I'm at my wits end, I don't know what to do but this functionality has been live so long now it's going to be extremely painful to take it away. Someone, please tell me if there's anything I can do to save this thing. was (Author: jove4015): Help!! We've been using this patch in production for months now, and suddenly in the last 3 days it is crashing constantly. [Edit - It's Ivan's latest patch, #3] Mar 6, 2009 5:23:50 AM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.util.OpenBitSet.ensureCapacityWords(OpenBitSet.java:701) at org.apache.solr.util.OpenBitSet.ensureCapacity(OpenBitSet.java:711) at org.apache.solr.util.OpenBitSet.expandingWordNum(OpenBitSet.java:280) at org.apache.solr.util.OpenBitSet.set(OpenBitSet.java:221) at org.apache.solr.search.CollapseFilter.addDoc(CollapseFilter.java:217) at org.apache.solr.search.CollapseFilter.adjacentCollapse(CollapseFilter.java:171) at org.apache.solr.search.CollapseFilter.init(CollapseFilter.java:139) at org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:52) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169) at
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12655269#action_12655269 ] ivan.prado edited comment on SOLR-236 at 12/10/08 8:34 AM: -- I have attached new patch with the problems solved in my first submitted patch. Doug Steigerwald, could you check if this patch works with for you? Thanks. was (Author: ivan.prado): A new patch with problems solved in my first submitted patch. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.4 Attachments: collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12655269#action_12655269 ] ivan.prado edited comment on SOLR-236 at 12/10/08 8:35 AM: -- I have attached new patch with the problems solved in my first submitted patch. Doug Steigerwald, could you check if this patch works well for you? Thanks. was (Author: ivan.prado): I have attached new patch with the problems solved in my first submitted patch. Doug Steigerwald, could you check if this patch works with for you? Thanks. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.4 Attachments: collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12638359#action_12638359 ] [EMAIL PROTECTED] edited comment on SOLR-236 at 10/9/08 12:53 PM: bq. What's a hard drive sort? Sorry - was not very clear. Just like sorting, finding dupes can be done in memory or using external storage (harddrive). I am only just looking into this stuff myself, but it seems in the best case you would want to do it in memory with a hash system which can be linear scalability. If you have too many items to look for dupes in, you have to use external storage - one good method is two sorts (we get one from the search), but there are other options too I think. In this case, the sorts are able to be done in memory though, but I think the hashtable method of identifying dupes is much less memory efficient (too many unique terms). was (Author: [EMAIL PROTECTED]): bq. What's a hard drive sort? Sorry - was not very clear. Just like sorting, finding dupes can be done in memory or using external storage (harddrive). I am only just looking into this stuff myself, but it seems in the best case you would want to do it in memory with a hash system which can be linear scalability. If you have too many items to look for dupes in, you have to use external storage - one good method is two external sorts (we get one from the search), but there are other options too I think. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Otis Gospodnetic Fix For: 1.4 Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12636978#action_12636978 ] [EMAIL PROTECTED] edited comment on SOLR-236 at 10/6/08 12:21 PM: Sorting twice (when not sorting on the collapse field) only makes sense if we are doing external sorts (harddrive), correct ? It seems to me that this should be closer to the facet stuff (in using the field cache) and then use a hash table of accumulators: linear time (is that generally?) right? (edit: looks like thats _too_ memory intensive) As Otis mentions above, this issue appears very popular. We should finish it up. was (Author: [EMAIL PROTECTED]): Sorting twice (when not sorting on the collapse field) only makes sense if we are doing external sorts (harddrive), correct ? It seems to me that this should be closer to the facet stuff (in using the field cache) and then use a hash table of accumulators: linear time (is that generally?) right? As Otis mentions above, this issue appears very popular. We should finish it up. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Otis Gospodnetic Fix For: 1.4 Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12624421#action_12624421 ] oleg_gnatovskiy edited comment on SOLR-236 at 8/21/08 10:34 AM: I was able to hack the latest patch in, and to get it to work, but it required some pretty heavy naive changes... If you are getting an NPE try this: in the SolrIndexSearcher class, in the getDocListC method change out = new DocListAndSet(); to DocListAndSet out = null; if(qr.getDocListAndSet() == null) out = new DocListAndSet(); else out = qr.getDocListAndSet(); was (Author: oleg_gnatovskiy): I was able to hack the latest patch in, and to get it to work, but it required some pretty heavy naive changes... Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Otis Gospodnetic Fix For: 1.4 Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12566864#action_12566864 ] oleg_gnatovskiy edited comment on SOLR-236 at 2/7/08 4:15 PM: -- Hello everyone. I am planning to implement chain collapsing on a high traffic production environment, so I'd like to use a stable version of Solr. It doesn't seem like you have a chain collapse patch for Solr 1.2, so I tried the Solr 1.1 patch. It seems to work fine at collapsing, but how do I get a countt for the documents other then the one being displayed? As a result I see: lst name=collapse_counts int name=Restaurant2414/int int name=Bar/Club9/int int name=Directory Services37/int /lst Does that mean that there are 2414 more Restaurants, 9 more Bars and 37 more Directory Services? If so, then that's great. However when I collapse on some integer fields I get an empty list for collapse_counts. Do counts only work for text fields? Thanks in advance for any help you can provide! was (Author: oleg_gnatovskiy): Hello everyone. I am planning to implement chain collapsing on a high traffic production environment, so I'd like to use a stable version of Solr. It doesn't seem like you have a chain collapse patch for Solr 1.2, so I tried the Solr 1.1 patch. It seems to work fine at collapsing, but how do I get a countt for the documents other then the one being displayed? As a result I see: code lst name=collapse_counts int name=Restaurant2414/int int name=Bar/Club9/int int name=Directory Services37/int /lst /code Does that mean that there are 2414 more Restaurants, 9 more Bars and 37 more Directory Services? If so, then that's great. However when I collapse on some integer fields I get an empty list for collapse_counts. Do counts only work for text fields? Thanks in advance for any help you can provide! Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12566864#action_12566864 ] oleg_gnatovskiy edited comment on SOLR-236 at 2/7/08 4:18 PM: -- Hello everyone. I am planning to implement chain collapsing on a high traffic production environment, so I'd like to use a stable version of Solr. It doesn't seem like you have a chain collapse patch for Solr 1.2, so I tried the Solr 1.1 patch. It seems to work fine at collapsing, but how do I get a countt for the documents other then the one being displayed? As a result I see: lst name=collapse_counts int name=Restaurant2414/int int name=Bar/Club9/int int name=Directory Services37/int /lst Does that mean that there are 2414 more Restaurants, 9 more Bars and 37 more Directory Services? If so, then that's great. However when I collapse on some fields I get an empty collapse_counts list. It could be that those fields have a large number of different values that it collapses on. Is there a limit to the number of values that collaose_counts displays? Thanks in advance for any help you can provide! was (Author: oleg_gnatovskiy): Hello everyone. I am planning to implement chain collapsing on a high traffic production environment, so I'd like to use a stable version of Solr. It doesn't seem like you have a chain collapse patch for Solr 1.2, so I tried the Solr 1.1 patch. It seems to work fine at collapsing, but how do I get a countt for the documents other then the one being displayed? As a result I see: lst name=collapse_counts int name=Restaurant2414/int int name=Bar/Club9/int int name=Directory Services37/int /lst Does that mean that there are 2414 more Restaurants, 9 more Bars and 37 more Directory Services? If so, then that's great. However when I collapse on some integer fields I get an empty list for collapse_counts. Do counts only work for text fields? Thanks in advance for any help you can provide! Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12564966#action_12564966 ] clh edited comment on SOLR-236 at 2/1/08 2:59 PM: - Ah ... got the beginnings of a diagnosis. The problem appears when the DocSet {{qDocSet}} returned by DocSetHitCollector.getDocSet() -- called at org.apache.solr.search.SolrIndexSearcher:1101 in trunk, or 1108 with the field_collapsing patch applied, inside getDocListAndSetNC()) -- is a BitDocSet, and not when it's a HashDocSet. As the stack trace above shows, calling intersection() on a BitDocSet object invokes the superclass' DocSetBase.intersection() method, which invokes a call chain that blows up when it hits the iterator() method of the NegatedDocSet passed in as the {{filter}} parameter to getDocListAndSetNC(); NegatedDocSet.iterator() blows up by design: {code} public DocIterator iterator() { throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, Unsupported Operation); } {code} I see that DocSetBase.intersection(DocSet other) has special-casing logic for dealing with {{other}} parameters that are instances of HashDocSet; does it also need special casing logic for dealing with {{other}} parameters that are NegatedDocSets? Or should NegatedDocSet *really* implement iterator()? Or something else entirely? was (Author: clh): Ah ... got the beginnings of a diagnosis. The problem appears when the DocSet {{qDocSet}} returned by DocSetHitCollector.getDocSet() -- called at org.apache.solr.search.SolrIndexSearcher:1101 in trunk, or 1108 with the field_collapsing patch applied, inside getDocListAndSetNC()) -- is a BitDocSet, and not when it's a HashDocSet. As the stack trace above shows, calling intersection() on a BitDocSet object invokes the superclass' DocSetBase.intersection() method, which invokes a call chain that blows up when it hits the iterator() method of the NegatedDocSet passed in as the {{filter}} parameter to getDocListAndSetNC(); NegatedDocSet.iterator() blows up by design: {{ public DocIterator iterator() { throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, Unsupported Operation); } }} I see that DocSetBase.intersection(DocSet other) has special-casing logic for dealing with {{other}} parameters that are instances of HashDocSet; does it also need special casing logic for dealing with {{other}} parameters that are NegatedDocSets? Or should NegatedDocSet *really* implement iterator()? Or something else entirely? Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12556652#action_12556652 ] clh edited comment on SOLR-236 at 1/7/08 1:51 PM: - bq. UPDATE: Doug Steigerwald's patch (field_collapsing_dsteigerwald.diff) applies cleanly to trunk I'm having trouble applying field_collapsing_1.3.patch to the head of trunk. {noformat} [EMAIL PROTECTED]:~/solr/src/java$ patch -p0 /home/charlie/downloads/field_collapsing_1.3.patch patching file org/apache/solr/search/CollapseFilter.java patching file org/apache/solr/search/SolrIndexSearcher.java Hunk #1 succeeded at 694 (offset -8 lines). Hunk #2 succeeded at 1252 (offset -1 lines). patching file org/apache/solr/common/params/CollapseParams.java patching file org/apache/solr/handler/StandardRequestHandler.java Hunk #1 FAILED at 33. Hunk #2 FAILED at 90. Hunk #3 FAILED at 117. 3 out of 3 hunks FAILED -- saving rejects to file org/apache/solr/handler/StandardRequestHandler.java.rej patching file org/apache/solr/handler/DisMaxRequestHandler.java Hunk #1 FAILED at 31. Hunk #2 FAILED at 40. Hunk #3 FAILED at 311. Hunk #4 FAILED at 339. 4 out of 4 hunks FAILED -- saving rejects to file org/apache/solr/handler/DisMaxRequestHandler.java.rej {noformat} I'm guessing that maybe the field collapsing patch needs to be updated for the SearchHandler refactoring that was does as part of SOLR-281? If so, I'll take a whack at migrating the changes to the SearchHandler.java, and see if I can produce a better patch. was (Author: clh): I'm having trouble applying field_collapsing_1.3.patch to the head of trunk. {noformat} [EMAIL PROTECTED]:~/solr/src/java$ patch -p0 /home/charlie/downloads/field_collapsing_1.3.patch patching file org/apache/solr/search/CollapseFilter.java patching file org/apache/solr/search/SolrIndexSearcher.java Hunk #1 succeeded at 694 (offset -8 lines). Hunk #2 succeeded at 1252 (offset -1 lines). patching file org/apache/solr/common/params/CollapseParams.java patching file org/apache/solr/handler/StandardRequestHandler.java Hunk #1 FAILED at 33. Hunk #2 FAILED at 90. Hunk #3 FAILED at 117. 3 out of 3 hunks FAILED -- saving rejects to file org/apache/solr/handler/StandardRequestHandler.java.rej patching file org/apache/solr/handler/DisMaxRequestHandler.java Hunk #1 FAILED at 31. Hunk #2 FAILED at 40. Hunk #3 FAILED at 311. Hunk #4 FAILED at 339. 4 out of 4 hunks FAILED -- saving rejects to file org/apache/solr/handler/DisMaxRequestHandler.java.rej {noformat} I'm guessing that maybe the field collapsing patch needs to be updated for the SearchHandler refactoring that was does as part of SOLR-281? If so, I'll take a whack at migrating the changes to the SearchHandler.java, and see if I can produce a better patch. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12556032#action_12556032 ] dsteigerwald edited comment on SOLR-236 at 1/4/08 11:43 AM: I've created a CollapseComponent for field collapsing. Everything seems to work fine with it. Only issue I'm having is I cannot use the query component because when it isn't commented out, the non-field collapsed results are displayed and I can't figure out how to remove them. Someone might be able to figure that part out. [http://localhost:8983/solr/search?q=id:[0%20TO%20*]collapse=truecollapse.field=inStockcollapse.type=normalcollapse.threshold=0] Here's the config I'm using: searchComponent name=collapse class=org.apache.solr.handler.component.CollapseComponent / requestHandler name=/search class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str /lst arr name=components !-- strquery/str -- strfacet/str !-- strmlt/str -- !-- strhighlight/str -- !-- strdebug/str -- strcollapse/str /arr /requestHandler was (Author: dsteigerwald): I've created a CollapseComponent for field collapsing. Everything seems to work fine with it. Only issue I'm having is I cannot use the query component because when it isn't commented out, the non-field collapsed results are displayed and I can't figure out how to remove them. Someone might be able to figure that part out. http://localhost:8983/solr/search?q=id:[0%20TO%20*]collapse=truecollapse.field=inStockcollapse.type=normalcollapse.threshold=0 Here's the config I'm using: searchComponent name=collapse class=org.apache.solr.handler.component.CollapseComponent / requestHandler name=/search class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str /lst arr name=components !-- strquery/str -- strfacet/str !-- strmlt/str -- !-- strhighlight/str -- !-- strdebug/str -- strcollapse/str /arr /requestHandler Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538339 ] ekeller edited comment on SOLR-236 at 10/28/07 1:55 PM: Here is the patch for solr 1.3 rev 589395. I made some performance improvement. No more cache. I use bitdocset or hashdocset depending on solrconfig.hashdocsetmaxsize variable. Regards, Emmanuel Keller. was (Author: ekeller): Here is the patch for solr 1.3 rev 589395. I made some performance improvment. No more cache. We are using bitdocset or hashdocset using solrconfig.hashdocsetmaxsize variable. Regards, Emmanuel Keller. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Attachments: field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.