[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2010-03-05 Thread Peter Karich (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841756#action_12841756
 ] 

Peter Karich edited comment on SOLR-236 at 3/5/10 8:53 AM:
---

It seems to me that the provided changes are necessary to make the OutOfMemory 
exception gone (see appended 3 files). Please apply the files with caution, 
because I made the changes from an old patch (from Nov 2009)

  was (Author: peathal):
It seems to me that the provides changes are necessary to make the 
OutOfMemory exception gone. Please apply the files with caution, because I made 
the changes from an old patch (from Nov 2009)
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java, 
 field-collapse-3.patch, field-collapse-4-with-solrj.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, 
 SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2010-03-04 Thread Peter Karich (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841147#action_12841147
 ] 

Peter Karich edited comment on SOLR-236 at 3/4/10 9:48 AM:
---

regarding the OutOfMemory problem: we are now testing the suggested change in 
production.

I replaced the float array with a TreeMapInteger, Float. The change was 
nearly trivial (I cannot provide a patch easily, because we are using an older 
patch, althoug I could post the 3 changed files.)

The point why I used a TreeMap instead a HashMap was that in the method advance 
in the class NonAdjacentDocumentCollapser.PredefinedScorer I needed the tailMap 
method:

{noformat}public int advance(int target) throws IOException {
// now we need a treemap method:
iter = scores.tailMap(target).entrySet().iterator();
if (iter.hasNext())
return target;
else
return NO_MORE_DOCS;
}
{noformat} 

Then -  I think - I discovered a bug/inconsistent behaviour: If I run the test 
FieldCollapsingIntegrationTest.testNonAdjacentCollapse_withFacetingBefore then 
the scores arrays will be created ala new float[maxDocs] in the old version. 
But the array will never be filled with some values so Float value1 = 
values.get(doc1); will return null in the method 
NonAdjacentDocumentCollapser.FloatValueFieldComparator.compare (the size of 
TreeMap is 0!); I work around this via 

{noformat} 
if (value1 == null)
value1 = 0f;
if (value2 == null)
value2 = 0f;
{noformat} 

I think the compare method should NOT be called if no docs are in the scores 
array ... ?

  was (Author: peathal):
regarding the OutOfMemory problem: we are now testing the suggested change 
in production.

I replaced the float array with a TreeMapInteger, Float. The change was 
nearly trivial (I cannot provide a patch easily, because we are using an older 
patch, althoug I could post the 3 changed files.)

The point why I used a TreeMap instead a HashMap was that in the method advance 
in the class NonAdjacentDocumentCollapser.PredefinedScorer I needed the tailMap 
method:

{noformat}public int advance(int target) throws IOException {
// now we need a treemap method:
iter = scores.tailMap(target).entrySet().iterator();
if (iter.hasNext())
return target;
else
return NO_MORE_DOCS;
}
{noformat} 

Then -  I think - I discovered a bug/inconsistent behaviour: If I run the test 
FieldCollapsingIntegrationTest.testNonAdjacentCollapse_withFacetingBefore then 
the scores arrays will be created ala new float[maxDocs] in the old version. 
But the array will never be filled with some values so Float value1 = 
values.get(doc1); will return null in the method 
NonAdjacentDocumentCollapser.FloatValueFieldComparator.compare (the size of 
TreeMap is 0!); I work around this via 

{noformat} 
if (value1 == null)
value1 = 0f;
if (value2 == null)
value2 = 0f;
{noformat} 

although the compare method should be called if no docs are in the scores array 
... ?
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, 
 SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the 

[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2010-03-04 Thread Peter Karich (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841147#action_12841147
 ] 

Peter Karich edited comment on SOLR-236 at 3/4/10 9:46 AM:
---

regarding the OutOfMemory problem: we are now testing the suggested change in 
production.

I replaced the float array with a TreeMapInteger, Float. The change was 
nearly trivial (I cannot provide a patch easily, because we are using an older 
patch, althoug I could post the 3 changed files.)

The point why I used a TreeMap instead a HashMap was that in the method advance 
in the class NonAdjacentDocumentCollapser.PredefinedScorer I needed the tailMap 
method:

{noformat}public int advance(int target) throws IOException {
// now we need a treemap method:
iter = scores.tailMap(target).entrySet().iterator();
if (iter.hasNext())
return target;
else
return NO_MORE_DOCS;
}
{noformat} 

Then -  I think - I discovered a bug/inconsistent behaviour: If I run the test 
FieldCollapsingIntegrationTest.testNonAdjacentCollapse_withFacetingBefore then 
the scores arrays will be created ala new float[maxDocs] in the old version. 
But the array will never be filled with some values so Float value1 = 
values.get(doc1); will return null in the method 
NonAdjacentDocumentCollapser.FloatValueFieldComparator.compare (the size of 
TreeMap is 0!); I work around this via 

{noformat} 
if (value1 == null)
value1 = 0f;
if (value2 == null)
value2 = 0f;
{noformat} 

although the compare method should be called if no docs are in the scores array 
... ?

  was (Author: peathal):
regarding the OutOfMemory problem: we are now testing the suggested change 
in production.

I replaced the float array with a TreeMapInteger, Float. The change was 
nearly trivial (I cannot provide a patch easily, because we are using an older 
patch, althoug I could post the 3 changed files.)

The point why I used a TreeMap instead a HashMap was that in the method advance 
in the class NonAdjacentDocumentCollapser.PredefinedScorer I needed the tailMap 
method:

{noformat} 
public int advance(int target) throws IOException {
// now we need a treemap method:
iter = scores.tailMap(target).entrySet().iterator();
if (iter.hasNext())
return target;
else
return NO_MORE_DOCS;
}
{noformat} 

Then -  I think - I discovered a bug/inconsistent behaviour: If I run the test 
FieldCollapsingIntegrationTest.testNonAdjacentCollapse_withFacetingBefore then 
the scores arrays will be created ala new float[maxDocs] in the old version. 
But the array will never be filled with some values so Float value1 = 
values.get(doc1); will return null in the method 
NonAdjacentDocumentCollapser.FloatValueFieldComparator.compare (the size of 
TreeMap is 0!); I work around this via 

{noformat} 

if (value1 == null)
value1 = 0f;
if (value2 == null)
value2 = 0f;

{noformat} 

although the compare method should be called if no docs are in the scores array 
... ?
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, 
 SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the 

[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2010-02-22 Thread Peter Steevensz (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836919#action_12836919
 ] 

Peter Steevensz edited comment on SOLR-236 at 2/22/10 9:40 PM:
---

I applied this patch to the nightlybuild of feb 22 and this compilers without 
any problem.

I can start Solr and it runs fine. But when i add the Field Collapse in the 
solrconfig.xml i cannot start Solr anymore.

After adding this line to my solrconfig.xml: 

searchComponent name=query
  class=org.apache.solr.handler.component.CollapseComponent /


I get this error when i run Solr:

2010-02-22 22:24:30.722::WARN:  Failed startup of context 
org.mortbay.jetty.webapp.webappcont...@7f5580{/solr,jar:file:/opt/apache-solr-1.5-dev/example/webapps/solr.war!/}
java.lang.NullPointerException
at 
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:593)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
at 
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
at 
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
at 
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
at org.mortbay.jetty.Server.doStart(Server.java:210)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.mortbay.start.Main.invokeMain(Main.java:183)
at org.mortbay.start.Main.start(Main.java:497)
at org.mortbay.start.Main.main(Main.java:115)

(I am using Centos with Java 1.6.0_13)

Any help is greatly appreciated!!


  was (Author: steevensz):
I applied this patch to the nightlybuild of feb 22 and this compilers 
without any problem.

I can start Solr and it runs fine. But when i add the Field Collapse in the 
solrconfig.xml i cannot start Solr anymore.

After adding this line to my solrconfig.xml: 

searchComponent name=query
  class=org.apache.solr.handler.component.CollapseComponent /


I get this error when i run Solr:

2010-02-22 22:24:30.722::WARN:  Failed startup of context 
org.mortbay.jetty.webapp.webappcont...@7f5580{/solr,jar:file:/opt/apache-solr-1.5-dev/example/webapps/solr.war!/}
java.lang.NullPointerException
at 
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:593)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
at 
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
at 
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
at 
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
at org.mortbay.jetty.Server.doStart(Server.java:210)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.mortbay.start.Main.invokeMain(Main.java:183)
at org.mortbay.start.Main.start(Main.java:497)

[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2010-02-22 Thread Peter Steevensz (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836919#action_12836919
 ] 

Peter Steevensz edited comment on SOLR-236 at 2/22/10 9:43 PM:
---

I applied this patch to the nightlybuild of feb 22 and this compiles without 
any problem.

I can start Solr and it runs fine. But when i add the Field Collapse in the 
solrconfig.xml i cannot start Solr anymore.

After adding this line to my solrconfig.xml: 

searchComponent name=query
  class=org.apache.solr.handler.component.CollapseComponent /


I get this error when i run Solr:

2010-02-22 22:24:30.722::WARN:  Failed startup of context 
org.mortbay.jetty.webapp.webappcont...@7f5580{/solr,jar:file:/opt/apache-solr-1.5-dev/example/webapps/solr.war!/}
java.lang.NullPointerException
at 
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:593)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
at 
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
at 
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
at 
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
at org.mortbay.jetty.Server.doStart(Server.java:210)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.mortbay.start.Main.invokeMain(Main.java:183)
at org.mortbay.start.Main.start(Main.java:497)
at org.mortbay.start.Main.main(Main.java:115)

(I am using Centos with Java 1.6.0_13)

Any help is greatly appreciated!!


  was (Author: steevensz):
I applied this patch to the nightlybuild of feb 22 and this compilers 
without any problem.

I can start Solr and it runs fine. But when i add the Field Collapse in the 
solrconfig.xml i cannot start Solr anymore.

After adding this line to my solrconfig.xml: 

searchComponent name=query
  class=org.apache.solr.handler.component.CollapseComponent /


I get this error when i run Solr:

2010-02-22 22:24:30.722::WARN:  Failed startup of context 
org.mortbay.jetty.webapp.webappcont...@7f5580{/solr,jar:file:/opt/apache-solr-1.5-dev/example/webapps/solr.war!/}
java.lang.NullPointerException
at 
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:593)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
at 
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
at 
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
at 
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
at org.mortbay.jetty.Server.doStart(Server.java:210)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.mortbay.start.Main.invokeMain(Main.java:183)
at org.mortbay.start.Main.start(Main.java:497)
 

[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2010-02-18 Thread Peter Karich (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835258#action_12835258
 ] 

Peter Karich edited comment on SOLR-236 at 2/18/10 4:06 PM:


Trying the latest patch from 1th Feb 2010 compiles against solr-2010-02-13 from 
nightly build but does not work. If I query 

http://server/cs-bidcs/select?q=*:*collapse.field=myfield

it fails with: 

{noformat} 

HTTP Status 500 - null java.lang.NullPointerException at 
org.apache.solr.schema.FieldType.toExternal(FieldType.java:329) at 
org.apache.solr.schema.FieldType.storedToReadable(FieldType.java:348) at 
org.apache.solr.search.fieldcollapse.collector.AbstractCollapseCollector.getCollapseGroupResult(AbstractCollapseCollector.java:58)
 at 
org.apache.solr.search.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory$DocumentCountCollapseCollector.getResult(DocumentGroupCountCollapseCollectorFactory.ja
va:84) at 
org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.getCollapseInfo(AbstractDocumentCollapser.java:193)
 at 
org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:192)
 at 
org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
 at
...
 {noformat} 


I only need the OutOfMemory problem solved ... :-(

  was (Author: peathal):
Trying the latest patch from 1th Feb 2010 compiles against solr-2010-02-13 
from nightly build but does not work. If I query 

http://searchdev05:15100/cs-bidcs/select?q=*:*collapse.field=myfield

it fails with: 

{noformat} 

HTTP Status 500 - null java.lang.NullPointerException at 
org.apache.solr.schema.FieldType.toExternal(FieldType.java:329) at 
org.apache.solr.schema.FieldType.storedToReadable(FieldType.java:348) at 
org.apache.solr.search.fieldcollapse.collector.AbstractCollapseCollector.getCollapseGroupResult(AbstractCollapseCollector.java:58)
 at 
org.apache.solr.search.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory$DocumentCountCollapseCollector.getResult(DocumentGroupCountCollapseCollectorFactory.ja
va:84) at 
org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.getCollapseInfo(AbstractDocumentCollapser.java:193)
 at 
org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:192)
 at 
org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
 at
...
 {noformat} 


I only need the OutOfMemory problem solved ... :-(
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, 
 SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 

[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2010-02-18 Thread Peter Karich (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835258#action_12835258
 ] 

Peter Karich edited comment on SOLR-236 at 2/18/10 4:06 PM:


Trying the latest patch from 1th Feb 2010 compiles against solr-2010-02-13 from 
nightly build but does not work. If I query 

http://server/solr-app/select?q=*:*collapse.field=myfield

it fails with: 

{noformat} 

HTTP Status 500 - null java.lang.NullPointerException at 
org.apache.solr.schema.FieldType.toExternal(FieldType.java:329) at 
org.apache.solr.schema.FieldType.storedToReadable(FieldType.java:348) at 
org.apache.solr.search.fieldcollapse.collector.AbstractCollapseCollector.getCollapseGroupResult(AbstractCollapseCollector.java:58)
 at 
org.apache.solr.search.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory$DocumentCountCollapseCollector.getResult(DocumentGroupCountCollapseCollectorFactory.ja
va:84) at 
org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.getCollapseInfo(AbstractDocumentCollapser.java:193)
 at 
org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:192)
 at 
org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
 at
...
 {noformat} 


I only need the OutOfMemory problem solved ... :-(

  was (Author: peathal):
Trying the latest patch from 1th Feb 2010 compiles against solr-2010-02-13 
from nightly build but does not work. If I query 

http://server/cs-bidcs/select?q=*:*collapse.field=myfield

it fails with: 

{noformat} 

HTTP Status 500 - null java.lang.NullPointerException at 
org.apache.solr.schema.FieldType.toExternal(FieldType.java:329) at 
org.apache.solr.schema.FieldType.storedToReadable(FieldType.java:348) at 
org.apache.solr.search.fieldcollapse.collector.AbstractCollapseCollector.getCollapseGroupResult(AbstractCollapseCollector.java:58)
 at 
org.apache.solr.search.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory$DocumentCountCollapseCollector.getResult(DocumentGroupCountCollapseCollectorFactory.ja
va:84) at 
org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.getCollapseInfo(AbstractDocumentCollapser.java:193)
 at 
org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:192)
 at 
org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
 at
...
 {noformat} 


I only need the OutOfMemory problem solved ... :-(
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, 
 SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to 

[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2010-02-18 Thread Peter Karich (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835258#action_12835258
 ] 

Peter Karich edited comment on SOLR-236 at 2/18/10 4:07 PM:


Trying the latest patch from 1th Feb 2010. It compiles against solr-2010-02-13 
from nightly build dir, but does not work. If I query 

http://server/solr-app/select?q=*:*collapse.field=myfield

it fails with: 

{noformat} 

HTTP Status 500 - null java.lang.NullPointerException at 
org.apache.solr.schema.FieldType.toExternal(FieldType.java:329) at 
org.apache.solr.schema.FieldType.storedToReadable(FieldType.java:348) at 
org.apache.solr.search.fieldcollapse.collector.AbstractCollapseCollector.getCollapseGroupResult(AbstractCollapseCollector.java:58)
 at 
org.apache.solr.search.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory$DocumentCountCollapseCollector.getResult(DocumentGroupCountCollapseCollectorFactory.ja
va:84) at 
org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.getCollapseInfo(AbstractDocumentCollapser.java:193)
 at 
org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:192)
 at 
org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
 at
...
 {noformat} 


I only need the OutOfMemory problem solved ... :-(

  was (Author: peathal):
Trying the latest patch from 1th Feb 2010 compiles against solr-2010-02-13 
from nightly build but does not work. If I query 

http://server/solr-app/select?q=*:*collapse.field=myfield

it fails with: 

{noformat} 

HTTP Status 500 - null java.lang.NullPointerException at 
org.apache.solr.schema.FieldType.toExternal(FieldType.java:329) at 
org.apache.solr.schema.FieldType.storedToReadable(FieldType.java:348) at 
org.apache.solr.search.fieldcollapse.collector.AbstractCollapseCollector.getCollapseGroupResult(AbstractCollapseCollector.java:58)
 at 
org.apache.solr.search.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory$DocumentCountCollapseCollector.getResult(DocumentGroupCountCollapseCollectorFactory.ja
va:84) at 
org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.getCollapseInfo(AbstractDocumentCollapser.java:193)
 at 
org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:192)
 at 
org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
 at
...
 {noformat} 


I only need the OutOfMemory problem solved ... :-(
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, 
 SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max 

[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2010-02-05 Thread Kevin Cunningham (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830305#action_12830305
 ] 

Kevin Cunningham edited comment on SOLR-236 at 2/5/10 11:06 PM:


Regarding Patrick's comment about a memory leak, we are seeing something 
similar - very large memory usage and eventually using all the available 
memory.  Were there any confirmed issues that may have been addressed with the 
later patches?  We're using the 12-24 patch.  Any toggles we can switch to 
still get the feature, yet minimize the memory footprint?

We had been running the 11-29 field-collapse-5.patch patch and saw nothing near 
this amount of memory consumption.

  was (Author: kunningham):
Regarding Patrick's comment about a memory leak, we are seeing something 
similar - very large memory usage and eventually using all the available 
memory.  Were there any confirmed issues that may have been addressed with the 
later patches?  We're using the 12-24 patch.  Any toggles we can switch to 
still get the feature, yet minimize the memory footprint?
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, 
 SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2010-02-05 Thread Kevin Cunningham (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830305#action_12830305
 ] 

Kevin Cunningham edited comment on SOLR-236 at 2/5/10 11:06 PM:


Regarding Patrick's comment about a memory leak, we are seeing something 
similar - very large memory usage and eventually using all the available 
memory.  Were there any confirmed issues that may have been addressed with the 
later patches?  We're using the 12-24 patch.  Any toggles we can switch to 
still get the feature, yet minimize the memory footprint?

We had been running the 11-29 field-collapse-5.patch patch and saw nothing near 
this amount of memory consumption.

What fixes would we be missing if ran Solr 1.4 with the last 
field-collapse-5.patch patch?

  was (Author: kunningham):
Regarding Patrick's comment about a memory leak, we are seeing something 
similar - very large memory usage and eventually using all the available 
memory.  Were there any confirmed issues that may have been addressed with the 
later patches?  We're using the 12-24 patch.  Any toggles we can switch to 
still get the feature, yet minimize the memory footprint?

We had been running the 11-29 field-collapse-5.patch patch and saw nothing near 
this amount of memory consumption.
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, 
 SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2010-01-07 Thread Martijn van Groningen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797794#action_12797794
 ] 

Martijn van Groningen edited comment on SOLR-236 at 1/7/10 9:28 PM:


bq. The result document of our prefix query, which was at position 1 without 
collapsing, was with collapsing not even within the top 10 results. We using 
the option collapse.maxdocs=150 and after changing this option to the value 
15000, the results seem to be as expected. Because of that, we concluded, that 
there has to be a problem with the sorting of the uncollapsed docset.

The collapse.maxdocs aborts collapsing after the threshold is met, but it is 
doing that based on the uncollapsed docset which is not sorted in any way. The 
result of that is that documents that would normally appear in the first page 
don't appear at all in the search result. Eventually the collapse component 
uses the collapsed docset as the result set and not the uncollapsed docset.

bq. Also, we noticed a huge memory leak problem, when using collapsing. We 
configured the component with searchComponent name=query 
class=org.apache.solr.handler.component.CollapseComponent/. Without setting 
the option collapse.field, it works normally, there are far no memory problems. 
If requests with enabled collapsing are received by the Solr server, the whole 
memory (oldgen could not be freed; eden space is heavily in use; ...) gets full 
after some few requests. By using a profiler, we noticed that the filterCache 
was extraordinary large. We supposed that there could be a caching problem 
(collapeCache was not enabled).

I agree it gets huge. This applies for both the filterCache and field collapse 
cache. This is something that has to be addressed and certainly will in the new 
field-collapse implementation. In the patch you're using too much is being 
cached (some data can even be neglected in the cache). Also in some cases 
strings are being cached that actually could be replaced with hashcodes.

bq. Additionally it might be very useful, if the parameter collapse=true|false 
would work again and could be used to enabled/disable the collapsing 
functionality. Currently, the existence of a field choosen for collapsing 
enables this feature and there is no possibility to configure the fields for 
collapsing within the request handlers. With that, we could configure it and 
only enable/disable it within the requests like it will be conveniently used by 
other components (highlighting, faceting, ...).

That actually makes sense for using the collapse.enable parameter again in the 
patch. 

Martijn

  was (Author: martijn):
bq. The result document of our prefix query, which was at position 1 
without collapsing, was with collapsing not even within the top 10 results. We 
using the option collapse.maxdocs=150 and after changing this option to the 
value 15000, the results seem to be as expected. Because of that, we concluded, 
that there has to be a problem with the sorting of the uncollapsed docset.

The collapse.maxdocs aborts collapsing after the threshold is met, but it is 
doing that based on the uncollapsed docset which is not sorted in any way. The 
result of that is that documents that would normally appear in the first page 
don't appear at all in the search result. Eventually the collapse component 
uses the collapsed docset as the result set and not the uncollapsed docset.

bq. Also, we noticed a huge memory leak problem, when using collapsing. We 
configured the component with searchComponent name=query 
class=org.apache.solr.handler.component.CollapseComponent/.
Without setting the option collapse.field, it works normally, there are far no 
memory problems. If requests with enabled collapsing are received by the Solr 
server, the whole memory (oldgen could not be freed; eden space is heavily in 
use; ...) gets full after some few requests. By using a profiler, we noticed 
that the filterCache was extraordinary large. We supposed that there could be a 
caching problem (collapeCache was not enabled).

I agree it gets huge. This applies for both the filterCache and field collapse 
cache. This is something that has to be addressed and certainly will in the new 
field-collapse implementation. In the patch you're using too much is being 
cached (some data can even be neglected in the cache). Also in some cases 
strings are being cached that actually could be replaced with hashcodes.

bq. Additionally it might be very useful, if the parameter collapse=true|false 
would work again and could be used to enabled/disable the collapsing 
functionality. Currently, the existence of a field choosen for collapsing 
enables this feature and there is no possibility to configure the fields for 
collapsing within the request handlers. With that, we could configure it and 
only enable/disable it within the requests like it will be 

[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-12-18 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792510#action_12792510
 ] 

Mark Miller edited comment on SOLR-236 at 12/18/09 3:41 PM:


bq. (Faceting fot a 50 times perf boost in 1.4)

No it didn't. Certain cases have gotten a boost (I think you might be referring 
to multi-valued field faceting cases?). And general faceting was always 
relatively fast and scalable.

I'm against committing features to trunk with a warning that the feature is not 
ready for trunk.

  was (Author: markrmil...@gmail.com):
bq. (Faceting fot a 50 times perf boost in 1.4)

No it didn't. Certain cases have gotten a boost (I think you might be referring 
to multi-field faceting cases?). And general faceting was always relatively 
fast and scalable.

I'm against committing features to trunk with a warning that the feature is not 
ready for trunk.
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, 
 SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-12-18 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792518#action_12792518
 ] 

Mark Miller edited comment on SOLR-236 at 12/18/09 4:12 PM:


bq. I very much disagree with a policy blocking non-production-ready code from 
being in source control

Just to be clear, there is no such policy that I've seen - each decision just 
comes down to consensus. And as far as I know, our branch policy is pretty much 
anything goes - trunk is very different than svn. Anyone (anyone with access 
to svn that is) can play around with a branch for anything if they want.


I agree with your thoughts on a branch - if the argument is, we want it to be 
easier for devs to check out and work on this, or for users to checkout and 
build this without applying patches, why not just make a branch? Merging is 
annoying but not difficult - I've been doing plenty of branch merging lately, 
and while its not glorious work, modern tools make it more of a grind than a 
challenge.

  was (Author: markrmil...@gmail.com):
bq. I very much disagree with a policy blocking non-production-ready code 
from being in source control

Just to be clear, there is no such policy that I've seen - each decision just 
comes down to consensus. And as far as I know, our branch policy is pretty much 
anything goes - trunk is very different than svn. Anyone can play around with 
a branch for anything if they want.


I agree with your thoughts on a branch - if the argument is, we want it to be 
easier for devs to check out and work on this, or for users to checkout and 
build this without applying patches, why not just make a branch? Merging is 
annoying but not difficult - I've been doing plenty of branch merging lately, 
and while its not glorious work, modern tools make it more of a grind than a 
challenge.
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, 
 SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-12-17 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791952#action_12791952
 ] 

Noble Paul edited comment on SOLR-236 at 12/17/09 2:55 PM:
---

shalin, the names may not be necessary on the collapseCollectorFactory  becaus 
they are never referred by the name

how about making the functions also plugis as
{code:xml}
collapseCollectorFactory 
class=org.apache.solr.search.fieldcollapse.collector.AggregateCollapseCollectorFactory

  function name=sum 
class=org.apache.solr.search.fieldcollapse.collector.aggregate.SumFunction/
  function name=avg 
class=org.apache.solr.search.fieldcollapse.collector.aggregate.AverageFunction/
  function name=min 
class=org.apache.solr.search.fieldcollapse.collector.aggregate.MinFunction/
  function name=max 
class=org.apache.solr.search.fieldcollapse.collector.aggregate.MaxFunction/
/collapseCollectorFactory

{code}

  was (Author: noble.paul):
shalin, the names may not be necessary on the collapseCollectorFactory  
becaus they are never referred by the name


  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, 
 SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-12-17 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792191#action_12792191
 ] 

Erik Hatcher edited comment on SOLR-236 at 12/17/09 11:03 PM:
--

I'll just add my 0,02€ - the main thing to vet now that it works (first make it 
work), is the interface to the client.  are the request params ideal?  is the 
response data structure locked down?   if so, get this committed ASAP and 
iterate on the internals of distributed and performance issues (then make it 
right).

Admittedly I've not tried this feature out myself though.  Committed stuff I'll 
try out easier than patches actually.

  was (Author: ehatcher):
I'll just add my 0,02€ - the main thing to vet now that it works (first 
make it work), is the interface to the client.  are the request params ideal?  
is the response data structure locked down?   if so, get this committed ASAP 
and iterate on the internals of distributed and performance issues.

Admittedly I've not tried this feature out myself though.  Committed stuff I'll 
try out easier than patches actually.
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, 
 SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-12-11 Thread Chad Kouse (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789550#action_12789550
 ] 

Chad Kouse edited comment on SOLR-236 at 12/11/09 9:37 PM:
---

Just wanted to comment that I am experiencing the same behavior as Marc Menghin 
above (NPE) -- the patch did NOT install cleanly (1 hunk failed) -- but I 
couldn't really tell why since it looked like it should have worked -- I just 
manually copied the hunk into the correct class Sorry I didn't note what 
failed

  was (Author: chadkouse):
Just wanted to comment that I am experiencing the same behavior as Marc 
Menghin above (NPE) -- the patch did NOT install cleanly (1 hunk failed) -- but 
I couldn't really tell why since it looked like it should have worked -- I just 
manually copied the hunk into the write class Sorry I didn't note what 
failed
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-solr-236-2.patch, 
 field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, 
 field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, 
 SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-12-07 Thread Marc Sturlese

Hey there, I have beeb testing the last patch and I think or I am missing
something or the way to show the collapsed documents when adjacent collapse
can be sometimes confusing:
I am using the patch replacing queryComponent for collapseComponent (not
using both at same time):
  searchComponent name=query
class=org.apache.solr.handler.component.CollapseComponent
What I have noticed is, imagin you get these results in the search:
doc1:
   id:001
   collapseField:ccc
doc2:
   id:002
   collapseField:aaa
doc3:
   id:003
   collapseField:ccc
doc4:
   id:004
   collapseField:bbb

And in the collapse_counts you get:
int name=collapseCount1/int
str name=fieldValueccc/str
result name=collapsedDocs numFound=1 start=0
doc
long name=id008/long
str name=contentaaa aaa/str
str name=colccc/str
/doc
/result

Now, how can I know the head document of doc 008? Both 001 and 003 could
be... wouldn't make sense to connect in someway  the uniqueField with the
collapsed documents?

Adding something to collapse_counts like:
int name=collapseCount1/int
str name=fieldValueccc/str
str name=uniqueFieldId003/str

I currently have hacked FieldValueCountCollapseCollectorFactory to return:
str name=fieldValueccc#003/str
but this respose looks dirty...

As I said maybe I am missunderstanding something and this can be knwon in
someway. In that case can someone tell me how?
Thanks in advance






JIRA j...@apache.org wrote:
 
 
 [
 https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783484#action_12783484
 ] 
 
 Martijn van Groningen edited comment on SOLR-236 at 11/29/09 9:56 PM:
 --
 
 I have attached a new patch that has the following changes:
 # Added caching for the field collapse functionality. Check the [solr
 wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure
 field-collapsing with caching.
 # Removed the collapse.max parameter (collapse.threshold must be used
 instead). It was deprecated for a long time. 
 
   was (Author: martijn):
 I have attached a new patch that has the following changes:
 # Added caching for the field collapse functionality. Check the [solr
 wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure the
 field-collapsing with caching.
 # Removed the collapse.max parameter (collapse.threshold must be used
 instead). It was deprecated for a long time. 
   
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch,
 collapsing-patch-to-1.3.0-ivan.patch,
 collapsing-patch-to-1.3.0-ivan_2.patch,
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch,
 field-collapse-4-with-solrj.patch, field-collapse-5.patch,
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch,
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch,
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff,
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff,
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch,
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch,
 solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a
 given field to a single entry in the result set. Site collapsing is a
 special case of this, where all results for a given web site is collapsed
 into one or two entries in the result set, typically with an associated
 more documents from this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)
 
 -- 
 This message is automatically generated by JIRA.
 -
 You can reply to this email to add a comment to the issue online.
 
 
 

-- 
View this message in context: 

Re: [jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-12-07 Thread Martijn v Groningen
Hi Marc,

I'm not sure if I follow you completely, but the example you gave is
not complete. I'm missing a few tags in your example. Lets assume the
following response that the latest patches produce.

lst name=collapse_counts
str name=fieldcat/str
lst name=results
lst name=009
str name=fieldValuehard/str
int name=collapseCount1/int
result name=collapsedDocs numFound=1 start=0
 doc
long name=id008/long
str name=contentaaa aaa/str
str name=colccc/str
 /doc
/result
/lst
...
/lst
/lst

The result list contains collapse groups. The name of the child
elements are the collapse head ids. Everything that falls under the
collapse head belongs to that collapse group and thus adding document
head id to the field value is unnecessary.  In the above example
document with id 009 is the document head of document with id 008.
Document with id 009 should be displayed in the search result.

From what you have said, it seems that you properly configured the patch.

Martijn

2009/12/7 Marc Sturlese marc.sturl...@gmail.com:

 Hey there, I have beeb testing the last patch and I think or I am missing
 something or the way to show the collapsed documents when adjacent collapse
 can be sometimes confusing:
 I am using the patch replacing queryComponent for collapseComponent (not
 using both at same time):
  searchComponent name=query
 class=org.apache.solr.handler.component.CollapseComponent
 What I have noticed is, imagin you get these results in the search:
 doc1:
   id:001
   collapseField:ccc
 doc2:
   id:002
   collapseField:aaa
 doc3:
   id:003
   collapseField:ccc
 doc4:
   id:004
   collapseField:bbb

 And in the collapse_counts you get:
 int name=collapseCount1/int
 str name=fieldValueccc/str
 result name=collapsedDocs numFound=1 start=0
 doc
 long name=id008/long
 str name=contentaaa aaa/str
 str name=colccc/str
 /doc
 /result

 Now, how can I know the head document of doc 008? Both 001 and 003 could
 be... wouldn't make sense to connect in someway  the uniqueField with the
 collapsed documents?

 Adding something to collapse_counts like:
 int name=collapseCount1/int
 str name=fieldValueccc/str
 str name=uniqueFieldId003/str

 I currently have hacked FieldValueCountCollapseCollectorFactory to return:
 str name=fieldValueccc#003/str
 but this respose looks dirty...

 As I said maybe I am missunderstanding something and this can be knwon in
 someway. In that case can someone tell me how?
 Thanks in advance






 JIRA j...@apache.org wrote:


     [
 https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783484#action_12783484
 ]

 Martijn van Groningen edited comment on SOLR-236 at 11/29/09 9:56 PM:
 --

 I have attached a new patch that has the following changes:
 # Added caching for the field collapse functionality. Check the [solr
 wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure
 field-collapsing with caching.
 # Removed the collapse.max parameter (collapse.threshold must be used
 instead). It was deprecated for a long time.

       was (Author: martijn):
     I have attached a new patch that has the following changes:
 # Added caching for the field collapse functionality. Check the [solr
 wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure the
 field-collapsing with caching.
 # Removed the collapse.max parameter (collapse.threshold must be used
 instead). It was deprecated for a long time.

 Field collapsing
 

                 Key: SOLR-236
                 URL: https://issues.apache.org/jira/browse/SOLR-236
             Project: Solr
          Issue Type: New Feature
          Components: search
    Affects Versions: 1.3
            Reporter: Emmanuel Keller
             Fix For: 1.5

         Attachments: collapsing-patch-to-1.3.0-dieter.patch,
 collapsing-patch-to-1.3.0-ivan.patch,
 collapsing-patch-to-1.3.0-ivan_2.patch,
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch,
 field-collapse-4-with-solrj.patch, field-collapse-5.patch,
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch,
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch,
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff,
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff,
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch,
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch,
 

Re: [jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-12-07 Thread Martijn v Groningen
Yes it should look similar to that. What is the exact request you send to Solr?
Also to check if the patch works correctly can you run: ant clean test
There are a number of tests that test the Field collapse functionality.

Martijn


2009/12/7 Marc Sturlese marc.sturl...@gmail.com:

lst name=collapse_counts
   str name=fieldcat/str
    lst name=results
        lst name=009
            str name=fieldValuehard/str
           int name=collapseCount1/int
            result name=collapsedDocs numFound=1 start=0
                 doc
                    long name=id008/long
                    str name=contentaaa aaa/str
                    str name=colccc/str
                 /doc
            /result
        /lst
        ...
    /lst
/lst
 I see, looks like I am applying the patch wrongly somehow.
 This the complete collapse_counts response I am getting:
 lst name=collapse_counts
  str name=fieldcol/str
  lst name=results
    lst
      int name=collapseCount1/int
      int name=collapseCount1/int
      int name=collapseCount1/int
      str name=fieldValuebbb/str
      str name=fieldValueccc/str
      str name=fieldValuexxx/str
      result name=collapsedDocs numFound=1 start=0
        doc
          long name=id2/long
          str name=contentaaa aaa/str
          str name=colbbb/str
        /doc
      /result
      result name=collapsedDocs numFound=1 start=0
        doc
          long name=id8/long
          str name=contentaaa aaa aaa sd/str
          str name=colccc/str
       /doc
      /result
      result name=collapsedDocs numFound=4 start=0
        doc
          long name=id12/long
          str name=contentaaa aaa aaa v/str
          str name=colxxx/str
        /doc
      /result
    /lst
  /lst
 /lst

 As you can see I am getting a lst tag with no name. As I understood what
 you told me. I should be getting as many lst tags as collapsed groups and
 the name attribute of the lst should be the unique field value. So, if the
 patch was applyed correcly teh response should look like:

 lst name=collapse_counts
  str name=fieldcol/str
  lst name=results
    lst name=354 (the head value of the collapsed group)
      int name=collapseCount1/int
      str name=fieldValuebbb/str
      result name=collapsedDocs numFound=1 start=0
        doc
          long name=id2/long
          str name=contentaaa aaa/str
          str name=colbbb/str
        /doc
      /result
    /lst
    lst name=654
      int name=collapseCount1/int
      str name=fieldValueccc/str
      result name=collapsedDocs numFound=1 start=0
        doc
          long name=id8/long
          str name=contentaaa aaa aaa sd/str
          str name=colccc/str
       /doc
      /result
    /lst
    lst name=654
      int name=collapseCount1/int
      str name=fieldValuexxx/str
      result name=collapsedDocs numFound=4 start=0
        doc
          long name=id12/long
          str name=contentaaa aaa aaa v/str
          str name=colxxx/str
        /doc
      /result
    /lst
  /lst
 /lst

 Is this the way the response looks like when you use teh patch?
 Thanks in advance


 Martijn v Groningen wrote:

 Hi Marc,

 I'm not sure if I follow you completely, but the example you gave is
 not complete. I'm missing a few tags in your example. Lets assume the
 following response that the latest patches produce.

 lst name=collapse_counts
     str name=fieldcat/str
     lst name=results
         lst name=009
             str name=fieldValuehard/str
             int name=collapseCount1/int
             result name=collapsedDocs numFound=1 start=0
                  doc
                     long name=id008/long
                     str name=contentaaa aaa/str
                     str name=colccc/str
                  /doc
             /result
         /lst
         ...
     /lst
 /lst

 The result list contains collapse groups. The name of the child
 elements are the collapse head ids. Everything that falls under the
 collapse head belongs to that collapse group and thus adding document
 head id to the field value is unnecessary.  In the above example
 document with id 009 is the document head of document with id 008.
 Document with id 009 should be displayed in the search result.

 From what you have said, it seems that you properly configured the patch.

 Martijn

 2009/12/7 Marc Sturlese marc.sturl...@gmail.com:

 Hey there, I have beeb testing the last patch and I think or I am missing
 something or the way to show the collapsed documents when adjacent
 collapse
 can be sometimes confusing:
 I am using the patch replacing queryComponent for collapseComponent (not
 using both at same time):
  searchComponent name=query
 class=org.apache.solr.handler.component.CollapseComponent
 What I have noticed is, imagin you get these results in the search:
 doc1:
   id:001
   collapseField:ccc
 doc2:
   id:002
   collapseField:aaa
 doc3:
   id:003
   collapseField:ccc
 doc4:
   id:004
   collapseField:bbb

 And in the collapse_counts you get:
 int name=collapseCount1/int
 

Re: [jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-12-07 Thread Marc Sturlese

The request I am sending is:
http://localhost:8983/solr/select/?q=aaaversion=2.2start=0rows=20indent=oncollapse.field=colcollapse.includeCollapsedDocs.fl=*collapse.type=adjacentcollapse.info.doc=truecollapse.info.count=true

I search for 'aaa' in the content field. All the documents in the result
contain that string in the field content

Martijn v Groningen wrote:
 
 Yes it should look similar to that. What is the exact request you send to
 Solr?
 Also to check if the patch works correctly can you run: ant clean test
 There are a number of tests that test the Field collapse functionality.
 
 Martijn
 
 
 2009/12/7 Marc Sturlese marc.sturl...@gmail.com:

lst name=collapse_counts
   str name=fieldcat/str
    lst name=results
        lst name=009
            str name=fieldValuehard/str
           int name=collapseCount1/int
            result name=collapsedDocs numFound=1 start=0
                 doc
                    long name=id008/long
                    str name=contentaaa aaa/str
                    str name=colccc/str
                 /doc
            /result
        /lst
        ...
    /lst
/lst
 I see, looks like I am applying the patch wrongly somehow.
 This the complete collapse_counts response I am getting:
 lst name=collapse_counts
  str name=fieldcol/str
  lst name=results
    lst
      int name=collapseCount1/int
      int name=collapseCount1/int
      int name=collapseCount1/int
      str name=fieldValuebbb/str
      str name=fieldValueccc/str
      str name=fieldValuexxx/str
      result name=collapsedDocs numFound=1 start=0
        doc
          long name=id2/long
          str name=contentaaa aaa/str
          str name=colbbb/str
        /doc
      /result
      result name=collapsedDocs numFound=1 start=0
        doc
          long name=id8/long
          str name=contentaaa aaa aaa sd/str
          str name=colccc/str
       /doc
      /result
      result name=collapsedDocs numFound=4 start=0
        doc
          long name=id12/long
          str name=contentaaa aaa aaa v/str
          str name=colxxx/str
        /doc
      /result
    /lst
  /lst
 /lst

 As you can see I am getting a lst tag with no name. As I understood
 what
 you told me. I should be getting as many lst tags as collapsed groups and
 the name attribute of the lst should be the unique field value. So, if
 the
 patch was applyed correcly teh response should look like:

 lst name=collapse_counts
  str name=fieldcol/str
  lst name=results
    lst name=354 (the head value of the collapsed group)
      int name=collapseCount1/int
      str name=fieldValuebbb/str
      result name=collapsedDocs numFound=1 start=0
        doc
          long name=id2/long
          str name=contentaaa aaa/str
          str name=colbbb/str
        /doc
      /result
    /lst
    lst name=654
      int name=collapseCount1/int
      str name=fieldValueccc/str
      result name=collapsedDocs numFound=1 start=0
        doc
          long name=id8/long
          str name=contentaaa aaa aaa sd/str
          str name=colccc/str
       /doc
      /result
    /lst
    lst name=654
      int name=collapseCount1/int
      str name=fieldValuexxx/str
      result name=collapsedDocs numFound=4 start=0
        doc
          long name=id12/long
          str name=contentaaa aaa aaa v/str
          str name=colxxx/str
        /doc
      /result
    /lst
  /lst
 /lst

 Is this the way the response looks like when you use teh patch?
 Thanks in advance


 Martijn v Groningen wrote:

 Hi Marc,

 I'm not sure if I follow you completely, but the example you gave is
 not complete. I'm missing a few tags in your example. Lets assume the
 following response that the latest patches produce.

 lst name=collapse_counts
     str name=fieldcat/str
     lst name=results
         lst name=009
             str name=fieldValuehard/str
             int name=collapseCount1/int
             result name=collapsedDocs numFound=1 start=0
                  doc
                     long name=id008/long
                     str name=contentaaa aaa/str
                     str name=colccc/str
                  /doc
             /result
         /lst
         ...
     /lst
 /lst

 The result list contains collapse groups. The name of the child
 elements are the collapse head ids. Everything that falls under the
 collapse head belongs to that collapse group and thus adding document
 head id to the field value is unnecessary.  In the above example
 document with id 009 is the document head of document with id 008.
 Document with id 009 should be displayed in the search result.

 From what you have said, it seems that you properly configured the
 patch.

 Martijn

 2009/12/7 Marc Sturlese marc.sturl...@gmail.com:

 Hey there, I have beeb testing the last patch and I think or I am
 missing
 something or the way to show the collapsed documents when adjacent
 collapse
 can be sometimes confusing:
 I am using the patch replacing queryComponent for collapseComponent
 (not
 using both at same 

Re: [jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-12-07 Thread Martijn v Groningen
The last two parameters are not necessary, since they default both to
true. Could you run the field collapse tests tests successful?

2009/12/7 Marc Sturlese marc.sturl...@gmail.com:

 The request I am sending is:
 http://localhost:8983/solr/select/?q=aaaversion=2.2start=0rows=20indent=oncollapse.field=colcollapse.includeCollapsedDocs.fl=*collapse.type=adjacentcollapse.info.doc=truecollapse.info.count=true

 I search for 'aaa' in the content field. All the documents in the result
 contain that string in the field content

 Martijn v Groningen wrote:

 Yes it should look similar to that. What is the exact request you send to
 Solr?
 Also to check if the patch works correctly can you run: ant clean test
 There are a number of tests that test the Field collapse functionality.

 Martijn


 2009/12/7 Marc Sturlese marc.sturl...@gmail.com:

lst name=collapse_counts
   str name=fieldcat/str
    lst name=results
        lst name=009
            str name=fieldValuehard/str
           int name=collapseCount1/int
            result name=collapsedDocs numFound=1 start=0
                 doc
                    long name=id008/long
                    str name=contentaaa aaa/str
                    str name=colccc/str
                 /doc
            /result
        /lst
        ...
    /lst
/lst
 I see, looks like I am applying the patch wrongly somehow.
 This the complete collapse_counts response I am getting:
 lst name=collapse_counts
  str name=fieldcol/str
  lst name=results
    lst
      int name=collapseCount1/int
      int name=collapseCount1/int
      int name=collapseCount1/int
      str name=fieldValuebbb/str
      str name=fieldValueccc/str
      str name=fieldValuexxx/str
      result name=collapsedDocs numFound=1 start=0
        doc
          long name=id2/long
          str name=contentaaa aaa/str
          str name=colbbb/str
        /doc
      /result
      result name=collapsedDocs numFound=1 start=0
        doc
          long name=id8/long
          str name=contentaaa aaa aaa sd/str
          str name=colccc/str
       /doc
      /result
      result name=collapsedDocs numFound=4 start=0
        doc
          long name=id12/long
          str name=contentaaa aaa aaa v/str
          str name=colxxx/str
        /doc
      /result
    /lst
  /lst
 /lst

 As you can see I am getting a lst tag with no name. As I understood
 what
 you told me. I should be getting as many lst tags as collapsed groups and
 the name attribute of the lst should be the unique field value. So, if
 the
 patch was applyed correcly teh response should look like:

 lst name=collapse_counts
  str name=fieldcol/str
  lst name=results
    lst name=354 (the head value of the collapsed group)
      int name=collapseCount1/int
      str name=fieldValuebbb/str
      result name=collapsedDocs numFound=1 start=0
        doc
          long name=id2/long
          str name=contentaaa aaa/str
          str name=colbbb/str
        /doc
      /result
    /lst
    lst name=654
      int name=collapseCount1/int
      str name=fieldValueccc/str
      result name=collapsedDocs numFound=1 start=0
        doc
          long name=id8/long
          str name=contentaaa aaa aaa sd/str
          str name=colccc/str
       /doc
      /result
    /lst
    lst name=654
      int name=collapseCount1/int
      str name=fieldValuexxx/str
      result name=collapsedDocs numFound=4 start=0
        doc
          long name=id12/long
          str name=contentaaa aaa aaa v/str
          str name=colxxx/str
        /doc
      /result
    /lst
  /lst
 /lst

 Is this the way the response looks like when you use teh patch?
 Thanks in advance


 Martijn v Groningen wrote:

 Hi Marc,

 I'm not sure if I follow you completely, but the example you gave is
 not complete. I'm missing a few tags in your example. Lets assume the
 following response that the latest patches produce.

 lst name=collapse_counts
     str name=fieldcat/str
     lst name=results
         lst name=009
             str name=fieldValuehard/str
             int name=collapseCount1/int
             result name=collapsedDocs numFound=1 start=0
                  doc
                     long name=id008/long
                     str name=contentaaa aaa/str
                     str name=colccc/str
                  /doc
             /result
         /lst
         ...
     /lst
 /lst

 The result list contains collapse groups. The name of the child
 elements are the collapse head ids. Everything that falls under the
 collapse head belongs to that collapse group and thus adding document
 head id to the field value is unnecessary.  In the above example
 document with id 009 is the document head of document with id 008.
 Document with id 009 should be displayed in the search result.

 From what you have said, it seems that you properly configured the
 patch.

 Martijn

 2009/12/7 Marc Sturlese marc.sturl...@gmail.com:

 Hey there, I have beeb testing the last patch and I think or I am
 missing
 something or the 

Re: [jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-12-07 Thread Martijn v Groningen
Yes, I can reproduce the same situation here. I will update the patch
asap and add it to Jira.

Martijn

2009/12/7 Marc Sturlese marc.sturl...@gmail.com:

 Hey! Got it working!
 The problem was that my uniqueField is indexed as long and it's not suported
 by the patch.
 The value is obtained in getCollapseGroupResult function in
 AbstarctCollapseCollector.java as:

 String schemaId = searcher.doc(docId).get(uniqueIdFieldname);

 To suport long,int,slong,sint,float,sfloat...
 It should be obtaining doing somenthing like:

 FieldType idFieldType =
 searcher.getSchema().getFieldType(uniqueIdFieldname);
 String schemaId = ;
 Fieldable name_field = null;
 try {
      name_field = searcher.doc(id).getFieldable(uniqueIdFieldname);
 } catch (IOException ex) {
      //deal with exception
 }
 if (name_field != null) {
   schemaId = idFieldType.storedToReadable(name_field);
 }


 Martijn v Groningen wrote:

 The last two parameters are not necessary, since they default both to
 true. Could you run the field collapse tests tests successful?

 2009/12/7 Marc Sturlese marc.sturl...@gmail.com:

 The request I am sending is:
 http://localhost:8983/solr/select/?q=aaaversion=2.2start=0rows=20indent=oncollapse.field=colcollapse.includeCollapsedDocs.fl=*collapse.type=adjacentcollapse.info.doc=truecollapse.info.count=true

 I search for 'aaa' in the content field. All the documents in the result
 contain that string in the field content

 Martijn v Groningen wrote:

 Yes it should look similar to that. What is the exact request you send
 to
 Solr?
 Also to check if the patch works correctly can you run: ant clean test
 There are a number of tests that test the Field collapse functionality.

 Martijn


 2009/12/7 Marc Sturlese marc.sturl...@gmail.com:

lst name=collapse_counts
   str name=fieldcat/str
    lst name=results
        lst name=009
            str name=fieldValuehard/str
           int name=collapseCount1/int
            result name=collapsedDocs numFound=1 start=0
                 doc
                    long name=id008/long
                    str name=contentaaa aaa/str
                    str name=colccc/str
                 /doc
            /result
        /lst
        ...
    /lst
/lst
 I see, looks like I am applying the patch wrongly somehow.
 This the complete collapse_counts response I am getting:
 lst name=collapse_counts
  str name=fieldcol/str
  lst name=results
    lst
      int name=collapseCount1/int
      int name=collapseCount1/int
      int name=collapseCount1/int
      str name=fieldValuebbb/str
      str name=fieldValueccc/str
      str name=fieldValuexxx/str
      result name=collapsedDocs numFound=1 start=0
        doc
          long name=id2/long
          str name=contentaaa aaa/str
          str name=colbbb/str
        /doc
      /result
      result name=collapsedDocs numFound=1 start=0
        doc
          long name=id8/long
          str name=contentaaa aaa aaa sd/str
          str name=colccc/str
       /doc
      /result
      result name=collapsedDocs numFound=4 start=0
        doc
          long name=id12/long
          str name=contentaaa aaa aaa v/str
          str name=colxxx/str
        /doc
      /result
    /lst
  /lst
 /lst

 As you can see I am getting a lst tag with no name. As I understood
 what
 you told me. I should be getting as many lst tags as collapsed groups
 and
 the name attribute of the lst should be the unique field value. So, if
 the
 patch was applyed correcly teh response should look like:

 lst name=collapse_counts
  str name=fieldcol/str
  lst name=results
    lst name=354 (the head value of the collapsed group)
      int name=collapseCount1/int
      str name=fieldValuebbb/str
      result name=collapsedDocs numFound=1 start=0
        doc
          long name=id2/long
          str name=contentaaa aaa/str
          str name=colbbb/str
        /doc
      /result
    /lst
    lst name=654
      int name=collapseCount1/int
      str name=fieldValueccc/str
      result name=collapsedDocs numFound=1 start=0
        doc
          long name=id8/long
          str name=contentaaa aaa aaa sd/str
          str name=colccc/str
       /doc
      /result
    /lst
    lst name=654
      int name=collapseCount1/int
      str name=fieldValuexxx/str
      result name=collapsedDocs numFound=4 start=0
        doc
          long name=id12/long
          str name=contentaaa aaa aaa v/str
          str name=colxxx/str
        /doc
      /result
    /lst
  /lst
 /lst

 Is this the way the response looks like when you use teh patch?
 Thanks in advance


 Martijn v Groningen wrote:

 Hi Marc,

 I'm not sure if I follow you completely, but the example you gave is
 not complete. I'm missing a few tags in your example. Lets assume the
 following response that the latest patches produce.

 lst name=collapse_counts
     str name=fieldcat/str
     lst name=results
         lst name=009
             str name=fieldValuehard/str
             int name=collapseCount1/int
             result 

[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-11-29 Thread Martijn van Groningen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783484#action_12783484
 ] 

Martijn van Groningen edited comment on SOLR-236 at 11/29/09 9:56 PM:
--

I have attached a new patch that has the following changes:
# Added caching for the field collapse functionality. Check the [solr 
wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure 
field-collapsing with caching.
# Removed the collapse.max parameter (collapse.threshold must be used instead). 
It was deprecated for a long time. 

  was (Author: martijn):
I have attached a new patch that has the following changes:
# Added caching for the field collapse functionality. Check the [solr 
wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure the 
field-collapsing with caching.
# Removed the collapse.max parameter (collapse.threshold must be used instead). 
It was deprecated for a long time. 
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-11-22 Thread Martijn van Groningen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781232#action_12781232
 ] 

Martijn van Groningen edited comment on SOLR-236 at 11/22/09 10:06 PM:
---

The reason why the search results after the first search were incorrect was, 
because the scores were not preserved in the cache. The result of that was that 
the collapsing algorithm could not properly group the documents into the 
collapse groups (the most relevant document per document group could not be 
determined properly), because there was no score information when retrieving 
the documents from cache (as DocSet in SolrIndexSearcher) . 

I made sure that in the attached patch the score is also saved in the cache, so 
the collapsing algorithm can do its work properly when the documents are 
retrieved from the cache. Because the scores are now stored with the cached 
documents the actual size of the filterCache in memory will increase. 

  was (Author: martijn):
The reason why the search results after the first search were incorrect 
was, because the score was not preserved in the cache. The result of that was 
that the collapsing algorithm could not properly group the documents into the 
collapse groups (the most relevant document per document group could not be 
determined properly), because there was not score information when retrieving 
the documents from cache (as DocSet in SolrIndexSearcher) . 

I made sure that in the attached patch the score is also saved in the cache, so 
the collapsing algorithm can do its work properly when the documents are 
retrieved from the cache. Because the scores are now stored with the cached 
documents the actual size of the filterCache in memory will increase. 
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsin
 g.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-11-13 Thread Thomas Woodard (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12777659#action_12777659
 ] 

Thomas Woodard edited comment on SOLR-236 at 11/13/09 9:10 PM:
---

I tried the build again, and you are right, it does work fine with the default 
search handler. I had been trying to get it working with our search handler, 
which is dismax. That still doesn't work. Here is the handler configuration, 
which works fine until collapsing is added.

{code:xml}
requestHandler name=glsearch class=solr.SearchHandler
lst name=defaults
str name=defTypedismax/str
str name=qfname^3 description^2 long_description^2 
search_stars^1 search_directors^1 product_id^0.1/str
str name=tie0.1/str
str name=facettrue/str
str name=facet.fieldstars/str
str name=facet.fielddirectors/str
str name=facet.fieldkeywords/str
str name=facet.fieldstudio/str
str name=facet.mincount1/str
/lst
/requestHandler
{code}

Edit: The search fails even if you don't pass a collapse field.

  was (Author: gtfoomw):
I tried the build again, and you are right, it does work fine with the 
default search handler. I had been trying to get it working with our search 
handler, which is dismax. That still doesn't work. Here is the handler 
configuration, which works fine until collapsing is added.

{code:xml}
requestHandler name=glsearch class=solr.SearchHandler
lst name=defaults
str name=defTypedismax/str
str name=qfname^3 description^2 long_description^2 
search_stars^1 search_directors^1 product_id^0.1/str
str name=tie0.1/str
str name=facettrue/str
str name=facet.fieldstars/str
str name=facet.fielddirectors/str
str name=facet.fieldkeywords/str
str name=facet.fieldstudio/str
str name=facet.mincount1/str
/lst
/requestHandler
{code}
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-11-10 Thread Michael Gundlach (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12775925#action_12775925
 ] 

Michael Gundlach edited comment on SOLR-236 at 11/10/09 4:13 PM:
-

This patch (quasidistributed.additional.patch) does not apply field collapsing.

Apply this patch in addition to the latest field collapsing patch, to avoid an 
NPE when:

 - you are collapsing on a field F,
 - you are sharding into multiple cores, using the hash of field F as your 
sharding key, AND
 - you perform a distributed search on a tokenized field.

Note that if you attempt to use this patch to collapse on a field F1 and shard 
according to a field F2, you will get buggy search behavior.

  was (Author: gundlach):
This patch does not apply field collapsing.

Apply this patch in addition to the latest field collapsing patch, to avoid an 
NPE when:

 - you are collapsing on a field F,
 - you are sharding into multiple cores, using the hash of field F as your 
sharding key, AND
 - you perform a distributed search on a tokenized field.

Note that if you attempt to use this patch to collapse on a field F1 and shard 
according to a field F2, you will get buggy search behavior.
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-11-09 Thread Michael Gundlach (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12775192#action_12775192
 ] 

Michael Gundlach edited comment on SOLR-236 at 11/9/09 11:45 PM:
-

I've found an NPE that occurs when performing quasi-distributed field 
collapsing.

My company only has one use case for field collapsing: collapsing on email 
address.  Our index is spread across multiple cores.  We found that if we shard 
by email address, so that all documents with a given email address are 
guaranteed to appear on the same core, then we can do distributed field 
collapsing.

We add collapse.field=email and shards=core1,core2,... to a regular query.  
Each core collapses on email and sends the results back to the requestor.  
Since no emails appear on more than one core, we've accomplished distributed 
search.  We do lose the collapse_count section, but that's not needed for our 
purpose -- we just need an accurate total document count, and to have no more 
than one document for a given email address in the results.

Unfortunately, this throws an NPE when searching on a tokenized field.  
Searching string fields is fine.  I don't understand exactly why the NPE 
appears, but I did bandaid over it by checking explicitly for nulls at the 
appropriate line in the code.  No more NPE.

There's a downside, which is that if we attempt to collapse on a field other 
than email -- one which has documents appearing in multiple cores -- the 
results are buggy: the first search returns few documents, and the number of 
documents actually displayed don't always match the numFound value.  Then 
upon refresh we get what we think is the correct numFound, and the correct list 
of documents.  This doesn't bother me too much, as you're guaranteed to get 
incorrect answers from the collapse code anyway when collapsing on a field that 
you didn't use as your key for sharding.

In the spirit of Yonik's law of patches, I have made two imperfect patches 
attempting to contribute the fix, or at least point out the error:

1. I pulled trunk, applied the latest SOLR-236 patch, made my 2 line change, 
and created a patch file.  The resultant patch file looks very different from 
the latest SOLR-236 patchfile, so I assume I did something wrong.

2. I pulled trunk, made my 2 line change, and created another patch file.  This 
file is tiny but of course is missing all of the field collapsing changes.

Would you like me to post either of these patchfiles to this issue?  Or is it 
sufficient to just tell you that the NPE occured in QueryComponent.java on line 
556? (rb._responseDocs.set(sdoc.positionInResponse, doc); where sdoc was 
null.)  Perhaps my use case is extraordinary enough that you're happy leaving 
the NPE in place and telling other users to not do what I'm doing?

Thanks!
Michael

  was (Author: gundlach):
I've found an NPE that occurs when performing quasi-distributed field 
collapsing.

My company only has one use case for field collapsing: collapsing on email 
address.  Our index is spread across multiple cores.  We found that if we shard 
by email address, so that a given all documents with a given email address are 
guaranteed to appear on the same core, then we can do distributed field 
collapsing.

We add collapse.field=email and shards=core1,core2,... to a regular query.  
Each core collapses on email and sends the results back to the requestor.  
Since no emails appear on more than one core, we've accomplished distributed 
search.  We do lose the collapse_count section, but that's not needed for our 
purpose -- we just need an accurate total document count, and to have no more 
than one document for a given email address in the results.

Unfortunately, this throws an NPE when searching on a tokenized field.  
Searching string fields is fine.  I don't understand exactly why the NPE 
appears, but I did bandaid over it by checking explicitly for nulls at the 
appropriate line in the code.  No more NPE.

There's a downside, which is that if we attempt to collapse on a field other 
than email -- one which has documents appearing in multiple cores -- the 
results are buggy: the first search returns few documents, and the number of 
documents actually displayed don't always match the numFound value.  Then 
upon refresh we get what we think is the correct numFound, and the correct list 
of documents.  This doesn't bother me too much, as you're guaranteed to get 
incorrect answers from the collapse code anyway when collapsing on a field that 
you didn't use as your key for sharding.

In the spirit of Yonik's law of patches, I have made two imperfect patches 
attempting to contribute the fix, or at least point out the error:

1. I pulled trunk, applied the latest SOLR-236 patch, made my 2 line change, 
and created a patch file.  The resultant patch file looks very different from 
the latest SOLR-236 

[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-10-13 Thread Aytek Ekici (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765076#action_12765076
 ] 

Aytek Ekici edited comment on SOLR-236 at 10/13/09 6:46 AM:


Hi all,
Just applied field-collapse-5.patch and i guess there are problems with 
filter queries.

Here it is:

1- http://10.231.14.252:8080/myindex/select?q=*:*fq=lat:[37.2 TO 39.8]
numFound: 6284

2- http://10.231.14.252:8080/myindex/select?q=*:*fq=lng:[24.5 TO 29.9]
numFound: 16912

3- http://10.231.14.252:8080/myindex/select?q=*:*fq=lat:[37.2 TO 
39.8]fq=lng:[24.5 TO 29.9]
numFound: 19419

4- When using q instead of fq which is 
http://10.231.14.252:8080/myindex/select?q=lat:[37.2 TO 39.8] AND lng:[24.5 TO 
29.9]
numFound: 3777 (which is the only correct number)

The thing is, as i understand, instead of applying AND for each filter query 
it applies OR. Checked http://10.231.14.252:8080/myindex/select?q=lat:[37.2 
TO 39.8] OR lng:[24.5 TO 29.9]
numFound: 19419 (same as 3rd one)

Any idea how to fix this?
Thx.

  was (Author: aytek):
Hi all,
Just applied field-collapse-5.patch and i guess there are problems with 
filter queries.

Here it is:

1- Use one(first) filter

http://10.231.14.252:8080/myindex/select?q=*:*fq=lat:[37.2 TO 39.8]
numFound: 6284

2- Use second filter
http://10.231.14.252:8080/myindex/select?q=*:*fq=lng:[24.5 TO 29.9]
numFound: 16912

3- Use both filters
http://10.231.14.252:8080/myindex/select?q=*:*fq=lat:[37.2 TO 
39.8]fq=lng:[24.5 TO 29.9]
numFound: 19419

4- When using q instead of fq which is : 
http://10.231.14.252:8080/myindex/select?q=lat:[37.2 TO 39.8] AND lng:[24.5 TO 
29.9]
numFound: 3777 (which is the only correct number)

The thing is, as i understand, instead of applying AND for each filter query 
it applies OR. Checked http://10.231.14.252:8080/myindex/select?q=lat:[37.2 
TO 39.8] OR lng:[24.5 TO 29.9]
numFound: 19419 (same as 3rd one)

Any idea how to fix this?
Thx.
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, 
 SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-10-13 Thread Aytek Ekici (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765076#action_12765076
 ] 

Aytek Ekici edited comment on SOLR-236 at 10/13/09 6:48 AM:


Hi all,
Just applied field-collapse-5.patch and i guess there are problems with 
filter queries.

Here it is:

1- select?q=*:*fq=lat:[37.2 TO 39.8]
numFound: 6284

2- select?q=*:*fq=lng:[24.5 TO 29.9]
numFound: 16912

3- select?q=*:*fq=lat:[37.2 TO 39.8]fq=lng:[24.5 TO 29.9]
numFound: 19419

4- When using q instead of fq which is: 
select?q=lat:[37.2 TO 39.8] AND lng:[24.5 TO 29.9]
numFound: 3777 (which is the only correct number)

The thing is, as i understand, instead of applying AND for each filter query 
it applies OR. Checked select?q=lat:[37.2 TO 39.8] OR lng:[24.5 TO 29.9]
numFound: 19419 (same as 3rd one)

Any idea how to fix this?
Thx.

  was (Author: aytek):
Hi all,
Just applied field-collapse-5.patch and i guess there are problems with 
filter queries.

Here it is:

1- http://10.231.14.252:8080/myindex/select?q=*:*fq=lat:[37.2 TO 39.8]
numFound: 6284

2- http://10.231.14.252:8080/myindex/select?q=*:*fq=lng:[24.5 TO 29.9]
numFound: 16912

3- http://10.231.14.252:8080/myindex/select?q=*:*fq=lat:[37.2 TO 
39.8]fq=lng:[24.5 TO 29.9]
numFound: 19419

4- When using q instead of fq which is 
http://10.231.14.252:8080/myindex/select?q=lat:[37.2 TO 39.8] AND lng:[24.5 TO 
29.9]
numFound: 3777 (which is the only correct number)

The thing is, as i understand, instead of applying AND for each filter query 
it applies OR. Checked http://10.231.14.252:8080/myindex/select?q=lat:[37.2 
TO 39.8] OR lng:[24.5 TO 29.9]
numFound: 19419 (same as 3rd one)

Any idea how to fix this?
Thx.
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, 
 SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-09-09 Thread Paul Nelson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753335#action_12753335
 ] 

Paul Nelson edited comment on SOLR-236 at 9/9/09 5:07 PM:
--

Hey All:  Just upgraded to 1.4 to get the new patch (many thanks, Martijn). The 
new algorithm appears to be sensitive to the size and complexity of the query 
(rather than simply the count of documents) - should this be the case? 
Unfortunately, we have rather large and complex queries with dozens of terms 
and several phrases, and while these queries are 0.5sec without collapsing, 
they are 3-4sec with collapsing. Meanwhile, collapse using *:* or other simple 
queries come back in 0.5sec - so it appears to be primarily a query-complexity 
issue.

I'm wondering if the filter cache (or some other cache) might be able to help 
with this situation?

  was (Author: pnelsoncomposer):
Hey All:  Just upgraded to 1.4 to get the new patch (many thanks, Martijn). 
The new algorithm appears to be sensitive to the size and complexity of the 
query (rather than simply the count of documents) - should this be the case? 
Unfortunately, we have rather large and complex queries with dozens of terms 
and several phrases, and while these queries are 0.5sec without collapsing, 
they are 3-4sec with collapsing.

I'm wondering if the filter cache (or some other cache) might be able to help 
with this situation?
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, 
 SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-09-09 Thread Paul Nelson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753335#action_12753335
 ] 

Paul Nelson edited comment on SOLR-236 at 9/9/09 5:09 PM:
--

Hey All:  Just upgraded to 1.4 to get the new patch (many thanks, Martijn). The 
new algorithm appears to be sensitive to the size and complexity of the query 
(rather than simply the count of documents) - should this be the case? 
Unfortunately, we have rather large and complex queries with dozens of terms 
and several phrases, and while these queries are 0.5sec without collapsing, 
they are 3-4sec with collapsing. Meanwhile, collapse using \*:\* or other 
simple queries come back in 0.5sec - so it appears to be primarily a 
query-complexity issue.

I'm wondering if the filter cache (or some other cache) might be able to help 
with this situation?

  was (Author: pnelsoncomposer):
Hey All:  Just upgraded to 1.4 to get the new patch (many thanks, Martijn). 
The new algorithm appears to be sensitive to the size and complexity of the 
query (rather than simply the count of documents) - should this be the case? 
Unfortunately, we have rather large and complex queries with dozens of terms 
and several phrases, and while these queries are 0.5sec without collapsing, 
they are 3-4sec with collapsing. Meanwhile, collapse using *:* or other simple 
queries come back in 0.5sec - so it appears to be primarily a query-complexity 
issue.

I'm wondering if the filter cache (or some other cache) might be able to help 
with this situation?
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, 
 SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-09-03 Thread Abdul Chaudhry (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751243#action_12751243
 ] 

Abdul Chaudhry edited comment on SOLR-236 at 9/3/09 5:56 PM:
-

I have some ideas for performance improvements.

I noticed that the code fetches the field cache twice, once for the collapse 
and then for the response object, assuming you asked for the info count in the 
response.

That seems expensive, especially for real-time content.

I think its better to use FieldCache.StringIndex instead of returning a large 
string array and keep it around for the collapse and the response object.

I changed the code so that I keep the cache around like so

  /**
   * Keep the field cached for the collapsed fields for the response object as 
well
   */
  private FieldCache.StringIndex collapseIndex;


To get the index use something like this instead of getting the string array 
for all docs

collapseIndex = FieldCache.DEFAULT.getStringIndex(searcher.getReader(), 
collapseField)

when collapsing , you can get the current value using something like this and 
remove the code passing the array

  int currentId = i.nextDoc();
  String currentValue = 
collapseIndex.lookup[collapseIndex.order[currentId]];

when building the response for the info count, you can reference the same cache 
like so:-

  if (collapseInfoCount) {
resCount.add(collapseFieldType.indexedToReadable(
  collapseIndex.lookup[collapseIndex.order[id]]), count);
  }

I also added timing for the cache access as it could be slow if you are doing a 
lot of updates

I have added code for displaying selected fields for the duplicates but its 
difficult to submit . I hope this gets committed as its hard to sumbit  a patch 
as its not in svn and I cannot submit a patch to a patch to a patch .. you get 
the idea.



  was (Author: abdollar):
I have some ideas for performance improvements.

I noticed that the code fetches the field cache twice, once for the collapse 
and then for the response object, assuming you asked for the info count in the 
response.

That seems expensive, especially for real-time content.

I think its better to use FieldCache.StringIndex instead of returning a large 
string array and keep it around for the collapse and the response object.

I changed the code so that I keep the cache around like so

  /**
   * Keep the field cached for the collapsed fields for the response object as 
well
   */
  private FieldCache.StringIndex collapseIndex;


when collapsing , you can get the current value using something like this and 
remove the code passing the array

  int currentId = i.nextDoc();
  String currentValue = 
collapseIndex.lookup[collapseIndex.order[currentId]];

when building the response for the info count, you can reference the same cache 
like so:-

  if (collapseInfoCount) {
resCount.add(collapseFieldType.indexedToReadable(
  collapseIndex.lookup[collapseIndex.order[id]]), count);
  }

I also added timing for the cache access as it could be slow if you are doing a 
lot of updates

I have added code for displaying selected fields for the duplicates but its 
difficult to submit . I hope this gets committed as its hard to sumbit  a patch 
as its not in svn and I cannot submit a patch to a patch to a patch .. you get 
the idea.

  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, 
 SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 

[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-09-02 Thread Martijn van Groningen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12750582#action_12750582
 ] 

Martijn van Groningen edited comment on SOLR-236 at 9/2/09 11:18 AM:
-

Yes, specifying which collapse fields to return is a good idea. Just like the 
fl parameter for a normal request. 
I was thinking about how to fit this new feature into the current patch and I 
thought that it might be a good idea to revise the current field collapse 
result format. So that the results of this feature can fit nicely into the 
response. 

Currently the collapse response is like this:
{code:xml}
lst name=collapse_counts
str name=fieldvenue/str
lst name=doc
int name=2332381/int
/lst
lst name=count
int name=melkweg1/int
/lst
/lst
{code}

I think a response format like the following would be more 
{code:xml}
lst name=collapse_counts
str name=fieldvenue/str
lst name=results
lst name=233238
 str name=fieldValuemelkweg/str
 int name=collapseCount2/int
 lst name=collapsedValues
 str name=price10.99, 1.999,99/str
 str name=nameadapter, laptop/str
 /lst
/lst
/lst
/lst
{code}
As you can see the data is more banded together and therefore easier to parse. 
The collapsedValues can have one or more fields, each containing collapsed 
field values in a comma separated format. The _collapseValues_ element will off 
course only be added when the client specifies the collapsed fields in the 
request.
What do you think about this new result format? 

  was (Author: martijn):
Yes, specifying which collapse fields to return is a good idea. Just like 
the fl parameter for a normal request. 
I was thinking about how to fit this new feature into the current patch and I 
thought that it might be a good idea to revise the current field collapse 
result format. So that the results of this feature can fit nicely into the 
response. 

Currently the collapse response is like this:
{code:xml}
lst name=collapse_counts
str name=fieldvenue/str
lst name=doc
int name=2332381/int
/lst
lst name=count
int name=melkweg1/int
/lst
/lst
{code}

I think a response format like the following would be more 
{code:xml}
lst name=collapse_counts
str name=fieldvenue/str
lst name=
lst name=233238
 str name=fieldValuemelkweg/str
 int name=collapseCount2/int
 lst name=collapsedValues
 str name=price10.99, 1.999,99/str
 str name=nameadapter, laptop/str
 /lst
/lst
/lst
{code}
As you can see the data is more banded together and therefore easier to parse. 
The collapsedValues can have one or more fields, each containing collapsed 
field values in a comma separated format. The _collapseValues_ element will off 
course only be added when the client specifies the collapsed fields in the 
request.
What do you think about this new result format? 
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, 
 SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to 

[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-06-18 Thread Martijn van Groningen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721100#action_12721100
 ] 

Martijn van Groningen edited comment on SOLR-236 at 6/18/09 2:26 AM:
-

I have not found an online example yet, but I copied this config from the 
javadoc of the DistanceCalculatingComponent class and modified it. The patch 
also modifies the solr examples, so i f you look there you can see how the 
patch is used (example/solr/conf/schema.xml and 
example/solr/conf/solrconfig.xml). You need to add an extra update processor 
and an extra field and dynamic field in order to make it work.

  was (Author: martijn):
I have not found an online example yet, but I copied this config from the 
javadoc of the DistanceCalculatingComponent class and modified it. 
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, 
 field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, 
 field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-06-15 Thread Shekhar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719677#action_12719677
 ] 

Shekhar edited comment on SOLR-236 at 6/15/09 3:34 PM:
---

Here is the solfconfig file.


requestHandler name=geo class=solr.SearchHandler
lst name=defaults 
str name=echoParamsexplicit/str
/lst
 
arr name=components
  strlocalsolr/str
strcollapse/str 
/arr

/requestHandler

You can get more details from http://www.gissearch.com/localsolr


===

Following are the results I am getting :

response
−
lst name=responseHeader
int name=status0/int
int name=QTime146/int
−
lst name=params
str name=lat41.883784/str
str name=radius50/str
str name=collapse.fieldresource_id/str
str name=rows2/str
str name=indenton/str
str name=flresource_id,geo_distance/str
str name=qTV/str
str name=qtgeo/str
str name=long-87.637668/str
/lst
/lst
−
result name=response numFound=4294 start=0
−
doc
int name=resource_id10018/int
double name=geo_distance26.16691883965225/double
/doc
−
doc
int name=resource_id10102/int
double name=geo_distance39.90588996589528/double
/doc
/result
−
lst name=collapse_counts
str name=fieldresource_id/str
−
lst name=doc
int name=10022116/int
int name=117014/int
/lst
−
lst name=count
int name=10015116/int
int name=100184/int
/lst
−
lst name=debug
str name=Docset typeBitDocSet(5201)/str
long name=Total collapsing time(ms)46/long
long name=Create uncollapsed docset(ms)22/long
long name=Collapsing normal time(ms)24/long
long name=Creating collapseinfo time(ms)0/long
long name=Convert to bitset time(ms)0/long
long name=Create collapsed docset time(ms)0/long
/lst
/lst
−
result name=response numFound=5201 start=0
−
doc
int name=resource_id10015/int
/doc
−
doc
int name=resource_id10018/int
/doc
/result
/response

  was (Author: csnirkhe):
Here is the solfconfig file.


requestHandler name=geo class=solr.SearchHandler
lst name=defaults 
str name=echoParamsexplicit/str
/lst
 
arr name=components
  strlocalsolr/str
strcollapse/str 
/arr

/requestHandler


Following are the results I am getting :

response
−
lst name=responseHeader
int name=status0/int
int name=QTime146/int
−
lst name=params
str name=lat41.883784/str
str name=radius50/str
str name=collapse.fieldresource_id/str
str name=rows2/str
str name=indenton/str
str name=flresource_id,geo_distance/str
str name=qTV/str
str name=qtgeo/str
str name=long-87.637668/str
/lst
/lst
−
result name=response numFound=4294 start=0
−
doc
int name=resource_id10018/int
double name=geo_distance26.16691883965225/double
/doc
−
doc
int name=resource_id10102/int
double name=geo_distance39.90588996589528/double
/doc
/result
−
lst name=collapse_counts
str name=fieldresource_id/str
−
lst name=doc
int name=10022116/int
int name=117014/int
/lst
−
lst name=count
int name=10015116/int
int name=100184/int
/lst
−
lst name=debug
str name=Docset typeBitDocSet(5201)/str
long name=Total collapsing time(ms)46/long
long name=Create uncollapsed docset(ms)22/long
long name=Collapsing normal time(ms)24/long
long name=Creating collapseinfo time(ms)0/long
long name=Convert to bitset time(ms)0/long
long name=Create collapsed docset time(ms)0/long
/lst
/lst
−
result name=response numFound=5201 start=0
−
doc
int name=resource_id10015/int
/doc
−
doc
int name=resource_id10018/int
/doc
/result
/response
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, 
 field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, 
 field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new 

[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-06-03 Thread Ron Veenstra (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12716128#action_12716128
 ] 

Ron Veenstra edited comment on SOLR-236 at 6/3/09 7:22 PM:
---

I require assistance.  I've installed a fresh Solr (1.3.0), and all 
appears/operates well.  I then patch using SOLR-236_collapsing.patch [by
Thomas Traeger]  (the last patch i saw claimed to work with 1.3.0), without 
error.  I then add to solrconfig.xml the following (per: 
http://wiki.apache.org/solr/FieldCollapsing) :

  searchComponent name=collapse 
class=org.apache.solr.handler.component.CollapseComponent /

Upon restart, I get a long configuration error, which seems to hinge on:

HTTP Status 500 - Severe errors in solr configuration. Check your log files for 
more detailed information on what may be wrong. If you want solr to continue 
after configuration errors, change: 
abortOnConfigurationErrorfalse/abortOnConfigurationError in solrconfig.xml 
- 
org.apache.solr.common.SolrException: Error loading class 
'org.apache.solr.handler.component.CollapseComponent' at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:273)

[the full error can be included if desired.]

I've verified that the CollapseComponent file exists in the proper place.
I've moved CollapseParams as required, (move CollapseParams.java from 
common/org/apache/solr/common/params to java/org/apache/solr/common/params/ )
I've tried multiple iterations of the patch (on fresh installs), all with the 
same issue.

Are there additional steps, patches, or configurations that are required?
Is this a known issue?
Any help is very much appreciated.

  was (Author: ronunism):
I require assistance.  I've installed a fresh Solr (1.3.0), and all 
appears/operates well.  I then patch using SOLR-236_collapsing.patch (the last 
patch i saw claimed to work with 1.3.0), without error.  I then add to 
solrconfig.xml the following (per: http://wiki.apache.org/solr/FieldCollapsing) 
:

  searchComponent name=collapse 
class=org.apache.solr.handler.component.CollapseComponent /

Upon restart, I get a long configuration error, which seems to hinge on:

HTTP Status 500 - Severe errors in solr configuration. Check your log files for 
more detailed information on what may be wrong. If you want solr to continue 
after configuration errors, change: 
abortOnConfigurationErrorfalse/abortOnConfigurationError in solrconfig.xml 
- 
org.apache.solr.common.SolrException: Error loading class 
'org.apache.solr.handler.component.CollapseComponent' at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:273)

[the full error can be included if desired.]

I've verified that the CollapseComponent file exists in the proper place.
I've moved CollapseParams as required, (move CollapseParams.java from 
common/org/apache/solr/common/params to java/org/apache/solr/common/params/ )
I've tried multiple iterations of the patch (on fresh installs), all with the 
same issue.

Are there additional steps, patches, or configurations that are required?
Is this a known issue?
Any help is very much appreciated.
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, 
 field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, 
 field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous 

[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-05-29 Thread Martijn van Groningen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714442#action_12714442
 ] 

Martijn van Groningen edited comment on SOLR-236 at 5/29/09 6:02 AM:
-

Hi,

I have modified the latest patch of Thomas and made two performance 
improvements: 
1) Improved normal field collapsing. I tested it with an index 1.1 million 
documents. When collapsing on all documents and with no sorting specified (so 
sorting on score) the query time is around 130ms compared with the previous 
patch which is around 1.5 s. When I then add sorting on string field the query 
time is around 220 ms compared with the previous patch which is around 5.2 s. 

The reason why it is faster is because the latest patch queries for a doclist 
instead of a docset. In the normal collapse method it keeps track of the most 
relevant documents, so the end result is the same, also creating a docList of 
1.1 million documents (and ordering it) is very expensive.

Note: I did not improved adjacent collapsing, because the adjacent method needs 
(as far as I understand it) a completely sorted list of documents (docList).

2) Slightly improved facetation in combination with field collapsing, by 
reusing the uncollapsed docset that is created during the collapsing process 
(the previous patch made invoked a second search).

I also have added documentation, added a few unit tests for the collapsing 
process itself and made the debug information more readable.

I'm very interested in other people's experiences with this patch and feedback 
on the patch itself. 

Cheers,

Martijn 


  was (Author: martijn):
Hi,

I have modified the latest patch of Thomas and made two performance 
improvements: 
1) Improved normal field collapsing. I tested it with an index 1.1 million 
documents. When collapsing on all documents and with no sorting specified (so 
sorting on score) the query time is around 130ms compared with the previous 
patch which is around 1.5 s. When I then add sorting on string field the query 
time is around 220 ms compared with the previous patch which is around 5.2 s. 

The reason why it is faster is because the latest patch queries for a doclist 
instead of a docset. In the normal collapse method it keeps track of the most 
relevant documents, so the end result is the same, also creating a docList of 
1.1 million documents (and ordering it) is very expensive.

Note: I did not improved adjacent collapsing, because the adjacent method needs 
(as far as I understand it) a completely sorted list of documents (docList).

2) Sightly improved facetation in combination with field collapsing, by reusing 
the uncollapsed docset that is created during the collapsing process (the 
previous patch made invoked a second search).

I also have added documentation, added a few unit tests for the collapsing 
process itself and made the debug information easier readable.

I'm very interested in other people's experiences with this patch and feedback 
on the patch itself. 

Cheers,

Martijn 

  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, 
 SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: 

[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-05-29 Thread Martijn van Groningen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714442#action_12714442
 ] 

Martijn van Groningen edited comment on SOLR-236 at 5/29/09 11:38 AM:
--

Hi,

I have modified the latest patch of Thomas and made two performance 
improvements: 
1) Improved normal field collapsing. I tested it with an index 1.1 million 
documents. When collapsing on all documents and with no sorting specified (so 
sorting on score) the query time is around 130ms compared with the previous 
patch which is around 1.5 s. When I then add sorting on string field the query 
time is around 220 ms compared with the previous patch which is around 5.2 s. 

The reason why it is faster is because the latest patch queries for a doclist 
instead of a docset. In the normal collapse method it keeps track of the most 
relevant documents, so the end result is the same, also creating a docList of 
1.1 million documents (and ordering it) is very expensive.

Note: I did not improved adjacent collapsing, because the adjacent method needs 
(as far as I understand it) a completely sorted list of documents (docList).

2) Slightly improved facetation in combination with field collapsing, by 
reusing the uncollapsed docset that is created during the collapsing process 
(the previous patch made invoked a second search).

I also have added documentation, added a few unit tests for the collapsing 
process itself and made the debug information more readable.
This patch works from revision 779335 (last Wednesday) and up. This patch 
depends on some changes in Solr and a change inside Lucene.

I'm very interested in other people's experiences with this patch and feedback 
on the patch itself. 

Cheers,

Martijn 


  was (Author: martijn):
Hi,

I have modified the latest patch of Thomas and made two performance 
improvements: 
1) Improved normal field collapsing. I tested it with an index 1.1 million 
documents. When collapsing on all documents and with no sorting specified (so 
sorting on score) the query time is around 130ms compared with the previous 
patch which is around 1.5 s. When I then add sorting on string field the query 
time is around 220 ms compared with the previous patch which is around 5.2 s. 

The reason why it is faster is because the latest patch queries for a doclist 
instead of a docset. In the normal collapse method it keeps track of the most 
relevant documents, so the end result is the same, also creating a docList of 
1.1 million documents (and ordering it) is very expensive.

Note: I did not improved adjacent collapsing, because the adjacent method needs 
(as far as I understand it) a completely sorted list of documents (docList).

2) Slightly improved facetation in combination with field collapsing, by 
reusing the uncollapsed docset that is created during the collapsing process 
(the previous patch made invoked a second search).

I also have added documentation, added a few unit tests for the collapsing 
process itself and made the debug information more readable.

I'm very interested in other people's experiences with this patch and feedback 
on the patch itself. 

Cheers,

Martijn 

  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, 
 SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - 

[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-05-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12705959#action_12705959
 ] 

Domingo Gómez García edited comment on SOLR-236 at 5/5/09 1:53 AM:
---

The results of collapse_counts are not what i have expected. It losses many 
categories, only showing a few . I tried incrementing the collapse.max 
parameter:

max=1 results 

lst name=doc
int name=2008/LICOBLE-00023109/int
int name=2008/LICOBLE-35/int
int name=2009/LICOBLE-000364/int
int name=2009/LICOBLE-000951/int
/lst
−
lst name=count
int name=12740109/int
int name=127415/int
int name=132824/int
int1/int
/lst


max=2 results

lst name=doc
int name=2009/LICOBLE-8108/int
int name=2007/LICOBLE-14/int
/lst
−
lst name=count
int name=12740108/int
int name=127414/int
/lst


max=3 results

lst name=doc
int name=2008/LICOBLE-00020107/int
int name=2008/LICOBLE-000213/int
/lst
−
lst name=count
int name=12740107/int
int name=127413/int
/lst


max=4

lst name=doc
int name=2009/LICOBLE-00060106/int
/lst
−
lst name=count
int name=12740106/int
/lst

How is possible to get less results each time? There are like 70 categories, do 
I have any way to obtain all those counts? Am I mising any collapsing concept?
Thanks.

  was (Author: dgomezca):
The results of collapse_counts are not what i have expected. It losses many 
categories, only showing . I tried incrementing the collapse.max parameter:

max=1 results 

lst name=doc
int name=2008/LICOBLE-00023109/int
int name=2008/LICOBLE-35/int
int name=2009/LICOBLE-000364/int
int name=2009/LICOBLE-000951/int
/lst
−
lst name=count
int name=12740109/int
int name=127415/int
int name=132824/int
int1/int
/lst


max=2 results

lst name=doc
int name=2009/LICOBLE-8108/int
int name=2007/LICOBLE-14/int
/lst
−
lst name=count
int name=12740108/int
int name=127414/int
/lst


max=3 results

lst name=doc
int name=2008/LICOBLE-00020107/int
int name=2008/LICOBLE-000213/int
/lst
−
lst name=count
int name=12740107/int
int name=127413/int
/lst


max=4

lst name=doc
int name=2009/LICOBLE-00060106/int
/lst
−
lst name=count
int name=12740106/int
/lst

How is possible to get less results each time? There are like 70 categories, do 
I have any way to obtain all those counts? Am I mising any collapsing concept?
Thanks.
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-04-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12701862#action_12701862
 ] 

Domingo Gómez García edited comment on SOLR-236 at 4/29/09 4:23 AM:


I made checkout on svn release-1.3.0 and applied SOLR-236_collapsing.patch. I 
have upgraded from 1.2 to 1.3.0 (patched) and I get a lot of permgen 
exceptions. Specially in calls from solrj.

  was (Author: dgomezca):
I made checkout on svn release-1.3.0 and applied SOLR-236_collapsing.patch. 
After the task generate-maven-artifacts I use the resulting distribution and 
made  
http://localhost:8983/solr/select/?q=*:*collapse.field=catcollapse.max=1collapse.type=normal
 (from wiki). No collapsed results. It seems to be ignoring CollapseComponent 
or something like that.
Do I have to configure something else?
Could anyone bring to me a working version/patch?

Thank you.
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-04-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12701862#action_12701862
 ] 

Domingo Gómez García edited comment on SOLR-236 at 4/29/09 4:29 AM:


I made checkout on svn release-1.3.0 and applied SOLR-236_collapsing.patch. 
When I use collapse parameters  I always get permgen exceptions. How much 
memory could use collapse vs normal querys?

  was (Author: dgomezca):
I made checkout on svn release-1.3.0 and applied SOLR-236_collapsing.patch. 
I have upgraded from 1.2 to 1.3.0 (patched) and I get a lot of permgen 
exceptions. Specially in calls from solrj.
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-04-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12701862#action_12701862
 ] 

Domingo Gómez García edited comment on SOLR-236 at 4/29/09 6:46 AM:


I made checkout on svn release-1.3.0 and applied SOLR-236_collapsing.patch.
Is there any way of integrate with solrj?

  was (Author: dgomezca):
I made checkout on svn release-1.3.0 and applied SOLR-236_collapsing.patch. 
When I use collapse parameters  I always get permgen exceptions. How much 
memory could use collapse vs normal querys?
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-04-16 Thread Jeff (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699903#action_12699903
 ] 

Jeff edited comment on SOLR-236 at 4/16/09 3:22 PM:


We have tried to integrate the most recent patch into our 1.4 install.  The 
patching was smooth and overall it works good.  However, it appears the issue 
with fq has returned.  Whenever I try to filter the query it gives Either 
filter or filterList may be set in the QueryCommand, but not both.  Not sure 
what happened.  What part of the patch makes it possible for fq to work as it 
may not be there now.

Additionally, the collapse.facet=before seems to not work.  Any help in this 
area would be greatly appreciated.

  was (Author: jnewburn):
We have tried to integrate the most recent patch into our 1.4 install.  The 
patching was smooth and overall it works good.  However, it appears the issue 
with fq has returned.  Whenever I try to filter the query it gives Either 
filter or filterList may be set in the QueryCommand, but not both.  Not sure 
what happened.  What part of the patch makes it possible for fq to work as it 
may not be there now.
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-04-10 Thread Dave Redford (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12694851#action_12694851
 ] 

Dave Redford edited comment on SOLR-236 at 4/10/09 6:47 PM:


There is an issue with collapsed result ordering when querying with only the 
unique Id and score fields in the request.

[Update: this is only an issue when both standard results and collapse results 
are present - which I was using for testing]

eg: 
q=fordversion=2.2start=0rows=10indent=onfl=Id,scorecollapse.field=PrimaryIdcollapse.max=1

gives wrong ordering (note: Id is our unique Id)

but adding a another field - even a bogus one - works.
q=fordversion=2.2start=0rows=10indent=onfl=Id,score,boguscollapse.field=PrimaryIdcollapse.max=1

Also using an fq makes it work 
eg:
fq=Type:articlesq=fordversion=2.2start=0rows=10indent=onfl=Id,scorecollapse.field=PrimaryIdcollapse.max=1

I'm using the latest Dmitry patch (25/mar/09) against 1.3.0.

Apart from that great so far...thanks to all


  was (Author: dredford):
There is an issue with collapsed result ordering when querying with only 
the unique Id and score fields in the request.

[Update: this is only an issue when both standard results and collapse results 
are present - which I was using for testing]

eg: 
q=fordversion=2.2start=0rows=10indent=onfl=Id,scorecollapse.field=PrimaryIdcollapse.max=1

gives wrong ordering (note: Id is our unique Id)

but adding a another field - even a bogus one - works.
q=fordversion=2.2start=0rows=10indent=onfl=Id,score,boguscollapse.field=PrimaryIdcollapse.max=1

Also using an fq makes it work 
eg:
fq=Type:articlesq=fordversion=2.2start=0rows=10indent=onfl=Id,scorecollapse.field=PrimaryIdcollapse.max=1

I'm using the latest Dmitry patch (25/mar/09) against 1.3.0.

Apart from that great so far...

  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-04-10 Thread Dave Redford (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12694851#action_12694851
 ] 

Dave Redford edited comment on SOLR-236 at 4/10/09 6:46 PM:


There is an issue with collapsed result ordering when querying with only the 
unique Id and score fields in the request.

[Update: this is only an issue when both standard results and collapse results 
are present - which I was using for testing]

eg: 
q=fordversion=2.2start=0rows=10indent=onfl=Id,scorecollapse.field=PrimaryIdcollapse.max=1

gives wrong ordering (note: Id is our unique Id)

but adding a another field - even a bogus one - works.
q=fordversion=2.2start=0rows=10indent=onfl=Id,score,boguscollapse.field=PrimaryIdcollapse.max=1

Also using an fq makes it work 
eg:
fq=Type:articlesq=fordversion=2.2start=0rows=10indent=onfl=Id,scorecollapse.field=PrimaryIdcollapse.max=1

I'm using the latest Dmitry patch (25/mar/09) against 1.3.0.

Apart from that great so far...


  was (Author: dredford):
There is an issue with collapsed result ordering when querying with only 
the unique Id and score fields in the request.

eg: 
q=fordversion=2.2start=0rows=10indent=onfl=Id,scorecollapse.field=PrimaryIdcollapse.max=1

gives wrong ordering (note: Id is our unique Id)

but adding a another field - even a bogus one - works.
q=fordversion=2.2start=0rows=10indent=onfl=Id,score,boguscollapse.field=PrimaryIdcollapse.max=1

Also using an fq makes it work 
eg:
fq=Type:articlesq=fordversion=2.2start=0rows=10indent=onfl=Id,scorecollapse.field=PrimaryIdcollapse.max=1

I'm using the latest Dmitry patch (25/mar/09) against 1.3.0.

Apart from that great so far...

  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-04-01 Thread Dave Redford (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12694851#action_12694851
 ] 

Dave Redford edited comment on SOLR-236 at 4/1/09 5:56 PM:
---

There is an issue with collapsed result ordering when querying with only the 
unique Id and score fields in the request.

eg: 
q=fordversion=2.2start=0rows=10indent=onfl=Id,scorecollapse.field=PrimaryIdcollapse.max=1

gives wrong ordering (note: Id is our unique Id)

but adding a another field - even a bogus one - works.
q=fordversion=2.2start=0rows=10indent=onfl=Id,score,boguscollapse.field=PrimaryIdcollapse.max=1

Also using an fq makes it work 
eg:
fq=Type:articlesq=fordversion=2.2start=0rows=10indent=onfl=Id,scorecollapse.field=PrimaryIdcollapse.max=1

I'm using the latest Dmitry patch (25/mar/09) against 1.3.0.

Apart from that great so far...


  was (Author: dredford):
There is an issue with collapsed result ordering when querying with only 
the unique Id and score fields in the request.

eg: 
q=fordversion=2.2start=0rows=10indent=onfl=Id,scorecollapse.field=PrimaryIdcollapse.max=1

gives wrong order (note: Id is our unique Id)

but 
q=fordversion=2.2start=0rows=10indent=onfl=Id,score,boguscollapse.field=PrimaryIdcollapse.max=1

Also using an fq make it work eg:

fq=Type:articlesq=fordversion=2.2start=0rows=10indent=onfl=Id,scorecollapse.field=PrimaryIdcollapse.max=1

I'm using the latest Dmitry patch (25/mar/09) against 1.3.0.

Apart from that great so far...

  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-03-06 Thread Stephen Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12679603#action_12679603
 ] 

jove4015 edited comment on SOLR-236 at 3/6/09 6:13 AM:


Help!!

We've been using this patch in production for months now, and suddenly in the 
last 3 days it is crashing constantly.

[Edit - It's Ivan's latest patch, #3, with Solr 1.3 dist]

Mar 6, 2009 5:23:50 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.OutOfMemoryError: Java heap space
at 
org.apache.solr.util.OpenBitSet.ensureCapacityWords(OpenBitSet.java:701)
at org.apache.solr.util.OpenBitSet.ensureCapacity(OpenBitSet.java:711)
at org.apache.solr.util.OpenBitSet.expandingWordNum(OpenBitSet.java:280)
at org.apache.solr.util.OpenBitSet.set(OpenBitSet.java:221)
at org.apache.solr.search.CollapseFilter.addDoc(CollapseFilter.java:217)
at 
org.apache.solr.search.CollapseFilter.adjacentCollapse(CollapseFilter.java:171)
at org.apache.solr.search.CollapseFilter.init(CollapseFilter.java:139)
at 
org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:52)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:361)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:324)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)


It seems to happen randomly - there's no special request happening, nothing new 
added to the index, nothing.  We've made no configuration changes. The only 
thing that's happened is more documents have been added since then.  The schema 
is the same, we have perhaps 20 more documents in the index now than we did 
when we first went live with it.

It was a 32-bit machine allocated 2GB of RAM for Java before.  We just upgraded 
it to 64-bit and increased the heap space to 3GB, and still it went down last 
night.  I'm at my wits end, I don't know what to do but this functionality has 
been live so long now it's going to be extremely painful to take it away.  
Someone, please tell me if there's anything I can do to save this thing.

  was (Author: jove4015):
Help!!

We've been using this patch in production for months now, and suddenly in the 
last 3 days it is crashing constantly.

[Edit - It's Ivan's latest patch, #3]

Mar 6, 2009 5:23:50 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.OutOfMemoryError: Java heap space
at 
org.apache.solr.util.OpenBitSet.ensureCapacityWords(OpenBitSet.java:701)
at org.apache.solr.util.OpenBitSet.ensureCapacity(OpenBitSet.java:711)
at org.apache.solr.util.OpenBitSet.expandingWordNum(OpenBitSet.java:280)
at org.apache.solr.util.OpenBitSet.set(OpenBitSet.java:221)
at org.apache.solr.search.CollapseFilter.addDoc(CollapseFilter.java:217)
at 
org.apache.solr.search.CollapseFilter.adjacentCollapse(CollapseFilter.java:171)
at org.apache.solr.search.CollapseFilter.init(CollapseFilter.java:139)
at 
org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:52)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169)
at 

[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2008-12-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12655269#action_12655269
 ] 

ivan.prado edited comment on SOLR-236 at 12/10/08 8:34 AM:
--

I have attached new patch with the problems solved in my first submitted patch. 
Doug Steigerwald, could you check if this patch works with for you? Thanks. 

  was (Author: ivan.prado):
A new patch with problems solved in my first submitted patch. 
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.4

 Attachments: collapsing-patch-to-1.3.0-ivan.patch, 
 collapsing-patch-to-1.3.0-ivan_2.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, solr-236.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2008-12-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12655269#action_12655269
 ] 

ivan.prado edited comment on SOLR-236 at 12/10/08 8:35 AM:
--

I have attached new patch with the problems solved in my first submitted patch. 
Doug Steigerwald, could you check if this patch works well for you? Thanks. 

  was (Author: ivan.prado):
I have attached new patch with the problems solved in my first submitted 
patch. Doug Steigerwald, could you check if this patch works with for you? 
Thanks. 
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.4

 Attachments: collapsing-patch-to-1.3.0-ivan.patch, 
 collapsing-patch-to-1.3.0-ivan_2.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, solr-236.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2008-10-09 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12638359#action_12638359
 ] 

[EMAIL PROTECTED] edited comment on SOLR-236 at 10/9/08 12:53 PM:


bq. What's a hard drive sort? 

Sorry - was not very clear.

Just like sorting, finding dupes can be done in memory or using external 
storage (harddrive). I am only just looking into this stuff myself, but it 
seems in the best case you would want to do it in memory with a hash system 
which can be linear scalability. If you have too many items to look for dupes 
in, you have to use external storage - one good method is two sorts (we get one 
from the search), but there are other options too I think. In this case, the 
sorts are able to be done in memory though, but I think the hashtable method of 
identifying dupes is much less memory efficient (too many unique terms).

  was (Author: [EMAIL PROTECTED]):
bq. What's a hard drive sort? 

Sorry - was not very clear.

Just like sorting, finding dupes can be done in memory or using external 
storage (harddrive). I am only just looking into this stuff myself, but it 
seems in the best case you would want to do it in memory with a hash system 
which can be linear scalability. If you have too many items to look for dupes 
in, you have to use external storage - one good method is two external sorts 
(we get one from the search), but there are other options too I think.
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Otis Gospodnetic
 Fix For: 1.4

 Attachments: field-collapsing-extended-592129.patch, 
 field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2008-10-06 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12636978#action_12636978
 ] 

[EMAIL PROTECTED] edited comment on SOLR-236 at 10/6/08 12:21 PM:


Sorting twice (when not sorting on the collapse field) only makes sense if we 
are doing external sorts (harddrive), correct ? It seems to me that this should 
be closer to the facet stuff (in using the field cache) and then use a hash 
table of accumulators: linear time (is that generally?) right? (edit: looks 
like thats _too_ memory intensive)

As Otis mentions above, this issue appears very popular. We should finish it up.

  was (Author: [EMAIL PROTECTED]):
Sorting twice (when not sorting on the collapse field) only makes sense if 
we are doing external sorts (harddrive), correct ? It seems to me that this 
should be closer to the facet stuff (in using the field cache) and then use a 
hash table of accumulators: linear time (is that generally?) right?

As Otis mentions above, this issue appears very popular. We should finish it up.
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Otis Gospodnetic
 Fix For: 1.4

 Attachments: field-collapsing-extended-592129.patch, 
 field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2008-08-21 Thread Oleg Gnatovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12624421#action_12624421
 ] 

oleg_gnatovskiy edited comment on SOLR-236 at 8/21/08 10:34 AM:


I was able to hack the latest patch in, and to get it to work, but it required 
some pretty heavy naive changes...

If you are getting an NPE try this: in the SolrIndexSearcher class, in the 
getDocListC method change out = new DocListAndSet(); to

DocListAndSet out = null;
if(qr.getDocListAndSet() == null)
out = new DocListAndSet();
else
out = qr.getDocListAndSet();

  was (Author: oleg_gnatovskiy):
I was able to hack the latest patch in, and to get it to work, but it 
required some pretty heavy naive changes...
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Otis Gospodnetic
 Fix For: 1.4

 Attachments: field-collapsing-extended-592129.patch, 
 field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2008-02-07 Thread Oleg Gnatovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12566864#action_12566864
 ] 

oleg_gnatovskiy edited comment on SOLR-236 at 2/7/08 4:15 PM:
--

Hello everyone. I am planning to implement chain collapsing on a high traffic 
production environment, so I'd like to use a stable version of Solr. It doesn't 
seem like you have a chain collapse patch for Solr 1.2, so I tried the Solr 1.1 
patch. It seems to work fine at collapsing, but how do I get a countt for the 
documents other then the one being displayed?

As a result I see:

lst name=collapse_counts
int name=Restaurant2414/int
int name=Bar/Club9/int
int name=Directory  Services37/int
/lst

Does that mean that there are 2414 more Restaurants, 9 more Bars and 37 more 
Directory  Services? If so, then that's great.

However when I collapse on some integer fields I get an empty list for 
collapse_counts. Do counts only work for text fields?

Thanks in advance for any help you can provide!

  was (Author: oleg_gnatovskiy):
Hello everyone. I am planning to implement chain collapsing on a high 
traffic production environment, so I'd like to use a stable version of Solr. It 
doesn't seem like you have a chain collapse patch for Solr 1.2, so I tried the 
Solr 1.1 patch. It seems to work fine at collapsing, but how do I get a countt 
for the documents other then the one being displayed?

As a result I see:
code
lst name=collapse_counts
int name=Restaurant2414/int
int name=Bar/Club9/int
int name=Directory  Services37/int
/lst
/code

Does that mean that there are 2414 more Restaurants, 9 more Bars and 37 more 
Directory  Services? If so, then that's great.

However when I collapse on some integer fields I get an empty list for 
collapse_counts. Do counts only work for text fields?

Thanks in advance for any help you can provide!
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Attachments: field-collapsing-extended-592129.patch, 
 field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2008-02-07 Thread Oleg Gnatovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12566864#action_12566864
 ] 

oleg_gnatovskiy edited comment on SOLR-236 at 2/7/08 4:18 PM:
--

Hello everyone. I am planning to implement chain collapsing on a high traffic 
production environment, so I'd like to use a stable version of Solr. It doesn't 
seem like you have a chain collapse patch for Solr 1.2, so I tried the Solr 1.1 
patch. It seems to work fine at collapsing, but how do I get a countt for the 
documents other then the one being displayed?

As a result I see:

lst name=collapse_counts
int name=Restaurant2414/int
int name=Bar/Club9/int
int name=Directory  Services37/int
/lst

Does that mean that there are 2414 more Restaurants, 9 more Bars and 37 more 
Directory  Services? If so, then that's great.

However when I collapse on some  fields I get an empty collapse_counts list. It 
could be that those fields have a large number of different values that it 
collapses on. Is there a limit to the number of values that collaose_counts 
displays?

Thanks in advance for any help you can provide!

  was (Author: oleg_gnatovskiy):
Hello everyone. I am planning to implement chain collapsing on a high 
traffic production environment, so I'd like to use a stable version of Solr. It 
doesn't seem like you have a chain collapse patch for Solr 1.2, so I tried the 
Solr 1.1 patch. It seems to work fine at collapsing, but how do I get a countt 
for the documents other then the one being displayed?

As a result I see:

lst name=collapse_counts
int name=Restaurant2414/int
int name=Bar/Club9/int
int name=Directory  Services37/int
/lst

Does that mean that there are 2414 more Restaurants, 9 more Bars and 37 more 
Directory  Services? If so, then that's great.

However when I collapse on some integer fields I get an empty list for 
collapse_counts. Do counts only work for text fields?

Thanks in advance for any help you can provide!
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Attachments: field-collapsing-extended-592129.patch, 
 field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2008-02-01 Thread Charles Hornberger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12564966#action_12564966
 ] 

clh edited comment on SOLR-236 at 2/1/08 2:59 PM:
-

Ah ... got the beginnings of a diagnosis. The problem appears when the DocSet 
{{qDocSet}} returned by DocSetHitCollector.getDocSet() -- called at 
org.apache.solr.search.SolrIndexSearcher:1101 in trunk, or 1108 with the 
field_collapsing patch applied, inside getDocListAndSetNC()) -- is a BitDocSet, 
and not when it's a HashDocSet. As the stack trace above shows, calling 
intersection() on a BitDocSet object invokes the superclass' 
DocSetBase.intersection() method, which invokes a call chain that blows up when 
it hits the iterator() method of the NegatedDocSet passed in as the {{filter}} 
parameter to getDocListAndSetNC(); NegatedDocSet.iterator() blows up by design:

{code}
public DocIterator iterator() {
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, Unsupported 
Operation);
}
{code}

I see that DocSetBase.intersection(DocSet other) has special-casing logic for 
dealing with {{other}} parameters that are instances of HashDocSet; does it 
also need special casing logic for dealing with {{other}} parameters that are 
NegatedDocSets? Or should NegatedDocSet *really* implement iterator()? Or 
something else entirely?

  was (Author: clh):
Ah ... got the beginnings of a diagnosis. The problem appears when the 
DocSet {{qDocSet}} returned by DocSetHitCollector.getDocSet() -- called at 
org.apache.solr.search.SolrIndexSearcher:1101 in trunk, or 1108 with the 
field_collapsing patch applied, inside getDocListAndSetNC()) -- is a BitDocSet, 
and not when it's a HashDocSet. As the stack trace above shows, calling 
intersection() on a BitDocSet object invokes the superclass' 
DocSetBase.intersection() method, which invokes a call chain that blows up when 
it hits the iterator() method of the NegatedDocSet passed in as the {{filter}} 
parameter to getDocListAndSetNC(); NegatedDocSet.iterator() blows up by design:

{{
public DocIterator iterator() {
 throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, 
Unsupported Operation);
}
}}

I see that DocSetBase.intersection(DocSet other) has special-casing logic for 
dealing with {{other}} parameters that are instances of HashDocSet; does it 
also need special casing logic for dealing with {{other}} parameters that are 
NegatedDocSets? Or should NegatedDocSet *really* implement iterator()? Or 
something else entirely?
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Attachments: field-collapsing-extended-592129.patch, 
 field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2008-01-07 Thread Charles Hornberger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12556652#action_12556652
 ] 

clh edited comment on SOLR-236 at 1/7/08 1:51 PM:
-

bq. UPDATE: Doug Steigerwald's patch (field_collapsing_dsteigerwald.diff) 
applies cleanly to trunk

I'm having trouble applying field_collapsing_1.3.patch to the head of trunk.

{noformat}
[EMAIL PROTECTED]:~/solr/src/java$ patch -p0  
/home/charlie/downloads/field_collapsing_1.3.patch 
patching file org/apache/solr/search/CollapseFilter.java
patching file org/apache/solr/search/SolrIndexSearcher.java
Hunk #1 succeeded at 694 (offset -8 lines).
Hunk #2 succeeded at 1252 (offset -1 lines).
patching file org/apache/solr/common/params/CollapseParams.java
patching file org/apache/solr/handler/StandardRequestHandler.java
Hunk #1 FAILED at 33.
Hunk #2 FAILED at 90.
Hunk #3 FAILED at 117.
3 out of 3 hunks FAILED -- saving rejects to file 
org/apache/solr/handler/StandardRequestHandler.java.rej
patching file org/apache/solr/handler/DisMaxRequestHandler.java
Hunk #1 FAILED at 31.
Hunk #2 FAILED at 40.
Hunk #3 FAILED at 311.
Hunk #4 FAILED at 339.
4 out of 4 hunks FAILED -- saving rejects to file 
org/apache/solr/handler/DisMaxRequestHandler.java.rej
{noformat}

I'm guessing that maybe the field collapsing patch needs to be updated for the 
SearchHandler refactoring that was does as part of SOLR-281? If so, I'll take a 
whack at migrating the changes to the SearchHandler.java, and see if I can 
produce a better patch.

  was (Author: clh):
I'm having trouble applying field_collapsing_1.3.patch to the head of trunk.

{noformat}
[EMAIL PROTECTED]:~/solr/src/java$ patch -p0  
/home/charlie/downloads/field_collapsing_1.3.patch 
patching file org/apache/solr/search/CollapseFilter.java
patching file org/apache/solr/search/SolrIndexSearcher.java
Hunk #1 succeeded at 694 (offset -8 lines).
Hunk #2 succeeded at 1252 (offset -1 lines).
patching file org/apache/solr/common/params/CollapseParams.java
patching file org/apache/solr/handler/StandardRequestHandler.java
Hunk #1 FAILED at 33.
Hunk #2 FAILED at 90.
Hunk #3 FAILED at 117.
3 out of 3 hunks FAILED -- saving rejects to file 
org/apache/solr/handler/StandardRequestHandler.java.rej
patching file org/apache/solr/handler/DisMaxRequestHandler.java
Hunk #1 FAILED at 31.
Hunk #2 FAILED at 40.
Hunk #3 FAILED at 311.
Hunk #4 FAILED at 339.
4 out of 4 hunks FAILED -- saving rejects to file 
org/apache/solr/handler/DisMaxRequestHandler.java.rej
{noformat}

I'm guessing that maybe the field collapsing patch needs to be updated for the 
SearchHandler refactoring that was does as part of SOLR-281? If so, I'll take a 
whack at migrating the changes to the SearchHandler.java, and see if I can 
produce a better patch.
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Attachments: field-collapsing-extended-592129.patch, 
 field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
 field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2008-01-04 Thread Doug Steigerwald (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12556032#action_12556032
 ] 

dsteigerwald edited comment on SOLR-236 at 1/4/08 11:43 AM:


I've created a CollapseComponent for field collapsing.  Everything seems to 
work fine with it.  Only issue I'm having is I cannot use the query component 
because when it isn't commented out, the non-field collapsed results are 
displayed and I can't figure out how to remove them.  Someone might be able to 
figure that part out.

[http://localhost:8983/solr/search?q=id:[0%20TO%20*]collapse=truecollapse.field=inStockcollapse.type=normalcollapse.threshold=0]

Here's the config I'm using:

searchComponent name=collapse 
class=org.apache.solr.handler.component.CollapseComponent / 
requestHandler name=/search class=solr.SearchHandler
lst name=defaults
str name=echoParamsexplicit/str
/lst
arr name=components
!--   strquery/str --
strfacet/str
!--   strmlt/str --
!--   strhighlight/str --
!--   strdebug/str --
strcollapse/str
/arr
  /requestHandler

  was (Author: dsteigerwald):
I've created a CollapseComponent for field collapsing.  Everything seems to 
work fine with it.  Only issue I'm having is I cannot use the query component 
because when it isn't commented out, the non-field collapsed results are 
displayed and I can't figure out how to remove them.  Someone might be able to 
figure that part out.

http://localhost:8983/solr/search?q=id:[0%20TO%20*]collapse=truecollapse.field=inStockcollapse.type=normalcollapse.threshold=0

Here's the config I'm using:

searchComponent name=collapse 
class=org.apache.solr.handler.component.CollapseComponent / 
requestHandler name=/search class=solr.SearchHandler
lst name=defaults
str name=echoParamsexplicit/str
/lst
arr name=components
!--   strquery/str --
strfacet/str
!--   strmlt/str --
!--   strhighlight/str --
!--   strdebug/str --
strcollapse/str
/arr
  /requestHandler
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Attachments: field-collapsing-extended-592129.patch, 
 field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
 field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2007-10-28 Thread Emmanuel Keller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538339
 ] 

ekeller edited comment on SOLR-236 at 10/28/07 1:55 PM:


Here is the patch for solr 1.3 rev 589395.

I made some performance improvement. No more cache. I use bitdocset or 
hashdocset depending on solrconfig.hashdocsetmaxsize variable.

Regards,
Emmanuel Keller.

  was (Author: ekeller):
Here is the patch for solr 1.3 rev 589395.

I made some performance improvment. No more cache. We are using bitdocset or 
hashdocset using solrconfig.hashdocsetmaxsize variable.

Regards,
Emmanuel Keller.
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Attachments: field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.