[ 
https://issues.apache.org/jira/browse/SOLR-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921324#action_12921324
 ] 

Peter Karich commented on SOLR-1311:
------------------------------------

Hi Marc,

could this issue be closed because of a field collapsing which is now in trunk 
and more mature?

Why it cannot be integrated as a plugin?

> pseudo-field-collapsing
> -----------------------
>
>                 Key: SOLR-1311
>                 URL: https://issues.apache.org/jira/browse/SOLR-1311
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.4
>            Reporter: Marc Sturlese
>             Fix For: Next
>
>         Attachments: SOLR-1311-pseudo-field-collapsing.patch
>
>
> I am trying to develope a new way of doing field collapsing based on the 
> adjacent field collapsing algorithm. I have started developing it beacuse I 
> am experiencing performance problems with the field collapsing patch with big 
> index (8G).
> The algorith does adjacent-pseudo-field collapsing. It does collapsing on the 
> first X documents. Instead of making the collapsed docs disapear, the 
> algorith will send them to a given position of the relevance results list.
> The reason I just do collapsing in the first X documents is that if I have 
> for example 600000 results and I am showing 10 results per page, I really 
> don't need to do collapsing in the page 30000 or even not in the 3000. Doing 
> this I am noticing dramatically better performance. The problem is I couldn't 
> find a way to plug the algorithm as a component and keep good performance. I 
> had to hack few classes in SolrIndexSearcher.java
> This patch is just experimental and for testing purposes. In case someone 
> finds it interesting would be good do find a way to integrate it in a better 
> way than it is at the moment.
> Advices are more than welcome.
>       
> Functionality:
> In solrconfig.xml we specify the pseudo-collapsing parameters:
>      <str name="plus.considerMoreDocs">true</str>
>      <str name="plus.considerHowMany">3000</str>
>      <str name="plus.considerField">name</str>
> (at the moment there's no threshold and other parameters that exist in the 
> current collapse-field patch)
> plus.considerMoreDocs one enables pseudo-collapsing
> plus.considerHowMany sets the number of resultant documents in wich we want 
> to apply the algorithm
> plus.considerField is the field to do pseudo-collapsing
> If the number of results is lower than plus.considerHowMany the algorithm 
> will be applyed to all the results.
> Let's say there is a query with 600000 results and we've set considerHowMany 
> to 3000 (and we already have the docs sorted by relevance). 
> What adjacent-pseudo-collapse does is, if the 2nd doc has to be collapsed it 
> will be sent to the pos 2999 of the relevance results array. If the 3th has 
> to be collpased too  will go to the position 2998 and successively like this.
> The algorithm is not applyed when a sortspec is set or plus.considerMoreDocs 
> is set to false. It neighter is applyed when using MoreLikeThisRequestHanlder.
> Example with a query of 9 results:
> Results sorted by relevance without pseudo-collapse-algorithm:
> doc1 - collapse_field_value 3
> doc2 - collapse_field_value 3
> doc3 - collapse_field_value 4
> doc4 - collapse_field_value 7
> doc5 - collapse_field_value 6
> doc6 - collapse_field_value 6
> doc7 - collapse_field_value 5
> doc8 - collapse_field_value 1
> doc9 - collapse_field_value 2
> Results pseudo-collapsed with plus.considerHowMany = 5
> doc1 - collapse_field_value 3
> doc3 - collapse_field_value 4
> doc4 - collapse_field_value 7
> doc5 - collapse_field_value 6
> doc2 - collapse_field_value 3*
> doc6 - collapse_field_value 6
> doc7 - collapse_field_value 5
> doc8 - collapse_field_value 1
> doc9 - collapse_field_value 2
> Results pseudo-collapsed with plus.considerHowMany = 9
> doc1 - collapse_field_value 3
> doc3 - collapse_field_value 4
> doc4 - collapse_field_value 7
> doc5 - collapse_field_value 6
> doc7 - collapse_field_value 5
> doc8 - collapse_field_value 1
> doc9 - collapse_field_value 2
> doc6 - collapse_field_value 6*
> doc2 - collapse_field_value 3*
> *pseudo-collapsed documents

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to