[ https://issues.apache.org/jira/browse/SOLR-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921324#action_12921324 ]
Peter Karich commented on SOLR-1311: ------------------------------------ Hi Marc, could this issue be closed because of a field collapsing which is now in trunk and more mature? Why it cannot be integrated as a plugin? > pseudo-field-collapsing > ----------------------- > > Key: SOLR-1311 > URL: https://issues.apache.org/jira/browse/SOLR-1311 > Project: Solr > Issue Type: New Feature > Components: search > Affects Versions: 1.4 > Reporter: Marc Sturlese > Fix For: Next > > Attachments: SOLR-1311-pseudo-field-collapsing.patch > > > I am trying to develope a new way of doing field collapsing based on the > adjacent field collapsing algorithm. I have started developing it beacuse I > am experiencing performance problems with the field collapsing patch with big > index (8G). > The algorith does adjacent-pseudo-field collapsing. It does collapsing on the > first X documents. Instead of making the collapsed docs disapear, the > algorith will send them to a given position of the relevance results list. > The reason I just do collapsing in the first X documents is that if I have > for example 600000 results and I am showing 10 results per page, I really > don't need to do collapsing in the page 30000 or even not in the 3000. Doing > this I am noticing dramatically better performance. The problem is I couldn't > find a way to plug the algorithm as a component and keep good performance. I > had to hack few classes in SolrIndexSearcher.java > This patch is just experimental and for testing purposes. In case someone > finds it interesting would be good do find a way to integrate it in a better > way than it is at the moment. > Advices are more than welcome. > > Functionality: > In solrconfig.xml we specify the pseudo-collapsing parameters: > <str name="plus.considerMoreDocs">true</str> > <str name="plus.considerHowMany">3000</str> > <str name="plus.considerField">name</str> > (at the moment there's no threshold and other parameters that exist in the > current collapse-field patch) > plus.considerMoreDocs one enables pseudo-collapsing > plus.considerHowMany sets the number of resultant documents in wich we want > to apply the algorithm > plus.considerField is the field to do pseudo-collapsing > If the number of results is lower than plus.considerHowMany the algorithm > will be applyed to all the results. > Let's say there is a query with 600000 results and we've set considerHowMany > to 3000 (and we already have the docs sorted by relevance). > What adjacent-pseudo-collapse does is, if the 2nd doc has to be collapsed it > will be sent to the pos 2999 of the relevance results array. If the 3th has > to be collpased too will go to the position 2998 and successively like this. > The algorithm is not applyed when a sortspec is set or plus.considerMoreDocs > is set to false. It neighter is applyed when using MoreLikeThisRequestHanlder. > Example with a query of 9 results: > Results sorted by relevance without pseudo-collapse-algorithm: > doc1 - collapse_field_value 3 > doc2 - collapse_field_value 3 > doc3 - collapse_field_value 4 > doc4 - collapse_field_value 7 > doc5 - collapse_field_value 6 > doc6 - collapse_field_value 6 > doc7 - collapse_field_value 5 > doc8 - collapse_field_value 1 > doc9 - collapse_field_value 2 > Results pseudo-collapsed with plus.considerHowMany = 5 > doc1 - collapse_field_value 3 > doc3 - collapse_field_value 4 > doc4 - collapse_field_value 7 > doc5 - collapse_field_value 6 > doc2 - collapse_field_value 3* > doc6 - collapse_field_value 6 > doc7 - collapse_field_value 5 > doc8 - collapse_field_value 1 > doc9 - collapse_field_value 2 > Results pseudo-collapsed with plus.considerHowMany = 9 > doc1 - collapse_field_value 3 > doc3 - collapse_field_value 4 > doc4 - collapse_field_value 7 > doc5 - collapse_field_value 6 > doc7 - collapse_field_value 5 > doc8 - collapse_field_value 1 > doc9 - collapse_field_value 2 > doc6 - collapse_field_value 6* > doc2 - collapse_field_value 3* > *pseudo-collapsed documents -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org