[ https://issues.apache.org/jira/browse/LUCENE-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031266#comment-13031266 ]
Michael McCandless commented on LUCENE-1421: -------------------------------------------- bq. I think that grouping code should be part of Lucene instead of Solr. +1 This is a very popular issue (currently tied for 2nd place in votes). Unfortunately, I think the single-pass collector attached here doesn't scale very well to large maxDoc and/or large number of unique groups. Also, it pulls a DocTermsIndex on the top-level reader (costly in an NRT/reopen setting since it's not per-segment). So I decided to factor out parts of Solr's current two-pass approach into a shared "grouping" module. The downside of the two-pass approach is you run the query twice, automatically halving your QPS. (It's even worse because the grouping itself is somewhat computing intensive too). To try to help mitigate this, I also added a new CachingCollector, which just holds hits (docID and optionally score) up to a max allowed RAM consumption, and can then replay them for the 2nd pass. In includes a "max RAM" setting so that if too many hits are found, it stops caching (and you must then re-execute the query). But one nice side effect of the two-phased approach is that sharding is in theory straightforward (I think?). Ie, all shards would do the first phase, concurrently, to get the top N groups. Then you merge-sort the resulting top groups, then run second phase (finding docs w/in the top groups) on all shards, then merge results from the same group across all shards. > Ability to group search results by field > ---------------------------------------- > > Key: LUCENE-1421 > URL: https://issues.apache.org/jira/browse/LUCENE-1421 > Project: Lucene - Java > Issue Type: New Feature > Components: Search > Reporter: Artyom Sokolov > Priority: Minor > Attachments: lucene-grouping.patch > > > It would be awesome to group search results by specified field. Some > functionality was provided for Apache Solr but I think it should be done in > Core Lucene. There could be some useful information like total hits about > collapsed data like total count and so on. > Thanks, > Artyom -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org