Thanks for sharing, Ian. Perhaps one day Solr will have a 3rd party plugin registry of sorts where you could publish it.
It wasn't clear if the bugs you listed were in your plugin or in Lucene/Solr. If the latter, and if you have time, please file JIRA issue(s). On Sun, Nov 18, 2018 at 10:25 PM Ian Caldwell <[email protected]> wrote: > We have been working on a search index that contains Archived Web Pages > that has been collected over a number of years. This can result in the same > page(url) being collected on many dates. The problem that we faced is that > we wanted to group results by Site(domain) but this left us with the same > page being found many times so we needed a second level of grouping. > > > > I have extended the SOLR 5.5.3 grouping code to allow for 2 level > grouping, through discussions with some of the people that are involved > with archiving websites it was requested that the code be shared with the > SOLR developers. I have made the code public on github SOLR Grouping > <https://github.com/nla/solr-grouping>. > > > > When extending the SOLR grouping code I tried to keep the code generic so > that it could possible used elsewhere but I did not try to make all > existing features work, only focusing on the parts that we needed for our > system. Along the way I found a couple of bugs that I fixed in this code > (1. Integer overflow in holding the total record count & 2. Not searching > all shards when performing the second phase of the query(get all records > within a group)). > > > > > > Ian Caldwell > > National Library of Australia > -- Lucene/Solr Search Committer (PMC), Developer, Author, Speaker LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com
