[
https://issues.apache.org/jira/browse/SOLR-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Martijn van Groningen updated SOLR-2205:
----------------------------------------
Attachment: SOLR-2205.patch
The code I initially wrote was on the pre-flex code base. So I took that code
and made it work for the trunk. So someone should definitely check it out if
all the changes I made are the right changes.
I tested this patch out on my local machine and when doing a search (q=*:*) on
an index that holds 10M documents, the searchtime was around 300 ms whereas the
same query without the code changes had a searchtime of around 2.8 seconds. So
that is +/- 9 times faster. These numbers are based on a basic search, so no
facets or highlighting etc.
I found out that the following piece of code took relatively a lot time to
execute (if it was executed millions and millions of times, you started to
notice):
{code}
filler.fillValue(doc);
groupMap.get(mval);
{code}
This fragment is used in the TopGroupCollector and Phase2GroupCollector. I put
some code in front of it the easily exclude documents that are not competitive.
This code in both classes is cheaper then using the fragment above.
Since I ported the code from pre-flex code I needed to make some changes to it
and support grouping by function. The code I initially wrote only needed to
support grouping on a field. Since the trunk also supports grouping by function
query, I added two methods to DocValues and implemented these methods in three
subclasses. I don't know if this particular change is good, but it works. I
think that it would be really helpful is someone can give feedback on this
particular change.
> Grouping performance improvements
> ---------------------------------
>
> Key: SOLR-2205
> URL: https://issues.apache.org/jira/browse/SOLR-2205
> Project: Solr
> Issue Type: Sub-task
> Components: search
> Affects Versions: 4.0
> Reporter: Martijn van Groningen
> Fix For: 4.0
>
> Attachments: SOLR-2205.patch
>
>
> This issue is dedicated to the performance of the grouping functionality.
> I've noticed that the code is not really performing on large indexes. Doing a
> search (q=*:*) with grouping on an index from around 5M documents took around
> one second on my local development machine. We had to support grouping on an
> index that holds around 50M documents per machine, so we made some changes
> and were able to happily serve that amount of documents. Patch will follow
> soon.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]