Hey Erick
Thank you very much for your help.
So I dived into the solr code and read the
http://wiki.apache.org/solr/HowToContribute section. Really informative :-)
I created a Jira issue about my problem and I attached a patch file with a
implementation off pivot faceting with ngroup and visible
Here is the link to the Jira Task
https://issues.apache.org/jira/browse/SOLR-5079
Best Regards Sandro
-Ursprüngliche Nachricht-
Von: Erick Erickson [mailto:erickerick...@gmail.com]
Gesendet: Sonntag, 21. Juli 2013 14:59
An: solr-user@lucene.apache.org
Betreff: Re: Avoid Solr Pivot Faceting Out of Memory / Shorter result for pivot
faceting requests with facet.pivot.ngroup=true and
facet.pivot.showLastList=false
Sorry, life's been really hectic lately. I don't know the pivot code, so can't
make much of a comment on that. But when it comes to code changes, it's
perfectly reasonable to open up a JIRA and attach the code as a patch. You
might have to nudge people a bit to get them to carry it forward...
The case will be strengthened if you can say that all the tests pass with your
patch. If the tests don't pass, then it may point to issues with your patch,
take a quick look at the tests that fail and see if they're related to your
changes.
Start here:
http://wiki.apache.org/solr/HowToContribute
Best
Erick
On Fri, Jul 19, 2013 at 9:25 AM, Sandro Zbinden zbin...@imagic.ch wrote:
Dear Members.
Do you guys think I am better off in the solr developer group with this
question.
To summarize I would like to add a facet.pivot.ngroup =true param for
show the count of the facet list Further on I would like to avoid an out of
memory exceptions in reducing the result of a facet.pivot query.
Best Regards
Sandro Zbinden
-Ursprüngliche Nachricht-
Von: Sandro Zbinden [mailto:zbin...@imagic.ch]
Gesendet: Mittwoch, 17. Juli 2013 13:45
An: solr-user@lucene.apache.org
Betreff: Avoid Solr Pivot Faceting Out of Memory / Shorter result for
pivot faceting requests with facet.pivot.ngroup=true and
facet.pivot.showLastList=false
Dear Usergroup
I am getting an out of memory exception in the following scenario.
I have 4 sql tables: patient, visit, study and image that will be
denormalized for the solr index The solr index looks like the
following
|p_id |p_lastname|v_id |v_name |...
| 1 | Miller| 10 | Study 1 |...
| 2 | Miller| 11 | Study 2 |...
| 2 | Miller| 12 | Study 3 |... -- Duplication
because of denormalization
| 3 | Smith| 13 | Study 4 |...
--
Now I am executing a facet query
q=*:*facet=true facet.pivot=p_lastname,p_id facet.limit=-1
And I get the following result
lst
str name=fieldp_lastname/str
str name=valueMiller/str
int name=count3/int
arr name=pivot
lst
str name=fieldp_id/str
int name=value1/int
int name=count1/int
/lst
lst
str name=fieldp_id/str
int name=value2/int
int name=count2/int
/lst
/arr
/lst
lst
str name=fieldp_lastname/str
str name=valueSmith/str
int name=count1/int
arr name=pivot
str name=fieldp_id/str
int name=value3/int
int name=count1/int
/lst
/arr
/lst
The goal is to show our clients a list of the group value and in parentheses
how many patients the group contains.
- Miller (2)
- Smith (1)
This is why we need to use the facet.pivot method with facet.limit-1. It is
as far as I know the only way to get a grouping for 2 criterias.
And we need the pivot list to count how many patients are in a group.
Currently this works good on smaller indexes but if we have arround 1'000'000
patients and we execute a query like the one above we run in an out of memory.
I figured out that the problem is not the calculation of the pivot but is the
presentation of the result.
Because we load all fields (we can not us facet.offset because we need to
order the results ascending and descending) the result can get really big.
To avoid this overload I created a change in the solr-core
PivotFacetHandler.java class.
In the method doPivots i added the following code
NamedListInteger nl = this.getTermCounts(subField);
pivot.add( ngroups, nl.size());
This will give me the group size of the list.
Then I removed the recursion call pivot.add( pivot, doPivots( nl,
subField, nextField, fnames, subset) ); Like this my result looks like
the following
lst
str name=fieldp_lastname/str
str name=valueMiller/str
int name=count3/int
int name=ngroup2/int
/lst
lst
str name=fieldp_lastname/str
str name=valueSmith/str
int name=count1/int
int name=ngroup1/int
/lst
My questions is now if there is already something planned like
facet.pivot.ngroup=true and facet.pivot.showLastList=false to improve the
performance of pivot faceting.
Is