Re: FieldCache insanity with field used as facet and group

2013-06-03 Thread Elodie Sannier

I'm reproducing the problem with the 4.2.1 example with 2 shards.

1) started up solr shards, indexed the example data, and confirmed empty
fieldCaches
[sanniere@funlevel-dx example]$ java
-Dbootstrap_confdir=./solr/collection1/conf
-Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar
[sanniere@funlevel-dx example2]$ java -Djetty.port=7574
-DzkHost=localhost:9983 -jar start.jar

2) used both grouping and faceting on the popularity field, then checked
the fieldcache insanity count
[sanniere@funlevel-dx example]$ curl -sS
"http://localhost:8983/solr/select?q=*:*&group=true&group.field=popularity";
> /dev/null
[sanniere@funlevel-dx example]$ curl -sS
"http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=popularity";
> /dev/null
[sanniere@funlevel-dx example]$ curl -sS
"http://localhost:8983/solr/admin/mbeans?stats=true&key=fieldCache&wt=json&indent=true";
| grep "entries_count|insanity_count"
"entries_count":10,
"insanity_count":2,

"insanity#0":"VALUEMISMATCH: Multiple distinct value objects for
SegmentCoreReader(owner=_g(4.2.1):C1)+popularity\n\t'SegmentCoreReader(owner=_g(4.2.1):C1)'=>'popularity',class
org.apache.lucene.index.SortedDocValues,0.5=>org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#12129794\n\t'SegmentCoreReader(owner=_g(4.2.1):C1)'=>'popularity',int,null=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#12298774\n\t'SegmentCoreReader(owner=_g(4.2.1):C1)'=>'popularity',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#12298774\n",
"insanity#1":"VALUEMISMATCH: Multiple distinct value objects for
SegmentCoreReader(owner=_f(4.2.1):C9)+popularity\n\t'SegmentCoreReader(owner=_f(4.2.1):C9)'=>'popularity',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#16648315\n\t'SegmentCoreReader(owner=_f(4.2.1):C9)'=>'popularity',int,null=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#16648315\n\t'SegmentCoreReader(owner=_f(4.2.1):C9)'=>'popularity',class
org.apache.lucene.index.SortedDocValues,0.5=>org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#1130715\n"}}},
"HIGHLIGHTING",{},
"OTHER",{}]}

I've updated https://issues.apache.org/jira/browse/SOLR-4866

Elodie

Le 28.05.2013 10:22, Elodie Sannier a écrit :

I've created https://issues.apache.org/jira/browse/SOLR-4866

Elodie

Le 07.05.2013 18:19, Chris Hostetter a écrit :

: I am using the Lucene FieldCache with SolrCloud and I have "insane" instances
: with messages like:

FWIW: I'm the one that named the result of these "sanity checks"
"FieldCacheInsantity" and i have regretted it ever since -- a better label
would have been "inconsistency"

: VALUEMISMATCH: Multiple distinct value objects for
: SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)+merchantid
: 'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',class
: 
org.apache.lucene.index.SortedDocValues,0.5=>org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#557711353
: 
'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',int,null=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713
: 
'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713
:
: All insane instances are for a field "merchantid" of type "int" used as facet
: and group field.

Interesting: it appears that the grouping code and the facet code are not
being consistent in how they are building hte field cache, so you are
getting two objects in the cache for each segment

I haven't checked if this happens much with the example configs, but if
you could: please file a bug with the details of which Solr version you
are using along with the schema fieldType&   filed declarations for your
merchantid field, along with the mbean stats output showing the field
cache insanity after executing two queries like...

/select?q=*:*&facet=true&facet.field=merchantid
/select?q=*:*&group=true&group.field=merchantid

(that way we can rule out your custom SearchComponent as having a bug in
it)

: This insanity can have performance impact ?
: How can I fix it ?

the impact is just that more ram is being used them is probably strictly
neccessary.  unless there is something unusual in your fieldType
delcataion, i don't think there is an easy fix you can apply -- we need to
fix the underlying code.

-Hoss


--
Kelkoo

*Elodie Sannier *Software engineer

*E*elodie.sann...@kelkoo.fr
*Y!Messenger* kelkooelodies
*T* +33 (0)4 56 09 07 55 *M*
*A* 4/6 Rue des Méridiens 38130 Echirolles




Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. 

Re: FieldCache insanity with field used as facet and group

2013-05-28 Thread Elodie Sannier

I've created https://issues.apache.org/jira/browse/SOLR-4866

Elodie

Le 07.05.2013 18:19, Chris Hostetter a écrit :

: I am using the Lucene FieldCache with SolrCloud and I have "insane" instances
: with messages like:

FWIW: I'm the one that named the result of these "sanity checks"
"FieldCacheInsantity" and i have regretted it ever since -- a better label
would have been "inconsistency"

: VALUEMISMATCH: Multiple distinct value objects for
: SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)+merchantid
: 'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',class
: 
org.apache.lucene.index.SortedDocValues,0.5=>org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#557711353
: 
'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',int,null=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713
: 
'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713
:
: All insane instances are for a field "merchantid" of type "int" used as facet
: and group field.

Interesting: it appears that the grouping code and the facet code are not
being consistent in how they are building hte field cache, so you are
getting two objects in the cache for each segment

I haven't checked if this happens much with the example configs, but if
you could: please file a bug with the details of which Solr version you
are using along with the schema fieldType&  filed declarations for your
merchantid field, along with the mbean stats output showing the field
cache insanity after executing two queries like...

/select?q=*:*&facet=true&facet.field=merchantid
/select?q=*:*&group=true&group.field=merchantid

(that way we can rule out your custom SearchComponent as having a bug in
it)

: This insanity can have performance impact ?
: How can I fix it ?

the impact is just that more ram is being used them is probably strictly
neccessary.  unless there is something unusual in your fieldType
delcataion, i don't think there is an easy fix you can apply -- we need to
fix the underlying code.

-Hoss



--
Kelkoo

*Elodie Sannier *Software engineer

*E*elodie.sann...@kelkoo.fr 
*Y!Messenger* kelkooelodies
*T* +33 (0)4 56 09 07 55 *M*
*A* 4/6 Rue des Méridiens 38130 Echirolles




Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


Re: FieldCache insanity with field used as facet and group

2013-05-07 Thread Chris Hostetter

: I am using the Lucene FieldCache with SolrCloud and I have "insane" instances
: with messages like:

FWIW: I'm the one that named the result of these "sanity checks" 
"FieldCacheInsantity" and i have regretted it ever since -- a better label 
would have been "inconsistency"

: VALUEMISMATCH: Multiple distinct value objects for
: SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)+merchantid
: 'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',class
: 
org.apache.lucene.index.SortedDocValues,0.5=>org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#557711353
: 
'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',int,null=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713
: 
'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713
: 
: All insane instances are for a field "merchantid" of type "int" used as facet
: and group field.

Interesting: it appears that the grouping code and the facet code are not 
being consistent in how they are building hte field cache, so you are 
getting two objects in the cache for each segment

I haven't checked if this happens much with the example configs, but if 
you could: please file a bug with the details of which Solr version you 
are using along with the schema fieldType & filed declarations for your 
merchantid field, along with the mbean stats output showing the field 
cache insanity after executing two queries like...

/select?q=*:*&facet=true&facet.field=merchantid
/select?q=*:*&group=true&group.field=merchantid

(that way we can rule out your custom SearchComponent as having a bug in 
it)

: This insanity can have performance impact ?
: How can I fix it ?

the impact is just that more ram is being used them is probably strictly 
neccessary.  unless there is something unusual in your fieldType 
delcataion, i don't think there is an easy fix you can apply -- we need to 
fix the underlying code.

-Hoss

FieldCache insanity with field used as facet and group

2013-04-25 Thread Elodie Sannier

Hello,

I am using the Lucene FieldCache with SolrCloud and I have "insane" instances 
with messages like:

VALUEMISMATCH: Multiple distinct value objects for 
SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)+merchantid 
'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',class 
org.apache.lucene.index.SortedDocValues,0.5=>org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#557711353
 
'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',int,null=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713
 
'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713

All insane instances are for a field "merchantid" of type "int" used as facet 
and group field.

I'm using a custom SearchHandler which makes two sub-queries, a first query 
with group.field=merchantid and a second query with facet.field=merchantid.

When I'm using the parameter facet.method=enum, I don't have the insane 
instance but I'm not sure it is the good fix.

This insanity can have performance impact ?
How can I fix it ?

Elodie Sannier


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.