Query function error - can not use FieldCache on multivalued field

2020-09-14 Thread Shamik Bandopadhyay
Hi,

  I'm trying to use Solr query function as a boost for term matches in the
title field. Here's my boost function

bf=if(exists(query({!v='title:Import data'})),10,0)

This throws the following error --> can not use FieldCache on multivalued
field: data

The function seems to be only working for a single term. The title field
doesn't support multivalued but it's configured to analyze terms. Here's
the field definition.



I was under the impression that I would be able to use the query function
to evaluate a regular query field. Am I missing something? If there's a
constraint on this function, can this boost be done in a different way?

Any pointers will be appreciated.

Thanks,
Shamik


Re: Solr 6.6.0 - Error: can not use FieldCache on multivalued field: categoryLevels

2018-02-26 Thread Vincenzo D'Amore
On Mon, Feb 26, 2018 at 7:14 PM, Erick Erickson 
wrote:

>
> Faceting works on multivalued fields, perhaps you can do something with
> that?
>
> The main difference I see in this case between facets and groups is that
groups are sorted by score, so most relevant group comes first.
Which is very useful when I have to return grouped results to the user.


-- 
Vincenzo D'Amore


Re: Solr 6.6.0 - Error: can not use FieldCache on multivalued field: categoryLevels

2018-02-26 Thread Erick Erickson
Of course, and in that use-case you'd want a particular document to
appear in all three categories.

Another client may want the doc to appear in only the "most important"
category, however that's defined.

Another client may want the doc to appear in "the more recent" day
(assuming we're grouping by date). Or "the oldest day".

that's what I meant by " rather than do something which will be wrong
it throws an error". Whatever you choose will be "wrong" in some
use-case.

Your use-case is certainly valid, but nobody has come forward with a
patch to allow it that I know of.

Faceting works on multivalued fields, perhaps you can do something with that?

Best,
Erick

On Mon, Feb 26, 2018 at 9:10 AM, Vincenzo D'Amore  wrote:
> Hi Erick,
>
> please consider this case where there is a group products that are
> televisions.
>
> Now I have only one category per product, but in same cases like the
> television I could have more than one.
>
> Some products should be available simultaneously in more categories, thats
> why the field I was trying to group is a multivalue, for example:
>
> /home-video/televisions/tv-led (516)
> /home-video/televisions/tv-ultra-hd-4k (363)
> /home-video/televisions/smart-tv (19)
>
> So there can be a television that is simultaneously a TV led, a smart tv
> and is ultra hd 4k.
>
> So, for example, I should be able to submit the following query:
>
> - fq=available:true
> - fq=vertical:0
> - q=television
> - rows=3
> - group=true
> - group.field=category
> - group.limit=0
>
> So the returned groups should be something like this (this is the output I
> have now for the single value field)
>
> 
>   
> 51653
> 
>   
> /home-video/televisions/tv-led
>  maxScore="0.6224861">
> 
>   
>   
> /home-video/televisions/tv-ultra-hd-4k
>  maxScore="0.5923965">
> 
>   
>   
> /home-video/televisions/smart-tv
> 
> 
>   
> 
>   
> 
>
>
>
> On Mon, Feb 26, 2018 at 4:44 PM, Erick Erickson 
> wrote:
>
>> What does "group by" mean on a field with more than one value? Say I
>> have "A" and "B" in the field in a single document. What group does it
>> go in, one labeld "A" or one labeled "B"?
>>
>> So IIUC, rather than do something which will be wrong it throws an
>> error if the field is defined as multiValued. And whatever option is
>> chosen (e.g. use the min or max or) will be wrong sometime.
>>
>> Although admittedly the error is a bit obscure...
>>
>> Best,
>> Erick
>>
>> On Mon, Feb 26, 2018 at 7:37 AM, Vincenzo D'Amore 
>> wrote:
>> > Hi Amrit,
>> >
>> > thanks for your help.
>> >
>> > I know that only 5/10% of documents in the collection have more than one
>> > value for the field I was trying to group by.
>> >
>> > So there isn't a particular memory usage in this case. Do you know if
>> there
>> > is any other counter-indication I have to be aware of?
>> >
>> > I was thinking to avoid this problem hacking the source code and deploy a
>> > personalised version of Solr.
>> >
>> > Best regards,
>> > Vincenzo
>> >
>> >
>> >
>> > On Mon, Feb 26, 2018 at 3:22 PM, Amrit Sarkar 
>> > wrote:
>> >
>> >> Vincenzo,
>> >>
>> >> As I read the source code;  SchemaField.java
>> >>
>> >> /**
>> >>  * Sanity checks that the properties of this field type are plausible
>> >>  * for a field that may be used to get a FieldCacheSource, throwing
>> >>  * an appropriate exception (including the field name) if it is not.
>> >>  * FieldType subclasses can choose to call this method in their
>> >>  * getValueSource implementation
>> >>  * @see FieldType#getValueSource
>> >>  */
>> >> public void checkFieldCacheSource() throws SolrException {
>> >>   if ( multiValued() ) {
>> >> throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
>> >> "can not use FieldCache on multivalued
>> field: "
>> >> + getName());
>> >>   }
>> >>   if (! hasDocValues() ) {
>> >> if ( ! ( indexed() && null != this.type.getUninversionType(this) )
>> ) {
>> >>   throw new SolrExcepti

Re: Solr 6.6.0 - Error: can not use FieldCache on multivalued field: categoryLevels

2018-02-26 Thread Vincenzo D'Amore
Hi Erick,

please consider this case where there is a group products that are
televisions.

Now I have only one category per product, but in same cases like the
television I could have more than one.

Some products should be available simultaneously in more categories, thats
why the field I was trying to group is a multivalue, for example:

/home-video/televisions/tv-led (516)
/home-video/televisions/tv-ultra-hd-4k (363)
/home-video/televisions/smart-tv (19)

So there can be a television that is simultaneously a TV led, a smart tv
and is ultra hd 4k.

So, for example, I should be able to submit the following query:

- fq=available:true
- fq=vertical:0
- q=television
- rows=3
- group=true
- group.field=category
- group.limit=0

So the returned groups should be something like this (this is the output I
have now for the single value field)


  
51653

  
/home-video/televisions/tv-led


  
  
/home-video/televisions/tv-ultra-hd-4k


  
  
/home-video/televisions/smart-tv


  

  




On Mon, Feb 26, 2018 at 4:44 PM, Erick Erickson 
wrote:

> What does "group by" mean on a field with more than one value? Say I
> have "A" and "B" in the field in a single document. What group does it
> go in, one labeld "A" or one labeled "B"?
>
> So IIUC, rather than do something which will be wrong it throws an
> error if the field is defined as multiValued. And whatever option is
> chosen (e.g. use the min or max or) will be wrong sometime.
>
> Although admittedly the error is a bit obscure...
>
> Best,
> Erick
>
> On Mon, Feb 26, 2018 at 7:37 AM, Vincenzo D'Amore 
> wrote:
> > Hi Amrit,
> >
> > thanks for your help.
> >
> > I know that only 5/10% of documents in the collection have more than one
> > value for the field I was trying to group by.
> >
> > So there isn't a particular memory usage in this case. Do you know if
> there
> > is any other counter-indication I have to be aware of?
> >
> > I was thinking to avoid this problem hacking the source code and deploy a
> > personalised version of Solr.
> >
> > Best regards,
> > Vincenzo
> >
> >
> >
> > On Mon, Feb 26, 2018 at 3:22 PM, Amrit Sarkar 
> > wrote:
> >
> >> Vincenzo,
> >>
> >> As I read the source code;  SchemaField.java
> >>
> >> /**
> >>  * Sanity checks that the properties of this field type are plausible
> >>  * for a field that may be used to get a FieldCacheSource, throwing
> >>  * an appropriate exception (including the field name) if it is not.
> >>  * FieldType subclasses can choose to call this method in their
> >>  * getValueSource implementation
> >>  * @see FieldType#getValueSource
> >>  */
> >> public void checkFieldCacheSource() throws SolrException {
> >>   if ( multiValued() ) {
> >> throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
> >> "can not use FieldCache on multivalued
> field: "
> >> + getName());
> >>   }
> >>   if (! hasDocValues() ) {
> >> if ( ! ( indexed() && null != this.type.getUninversionType(this) )
> ) {
> >>   throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
> >>   "can not use FieldCache on a field w/o
> >> docValues unless it is indexed and supports Uninversion: "
> >>   + getName());
> >> }
> >>   }
> >> }
> >>
> >> Seems like FieldCache are not allowed to un-invert values for
> >> multi-valued fields.
> >>
> >> I can suspect the reason, multiple values will eat up more memory? Not
> >> sure, someone else can weigh in.
> >>
> >>
> >>
> >> Amrit Sarkar
> >> Search Engineer
> >> Lucidworks, Inc.
> >> 415-589-9269
> >> www.lucidworks.com
> >> Twitter http://twitter.com/lucidworks
> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >> Medium: https://medium.com/@sarkaramrit2
> >>
> >> On Mon, Feb 26, 2018 at 7:37 PM, Vincenzo D'Amore 
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > while trying to run a group query on a multivalue field I received
> this
> >> > error:
> >> >
> >> > can not use FieldCache on multivalued field:
> >> >
> >> > 
> >> > 
> >> >
> >> > 
> >> >   true
> >> >   400
> >> >   4
> >> > 
> >> > 
> >> >   
> >> > org.apache.solr.common.SolrException str>
> >> > org.apache.solr.common.
> >> > SolrException
> >> >   
> >> >   can not use FieldCache on multivalued field:
> >> > categoryLevels
> >> >   400
> >> > 
> >> > 
> >> >
> >> > I don't understand why this is happening.
> >> >
> >> > Do you know any way to work around this problem?
> >> >
> >> > Thanks in advance,
> >> > Vincenzo
> >> >
> >> > --
> >> > Vincenzo D'Amore
> >> >
> >>
> >
> >
> >
> > --
> > Vincenzo D'Amore
>



-- 
Vincenzo D'Amore


Re: Solr 6.6.0 - Error: can not use FieldCache on multivalued field: categoryLevels

2018-02-26 Thread Erick Erickson
What does "group by" mean on a field with more than one value? Say I
have "A" and "B" in the field in a single document. What group does it
go in, one labeld "A" or one labeled "B"?

So IIUC, rather than do something which will be wrong it throws an
error if the field is defined as multiValued. And whatever option is
chosen (e.g. use the min or max or) will be wrong sometime.

Although admittedly the error is a bit obscure...

Best,
Erick

On Mon, Feb 26, 2018 at 7:37 AM, Vincenzo D'Amore  wrote:
> Hi Amrit,
>
> thanks for your help.
>
> I know that only 5/10% of documents in the collection have more than one
> value for the field I was trying to group by.
>
> So there isn't a particular memory usage in this case. Do you know if there
> is any other counter-indication I have to be aware of?
>
> I was thinking to avoid this problem hacking the source code and deploy a
> personalised version of Solr.
>
> Best regards,
> Vincenzo
>
>
>
> On Mon, Feb 26, 2018 at 3:22 PM, Amrit Sarkar 
> wrote:
>
>> Vincenzo,
>>
>> As I read the source code;  SchemaField.java
>>
>> /**
>>  * Sanity checks that the properties of this field type are plausible
>>  * for a field that may be used to get a FieldCacheSource, throwing
>>  * an appropriate exception (including the field name) if it is not.
>>  * FieldType subclasses can choose to call this method in their
>>  * getValueSource implementation
>>  * @see FieldType#getValueSource
>>  */
>> public void checkFieldCacheSource() throws SolrException {
>>   if ( multiValued() ) {
>> throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
>> "can not use FieldCache on multivalued field: "
>> + getName());
>>   }
>>   if (! hasDocValues() ) {
>> if ( ! ( indexed() && null != this.type.getUninversionType(this) ) ) {
>>   throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
>>   "can not use FieldCache on a field w/o
>> docValues unless it is indexed and supports Uninversion: "
>>   + getName());
>> }
>>   }
>> }
>>
>> Seems like FieldCache are not allowed to un-invert values for
>> multi-valued fields.
>>
>> I can suspect the reason, multiple values will eat up more memory? Not
>> sure, someone else can weigh in.
>>
>>
>>
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> Medium: https://medium.com/@sarkaramrit2
>>
>> On Mon, Feb 26, 2018 at 7:37 PM, Vincenzo D'Amore 
>> wrote:
>>
>> > Hi,
>> >
>> > while trying to run a group query on a multivalue field I received this
>> > error:
>> >
>> > can not use FieldCache on multivalued field:
>> >
>> > 
>> > 
>> >
>> > 
>> >   true
>> >   400
>> >   4
>> > 
>> > 
>> >   
>> > org.apache.solr.common.SolrException
>> > org.apache.solr.common.
>> > SolrException
>> >   
>> >   can not use FieldCache on multivalued field:
>> > categoryLevels
>> >   400
>> > 
>> > 
>> >
>> > I don't understand why this is happening.
>> >
>> > Do you know any way to work around this problem?
>> >
>> > Thanks in advance,
>> > Vincenzo
>> >
>> > --
>> > Vincenzo D'Amore
>> >
>>
>
>
>
> --
> Vincenzo D'Amore


Re: Solr 6.6.0 - Error: can not use FieldCache on multivalued field: categoryLevels

2018-02-26 Thread Vincenzo D'Amore
Hi Amrit,

thanks for your help.

I know that only 5/10% of documents in the collection have more than one
value for the field I was trying to group by.

So there isn't a particular memory usage in this case. Do you know if there
is any other counter-indication I have to be aware of?

I was thinking to avoid this problem hacking the source code and deploy a
personalised version of Solr.

Best regards,
Vincenzo



On Mon, Feb 26, 2018 at 3:22 PM, Amrit Sarkar 
wrote:

> Vincenzo,
>
> As I read the source code;  SchemaField.java
>
> /**
>  * Sanity checks that the properties of this field type are plausible
>  * for a field that may be used to get a FieldCacheSource, throwing
>  * an appropriate exception (including the field name) if it is not.
>  * FieldType subclasses can choose to call this method in their
>  * getValueSource implementation
>  * @see FieldType#getValueSource
>  */
> public void checkFieldCacheSource() throws SolrException {
>   if ( multiValued() ) {
> throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
> "can not use FieldCache on multivalued field: "
> + getName());
>   }
>   if (! hasDocValues() ) {
> if ( ! ( indexed() && null != this.type.getUninversionType(this) ) ) {
>   throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
>   "can not use FieldCache on a field w/o
> docValues unless it is indexed and supports Uninversion: "
>   + getName());
> }
>   }
> }
>
> Seems like FieldCache are not allowed to un-invert values for
> multi-valued fields.
>
> I can suspect the reason, multiple values will eat up more memory? Not
> sure, someone else can weigh in.
>
>
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
> On Mon, Feb 26, 2018 at 7:37 PM, Vincenzo D'Amore 
> wrote:
>
> > Hi,
> >
> > while trying to run a group query on a multivalue field I received this
> > error:
> >
> > can not use FieldCache on multivalued field:
> >
> > 
> > 
> >
> > 
> >   true
> >   400
> >   4
> > 
> > 
> >   
> > org.apache.solr.common.SolrException
> > org.apache.solr.common.
> > SolrException
> >   
> >   can not use FieldCache on multivalued field:
> > categoryLevels
> >   400
> > 
> > 
> >
> > I don't understand why this is happening.
> >
> > Do you know any way to work around this problem?
> >
> > Thanks in advance,
> > Vincenzo
> >
> > --
> > Vincenzo D'Amore
> >
>



-- 
Vincenzo D'Amore


Re: Solr 6.6.0 - Error: can not use FieldCache on multivalued field: categoryLevels

2018-02-26 Thread Amrit Sarkar
Vincenzo,

As I read the source code;  SchemaField.java

/**
 * Sanity checks that the properties of this field type are plausible
 * for a field that may be used to get a FieldCacheSource, throwing
 * an appropriate exception (including the field name) if it is not.
 * FieldType subclasses can choose to call this method in their
 * getValueSource implementation
 * @see FieldType#getValueSource
 */
public void checkFieldCacheSource() throws SolrException {
  if ( multiValued() ) {
throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
"can not use FieldCache on multivalued field: "
+ getName());
  }
  if (! hasDocValues() ) {
if ( ! ( indexed() && null != this.type.getUninversionType(this) ) ) {
  throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
  "can not use FieldCache on a field w/o
docValues unless it is indexed and supports Uninversion: "
  + getName());
}
  }
}

Seems like FieldCache are not allowed to un-invert values for
multi-valued fields.

I can suspect the reason, multiple values will eat up more memory? Not
sure, someone else can weigh in.



Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Mon, Feb 26, 2018 at 7:37 PM, Vincenzo D'Amore 
wrote:

> Hi,
>
> while trying to run a group query on a multivalue field I received this
> error:
>
> can not use FieldCache on multivalued field:
>
> 
> 
>
> 
>   true
>   400
>   4
> 
> 
>   
> org.apache.solr.common.SolrException
> org.apache.solr.common.
> SolrException
>   
>   can not use FieldCache on multivalued field:
> categoryLevels
>   400
> 
> 
>
> I don't understand why this is happening.
>
> Do you know any way to work around this problem?
>
> Thanks in advance,
> Vincenzo
>
> --
> Vincenzo D'Amore
>


Solr 6.6.0 - Error: can not use FieldCache on multivalued field: categoryLevels

2018-02-26 Thread Vincenzo D'Amore
Hi,

while trying to run a group query on a multivalue field I received this
error:

can not use FieldCache on multivalued field:





  true
  400
  4


  
org.apache.solr.common.SolrException
org.apache.solr.common.SolrException
  
  can not use FieldCache on multivalued field:
categoryLevels
  400



I don't understand why this is happening.

Do you know any way to work around this problem?

Thanks in advance,
Vincenzo

-- 
Vincenzo D'Amore


Re: Spatial Search: can not use FieldCache on a field which is neither indexed nor has doc values: latitudeLongitude_0_coordinate

2017-04-30 Thread David Smiley
Frederick,

RE LatLonType: Weird. Is the dynamic field "_coordinate" defined?  It
should be ensure it has indexed=true on it.  I forget if indexed needs
to be set on that or on the LLT field that refers to it but to be sure set
on both.

RE LatLonPointSpatialField: You should use this for sure assuming you are
using the latest Solr release (6.5.x).  You said "Solr version 6.1.0" which
doesn't have this field type though.

~ David

On Thu, Apr 27, 2017 at 8:26 AM freddy79 
wrote:

> Hi,
>
> when doing a query with spatial search i get the error: can not use
> FieldCache on a field which is neither indexed nor has doc values:
> latitudeLongitude_0_coordinate
>
> *SOLR Version:* 6.1.0
> *schema.xml:*
>
>  subFieldSuffix="_coordinate" />
>  stored="false" multiValued="false"  />
>
> *Query:*
>
> http://localhost:8983/solr/career_educationVacancyLocation/select?q=*:*&fq={!geofilt}&sfield=latitudeLongitude&pt=48.15,16.23&d=10
>
> *Error Message:*
> can not use FieldCache on a field which is neither indexed nor has doc
> values: latitudeLongitude_0_coordinate
>
> What is wrong? Thanks.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Spatial-Search-can-not-use-FieldCache-on-a-field-which-is-neither-indexed-nor-has-doc-values-latitude-tp4332185.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com


Re: Spatial Search: can not use FieldCache on a field which is neither indexed nor has doc values: latitudeLongitude_0_coordinate

2017-04-27 Thread freddy79
It does work with "solr.LatLonPointSpatialField" instead of
"solr.LatLonType".



But why not with "solr.LatLonType"?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spatial-Search-can-not-use-FieldCache-on-a-field-which-is-neither-indexed-nor-has-doc-values-latitude-tp4332185p4332199.html
Sent from the Solr - User mailing list archive at Nabble.com.


Spatial Search: can not use FieldCache on a field which is neither indexed nor has doc values: latitudeLongitude_0_coordinate

2017-04-27 Thread freddy79
Hi,

when doing a query with spatial search i get the error: can not use
FieldCache on a field which is neither indexed nor has doc values:
latitudeLongitude_0_coordinate

*SOLR Version:* 6.1.0
*schema.xml:*




*Query:*
http://localhost:8983/solr/career_educationVacancyLocation/select?q=*:*&fq={!geofilt}&sfield=latitudeLongitude&pt=48.15,16.23&d=10

*Error Message:*
can not use FieldCache on a field which is neither indexed nor has doc
values: latitudeLongitude_0_coordinate

What is wrong? Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spatial-Search-can-not-use-FieldCache-on-a-field-which-is-neither-indexed-nor-has-doc-values-latitude-tp4332185.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Question about Lucene FieldCache

2017-01-09 Thread Yago Riveiro
Ok, then I need to configure to reduce the size of the cache.

Thanks for the help Mikhail.

--

/Yago Riveiro

On 9 Jan 2017 17:01 +, Mikhail Khludnev , wrote:
> This probably says why
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/core/SolrConfig.java#L258
>
> On Mon, Jan 9, 2017 at 4:41 PM, Yago Riveiro  wrote:
>
> > The documentation says that the only caches configurable are:
> >
> > - filterCache
> > - queryResultCache
> > - documentCache
> > - user defined caches
> >
> > There is no entry for fieldValueCache and in my case all of list in the
> > documentation are disable ...
> >
> > --
> >
> > /Yago Riveiro
> >
> > On 9 Jan 2017 13:20 +, Mikhail Khludnev , wrote:
> > > On Mon, Jan 9, 2017 at 2:17 PM, Yago Riveiro  > wrote:
> > >
> > > > Thanks for re reply Mikhail,
> > > >
> > > > Do you know if the 1 value is configurable?
> > >
> > > yes. in solrconfig.xml
> > > https://cwiki.apache.org/confluence/display/solr/Query+
> > Settings+in+SolrConfig#QuerySettingsinSolrConfig-Caches
> > > iirc you cant' fully disable it setting size to 0.
> > >
> > >
> > > > My insert rate is so high
> > > > (5000 docs/s) that the cache it's quite useless.
> > > >
> > > > In the case of the Lucene field cache, it's possible "clean" it in some
> > > > way?
> > > >
> > > > Even it would be possible, the first sorting query or so loads it back.
> > >
> > > > Some cache is eating my memory heap.
> > > >
> > > Probably you need to dedicate master which won't load FieldCache.
> > >
> > >
> > > >
> > > >
> > > >
> > > > -
> > > > Best regards
> > > >
> > > > /Yago
> > > > --
> > > > View this message in context: http://lucene.472066.n3.
> > > > nabble.com/Question-about-Lucene-FieldCache-tp4313062p4313069.html
> > > > Sent from the Solr - User mailing list archive at Nabble.com.
> > > >
> > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev


Re: Question about Lucene FieldCache

2017-01-09 Thread Mikhail Khludnev
This probably says why
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/core/SolrConfig.java#L258

On Mon, Jan 9, 2017 at 4:41 PM, Yago Riveiro  wrote:

> The documentation says that the only caches configurable are:
>
> - filterCache
> - queryResultCache
> - documentCache
> - user defined caches
>
> There is no entry for fieldValueCache and in my case all of list in the
> documentation are disable ...
>
> --
>
> /Yago Riveiro
>
> On 9 Jan 2017 13:20 +, Mikhail Khludnev , wrote:
> > On Mon, Jan 9, 2017 at 2:17 PM, Yago Riveiro 
> wrote:
> >
> > > Thanks for re reply Mikhail,
> > >
> > > Do you know if the 1 value is configurable?
> >
> > yes. in solrconfig.xml
> > https://cwiki.apache.org/confluence/display/solr/Query+
> Settings+in+SolrConfig#QuerySettingsinSolrConfig-Caches
> > iirc you cant' fully disable it setting size to 0.
> >
> >
> > > My insert rate is so high
> > > (5000 docs/s) that the cache it's quite useless.
> > >
> > > In the case of the Lucene field cache, it's possible "clean" it in some
> > > way?
> > >
> > > Even it would be possible, the first sorting query or so loads it back.
> >
> > > Some cache is eating my memory heap.
> > >
> > Probably you need to dedicate master which won't load FieldCache.
> >
> >
> > >
> > >
> > >
> > > -
> > > Best regards
> > >
> > > /Yago
> > > --
> > > View this message in context: http://lucene.472066.n3.
> > > nabble.com/Question-about-Lucene-FieldCache-tp4313062p4313069.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
>



-- 
Sincerely yours
Mikhail Khludnev


Re: Question about Lucene FieldCache

2017-01-09 Thread billnbell
Try disabling and perf may get better 

Bill Bell
Sent from mobile


> On Jan 9, 2017, at 6:41 AM, Yago Riveiro  wrote:
> 
> The documentation says that the only caches configurable are:
> 
> - filterCache
> - queryResultCache
> - documentCache
> - user defined caches
> 
> There is no entry for fieldValueCache and in my case all of list in the 
> documentation are disable ...
> 
> --
> 
> /Yago Riveiro
> 
>> On 9 Jan 2017 13:20 +, Mikhail Khludnev , wrote:
>>> On Mon, Jan 9, 2017 at 2:17 PM, Yago Riveiro  wrote:
>>> 
>>> Thanks for re reply Mikhail,
>>> 
>>> Do you know if the 1 value is configurable?
>> 
>> yes. in solrconfig.xml
>> https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig#QuerySettingsinSolrConfig-Caches
>> iirc you cant' fully disable it setting size to 0.
>> 
>> 
>>> My insert rate is so high
>>> (5000 docs/s) that the cache it's quite useless.
>>> 
>>> In the case of the Lucene field cache, it's possible "clean" it in some
>>> way?
>>> 
>>> Even it would be possible, the first sorting query or so loads it back.
>> 
>>> Some cache is eating my memory heap.
>>> 
>> Probably you need to dedicate master which won't load FieldCache.
>> 
>> 
>>> 
>>> 
>>> 
>>> -
>>> Best regards
>>> 
>>> /Yago
>>> --
>>> View this message in context: http://lucene.472066.n3.
>>> nabble.com/Question-about-Lucene-FieldCache-tp4313062p4313069.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
>> 
>> 
>> 
>> --
>> Sincerely yours
>> Mikhail Khludnev


Re: Question about Lucene FieldCache

2017-01-09 Thread Yago Riveiro
The documentation says that the only caches configurable are:

- filterCache
- queryResultCache
- documentCache
- user defined caches

There is no entry for fieldValueCache and in my case all of list in the 
documentation are disable ...

--

/Yago Riveiro

On 9 Jan 2017 13:20 +, Mikhail Khludnev , wrote:
> On Mon, Jan 9, 2017 at 2:17 PM, Yago Riveiro  wrote:
>
> > Thanks for re reply Mikhail,
> >
> > Do you know if the 1 value is configurable?
>
> yes. in solrconfig.xml
> https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig#QuerySettingsinSolrConfig-Caches
> iirc you cant' fully disable it setting size to 0.
>
>
> > My insert rate is so high
> > (5000 docs/s) that the cache it's quite useless.
> >
> > In the case of the Lucene field cache, it's possible "clean" it in some
> > way?
> >
> > Even it would be possible, the first sorting query or so loads it back.
>
> > Some cache is eating my memory heap.
> >
> Probably you need to dedicate master which won't load FieldCache.
>
>
> >
> >
> >
> > -
> > Best regards
> >
> > /Yago
> > --
> > View this message in context: http://lucene.472066.n3.
> > nabble.com/Question-about-Lucene-FieldCache-tp4313062p4313069.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev


Re: Question about Lucene FieldCache

2017-01-09 Thread Mikhail Khludnev
On Mon, Jan 9, 2017 at 2:17 PM, Yago Riveiro  wrote:

> Thanks for re reply Mikhail,
>
> Do you know if the 1 value is configurable?

yes. in solrconfig.xml
https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig#QuerySettingsinSolrConfig-Caches
iirc you cant' fully disable it setting size to 0.


> My insert rate is so high
> (5000 docs/s) that the cache it's quite useless.
>
> In the case of the Lucene field cache, it's possible "clean" it in some
> way?
>
> Even it would be possible, the first sorting query or so loads it back.

> Some cache is eating my memory heap.
>
Probably you need to dedicate master which won't load FieldCache.


>
>
>
> -
> Best regards
>
> /Yago
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Question-about-Lucene-FieldCache-tp4313062p4313069.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev


Re: Question about Lucene FieldCache

2017-01-09 Thread Yago Riveiro
Thanks for re reply Mikhail,

Do you know if the 1 value is configurable? My insert rate is so high
(5000 docs/s) that the cache it's quite useless.

In the case of the Lucene field cache, it's possible "clean" it in some way?

Some cache is eating my memory heap.



-
Best regards

/Yago
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-about-Lucene-FieldCache-tp4313062p4313069.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Question about Lucene FieldCache

2017-01-09 Thread Mikhail Khludnev
Hello, Yago.

"size": "1", "showItems": "-1", "initialSize": "10", "name":
"fieldValueCache"

These are Solr's UnInvertedFields, not Lucene's FieldCache.
That 1 is for all fields of the collection schema.
Collection reload or commit drop all entries from this cache.


On Mon, Jan 9, 2017 at 1:30 PM, Yago Riveiro  wrote:

> Hi,
>
> After some reading into the documentation, supposedly the Lucene FieldCache
> is the only one that it's not possible to disable.
>
> Fetching the config for a collection through the REST API I found an entry
> like this:
>
> "query": {
> "useFilterForSortedQuery": true,
> "queryResultWindowSize": 1,
> "queryResultMaxDocsCached": 0,
> "enableLazyFieldLoading": true,
> "maxBooleanClauses": 8192,
> "": {
> "size": "1",
> "showItems": "-1",
> "initialSize": "10",
> "name": "fieldValueCache"
> }
> },
>
> My questions:
>
> - That size, 1 is for all files of the collection schema or is 1
> for
> each field defined?
> - If I reload the collection the caches are wiped?
>
> Regards,
>
> /Yago
>
>
>
> -
> Best regards
>
> /Yago
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Question-about-Lucene-FieldCache-tp4313062.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev


Question about Lucene FieldCache

2017-01-09 Thread Yago Riveiro
Hi,

After some reading into the documentation, supposedly the Lucene FieldCache
is the only one that it's not possible to disable.

Fetching the config for a collection through the REST API I found an entry
like this:

"query": {
"useFilterForSortedQuery": true,
"queryResultWindowSize": 1,
"queryResultMaxDocsCached": 0,
"enableLazyFieldLoading": true,
"maxBooleanClauses": 8192,
"": {
"size": "1",
"showItems": "-1",
"initialSize": "10",
"name": "fieldValueCache"
}
},

My questions:

- That size, 1 is for all files of the collection schema or is 1 for
each field defined?
- If I reload the collection the caches are wiped?

Regards,

/Yago



-
Best regards

/Yago
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-about-Lucene-FieldCache-tp4313062.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr off-heap FieldCache & HelioSearch

2016-06-03 Thread Phillip Peleshok
Funny you say that, as that's exactly what happened.  Tried them a couple
weeks ago and nothing.  Going at them again and will see what happens.

Yeah, we're in the same boat.  We started with the profilers (Yourkit) to
track down the causes.  Mainly got hit in the field cache and ordinal maps
(and all the objects just to build them).  Since we transitioned from
classic facets to json facets, unfortunately SOLR-8922 doesn't lend much
but it looks really good.  We were looking at cutting out the ordinal cache
depending on the cardinality but that's still a PoC at this point, but does
allow us to cap the memory usage.  Then given the (
http://stackoverflow.com/questions/214362/java-very-large-heap-sizes) we
stumbled across the off-heap and were giving that a go to see if it's worth
the avenue.  But after reading the UnSafe, started getting cold feet and
that's why I was trying to dig up a little more history.

Was actually thinking about the isolation of JVM per shard too.  Going
through the whiteboarding, decided against since it didn't lend itself to
our scenarios, but would be interested in how it turns out for you.

Thanks!
Phil

On Fri, Jun 3, 2016 at 8:33 AM, Jeff Wartes  wrote:

>
> For what it’s worth, I’d suggest you go into a conversation with Azul with
> a more explicit “I’m looking to buy” approach. I reached out to them with a
> more “I’m exploring my options” attitude, and never even got a trial. I get
> the impression their business model involves a fairly expensive (to them)
> trial process, so they’re looking for more urgency on the part of the
> client than I was expressing.
>
> Instead, I spent a few weeks analyzing how my specific index allocated
> memory. This turned out to be quite worthwhile. Armed with that
> information, I was able to file a few patches (coming in 6.1, perhaps?)
> that reduced allocations by a pretty decent amount on large indexes.
> (SOLR-8922, particularly) It also straight-up ruled out certain things Solr
> supports, because the allocations were just too heavy. (SOLR-9125)
>
> I suppose the next thing I’m considering is using multiple JVMs per host,
> essentially one per shard. This wouldn’t change the allocation rate, but
> does serve to reduce the worst-case GC pause, since each JVM can have a
> smaller heap. I’d be trading a little p50 latency for some p90 latency
> reduction, I’d expect. Of course, that adds a bunch of headache to managing
> replica locations too.
>
>
> On 6/2/16, 6:30 PM, "Phillip Peleshok"  wrote:
>
> >Fantastic! I'm sorry I couldn't find that JIRA before and for getting you
> >to track it down.
> >
> >Yup, I noticed that for the docvalues with the ordinal map and I'm
> >definitely leveraging all that but I'm hitting the terms limit now and
> that
> >ends up pushing me over.  I'll see about giving Zing/Azul a try.  From all
> >my readings using theUnsafe seemed a little sketchy (
> >http://mishadoff.com/blog/java-magic-part-4-sun-dot-misc-dot-unsafe/) so
> >I'm glad that seemed to be the point of contention bringing it in and not
> >anything else.
> >
> >Thank you very much for the info,
> >Phil
> >
> >On Thu, Jun 2, 2016 at 6:14 PM, Erick Erickson 
> >wrote:
> >
> >> Basically it never reached consensus, see the discussion at:
> >> https://issues.apache.org/jira/browse/SOLR-6638
> >>
> >> If you can afford it I've seen people with very good results
> >> using Zing/Azul, but that can be expensive.
> >>
> >> DocValues can help for fields you facet and sort on,
> >> those essentially move memory into the OS
> >> cache.
> >>
> >> But memory is an ongoing struggle I'm afraid.
> >>
> >> Best,
> >> Erick
> >>
> >> On Wed, Jun 1, 2016 at 12:34 PM, Phillip Peleshok 
> >> wrote:
> >> > Hey everyone,
> >> >
> >> > I've been using Solr for some time now and running into GC issues as
> most
> >> > others have.  Now I've exhausted all the traditional GC settings
> >> > recommended by various individuals (ie Shawn Heisey, etc) but neither
> >> > proved sufficient.  The one solution that I've seen that proved
> useful is
> >> > Heliosearch and the off-heap implementation.
> >> >
> >> > My question is this, why wasn't the off-heap FieldCache
> implementation (
> >> > http://yonik.com/hs-solr-off-heap-fieldcache-performance/) ever
> rolled
> >> into
> >> > Solr when the other HelioSearch improvement were merged? Was there a
> >> > fundamental design problem or just a matter of time/testing that
> would be
> >> > incurred by the move?
> >> >
> >> > Thanks,
> >> > Phil
> >>
>
>


Re: Solr off-heap FieldCache & HelioSearch

2016-06-03 Thread Phillip Peleshok
Thank you for the info on this.  Yeah, I should've raised this in the dev
lists; sorry about that.  Funny you mention that since I was trending in
that direction as well.  Then saw the off-heap stuff and thought it might
have had an easy way out.  I'd like to focus on the re-use scheme to be
honest.  Already looking at that approach for the ordinal maps.

Thanks again,
Phil

On Fri, Jun 3, 2016 at 4:33 AM, Toke Eskildsen 
wrote:

> On Thu, 2016-06-02 at 18:14 -0700, Erick Erickson wrote:
> > But memory is an ongoing struggle I'm afraid.
>
> With fear of going too far into devel-territory...
>
>
> There are several places in Solr where memory usage if far from optimal
> with high-cardinality data and where improvements can be made without
> better GC or off-heap.
>
> Some places it is due to "clean object oriented" programming, for
> example with priority queues filled with objects, which gets very GC
> expensive for 100K+ entries. Some of this can be remedied by less clean
> coding and bit-hacking, but often results in less-manageable code.
>
> https://sbdevel.wordpress.com/2015/11/13/the-ones-that-got-away/
>
>
> Other places it is large arrays that are hard to avoid, for example with
> docID-bitmaps and counter-arrays for String faceting. These put quite a
> strain on GC as they are being allocated and released all the time.
> Unless the index is constantly updated, DocValues does not help much
> with GC as the counters are the same, DocValues or not.
>
> The layout of these structures is well-defined: As long as the Searcher
> has not been re-opened, each new instance of an array is of the exact
> same size as the previous one. When the searcher is re-opened, all the
> sizes changes. Putting those structures off-heap is one solution,
> another is to re-use the structures.
>
> Our experiments with re-using faceting counter structures has been very
> promising (far less GC, lower response times). I would think that the
> same would be true for a similar docID-bitmap re-use scheme.
>
>
> So yes, very much an on-going struggle, but one where there are multiple
> known remedies. Not necessarily easy to implement though.
>
> - Toke Eskildsen, State and Univeristy Library, Denmark
>
>
>


Re: Solr off-heap FieldCache & HelioSearch

2016-06-03 Thread Jeff Wartes

For what it’s worth, I’d suggest you go into a conversation with Azul with a 
more explicit “I’m looking to buy” approach. I reached out to them with a more 
“I’m exploring my options” attitude, and never even got a trial. I get the 
impression their business model involves a fairly expensive (to them) trial 
process, so they’re looking for more urgency on the part of the client than I 
was expressing.

Instead, I spent a few weeks analyzing how my specific index allocated memory. 
This turned out to be quite worthwhile. Armed with that information, I was able 
to file a few patches (coming in 6.1, perhaps?) that reduced allocations by a 
pretty decent amount on large indexes. (SOLR-8922, particularly) It also 
straight-up ruled out certain things Solr supports, because the allocations 
were just too heavy. (SOLR-9125)

I suppose the next thing I’m considering is using multiple JVMs per host, 
essentially one per shard. This wouldn’t change the allocation rate, but does 
serve to reduce the worst-case GC pause, since each JVM can have a smaller 
heap. I’d be trading a little p50 latency for some p90 latency reduction, I’d 
expect. Of course, that adds a bunch of headache to managing replica locations 
too.


On 6/2/16, 6:30 PM, "Phillip Peleshok"  wrote:

>Fantastic! I'm sorry I couldn't find that JIRA before and for getting you
>to track it down.
>
>Yup, I noticed that for the docvalues with the ordinal map and I'm
>definitely leveraging all that but I'm hitting the terms limit now and that
>ends up pushing me over.  I'll see about giving Zing/Azul a try.  From all
>my readings using theUnsafe seemed a little sketchy (
>http://mishadoff.com/blog/java-magic-part-4-sun-dot-misc-dot-unsafe/) so
>I'm glad that seemed to be the point of contention bringing it in and not
>anything else.
>
>Thank you very much for the info,
>Phil
>
>On Thu, Jun 2, 2016 at 6:14 PM, Erick Erickson 
>wrote:
>
>> Basically it never reached consensus, see the discussion at:
>> https://issues.apache.org/jira/browse/SOLR-6638
>>
>> If you can afford it I've seen people with very good results
>> using Zing/Azul, but that can be expensive.
>>
>> DocValues can help for fields you facet and sort on,
>> those essentially move memory into the OS
>> cache.
>>
>> But memory is an ongoing struggle I'm afraid.
>>
>> Best,
>> Erick
>>
>> On Wed, Jun 1, 2016 at 12:34 PM, Phillip Peleshok 
>> wrote:
>> > Hey everyone,
>> >
>> > I've been using Solr for some time now and running into GC issues as most
>> > others have.  Now I've exhausted all the traditional GC settings
>> > recommended by various individuals (ie Shawn Heisey, etc) but neither
>> > proved sufficient.  The one solution that I've seen that proved useful is
>> > Heliosearch and the off-heap implementation.
>> >
>> > My question is this, why wasn't the off-heap FieldCache implementation (
>> > http://yonik.com/hs-solr-off-heap-fieldcache-performance/) ever rolled
>> into
>> > Solr when the other HelioSearch improvement were merged? Was there a
>> > fundamental design problem or just a matter of time/testing that would be
>> > incurred by the move?
>> >
>> > Thanks,
>> > Phil
>>



Re: Solr off-heap FieldCache & HelioSearch

2016-06-03 Thread Toke Eskildsen
On Thu, 2016-06-02 at 18:14 -0700, Erick Erickson wrote:
> But memory is an ongoing struggle I'm afraid.

With fear of going too far into devel-territory...


There are several places in Solr where memory usage if far from optimal
with high-cardinality data and where improvements can be made without
better GC or off-heap.

Some places it is due to "clean object oriented" programming, for
example with priority queues filled with objects, which gets very GC
expensive for 100K+ entries. Some of this can be remedied by less clean
coding and bit-hacking, but often results in less-manageable code.

https://sbdevel.wordpress.com/2015/11/13/the-ones-that-got-away/


Other places it is large arrays that are hard to avoid, for example with
docID-bitmaps and counter-arrays for String faceting. These put quite a
strain on GC as they are being allocated and released all the time.
Unless the index is constantly updated, DocValues does not help much
with GC as the counters are the same, DocValues or not.

The layout of these structures is well-defined: As long as the Searcher
has not been re-opened, each new instance of an array is of the exact
same size as the previous one. When the searcher is re-opened, all the
sizes changes. Putting those structures off-heap is one solution,
another is to re-use the structures.

Our experiments with re-using faceting counter structures has been very
promising (far less GC, lower response times). I would think that the
same would be true for a similar docID-bitmap re-use scheme.


So yes, very much an on-going struggle, but one where there are multiple
known remedies. Not necessarily easy to implement though.

- Toke Eskildsen, State and Univeristy Library, Denmark




Re: Solr off-heap FieldCache & HelioSearch

2016-06-02 Thread Phillip Peleshok
Fantastic! I'm sorry I couldn't find that JIRA before and for getting you
to track it down.

Yup, I noticed that for the docvalues with the ordinal map and I'm
definitely leveraging all that but I'm hitting the terms limit now and that
ends up pushing me over.  I'll see about giving Zing/Azul a try.  From all
my readings using theUnsafe seemed a little sketchy (
http://mishadoff.com/blog/java-magic-part-4-sun-dot-misc-dot-unsafe/) so
I'm glad that seemed to be the point of contention bringing it in and not
anything else.

Thank you very much for the info,
Phil

On Thu, Jun 2, 2016 at 6:14 PM, Erick Erickson 
wrote:

> Basically it never reached consensus, see the discussion at:
> https://issues.apache.org/jira/browse/SOLR-6638
>
> If you can afford it I've seen people with very good results
> using Zing/Azul, but that can be expensive.
>
> DocValues can help for fields you facet and sort on,
> those essentially move memory into the OS
> cache.
>
> But memory is an ongoing struggle I'm afraid.
>
> Best,
> Erick
>
> On Wed, Jun 1, 2016 at 12:34 PM, Phillip Peleshok 
> wrote:
> > Hey everyone,
> >
> > I've been using Solr for some time now and running into GC issues as most
> > others have.  Now I've exhausted all the traditional GC settings
> > recommended by various individuals (ie Shawn Heisey, etc) but neither
> > proved sufficient.  The one solution that I've seen that proved useful is
> > Heliosearch and the off-heap implementation.
> >
> > My question is this, why wasn't the off-heap FieldCache implementation (
> > http://yonik.com/hs-solr-off-heap-fieldcache-performance/) ever rolled
> into
> > Solr when the other HelioSearch improvement were merged? Was there a
> > fundamental design problem or just a matter of time/testing that would be
> > incurred by the move?
> >
> > Thanks,
> > Phil
>


Re: Solr off-heap FieldCache & HelioSearch

2016-06-02 Thread Erick Erickson
Basically it never reached consensus, see the discussion at:
https://issues.apache.org/jira/browse/SOLR-6638

If you can afford it I've seen people with very good results
using Zing/Azul, but that can be expensive.

DocValues can help for fields you facet and sort on,
those essentially move memory into the OS
cache.

But memory is an ongoing struggle I'm afraid.

Best,
Erick

On Wed, Jun 1, 2016 at 12:34 PM, Phillip Peleshok  wrote:
> Hey everyone,
>
> I've been using Solr for some time now and running into GC issues as most
> others have.  Now I've exhausted all the traditional GC settings
> recommended by various individuals (ie Shawn Heisey, etc) but neither
> proved sufficient.  The one solution that I've seen that proved useful is
> Heliosearch and the off-heap implementation.
>
> My question is this, why wasn't the off-heap FieldCache implementation (
> http://yonik.com/hs-solr-off-heap-fieldcache-performance/) ever rolled into
> Solr when the other HelioSearch improvement were merged? Was there a
> fundamental design problem or just a matter of time/testing that would be
> incurred by the move?
>
> Thanks,
> Phil


Solr off-heap FieldCache & HelioSearch

2016-06-01 Thread Phillip Peleshok
Hey everyone,

I've been using Solr for some time now and running into GC issues as most
others have.  Now I've exhausted all the traditional GC settings
recommended by various individuals (ie Shawn Heisey, etc) but neither
proved sufficient.  The one solution that I've seen that proved useful is
Heliosearch and the off-heap implementation.

My question is this, why wasn't the off-heap FieldCache implementation (
http://yonik.com/hs-solr-off-heap-fieldcache-performance/) ever rolled into
Solr when the other HelioSearch improvement were merged? Was there a
fundamental design problem or just a matter of time/testing that would be
incurred by the move?

Thanks,
Phil


Are fieldCache and/or DocValues used by Function Queries

2016-02-11 Thread Andrea Roggerone
Hi,
I need to evaluate different boost solutions performance and I can't find
any relevant documentation about it. Are fieldCache and/or DocValues used
by Function Queries?


Does bf for eDismax use DocValue or FieldCache?

2016-02-10 Thread Andrea Roggerone
Hi,
I need to boost documents at runtime according to a set of roles and
related ids. For instance I would have the fields:
ceo:1234-abcd-5678-poiu
tl:-abcd-5678-abc

and a set of boosts to apply a runtime, for instance
ceo = 10
tl = 5

I don't want to do any complex operation with the weights and I am happy of
boosting by the value of the most relevant role, in the previous case would
be ceo.
Since I use eDismax parser, the syntax I'd like to use would be:

*bf*=if(*termfreq*
(ceo,"85a09bd5-2ff2-464c-9bc5-33a38a7f1234"),3,if(termfreq(tl,"85a09bd5-2ff2-464c-9bc5-33a38a7123456"),2,1))


however I am worried about performance.
My questions are:
- In the bf parameter are FieldCache and DocValues used?
- Is termfreq calculated all the time or we simply read the existing value?
- increasing the number of clauses (for instance adding more nested if)
what kind of impact would have on my performance?
- My alternative would be to use the payload. Is that a better option and
why?

Thanks!!


Re: FieldCache

2016-01-20 Thread Yonik Seeley
On Thu, Jan 14, 2016 at 2:43 PM, Lewin Joy (TMS)  wrote:
> Thanks for the reply.
> But, the grouping on multivalued is working for me even with multiple data in 
> the multivalued field.
> I also tested this on the tutorial collection from the later solr version 
> 5.3.1 , which works as well.

Older versions of Solr would happily populate a FieldCache entry with
a multi-valued field by overwriting old values with new values while
uninverting.  Thus the FieldCache entry (used for sorting, faceting,
grouping, function queries, etc) would contain just the last/highest
value for any document.
So that sort-of explains how it was working in the past I think...
probably not how you intended.

If it works sometimes, but not other times, it may be due to details
of the request that cause one code path to be executed vs another, and
you hit a path were the check is done vs not.  The check checks the
schema only.

For example, in StrField.java:

  @Override
  public ValueSource getValueSource(SchemaField field, QParser parser) {
field.checkFieldCacheSource(parser);
return new StrFieldSource(field.getName());
  }

There are different implementations of grouping... and only some go
through a ValueSource I believe... and those are the only ones that
would check to see if the field was single valued.  The grouping code
started in Solr, but was refactored and moved to Lucene, and I'm no
longer that familiar with it.

-Yonik


RE: FieldCache

2016-01-14 Thread Lewin Joy (TMS)
Hi Toke,

Thanks for the reply. 
But, the grouping on multivalued is working for me even with multiple data in 
the multivalued field.
I also tested this on the tutorial collection from the later solr version 5.3.1 
, which works as well.
Maybe the wiki needs to be updated?

-Lewin

-Original Message-
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] 
Sent: Thursday, January 14, 2016 12:31 AM
To: solr-user@lucene.apache.org
Subject: Re: FieldCache

On Thu, 2016-01-14 at 00:18 +, Lewin Joy (TMS) wrote:
> I am working on Solr 4.10.3 on Cloudera CDH 5.4.4 and am trying to 
> group results on a multivalued field, let's say "interests".
...
> But after I just re-indexed the data, it started working.

Grouping is not supposed to be supported for multi-valued fields:
https://cwiki.apache.org/confluence/display/solr/Result+Grouping

I wonder if there might be an edge case where the field is marked as multiValue 
in schema.xml, but only contains single-values?

- Toke Eskildsen, State and University Library, Denmark




Re: FieldCache

2016-01-14 Thread Toke Eskildsen
On Thu, 2016-01-14 at 00:18 +, Lewin Joy (TMS) wrote:
> I am working on Solr 4.10.3 on Cloudera CDH 5.4.4 and am trying to
> group results on a multivalued field, let's say "interests".
...
> But after I just re-indexed the data, it started working.

Grouping is not supposed to be supported for multi-valued fields:
https://cwiki.apache.org/confluence/display/solr/Result+Grouping

I wonder if there might be an edge case where the field is marked as
multiValue in schema.xml, but only contains single-values?

- Toke Eskildsen, State and University Library, Denmark




FieldCache

2016-01-13 Thread Lewin Joy (TMS)
Hi,

I have been facing a weird issue in solr.

I am working on Solr 4.10.3 on Cloudera CDH 5.4.4 and am trying to group 
results on a multivalued field, let's say "interests".
This is giving me an error message below:

  "error": {
"msg": "can not use FieldCache on multivalued field: interests",
"code": 400
  }

I thought this could be a version issue. 
But after I just re-indexed the data, it started working.

I wanted to understand this error message and why it could be failing sometimes 
on multivalued fields.

Thanks,
-Lewin



Error: FieldCache on multivalued field

2016-01-13 Thread Lewin Joy (TMS)
*updated subject line

Hi,

I have been facing a weird issue in solr.

I am working on Solr 4.10.3 on Cloudera CDH 5.4.4 and am trying to group 
results on a multivalued field, let's say "interests".
This is giving me an error message below:

  "error": {
"msg": "can not use FieldCache on multivalued field: interests",
"code": 400
  }

I thought this could be a version issue. 
But after I just re-indexed the data, it started working.

I wanted to understand this error message and why it could be failing sometimes 
on multivalued fields.

Thanks,
-Lewin



Re: FieldCache?

2015-11-04 Thread Chris Hostetter

: What is the implication of this? Should we move all facets to DocValues
: when we have high cardinality (lots of values) ? Are we adding it back?

1) Using DocValues is almost certainly a good idea moving forward for 
situations where the FieldCache was used in the past.

: FieldCache is gone (moved to a dedicated UninvertingReader in the miscmodule).

2) Solr implicitly uses the UninvertingReader under the covers on your 
behalf automatically in cases where FieldCache is not available.



-Hoss
http://www.lucidworks.com/


Re: FieldCache?

2015-10-06 Thread Alessandro Benedetti
For completeness this is the related issue :

https://issues.apache.org/jira/browse/SOLR-8096

Cheers

2015-10-06 11:21 GMT+01:00 Alessandro Benedetti 
:

> We should make some precision here,
> When dealing with faceting , there are currently 2 main approaches :
>
> 1) *Enum Algorithm* - best for low cardinality value fields, it is based
> on retrieving the term enum for all the terms in the index, and then
> intersecting the related posting list with the query result set
>
> 2) *Un-Inverting Algorithms* - Best for high cardinality value fields, it
> is based on uninverting the index, checking the value for each query result
> document field, and counting the occurrences
>
> Within the 2nd approach :
>
> 2a) *Doc Values * - It is better for dynamic indexes, built at indexing
> time and stored on the disk ( setting the specific attribute for the field)
> OR calculated at runtime thanks to the UnInvertedReader that will uninvert
> to an in-memory structure that looks like DocValues
>
> 2b) *Uninverted field* - for index that changes less frequently . After
> the removal related :
> Only per segment field caches are available, which means you should be
> able to use it using the fcs algorithm.
>
> This should be the current situation,
> I will take a look into details and let you know if I understood something
> wrong.
>
> Cheers
>
>
>
> 2015-10-06 5:03 GMT+01:00 William Bell :
>
>> So the FieldCache was removed from Solr 5.
>>
>> What is the implication of this? Should we move all facets to DocValues
>> when we have high cardinality (lots of values) ? Are we adding it back?
>>
>> Other ideas to improve performance?
>>
>> From Mike M:
>>
>> FieldCache is gone (moved to a dedicated UninvertingReader in the
>> miscmodule).
>> This means when you intend to sort on a field, you should index that field
>> using doc values, which is much faster and less heap consuming than
>> FieldCache.
>>
>> --
>> Bill Bell
>> billnb...@gmail.com
>> cell 720-256-8076
>>
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: FieldCache?

2015-10-06 Thread Alessandro Benedetti
We should make some precision here,
When dealing with faceting , there are currently 2 main approaches :

1) *Enum Algorithm* - best for low cardinality value fields, it is based on
retrieving the term enum for all the terms in the index, and then
intersecting the related posting list with the query result set

2) *Un-Inverting Algorithms* - Best for high cardinality value fields, it
is based on uninverting the index, checking the value for each query result
document field, and counting the occurrences

Within the 2nd approach :

2a) *Doc Values * - It is better for dynamic indexes, built at indexing
time and stored on the disk ( setting the specific attribute for the field)
OR calculated at runtime thanks to the UnInvertedReader that will uninvert
to an in-memory structure that looks like DocValues

2b) *Uninverted field* - for index that changes less frequently . After the
removal related :
Only per segment field caches are available, which means you should be able
to use it using the fcs algorithm.

This should be the current situation,
I will take a look into details and let you know if I understood something
wrong.

Cheers



2015-10-06 5:03 GMT+01:00 William Bell :

> So the FieldCache was removed from Solr 5.
>
> What is the implication of this? Should we move all facets to DocValues
> when we have high cardinality (lots of values) ? Are we adding it back?
>
> Other ideas to improve performance?
>
> From Mike M:
>
> FieldCache is gone (moved to a dedicated UninvertingReader in the
> miscmodule).
> This means when you intend to sort on a field, you should index that field
> using doc values, which is much faster and less heap consuming than
> FieldCache.
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


FieldCache?

2015-10-05 Thread William Bell
So the FieldCache was removed from Solr 5.

What is the implication of this? Should we move all facets to DocValues
when we have high cardinality (lots of values) ? Are we adding it back?

Other ideas to improve performance?

>From Mike M:

FieldCache is gone (moved to a dedicated UninvertingReader in the miscmodule).
This means when you intend to sort on a field, you should index that field
using doc values, which is much faster and less heap consuming than
FieldCache.

-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: FieldCache error for multivalued fields in json facets.

2015-09-21 Thread Vishnu Mishra
Hi I am using solr 5.3 and I have the same problem while doing json facet on
multivalued field. Below is the error stack trace :




2015-09-21 21:26:09,292 ERROR org.apache.solr.core.SolrCore  ?
org.apache.solr.common.SolrException: can not use FieldCache on multivalued
field: FLAG
at
org.apache.solr.schema.SchemaField.checkFieldCacheSource(SchemaField.java:187)
at
org.apache.solr.schema.TrieField.getValueSource(TrieField.java:231)
at
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:378)
at
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:235)
at
org.apache.solr.search.ValueSourceParser$79.parse(ValueSourceParser.java:845)
at
org.apache.solr.search.FunctionQParser.parseAgg(FunctionQParser.java:414)
at
org.apache.solr.search.facet.FacetParser.parseStringStat(FacetRequest.java:272)
at
org.apache.solr.search.facet.FacetParser.parseStringFacetOrStat(FacetRequest.java:265)
at
org.apache.solr.search.facet.FacetParser.parseFacetOrStat(FacetRequest.java:199)
at
org.apache.solr.search.facet.FacetParser.parseSubs(FacetRequest.java:179)
at
org.apache.solr.search.facet.FacetTopParser.parse(FacetRequest.java:427)
at
org.apache.solr.search.facet.FacetTopParser.parse(FacetRequest.java:416)
at
org.apache.solr.search.facet.FacetModule.prepare(FacetModule.java:125)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:251)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:210)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:142)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
at
org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:617)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:518)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1091)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:668)
at
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1521)
at
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1478)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Unknown Source)




--
View this message in context: 
http://lucene.472066.n3.nabble.com/FieldCache-error-for-multivalued-fields-in-json-facets-tp4216995p4230304.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: FieldCache error for multivalued fields in json facets.

2015-07-14 Thread Iana Bondarska
Yonik, Upayavira,
thanks for response. Here is the stacktrace from solr logs.
I can make my field single valued, but are there any plans to fix this or
in general mulitvalued fields should not be used for metric calculation ?
what about other metrics, e.g. avg, min,max -- should I be able to
calculate them on multivalued fields?

org.apache.solr.common.SolrException: can not use FieldCache on
multivalued field: sales
at 
org.apache.solr.schema.SchemaField.checkFieldCacheSource(SchemaField.java:187)
at org.apache.solr.schema.TrieField.getValueSource(TrieField.java:236)
at 
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:378)
at 
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:235)
at 
org.apache.solr.search.ValueSourceParser$79.parse(ValueSourceParser.java:832)
at 
org.apache.solr.search.FunctionQParser.parseAgg(FunctionQParser.java:414)
at 
org.apache.solr.search.facet.FacetParser.parseStringStat(FacetRequest.java:522)
at 
org.apache.solr.search.facet.FacetParser.parseStringFacetOrStat(FacetRequest.java:515)
at 
org.apache.solr.search.facet.FacetParser.parseFacetOrStat(FacetRequest.java:449)
at 
org.apache.solr.search.facet.FacetParser.parseSubs(FacetRequest.java:429)
at 
org.apache.solr.search.facet.FacetFieldParser.parse(FacetRequest.java:728)
at 
org.apache.solr.search.facet.FacetParser.parseFieldFacet(FacetRequest.java:500)
at 
org.apache.solr.search.facet.FacetParser.parseFacetOrStat(FacetRequest.java:486)
at 
org.apache.solr.search.facet.FacetParser.parseFacetOrStat(FacetRequest.java:479)
at 
org.apache.solr.search.facet.FacetParser.parseSubs(FacetRequest.java:429)
at 
org.apache.solr.search.facet.FacetTopParser.parse(FacetRequest.java:646)
at 
org.apache.solr.search.facet.FacetTopParser.parse(FacetRequest.java:635)
at 
org.apache.solr.search.facet.FacetModule.prepare(FacetModule.java:125)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:229)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:497)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)


2015-07-13 19:24 GMT+03:00 Yonik Seeley :

> On Mon, Jul 13, 2015 at 1:55 AM, Iana Bondarska 
> wrote:
> > Hi,
> > I'm using json query api for solr 5.2. When query for metrics for
> > multivalued fields, I get error:
> > can not use FieldCache on multivalued field: sales.
> >
> > I've found in solr wiki that to avoid using fieldcache I should set
> > facet.method parameter to "enum".
> > Now my question is how can I add facet.enum parameter to query?
> > My original query looks like this:
> &g

Re: FieldCache error for multivalued fields in json facets.

2015-07-13 Thread Yonik Seeley
On Mon, Jul 13, 2015 at 1:55 AM, Iana Bondarska  wrote:
> Hi,
> I'm using json query api for solr 5.2. When query for metrics for
> multivalued fields, I get error:
> can not use FieldCache on multivalued field: sales.
>
> I've found in solr wiki that to avoid using fieldcache I should set
> facet.method parameter to "enum".
> Now my question is how can I add facet.enum parameter to query?
> My original query looks like this:
> {"limit":0,"offset":0,"facet":{"facet":{"facet":{"mechanicnumbers_sum":"sum(sales)"},"limit":0,"field":"brand","type":"terms"}}}

sum(field) is currently only implemented for single-valued numeric fields.
Can you make the sales field single-valued, or do you actually need
multiple values per document?

-Yonik


Re: FieldCache error for multivalued fields in json facets.

2015-07-13 Thread Upayavira


On Mon, Jul 13, 2015, at 06:55 AM, Iana Bondarska wrote:
> Hi,
> I'm using json query api for solr 5.2. When query for metrics for
> multivalued fields, I get error:
> can not use FieldCache on multivalued field: sales.
> 
> I've found in solr wiki that to avoid using fieldcache I should set
> facet.method parameter to "enum".
> Now my question is how can I add facet.enum parameter to query?
> My original query looks like this:
> {"limit":0,"offset":0,"facet":{"facet":{"facet":{"mechanicnumbers_sum":"sum(sales)"},"limit":0,"field":"brand","type":"terms"}}}
> 
> Adding method:enum inside facet doesn't help. Adding facet.method=enum
> outside json parameter also doesn't help.

Can you provide the whole exception, including stack trace? This looks
like a bug to me, as it should switch to using the FieldValueCache for
multivalued fields rather than fail to use the FieldCache.

Upayavira


FieldCache error for multivalued fields in json facets.

2015-07-12 Thread Iana Bondarska
Hi,
I'm using json query api for solr 5.2. When query for metrics for
multivalued fields, I get error:
can not use FieldCache on multivalued field: sales.

I've found in solr wiki that to avoid using fieldcache I should set
facet.method parameter to "enum".
Now my question is how can I add facet.enum parameter to query?
My original query looks like this:
{"limit":0,"offset":0,"facet":{"facet":{"facet":{"mechanicnumbers_sum":"sum(sales)"},"limit":0,"field":"brand","type":"terms"}}}

Adding method:enum inside facet doesn't help. Adding facet.method=enum
outside json parameter also doesn't help.

Best Regards,
Iana


What contribute to a Solr core's FieldCache entry_count?

2015-06-16 Thread forest_soup
For the fieldCache, what determines the entries_count? 

Is each search request containing a sort on an non-docValues field
contribute one entry to the entries_count?

For example, search A ( q=owner:1&sort=maildate asc ) and search b (
q=owner:2&sort=maildate asc ) will contribute 2 field cache entries ?

I have a collection containing only one core, and there is only one doc
within it, why there are so many lucene fieldCache? 

<http://lucene.472066.n3.nabble.com/file/n4212148/%244FA9F550C60D3BA2.jpg> 
<http://lucene.472066.n3.nabble.com/file/n4212148/Untitled.png> 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-contribute-to-a-Solr-core-s-FieldCache-entry-count-tp4212148.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Compression vs FieldCache for doc ids retrieval

2014-06-01 Thread jim ferenczi
@William Firstly because I was sure that the ticket (or an equivalent) was
already opened but I just could not find it. Thanks @Manuel. Secondly
because I wanted to start the discussion, I have the feeling that the
compression of the documents, activated by default, can be a killer for
some applications (if the number of shards is big or if you have a lot of
deep paging queries) and I wanted to check if someone noticed the problem
in a benchmark. Let's say that you have 10 shards and you want to return 10
documents per request, in the first stage of the search each shard would
need to decompress 10 blocks of 16k each whereas the second stage would
need to decompress only 10 blocks total. This makes me believe that this
patch should be the default behaviour for any distributed search in Solr (I
mean more than 1 shard).
Maybe it's better to continue the discussion on the ticket created by
Manuel, but still, I think that it could speed up every queries (not only
deep paging queries like in the patch proposed in Manuel's ticket).

Jim



2014-06-01 14:06 GMT+09:00 William Bell :

> Why not just submit a JIRA issue - and add your patch so that we can all
> benefit?
>
>
> On Fri, May 30, 2014 at 5:34 AM, Manuel Le Normand <
> manuel.lenorm...@gmail.com> wrote:
>
> > Is the issue SOLR-5478 what you were looking for?
> >
>
>
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076
>


Re: Compression vs FieldCache for doc ids retrieval

2014-05-31 Thread William Bell
Why not just submit a JIRA issue - and add your patch so that we can all
benefit?


On Fri, May 30, 2014 at 5:34 AM, Manuel Le Normand <
manuel.lenorm...@gmail.com> wrote:

> Is the issue SOLR-5478 what you were looking for?
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Compression vs FieldCache for doc ids retrieval

2014-05-30 Thread Manuel Le Normand
Is the issue SOLR-5478 what you were looking for?


Compression vs FieldCache for doc ids retrieval

2014-05-26 Thread jim ferenczi
Dear Solr users,

we migrated our solution from Solr 4.0 to Solr 4.3 and we noticed a
degradation of the search performance. We compared the two versions and
found out that most of the time is spent in the decompression of the
retrievable fields in Solr 4.3. The block compression of the documents is a
great feature for us because it reduces the size of our index but we don’t
have enough resources (I mean cpus) to safely migrate to the new version.
In order to reduce the cost of the decompression we tried a simple patch in
the BinaryResponseWriter; during the first phase of the distributed search
the response writer gets the documents from the index reader to only
extract the doc ids of the top N results. Our patch uses the field cache to
get the doc ids during the first phase and thus replaces a full
decompression of 16k blocks (for a single document) by a simple get in an
array (the field cache or the doc values). Thanks to this patch we are now
able to handle the same number of QPS than before (with Solr 4.0). Of
course the document cache could help as well but but not as much as one
would have though (mainly because we have a lot of deep paging queries).

I am sure that the idea we implemented is not new but I haven’t seen any
Jira about it. Should we create one (I mean does it have a chance to be
included in future release of Solr or does anybody already working on this)
?

Cheers,

Jim


Re: Solr Core Reload causing JVM Memory Leak through FieldCache/LRUCache/LFUCache

2013-11-15 Thread Umesh Prasad
Mailing list by default removes attachments. So uploaded it to google drive
..

https://drive.google.com/file/d/0B-RnB4e-vaJhX280NVllMUdHYWs/edit?usp=sharing



On Fri, Nov 15, 2013 at 2:28 PM, Umesh Prasad  wrote:

> Hi All,
> We are seeing memory leaks in our Search application whenever core
> reload happens after replication.
>We are using Solr 3.6.2 and I have observed this consistently on all
> servers.
>
> The leak suspect analysis from MAT is attached with the mail.
>
>  <#1425afb4a706064b_>  Problem Suspect 1
>
> One instance of *"org.apache.lucene.search.FieldCacheImpl"*loaded by 
> *"org.apache.catalina.loader.WebappClassLoader
> @ 0x7f7b0a5b8b30"* occupies *8,726,099,312 (35.49%)* bytes. The memory is
> accumulated in one instance of*"java.util.HashMap$Entry[]"* loaded by 
> *" class loader>"*.
>
> *Keywords*
> org.apache.catalina.loader.WebappClassLoader @ 0x7f7b0a5b8b30
> java.util.HashMap$Entry[]
> org.apache.lucene.search.FieldCacheImpl
>
> Problem Suspect 2
>
> 69 instances of *"org.apache.solr.util.ConcurrentLRUCache"*, loaded by 
> *"org.apache.catalina.loader.WebappClassLoader
> @ 0x7f7b0a5b8b30"* occupy *6,309,187,392 (25.66%)* bytes.
>
> Biggest instances:
>
>- org.apache.solr.util.ConcurrentLRUCache @
>0x7f7fe74ef120 - 755,575,672 (3.07%) bytes.
>- org.apache.solr.util.ConcurrentLRUCache @
>0x7f7e74b7a068 - 728,731,344 (2.96%) bytes.
>- org.apache.solr.util.ConcurrentLRUCache @
>0x7f7d0a6bd1b8 - 711,828,392 (2.90%) bytes.
>- org.apache.solr.util.ConcurrentLRUCache @
>0x7f7c6c12e800 - 708,657,624 (2.88%) bytes.
>- org.apache.solr.util.ConcurrentLRUCache @
>0x7f7fcb092058 - 568,473,352 (2.31%) bytes.
>- org.apache.solr.util.ConcurrentLRUCache @
>0x7f7f268cb2f0 - 568,400,040 (2.31%) bytes.
>- org.apache.solr.util.ConcurrentLRUCache @
>0x7f7e31b60c58 - 544,078,600 (2.21%) bytes.
>- org.apache.solr.util.ConcurrentLRUCache @
>0x7f7e65c2b2d8 - 489,578,480 (1.99%) bytes.
>- org.apache.solr.util.ConcurrentLRUCache @
>0x7f7d81ea8538 - 467,833,720 (1.90%) bytes.
>- org.apache.solr.util.ConcurrentLRUCache @
>0x7f7f31996508 - 444,383,992 (1.81%) bytes.
>
>
>
> *Keywords*
> org.apache.catalina.loader.WebappClassLoader @ 0x7f7b0a5b8b30
> org.apache.solr.util.ConcurrentLRUCache
> Details » 
>
> 194 instances of *"org.apache.solr.util.ConcurrentLFUCache"*, loaded by 
> *"org.apache.catalina.loader.WebappClassLoader
> @ 0x7f7b0a5b8b30"* occupy *4,583,727,104 (18.64%)* bytes.
>
> Biggest instances:
>
>- org.apache.solr.util.ConcurrentLFUCache @
>0x7f7cdd4735a0 - 410,628,176 (1.67%) bytes.
>- org.apache.solr.util.ConcurrentLFUCache @
>0x7f7c7d48e180 - 390,690,864 (1.59%) bytes.
>- org.apache.solr.util.ConcurrentLFUCache @
>0x7f7f1edfd008 - 348,193,312 (1.42%) bytes.
>- org.apache.solr.util.ConcurrentLFUCache @
>0x7f7f37b01990 - 340,595,920 (1.39%) bytes.
>- org.apache.solr.util.ConcurrentLFUCache @
>0x7f7fe02d8dd8 - 274,611,632 (1.12%) bytes.
>- org.apache.solr.util.ConcurrentLFUCache @
>0x7f7fa9dcfb20 - 253,848,232 (1.03%) bytes.
>
>
>
> *Keywords*
> org.apache.catalina.loader.WebappClassLoader @ 0x7f7b0a5b8b30
> org.apache.solr.util.ConcurrentLFUCache
>
>
> ---
> Thanks & Regards
> Umesh Prasad
>
> SDE @ Flipkart  : The Online Megastore at your doorstep ..
>



-- 
---
Thanks & Regards
Umesh Prasad


Re: No or limited use of FieldCache

2013-09-12 Thread Per Steffensen

On 9/12/13 3:28 PM, Toke Eskildsen wrote:

On Thu, 2013-09-12 at 14:48 +0200, Per Steffensen wrote:

Actually some months back I made PoC of a FieldCache that could expand
beyond the heap. Basically imagine a FieldCache with room for
"unlimited" data-arrays, that just behind the scenes goes to
memory-mapped files when there is no more room on heap.

That sounds a lot like disk-based DocValues.


He he

But that solution will also have the "running out of swap space"-problems.

Not really. Memory mapping works like the disk cache: There is no
requirement that a certain amount of physical memory needs to be
available, it just takes what it can get. If there are not a lot of
physical memory, it will require a lot of storage access, but it will
not over-allocate swap space.
That was also my impression, but during the work, I experienced some 
problems around swap space, but I do not remember exactly what I saw, 
and therefore how I concluded that everything in mm-files actually have 
to fit in physical mem + swap. I might very well have been wrong in that 
conclusion

It seems that different setups vary quite a lot in this area and some
systems are prone to aggressive use of the swap file, which can severely
harm responsiveness of applications with out-swapped data.

However, this should still not result in any OOM's, as the system can
always discard some of the memory mapped data if it needs more physical
memory.

I saw no OOMs

- Toke Eskildsen, State and University Library, Denmark





Re: No or limited use of FieldCache

2013-09-12 Thread Toke Eskildsen
On Thu, 2013-09-12 at 14:48 +0200, Per Steffensen wrote:
> Actually some months back I made PoC of a FieldCache that could expand 
> beyond the heap. Basically imagine a FieldCache with room for 
> "unlimited" data-arrays, that just behind the scenes goes to 
> memory-mapped files when there is no more room on heap.

That sounds a lot like disk-based DocValues.

[...]

> But that solution will also have the "running out of swap space"-problems.

Not really. Memory mapping works like the disk cache: There is no
requirement that a certain amount of physical memory needs to be
available, it just takes what it can get. If there are not a lot of
physical memory, it will require a lot of storage access, but it will
not over-allocate swap space.


It seems that different setups vary quite a lot in this area and some
systems are prone to aggressive use of the swap file, which can severely
harm responsiveness of applications with out-swapped data.

However, this should still not result in any OOM's, as the system can
always discard some of the memory mapped data if it needs more physical
memory.

- Toke Eskildsen, State and University Library, Denmark




Re: No or limited use of FieldCache

2013-09-12 Thread Per Steffensen

Yes, thanks.

Actually some months back I made PoC of a FieldCache that could expand 
beyond the heap. Basically imagine a FieldCache with room for 
"unlimited" data-arrays, that just behind the scenes goes to 
memory-mapped files when there is no more room on heap. Never finished 
it, and it might be kinda stupid because you actually just go read the 
data from lucene indices and write them to memory-mapped files in order 
to use them. It is better to just use the data in the Lucene indices 
instead. But it had some nice features. But that solution will also have 
the "running out of swap space"-problems.


Regards, Per Steffensen

On 9/12/13 12:48 PM, Erick Erickson wrote:

Per:

One thing I'll be curious about. From my reading of DocValues, it uses
little or no heap. But it _will_ use memory from the OS if I followed
Simon's slides correctly. So I wonder if you'll hit swapping issues...
Which are better than OOMs, certainly...

Thanks,
Erick




Re: No or limited use of FieldCache

2013-09-12 Thread Erick Erickson
Per:

One thing I'll be curious about. From my reading of DocValues, it uses
little or no heap. But it _will_ use memory from the OS if I followed
Simon's slides correctly. So I wonder if you'll hit swapping issues...
Which are better than OOMs, certainly...

Thanks,
Erick


On Thu, Sep 12, 2013 at 2:07 AM, Per Steffensen  wrote:

> Thanks, guys. Now I know a little more about DocValues and realize that
> they will do the job wrt FieldCache.
>
> Regards, Per Steffensen
>
>
> On 9/12/13 3:11 AM, Otis Gospodnetic wrote:
>
>> Per,  check zee Wiki, there is a page describing docvalues. We used them
>> successfully in a solr for analytics scenario.
>>
>> Otis
>> Solr & ElasticSearch Support
>> http://sematext.com/
>> On Sep 11, 2013 9:15 AM, "Michael Sokolov" > com >
>> wrote:
>>
>>  On 09/11/2013 08:40 AM, Per Steffensen wrote:
>>>
>>>  The reason I mention sort is that we in my project, half a year ago,
>>>> have
>>>> dealt with the FieldCache->OOM-problem when doing sort-requests. We
>>>> basically just reject sort-requests unless they hit below X documents -
>>>> in
>>>> case they do we just find them without sorting and sort them ourselves
>>>> afterwards.
>>>>
>>>> Currently our problem is, that we have to do a group/distinct (in
>>>> SQL-language) query and we have found that we can do what we want to do
>>>> using group 
>>>> (http://wiki.apache.org/solr/FieldCollapsing<http://wiki.apache.org/solr/**FieldCollapsing>
>>>> <http://wiki.**apache.org/solr/**FieldCollapsing<http://wiki.apache.org/solr/FieldCollapsing>
>>>> >)
>>>> or facet - either will work for us. Problem is that they both use
>>>> FieldCache and we "know" that using FieldCache will lead to
>>>> OOM-execptions
>>>> with the amount of data each of our Solr-nodes administrate. This time
>>>> we
>>>> have really no option of just "limit" usage as we did with sort.
>>>> Therefore
>>>> we need a group/distinct-functionality that works even on huge
>>>> data-amounts
>>>> (and a algorithm using FieldCache will not)
>>>>
>>>> I believe setting facet.method=enum will actually make facet not use the
>>>> FieldCache. Is that true? Is it a bad idea?
>>>>
>>>> I do not know much about DocValues, but I do not believe that you will
>>>> avoid FieldCache by using DocValues? Please elaborate, or point to
>>>> documentation where I will be able to read that I am wrong. Thanks!
>>>>
>>>>  There is Simon Willnauer's presentation http://www.slideshare.net/**
>>> lucenerevolution/willnauer-simon-doc-values-column-**
>>> stride-fields-in-lucene>> lucenerevolution/willnauer-**simon-doc-values-column-**
>>> stride-fields-in-lucene<http://www.slideshare.net/lucenerevolution/willnauer-simon-doc-values-column-stride-fields-in-lucene>
>>> >
>>>
>>> and this blog post 
>>> http://blog.trifork.com/2011/<http://blog.trifork.com/2011/**>
>>> 10/27/introducing-lucene-index-doc-values/<http://blog.**
>>> trifork.com/2011/10/27/**introducing-lucene-index-doc-**values/<http://blog.trifork.com/2011/10/27/introducing-lucene-index-doc-values/>
>>> >
>>>
>>> and this one that shows some performance comparisons:
>>> http://searchhub.org/2013/04/02/fun-with-docvalues-in-**solr-**4-2/<http://searchhub.org/2013/04/**02/fun-with-docvalues-in-solr-**4-2/>
>>> <http://searchhub.**org/2013/04/02/fun-with-**docvalues-in-solr-4-2/<http://searchhub.org/2013/04/02/fun-with-docvalues-in-solr-4-2/>
>>> >
>>>
>>>
>>>
>>>
>>>
>


Re: No or limited use of FieldCache

2013-09-11 Thread Per Steffensen
Thanks, guys. Now I know a little more about DocValues and realize that 
they will do the job wrt FieldCache.


Regards, Per Steffensen

On 9/12/13 3:11 AM, Otis Gospodnetic wrote:

Per,  check zee Wiki, there is a page describing docvalues. We used them
successfully in a solr for analytics scenario.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Sep 11, 2013 9:15 AM, "Michael Sokolov" 
wrote:


On 09/11/2013 08:40 AM, Per Steffensen wrote:


The reason I mention sort is that we in my project, half a year ago, have
dealt with the FieldCache->OOM-problem when doing sort-requests. We
basically just reject sort-requests unless they hit below X documents - in
case they do we just find them without sorting and sort them ourselves
afterwards.

Currently our problem is, that we have to do a group/distinct (in
SQL-language) query and we have found that we can do what we want to do
using group 
(http://wiki.apache.org/solr/**FieldCollapsing<http://wiki.apache.org/solr/FieldCollapsing>)
or facet - either will work for us. Problem is that they both use
FieldCache and we "know" that using FieldCache will lead to OOM-execptions
with the amount of data each of our Solr-nodes administrate. This time we
have really no option of just "limit" usage as we did with sort. Therefore
we need a group/distinct-functionality that works even on huge data-amounts
(and a algorithm using FieldCache will not)

I believe setting facet.method=enum will actually make facet not use the
FieldCache. Is that true? Is it a bad idea?

I do not know much about DocValues, but I do not believe that you will
avoid FieldCache by using DocValues? Please elaborate, or point to
documentation where I will be able to read that I am wrong. Thanks!


There is Simon Willnauer's presentation http://www.slideshare.net/**
lucenerevolution/willnauer-**simon-doc-values-column-**
stride-fields-in-lucene<http://www.slideshare.net/lucenerevolution/willnauer-simon-doc-values-column-stride-fields-in-lucene>

and this blog post http://blog.trifork.com/2011/**
10/27/introducing-lucene-**index-doc-values/<http://blog.trifork.com/2011/10/27/introducing-lucene-index-doc-values/>

and this one that shows some performance comparisons:
http://searchhub.org/2013/04/**02/fun-with-docvalues-in-solr-**4-2/<http://searchhub.org/2013/04/02/fun-with-docvalues-in-solr-4-2/>








Re: No or limited use of FieldCache

2013-09-11 Thread Otis Gospodnetic
Per,  check zee Wiki, there is a page describing docvalues. We used them
successfully in a solr for analytics scenario.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Sep 11, 2013 9:15 AM, "Michael Sokolov" 
wrote:

> On 09/11/2013 08:40 AM, Per Steffensen wrote:
>
>> The reason I mention sort is that we in my project, half a year ago, have
>> dealt with the FieldCache->OOM-problem when doing sort-requests. We
>> basically just reject sort-requests unless they hit below X documents - in
>> case they do we just find them without sorting and sort them ourselves
>> afterwards.
>>
>> Currently our problem is, that we have to do a group/distinct (in
>> SQL-language) query and we have found that we can do what we want to do
>> using group 
>> (http://wiki.apache.org/solr/**FieldCollapsing<http://wiki.apache.org/solr/FieldCollapsing>)
>> or facet - either will work for us. Problem is that they both use
>> FieldCache and we "know" that using FieldCache will lead to OOM-execptions
>> with the amount of data each of our Solr-nodes administrate. This time we
>> have really no option of just "limit" usage as we did with sort. Therefore
>> we need a group/distinct-functionality that works even on huge data-amounts
>> (and a algorithm using FieldCache will not)
>>
>> I believe setting facet.method=enum will actually make facet not use the
>> FieldCache. Is that true? Is it a bad idea?
>>
>> I do not know much about DocValues, but I do not believe that you will
>> avoid FieldCache by using DocValues? Please elaborate, or point to
>> documentation where I will be able to read that I am wrong. Thanks!
>>
> There is Simon Willnauer's presentation http://www.slideshare.net/**
> lucenerevolution/willnauer-**simon-doc-values-column-**
> stride-fields-in-lucene<http://www.slideshare.net/lucenerevolution/willnauer-simon-doc-values-column-stride-fields-in-lucene>
>
> and this blog post http://blog.trifork.com/2011/**
> 10/27/introducing-lucene-**index-doc-values/<http://blog.trifork.com/2011/10/27/introducing-lucene-index-doc-values/>
>
> and this one that shows some performance comparisons:
> http://searchhub.org/2013/04/**02/fun-with-docvalues-in-solr-**4-2/<http://searchhub.org/2013/04/02/fun-with-docvalues-in-solr-4-2/>
>
>
>
>


Re: No or limited use of FieldCache

2013-09-11 Thread Michael Sokolov

On 09/11/2013 08:40 AM, Per Steffensen wrote:
The reason I mention sort is that we in my project, half a year ago, 
have dealt with the FieldCache->OOM-problem when doing sort-requests. 
We basically just reject sort-requests unless they hit below X 
documents - in case they do we just find them without sorting and sort 
them ourselves afterwards.


Currently our problem is, that we have to do a group/distinct (in 
SQL-language) query and we have found that we can do what we want to 
do using group (http://wiki.apache.org/solr/FieldCollapsing) or facet 
- either will work for us. Problem is that they both use FieldCache 
and we "know" that using FieldCache will lead to OOM-execptions with 
the amount of data each of our Solr-nodes administrate. This time we 
have really no option of just "limit" usage as we did with sort. 
Therefore we need a group/distinct-functionality that works even on 
huge data-amounts (and a algorithm using FieldCache will not)


I believe setting facet.method=enum will actually make facet not use 
the FieldCache. Is that true? Is it a bad idea?


I do not know much about DocValues, but I do not believe that you will 
avoid FieldCache by using DocValues? Please elaborate, or point to 
documentation where I will be able to read that I am wrong. Thanks!
There is Simon Willnauer's presentation 
http://www.slideshare.net/lucenerevolution/willnauer-simon-doc-values-column-stride-fields-in-lucene


and this blog post 
http://blog.trifork.com/2011/10/27/introducing-lucene-index-doc-values/


and this one that shows some performance comparisons: 
http://searchhub.org/2013/04/02/fun-with-docvalues-in-solr-4-2/






Re: No or limited use of FieldCache

2013-09-11 Thread Per Steffensen
The reason I mention sort is that we in my project, half a year ago, 
have dealt with the FieldCache->OOM-problem when doing sort-requests. We 
basically just reject sort-requests unless they hit below X documents - 
in case they do we just find them without sorting and sort them 
ourselves afterwards.


Currently our problem is, that we have to do a group/distinct (in 
SQL-language) query and we have found that we can do what we want to do 
using group (http://wiki.apache.org/solr/FieldCollapsing) or facet - 
either will work for us. Problem is that they both use FieldCache and we 
"know" that using FieldCache will lead to OOM-execptions with the amount 
of data each of our Solr-nodes administrate. This time we have really no 
option of just "limit" usage as we did with sort. Therefore we need a 
group/distinct-functionality that works even on huge data-amounts (and a 
algorithm using FieldCache will not)


I believe setting facet.method=enum will actually make facet not use the 
FieldCache. Is that true? Is it a bad idea?


I do not know much about DocValues, but I do not believe that you will 
avoid FieldCache by using DocValues? Please elaborate, or point to 
documentation where I will be able to read that I am wrong. Thanks!


Regards, Per Steffensen

On 9/11/13 1:38 PM, Erick Erickson wrote:

I don't know any more than Michael, but I'd _love_ some reports from the
field.

There are some restriction on DocValues though, I believe one of them
is that they don't really work on analyzed data

FWIW,
Erick




Re: No or limited use of FieldCache

2013-09-11 Thread Erick Erickson
I don't know any more than Michael, but I'd _love_ some reports from the
field.

There are some restriction on DocValues though, I believe one of them
is that they don't really work on analyzed data

FWIW,
Erick


On Wed, Sep 11, 2013 at 7:00 AM, Michael Sokolov <
msoko...@safaribooksonline.com> wrote:

> On 9/11/13 3:11 AM, Per Steffensen wrote:
>
>> Hi
>>
>> We have a SolrCloud setup handling huge amounts of data. When we do
>> group, facet or sort searches Solr will use its FieldCache, and add data in
>> it for every single document we have. For us it is not realistic that this
>> will ever fit in memory and we get OOM exceptions. Are there some way of
>> disabling the FieldCache (taking the performance penalty of course) or make
>> it behave in a nicer way where it only uses up to e.g. 80% of the memory
>> available to the JVM? Or other suggestions?
>>
>> Regards, Per Steffensen
>>
> I think you might want to look into using DocValues fields, which are
> column-stride fields stored as compressed arrays - one value per document
> -- for the fields on which you are sorting and faceting. My understanding
> (which is limited) is that these avoid the use of the field cache, and I
> believe you have the option to control whether they are held in memory or
> on disk.  I hope someone who knows more will elaborate...
>
> -Mike
>


Re: No or limited use of FieldCache

2013-09-11 Thread Michael Sokolov

On 9/11/13 3:11 AM, Per Steffensen wrote:

Hi

We have a SolrCloud setup handling huge amounts of data. When we do 
group, facet or sort searches Solr will use its FieldCache, and add 
data in it for every single document we have. For us it is not 
realistic that this will ever fit in memory and we get OOM exceptions. 
Are there some way of disabling the FieldCache (taking the performance 
penalty of course) or make it behave in a nicer way where it only uses 
up to e.g. 80% of the memory available to the JVM? Or other suggestions?


Regards, Per Steffensen
I think you might want to look into using DocValues fields, which are 
column-stride fields stored as compressed arrays - one value per 
document -- for the fields on which you are sorting and faceting. My 
understanding (which is limited) is that these avoid the use of the 
field cache, and I believe you have the option to control whether they 
are held in memory or on disk.  I hope someone who knows more will 
elaborate...


-Mike


No or limited use of FieldCache

2013-09-11 Thread Per Steffensen

Hi

We have a SolrCloud setup handling huge amounts of data. When we do 
group, facet or sort searches Solr will use its FieldCache, and add data 
in it for every single document we have. For us it is not realistic that 
this will ever fit in memory and we get OOM exceptions. Are there some 
way of disabling the FieldCache (taking the performance penalty of 
course) or make it behave in a nicer way where it only uses up to e.g. 
80% of the memory available to the JVM? Or other suggestions?


Regards, Per Steffensen


Re: Who's cleaning the Fieldcache?

2013-08-15 Thread Andrea Gazzarini

Hi Chris, Robert

Thank you very much. First, answers to your questions:

1) which version of Solr are you using?

3.6.0


2) is it possibly you have multiple searchers open (ie: one in use
while another one is warming up) when you're seeing these stats?

No, no multiple searchers.


Now, after one day of experiments, I think I got what's happening. 
Briefly, the behaviour seems to be exactly what Chris described (Weak 
references that are garbage collected when needed). Instead I'm not 
seeing what described by Robert.


This is what I understood aboyut my problem:

- Xms4GB
- sort fields definitely too big...once loaded they got about more than 
2GB of memory


So when replication occurs "new" sort field values (belonging to the new 
replicated segment) are loaded in memory...but before old segment are 
garbage collected on slave (I mean, sort field values belonging to the 
old segment) I have no enough memory (2GB + 2GB...only for sort 
fields)so et voilà: OutOfMemory


What I've done is to reduce unique values of sort fields (now the 
FieldCacheImpl is about 600MB) so there's enough memory for ordinary 
work, for replication stuff and for retaining two different 
FieldCacheImpl references (old and replicated segment)...in this way I 
see exatcly what Chris described: when replication occurs, memory grows 
(on slave) for about 800MB; hereafter the memory has a constant growing 
(very slowly) graph but when it reaches a certain point (about 3.7GB) 
garbage collector run and frees something like 2.2GB. Good, I repeat 
this test a lot of times and the behaviour is always the same.


Thank you very much to both of you
Andrea


On 08/14/2013 11:58 PM, Chris Hostetter wrote:

: > FieldCaches are managed using a WeakHashMap - so once the IndexReader's
: > associated with those FieldCaches are no logner used, they will be garbage
: > collected when and if the JVMs garbage collector get arround to it.
: >
: > if they sit arround after you are done with them, they might look like the
: > ytake upa log of memory, but that just means your JVM Heap has that memory
: > to spare and hasn't needed to clean them up yet.
:
: I don't think this is correct.
:
: When you register an entry in the fieldcache, it registers event
: listeners on the segment's core so that when its close()d, any entries
: are purged rather than waiting on GC.
:
: See FieldCacheImpl.java

Ah ... sweet.  I didn't realize that got added.

(In any case: it looks like a WeakHashMap is still used in case the
listeners never get called, correct?)

But bassed on the details from the OP's first message, it looks like he's
running Solr 3.x (there's mentions of "SolrIndexReader" which fromat what
i can tell was gone by 4.0) so perhaps this is an older version before all
the kinks were worked out in the reader close listeners used by
fieldcache?  (I'm noticing things like LUCENE-3644 in particular)

Andrea:

1) which version of Solr are you using?
2) is it possibly you have multiple searchers open (ie: one in use
while another one is warming up) when you're seeing these stats?



-Hoss




Re: Who's cleaning the Fieldcache?

2013-08-14 Thread Robert Muir
On Wed, Aug 14, 2013 at 5:58 PM, Chris Hostetter
 wrote:
>
> : > FieldCaches are managed using a WeakHashMap - so once the IndexReader's
> : > associated with those FieldCaches are no logner used, they will be garbage
> : > collected when and if the JVMs garbage collector get arround to it.
> : >
> : > if they sit arround after you are done with them, they might look like the
> : > ytake upa log of memory, but that just means your JVM Heap has that memory
> : > to spare and hasn't needed to clean them up yet.
> :
> : I don't think this is correct.
> :
> : When you register an entry in the fieldcache, it registers event
> : listeners on the segment's core so that when its close()d, any entries
> : are purged rather than waiting on GC.
> :
> : See FieldCacheImpl.java
>
> Ah ... sweet.  I didn't realize that got added.
>
> (In any case: it looks like a WeakHashMap is still used in case the
> listeners never get called, correct?)
>

I think it might be the other way around: i think it was weakmap
before always, the close listeners were then added sometime in 3.x
series, so we registered purge events "as an optimization".

But one way to look at it is: readers should really get closed, so why
have the weak map and not just a regular hashmap.

Even if we want to keep the weak map (seriously i dont care, and i
dont want to be the guy fielding complaints on this), I'm going to
open with an issue with a patch that removes it and fails tests in
@afterclass if there is any entries. This way its totally clear
if/when/where anything is "relying on GC" today here and we can at
least look at that.


Re: Who's cleaning the Fieldcache?

2013-08-14 Thread Chris Hostetter

: > FieldCaches are managed using a WeakHashMap - so once the IndexReader's
: > associated with those FieldCaches are no logner used, they will be garbage
: > collected when and if the JVMs garbage collector get arround to it.
: >
: > if they sit arround after you are done with them, they might look like the
: > ytake upa log of memory, but that just means your JVM Heap has that memory
: > to spare and hasn't needed to clean them up yet.
: 
: I don't think this is correct.
: 
: When you register an entry in the fieldcache, it registers event
: listeners on the segment's core so that when its close()d, any entries
: are purged rather than waiting on GC.
: 
: See FieldCacheImpl.java

Ah ... sweet.  I didn't realize that got added.

(In any case: it looks like a WeakHashMap is still used in case the 
listeners never get called, correct?)

But bassed on the details from the OP's first message, it looks like he's 
running Solr 3.x (there's mentions of "SolrIndexReader" which fromat what 
i can tell was gone by 4.0) so perhaps this is an older version before all 
the kinks were worked out in the reader close listeners used by 
fieldcache?  (I'm noticing things like LUCENE-3644 in particular)

Andrea: 

1) which version of Solr are you using?
2) is it possibly you have multiple searchers open (ie: one in use 
while another one is warming up) when you're seeing these stats?



-Hoss


Re: Who's cleaning the Fieldcache?

2013-08-14 Thread Robert Muir
On Wed, Aug 14, 2013 at 5:29 PM, Chris Hostetter
 wrote:
>
> : why? Those are my sort fields and they are occupying a lot of space (doubled
> : in this case but I see that sometimes I have three or four "old" segment
> : references)
> :
> : Is there something I can do to remove those old references? I tried to 
> reload
> : the core and it seems the old references are discarded (i.e. garbage
> : collected) but I believe is not a good workaround, I would avoid to reload 
> the
> : core for every replication cycle.
>
> You don't need to reload the core to get rid of the old FieldCaches -- in
> fact, there is nothing about reloading the core that will garuntee old
> FieldCaches get removed.
>
> FieldCaches are managed using a WeakHashMap - so once the IndexReader's
> associated with those FieldCaches are no logner used, they will be garbage
> collected when and if the JVMs garbage collector get arround to it.
>
> if they sit arround after you are done with them, they might look like the
> ytake upa log of memory, but that just means your JVM Heap has that memory
> to spare and hasn't needed to clean them up yet.

I don't think this is correct.

When you register an entry in the fieldcache, it registers event
listeners on the segment's core so that when its close()d, any entries
are purged rather than waiting on GC.

See FieldCacheImpl.java


Re: Who's cleaning the Fieldcache?

2013-08-14 Thread Chris Hostetter

: why? Those are my sort fields and they are occupying a lot of space (doubled
: in this case but I see that sometimes I have three or four "old" segment
: references)
: 
: Is there something I can do to remove those old references? I tried to reload
: the core and it seems the old references are discarded (i.e. garbage
: collected) but I believe is not a good workaround, I would avoid to reload the
: core for every replication cycle.

You don't need to reload the core to get rid of the old FieldCaches -- in 
fact, there is nothing about reloading the core that will garuntee old 
FieldCaches get removed.

FieldCaches are managed using a WeakHashMap - so once the IndexReader's 
associated with those FieldCaches are no logner used, they will be garbage 
collected when and if the JVMs garbage collector get arround to it.

if they sit arround after you are done with them, they might look like the 
ytake upa log of memory, but that just means your JVM Heap has that memory 
to spare and hasn't needed to clean them up yet.


-Hoss


Who's cleaning the Fieldcache?

2013-08-14 Thread Andrea Gazzarini

After doing some replications (replicationOnOptimize) I see

- on master filesystem files that belong to two segments (I suppose the 
oldest is just a commit point)
- on master admin console 
(SolrIndexReader{this=4f2452c6,r=ReadOnlyDirectoryReader@4f2452c6,refCnt=1,*segments=**1*})


but on slave

- on filesystem I see files belonging just to latest segment (which in 
this case is called *_mx*)

- on admin console I see there's just one segment
- on stats page I see (FieldCache) references to both new and previous 
(old) segment (*_mv and _mx*)


entries_count : 11
...
entry#2 : 
'MMapIndexInput(path="/home/agazzarini/solr-indexes/slave-data-dir/cbt/main/data/index/_*mv*.frq")'=>'*title_sort*',class 
org.apache.lucene.search.FieldCache$StringIndex,null=>org.apache.lucene.search.FieldCache$StringIndex#883647064
entry#3 : 
'MMapIndexInput(path="/home/agazzarini/solr-indexes/slave-data-dir/cbt/main/data/index/_*mv*.frq")'=>'*author_sort*',class 
org.apache.lucene.search.FieldCache$StringIndex,null=>org.apache.lucene.search.FieldCache$StringIndex#1606785643

...
entry#7 : 
'MMapIndexInput(path="/home/agazzarini/solr-indexes/slave-data-dir/cbt/main/data/index/_*mx*.frq")'=>'*title_sort*',class 
org.apache.lucene.search.FieldCache$StringIndex,null=>org.apache.lucene.search.FieldCache$StringIndex#144024863
entry#8 : 
'MMapIndexInput(path="/home/agazzarini/solr-indexes/slave-data-dir/cbt/main/data/index/_*mx*.frq")'=>'*author_sort'*,class 
org.apache.lucene.search.FieldCache$StringIndex,null=>org.apache.lucene.search.FieldCache$StringIndex#411802272

...

why? Those are my sort fields and they are occupying a lot of space 
(doubled in this case but I see that sometimes I have three or four 
"old" segment references)


Is there something I can do to remove those old references? I tried to 
reload the core and it seems the old references are discarded (i.e. 
garbage collected) but I believe is not a good workaround, I would avoid 
to reload the core for every replication cycle.


Best
Andrea






Re: TrieField and FieldCache confusion

2013-08-01 Thread Paul Masurel
Thank you very much for your very fast answer and
all the pointers.

That's what I thought, but then I got confused by the last note
http://wiki.apache.org/solr/StatsComponent

"TrieFields <http://wiki.apache.org/solr/TrieFields> has to use a
precisionStep of -1 to avoid using
UnInvertedField<http://wiki.apache.org/solr/UnInvertedField>.java.
Consider using one field for doing stats, and one for doing range facetting
on. "

I assume it referred to former version of Solr.




On Wed, Jul 31, 2013 at 7:43 PM, Chris Hostetter
wrote:

>
> : Can I expect the FieldCache of Lucene to return the correct values when
> : working
> : with TrieField with the precisionStep higher than 0. If not, what did I
> get
> : wrong?
>
> Yes -- the code for building FieldCaches from Trie fields is smart enough
> to ensure that only the "real" original values are used to populate the
> Cache
>
> (See for example: FieldCache.NUMERIC_UTILS_INT_PARSER and the classes
> linked to from it's javadocs...
>
>
> https://lucene.apache.org/core/4_4_0/core/org/apache/lucene/search/FieldCache.html#NUMERIC_UTILS_INT_PARSER
>
> https://lucene.apache.org/core/4_4_0/core/org/apache/lucene/util/NumericUtils.html
>
> https://lucene.apache.org/core/4_4_0/core/org/apache/lucene/document/IntField.html
>
> (Solr's Trie fields are backed by the various numeric fields in lucene --
> ie: solr:TrieIntField -> lucene:IntField.  the "Trie*" prefix is used in
> solr because there already had classes named IntField, DoubleField, etc...
> when the Trie based impls where added to lucene)
>
>
> -Hoss
>



-- 
__

 Masurel Paul
 e-mail: paul.masu...@gmail.com


Re: TrieField and FieldCache confusion

2013-07-31 Thread Chris Hostetter

: Can I expect the FieldCache of Lucene to return the correct values when
: working
: with TrieField with the precisionStep higher than 0. If not, what did I get
: wrong?

Yes -- the code for building FieldCaches from Trie fields is smart enough 
to ensure that only the "real" original values are used to populate the 
Cache

(See for example: FieldCache.NUMERIC_UTILS_INT_PARSER and the classes 
linked to from it's javadocs...

https://lucene.apache.org/core/4_4_0/core/org/apache/lucene/search/FieldCache.html#NUMERIC_UTILS_INT_PARSER
https://lucene.apache.org/core/4_4_0/core/org/apache/lucene/util/NumericUtils.html
https://lucene.apache.org/core/4_4_0/core/org/apache/lucene/document/IntField.html

(Solr's Trie fields are backed by the various numeric fields in lucene -- 
ie: solr:TrieIntField -> lucene:IntField.  the "Trie*" prefix is used in 
solr because there already had classes named IntField, DoubleField, etc... 
when the Trie based impls where added to lucene)


-Hoss


TrieField and FieldCache confusion

2013-07-31 Thread Paul Masurel
Hello everyone,

I have a question about Solr TrieField and Lucene FieldCache.

>From my understanding, Solr added the implementation of TrieField to
perform faster range queries.
For each value it will index multiple terms. The n-th term being a masked
version of our value,
showing only it first (precisionStep * n) bits.

When uninverting the field to populate a FieldCache, the last value with
regard
to the lexicographical order will be retained ; which from my understanding
should
be the term with the highest precision.

Can I expect the FieldCache of Lucene to return the correct values when
working
with TrieField with the precisionStep higher than 0. If not, what did I get
wrong?

Regards,

Paul Masurel
e-mail: paul.masu...@gmail.com


Re: Using per-segment FieldCache or DocValues in custom component?

2013-07-02 Thread Robert Muir
Where do you get the docid from? Usually its best to just look at the whole
algorithm, e.g. docids come from per-segment readers by default anyway so
ideally you want to access any per-document things from that same
segmentreader.

As far as supporting docvalues, FieldCache API "passes thru" to docvalues
transparently if its enabled for the field.

On Mon, Jul 1, 2013 at 4:55 PM, Michael Ryan  wrote:

> I have some custom code that uses the top-level FieldCache (e.g.,
> FieldCache.DEFAULT.getLongs(reader, "foobar", false)). I'd like to redesign
> this to use the per-segment FieldCaches so that re-opening a Searcher is
> fast(er). In most cases, I've got a docId and I want to get the value for a
> particular single-valued field for that doc.
>
> Is there a good place to look to see example code of per-segment
> FieldCache use? I've been looking at PerSegmentSingleValuedFaceting, but
> hoping there might be something less confusing :)
>
> Also thinking DocValues might be a better way to go for me... is there any
> documentation or example code for that?
>
> -Michael
>


Using per-segment FieldCache or DocValues in custom component?

2013-07-01 Thread Michael Ryan
I have some custom code that uses the top-level FieldCache (e.g., 
FieldCache.DEFAULT.getLongs(reader, "foobar", false)). I'd like to redesign 
this to use the per-segment FieldCaches so that re-opening a Searcher is 
fast(er). In most cases, I've got a docId and I want to get the value for a 
particular single-valued field for that doc.

Is there a good place to look to see example code of per-segment FieldCache 
use? I've been looking at PerSegmentSingleValuedFaceting, but hoping there 
might be something less confusing :)

Also thinking DocValues might be a better way to go for me... is there any 
documentation or example code for that?

-Michael


OOM fieldCache problem

2013-06-26 Thread Markus Klose
Hi all,

I have some memory problems (OOM) with Solr 3.5.0 and I suppose that it has
something to do with the fieldCache. The entries count of the fieldCache
grows and grows, why is it not rebuilt after a commit? I commit every 60
seconds, but the memory consumption of Solr increased within one day from
2GB to 10GB (index size: ~200MB). 

I tried to solve the problem by reducing the other cache sizes (filterCache,
documentCache, queryResultCache). It delayed the OOM exception but it did
not solve the problem that the memory consumption increases continuously. Is
it possible to reset the fieldCache explicitly?

Markus



Re: FieldCache insanity with field used as facet and group

2013-06-03 Thread Elodie Sannier

I'm reproducing the problem with the 4.2.1 example with 2 shards.

1) started up solr shards, indexed the example data, and confirmed empty
fieldCaches
[sanniere@funlevel-dx example]$ java
-Dbootstrap_confdir=./solr/collection1/conf
-Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar
[sanniere@funlevel-dx example2]$ java -Djetty.port=7574
-DzkHost=localhost:9983 -jar start.jar

2) used both grouping and faceting on the popularity field, then checked
the fieldcache insanity count
[sanniere@funlevel-dx example]$ curl -sS
"http://localhost:8983/solr/select?q=*:*&group=true&group.field=popularity";
> /dev/null
[sanniere@funlevel-dx example]$ curl -sS
"http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=popularity";
> /dev/null
[sanniere@funlevel-dx example]$ curl -sS
"http://localhost:8983/solr/admin/mbeans?stats=true&key=fieldCache&wt=json&indent=true";
| grep "entries_count|insanity_count"
"entries_count":10,
"insanity_count":2,

"insanity#0":"VALUEMISMATCH: Multiple distinct value objects for
SegmentCoreReader(owner=_g(4.2.1):C1)+popularity\n\t'SegmentCoreReader(owner=_g(4.2.1):C1)'=>'popularity',class
org.apache.lucene.index.SortedDocValues,0.5=>org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#12129794\n\t'SegmentCoreReader(owner=_g(4.2.1):C1)'=>'popularity',int,null=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#12298774\n\t'SegmentCoreReader(owner=_g(4.2.1):C1)'=>'popularity',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#12298774\n",
"insanity#1":"VALUEMISMATCH: Multiple distinct value objects for
SegmentCoreReader(owner=_f(4.2.1):C9)+popularity\n\t'SegmentCoreReader(owner=_f(4.2.1):C9)'=>'popularity',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#16648315\n\t'SegmentCoreReader(owner=_f(4.2.1):C9)'=>'popularity',int,null=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#16648315\n\t'SegmentCoreReader(owner=_f(4.2.1):C9)'=>'popularity',class
org.apache.lucene.index.SortedDocValues,0.5=>org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#1130715\n"}}},
"HIGHLIGHTING",{},
"OTHER",{}]}

I've updated https://issues.apache.org/jira/browse/SOLR-4866

Elodie

Le 28.05.2013 10:22, Elodie Sannier a écrit :

I've created https://issues.apache.org/jira/browse/SOLR-4866

Elodie

Le 07.05.2013 18:19, Chris Hostetter a écrit :

: I am using the Lucene FieldCache with SolrCloud and I have "insane" instances
: with messages like:

FWIW: I'm the one that named the result of these "sanity checks"
"FieldCacheInsantity" and i have regretted it ever since -- a better label
would have been "inconsistency"

: VALUEMISMATCH: Multiple distinct value objects for
: SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)+merchantid
: 'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',class
: 
org.apache.lucene.index.SortedDocValues,0.5=>org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#557711353
: 
'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',int,null=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713
: 
'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713
:
: All insane instances are for a field "merchantid" of type "int" used as facet
: and group field.

Interesting: it appears that the grouping code and the facet code are not
being consistent in how they are building hte field cache, so you are
getting two objects in the cache for each segment

I haven't checked if this happens much with the example configs, but if
you could: please file a bug with the details of which Solr version you
are using along with the schema fieldType&   filed declarations for your
merchantid field, along with the mbean stats output showing the field
cache insanity after executing two queries like...

/select?q=*:*&facet=true&facet.field=merchantid
/select?q=*:*&group=true&group.field=merchantid

(that way we can rule out your custom SearchComponent as having a bug in
it)

: This insanity can have performance impact ?
: How can I fix it ?

the impact is just that more ram is being used them is probably strictly
neccessary.  unless there is something unusual in your fieldType
delcataion, i don't think there is an easy fix you can apply -- we need 

Re: FieldCache insanity with field used as facet and group

2013-05-28 Thread Elodie Sannier

I've created https://issues.apache.org/jira/browse/SOLR-4866

Elodie

Le 07.05.2013 18:19, Chris Hostetter a écrit :

: I am using the Lucene FieldCache with SolrCloud and I have "insane" instances
: with messages like:

FWIW: I'm the one that named the result of these "sanity checks"
"FieldCacheInsantity" and i have regretted it ever since -- a better label
would have been "inconsistency"

: VALUEMISMATCH: Multiple distinct value objects for
: SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)+merchantid
: 'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',class
: 
org.apache.lucene.index.SortedDocValues,0.5=>org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#557711353
: 
'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',int,null=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713
: 
'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713
:
: All insane instances are for a field "merchantid" of type "int" used as facet
: and group field.

Interesting: it appears that the grouping code and the facet code are not
being consistent in how they are building hte field cache, so you are
getting two objects in the cache for each segment

I haven't checked if this happens much with the example configs, but if
you could: please file a bug with the details of which Solr version you
are using along with the schema fieldType&  filed declarations for your
merchantid field, along with the mbean stats output showing the field
cache insanity after executing two queries like...

/select?q=*:*&facet=true&facet.field=merchantid
/select?q=*:*&group=true&group.field=merchantid

(that way we can rule out your custom SearchComponent as having a bug in
it)

: This insanity can have performance impact ?
: How can I fix it ?

the impact is just that more ram is being used them is probably strictly
neccessary.  unless there is something unusual in your fieldType
delcataion, i don't think there is an easy fix you can apply -- we need to
fix the underlying code.

-Hoss



--
Kelkoo

*Elodie Sannier *Software engineer

*E*elodie.sann...@kelkoo.fr <mailto:elodie.sann...@kelkoo.fr>
*Y!Messenger* kelkooelodies
*T* +33 (0)4 56 09 07 55 *M*
*A* 4/6 Rue des Méridiens 38130 Echirolles




Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


Re: Insane FieldCache usage when using group.facet=true

2013-05-21 Thread Elisabeth Adler
Hi,
I found a different solution. Instead of using the parameter
&group.facet=true, I am now using &group.truncate=true, which is giving me
the correct number of counts.
Best,
Elisabeth

On 21 May 2013 09:55, Elisabeth Adler  wrote:

> Hi,
> I did a few more tests but still can't get my Solr giving me the group
> counts on faceting instead of the document counts.
> Any ideas?
> Thanks,
> Elisabeth
>
>
> On 17 May 2013 14:11, Elisabeth Adler  wrote:
>
>> Dear all,
>>
>> I am running a grouped query including facets in my Junit Test cases
>> against a Solr 4.2.1 Embedded Server. When faceting the groups, I want the
>> counts to reflect the number of groups, not the number of documents. But
>> when I enable "&group.facet=true" on the query, the test fails with the
>> following message:
>>
>> *** BEGIN testSearchByQuery(com.test.InsaneFieldCacheTest): Insane
>> FieldCache usage(s) ***
>> VALUEMISMATCH: Multiple distinct value objects for
>> SegmentCoreReader(owner=_0(4.2.1):C12)+course_id
>> 'SegmentCoreReader(owner=_0(4.2.1):C12)'=>'course_id',class
>> org.apache.lucene.index.SortedDocValues,0.5=>org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#297645694
>> (size =~ 320 bytes)
>>
>> 'SegmentCoreReader(owner=_0(4.2.1):C12)'=>'course_id',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#758496471
>> (size =~ 80 bytes)
>> *** END testSearchByQuery(com.test.InsaneFieldCacheTest): Insane
>> FieldCache usage(s) ***
>>
>> When disabling the group.facet, the test runs as expected.
>>
>> I found a related issue where the same field was used for the facet and
>> the group. I verified, the given query does not use the "course_id" field
>> twice. The error message suggests that sorting on "course_id" was done and
>> this is stored in the cache.
>> I set up a copy-field, one for grouping (course_id_grouping) and one for
>> all other uses (course_id). I got the message again, now for
>> course_id_grouping. Disabling all caches did not help.
>>
>> I put a test case on
>> https://github.com/lischen3229/solrInsaneFieldCacheErrorTest to
>> replicate the issue.
>>
>> Any pointers on how to get the facets displaying the group counts instead
>> of the document counts highly appreciated.
>>
>> Best,
>> Elisabeth
>>
>
>


Re: Insane FieldCache usage when using group.facet=true

2013-05-21 Thread Elisabeth Adler
Hi,
I did a few more tests but still can't get my Solr giving me the group
counts on faceting instead of the document counts.
Any ideas?
Thanks,
Elisabeth

On 17 May 2013 14:11, Elisabeth Adler  wrote:

> Dear all,
>
> I am running a grouped query including facets in my Junit Test cases
> against a Solr 4.2.1 Embedded Server. When faceting the groups, I want the
> counts to reflect the number of groups, not the number of documents. But
> when I enable "&group.facet=true" on the query, the test fails with the
> following message:
>
> *** BEGIN testSearchByQuery(com.test.InsaneFieldCacheTest): Insane
> FieldCache usage(s) ***
> VALUEMISMATCH: Multiple distinct value objects for
> SegmentCoreReader(owner=_0(4.2.1):C12)+course_id
> 'SegmentCoreReader(owner=_0(4.2.1):C12)'=>'course_id',class
> org.apache.lucene.index.SortedDocValues,0.5=>org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#297645694
> (size =~ 320 bytes)
>
> 'SegmentCoreReader(owner=_0(4.2.1):C12)'=>'course_id',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#758496471
> (size =~ 80 bytes)
> *** END testSearchByQuery(com.test.InsaneFieldCacheTest): Insane
> FieldCache usage(s) ***
>
> When disabling the group.facet, the test runs as expected.
>
> I found a related issue where the same field was used for the facet and
> the group. I verified, the given query does not use the "course_id" field
> twice. The error message suggests that sorting on "course_id" was done and
> this is stored in the cache.
> I set up a copy-field, one for grouping (course_id_grouping) and one for
> all other uses (course_id). I got the message again, now for
> course_id_grouping. Disabling all caches did not help.
>
> I put a test case on
> https://github.com/lischen3229/solrInsaneFieldCacheErrorTest to replicate
> the issue.
>
> Any pointers on how to get the facets displaying the group counts instead
> of the document counts highly appreciated.
>
> Best,
> Elisabeth
>


Insane FieldCache usage when using group.facet=true

2013-05-17 Thread Elisabeth Adler
Dear all,

I am running a grouped query including facets in my Junit Test cases
against a Solr 4.2.1 Embedded Server. When faceting the groups, I want the
counts to reflect the number of groups, not the number of documents. But
when I enable "&group.facet=true" on the query, the test fails with the
following message:

*** BEGIN testSearchByQuery(com.test.InsaneFieldCacheTest): Insane
FieldCache usage(s) ***
VALUEMISMATCH: Multiple distinct value objects for
SegmentCoreReader(owner=_0(4.2.1):C12)+course_id
'SegmentCoreReader(owner=_0(4.2.1):C12)'=>'course_id',class
org.apache.lucene.index.SortedDocValues,0.5=>org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#297645694
(size =~ 320 bytes)

'SegmentCoreReader(owner=_0(4.2.1):C12)'=>'course_id',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#758496471
(size =~ 80 bytes)
*** END testSearchByQuery(com.test.InsaneFieldCacheTest): Insane FieldCache
usage(s) ***

When disabling the group.facet, the test runs as expected.

I found a related issue where the same field was used for the facet and the
group. I verified, the given query does not use the "course_id" field
twice. The error message suggests that sorting on "course_id" was done and
this is stored in the cache.
I set up a copy-field, one for grouping (course_id_grouping) and one for
all other uses (course_id). I got the message again, now for
course_id_grouping. Disabling all caches did not help.

I put a test case on
https://github.com/lischen3229/solrInsaneFieldCacheErrorTest to replicate
the issue.

Any pointers on how to get the facets displaying the group counts instead
of the document counts highly appreciated.

Best,
Elisabeth


Re: FieldCache insanity with field used as facet and group

2013-05-07 Thread Chris Hostetter

: I am using the Lucene FieldCache with SolrCloud and I have "insane" instances
: with messages like:

FWIW: I'm the one that named the result of these "sanity checks" 
"FieldCacheInsantity" and i have regretted it ever since -- a better label 
would have been "inconsistency"

: VALUEMISMATCH: Multiple distinct value objects for
: SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)+merchantid
: 'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',class
: 
org.apache.lucene.index.SortedDocValues,0.5=>org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#557711353
: 
'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',int,null=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713
: 
'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713
: 
: All insane instances are for a field "merchantid" of type "int" used as facet
: and group field.

Interesting: it appears that the grouping code and the facet code are not 
being consistent in how they are building hte field cache, so you are 
getting two objects in the cache for each segment

I haven't checked if this happens much with the example configs, but if 
you could: please file a bug with the details of which Solr version you 
are using along with the schema fieldType & filed declarations for your 
merchantid field, along with the mbean stats output showing the field 
cache insanity after executing two queries like...

/select?q=*:*&facet=true&facet.field=merchantid
/select?q=*:*&group=true&group.field=merchantid

(that way we can rule out your custom SearchComponent as having a bug in 
it)

: This insanity can have performance impact ?
: How can I fix it ?

the impact is just that more ram is being used them is probably strictly 
neccessary.  unless there is something unusual in your fieldType 
delcataion, i don't think there is an easy fix you can apply -- we need to 
fix the underlying code.

-Hoss

FieldCache insanity with field used as facet and group

2013-04-25 Thread Elodie Sannier

Hello,

I am using the Lucene FieldCache with SolrCloud and I have "insane" instances 
with messages like:

VALUEMISMATCH: Multiple distinct value objects for 
SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)+merchantid 
'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',class 
org.apache.lucene.index.SortedDocValues,0.5=>org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#557711353
 
'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',int,null=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713
 
'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713

All insane instances are for a field "merchantid" of type "int" used as facet 
and group field.

I'm using a custom SearchHandler which makes two sub-queries, a first query 
with group.field=merchantid and a second query with facet.field=merchantid.

When I'm using the parameter facet.method=enum, I don't have the insane 
instance but I'm not sure it is the good fix.

This insanity can have performance impact ?
How can I fix it ?

Elodie Sannier


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


Re: Sort-field for ALL docs in FieldCache for sort queries -> OOM on lots of docs

2013-03-22 Thread Shawn Heisey
On 3/22/2013 8:54 AM, Per Steffensen wrote:
> Me too. I will find out soon - I hope! But re-indexing is kinda a
> problem for us, but we will figure out.
> Any "guide to re-index all you stuff" anywhere, so I do it the easiest
> way? Guess maybe there are some nice tricks about steaming data directly
> from one Solr running the old index into a new Solr running the new
> index, and then discard the old index afterwards?

There is no guide to reindexing, because there are so many ways to
index.  The basic procedure is to repeat whatever you did the first
time, possibly deleting the entire index first.  Because Lucene and Solr
indexes often require changes to deal with changing requirements, the
full index procedure should be automated and repeatable.

The dataimport handler has a SolrEntityProcessor that can index from
another Solr instance.  All fields must be stored for this to work,
because it just retrieves documents and ignores the search index.  Many
people (including myself) do not store all fields, in an attempt to keep
the index size down.

Thanks,
Shawn



Re: Sort-field for ALL docs in FieldCache for sort queries -> OOM on lots of docs

2013-03-22 Thread Per Steffensen

On 3/21/13 10:50 PM, Shawn Heisey wrote:

On 3/21/2013 4:05 AM, Per Steffensen wrote:

Can anyone else elaborate? How to "activate" it? How to make sure, for
sorting, that sort-field-value for all docs are not read into memory for
sorting - leading to OOM when you have a lot of docs? Can this feature
be activated on top of an existing 4.0 index, or do you have to re-index
everything?


There is one requirement that may not be obvious - every document must 
have a value in the field, so you must either make the field either 
required or give it a default value in the schema.  Solr 4.2 will 
refuse to start the core if this requirement is not met.

That is not problem for us. The field exist on every document.
The example schema hints that the value might need to be 
single-valued.  I have not tested this.  Sorting is already 
problematic on multi-valued fields, so I assume that this won't be the 
case for you.

That is not a problem for us either. The field is single-valued.


To use docValues, add docValues="true" and then either set 
required="true" or default="" on the field definition in 
schema.xml, restart Solr or reload the core, and reindex.  Your index 
will get bigger.

So the answer to "...or do you have to re-index everything?" is yes!?


If the touted behavior of handling the sort mechanism in OS disk cache 
memory (or just reading the disk if there's not enough memory) rather 
than heap is correct, then it should solve your issues.  I hope it does!
Me too. I will find out soon - I hope! But re-indexing is kinda a 
problem for us, but we will figure out.
Any "guide to re-index all you stuff" anywhere, so I do it the easiest 
way? Guess maybe there are some nice tricks about steaming data directly 
from one Solr running the old index into a new Solr running the new 
index, and then discard the old index afterwards?


Thanks,
Shawn



Thanks a lot, Shawn!

Regards, Per Steffensen


Re: Sort-field for ALL docs in FieldCache for sort queries -> OOM on lots of docs

2013-03-21 Thread Shawn Heisey

On 3/21/2013 4:05 AM, Per Steffensen wrote:

Can anyone else elaborate? How to "activate" it? How to make sure, for
sorting, that sort-field-value for all docs are not read into memory for
sorting - leading to OOM when you have a lot of docs? Can this feature
be activated on top of an existing 4.0 index, or do you have to re-index
everything?


There is one requirement that may not be obvious - every document must 
have a value in the field, so you must either make the field either 
required or give it a default value in the schema.  Solr 4.2 will refuse 
to start the core if this requirement is not met.  The example schema 
hints that the value might need to be single-valued.  I have not tested 
this.  Sorting is already problematic on multi-valued fields, so I 
assume that this won't be the case for you.


To use docValues, add docValues="true" and then either set 
required="true" or default="" on the field definition in 
schema.xml, restart Solr or reload the core, and reindex.  Your index 
will get bigger.


If the touted behavior of handling the sort mechanism in OS disk cache 
memory (or just reading the disk if there's not enough memory) rather 
than heap is correct, then it should solve your issues.  I hope it does!


Thanks,
Shawn



Re: Sort-field for ALL docs in FieldCache for sort queries -> OOM on lots of docs

2013-03-21 Thread Per Steffensen

On 3/21/13 10:52 AM, Toke Eskildsen wrote:

On Thu, 2013-03-21 at 09:57 +0100, Per Steffensen wrote:

Thanks Toke! Can you please elaborate a little bit? How to use it? What
it is supposed to do for you?

Sorry, no, I only know about it on the abstract level. The release notes
for Solr 4.2 says

* DocValues have been integrated into Solr. DocValues can be loaded up a
lot faster than the field cache and can also use different compression
algorithms as well as in RAM or on Disk representations. Faceting,
sorting, and function queries all get to benefit. How about the OS
handling faceting and sorting caches off heap? No more tuning 60
gigabyte heaps? How about a snappy new per segment DocValues faceting
method? Improved numeric faceting? Sweet.

Spending 5 minutes searching on how to activate the new powers did not
get me much; my Google-fu is clearly not strong enough. The example
schema shows that docValues="true" is a valid attribute for "StrField,
UUIDField and all Trie*Fields", but I do not know if they are used
automatically by sort or if they should be requested explicitly.

Regards,
Toke Eskildsen



Thanks again, Toke!

Can anyone else elaborate? How to "activate" it? How to make sure, for 
sorting, that sort-field-value for all docs are not read into memory for 
sorting - leading to OOM when you have a lot of docs? Can this feature 
be activated on top of an existing 4.0 index, or do you have to re-index 
everything?


Thanks a lot for any feedback!

Regards, Per Steffensen


Re: Sort-field for ALL docs in FieldCache for sort queries -> OOM on lots of docs

2013-03-21 Thread Toke Eskildsen
On Thu, 2013-03-21 at 09:57 +0100, Per Steffensen wrote:
> Thanks Toke! Can you please elaborate a little bit? How to use it? What 
> it is supposed to do for you?

Sorry, no, I only know about it on the abstract level. The release notes
for Solr 4.2 says

* DocValues have been integrated into Solr. DocValues can be loaded up a
lot faster than the field cache and can also use different compression
algorithms as well as in RAM or on Disk representations. Faceting,
sorting, and function queries all get to benefit. How about the OS
handling faceting and sorting caches off heap? No more tuning 60
gigabyte heaps? How about a snappy new per segment DocValues faceting
method? Improved numeric faceting? Sweet.

Spending 5 minutes searching on how to activate the new powers did not
get me much; my Google-fu is clearly not strong enough. The example
schema shows that docValues="true" is a valid attribute for "StrField,
UUIDField and all Trie*Fields", but I do not know if they are used
automatically by sort or if they should be requested explicitly.

Regards,
Toke Eskildsen



Re: Sort-field for ALL docs in FieldCache for sort queries -> OOM on lots of docs

2013-03-21 Thread Per Steffensen

On 3/21/13 9:48 AM, Toke Eskildsen wrote:

On Thu, 2013-03-21 at 09:13 +0100, Per Steffensen wrote:

We have a lot of docs in Solr. Each particular Solr-node handles a lot
of docs distributed among several replica. When you issue a sort query,
it seems to me that, the value of the sort-field of ALL docs under the
Solr-node is added to the FieldCache. [...]

I haven't used it yet, but DocValues in Solr 4.2 seems to be the answer.

- Toke Eskildsen


Thanks Toke! Can you please elaborate a little bit? How to use it? What 
it is supposed to do for you?


Regards, Per Steffensen


Re: Sort-field for ALL docs in FieldCache for sort queries -> OOM on lots of docs

2013-03-21 Thread Toke Eskildsen
On Thu, 2013-03-21 at 09:13 +0100, Per Steffensen wrote:
> We have a lot of docs in Solr. Each particular Solr-node handles a lot 
> of docs distributed among several replica. When you issue a sort query, 
> it seems to me that, the value of the sort-field of ALL docs under the 
> Solr-node is added to the FieldCache. [...]

I haven't used it yet, but DocValues in Solr 4.2 seems to be the answer.

- Toke Eskildsen



Sort-field for ALL docs in FieldCache for sort queries -> OOM on lots of docs

2013-03-21 Thread Per Steffensen

Hi

We have a lot of docs in Solr. Each particular Solr-node handles a lot 
of docs distributed among several replica. When you issue a sort query, 
it seems to me that, the value of the sort-field of ALL docs under the 
Solr-node is added to the FieldCache. This leads to OOM-exceptions at 
some point when you have enough docs under the Solr-node - relative to 
its Xmx of course. Are there any "tricks" to get around this issue, so 
that a sort-query will never trigger a OOM, no matter how many docs are 
handled by a particular Solr-node. Of course you need to be ready to 
accept the penalty of more disk-IO as soon as the entire thing does not 
fit in memory, but I would rather accept that than accept OOM's.


Regards, Per Steffensen


Using FieldCache in SolrIndexSearcher for distributed id retrieval

2013-01-29 Thread Michael Ryan
Following up from a post I made back in 2011...

> I am a user of Solr 3.2 and I make use of the distributed search capabilities 
> of Solr using
> a fairly simple architecture of a coordinator + some shards.
> 
> Correct me if I am wrong:  In a standard distributed search with 
> QueryComponent, the first
> query sent to the shards asks for fl=myUniqueKey or fl=myUniqueKey,score.  
> When the response
> is being generated to send back to the coordinator, SolrIndexSearcher.doc 
> (int i, Set
> fields) is called for each document.  As I understand it, this will read each 
> document from
> the index _on disk_ and retrieve the myUniqueKey field value for each 
> document.
> 
> My idea is to have a FieldCache for the myUniqueKey field in 
> SolrIndexSearcher (or somewhere
> else?) that would be used in cases where the only field that needs to be 
> retrieved is myUniqueKey.
>  Is this something that would improve performance?
> 
> In our actual setup, we are using an extended version of QueryComponent that 
> queries for a
> couple other fields besides myUniqueKey in the initial query to the shards, 
> and it asks a
> lot of rows when doing so, many more than what the user ends up getting back 
> when they see
> the results.  (The reasons for this are complicated and aren't related much 
> to this question.)
>  We already maintain FieldCaches for the fields that we are asking for, but 
> for other purposes.
>  Would it make sense to utilize these FieldCaches in SolrIndexSearcher?  Is 
> this something
> that anyone else has done before?

We did end up doing this inside of the SolrIndexSearcher.doc() method. 
Basically I check if the fields Set only contains fields that I am willing to 
use the FieldCache for, and if so, build up the Document from the data inside 
of the FieldCache. Basically looks like this...

if (fieldNamesToRetrieveFromFieldCache.containsAll(fields)) {
  d = new Document();
  if (fields.contains("myUniqueKeyField")) {
long value = FieldCache.DEFAULT.getLongs(reader, "myUniqueKeyField")[i];
if (value != 0) {
  d.add(new NumericField("myUniqueKeyField", Field.Store.YES, 
true).setLongValue(value));
}
  }
  if (fields.contains("someOtherField")) {
long value = FieldCache.DEFAULT.getLongs(reader, "someOtherField")[i];
if (value != 0) {
  d.add(new NumericField("someOtherField", Field.Store.YES, 
true).setLongValue(value));
}
  }
}

I don't have a more generalized patch that makes it easily configurable, but 
the idea is fairly simple.

We have had good results from this. For a system of n shards, this reduces the 
average number of docs to retrieve from disk per shard from rows to rows/n. For 
requests with a large rows parameter (e.g., 1000) and many shards, this makes a 
noticeable difference in response time. Obviously this isn't the typical Solr 
use case, so your mileage may vary. 

-Michael


Re: multivalued filed question (FieldCache error)

2012-10-08 Thread giovanni.bricc...@banzai.it

Thank you very much!

I've singlelined, spaced removed every fl field in my solrconfig and now 
the app works fine


Giovanni

Il 05/10/12 20:49, Chris Hostetter ha scritto:

: So extracting the attachment you will be able to track down what appens
:
: this is the query that shows the error, and below you can see the latest stack
: trace and the qt definition

Awesome -- exactly what we needed.

I've reproduced your problem, and verified that it has something to do
with the extra newlines which are confusing the parsing into not
recognizing "store_slug" as a simple field name.

The workarround is to modify the fl in your config to look like this...

  sku,store_slug

...or even like this...

 sku,  store_slug   

...and then it should work fine.

having a newline immediately following the store_slug field name is
somehow confusing things, and making it not recognize "store_slug" as a
simple field name -- so then it tries to parse it as a function, and
since bare field names can also be used as functions that parsing works,
but then you get the error that the field can't be used as a function
since it's multivalued.

I'll try to get a fix for this into 4.0-FINAL...

https://issues.apache.org/jira/browse/SOLR-3916

-Hoss






Re: multivalued filed question (FieldCache error)

2012-10-05 Thread Chris Hostetter

: So extracting the attachment you will be able to track down what appens
: 
: this is the query that shows the error, and below you can see the latest stack
: trace and the qt definition

Awesome -- exactly what we needed.

I've reproduced your problem, and verified that it has something to do 
with the extra newlines which are confusing the parsing into not 
recognizing "store_slug" as a simple field name.

The workarround is to modify the fl in your config to look like this...

 sku,store_slug

...or even like this...

sku,  store_slug   

...and then it should work fine.  

having a newline immediately following the store_slug field name is 
somehow confusing things, and making it not recognize "store_slug" as a 
simple field name -- so then it tries to parse it as a function, and 
since bare field names can also be used as functions that parsing works, 
but then you get the error that the field can't be used as a function 
since it's multivalued.

I'll try to get a fix for this into 4.0-FINAL...

https://issues.apache.org/jira/browse/SOLR-3916

-Hoss


Re: multivalued filed question (FieldCache error)

2012-10-04 Thread giovanni.bricc...@banzai.it

Thank you for the support!

Unfortunately my configuration is very large, but I was able to 
reproduce the error in a new test collection (I have a multicore setup).

So extracting the attachment you will be able to track down what appens

this is the query that shows the error, and below you can see the latest 
stack trace and the qt definition


i'm using solr version "4.0.0-BETA 1370099 - rmuir - 2012-08-06 22:50:47"

http://src-eprice-dev:8080/solr/test/select?q=ciao&wt=xml&qt=eprice

SEVERE: org.apache.solr.common.SolrException: can not use FieldCache on 
multivalued field: store_slug
at 
org.apache.solr.schema.SchemaField.checkFieldCacheSource(SchemaField.java:174)

at org.apache.solr.schema.StrField.getValueSource(StrField.java:44)
at 
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:376)
at 
org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:70)

at org.apache.solr.search.QParser.getQuery(QParser.java:145)
at org.apache.solr.search.ReturnFields.add(ReturnFields.java:289)
at 
org.apache.solr.search.ReturnFields.parseFieldList(ReturnFields.java:115)

at org.apache.solr.search.ReturnFields.(ReturnFields.java:101)
at org.apache.solr.search.ReturnFields.(ReturnFields.java:77)
at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:97)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:185)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:454)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:275)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)

at java.lang.Thread.run(Thread.java:662)


  

 dismax
 explicit
 1
 
sku^1
 
 
sku^1
 
 
sku,store_slug
 

 
 
 2
 2
 *:*

true
1
0
count
store_slug
false


  

Il 03/10/12 19:51, Chris Hostetter ha scritto:

: Here is the stack trace

what exactly does your fl param look like when you get this error?  and
what exactly are the field/fieldType declarations for each of the fields
in your fl?

Because if i'm reading this correctly, Solr thinks you are trying to
include in the response the results of a function on your store_slug
field, ie...

   fl=foo, bar, baz, somefunction(store_slug)

...it's possible there is a bug in the parsing code -- it includes some
huersitics to deal with the posibility of atypical field names that might
look like function names, but it shouldn't get confused by a field name as
simple sa "store_slug" which leads me to believe something earlier in the
fl list is confusing it.

(Details really matter.  When you only give us part of the information
-- ie: "..." in your solrconfig, a one line error message instead of hte
full stack trace -- and we have to ask lots of follow up questions to get
the basic info about what/how you got an error, it really makes it hard to
help diagnose problems)


: Oct 3, 2012 3:07:38 PM org.apache.solr.common.SolrException log
: SEVERE: org.apache.solr.common.SolrException: can not use FieldCache on
: multivalued field: store_slug
: at
: org.apache.solr.schema.SchemaField.checkFieldCacheSource(SchemaField.java:174)
: at org.apache.solr.schema.StrField.getValueSource(StrField.java:44)
: at
: 
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:376)
: at org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:70)
: at org.apache.solr.search.QParser.getQuery(QParser.java:145)
: at org.apache.solr.search.ReturnFields.add(ReturnFields.java:289)
: at
: org.apache.solr.search.ReturnFields.parseFieldList(ReturnFields.java:115)
: at org.apache

Re: multivalued filed question (FieldCache error)

2012-10-03 Thread Chris Hostetter

: Here is the stack trace

what exactly does your fl param look like when you get this error?  and 
what exactly are the field/fieldType declarations for each of the fields 
in your fl?

Because if i'm reading this correctly, Solr thinks you are trying to 
include in the response the results of a function on your store_slug 
field, ie... 

  fl=foo, bar, baz, somefunction(store_slug)

...it's possible there is a bug in the parsing code -- it includes some 
huersitics to deal with the posibility of atypical field names that might 
look like function names, but it shouldn't get confused by a field name as 
simple sa "store_slug" which leads me to believe something earlier in the 
fl list is confusing it.

(Details really matter.  When you only give us part of the information 
-- ie: "..." in your solrconfig, a one line error message instead of hte 
full stack trace -- and we have to ask lots of follow up questions to get 
the basic info about what/how you got an error, it really makes it hard to 
help diagnose problems)


: Oct 3, 2012 3:07:38 PM org.apache.solr.common.SolrException log
: SEVERE: org.apache.solr.common.SolrException: can not use FieldCache on
: multivalued field: store_slug
: at
: org.apache.solr.schema.SchemaField.checkFieldCacheSource(SchemaField.java:174)
: at org.apache.solr.schema.StrField.getValueSource(StrField.java:44)
: at
: 
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:376)
: at org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:70)
: at org.apache.solr.search.QParser.getQuery(QParser.java:145)
: at org.apache.solr.search.ReturnFields.add(ReturnFields.java:289)
: at
: org.apache.solr.search.ReturnFields.parseFieldList(ReturnFields.java:115)
: at org.apache.solr.search.ReturnFields.(ReturnFields.java:101)
: at org.apache.solr.search.ReturnFields.(ReturnFields.java:77)
: at
: 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:97)
: at
: 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:185)
: at
: 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
: at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
: at
: 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:454)
: at
: 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:275)
: at
: 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
: at
: 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
: at
: 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
: at
: 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
: at
: org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
: at
: org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
: at
: 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
: at
: org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
: at
: org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
: at
: 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
: at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
: at java.lang.Thread.run(Thread.java:662)
: 
: Il 02/10/12 19:40, Chris Hostetter ha scritto:
: > : I'm also using that field for a facet:
: > 
: > Hmmm... that still doesn't make sense.  faceting can use FieldCache, but
: > it will check if ht field is mutivalued to decide if/when/how to do this.
: > 
: > There's nothing else in your requestHandler config that would suggest why
: > you might get this error.
: > 
: > can you please provide more details about the error you are getting -- in
: > particular: the completley stack trace from the server logs.  that should
: > help us itendify the code path leading to the problem.
: > 
: > 
: > :
: > : |
: > : 
: > :  dismax
: > :  explicit
: > :  1
: > :  
: > :many field but not store_slug
: > :  
: > :  
: > :|many field but not store_slug|||
: > : 
: > : ..., store_slug
: > :  
: > :   
: > :  2
: > :  2
: > :  *:*
: > : default
: > :   true
: > :   true
: > :   10
: > :   true  
: > : true
: > : 1
: > : 0
: > : count
: > : ...
: > : store_slug
: > : ...
: > : false
: > : 
: > : 
: > :   spellcheck
: > : 
: > :
: > :   |
: > :
: > :
: > : Il 01/10/12 18:34

Re: multivalued filed question (FieldCache error)

2012-10-03 Thread giovanni.bricc...@banzai.it

Here is the stack trace



Oct 3, 2012 3:07:38 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: can not use FieldCache on 
multivalued field: store_slug
at 
org.apache.solr.schema.SchemaField.checkFieldCacheSource(SchemaField.java:174)

at org.apache.solr.schema.StrField.getValueSource(StrField.java:44)
at 
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:376)
at 
org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:70)

at org.apache.solr.search.QParser.getQuery(QParser.java:145)
at org.apache.solr.search.ReturnFields.add(ReturnFields.java:289)
at 
org.apache.solr.search.ReturnFields.parseFieldList(ReturnFields.java:115)

at org.apache.solr.search.ReturnFields.(ReturnFields.java:101)
at org.apache.solr.search.ReturnFields.(ReturnFields.java:77)
at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:97)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:185)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:454)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:275)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)

at java.lang.Thread.run(Thread.java:662)

Il 02/10/12 19:40, Chris Hostetter ha scritto:

: I'm also using that field for a facet:

Hmmm... that still doesn't make sense.  faceting can use FieldCache, but
it will check if ht field is mutivalued to decide if/when/how to do this.

There's nothing else in your requestHandler config that would suggest why
you might get this error.

can you please provide more details about the error you are getting -- in
particular: the completley stack trace from the server logs.  that should
help us itendify the code path leading to the problem.


:
: |
: 
:  dismax
:  explicit
:  1
:  
:many field but not store_slug
:  
:  
:|many field but not store_slug|||
: 
: ..., store_slug
:  
:   
:  2
:  2
:  *:*
: default
:   true
:   true
:   10
:   true  
: true
: 1
: 0
: count
: ...
: store_slug
: ...
: false
: 
: 
:   spellcheck
: 
:
:   |
:
:
: Il 01/10/12 18:34, Erik Hatcher ha scritto:
: > How is your request handler defined?  Using store_slug for anything but fl?
: >
: > Erik
: >
: > On Oct 1, 2012, at 10:51,"giovanni.bricc...@banzai.it"
: >   wrote:
: >
: > > Hello,
: > >
: > > I would like to put a multivalued field into a qt definition as output
: > > field. to do this I edit the current solrconfig.xml definition and add the
: > > field in the fl specification.
: > >
: > > Unexpectedly when I do the query q=*:*&qt=mytype I get the error
: > >
: > > 
: > > can not use FieldCache on multivalued field: store_slug
: > > 
: > >
: > > But if I instead run the query
: > >
: > > 
http://src-eprice-dev:8080/solr/0/select/?q=*:*&qt=mytype&fl=otherfield,mymultivaluedfiled
: > >
: > > I don't get the error
: > >
: > > Have you got any suggestions?
: > >
: > > I'm using solr 4 beta
: > >
: > > solr-spec 4.0.0.2012.08.06.22.50.47
: > > lucene-impl 4.0.0-BETA 1370099
: > >
: > >
: > > Giovanni
:
:
: --
:
:
:  Giovanni Bricconi
:
: Banzai Consulting
: cell. 348 7283865
: ufficio 02 00643839
: via Gian Battista Vico 42
: 20132 Milano (MI)
:
:
:
:

-Hoss



--


 Giovanni Bricconi

Banzai Consulting
cell. 348 7283865
ufficio 02 00643839
via Gian Battista Vico 42
20132 Milano (MI)


Re: multivalued filed question (FieldCache error)

2012-10-02 Thread Chris Hostetter

: I'm also using that field for a facet:

Hmmm... that still doesn't make sense.  faceting can use FieldCache, but 
it will check if ht field is mutivalued to decide if/when/how to do this.

There's nothing else in your requestHandler config that would suggest why 
you might get this error.

can you please provide more details about the error you are getting -- in 
particular: the completley stack trace from the server logs.  that should 
help us itendify the code path leading to the problem.


: 
: |
: 
:  dismax
:  explicit
:  1
:  
:many field but not store_slug
:  
:  
:|many field but not store_slug|||
: 
: ..., store_slug
:  
:   
:  2
:  2
:  *:*
: default
:   true
:   true
:   10
:   true  
: true
: 1
: 0
: count
: ...
: store_slug
: ...
: false
: 
: 
:   spellcheck
: 
: 
:   |
: 
: 
: Il 01/10/12 18:34, Erik Hatcher ha scritto:
: > How is your request handler defined?  Using store_slug for anything but fl?
: > 
: > Erik
: > 
: > On Oct 1, 2012, at 10:51,"giovanni.bricc...@banzai.it"
: >   wrote:
: > 
: > > Hello,
: > > 
: > > I would like to put a multivalued field into a qt definition as output
: > > field. to do this I edit the current solrconfig.xml definition and add the
: > > field in the fl specification.
: > > 
: > > Unexpectedly when I do the query q=*:*&qt=mytype I get the error
: > > 
: > > 
: > > can not use FieldCache on multivalued field: store_slug
: > > 
: > > 
: > > But if I instead run the query
: > > 
: > > 
http://src-eprice-dev:8080/solr/0/select/?q=*:*&qt=mytype&fl=otherfield,mymultivaluedfiled
: > > 
: > > I don't get the error
: > > 
: > > Have you got any suggestions?
: > > 
: > > I'm using solr 4 beta
: > > 
: > > solr-spec 4.0.0.2012.08.06.22.50.47
: > > lucene-impl 4.0.0-BETA 1370099
: > > 
: > > 
: > > Giovanni
: 
: 
: -- 
: 
: 
:  Giovanni Bricconi
: 
: Banzai Consulting
: cell. 348 7283865
: ufficio 02 00643839
: via Gian Battista Vico 42
: 20132 Milano (MI)
: 
: 
: 
: 

-Hoss


Re: multivalued filed question (FieldCache error)

2012-10-01 Thread giovanni.bricc...@banzai.it



I'm also using that field for a facet:

|

 dismax
 explicit
 1
 
   many field but not store_slug
 
 
   |many field but not store_slug|||
   
 

..., store_slug
 
 
 

 2
 2
 *:*
default
  true
  true
  10
  true  


true
1
0
count
...
store_slug
...
false


  spellcheck


  |


Il 01/10/12 18:34, Erik Hatcher ha scritto:

How is your request handler defined?  Using store_slug for anything but fl?

Erik

On Oct 1, 2012, at 10:51,"giovanni.bricc...@banzai.it"  
  wrote:


Hello,

I would like to put a multivalued field into a qt definition as output field. 
to do this I edit the current solrconfig.xml definition and add the field in 
the fl specification.

Unexpectedly when I do the query q=*:*&qt=mytype I get the error


can not use FieldCache on multivalued field: store_slug


But if I instead run the query

http://src-eprice-dev:8080/solr/0/select/?q=*:*&qt=mytype&fl=otherfield,mymultivaluedfiled

I don't get the error

Have you got any suggestions?

I'm using solr 4 beta

solr-spec 4.0.0.2012.08.06.22.50.47
lucene-impl 4.0.0-BETA 1370099


Giovanni



--


 Giovanni Bricconi

Banzai Consulting
cell. 348 7283865
ufficio 02 00643839
via Gian Battista Vico 42
20132 Milano (MI)





Re: Understanding fieldCache SUBREADER "insanity"

2012-10-01 Thread Aaron Daubman
Hi Yonik,

I've been attempting to fix the SUBREADER insanity in our custom
component, and have made perhaps some progress (or is this worse?) -
I've gone from SUBREADER to VALUEMISMATCH insanity:
---snip---
entries_count : 12
entry#0 : 
'MMapIndexInput(path="/io01/p/solr/playlist/c/playlist/index/_c2.frq")'=>'f_normalizedTotalHotttnesss',class
org.apache.lucene.search.FieldCacheImpl$DocsWithFieldCache,null=>org.apache.lucene.util.FixedBitSet#1387502754
entry#1 : 
'MMapIndexInput(path="/io01/p/solr/playlist/c/playlist/index/_c2.frq")'=>'i_track_count',class
org.apache.lucene.search.FieldCacheImpl$DocsWithFieldCache,null=>org.apache.lucene.util.Bits$MatchAllBits#233863705
entry#2 : 
'MMapIndexInput(path="/io01/p/solr/playlist/c/playlist/index/_c2.frq")'=>'s_artistID',class
org.apache.lucene.search.FieldCache$StringIndex,null=>org.apache.lucene.search.FieldCache$StringIndex#652215925
entry#3 : 
'MMapIndexInput(path="/io01/p/solr/playlist/c/playlist/index/_c2.frq")'=>'s_artistID',class
java.lang.String,null=>[Ljava.lang.String;#1036517187
entry#4 : 
'MMapIndexInput(path="/io01/p/solr/playlist/c/playlist/index/_c2.frq")'=>'thingID',class
java.lang.String,null=>[Ljava.lang.String;#357017445
entry#5 : 
'MMapIndexInput(path="/io01/p/solr/playlist/c/playlist/index/_c2.frq")'=>'f_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=>[F#322888397
entry#6 : 
'MMapIndexInput(path="/io01/p/solr/playlist/c/playlist/index/_c2.frq")'=>'f_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.DEFAULT_FLOAT_PARSER=>org.apache.lucene.search.FieldCache$CreationPlaceholder#1229311421
entry#7 : 
'MMapIndexInput(path="/io01/p/solr/playlist/c/playlist/index/_c2.frq")'=>'f_normalizedTotalHotttnesss',float,null=>[F#322888397
entry#8 : 
'MMapIndexInput(path="/io01/p/solr/playlist/c/playlist/index/_c2.frq")'=>'i_collapse',int,org.apache.lucene.search.FieldCache.DEFAULT_INT_PARSER=>org.apache.lucene.search.FieldCache$CreationPlaceholder#92920526
entry#9 : 
'MMapIndexInput(path="/io01/p/solr/playlist/c/playlist/index/_c2.frq")'=>'i_collapse',int,null=>[I#494669113
entry#10 : 
'MMapIndexInput(path="/io01/p/solr/playlist/c/playlist/index/_c2.frq")'=>'i_collapse',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=>[I#494669113
entry#11 : 
'MMapIndexInput(path="/io01/p/solr/playlist/c/playlist/index/_c2.frq")'=>'i_track_count',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=>[I#994584654
insanity_count : 1
insanity#0 : VALUEMISMATCH: Multiple distinct value objects for
MMapIndexInput(path="/io01/p/solr/playlist/c/playlist/index/_c2.frq")+s_artistID
'MMapIndexInput(path="/io01/p/solr/playlist/c/playlist/index/_c2.frq")'=>'s_artistID',class
org.apache.lucene.search.FieldCache$StringIndex,null=>org.apache.lucene.search.FieldCache$StringIndex#652215925
'MMapIndexInput(path="/io01/p/solr/playlist/c/playlist/index/_c2.frq")'=>'s_artistID',class
java.lang.String,null=>[Ljava.lang.String;#1036517187
---snip---

Any suggestions on what the cause of this VALUEMISMATCH is, if it is
the "normal" case, or suggestions on how to "fix" it.

For anybody else with SUBREADER insanity issues, this is the change I
made to get this far (get the first leafReader, since we are using a
merged/optimized index):
---snip---
SolrIndexReader reader = searcher.getReader().getLeafReaders()[0];
collapseIDs = FieldCache.DEFAULT.getInts(reader, COLLAPSE_KEY_NAME);
hotnessValues = FieldCache.DEFAULT.getFloats(reader,
HOTNESS_KEY_NAME);
artistIDs = FieldCache.DEFAULT.getStrings(reader, ARTIST_KEY_NAME);
---snip---

Thanks,
 Aaron

On Wed, Sep 19, 2012 at 4:54 PM, Yonik Seeley  wrote:
>> already-optimized, single-segment index
>
> That part is interesting... if true, then the type of "insanity" you
> saw should be impossible, and either the insanity detection or
> something else is broken.
>
> -Yonik
> http://lucidworks.com


Re: multivalued filed question (FieldCache error)

2012-10-01 Thread Erik Hatcher
How is your request handler defined?  Using store_slug for anything but fl?

   Erik

On Oct 1, 2012, at 10:51, "giovanni.bricc...@banzai.it" 
 wrote:

> Hello,
> 
> I would like to put a multivalued field into a qt definition as output field. 
> to do this I edit the current solrconfig.xml definition and add the field in 
> the fl specification.
> 
> Unexpectedly when I do the query q=*:*&qt=mytype I get the error
> 
> 
> can not use FieldCache on multivalued field: store_slug
> 
> 
> But if I instead run the query
> 
> http://src-eprice-dev:8080/solr/0/select/?q=*:*&qt=mytype&fl=otherfield,mymultivaluedfiled
> 
> I don't get the error
> 
> Have you got any suggestions?
> 
> I'm using solr 4 beta
> 
> solr-spec 4.0.0.2012.08.06.22.50.47
> lucene-impl 4.0.0-BETA 1370099
> 
> 
> Giovanni


multivalued filed question (FieldCache error)

2012-10-01 Thread giovanni.bricc...@banzai.it

Hello,

I would like to put a multivalued field into a qt definition as output 
field. to do this I edit the current solrconfig.xml definition and add 
the field in the fl specification.


Unexpectedly when I do the query q=*:*&qt=mytype I get the error


can not use FieldCache on multivalued field: store_slug


But if I instead run the query

http://src-eprice-dev:8080/solr/0/select/?q=*:*&qt=mytype&fl=otherfield,mymultivaluedfiled

I don't get the error

Have you got any suggestions?

I'm using solr 4 beta

solr-spec 4.0.0.2012.08.06.22.50.47
lucene-impl 4.0.0-BETA 1370099


Giovanni


Re: Understanding fieldCache SUBREADER "insanity"

2012-09-21 Thread Aaron Daubman
Yonik, et al.

I believe I found the section of code pushing me into 'insanity' status:
---snip---
int[] collapseIDs = null;
float[] hotnessValues = null;
String[] artistIDs = null;
try {
collapseIDs =
FieldCache.DEFAULT.getInts(searcher.getIndexReader(),
COLLAPSE_KEY_NAME);
hotnessValues =
FieldCache.DEFAULT.getFloats(searcher.getIndexReader(),
HOTNESS_KEY_NAME);
artistIDs =
FieldCache.DEFAULT.getStrings(searcher.getIndexReader(),
ARTIST_KEY_NAME);
} ...
---snip---

Since it seems like this code is using the 'old-style' pre-Lucene 2.9
top-level indexReaders, is there any example code you can point me to
that could show how to convert to using the leaf level segmentReaders?
If the limited information I've been able to find is correct, this
could explain some of the significant memory usage I am seeing...

Thanks again,
 Aaron

On Wed, Sep 19, 2012 at 4:54 PM, Yonik Seeley  wrote:
>> already-optimized, single-segment index
>
> That part is interesting... if true, then the type of "insanity" you
> saw should be impossible, and either the insanity detection or
> something else is broken.
>
> -Yonik
> http://lucidworks.com


Re: Understanding fieldCache SUBREADER "insanity"

2012-09-19 Thread Yonik Seeley
> already-optimized, single-segment index

That part is interesting... if true, then the type of "insanity" you
saw should be impossible, and either the insanity detection or
something else is broken.

-Yonik
http://lucidworks.com


Re: Understanding fieldCache SUBREADER "insanity"

2012-09-19 Thread Tomás Fernández Löbbe
Some function queries also use the field cache. I *think* those usually use
the segment level cache, but I'm not sure.

On Wed, Sep 19, 2012 at 4:36 PM, Yonik Seeley  wrote:

> The other thing to realize is that it's only "insanity" if it's
> unexpected or not-by-design (so the term is rather mis-named).
> It's more for core developers - if you are just using Solr without
> custom plugins, don't worry about it.
>
> -Yonik
> http://lucidworks.com
>
>
> On Wed, Sep 19, 2012 at 3:27 PM, Tomás Fernández Löbbe
>  wrote:
> > Hi Aaron, here there is some information about the "insanity count":
> > http://wiki.apache.org/solr/SolrCaching#The_Lucene_FieldCache
> >
> > As for the SUBREADER type, the javadocs say:
> > "Indicates an overlap in cache usage on a given field in sub/super
> readers."
> >
> > This probably means that you are using the same field for faceting and
> for
> > sorting (tf_normalizedTotalHotttnesss), sorting uses the segment level
> > cache and faceting uses by default the global field cache. This can be a
> > problem because the field is duplicated in cache, and then it uses twice
> > the memory.
> >
> > One way to solve this would be to change the faceting method on that
> field
> > to 'fcs', which uses segment level cache (but may be a little bit
> slower).
> >
> > Tomás
> >
> >
> > On Wed, Sep 19, 2012 at 3:16 PM, Aaron Daubman 
> wrote:
> >
> >> Hi all,
> >>
> >> In reviewing a solr instance with somewhat variable performance, I
> >> noticed that its fieldCache stats show an insanity_count of 1 with the
> >> insanity type SUBREADER:
> >>
> >> ---snip---
> >> insanity_count : 1
> >> insanity#0 : SUBREADER: Found caches for descendants of
> >> ReadOnlyDirectoryReader(segments_k
> >> _6h9(3.3):C17198463)+tf_normalizedTotalHotttnesss
> >> 'ReadOnlyDirectoryReader(segments_k
> >>
> >>
> _6h9(3.3):C17198463)'=>'tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=>[F#1965982057
> >> 'ReadOnlyDirectoryReader(segments_k
> >>
> >>
> _6h9(3.3):C17198463)'=>'tf_normalizedTotalHotttnesss',float,null=>[F#1965982057
> >>
> >>
> 'MMapIndexInput(path="/io01/p/solr/playlist/a/playlist/index/_6h9.frq")'=>'tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=>[F#1308116426
> >> ---snip---
> >>
> >> How can I decipher what this means and what, if anything, I should do
> >> to fix/improve the "insanity"?
> >>
> >> Thanks,
> >>  Aaron
> >>
>


  1   2   3   >