subject:"Field Cache"

Re: What controls field cache size and eviction rates?

2021-03-05 Thread Stephen Lewis Bianamara

Should say -- Can anyone confirm if it's right *still*, since the article
is 10 years old :)

On Fri, Mar 5, 2021 at 10:36 AM Stephen Lewis Bianamara <
stephen.bianam...@gmail.com> wrote:

> Hi SOLR Community,
>
> Just following up here with an update. I found this article which goes
> into depth on the field cache though stops short of discussing how it
> handles eviction. Can anyone confirm if this info is right?
>
> https://lucidworks.com/post/scaling-lucene-and-solr/
>
>
> Also, can anyone speak to how the field cache handles evictions?
>
> Best,
> Stephen
>
> On Wed, Feb 24, 2021 at 4:43 PM Stephen Lewis Bianamara <
> stephen.bianam...@gmail.com> wrote:
>
>> Hi SOLR Community,
>>
>> I've been trying to understand how the field cache in SOLR manages
>> its evictions, and it is not easily readable from the code or documentation
>> the simple question of when and how something gets evicted from the field
>> cache. This cache also doesn't show hit ratio, total hits, eviction ratio,
>> total evictions, etc... in the web UI.
>>
>> For example: I've observed that if I write one document and trigger a
>> query with a sort on the field, it will generate two entries in the field
>> cache. Then if I repush the document, the entries get removed, but will
>> otherwise stay there seemingly forever. If my query matches 2 docs, same
>> thing but with 4 entries (2 each). Then, if I rewrite one of the docs,
>> those two entries go away but not the two from the first one. This
>> obviously implies that there are implications to write throughput
>> performance based on this cache, so the fact that it is not configurable by
>> the user and doesn't have very clear documentation is a bit worrisome.
>>
>> Can someone here help out and explain how the filter cache handles
>> evictions, or perhaps send me the documentation if I missed it?
>>
>>
>> Thanks!
>> Stephen
>>
>

Re: What controls field cache size and eviction rates?

2021-03-05 Thread Stephen Lewis Bianamara

Hi SOLR Community,

Just following up here with an update. I found this article which goes into
depth on the field cache though stops short of discussing how it handles
eviction. Can anyone confirm if this info is right?

https://lucidworks.com/post/scaling-lucene-and-solr/


Also, can anyone speak to how the field cache handles evictions?

Best,
Stephen

On Wed, Feb 24, 2021 at 4:43 PM Stephen Lewis Bianamara <
stephen.bianam...@gmail.com> wrote:

> Hi SOLR Community,
>
> I've been trying to understand how the field cache in SOLR manages
> its evictions, and it is not easily readable from the code or documentation
> the simple question of when and how something gets evicted from the field
> cache. This cache also doesn't show hit ratio, total hits, eviction ratio,
> total evictions, etc... in the web UI.
>
> For example: I've observed that if I write one document and trigger a
> query with a sort on the field, it will generate two entries in the field
> cache. Then if I repush the document, the entries get removed, but will
> otherwise stay there seemingly forever. If my query matches 2 docs, same
> thing but with 4 entries (2 each). Then, if I rewrite one of the docs,
> those two entries go away but not the two from the first one. This
> obviously implies that there are implications to write throughput
> performance based on this cache, so the fact that it is not configurable by
> the user and doesn't have very clear documentation is a bit worrisome.
>
> Can someone here help out and explain how the filter cache handles
> evictions, or perhaps send me the documentation if I missed it?
>
>
> Thanks!
> Stephen
>

What controls field cache size and eviction rates?

2021-02-24 Thread Stephen Lewis Bianamara

Hi SOLR Community,

I've been trying to understand how the field cache in SOLR manages
its evictions, and it is not easily readable from the code or documentation
the simple question of when and how something gets evicted from the field
cache. This cache also doesn't show hit ratio, total hits, eviction ratio,
total evictions, etc... in the web UI.

For example: I've observed that if I write one document and trigger a query
with a sort on the field, it will generate two entries in the field cache.
Then if I repush the document, the entries get removed, but will otherwise
stay there seemingly forever. If my query matches 2 docs, same thing but
with 4 entries (2 each). Then, if I rewrite one of the docs, those two
entries go away but not the two from the first one. This obviously implies
that there are implications to write throughput performance based on this
cache, so the fact that it is not configurable by the user and doesn't have
very clear documentation is a bit worrisome.

Can someone here help out and explain how the filter cache handles
evictions, or perhaps send me the documentation if I missed it?


Thanks!
Stephen

Re: Unknown field "cache"

2018-09-02 Thread Erick Erickson

Which are you using? schema.xml of managed-schema? You must be using
one or the other, but not both.

It's likely you're using managed-schema, that's where changes need to be made.

Best,
Erick
On Sun, Sep 2, 2018 at 11:55 AM Bineesh  wrote:
>
> Hi Govind,
>
> Thanks for the reply. Pleasee below the chema.xml and managed.schema
>
> 1: schema.xml
>
>
>  precisionStep="6" positionIncrementGap="0"/>
>   precisionStep="6" positionIncrementGap="0" multiValued="true"/>
>  int, float, long, date, double, including the "Trie" variants.
> 
> 
> 
> 
> 
>
> 2 : managed.schema
>
>  - For maximum indexing performance, use the ConcurrentUpdateSolrServer
>  "add-unknown-fields-to-the-schema" update request processor chain
> declared
>  stored="true"/>
>  multiValued="true"/>
> 
>  stored="true"/>
> 
>  stored="true"/>
>  This includes "string","boolean", "int", "float", "long",
> "date", "double",
>
> 
>  multiValued="true"/>
>  precisionStep="0" positionIncrementGap="0"/>
>  precisionStep="0" positionIncrementGap="0" multiValued="true"/>
>  precisionStep="6" positionIncrementGap="0"/>
>  precisionStep="6" positionIncrementGap="0" multiValued="true"/>
>
> please let me know if i'm missing anything
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Unknown field "cache"

2018-09-02 Thread Bineesh

Hi Govind,

Thanks for the reply. Pleasee below the chema.xml and managed.schema

1: schema.xml

 

 
 int, float, long, date, double, including the "Trie" variants.






2 : managed.schema

 - For maximum indexing performance, use the ConcurrentUpdateSolrServer
 "add-unknown-fields-to-the-schema" update request processor chain
declared






 This includes "string","boolean", "int", "float", "long",
"date", "double",








please let me know if i'm missing anything




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Unknown field "cache"

2018-09-02 Thread govind nitk

Please metion the schema definition of date.
If you edit solr schema manually, you need to reload the solr core.



On Sat, Sep 1, 2018 at 3:38 AM kunhu0...@gmail.com 
wrote:

> Hello Team,
>
> Need suggestions on Solr Indexing. We are using Solr-6.6.3 and Nutch 1.14.
>
> I see unknown field 'cache' error while indexing the data to Solr so i
> added
> below entry in field section of schema.xml forsolr
>
> 
>
> Tried indexing the data again and this time error is unknown field 'date'.
> However i have the 
> Please suggest
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Unknown field "cache"

2018-08-31 Thread kunhu0...@gmail.com

Hello Team,

Need suggestions on Solr Indexing. We are using Solr-6.6.3 and Nutch 1.14. 

I see unknown field 'cache' error while indexing the data to Solr so i added
below entry in field section of schema.xml forsolr



Tried indexing the data again and this time error is unknown field 'date'.
However i have the http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Default Field Cache

2016-09-01 Thread Rallavagu


Yes. Thanks.

On 9/1/16 4:53 AM, Alessandro Benedetti wrote:

Are you looking for this ?

org/apache/solr/core/SolrConfig.java:243

CacheConfig conf = CacheConfig.getConfig(this, "query/fieldValueCache");
if (conf == null) {
  Map args = new HashMap<>();
  args.put(NAME, "fieldValueCache");
  args.put("size", "1");
  args.put("initialSize", "10");
  args.put("showItems", "-1");
  conf = new CacheConfig(FastLRUCache.class, args, null);
}
fieldValueCacheConfig = conf;


Cheers


On Thu, Sep 1, 2016 at 2:41 AM, Rallavagu  wrote:


But, the configuration is commented out (disabled). As comments section
mentioned

"The fieldValueCache is created by default even if not configured here"

I would like to know what would be the configuration of default
fieldValueCache created.


On 8/31/16 6:37 PM, Zheng Lin Edwin Yeo wrote:


If I didn't get your question wrong, what you have listed is already the
default configuration that comes with your version of Solr.

Regards,
Edwin

On 30 August 2016 at 07:49, Rallavagu  wrote:

Solr 5.4.1





Wondering what is the default configuration for "fieldValueCache".

Re: Default Field Cache

2016-09-01 Thread Alessandro Benedetti

Are you looking for this ?

org/apache/solr/core/SolrConfig.java:243

CacheConfig conf = CacheConfig.getConfig(this, "query/fieldValueCache");
if (conf == null) {
  Map args = new HashMap<>();
  args.put(NAME, "fieldValueCache");
  args.put("size", "1");
  args.put("initialSize", "10");
  args.put("showItems", "-1");
  conf = new CacheConfig(FastLRUCache.class, args, null);
}
fieldValueCacheConfig = conf;


Cheers


On Thu, Sep 1, 2016 at 2:41 AM, Rallavagu  wrote:

> But, the configuration is commented out (disabled). As comments section
> mentioned
>
> "The fieldValueCache is created by default even if not configured here"
>
> I would like to know what would be the configuration of default
> fieldValueCache created.
>
>
> On 8/31/16 6:37 PM, Zheng Lin Edwin Yeo wrote:
>
>> If I didn't get your question wrong, what you have listed is already the
>> default configuration that comes with your version of Solr.
>>
>> Regards,
>> Edwin
>>
>> On 30 August 2016 at 07:49, Rallavagu  wrote:
>>
>> Solr 5.4.1
>>>
>>> 
>>> 
>>>
>>> Wondering what is the default configuration for "fieldValueCache".
>>>
>>>
>>


-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: Default Field Cache

2016-08-31 Thread Rallavagu

But, the configuration is commented out (disabled). As comments section 
mentioned


"The fieldValueCache is created by default even if not configured here"

I would like to know what would be the configuration of default 
fieldValueCache created.


On 8/31/16 6:37 PM, Zheng Lin Edwin Yeo wrote:

If I didn't get your question wrong, what you have listed is already the
default configuration that comes with your version of Solr.

Regards,
Edwin

On 30 August 2016 at 07:49, Rallavagu  wrote:


Solr 5.4.1




Wondering what is the default configuration for "fieldValueCache".

Re: Default Field Cache

2016-08-31 Thread Zheng Lin Edwin Yeo

If I didn't get your question wrong, what you have listed is already the
default configuration that comes with your version of Solr.

Regards,
Edwin

On 30 August 2016 at 07:49, Rallavagu  wrote:

> Solr 5.4.1
>
> 
> 
>
> Wondering what is the default configuration for "fieldValueCache".
>

Default Field Cache

2016-08-29 Thread Rallavagu


Solr 5.4.1




Wondering what is the default configuration for "fieldValueCache".

Re: Disable or limit the size of Lucene field cache

2015-06-04 Thread pras.venkatesh

A follow up question, I see docValues has been there since Lucene 4.0. so can
I use docValues with my current solr cloud version of 4.8.x 

The reason I am asking is because, I have deployment mechanism and securing
the index (using Tomcat Valves) all built out based on Tomcat which I need
figure out with Jetty.

so thinking if I could use docValues with solr/lucene 4.8.x



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Disable-or-limit-the-size-of-Lucene-field-cache-tp4198798p4209873.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Disable or limit the size of Lucene field cache

2015-04-14 Thread pras.venkatesh

Thank you.. This really helps. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Disable-or-limit-the-size-of-Lucene-field-cache-tp4198798p4199646.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Sorting problem in Solr due to Lucene Field Cache

2014-05-16 Thread Joel Bernstein

Take a look at Solr's use of DocValues:
https://cwiki.apache.org/confluence/display/solr/DocValues.

There are docValues options that use less memory then the FieldCache.

Joel Bernstein
Search Engineer at Heliosearch


On Thu, May 15, 2014 at 6:39 AM, Jeongseok Son  wrote:

> Hello, I'm struggling with large data indexed and searched by Solr.
>
> The schema of the documents consist of date(-MM-DD), text(tokenized and
> indexed with Natural Language Toolkit), and several numerical fields.
>
> Each document is small-sized but but the number of the docs is very large,
> which is around 10 million per each date. The server has 32GB of memory and
> I allocated around 30GB for Solr JVM.
>
> My Solr server has to return documents sorted by one of the numerical
> fields when is requested with specific date and text.(ex.
> q=date:-MM-DD+text:KEYWORD) The problem is that sorting in Lucene
> requires lots of Field Cache and Solr can't handle Field Cache well. The
> Field Cache is getting larger as more queries are executed and is not
> evicted. When the whole memory is filled with Field Cache, Solr server
> stops or generates Out of Memory exception.
>
> Solr cannot control Lucene field cache at all so I have a difficult time to
> solve this problem. I'm considering these three ways to solve this.
>
> 1) Add more memory.
> This can relieve the problem but I don't think it can completely solve it.
> Anyway the memory would fill up with field cache as the server handles
> search requests.
> 2) Separate numerical data from text data
> I find Solr/Lucene isn't suitable for sorting large numerical data.
> Therefore I'm thinking of storing numerical data in another DB(HBase,
> MongoDB ...), then Solr server will just do some text search.
> 3) Switching to Elasticsearch
> According to this page(
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html
> )
> Elasticsearch can control field cache. I think ES could solve my
> problem.
>
> I'm likely to try 2nd, or 3rd way. Are these appropriate solutions? If you
> have any better ideas please let me know. I've went through too many
> troubles so it's time to make a decision. I want my choices reviewed by
> many other excellent Solr users and developers and also want to find better
> solutions.
> I really appreciate any help you can provide.
>

Sorting problem in Solr due to Lucene Field Cache

2014-05-16 Thread Jeongseok Son

Hello, I'm struggling with large data indexed and searched by Solr.

The schema of the documents consist of date(-MM-DD), text(tokenized and
indexed with Natural Language Toolkit), and several numerical fields.

Each document is small-sized but but the number of the docs is very large,
which is around 10 million per each date. The server has 32GB of memory and
I allocated around 30GB for Solr JVM.

My Solr server has to return documents sorted by one of the numerical
fields when is requested with specific date and text.(ex.
q=date:-MM-DD+text:KEYWORD) The problem is that sorting in Lucene
requires lots of Field Cache and Solr can't handle Field Cache well. The
Field Cache is getting larger as more queries are executed and is not
evicted. When the whole memory is filled with Field Cache, Solr server
stops or generates Out of Memory exception.

Solr cannot control Lucene field cache at all so I have a difficult time to
solve this problem. I'm considering these three ways to solve this.

1) Add more memory.
This can relieve the problem but I don't think it can completely solve it.
Anyway the memory would fill up with field cache as the server handles
search requests.
2) Separate numerical data from text data
I find Solr/Lucene isn't suitable for sorting large numerical data.
Therefore I'm thinking of storing numerical data in another DB(HBase,
MongoDB ...), then Solr server will just do some text search.
3) Switching to Elasticsearch
According to this page(
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html)
Elasticsearch can control field cache. I think ES could solve my
problem.

I'm likely to try 2nd, or 3rd way. Are these appropriate solutions? If you
have any better ideas please let me know. I've went through too many
troubles so it's time to make a decision. I want my choices reviewed by
many other excellent Solr users and developers and also want to find better
solutions.
I really appreciate any help you can provide.

[ANN] Heliosearch 0.03 with off-heap field cache

2014-02-03 Thread Yonik Seeley

A new Heliosearch pre-release has been cut for people to try out:
https://github.com/Heliosearch/heliosearch/releases

Release Notes:
-
This is Heliosearch v0.03
Heliosearch is forked from Apache Solr and includes the following
additional features:

- Off-Heap Filters to reduce garbage collection pauses and overhead.
http://www.heliosearch.org/off-heap-filters

- Removed the 1024 limit on the number of clauses in a boolean query.
For example: q=id:(doc1 doc2 doc3 doc4 doc5 ... doc2000) will now work
correctly without throwing an exception.

- Deep Paging with cursorMark.  This is not yet in a current release
of Apache Solr, but should be in Solr 4.7
http://heliosearch.org/solr/paging-and-deep-paging/

- nCache - the new Off-Heap FieldCache to reduce garbage collection
overhead and accelerate sorting, faceting, and function queries.
http://heliosearch.org/solr-off-heap-fieldcache



-Yonik
http://heliosearch.com -- making solr shine

Re: Solr 3.6.1 stalling with high CPU and blocking on field cache

2013-12-09 Thread Patrick O'Lone

with regards to faceting, but heavily sort. We provide
> classified ad
> >>>> searches and that heavily uses faceting. I might try reducing
> the JVM
> >>>> memory some and amount of perm generation as suggested earlier.
> It feels
> >>>> like a GC issue and loading the cache just happens to be the
> victim of a
>     >>>> stop-the-world event at the worse possible time.
> >>>>
> >>>>> My gut instinct is that your heap size is way too high. Try
> >>>>> decreasing it to like 5-10G. I know you say it uses more than
> that,
> >>>>> but that just seems bizarre unless you're doing something like
> >>>>> faceting and/or sorting on every field.
> >>>>>
> >>>>> -Michael
> >>>>>
> >>>>> -Original Message-
> >>>>> From: Patrick O'Lone [mailto:pol...@townnews.com
> <mailto:pol...@townnews.com>]
> >>>>> Sent: Tuesday, November 26, 2013 11:59 AM
> >>>>> To: solr-user@lucene.apache.org
> <mailto:solr-user@lucene.apache.org>
> >>>>> Subject: Solr 3.6.1 stalling with high CPU and blocking on
> field cache
> >>>>>
> >>>>> I've been tracking a problem in our Solr environment for
> awhile with
> >>>>> periodic stalls of Solr 3.6.1. I'm running up to a wall on
> ideas to
> >>>>> try and thought I might get some insight from some others on
> this list.
> >>>>>
> >>>>> The load on the server is normally anywhere between 1-3. It's an
> >>>>> 8-core machine with 40GB of RAM. I have about 25GB of index
> data that
> >>>>> is replicated to this server every 5 minutes. It's taking
> about 200
> >>>>> connections per second and roughly every 5-10 minutes it will
> stall
> >>>>> for about 30 seconds to a minute. The stall causes the load to
> go to
> >>>>> as high as 90. It is all CPU bound in user space - all cores go to
> >>>>> 99% utilization (spinlock?). When doing a thread dump, the
> following
> >>>>> line is blocked in all running Tomcat threads:
> >>>>>
> >>>>> org.apache.lucene.search.FieldCacheImpl$Cache.get (
> >>>>> FieldCacheImpl.java:230 )
> >>>>>
> >>>>> Looking the source code in 3.6.1, that is a function call to
> >>>>> syncronized() which blocks all threads and causes the backlog.
> I've
> >>>>> tried to correlate these events to the replication events -
> but even
> >>>>> with replication disabled - this still happens. We run
> multiple data
> >>>>> centers using Solr and I was comparing garbage collection
> processes
> >>>>> between and noted that the old generation is collected very
> >>>>> differently on this data center versus others. The old
> generation is
> >>>>> collected as a massive collect event (several gigabytes worth)
> - the
> >>>>> other data center is more saw toothed and collects only in
> 500MB-1GB
> >>>>> at a time. Here's my parameters to java (the same in all
> environments):
> >>>>>
> >>>>> /usr/java/jre/bin/java \
> >>>>> -verbose:gc \
> >>>>> -XX:+PrintGCDetails \
> >>>>> -server \
> >>>>> -Dcom.sun.management.jmxremote \
> >>>>> -XX:+UseConcMarkSweepGC \
> >>>>> -XX:+UseParNewGC \
> >>>>> -XX:+CMSIncrementalMode \
> >>>>> -XX:+CMSParallelRemarkEnabled \
> >>>>> -XX:+CMSIncrementalPacing \
> >>>>> -XX:NewRatio=3 \
> >>>>> -Xms30720M \
> >>>>> -Xmx30720M \
> >>>>> -Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \
> >>>>> -classpath /usr/local/share/apache-tomcat/bin/bootstrap.jar \
> >>>>> -Dcatalina.base=/usr/local/share/apache-tomcat \
> >>>>> -Dcatalina.home=/usr/local/share/apache-tomcat \
> >>>>> -Djava.io.tmpdir=/tmp \ org.apache.catalina.

Re: Solr 3.6.1 stalling with high CPU and blocking on field cache

2013-12-09 Thread Joel Bernstein

Patrick,

Are you getting these stalls following a commit? If so then the issue is
most likely fieldCache warming pauses. To stop your users from seeing this
pause you'll need to add static warming queries to your solrconfig.xml to
warm the fieldCache before it's registered .


On Mon, Dec 9, 2013 at 12:33 PM, Patrick O'Lone  wrote:

> Well, I want to include everything will start in the next 5 minute
> interval and everything that came before. The query is more like:
>
> fq=start_time:[* TO NOW+5MINUTE/5MINUTE]
>
> so that it rounds to the nearest 5 minute interval on the right-hand
> side. But, as soon as 1 second after that 5 minute window, everything
> pauses wanting for filter cache (at least that's my working theory based
> on observation). Is it possible to do something like:
>
> fq=start_time:[* TO NOW+1DAY/DAY]&q=start_time:[* TO NOW/MINUTE]
>
> where it would use the filter cache to narrow down by day resolution and
> then filter as part of the standard query, or something like that?
>
> My thought is that this would still gain a benefit from a query cache,
> but somewhat slower since it must remove results for things appearing
> later in the day.
>
> > If you want a start time within the next 5 minutes, I think your filter
> > is not the good one.
> > * will be replaced by the first date in your field
> >
> > Try :
> > fq=start_time:[NOW TO NOW+5MINUTE]
> >
> > Franck Brisbart
> >
> >
> > Le lundi 09 décembre 2013 à 09:07 -0600, Patrick O'Lone a écrit :
> >> I have a new question about this issue - I create a filter queries of
> >> the form:
> >>
> >> fq=start_time:[* TO NOW/5MINUTE]
> >>
> >> This is used to restrict the set of documents to only items that have a
> >> start time within the next 5 minutes. Most of my indexes have millions
> >> of documents with few documents that start sometime in the future.
> >> Nearly all of my queries include this, would this cause every other
> >> search thread to block until the filter query is re-cached every 5
> >> minutes and if so, is there a better way to do it? Thanks for any
> >> continued help with this issue!
> >>
> >>> We have a webapp running with a very high HEAP size (24GB) and we have
> >>> no problems with it AFTER we enabled the new GC that is meant to
> replace
> >>> sometime in the future the CMS GC, but you have to have Java 6 update
> >>> "Some number I couldn't find but latest should cover" to be able to
> use:
> >>>
> >>> 1. Remove all GC options you have and...
> >>> 2. Replace them with /"-XX:+UseG1GC -XX:MaxGCPauseMillis=50"/
> >>>
> >>> As a test of course, more information you can read on the following
> (and
> >>> interesting) article, we also have Solr running with these options, no
> >>> more pauses or HEAP size hitting the sky.
> >>>
> >>> Don't get bored reading the 1st (and small) introduction page of the
> >>> article, page 2 and 3 will make lot of sense:
> >>>
> http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061
> >>>
> >>>
> >>> HTH,
> >>>
> >>> Guido.
> >>>
> >>> On 26/11/13 21:59, Patrick O'Lone wrote:
> >>>> We do perform a lot of sorting - on multiple fields in fact. We have
> >>>> different kinds of Solr configurations - our news searches do little
> >>>> with regards to faceting, but heavily sort. We provide classified ad
> >>>> searches and that heavily uses faceting. I might try reducing the JVM
> >>>> memory some and amount of perm generation as suggested earlier. It
> feels
> >>>> like a GC issue and loading the cache just happens to be the victim
> of a
> >>>> stop-the-world event at the worse possible time.
> >>>>
> >>>>> My gut instinct is that your heap size is way too high. Try
> >>>>> decreasing it to like 5-10G. I know you say it uses more than that,
> >>>>> but that just seems bizarre unless you're doing something like
> >>>>> faceting and/or sorting on every field.
> >>>>>
> >>>>> -Michael
> >>>>>
> >>>>> -Original Message-
> >>>>> From: Patrick O'Lone [mailto:pol...@townnews.com]
> >>>>> Sent: Tuesday, November 26, 2013 11:59 AM
> >>>>> To: solr-user@lucene.apache

Re: Solr 3.6.1 stalling with high CPU and blocking on field cache

2013-12-09 Thread Patrick O'Lone

Well, I want to include everything will start in the next 5 minute
interval and everything that came before. The query is more like:

fq=start_time:[* TO NOW+5MINUTE/5MINUTE]

so that it rounds to the nearest 5 minute interval on the right-hand
side. But, as soon as 1 second after that 5 minute window, everything
pauses wanting for filter cache (at least that's my working theory based
on observation). Is it possible to do something like:

fq=start_time:[* TO NOW+1DAY/DAY]&q=start_time:[* TO NOW/MINUTE]

where it would use the filter cache to narrow down by day resolution and
then filter as part of the standard query, or something like that?

My thought is that this would still gain a benefit from a query cache,
but somewhat slower since it must remove results for things appearing
later in the day.

> If you want a start time within the next 5 minutes, I think your filter
> is not the good one.
> * will be replaced by the first date in your field
> 
> Try :
> fq=start_time:[NOW TO NOW+5MINUTE]
> 
> Franck Brisbart
> 
> 
> Le lundi 09 décembre 2013 à 09:07 -0600, Patrick O'Lone a écrit :
>> I have a new question about this issue - I create a filter queries of
>> the form:
>>
>> fq=start_time:[* TO NOW/5MINUTE]
>>
>> This is used to restrict the set of documents to only items that have a
>> start time within the next 5 minutes. Most of my indexes have millions
>> of documents with few documents that start sometime in the future.
>> Nearly all of my queries include this, would this cause every other
>> search thread to block until the filter query is re-cached every 5
>> minutes and if so, is there a better way to do it? Thanks for any
>> continued help with this issue!
>>
>>> We have a webapp running with a very high HEAP size (24GB) and we have
>>> no problems with it AFTER we enabled the new GC that is meant to replace
>>> sometime in the future the CMS GC, but you have to have Java 6 update
>>> "Some number I couldn't find but latest should cover" to be able to use:
>>>
>>> 1. Remove all GC options you have and...
>>> 2. Replace them with /"-XX:+UseG1GC -XX:MaxGCPauseMillis=50"/
>>>
>>> As a test of course, more information you can read on the following (and
>>> interesting) article, we also have Solr running with these options, no
>>> more pauses or HEAP size hitting the sky.
>>>
>>> Don't get bored reading the 1st (and small) introduction page of the
>>> article, page 2 and 3 will make lot of sense:
>>> http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061
>>>
>>>
>>> HTH,
>>>
>>> Guido.
>>>
>>> On 26/11/13 21:59, Patrick O'Lone wrote:
>>>> We do perform a lot of sorting - on multiple fields in fact. We have
>>>> different kinds of Solr configurations - our news searches do little
>>>> with regards to faceting, but heavily sort. We provide classified ad
>>>> searches and that heavily uses faceting. I might try reducing the JVM
>>>> memory some and amount of perm generation as suggested earlier. It feels
>>>> like a GC issue and loading the cache just happens to be the victim of a
>>>> stop-the-world event at the worse possible time.
>>>>
>>>>> My gut instinct is that your heap size is way too high. Try
>>>>> decreasing it to like 5-10G. I know you say it uses more than that,
>>>>> but that just seems bizarre unless you're doing something like
>>>>> faceting and/or sorting on every field.
>>>>>
>>>>> -Michael
>>>>>
>>>>> -Original Message-
>>>>> From: Patrick O'Lone [mailto:pol...@townnews.com]
>>>>> Sent: Tuesday, November 26, 2013 11:59 AM
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache
>>>>>
>>>>> I've been tracking a problem in our Solr environment for awhile with
>>>>> periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to
>>>>> try and thought I might get some insight from some others on this list.
>>>>>
>>>>> The load on the server is normally anywhere between 1-3. It's an
>>>>> 8-core machine with 40GB of RAM. I have about 25GB of index data that
>>>>> is replicated to this server every 5 minutes. It's taking about 200
>>>>> connections per second and roughly every 5-10 minutes it will stall
>>&

Re: Solr 3.6.1 stalling with high CPU and blocking on field cache

2013-12-09 Thread Patrick O'Lone

Yeah, I tried G1, but it did not help - I don't think it is a garbage
collection issue. I've made various changes to iCMS as well and the
issue ALWAYS happens - no matter what I do. If I'm taking heavy traffic
(200 requests per second) - as soon as I hit a 5 minute mark - the world
stops - garbage collection would be less predictable. Nearly all of my
requests have this 5 minute windowing behavior on time though, which is
why I have it as a strong suspect now. If it blocks on that - even for a
couple of seconds, my traffic backlog will be 600-800 requests.

> Did you add the Garbage collection JVM options I suggested you?
> 
> -XX:+UseG1GC -XX:MaxGCPauseMillis=50
> 
> Guido.
> 
> On 09/12/13 16:33, Patrick O'Lone wrote:
>> Unfortunately, in a test environment, this happens in version 4.4.0 of
>> Solr as well.
>>
>>> I was trying to locate the release notes for 3.6.x it is too old, if I
>>> were you I would update to 3.6.2 (from 3.6.1), it shouldn't affect you
>>> since it is a minor release, locate the release notes and see if
>>> something that is affecting you got fixed, also, I would be thinking on
>>> moving on to 4.x which is quite stable and fast.
>>>
>>> Like anything with Java and concurrency, it will just get better (and
>>> faster) with bigger numbers and concurrency frameworks becoming more and
>>> more reliable, standard and stable.
>>>
>>> Regards,
>>>
>>> Guido.
>>>
>>> On 09/12/13 15:07, Patrick O'Lone wrote:
>>>> I have a new question about this issue - I create a filter queries of
>>>> the form:
>>>>
>>>> fq=start_time:[* TO NOW/5MINUTE]
>>>>
>>>> This is used to restrict the set of documents to only items that have a
>>>> start time within the next 5 minutes. Most of my indexes have millions
>>>> of documents with few documents that start sometime in the future.
>>>> Nearly all of my queries include this, would this cause every other
>>>> search thread to block until the filter query is re-cached every 5
>>>> minutes and if so, is there a better way to do it? Thanks for any
>>>> continued help with this issue!
>>>>
>>>>> We have a webapp running with a very high HEAP size (24GB) and we have
>>>>> no problems with it AFTER we enabled the new GC that is meant to
>>>>> replace
>>>>> sometime in the future the CMS GC, but you have to have Java 6 update
>>>>> "Some number I couldn't find but latest should cover" to be able to
>>>>> use:
>>>>>
>>>>> 1. Remove all GC options you have and...
>>>>> 2. Replace them with /"-XX:+UseG1GC -XX:MaxGCPauseMillis=50"/
>>>>>
>>>>> As a test of course, more information you can read on the following
>>>>> (and
>>>>> interesting) article, we also have Solr running with these options, no
>>>>> more pauses or HEAP size hitting the sky.
>>>>>
>>>>> Don't get bored reading the 1st (and small) introduction page of the
>>>>> article, page 2 and 3 will make lot of sense:
>>>>> http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> HTH,
>>>>>
>>>>> Guido.
>>>>>
>>>>> On 26/11/13 21:59, Patrick O'Lone wrote:
>>>>>> We do perform a lot of sorting - on multiple fields in fact. We have
>>>>>> different kinds of Solr configurations - our news searches do little
>>>>>> with regards to faceting, but heavily sort. We provide classified ad
>>>>>> searches and that heavily uses faceting. I might try reducing the JVM
>>>>>> memory some and amount of perm generation as suggested earlier. It
>>>>>> feels
>>>>>> like a GC issue and loading the cache just happens to be the victim
>>>>>> of a
>>>>>> stop-the-world event at the worse possible time.
>>>>>>
>>>>>>> My gut instinct is that your heap size is way too high. Try
>>>>>>> decreasing it to like 5-10G. I know you say it uses more than that,
>>>>>>> but that just seems bizarre unless you're doing something like
>>>>>>> faceting and/or sorting on every field.
>>>>>>>
>>>>>>> -Michael
&g

Re: Solr 3.6.1 stalling with high CPU and blocking on field cache

2013-12-09 Thread Guido Medina


Did you add the Garbage collection JVM options I suggested you?

-XX:+UseG1GC -XX:MaxGCPauseMillis=50

Guido.

On 09/12/13 16:33, Patrick O'Lone wrote:

Unfortunately, in a test environment, this happens in version 4.4.0 of
Solr as well.


I was trying to locate the release notes for 3.6.x it is too old, if I
were you I would update to 3.6.2 (from 3.6.1), it shouldn't affect you
since it is a minor release, locate the release notes and see if
something that is affecting you got fixed, also, I would be thinking on
moving on to 4.x which is quite stable and fast.

Like anything with Java and concurrency, it will just get better (and
faster) with bigger numbers and concurrency frameworks becoming more and
more reliable, standard and stable.

Regards,

Guido.

On 09/12/13 15:07, Patrick O'Lone wrote:

I have a new question about this issue - I create a filter queries of
the form:

fq=start_time:[* TO NOW/5MINUTE]

This is used to restrict the set of documents to only items that have a
start time within the next 5 minutes. Most of my indexes have millions
of documents with few documents that start sometime in the future.
Nearly all of my queries include this, would this cause every other
search thread to block until the filter query is re-cached every 5
minutes and if so, is there a better way to do it? Thanks for any
continued help with this issue!


We have a webapp running with a very high HEAP size (24GB) and we have
no problems with it AFTER we enabled the new GC that is meant to replace
sometime in the future the CMS GC, but you have to have Java 6 update
"Some number I couldn't find but latest should cover" to be able to use:

1. Remove all GC options you have and...
2. Replace them with /"-XX:+UseG1GC -XX:MaxGCPauseMillis=50"/

As a test of course, more information you can read on the following (and
interesting) article, we also have Solr running with these options, no
more pauses or HEAP size hitting the sky.

Don't get bored reading the 1st (and small) introduction page of the
article, page 2 and 3 will make lot of sense:
http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061



HTH,

Guido.

On 26/11/13 21:59, Patrick O'Lone wrote:

We do perform a lot of sorting - on multiple fields in fact. We have
different kinds of Solr configurations - our news searches do little
with regards to faceting, but heavily sort. We provide classified ad
searches and that heavily uses faceting. I might try reducing the JVM
memory some and amount of perm generation as suggested earlier. It
feels
like a GC issue and loading the cache just happens to be the victim
of a
stop-the-world event at the worse possible time.


My gut instinct is that your heap size is way too high. Try
decreasing it to like 5-10G. I know you say it uses more than that,
but that just seems bizarre unless you're doing something like
faceting and/or sorting on every field.

-Michael

-Original Message-
From: Patrick O'Lone [mailto:pol...@townnews.com]
Sent: Tuesday, November 26, 2013 11:59 AM
To: solr-user@lucene.apache.org
Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache

I've been tracking a problem in our Solr environment for awhile with
periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to
try and thought I might get some insight from some others on this
list.

The load on the server is normally anywhere between 1-3. It's an
8-core machine with 40GB of RAM. I have about 25GB of index data that
is replicated to this server every 5 minutes. It's taking about 200
connections per second and roughly every 5-10 minutes it will stall
for about 30 seconds to a minute. The stall causes the load to go to
as high as 90. It is all CPU bound in user space - all cores go to
99% utilization (spinlock?). When doing a thread dump, the following
line is blocked in all running Tomcat threads:

org.apache.lucene.search.FieldCacheImpl$Cache.get (
FieldCacheImpl.java:230 )

Looking the source code in 3.6.1, that is a function call to
syncronized() which blocks all threads and causes the backlog. I've
tried to correlate these events to the replication events - but even
with replication disabled - this still happens. We run multiple data
centers using Solr and I was comparing garbage collection processes
between and noted that the old generation is collected very
differently on this data center versus others. The old generation is
collected as a massive collect event (several gigabytes worth) - the
other data center is more saw toothed and collects only in 500MB-1GB
at a time. Here's my parameters to java (the same in all
environments):

/usr/java/jre/bin/java \
-verbose:gc \
-XX:+PrintGCDetails \
-server \
-Dcom.sun.management.jmxremote \
-XX:+UseConcMarkSweepGC \
-XX:+UseParNewGC \
-XX:+CMSIncrementalMode \
-XX:+CMSParallelRemarkEnabled \
-XX:+CMSIncrementalPacing \
-XX:NewRatio=3 \
-Xms30720M \
-Xmx30720M \

Re: Solr 3.6.1 stalling with high CPU and blocking on field cache

2013-12-09 Thread fbrisbart

If you want a start time within the next 5 minutes, I think your filter
is not the good one.
* will be replaced by the first date in your field

Try :
fq=start_time:[NOW TO NOW+5MINUTE]

Franck Brisbart


Le lundi 09 décembre 2013 à 09:07 -0600, Patrick O'Lone a écrit :
> I have a new question about this issue - I create a filter queries of
> the form:
> 
> fq=start_time:[* TO NOW/5MINUTE]
> 
> This is used to restrict the set of documents to only items that have a
> start time within the next 5 minutes. Most of my indexes have millions
> of documents with few documents that start sometime in the future.
> Nearly all of my queries include this, would this cause every other
> search thread to block until the filter query is re-cached every 5
> minutes and if so, is there a better way to do it? Thanks for any
> continued help with this issue!
> 
> > We have a webapp running with a very high HEAP size (24GB) and we have
> > no problems with it AFTER we enabled the new GC that is meant to replace
> > sometime in the future the CMS GC, but you have to have Java 6 update
> > "Some number I couldn't find but latest should cover" to be able to use:
> > 
> > 1. Remove all GC options you have and...
> > 2. Replace them with /"-XX:+UseG1GC -XX:MaxGCPauseMillis=50"/
> > 
> > As a test of course, more information you can read on the following (and
> > interesting) article, we also have Solr running with these options, no
> > more pauses or HEAP size hitting the sky.
> > 
> > Don't get bored reading the 1st (and small) introduction page of the
> > article, page 2 and 3 will make lot of sense:
> > http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061
> > 
> > 
> > HTH,
> > 
> > Guido.
> > 
> > On 26/11/13 21:59, Patrick O'Lone wrote:
> >> We do perform a lot of sorting - on multiple fields in fact. We have
> >> different kinds of Solr configurations - our news searches do little
> >> with regards to faceting, but heavily sort. We provide classified ad
> >> searches and that heavily uses faceting. I might try reducing the JVM
> >> memory some and amount of perm generation as suggested earlier. It feels
> >> like a GC issue and loading the cache just happens to be the victim of a
> >> stop-the-world event at the worse possible time.
> >>
> >>> My gut instinct is that your heap size is way too high. Try
> >>> decreasing it to like 5-10G. I know you say it uses more than that,
> >>> but that just seems bizarre unless you're doing something like
> >>> faceting and/or sorting on every field.
> >>>
> >>> -Michael
> >>>
> >>> -Original Message-
> >>> From: Patrick O'Lone [mailto:pol...@townnews.com]
> >>> Sent: Tuesday, November 26, 2013 11:59 AM
> >>> To: solr-user@lucene.apache.org
> >>> Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache
> >>>
> >>> I've been tracking a problem in our Solr environment for awhile with
> >>> periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to
> >>> try and thought I might get some insight from some others on this list.
> >>>
> >>> The load on the server is normally anywhere between 1-3. It's an
> >>> 8-core machine with 40GB of RAM. I have about 25GB of index data that
> >>> is replicated to this server every 5 minutes. It's taking about 200
> >>> connections per second and roughly every 5-10 minutes it will stall
> >>> for about 30 seconds to a minute. The stall causes the load to go to
> >>> as high as 90. It is all CPU bound in user space - all cores go to
> >>> 99% utilization (spinlock?). When doing a thread dump, the following
> >>> line is blocked in all running Tomcat threads:
> >>>
> >>> org.apache.lucene.search.FieldCacheImpl$Cache.get (
> >>> FieldCacheImpl.java:230 )
> >>>
> >>> Looking the source code in 3.6.1, that is a function call to
> >>> syncronized() which blocks all threads and causes the backlog. I've
> >>> tried to correlate these events to the replication events - but even
> >>> with replication disabled - this still happens. We run multiple data
> >>> centers using Solr and I was comparing garbage collection processes
> >>> between and noted that the old generation is collected very
> >>> differently on this data center versus others. The old generation is
> >

Re: Solr 3.6.1 stalling with high CPU and blocking on field cache

2013-12-09 Thread Patrick O'Lone

Unfortunately, in a test environment, this happens in version 4.4.0 of
Solr as well.

> I was trying to locate the release notes for 3.6.x it is too old, if I
> were you I would update to 3.6.2 (from 3.6.1), it shouldn't affect you
> since it is a minor release, locate the release notes and see if
> something that is affecting you got fixed, also, I would be thinking on
> moving on to 4.x which is quite stable and fast.
> 
> Like anything with Java and concurrency, it will just get better (and
> faster) with bigger numbers and concurrency frameworks becoming more and
> more reliable, standard and stable.
> 
> Regards,
> 
> Guido.
> 
> On 09/12/13 15:07, Patrick O'Lone wrote:
>> I have a new question about this issue - I create a filter queries of
>> the form:
>>
>> fq=start_time:[* TO NOW/5MINUTE]
>>
>> This is used to restrict the set of documents to only items that have a
>> start time within the next 5 minutes. Most of my indexes have millions
>> of documents with few documents that start sometime in the future.
>> Nearly all of my queries include this, would this cause every other
>> search thread to block until the filter query is re-cached every 5
>> minutes and if so, is there a better way to do it? Thanks for any
>> continued help with this issue!
>>
>>> We have a webapp running with a very high HEAP size (24GB) and we have
>>> no problems with it AFTER we enabled the new GC that is meant to replace
>>> sometime in the future the CMS GC, but you have to have Java 6 update
>>> "Some number I couldn't find but latest should cover" to be able to use:
>>>
>>> 1. Remove all GC options you have and...
>>> 2. Replace them with /"-XX:+UseG1GC -XX:MaxGCPauseMillis=50"/
>>>
>>> As a test of course, more information you can read on the following (and
>>> interesting) article, we also have Solr running with these options, no
>>> more pauses or HEAP size hitting the sky.
>>>
>>> Don't get bored reading the 1st (and small) introduction page of the
>>> article, page 2 and 3 will make lot of sense:
>>> http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061
>>>
>>>
>>>
>>> HTH,
>>>
>>> Guido.
>>>
>>> On 26/11/13 21:59, Patrick O'Lone wrote:
>>>> We do perform a lot of sorting - on multiple fields in fact. We have
>>>> different kinds of Solr configurations - our news searches do little
>>>> with regards to faceting, but heavily sort. We provide classified ad
>>>> searches and that heavily uses faceting. I might try reducing the JVM
>>>> memory some and amount of perm generation as suggested earlier. It
>>>> feels
>>>> like a GC issue and loading the cache just happens to be the victim
>>>> of a
>>>> stop-the-world event at the worse possible time.
>>>>
>>>>> My gut instinct is that your heap size is way too high. Try
>>>>> decreasing it to like 5-10G. I know you say it uses more than that,
>>>>> but that just seems bizarre unless you're doing something like
>>>>> faceting and/or sorting on every field.
>>>>>
>>>>> -Michael
>>>>>
>>>>> -Original Message-
>>>>> From: Patrick O'Lone [mailto:pol...@townnews.com]
>>>>> Sent: Tuesday, November 26, 2013 11:59 AM
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache
>>>>>
>>>>> I've been tracking a problem in our Solr environment for awhile with
>>>>> periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to
>>>>> try and thought I might get some insight from some others on this
>>>>> list.
>>>>>
>>>>> The load on the server is normally anywhere between 1-3. It's an
>>>>> 8-core machine with 40GB of RAM. I have about 25GB of index data that
>>>>> is replicated to this server every 5 minutes. It's taking about 200
>>>>> connections per second and roughly every 5-10 minutes it will stall
>>>>> for about 30 seconds to a minute. The stall causes the load to go to
>>>>> as high as 90. It is all CPU bound in user space - all cores go to
>>>>> 99% utilization (spinlock?). When doing a thread dump, the following
>>>>> line is blocked in all running Tomcat threads:
>>&

Re: Solr 3.6.1 stalling with high CPU and blocking on field cache

2013-12-09 Thread Guido Medina

I was trying to locate the release notes for 3.6.x it is too old, if I 
were you I would update to 3.6.2 (from 3.6.1), it shouldn't affect you 
since it is a minor release, locate the release notes and see if 
something that is affecting you got fixed, also, I would be thinking on 
moving on to 4.x which is quite stable and fast.


Like anything with Java and concurrency, it will just get better (and 
faster) with bigger numbers and concurrency frameworks becoming more and 
more reliable, standard and stable.


Regards,

Guido.

On 09/12/13 15:07, Patrick O'Lone wrote:

I have a new question about this issue - I create a filter queries of
the form:

fq=start_time:[* TO NOW/5MINUTE]

This is used to restrict the set of documents to only items that have a
start time within the next 5 minutes. Most of my indexes have millions
of documents with few documents that start sometime in the future.
Nearly all of my queries include this, would this cause every other
search thread to block until the filter query is re-cached every 5
minutes and if so, is there a better way to do it? Thanks for any
continued help with this issue!


We have a webapp running with a very high HEAP size (24GB) and we have
no problems with it AFTER we enabled the new GC that is meant to replace
sometime in the future the CMS GC, but you have to have Java 6 update
"Some number I couldn't find but latest should cover" to be able to use:

1. Remove all GC options you have and...
2. Replace them with /"-XX:+UseG1GC -XX:MaxGCPauseMillis=50"/

As a test of course, more information you can read on the following (and
interesting) article, we also have Solr running with these options, no
more pauses or HEAP size hitting the sky.

Don't get bored reading the 1st (and small) introduction page of the
article, page 2 and 3 will make lot of sense:
http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061


HTH,

Guido.

On 26/11/13 21:59, Patrick O'Lone wrote:

We do perform a lot of sorting - on multiple fields in fact. We have
different kinds of Solr configurations - our news searches do little
with regards to faceting, but heavily sort. We provide classified ad
searches and that heavily uses faceting. I might try reducing the JVM
memory some and amount of perm generation as suggested earlier. It feels
like a GC issue and loading the cache just happens to be the victim of a
stop-the-world event at the worse possible time.


My gut instinct is that your heap size is way too high. Try
decreasing it to like 5-10G. I know you say it uses more than that,
but that just seems bizarre unless you're doing something like
faceting and/or sorting on every field.

-Michael

-Original Message-
From: Patrick O'Lone [mailto:pol...@townnews.com]
Sent: Tuesday, November 26, 2013 11:59 AM
To: solr-user@lucene.apache.org
Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache

I've been tracking a problem in our Solr environment for awhile with
periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to
try and thought I might get some insight from some others on this list.

The load on the server is normally anywhere between 1-3. It's an
8-core machine with 40GB of RAM. I have about 25GB of index data that
is replicated to this server every 5 minutes. It's taking about 200
connections per second and roughly every 5-10 minutes it will stall
for about 30 seconds to a minute. The stall causes the load to go to
as high as 90. It is all CPU bound in user space - all cores go to
99% utilization (spinlock?). When doing a thread dump, the following
line is blocked in all running Tomcat threads:

org.apache.lucene.search.FieldCacheImpl$Cache.get (
FieldCacheImpl.java:230 )

Looking the source code in 3.6.1, that is a function call to
syncronized() which blocks all threads and causes the backlog. I've
tried to correlate these events to the replication events - but even
with replication disabled - this still happens. We run multiple data
centers using Solr and I was comparing garbage collection processes
between and noted that the old generation is collected very
differently on this data center versus others. The old generation is
collected as a massive collect event (several gigabytes worth) - the
other data center is more saw toothed and collects only in 500MB-1GB
at a time. Here's my parameters to java (the same in all environments):

/usr/java/jre/bin/java \
-verbose:gc \
-XX:+PrintGCDetails \
-server \
-Dcom.sun.management.jmxremote \
-XX:+UseConcMarkSweepGC \
-XX:+UseParNewGC \
-XX:+CMSIncrementalMode \
-XX:+CMSParallelRemarkEnabled \
-XX:+CMSIncrementalPacing \
-XX:NewRatio=3 \
-Xms30720M \
-Xmx30720M \
-Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \
-classpath /usr/local/share/apache-tomcat/bin/bootstrap.jar \
-Dcatalina.base=/usr/local/share/apache-tomcat \
-Dcatalina.home=/usr/local/share/apache-tomcat \
-Djava.io.tmpdir=/

Re: Solr 3.6.1 stalling with high CPU and blocking on field cache

2013-12-09 Thread Patrick O'Lone

I have a new question about this issue - I create a filter queries of
the form:

fq=start_time:[* TO NOW/5MINUTE]

This is used to restrict the set of documents to only items that have a
start time within the next 5 minutes. Most of my indexes have millions
of documents with few documents that start sometime in the future.
Nearly all of my queries include this, would this cause every other
search thread to block until the filter query is re-cached every 5
minutes and if so, is there a better way to do it? Thanks for any
continued help with this issue!

> We have a webapp running with a very high HEAP size (24GB) and we have
> no problems with it AFTER we enabled the new GC that is meant to replace
> sometime in the future the CMS GC, but you have to have Java 6 update
> "Some number I couldn't find but latest should cover" to be able to use:
> 
> 1. Remove all GC options you have and...
> 2. Replace them with /"-XX:+UseG1GC -XX:MaxGCPauseMillis=50"/
> 
> As a test of course, more information you can read on the following (and
> interesting) article, we also have Solr running with these options, no
> more pauses or HEAP size hitting the sky.
> 
> Don't get bored reading the 1st (and small) introduction page of the
> article, page 2 and 3 will make lot of sense:
> http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061
> 
> 
> HTH,
> 
> Guido.
> 
> On 26/11/13 21:59, Patrick O'Lone wrote:
>> We do perform a lot of sorting - on multiple fields in fact. We have
>> different kinds of Solr configurations - our news searches do little
>> with regards to faceting, but heavily sort. We provide classified ad
>> searches and that heavily uses faceting. I might try reducing the JVM
>> memory some and amount of perm generation as suggested earlier. It feels
>> like a GC issue and loading the cache just happens to be the victim of a
>> stop-the-world event at the worse possible time.
>>
>>> My gut instinct is that your heap size is way too high. Try
>>> decreasing it to like 5-10G. I know you say it uses more than that,
>>> but that just seems bizarre unless you're doing something like
>>> faceting and/or sorting on every field.
>>>
>>> -Michael
>>>
>>> -Original Message-
>>> From: Patrick O'Lone [mailto:pol...@townnews.com]
>>> Sent: Tuesday, November 26, 2013 11:59 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache
>>>
>>> I've been tracking a problem in our Solr environment for awhile with
>>> periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to
>>> try and thought I might get some insight from some others on this list.
>>>
>>> The load on the server is normally anywhere between 1-3. It's an
>>> 8-core machine with 40GB of RAM. I have about 25GB of index data that
>>> is replicated to this server every 5 minutes. It's taking about 200
>>> connections per second and roughly every 5-10 minutes it will stall
>>> for about 30 seconds to a minute. The stall causes the load to go to
>>> as high as 90. It is all CPU bound in user space - all cores go to
>>> 99% utilization (spinlock?). When doing a thread dump, the following
>>> line is blocked in all running Tomcat threads:
>>>
>>> org.apache.lucene.search.FieldCacheImpl$Cache.get (
>>> FieldCacheImpl.java:230 )
>>>
>>> Looking the source code in 3.6.1, that is a function call to
>>> syncronized() which blocks all threads and causes the backlog. I've
>>> tried to correlate these events to the replication events - but even
>>> with replication disabled - this still happens. We run multiple data
>>> centers using Solr and I was comparing garbage collection processes
>>> between and noted that the old generation is collected very
>>> differently on this data center versus others. The old generation is
>>> collected as a massive collect event (several gigabytes worth) - the
>>> other data center is more saw toothed and collects only in 500MB-1GB
>>> at a time. Here's my parameters to java (the same in all environments):
>>>
>>> /usr/java/jre/bin/java \
>>> -verbose:gc \
>>> -XX:+PrintGCDetails \
>>> -server \
>>> -Dcom.sun.management.jmxremote \
>>> -XX:+UseConcMarkSweepGC \
>>> -XX:+UseParNewGC \
>>> -XX:+CMSIncrementalMode \
>>> -XX:+CMSParallelRemarkEnabled \
>>> -XX:+CMSIncrementalPacing \
>>> -XX:NewRatio=3 \
>>> -Xm

Re: Solr 3.6.1 stalling with high CPU and blocking on field cache

2013-11-27 Thread Guido Medina

We have a webapp running with a very high HEAP size (24GB) and we have 
no problems with it AFTER we enabled the new GC that is meant to replace 
sometime in the future the CMS GC, but you have to have Java 6 update 
"Some number I couldn't find but latest should cover" to be able to use:


1. Remove all GC options you have and...
2. Replace them with /"-XX:+UseG1GC -XX:MaxGCPauseMillis=50"/

As a test of course, more information you can read on the following (and 
interesting) article, we also have Solr running with these options, no 
more pauses or HEAP size hitting the sky.


Don't get bored reading the 1st (and small) introduction page of the 
article, page 2 and 3 will make lot of sense: 
http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061


HTH,

Guido.

On 26/11/13 21:59, Patrick O'Lone wrote:

We do perform a lot of sorting - on multiple fields in fact. We have
different kinds of Solr configurations - our news searches do little
with regards to faceting, but heavily sort. We provide classified ad
searches and that heavily uses faceting. I might try reducing the JVM
memory some and amount of perm generation as suggested earlier. It feels
like a GC issue and loading the cache just happens to be the victim of a
stop-the-world event at the worse possible time.


My gut instinct is that your heap size is way too high. Try decreasing it to 
like 5-10G. I know you say it uses more than that, but that just seems bizarre 
unless you're doing something like faceting and/or sorting on every field.

-Michael

-Original Message-
From: Patrick O'Lone [mailto:pol...@townnews.com]
Sent: Tuesday, November 26, 2013 11:59 AM
To: solr-user@lucene.apache.org
Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache

I've been tracking a problem in our Solr environment for awhile with periodic 
stalls of Solr 3.6.1. I'm running up to a wall on ideas to try and thought I 
might get some insight from some others on this list.

The load on the server is normally anywhere between 1-3. It's an 8-core machine 
with 40GB of RAM. I have about 25GB of index data that is replicated to this 
server every 5 minutes. It's taking about 200 connections per second and 
roughly every 5-10 minutes it will stall for about 30 seconds to a minute. The 
stall causes the load to go to as high as 90. It is all CPU bound in user space 
- all cores go to 99% utilization (spinlock?). When doing a thread dump, the 
following line is blocked in all running Tomcat threads:

org.apache.lucene.search.FieldCacheImpl$Cache.get (
FieldCacheImpl.java:230 )

Looking the source code in 3.6.1, that is a function call to
syncronized() which blocks all threads and causes the backlog. I've tried to 
correlate these events to the replication events - but even with replication 
disabled - this still happens. We run multiple data centers using Solr and I 
was comparing garbage collection processes between and noted that the old 
generation is collected very differently on this data center versus others. The 
old generation is collected as a massive collect event (several gigabytes 
worth) - the other data center is more saw toothed and collects only in 
500MB-1GB at a time. Here's my parameters to java (the same in all 
environments):

/usr/java/jre/bin/java \
-verbose:gc \
-XX:+PrintGCDetails \
-server \
-Dcom.sun.management.jmxremote \
-XX:+UseConcMarkSweepGC \
-XX:+UseParNewGC \
-XX:+CMSIncrementalMode \
-XX:+CMSParallelRemarkEnabled \
-XX:+CMSIncrementalPacing \
-XX:NewRatio=3 \
-Xms30720M \
-Xmx30720M \
-Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \ -classpath 
/usr/local/share/apache-tomcat/bin/bootstrap.jar \ 
-Dcatalina.base=/usr/local/share/apache-tomcat \ 
-Dcatalina.home=/usr/local/share/apache-tomcat \ -Djava.io.tmpdir=/tmp \ 
org.apache.catalina.startup.Bootstrap start

I've tried a few GC option changes from this (been running this way for a 
couple of years now) - primarily removing CMS Incremental mode as we have 8 
cores and remarks on the internet suggest that it is only for smaller SMP 
setups. Removing CMS did not fix anything.

I've considered that the heap is way too large (30GB from 40GB) and may not 
leave enough memory for mmap operations (MMap appears to be used in the field 
cache). Based on active memory utilization in Java, seems like I might be able 
to reduce down to 22GB safely - but I'm not sure if that will help with the CPU 
issues.

I think field cache is used for sorting and faceting. I've started to 
investigate facet.method, but from what I can tell, this doesn't seem to 
influence sorting at all - only facet queries. I've tried setting 
useFilterForSortQuery, and seems to require less field cache but doesn't 
address the stalling issues.

Is there something I am overlooking? Perhaps the system is becoming 
oversubscribed in terms of resources? Thanks

Re: Solr 3.6.1 stalling with high CPU and blocking on field cache

2013-11-26 Thread Patrick O'Lone

We do perform a lot of sorting - on multiple fields in fact. We have
different kinds of Solr configurations - our news searches do little
with regards to faceting, but heavily sort. We provide classified ad
searches and that heavily uses faceting. I might try reducing the JVM
memory some and amount of perm generation as suggested earlier. It feels
like a GC issue and loading the cache just happens to be the victim of a
stop-the-world event at the worse possible time.

> My gut instinct is that your heap size is way too high. Try decreasing it to 
> like 5-10G. I know you say it uses more than that, but that just seems 
> bizarre unless you're doing something like faceting and/or sorting on every 
> field.
> 
> -Michael
> 
> -Original Message-
> From: Patrick O'Lone [mailto:pol...@townnews.com] 
> Sent: Tuesday, November 26, 2013 11:59 AM
> To: solr-user@lucene.apache.org
> Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache
> 
> I've been tracking a problem in our Solr environment for awhile with periodic 
> stalls of Solr 3.6.1. I'm running up to a wall on ideas to try and thought I 
> might get some insight from some others on this list.
> 
> The load on the server is normally anywhere between 1-3. It's an 8-core 
> machine with 40GB of RAM. I have about 25GB of index data that is replicated 
> to this server every 5 minutes. It's taking about 200 connections per second 
> and roughly every 5-10 minutes it will stall for about 30 seconds to a 
> minute. The stall causes the load to go to as high as 90. It is all CPU bound 
> in user space - all cores go to 99% utilization (spinlock?). When doing a 
> thread dump, the following line is blocked in all running Tomcat threads:
> 
> org.apache.lucene.search.FieldCacheImpl$Cache.get (
> FieldCacheImpl.java:230 )
> 
> Looking the source code in 3.6.1, that is a function call to
> syncronized() which blocks all threads and causes the backlog. I've tried to 
> correlate these events to the replication events - but even with replication 
> disabled - this still happens. We run multiple data centers using Solr and I 
> was comparing garbage collection processes between and noted that the old 
> generation is collected very differently on this data center versus others. 
> The old generation is collected as a massive collect event (several gigabytes 
> worth) - the other data center is more saw toothed and collects only in 
> 500MB-1GB at a time. Here's my parameters to java (the same in all 
> environments):
> 
> /usr/java/jre/bin/java \
> -verbose:gc \
> -XX:+PrintGCDetails \
> -server \
> -Dcom.sun.management.jmxremote \
> -XX:+UseConcMarkSweepGC \
> -XX:+UseParNewGC \
> -XX:+CMSIncrementalMode \
> -XX:+CMSParallelRemarkEnabled \
> -XX:+CMSIncrementalPacing \
> -XX:NewRatio=3 \
> -Xms30720M \
> -Xmx30720M \
> -Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \ -classpath 
> /usr/local/share/apache-tomcat/bin/bootstrap.jar \ 
> -Dcatalina.base=/usr/local/share/apache-tomcat \ 
> -Dcatalina.home=/usr/local/share/apache-tomcat \ -Djava.io.tmpdir=/tmp \ 
> org.apache.catalina.startup.Bootstrap start
> 
> I've tried a few GC option changes from this (been running this way for a 
> couple of years now) - primarily removing CMS Incremental mode as we have 8 
> cores and remarks on the internet suggest that it is only for smaller SMP 
> setups. Removing CMS did not fix anything.
> 
> I've considered that the heap is way too large (30GB from 40GB) and may not 
> leave enough memory for mmap operations (MMap appears to be used in the field 
> cache). Based on active memory utilization in Java, seems like I might be 
> able to reduce down to 22GB safely - but I'm not sure if that will help with 
> the CPU issues.
> 
> I think field cache is used for sorting and faceting. I've started to 
> investigate facet.method, but from what I can tell, this doesn't seem to 
> influence sorting at all - only facet queries. I've tried setting 
> useFilterForSortQuery, and seems to require less field cache but doesn't 
> address the stalling issues.
> 
> Is there something I am overlooking? Perhaps the system is becoming 
> oversubscribed in terms of resources? Thanks for any help that is offered.
> 
> --
> Patrick O'Lone
> Director of Software Development
> TownNews.com
> 
> E-mail ... pol...@townnews.com
> Phone  309-743-0809
> Fax .. 309-743-0830
> 
> 


-- 
Patrick O'Lone
Director of Software Development
TownNews.com

E-mail ... pol...@townnews.com
Phone  309-743-0809
Fax .. 309-743-0830

RE: Solr 3.6.1 stalling with high CPU and blocking on field cache

2013-11-26 Thread Michael Ryan

My gut instinct is that your heap size is way too high. Try decreasing it to 
like 5-10G. I know you say it uses more than that, but that just seems bizarre 
unless you're doing something like faceting and/or sorting on every field.

-Michael

-Original Message-
From: Patrick O'Lone [mailto:pol...@townnews.com] 
Sent: Tuesday, November 26, 2013 11:59 AM
To: solr-user@lucene.apache.org
Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache

I've been tracking a problem in our Solr environment for awhile with periodic 
stalls of Solr 3.6.1. I'm running up to a wall on ideas to try and thought I 
might get some insight from some others on this list.

The load on the server is normally anywhere between 1-3. It's an 8-core machine 
with 40GB of RAM. I have about 25GB of index data that is replicated to this 
server every 5 minutes. It's taking about 200 connections per second and 
roughly every 5-10 minutes it will stall for about 30 seconds to a minute. The 
stall causes the load to go to as high as 90. It is all CPU bound in user space 
- all cores go to 99% utilization (spinlock?). When doing a thread dump, the 
following line is blocked in all running Tomcat threads:

org.apache.lucene.search.FieldCacheImpl$Cache.get (
FieldCacheImpl.java:230 )

Looking the source code in 3.6.1, that is a function call to
syncronized() which blocks all threads and causes the backlog. I've tried to 
correlate these events to the replication events - but even with replication 
disabled - this still happens. We run multiple data centers using Solr and I 
was comparing garbage collection processes between and noted that the old 
generation is collected very differently on this data center versus others. The 
old generation is collected as a massive collect event (several gigabytes 
worth) - the other data center is more saw toothed and collects only in 
500MB-1GB at a time. Here's my parameters to java (the same in all 
environments):

/usr/java/jre/bin/java \
-verbose:gc \
-XX:+PrintGCDetails \
-server \
-Dcom.sun.management.jmxremote \
-XX:+UseConcMarkSweepGC \
-XX:+UseParNewGC \
-XX:+CMSIncrementalMode \
-XX:+CMSParallelRemarkEnabled \
-XX:+CMSIncrementalPacing \
-XX:NewRatio=3 \
-Xms30720M \
-Xmx30720M \
-Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \ -classpath 
/usr/local/share/apache-tomcat/bin/bootstrap.jar \ 
-Dcatalina.base=/usr/local/share/apache-tomcat \ 
-Dcatalina.home=/usr/local/share/apache-tomcat \ -Djava.io.tmpdir=/tmp \ 
org.apache.catalina.startup.Bootstrap start

I've tried a few GC option changes from this (been running this way for a 
couple of years now) - primarily removing CMS Incremental mode as we have 8 
cores and remarks on the internet suggest that it is only for smaller SMP 
setups. Removing CMS did not fix anything.

I've considered that the heap is way too large (30GB from 40GB) and may not 
leave enough memory for mmap operations (MMap appears to be used in the field 
cache). Based on active memory utilization in Java, seems like I might be able 
to reduce down to 22GB safely - but I'm not sure if that will help with the CPU 
issues.

I think field cache is used for sorting and faceting. I've started to 
investigate facet.method, but from what I can tell, this doesn't seem to 
influence sorting at all - only facet queries. I've tried setting 
useFilterForSortQuery, and seems to require less field cache but doesn't 
address the stalling issues.

Is there something I am overlooking? Perhaps the system is becoming 
oversubscribed in terms of resources? Thanks for any help that is offered.

--
Patrick O'Lone
Director of Software Development
TownNews.com

E-mail ... pol...@townnews.com
Phone  309-743-0809
Fax .. 309-743-0830

RE: Solr 3.6.1 stalling with high CPU and blocking on field cache

2013-11-26 Thread Patrice Monroe Pustavrh

I am not completely sure about that, but if I remember correctly (it has been 
more than one year since I've did that and I was hmm.. whatever you want to 
write here,  enogh not to take notes :( ), it helped that I've reduced the 
percentage of size of permanent generation (somehow, more GC on less permanent 
gen, but this one is not blocking the system and it could be that it prevents 
really large GC's - at the account of more smaller ones). But it is far from 
sound advice, it is just somehow distant memory and I've could also mixed 
things up in my memory (been doing many other things in between), so my advice 
could as well be misleading (and make sure that your heap is still big enough, 
once you get bellow reasonable value, nothing will help). 
P.S. if it worked for you, just let us know. 

Regards
Patrice Monroe Pustavrh, 
Software developer, 
Bisnode Slovenia d.o.o.

-Original Message-
From: Patrick O'Lone [mailto:pol...@townnews.com] 
Sent: Tuesday, November 26, 2013 5:59 PM
To: solr-user@lucene.apache.org
Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache

I've been tracking a problem in our Solr environment for awhile with periodic 
stalls of Solr 3.6.1. I'm running up to a wall on ideas to try and thought I 
might get some insight from some others on this list.

The load on the server is normally anywhere between 1-3. It's an 8-core machine 
with 40GB of RAM. I have about 25GB of index data that is replicated to this 
server every 5 minutes. It's taking about 200 connections per second and 
roughly every 5-10 minutes it will stall for about 30 seconds to a minute. The 
stall causes the load to go to as high as 90. It is all CPU bound in user space 
- all cores go to 99% utilization (spinlock?). When doing a thread dump, the 
following line is blocked in all running Tomcat threads:

org.apache.lucene.search.FieldCacheImpl$Cache.get (
FieldCacheImpl.java:230 )

Looking the source code in 3.6.1, that is a function call to
syncronized() which blocks all threads and causes the backlog. I've tried to 
correlate these events to the replication events - but even with replication 
disabled - this still happens. We run multiple data centers using Solr and I 
was comparing garbage collection processes between and noted that the old 
generation is collected very differently on this data center versus others. The 
old generation is collected as a massive collect event (several gigabytes 
worth) - the other data center is more saw toothed and collects only in 
500MB-1GB at a time. Here's my parameters to java (the same in all 
environments):

/usr/java/jre/bin/java \
-verbose:gc \
-XX:+PrintGCDetails \
-server \
-Dcom.sun.management.jmxremote \
-XX:+UseConcMarkSweepGC \
-XX:+UseParNewGC \
-XX:+CMSIncrementalMode \
-XX:+CMSParallelRemarkEnabled \
-XX:+CMSIncrementalPacing \
-XX:NewRatio=3 \
-Xms30720M \
-Xmx30720M \
-Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \ -classpath 
/usr/local/share/apache-tomcat/bin/bootstrap.jar \ 
-Dcatalina.base=/usr/local/share/apache-tomcat \ 
-Dcatalina.home=/usr/local/share/apache-tomcat \ -Djava.io.tmpdir=/tmp \ 
org.apache.catalina.startup.Bootstrap start

I've tried a few GC option changes from this (been running this way for a 
couple of years now) - primarily removing CMS Incremental mode as we have 8 
cores and remarks on the internet suggest that it is only for smaller SMP 
setups. Removing CMS did not fix anything.

I've considered that the heap is way too large (30GB from 40GB) and may not 
leave enough memory for mmap operations (MMap appears to be used in the field 
cache). Based on active memory utilization in Java, seems like I might be able 
to reduce down to 22GB safely - but I'm not sure if that will help with the CPU 
issues.

I think field cache is used for sorting and faceting. I've started to 
investigate facet.method, but from what I can tell, this doesn't seem to 
influence sorting at all - only facet queries. I've tried setting 
useFilterForSortQuery, and seems to require less field cache but doesn't 
address the stalling issues.

Is there something I am overlooking? Perhaps the system is becoming 
oversubscribed in terms of resources? Thanks for any help that is offered.

--
Patrick O'Lone
Director of Software Development
TownNews.com

E-mail ... pol...@townnews.com
Phone  309-743-0809
Fax .. 309-743-0830

RE: Solr 3.6.1 stalling with high CPU and blocking on field cache

2013-11-26 Thread Patrice Monroe Pustavrh

I am not completely sure about that, but if I remember correctly (it has been 
more than one year since I've did that and I was lazy enogh not to take notes 
:( ), it helped that I've reduced the percentage of size of permanent 
generation (somehow, more GC on less permanent gen, but this one is not 
blocking the system and it could be that it prevents really large GC's - at the 
account of more smaller ones). But it is far from sound advice, it is just 
somehow distant memory and I've could also mixed things up in my memory (been 
doing many other things in between), so my advice could as well be misleading 
(and make sure that your heap is still big enough, once you get bellow 
reasonable value, nothing will help). 
P.S. if it worked for you, just let us know. 

Regards
Patrice Monroe Pustavrh, 
Software developer, 
Bisnode Slovenia d.o.o.

-Original Message-
From: Patrick O'Lone [mailto:pol...@townnews.com] 
Sent: Tuesday, November 26, 2013 5:59 PM
To: solr-user@lucene.apache.org
Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache

I've been tracking a problem in our Solr environment for awhile with periodic 
stalls of Solr 3.6.1. I'm running up to a wall on ideas to try and thought I 
might get some insight from some others on this list.

The load on the server is normally anywhere between 1-3. It's an 8-core machine 
with 40GB of RAM. I have about 25GB of index data that is replicated to this 
server every 5 minutes. It's taking about 200 connections per second and 
roughly every 5-10 minutes it will stall for about 30 seconds to a minute. The 
stall causes the load to go to as high as 90. It is all CPU bound in user space 
- all cores go to 99% utilization (spinlock?). When doing a thread dump, the 
following line is blocked in all running Tomcat threads:

org.apache.lucene.search.FieldCacheImpl$Cache.get (
FieldCacheImpl.java:230 )

Looking the source code in 3.6.1, that is a function call to
syncronized() which blocks all threads and causes the backlog. I've tried to 
correlate these events to the replication events - but even with replication 
disabled - this still happens. We run multiple data centers using Solr and I 
was comparing garbage collection processes between and noted that the old 
generation is collected very differently on this data center versus others. The 
old generation is collected as a massive collect event (several gigabytes 
worth) - the other data center is more saw toothed and collects only in 
500MB-1GB at a time. Here's my parameters to java (the same in all 
environments):

/usr/java/jre/bin/java \
-verbose:gc \
-XX:+PrintGCDetails \
-server \
-Dcom.sun.management.jmxremote \
-XX:+UseConcMarkSweepGC \
-XX:+UseParNewGC \
-XX:+CMSIncrementalMode \
-XX:+CMSParallelRemarkEnabled \
-XX:+CMSIncrementalPacing \
-XX:NewRatio=3 \
-Xms30720M \
-Xmx30720M \
-Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \ -classpath 
/usr/local/share/apache-tomcat/bin/bootstrap.jar \ 
-Dcatalina.base=/usr/local/share/apache-tomcat \ 
-Dcatalina.home=/usr/local/share/apache-tomcat \ -Djava.io.tmpdir=/tmp \ 
org.apache.catalina.startup.Bootstrap start

I've tried a few GC option changes from this (been running this way for a 
couple of years now) - primarily removing CMS Incremental mode as we have 8 
cores and remarks on the internet suggest that it is only for smaller SMP 
setups. Removing CMS did not fix anything.

I've considered that the heap is way too large (30GB from 40GB) and may not 
leave enough memory for mmap operations (MMap appears to be used in the field 
cache). Based on active memory utilization in Java, seems like I might be able 
to reduce down to 22GB safely - but I'm not sure if that will help with the CPU 
issues.

I think field cache is used for sorting and faceting. I've started to 
investigate facet.method, but from what I can tell, this doesn't seem to 
influence sorting at all - only facet queries. I've tried setting 
useFilterForSortQuery, and seems to require less field cache but doesn't 
address the stalling issues.

Is there something I am overlooking? Perhaps the system is becoming 
oversubscribed in terms of resources? Thanks for any help that is offered.

--
Patrick O'Lone
Director of Software Development
TownNews.com

E-mail ... pol...@townnews.com
Phone  309-743-0809
Fax .. 309-743-0830

Solr 3.6.1 stalling with high CPU and blocking on field cache

2013-11-26 Thread Patrick O'Lone

I've been tracking a problem in our Solr environment for awhile with
periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to try
and thought I might get some insight from some others on this list.

The load on the server is normally anywhere between 1-3. It's an 8-core
machine with 40GB of RAM. I have about 25GB of index data that is
replicated to this server every 5 minutes. It's taking about 200
connections per second and roughly every 5-10 minutes it will stall for
about 30 seconds to a minute. The stall causes the load to go to as high
as 90. It is all CPU bound in user space - all cores go to 99%
utilization (spinlock?). When doing a thread dump, the following line is
blocked in all running Tomcat threads:

org.apache.lucene.search.FieldCacheImpl$Cache.get (
FieldCacheImpl.java:230 )

Looking the source code in 3.6.1, that is a function call to
syncronized() which blocks all threads and causes the backlog. I've
tried to correlate these events to the replication events - but even
with replication disabled - this still happens. We run multiple data
centers using Solr and I was comparing garbage collection processes
between and noted that the old generation is collected very differently
on this data center versus others. The old generation is collected as a
massive collect event (several gigabytes worth) - the other data center
is more saw toothed and collects only in 500MB-1GB at a time. Here's my
parameters to java (the same in all environments):

/usr/java/jre/bin/java \
-verbose:gc \
-XX:+PrintGCDetails \
-server \
-Dcom.sun.management.jmxremote \
-XX:+UseConcMarkSweepGC \
-XX:+UseParNewGC \
-XX:+CMSIncrementalMode \
-XX:+CMSParallelRemarkEnabled \
-XX:+CMSIncrementalPacing \
-XX:NewRatio=3 \
-Xms30720M \
-Xmx30720M \
-Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \
-classpath /usr/local/share/apache-tomcat/bin/bootstrap.jar \
-Dcatalina.base=/usr/local/share/apache-tomcat \
-Dcatalina.home=/usr/local/share/apache-tomcat \
-Djava.io.tmpdir=/tmp \
org.apache.catalina.startup.Bootstrap start

I've tried a few GC option changes from this (been running this way for
a couple of years now) - primarily removing CMS Incremental mode as we
have 8 cores and remarks on the internet suggest that it is only for
smaller SMP setups. Removing CMS did not fix anything.

I've considered that the heap is way too large (30GB from 40GB) and may
not leave enough memory for mmap operations (MMap appears to be used in
the field cache). Based on active memory utilization in Java, seems like
I might be able to reduce down to 22GB safely - but I'm not sure if that
will help with the CPU issues.

I think field cache is used for sorting and faceting. I've started to
investigate facet.method, but from what I can tell, this doesn't seem to
influence sorting at all - only facet queries. I've tried setting
useFilterForSortQuery, and seems to require less field cache but doesn't
address the stalling issues.

Is there something I am overlooking? Perhaps the system is becoming
oversubscribed in terms of resources? Thanks for any help that is offered.

-- 
Patrick O'Lone
Director of Software Development
TownNews.com

E-mail ... pol...@townnews.com
Phone  309-743-0809
Fax .. 309-743-0830

Re: Solr 3.6 optimize and field cache question

2013-07-10 Thread Marc Sturlese

Not a solution for the short term but sounds like a good use case to migrate
to Solr 4.X and use DocValues instead of FieldCache for faceting.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-3-6-optimize-and-field-cache-question-tp4076398p4076822.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 3.6 optimize and field cache question

2013-07-08 Thread Otis Gospodnetic

Hi,

70 GB heap and still OOMing?  H sure, 14 fields for faceting,
but still - 70 GB heap!

Don't have source handy, but I quickly looked at FC src here -
http://search-lucene.com/c/Lucene:core/src/java/org/apache/lucene/search/FieldCache.java
- I see mentions of "delete" there, so I would guess FC is
delete-aware.

Have you tried just committing after your purge? (purge == delete (by
query), I assume)
Try that and lets see your heap before/after.

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Mon, Jul 8, 2013 at 6:11 PM, Joshi, Shital  wrote:
> Hi,
>
> We have Solr 3.6 set up with master and two slaves, each one with 70GB JVM. 
> We run into java.lang.OutOfMemoryError when we cross 250 million documents. 
> Every time this happens we purge documents, bring it below 200 million and 
> bounce both slaves. We have facets  on 14 fields. We usually don't run 
> optimize after the purge. Will the deleted documents be part of field cache 
> if we don't run optimize after purge? Will I see difference in java heap 
> memory utilization before and after running optimize? I thought optimize only 
> affects the disk space.
>
> Thanks!
>

RE: Solr 3.6 optimize and field cache question

2013-07-08 Thread Michael Ryan

I'm 99% sure that the deleted docs will indeed use up space in the field cache, 
at least until the segments that those documents are in are merged - that is 
what an optimize will do. Of course, these segments will automatically be 
merged eventually, but it might take days for this to happen, depending on how 
often your index is updated.

-Michael

-Original Message-
From: Joshi, Shital [mailto:shital.jo...@gs.com] 
Sent: Monday, July 08, 2013 6:12 PM
To: 'solr-user@lucene.apache.org'
Subject: Solr 3.6 optimize and field cache question

Hi,

We have Solr 3.6 set up with master and two slaves, each one with 70GB JVM. We 
run into java.lang.OutOfMemoryError when we cross 250 million documents. Every 
time this happens we purge documents, bring it below 200 million and bounce 
both slaves. We have facets  on 14 fields. We usually don't run optimize after 
the purge. Will the deleted documents be part of field cache if we don't run 
optimize after purge? Will I see difference in java heap memory utilization 
before and after running optimize? I thought optimize only affects the disk 
space.

Thanks!

Solr 3.6 optimize and field cache question

2013-07-08 Thread Joshi, Shital

Hi,

We have Solr 3.6 set up with master and two slaves, each one with 70GB JVM. We 
run into java.lang.OutOfMemoryError when we cross 250 million documents. Every 
time this happens we purge documents, bring it below 200 million and bounce 
both slaves. We have facets  on 14 fields. We usually don't run optimize after 
the purge. Will the deleted documents be part of field cache if we don't run 
optimize after purge? Will I see difference in java heap memory utilization 
before and after running optimize? I thought optimize only affects the disk 
space.

Thanks!

Re: Avoid loading Lucene's field cache for certain fields

2013-05-19 Thread J Mohamed Zahoor


I am using Solr 4.2.1

./zahoor

On 20-May-2013, at 11:48 AM, J Mohamed Zahoor  wrote:

> Hi
> 
> I am trying to avoid loading some fields in Lucene's FieldCache.
> 
> Is there a way to avoid loading certain failed in Lucene's FieldCache.
> One way is to declare them multivalued..
> 
> Is there any other way?
> 
> ./zahoor
> 
>

Avoid loading Lucene's field cache for certain fields

2013-05-19 Thread J Mohamed Zahoor

Hi

I am trying to avoid loading some fields in Lucene's FieldCache.

Is there a way to avoid loading certain failed in Lucene's FieldCache.
One way is to declare them multivalued..

Is there any other way?

./zahoor

reranking and field cache usage

2011-05-08 Thread Zhi-Da Zhong

Hello,

I'm experimenting with ways to add some degree of diversity to search results 
by re-ranking them. For example, I might want to take the top 100 docs (sorted 
by score), and rearrange them so that no more than 2 results share a particular 
attribute x within any 20-result block. It's a best effort algorithm since 
there may be more than 10 results that have x. And if the original list already 
satisfies the diversity goal, then the ordering is unchanged. So 2 questions:

1. What's a good way to implement this?

The most obvious solution (at least for this particular example) might be field 
collapsing. But we do need faceting as well. And the two don't yet work 
together according to http://wiki.apache.org/solr/FieldCollapsing . It also 
wouldn't be applicable if the re-ranking function depended on things other than 
field values (like the score).

Custom sorting (FieldComparatorSource) doesn't seem to work either because the 
relative ordering of 2 docs depends not only on their field values but on what 
other docs match the query as well.

So right now I'm doing post-processing: sort by score, look up x for each top 
doc, then re-arrange if necessary.

Is there a better way?

2. We need a fast way to fetch x for a large (100s) number of docs.

It'd be great if the doc()/document() methods could automatically use the field 
cache - perhaps with something like 
https://issues.apache.org/jira/browse/SOLR-1961 . That hasn't been accepted, 
though. So I wrote this on top of the Solr API:

  private static void loadCachedFields(SolrDocument doc, SolrIndexSearcher 
searcher, int docId, final Set cachedFields) throws IOException {
// find leaf reader and doc id offset for this doc
SolrIndexReader reader = searcher.getReader();
int[] offsets = reader.getLeafOffsets();
int idx = SolrIndexReader.readerIndex(docId, offsets);
SolrIndexReader leafReader = reader.getLeafReaders()[idx];
int offset = offsets[idx];

IndexSchema schema = searcher.getSchema();

for (String f : cachedFields) {
  Object val;
  if (schema.getField(f).getType() instanceof IntField) {
val = FieldCache.DEFAULT.getInts(leafReader, f)[docId - offset];
  } else ...

  doc.addField(f, val);
}
  }

(I borrowed the doc id offset code from QueryComponent.)

Does this look like a reasonable solution?

Thanks!

- zhi-da

Field Cache

2011-05-07 Thread samarth s

Hi,

I have read lucene field cache is used in faceting and sorting. Is it also
populated/used when only selected fields are retrieved using the 'fl' OR
'included fields in collapse' parameters? Is it also used for collapsing?

-- 
Regards,
Samarth

Re: Sort fields all look Strings in field cache, no matter schema type

2009-12-19 Thread Jay Hill

Oh, forgot to add (just to keep the thread complete), the field is being
used for a sort, so it was able to use TrieDoubleField.

Thanks again,
-Jay


On Sat, Dec 19, 2009 at 12:21 PM, Jay Hill  wrote:

> This field is of class type solr.SortableDoubleField.
>
> I'm actually migrating a project from Solr 1.1 to 1.4, and am in the
> process of trying to update the schema and solrconfig in stages. Updating
> the field to TrieDoubleField w/ precisionStep=0 definitely helped.
>
> Thanks Yonik!
> -Jay
>
>
>
>
> On Sat, Dec 19, 2009 at 11:37 AM, Yonik Seeley  > wrote:
>
>> On Sat, Dec 19, 2009 at 2:25 PM, Jay Hill  wrote:
>> > One thing that struck me as odd in the output of the stats.jsp page is
>> that
>> > the field cache always shows a String type for a field, even if it is
>> not a
>> > String. For example, the output below is for a field "cscore" that is a
>> > double:
>>
>> What's the class type of the double?  Older style SortableDouble had
>> to use the string index.  Newer style trie-double based should use a
>> double[].
>>
>> It also matters what the FieldCache entry is being used for... certain
>> things like faceting on single valued fields still use the
>> StringIndex.  I believe the stats component does too.  Sorting and
>> function queries should work as expected.
>>
>> -Yonik
>>
>
>

Re: Sort fields all look Strings in field cache, no matter schema type

2009-12-19 Thread Jay Hill

This field is of class type solr.SortableDoubleField.

I'm actually migrating a project from Solr 1.1 to 1.4, and am in the process
of trying to update the schema and solrconfig in stages. Updating the field
to TrieDoubleField w/ precisionStep=0 definitely helped.

Thanks Yonik!
-Jay



On Sat, Dec 19, 2009 at 11:37 AM, Yonik Seeley
wrote:

> On Sat, Dec 19, 2009 at 2:25 PM, Jay Hill  wrote:
> > One thing that struck me as odd in the output of the stats.jsp page is
> that
> > the field cache always shows a String type for a field, even if it is not
> a
> > String. For example, the output below is for a field "cscore" that is a
> > double:
>
> What's the class type of the double?  Older style SortableDouble had
> to use the string index.  Newer style trie-double based should use a
> double[].
>
> It also matters what the FieldCache entry is being used for... certain
> things like faceting on single valued fields still use the
> StringIndex.  I believe the stats component does too.  Sorting and
> function queries should work as expected.
>
> -Yonik
>

Re: Sort fields all look Strings in field cache, no matter schema type

2009-12-19 Thread Yonik Seeley

On Sat, Dec 19, 2009 at 2:25 PM, Jay Hill  wrote:
> One thing that struck me as odd in the output of the stats.jsp page is that
> the field cache always shows a String type for a field, even if it is not a
> String. For example, the output below is for a field "cscore" that is a
> double:

What's the class type of the double?  Older style SortableDouble had
to use the string index.  Newer style trie-double based should use a
double[].

It also matters what the FieldCache entry is being used for... certain
things like faceting on single valued fields still use the
StringIndex.  I believe the stats component does too.  Sorting and
function queries should work as expected.

-Yonik

Sort fields all look Strings in field cache, no matter schema type

2009-12-19 Thread Jay Hill

I'm on a project where I'm trying to determine the size of the field cache.
We're seeing lots of memory problems, and I suspect that the field cache is
extremely large, but I'm trying to get exact counts on what's in the field
cache.

One thing that struck me as odd in the output of the stats.jsp page is that
the field cache always shows a String type for a field, even if it is not a
String. For example, the output below is for a field "cscore" that is a
double:

entry#0 : 
'org.apache.lucene.index.readonlydirectoryrea...@6239da8a'=>'cscore',class

org.apache.lucene.search.FieldCache$StringIndex,null=>org.apache.lucene.search.FieldCache$StringIndex#297347471


The index has 4,292,426 documents, so I would expect the field cache size
for this field to be:
cscore: double (8 bytes) x 4,292,426 docs = 34,339,408 bytes

But can someone explain why a double is using FieldCache$StringIndex please?
No matter what the type of the field is in the schema the field cache stats
always show FieldCache$StringIndex.

Thanks,
-Jay

Multi-valued field cache

2009-09-30 Thread wojtekpia


I want to build a FunctionQuery that scores documents based on a multi-valued
field. My intention was to use the field cache, but that doesn't get me
multiple values per document. I saw other posts suggesting UnInvertedField
as the solution. I don't see a method in the UnInvertedField class that will
give me a list of field values per document. I only see methods that give
values per document set. Should I use one of those methods and create
document sets of size 1 for each document?

Thanks,

Wojtek
-- 
View this message in context: 
http://www.nabble.com/Multi-valued-field-cache-tp25684952p25684952.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multivalue Field Cache

2009-09-24 Thread Grant Ingersoll


Have a look at UninvertedField.java.  I think that might help.

On Sep 23, 2009, at 2:35 PM, Amit Nithian wrote:

Are there any good implementations of a field cache that will return  
all
values of a multivalued field? I am in the process of writing one  
for my
immediate needs but I was wondering if if there is a complete  
implementation
of one that handles the different field types. If not, then I can  
continue

on with mine and donate back.
Thanks!
Amit


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search

Multivalue Field Cache

2009-09-23 Thread Amit Nithian

Are there any good implementations of a field cache that will return all
values of a multivalued field? I am in the process of writing one for my
immediate needs but I was wondering if if there is a complete implementation
of one that handles the different field types. If not, then I can continue
on with mine and donate back.
Thanks!
Amit

Re: quickie: do facetfields use same cached items in field cache as FQ-param?

2007-10-12 Thread Britske


as a related question: is here a way to inspect the queries currently in the
filtercache?


Britske wrote:
> 
> Yeah i meant filter-cache, thanks. 
> It seemed that the particular field (cityname) was using a
> keywordtokenizer (which doens't show at the front) which is why i missed
> it i guess :-S. This means the term field is tokenized so
> termEnums-apporach is used. This results in about 10.000 inserts on
> facet.field=cityname on a cold searcher, which matches the nr of different
> terms in that field. At least that explains that. 
> 
> So if I understand correctly if I use that same field in a FQ-param, say
> fq=cityname:amsterdam and amsterdam is a term of field cityname, than the
> FQ-query can utilize the cached 'query': cityname:amsterdam which is
> already put into the filtercache by the query facet.field=cityname right?
> 
> The thing that I still don't get is why my filtercache starts to have
> evictions although it's size is 16.000+.  This shouldn't be happing given
> that:
> I currently only use faceting on cityname and use this field on FQ as
> well, as already said (which adds +/- 1 items to the filtercache,
> given that faceting and fq share cached items). 
> Moreover i use FQ on about 2500 different fields (named _ddp*), but only
> check to see if a value exists by doing for example: fq=_ddp1234:[* TO *].
> I sometimes add them together like so: fq=_ddp1234:[* TO *]
> &fq=_ddp2345:[* TO *]. But never like so: fq=_ddp1234:[* TO *]
> +_ddp2345:[* TO *]. Which means each _ddp*-field is only added once to the
> filtercache. 
> 
> Wouldn't this mean that at a maximum I can only have 12500 items in the
> filtercache?
> Still my filtercache starts to have evictions although it's size is
> 16.000+. 
> 
> What am I missing here?
> Geert-Jan
> 
> 
> hossman wrote:
>> 
>> 
>> : ..fq=country:france
>> : 
>> : do these queries share cached items in the fieldcache? (in this
>> example:
>> : country:france) or do they somehow live as seperate entities in the
>> cache?
>> : The latter would explain my fieldcache having evictions at the moment.
>> 
>> FieldCache can't have evicitions.  it's a really low level "cache" where 
>> the key is field name and the value is an array containing a value for 
>> every document (you cna think of it as an inverted-inverted-index) that 
>> Lucene maintains directly.  items are never removed they just get garbage 
>> collected when the IndexReader is no longer used.  It's primarily for 
>> sorting, but the SimpleFacets code also leveragies it for facets in some 
>> cases -- Solr has no way of showing you what's in the FieldCache, because 
>> Lucene doesn't expose any inspection APIs to query it (it's a heisenberg 
>> cache .. once you ask if something is in it, it's in it)
>> 
>> are you refering to the "filterCache" ?  
>> 
>> filterCache contains records whose key is a "query" and whose value is a 
>> DocSet (an unordered collection of all docs matching a query) ... it's 
>> used whenever you use an "fq" param, for faceting on some fields (when
>> the 
>> TermEnum method is used, a filterCache entry is added for each term 
>> tested), and even for some sorted queries if the 
>>  config option is set to true.
>> 
>> the easiest way to know whether your faceting is using the FieldCache is 
>> to start your server cold (no newSearcher warming) and then send it a 
>> simple query with a single facet.field.  depending on the query, you
>> might 
>> get 0 or 1 entries in the filterCache if SimpleFacets is using the 
>> FieldCache -- but if it's using the TermEnums, and generating a DocSet
>> per 
>> term, you'llsee *lots* of inserts into the filterCache.
>> 
>> 
>> 
>> -Hoss
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/quickie%3A-do-facetfields-use-same-cached-items-in-field-cache-as-FQ-param--tf4609795.html#a13170530
Sent from the Solr - User mailing list archive at Nabble.com.

Re: quickie: do facetfields use same cached items in field cache as FQ-param?

2007-10-11 Thread Britske

Yeah i meant filter-cache, thanks. 
It seemed that the particular field (cityname) was using a keywordtokenizer
(which doens't show at the front) which is why i missed it i guess :-S. This
means the term field is tokenized so termEnums-apporach is used. This
results in about 10.000 inserts on facet.field=cityname on a cold searcher,
which matches the nr of different terms in that field. At least that
explains that. 

So if I understand correctly if I use that same field in a FQ-param, say
fq=cityname:amsterdam and amsterdam is a term of field cityname, than the
FQ-query can utilize the cached 'query': cityname:amsterdam which is already
put into the filtercache by the query facet.field=cityname right?

The thing that I still don't get is why my filtercache starts to have
evictions although it's size is 16.000+.  This shouldn't be happing given
that:
I currently only use faceting on cityname and use this field on FQ as well,
as already said (which adds +/- 1 items to the filtercache, given that
faceting and fq share cached items). 
Moreover i use FQ on about 2500 different fields (named _ddp*), but only
check to see if a value exists by doing for example: fq=_ddp1234:[* TO *]. I
sometimes add them together like so: fq=_ddp1234:[* TO *] &fq=_ddp2345:[* TO
*]. But never like so: fq=_ddp1234:[* TO *] +_ddp2345:[* TO *]. Which means
each _ddp*-field is only added once to the filtercache. 

Wouldn't this mean that at a maximum I can only have 12500 items in the
filtercache?
Still my filtercache starts to have evictions although it's size is 16.000+. 

What am I missing here?
Geert-Jan

hossman wrote:
> 
> 
> : ..fq=country:france
> : 
> : do these queries share cached items in the fieldcache? (in this example:
> : country:france) or do they somehow live as seperate entities in the
> cache?
> : The latter would explain my fieldcache having evictions at the moment.
> 
> FieldCache can't have evicitions.  it's a really low level "cache" where 
> the key is field name and the value is an array containing a value for 
> every document (you cna think of it as an inverted-inverted-index) that 
> Lucene maintains directly.  items are never removed they just get garbage 
> collected when the IndexReader is no longer used.  It's primarily for 
> sorting, but the SimpleFacets code also leveragies it for facets in some 
> cases -- Solr has no way of showing you what's in the FieldCache, because 
> Lucene doesn't expose any inspection APIs to query it (it's a heisenberg 
> cache .. once you ask if something is in it, it's in it)
> 
> are you refering to the "filterCache" ?  
> 
> filterCache contains records whose key is a "query" and whose value is a 
> DocSet (an unordered collection of all docs matching a query) ... it's 
> used whenever you use an "fq" param, for faceting on some fields (when the 
> TermEnum method is used, a filterCache entry is added for each term 
> tested), and even for some sorted queries if the 
>  config option is set to true.
> 
> the easiest way to know whether your faceting is using the FieldCache is 
> to start your server cold (no newSearcher warming) and then send it a 
> simple query with a single facet.field.  depending on the query, you might 
> get 0 or 1 entries in the filterCache if SimpleFacets is using the 
> FieldCache -- but if it's using the TermEnums, and generating a DocSet per 
> term, you'llsee *lots* of inserts into the filterCache.
> 
> 
> 
> -Hoss
> 
> 

-- 
View this message in context: 
http://www.nabble.com/quickie%3A-do-facetfields-use-same-cached-items-in-field-cache-as-FQ-param--tf4609795.html#a13169935
Sent from the Solr - User mailing list archive at Nabble.com.

Re: quickie: do facetfields use same cached items in field cache as FQ-param?

2007-10-11 Thread Chris Hostetter


: ..fq=country:france
: 
: do these queries share cached items in the fieldcache? (in this example:
: country:france) or do they somehow live as seperate entities in the cache?
: The latter would explain my fieldcache having evictions at the moment.

FieldCache can't have evicitions.  it's a really low level "cache" where 
the key is field name and the value is an array containing a value for 
every document (you cna think of it as an inverted-inverted-index) that 
Lucene maintains directly.  items are never removed they just get garbage 
collected when the IndexReader is no longer used.  It's primarily for 
sorting, but the SimpleFacets code also leveragies it for facets in some 
cases -- Solr has no way of showing you what's in the FieldCache, because 
Lucene doesn't expose any inspection APIs to query it (it's a heisenberg 
cache .. once you ask if something is in it, it's in it)

are you refering to the "filterCache" ?  

filterCache contains records whose key is a "query" and whose value is a 
DocSet (an unordered collection of all docs matching a query) ... it's 
used whenever you use an "fq" param, for faceting on some fields (when the 
TermEnum method is used, a filterCache entry is added for each term 
tested), and even for some sorted queries if the 
 config option is set to true.

the easiest way to know whether your faceting is using the FieldCache is 
to start your server cold (no newSearcher warming) and then send it a 
simple query with a single facet.field.  depending on the query, you might 
get 0 or 1 entries in the filterCache if SimpleFacets is using the 
FieldCache -- but if it's using the TermEnums, and generating a DocSet per 
term, you'llsee *lots* of inserts into the filterCache.



-Hoss

quickie: do facetfields use same cached items in field cache as FQ-param?

2007-10-11 Thread Britske


say I have the following (partial)
querystring:...&facet=true&facet.field=country
field 'country' is not tokenized, not multi-valued, and not boolean, so the
field-cache approach is used.

Morover, the following (partial) querystring is used as well: 
..fq=country:france

do these queries share cached items in the fieldcache? (in this example:
country:france) or do they somehow live as seperate entities in the cache?
The latter would explain my fieldcache having evictions at the moment.

Geert-Jan



-- 
View this message in context: 
http://www.nabble.com/quickie%3A-do-facetfields-use-same-cached-items-in-field-cache-as-FQ-param--tf4609795.html#a13164249
Sent from the Solr - User mailing list archive at Nabble.com.

51 matches

Mail list logo