Re: Solr caching the index file make server refuse serving

2017-08-24 Thread Erick Erickson
10 billion documents on 12 cores is over 800M documents/shard at best.
This is _very_ aggressive for a shard. Could you give more information
about your setup?

I've seen 250M docs fit in 12G memory. I've also seen 10M documents
strain 32G of memory. Details matter a lot. The only way I've been
able to determine what a reasonable number of docs with my queries on
my data is to do "the sizing exercise", which I've outlined here:

https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

While this was written over 5 years ago, it's still accurate.

Best,
Erick

On Thu, Aug 24, 2017 at 6:10 PM, 陈永龙  wrote:
> Hello,
>
> ENV:  solrcloud 6.3
>
> 3*dell server
>
> 128G 12cores 4.3T /server
>
> 3 solr node /server
>
> 20G /node (with parameter –m 20G)
>
> 10 billlion documents totle
>
> Problem:
>
>  When we start solrcloud ,the cached index will make memory 98% or
> more used . And if we continue to index document (batch commit 10 000
> documents),one or more server will refuse serving.Cannot login wia ssh,even
> refuse the monitor.
>
> So,how can I limit the solr’s caching index to memory behavior?
>
> Anyone thanks!
>


Solr caching the index file make server refuse serving

2017-08-24 Thread 陈永龙
Hello,

ENV:  solrcloud 6.3  

3*dell server

128G 12cores 4.3T /server

3 solr node /server

20G /node (with parameter �Cm 20G)

10 billlion documents totle

Problem:

 When we start solrcloud ,the cached index will make memory 98% or
more used . And if we continue to index document (batch commit 10 000
documents),one or more server will refuse serving.Cannot login wia ssh,even
refuse the monitor.

So,how can I limit the solr’s caching index to memory behavior?

Anyone thanks!



Re: Solr Caching (documentCache) not working

2015-08-18 Thread Daniel Collins
I think this is expected.  As Shawn mentioned, your hard commits have
openSearcher=false, so they flush changes to disk, but don't force a
re-open of the active searcher.
By contrast softCommit, sets openSearcher=true, the point of softCommit is
to make the changes visible so do to that you have to re-open a searcher.

Currently, most of the Solr Caches are searcher-based, so opening a new
searcher means creating (and optionally warming) a new cache.  I know there
is work in progress to make these caches more segment-based (which would
lead to more re-use between searchers) but currently each commit would
create a new set of caches.  You can warm those caches, but that's where
the trade-off comes, warming a cache with records takes time and
processing, so if you are committing frequently, the warming doesn't really
have time to take effect.

https://wiki.apache.org/solr/SolrCaching explains this (probably better
than me) and from that page, the DocumentCache can't be auto-warmed since
DocIds can change between searchers.


On 18 August 2015 at 06:19, Maulin Rathod mrat...@asite.com wrote:

 Hi Shawn,



 Thanks for your feedback.

 In our scenario documents are added frequently (Approx 10 documents added
 in 1 minute) and we want to make it available for search near realtime
 (within 5 second).  Even if we set  autosoftcommit 5 second (so that
 document will be available for search after 5 second), it flushes all
 documents from documentCache. Just wanted to understand if we are doing
 something wrong or its solr expected behavior.





 autoSoftCommit

maxTime5000/maxTime

 /autoSoftCommit





 Regards,



 Maulin







 -Original Message-
 From: Shawn Heisey [mailto:apa...@elyograg.org]
 Sent: 17 August 2015 19:02
 To: solr-user@lucene.apache.org
 Subject: Re: Solr Caching (documentCache) not working



 On 8/17/2015 7:04 AM, Maulin Rathod wrote:

  We have observed that Intermittently querying become slower when
 documentCache become empty. The documentCache is getting flushed whenever
 new document added to the collection.

 

  Is there any way by which we can ensure that newly added documents are
 visible without losing data in documentCache? We are trying to use soft
 commit but it also flushes all documents in documentCache.



 snip



  autoSoftCommit

maxTime50/maxTime

  /autoSoftCommit



 You are doing a soft commit within 50 milliseconds of adding a new
 document.  Solr can have severe performance problems when autoSoftCommit is
 set to 1000 -- one second.  50 milliseconds is one twentieth of a very low
 value that is known to cause problems.  It can make the problem much more
 than 20 times worse.



 Please read this article:




 http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/



 Note one particular section, which says the following:  Don’t listen to
 your product manager who says we need no more than 1 second latency.



 You need to set your commit interval as long as you possibly can.  I
 personally wouldn't go longer than 60 seconds, 30 seconds if the commits
 complete particularly fast.  It should be several minutes if that will meet
 your needs.  When your commit interval is very low, Solr's caches can
 become useless, as you've noticed.



 TL;DR info:  Your autoCommit settings have openSearcher set to false, so
 they do not matter for the problem you have described. I would probably
 increase that to 5 minutes rather than 15 seconds, but that is not very
 important here, and 15 seconds for hard commits that don't open a new
 searcher is known to have a low impact on performance.  Low impact isn't
 the same as NO impact, so I keep this interval long as well.



 Thanks,

 Shawn





Re: Solr Caching (documentCache) not working

2015-08-18 Thread Shawn Heisey
On 8/18/2015 2:30 AM, Daniel Collins wrote:
 I think this is expected.  As Shawn mentioned, your hard commits have
 openSearcher=false, so they flush changes to disk, but don't force a
 re-open of the active searcher.
 By contrast softCommit, sets openSearcher=true, the point of softCommit is
 to make the changes visible so do to that you have to re-open a searcher.
 
 Currently, most of the Solr Caches are searcher-based, so opening a new
 searcher means creating (and optionally warming) a new cache.  I know there
 is work in progress to make these caches more segment-based (which would
 lead to more re-use between searchers) but currently each commit would
 create a new set of caches.  You can warm those caches, but that's where
 the trade-off comes, warming a cache with records takes time and
 processing, so if you are committing frequently, the warming doesn't really
 have time to take effect.
 
 https://wiki.apache.org/solr/SolrCaching explains this (probably better
 than me) and from that page, the DocumentCache can't be auto-warmed since
 DocIds can change between searchers.
 
 
 On 18 August 2015 at 06:19, Maulin Rathod mrat...@asite.com wrote:
 
 Hi Shawn,



 Thanks for your feedback.

 In our scenario documents are added frequently (Approx 10 documents added
 in 1 minute) and we want to make it available for search near realtime
 (within 5 second).  Even if we set  autosoftcommit 5 second (so that
 document will be available for search after 5 second), it flushes all
 documents from documentCache. Just wanted to understand if we are doing
 something wrong or its solr expected behavior.

10 documents per minute won't even make Solr breathe hard.  Under the
right conditions, Solr can index thousands of documents per second.
Unless there are a very large number of of new documents, the length of
time that a commit takes usually has no correlation to the number of
documents that were added.  It has more to do with the number of
documents in the entire index, the total index size, and how you have
configured your cache warming.

Daniel has provided good information regarding how searchers and caches
interact.

Thanks,
Shawn



Re: Solr Caching (documentCache) not working

2015-08-17 Thread Yonik Seeley
On Mon, Aug 17, 2015 at 4:36 PM, Daniel Collins danwcoll...@gmail.com wrote:
 we had to turn off
 ALL the Solr caches (warming is useless at that kind of frequency

Warming and caching are related, but different.  Caching still
normally makes sense without warming, and Solr is generally written
with the assumption that caches are present.

-Yonik


Re: Solr Caching (documentCache) not working

2015-08-17 Thread Shawn Heisey
On 8/17/2015 7:04 AM, Maulin Rathod wrote:
 We have observed that Intermittently querying become slower when 
 documentCache become empty. The documentCache is getting flushed whenever new 
 document added to the collection.
 
 Is there any way by which we can ensure that newly added documents are 
 visible without losing data in documentCache? We are trying to use soft 
 commit but it also flushes all documents in documentCache.

snip

 autoSoftCommit
   maxTime50/maxTime
 /autoSoftCommit

You are doing a soft commit within 50 milliseconds of adding a new
document.  Solr can have severe performance problems when autoSoftCommit
is set to 1000 -- one second.  50 milliseconds is one twentieth of a
very low value that is known to cause problems.  It can make the problem
much more than 20 times worse.

Please read this article:

http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Note one particular section, which says the following:  Don’t listen to
your product manager who says we need no more than 1 second latency.

You need to set your commit interval as long as you possibly can.  I
personally wouldn't go longer than 60 seconds, 30 seconds if the commits
complete particularly fast.  It should be several minutes if that will
meet your needs.  When your commit interval is very low, Solr's caches
can become useless, as you've noticed.

TL;DR info:  Your autoCommit settings have openSearcher set to false, so
they do not matter for the problem you have described. I would probably
increase that to 5 minutes rather than 15 seconds, but that is not very
important here, and 15 seconds for hard commits that don't open a new
searcher is known to have a low impact on performance.  Low impact
isn't the same as NO impact, so I keep this interval long as well.

Thanks,
Shawn



Solr Caching (documentCache) not working

2015-08-17 Thread Maulin Rathod

Hi,

We are using solr cloud 5.2 version.

We have observed that Intermittently querying become slower when documentCache 
become empty. The documentCache is getting flushed whenever new document added 
to the collection.

Is there any way by which we can ensure that newly added documents are visible 
without losing data in documentCache? We are trying to use soft commit but it 
also flushes all documents in documentCache.

We have following setting in solrconfig.xml.


autoCommit
  maxTime${solr.autoCommit.maxTime:15000}/maxTime
  openSearcherfalse/openSearcher
/autoCommit

autoSoftCommit
  maxTime50/maxTime
/autoSoftCommit



Regards,

Maulin

[CC Award Winners 2014]



RE: Solr Caching (documentCache) not working

2015-08-17 Thread Maulin Rathod
Hi Shawn,



Thanks for your feedback.

In our scenario documents are added frequently (Approx 10 documents added in 1 
minute) and we want to make it available for search near realtime (within 5 
second).  Even if we set  autosoftcommit 5 second (so that document will be 
available for search after 5 second), it flushes all documents from 
documentCache. Just wanted to understand if we are doing something wrong or its 
solr expected behavior.





autoSoftCommit

   maxTime5000/maxTime

/autoSoftCommit





Regards,



Maulin







-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: 17 August 2015 19:02
To: solr-user@lucene.apache.org
Subject: Re: Solr Caching (documentCache) not working



On 8/17/2015 7:04 AM, Maulin Rathod wrote:

 We have observed that Intermittently querying become slower when 
 documentCache become empty. The documentCache is getting flushed whenever new 
 document added to the collection.



 Is there any way by which we can ensure that newly added documents are 
 visible without losing data in documentCache? We are trying to use soft 
 commit but it also flushes all documents in documentCache.



snip



 autoSoftCommit

   maxTime50/maxTime

 /autoSoftCommit



You are doing a soft commit within 50 milliseconds of adding a new document.  
Solr can have severe performance problems when autoSoftCommit is set to 1000 -- 
one second.  50 milliseconds is one twentieth of a very low value that is known 
to cause problems.  It can make the problem much more than 20 times worse.



Please read this article:



http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/



Note one particular section, which says the following:  Don’t listen to your 
product manager who says we need no more than 1 second latency.



You need to set your commit interval as long as you possibly can.  I personally 
wouldn't go longer than 60 seconds, 30 seconds if the commits complete 
particularly fast.  It should be several minutes if that will meet your needs.  
When your commit interval is very low, Solr's caches can become useless, as 
you've noticed.



TL;DR info:  Your autoCommit settings have openSearcher set to false, so they 
do not matter for the problem you have described. I would probably increase 
that to 5 minutes rather than 15 seconds, but that is not very important here, 
and 15 seconds for hard commits that don't open a new searcher is known to have 
a low impact on performance.  Low impact isn't the same as NO impact, so I 
keep this interval long as well.



Thanks,

Shawn




Re: Solr Caching (documentCache) not working

2015-08-17 Thread Mikhail Khludnev
On Mon, Aug 17, 2015 at 11:36 PM, Daniel Collins danwcoll...@gmail.com
wrote:

 Just to open the can of worms, it *can* be possible to have very low commit
 times, we have 250ms currently and are in production with that.  But it
 does come with pain (no such thing as a free lunch!), we had to turn off
 ALL the Solr caches


Gentlemen,
Excuse me for hijacking, here are the small segmentation lunchbox:
 - segmented filters, which much cheaper for commit:
http://blog.griddynamics.com/2014/01/segmented-filter-cache-in-solr.html
 - docvalues facets on steroids
https://issues.apache.org/jira/browse/SOLR-7730 (since 5.3)
But document's cache wasn't sliced yet. Thus, I prefer
https://issues.apache.org/jira/browse/SOLR-7937 hangs for a while and to be
pursued later.


 (warming is useless at that kind of frequency, it will
 take longer to warm the cache than the time before the next commit), and
 throw a lot of RAM and expensive SSDs at the problem.

 That said, Shawn's advice is correct, anything less than 1s commit
 shouldn't be needed for most users, and I would concur with staying away
 from it unless you absolutely decide you have to have it.

 You only go that route if you are prepared to commit (no pun intended!) a
 fair amount of time, money and resources to investigating and dealing with
 issues.  We will have a talk at Revolution this year about some of the
 scale and latency issues we have to deal with (blatant plug for my team
 lead who's giving the talk!)

 On 17 August 2015 at 14:31, Shawn Heisey apa...@elyograg.org wrote:

  On 8/17/2015 7:04 AM, Maulin Rathod wrote:
   We have observed that Intermittently querying become slower when
  documentCache become empty. The documentCache is getting flushed whenever
  new document added to the collection.
  
   Is there any way by which we can ensure that newly added documents are
  visible without losing data in documentCache? We are trying to use soft
  commit but it also flushes all documents in documentCache.
 
  snip
 
   autoSoftCommit
 maxTime50/maxTime
   /autoSoftCommit
 
  You are doing a soft commit within 50 milliseconds of adding a new
  document.  Solr can have severe performance problems when autoSoftCommit
  is set to 1000 -- one second.  50 milliseconds is one twentieth of a
  very low value that is known to cause problems.  It can make the problem
  much more than 20 times worse.
 
  Please read this article:
 
 
 
 http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
 
  Note one particular section, which says the following:  Don’t listen to
  your product manager who says we need no more than 1 second latency.
 
  You need to set your commit interval as long as you possibly can.  I
  personally wouldn't go longer than 60 seconds, 30 seconds if the commits
  complete particularly fast.  It should be several minutes if that will
  meet your needs.  When your commit interval is very low, Solr's caches
  can become useless, as you've noticed.
 
  TL;DR info:  Your autoCommit settings have openSearcher set to false, so
  they do not matter for the problem you have described. I would probably
  increase that to 5 minutes rather than 15 seconds, but that is not very
  important here, and 15 seconds for hard commits that don't open a new
  searcher is known to have a low impact on performance.  Low impact
  isn't the same as NO impact, so I keep this interval long as well.
 
  Thanks,
  Shawn
 
 




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com


Re: Solr Caching (documentCache) not working

2015-08-17 Thread Daniel Collins
Just to open the can of worms, it *can* be possible to have very low commit
times, we have 250ms currently and are in production with that.  But it
does come with pain (no such thing as a free lunch!), we had to turn off
ALL the Solr caches (warming is useless at that kind of frequency, it will
take longer to warm the cache than the time before the next commit), and
throw a lot of RAM and expensive SSDs at the problem.

That said, Shawn's advice is correct, anything less than 1s commit
shouldn't be needed for most users, and I would concur with staying away
from it unless you absolutely decide you have to have it.

You only go that route if you are prepared to commit (no pun intended!) a
fair amount of time, money and resources to investigating and dealing with
issues.  We will have a talk at Revolution this year about some of the
scale and latency issues we have to deal with (blatant plug for my team
lead who's giving the talk!)

On 17 August 2015 at 14:31, Shawn Heisey apa...@elyograg.org wrote:

 On 8/17/2015 7:04 AM, Maulin Rathod wrote:
  We have observed that Intermittently querying become slower when
 documentCache become empty. The documentCache is getting flushed whenever
 new document added to the collection.
 
  Is there any way by which we can ensure that newly added documents are
 visible without losing data in documentCache? We are trying to use soft
 commit but it also flushes all documents in documentCache.

 snip

  autoSoftCommit
maxTime50/maxTime
  /autoSoftCommit

 You are doing a soft commit within 50 milliseconds of adding a new
 document.  Solr can have severe performance problems when autoSoftCommit
 is set to 1000 -- one second.  50 milliseconds is one twentieth of a
 very low value that is known to cause problems.  It can make the problem
 much more than 20 times worse.

 Please read this article:


 http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

 Note one particular section, which says the following:  Don’t listen to
 your product manager who says we need no more than 1 second latency.

 You need to set your commit interval as long as you possibly can.  I
 personally wouldn't go longer than 60 seconds, 30 seconds if the commits
 complete particularly fast.  It should be several minutes if that will
 meet your needs.  When your commit interval is very low, Solr's caches
 can become useless, as you've noticed.

 TL;DR info:  Your autoCommit settings have openSearcher set to false, so
 they do not matter for the problem you have described. I would probably
 increase that to 5 minutes rather than 15 seconds, but that is not very
 important here, and 15 seconds for hard commits that don't open a new
 searcher is known to have a low impact on performance.  Low impact
 isn't the same as NO impact, so I keep this interval long as well.

 Thanks,
 Shawn




Re: Question on Solr Caching

2014-12-08 Thread Manohar Sripada
Thanks Shawn,

Can you please re-direct me to any wiki which describes (in detail) the
differences between MMapDirectoryFactory and NRTCachingDirectoryFactory? I
found this blog
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html very
helpful which describes about MMapDirectory. I want to know in detail about
NRTCachingFactory as well.

Also, when I ran this rest request solr/admin/cores?action=STATUS, I got
the below result (pasted partial result only). I have set the
DirectoryFactory as NRTCachingDirectory in solrconfig.xml. But, it also
shows MMapDirectory in the below element. Does this means
NRTCachingDirectory is using MMapDirectory internally??

str name=directory
org.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/instance/solr/collection1_shard2_replica1/data/index
lockFactory=NativeFSLockFactory@/instance/solr/collection1_shard2_replica1/data/index;
maxCacheMB=48.0 maxMergeSizeMB=4.0)/str

What does maxCacheMB and maxMergeSizeMB indicate? How to control it?


Thanks,
Manohar

On Fri, Dec 5, 2014 at 11:04 AM, Shawn Heisey apa...@elyograg.org wrote:

 On 12/4/2014 10:06 PM, Manohar Sripada wrote:
  If you use MMapDirectory, Lucene will map the files into memory off heap
  and the OS's disk cache will cache the files in memory for you. Don't use
  RAMDirectory, it's not better than MMapDirectory for any use I'm aware
 of.
 
  Will that mean it will cache the Inverted index as well to OS disk's
  cache? The reason I am asking is, Solr searches this Inverted Index first
  to get the data. How about if we can keep this in memory?

 If you have enough memory, the operating system will cache *everything*.
  It does so by simply loading the data that's on the disk into RAM ...
 it is not aware that certain parts are the inverted index, it simply
 caches whatever data gets read.  A subsequent read will come out of
 memory, the disk heads will never even move.  If certain data in the
 index is never accessed, then it will not get cached.

 http://en.wikipedia.org/wiki/Page_cache

 Thanks,
 Shawn




Re: Question on Solr Caching

2014-12-08 Thread Shawn Heisey
On 12/8/2014 2:42 AM, Manohar Sripada wrote:
 Can you please re-direct me to any wiki which describes (in detail) the
 differences between MMapDirectoryFactory and NRTCachingDirectoryFactory? I
 found this blog
 http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html very
 helpful which describes about MMapDirectory. I want to know in detail about
 NRTCachingFactory as well.
 
 Also, when I ran this rest request solr/admin/cores?action=STATUS, I got
 the below result (pasted partial result only). I have set the
 DirectoryFactory as NRTCachingDirectory in solrconfig.xml. But, it also
 shows MMapDirectory in the below element. Does this means
 NRTCachingDirectory is using MMapDirectory internally??
 
 str name=directory
 org.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/instance/solr/collection1_shard2_replica1/data/index
 lockFactory=NativeFSLockFactory@/instance/solr/collection1_shard2_replica1/data/index;
 maxCacheMB=48.0 maxMergeSizeMB=4.0)/str
 
 What does maxCacheMB and maxMergeSizeMB indicate? How to control it?

NRTCachingDirectoryFactory creates instances of NRTCachingDirectory.
This is is a wrapper on top of another Directory implementation.
Normally it wraps MMapDirectory, so you get all the MMap advantages.
The javadoc for NRTCachingDirectory says that it Wraps a RAMDirectory
around any provided delegate directory, to be used during NRT search.

http://lucene.apache.org/core/4_10_0/core/org/apache/lucene/store/NRTCachingDirectory.html

Further down in that javadoc, the constructor documentation has this to
say: We will cache a newly created output if 1) it's a flush or a merge
and the estimated size of the merged segment is = maxMergeSizeMB, and
2) the total cached bytes is = maxCachedMB

Basically, if a newly created or merged segment is small enough, it
won't be written to disk right away, it will be saved into RAM until
another cacheable segment won't fit in available RAM and the oldest
cached segment must be flushed to disk.  Near Real Time search becomes
easier.

This DirectoryFactory implementation is default in 4.x, so as I
understand it, it's critically important for Solr to have a replayable
transaction log ... without it, any data that is cached in RAM will be
lost if the program crashes or exits.  The main Solr example *does* have
the transaction log enabled.

Thanks,
Shawn



Question on Solr Caching

2014-12-04 Thread Manohar Sripada
Hi,

I am working on implementing Solr in my product. I have a few questions on
caching.

1. Does posting-list and term-list of the index reside in the memory? If
not, how to load this to memory. I don't want to load entire data, like
using DocumentCache. Either I want to use RAMDirectoryFactory as the data
will be lost if you restart

2. For FilterCache, there is a way to specify whether the filter should be
cached or not in the query. Similarly, Is there a way where I can specify
the list of stored fields to be loaded to Document Cache? I know Document
Cache is not associated to query. Just curious to know.

3. Similarly, Is there a way I can specify list of fields to be cached for
FieldCache?

Thanks,
Manohar


Re: Question on Solr Caching

2014-12-04 Thread Michael Della Bitta

Hi, Manohar,


1. Does posting-list and term-list of the index reside in the memory? If

not, how to load this to memory. I don't want to load entire data, like
using DocumentCache. Either I want to use RAMDirectoryFactory as the data
will be lost if you restart


If you use MMapDirectory, Lucene will map the files into memory off heap 
and the OS's disk cache will cache the files in memory for you. Don't 
use RAMDirectory, it's not better than MMapDirectory for any use I'm 
aware of.


 2. For FilterCache, there is a way to specify whether the filter 
should be cached or not in the query.


If you add {!cache=false}  to your filter query, it will bypass the 
cache. I'm fairly certain it will not subsequently be cached.


 Similarly, Is there a way where I can specify the list of stored 
fields to be loaded to Document Cache?


If you have lazy loading enabled, the DocumentCache will only have the 
fields you asked for in it.


 3. Similarly, Is there a way I can specify list of fields to be 
cached for FieldCache? Thanks, Manohar


You basically don't have much control over the FieldCache in Solr other 
than warming it with queries.


You should check out this wiki page, it will probably answer some questions:

https://wiki.apache.org/solr/SolrCaching

I hope that helps!

Michael



Re: Question on Solr Caching

2014-12-04 Thread Manohar Sripada
Thanks Micheal for the response.

If you use MMapDirectory, Lucene will map the files into memory off heap
and the OS's disk cache will cache the files in memory for you. Don't use
RAMDirectory, it's not better than MMapDirectory for any use I'm aware of.

 Will that mean it will cache the Inverted index as well to OS disk's
cache? The reason I am asking is, Solr searches this Inverted Index first
to get the data. How about if we can keep this in memory?

Thanks,
Manohar



On Thu, Dec 4, 2014 at 10:54 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 Hi, Manohar,

  1. Does posting-list and term-list of the index reside in the memory? If

 not, how to load this to memory. I don't want to load entire data, like
 using DocumentCache. Either I want to use RAMDirectoryFactory as the data
 will be lost if you restart


 If you use MMapDirectory, Lucene will map the files into memory off heap
 and the OS's disk cache will cache the files in memory for you. Don't use
 RAMDirectory, it's not better than MMapDirectory for any use I'm aware of.

  2. For FilterCache, there is a way to specify whether the filter should
 be cached or not in the query.

 If you add {!cache=false}  to your filter query, it will bypass the cache.
 I'm fairly certain it will not subsequently be cached.

  Similarly, Is there a way where I can specify the list of stored fields
 to be loaded to Document Cache?

 If you have lazy loading enabled, the DocumentCache will only have the
 fields you asked for in it.

  3. Similarly, Is there a way I can specify list of fields to be cached
 for FieldCache? Thanks, Manohar

 You basically don't have much control over the FieldCache in Solr other
 than warming it with queries.

 You should check out this wiki page, it will probably answer some
 questions:

 https://wiki.apache.org/solr/SolrCaching

 I hope that helps!

 Michael




Re: Question on Solr Caching

2014-12-04 Thread Shawn Heisey
On 12/4/2014 10:06 PM, Manohar Sripada wrote:
 If you use MMapDirectory, Lucene will map the files into memory off heap
 and the OS's disk cache will cache the files in memory for you. Don't use
 RAMDirectory, it's not better than MMapDirectory for any use I'm aware of.
 
 Will that mean it will cache the Inverted index as well to OS disk's
 cache? The reason I am asking is, Solr searches this Inverted Index first
 to get the data. How about if we can keep this in memory?

If you have enough memory, the operating system will cache *everything*.
 It does so by simply loading the data that's on the disk into RAM ...
it is not aware that certain parts are the inverted index, it simply
caches whatever data gets read.  A subsequent read will come out of
memory, the disk heads will never even move.  If certain data in the
index is never accessed, then it will not get cached.

http://en.wikipedia.org/wiki/Page_cache

Thanks,
Shawn



Re: Solr caching clarifications

2013-07-15 Thread Erick Erickson
Manuel:

First off, anything that Mike McCandless says about low-level
details should override anything I say. The memory savings
he's talking about there are actually something he tutored me
in once on a chat.

The savings there, as I understand it, aren't huge. For large
sets I think it's a 25% savings (if I calculated right). But consider
that even without those savings, 8 filter cache entries will be
more than the entire structure that JIRA talks about

As to your fq question, absolutely! Any yes/no clause that,
as you say, contribute to the score is a candidate to be
moved to a fq clause. There are a couple of things to
be aware of though.
1 be a little careful of using NOW. If you don't use it correctly,
 fq clauses will not be re-used. See:
 http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/
2 How you usually do this is through the UI, not the users entering
 a query. For instance if you have a date-range picker your a;;
 constructs the fq clause from that. Or you append fq clauses to the
 links you create when you display facets or

No, there's no automatic tool for this. There's not likely to be one
since there's no way to infer the intent. Say you put in a clause like
q=a AND b.
That scores things. It would give the same result set as
q=*:*fq=1fq=b
which would compute no scores. How could a tool infer when this
was or wasn't OK?

Best
Erick

On Sun, Jul 14, 2013 at 6:10 PM, Manuel Le Normand
manuel.lenorm...@gmail.com wrote:
 Alright, thanks Erick. For the question about memory usage of merges, taken
 from  Mike McCandless Blog

 The big thing that stays in RAM is a logical int[] mapping old docIDs to
 new docIDs, but in more recent versions of Lucene (4.x) we use a much more
 efficient structure than a simple int[] ... see
 https://issues.apache.org/jira/browse/LUCENE-2357

 How much RAM is required is mostly a function of how many documents (lots
 of tiny docs use more RAM than fewer huge docs).


 A related clarification
 As my users are not aware of the fq possibility, i was wondering how do I
 make the best out of this field cache. Would if be efficient transforming
 implicitly their query to a filter query on fields that are boolean
 searches (date range etc. that do not affect the score of a document). Is
 this a good practice? Is there any plugin for a query parser that makes it?




 Inline

 On Thu, Jul 11, 2013 at 8:36 AM, Manuel Le Normand
 manuel.lenorm...@gmail.com wrote:
  Hello,
  As a result of frequent java OOM exceptions, I try to investigate more
 into
  the solr jvm memory heap usage.
  Please correct me if I am mistaking, this is my understanding of usages
 for
  the heap (per replica on a solr instance):
  1. Buffers for indexing - bounded by ramBufferSize
  2. Solr caches
  3. Segment merge
  4. Miscellaneous- buffers for Tlogs, servlet overhead etc.
 
  Particularly I'm concerned by Solr caches and segment merges.
  1. How much memory consuming (bytes per doc) are FilterCaches
 (bitDocSet)
  and queryResultCaches (DocList)? I understand it is related to the skip
  spaces between doc id's that match (so it's not saved as a bitmap). But
  basically, is every id saved as a java int?

 Different beasts. filterCache consumes, essentially, maxDoc/8 bytes (you
 can get the maxDoc number from your Solr admin page). Plus some overhead
 for storing the fq text, but that's usually not much. This is for each
 entry up to Size.




 queryResultCache is usually trivial unless you've configured it
 extravagantly.
 It's the query string length + queryResultWindowSize integers per entry
 (queryResultWindowSize is from solrconfig.xml).

  2. QueryResultMaxDocsCached - (for example = 100) means that any query
  resulting in more than 100 docs will not be cached (at all) in the
  queryResultCache? Or does it have to do with the documentCache?
 It's just a limit on the queryResultCache entry size as far as I can
 tell. But again
 this cache is relatively small, I'd be surprised if it used
 significant resources.

  3. DocumentCache - written on the wiki it should be greater than
  max_results*concurrent_queries. Max result is just the num of rows
  displayed (rows-start) param, right? Not the queryResultWindow.

 Yes. This a cache (I think) for the _contents_ of the documents you'll
 be returning to be manipulated by various components during the life
 of the query.

  4. LazyFieldLoading=true - when quering for id's only (fl=id) will this
  cache be used? (on the expense of eviction of docs that were already
 loaded
  with stored fields)

 Not sure, but I don't think this will contribute much to memory pressure.
 This
 is about now many fields are loaded to get a single value from a doc in
 the
 results list, and since one is usually working with 20 or so docs this
 is usually
 a small amount of memory.

  5. How large is the heap used by mergings? Assuming we have a merge of
 10
  segments of 500MB each (half inverted files - *.pos *.doc etc, half 

Re: Solr caching clarifications

2013-07-15 Thread Manuel Le Normand
Great explanation and article.

Yes, this buffer for merges seems very small, and still optimized. Thats
impressive.


Re: Solr caching clarifications

2013-07-14 Thread Manuel Le Normand
Alright, thanks Erick. For the question about memory usage of merges, taken
from  Mike McCandless Blog

The big thing that stays in RAM is a logical int[] mapping old docIDs to
new docIDs, but in more recent versions of Lucene (4.x) we use a much more
efficient structure than a simple int[] ... see
https://issues.apache.org/jira/browse/LUCENE-2357

How much RAM is required is mostly a function of how many documents (lots
of tiny docs use more RAM than fewer huge docs).


A related clarification
As my users are not aware of the fq possibility, i was wondering how do I
make the best out of this field cache. Would if be efficient transforming
implicitly their query to a filter query on fields that are boolean
searches (date range etc. that do not affect the score of a document). Is
this a good practice? Is there any plugin for a query parser that makes it?




 Inline

 On Thu, Jul 11, 2013 at 8:36 AM, Manuel Le Normand
 manuel.lenorm...@gmail.com wrote:
  Hello,
  As a result of frequent java OOM exceptions, I try to investigate more
into
  the solr jvm memory heap usage.
  Please correct me if I am mistaking, this is my understanding of usages
for
  the heap (per replica on a solr instance):
  1. Buffers for indexing - bounded by ramBufferSize
  2. Solr caches
  3. Segment merge
  4. Miscellaneous- buffers for Tlogs, servlet overhead etc.
 
  Particularly I'm concerned by Solr caches and segment merges.
  1. How much memory consuming (bytes per doc) are FilterCaches
(bitDocSet)
  and queryResultCaches (DocList)? I understand it is related to the skip
  spaces between doc id's that match (so it's not saved as a bitmap). But
  basically, is every id saved as a java int?

 Different beasts. filterCache consumes, essentially, maxDoc/8 bytes (you
 can get the maxDoc number from your Solr admin page). Plus some overhead
 for storing the fq text, but that's usually not much. This is for each
 entry up to Size.




 queryResultCache is usually trivial unless you've configured it
extravagantly.
 It's the query string length + queryResultWindowSize integers per entry
 (queryResultWindowSize is from solrconfig.xml).

  2. QueryResultMaxDocsCached - (for example = 100) means that any query
  resulting in more than 100 docs will not be cached (at all) in the
  queryResultCache? Or does it have to do with the documentCache?
 It's just a limit on the queryResultCache entry size as far as I can
 tell. But again
 this cache is relatively small, I'd be surprised if it used
 significant resources.

  3. DocumentCache - written on the wiki it should be greater than
  max_results*concurrent_queries. Max result is just the num of rows
  displayed (rows-start) param, right? Not the queryResultWindow.

 Yes. This a cache (I think) for the _contents_ of the documents you'll
 be returning to be manipulated by various components during the life
 of the query.

  4. LazyFieldLoading=true - when quering for id's only (fl=id) will this
  cache be used? (on the expense of eviction of docs that were already
loaded
  with stored fields)

 Not sure, but I don't think this will contribute much to memory pressure.
This
 is about now many fields are loaded to get a single value from a doc in
the
 results list, and since one is usually working with 20 or so docs this
 is usually
 a small amount of memory.

  5. How large is the heap used by mergings? Assuming we have a merge of
10
  segments of 500MB each (half inverted files - *.pos *.doc etc, half non
  inverted files - *.fdt, *.tvd), how much heap should be left unused for
  this merge?

 Again, I don't think this is much of a memory consumer, although I
 confess I don't
 know the internals. Merging is mostly about I/O.

 
  Thanks in advance,
  Manu

 But take a look at the admin page, you can see how much memory various
 caches are using by looking at the plugins/stats section.

 Best
 Erick


Re: Solr caching clarifications

2013-07-12 Thread Erick Erickson
Inline

On Thu, Jul 11, 2013 at 8:36 AM, Manuel Le Normand
manuel.lenorm...@gmail.com wrote:
 Hello,
 As a result of frequent java OOM exceptions, I try to investigate more into
 the solr jvm memory heap usage.
 Please correct me if I am mistaking, this is my understanding of usages for
 the heap (per replica on a solr instance):
 1. Buffers for indexing - bounded by ramBufferSize
 2. Solr caches
 3. Segment merge
 4. Miscellaneous- buffers for Tlogs, servlet overhead etc.

 Particularly I'm concerned by Solr caches and segment merges.
 1. How much memory consuming (bytes per doc) are FilterCaches (bitDocSet)
 and queryResultCaches (DocList)? I understand it is related to the skip
 spaces between doc id's that match (so it's not saved as a bitmap). But
 basically, is every id saved as a java int?

Different beasts. filterCache consumes, essentially, maxDoc/8 bytes (you
can get the maxDoc number from your Solr admin page). Plus some overhead
for storing the fq text, but that's usually not much. This is for each
entry up to Size.

queryResultCache is usually trivial unless you've configured it extravagantly.
It's the query string length + queryResultWindowSize integers per entry
(queryResultWindowSize is from solrconfig.xml).

 2. QueryResultMaxDocsCached - (for example = 100) means that any query
 resulting in more than 100 docs will not be cached (at all) in the
 queryResultCache? Or does it have to do with the documentCache?
It's just a limit on the queryResultCache entry size as far as I can
tell. But again
this cache is relatively small, I'd be surprised if it used
significant resources.

 3. DocumentCache - written on the wiki it should be greater than
 max_results*concurrent_queries. Max result is just the num of rows
 displayed (rows-start) param, right? Not the queryResultWindow.

Yes. This a cache (I think) for the _contents_ of the documents you'll
be returning to be manipulated by various components during the life
of the query.

 4. LazyFieldLoading=true - when quering for id's only (fl=id) will this
 cache be used? (on the expense of eviction of docs that were already loaded
 with stored fields)

Not sure, but I don't think this will contribute much to memory pressure. This
is about now many fields are loaded to get a single value from a doc in the
results list, and since one is usually working with 20 or so docs this
is usually
a small amount of memory.

 5. How large is the heap used by mergings? Assuming we have a merge of 10
 segments of 500MB each (half inverted files - *.pos *.doc etc, half non
 inverted files - *.fdt, *.tvd), how much heap should be left unused for
 this merge?

Again, I don't think this is much of a memory consumer, although I
confess I don't
know the internals. Merging is mostly about I/O.


 Thanks in advance,
 Manu

But take a look at the admin page, you can see how much memory various
caches are using by looking at the plugins/stats section.

Best
Erick


Solr caching clarifications

2013-07-11 Thread Manuel Le Normand
Hello,
As a result of frequent java OOM exceptions, I try to investigate more into
the solr jvm memory heap usage.
Please correct me if I am mistaking, this is my understanding of usages for
the heap (per replica on a solr instance):
1. Buffers for indexing - bounded by ramBufferSize
2. Solr caches
3. Segment merge
4. Miscellaneous- buffers for Tlogs, servlet overhead etc.

Particularly I'm concerned by Solr caches and segment merges.
1. How much memory consuming (bytes per doc) are FilterCaches (bitDocSet)
and queryResultCaches (DocList)? I understand it is related to the skip
spaces between doc id's that match (so it's not saved as a bitmap). But
basically, is every id saved as a java int?
2. QueryResultMaxDocsCached - (for example = 100) means that any query
resulting in more than 100 docs will not be cached (at all) in the
queryResultCache? Or does it have to do with the documentCache?
3. DocumentCache - written on the wiki it should be greater than
max_results*concurrent_queries. Max result is just the num of rows
displayed (rows-start) param, right? Not the queryResultWindow.
4. LazyFieldLoading=true - when quering for id's only (fl=id) will this
cache be used? (on the expense of eviction of docs that were already loaded
with stored fields)
5. How large is the heap used by mergings? Assuming we have a merge of 10
segments of 500MB each (half inverted files - *.pos *.doc etc, half non
inverted files - *.fdt, *.tvd), how much heap should be left unused for
this merge?

Thanks in advance,
Manu


Solr Caching

2013-04-17 Thread Furkan KAMACI
I've just started to read about Solr caching. I want to learn one thing.
Let's assume that I have given 4 GB RAM into my Solr application and I have
10 GB RAM. When Solr caching mechanism starts to work, does it use memory
from that 4 GB part or lets operating system to cache it from 6 GB part of
RAM that is remaining from Solr application?


Re: Solr Caching

2013-04-17 Thread Walter Underwood
On Apr 17, 2013, at 3:09 PM, Furkan KAMACI wrote:

 I've just started to read about Solr caching. I want to learn one thing.
 Let's assume that I have given 4 GB RAM into my Solr application and I have
 10 GB RAM. When Solr caching mechanism starts to work, does it use memory
 from that 4 GB part or lets operating system to cache it from 6 GB part of
 RAM that is remaining from Solr application?

Both.

Solr manages caches of Java objects. These are stored in the Java heap.

The OS manages caches of files. These are stored in file buffers managed by the 
OS.

All are in RAM.

wunder
--
Walter Underwood
wun...@wunderwood.org





Re: Solr Caching - how to tune, how much to increase, and any tips on using Solr with JDK7 and G1 GC?

2012-09-30 Thread Erick Erickson
4.0 is significantly more efficient memory-wise, both in the usage and
number of objects allocated. See:

http://searchhub.org/dev/2012/04/06/memory-comparisons-between-solr-3x-and-trunk/

Erick

On Sun, Sep 30, 2012 at 12:25 AM, varun srivastava
varunmail...@gmail.com wrote:
 Hi Erick,
  You mentioned for 4.0 memory pattern is much difference than 3.X . Can you
 elaborate whether its worse or better ? Does 4.0 tend to use more memory
 for similar index size as compared to 3.X ?

 Thanks
 Varun

 On Sat, Sep 29, 2012 at 1:58 PM, Erick Erickson 
 erickerick...@gmail.comwrote:

 Well, I haven't had experience with JDK7, so I'll skip that part...

 But about caches. First, as far as memory is concerned, be
 sure to read Uwe's blog about MMapDirectory here:
 http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

 As to the caches.

 Be a little careful here. Getting high hit rates on _all_ your caches
 is a waste.

 filterCache. This is the exception, you want as high a hit ratio as you can
 get for this one, it's where the results of all the fq= clauses go and is
 a
 major factor in speeding up QPS..

 queryResultCache. Hmmm, given the lack of updates to your index, this one
 may actually get more hits than Id expect. But it's a very cheap cache
 memory
 wise. Think of it as a map where the key is the query and the value is an
 array of queryResultWindowSize longs (document IDs). It's really intended
 for paging mostly. It's also often the case that the chances of the exact
 same query (except for start and rows) being issued is actually
 relatively
 small. As always YMMV. I usually see hit rates on this cache  10%.
 Evictions
 merely mean it's been around a long time, bumping the size of this cache
 probably won't affect the hit rate unless your app somehow submits just
 a few queries.


 documentCache. Again, this often doesn't have a great hit ration. It's main
 use as I understand it is to keep various parts of a query component chain
 from having to re-access the disk. Each element in a query component is
 completely separate from the others, so if two or more components want
 values from the doc, having them cached is useful. The usual recommendation
 is (#docs returned to user) * (expected simultaneous queries), where
 # docs returned to user is really the rows value.

 One of the consequences of having huge amounts of memory allocated to
 the JVM can be really long garbage collections. They happen less frequently
 but have more work to do when they happen.

 Oh, and when you start using 4.0, the memory patterns are much different...

 Finally, here's a great post on solr memory tuning, too bad the image links
 are broken...
 http://searchhub.org/dev/2011/03/27/garbage-collection-bootcamp-1-0/

 Best
 Erick

 On Sat, Sep 29, 2012 at 3:08 PM, Aaron Daubman daub...@gmail.com wrote:
  Greetings,
 
  I've recently moved to running some of our Solr (3.6.1) instances
  using JDK 7u7 with the G1 GC (playing with max pauses in the 20 to
  100ms range). By and large, it has been working well (or, perhaps I
  should say that without requiring much tuning it works much better in
  general than my haphazard attempts to tune CMS).
 
  I have two instances in particular, one with a heap size of 14G and
  one with a heap size of 60G. I'm attempting to squeeze out additional
  performance by increasing Solr's cache sizes (I am still seeing the
  hit ratio go up as I increase max size size and decrease the number of
  evictions), and am guessing this is the cause of some recent
  situations where the 14G instance especially eventually (12-24 hrs
  later under 100s of queries per minute) makes it to 80%-90% of the
  heap and then spirals into major GC with long-pause territory.
 
  I am wondering:
  1) if anybody has experience tuning the G1 GC, especially for use with
  Solr (what are decent max-pause times to use?)
  2) how to better tune Solr's cache sizes - e.g. how to even tell the
  actual amount of memory used by each cache (not # entries as the stats
  sow, but # bits)
  3) if there are any guidelines on when increasing a cache's size (even
  if it does continue to increase the hit ratio) runs into the law of
  diminishing returns or even starts to hurt - e.g. if the document
  cache has a current maxSize of 65536 and has seen 4409275 evictions,
  and currently has a hit ratio of 0.74, should the max be increased
  further? If so, how much ram needs to be added to the heap, and how
  much larger should its max size be made?
 
  I should mention that these solr instances are read-only (so cache is
  probably more valuable than in other scenarios - we only invalidate
  the searcher every 20-24hrs or so) and are also backed with indexes
  (6G and 70G for the 14G and 60G heap sizes) on IODrives, so I'm not as
  concerned about leaving RAM for linux to cache the index files (I'd
  much rather actually cache the post-transformed values).
 
  Thanks as always,
   Aaron



Solr Caching - how to tune, how much to increase, and any tips on using Solr with JDK7 and G1 GC?

2012-09-29 Thread Aaron Daubman
Greetings,

I've recently moved to running some of our Solr (3.6.1) instances
using JDK 7u7 with the G1 GC (playing with max pauses in the 20 to
100ms range). By and large, it has been working well (or, perhaps I
should say that without requiring much tuning it works much better in
general than my haphazard attempts to tune CMS).

I have two instances in particular, one with a heap size of 14G and
one with a heap size of 60G. I'm attempting to squeeze out additional
performance by increasing Solr's cache sizes (I am still seeing the
hit ratio go up as I increase max size size and decrease the number of
evictions), and am guessing this is the cause of some recent
situations where the 14G instance especially eventually (12-24 hrs
later under 100s of queries per minute) makes it to 80%-90% of the
heap and then spirals into major GC with long-pause territory.

I am wondering:
1) if anybody has experience tuning the G1 GC, especially for use with
Solr (what are decent max-pause times to use?)
2) how to better tune Solr's cache sizes - e.g. how to even tell the
actual amount of memory used by each cache (not # entries as the stats
sow, but # bits)
3) if there are any guidelines on when increasing a cache's size (even
if it does continue to increase the hit ratio) runs into the law of
diminishing returns or even starts to hurt - e.g. if the document
cache has a current maxSize of 65536 and has seen 4409275 evictions,
and currently has a hit ratio of 0.74, should the max be increased
further? If so, how much ram needs to be added to the heap, and how
much larger should its max size be made?

I should mention that these solr instances are read-only (so cache is
probably more valuable than in other scenarios - we only invalidate
the searcher every 20-24hrs or so) and are also backed with indexes
(6G and 70G for the 14G and 60G heap sizes) on IODrives, so I'm not as
concerned about leaving RAM for linux to cache the index files (I'd
much rather actually cache the post-transformed values).

Thanks as always,
 Aaron


Re: Solr Caching - how to tune, how much to increase, and any tips on using Solr with JDK7 and G1 GC?

2012-09-29 Thread Erick Erickson
Well, I haven't had experience with JDK7, so I'll skip that part...

But about caches. First, as far as memory is concerned, be
sure to read Uwe's blog about MMapDirectory here:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

As to the caches.

Be a little careful here. Getting high hit rates on _all_ your caches
is a waste.

filterCache. This is the exception, you want as high a hit ratio as you can
get for this one, it's where the results of all the fq= clauses go and is a
major factor in speeding up QPS..

queryResultCache. Hmmm, given the lack of updates to your index, this one
may actually get more hits than Id expect. But it's a very cheap cache memory
wise. Think of it as a map where the key is the query and the value is an
array of queryResultWindowSize longs (document IDs). It's really intended
for paging mostly. It's also often the case that the chances of the exact
same query (except for start and rows) being issued is actually relatively
small. As always YMMV. I usually see hit rates on this cache  10%. Evictions
merely mean it's been around a long time, bumping the size of this cache
probably won't affect the hit rate unless your app somehow submits just
a few queries.


documentCache. Again, this often doesn't have a great hit ration. It's main
use as I understand it is to keep various parts of a query component chain
from having to re-access the disk. Each element in a query component is
completely separate from the others, so if two or more components want
values from the doc, having them cached is useful. The usual recommendation
is (#docs returned to user) * (expected simultaneous queries), where
# docs returned to user is really the rows value.

One of the consequences of having huge amounts of memory allocated to
the JVM can be really long garbage collections. They happen less frequently
but have more work to do when they happen.

Oh, and when you start using 4.0, the memory patterns are much different...

Finally, here's a great post on solr memory tuning, too bad the image links
are broken...
http://searchhub.org/dev/2011/03/27/garbage-collection-bootcamp-1-0/

Best
Erick

On Sat, Sep 29, 2012 at 3:08 PM, Aaron Daubman daub...@gmail.com wrote:
 Greetings,

 I've recently moved to running some of our Solr (3.6.1) instances
 using JDK 7u7 with the G1 GC (playing with max pauses in the 20 to
 100ms range). By and large, it has been working well (or, perhaps I
 should say that without requiring much tuning it works much better in
 general than my haphazard attempts to tune CMS).

 I have two instances in particular, one with a heap size of 14G and
 one with a heap size of 60G. I'm attempting to squeeze out additional
 performance by increasing Solr's cache sizes (I am still seeing the
 hit ratio go up as I increase max size size and decrease the number of
 evictions), and am guessing this is the cause of some recent
 situations where the 14G instance especially eventually (12-24 hrs
 later under 100s of queries per minute) makes it to 80%-90% of the
 heap and then spirals into major GC with long-pause territory.

 I am wondering:
 1) if anybody has experience tuning the G1 GC, especially for use with
 Solr (what are decent max-pause times to use?)
 2) how to better tune Solr's cache sizes - e.g. how to even tell the
 actual amount of memory used by each cache (not # entries as the stats
 sow, but # bits)
 3) if there are any guidelines on when increasing a cache's size (even
 if it does continue to increase the hit ratio) runs into the law of
 diminishing returns or even starts to hurt - e.g. if the document
 cache has a current maxSize of 65536 and has seen 4409275 evictions,
 and currently has a hit ratio of 0.74, should the max be increased
 further? If so, how much ram needs to be added to the heap, and how
 much larger should its max size be made?

 I should mention that these solr instances are read-only (so cache is
 probably more valuable than in other scenarios - we only invalidate
 the searcher every 20-24hrs or so) and are also backed with indexes
 (6G and 70G for the 14G and 60G heap sizes) on IODrives, so I'm not as
 concerned about leaving RAM for linux to cache the index files (I'd
 much rather actually cache the post-transformed values).

 Thanks as always,
  Aaron


Re: Solr Caching - how to tune, how much to increase, and any tips on using Solr with JDK7 and G1 GC?

2012-09-29 Thread varun srivastava
Hi Erick,
 You mentioned for 4.0 memory pattern is much difference than 3.X . Can you
elaborate whether its worse or better ? Does 4.0 tend to use more memory
for similar index size as compared to 3.X ?

Thanks
Varun

On Sat, Sep 29, 2012 at 1:58 PM, Erick Erickson erickerick...@gmail.comwrote:

 Well, I haven't had experience with JDK7, so I'll skip that part...

 But about caches. First, as far as memory is concerned, be
 sure to read Uwe's blog about MMapDirectory here:
 http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

 As to the caches.

 Be a little careful here. Getting high hit rates on _all_ your caches
 is a waste.

 filterCache. This is the exception, you want as high a hit ratio as you can
 get for this one, it's where the results of all the fq= clauses go and is
 a
 major factor in speeding up QPS..

 queryResultCache. Hmmm, given the lack of updates to your index, this one
 may actually get more hits than Id expect. But it's a very cheap cache
 memory
 wise. Think of it as a map where the key is the query and the value is an
 array of queryResultWindowSize longs (document IDs). It's really intended
 for paging mostly. It's also often the case that the chances of the exact
 same query (except for start and rows) being issued is actually
 relatively
 small. As always YMMV. I usually see hit rates on this cache  10%.
 Evictions
 merely mean it's been around a long time, bumping the size of this cache
 probably won't affect the hit rate unless your app somehow submits just
 a few queries.


 documentCache. Again, this often doesn't have a great hit ration. It's main
 use as I understand it is to keep various parts of a query component chain
 from having to re-access the disk. Each element in a query component is
 completely separate from the others, so if two or more components want
 values from the doc, having them cached is useful. The usual recommendation
 is (#docs returned to user) * (expected simultaneous queries), where
 # docs returned to user is really the rows value.

 One of the consequences of having huge amounts of memory allocated to
 the JVM can be really long garbage collections. They happen less frequently
 but have more work to do when they happen.

 Oh, and when you start using 4.0, the memory patterns are much different...

 Finally, here's a great post on solr memory tuning, too bad the image links
 are broken...
 http://searchhub.org/dev/2011/03/27/garbage-collection-bootcamp-1-0/

 Best
 Erick

 On Sat, Sep 29, 2012 at 3:08 PM, Aaron Daubman daub...@gmail.com wrote:
  Greetings,
 
  I've recently moved to running some of our Solr (3.6.1) instances
  using JDK 7u7 with the G1 GC (playing with max pauses in the 20 to
  100ms range). By and large, it has been working well (or, perhaps I
  should say that without requiring much tuning it works much better in
  general than my haphazard attempts to tune CMS).
 
  I have two instances in particular, one with a heap size of 14G and
  one with a heap size of 60G. I'm attempting to squeeze out additional
  performance by increasing Solr's cache sizes (I am still seeing the
  hit ratio go up as I increase max size size and decrease the number of
  evictions), and am guessing this is the cause of some recent
  situations where the 14G instance especially eventually (12-24 hrs
  later under 100s of queries per minute) makes it to 80%-90% of the
  heap and then spirals into major GC with long-pause territory.
 
  I am wondering:
  1) if anybody has experience tuning the G1 GC, especially for use with
  Solr (what are decent max-pause times to use?)
  2) how to better tune Solr's cache sizes - e.g. how to even tell the
  actual amount of memory used by each cache (not # entries as the stats
  sow, but # bits)
  3) if there are any guidelines on when increasing a cache's size (even
  if it does continue to increase the hit ratio) runs into the law of
  diminishing returns or even starts to hurt - e.g. if the document
  cache has a current maxSize of 65536 and has seen 4409275 evictions,
  and currently has a hit ratio of 0.74, should the max be increased
  further? If so, how much ram needs to be added to the heap, and how
  much larger should its max size be made?
 
  I should mention that these solr instances are read-only (so cache is
  probably more valuable than in other scenarios - we only invalidate
  the searcher every 20-24hrs or so) and are also backed with indexes
  (6G and 70G for the 14G and 60G heap sizes) on IODrives, so I'm not as
  concerned about leaving RAM for linux to cache the index files (I'd
  much rather actually cache the post-transformed values).
 
  Thanks as always,
   Aaron



Re: Solr caching memory consumption Problem

2012-04-02 Thread Suneel
Hello friends,

I am using DIH for solr indexing. I have 60 million records in SQL which
need to upload on solr. i started caching its smoothly working and memory
consumption is normal, But after some time incrementally memory consumption
going high and process reach more then 6 gb. that the reason i am not able
to caching my data.
please advise me if anything need to be done in configuration or in tomcat
configuration.

this will be very help full for me.


-
Regards,

Suneel Pandey
Sr. Software Developer
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-caching-memory-consumption-Problem-tp3873158p3877081.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr caching memory consumption Problem

2012-04-02 Thread Shawn Heisey

On 3/31/2012 4:30 AM, Suneel wrote:

Hello friends,

I am using DIH for solr indexing. I have 60 million records in SQL which
need to upload on solr. i started caching its smoothly working and memory
consumption is normal, But after some time incrementally memory consumption
going high and process reach more then 6 gb. that the reason i am not able
to caching my data.
please advise me if anything need to be done in configuration or in tomcat
configuration.


I saw your later message about virtual memory and the directoryFactory - 
most of the time it is best to go with the default 
(solr.StandardDirectoryFactory), which you can do by specifying it 
explicitly or by leaving that configuration out.


When you talk about caching, are you talking about Solr's caches or 
OS/process memory and disk cache?If you are talking about the caches 
that you can configure in solrconfig.xml (filterCache, queryResultCache, 
and documentCache), you should not be trying to cache large portions of 
your index there.  I have over 11 million documents in each of my index 
shards (68 million for the whole index) and my numbers for those three 
caches are 64, 512, and 16384, with autoWarm counts of 4 and 32, since 
the documentCache doesn't directly support warming.


If you are talking about how much memory Windows says the Java process 
says it is taking up, take a look at the replies you have already gotten 
on your Virtual Memory message.  As Erick and Michael told you, if you 
are using the latest version (3.5) with the standard directoryFactory 
config, most of the memory that you are seeing there is because the OS 
is memory mapping your entire on-disk index, taking advantage of the OS 
disk cache to speed up disk access without actually allocating the 
memory involved.  This is a good thing, even though the process numbers 
look bad.  JConsole or another java memory tool can show you the true 
picture.


With 60 million records, even if those records are small, your Solr 
index will probably grow to several gigabytes.  For the best 
performance, your server must have enough memory so that the entire 
index can fit into RAM, after discounting memory usage for the OS itself 
and the java process that contains Solr.  If you can get MOST of the 
index into RAM, performance will likely still be acceptable.


You message implies that 6GB worries you very much, so I am guessing 
that your server has somewhere in the range of 4GB to 8GB of RAM, but 
your index is very much larger than this.  You don't actually say 
whether you lose performance.  Do you, or are you just worried about the 
memory usage?  If Solr's query times start increasing, that is usually a 
good indicator that it is not healthy.


Thanks,
Shawn



Solr caching memory consumption Problem

2012-03-31 Thread Suneel
Hello friends,

I am using DIH for solr indexing. I have 60 million records in SQL which
need to upload on solr. i started caching its smoothly working and memory
consumption is normal, But after some time incrementally memory consumption
going high and process reach more then 6 gb. that the reason i am not able
to caching my data.
please advise me if anything need to be done in configuration or in tomcat
configuration. 

this will be very help full for me.







-
Regards,

Suneel Pandey
Sr. Software Developer
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-caching-memory-consumption-Problem-tp3873158p3873158.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr caching problem

2009-09-24 Thread Lance Norskog
There are now two excellent books: Lucene In Action 2 and Solr 1.4
Enterprise Search Server the describe the inners workings of these
technologies and how they fit together.

Otherwise Solr and Lucene knowledge are only available in a fragmented
form across many wiki pages, bug reports and email discussions.

But the direct answer is: before you commit your changes you will not
seem them in queries. When you commit them, all caches are thrown away
and rebuilt when you do the same queries you did before. This
rebuilding process has various tools to control it in solrconfig.xml.

On Wed, Sep 23, 2009 at 8:27 PM, satya tosatyaj...@gmail.com wrote:
 Is there any way to analyze or see that which documents are getting cached
 by documentCache -

  documentCache
     class=solr.LRUCache
     size=512
     initialSize=512
     autowarmCount=0/



 On Wed, Sep 23, 2009 at 8:10 AM, satya tosatyaj...@gmail.com wrote:

 First of all , thanks a lot for the clarification.Is there any way to see,
 how this cache is working internally and what are the objects being stored
 and how much memory its consuming,so that we can get a clear picture in
 mind.And how to test the performance through cache.


 On Tue, Sep 22, 2009 at 11:19 PM, Fuad Efendi f...@efendi.ca wrote:

  1)Then do you mean , if we delete a perticular doc ,then that is going
 to
 be
  deleted from
    cache also.

 When you delete document, and then COMMIT your changes, new caches will be
 warmed up (and prepopulated by some key-value pairs from old instances),
 etc:

  !-- documentCache caches Lucene Document objects (the stored fields for
 each document).
       Since Lucene internal document ids are transient, this cache will
 not
 be autowarmed.  --
    documentCache
      class=solr.LRUCache
      size=512
      initialSize=512
      autowarmCount=0/

 - this one won't be 'prepopulated'.




  2)In solr,is  cache storing the entire document in memory or only the
  references to
     documents in memory.

 There are many different cache instances, DocumentCache should store ID,
 Document pairs, etc








-- 
Lance Norskog
goks...@gmail.com


Re: solr caching problem

2009-09-23 Thread satya
Is there any way to analyze or see that which documents are getting cached
by documentCache -

  documentCache
 class=solr.LRUCache
 size=512
 initialSize=512
 autowarmCount=0/



On Wed, Sep 23, 2009 at 8:10 AM, satya tosatyaj...@gmail.com wrote:

 First of all , thanks a lot for the clarification.Is there any way to see,
 how this cache is working internally and what are the objects being stored
 and how much memory its consuming,so that we can get a clear picture in
 mind.And how to test the performance through cache.


 On Tue, Sep 22, 2009 at 11:19 PM, Fuad Efendi f...@efendi.ca wrote:

  1)Then do you mean , if we delete a perticular doc ,then that is going
 to
 be
  deleted from
cache also.

 When you delete document, and then COMMIT your changes, new caches will be
 warmed up (and prepopulated by some key-value pairs from old instances),
 etc:

  !-- documentCache caches Lucene Document objects (the stored fields for
 each document).
   Since Lucene internal document ids are transient, this cache will
 not
 be autowarmed.  --
documentCache
  class=solr.LRUCache
  size=512
  initialSize=512
  autowarmCount=0/

 - this one won't be 'prepopulated'.




  2)In solr,is  cache storing the entire document in memory or only the
  references to
 documents in memory.

 There are many different cache instances, DocumentCache should store ID,
 Document pairs, etc






solr caching problem

2009-09-22 Thread satyasundar jena
I configured filter cache in solrconfig.xml as here under :
filterCache
class=solr.FastLRUCache
size=16384
initialSize=4096
autowarmCount=4096/

useFilterForSortedQuerytrue/useFilterForSortedQuery

as per
http://wiki.apache.org/solr/SolrCaching#head-b6a7d51521d55fa0c89f2b576b2659f297f9

And executed a query as:
http://localhost:8080/solr/select/?q=*:*fq=id:(172704http://localhost:8080/solr/select/?q=*:*fq=id:%28172704TO
2079813)sort=id asc

But when i deleted the doc id:172704 and executed the query again , i didnt
find the same doc(172704 ) in my
result.


Re: solr caching problem

2009-09-22 Thread Yonik Seeley
Solr's caches should be transparent - they should only speed up
queries, not change the result of queries.

-Yonik
http://www.lucidimagination.com

On Tue, Sep 22, 2009 at 9:45 AM, satyasundar jena tosatyaj...@gmail.com wrote:
 I configured filter cache in solrconfig.xml as here under :
 filterCache
 class=solr.FastLRUCache
 size=16384
 initialSize=4096
 autowarmCount=4096/

 useFilterForSortedQuerytrue/useFilterForSortedQuery

 as per
 http://wiki.apache.org/solr/SolrCaching#head-b6a7d51521d55fa0c89f2b576b2659f297f9

 And executed a query as:
 http://localhost:8080/solr/select/?q=*:*fq=id:(172704http://localhost:8080/solr/select/?q=*:*fq=id:%28172704TO
 2079813)sort=id asc

 But when i deleted the doc id:172704 and executed the query again , i didnt
 find the same doc(172704 ) in my
 result.



Re: solr caching problem

2009-09-22 Thread satyasundar jena
1)Then do you mean , if we delete a perticular doc ,then that is going to be
deleted from
  cache also.
2)In solr,is  cache storing the entire document in memory or only the
references to
   documents in memory.
And how to test this caching after all.
I ll be thankful upon getting an elaboration.

On Tue, Sep 22, 2009 at 8:46 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 Solr's caches should be transparent - they should only speed up
 queries, not change the result of queries.

 -Yonik
 http://www.lucidimagination.com

 On Tue, Sep 22, 2009 at 9:45 AM, satyasundar jena tosatyaj...@gmail.com
 wrote:
  I configured filter cache in solrconfig.xml as here under :
  filterCache
  class=solr.FastLRUCache
  size=16384
  initialSize=4096
  autowarmCount=4096/
 
  useFilterForSortedQuerytrue/useFilterForSortedQuery
 
  as per
 
 http://wiki.apache.org/solr/SolrCaching#head-b6a7d51521d55fa0c89f2b576b2659f297f9
 
  And executed a query as:
  http://localhost:8080/solr/select/?q=*:*fq=id:(172704http://localhost:8080/solr/select/?q=*:*fq=id:%28172704
 http://localhost:8080/solr/select/?q=*:*fq=id:%28172704TO
  2079813)sort=id asc
 
  But when i deleted the doc id:172704 and executed the query again , i
 didnt
  find the same doc(172704 ) in my
  result.
 



RE: solr caching problem

2009-09-22 Thread Fuad Efendi
 1)Then do you mean , if we delete a perticular doc ,then that is going to
be
 deleted from
   cache also.

When you delete document, and then COMMIT your changes, new caches will be
warmed up (and prepopulated by some key-value pairs from old instances),
etc:

  !-- documentCache caches Lucene Document objects (the stored fields for
each document).
   Since Lucene internal document ids are transient, this cache will not
be autowarmed.  --
documentCache
  class=solr.LRUCache
  size=512
  initialSize=512
  autowarmCount=0/

- this one won't be 'prepopulated'.




 2)In solr,is  cache storing the entire document in memory or only the
 references to
documents in memory.

There are many different cache instances, DocumentCache should store ID,
Document pairs, etc




Re: solr caching problem

2009-09-22 Thread satya
First of all , thanks a lot for the clarification.Is there any way to see,
how this cache is working internally and what are the objects being stored
and how much memory its consuming,so that we can get a clear picture in
mind.And how to test the performance through cache.

On Tue, Sep 22, 2009 at 11:19 PM, Fuad Efendi f...@efendi.ca wrote:

  1)Then do you mean , if we delete a perticular doc ,then that is going to
 be
  deleted from
cache also.

 When you delete document, and then COMMIT your changes, new caches will be
 warmed up (and prepopulated by some key-value pairs from old instances),
 etc:

  !-- documentCache caches Lucene Document objects (the stored fields for
 each document).
   Since Lucene internal document ids are transient, this cache will not
 be autowarmed.  --
documentCache
  class=solr.LRUCache
  size=512
  initialSize=512
  autowarmCount=0/

 - this one won't be 'prepopulated'.




  2)In solr,is  cache storing the entire document in memory or only the
  references to
 documents in memory.

 There are many different cache instances, DocumentCache should store ID,
 Document pairs, etc





Contributions Needed: Faceting Performance, SOLR Caching

2008-10-19 Thread Funtick

Users  Developers  Possible Contributors, 


Hi,

Recently I did some code hacks and I am using frequency calcs for TermVector
instead of default out-of-the-box DocSet Intersections. It improves
performance hundreds of times at shopping engine http://www.tokenizer.org -
please check http://issues.apache.org/jira/browse/SOLR-711 - I feel the term
faceting (and related architectural decision made for CNET several years
ago) is completely wrong. Default SOLR response times: 30-180 seconds; with
TermVector: 0.2 seconds (25 millions documents, tokenized field). For
non-tokenized field: it also looks natural to use frequency calcs, but I
have not done it yet.

Sorry... too busy with Liferay Portal contract assignments,
http://www.linkedin.com/in/liferay

Another possible performance improvements: create safe  concurrent cache
for SOLR, you may check LingPipe, and also
http://issues.apache.org/jira/browse/SOLR-665 and
http://issues.apache.org/jira/browse/SOLR-667.

Lucene developers are doing greate job to remove synchronization in several
places too, such as isDeleted() method call... would be nice to have
unsynchronized API version for read-only indexes.


Thanks!




-- 
View this message in context: 
http://www.nabble.com/Contributions-Needed%3A-Faceting-Performance%2C-SOLR-Caching-tp20058987p20058987.html
Sent from the Solr - User mailing list archive at Nabble.com.