Re: Question about startup memory usage

2019-11-14 Thread Shawn Heisey

On 11/14/2019 1:46 AM, Hongxu Ma wrote:

Thank you @Shawn Heisey , you help me many times.

My -xms=1G
When restart solr, I can see the progress of memory increasing (from 1G to 9G, 
took near 10s).

I have a guess: maybe solr is loading some needed files into heap memory, e.g. 
*.tip : term index file. What's your thoughts?


Solr's basic operation involves quite a lot of Java memory allocation. 
Most of what gets allocated turns into garbage almost immediately, but 
Java does not reuse that memory right away ... it can only be reused 
after garbage collection on the appropriate memory region runs.


The algorithms in Java that decide between either grabbing more memory 
(up to the configured heap limit) or running garbage collection are 
beyond my understanding.  For programs with heavy memory allocation, 
like Solr, the preference does seem to lean towards allocating more 
memory if it's available than performing garbage collection.


I can imagine that initial loading of indexes containing billions of 
documents will require quite a bit of heap.  I do not know what data is 
stored in that memory.


Thanks,
Shawn


Re: Question about startup memory usage

2019-11-14 Thread Hongxu Ma
Thank you @Shawn Heisey<mailto:apa...@elyograg.org> , you help me many times.

My -xms=1G
When restart solr, I can see the progress of memory increasing (from 1G to 9G, 
took near 10s).

I have a guess: maybe solr is loading some needed files into heap memory, e.g. 
*.tip : term index file. What's your thoughts?

thanks.



From: Shawn Heisey 
Sent: Thursday, November 14, 2019 1:15
To: solr-user@lucene.apache.org 
Subject: Re: Question about startup memory usage

On 11/13/2019 2:03 AM, Hongxu Ma wrote:
> I have a solr-cloud cluster with a big collection, after startup (no any 
> search/index operations), its jvm memory usage is 9GB (via top: RES).
>
> Cluster and collection info:
> each host: total 64G mem, two solr nodes with -xmx=15G
> collection: total 9B billion docs (but each doc is very small: only some 
> bytes), total size 3TB.
>
> My question is:
> Is the 9G mem usage after startup normal? If so, I am worried that the follow 
> up index/search operations will cause an OOM error.
> And how can I reduce the memory usage? Maybe I should introduce more host 
> with nodes, but besides this, is there any other solution?

With the "-Xmx=15G" option, you've told Java that it can use up to 15GB
for heap.  It's total resident memory usage is eventually going to reach
a little over 15GB and probably never go down.  This is how Java works.

The amount of memory that Java allocates immediately on program startup
is related to the -Xms setting.  Normally Solr uses the same number for
both -Xms and -Xmx, but that can be changed if you desire.  We recommend
using the same number.  If -Xms is smaller than -Xmx, Java may allocate
less memory as soon as it starts, then Solr is going to run through its
startup procedure.  We will not know exactly how much memory allocation
is going to occur when that happens ... but with billions of documents,
it's not going to be small.

Thanks,
Shawn


Re: Question about startup memory usage

2019-11-13 Thread Shawn Heisey

On 11/13/2019 2:03 AM, Hongxu Ma wrote:

I have a solr-cloud cluster with a big collection, after startup (no any 
search/index operations), its jvm memory usage is 9GB (via top: RES).

Cluster and collection info:
each host: total 64G mem, two solr nodes with -xmx=15G
collection: total 9B billion docs (but each doc is very small: only some 
bytes), total size 3TB.

My question is:
Is the 9G mem usage after startup normal? If so, I am worried that the follow 
up index/search operations will cause an OOM error.
And how can I reduce the memory usage? Maybe I should introduce more host with 
nodes, but besides this, is there any other solution?


With the "-Xmx=15G" option, you've told Java that it can use up to 15GB 
for heap.  It's total resident memory usage is eventually going to reach 
a little over 15GB and probably never go down.  This is how Java works.


The amount of memory that Java allocates immediately on program startup 
is related to the -Xms setting.  Normally Solr uses the same number for 
both -Xms and -Xmx, but that can be changed if you desire.  We recommend 
using the same number.  If -Xms is smaller than -Xmx, Java may allocate 
less memory as soon as it starts, then Solr is going to run through its 
startup procedure.  We will not know exactly how much memory allocation 
is going to occur when that happens ... but with billions of documents, 
it's not going to be small.


Thanks,
Shawn


Question about startup memory usage

2019-11-13 Thread Hongxu Ma
Hi
I have a solr-cloud cluster with a big collection, after startup (no any 
search/index operations), its jvm memory usage is 9GB (via top: RES).

Cluster and collection info:
each host: total 64G mem, two solr nodes with -xmx=15G
collection: total 9B billion docs (but each doc is very small: only some 
bytes), total size 3TB.

My question is:
Is the 9G mem usage after startup normal? If so, I am worried that the follow 
up index/search operations will cause an OOM error.
And how can I reduce the memory usage? Maybe I should introduce more host with 
nodes, but besides this, is there any other solution?

Thanks.






Re: Question about memory usage and file handling

2019-11-11 Thread Erick Erickson
(1) no. The internal Ram buffer will pretty much limit the amount of heap used 
however.

(2) You actually have several segments. “.cfs” stands for “Compound File”, see: 

https://lucene.apache.org/core/7_1_0/core/org/apache/lucene/codecs/lucene70/package-summary.html
"An optional "virtual" file consisting of all the other index files for systems 
that frequently run out of file handles.”

IOW, _0.cfs is a complete segment. _1.cfs is a different, complete segment etc. 
The merge policy (TieredMergePolicy) controls when these are used .vs. the 
segment being kept in separate files.

New segments are created whenever the ram buffer is flushed or whenever you do 
a commit (closing the IW also creates a segment IIUC). However, under control 
of the merge policy, segments are merged. See: 
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

You’re confusing closing a writer with merging segments. Essentially, every 
time a commit happens, the merge policy is called to determine if segments 
should be merged, see Mike’s blog above.

Additionally, you say "I was hoping there would be only _0.cfs file”. This’ll 
pretty much never happen. Segment names always increase, at best you’d have 
something like _ab.cfs, if not 10-15 _ab* files.

Lucene likes file handles, essentially when searching a file handle will be 
open for _every_ file in your index all the time.

All that said, counting the number of files seems like a waste of time. If 
you’re running on a *nix box, the usual (Solr I’ll admit, but I think it 
applies to Lucene as well) is to set the limit to 65K or so.

And if you’re truly concerned, and since you say this is an immutable, you can 
do a forceMerge. Prior to Lucene 7.5, the would by default form exactly one 
segment. For Lucene 7.5 and later, it’ll respect max segment size (a parameter 
in TMP, defaults to 5g) unless you specify a segment count of 1.

Best,
Erick

> On Nov 11, 2019, at 5:47 PM, Shawn Heisey  wrote:
> 
> On 11/11/2019 1:40 PM, siddharth teotia wrote:
>> I have a few questions about Lucene indexing and file handling. It would be
>> great if someone can help with these. I had earlier asked these questions
>> on gene...@lucene.apache.org but was asked to seek help here.
> 
> This mailing list (solr-user) is for Solr.  Questions about Lucene do not 
> belong on this list.
> 
> You should ask on the java-user mailing list, which is for questions related 
> to the core (Java) version of Lucene.
> 
> http://lucene.apache.org/core/discussion.html#java-user-list-java-userluceneapacheorg
> 
> I have put the original sender address in the BCC field just in case you are 
> not subscribed here.
> 
> Thanks,
> Shawn



Re: Question about memory usage and file handling

2019-11-11 Thread Shawn Heisey

On 11/11/2019 1:40 PM, siddharth teotia wrote:

I have a few questions about Lucene indexing and file handling. It would be
great if someone can help with these. I had earlier asked these questions
on gene...@lucene.apache.org but was asked to seek help here.


This mailing list (solr-user) is for Solr.  Questions about Lucene do 
not belong on this list.


You should ask on the java-user mailing list, which is for questions 
related to the core (Java) version of Lucene.


http://lucene.apache.org/core/discussion.html#java-user-list-java-userluceneapacheorg

I have put the original sender address in the BCC field just in case you 
are not subscribed here.


Thanks,
Shawn


Question about memory usage and file handling

2019-11-11 Thread siddharth teotia
Hi All,

I have a few questions about Lucene indexing and file handling. It would be
great if someone can help with these. I had earlier asked these questions
on gene...@lucene.apache.org but was asked to seek help here.


(1) During indexing, is there any knob to tell the writer to use off-heap
for buffering. I didn't find anything in the docs so probably the answer is
no. Just confirming.

(2) I did some experiments with buffering threshold using
setMaxRAMBufferSizeMB() on IndexWriterConfig. I varied it from 16MB
(default), 128MB, 256MB and 512MB. The experiment was ingesting 5million
documents. It turns out that buffering threshold also controls the number
of files that are created in the index directory. In all the cases, I see
only 1 segment (since there was just one segments_1) file but there were
multiple .cfs files  -- _0.cfs, _1.cfs, _2.cfs, _3.cfs.

How can there be multiple cfs files when there is just one segment? My
understanding from the documentation was that all files for each segment
will have the same name but different extension. In this case, even though
there is only 1 segment, there are still cfs files. Does each flush result
in a new file?

The reason to do this experiment is to understand the number of open files
both while building the index and querying. I am not quite sure why I am
seeing multiple CFS files when there is only 1 segment. I was hoping there
would be only_0.cfs file.  This is true when buffer threshold is 512MB, but
there are 2 cfs files when threshold is set to 256MB, 5 cfs files when set
to 128MB and I didn't see the CFS file for the default 16MB threshold.
There were individual files (.fdx, .fdt, .tip etc). I thought by default
Lucene creates a compound file at least after the writer closes. Is that
not true?

I can see that during querying, only the cfs file is kept opened. But I
would like to understand a little bit about the number of cfs files and
based on that we can set the buffering threshold to control the heap
overhead while building the index.

(2) In my experiments, the writer commits and is closed after ingesting all
the 5million documents and after that there is no need for us to index
more. So essentially it is an immutable index. However, I want to
understand the threshold for creating a new segment. Is that pretty high?
Or if the writer is reopened, then the next set of documents will go into
the next segment and so on?

I would really appreciate some help with above questions.

Thanks,
Siddharth


Re: investigating high heap memory usage particularly on overseer / collection leaders

2019-10-22 Thread Paras Lehana
Since you say that it could be related to client usage patterns, have you
tried analyzing queries taking the maximum times? Refer this
<https://lucene.apache.org/solr/guide/7_6/configuring-logging.html#logging-slow-queries>
.

On Tue, 8 Oct 2019 at 02:42, dshih  wrote:

> 3-node SOLR 7.4.0
> 24gb max heap memory
> 13 collections, each with 500mb-2gb index (on disk)
>
> We are investigating high heap memory usage/spikes with our SOLR cluster
> (details above).  After rebooting the cluster, all three instances stay
> under 2gb for about a day.  Then suddenly, one instance (srch01 in the
> below
> graph) spikes to about 7.5gb and begins a cycle of 3gb-7.5gb
> ups-and-downs.
> On this cluster, srch01 is both the overseer and the leader for all
> collections.  A few days later, the same trend begins occurring for another
> node (srch02).
>
> Are there known usage patterns that would cause this kind of memory usage
> with SOLR?  In particular, it seems odd that it would only affect the
> overseer/leaders node for days.  Also, any tips on investigation?  We
> haven't been able to deduce much from visualvm profiling.
>
> Additional context.  For years, we set max heap memory to 4gb.  But our
> SOLR
> instances recently began to OOM.  Increasing to 8gb helped, but the OOMs
> still eventually occurred.  This is how we eventually set it to 24gb
> (following SOLR documentation saying 10-20gb was not uncommon for
> production
> instances).  But the recent change is what makes us suspicious that some
> client usage pattern is the root cause.
>
> <https://lucene.472066.n3.nabble.com/file/t494250/ue1_10-4_to_10-7.jpg>
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Software Programmer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.


investigating high heap memory usage particularly on overseer / collection leaders

2019-10-07 Thread dshih
3-node SOLR 7.4.0
24gb max heap memory
13 collections, each with 500mb-2gb index (on disk)

We are investigating high heap memory usage/spikes with our SOLR cluster
(details above).  After rebooting the cluster, all three instances stay
under 2gb for about a day.  Then suddenly, one instance (srch01 in the below
graph) spikes to about 7.5gb and begins a cycle of 3gb-7.5gb ups-and-downs. 
On this cluster, srch01 is both the overseer and the leader for all
collections.  A few days later, the same trend begins occurring for another
node (srch02).

Are there known usage patterns that would cause this kind of memory usage
with SOLR?  In particular, it seems odd that it would only affect the
overseer/leaders node for days.  Also, any tips on investigation?  We
haven't been able to deduce much from visualvm profiling.

Additional context.  For years, we set max heap memory to 4gb.  But our SOLR
instances recently began to OOM.  Increasing to 8gb helped, but the OOMs
still eventually occurred.  This is how we eventually set it to 24gb
(following SOLR documentation saying 10-20gb was not uncommon for production
instances).  But the recent change is what makes us suspicious that some
client usage pattern is the root cause.

<https://lucene.472066.n3.nabble.com/file/t494250/ue1_10-4_to_10-7.jpg> 



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Synonym filters memory usage

2019-10-02 Thread Dominique Bejean
Thank you for all your responses.
Dominique

Le lun. 30 sept. 2019 à 13:38, Erick Erickson  a
écrit :

> Solr/Lucene _better_ not have a copy of the synonym map for every segment,
> if so it’s a JIRA for sure. I’ve seen indexes with 100s of segments. With a
> large synonym file it’d be terrible.
>
> I would be really, really, really surprised if this is the case. The
> Lucene people are very careful with memory usage and would hop on this in
> an instant if true I’d guess.
>
> Best,
> Erick
>
> > On Sep 30, 2019, at 5:27 AM, Andrea Gazzarini 
> wrote:
> >
> > That sounds really strange to me.
> > Segments are created gradually depending on changes applied to the
> index, while the Schema should have a completely different lifecycle,
> independent from that.
> > If that is true, that would mean each time a new segment is created Solr
> would instantiate a new Schema instance (or at least, assuming this is
> valid only for synonyms, one SynonymFilterFactory, one SynonymFilter, one
> SynonymMap), which again, sounds really strange.
> >
> > Thanks for the point, I'll check and I'll let you know
> >
> > Cheers,
> > Andrea
> >
> > On 30/09/2019 09:58, Bernd Fehling wrote:
> >> Yes, I think so.
> >> While integrating a Thesaurus as synonyms.txt I saw massive memory
> usage.
> >> A heap dump and analysis with MemoryAnalyzer pointed out that the
> >> SynonymMap took 3 times a huge amount of memory, together with each
> >> opened index segment.
> >> Just try it and check that by yourself with heap dump and
> MemoryAnalyzer.
> >>
> >> Regards
> >> Bernd
> >>
> >>
> >> Am 30.09.19 um 09:44 schrieb Andrea Gazzarini:
> >>> mmm, ok for the core but are you sure things in this case are working
> per-segment? I would expect a FilterFactory instance per index, initialized
> at schema loading time.
> >>>
> >>> On 30/09/2019 09:04, Bernd Fehling wrote:
> >>>> And I think this is per core per index segment.
> >>>>
> >>>> 2 cores per instance, each core with 3 index segments, sums up to 6
> times
> >>>> the 2 SynonymMaps. Results in 12 times SynonymMaps.
> >>>>
> >>>> Regards
> >>>> Bernd
> >>>>
> >>>>
> >>>> Am 30.09.19 um 08:41 schrieb Andrea Gazzarini:
> >>>>>   Hi,
> >>>>> looking at the stateful nature of SynonymGraphFilter/FilterFactory
> classes,
> >>>>> the answer should be 2 times (one time per type instance).
> >>>>> The SynonymMap, which internally holds the synonyms table, is a
> private
> >>>>> member of the filter factory and it is loaded each time the factory
> needs
> >>>>> to create a type.
> >>>>>
> >>>>> Best,
> >>>>> Andrea
> >>>>>
> >>>>> On 29/09/2019 23:49, Dominique Bejean wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> My concern is about memory used by synonym filter, especially if
> synonyms
> >>>>> resources files are large.
> >>>>>
> >>>>> If in my schema, there are two field types "TypeSyno1" and
> "TypeSyno2"
> >>>>> using synonym filter with the same synonyms files.
> >>>>> For each of these two field types there are two fields
> >>>>>
> >>>>> Field1 type is TypeSyno1
> >>>>> Field2 type is TypeSyno1
> >>>>> Field3 type is TypeSyno2
> >>>>> Field4 type is TypeSyno2
> >>>>>
> >>>>> How many times is the synonym file loaded in memory ?
> >>>>> 4 times, so one time per field ?
> >>>>> 2 times, so one time per instanciated type ?
> >>>>>
> >>>>> Regards
> >>>>>
> >>>>> Dominique
> >>>
> >
> > --
> > Andrea Gazzarini
> > Search Consultant, R&D Software Engineer
> >
> >
> >
> > mobile: +39 349 513 86 25
> > email: a.gazzar...@sease.io
> >
>
>


Re: Synonym filters memory usage

2019-09-30 Thread Erick Erickson
Solr/Lucene _better_ not have a copy of the synonym map for every segment, if 
so it’s a JIRA for sure. I’ve seen indexes with 100s of segments. With a large 
synonym file it’d be terrible.

I would be really, really, really surprised if this is the case. The Lucene 
people are very careful with memory usage and would hop on this in an instant 
if true I’d guess.

Best,
Erick

> On Sep 30, 2019, at 5:27 AM, Andrea Gazzarini  wrote:
> 
> That sounds really strange to me. 
> Segments are created gradually depending on changes applied to the index, 
> while the Schema should have a completely different lifecycle, independent 
> from that.
> If that is true, that would mean each time a new segment is created Solr 
> would instantiate a new Schema instance (or at least, assuming this is valid 
> only for synonyms, one SynonymFilterFactory, one SynonymFilter, one 
> SynonymMap), which again, sounds really strange.
> 
> Thanks for the point, I'll check and I'll let you know
> 
> Cheers, 
> Andrea
> 
> On 30/09/2019 09:58, Bernd Fehling wrote:
>> Yes, I think so. 
>> While integrating a Thesaurus as synonyms.txt I saw massive memory usage. 
>> A heap dump and analysis with MemoryAnalyzer pointed out that the 
>> SynonymMap took 3 times a huge amount of memory, together with each 
>> opened index segment. 
>> Just try it and check that by yourself with heap dump and MemoryAnalyzer. 
>> 
>> Regards 
>> Bernd 
>> 
>> 
>> Am 30.09.19 um 09:44 schrieb Andrea Gazzarini: 
>>> mmm, ok for the core but are you sure things in this case are working 
>>> per-segment? I would expect a FilterFactory instance per index, initialized 
>>> at schema loading time. 
>>> 
>>> On 30/09/2019 09:04, Bernd Fehling wrote: 
>>>> And I think this is per core per index segment. 
>>>> 
>>>> 2 cores per instance, each core with 3 index segments, sums up to 6 times 
>>>> the 2 SynonymMaps. Results in 12 times SynonymMaps. 
>>>> 
>>>> Regards 
>>>> Bernd 
>>>> 
>>>> 
>>>> Am 30.09.19 um 08:41 schrieb Andrea Gazzarini: 
>>>>>   Hi, 
>>>>> looking at the stateful nature of SynonymGraphFilter/FilterFactory 
>>>>> classes, 
>>>>> the answer should be 2 times (one time per type instance). 
>>>>> The SynonymMap, which internally holds the synonyms table, is a private 
>>>>> member of the filter factory and it is loaded each time the factory needs 
>>>>> to create a type. 
>>>>> 
>>>>> Best, 
>>>>> Andrea 
>>>>> 
>>>>> On 29/09/2019 23:49, Dominique Bejean wrote: 
>>>>> 
>>>>> Hi, 
>>>>> 
>>>>> My concern is about memory used by synonym filter, especially if synonyms 
>>>>> resources files are large. 
>>>>> 
>>>>> If in my schema, there are two field types "TypeSyno1" and "TypeSyno2" 
>>>>> using synonym filter with the same synonyms files. 
>>>>> For each of these two field types there are two fields 
>>>>> 
>>>>> Field1 type is TypeSyno1 
>>>>> Field2 type is TypeSyno1 
>>>>> Field3 type is TypeSyno2 
>>>>> Field4 type is TypeSyno2 
>>>>> 
>>>>> How many times is the synonym file loaded in memory ? 
>>>>> 4 times, so one time per field ? 
>>>>> 2 times, so one time per instanciated type ? 
>>>>> 
>>>>> Regards 
>>>>> 
>>>>> Dominique 
>>> 
> 
> -- 
> Andrea Gazzarini
> Search Consultant, R&D Software Engineer
> 
> 
> 
> mobile: +39 349 513 86 25
> email: a.gazzar...@sease.io 
> 



Re: Synonym filters memory usage

2019-09-30 Thread Andrea Gazzarini

That sounds really strange to me.
Segments are created gradually depending on changes applied to the 
index, while the Schema should have a completely different lifecycle, 
independent from that.
If that is true, that would mean each time a new segment is created Solr 
would instantiate a new Schema instance (or at least, assuming this is 
valid only for synonyms, one SynonymFilterFactory, one SynonymFilter, 
one SynonymMap), which again, sounds really strange.


Thanks for the point, I'll check and I'll let you know

Cheers,
Andrea

On 30/09/2019 09:58, Bernd Fehling wrote:

Yes, I think so.
While integrating a Thesaurus as synonyms.txt I saw massive memory usage.
A heap dump and analysis with MemoryAnalyzer pointed out that the
SynonymMap took 3 times a huge amount of memory, together with each
opened index segment.
Just try it and check that by yourself with heap dump and MemoryAnalyzer.

Regards
Bernd


Am 30.09.19 um 09:44 schrieb Andrea Gazzarini:
mmm, ok for the core but are you sure things in this case are working 
per-segment? I would expect a FilterFactory instance per index, 
initialized at schema loading time.


On 30/09/2019 09:04, Bernd Fehling wrote:

And I think this is per core per index segment.

2 cores per instance, each core with 3 index segments, sums up to 6 
times

the 2 SynonymMaps. Results in 12 times SynonymMaps.

Regards
Bernd


Am 30.09.19 um 08:41 schrieb Andrea Gazzarini:

  Hi,
looking at the stateful nature of SynonymGraphFilter/FilterFactory 
classes,

the answer should be 2 times (one time per type instance).
The SynonymMap, which internally holds the synonyms table, is a 
private
member of the filter factory and it is loaded each time the factory 
needs

to create a type.

Best,
Andrea

On 29/09/2019 23:49, Dominique Bejean wrote:

Hi,

My concern is about memory used by synonym filter, especially if 
synonyms

resources files are large.

If in my schema, there are two field types "TypeSyno1" and "TypeSyno2"
using synonym filter with the same synonyms files.
For each of these two field types there are two fields

Field1 type is TypeSyno1
Field2 type is TypeSyno1
Field3 type is TypeSyno2
Field4 type is TypeSyno2

How many times is the synonym file loaded in memory ?
4 times, so one time per field ?
2 times, so one time per instanciated type ?

Regards

Dominique




--
Andrea Gazzarini
/Search Consultant, R&D Software Engineer/

Sease Ltd

mobile: +39 349 513 86 25
email: a.gazzar...@sease.io



Re: Synonym filters memory usage

2019-09-30 Thread Bernd Fehling

Yes, I think so.
While integrating a Thesaurus as synonyms.txt I saw massive memory usage.
A heap dump and analysis with MemoryAnalyzer pointed out that the
SynonymMap took 3 times a huge amount of memory, together with each
opened index segment.
Just try it and check that by yourself with heap dump and MemoryAnalyzer.

Regards
Bernd


Am 30.09.19 um 09:44 schrieb Andrea Gazzarini:
mmm, ok for the core but are you sure things in this case are working per-segment? I would expect a FilterFactory instance per index, 
initialized at schema loading time.


On 30/09/2019 09:04, Bernd Fehling wrote:

And I think this is per core per index segment.

2 cores per instance, each core with 3 index segments, sums up to 6 times
the 2 SynonymMaps. Results in 12 times SynonymMaps.

Regards
Bernd


Am 30.09.19 um 08:41 schrieb Andrea Gazzarini:

  Hi,
looking at the stateful nature of SynonymGraphFilter/FilterFactory classes,
the answer should be 2 times (one time per type instance).
The SynonymMap, which internally holds the synonyms table, is a private
member of the filter factory and it is loaded each time the factory needs
to create a type.

Best,
Andrea

On 29/09/2019 23:49, Dominique Bejean wrote:

Hi,

My concern is about memory used by synonym filter, especially if synonyms
resources files are large.

If in my schema, there are two field types "TypeSyno1" and "TypeSyno2"
using synonym filter with the same synonyms files.
For each of these two field types there are two fields

Field1 type is TypeSyno1
Field2 type is TypeSyno1
Field3 type is TypeSyno2
Field4 type is TypeSyno2

How many times is the synonym file loaded in memory ?
4 times, so one time per field ?
2 times, so one time per instanciated type ?

Regards

Dominique




Re: Synonym filters memory usage

2019-09-30 Thread Andrea Gazzarini
mmm, ok for the core but are you sure things in this case are working 
per-segment? I would expect a FilterFactory instance per index, 
initialized at schema loading time.


On 30/09/2019 09:04, Bernd Fehling wrote:

And I think this is per core per index segment.

2 cores per instance, each core with 3 index segments, sums up to 6 times
the 2 SynonymMaps. Results in 12 times SynonymMaps.

Regards
Bernd


Am 30.09.19 um 08:41 schrieb Andrea Gazzarini:

  Hi,
looking at the stateful nature of SynonymGraphFilter/FilterFactory 
classes,

the answer should be 2 times (one time per type instance).
The SynonymMap, which internally holds the synonyms table, is a private
member of the filter factory and it is loaded each time the factory 
needs

to create a type.

Best,
Andrea

On 29/09/2019 23:49, Dominique Bejean wrote:

Hi,

My concern is about memory used by synonym filter, especially if 
synonyms

resources files are large.

If in my schema, there are two field types "TypeSyno1" and "TypeSyno2"
using synonym filter with the same synonyms files.
For each of these two field types there are two fields

Field1 type is TypeSyno1
Field2 type is TypeSyno1
Field3 type is TypeSyno2
Field4 type is TypeSyno2

How many times is the synonym file loaded in memory ?
4 times, so one time per field ?
2 times, so one time per instanciated type ?

Regards

Dominique


--
Andrea Gazzarini
/Search Consultant, R&D Software Engineer/

Sease Ltd

mobile: +39 349 513 86 25
email: a.gazzar...@sease.io



Re: Synonym filters memory usage

2019-09-30 Thread Andrea Gazzarini



On 30/09/2019 09:04, Bernd Fehling wrote:

And I think this is per core per index segment.

2 cores per instance, each core with 3 index segments, sums up to 6 times
the 2 SynonymMaps. Results in 12 times SynonymMaps.

Regards
Bernd


Am 30.09.19 um 08:41 schrieb Andrea Gazzarini:

  Hi,
looking at the stateful nature of SynonymGraphFilter/FilterFactory 
classes,

the answer should be 2 times (one time per type instance).
The SynonymMap, which internally holds the synonyms table, is a private
member of the filter factory and it is loaded each time the factory 
needs

to create a type.

Best,
Andrea

On 29/09/2019 23:49, Dominique Bejean wrote:

Hi,

My concern is about memory used by synonym filter, especially if 
synonyms

resources files are large.

If in my schema, there are two field types "TypeSyno1" and "TypeSyno2"
using synonym filter with the same synonyms files.
For each of these two field types there are two fields

Field1 type is TypeSyno1
Field2 type is TypeSyno1
Field3 type is TypeSyno2
Field4 type is TypeSyno2

How many times is the synonym file loaded in memory ?
4 times, so one time per field ?
2 times, so one time per instanciated type ?

Regards

Dominique


--
Andrea Gazzarini
/Search Consultant, R&D Software Engineer/

Sease Ltd

mobile: +39 349 513 86 25
email: a.gazzar...@sease.io



Re: Synonym filters memory usage

2019-09-30 Thread Bernd Fehling

And I think this is per core per index segment.

2 cores per instance, each core with 3 index segments, sums up to 6 times
the 2 SynonymMaps. Results in 12 times SynonymMaps.

Regards
Bernd


Am 30.09.19 um 08:41 schrieb Andrea Gazzarini:

  Hi,
looking at the stateful nature of SynonymGraphFilter/FilterFactory classes,
the answer should be 2 times (one time per type instance).
The SynonymMap, which internally holds the synonyms table, is a private
member of the filter factory and it is loaded each time the factory needs
to create a type.

Best,
Andrea

On 29/09/2019 23:49, Dominique Bejean wrote:

Hi,

My concern is about memory used by synonym filter, especially if synonyms
resources files are large.

If in my schema, there are two field types "TypeSyno1" and "TypeSyno2"
using synonym filter with the same synonyms files.
For each of these two field types there are two fields

Field1 type is TypeSyno1
Field2 type is TypeSyno1
Field3 type is TypeSyno2
Field4 type is TypeSyno2

How many times is the synonym file loaded in memory ?
4 times, so one time per field ?
2 times, so one time per instanciated type ?

Regards

Dominique


Re: Synonym filters memory usage

2019-09-29 Thread Andrea Gazzarini
 Hi,
looking at the stateful nature of SynonymGraphFilter/FilterFactory classes,
the answer should be 2 times (one time per type instance).
The SynonymMap, which internally holds the synonyms table, is a private
member of the filter factory and it is loaded each time the factory needs
to create a type.

Best,
Andrea

-- 
Andrea Gazzarini
*Search Consultant, R&D Software Engineer*


www.sease.io

email: a.gazzar...@sease.io
cell: +39 349 513 86 25

On 29/09/2019 23:49, Dominique Bejean wrote:

Hi,

My concern is about memory used by synonym filter, especially if synonyms
resources files are large.

If in my schema, there are two field types "TypeSyno1" and "TypeSyno2"
using synonym filter with the same synonyms files.
For each of these two field types there are two fields

Field1 type is TypeSyno1
Field2 type is TypeSyno1
Field3 type is TypeSyno2
Field4 type is TypeSyno2

How many times is the synonym file loaded in memory ?
4 times, so one time per field ?
2 times, so one time per instanciated type ?

Regards

Dominique


Synonym filters memory usage

2019-09-29 Thread Dominique Bejean
Hi,

My concern is about memory used by synonym filter, especially if synonyms
resources files are large.

If in my schema, there are two field types "TypeSyno1" and "TypeSyno2"
using synonym filter with the same synonyms files.
For each of these two field types there are two fields

Field1 type is TypeSyno1
Field2 type is TypeSyno1
Field3 type is TypeSyno2
Field4 type is TypeSyno2

How many times is the synonym file loaded in memory ?
4 times, so one time per field ?
2 times, so one time per instanciated type ?

Regards

Dominique


Re: Determing Solr heap requirments and analyzing memory usage

2019-05-02 Thread Erick Erickson
Brian: 

Many thanks for letting us know what you found. I’ll attach this to SOLR-13003 
which is about this exact issue but doesn’t contain this information. This is a 
great help.

> On May 2, 2019, at 6:15 AM, Brian Ecker  wrote:
> 
> Just to update here in order to help others that might run into similar
> issues in the future, the problem is resolved. The issue was caused by the
> queryResultCache. This was very easy to determine by analyzing a heap dump.
> In our setup we had the following config:
> 
>  autowarmCount="0"/>
> 
> In reality this maxRamMB="3072" was not as expected, and this cache was
> using *way* more memory (about 6-8 times the amount). See the following
> screenshot from Eclipse MAT (http://oi63.tinypic.com/epn341.jpg). Notice in
> the left window that ramBytes, the internal calculation of how much memory
> Solr currently thinks this cache is using, is 1894333464B (1894MB). Now
> notice that the highlighted line, the ConcurrentLRUCache used internally by
> the FastLRUCache representing the queryResultCache, is actually using
> 12212779160B (12212MB). On further investigation, I realized that this
> cache is a map from a query with all its associated objects as the key, to
> a very simple object containing an array of document (integer) ids as the
> value.
> 
> Looking into the lucene-solr source, I found the following line for the
> calculation of ramBytesUsed
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/util/ConcurrentLRUCache.java#L605.
> Surprisingly, the query objects used as keys in the queryResultCache do not
> implement Accountable as far as I can tell, and this lines up very well
> with our observation of memory usage because in the heap dump we can also
> see that the keys in the cache are using substantially more memory than the
> values and completely account for the additional memory usage. It was quite
> surprising to me that the keys were given a default value of 192B as
> specified in LRUCache.DEFAULT_RAM_BYTES_USED because I can't actually
> imagine a case where the keys in the queryResultCache would be so small. I
> imagine that in almost all cases the keys would actually be larger than the
> values for the queryResultCache, but that's probably not true for all
> usages of a FastLRUCache.
> 
> We solved our memory usage issue by drastically reducing the maxRamMB value
> and calculating the actual max usage as maxRamMB * 8. It would be quite
> useful to have this detail at least documented somewhere.
> 
> -Brian
> 
> On Tue, Apr 23, 2019 at 9:49 PM Shawn Heisey  wrote:
> 
>> On 4/23/2019 11:48 AM, Brian Ecker wrote:
>>> I see. The other files I meant to attach were the GC log (
>>> https://pastebin.com/raw/qeuQwsyd), the heap histogram (
>>> https://pastebin.com/raw/aapKTKTU), and the screenshot from top (
>>> http://oi64.tinypic.com/21r0bk.jpg).
>> 
>> I have no idea what to do with the histogram.  I doubt it's all that
>> useful anyway, as it wouldn't have any information about what parts of
>> the system are using the most.
>> 
>> The GC log is not complete.  It only covers 2 min 47 sec 674 ms of time.
>>  To get anything useful out of a GC log, it would probably need to
>> cover hours of runtime.
>> 
>> But if you are experiencing OutOfMemoryError, then either you have run
>> into something where a memory leak exists, or there's something about
>> your index or your queries that needs more heap than you have allocated.
>>  Memory leaks are not super common in Solr, but they have happened.
>> 
>> Tuning GC will never help OOME problems.
>> 
>> The screenshot looks like it matches the info below.
>> 
>>> I'll work on getting the heap dump, but would it also be sufficient to
>> use
>>> say a 5GB dump from when it's half full and then extrapolate to the
>>> contents of the heap when it's full? That way the dump would be a bit
>>> easier to work with.
>> 
>> That might be useful.  The only way to know for sure is to take a look
>> at it to see if the part of the code using lots of heap is detectable.
>> 
>>> There are around 2,100,000 documents.
>> 
>>> The data takes around 9GB on disk.
>> 
>> Ordinarily, I would expect that level of data to not need a whole lot of
>> heap.  10GB would be more than I would think necessary, but if your
>> queries are big consumers of memory, I could be wrong.  I ran indexes
>> with 30 million documents taking up 50GB of disk space on an 8GB heap.
>> I probably could have gone lower with no problems.
>> 
>> I have absolutely no idea what kind of requirements the spellcheck
>> feature has.  I've never used that beyond a few test queries.  If the
>> query information you sent is complete, I wouldn't expect the
>> non-spellcheck parts to require a whole lot of heap.  So perhaps
>> spellcheck is the culprit here.  Somebody else will need to comment on
>> that.
>> 
>> Thanks,
>> Shawn
>> 



Re: Determing Solr heap requirments and analyzing memory usage

2019-05-02 Thread Brian Ecker
Just to update here in order to help others that might run into similar
issues in the future, the problem is resolved. The issue was caused by the
queryResultCache. This was very easy to determine by analyzing a heap dump.
In our setup we had the following config:



In reality this maxRamMB="3072" was not as expected, and this cache was
using *way* more memory (about 6-8 times the amount). See the following
screenshot from Eclipse MAT (http://oi63.tinypic.com/epn341.jpg). Notice in
the left window that ramBytes, the internal calculation of how much memory
Solr currently thinks this cache is using, is 1894333464B (1894MB). Now
notice that the highlighted line, the ConcurrentLRUCache used internally by
the FastLRUCache representing the queryResultCache, is actually using
12212779160B (12212MB). On further investigation, I realized that this
cache is a map from a query with all its associated objects as the key, to
a very simple object containing an array of document (integer) ids as the
value.

Looking into the lucene-solr source, I found the following line for the
calculation of ramBytesUsed
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/util/ConcurrentLRUCache.java#L605.
Surprisingly, the query objects used as keys in the queryResultCache do not
implement Accountable as far as I can tell, and this lines up very well
with our observation of memory usage because in the heap dump we can also
see that the keys in the cache are using substantially more memory than the
values and completely account for the additional memory usage. It was quite
surprising to me that the keys were given a default value of 192B as
specified in LRUCache.DEFAULT_RAM_BYTES_USED because I can't actually
imagine a case where the keys in the queryResultCache would be so small. I
imagine that in almost all cases the keys would actually be larger than the
values for the queryResultCache, but that's probably not true for all
usages of a FastLRUCache.

We solved our memory usage issue by drastically reducing the maxRamMB value
and calculating the actual max usage as maxRamMB * 8. It would be quite
useful to have this detail at least documented somewhere.

-Brian

On Tue, Apr 23, 2019 at 9:49 PM Shawn Heisey  wrote:

> On 4/23/2019 11:48 AM, Brian Ecker wrote:
> > I see. The other files I meant to attach were the GC log (
> > https://pastebin.com/raw/qeuQwsyd), the heap histogram (
> > https://pastebin.com/raw/aapKTKTU), and the screenshot from top (
> > http://oi64.tinypic.com/21r0bk.jpg).
>
> I have no idea what to do with the histogram.  I doubt it's all that
> useful anyway, as it wouldn't have any information about what parts of
> the system are using the most.
>
> The GC log is not complete.  It only covers 2 min 47 sec 674 ms of time.
>   To get anything useful out of a GC log, it would probably need to
> cover hours of runtime.
>
> But if you are experiencing OutOfMemoryError, then either you have run
> into something where a memory leak exists, or there's something about
> your index or your queries that needs more heap than you have allocated.
>   Memory leaks are not super common in Solr, but they have happened.
>
> Tuning GC will never help OOME problems.
>
> The screenshot looks like it matches the info below.
>
> > I'll work on getting the heap dump, but would it also be sufficient to
> use
> > say a 5GB dump from when it's half full and then extrapolate to the
> > contents of the heap when it's full? That way the dump would be a bit
> > easier to work with.
>
> That might be useful.  The only way to know for sure is to take a look
> at it to see if the part of the code using lots of heap is detectable.
>
> > There are around 2,100,000 documents.
> 
> > The data takes around 9GB on disk.
>
> Ordinarily, I would expect that level of data to not need a whole lot of
> heap.  10GB would be more than I would think necessary, but if your
> queries are big consumers of memory, I could be wrong.  I ran indexes
> with 30 million documents taking up 50GB of disk space on an 8GB heap.
> I probably could have gone lower with no problems.
>
> I have absolutely no idea what kind of requirements the spellcheck
> feature has.  I've never used that beyond a few test queries.  If the
> query information you sent is complete, I wouldn't expect the
> non-spellcheck parts to require a whole lot of heap.  So perhaps
> spellcheck is the culprit here.  Somebody else will need to comment on
> that.
>
> Thanks,
> Shawn
>


Re: Determing Solr heap requirments and analyzing memory usage

2019-04-23 Thread Shawn Heisey

On 4/23/2019 11:48 AM, Brian Ecker wrote:

I see. The other files I meant to attach were the GC log (
https://pastebin.com/raw/qeuQwsyd), the heap histogram (
https://pastebin.com/raw/aapKTKTU), and the screenshot from top (
http://oi64.tinypic.com/21r0bk.jpg).


I have no idea what to do with the histogram.  I doubt it's all that 
useful anyway, as it wouldn't have any information about what parts of 
the system are using the most.


The GC log is not complete.  It only covers 2 min 47 sec 674 ms of time. 
 To get anything useful out of a GC log, it would probably need to 
cover hours of runtime.


But if you are experiencing OutOfMemoryError, then either you have run 
into something where a memory leak exists, or there's something about 
your index or your queries that needs more heap than you have allocated. 
 Memory leaks are not super common in Solr, but they have happened.


Tuning GC will never help OOME problems.

The screenshot looks like it matches the info below.


I'll work on getting the heap dump, but would it also be sufficient to use
say a 5GB dump from when it's half full and then extrapolate to the
contents of the heap when it's full? That way the dump would be a bit
easier to work with.


That might be useful.  The only way to know for sure is to take a look 
at it to see if the part of the code using lots of heap is detectable.



There are around 2,100,000 documents.



The data takes around 9GB on disk.


Ordinarily, I would expect that level of data to not need a whole lot of 
heap.  10GB would be more than I would think necessary, but if your 
queries are big consumers of memory, I could be wrong.  I ran indexes 
with 30 million documents taking up 50GB of disk space on an 8GB heap. 
I probably could have gone lower with no problems.


I have absolutely no idea what kind of requirements the spellcheck 
feature has.  I've never used that beyond a few test queries.  If the 
query information you sent is complete, I wouldn't expect the 
non-spellcheck parts to require a whole lot of heap.  So perhaps 
spellcheck is the culprit here.  Somebody else will need to comment on that.


Thanks,
Shawn


Re: Determing Solr heap requirments and analyzing memory usage

2019-04-23 Thread Brian Ecker
Thanks for your response. See below please for detailed responses.

On Tue, Apr 23, 2019 at 6:04 PM Shawn Heisey  wrote:

> On 4/23/2019 6:34 AM, Brian Ecker wrote:
> > What I’m trying to determine is (1) How much heap does
> > this setup need before it stabilizes and stops crashing with OOM errors,
> > (2) can this requirement somehow be reduced so that we can use less
> > memory, and (3) from the heap histogram, what is actually using memory
> > (lots of primitive type arrays and data structures, but what part of
> > Solr is using those)?
>
> Exactly one attachment made it through:  The file named
> solrconfig-anonymized.xml.  Attachments can't be used to share files
> because the mailing list software is going to eat them and we won't see
> them.  You'll need to use a file sharing website.  Dropbox is often a
> good choice.
>

I see. The other files I meant to attach were the GC log (
https://pastebin.com/raw/qeuQwsyd), the heap histogram (
https://pastebin.com/raw/aapKTKTU), and the screenshot from top (
http://oi64.tinypic.com/21r0bk.jpg).

>
> We won't be able to tell anything about what's using all the memory from
> a histogram.  We would need an actual heap dump from Java.  This file
> will be huge -- if you have a 10GB heap, and that heap is full, the file
> will likely be larger than 10GB.


I'll work on getting the heap dump, but would it also be sufficient to use
say a 5GB dump from when it's half full and then extrapolate to the
contents of the heap when it's full? That way the dump would be a bit
easier to work with.

>
> There is no way for us to know how much heap you need.  With a large
> amount of information about your setup, we can make a guess, but that
> guess will probably be wrong.  Info we'll need to make a start:
>

I believe I already provided most of this information in my original post,
as I understand that it's not trivial to make this assessment accurately.
I'll re-iterate below, but please see the original post too because I tried
to provide as much detail as possible.

>
> *) How many documents is this Solr instance handling?  You find this out
> by looking at every core and adding up all the "maxDoc" numbers.
>

There are around 2,100,000 documents.

>
> *) How much disk space is the index data taking?  This could be found
> either by getting a disk usage value for the solr home, or looking at
> every core and adding up the size of each one.
>

The data takes around 9GB on disk.

>
> *) What kind of queries are you running?  Anything with facets, or
> grouping?  Are you using a lot of sort fields?


No facets or grouping and no sort fields. The application performs a
full-text search complete-as-you-type function. Much of this is done using
prefix analyzers and edge ngrams. We also make heavy use of spellchecking.
An example of one of the queries produced is the following:

?q=(single_value_f1:"baril" OR multivalue_f1:"baril")^=1
(single_value_f2:(baril) OR multivalue_f2:(baril))^=0.5
&fl=score,myfield1,myfield2,myfield3:myfield3.ar&bf=product(def(myfield3.ar
,0),1)&rows=200&df=dummy&spellcheck=on&spellcheck.dictionary=spellchecker.es&spellcheck.dictionary=spellchecker.und&spellcheck.q=baril&spellcheck.accuracy=0.5&spellcheck.count=1&fq=+myfield1:(100
OR 200 OR 500)&fl=score&fl=myfield1&fl=myfield2&fl=myfield3:myfield3.ar


> *) What kind of data is in each document, and how large is that data?
>

The data contained is mostly 1-5 words of text in various fields and in
various languages. We apply different tokenizers and some language specific
analyzers for different fields, but almost every field is tokenized. There
are 215 fields in total, 77 of which are stored. Based on the index size on
disk and the number of documents, I guess that gives 4.32 KB/doc on
average.

>
> Your cache sizes are reasonable.  So you can't reduce heap requirements
> by much by reducing cache sizes.
>
> Here's some info about what takes a lot of heap and ideas for reducing
> the requirements:
>
> https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap


Thank you, but I've seen that page already and that's part of why I'm
confused, as I believe most of those points that usually take a lot of heap
don't seem to apply to my setup.

>
>
> That page also reiterates what I said above:  It's unlikely that anybody
> will be able to tell you exactly how much heap you need at a minimum.
> We can make guesses, but those guesses might be wrong.
>
> Thanks,
> Shawn
>


Re: Determing Solr heap requirments and analyzing memory usage

2019-04-23 Thread Shawn Heisey

On 4/23/2019 6:34 AM, Brian Ecker wrote:
What I’m trying to determine is (1) How much heap does 
this setup need before it stabilizes and stops crashing with OOM errors, 
(2) can this requirement somehow be reduced so that we can use less 
memory, and (3) from the heap histogram, what is actually using memory 
(lots of primitive type arrays and data structures, but what part of 
Solr is using those)?


Exactly one attachment made it through:  The file named 
solrconfig-anonymized.xml.  Attachments can't be used to share files 
because the mailing list software is going to eat them and we won't see 
them.  You'll need to use a file sharing website.  Dropbox is often a 
good choice.


We won't be able to tell anything about what's using all the memory from 
a histogram.  We would need an actual heap dump from Java.  This file 
will be huge -- if you have a 10GB heap, and that heap is full, the file 
will likely be larger than 10GB.


There is no way for us to know how much heap you need.  With a large 
amount of information about your setup, we can make a guess, but that 
guess will probably be wrong.  Info we'll need to make a start:


*) How many documents is this Solr instance handling?  You find this out 
by looking at every core and adding up all the "maxDoc" numbers.


*) How much disk space is the index data taking?  This could be found 
either by getting a disk usage value for the solr home, or looking at 
every core and adding up the size of each one.


*) What kind of queries are you running?  Anything with facets, or 
grouping?  Are you using a lot of sort fields?


*) What kind of data is in each document, and how large is that data?

Your cache sizes are reasonable.  So you can't reduce heap requirements 
by much by reducing cache sizes.


Here's some info about what takes a lot of heap and ideas for reducing 
the requirements:


https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

That page also reiterates what I said above:  It's unlikely that anybody 
will be able to tell you exactly how much heap you need at a minimum. 
We can make guesses, but those guesses might be wrong.


Thanks,
Shawn


Determing Solr heap requirments and analyzing memory usage

2019-04-23 Thread Brian Ecker
Hello,

We are currently running into a situation where Solr (version 7.4) in
slowly using up all available memory allocated to the heap, and then
eventually hitting an OutOfMemory error. We have tried increasing the heap
size and also tuning the GC settings, but this does not seem to solve the
issue. What we see is a slow increase in G1 Old Gen heap utilization until
it eventually takes all of the heap space and causes instances to crash.
Previously we tried running each instance with 10GB of heap space
allocated. We then tried running with 20GB of heap space, and we ran into
the same issue. I have attached a histogram of the heap captured from an
instance using nearly all the available heap when allocated 10GB. What I’m
trying to determine is (1) How much heap does this setup need before it
stabilizes and stops crashing with OOM errors, (2) can this requirement
somehow be reduced so that we can use less memory, and (3) from the heap
histogram, what is actually using memory (lots of primitive type arrays and
data structures, but what part of Solr is using those)?

I am aware that distributing the index would reduce the requirements for
each shard, but we’d like to avoid that for as long as possible due to
operational difficulties associated. As far as I can tell, very few of the
conditions listed under
https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap section
actually apply to our instance. We don’t have a very large index, we never
update in production (only query), the documents don’t seem very large
(~4KB each), we don’t use faceting, caches are reasonably small (~3GB max),
RAMBufferSizeMB is 100MB, we don’t use RAMDirectoryFactory (as far as I can
tell), and we don’t use sort parameters. The solr instance is used for a
full-text complete-as-you-type use case. The typical query looks something
like the following (field names anonymized):

?q=(single_value_f1:"baril" OR multivalue_f1:"baril")^=1
(single_value_f2:(baril) OR multivalue_f2:(baril))^=0.5
&fl=score,myfield1,myfield2,myfield3:myfield3.ar&bf=product(def(myfield3.ar
,0),1)&rows=200&df=dummy&spellcheck=on&spellcheck.dictionary=spellchecker.es&spellcheck.dictionary=spellchecker.und&spellcheck.q=baril&spellcheck.accuracy=0.5&spellcheck.count=1&fq=+myfield1:(100
OR 200 OR 500)&fl=score&fl=myfield1&fl=myfield2&fl=myfield3:myfield3.ar

I have attached in various screenshots details from top on a running Solr
instance, GC logs, solr-config.xml, and also a heap histogram sampled with
Java Mission Control. I also provide various additional details below
related to how the instances are set up and details about their
configuration.

Operational summary:
We run multiple Solr instances, each acting as a completely independent
node. They are not a cluster and are not set up using Solr Cloud. Each
replica contains the entire index. These replicas run in Kubernetes on GCP.

GC Settings:
-XX:+UnlockExperimentalVMOptions -Xlog:gc*,heap=info
-XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=50
-XX:InitiatingHeapOccupancyPercent=40 -XX:-G1UseAdaptiveIHOP

Index summary:
* ~2,100,000 documents
* Total size: 9.09 GB
* Average document size = 9.09 GB / 2,100,000 docs = 4.32 KB/doc
* 215 fields per document
* 77 are stored.
* 137 are multivalued
* Makes fields use of many spell checkers for different languages (see solr
config.xml)
* Most fields include some sort of tokenization and analysis. Example
config:

  

  
  
  
  
  
  




  
  
  
  

  

Please let me know if there is any additional information required.




${solr.indexVersion:7.4.0}

  ${solr.data.dir}/${solr.indexVersion:7.4.0}/${solr.core.name}/data










100
${solr.lock.type:native}


0
0









1024







true
200
200









false
2






max-age=604800, public





10
key
OR


spellcheck





edismax
5
OR

on
spellchecker.en
20
true
f



spellcheck







identifier_id
ping





spellchecker_analyzer
solr.DirectSolrSpellChecker
internal 
2 
1 
10 
3 
0.2 
0.01
.01



spellchecker.und
spellchecker.und
solr.DirectSolrSpellChecker


spellchecker.en
spellchecker.en
solr.DirectSolrSpellChecker


spellchecker.d

Re: High CPU and Physical Memory Usage in solr with 4000 user load

2018-02-13 Thread rubi.hali
Hi Shawn

As asked, have attached the gc log and snapshot of top command
TopCommandSlave1.jpg
   and 
regarding blocked threads, we are fetching facets and doing grouping with
the main query and for those fields docValues were not enabled. So we did
enable docValues for the same and saw there were no such blocked threads
now.

Regarding Custom Handler, Its just a wrapper which is changing  qf parameter
based on some conditions so that should not be a problem. 

Now CPU usage also came down to 70% but still there is  a concern of
continuous spike in physical memory even though our heap is not getting
utilized that much.

Also there are blocked threads on QueueThreadPool ?? Not sure if this is an
issue or its expected as they should get released immediately.
solr_gc.current
  




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: High CPU and Physical Memory Usage in solr with 4000 user load

2018-02-09 Thread Shawn Heisey

On 2/9/2018 12:48 AM, rubi.hali wrote:

As threads are blocked , our cpu is reaching to 90% even even when heap or
os memory is not getting used at all.

One of the BLOCKED Thread Snippet
Most of the threads are blocked either on Jetty Connector or FieldCacheIMPL
for DOC values.  Slave1ThreadDump5.txt

Slave1ThreadDump4.txt



Threads being blocked shouldn't cause high CPU.  It would generally be 
caused by whatever is causing the threads to be blocked.


I uploaded the Slave1ThreadDump5.txt file from the first link to the 
http://fastthread.io website.


The analysis noticed that 45 threads were blocked by another thread.  At 
the end of this message is the stacktrace for the thread blocking the 
others.


What jumps out at me is the mention of a custom plugin for Solr in the 
stacktrace.  Here's the method mentioned:


com.ril.solr.request.handler.RILMultipleQuerySearchHandler.handleRequestBody

Could there be a bug in that handler causing problems?  I can't say for 
sure, but this IS something out of the ordinary.


This thread is blocking other threads, but it is itself blocked.   Says 
it's waiting for 0x7f24d881e000, but this entry does not appear 
anywhere else in the thread dump.  So I can't tell what's actually 
blocking it.


I also tried putting the thread dump into this page:

https://spotify.github.io/threaddump-analyzer/

The analysis there says that qtp1205044462-196 is "inconsistent" and the 
mouseover for that says that the Thread is blocked on "BLOCKED (on 
object monitor)" without waiting for anything.


Can't find anything definitive, but it MIGHT be blocked by java's own 
internal operations.  Things like memory allocations, GC pauses, etc.


I noticed that the stacktrace includes mention of Solr's facet classes, 
suggesting that these queries include facets or grouping.  Facets and 
grouping can require very large amounts of heap, especially if the 
fields being used do not have docValues enabled.  You may need to make 
your heap larger so that there are fewer garbage collection events.


I have no idea what the custom search handler is going to do to Solr's 
memory requirements.


Solr should create GC logs if you have started it with the included 
script.  Can you share a GC log that includes the time the load testing 
is underway and the problem appears?


Can you log onto the Linux server, run the "top" program (not htop or 
any other variant), press shift-M to sort the list by memory, grab a 
screenshot, and share it?


Thanks,
Shawn



"qtp1205044462-196" #196 prio=5 os_prio=0 tid=0x7f2374002800 
nid=0x5171 waiting for monitor entry [0x7f24d881e000]

   java.lang.Thread.State: BLOCKED (on object monitor)
	at 
org.apache.solr.uninverting.FieldCacheImpl$Cache.get(FieldCacheImpl.java:167)

- locked <0x000642cd7568> (a java.util.WeakHashMap)
	at 
org.apache.solr.uninverting.FieldCacheImpl.getTermsIndex(FieldCacheImpl.java:767)
	at 
org.apache.solr.uninverting.FieldCacheImpl.getTermsIndex(FieldCacheImpl.java:747)
	at 
org.apache.solr.uninverting.UninvertingReader.getSortedDocValues(UninvertingReader.java:319)
	at 
org.apache.lucene.index.FilterLeafReader.getSortedDocValues(FilterLeafReader.java:448)

at org.apache.lucene.index.DocValues.getSorted(DocValues.java:262)
	at 
org.apache.lucene.search.grouping.term.TermGroupFacetCollector$SV.doSetNextReader(TermGroupFacetCollector.java:128)
	at 
org.apache.lucene.search.SimpleCollector.getLeafCollector(SimpleCollector.java:33)

at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:659)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:472)
	at 
org.apache.solr.request.SimpleFacets.getGroupedCounts(SimpleFacets.java:692)
	at 
org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:476)
	at 
org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:405)
	at 
org.apache.solr.request.SimpleFacets.lambda$getFacetFieldCounts$0(SimpleFacets.java:803)
	at 
org.apache.solr.request.SimpleFacets$$Lambda$207/1284277177.call(Unknown 
Source)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.solr.request.SimpleFacets$3.execute(SimpleFacets.java:742)
	at 
org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:818)
	at 
org.apache.solr.handler.component.FacetComponent.getFacetCounts(FacetComponent.java:330)
	at 
org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:274)
	at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:296)
	at 
com.ril.solr.request.handler.RILMultipleQuerySearchHandler.handleRequestBody(RILMultipleQuerySearchHandler.java:39)
	at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)

at org.apache.solr.core.Sol

Re: High CPU and Physical Memory Usage in solr with 4000 user load

2018-02-09 Thread rubi.hali
Hi Shawn

We tried one more round of testing after We increased CPU cores on each
server from 4 to 16 which amounts to 64 cores in total against 4 slaves.But
Still CPU Usage was higher.So We took the thread dumps on one of our slaves
and found that threads were blocked. Have also attached them.

As far as document size is considered, we have only 1.5 GB index(only one)
size amounting to 4 lakh docs  on each server and query load in span of 10
minutes on each of slaves was distributed as
slave 1- 1258
slave 2 - 512
slave 3 - 256
slave 4 - 1384

We are using Linux OS. 

As threads are blocked , our cpu is reaching to 90% even even when heap or
os memory is not getting used at all.

One of the BLOCKED Thread Snippet
Most of the threads are blocked either on Jetty Connector or FieldCacheIMPL
for DOC values.  Slave1ThreadDump5.txt
  
Slave1ThreadDump4.txt
  

"qtp1205044462-26-acceptor-1@706f9d47-ServerConnector@7ea87d3a{HTTP/1.1,[http/1.1]}{0.0.0.0:8983}"
#26 prio=5 os_prio=0 tid=0x7f2530501800 nid=0x4f9c waiting for monitor
entry [0x7f2505eb8000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:233)
- waiting to lock <0x00064067fc68> (a java.lang.Object)
at
org.eclipse.jetty.server.ServerConnector.accept(ServerConnector.java:373)
at
org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:601)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
- None




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: High CPU and Physical Memory Usage in solr with 4000 user load

2018-02-08 Thread Shawn Heisey
On 2/8/2018 12:41 AM, rubi.hali wrote:
> We are using Solr-6.6.2 with one master and 4 Slaves each having 4 CPU core
> and 16GB RAM. We were doing load testing with 4000 users and 800 odd search
> keywords which resulted into 95% of CPU usage in less than 3 minutes and
> affected our QUERY Responses. There was spike in physical memory also which
> was not going down even when we stopped sending load.

How many queries per second is that sending?  Thousands of queries per
second may need more than five servers.

> Our JVM Heap given to Solr is 8G which still lefts with 8G for OS.
>
> We have 4 lakh documents in solr.

How big (disk space) is all the index data being handled by one of your
Solr servers?  Does all that index data add up to 40 documents, or
is that just the document count for one index out of multiple?

> Our Cache Configurations done in SOLR are
>
> size="5"
>initialSize="3"
>autowarmCount="0"/>

This is fairly large.  May not be a problem, but that will depend on how
big each document is.

>size="4096"
>  initialSize="512"
>  autowarmCount="20"/>

This is quite large for a filterCache.  It probably needs to be
smaller.  Be careful with autowarming on the filterCache -- sometimes
filters can execute very slowly.  I had to reduce my autowarmCount on
the filterCache to *four* in order for commits to be fast enough.

>   size="256"
>  initialSize="256"
>  autowarmCount="0"/>

I would probably use a non-zero autowarmCount here.  But don't make it
TOO large.

> We have enabled autocommit 
>  
>${solr.autoCommit.maxDocs:25000}
>  ${solr.autoCommit.maxTime:6} 
>true (This is true only in case of
> slaves) 
>  

I would recommend that you remove maxDocs and let it just be
time-based.  Also, *all* servers should have openSearcher set to false
on autoCommit.

>  We are also doing softcommit
>   
>${solr.autoSoftCommit.maxTime:30} 
>  

This is good.  You could probably lower it to two minutes instead of five.

> Our Queries are enabled with Grouping so Query Result Cache doesnt get used.
> But still in heavy load we are seeing this behaviour which is resulting into
> high response times.
>
> Please suggest if there is any configuration mismatch or os issue which we
> should resolve for bringing down our High Response Times

If your query load is going to be high, you probably need to add more
servers.

To check whether things are optimized well on each server:  What OS is
it running on?  I would like to get some more information from your
install, but the exact method for obtaining that information will vary
depending on the OS.

Can you also provide a large solr_gc.log file so that can be analyzed?

Thanks,
Shawn



High CPU and Physical Memory Usage in solr with 4000 user load

2018-02-08 Thread rubi.hali
Hi

We are using Solr-6.6.2 with one master and 4 Slaves each having 4 CPU core
and 16GB RAM. We were doing load testing with 4000 users and 800 odd search
keywords which resulted into 95% of CPU usage in less than 3 minutes and
affected our QUERY Responses. There was spike in physical memory also which
was not going down even when we stopped sending load.

Our JVM Heap given to Solr is 8G which still lefts with 8G for OS.

We have 4 lakh documents in solr.

Our Cache Configurations done in SOLR are



 




We have enabled autocommit 
   
   ${solr.autoCommit.maxDocs:25000}
   ${solr.autoCommit.maxTime:6} 
   true (This is true only in case of
slaves) 
 

 We are also doing softcommit
  
   ${solr.autoSoftCommit.maxTime:30} 
 

Our Queries are enabled with Grouping so Query Result Cache doesnt get used.
But still in heavy load we are seeing this behaviour which is resulting into
high response times.

Please suggest if there is any configuration mismatch or os issue which we
should resolve for bringing down our High Response Times
















--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: 3 color jvm memory usage bar

2017-10-23 Thread Toke Eskildsen
On Thu, 2017-10-19 at 08:56 -0700, Nawab Zada Asad Iqbal wrote:
> I see three colors in the JVM usage bar. Dark Gray, light Gray,
> white. (left to right).  Only one dark and one light color made sense
> to me (as i could interpret them as used vs available memory), but
> there is light gray between dark gray and white parts.

The light grey is the amount of memory reserved by the JVM. It is only
visible if you do not specify Xms, so many people do not have that.

Generally the dark grey (the amount of heap that is actively used to
hold data) will fluctuate a lot and I don't find it very usable for
observing and tweaking heap size. The GC-log is better.

- Toke Eskildsen, Royal Danish Library



Re: 3 color jvm memory usage bar

2017-10-19 Thread Nawab Zada Asad Iqbal
Thanks Erik

I see three colors in the JVM usage bar. Dark Gray, light Gray, white.
(left to right).  Only one dark and one light color made sense to me (as i
could interpret them as used vs available memory), but there is light gray
between dark gray and white parts.


Thanks
Nawab

On Thu, Oct 19, 2017 at 8:09 AM, Erick Erickson 
wrote:

> Nawab:
>
> Images are stripped aggressively by the Apache mail servers, your
> attachment didn't come through. You'll have to put it somewhere and
> provide a link.
>
> Generally the lighter color in each bar is the available resource and the
> darker shade is used.
>
> Best,
> Erick
>
> On Thu, Oct 19, 2017 at 7:27 AM, Nawab Zada Asad Iqbal 
> wrote:
> > Good morning,
> >
> >
> > What do the 3 colors mean in this bar on Solr dashboard page? (please see
> > attached) :
> >
> >
> > Regards
> > Nawab
>


Re: 3 color jvm memory usage bar

2017-10-19 Thread Erick Erickson
Nawab:

Images are stripped aggressively by the Apache mail servers, your
attachment didn't come through. You'll have to put it somewhere and
provide a link.

Generally the lighter color in each bar is the available resource and the
darker shade is used.

Best,
Erick

On Thu, Oct 19, 2017 at 7:27 AM, Nawab Zada Asad Iqbal  wrote:
> Good morning,
>
>
> What do the 3 colors mean in this bar on Solr dashboard page? (please see
> attached) :
>
>
> Regards
> Nawab


3 color jvm memory usage bar

2017-10-19 Thread Nawab Zada Asad Iqbal
Good morning,


What do the 3 colors mean in this bar on Solr dashboard page? (please see
attached) :


Regards
Nawab


Re: Solr 6.5.1 crashing when too many queries with error or high memory usage are queried

2017-07-10 Thread Joel Bernstein
Yes the hashJoin will read the entire "hashed" query into memory. The
documentation explains this.

In general the streaming joins were designed for OLAP type work loads.
Unless you have a large cluster powering streaming joins you are going to
have problems with high QPS workloads.

Joel Bernstein
http://joelsolr.blogspot.com/

On Sun, Jul 9, 2017 at 10:59 PM, Zheng Lin Edwin Yeo 
wrote:

> I have found that it could be likely due to the hashJoin in the streaming
> expression, as this will store all tuples in memory?
>
> I have more than 12 million in the collections which I am querying, in 1
> shard. The index size of the collection is 45 GB.
> Physical RAM of server: 384 GB
> Java Heap: 22 GB
> Typical search latency: 2 to 4 seconds
>
> Regards,
> Edwin
>
>
> On 7 July 2017 at 16:46, Jan Høydahl  wrote:
>
> > You have not told us how many documents you have, how many shards, how
> big
> > the docs are, physical RAM, Java heap, what typical search latency is
> etc.
> >
> > If you have tried to squeeze too many docs into a single node it might
> get
> > overloaded faster, thus sharding would help.
> > If you return too much content (large fields that you won’t use) that may
> > lower the max QPS for a node, so check that.
> > If you are not using DocValues, faceting etc will take too much memory,
> > but since you use streaming I guess you use Docvalues.
> > There are products that you can put in front of Solr that can do rate
> > limiting for you, such as https://getkong.org/ <https://getkong.org/>
> >
> > You really need to debug what is the bottleneck in your case and try to
> > fix that.
> >
> > Can you share your key numbers here so we can do a qualified guess?
> >
> > --
> > Jan Høydahl, search solution architect
> > Cominvent AS - www.cominvent.com
> >
> > > 2. jul. 2017 kl. 09.00 skrev Zheng Lin Edwin Yeo  >:
> > >
> > > Hi,
> > >
> > > I'm currently facing the issue whereby the Solr crashed when I have
> > issued
> > > too many queries with error or those with high memory usage, like JSON
> > > facet or Streaming expressions.
> > >
> > > What could be the issue here?
> > >
> > > I'm using Solr 6.5.1
> > >
> > > Regards,
> > > Edwin
> >
> >
>


Re: Solr 6.5.1 crashing when too many queries with error or high memory usage are queried

2017-07-09 Thread Zheng Lin Edwin Yeo
I have found that it could be likely due to the hashJoin in the streaming
expression, as this will store all tuples in memory?

I have more than 12 million in the collections which I am querying, in 1
shard. The index size of the collection is 45 GB.
Physical RAM of server: 384 GB
Java Heap: 22 GB
Typical search latency: 2 to 4 seconds

Regards,
Edwin


On 7 July 2017 at 16:46, Jan Høydahl  wrote:

> You have not told us how many documents you have, how many shards, how big
> the docs are, physical RAM, Java heap, what typical search latency is etc.
>
> If you have tried to squeeze too many docs into a single node it might get
> overloaded faster, thus sharding would help.
> If you return too much content (large fields that you won’t use) that may
> lower the max QPS for a node, so check that.
> If you are not using DocValues, faceting etc will take too much memory,
> but since you use streaming I guess you use Docvalues.
> There are products that you can put in front of Solr that can do rate
> limiting for you, such as https://getkong.org/ <https://getkong.org/>
>
> You really need to debug what is the bottleneck in your case and try to
> fix that.
>
> Can you share your key numbers here so we can do a qualified guess?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 2. jul. 2017 kl. 09.00 skrev Zheng Lin Edwin Yeo :
> >
> > Hi,
> >
> > I'm currently facing the issue whereby the Solr crashed when I have
> issued
> > too many queries with error or those with high memory usage, like JSON
> > facet or Streaming expressions.
> >
> > What could be the issue here?
> >
> > I'm using Solr 6.5.1
> >
> > Regards,
> > Edwin
>
>


Re: Solr 6.5.1 crashing when too many queries with error or high memory usage are queried

2017-07-07 Thread Jan Høydahl
You have not told us how many documents you have, how many shards, how big the 
docs are, physical RAM, Java heap, what typical search latency is etc.

If you have tried to squeeze too many docs into a single node it might get 
overloaded faster, thus sharding would help.
If you return too much content (large fields that you won’t use) that may lower 
the max QPS for a node, so check that.
If you are not using DocValues, faceting etc will take too much memory, but 
since you use streaming I guess you use Docvalues.
There are products that you can put in front of Solr that can do rate limiting 
for you, such as https://getkong.org/ <https://getkong.org/>

You really need to debug what is the bottleneck in your case and try to fix 
that.

Can you share your key numbers here so we can do a qualified guess?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 2. jul. 2017 kl. 09.00 skrev Zheng Lin Edwin Yeo :
> 
> Hi,
> 
> I'm currently facing the issue whereby the Solr crashed when I have issued
> too many queries with error or those with high memory usage, like JSON
> facet or Streaming expressions.
> 
> What could be the issue here?
> 
> I'm using Solr 6.5.1
> 
> Regards,
> Edwin



Re: Solr 6.5.1 crashing when too many queries with error or high memory usage are queried

2017-07-02 Thread Toke Eskildsen
On Sun, 2017-07-02 at 15:00 +0800, Zheng Lin Edwin Yeo wrote:
> I'm currently facing the issue whereby the Solr crashed when I have
> issued too many queries with error or those with high memory usage,
> like JSON facet or Streaming expressions.
> 
> What could be the issue here?

Solr does not have any auto-limiting of the number of concurrent
requests. You will have to build that yourself (quite hard) or impose a
hard limit in your request layer that is low enough to guarantee that
you don't run out of memory in Solr.

You could raise the amount of memory allocated for Solr, but even then
you might want to have a hard limit, just to avoid the occasional "cat
steps on F5 and the browser issues a gazillion requests"-scenario.
-- 
Toke Eskildsen, Royal Danish Library


Re: Solr 6.5.1 crashing when too many queries with error or high memory usage are queried

2017-07-02 Thread Rick Leir
Stack trace? Memory diagnostics from top(1)? What querys?

On July 2, 2017 3:00:16 AM EDT, Zheng Lin Edwin Yeo  
wrote:
>Hi,
>
>I'm currently facing the issue whereby the Solr crashed when I have
>issued
>too many queries with error or those with high memory usage, like JSON
>facet or Streaming expressions.
>
>What could be the issue here?
>
>I'm using Solr 6.5.1
>
>Regards,
>Edwin

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Solr 6.5.1 crashing when too many queries with error or high memory usage are queried

2017-07-02 Thread Zheng Lin Edwin Yeo
Hi,

I'm currently facing the issue whereby the Solr crashed when I have issued
too many queries with error or those with high memory usage, like JSON
facet or Streaming expressions.

What could be the issue here?

I'm using Solr 6.5.1

Regards,
Edwin


Re: Heap memory usage is -1 in UI

2016-09-23 Thread Yago Riveiro
This is happening in 5.3.1

This metric is interesting to know the minimal memory footprint of a core (data 
structures and caches).

I agree with Shawn that if Solr doesn't support the metric should be remove 
from the admin, but I insist in the fact that it's useful to plot memory 
consumption in services like zabbix.

--

/Yago Riveiro

On 23 Sep 2016, 01:08 +0100, Shawn Heisey , wrote:
> On 9/22/2016 4:59 PM, Yago Riveiro wrote:
> > The Heap Memory Usage in the UI it's always -1. There is some way to
> > get the amount of heap that a core consumes?
>
> In all the versions that I have looked at, up to 6.0, this number is
> either entirely too small or -1.
>
> Looking into the code, this info comes from the /admin/luke handler, and
> that handler gets it from Lucene. The -1 appears to come into play when
> the reader object is not the expected type, so I'm guessing that past
> changes in Lucene require changes in Solr that have not yet happened.
> Even if the code is fixed so the reader object(s) are calculated
> correctly, that won't be enough information for a true picture of core
> memory usage.
>
> In order for this number to be accurate, size information from other
> places, such as Lucene caches and Solr caches, must also be included.
> There might also be memory structures involved that I haven't even
> thought of. It is entirely possible that the code to gather all this
> information does not yet exist.
>
> In my opinion, the Heap Memory statistic should be removed until a time
> when it can be overhauled so that it is as accurate as possible. Can
> you open an issue in Jira?
>
> Thanks,
> Shawn
>


Re: Heap memory usage is -1 in UI

2016-09-22 Thread Shawn Heisey
On 9/22/2016 4:59 PM, Yago Riveiro wrote:
> The Heap Memory Usage in the UI it's always -1. There is some way to
> get the amount of heap that a core consumes?

In all the versions that I have looked at, up to 6.0, this number is
either entirely too small or -1.

Looking into the code, this info comes from the /admin/luke handler, and
that handler gets it from Lucene.  The -1 appears to come into play when
the reader object is not the expected type, so I'm guessing that past
changes in Lucene require changes in Solr that have not yet happened. 
Even if the code is fixed so the reader object(s) are calculated
correctly, that won't be enough information for a true picture of core
memory usage.

In order for this number to be accurate, size information from other
places, such as Lucene caches and Solr caches, must also be included. 
There might also be memory structures involved that I haven't even
thought of.  It is entirely possible that the code to gather all this
information does not yet exist.

In my opinion, the Heap Memory statistic should be removed until a time
when it can be overhauled so that it is as accurate as possible.  Can
you open an issue in Jira?

Thanks,
Shawn



Re: Heap memory usage is -1 in UI

2016-09-22 Thread Alexandre Rafalovitch
What version of Solr and which Operating System is that on?

Regards,
Alex

On 23 Sep 2016 5:59 AM, "Yago Riveiro"  wrote:

> The Heap Memory Usage in the UI it's always -1.
>
> There is some way to get the amount of heap that a core consumes?
>
>
>
> -
> Best regards
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Heap-memory-usage-is-1-in-UI-tp4297601.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Heap memory usage is -1 in UI

2016-09-22 Thread Yago Riveiro
The Heap Memory Usage in the UI it's always -1.

There is some way to get the amount of heap that a core consumes?



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Heap-memory-usage-is-1-in-UI-tp4297601.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Memory Usage increases by a lot during and after optimization .

2016-01-07 Thread Zheng Lin Edwin Yeo
Hi Shawn,

I am using Java Version 8 Update 45 (build 1.8.0_45-b15). It is a 64-bit
Java.

Thank you.

Regards,
Edwin


On 8 January 2016 at 00:15, Shawn Heisey  wrote:

> On 1/7/2016 12:53 AM, Zheng Lin Edwin Yeo wrote:
> >> Subtracting SHR from RES (or in your case, Shareable from Working)
> >> reveals the actual memory being used, and I believe you can see this
> >> actual number in the Private column, which is approximately the
> >> difference between Working and Shareable.  If I'm right, this means that
> >> the actual memory usage is almost 14GB lower than Windows is reporting.
> > Does this means that sometimes when I see the high memory usage (can be
> up
> > to 100%), it is just a memory reporting error by Windows, but Solr is
> > working exactly as it should?
>
> I believe so, yes.  It should be completely impossible for Java to
> allocate more memory than you ask it to ... and I believe that Java is
> acting correctly in this regard, but the reporting is wrong.
>
> I do not know whether anything can be done about the reporting problem.
> The fact that it happens with Java on two different operating systems
> (that I know of) suggests that the problem is in Java itself.
>
> I suspect that it is related to the fact that the index files are
> accessed via MMap, and part of that virtual memory is misreported as
> shared memory.  I am subscribed to the hotspot mailing list for
> openjdk.  I don't know whether the message will be on-topic for the
> list, but I will ask about it there.  For this message I will need to
> know the precise version of Java you are using, including whether it's
> 32 or 64-bit.
>
> Thanks,
> Shawn
>
>


Re: Memory Usage increases by a lot during and after optimization .

2016-01-06 Thread Zheng Lin Edwin Yeo
Hi Shawn,

Thank you for your explanation.

Yes, both of the top two processes are Solr. I have two Solr processes on
one machine now, as the second one is a replica of the first one. In the
future, the plan is to have them on separate machine.


>Subtracting SHR from RES (or in your case, Shareable from Working)
>reveals the actual memory being used, and I believe you can see this
>actual number in the Private column, which is approximately the
>difference between Working and Shareable.  If I'm right, this means that
>the actual memory usage is almost 14GB lower than Windows is reporting.

Does this means that sometimes when I see the high memory usage (can be up
to 100%), it is just a memory reporting error by Windows, but Solr is
working exactly as it should?

Regards,
Edwin


On 7 January 2016 at 04:42, Shawn Heisey  wrote:

> On 1/5/2016 11:50 PM, Zheng Lin Edwin Yeo wrote:
> > Here is the new screenshot of the Memory tab of the Resource Monitor.
> > https://www.dropbox.com/s/w4bnrb66r16lpx1/Resource%20Monitor.png?dl=0
> >
> > Yes, I found that the value under the "Working Set" column is much higher
> > than the others. Also, the value which I was previously looking at under
> > the Task Manager is under the Private Column here.
> > It says that I have about 14GB of available memory, but the "Free" number
> > is much lower, at 79MB.
>
> You'll probably think I'm nuts, but I believe everything is working
> exactly as it should.
>
> The first two processes, which I assume are Solr processes, show a
> Shareable size near 7GB each.  I have seen something similar happen on
> Linux where SHR memory is huge for the Solr process, and when this
> happens, the combination of memory numbers would turn out to be
> impossible, so I think it's a memory reporting bug related to Java, one
> that affects both Linux and Windows.
>
> Subtracting SHR from RES (or in your case, Shareable from Working)
> reveals the actual memory being used, and I believe you can see this
> actual number in the Private column, which is approximately the
> difference between Working and Shareable.  If I'm right, this means that
> the actual memory usage is almost 14GB lower than Windows is reporting.
>
> If both of the top processes are Solr, I'm not sure why you have two
> Solr processes on one machine.  One Solr instance can handle multiple
> indexes with no problem.
>
> As evidence that I'm not insane, consider the following screenshot, from
> another of my servers:
>
> https://www.dropbox.com/s/64en3sar4cr1ytj/linux-solr-mem-high-shr.png?dl=0
>
> On the screenshot, the solr process shows RES size of 22GB ... which is
> highly unusual, because this Solr install has a max heap of 8GB ... but
> notice that SHR is 13GB.  The difference between 22GB and 13GB is 9GB,
> which is much more reasonable, and if we assume that the 22GB is rounded
> up and/or the 13GB is rounded down, then the difference is much closer
> to 8GB.  Looking at some other numbers, the "cached" value is 48GB.  If
> you add the 48GB cache allocation to the *reported* resident size of
> 22GB for Solr, you get a total of 70GB ... which is more memory than the
> machine even has (64GB).  This is why I am sure that when SHR is really
> high on a Java process, it is a memory reporting error.
>
> Thanks,
> Shawn
>
>


Re: Memory Usage increases by a lot during and after optimization .

2016-01-06 Thread Shawn Heisey
On 1/5/2016 11:50 PM, Zheng Lin Edwin Yeo wrote:
> Here is the new screenshot of the Memory tab of the Resource Monitor.
> https://www.dropbox.com/s/w4bnrb66r16lpx1/Resource%20Monitor.png?dl=0
>
> Yes, I found that the value under the "Working Set" column is much higher
> than the others. Also, the value which I was previously looking at under
> the Task Manager is under the Private Column here.
> It says that I have about 14GB of available memory, but the "Free" number
> is much lower, at 79MB.

You'll probably think I'm nuts, but I believe everything is working
exactly as it should.

The first two processes, which I assume are Solr processes, show a
Shareable size near 7GB each.  I have seen something similar happen on
Linux where SHR memory is huge for the Solr process, and when this
happens, the combination of memory numbers would turn out to be
impossible, so I think it's a memory reporting bug related to Java, one
that affects both Linux and Windows.

Subtracting SHR from RES (or in your case, Shareable from Working)
reveals the actual memory being used, and I believe you can see this
actual number in the Private column, which is approximately the
difference between Working and Shareable.  If I'm right, this means that
the actual memory usage is almost 14GB lower than Windows is reporting.

If both of the top processes are Solr, I'm not sure why you have two
Solr processes on one machine.  One Solr instance can handle multiple
indexes with no problem.

As evidence that I'm not insane, consider the following screenshot, from
another of my servers:

https://www.dropbox.com/s/64en3sar4cr1ytj/linux-solr-mem-high-shr.png?dl=0

On the screenshot, the solr process shows RES size of 22GB ... which is
highly unusual, because this Solr install has a max heap of 8GB ... but
notice that SHR is 13GB.  The difference between 22GB and 13GB is 9GB,
which is much more reasonable, and if we assume that the 22GB is rounded
up and/or the 13GB is rounded down, then the difference is much closer
to 8GB.  Looking at some other numbers, the "cached" value is 48GB.  If
you add the 48GB cache allocation to the *reported* resident size of
22GB for Solr, you get a total of 70GB ... which is more memory than the
machine even has (64GB).  This is why I am sure that when SHR is really
high on a Java process, it is a memory reporting error.

Thanks,
Shawn



Re: Memory Usage increases by a lot during and after optimization .

2016-01-05 Thread Zheng Lin Edwin Yeo
Hi Shawn,

Here is the new screenshot of the Memory tab of the Resource Monitor.
https://www.dropbox.com/s/w4bnrb66r16lpx1/Resource%20Monitor.png?dl=0

Yes, I found that the value under the "Working Set" column is much higher
than the others. Also, the value which I was previously looking at under
the Task Manager is under the Private Column here.
It says that I have about 14GB of available memory, but the "Free" number
is much lower, at 79MB.

Regards.
Edwin


On 6 January 2016 at 03:11, Shawn Heisey  wrote:

> On 1/5/2016 9:59 AM, Zheng Lin Edwin Yeo wrote:
> > I have uploaded the screenshot here
> > https://www.dropbox.com/s/l5itfbaus1c9793/Memmory%20Usage.png?dl=0
> >
> > Basically, Java(TM) Platform SE Library, which Solr is running on, is
> only
> > using about 22GB currently. However, the memory usage at the top says it
> is
> > using 73% now (which I think is already higher than the figures, given
> that
> > I have 64GB of RAM), and it could potentially go up to 100%, even though
> > the memory usage of Java(TM) Platform SE Library remains around 22GB, and
> > there is no other new task which uses alot of memory are running. The
> > figure is sorted according to the memory usage already.
>
> I would bet that the 73 percent refers to all memory allocated for *any*
> purpose, including the disk cache.
>
> The screenshot that you provided is not the best way to see everything
> that's happening in memory.  With the task manager open, click on the
> "Performance" tab, then click on "Open Resource Monitor" down at the
> bottom of the window.  This will open a whole new program.  Once that's
> open, click on the Memory tab, then click on the "Working Set" column
> header to sort by that column.  Increase the size of the window so that
> a large number of processes with the memory utilization can be seen,
> adjust the column widths so the information is clear, and make sure that
> the "Physical Memory" graph and its legend are fully visible.  Then grab
> a new screenshot.
>
> I believe that you will find the "Available" number is quite high, even
> though the "Free" number is very small ... and the difference will be
> similar to the "Cached" number.  If this is what you find, then
> everything is working exactly as it is supposed to be working.
>
> Thanks,
> Shawn
>
>


Re: Memory Usage increases by a lot during and after optimization .

2016-01-05 Thread Shawn Heisey
On 1/5/2016 9:59 AM, Zheng Lin Edwin Yeo wrote:
> I have uploaded the screenshot here
> https://www.dropbox.com/s/l5itfbaus1c9793/Memmory%20Usage.png?dl=0
>
> Basically, Java(TM) Platform SE Library, which Solr is running on, is only
> using about 22GB currently. However, the memory usage at the top says it is
> using 73% now (which I think is already higher than the figures, given that
> I have 64GB of RAM), and it could potentially go up to 100%, even though
> the memory usage of Java(TM) Platform SE Library remains around 22GB, and
> there is no other new task which uses alot of memory are running. The
> figure is sorted according to the memory usage already.

I would bet that the 73 percent refers to all memory allocated for *any*
purpose, including the disk cache.

The screenshot that you provided is not the best way to see everything
that's happening in memory.  With the task manager open, click on the
"Performance" tab, then click on "Open Resource Monitor" down at the
bottom of the window.  This will open a whole new program.  Once that's
open, click on the Memory tab, then click on the "Working Set" column
header to sort by that column.  Increase the size of the window so that
a large number of processes with the memory utilization can be seen,
adjust the column widths so the information is clear, and make sure that
the "Physical Memory" graph and its legend are fully visible.  Then grab
a new screenshot.

I believe that you will find the "Available" number is quite high, even
though the "Free" number is very small ... and the difference will be
similar to the "Cached" number.  If this is what you find, then
everything is working exactly as it is supposed to be working.

Thanks,
Shawn



Re: Memory Usage increases by a lot during and after optimization .

2016-01-05 Thread Zheng Lin Edwin Yeo
Hi Shawn,

Thanks for your reply.

I have uploaded the screenshot here
https://www.dropbox.com/s/l5itfbaus1c9793/Memmory%20Usage.png?dl=0

Basically, Java(TM) Platform SE Library, which Solr is running on, is only
using about 22GB currently. However, the memory usage at the top says it is
using 73% now (which I think is already higher than the figures, given that
I have 64GB of RAM), and it could potentially go up to 100%, even though
the memory usage of Java(TM) Platform SE Library remains around 22GB, and
there is no other new task which uses alot of memory are running. The
figure is sorted according to the memory usage already.

Regards,
Edwin



On 5 January 2016 at 08:16, Shawn Heisey  wrote:

> On 1/3/2016 7:05 PM, Zheng Lin Edwin Yeo wrote:
> > A) Before I start the optimization, the server's memory usage
> > is consistent at around 16GB, when Solr startsup and we did some
> searching.
> > However, when I click on the optimization button, the memory usage
> > increases gradually, until it reaches the maximum of 64GB which the
> server
> > has. But this only happens to the collection with index of 200GB, and not
> > other collections which has smaller index size (they are at most 1GB at
> the
> > moment).
>
> 
>
> > A) I am quite curious at this also, because in the Task Manager of the
> > server, the amount of memory usage stated does not tally with the
> > percentage of memory usage. When I start optimizatoin, the memory usage
> > states the JVM is only using 14GB, but the percentage of memory usage is
> > almost 100%, when I have 64GB RAM. I have check the other processes
> running
> > in the server, and did not found any other processes that takes up a
> large
> > amount of memory, and the total amount of memory usage for the whole
> sever
> > is only around 16GB.
>
> Toke's reply is spot on.
>
> In your first answer above, you didn't really answer my question, which
> was "What *exactly* are you looking at that says Solr is using all your
> memory?"  You've said "the server's memory usage" but haven't described
> how you got that number.
>
> Here's a screenshot of "top" on one of my Solr servers, with the list
> sorted by memory usage:
>
> https://www.dropbox.com/s/i49s2uyfetwo3xq/solr-mem-prod-8g-heap.png?dl=0
>
> This machine has 165GB (base 2 number) of index data on it, and 64GB of
> memory.  Solr has been assigned an 8GB heap.  Here's more specific info
> about the size of the index data:
>
> root@idxb3:/index/solr5/data# du -hs data
> 165Gdata
> root@idxb3:/index/solr5/data# du -s data
> 172926520   data
>
> You can see that the VIRT memory size of the Solr process is
> approximately the same as the total index size (165GB) plus the max heap
> (8GB), which adds up to 173GB.  The RES memory size of the java process
> is 8.3GB -- just a little bit larger than the max heap.
>
> At the OS level, my server shows 46GB used out of 64GB total ... which
> probably seems excessive, until you consider the 36 million kilobytes in
> the "cached" statistic.  This is the amount of memory being used for the
> page cache.   If you subtract that memory, then you can see that this
> server has only allocated about 10GB of RAM total -- exactly what I
> would expect for a Linux machine dedicated to Solr with the max heap at
> 8GB.
>
> Although my server is indicating about 18GB of memory free, I have seen
> perfectly functioning servers with that number very close to zero.  It
> is completely normal for the "free" memory statistic on Linux and
> Windows to show a few megabytes or less, especially when you optimize a
> Solr index, which reads (and writes) all of the index data, and will
> fill up the page cache.
>
> So, I will ask something very similar to my initial question.  Where
> *exactly* are you looking to see the memory usage that you believe is a
> problem?  A screenshot would be very helpful.
>
> Here's a screenshot from my Windows client.  This machine is NOT running
> Solr, but the situation with free and cached memory is similar.
>
> https://www.dropbox.com/s/wex1gbj7e45g8ed/windows7-mem-usage.png?dl=0
>
> I am not doing anything particularly unusual with this machine, but it
> says there is *zero* free memory, out of 16GB total.  There is 9GB of
> memory in the page cache, though -- memory that the OS will instantly
> give up if any program requests it, which you can see because the
> "available" stat is also about 9GB.  This Windows machine is doing
> perfectly fine as far as memory.
>
> Thanks,
> Shawn
>
>


Re: Memory Usage increases by a lot during and after optimization .

2016-01-05 Thread Zheng Lin Edwin Yeo
Hi Toke,

I read the server's memory usage from the Task manager under Windows,

Regards,
Edwin


On 4 January 2016 at 17:17, Toke Eskildsen  wrote:

> On Mon, 2016-01-04 at 10:05 +0800, Zheng Lin Edwin Yeo wrote:
> > A) Before I start the optimization, the server's memory usage
> > is consistent at around 16GB, when Solr startsup and we did some
> searching.
>
> How do you read this number?
>
> > However, when I click on the optimization button, the memory usage
> > increases gradually, until it reaches the maximum of 64GB which the
> server
> > has.
>
> There are multiple ways of looking at memory. The most relevant ones in
> this context are
>
> - Total memory on the system
>   This appears to be 64GB.
>
> - Free memory on the system
>   Usually determined by 'top' under Linux or Task Manager under Windows.
>
> - Memory used for caching on the system
>   Usually determined by 'top' under Linux or Task Manager under Windows.
>
> - JVM memory usage
>   Usually determined by 'top' under Linux or Task Manager under Windows.
>   Look for "Res" (resident) for the task in Linux. It might be called
>   "physical" under Windows.
>
>
> - Maximum JVM heap (Xmx)
>   Lightest grey in "JVM-Memory" in the Solr Admin interface Dashboard.
>
> - Allocated JVM heap (Xmx)
>   Medium grey in "JVM-Memory" in the Solr Admin interface Dashboard.
>
> - Active JVM heap (Xmx)
>   Dark grey in "JVM-Memory" in the Solr Admin interface Dashboard.
>
>
> I am guessing that the number you are talking about is "Free memory on
> the system" and as Shawn and Erick points out, a full allocation there
> is expected behaviour.
>
> What we are interested in are the JVM heap numbers.
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>


Re: Memory Usage increases by a lot during and after optimization .

2016-01-04 Thread Shawn Heisey
On 1/3/2016 7:05 PM, Zheng Lin Edwin Yeo wrote:
> A) Before I start the optimization, the server's memory usage
> is consistent at around 16GB, when Solr startsup and we did some searching.
> However, when I click on the optimization button, the memory usage
> increases gradually, until it reaches the maximum of 64GB which the server
> has. But this only happens to the collection with index of 200GB, and not
> other collections which has smaller index size (they are at most 1GB at the
> moment).



> A) I am quite curious at this also, because in the Task Manager of the
> server, the amount of memory usage stated does not tally with the
> percentage of memory usage. When I start optimizatoin, the memory usage
> states the JVM is only using 14GB, but the percentage of memory usage is
> almost 100%, when I have 64GB RAM. I have check the other processes running
> in the server, and did not found any other processes that takes up a large
> amount of memory, and the total amount of memory usage for the whole sever
> is only around 16GB.

Toke's reply is spot on.

In your first answer above, you didn't really answer my question, which
was "What *exactly* are you looking at that says Solr is using all your
memory?"  You've said "the server's memory usage" but haven't described
how you got that number.

Here's a screenshot of "top" on one of my Solr servers, with the list
sorted by memory usage:

https://www.dropbox.com/s/i49s2uyfetwo3xq/solr-mem-prod-8g-heap.png?dl=0

This machine has 165GB (base 2 number) of index data on it, and 64GB of
memory.  Solr has been assigned an 8GB heap.  Here's more specific info
about the size of the index data:

root@idxb3:/index/solr5/data# du -hs data
165Gdata
root@idxb3:/index/solr5/data# du -s data
172926520   data

You can see that the VIRT memory size of the Solr process is
approximately the same as the total index size (165GB) plus the max heap
(8GB), which adds up to 173GB.  The RES memory size of the java process
is 8.3GB -- just a little bit larger than the max heap.

At the OS level, my server shows 46GB used out of 64GB total ... which
probably seems excessive, until you consider the 36 million kilobytes in
the "cached" statistic.  This is the amount of memory being used for the
page cache.   If you subtract that memory, then you can see that this
server has only allocated about 10GB of RAM total -- exactly what I
would expect for a Linux machine dedicated to Solr with the max heap at 8GB.

Although my server is indicating about 18GB of memory free, I have seen
perfectly functioning servers with that number very close to zero.  It
is completely normal for the "free" memory statistic on Linux and
Windows to show a few megabytes or less, especially when you optimize a
Solr index, which reads (and writes) all of the index data, and will
fill up the page cache.

So, I will ask something very similar to my initial question.  Where
*exactly* are you looking to see the memory usage that you believe is a
problem?  A screenshot would be very helpful.

Here's a screenshot from my Windows client.  This machine is NOT running
Solr, but the situation with free and cached memory is similar.

https://www.dropbox.com/s/wex1gbj7e45g8ed/windows7-mem-usage.png?dl=0

I am not doing anything particularly unusual with this machine, but it
says there is *zero* free memory, out of 16GB total.  There is 9GB of
memory in the page cache, though -- memory that the OS will instantly
give up if any program requests it, which you can see because the
"available" stat is also about 9GB.  This Windows machine is doing
perfectly fine as far as memory.

Thanks,
Shawn



Re: Memory Usage increases by a lot during and after optimization .

2016-01-04 Thread Toke Eskildsen
On Mon, 2016-01-04 at 10:05 +0800, Zheng Lin Edwin Yeo wrote:
> A) Before I start the optimization, the server's memory usage
> is consistent at around 16GB, when Solr startsup and we did some searching.

How do you read this number?

> However, when I click on the optimization button, the memory usage
> increases gradually, until it reaches the maximum of 64GB which the server
> has.

There are multiple ways of looking at memory. The most relevant ones in
this context are

- Total memory on the system
  This appears to be 64GB.

- Free memory on the system
  Usually determined by 'top' under Linux or Task Manager under Windows.

- Memory used for caching on the system
  Usually determined by 'top' under Linux or Task Manager under Windows.

- JVM memory usage
  Usually determined by 'top' under Linux or Task Manager under Windows.
  Look for "Res" (resident) for the task in Linux. It might be called 
  "physical" under Windows.


- Maximum JVM heap (Xmx)
  Lightest grey in "JVM-Memory" in the Solr Admin interface Dashboard.

- Allocated JVM heap (Xmx)
  Medium grey in "JVM-Memory" in the Solr Admin interface Dashboard.

- Active JVM heap (Xmx)
  Dark grey in "JVM-Memory" in the Solr Admin interface Dashboard.


I am guessing that the number you are talking about is "Free memory on
the system" and as Shawn and Erick points out, a full allocation there
is expected behaviour.

What we are interested in are the JVM heap numbers.

- Toke Eskildsen, State and University Library, Denmark




Re: Memory Usage increases by a lot during and after optimization .

2016-01-03 Thread Zheng Lin Edwin Yeo
Thanks for the reply Shawn and Erick.

What *exactly* are you looking at that says Solr is using all your
memory?  You must be extremely specific when answering this question.
This will determine whether we should be looking for a bug or not.

A) Before I start the optimization, the server's memory usage
is consistent at around 16GB, when Solr startsup and we did some searching.
However, when I click on the optimization button, the memory usage
increases gradually, until it reaches the maximum of 64GB which the server
has. But this only happens to the collection with index of 200GB, and not
other collections which has smaller index size (they are at most 1GB at the
moment).


In another message thread, you indicated that your max heap was set to
14GB.  Java will only ever use that much memory for the program that is
being run, plus a relatively small amount so that Java itself can
operate.  Any significantly large resident memory allocation beyond the
max heap would be an indication of a bug in Java, not a bug in Solr.

A) I am quite curious at this also, because in the Task Manager of the
server, the amount of memory usage stated does not tally with the
percentage of memory usage. When I start optimizatoin, the memory usage
states the JVM is only using 14GB, but the percentage of memory usage is
almost 100%, when I have 64GB RAM. I have check the other processes running
in the server, and did not found any other processes that takes up a large
amount of memory, and the total amount of memory usage for the whole sever
is only around 16GB.


Regards,
Edwin


On 3 January 2016 at 01:24, Erick Erickson  wrote:

> If you happen to be looking at "top" or the like, you
> might be seeing virtual memory, see Uwe's
> excellent blog here:
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> Best,
> Erick
>
> On Fri, Jan 1, 2016 at 11:46 PM, Shawn Heisey  wrote:
> > On 12/31/2015 8:03 PM, Zheng Lin Edwin Yeo wrote:
> >> But the problem I'm facing now is that during optimizing, the memory
> usage
> >> of the server hit the maximum of 64GB, and I believe the optimization
> could
> >> not be completed fully as there is not enough memory, so when I check
> the
> >> index again, it says that it is not optimized. Before the optimization,
> the
> >> memory usage was less than 16GB, so the optimization actually uses up
> more
> >> than 48GB of memory.
> >>
> >> Is it normal for an index size of 200GB to use up so much memory during
> >> optimization?
> >
> > What *exactly* are you looking at that says Solr is using all your
> > memory?  You must be extremely specific when answering this question.
> > This will determine whether we should be looking for a bug or not.
> >
> > It is completely normal for all modern operating systems to use all the
> > memory when the amount of data being handled is large.  Some of the
> > memory will be allocated to programs like Java/Solr, and the operating
> > system will use everything else to cache data from I/O operations on the
> > disk.  This is called the page cache.  For Solr to perform well, the
> > page cache must be large enough to effectively cache your index data.
> >
> > https://en.wikipedia.org/wiki/Page_cache
> >
> > In another message thread, you indicated that your max heap was set to
> > 14GB.  Java will only ever use that much memory for the program that is
> > being run, plus a relatively small amount so that Java itself can
> > operate.  Any significantly large resident memory allocation beyond the
> > max heap would be an indication of a bug in Java, not a bug in Solr.
> >
> > With the index size at 200GB, I would hope to have at least 128GB of
> > memory in the server, but I would *want* 256GB.  64GB may not be enough
> > for good performance.
> >
> > Thanks,
> > Shawn
> >
>


Re: Memory Usage increases by a lot during and after optimization .

2016-01-02 Thread Erick Erickson
If you happen to be looking at "top" or the like, you
might be seeing virtual memory, see Uwe's
excellent blog here:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Best,
Erick

On Fri, Jan 1, 2016 at 11:46 PM, Shawn Heisey  wrote:
> On 12/31/2015 8:03 PM, Zheng Lin Edwin Yeo wrote:
>> But the problem I'm facing now is that during optimizing, the memory usage
>> of the server hit the maximum of 64GB, and I believe the optimization could
>> not be completed fully as there is not enough memory, so when I check the
>> index again, it says that it is not optimized. Before the optimization, the
>> memory usage was less than 16GB, so the optimization actually uses up more
>> than 48GB of memory.
>>
>> Is it normal for an index size of 200GB to use up so much memory during
>> optimization?
>
> What *exactly* are you looking at that says Solr is using all your
> memory?  You must be extremely specific when answering this question.
> This will determine whether we should be looking for a bug or not.
>
> It is completely normal for all modern operating systems to use all the
> memory when the amount of data being handled is large.  Some of the
> memory will be allocated to programs like Java/Solr, and the operating
> system will use everything else to cache data from I/O operations on the
> disk.  This is called the page cache.  For Solr to perform well, the
> page cache must be large enough to effectively cache your index data.
>
> https://en.wikipedia.org/wiki/Page_cache
>
> In another message thread, you indicated that your max heap was set to
> 14GB.  Java will only ever use that much memory for the program that is
> being run, plus a relatively small amount so that Java itself can
> operate.  Any significantly large resident memory allocation beyond the
> max heap would be an indication of a bug in Java, not a bug in Solr.
>
> With the index size at 200GB, I would hope to have at least 128GB of
> memory in the server, but I would *want* 256GB.  64GB may not be enough
> for good performance.
>
> Thanks,
> Shawn
>


Re: Memory Usage increases by a lot during and after optimization .

2016-01-01 Thread Shawn Heisey
On 12/31/2015 8:03 PM, Zheng Lin Edwin Yeo wrote:
> But the problem I'm facing now is that during optimizing, the memory usage
> of the server hit the maximum of 64GB, and I believe the optimization could
> not be completed fully as there is not enough memory, so when I check the
> index again, it says that it is not optimized. Before the optimization, the
> memory usage was less than 16GB, so the optimization actually uses up more
> than 48GB of memory.
> 
> Is it normal for an index size of 200GB to use up so much memory during
> optimization?

What *exactly* are you looking at that says Solr is using all your
memory?  You must be extremely specific when answering this question.
This will determine whether we should be looking for a bug or not.

It is completely normal for all modern operating systems to use all the
memory when the amount of data being handled is large.  Some of the
memory will be allocated to programs like Java/Solr, and the operating
system will use everything else to cache data from I/O operations on the
disk.  This is called the page cache.  For Solr to perform well, the
page cache must be large enough to effectively cache your index data.

https://en.wikipedia.org/wiki/Page_cache

In another message thread, you indicated that your max heap was set to
14GB.  Java will only ever use that much memory for the program that is
being run, plus a relatively small amount so that Java itself can
operate.  Any significantly large resident memory allocation beyond the
max heap would be an indication of a bug in Java, not a bug in Solr.

With the index size at 200GB, I would hope to have at least 128GB of
memory in the server, but I would *want* 256GB.  64GB may not be enough
for good performance.

Thanks,
Shawn



Re: Memory Usage increases by a lot during and after optimization .

2015-12-31 Thread Zheng Lin Edwin Yeo
Hi Yonik,

Yes, the plan is to do the optimizing at night after indexing, when there
are lesser user who will use the system.

But the problem I'm facing now is that during optimizing, the memory usage
of the server hit the maximum of 64GB, and I believe the optimization could
not be completed fully as there is not enough memory, so when I check the
index again, it says that it is not optimized. Before the optimization, the
memory usage was less than 16GB, so the optimization actually uses up more
than 48GB of memory.

Is it normal for an index size of 200GB to use up so much memory during
optimization?

Regards,
Edwin


On 30 December 2015 at 11:49, Yonik Seeley  wrote:

> Some people also want to control when major segment merges happen, and
> optimizing at a known time helps prevent a major merge at an unknown
> time (which can be equivalent to an optimize/forceMerge).
>
> The benefits of optimizing (and having fewer segments to search
> across) will vary depending on the requests.
> Normal full-text searches will see little benefit (merging a few terms
> across many segments is not expensive), while other operations that
> need to deal with many terms, like faceting, may see bigger speedups.
>
> -Yonik
>


Re: Memory Usage increases by a lot during and after optimization .

2015-12-31 Thread Alexandre Rafalovitch
Wouldn't collection swapping be a better strategy in that case?

Load and optimise in a separate server, then swap it in.
On 30 Dec 2015 10:08 am, "Walter Underwood"  wrote:

> The only time that a force merge might be useful is when you reindex all
> content every night or every week, then do not make any changes until the
> next reindex. But even then, it probably does not matter.
>
> Just let Solr do its thing. Solr is pretty smart.
>
> A long time ago (1996-2006), I worked on an enterprise search engine with
> the same merging algorithm as Solr (Ultraseek Server). We always had
> customers asking about force-merge/optimize. It never made a useful
> difference. Even with twenty servers at irs.gov <http://irs.gov/>, it
> didn’t make a difference.
>
> wunder
> K6WRU
> Walter Underwood
> CM87wj
> http://observer.wunderwood.org/ (my blog)
>
> > On Dec 29, 2015, at 6:59 PM, Zheng Lin Edwin Yeo 
> wrote:
> >
> > Hi Walter,
> >
> > Thanks for your reply.
> >
> > Then how about optimization after indexing?
> > Normally the index size is much larger after indexing, then after
> > optimization, the index size reduces. Do we still need to do that?
> >
> > Regards,
> > Edwin
> >
> > On 30 December 2015 at 10:45, Walter Underwood 
> > wrote:
> >
> >> Do not “optimize".
> >>
> >> It is a forced merge, not an optimization. It was a mistake to ever name
> >> it “optimize”. Solr automatically merges as needed. There are a few
> >> situations where a force merge might make a small difference. Maybe 10%
> or
> >> 20%, no one had bothered to measure it.
> >>
> >> If your index is continually updated, clicking that is a complete waste
> of
> >> resources. Don’t do it.
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >>> On Dec 29, 2015, at 6:35 PM, Zheng Lin Edwin Yeo  >
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I am facing a situation, when I do an optimization by clicking on the
> >>> "Optimized" button on the Solr Admin Overview UI, the memory usage of
> the
> >>> server increases gradually, until it reaches near the maximum memory
> >>> available. There is 64GB of memory available in the server.
> >>>
> >>> Even after the optimized is completed, the memory usage stays near the
> >> 100%
> >>> range, and could not be reduced until I stop Solr. Why could this be
> >>> happening?
> >>>
> >>> Also, I don't think the optimization is completed, as the admin page
> says
> >>> the index is not optimized again after I go back to the Overview page,
> >> even
> >>> though I did not do any updates to the index.
> >>>
> >>> I am using Solr 5.3.0, with 1 shard and 2 replica. My index size is
> >> 183GB.
> >>>
> >>> Regards,
> >>> Edwin
> >>
> >>
>
>


Re: Memory Usage increases by a lot during and after optimization .

2015-12-29 Thread William Bell
Question: does anyone have example good merge settings for solrconfig ? To
keep the number of segments small like 6?

On Tue, Dec 29, 2015 at 8:49 PM, Yonik Seeley  wrote:

> Some people also want to control when major segment merges happen, and
> optimizing at a known time helps prevent a major merge at an unknown
> time (which can be equivalent to an optimize/forceMerge).
>
> The benefits of optimizing (and having fewer segments to search
> across) will vary depending on the requests.
> Normal full-text searches will see little benefit (merging a few terms
> across many segments is not expensive), while other operations that
> need to deal with many terms, like faceting, may see bigger speedups.
>
> -Yonik
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Memory Usage increases by a lot during and after optimization .

2015-12-29 Thread Yonik Seeley
Some people also want to control when major segment merges happen, and
optimizing at a known time helps prevent a major merge at an unknown
time (which can be equivalent to an optimize/forceMerge).

The benefits of optimizing (and having fewer segments to search
across) will vary depending on the requests.
Normal full-text searches will see little benefit (merging a few terms
across many segments is not expensive), while other operations that
need to deal with many terms, like faceting, may see bigger speedups.

-Yonik


Re: Memory Usage increases by a lot during and after optimization .

2015-12-29 Thread Zheng Lin Edwin Yeo
Thanks for the information.

Another thing I like to confirm is, will the Java Heap size setting affect
the optimization process or the memory usage?

Is the any recommended setting that we can use, for an index size of 200GB?

Regards,
Edwin


On 30 December 2015 at 11:07, Walter Underwood 
wrote:

> The only time that a force merge might be useful is when you reindex all
> content every night or every week, then do not make any changes until the
> next reindex. But even then, it probably does not matter.
>
> Just let Solr do its thing. Solr is pretty smart.
>
> A long time ago (1996-2006), I worked on an enterprise search engine with
> the same merging algorithm as Solr (Ultraseek Server). We always had
> customers asking about force-merge/optimize. It never made a useful
> difference. Even with twenty servers at irs.gov <http://irs.gov/>, it
> didn’t make a difference.
>
> wunder
> K6WRU
> Walter Underwood
> CM87wj
> http://observer.wunderwood.org/ (my blog)
>
> > On Dec 29, 2015, at 6:59 PM, Zheng Lin Edwin Yeo 
> wrote:
> >
> > Hi Walter,
> >
> > Thanks for your reply.
> >
> > Then how about optimization after indexing?
> > Normally the index size is much larger after indexing, then after
> > optimization, the index size reduces. Do we still need to do that?
> >
> > Regards,
> > Edwin
> >
> > On 30 December 2015 at 10:45, Walter Underwood 
> > wrote:
> >
> >> Do not “optimize".
> >>
> >> It is a forced merge, not an optimization. It was a mistake to ever name
> >> it “optimize”. Solr automatically merges as needed. There are a few
> >> situations where a force merge might make a small difference. Maybe 10%
> or
> >> 20%, no one had bothered to measure it.
> >>
> >> If your index is continually updated, clicking that is a complete waste
> of
> >> resources. Don’t do it.
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >>> On Dec 29, 2015, at 6:35 PM, Zheng Lin Edwin Yeo  >
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I am facing a situation, when I do an optimization by clicking on the
> >>> "Optimized" button on the Solr Admin Overview UI, the memory usage of
> the
> >>> server increases gradually, until it reaches near the maximum memory
> >>> available. There is 64GB of memory available in the server.
> >>>
> >>> Even after the optimized is completed, the memory usage stays near the
> >> 100%
> >>> range, and could not be reduced until I stop Solr. Why could this be
> >>> happening?
> >>>
> >>> Also, I don't think the optimization is completed, as the admin page
> says
> >>> the index is not optimized again after I go back to the Overview page,
> >> even
> >>> though I did not do any updates to the index.
> >>>
> >>> I am using Solr 5.3.0, with 1 shard and 2 replica. My index size is
> >> 183GB.
> >>>
> >>> Regards,
> >>> Edwin
> >>
> >>
>
>


Re: Memory Usage increases by a lot during and after optimization .

2015-12-29 Thread Walter Underwood
The only time that a force merge might be useful is when you reindex all 
content every night or every week, then do not make any changes until the next 
reindex. But even then, it probably does not matter.

Just let Solr do its thing. Solr is pretty smart.

A long time ago (1996-2006), I worked on an enterprise search engine with the 
same merging algorithm as Solr (Ultraseek Server). We always had customers 
asking about force-merge/optimize. It never made a useful difference. Even with 
twenty servers at irs.gov <http://irs.gov/>, it didn’t make a difference.

wunder
K6WRU
Walter Underwood
CM87wj
http://observer.wunderwood.org/ (my blog)

> On Dec 29, 2015, at 6:59 PM, Zheng Lin Edwin Yeo  wrote:
> 
> Hi Walter,
> 
> Thanks for your reply.
> 
> Then how about optimization after indexing?
> Normally the index size is much larger after indexing, then after
> optimization, the index size reduces. Do we still need to do that?
> 
> Regards,
> Edwin
> 
> On 30 December 2015 at 10:45, Walter Underwood 
> wrote:
> 
>> Do not “optimize".
>> 
>> It is a forced merge, not an optimization. It was a mistake to ever name
>> it “optimize”. Solr automatically merges as needed. There are a few
>> situations where a force merge might make a small difference. Maybe 10% or
>> 20%, no one had bothered to measure it.
>> 
>> If your index is continually updated, clicking that is a complete waste of
>> resources. Don’t do it.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Dec 29, 2015, at 6:35 PM, Zheng Lin Edwin Yeo 
>> wrote:
>>> 
>>> Hi,
>>> 
>>> I am facing a situation, when I do an optimization by clicking on the
>>> "Optimized" button on the Solr Admin Overview UI, the memory usage of the
>>> server increases gradually, until it reaches near the maximum memory
>>> available. There is 64GB of memory available in the server.
>>> 
>>> Even after the optimized is completed, the memory usage stays near the
>> 100%
>>> range, and could not be reduced until I stop Solr. Why could this be
>>> happening?
>>> 
>>> Also, I don't think the optimization is completed, as the admin page says
>>> the index is not optimized again after I go back to the Overview page,
>> even
>>> though I did not do any updates to the index.
>>> 
>>> I am using Solr 5.3.0, with 1 shard and 2 replica. My index size is
>> 183GB.
>>> 
>>> Regards,
>>> Edwin
>> 
>> 



Re: Memory Usage increases by a lot during and after optimization .

2015-12-29 Thread Zheng Lin Edwin Yeo
Hi Walter,

Thanks for your reply.

Then how about optimization after indexing?
Normally the index size is much larger after indexing, then after
optimization, the index size reduces. Do we still need to do that?

Regards,
Edwin

On 30 December 2015 at 10:45, Walter Underwood 
wrote:

> Do not “optimize".
>
> It is a forced merge, not an optimization. It was a mistake to ever name
> it “optimize”. Solr automatically merges as needed. There are a few
> situations where a force merge might make a small difference. Maybe 10% or
> 20%, no one had bothered to measure it.
>
> If your index is continually updated, clicking that is a complete waste of
> resources. Don’t do it.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Dec 29, 2015, at 6:35 PM, Zheng Lin Edwin Yeo 
> wrote:
> >
> > Hi,
> >
> > I am facing a situation, when I do an optimization by clicking on the
> > "Optimized" button on the Solr Admin Overview UI, the memory usage of the
> > server increases gradually, until it reaches near the maximum memory
> > available. There is 64GB of memory available in the server.
> >
> > Even after the optimized is completed, the memory usage stays near the
> 100%
> > range, and could not be reduced until I stop Solr. Why could this be
> > happening?
> >
> > Also, I don't think the optimization is completed, as the admin page says
> > the index is not optimized again after I go back to the Overview page,
> even
> > though I did not do any updates to the index.
> >
> > I am using Solr 5.3.0, with 1 shard and 2 replica. My index size is
> 183GB.
> >
> > Regards,
> > Edwin
>
>


Re: Memory Usage increases by a lot during and after optimization .

2015-12-29 Thread Walter Underwood
Do not “optimize".

It is a forced merge, not an optimization. It was a mistake to ever name it 
“optimize”. Solr automatically merges as needed. There are a few situations 
where a force merge might make a small difference. Maybe 10% or 20%, no one had 
bothered to measure it.

If your index is continually updated, clicking that is a complete waste of 
resources. Don’t do it.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Dec 29, 2015, at 6:35 PM, Zheng Lin Edwin Yeo  wrote:
> 
> Hi,
> 
> I am facing a situation, when I do an optimization by clicking on the
> "Optimized" button on the Solr Admin Overview UI, the memory usage of the
> server increases gradually, until it reaches near the maximum memory
> available. There is 64GB of memory available in the server.
> 
> Even after the optimized is completed, the memory usage stays near the 100%
> range, and could not be reduced until I stop Solr. Why could this be
> happening?
> 
> Also, I don't think the optimization is completed, as the admin page says
> the index is not optimized again after I go back to the Overview page, even
> though I did not do any updates to the index.
> 
> I am using Solr 5.3.0, with 1 shard and 2 replica. My index size is 183GB.
> 
> Regards,
> Edwin



Memory Usage increases by a lot during and after optimization .

2015-12-29 Thread Zheng Lin Edwin Yeo
Hi,

I am facing a situation, when I do an optimization by clicking on the
"Optimized" button on the Solr Admin Overview UI, the memory usage of the
server increases gradually, until it reaches near the maximum memory
available. There is 64GB of memory available in the server.

Even after the optimized is completed, the memory usage stays near the 100%
range, and could not be reduced until I stop Solr. Why could this be
happening?

Also, I don't think the optimization is completed, as the admin page says
the index is not optimized again after I go back to the Overview page, even
though I did not do any updates to the index.

I am using Solr 5.3.0, with 1 shard and 2 replica. My index size is 183GB.

Regards,
Edwin


Re: Solr memory usage

2015-12-11 Thread Otis Gospodnetić
Hi Steve,

Fluctuation is OK.  100% utilization for more than a moment is not :)

Not sure what tool(s) you use for monitoring your Solr servers, but look
under "JVM Pool Utilization" in SPM if you're using SPM.
Or this live demo of a Solr system:
* click on https://apps.sematext.com/demo to get into the demo account
* look at "JVM Pool Utilization" on
https://apps.sematext.com/spm-reports/mainPage.do?selectedApplication=1704&r=poolReportPage×tamp=1449865787801&stickyFiltersOff=false

And on that JVM Pool Size chart on top of the page you will see giant saw
pattern which is a healthy sign :)

HTH
Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/


On Wed, Dec 9, 2015 at 9:56 AM, Steven White  wrote:

> Thanks Erick!!  Your summary and the blog by Uwe (thank you too Uwe) are
> very helpful.
>
> A follow up question.  I also noticed the "JVM-Memory" report off Solr's
> home page is fluctuating.  I expect some fluctuation, but it kinda worries
> me when it fluctuates up / down in a range of 4 GB and maybe more.  I.e.:
> at times it is at 5 GB and other times it is at 10 GB (this is while I'm
> running my search tests).  What does such high fluctuation means?
>
> If it helps, Solr's "JVM-Memory" report states 2.5 GB usage when Solr is
> first started and before I run any search on it.  I'm taking this as my
> base startup memory usage.
>
> Steve
>
> On Tue, Dec 8, 2015 at 3:17 PM, Erick Erickson 
> wrote:
>
> > You're doing nothing wrong, that particular bit of advice has
> > always needed a bit of explanation.
> >
> > Solr (well, actually Lucene) uses MMapDirectory for much of
> > the index structure which uses the OS memory rather than
> > the JVM heap. See Uwe's excellent:
> >
> > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> >
> > Plus, the size on disk includes the stored data, which is in the *.fdt
> > files in data/index. Very little of the stored data is kept in the JVM
> > so that's another reason your Java heap may be smaller than
> > your raw index size on disk.
> >
> > The advice about fitting your entire index into memory really has
> > the following caveats (at least).
> > 1> "memory" includes the OS memory available to the process
> > 2> The size of the index on disk is misleading, the *.fdt files
> >  should be subtracted in order to get a truer picture.
> > 3> Both Solr and Lucene create structures in the Java JVM
> >  that are _not_ reflected in the size on disk.
> >
> > <1> and <2> mean the JVM memory necessary is smaller
> > than the size on disk.
> >
> > <3> means the JVM memory will be larger than.
> >
> > So you're doing the right thing, testing and seeing what you
> > _really_ need. I'd pretty much take your test, add some
> > padding and consider it good. You're _not_ doing the
> > really bad thing of using the same query over and over
> > again and hoping .
> >
> > Best,
> > Erick
> >
> >
> > On Tue, Dec 8, 2015 at 11:54 AM, Steven White 
> > wrote:
> > > Hi folks,
> > >
> > > My index size on disk (optimized) is 20 GB (single core, single index).
> > I
> > > have a system with 64 GB of RAM.  I start Solr with 24 GB of RAM.
> > >
> > > I have run load tests (up to 100 concurrent users) for hours where each
> > > user issuing unique searches (the same search is never executed again
> for
> > > at least 30 minute since it was last executed).  In all tests I run,
> > Solr's
> > > JVM memory never goes over 10 GB (monitoring http://localhost:8983/).
> > >
> > > I read over and over, for optimal performance, Solr should be given
> > enough
> > > RAM to hold the index in memory.  Well, I have done that and some but
> > yet I
> > > don't see Solr using up that whole RAM.  What am I doing wrong?  Is my
> > test
> > > at fault?  I doubled the test load (number of users) and didn't see
> much
> > of
> > > a difference with RAM usage but yet my search performance went down
> > (takes
> > > about 40% longer now).  I run my tests again but this time with only 12
> > GB
> > > of RAM given to Solr.  Test result didn't differ much from the 24 GB
> run
> > > and Solr never used more than 10 GB of RAM.
> > >
> > > Can someone help me understand this?  I don't want to give Solr RAM
> that
> > it
> > > won't use.
> > >
> > > PS: This is simply search tests, there is no update to the index at
> all.
> > >
> > > Thanks in advanced.
> > >
> > > Steve
> >
>


Re: Solr memory usage

2015-12-10 Thread Shawn Heisey
On 12/9/2015 7:56 AM, Steven White wrote:
> Thanks Erick!!  Your summary and the blog by Uwe (thank you too Uwe) are
> very helpful.
>
> A follow up question.  I also noticed the "JVM-Memory" report off Solr's
> home page is fluctuating.  I expect some fluctuation, but it kinda worries
> me when it fluctuates up / down in a range of 4 GB and maybe more.  I.e.:
> at times it is at 5 GB and other times it is at 10 GB (this is while I'm
> running my search tests).  What does such high fluctuation means?
>
> If it helps, Solr's "JVM-Memory" report states 2.5 GB usage when Solr is
> first started and before I run any search on it.  I'm taking this as my
> base startup memory usage.

The heap usage at any particular instant in time (even right after
startup) is nearly useless information.  To reach any useful conclusions
and change your heap size based on those conclusions, heap usage must be
tracked (and ideally graphed) for several minutes or hours, sampling no
less frequently than about every five or ten seconds -- exactly what
programs like JConsole (included with the Java JDK) do.  You will want
to do the tracking/graphing during your your heaviest usage for both
queries and indexing.

See the "How much heap space do I need?" section here for some
relatively vague pointers:

https://wiki.apache.org/solr/SolrPerformanceProblems#How_much_heap_space_do_I_need.3F

Thanks,
Shawn



RE: Solr memory usage

2015-12-09 Thread Markus Jelsma
Steven - this fluctuation is normal, it is eating memory when documents are 
indexed or when searches are handled, this makes the meter go up. The garbage 
collector then frees the memory again. You can start to worry if there is a lot 
of activity but no fluctuation.

M.
 
-Original message-
> From:Steven White 
> Sent: Wednesday 9th December 2015 15:56
> To: solr-user@lucene.apache.org
> Subject: Re: Solr memory usage
> 
> Thanks Erick!!  Your summary and the blog by Uwe (thank you too Uwe) are
> very helpful.
> 
> A follow up question.  I also noticed the "JVM-Memory" report off Solr's
> home page is fluctuating.  I expect some fluctuation, but it kinda worries
> me when it fluctuates up / down in a range of 4 GB and maybe more.  I.e.:
> at times it is at 5 GB and other times it is at 10 GB (this is while I'm
> running my search tests).  What does such high fluctuation means?
> 
> If it helps, Solr's "JVM-Memory" report states 2.5 GB usage when Solr is
> first started and before I run any search on it.  I'm taking this as my
> base startup memory usage.
> 
> Steve
> 
> On Tue, Dec 8, 2015 at 3:17 PM, Erick Erickson 

A follow up question.  I also noticed the "JVM-Memory" report off Solr's
home page is fluctuating.  I expect some fluctuation, but it kinda worries
me when it fluctuates up / down in a range of 4 GB and maybe more.  I.e.:
at times it is at 5 GB and other times it is at 10 GB (this is while I'm
running my search tests).  What does such high fluctuation means?

If it helps, Solr's "JVM-Memory" report states 2.5 GB usage when Solr is
first started and before I run any search on it.  I'm taking this as my
base startup memory usage.
> wrote:
> 
> > You're doing nothing wrong, that particular bit of advice has
> > always needed a bit of explanation.
> >
> > Solr (well, actually Lucene) uses MMapDirectory for much of
> > the index structure which uses the OS memory rather than
> > the JVM heap. See Uwe's excellent:
> >
> > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> >
> > Plus, the size on disk includes the stored data, which is in the *.fdt
> > files in data/index. Very little of the stored data is kept in the JVM
> > so that's another reason your Java heap may be smaller than
> > your raw index size on disk.
> >
> > The advice about fitting your entire index into memory really has
> > the following caveats (at least).
> > 1> "memory" includes the OS memory available to the process
> > 2> The size of the index on disk is misleading, the *.fdt files
> >  should be subtracted in order to get a truer picture.
> > 3> Both Solr and Lucene create structures in the Java JVM
> >  that are _not_ reflected in the size on disk.
> >
> > <1> and <2> mean the JVM memory necessary is smaller
> > than the size on disk.
> >
> > <3> means the JVM memory will be larger than.
> >
> > So you're doing the right thing, testing and seeing what you
> > _really_ need. I'd pretty much take your test, add some
> > padding and consider it good. You're _not_ doing the
> > really bad thing of using the same query over and over
> > again and hoping .
> >
> > Best,
> > Erick
> >
> >
> > On Tue, Dec 8, 2015 at 11:54 AM, Steven White 
> > wrote:
> > > Hi folks,
> > >
> > > My index size on disk (optimized) is 20 GB (single core, single index).
> > I
> > > have a system with 64 GB of RAM.  I start Solr with 24 GB of RAM.
> > >
> > > I have run load tests (up to 100 concurrent users) for hours where each
> > > user issuing unique searches (the same search is never executed again for
> > > at least 30 minute since it was last executed).  In all tests I run,
> > Solr's
> > > JVM memory never goes over 10 GB (monitoring http://localhost:8983/).
> > >
> > > I read over and over, for optimal performance, Solr should be given
> > enough
> > > RAM to hold the index in memory.  Well, I have done that and some but
> > yet I
> > > don't see Solr using up that whole RAM.  What am I doing wrong?  Is my
> > test
> > > at fault?  I doubled the test load (number of users) and didn't see much
> > of
> > > a difference with RAM usage but yet my search performance went down
> > (takes
> > > about 40% longer now).  I run my tests again but this time with only 12
> > GB
> > > of RAM given to Solr.  Test result didn't differ much from the 24 GB run
> > > and Solr never used more than 10 GB of RAM.
> > >
> > > Can someone help me understand this?  I don't want to give Solr RAM that
> > it
> > > won't use.
> > >
> > > PS: This is simply search tests, there is no update to the index at all.
> > >
> > > Thanks in advanced.
> > >
> > > Steve
> >
> 


Re: Solr memory usage

2015-12-09 Thread Steven White
Thanks Erick!!  Your summary and the blog by Uwe (thank you too Uwe) are
very helpful.

A follow up question.  I also noticed the "JVM-Memory" report off Solr's
home page is fluctuating.  I expect some fluctuation, but it kinda worries
me when it fluctuates up / down in a range of 4 GB and maybe more.  I.e.:
at times it is at 5 GB and other times it is at 10 GB (this is while I'm
running my search tests).  What does such high fluctuation means?

If it helps, Solr's "JVM-Memory" report states 2.5 GB usage when Solr is
first started and before I run any search on it.  I'm taking this as my
base startup memory usage.

Steve

On Tue, Dec 8, 2015 at 3:17 PM, Erick Erickson 
wrote:

> You're doing nothing wrong, that particular bit of advice has
> always needed a bit of explanation.
>
> Solr (well, actually Lucene) uses MMapDirectory for much of
> the index structure which uses the OS memory rather than
> the JVM heap. See Uwe's excellent:
>
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> Plus, the size on disk includes the stored data, which is in the *.fdt
> files in data/index. Very little of the stored data is kept in the JVM
> so that's another reason your Java heap may be smaller than
> your raw index size on disk.
>
> The advice about fitting your entire index into memory really has
> the following caveats (at least).
> 1> "memory" includes the OS memory available to the process
> 2> The size of the index on disk is misleading, the *.fdt files
>  should be subtracted in order to get a truer picture.
> 3> Both Solr and Lucene create structures in the Java JVM
>  that are _not_ reflected in the size on disk.
>
> <1> and <2> mean the JVM memory necessary is smaller
> than the size on disk.
>
> <3> means the JVM memory will be larger than.
>
> So you're doing the right thing, testing and seeing what you
> _really_ need. I'd pretty much take your test, add some
> padding and consider it good. You're _not_ doing the
> really bad thing of using the same query over and over
> again and hoping .
>
> Best,
> Erick
>
>
> On Tue, Dec 8, 2015 at 11:54 AM, Steven White 
> wrote:
> > Hi folks,
> >
> > My index size on disk (optimized) is 20 GB (single core, single index).
> I
> > have a system with 64 GB of RAM.  I start Solr with 24 GB of RAM.
> >
> > I have run load tests (up to 100 concurrent users) for hours where each
> > user issuing unique searches (the same search is never executed again for
> > at least 30 minute since it was last executed).  In all tests I run,
> Solr's
> > JVM memory never goes over 10 GB (monitoring http://localhost:8983/).
> >
> > I read over and over, for optimal performance, Solr should be given
> enough
> > RAM to hold the index in memory.  Well, I have done that and some but
> yet I
> > don't see Solr using up that whole RAM.  What am I doing wrong?  Is my
> test
> > at fault?  I doubled the test load (number of users) and didn't see much
> of
> > a difference with RAM usage but yet my search performance went down
> (takes
> > about 40% longer now).  I run my tests again but this time with only 12
> GB
> > of RAM given to Solr.  Test result didn't differ much from the 24 GB run
> > and Solr never used more than 10 GB of RAM.
> >
> > Can someone help me understand this?  I don't want to give Solr RAM that
> it
> > won't use.
> >
> > PS: This is simply search tests, there is no update to the index at all.
> >
> > Thanks in advanced.
> >
> > Steve
>


Re: Solr memory usage

2015-12-08 Thread Erick Erickson
You're doing nothing wrong, that particular bit of advice has
always needed a bit of explanation.

Solr (well, actually Lucene) uses MMapDirectory for much of
the index structure which uses the OS memory rather than
the JVM heap. See Uwe's excellent:

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Plus, the size on disk includes the stored data, which is in the *.fdt
files in data/index. Very little of the stored data is kept in the JVM
so that's another reason your Java heap may be smaller than
your raw index size on disk.

The advice about fitting your entire index into memory really has
the following caveats (at least).
1> "memory" includes the OS memory available to the process
2> The size of the index on disk is misleading, the *.fdt files
 should be subtracted in order to get a truer picture.
3> Both Solr and Lucene create structures in the Java JVM
 that are _not_ reflected in the size on disk.

<1> and <2> mean the JVM memory necessary is smaller
than the size on disk.

<3> means the JVM memory will be larger than.

So you're doing the right thing, testing and seeing what you
_really_ need. I'd pretty much take your test, add some
padding and consider it good. You're _not_ doing the
really bad thing of using the same query over and over
again and hoping .

Best,
Erick


On Tue, Dec 8, 2015 at 11:54 AM, Steven White  wrote:
> Hi folks,
>
> My index size on disk (optimized) is 20 GB (single core, single index).  I
> have a system with 64 GB of RAM.  I start Solr with 24 GB of RAM.
>
> I have run load tests (up to 100 concurrent users) for hours where each
> user issuing unique searches (the same search is never executed again for
> at least 30 minute since it was last executed).  In all tests I run, Solr's
> JVM memory never goes over 10 GB (monitoring http://localhost:8983/).
>
> I read over and over, for optimal performance, Solr should be given enough
> RAM to hold the index in memory.  Well, I have done that and some but yet I
> don't see Solr using up that whole RAM.  What am I doing wrong?  Is my test
> at fault?  I doubled the test load (number of users) and didn't see much of
> a difference with RAM usage but yet my search performance went down (takes
> about 40% longer now).  I run my tests again but this time with only 12 GB
> of RAM given to Solr.  Test result didn't differ much from the 24 GB run
> and Solr never used more than 10 GB of RAM.
>
> Can someone help me understand this?  I don't want to give Solr RAM that it
> won't use.
>
> PS: This is simply search tests, there is no update to the index at all.
>
> Thanks in advanced.
>
> Steve


Solr memory usage

2015-12-08 Thread Steven White
Hi folks,

My index size on disk (optimized) is 20 GB (single core, single index).  I
have a system with 64 GB of RAM.  I start Solr with 24 GB of RAM.

I have run load tests (up to 100 concurrent users) for hours where each
user issuing unique searches (the same search is never executed again for
at least 30 minute since it was last executed).  In all tests I run, Solr's
JVM memory never goes over 10 GB (monitoring http://localhost:8983/).

I read over and over, for optimal performance, Solr should be given enough
RAM to hold the index in memory.  Well, I have done that and some but yet I
don't see Solr using up that whole RAM.  What am I doing wrong?  Is my test
at fault?  I doubled the test load (number of users) and didn't see much of
a difference with RAM usage but yet my search performance went down (takes
about 40% longer now).  I run my tests again but this time with only 12 GB
of RAM given to Solr.  Test result didn't differ much from the 24 GB run
and Solr never used more than 10 GB of RAM.

Can someone help me understand this?  I don't want to give Solr RAM that it
won't use.

PS: This is simply search tests, there is no update to the index at all.

Thanks in advanced.

Steve


Re: Confusing SOLR 5 memory usage

2015-04-21 Thread Toke Eskildsen
Tom Evans  wrote:
> I do apologise for wasting anyone's time on this, the PEBKAC (my
> keyboard and chair unfortunately). When adding the new server to
> haproxy, I updated the label for the balancer entry to the new server,
> but left the host name the same, so the server that wasn't using any
> RAM... wasn't getting any requests.

No problem at all. On the contrary, thank you for closing the issue.

- Toke Eskildsen


Re: Confusing SOLR 5 memory usage

2015-04-21 Thread Tom Evans
I do apologise for wasting anyone's time on this, the PEBKAC (my
keyboard and chair unfortunately). When adding the new server to
haproxy, I updated the label for the balancer entry to the new server,
but left the host name the same, so the server that wasn't using any
RAM... wasn't getting any requests.

Again, sorry!

Tom

On Tue, Apr 21, 2015 at 11:54 AM, Tom Evans  wrote:
> We monitor them with munin, so I have charts if attachments are
> acceptable? Having said that, they have only been running for a day
> with this memory allocation..
>
> Describing them, the master consistently has 8GB used for apps, the
> 8GB used in cache, whilst the slave consistently only uses ~1.5GB for
> apps, 14GB used in cache.
>
> We are trying to use our SOLR servers to do a lot more facet queries,
> previously we were mainly doing searches, and the
> SolrPerformanceProblems wiki page mentions that faceting (amongst
> others) require a lot of JVM heap, so I'm confused why it is not using
> the heap we've allocated on one server, whilst it is on the other
> server. Perhaps our master server needs even more heap?
>
> Also, my infra guy is wondering why I asked him to add more memory to
> the slave server, if it is "just" in cache, although I did try to
> explain that ideally, I'd have even more in cache - we have about 35GB
> of index data.
>
> Cheers
>
> Tom
>
> On Tue, Apr 21, 2015 at 11:25 AM, Markus Jelsma
>  wrote:
>> Hi - what do you see if you monitor memory over time? You should see a 
>> typical saw tooth.
>> Markus
>>
>> -Original message-
>>> From:Tom Evans 
>>> Sent: Tuesday 21st April 2015 12:22
>>> To: solr-user@lucene.apache.org
>>> Subject: Confusing SOLR 5 memory usage
>>>
>>> Hi all
>>>
>>> I have two SOLR 5 servers, one is the master and one is the slave.
>>> They both have 12 cores, fully replicated and giving identical results
>>> when querying them. The only difference between configuration on the
>>> two servers is that one is set to slave from the other - identical
>>> core configs and solr.in.sh.
>>>
>>> They both run on identical VMs with 16GB of RAM. In solr.in.sh, we are
>>> setting the heap size identically:
>>>
>>> SOLR_JAVA_MEM="-Xms512m -Xmx7168m"
>>>
>>> The two servers are balanced behind haproxy, and identical numbers and
>>> types of queries flow to both servers. Indexing only happens once a
>>> day.
>>>
>>> When viewing the memory usage of the servers, the master server's JVM
>>> has 8.8GB RSS, but the slave only has 1.2GB RSS.
>>>
>>> Can someone hit me with the cluebat please? :)
>>>
>>> Cheers
>>>
>>> Tom
>>>


Re: Confusing SOLR 5 memory usage

2015-04-21 Thread Tom Evans
We monitor them with munin, so I have charts if attachments are
acceptable? Having said that, they have only been running for a day
with this memory allocation..

Describing them, the master consistently has 8GB used for apps, the
8GB used in cache, whilst the slave consistently only uses ~1.5GB for
apps, 14GB used in cache.

We are trying to use our SOLR servers to do a lot more facet queries,
previously we were mainly doing searches, and the
SolrPerformanceProblems wiki page mentions that faceting (amongst
others) require a lot of JVM heap, so I'm confused why it is not using
the heap we've allocated on one server, whilst it is on the other
server. Perhaps our master server needs even more heap?

Also, my infra guy is wondering why I asked him to add more memory to
the slave server, if it is "just" in cache, although I did try to
explain that ideally, I'd have even more in cache - we have about 35GB
of index data.

Cheers

Tom

On Tue, Apr 21, 2015 at 11:25 AM, Markus Jelsma
 wrote:
> Hi - what do you see if you monitor memory over time? You should see a 
> typical saw tooth.
> Markus
>
> -Original message-
>> From:Tom Evans 
>> Sent: Tuesday 21st April 2015 12:22
>> To: solr-user@lucene.apache.org
>> Subject: Confusing SOLR 5 memory usage
>>
>> Hi all
>>
>> I have two SOLR 5 servers, one is the master and one is the slave.
>> They both have 12 cores, fully replicated and giving identical results
>> when querying them. The only difference between configuration on the
>> two servers is that one is set to slave from the other - identical
>> core configs and solr.in.sh.
>>
>> They both run on identical VMs with 16GB of RAM. In solr.in.sh, we are
>> setting the heap size identically:
>>
>> SOLR_JAVA_MEM="-Xms512m -Xmx7168m"
>>
>> The two servers are balanced behind haproxy, and identical numbers and
>> types of queries flow to both servers. Indexing only happens once a
>> day.
>>
>> When viewing the memory usage of the servers, the master server's JVM
>> has 8.8GB RSS, but the slave only has 1.2GB RSS.
>>
>> Can someone hit me with the cluebat please? :)
>>
>> Cheers
>>
>> Tom
>>


RE: Confusing SOLR 5 memory usage

2015-04-21 Thread Markus Jelsma
Hi - what do you see if you monitor memory over time? You should see a typical 
saw tooth.
Markus 
 
-Original message-
> From:Tom Evans 
> Sent: Tuesday 21st April 2015 12:22
> To: solr-user@lucene.apache.org
> Subject: Confusing SOLR 5 memory usage
> 
> Hi all
> 
> I have two SOLR 5 servers, one is the master and one is the slave.
> They both have 12 cores, fully replicated and giving identical results
> when querying them. The only difference between configuration on the
> two servers is that one is set to slave from the other - identical
> core configs and solr.in.sh.
> 
> They both run on identical VMs with 16GB of RAM. In solr.in.sh, we are
> setting the heap size identically:
> 
> SOLR_JAVA_MEM="-Xms512m -Xmx7168m"
> 
> The two servers are balanced behind haproxy, and identical numbers and
> types of queries flow to both servers. Indexing only happens once a
> day.
> 
> When viewing the memory usage of the servers, the master server's JVM
> has 8.8GB RSS, but the slave only has 1.2GB RSS.
> 
> Can someone hit me with the cluebat please? :)
> 
> Cheers
> 
> Tom
> 


Confusing SOLR 5 memory usage

2015-04-21 Thread Tom Evans
Hi all

I have two SOLR 5 servers, one is the master and one is the slave.
They both have 12 cores, fully replicated and giving identical results
when querying them. The only difference between configuration on the
two servers is that one is set to slave from the other - identical
core configs and solr.in.sh.

They both run on identical VMs with 16GB of RAM. In solr.in.sh, we are
setting the heap size identically:

SOLR_JAVA_MEM="-Xms512m -Xmx7168m"

The two servers are balanced behind haproxy, and identical numbers and
types of queries flow to both servers. Indexing only happens once a
day.

When viewing the memory usage of the servers, the master server's JVM
has 8.8GB RSS, but the slave only has 1.2GB RSS.

Can someone hit me with the cluebat please? :)

Cheers

Tom


Re: High memory usage while querying with sort using cursor

2015-03-18 Thread Vaibhav Bhandari
Thanks Chris, that makes a lot of sense.



On Wed, Mar 18, 2015 at 3:16 PM, Chris Hostetter 
wrote:

>
> : A simple query on the collection: ../select?q=*:* works perfectly fine.
> :
> : But as soon as i add sorting, it crashes the nodes with OOM:
> : .../select?q=*:*&sort=unique_id asc&rows=0.
>
> if you don't have docValues="true" on your unique_id field, then sorting
> rquires it to build up a large in memory data strucutre (formally known as
> "FieldCache", now just an on the fly DocValues structure)
>
> With explicit docValues constructed at index time, a lot of that data can
> just live in the operating system's filesystem cache, and lucene only has
> to load a small potion of it into the heap.
>
>
>
> -Hoss
> http://www.lucidworks.com/
>


Re: High memory usage while querying with sort using cursor

2015-03-18 Thread Chris Hostetter

: A simple query on the collection: ../select?q=*:* works perfectly fine.
: 
: But as soon as i add sorting, it crashes the nodes with OOM:
: .../select?q=*:*&sort=unique_id asc&rows=0.

if you don't have docValues="true" on your unique_id field, then sorting 
rquires it to build up a large in memory data strucutre (formally known as 
"FieldCache", now just an on the fly DocValues structure)

With explicit docValues constructed at index time, a lot of that data can 
just live in the operating system's filesystem cache, and lucene only has 
to load a small potion of it into the heap.



-Hoss
http://www.lucidworks.com/


High memory usage while querying with sort using cursor

2015-03-18 Thread Vaibhav Bhandari
Hi all,

My setup is as follows:

*Collection* size: 32GB, 2 shards, replication factor: 2 (~16GB on each
replica). Number of rows: 250million
4 *Solr* nodes: RAM: 30GB each. Heap size: 8GB. Version: 4.9.1

Besides the collection in question, the nodes have some other collections
present. The total size of all collections of each node is 30GB (which is
the same as the amount of RAM on them).

A simple query on the collection: ../select?q=*:* works perfectly fine.

But as soon as i add sorting, it crashes the nodes with OOM:
.../select?q=*:*&sort=unique_id asc&rows=0.

I have tried to disable filter-cache and query-result-cache. But that did
not help either.

Any ideas/suggestions?

Thanks,
Vaibhav


Re: Field collapsing memory usage

2015-01-23 Thread Toke Eskildsen
On Thu, 2015-01-22 at 22:52 +0100, Erick Erickson wrote:
> What do you think about folding this into the Solr (or Lucene?) code
> base? Or is it to specialized?

(writing under the assumption that DVEnabler actually works as it should
for everyone and not just us)

Right now it is an explicit tool. As such, users need to find it and
learn how to use it, which is a large barrier. Most of the time it is
easier just to re-index everything.

It seems to me that it should be possible to do seamlessly instead:
Simply change the schema and reload. Old segments would have emulated
DocValues (high speed, high memory overhead), new segments would have
pure DVs. An optimize would be optional, but highly recommended.

- Toke Eskildsen, State and University Library, Denmark




Re: Field collapsing memory usage

2015-01-22 Thread Erick Erickson
Toke:

What do you think about folding this into the Solr (or Lucene?) code
base? Or is it to specialized?

Not sure one way or the other, just askin'

Erick

On Thu, Jan 22, 2015 at 3:47 AM, Toke Eskildsen  
wrote:
> Norgorn [lsunnyd...@mail.ru] wrote:
>> Is there any way to make 'docValues="true"' without reindexing?
>
> Depends on how brave you are :-)
>
> We recently had the same need and made 
> https://github.com/netarchivesuite/dvenabler
> To my knowledge that is the only existing tool for that task an as we are the 
> only ones having used it, robustness is not guaranteed. Warnings aside, it 
> works without problems in our tests as well as the few real corpuses we have 
> tested on. It does use a fairly memory hungry structure during the 
> conversion. If the number of _unique_ values in your grouping field 
> approaches 1b, I loosely guess that you will need 40GB+ of heap. Do read 
> https://github.com/netarchivesuite/dvenabler/issues/14 if you want to try it.
>
> - Toke Eskildsen


RE: Field collapsing memory usage

2015-01-22 Thread Toke Eskildsen
Norgorn [lsunnyd...@mail.ru] wrote:
> Nice, thanks!
> If u'd like to, I'll write our results with that amazing util.

By all means, please do. Good as well as bad. Independent testing is needed to 
ensure proper working tools.

- Toke Eskildsen


RE: Field collapsing memory usage

2015-01-22 Thread Norgorn
Nice, thanks!
If u'd like to, I'll write our results with that amazing util.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Field-collapsing-memory-usage-tp4181092p4181159.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Field collapsing memory usage

2015-01-22 Thread Toke Eskildsen
Norgorn [lsunnyd...@mail.ru] wrote:
> Is there any way to make 'docValues="true"' without reindexing?

Depends on how brave you are :-)

We recently had the same need and made 
https://github.com/netarchivesuite/dvenabler
To my knowledge that is the only existing tool for that task an as we are the 
only ones having used it, robustness is not guaranteed. Warnings aside, it 
works without problems in our tests as well as the few real corpuses we have 
tested on. It does use a fairly memory hungry structure during the conversion. 
If the number of _unique_ values in your grouping field approaches 1b, I 
loosely guess that you will need 40GB+ of heap. Do read 
https://github.com/netarchivesuite/dvenabler/issues/14 if you want to try it.

- Toke Eskildsen


RE: Field collapsing memory usage

2015-01-22 Thread Norgorn
Thank you for your answer.
We've found out that the problem was in our SOLR spec (heliosearch 0.08).
There are no crushes, after changing to 4.10.3 (although, there are lot of
OOMs while handling query, it's not really strange for 1.1 bil of documents
).
Now we are going to try latest Heliosearch.

Is there any way to make 'docValues="true"' without reindexing?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Field-collapsing-memory-usage-tp4181092p4181108.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Field collapsing memory usage

2015-01-21 Thread Toke Eskildsen
Norgorn [lsunnyd...@mail.ru] wrote:
> So, as we see, memory, used by first shard to group, wasn't released.
> Caches are already nearly zero.

It should be one or the other: Either the memory is released or there is 
something in the caches. Anyway, DocValues is the way to go, so ensure that it 
turned on for your group field: We do grouping on indexes with 250M documents 
(and 200M+ unique values in the group field) without any significant memory 
overhead, using DocValues.

Caveat: If you ask for very large result sets, the memory usage will be high. 
But only temporarily.

- Toke Eskildsen


Field collapsing memory usage

2015-01-21 Thread Norgorn
We are trying to run SOLR with big index, using as little RAM as possible.
Simple search for our cases works nice, but field collapsing (group=true)
queries fall with OOM.

Our setup is several shards per SOLR entity, each shard on it's own HDD.
We've tried same queries, but to one specific shard, and those queries
worked well (no OOMs).

Then we changed shard being queried and measured RAM usage. We saw, that
while there is only one shard being queried, used RAM increased
significantly.

So, as we see, memory, used by first shard to group, wasn't released.
Caches are already nearly zero.

Changing shards, we've managed to make SOLR fall.

My question is, why is it so? What do we need to do, to release memory, to,
at the end, be able to query shards alternately (cause parallel group query
fails nearly always)?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Field-collapsing-memory-usage-tp4181092.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Memory Usage - How to reduce memory footprint for solr

2015-01-07 Thread Erick Erickson
And keep in mind that starving the OS of memory to
give it to the JVM is an anti-pattern, see Uwe's
excellent blog on MMapDirectory here:

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Best,
Erick

On Wed, Jan 7, 2015 at 5:55 AM, Shawn Heisey  wrote:
> On 1/6/2015 1:10 PM, Abhishek Sharma wrote:
>> *Q* - I am forced to set Java Xmx as high as 3.5g for my solr app.. If i
>> keep this low, my CPU hits 100% and response time for indexing increases a
>> lot.. And i have hit OOM Error as well when this value is low..
>>
>> Is this too high? If so, how can I reduce this?
>>
>> *Machine Details* 4 G RAM, SSD
>>
>> *Solr App Details* (Standalone solr app, no shards)
>>
>>1. num. of Solr Cores = 5
>>2. Index Size - 2 g
>>3. num. of Search Hits per sec - 10 [*IMP* - All search queries have
>>faceting..]
>>4. num. of times Re-Indexing per hour per core - 10 (it may happen at
>>the same time at a moment for all the 5 cores)
>>5. Query Result Cache, Document cache and Filter Cache are all default
>>size - 4 kb.
>>
>> *top* stats -
>>
>>   VIRTRESSHR S %CPU %MEM
>> 6446600 3.478g  18308 S 11.3 94.6
>>
>> *iotop* stats
>>
>>  DISK READ  DISK WRITE  SWAPIN IO>
>> 0-1200 K/s0-100 K/s  0  0-5%
>
> Your questions cannot be easily answered.  We can make guesses, but in
> the end, figuring out how much hardware and exactly what configs to use
> is something that only you can determine, by actually trying it:
>
> https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> The following URL is the only general guideline I know of, and because
> of the problems mentioned on the blog post above, it's not all that
> helpful for specifics.  Full disclosure of my bias ... I wrote most of
> this wiki page:
>
> http://wiki.apache.org/solr/SolrPerformanceProblems
>
> Any recommendation we make will err on the side of caution, and may
> involve spending more money for your hardware than you intended.  I
> personally would not try to get a Solr install going on machines with
> only 4GB of RAM unless it was a VERY small index.  Your mentioned heap
> size of 3.5GB is quite small compared to what we normally see here.  My
> own production heaps for Solr are 6GB.
>
> Thanks,
> Shawn
>


Re: Solr Memory Usage - How to reduce memory footprint for solr

2015-01-07 Thread Shawn Heisey
On 1/6/2015 1:10 PM, Abhishek Sharma wrote:
> *Q* - I am forced to set Java Xmx as high as 3.5g for my solr app.. If i
> keep this low, my CPU hits 100% and response time for indexing increases a
> lot.. And i have hit OOM Error as well when this value is low..
> 
> Is this too high? If so, how can I reduce this?
> 
> *Machine Details* 4 G RAM, SSD
> 
> *Solr App Details* (Standalone solr app, no shards)
> 
>1. num. of Solr Cores = 5
>2. Index Size - 2 g
>3. num. of Search Hits per sec - 10 [*IMP* - All search queries have
>faceting..]
>4. num. of times Re-Indexing per hour per core - 10 (it may happen at
>the same time at a moment for all the 5 cores)
>5. Query Result Cache, Document cache and Filter Cache are all default
>size - 4 kb.
> 
> *top* stats -
> 
>   VIRTRESSHR S %CPU %MEM
> 6446600 3.478g  18308 S 11.3 94.6
> 
> *iotop* stats
> 
>  DISK READ  DISK WRITE  SWAPIN IO>
> 0-1200 K/s0-100 K/s  0  0-5%

Your questions cannot be easily answered.  We can make guesses, but in
the end, figuring out how much hardware and exactly what configs to use
is something that only you can determine, by actually trying it:

https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

The following URL is the only general guideline I know of, and because
of the problems mentioned on the blog post above, it's not all that
helpful for specifics.  Full disclosure of my bias ... I wrote most of
this wiki page:

http://wiki.apache.org/solr/SolrPerformanceProblems

Any recommendation we make will err on the side of caution, and may
involve spending more money for your hardware than you intended.  I
personally would not try to get a Solr install going on machines with
only 4GB of RAM unless it was a VERY small index.  Your mentioned heap
size of 3.5GB is quite small compared to what we normally see here.  My
own production heaps for Solr are 6GB.

Thanks,
Shawn



RE: Solr Memory Usage - How to reduce memory footprint for solr

2015-01-06 Thread Toke Eskildsen
Abhishek Sharma [abhishe...@unbxd.com] wrote:

> *Q* - I am forced to set Java Xmx as high as 3.5g for my solr app.. If i
> keep this low, my CPU hits 100% and response time for indexing increases a
> lot.. And i have hit OOM Error as well when this value is low..

[...]

>   2. Index Size - 2 g
>   3. num. of Search Hits per sec - 10 [*IMP* - All search queries have
>   faceting..]

Faceting is often the reason for high memory usage. If you are not already 
doing so, do enable DocValues for the fields you are faceting on. If you have a 
lot of unique values in your facets (millions), you might also consider 
limiting the amount of concurrent searches.

Still, 3.5GB heap seems like quite a bit for a 2GB index. How many documents do 
you have?

- Toke Eskildsen


Solr Memory Usage - How to reduce memory footprint for solr

2015-01-06 Thread Abhishek Sharma
*Q* - I am forced to set Java Xmx as high as 3.5g for my solr app.. If i
keep this low, my CPU hits 100% and response time for indexing increases a
lot.. And i have hit OOM Error as well when this value is low..

Is this too high? If so, how can I reduce this?

*Machine Details* 4 G RAM, SSD

*Solr App Details* (Standalone solr app, no shards)

   1. num. of Solr Cores = 5
   2. Index Size - 2 g
   3. num. of Search Hits per sec - 10 [*IMP* - All search queries have
   faceting..]
   4. num. of times Re-Indexing per hour per core - 10 (it may happen at
   the same time at a moment for all the 5 cores)
   5. Query Result Cache, Document cache and Filter Cache are all default
   size - 4 kb.

*top* stats -

  VIRTRESSHR S %CPU %MEM
6446600 3.478g  18308 S 11.3 94.6

*iotop* stats

 DISK READ  DISK WRITE  SWAPIN IO>
0-1200 K/s0-100 K/s  0  0-5%


Re: Solr Memory Usage

2014-10-30 Thread Toke Eskildsen
On Wed, 2014-10-29 at 23:37 +0100, Will Martin wrote:
> This command only touches OS level caches that hold pages destined for (or
> not) the swap cache. Its use means that disk will be hit on future requests,
> but in many instances the pages were headed for ejection anyway.
> 
> It does not have anything whatsoever to do with Solr caches.

If you re-read my post, you will see "the OS had to spend a lot of
resources just bookkeeping memory". OS, not JVM.

> It also is not fragmentation related; it is a result of the kernel
> managing virtual pages in an "as designed manner". The proper command
> is
> 
> #sync; echo 3 >/proc/sys/vm/drop_caches. 

I just talked with a Systems guy to verify what happened when we had
the problem:

- The machine spawned Xmx1g JVMs with Tika, each instance processing a 
  single 100M ARC file, sending the result to a shared Solr instance 
  and shutting down. 40 instances were running at all times, each 
  instance living for a little less than 3 minutes.
  Besides taking ~40GB of RAM in total, this also meant that about 10GB 
  of RAM was released and re-requested from the system each minute.
  I don't know how the memory mapping in Solr works with regard to
  re-use of existing allocations, so I can't say if Solr added to than
  number or not.

- The indexing speed deteriorated after some days, grinding down to 
  (loose guess) something like 1/4th of initial speed.

- Running top showed that the majority of time was spend in the kernel.

- Running "echo 3 >/proc/sys/vm/drop_caches" (I asked Systems explicitly
  about the integer and it was '3') brought the speed back to the 
  initial level. The temporary patch was to run it once every hour.

- Running top with the patch showed the vast majority of time was spend 
  in user space.

- Systems investigated and determined that "huge pages" were 
  automatically requested by processes on the machine, leading to 
  (virtual) memory fragmentation on the OS level. They used a tool in 
  'sysfsutils' (just relaying what they said here) to change the default
  from huge pages to small pages (or whatever the default is named).

- The disabling of huge pages made the problem go away and we no longer
  use the drop_caches-trick.

> http://linux.die.net/man/5/proc
> 
> I have encountered resistance on the use of this on long-running processes
> for years ... from people who don't even research the matter.

The resistance is natural: Although it might work to drop_cache, as it
did for us, it is still symptom treatment. Until the cause has been
isolated and determined to be practically unresolvable, the drop_cache
is a red flag.

Your undetermined core problem might not be the same as ours, but it is
simple to check: Watch kernel time percentage. If it rises over time,
try disabling huge pages.

- Toke Eskildsen, State and University Library, Denmark




Re: Solr Memory Usage

2014-10-29 Thread Shawn Heisey
On 10/29/2014 1:05 PM, Toke Eskildsen wrote:
> We did have some problems on a 256GB machine churning terabytes of data 
> through 40 concurrent Tika processes and into Solr. After some days, 
> performance got really bad. When we did a top, we noticed that most of the 
> time was used in the kernel (the 'sy' on the '%Cpu(s):'-line). The 
> drop_caches trick worked for us too. Our systems guys explained that it was 
> because of virtual memory space fragmentation, so the OS had to spend a lot 
> of resources just bookkeeping memory.

There's always at least one exception to any general advice, including
whatever I come up with!  It's really too bad that it didn't Just Work
(tm) for you.  Weird things can happen when you start down the path of
extreme scaling, though.

Thank you for exploring the bleeding edge for us!

Shawn



RE: Solr Memory Usage

2014-10-29 Thread Will Martin
Oops. My wording was poor. My reference to those who don't research the
matter was pointing at a large number of engineers I have worked with; not
this list.

-Original Message-
From: Will Martin [mailto:wmartin...@gmail.com] 
Sent: Wednesday, October 29, 2014 6:38 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: Solr Memory Usage

This command only touches OS level caches that hold pages destined for (or
not) the swap cache. Its use means that disk will be hit on future requests,
but in many instances the pages were headed for ejection anyway.

It does not have anything whatsoever to do with Solr caches.  It also is not
fragmentation related; it is a result of the kernel managing virtual pages
in an "as designed manner". The proper command is

#sync; echo 3 >/proc/sys/vm/drop_caches. 

http://linux.die.net/man/5/proc

I have encountered resistance on the use of this on long-running processes
for years ... from people who don't even research the matter.



-Original Message-
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
Sent: Wednesday, October 29, 2014 3:06 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr Memory Usage

Vijay Kokatnur [kokatnur.vi...@gmail.com] wrote:
> For the Solr Cloud setup, we are running a cron job with following 
> command to clear out the inactive memory.  It  is working as expected.
> Even though the index size of Cloud is 146GB, the used memory is always
below 55GB.
> Our response times are better and no errors/exceptions are thrown. 
> (This command causes issue in 2 Shard setup)

> echo 3 > /proc/sys/vm/drop_caches

As Shawn points out, this is under normal circumstances a very bad idea,
but...

> Has anyone faced this issue before?

We did have some problems on a 256GB machine churning terabytes of data
through 40 concurrent Tika processes and into Solr. After some days,
performance got really bad. When we did a top, we noticed that most of the
time was used in the kernel (the 'sy' on the '%Cpu(s):'-line). The
drop_caches trick worked for us too. Our systems guys explained that it was
because of virtual memory space fragmentation, so the OS had to spend a lot
of resources just bookkeeping memory.

Try keeping an eye on the fraction of processing power spend on the kernel
from you clear the cache until it performance gets bad again. If it rises
drastically, you might have the same problem.

- Toke Eskildsen



RE: Solr Memory Usage

2014-10-29 Thread Will Martin
This command only touches OS level caches that hold pages destined for (or
not) the swap cache. Its use means that disk will be hit on future requests,
but in many instances the pages were headed for ejection anyway.

It does not have anything whatsoever to do with Solr caches.  It also is not
fragmentation related; it is a result of the kernel managing virtual pages
in an "as designed manner". The proper command is

#sync; echo 3 >/proc/sys/vm/drop_caches. 

http://linux.die.net/man/5/proc

I have encountered resistance on the use of this on long-running processes
for years ... from people who don't even research the matter.



-Original Message-
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] 
Sent: Wednesday, October 29, 2014 3:06 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr Memory Usage

Vijay Kokatnur [kokatnur.vi...@gmail.com] wrote:
> For the Solr Cloud setup, we are running a cron job with following 
> command to clear out the inactive memory.  It  is working as expected.  
> Even though the index size of Cloud is 146GB, the used memory is always
below 55GB.
> Our response times are better and no errors/exceptions are thrown. 
> (This command causes issue in 2 Shard setup)

> echo 3 > /proc/sys/vm/drop_caches

As Shawn points out, this is under normal circumstances a very bad idea,
but...

> Has anyone faced this issue before?

We did have some problems on a 256GB machine churning terabytes of data
through 40 concurrent Tika processes and into Solr. After some days,
performance got really bad. When we did a top, we noticed that most of the
time was used in the kernel (the 'sy' on the '%Cpu(s):'-line). The
drop_caches trick worked for us too. Our systems guys explained that it was
because of virtual memory space fragmentation, so the OS had to spend a lot
of resources just bookkeeping memory.

Try keeping an eye on the fraction of processing power spend on the kernel
from you clear the cache until it performance gets bad again. If it rises
drastically, you might have the same problem.

- Toke Eskildsen



RE: Solr Memory Usage

2014-10-29 Thread Toke Eskildsen
Vijay Kokatnur [kokatnur.vi...@gmail.com] wrote:
> For the Solr Cloud setup, we are running a cron job with following command
> to clear out the inactive memory.  It  is working as expected.  Even though
> the index size of Cloud is 146GB, the used memory is always below 55GB.
> Our response times are better and no errors/exceptions are thrown. (This
> command causes issue in 2 Shard setup)

> echo 3 > /proc/sys/vm/drop_caches

As Shawn points out, this is under normal circumstances a very bad idea, but...

> Has anyone faced this issue before?

We did have some problems on a 256GB machine churning terabytes of data through 
40 concurrent Tika processes and into Solr. After some days, performance got 
really bad. When we did a top, we noticed that most of the time was used in the 
kernel (the 'sy' on the '%Cpu(s):'-line). The drop_caches trick worked for us 
too. Our systems guys explained that it was because of virtual memory space 
fragmentation, so the OS had to spend a lot of resources just bookkeeping 
memory.

Try keeping an eye on the fraction of processing power spend on the kernel from 
you clear the cache until it performance gets bad again. If it rises 
drastically, you might have the same problem.

- Toke Eskildsen


Re: Solr Memory Usage

2014-10-29 Thread Shawn Heisey
On 10/29/2014 11:43 AM, Vijay Kokatnur wrote:
> I am observing some weird behavior with how Solr is using memory.  We are
> running both Solr and zookeeper on the same node.  We tested memory
> settings on Solr Cloud Setup of 1 shard with 146GB index size, and 2 Shard
> Solr setup with 44GB index size.  Both are running on similar beefy
> machines.
>
>  After running the setup for 3-4 days, I see that a lot of memory is
> inactive in all the nodes -
>
>  99052952  total memory
>  98606256  used memory
>  19143796  active memory
>  75063504  inactive memory
>
> And inactive memory is never reclaimed by the OS.  When total memory size
> is reached, latency and disk IO shoots up.  We observed this behavior in
> both Solr Cloud setup with 1 shard and Solr setup with 2 shards.

Where are these numbers coming from?  If they are coming from the
operating system and not Java, then you have nothing to worry about.

> For the Solr Cloud setup, we are running a cron job with following command
> to clear out the inactive memory.  It  is working as expected.  Even though
> the index size of Cloud is 146GB, the used memory is always below 55GB.
> Our response times are better and no errors/exceptions are thrown. (This
> command causes issue in 2 Shard setup)
>
> echo 3 > /proc/sys/vm/drop_caches

Don't do that.  You're throwing away almost every performance advantage
the operating system has to offer.  If this changes the numbers so they
look better to you, then I can almost guarantee you that you are not
having any actual problem, and that dropping the caches like this is
*hurting* performance, not helping it.

It's completely normal for a correctly functioning system to report an
extremely low amount of memory as free.  The operating system is using
the spare memory in your system as a filesystem cache, which makes
everything run a lot faster.  If a program needs more memory, the
operating system will instantly give up some of its disk cache in order
to satisfy the memory allocation.

The "virtual memory" part of this blog post (which has direct relevance
for Solr) hopefully can explain it better than I can.  The entire blog
post is worth reading.

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Thanks,
Shawn



Solr Memory Usage

2014-10-29 Thread Vijay Kokatnur
I am observing some weird behavior with how Solr is using memory.  We are
running both Solr and zookeeper on the same node.  We tested memory
settings on Solr Cloud Setup of 1 shard with 146GB index size, and 2 Shard
Solr setup with 44GB index size.  Both are running on similar beefy
machines.

 After running the setup for 3-4 days, I see that a lot of memory is
inactive in all the nodes -

 99052952  total memory
 98606256  used memory
 19143796  active memory
 75063504  inactive memory

And inactive memory is never reclaimed by the OS.  When total memory size
is reached, latency and disk IO shoots up.  We observed this behavior in
both Solr Cloud setup with 1 shard and Solr setup with 2 shards.

For the Solr Cloud setup, we are running a cron job with following command
to clear out the inactive memory.  It  is working as expected.  Even though
the index size of Cloud is 146GB, the used memory is always below 55GB.
Our response times are better and no errors/exceptions are thrown. (This
command causes issue in 2 Shard setup)

echo 3 > /proc/sys/vm/drop_caches

We have disabled the query, doc and solr caches in our setup.  Zookeeper is
using around 10GB of memory and we are not running any other process in
this system.

Has anyone faced this issue before?


RE: Solr configuration, memory usage and MMapDirectory

2014-10-08 Thread Simon Fairey
Hi

Thanks for this I will investigate further after reading a number of your 
points in more detail, I do have a feeling they've setup too many entries in 
the filter cache (1000s) so will revisit that.

Just a note on numbers, those were valid when I made the post but obviously 
they change as the week progresses before a regular clean-up of content, 
current numbers for info (if it's at all relevant) from the index admin view on 
one of the 2 nodes is:

Last Modified:  18 minutes ago
Num Docs:   24590368
Max Doc:29139255
Deleted Docs:   4548887
Version:1297982
Segment Count:  28

   Version  Gen Size
Master: 1412798583558 402364 52.98 GB

Top:
2996 tomcat6   20   0  189g  73g 1.5g S   15 58.7  58034:04 java

And the only GC option I can see that is on is "- XX:+UseConcMarkSweepGC"

Regarding the XY problem, you are very likely correct, unfortunately I wasn't 
involved in the config and I very much suspect when it was done many of the 
defaults were used and then if it didn't work or there was say an out of memory 
error they just upped the heap to solve the symptom without investigating the 
cause. The luxury of having more than enough RAM I guess!

I'm going to get some late night downtime soon at which point I'm hoping to 
change the heap size, GC settings and add the JMX, it's not exposed to the 
internet so no security is fine.

Right off to do some reading!

Cheers

Si

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: 08 October 2014 21:09
To: solr-user@lucene.apache.org
Subject: Re: Solr configuration, memory usage and MMapDirectory

On 10/8/2014 4:02 AM, Simon Fairey wrote:
> I'm currently setting up jconsole but as I have to remotely monitor (no gui 
> capability on the server) I have to wait before I can restart solr with a JMX 
> port setup. In the meantime I looked at top and given the calculations you 
> said based on your top output and this top of my java process from the node 
> that handles the querying, the indexing node has a similar memory profile:
> 
> https://www.dropbox.com/s/pz85dm4e7qpepco/SolrTop.png?dl=0
> 
> It would seem I need a monstrously large heap in the 60GB region?
> 
> We do use a lot of navigators/filters so I have set the caches to be quite 
> large for these, are these what are using up the memory?

With a VIRT size of 189GB and a RES size of 73GB, I believe you probably have 
more than 45GB of index data.  This might be a combination of old indexes and 
the active index.  Only the indexes (cores) that are being actively used need 
to be considered when trying to calculate the total RAM needed.  Other indexes 
will not affect performance, even though they increase your virtual memory size.

With MMap, part of the virtual memory size is the size of the index data that 
has been opened on the disk.  This is not memory that's actually allocated.  
There's a very good reason that mmap has been the default in Lucene and Solr 
for more than two years.

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

You stated originally that you have 25 million document and 45GB of index data 
on each node.  With those numbers and a conservative configuration, I would 
expect that you need about 4GB of heap, maybe as much as 8GB.  I cannot think 
of any reason that you would NEED a heap 60GB or larger.

Each field that you sort on, each field that you facet on with the default 
facet.method of fc, and each filter that you cache will use a large block of 
memory.  The size of that block of memory is almost exclusively determined by 
the number of documents in the index.

With 25 million documents, each filterCache entry will be approximately 3MB -- 
one bit for every document.  I do not know how big each FieldCache entry is for 
a sort field and a facet field, but assume that they are probably larger than 
the 3MB entries on the filterCache.

I've got a filterCache sized at 64, with an autowarmCount of 4.  With larger 
autowarmCount values, I was seeing commits take 30 seconds or more, because 
each of those filters can take a few seconds to execute.
Cache sizes in the thousands are rarely necessary, and just chew up a lot of 
memory with no benefit.  Large autowarmCount values are also rarely necessary.  
Every time a new searcher is opened by a commit, add up all your autowarmCount 
values and realize that the searcher likely needs to execute that many queries 
before it is available.

If you need to set up remote JMX so you can remotely connect jconsole, I have 
done this in the redhat init script I've built -- see JMX_OPTS here:

http://wiki.apache.org/solr/ShawnHeisey#Init_script

It's never a good idea to expose Solr directly to the internet, but if you use 
that JMX config, *definitely* don't expose it to the Internet.
It doesn't

Re: Solr configuration, memory usage and MMapDirectory

2014-10-08 Thread Shawn Heisey
On 10/8/2014 4:02 AM, Simon Fairey wrote:
> I'm currently setting up jconsole but as I have to remotely monitor (no gui 
> capability on the server) I have to wait before I can restart solr with a JMX 
> port setup. In the meantime I looked at top and given the calculations you 
> said based on your top output and this top of my java process from the node 
> that handles the querying, the indexing node has a similar memory profile:
> 
> https://www.dropbox.com/s/pz85dm4e7qpepco/SolrTop.png?dl=0
> 
> It would seem I need a monstrously large heap in the 60GB region?
> 
> We do use a lot of navigators/filters so I have set the caches to be quite 
> large for these, are these what are using up the memory?

With a VIRT size of 189GB and a RES size of 73GB, I believe you probably
have more than 45GB of index data.  This might be a combination of old
indexes and the active index.  Only the indexes (cores) that are being
actively used need to be considered when trying to calculate the total
RAM needed.  Other indexes will not affect performance, even though they
increase your virtual memory size.

With MMap, part of the virtual memory size is the size of the index data
that has been opened on the disk.  This is not memory that's actually
allocated.  There's a very good reason that mmap has been the default in
Lucene and Solr for more than two years.

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

You stated originally that you have 25 million document and 45GB of
index data on each node.  With those numbers and a conservative
configuration, I would expect that you need about 4GB of heap, maybe as
much as 8GB.  I cannot think of any reason that you would NEED a heap
60GB or larger.

Each field that you sort on, each field that you facet on with the
default facet.method of fc, and each filter that you cache will use a
large block of memory.  The size of that block of memory is almost
exclusively determined by the number of documents in the index.

With 25 million documents, each filterCache entry will be approximately
3MB -- one bit for every document.  I do not know how big each
FieldCache entry is for a sort field and a facet field, but assume that
they are probably larger than the 3MB entries on the filterCache.

I've got a filterCache sized at 64, with an autowarmCount of 4.  With
larger autowarmCount values, I was seeing commits take 30 seconds or
more, because each of those filters can take a few seconds to execute.
Cache sizes in the thousands are rarely necessary, and just chew up a
lot of memory with no benefit.  Large autowarmCount values are also
rarely necessary.  Every time a new searcher is opened by a commit, add
up all your autowarmCount values and realize that the searcher likely
needs to execute that many queries before it is available.

If you need to set up remote JMX so you can remotely connect jconsole, I
have done this in the redhat init script I've built -- see JMX_OPTS here:

http://wiki.apache.org/solr/ShawnHeisey#Init_script

It's never a good idea to expose Solr directly to the internet, but if
you use that JMX config, *definitely* don't expose it to the Internet.
It doesn't use any authentication.

We might need to back up a little bit and start with the problem that
you are trying to figure out, not the numbers that are being reported.

http://people.apache.org/~hossman/#xyproblem

Your original note said that you're sanity checking.  Toward that end,
the only insane thing that jumps out at me is that your max heap is
*VERY* large, and you probably don't have the proper GC tuning.

My recommendations for initial action are to use -Xmx8g on the servlet
container startup and include the GC settings you can find on the wiki
pages I've given you.  It would be a very good idea to set up remote JMX
so you can use jconsole or jvisualvm remotely.

Thanks,
Shawn



RE: Solr configuration, memory usage and MMapDirectory

2014-10-08 Thread Simon Fairey
Hi

I'm currently setting up jconsole but as I have to remotely monitor (no gui 
capability on the server) I have to wait before I can restart solr with a JMX 
port setup. In the meantime I looked at top and given the calculations you said 
based on your top output and this top of my java process from the node that 
handles the querying, the indexing node has a similar memory profile:

https://www.dropbox.com/s/pz85dm4e7qpepco/SolrTop.png?dl=0

It would seem I need a monstrously large heap in the 60GB region?

We do use a lot of navigators/filters so I have set the caches to be quite 
large for these, are these what are using up the memory?

Thanks

Si

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: 06 October 2014 16:56
To: solr-user@lucene.apache.org
Subject: Re: Solr configuration, memory usage and MMapDirectory

On 10/6/2014 9:24 AM, Simon Fairey wrote:
> I've inherited a Solr config and am doing some sanity checks before 
> making some updates, I'm concerned about the memory settings.
>
> System has 1 index in 2 shards split across 2 Ubuntu 64 bit nodes, 
> each node has 32 CPU cores and 132GB RAM, we index around 500k files a 
> day spread out over the day in batches every 10 minutes, a portion of 
> these are updates to existing content, maybe 5-10%. Currently 
> MergeFactor is set to 2 and commit settings are:
>
> 
>
> 6
>
> false
>
> 
>
> 
>
> 90
>
> 
>
> Currently each node has around 25M docs in with an index size of 45GB, 
> we prune the data every few weeks so it never gets much above 35M docs 
> per node.
>
> On reading I've seen a recommendation that we should be using 
> MMapDirectory, currently it's set to NRTCachingDirectoryFactory.
> However currently the JVM is configured with -Xmx131072m, and for 
> MMapDirectory I've read you should use less memory for the JVM so 
> there is more available for the OS caching.
>
> Looking at the dashboard in the JVM memory usage I see:
>
> enter image description here
>
> Not sure I understand the 3 bands, assume 127.81 is Max, dark grey is 
> in use at the moment and the light grey is allocated as it was used 
> previously but not been cleaned up yet?
>
> I'm trying to understand if this will help me know how much would be a 
> good value to change Xmx to, i.e. say 64GB based on light grey?
>
> Additionally once I've changed the max heap size is it a simple case 
> of changing the config to use MMapDirectory or are there things i need 
> to watch out for?
>

NRTCachingDirectoryFactory is a wrapper directory implementation. The wrapped 
Directory implementation is used with some code between that implementation and 
the consumer (Solr in this case) that does caching for NRT indexing.  The 
wrapped implementation is MMapDirectory, so you do not need to switch, you ARE 
using MMap.

Attachments rarely make it to the list, and that has happened in this case, so 
I cannot see any of your pictures.  Instead, look at one of mine, and the 
output of a command from the same machine, running Solr
4.7.2 with Oracle Java 7:

https://www.dropbox.com/s/91uqlrnfghr2heo/solr-memory-sorted-top.png?dl=0

[root@idxa1 ~]# du -sh /index/solr4/data/
64G /index/solr4/data/

I've got 64GB of index data on this machine, used by about 56 million 
documents.  I've also got 64GB of RAM.  The solr process shows a virtual memory 
size of 54GB, a resident size of 16GB, and a shared size of 11GB.  My max heap 
on this process is 6GB.  If you deduct the shared memory size from the resident 
size, you get about 5GB.  The admin dashboard for this machine says the current 
max heap size is 5.75GB, so that 5GB is pretty close to that, and probably 
matches up really well when you consider that the resident size may be 
considerably more than 16GB and the shared size may be just barely over 11GB.

My system has well over 9GB free memory and 44GB is being used for the OS disk 
cache.  This system is NOT facing memory pressure.  The index is well-cached 
and there is even memory that is not used *at all*.

With an index size of 45GB and 132GB of RAM, you're unlikely to be having 
problems with memory unless your heap size is *ENORMOUS*.  You
*should* have your garbage collection highly tuned, especially if your max heap 
larger than 2 or 3GB.  I would guess that a 4 to 6GB heap is probably enough 
for your needs, unless you're doing a lot with facets, sorting, or Solr's 
caches, then you may need more.  Here's some info about heap requirements, 
followed by information about garbage collection tuning:

http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

Your automatic commit settings do not raise any red flags with me. 
Those are sensible settings.

Thanks,
Shawn




  1   2   3   >