Re: Continual garbage collection loop

2017-02-15 Thread Leon STRINGER
Thanks all who replied, lots of information to help us improve our use and
management of Solr!

> 
> On 15 February 2017 at 08:04 Michael Kuhlmann  wrote:
> 
> 
> The number of cores is not *that much* important compared to the index
> size, but each core has its memory overhead. For instance, caches are
> based on cores, so you're having 36 individual caches per type.
> 
> Best,
> Michael
> 
> Am 14.02.2017 um 16:39 schrieb Leon STRINGER:
> >> On 14 February 2017 at 14:44 Michael Kuhlmann  wrote:
> >>
> >>
> >> Wow, running 36 cores with only half a gigabyte of heap memory is
> >> *really* optimistic!
> >>
> >> I'd raise the heap size to some gigabytes at least and see how it's
> >> working then.
> >>
> > I'll try increasing the heap size and see if I get the problem again.
> >
> > Is core quantity a big issue? As opposed to the size of the cores? Yes,
> > there's
> > 36 but some relate to largely inactive web sites so the average size
> > (assuming
> > my "Master (Searching)" way of calculating this is correct) is less than
> > 4 MB. I
> > naively assumed a heap size-related issue would result from larger data
> > sets.
> >
> > Thanks for your recommendation,
> >
> > Leon Stringer
> >
> 

>

Re: Continual garbage collection loop

2017-02-15 Thread Michael Kuhlmann
The number of cores is not *that much* important compared to the index
size, but each core has its memory overhead. For instance, caches are
based on cores, so you're having 36 individual caches per type.

Best,
Michael

Am 14.02.2017 um 16:39 schrieb Leon STRINGER:
>> On 14 February 2017 at 14:44 Michael Kuhlmann  wrote:
>>
>>
>> Wow, running 36 cores with only half a gigabyte of heap memory is
>> *really* optimistic!
>>
>> I'd raise the heap size to some gigabytes at least and see how it's
>> working then.
>>
> I'll try increasing the heap size and see if I get the problem again.
>
> Is core quantity a big issue? As opposed to the size of the cores? Yes, 
> there's
> 36 but some relate to largely inactive web sites so the average size (assuming
> my "Master (Searching)" way of calculating this is correct) is less than 4 
> MB. I
> naively assumed a heap size-related issue would result from larger data sets.
>
> Thanks for your recommendation,
>
> Leon Stringer
>



Re: Continual garbage collection loop

2017-02-14 Thread Erick Erickson
Yeah, 512M is the default if for Java, but Solr _really_ likes memory.

These two lines are "smoking guns"
Max heap after conc GC: 488.7M (99.6%)
Max heap after full GC: 490M (99.9%)

So what's happening (I think) is that you're spending a lot of cycles
recovering a very little bit of memory and then (probably) going back
into another GC cycle. Increasing memory will help here a lot.

And it doesn't really matter that "some cores are largely inactive". Once
an object is allocated on the heap (say a filterCache entry, underlying
caches etc) it stays there until there are no references to it, i.e. usually
when the core closes.

Best,
Erick

On Tue, Feb 14, 2017 at 8:48 AM, Leon STRINGER
 wrote:
>>
>> On 14 February 2017 at 15:49 Walter Underwood 
>> wrote:
>>
>>
>> Yes, 512 MB is far too small. I’m surprised it even starts. We run with 8
>> Gb.
>>
>
> Thanks, in fairness 512 MB was the default and we're new to this. We'll look 
> at
> what we're allocating to Solr to tune this.
>
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/ (my blog)
>>


Re: Continual garbage collection loop

2017-02-14 Thread Leon STRINGER
> 
> On 14 February 2017 at 15:49 Walter Underwood 
> wrote:
> 
> 
> Yes, 512 MB is far too small. I’m surprised it even starts. We run with 8
> Gb.
> 

Thanks, in fairness 512 MB was the default and we're new to this. We'll look at
what we're allocating to Solr to tune this.

> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/ (my blog)
>

Re: Continual garbage collection loop

2017-02-14 Thread Walter Underwood
Yes, 512 MB is far too small. I’m surprised it even starts. We run with 8 Gb.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Feb 14, 2017, at 7:39 AM, Leon STRINGER  wrote:
> 
>> 
>>On 14 February 2017 at 14:44 Michael Kuhlmann  wrote:
>> 
>> 
>>Wow, running 36 cores with only half a gigabyte of heap memory is
>>*really* optimistic!
>> 
>>I'd raise the heap size to some gigabytes at least and see how it's
>>working then.
>> 
> 
> I'll try increasing the heap size and see if I get the problem again.
> 
> Is core quantity a big issue? As opposed to the size of the cores? Yes, 
> there's
> 36 but some relate to largely inactive web sites so the average size (assuming
> my "Master (Searching)" way of calculating this is correct) is less than 4 
> MB. I
> naively assumed a heap size-related issue would result from larger data sets.
> 
> Thanks for your recommendation,
> 
> Leon Stringer
> 



Re: Continual garbage collection loop

2017-02-14 Thread Leon STRINGER
> 
> On 14 February 2017 at 14:44 Michael Kuhlmann  wrote:
> 
> 
> Wow, running 36 cores with only half a gigabyte of heap memory is
> *really* optimistic!
> 
> I'd raise the heap size to some gigabytes at least and see how it's
> working then.
> 

I'll try increasing the heap size and see if I get the problem again.

Is core quantity a big issue? As opposed to the size of the cores? Yes, there's
36 but some relate to largely inactive web sites so the average size (assuming
my "Master (Searching)" way of calculating this is correct) is less than 4 MB. I
naively assumed a heap size-related issue would result from larger data sets.

Thanks for your recommendation,

Leon Stringer

>

Re: Continual garbage collection loop

2017-02-14 Thread Michael Kuhlmann
Wow, running 36 cores with only half a gigabyte of heap memory is
*really* optimistic!

I'd raise the heap size to some gigabytes at least and see how it's
working then.

-Michael

Am 14.02.2017 um 15:23 schrieb Leon STRINGER:
> Further background on the environment:
>
> There are 36 cores, with a total size of 131 MB (based on the size reported by
> "Master (Searching)" in the web console).
>
> The Java memory parameters in use are: -Xms512m -Xmx512m.
>
>> On 14 February 2017 at 05:45 Erick Erickson 
>> wrote:
>>
>> GCViewer is a nifty tool for visualizing the GC activity BTW.
>>
> I don't know what I'm looking for but for a log covering a 3-hour period today
> the "Summary" tab says (typed manually, apologies for any mistakes):
>
> Total heap (usage / alloc. max): 490.7M (100.0%) / 490.7M
>
> Max heap after conc GC: 488.7M (99.6%)
>
> Max tenured after conc GC: 382M (99.5% / 77.9%)
>
> Max heap after full GC: 490M (99.9%)
>
> Freed Memory: 141,811.4M
>
> Freed Mem/Min: 748.554M/min
>
> Total Time: 3h9m26s
>
> Accumulated pauses: 883.6s
>
> Throughput: 92.23%
>
> Number of full gc pauses: 476
>
> Full GC Performance: 101.4M/s
>
> Number of gc pauses: 15153
>
> GC Performance: 245.5M/s
>
>
> "Memory" tab:
>
> Total heap (usage / alloc. max): 490.7M (100.0%) / 490.7M
>
> Tenured heap (usage / alloc. max): 384M (100.0%) / 384M
>
> Young heap (usage / alloc. max): 106.7M (100.0%) / 106.7M
>
> Perm heap (usage / alloc. max): 205.6M (17.0%) / 1,212M
>
> Max tenured after conc GC: 382M (99.5% / 77.9%)
>
> Avg tenured after conc GC: 247.5M (delta=17.612M)
>
> Max heap after conc GC: 488.7M (99.6%)
>
> Avg heap after conc GC: 252.6M (delta=35.751M)
>
> Max heap after full GC: 490M (99.9%)
>
> Avg heap after full GC: 379M (delta=72.917M)
>
> Avg after GC: 359.9M  (delta=40.965M)
>
> Freed by full GC: 47,692.8M (33.6%)
>
> Freed by GC: 94,118.7M (66.4%)
>
> Avg freed full GC: 100.2M/coll (delta=68.015M) [greyed]
>
> Avg freed GC: 6,360.3K/coll (delta=19.963M) [greyed]
>
> Avg rel inc after FGC: -199,298B/coll
>
> Avg rel inc after GC: 6,360.3K/coll (delta=19.963M)
>
> Slope full GC: -126,380B/s
>
> Slope GC: 14.317M/s
>
> InitiatingOccFraction (avg / max): 65.9% / 100.0%
>
> Avg promotion: 2,215.324K/coll (delta=6,904.174K) [greyed]
>
> Total promotion: 12,504.467M
>
>
> Can anyone can shed any light on this? Is it a problem or is this all normal?
>
> Thanks,
>
> Leon Stringer




Re: Continual garbage collection loop

2017-02-14 Thread Leon STRINGER
Further background on the environment:

There are 36 cores, with a total size of 131 MB (based on the size reported by
"Master (Searching)" in the web console).

The Java memory parameters in use are: -Xms512m -Xmx512m.

> 
> On 14 February 2017 at 05:45 Erick Erickson 
> wrote:
> 
> GCViewer is a nifty tool for visualizing the GC activity BTW.
> 

I don't know what I'm looking for but for a log covering a 3-hour period today
the "Summary" tab says (typed manually, apologies for any mistakes):

Total heap (usage / alloc. max): 490.7M (100.0%) / 490.7M

Max heap after conc GC: 488.7M (99.6%)

Max tenured after conc GC: 382M (99.5% / 77.9%)

Max heap after full GC: 490M (99.9%)

Freed Memory: 141,811.4M

Freed Mem/Min: 748.554M/min

Total Time: 3h9m26s

Accumulated pauses: 883.6s

Throughput: 92.23%

Number of full gc pauses: 476

Full GC Performance: 101.4M/s

Number of gc pauses: 15153

GC Performance: 245.5M/s


"Memory" tab:

Total heap (usage / alloc. max): 490.7M (100.0%) / 490.7M

Tenured heap (usage / alloc. max): 384M (100.0%) / 384M

Young heap (usage / alloc. max): 106.7M (100.0%) / 106.7M

Perm heap (usage / alloc. max): 205.6M (17.0%) / 1,212M

Max tenured after conc GC: 382M (99.5% / 77.9%)

Avg tenured after conc GC: 247.5M (delta=17.612M)

Max heap after conc GC: 488.7M (99.6%)

Avg heap after conc GC: 252.6M (delta=35.751M)

Max heap after full GC: 490M (99.9%)

Avg heap after full GC: 379M (delta=72.917M)

Avg after GC: 359.9M  (delta=40.965M)

Freed by full GC: 47,692.8M (33.6%)

Freed by GC: 94,118.7M (66.4%)

Avg freed full GC: 100.2M/coll (delta=68.015M) [greyed]

Avg freed GC: 6,360.3K/coll (delta=19.963M) [greyed]

Avg rel inc after FGC: -199,298B/coll

Avg rel inc after GC: 6,360.3K/coll (delta=19.963M)

Slope full GC: -126,380B/s

Slope GC: 14.317M/s

InitiatingOccFraction (avg / max): 65.9% / 100.0%

Avg promotion: 2,215.324K/coll (delta=6,904.174K) [greyed]

Total promotion: 12,504.467M


Can anyone can shed any light on this? Is it a problem or is this all normal?

Thanks,

Leon Stringer

Re: Continual garbage collection loop

2017-02-14 Thread Leon STRINGER
Why is this a problem? CPU load and log file size.

The CPU load reported by top increases from it's normal 0-1% resting state to
25-50% continually, and higher when searching, indexing, etc.

With 6.1.0 we just got a huge GC log file. With 6.4.1 the GC log is now cycled
before it grows too large, but it does mean that they are quickly cycled (every
3 hours).

Operations seem much slower in this state but I haven't collected figures for
this.

When Solr gets into this state it stays like it until it's restarted.

Thanks,

Leon Stringer

> 
> On 14 February 2017 at 05:45 Erick Erickson 
> wrote:
> 
> 
> Why is this a problem? Are you seeing unacceptable slowness?
> It's fairly common for Java to frequently do GC, the problem happens
> when it uses stop-the-world GC. So unless you're seeing visibly
> slow performance I'd say ignore it.
> 
> Curiously, increasing the Java heap a little bit sometimes helps
> as I've seen situations where the GC recovers so little memory that
> another GC cycle immediately occurs. That said I don't see evidence
> of this in what you showed.
> 
> GCViewer is a nifty tool for visualizing the GC activity BTW.
> 
> Best,
> Erick
> 
> On Mon, Feb 13, 2017 at 8:36 AM, Leon STRINGER
>  wrote:
> > Hi,
> >
> > I get an issue where, when I'm deleting and adding Solr cores, it
> > appears to go
> > into a loop increasing CPU load and continually (every 2 seconds)
> > logging to the
> > garbage collection log.
> >
> > I had this problem with 6.1.0 so we've just upgraded to 6.4.1 and the
> > issue
> > still occurs. The entries being logged every 2 seconds are below (hope
> > it's not
> > too verbose). Obviously this means the log gets big quickly.
> >
> > We can work around the issue by restarting Solr but presumably something
> > has
> > gone wrong. Can anyone suggest if we're doing something incorrectly to
> > cause
> > this, or if it's an issue we can troubleshoot.
> >
> > Any advice gratefully received.
> >
> > On CentOS 7 with OpenJDK 1.8.0_91-b14.
> >
> > solr_gc.log.0.current logs the following every 2 seconds:
> >
> > 2017-02-13T16:19:11.230+: 5092.640: [GC (CMS Initial Mark) [1
> > CMS-initial-mark: 225270K(393216K)] 225280K(502464K), 0.0030517 secs]
> > [Times:
> > user=0.01 sys=0.00, real=0.01 secs]
> > 2017-02-13T16:19:11.234+: 5092.643: Total time for which application
> > threads
> > were stopped: 0.0033800 seconds, Stopping threads took: 0.473
> > seconds
> > 2017-02-13T16:19:11.234+: 5092.643: [CMS-concurrent-mark-start]
> > 2017-02-13T16:19:11.359+: 5092.769: [CMS-concurrent-mark:
> > 0.125/0.125 secs]
> > [Times: user=0.50 sys=0.00, real=0.12 secs]
> > 2017-02-13T16:19:11.359+: 5092.769: [CMS-concurrent-preclean-start]
> > 2017-02-13T16:19:11.361+: 5092.771: [CMS-concurrent-preclean:
> > 0.002/0.002
> > secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
> > 2017-02-13T16:19:11.362+: 5092.771: [GC (CMS Final Remark) [YG
> > occupancy: 10
> > K (109248 K)]{Heap before GC invocations=3236 (full 1150):
> > par new generation total 109248K, used 10K [0xe000,
> > 0xe800, 0xe800)
> > eden space 87424K, 0% used [0xe000, 0xe0001020,
> > 0xe556)
> > from space 21824K, 0% used [0xe6ab, 0xe6ab1830,
> > 0xe800)
> > to space 21824K, 0% used [0xe556, 0xe556,
> > 0xe6ab)
> > concurrent mark-sweep generation total 393216K, used 225270K
> > [0xe800, 0x0001, 0x0001)
> > Metaspace used 176850K, capacity 179580K, committed 181092K, reserved
> > 1210368K
> > class space used 18794K, capacity 19506K, committed 19836K, reserved
> > 1048576K
> > 2017-02-13T16:19:11.362+: 5092.771: [GC (CMS Final Remark)
> > 2017-02-13T16:19:11.362+: 5092.771: [ParNew
> > Desired survivor size 20112992 bytes, new threshold 8 (max 8)
> > - age 2: 160 bytes, 160 total
> > - age 4: 32 bytes, 192 total
> > : 10K->6K(109248K), 0.0041872 secs] 225280K->225276K(502464K), 0.0042455
> > secs]
> > [Times: user=0.01 sys=0.00, real=0.01 secs]
> > Heap after GC invocations=3237 (full 1150):
> > par new generation total 109248K, used 6K [0xe000,
> > 0xe800, 0xe800)
> > eden space 87424K, 0% used [0xe000, 0xe000,
> > 0xe556)
> > from space 21824K, 0% used [0xe556, 0xe5561830,
> > 0xe6ab)
> > to space 21824K, 0% used [0xe6ab, 0xe6ab,
> > 0xe800)
> > concurrent mark-sweep 

Re: Continual garbage collection loop

2017-02-13 Thread Erick Erickson
Why is this a problem? Are you seeing unacceptable slowness?
It's fairly common for Java to frequently do GC, the problem happens
when it uses stop-the-world GC. So unless you're seeing visibly
slow performance I'd say ignore it.

Curiously, increasing the Java heap a little bit sometimes helps
as I've seen situations where the GC recovers so little memory that
another GC cycle immediately occurs. That said I don't see evidence
of this in what you showed.

GCViewer is a nifty tool for visualizing the GC activity BTW.

Best,
Erick

On Mon, Feb 13, 2017 at 8:36 AM, Leon STRINGER
 wrote:
> Hi,
>
> I get an issue where, when I'm deleting and adding Solr cores, it appears to 
> go
> into a loop increasing CPU load and continually (every 2 seconds) logging to 
> the
> garbage collection log.
>
> I had this problem with 6.1.0 so we've just upgraded to 6.4.1 and the issue
> still occurs. The entries being logged every 2 seconds are below (hope it's 
> not
> too verbose). Obviously this means the log gets big quickly.
>
> We can work around the issue by restarting Solr but presumably something has
> gone wrong. Can anyone suggest if we're doing something incorrectly to cause
> this, or if it's an issue we can troubleshoot.
>
> Any advice gratefully received.
>
> On CentOS 7 with OpenJDK 1.8.0_91-b14.
>
> solr_gc.log.0.current logs the following every 2 seconds:
>
> 2017-02-13T16:19:11.230+: 5092.640: [GC (CMS Initial Mark) [1
> CMS-initial-mark: 225270K(393216K)] 225280K(502464K), 0.0030517 secs] [Times:
> user=0.01 sys=0.00, real=0.01 secs]
> 2017-02-13T16:19:11.234+: 5092.643: Total time for which application 
> threads
> were stopped: 0.0033800 seconds, Stopping threads took: 0.473 seconds
> 2017-02-13T16:19:11.234+: 5092.643: [CMS-concurrent-mark-start]
> 2017-02-13T16:19:11.359+: 5092.769: [CMS-concurrent-mark: 0.125/0.125 
> secs]
> [Times: user=0.50 sys=0.00, real=0.12 secs]
> 2017-02-13T16:19:11.359+: 5092.769: [CMS-concurrent-preclean-start]
> 2017-02-13T16:19:11.361+: 5092.771: [CMS-concurrent-preclean: 0.002/0.002
> secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
> 2017-02-13T16:19:11.362+: 5092.771: [GC (CMS Final Remark) [YG occupancy: 
> 10
> K (109248 K)]{Heap before GC invocations=3236 (full 1150):
> par new generation total 109248K, used 10K [0xe000,
> 0xe800, 0xe800)
> eden space 87424K, 0% used [0xe000, 0xe0001020,
> 0xe556)
> from space 21824K, 0% used [0xe6ab, 0xe6ab1830,
> 0xe800)
> to space 21824K, 0% used [0xe556, 0xe556,
> 0xe6ab)
> concurrent mark-sweep generation total 393216K, used 225270K
> [0xe800, 0x0001, 0x0001)
> Metaspace used 176850K, capacity 179580K, committed 181092K, reserved 1210368K
> class space used 18794K, capacity 19506K, committed 19836K, reserved 1048576K
> 2017-02-13T16:19:11.362+: 5092.771: [GC (CMS Final Remark)
> 2017-02-13T16:19:11.362+: 5092.771: [ParNew
> Desired survivor size 20112992 bytes, new threshold 8 (max 8)
> - age 2: 160 bytes, 160 total
> - age 4: 32 bytes, 192 total
> : 10K->6K(109248K), 0.0041872 secs] 225280K->225276K(502464K), 0.0042455 secs]
> [Times: user=0.01 sys=0.00, real=0.01 secs]
> Heap after GC invocations=3237 (full 1150):
> par new generation total 109248K, used 6K [0xe000,
> 0xe800, 0xe800)
> eden space 87424K, 0% used [0xe000, 0xe000,
> 0xe556)
> from space 21824K, 0% used [0xe556, 0xe5561830,
> 0xe6ab)
> to space 21824K, 0% used [0xe6ab, 0xe6ab,
> 0xe800)
> concurrent mark-sweep generation total 393216K, used 225270K
> [0xe800, 0x0001, 0x0001)
> Metaspace used 176850K, capacity 179580K, committed 181092K, reserved 1210368K
> class space used 18794K, capacity 19506K, committed 19836K, reserved 1048576K
> }
> 2017-02-13T16:19:11.366+: 5092.775: [Rescan (parallel) , 0.0018980
> secs]2017-02-13T16:19:11.368+: 5092.777: [weak refs processing, 0.0004940
> secs]2017-02-13T16:19:11.368+: 5092.778: [class unloading, 0.0580950
> secs]2017-02-13T16:19:11.426+: 5092.836: [scrub symbol table, 0.0110875
> secs]2017-02-13T16:19:11.438+: 5092.847: [scrub string table, 0.0019072
> secs][1 CMS-remark: 225270K(393216K)] 225276K(502464K), 0.0780250 secs] 
> [Times:
> user=0.09 sys=0.00, real=0.08 secs]
> 2017-02-13T16:19:11.440+: 5092.849: Total time for which application 
> threads
> were stopped: 0.0782677 seconds, Stopping threads took: 0.411 seconds
> 2017-02-13T16:19:11.440+: 5092.849: [CMS-concurrent-sweep-start]
> 2017-02-13T16:19:11.546+: 5092.955: [CMS-concurrent-sweep: 0.106/0.106 
> secs]
> [Times: user=0.11 sys=0.00, real=0.11 secs]
> 2017-02-13T16:19:11.546+: 5092.955: