Re: solr crashes

2018-12-10 Thread Shawn Heisey

On 12/3/2018 2:09 AM, Danilo Tomasoni wrote:
Unfortunately in this scenario solr often crashes while performing a 
query, even with a single query and no other source of system load.


What do you mean by "crashes"?  Because what I think of as a crash is 
exceedingly rare with Java programs.  I won't say that Solr can't crash 
... but the only situations where I've seen it happen have been hardware 
problems, OS problems, or bugs in Java.


One thing that can happen that *looks* like a crash is what Solr is 
designed to do on the occurrence of an OutOfMemoryError, if it happens 
on an OS that's not Windows.  In that situation, Java has been 
configured to run a script that kills Solr and creates a log entry 
saying that it ran.  This hasn't been set up on Windows yet.


The following URL was mentioned in one of the other thread replies.  
Here I have added an anchor to go to the part about the Java heap.  If 
what you are observing is that Solr dies, then very likely the OOM 
script is killing Solr due to Java running out of a resource.  This URL 
contains a section describing things that can consume a lot of heap 
memory.  Later there is another section about how to reduce heap 
requirements:


https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

Note that an OutOfMemoryError can occur because of problems other than 
heap memory.  If Solr is being killed because of the OOME exception, you 
will need to find the reason for the OOME in one of the solr.log files.  
If it got logged, it might be in one of the files named something like 
solr.log.1 through solr.log.9 rather than solr.log itself.


Thanks,
Shawn



Re: solr crashes

2018-12-06 Thread Walter Underwood
> On Dec 6, 2018, at 12:59 AM, Bernd Fehling  
> wrote:
> 
> Am 05.12.18 um 17:11 schrieb Walter Underwood:
>> I’ve never heard a recommendation to have three times as much RAM as the 
>> heap. That doesn’t make sense to me.
> 
> https://wiki.apache.org/solr/SolrPerformanceProblems#RAM 
> 

Thanks. That does not say to use 3X the RAM. In fact, it recommends the 
approach I gave, size the RAM by looking at the needs of Solr and other 
programs. It does not mention the OS and file buffers, which is a serious 
omission.

"Let's say that you have a Solr index size of 8GB. If your OS, Solr's Java 
heap, and all other running programs require 4GB of memory, then an ideal 
memory size for that server is at least 12GB. […] It's very important to note 
here that there is no quick formula available for deciding the minimum amount 
of memory required for good performance."

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



Re: solr crashes

2018-12-06 Thread Bernd Fehling




Am 05.12.18 um 17:11 schrieb Walter Underwood:

I’ve never heard a recommendation to have three times as much RAM as the heap. 
That doesn’t make sense to me.


https://wiki.apache.org/solr/SolrPerformanceProblems#RAM



You might need 3X as much disk space as the index size.

For RAM, it is best to have the sum of:

* JVM heap
* A couple of gigabytes for OS and demons
* RAM for other processes needed on the host (keep to a minimum)
* Enough RAM to hold the entire index

Clearly, you are not going to have enough RAM for a 555 gigabyte index. Well, 
Amazon does have a dozen instance types that can do that, but they are 
expensive.

A 24 GB heap on a 30 GB machine will be pretty tight.

Always set Xms (starting heap) to the same as Xmx (maximum heap). If you set it 
smaller, the JVM will keep increasing the heap until it hits the max before 
doing a full GC. It will always end up with the max setting, but it will have 
to do more work to get there. The setting for initial heap size is about the 
most useless thing in Java.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Dec 4, 2018, at 6:06 AM, Bernd Fehling  
wrote:

Hi Danilo,

Full GC points out that you need more heap which also implies that you need 
more RAM.
Raise your heap to 24GB and your physical RAM to about 75GB or better 96GB.
RAM should be about 3 to 4 times heap size.

Regards, Bernd


Am 04.12.18 um 13:37 schrieb Danilo Tomasoni:

Hello Bernd,
Here I list the extra info you requested:
- actually the virtual machine has 22GB of RAM and 16GB of heap
- my 40 million raw data takes about 1364GB on filesystem (in xml format)
- my index optimized (1 segment, 0 deleted docs) takes about 555GB
- solr 7.3, openjdk 1.8.0_181
- GC logs are like
2018-12-03T07:40:22.302+0100: 28752.505: [Full GC (Allocation Failure) 
2018-12-03T07:40:22.302+0100: 28752.505: [CMS: 12287999K->12287999K(12288000K), 
13.6470083 secs] 15701375K->15701373K(15701376K), [Metaspace: 
37438K->37438K(1083392K)], 13.6470726 secs] [Times: user=13.66 sys=0.00, real=13.64 
secs]
Heap after GC invocations=2108 (full 1501):
  par new generation   total 3413376K, used 3413373K [0x0003d800, 
0x0004d200, 0x0004d200)
   eden space 2730752K,  99% used [0x0003d800, 0x00047eabfdc0, 
0x00047eac)
   from space 682624K,  99% used [0x00047eac, 0x0004a855f8a0, 
0x0004a856)
   to   space 682624K,   0% used [0x0004a856, 0x0004a856, 
0x0004d200)
  concurrent mark-sweep generation total 12288000K, used 12287999K 
[0x0004d200, 0x0007c000, 0x0007c000)
  Metaspace   used 37438K, capacity 38438K, committed 38676K, reserved 
1083392K
   class spaceused 4257K, capacity 4521K, committed 4628K, reserved 1048576K
}
Thank you for your help
Danilo
On 03/12/18 10:36, Bernd Fehling wrote:

Hi Danilo,

you have to give more infos about your system and the config.

- 30gb RAM (physical RAM?) how much heap do you have for JAVA?
- how large (in GByte) are your 40 million raw data being indexed?
- how large is your index (in GByte) with 40 million docs indexed?
- which version of Solr and JAVA?
- do you have JAVA garbage collection logs and if so what are they reporting?
- Any FullGC in GC logs?

Regards, Bernd


Am 03.12.18 um 10:09 schrieb Danilo Tomasoni:

Hello all,

We have a configuration with a single node with 30gb of RAM.

We use it to index ~40MLN of documents.

We perform queries with edismax parser that contain often edismax parser 
subqueries with the syntax

'_query_:{!edismax mm=X v=$subqueryN}'

Often X == 1.

This solves the "too many boolean clauses" error we got expanding the query 
terms (often phrase queries) directly in the main query.

Unfortunately in this scenario solr often crashes while performing a query, 
even with a single query and no other source of system load.


Do you have any idea of what's going on here?

Otherwise,

What kind of solr configuration parameters do you think I need to investigate 
first?

What kind of log lines should I search for to understand what's going on?


Thank you

Danilo





Re: solr crashes

2018-12-05 Thread Walter Underwood
But it is silly to base non-heap RAM on the size of the heap. Get the
RAM needed for the non-heap usage. That has nothing to do with the
size of the Java heap.

Non-heap RAM is mostly used for two things: other programs and
file buffers for the Solr indexes. Base the RAM needs on those.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Dec 5, 2018, at 10:01 AM, Gus Heck  wrote:
> 
> 3x heap is larger than usual, but significant RAM beyond heap is a good
> idea if you can't fit the whole index in 31 GB of memory, since the OS will
> cache files in ram. Note also the use of 32 GB through about 45 GB heap
> settings gives you LESS heap than 31 GB due to an increase in pointer sizes
> needed to track large memory spaces. Typically 64 GB ram with 31gb heap is
> a good start for decent sized indexes and add more machines to get more
> ram/heap/cpu relative to your data on disk and query load. Of course test
> and tune from there to find your ideal spec for your installation... Also
> larger ram means longer gc pauses.
> 
> That said, none of the ram beyond heap is likely to have much effect on
> crashing once the OS and other processes on the box are happy.
> 
> On Wed, Dec 5, 2018, 11:11 AM Walter Underwood  
>> I’ve never heard a recommendation to have three times as much RAM as the
>> heap. That doesn’t make sense to me.
>> 
>> You might need 3X as much disk space as the index size.
>> 
>> For RAM, it is best to have the sum of:
>> 
>> * JVM heap
>> * A couple of gigabytes for OS and demons
>> * RAM for other processes needed on the host (keep to a minimum)
>> * Enough RAM to hold the entire index
>> 
>> Clearly, you are not going to have enough RAM for a 555 gigabyte index.
>> Well, Amazon does have a dozen instance types that can do that, but they
>> are expensive.
>> 
>> A 24 GB heap on a 30 GB machine will be pretty tight.
>> 
>> Always set Xms (starting heap) to the same as Xmx (maximum heap). If you
>> set it smaller, the JVM will keep increasing the heap until it hits the max
>> before doing a full GC. It will always end up with the max setting, but it
>> will have to do more work to get there. The setting for initial heap size
>> is about the most useless thing in Java.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Dec 4, 2018, at 6:06 AM, Bernd Fehling <
>> bernd.fehl...@uni-bielefeld.de> wrote:
>>> 
>>> Hi Danilo,
>>> 
>>> Full GC points out that you need more heap which also implies that you
>> need more RAM.
>>> Raise your heap to 24GB and your physical RAM to about 75GB or better
>> 96GB.
>>> RAM should be about 3 to 4 times heap size.
>>> 
>>> Regards, Bernd
>>> 
>>> 
>>> Am 04.12.18 um 13:37 schrieb Danilo Tomasoni:
 Hello Bernd,
 Here I list the extra info you requested:
 - actually the virtual machine has 22GB of RAM and 16GB of heap
 - my 40 million raw data takes about 1364GB on filesystem (in xml
>> format)
 - my index optimized (1 segment, 0 deleted docs) takes about 555GB
 - solr 7.3, openjdk 1.8.0_181
 - GC logs are like
 2018-12-03T07:40:22.302+0100: 28752.505: [Full GC (Allocation Failure)
>> 2018-12-03T07:40:22.302+0100: 28752.505: [CMS:
>> 12287999K->12287999K(12288000K), 13.6470083 secs]
>> 15701375K->15701373K(15701376K), [Metaspace: 37438K->37438K(1083392K)],
>> 13.6470726 secs] [Times: user=13.66 sys=0.00, real=13.64 secs]
 Heap after GC invocations=2108 (full 1501):
 par new generation   total 3413376K, used 3413373K
>> [0x0003d800, 0x0004d200, 0x0004d200)
  eden space 2730752K,  99% used [0x0003d800,
>> 0x00047eabfdc0, 0x00047eac)
  from space 682624K,  99% used [0x00047eac,
>> 0x0004a855f8a0, 0x0004a856)
  to   space 682624K,   0% used [0x0004a856,
>> 0x0004a856, 0x0004d200)
 concurrent mark-sweep generation total 12288000K, used 12287999K
>> [0x0004d200, 0x0007c000, 0x0007c000)
 Metaspace   used 37438K, capacity 38438K, committed 38676K,
>> reserved 1083392K
  class spaceused 4257K, capacity 4521K, committed 4628K, reserved
>> 1048576K
 }
 Thank you for your help
 Danilo
 On 03/12/18 10:36, Bernd Fehling wrote:
> Hi Danilo,
> 
> you have to give more infos about your system and the config.
> 
> - 30gb RAM (physical RAM?) how much heap do you have for JAVA?
> - how large (in GByte) are your 40 million raw data being indexed?
> - how large is your index (in GByte) with 40 million docs indexed?
> - which version of Solr and JAVA?
> - do you have JAVA garbage collection logs and if so what are they
>> reporting?
> - Any FullGC in GC logs?
> 
> Regards, Bernd
> 
> 
> Am 03.12.18 um 10:09 schrieb Danilo Tomasoni:
>> Hello all,
>> 
>> We have a configuration with a single node with 30gb of RAM.

Re: solr crashes

2018-12-05 Thread Gus Heck
3x heap is larger than usual, but significant RAM beyond heap is a good
idea if you can't fit the whole index in 31 GB of memory, since the OS will
cache files in ram. Note also the use of 32 GB through about 45 GB heap
settings gives you LESS heap than 31 GB due to an increase in pointer sizes
needed to track large memory spaces. Typically 64 GB ram with 31gb heap is
a good start for decent sized indexes and add more machines to get more
ram/heap/cpu relative to your data on disk and query load. Of course test
and tune from there to find your ideal spec for your installation... Also
larger ram means longer gc pauses.

That said, none of the ram beyond heap is likely to have much effect on
crashing once the OS and other processes on the box are happy.

On Wed, Dec 5, 2018, 11:11 AM Walter Underwood  I’ve never heard a recommendation to have three times as much RAM as the
> heap. That doesn’t make sense to me.
>
> You might need 3X as much disk space as the index size.
>
> For RAM, it is best to have the sum of:
>
> * JVM heap
> * A couple of gigabytes for OS and demons
> * RAM for other processes needed on the host (keep to a minimum)
> * Enough RAM to hold the entire index
>
> Clearly, you are not going to have enough RAM for a 555 gigabyte index.
> Well, Amazon does have a dozen instance types that can do that, but they
> are expensive.
>
> A 24 GB heap on a 30 GB machine will be pretty tight.
>
> Always set Xms (starting heap) to the same as Xmx (maximum heap). If you
> set it smaller, the JVM will keep increasing the heap until it hits the max
> before doing a full GC. It will always end up with the max setting, but it
> will have to do more work to get there. The setting for initial heap size
> is about the most useless thing in Java.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Dec 4, 2018, at 6:06 AM, Bernd Fehling <
> bernd.fehl...@uni-bielefeld.de> wrote:
> >
> > Hi Danilo,
> >
> > Full GC points out that you need more heap which also implies that you
> need more RAM.
> > Raise your heap to 24GB and your physical RAM to about 75GB or better
> 96GB.
> > RAM should be about 3 to 4 times heap size.
> >
> > Regards, Bernd
> >
> >
> > Am 04.12.18 um 13:37 schrieb Danilo Tomasoni:
> >> Hello Bernd,
> >> Here I list the extra info you requested:
> >> - actually the virtual machine has 22GB of RAM and 16GB of heap
> >> - my 40 million raw data takes about 1364GB on filesystem (in xml
> format)
> >> - my index optimized (1 segment, 0 deleted docs) takes about 555GB
> >> - solr 7.3, openjdk 1.8.0_181
> >> - GC logs are like
> >> 2018-12-03T07:40:22.302+0100: 28752.505: [Full GC (Allocation Failure)
> 2018-12-03T07:40:22.302+0100: 28752.505: [CMS:
> 12287999K->12287999K(12288000K), 13.6470083 secs]
> 15701375K->15701373K(15701376K), [Metaspace: 37438K->37438K(1083392K)],
> 13.6470726 secs] [Times: user=13.66 sys=0.00, real=13.64 secs]
> >> Heap after GC invocations=2108 (full 1501):
> >>  par new generation   total 3413376K, used 3413373K
> [0x0003d800, 0x0004d200, 0x0004d200)
> >>   eden space 2730752K,  99% used [0x0003d800,
> 0x00047eabfdc0, 0x00047eac)
> >>   from space 682624K,  99% used [0x00047eac,
> 0x0004a855f8a0, 0x0004a856)
> >>   to   space 682624K,   0% used [0x0004a856,
> 0x0004a856, 0x0004d200)
> >>  concurrent mark-sweep generation total 12288000K, used 12287999K
> [0x0004d200, 0x0007c000, 0x0007c000)
> >>  Metaspace   used 37438K, capacity 38438K, committed 38676K,
> reserved 1083392K
> >>   class spaceused 4257K, capacity 4521K, committed 4628K, reserved
> 1048576K
> >> }
> >> Thank you for your help
> >> Danilo
> >> On 03/12/18 10:36, Bernd Fehling wrote:
> >>> Hi Danilo,
> >>>
> >>> you have to give more infos about your system and the config.
> >>>
> >>> - 30gb RAM (physical RAM?) how much heap do you have for JAVA?
> >>> - how large (in GByte) are your 40 million raw data being indexed?
> >>> - how large is your index (in GByte) with 40 million docs indexed?
> >>> - which version of Solr and JAVA?
> >>> - do you have JAVA garbage collection logs and if so what are they
> reporting?
> >>> - Any FullGC in GC logs?
> >>>
> >>> Regards, Bernd
> >>>
> >>>
> >>> Am 03.12.18 um 10:09 schrieb Danilo Tomasoni:
>  Hello all,
> 
>  We have a configuration with a single node with 30gb of RAM.
> 
>  We use it to index ~40MLN of documents.
> 
>  We perform queries with edismax parser that contain often edismax
> parser subqueries with the syntax
> 
>  '_query_:{!edismax mm=X v=$subqueryN}'
> 
>  Often X == 1.
> 
>  This solves the "too many boolean clauses" error we got expanding the
> query terms (often phrase queries) directly in the main query.
> 
>  Unfortunately in this scenario solr often crashes while performing a
> query, even with a single query and no 

Re: solr crashes

2018-12-05 Thread Walter Underwood
I’ve never heard a recommendation to have three times as much RAM as the heap. 
That doesn’t make sense to me.

You might need 3X as much disk space as the index size.

For RAM, it is best to have the sum of:

* JVM heap
* A couple of gigabytes for OS and demons
* RAM for other processes needed on the host (keep to a minimum)
* Enough RAM to hold the entire index

Clearly, you are not going to have enough RAM for a 555 gigabyte index. Well, 
Amazon does have a dozen instance types that can do that, but they are 
expensive.

A 24 GB heap on a 30 GB machine will be pretty tight.

Always set Xms (starting heap) to the same as Xmx (maximum heap). If you set it 
smaller, the JVM will keep increasing the heap until it hits the max before 
doing a full GC. It will always end up with the max setting, but it will have 
to do more work to get there. The setting for initial heap size is about the 
most useless thing in Java.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Dec 4, 2018, at 6:06 AM, Bernd Fehling  
> wrote:
> 
> Hi Danilo,
> 
> Full GC points out that you need more heap which also implies that you need 
> more RAM.
> Raise your heap to 24GB and your physical RAM to about 75GB or better 96GB.
> RAM should be about 3 to 4 times heap size.
> 
> Regards, Bernd
> 
> 
> Am 04.12.18 um 13:37 schrieb Danilo Tomasoni:
>> Hello Bernd,
>> Here I list the extra info you requested:
>> - actually the virtual machine has 22GB of RAM and 16GB of heap
>> - my 40 million raw data takes about 1364GB on filesystem (in xml format)
>> - my index optimized (1 segment, 0 deleted docs) takes about 555GB
>> - solr 7.3, openjdk 1.8.0_181
>> - GC logs are like
>> 2018-12-03T07:40:22.302+0100: 28752.505: [Full GC (Allocation Failure) 
>> 2018-12-03T07:40:22.302+0100: 28752.505: [CMS: 
>> 12287999K->12287999K(12288000K), 13.6470083 secs] 
>> 15701375K->15701373K(15701376K), [Metaspace: 37438K->37438K(1083392K)], 
>> 13.6470726 secs] [Times: user=13.66 sys=0.00, real=13.64 secs]
>> Heap after GC invocations=2108 (full 1501):
>>  par new generation   total 3413376K, used 3413373K [0x0003d800, 
>> 0x0004d200, 0x0004d200)
>>   eden space 2730752K,  99% used [0x0003d800, 0x00047eabfdc0, 
>> 0x00047eac)
>>   from space 682624K,  99% used [0x00047eac, 0x0004a855f8a0, 
>> 0x0004a856)
>>   to   space 682624K,   0% used [0x0004a856, 0x0004a856, 
>> 0x0004d200)
>>  concurrent mark-sweep generation total 12288000K, used 12287999K 
>> [0x0004d200, 0x0007c000, 0x0007c000)
>>  Metaspace   used 37438K, capacity 38438K, committed 38676K, reserved 
>> 1083392K
>>   class spaceused 4257K, capacity 4521K, committed 4628K, reserved 
>> 1048576K
>> }
>> Thank you for your help
>> Danilo
>> On 03/12/18 10:36, Bernd Fehling wrote:
>>> Hi Danilo,
>>> 
>>> you have to give more infos about your system and the config.
>>> 
>>> - 30gb RAM (physical RAM?) how much heap do you have for JAVA?
>>> - how large (in GByte) are your 40 million raw data being indexed?
>>> - how large is your index (in GByte) with 40 million docs indexed?
>>> - which version of Solr and JAVA?
>>> - do you have JAVA garbage collection logs and if so what are they 
>>> reporting?
>>> - Any FullGC in GC logs?
>>> 
>>> Regards, Bernd
>>> 
>>> 
>>> Am 03.12.18 um 10:09 schrieb Danilo Tomasoni:
 Hello all,
 
 We have a configuration with a single node with 30gb of RAM.
 
 We use it to index ~40MLN of documents.
 
 We perform queries with edismax parser that contain often edismax parser 
 subqueries with the syntax
 
 '_query_:{!edismax mm=X v=$subqueryN}'
 
 Often X == 1.
 
 This solves the "too many boolean clauses" error we got expanding the 
 query terms (often phrase queries) directly in the main query.
 
 Unfortunately in this scenario solr often crashes while performing a 
 query, even with a single query and no other source of system load.
 
 
 Do you have any idea of what's going on here?
 
 Otherwise,
 
 What kind of solr configuration parameters do you think I need to 
 investigate first?
 
 What kind of log lines should I search for to understand what's going on?
 
 
 Thank you
 
 Danilo
 
> 
> -- 
> *
> Bernd FehlingBielefeld University Library
> Dipl.-Inform. (FH)LibTec - Library Technology
> Universitätsstr. 25  and Knowledge Management
> 33615 Bielefeld
> Tel. +49 521 106-4060   bernd.fehling(at)uni-bielefeld.de
>  https://www.ub.uni-bielefeld.de/~befehl/
> 
> BASE - Bielefeld Academic Search Engine - www.base-search.net
> *



Re: solr crashes

2018-12-04 Thread Bernd Fehling




Am 04.12.18 um 16:47 schrieb Danilo Tomasoni:

Hello Bernd,

Thanks for the suggestion,

the problem is that we don't have 75 GB of RAM.

Are you aware of any way to reduce solr memory usage?


Yes, remove all Faceting, especially those for Fields with high cardinality.

Don't use huge Synonym files which build a Synonyms FST for SpellCheckComponent
used for autocomplete suggestions (e.g. Thesuarus).





Thanks

Danilo

On 04/12/18 15:06, Bernd Fehling wrote:

Hi Danilo,

Full GC points out that you need more heap which also implies that you need 
more RAM.
Raise your heap to 24GB and your physical RAM to about 75GB or better 96GB.
RAM should be about 3 to 4 times heap size.

Regards, Bernd


Am 04.12.18 um 13:37 schrieb Danilo Tomasoni:

Hello Bernd,

Here I list the extra info you requested:

- actually the virtual machine has 22GB of RAM and 16GB of heap

- my 40 million raw data takes about 1364GB on filesystem (in xml format)

- my index optimized (1 segment, 0 deleted docs) takes about 555GB

- solr 7.3, openjdk 1.8.0_181

- GC logs are like

2018-12-03T07:40:22.302+0100: 28752.505: [Full GC (Allocation Failure) 2018-12-03T07:40:22.302+0100: 28752.505: [CMS: 
12287999K->12287999K(12288000K), 13.6470083 secs] 15701375K->15701373K(15701376K), [Metaspace: 37438K->37438K(1083392K)], 13.6470726 secs] 
[Times: user=13.66 sys=0.00, real=13.64 secs]

Heap after GC invocations=2108 (full 1501):
  par new generation   total 3413376K, used 3413373K [0x0003d800, 
0x0004d200, 0x0004d200)
   eden space 2730752K,  99% used [0x0003d800, 0x00047eabfdc0, 
0x00047eac)
   from space 682624K,  99% used [0x00047eac, 0x0004a855f8a0, 
0x0004a856)
   to   space 682624K,   0% used [0x0004a856, 0x0004a856, 
0x0004d200)
  concurrent mark-sweep generation total 12288000K, used 12287999K 
[0x0004d200, 0x0007c000, 0x0007c000)
  Metaspace   used 37438K, capacity 38438K, committed 38676K, reserved 
1083392K
   class space    used 4257K, capacity 4521K, committed 4628K, reserved 1048576K
}


Thank you for your help

Danilo


On 03/12/18 10:36, Bernd Fehling wrote:

Hi Danilo,

you have to give more infos about your system and the config.

- 30gb RAM (physical RAM?) how much heap do you have for JAVA?
- how large (in GByte) are your 40 million raw data being indexed?
- how large is your index (in GByte) with 40 million docs indexed?
- which version of Solr and JAVA?
- do you have JAVA garbage collection logs and if so what are they reporting?
- Any FullGC in GC logs?

Regards, Bernd


Am 03.12.18 um 10:09 schrieb Danilo Tomasoni:

Hello all,

We have a configuration with a single node with 30gb of RAM.

We use it to index ~40MLN of documents.

We perform queries with edismax parser that contain often edismax parser 
subqueries with the syntax

'_query_:{!edismax mm=X v=$subqueryN}'

Often X == 1.

This solves the "too many boolean clauses" error we got expanding the query 
terms (often phrase queries) directly in the main query.

Unfortunately in this scenario solr often crashes while performing a query, 
even with a single query and no other source of system load.


Do you have any idea of what's going on here?

Otherwise,

What kind of solr configuration parameters do you think I need to investigate 
first?

What kind of log lines should I search for to understand what's going on?


Thank you

Danilo





Re: solr crashes

2018-12-04 Thread David Hastings
you can set the -Xms value lower on startup but your still going to run
into this issue.  Really you just need to go buy more ram, hardware is
cheap so you may as well max out the number of sockets for memory and get a
couple TB sized SSD's.

On Tue, Dec 4, 2018 at 10:47 AM Danilo Tomasoni  wrote:

> Hello Bernd,
>
> Thanks for the suggestion,
>
> the problem is that we don't have 75 GB of RAM.
>
> Are you aware of any way to reduce solr memory usage?
>
> Thanks
>
> Danilo
>
> On 04/12/18 15:06, Bernd Fehling wrote:
> > Hi Danilo,
> >
> > Full GC points out that you need more heap which also implies that you
> > need more RAM.
> > Raise your heap to 24GB and your physical RAM to about 75GB or better
> > 96GB.
> > RAM should be about 3 to 4 times heap size.
> >
> > Regards, Bernd
> >
> >
> > Am 04.12.18 um 13:37 schrieb Danilo Tomasoni:
> >> Hello Bernd,
> >>
> >> Here I list the extra info you requested:
> >>
> >> - actually the virtual machine has 22GB of RAM and 16GB of heap
> >>
> >> - my 40 million raw data takes about 1364GB on filesystem (in xml
> >> format)
> >>
> >> - my index optimized (1 segment, 0 deleted docs) takes about 555GB
> >>
> >> - solr 7.3, openjdk 1.8.0_181
> >>
> >> - GC logs are like
> >>
> >> 2018-12-03T07:40:22.302+0100: 28752.505: [Full GC (Allocation
> >> Failure) 2018-12-03T07:40:22.302+0100: 28752.505: [CMS:
> >> 12287999K->12287999K(12288000K), 13.6470083 secs]
> >> 15701375K->15701373K(15701376K), [Metaspace:
> >> 37438K->37438K(1083392K)], 13.6470726 secs] [Times: user=13.66
> >> sys=0.00, real=13.64 secs]
> >> Heap after GC invocations=2108 (full 1501):
> >>   par new generation   total 3413376K, used 3413373K
> >> [0x0003d800, 0x0004d200, 0x0004d200)
> >>eden space 2730752K,  99% used [0x0003d800,
> >> 0x00047eabfdc0, 0x00047eac)
> >>from space 682624K,  99% used [0x00047eac,
> >> 0x0004a855f8a0, 0x0004a856)
> >>to   space 682624K,   0% used [0x0004a856,
> >> 0x0004a856, 0x0004d200)
> >>   concurrent mark-sweep generation total 12288000K, used 12287999K
> >> [0x0004d200, 0x0007c000, 0x0007c000)
> >>   Metaspace   used 37438K, capacity 38438K, committed 38676K,
> >> reserved 1083392K
> >>class spaceused 4257K, capacity 4521K, committed 4628K,
> >> reserved 1048576K
> >> }
> >>
> >>
> >> Thank you for your help
> >>
> >> Danilo
> >>
> >>
> >> On 03/12/18 10:36, Bernd Fehling wrote:
> >>> Hi Danilo,
> >>>
> >>> you have to give more infos about your system and the config.
> >>>
> >>> - 30gb RAM (physical RAM?) how much heap do you have for JAVA?
> >>> - how large (in GByte) are your 40 million raw data being indexed?
> >>> - how large is your index (in GByte) with 40 million docs indexed?
> >>> - which version of Solr and JAVA?
> >>> - do you have JAVA garbage collection logs and if so what are they
> >>> reporting?
> >>> - Any FullGC in GC logs?
> >>>
> >>> Regards, Bernd
> >>>
> >>>
> >>> Am 03.12.18 um 10:09 schrieb Danilo Tomasoni:
>  Hello all,
> 
>  We have a configuration with a single node with 30gb of RAM.
> 
>  We use it to index ~40MLN of documents.
> 
>  We perform queries with edismax parser that contain often edismax
>  parser subqueries with the syntax
> 
>  '_query_:{!edismax mm=X v=$subqueryN}'
> 
>  Often X == 1.
> 
>  This solves the "too many boolean clauses" error we got expanding
>  the query terms (often phrase queries) directly in the main query.
> 
>  Unfortunately in this scenario solr often crashes while performing
>  a query, even with a single query and no other source of system load.
> 
> 
>  Do you have any idea of what's going on here?
> 
>  Otherwise,
> 
>  What kind of solr configuration parameters do you think I need to
>  investigate first?
> 
>  What kind of log lines should I search for to understand what's
>  going on?
> 
> 
>  Thank you
> 
>  Danilo
> 
> >
> --
> Danilo Tomasoni
> COSBI
>
> As for the European General Data Protection Regulation 2016/679 on the
> protection of natural persons with regard to the processing of personal
> data, we inform you that all the data we possess are object of treatement
> in the respect of the normative provided for by the cited GDPR.
>
> It is your right to be informed on which of your data are used and how;
> you may ask for their correction, cancellation or you may oppose to their
> use by written request sent by recorded delivery to The Microsoft Research
> – University of Trento Centre for Computational and Systems Biology Scarl,
> Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
>
>


Re: solr crashes

2018-12-04 Thread Danilo Tomasoni

Hello Bernd,

Thanks for the suggestion,

the problem is that we don't have 75 GB of RAM.

Are you aware of any way to reduce solr memory usage?

Thanks

Danilo

On 04/12/18 15:06, Bernd Fehling wrote:

Hi Danilo,

Full GC points out that you need more heap which also implies that you 
need more RAM.
Raise your heap to 24GB and your physical RAM to about 75GB or better 
96GB.

RAM should be about 3 to 4 times heap size.

Regards, Bernd


Am 04.12.18 um 13:37 schrieb Danilo Tomasoni:

Hello Bernd,

Here I list the extra info you requested:

- actually the virtual machine has 22GB of RAM and 16GB of heap

- my 40 million raw data takes about 1364GB on filesystem (in xml 
format)


- my index optimized (1 segment, 0 deleted docs) takes about 555GB

- solr 7.3, openjdk 1.8.0_181

- GC logs are like

2018-12-03T07:40:22.302+0100: 28752.505: [Full GC (Allocation 
Failure) 2018-12-03T07:40:22.302+0100: 28752.505: [CMS: 
12287999K->12287999K(12288000K), 13.6470083 secs] 
15701375K->15701373K(15701376K), [Metaspace: 
37438K->37438K(1083392K)], 13.6470726 secs] [Times: user=13.66 
sys=0.00, real=13.64 secs]

Heap after GC invocations=2108 (full 1501):
  par new generation   total 3413376K, used 3413373K 
[0x0003d800, 0x0004d200, 0x0004d200)
   eden space 2730752K,  99% used [0x0003d800, 
0x00047eabfdc0, 0x00047eac)
   from space 682624K,  99% used [0x00047eac, 
0x0004a855f8a0, 0x0004a856)
   to   space 682624K,   0% used [0x0004a856, 
0x0004a856, 0x0004d200)
  concurrent mark-sweep generation total 12288000K, used 12287999K 
[0x0004d200, 0x0007c000, 0x0007c000)
  Metaspace   used 37438K, capacity 38438K, committed 38676K, 
reserved 1083392K
   class space    used 4257K, capacity 4521K, committed 4628K, 
reserved 1048576K

}


Thank you for your help

Danilo


On 03/12/18 10:36, Bernd Fehling wrote:

Hi Danilo,

you have to give more infos about your system and the config.

- 30gb RAM (physical RAM?) how much heap do you have for JAVA?
- how large (in GByte) are your 40 million raw data being indexed?
- how large is your index (in GByte) with 40 million docs indexed?
- which version of Solr and JAVA?
- do you have JAVA garbage collection logs and if so what are they 
reporting?

- Any FullGC in GC logs?

Regards, Bernd


Am 03.12.18 um 10:09 schrieb Danilo Tomasoni:

Hello all,

We have a configuration with a single node with 30gb of RAM.

We use it to index ~40MLN of documents.

We perform queries with edismax parser that contain often edismax 
parser subqueries with the syntax


'_query_:{!edismax mm=X v=$subqueryN}'

Often X == 1.

This solves the "too many boolean clauses" error we got expanding 
the query terms (often phrase queries) directly in the main query.


Unfortunately in this scenario solr often crashes while performing 
a query, even with a single query and no other source of system load.



Do you have any idea of what's going on here?

Otherwise,

What kind of solr configuration parameters do you think I need to 
investigate first?


What kind of log lines should I search for to understand what's 
going on?



Thank you

Danilo




--
Danilo Tomasoni
COSBI

As for the European General Data Protection Regulation 2016/679 on the 
protection of natural persons with regard to the processing of personal data, 
we inform you that all the data we possess are object of treatement in the 
respect of the normative provided for by the cited GDPR.

It is your right to be informed on which of your data are used and how; you may 
ask for their correction, cancellation or you may oppose to their use by 
written request sent by recorded delivery to The Microsoft Research – 
University of Trento Centre for Computational and Systems Biology Scarl, Piazza 
Manifattura 1, 38068 Rovereto (TN), Italy.



Re: solr crashes

2018-12-04 Thread Bernd Fehling

Hi Danilo,

Full GC points out that you need more heap which also implies that you need 
more RAM.
Raise your heap to 24GB and your physical RAM to about 75GB or better 96GB.
RAM should be about 3 to 4 times heap size.

Regards, Bernd


Am 04.12.18 um 13:37 schrieb Danilo Tomasoni:

Hello Bernd,

Here I list the extra info you requested:

- actually the virtual machine has 22GB of RAM and 16GB of heap

- my 40 million raw data takes about 1364GB on filesystem (in xml format)

- my index optimized (1 segment, 0 deleted docs) takes about 555GB

- solr 7.3, openjdk 1.8.0_181

- GC logs are like

2018-12-03T07:40:22.302+0100: 28752.505: [Full GC (Allocation Failure) 2018-12-03T07:40:22.302+0100: 28752.505: [CMS: 
12287999K->12287999K(12288000K), 13.6470083 secs] 15701375K->15701373K(15701376K), [Metaspace: 37438K->37438K(1083392K)], 13.6470726 secs] 
[Times: user=13.66 sys=0.00, real=13.64 secs]

Heap after GC invocations=2108 (full 1501):
  par new generation   total 3413376K, used 3413373K [0x0003d800, 
0x0004d200, 0x0004d200)
   eden space 2730752K,  99% used [0x0003d800, 0x00047eabfdc0, 
0x00047eac)
   from space 682624K,  99% used [0x00047eac, 0x0004a855f8a0, 
0x0004a856)
   to   space 682624K,   0% used [0x0004a856, 0x0004a856, 
0x0004d200)
  concurrent mark-sweep generation total 12288000K, used 12287999K 
[0x0004d200, 0x0007c000, 0x0007c000)
  Metaspace   used 37438K, capacity 38438K, committed 38676K, reserved 
1083392K
   class space    used 4257K, capacity 4521K, committed 4628K, reserved 1048576K
}


Thank you for your help

Danilo


On 03/12/18 10:36, Bernd Fehling wrote:

Hi Danilo,

you have to give more infos about your system and the config.

- 30gb RAM (physical RAM?) how much heap do you have for JAVA?
- how large (in GByte) are your 40 million raw data being indexed?
- how large is your index (in GByte) with 40 million docs indexed?
- which version of Solr and JAVA?
- do you have JAVA garbage collection logs and if so what are they reporting?
- Any FullGC in GC logs?

Regards, Bernd


Am 03.12.18 um 10:09 schrieb Danilo Tomasoni:

Hello all,

We have a configuration with a single node with 30gb of RAM.

We use it to index ~40MLN of documents.

We perform queries with edismax parser that contain often edismax parser 
subqueries with the syntax

'_query_:{!edismax mm=X v=$subqueryN}'

Often X == 1.

This solves the "too many boolean clauses" error we got expanding the query 
terms (often phrase queries) directly in the main query.

Unfortunately in this scenario solr often crashes while performing a query, 
even with a single query and no other source of system load.


Do you have any idea of what's going on here?

Otherwise,

What kind of solr configuration parameters do you think I need to investigate 
first?

What kind of log lines should I search for to understand what's going on?


Thank you

Danilo



--
*
Bernd FehlingBielefeld University Library
Dipl.-Inform. (FH)LibTec - Library Technology
Universitätsstr. 25  and Knowledge Management
33615 Bielefeld
Tel. +49 521 106-4060   bernd.fehling(at)uni-bielefeld.de
  https://www.ub.uni-bielefeld.de/~befehl/

BASE - Bielefeld Academic Search Engine - www.base-search.net
*


Re: solr crashes

2018-12-04 Thread Danilo Tomasoni

Hello Bernd,

Here I list the extra info you requested:

- actually the virtual machine has 22GB of RAM and 16GB of heap

- my 40 million raw data takes about 1364GB on filesystem (in xml format)

- my index optimized (1 segment, 0 deleted docs) takes about 555GB

- solr 7.3, openjdk 1.8.0_181

- GC logs are like

2018-12-03T07:40:22.302+0100: 28752.505: [Full GC (Allocation Failure) 
2018-12-03T07:40:22.302+0100: 28752.505: [CMS: 
12287999K->12287999K(12288000K), 13.6470083 secs] 
15701375K->15701373K(15701376K), [Metaspace: 37438K->37438K(1083392K)], 
13.6470726 secs] [Times: user=13.66 sys=0.00, real=13.64 secs]

Heap after GC invocations=2108 (full 1501):
 par new generation   total 3413376K, used 3413373K 
[0x0003d800, 0x0004d200, 0x0004d200)
  eden space 2730752K,  99% used [0x0003d800, 
0x00047eabfdc0, 0x00047eac)
  from space 682624K,  99% used [0x00047eac, 
0x0004a855f8a0, 0x0004a856)
  to   space 682624K,   0% used [0x0004a856, 
0x0004a856, 0x0004d200)
 concurrent mark-sweep generation total 12288000K, used 12287999K 
[0x0004d200, 0x0007c000, 0x0007c000)
 Metaspace   used 37438K, capacity 38438K, committed 38676K, 
reserved 1083392K
  class space    used 4257K, capacity 4521K, committed 4628K, reserved 
1048576K

}


Thank you for your help

Danilo


On 03/12/18 10:36, Bernd Fehling wrote:

Hi Danilo,

you have to give more infos about your system and the config.

- 30gb RAM (physical RAM?) how much heap do you have for JAVA?
- how large (in GByte) are your 40 million raw data being indexed?
- how large is your index (in GByte) with 40 million docs indexed?
- which version of Solr and JAVA?
- do you have JAVA garbage collection logs and if so what are they 
reporting?

- Any FullGC in GC logs?

Regards, Bernd


Am 03.12.18 um 10:09 schrieb Danilo Tomasoni:

Hello all,

We have a configuration with a single node with 30gb of RAM.

We use it to index ~40MLN of documents.

We perform queries with edismax parser that contain often edismax 
parser subqueries with the syntax


'_query_:{!edismax mm=X v=$subqueryN}'

Often X == 1.

This solves the "too many boolean clauses" error we got expanding the 
query terms (often phrase queries) directly in the main query.


Unfortunately in this scenario solr often crashes while performing a 
query, even with a single query and no other source of system load.



Do you have any idea of what's going on here?

Otherwise,

What kind of solr configuration parameters do you think I need to 
investigate first?


What kind of log lines should I search for to understand what's going 
on?



Thank you

Danilo


--
Danilo Tomasoni
COSBI

As for the European General Data Protection Regulation 2016/679 on the 
protection of natural persons with regard to the processing of personal data, 
we inform you that all the data we possess are object of treatement in the 
respect of the normative provided for by the cited GDPR.

It is your right to be informed on which of your data are used and how; you may 
ask for their correction, cancellation or you may oppose to their use by 
written request sent by recorded delivery to The Microsoft Research – 
University of Trento Centre for Computational and Systems Biology Scarl, Piazza 
Manifattura 1, 38068 Rovereto (TN), Italy.



Re: solr crashes

2018-12-03 Thread Bernd Fehling

Hi Danilo,

you have to give more infos about your system and the config.

- 30gb RAM (physical RAM?) how much heap do you have for JAVA?
- how large (in GByte) are your 40 million raw data being indexed?
- how large is your index (in GByte) with 40 million docs indexed?
- which version of Solr and JAVA?
- do you have JAVA garbage collection logs and if so what are they reporting?
- Any FullGC in GC logs?

Regards, Bernd


Am 03.12.18 um 10:09 schrieb Danilo Tomasoni:

Hello all,

We have a configuration with a single node with 30gb of RAM.

We use it to index ~40MLN of documents.

We perform queries with edismax parser that contain often edismax parser 
subqueries with the syntax

'_query_:{!edismax mm=X v=$subqueryN}'

Often X == 1.

This solves the "too many boolean clauses" error we got expanding the query 
terms (often phrase queries) directly in the main query.

Unfortunately in this scenario solr often crashes while performing a query, 
even with a single query and no other source of system load.


Do you have any idea of what's going on here?

Otherwise,

What kind of solr configuration parameters do you think I need to investigate 
first?

What kind of log lines should I search for to understand what's going on?


Thank you

Danilo



solr crashes

2018-12-03 Thread Danilo Tomasoni

Hello all,

We have a configuration with a single node with 30gb of RAM.

We use it to index ~40MLN of documents.

We perform queries with edismax parser that contain often edismax parser 
subqueries with the syntax


'_query_:{!edismax mm=X v=$subqueryN}'

Often X == 1.

This solves the "too many boolean clauses" error we got expanding the 
query terms (often phrase queries) directly in the main query.


Unfortunately in this scenario solr often crashes while performing a 
query, even with a single query and no other source of system load.



Do you have any idea of what's going on here?

Otherwise,

What kind of solr configuration parameters do you think I need to 
investigate first?


What kind of log lines should I search for to understand what's going on?


Thank you

Danilo

--
Danilo Tomasoni
COSBI

As for the European General Data Protection Regulation 2016/679 on the 
protection of natural persons with regard to the processing of personal data, 
we inform you that all the data we possess are object of treatement in the 
respect of the normative provided for by the cited GDPR.

It is your right to be informed on which of your data are used and how; you may 
ask for their correction, cancellation or you may oppose to their use by 
written request sent by recorded delivery to The Microsoft Research – 
University of Trento Centre for Computational and Systems Biology Scarl, Piazza 
Manifattura 1, 38068 Rovereto (TN), Italy.



Re: SOlr crashes

2006-08-14 Thread Yonik Seeley

On 8/14/06, Chris Hostetter [EMAIL PROTECTED] wrote:


Something else to consider is using the compound file format to reduce the
number of files for your index.

this is mentioned in the Lucen FAQ...


Yeah, although unless you have a *lot* of fields with norms, I'd
sooner reduce the mergeFactor and increase maxBufferedDocs to keep the
number of files under control.

-Yonik