Re: Query in quotes cannot find results

2020-06-29 Thread Erick Erickson
Looks like you’re removing stopwords. Stopwords cause issues like this with the 
positions being off.

It’s becoming more and more common to _NOT_ remove stopwords, is that an option?



Best,
Erick

> On Jun 29, 2020, at 7:32 PM, Permakoff, Vadim  
> wrote:
> 
> Hi Shawn,
> Many thanks for the response, I checked the field and it is correct. Let's 
> call it _text_ to make it easier.
> I believe the parsing is also correct, please see below:
> - Query without quotes (works):
>"querystring":"expand the methods",
>"parsedquery":"(PhraseQuery(_text_:\"blow up\") _text_:expand) 
> _text_:methods",
> 
> - Query with quotes (does not work):
>"querystring":"\"expand the methods\"",
>"parsedquery":"SpanNearQuery(spanNear([spanOr([spanNear([_text_:blow, 
> _text_:up], 0, true), _text_:expand]), _text_:methods], 0, true))",
> 
> The document has text:
> "to expand the methods for mailing cancellation"
> 
> The analysis on this field shows that all words are present in the index and 
> the query, the order is also correct, but the word "methods" in moved one 
> position, I guess that's why the result is not found.
> 
> Best Regards,
> Vadim Permakoff
> 
> 
> 
> 
> -Original Message-
> From: Shawn Heisey 
> Sent: Monday, June 29, 2020 6:28 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Query in quotes cannot find results
> 
> On 6/29/2020 3:34 PM, Permakoff, Vadim wrote:
>> The basic query q=expand the methods   <<< finds the document,
>> the query (in quotes) q="expand the methods"   <<< cannot find the document
>> 
>> Am I doing something wrong, or is it known bug (I saw similar issues 
>> discussed in the past, but not for exact match query) and if yes - what is 
>> the Jira for it?
> 
> The most helpful information will come from running both queries with debug 
> enabled, so you can see how the query is parsed.  If you add a parameter 
> "debugQuery=true" to the URL, then the response should include the parsed 
> query.  Compare those, and see if you can tell what the differences are.
> 
> One of the most common problems for queries like this is that you're not 
> searching the field that you THINK you're searching.  I don't know whether 
> this is the problem, I just mention it because it is a common error.
> 
> Thanks,
> Shawn
> 
> 
> 
> This email is intended solely for the recipient. It may contain privileged, 
> proprietary or confidential information or material. If you are not the 
> intended recipient, please delete this email and any attachments and notify 
> the sender of the error.



RE: Query in quotes cannot find results

2020-06-29 Thread Permakoff, Vadim
Hi Shawn,
Many thanks for the response, I checked the field and it is correct. Let's call 
it _text_ to make it easier.
I believe the parsing is also correct, please see below:
 - Query without quotes (works):
"querystring":"expand the methods",
"parsedquery":"(PhraseQuery(_text_:\"blow up\") _text_:expand) 
_text_:methods",

 - Query with quotes (does not work):
"querystring":"\"expand the methods\"",
"parsedquery":"SpanNearQuery(spanNear([spanOr([spanNear([_text_:blow, 
_text_:up], 0, true), _text_:expand]), _text_:methods], 0, true))",

The document has text:
"to expand the methods for mailing cancellation"

The analysis on this field shows that all words are present in the index and 
the query, the order is also correct, but the word "methods" in moved one 
position, I guess that's why the result is not found.

Best Regards,
Vadim Permakoff




-Original Message-
From: Shawn Heisey 
Sent: Monday, June 29, 2020 6:28 PM
To: solr-user@lucene.apache.org
Subject: Re: Query in quotes cannot find results

On 6/29/2020 3:34 PM, Permakoff, Vadim wrote:
> The basic query q=expand the methods   <<< finds the document,
> the query (in quotes) q="expand the methods"   <<< cannot find the document
>
> Am I doing something wrong, or is it known bug (I saw similar issues 
> discussed in the past, but not for exact match query) and if yes - what is 
> the Jira for it?

The most helpful information will come from running both queries with debug 
enabled, so you can see how the query is parsed.  If you add a parameter 
"debugQuery=true" to the URL, then the response should include the parsed 
query.  Compare those, and see if you can tell what the differences are.

One of the most common problems for queries like this is that you're not 
searching the field that you THINK you're searching.  I don't know whether this 
is the problem, I just mention it because it is a common error.

Thanks,
Shawn



This email is intended solely for the recipient. It may contain privileged, 
proprietary or confidential information or material. If you are not the 
intended recipient, please delete this email and any attachments and notify the 
sender of the error.


Re: Query in quotes cannot find results

2020-06-29 Thread Shawn Heisey

On 6/29/2020 3:34 PM, Permakoff, Vadim wrote:

The basic query q=expand the methods   <<< finds the document,
the query (in quotes) q="expand the methods"   <<< cannot find the document

Am I doing something wrong, or is it known bug (I saw similar issues discussed 
in the past, but not for exact match query) and if yes - what is the Jira for 
it?


The most helpful information will come from running both queries with 
debug enabled, so you can see how the query is parsed.  If you add a 
parameter "debugQuery=true" to the URL, then the response should include 
the parsed query.  Compare those, and see if you can tell what the 
differences are.


One of the most common problems for queries like this is that you're not 
searching the field that you THINK you're searching.  I don't know 
whether this is the problem, I just mention it because it is a common error.


Thanks,
Shawn


Query in quotes cannot find results

2020-06-29 Thread Permakoff, Vadim
Hi,
This might be known issue, but I cannot find a reference for this specific case 
- searching for exact query with synonyms and stopwords.

I have a simple configuration for catch-all field:


  
   


  
  




  


The synonyms.txt file has only one line:
expand,blow up

The stopwords.txt file has only one line:
the

There is only one document:
{
   "id":"1",
"title":"to expand the methods for mailing cancellation"
}

Everything else is default basic configuaration. Tested with Solr 6.5.1 and 
Solr 8.5.2.

The basic query q=expand the methods   <<< finds the document,
the query (in quotes) q="expand the methods"   <<< cannot find the document

Am I doing something wrong, or is it known bug (I saw similar issues discussed 
in the past, but not for exact match query) and if yes - what is the Jira for 
it?

Best Regards,
Vadim Permakoff




This email is intended solely for the recipient. It may contain privileged, 
proprietary or confidential information or material. If you are not the 
intended recipient, please delete this email and any attachments and notify the 
sender of the error.


Suggestion or recommendation for NRT

2020-06-29 Thread ramyogi
Hi,

We are using SOLR 7.5.0 version, We are testing one collection for both
Search and Index.
Our collection created with  below indexerconfig, We are using indexing
process KAFKA connect plugin with every 5 min commit (cloud SOLRJ) as below
https://github.com/jcustenborder/kafka-connect-solr

Our collection 30 shard and 3 replica with good RAM EC2 nodes ( 90 nodes) .
it is almost 2.5 TB size. We could see the performance impact for search
request when indexing in progress.   Any kind of recommendation or fine
tunning steps to be considered , Please provide any references if there is
available that will help. 



150
8000
100

10
10


${solr.lock.type:native}
true







--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Prefix + Suffix Wildcards in Searches

2020-06-29 Thread Mikhail Khludnev
Hello, Chris.
I suppose index time analysis can yield these terms:
"paid","ms-reply-unpaid","ms-reply-paid", and thus let you avoid these
expensive wildcard queries. Here's why it's worth to avoid them
https://www.slideshare.net/lucidworks/search-like-sql-mikhail-khludnev-epam

On Mon, Jun 29, 2020 at 6:17 PM Chris Dempsey  wrote:

> Hello, all! I'm relatively new to Solr and Lucene (*using Solr 7.7.1*) but
> I'm looking into options for optimizing something like this:
>
> > fq=(tag:* -tag:*paid*) OR (tag:* -tag:*ms-reply-unpaid*) OR
> tag:*ms-reply-paid*
>
> It's probably not a surprise that we're seeing performance issues with
> something like this. My understanding is that using the wildcard on both
> ends forces a full-text index search. Something like the above can't take
> advantage of something like the ReverseWordFilter either. I believe
> constructing `n-grams` is an option (*at the expense of index size*) but is
> there anything I'm overlooking as a possible avenue to look into?
>


-- 
Sincerely yours
Mikhail Khludnev


Re: How to determine why solr stops running?

2020-06-29 Thread Erick Erickson
Really look at your cache size settings.

This is to eliminate this scenario:
- your cache sizes are very large
- when you looked and the memory was 9G, you also had a lot of cache entries
- there was a commit, which threw out the old cache and reduced your cache size

This is frankly kind of unlikely, but worth checking.

The other option is that you haven’t been hitting OOMs at all and that’s a 
complete
red herring. Let’s say in actuality, you only need an 8G heap or even smaller. 
By
overallocating memory garbage will simply accumulate for a long time and when it
is eventually collected, _lots_ of memory will be collected.

Another rather unlikely scenario, but again worth checking.

Best,
Erick

> On Jun 29, 2020, at 3:27 PM, Ryan W  wrote:
> 
> On Mon, Jun 29, 2020 at 3:13 PM Erick Erickson 
> wrote:
> 
>> ps aux | grep solr
>> 
> 
> [solr@faspbsy0002 database-backups]$ ps aux | grep solr
> solr  72072  1.6 33.4 22847816 10966476 ?   Sl   13:35   1:36 java
> -server -Xms16g -Xmx16g -XX:+UseG1GC -XX:+ParallelRefProcEnabled
> -XX:G1HeapRegionSize=8m -XX:MaxGCPauseMillis=200 -XX:+UseLargePages
> -XX:+AggressiveOpts -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
> -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime
> -Xloggc:/opt/solr/server/logs/solr_gc.log -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M
> -Dsolr.log.dir=/opt/solr/server/logs -Djetty.port=8983 -DSTOP.PORT=7983
> -DSTOP.KEY=solrrocks -Duser.timezone=UTC -Djetty.home=/opt/solr/server
> -Dsolr.solr.home=/opt/solr/server/solr -Dsolr.data.home=
> -Dsolr.install.dir=/opt/solr
> -Dsolr.default.confdir=/opt/solr/server/solr/configsets/_default/conf
> -Xss256k -Dsolr.jetty.https.port=8983 -Dsolr.log.muteconsole
> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /opt/solr/server/logs
> -jar start.jar --module=http
> 
> 
> 
>> should show you all the parameters Solr is running with, as would the
>> admin screen. You should see something like:
>> 
>> -XX:OnOutOfMemoryError=your_solr_directory/bin/oom_solr.sh
>> 
>> And there should be some logs laying around if that was the case
>> similar to:
>> $SOLR_LOGS_DIR/solr_oom_killer-$SOLR_PORT-$NOW.log
>> 
> 
> This log is not being written, even though in the oom_solr.sh it does
> appear a solr_oom_killer-$SOLR_PORT-$NOW.log should be written to the logs
> directory, but it isn't. There are some log files in /opt/solr/server/logs,
> and they are indeed being written to.  There are fresh entries in the logs,
> but no sign of any problem.  If I grep for oom in the logs directory, the
> only references I see are benign... just a few entries that list all the
> flags, and oom_solr.sh is among the settings visible in the entry.  And
> someone did a search for "Mushroom," so there's another instance of oom
> from that search.
> 
> 
> As for memory, It Depends (tm). There are configurations
>> you can make choices about that will affect the heap requirements.
>> You can’t really draw comparisons between different projects. Your
>> Drupal + Solr app has how many documents? Indexed how? Searched
>> how? .vs. this one.
>> 
>> The usual suspect for configuration settings that are responsible
>> include:
>> 
>> - filterCache size too large. Each filterCache entry is bounded by
>> maxDoc/8 bytes. I’ve seen people set this to over 1M…
>> 
>> - using non-docValues for fields used for sorting, grouping, function
>> queries
>> or faceting. Solr will uninvert the field on the heap, whereas if you have
>> specified docValues=true, the memory is out in OS memory space rather than
>> heap.
>> 
>> - People just putting too many docs in a collection in a single JVM in
>> aggregate.
>> All replicas in the same instance are using part of the heap.
>> 
>> - Having unnecessary options on your fields, although that’s more MMap
>> space than
>> heap.
>> 
>> The problem basically is that all of Solr’s access is essentially random,
>> so for
>> performance reasons lots of stuff has to be in memory.
>> 
>> That said, Solr hasn’t been as careful as it should be about using up
>> memory,
>> that’s ongoing.
>> 
>> If you really want to know what’s using up memory, throw a heap analysis
>> tool
>> at it. That’ll give you a clue what’s hogging memory and you can go from
>> there.
>> 
>>> On Jun 29, 2020, at 1:48 PM, David Hastings <
>> hastings.recurs...@gmail.com> wrote:
>>> 
>>> little nit picky note here, use 31gb, never 32.
>>> 
>>> On Mon, Jun 29, 2020 at 1:45 PM Ryan W  wrote:
>>> 
 It figures it would happen again a couple hours after I suggested the
>> issue
 might be resolved.  Just now, Solr stopped running.  I cleared the
>> cache in
 my app a couple times around the time that it happened, so perhaps that
>> was
 somehow too taxing for the server.  However, I've never allocated so
>> much
 RAM to a website before, so it's odd that I'm getting these failures.
>> My
 colleagues were astonished when I said p

Re: How to determine why solr stops running?

2020-06-29 Thread Ryan W
On Mon, Jun 29, 2020 at 3:13 PM Erick Erickson 
wrote:

> ps aux | grep solr
>

[solr@faspbsy0002 database-backups]$ ps aux | grep solr
solr  72072  1.6 33.4 22847816 10966476 ?   Sl   13:35   1:36 java
-server -Xms16g -Xmx16g -XX:+UseG1GC -XX:+ParallelRefProcEnabled
-XX:G1HeapRegionSize=8m -XX:MaxGCPauseMillis=200 -XX:+UseLargePages
-XX:+AggressiveOpts -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
-XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime
-Xloggc:/opt/solr/server/logs/solr_gc.log -XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M
-Dsolr.log.dir=/opt/solr/server/logs -Djetty.port=8983 -DSTOP.PORT=7983
-DSTOP.KEY=solrrocks -Duser.timezone=UTC -Djetty.home=/opt/solr/server
-Dsolr.solr.home=/opt/solr/server/solr -Dsolr.data.home=
-Dsolr.install.dir=/opt/solr
-Dsolr.default.confdir=/opt/solr/server/solr/configsets/_default/conf
-Xss256k -Dsolr.jetty.https.port=8983 -Dsolr.log.muteconsole
-XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /opt/solr/server/logs
-jar start.jar --module=http



> should show you all the parameters Solr is running with, as would the
> admin screen. You should see something like:
>
> -XX:OnOutOfMemoryError=your_solr_directory/bin/oom_solr.sh
>
> And there should be some logs laying around if that was the case
> similar to:
> $SOLR_LOGS_DIR/solr_oom_killer-$SOLR_PORT-$NOW.log
>

This log is not being written, even though in the oom_solr.sh it does
appear a solr_oom_killer-$SOLR_PORT-$NOW.log should be written to the logs
directory, but it isn't. There are some log files in /opt/solr/server/logs,
and they are indeed being written to.  There are fresh entries in the logs,
but no sign of any problem.  If I grep for oom in the logs directory, the
only references I see are benign... just a few entries that list all the
flags, and oom_solr.sh is among the settings visible in the entry.  And
someone did a search for "Mushroom," so there's another instance of oom
from that search.


As for memory, It Depends (tm). There are configurations
> you can make choices about that will affect the heap requirements.
> You can’t really draw comparisons between different projects. Your
> Drupal + Solr app has how many documents? Indexed how? Searched
> how? .vs. this one.
>
> The usual suspect for configuration settings that are responsible
> include:
>
> - filterCache size too large. Each filterCache entry is bounded by
> maxDoc/8 bytes. I’ve seen people set this to over 1M…
>
> - using non-docValues for fields used for sorting, grouping, function
> queries
> or faceting. Solr will uninvert the field on the heap, whereas if you have
> specified docValues=true, the memory is out in OS memory space rather than
> heap.
>
> - People just putting too many docs in a collection in a single JVM in
> aggregate.
> All replicas in the same instance are using part of the heap.
>
> - Having unnecessary options on your fields, although that’s more MMap
> space than
> heap.
>
> The problem basically is that all of Solr’s access is essentially random,
> so for
> performance reasons lots of stuff has to be in memory.
>
> That said, Solr hasn’t been as careful as it should be about using up
> memory,
> that’s ongoing.
>
> If you really want to know what’s using up memory, throw a heap analysis
> tool
> at it. That’ll give you a clue what’s hogging memory and you can go from
> there.
>
> > On Jun 29, 2020, at 1:48 PM, David Hastings <
> hastings.recurs...@gmail.com> wrote:
> >
> > little nit picky note here, use 31gb, never 32.
> >
> > On Mon, Jun 29, 2020 at 1:45 PM Ryan W  wrote:
> >
> >> It figures it would happen again a couple hours after I suggested the
> issue
> >> might be resolved.  Just now, Solr stopped running.  I cleared the
> cache in
> >> my app a couple times around the time that it happened, so perhaps that
> was
> >> somehow too taxing for the server.  However, I've never allocated so
> much
> >> RAM to a website before, so it's odd that I'm getting these failures.
> My
> >> colleagues were astonished when I said people on the solr-user list were
> >> telling me I might need 32GB just for solr.
> >>
> >> I manage another project that uses Drupal + Solr, and we have a total of
> >> 8GB of RAM on that server and Solr never, ever stops.  I've been
> managing
> >> that site for years and never seen a Solr outage.  On that project,
> >> Drupal + Solr is OK with 8GB, but somehow this other project needs 64
> GB or
> >> more?
> >>
> >> "The thing that’s unsettling about this is that assuming you were
> hitting
> >> OOMs, and were running the OOM-killer script, you _should_ have had very
> >> clear evidence that that was the cause."
> >>
> >> How do I know if I'm running the OOM-killer script?
> >>
> >> Thank you.
> >>
> >> On Mon, Jun 29, 2020 at 12:12 PM Erick Erickson <
> erickerick...@gmail.com>
> >> wrote:
> >>
> >>> The thing that’s unsettling about this is that assuming you were
> hitting
> >>> OOMs,
>

Re: How to determine why solr stops running?

2020-06-29 Thread Jörn Franke
Maybe you can identify in the logfiles some critical queries?

What is the total size of the index?

What client are you using on the web app side? Are you reusing clients or 
create one new for every query.

> Am 29.06.2020 um 21:14 schrieb Ryan W :
> 
> On Mon, Jun 29, 2020 at 1:49 PM David Hastings 
> wrote:
> 
>> little nit picky note here, use 31gb, never 32.
> 
> 
> Good to know.
> 
> Just now I got this output from bin/solr status:
> 
>  "solr_home":"/opt/solr/server/solr",
>  "version":"7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy -
> 2019-05-28 23:37:48",
>  "startTime":"2020-06-29T17:35:13.966Z",
>  "uptime":"0 days, 1 hours, 32 minutes, 7 seconds",
>  "memory":"9.3 GB (%57.9) of 16 GB"}
> 
> That's the highest memory use I've seen.  Not sure if this indicates 16GB
> isn't enough.  Then I ran it again a couple minutes later and it was down
> to 598.3 MB.  I wonder what accounts for these wide swings.  I can't
> imagine if a few users are doing searches, suddenly it uses 9 GB of RAM.
> 
> 
>> On Mon, Jun 29, 2020 at 1:45 PM Ryan W  wrote:
>> 
>>> It figures it would happen again a couple hours after I suggested the
>> issue
>>> might be resolved.  Just now, Solr stopped running.  I cleared the cache
>> in
>>> my app a couple times around the time that it happened, so perhaps that
>> was
>>> somehow too taxing for the server.  However, I've never allocated so much
>>> RAM to a website before, so it's odd that I'm getting these failures.  My
>>> colleagues were astonished when I said people on the solr-user list were
>>> telling me I might need 32GB just for solr.
>>> 
>>> I manage another project that uses Drupal + Solr, and we have a total of
>>> 8GB of RAM on that server and Solr never, ever stops.  I've been managing
>>> that site for years and never seen a Solr outage.  On that project,
>>> Drupal + Solr is OK with 8GB, but somehow this other project needs 64 GB
>> or
>>> more?
>>> 
>>> "The thing that’s unsettling about this is that assuming you were hitting
>>> OOMs, and were running the OOM-killer script, you _should_ have had very
>>> clear evidence that that was the cause."
>>> 
>>> How do I know if I'm running the OOM-killer script?
>>> 
>>> Thank you.
>>> 
>>> On Mon, Jun 29, 2020 at 12:12 PM Erick Erickson >> 
>>> wrote:
>>> 
 The thing that’s unsettling about this is that assuming you were
>> hitting
 OOMs,
 and were running the OOM-killer script, you _should_ have had very
>> clear
 evidence that that was the cause.
 
 If you were not running the killer script, the apologies for not asking
 about that
 in the first place. Java’s performance is unpredictable when OOMs
>> happen,
 which is the point of the killer script: at least Solr stops rather
>> than
>>> do
 something inexplicable.
 
 Best,
 Erick
 
> On Jun 29, 2020, at 11:52 AM, David Hastings <
 hastings.recurs...@gmail.com> wrote:
> 
> sometimes just throwing money/ram/ssd at the problem is just the best
> answer.
> 
> On Mon, Jun 29, 2020 at 11:38 AM Ryan W  wrote:
> 
>> Thanks everyone. Just to give an update on this issue, I bumped the
>>> RAM
>> available to Solr up to 16GB a couple weeks ago, and haven’t had any
>> problem since.
>> 
>> 
>> On Tue, Jun 16, 2020 at 1:00 PM David Hastings <
>> hastings.recurs...@gmail.com>
>> wrote:
>> 
>>> me personally, around 290gb.  as much as we could shove into them
>>> 
>>> On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson <
 erickerick...@gmail.com
>>> 
>>> wrote:
>>> 
 How much physical RAM? A rule of thumb is that you should allocate
>>> no
>>> more
 than 25-50 percent of the total physical RAM to Solr. That's
>> cumulative,
 i.e. the sum of the heap allocations across all your JVMs should
>> be
>> below
 that percentage. See Uwe Schindler's mmapdirectiry blog...
 
 Shot in the dark...
 
 On Tue, Jun 16, 2020, 11:51 David Hastings <
>> hastings.recurs...@gmail.com
 
 wrote:
 
> To add to this, i generally have solr start with this:
> -Xms31000m-Xmx31000m
> 
> and the only other thing that runs on them are maria db gallera
>> cluster
> nodes that are not in use (aside from replication)
> 
> the 31gb is not an accident either, you dont want 32gb.
> 
> 
> On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey <
>> apa...@elyograg.org
 
 wrote:
> 
>> On 6/11/2020 11:52 AM, Ryan W wrote:
 I will check "dmesg" first, to find out any hardware error
>>> message.
>> 
>> 
>> 
>>> [1521232.781801] Out of memory: Kill process 117529 (httpd)
>> score 9
 or
>>> sacrifice child
>>> [1521232.782908] Killed process 117529 (httpd), UID 48,
>> total-vm:675

Re: How to determine why solr stops running?

2020-06-29 Thread Ryan W
On Mon, Jun 29, 2020 at 1:49 PM David Hastings 
wrote:

> little nit picky note here, use 31gb, never 32.


Good to know.

Just now I got this output from bin/solr status:

  "solr_home":"/opt/solr/server/solr",
  "version":"7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy -
2019-05-28 23:37:48",
  "startTime":"2020-06-29T17:35:13.966Z",
  "uptime":"0 days, 1 hours, 32 minutes, 7 seconds",
  "memory":"9.3 GB (%57.9) of 16 GB"}

That's the highest memory use I've seen.  Not sure if this indicates 16GB
isn't enough.  Then I ran it again a couple minutes later and it was down
to 598.3 MB.  I wonder what accounts for these wide swings.  I can't
imagine if a few users are doing searches, suddenly it uses 9 GB of RAM.


On Mon, Jun 29, 2020 at 1:45 PM Ryan W  wrote:
>
> > It figures it would happen again a couple hours after I suggested the
> issue
> > might be resolved.  Just now, Solr stopped running.  I cleared the cache
> in
> > my app a couple times around the time that it happened, so perhaps that
> was
> > somehow too taxing for the server.  However, I've never allocated so much
> > RAM to a website before, so it's odd that I'm getting these failures.  My
> > colleagues were astonished when I said people on the solr-user list were
> > telling me I might need 32GB just for solr.
> >
> > I manage another project that uses Drupal + Solr, and we have a total of
> > 8GB of RAM on that server and Solr never, ever stops.  I've been managing
> > that site for years and never seen a Solr outage.  On that project,
> > Drupal + Solr is OK with 8GB, but somehow this other project needs 64 GB
> or
> > more?
> >
> > "The thing that’s unsettling about this is that assuming you were hitting
> > OOMs, and were running the OOM-killer script, you _should_ have had very
> > clear evidence that that was the cause."
> >
> > How do I know if I'm running the OOM-killer script?
> >
> > Thank you.
> >
> > On Mon, Jun 29, 2020 at 12:12 PM Erick Erickson  >
> > wrote:
> >
> > > The thing that’s unsettling about this is that assuming you were
> hitting
> > > OOMs,
> > > and were running the OOM-killer script, you _should_ have had very
> clear
> > > evidence that that was the cause.
> > >
> > > If you were not running the killer script, the apologies for not asking
> > > about that
> > > in the first place. Java’s performance is unpredictable when OOMs
> happen,
> > > which is the point of the killer script: at least Solr stops rather
> than
> > do
> > > something inexplicable.
> > >
> > > Best,
> > > Erick
> > >
> > > > On Jun 29, 2020, at 11:52 AM, David Hastings <
> > > hastings.recurs...@gmail.com> wrote:
> > > >
> > > > sometimes just throwing money/ram/ssd at the problem is just the best
> > > > answer.
> > > >
> > > > On Mon, Jun 29, 2020 at 11:38 AM Ryan W  wrote:
> > > >
> > > >> Thanks everyone. Just to give an update on this issue, I bumped the
> > RAM
> > > >> available to Solr up to 16GB a couple weeks ago, and haven’t had any
> > > >> problem since.
> > > >>
> > > >>
> > > >> On Tue, Jun 16, 2020 at 1:00 PM David Hastings <
> > > >> hastings.recurs...@gmail.com>
> > > >> wrote:
> > > >>
> > > >>> me personally, around 290gb.  as much as we could shove into them
> > > >>>
> > > >>> On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson <
> > > erickerick...@gmail.com
> > > >>>
> > > >>> wrote:
> > > >>>
> > >  How much physical RAM? A rule of thumb is that you should allocate
> > no
> > > >>> more
> > >  than 25-50 percent of the total physical RAM to Solr. That's
> > > >> cumulative,
> > >  i.e. the sum of the heap allocations across all your JVMs should
> be
> > > >> below
> > >  that percentage. See Uwe Schindler's mmapdirectiry blog...
> > > 
> > >  Shot in the dark...
> > > 
> > >  On Tue, Jun 16, 2020, 11:51 David Hastings <
> > > >> hastings.recurs...@gmail.com
> > > 
> > >  wrote:
> > > 
> > > > To add to this, i generally have solr start with this:
> > > > -Xms31000m-Xmx31000m
> > > >
> > > > and the only other thing that runs on them are maria db gallera
> > > >> cluster
> > > > nodes that are not in use (aside from replication)
> > > >
> > > > the 31gb is not an accident either, you dont want 32gb.
> > > >
> > > >
> > > > On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey <
> apa...@elyograg.org
> > >
> > >  wrote:
> > > >
> > > >> On 6/11/2020 11:52 AM, Ryan W wrote:
> > >  I will check "dmesg" first, to find out any hardware error
> > > >>> message.
> > > >>
> > > >> 
> > > >>
> > > >>> [1521232.781801] Out of memory: Kill process 117529 (httpd)
> > > >> score 9
> > >  or
> > > >>> sacrifice child
> > > >>> [1521232.782908] Killed process 117529 (httpd), UID 48,
> > > >> total-vm:675824kB,
> > > >>> anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB
> > > >>>
> > > >>> Is this a relevant "Out of memory" message?  Does this suggest
> an
> > > >>> OOM
> > > >>> situatio

Re: How to determine why solr stops running?

2020-06-29 Thread Erick Erickson
ps aux | grep solr

should show you all the parameters Solr is running with, as would the
admin screen. You should see something like:

-XX:OnOutOfMemoryError=your_solr_directory/bin/oom_solr.sh

And there should be some logs laying around if that was the case
similar to:
$SOLR_LOGS_DIR/solr_oom_killer-$SOLR_PORT-$NOW.log

As for memory, It Depends (tm). There are configurations
you can make choices about that will affect the heap requirements.
You can’t really draw comparisons between different projects. Your
Drupal + Solr app has how many documents? Indexed how? Searched
how? .vs. this one.

The usual suspect for configuration settings that are responsible 
include:

- filterCache size too large. Each filterCache entry is bounded by
maxDoc/8 bytes. I’ve seen people set this to over 1M…

- using non-docValues for fields used for sorting, grouping, function queries
or faceting. Solr will uninvert the field on the heap, whereas if you have
specified docValues=true, the memory is out in OS memory space rather than heap.

- People just putting too many docs in a collection in a single JVM in 
aggregate.
All replicas in the same instance are using part of the heap.

- Having unnecessary options on your fields, although that’s more MMap space 
than
heap.

The problem basically is that all of Solr’s access is essentially random, so for
performance reasons lots of stuff has to be in memory.

That said, Solr hasn’t been as careful as it should be about using up memory,
that’s ongoing.

If you really want to know what’s using up memory, throw a heap analysis tool
at it. That’ll give you a clue what’s hogging memory and you can go from there.

> On Jun 29, 2020, at 1:48 PM, David Hastings  
> wrote:
> 
> little nit picky note here, use 31gb, never 32.
> 
> On Mon, Jun 29, 2020 at 1:45 PM Ryan W  wrote:
> 
>> It figures it would happen again a couple hours after I suggested the issue
>> might be resolved.  Just now, Solr stopped running.  I cleared the cache in
>> my app a couple times around the time that it happened, so perhaps that was
>> somehow too taxing for the server.  However, I've never allocated so much
>> RAM to a website before, so it's odd that I'm getting these failures.  My
>> colleagues were astonished when I said people on the solr-user list were
>> telling me I might need 32GB just for solr.
>> 
>> I manage another project that uses Drupal + Solr, and we have a total of
>> 8GB of RAM on that server and Solr never, ever stops.  I've been managing
>> that site for years and never seen a Solr outage.  On that project,
>> Drupal + Solr is OK with 8GB, but somehow this other project needs 64 GB or
>> more?
>> 
>> "The thing that’s unsettling about this is that assuming you were hitting
>> OOMs, and were running the OOM-killer script, you _should_ have had very
>> clear evidence that that was the cause."
>> 
>> How do I know if I'm running the OOM-killer script?
>> 
>> Thank you.
>> 
>> On Mon, Jun 29, 2020 at 12:12 PM Erick Erickson 
>> wrote:
>> 
>>> The thing that’s unsettling about this is that assuming you were hitting
>>> OOMs,
>>> and were running the OOM-killer script, you _should_ have had very clear
>>> evidence that that was the cause.
>>> 
>>> If you were not running the killer script, the apologies for not asking
>>> about that
>>> in the first place. Java’s performance is unpredictable when OOMs happen,
>>> which is the point of the killer script: at least Solr stops rather than
>> do
>>> something inexplicable.
>>> 
>>> Best,
>>> Erick
>>> 
 On Jun 29, 2020, at 11:52 AM, David Hastings <
>>> hastings.recurs...@gmail.com> wrote:
 
 sometimes just throwing money/ram/ssd at the problem is just the best
 answer.
 
 On Mon, Jun 29, 2020 at 11:38 AM Ryan W  wrote:
 
> Thanks everyone. Just to give an update on this issue, I bumped the
>> RAM
> available to Solr up to 16GB a couple weeks ago, and haven’t had any
> problem since.
> 
> 
> On Tue, Jun 16, 2020 at 1:00 PM David Hastings <
> hastings.recurs...@gmail.com>
> wrote:
> 
>> me personally, around 290gb.  as much as we could shove into them
>> 
>> On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson <
>>> erickerick...@gmail.com
>> 
>> wrote:
>> 
>>> How much physical RAM? A rule of thumb is that you should allocate
>> no
>> more
>>> than 25-50 percent of the total physical RAM to Solr. That's
> cumulative,
>>> i.e. the sum of the heap allocations across all your JVMs should be
> below
>>> that percentage. See Uwe Schindler's mmapdirectiry blog...
>>> 
>>> Shot in the dark...
>>> 
>>> On Tue, Jun 16, 2020, 11:51 David Hastings <
> hastings.recurs...@gmail.com
>>> 
>>> wrote:
>>> 
 To add to this, i generally have solr start with this:
 -Xms31000m-Xmx31000m
 
 and the only other thing that runs on them are maria db gallera
> cluster
 nodes that are not in use

Re: How to determine why solr stops running?

2020-06-29 Thread David Hastings
little nit picky note here, use 31gb, never 32.

On Mon, Jun 29, 2020 at 1:45 PM Ryan W  wrote:

> It figures it would happen again a couple hours after I suggested the issue
> might be resolved.  Just now, Solr stopped running.  I cleared the cache in
> my app a couple times around the time that it happened, so perhaps that was
> somehow too taxing for the server.  However, I've never allocated so much
> RAM to a website before, so it's odd that I'm getting these failures.  My
> colleagues were astonished when I said people on the solr-user list were
> telling me I might need 32GB just for solr.
>
> I manage another project that uses Drupal + Solr, and we have a total of
> 8GB of RAM on that server and Solr never, ever stops.  I've been managing
> that site for years and never seen a Solr outage.  On that project,
> Drupal + Solr is OK with 8GB, but somehow this other project needs 64 GB or
> more?
>
> "The thing that’s unsettling about this is that assuming you were hitting
> OOMs, and were running the OOM-killer script, you _should_ have had very
> clear evidence that that was the cause."
>
> How do I know if I'm running the OOM-killer script?
>
> Thank you.
>
> On Mon, Jun 29, 2020 at 12:12 PM Erick Erickson 
> wrote:
>
> > The thing that’s unsettling about this is that assuming you were hitting
> > OOMs,
> > and were running the OOM-killer script, you _should_ have had very clear
> > evidence that that was the cause.
> >
> > If you were not running the killer script, the apologies for not asking
> > about that
> > in the first place. Java’s performance is unpredictable when OOMs happen,
> > which is the point of the killer script: at least Solr stops rather than
> do
> > something inexplicable.
> >
> > Best,
> > Erick
> >
> > > On Jun 29, 2020, at 11:52 AM, David Hastings <
> > hastings.recurs...@gmail.com> wrote:
> > >
> > > sometimes just throwing money/ram/ssd at the problem is just the best
> > > answer.
> > >
> > > On Mon, Jun 29, 2020 at 11:38 AM Ryan W  wrote:
> > >
> > >> Thanks everyone. Just to give an update on this issue, I bumped the
> RAM
> > >> available to Solr up to 16GB a couple weeks ago, and haven’t had any
> > >> problem since.
> > >>
> > >>
> > >> On Tue, Jun 16, 2020 at 1:00 PM David Hastings <
> > >> hastings.recurs...@gmail.com>
> > >> wrote:
> > >>
> > >>> me personally, around 290gb.  as much as we could shove into them
> > >>>
> > >>> On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson <
> > erickerick...@gmail.com
> > >>>
> > >>> wrote:
> > >>>
> >  How much physical RAM? A rule of thumb is that you should allocate
> no
> > >>> more
> >  than 25-50 percent of the total physical RAM to Solr. That's
> > >> cumulative,
> >  i.e. the sum of the heap allocations across all your JVMs should be
> > >> below
> >  that percentage. See Uwe Schindler's mmapdirectiry blog...
> > 
> >  Shot in the dark...
> > 
> >  On Tue, Jun 16, 2020, 11:51 David Hastings <
> > >> hastings.recurs...@gmail.com
> > 
> >  wrote:
> > 
> > > To add to this, i generally have solr start with this:
> > > -Xms31000m-Xmx31000m
> > >
> > > and the only other thing that runs on them are maria db gallera
> > >> cluster
> > > nodes that are not in use (aside from replication)
> > >
> > > the 31gb is not an accident either, you dont want 32gb.
> > >
> > >
> > > On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey  >
> >  wrote:
> > >
> > >> On 6/11/2020 11:52 AM, Ryan W wrote:
> >  I will check "dmesg" first, to find out any hardware error
> > >>> message.
> > >>
> > >> 
> > >>
> > >>> [1521232.781801] Out of memory: Kill process 117529 (httpd)
> > >> score 9
> >  or
> > >>> sacrifice child
> > >>> [1521232.782908] Killed process 117529 (httpd), UID 48,
> > >> total-vm:675824kB,
> > >>> anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB
> > >>>
> > >>> Is this a relevant "Out of memory" message?  Does this suggest an
> > >>> OOM
> > >>> situation is the culprit?
> > >>
> > >> Because this was in the "dmesg" output, it indicates that it is
> the
> > >> operating system killing programs because the *system* doesn't
> have
> > >>> any
> > >> memory left.  It wasn't Java that did this, and it wasn't Solr
> that
> > >>> was
> > >> killed.  It very well could have been Solr that was killed at
> > >> another
> > >> time, though.
> > >>
> > >> The process that it killed this time is named httpd ... which is
> > >> most
> > >> likely the Apache webserver.  Because the UID is 48, this is
> > >> probably
> >  an
> > >> OS derived from Redhat, where the "apache" user has UID and GID 48
> > >> by
> > >> default.  Apache with its default config can be VERY memory hungry
> > >>> when
> > >> it gets busy.
> > >>
> > >>> -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912
> > >>
> > >> This says that you started Solr with the 

Re: How to determine why solr stops running?

2020-06-29 Thread Ryan W
It figures it would happen again a couple hours after I suggested the issue
might be resolved.  Just now, Solr stopped running.  I cleared the cache in
my app a couple times around the time that it happened, so perhaps that was
somehow too taxing for the server.  However, I've never allocated so much
RAM to a website before, so it's odd that I'm getting these failures.  My
colleagues were astonished when I said people on the solr-user list were
telling me I might need 32GB just for solr.

I manage another project that uses Drupal + Solr, and we have a total of
8GB of RAM on that server and Solr never, ever stops.  I've been managing
that site for years and never seen a Solr outage.  On that project,
Drupal + Solr is OK with 8GB, but somehow this other project needs 64 GB or
more?

"The thing that’s unsettling about this is that assuming you were hitting
OOMs, and were running the OOM-killer script, you _should_ have had very
clear evidence that that was the cause."

How do I know if I'm running the OOM-killer script?

Thank you.

On Mon, Jun 29, 2020 at 12:12 PM Erick Erickson 
wrote:

> The thing that’s unsettling about this is that assuming you were hitting
> OOMs,
> and were running the OOM-killer script, you _should_ have had very clear
> evidence that that was the cause.
>
> If you were not running the killer script, the apologies for not asking
> about that
> in the first place. Java’s performance is unpredictable when OOMs happen,
> which is the point of the killer script: at least Solr stops rather than do
> something inexplicable.
>
> Best,
> Erick
>
> > On Jun 29, 2020, at 11:52 AM, David Hastings <
> hastings.recurs...@gmail.com> wrote:
> >
> > sometimes just throwing money/ram/ssd at the problem is just the best
> > answer.
> >
> > On Mon, Jun 29, 2020 at 11:38 AM Ryan W  wrote:
> >
> >> Thanks everyone. Just to give an update on this issue, I bumped the RAM
> >> available to Solr up to 16GB a couple weeks ago, and haven’t had any
> >> problem since.
> >>
> >>
> >> On Tue, Jun 16, 2020 at 1:00 PM David Hastings <
> >> hastings.recurs...@gmail.com>
> >> wrote:
> >>
> >>> me personally, around 290gb.  as much as we could shove into them
> >>>
> >>> On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson <
> erickerick...@gmail.com
> >>>
> >>> wrote:
> >>>
>  How much physical RAM? A rule of thumb is that you should allocate no
> >>> more
>  than 25-50 percent of the total physical RAM to Solr. That's
> >> cumulative,
>  i.e. the sum of the heap allocations across all your JVMs should be
> >> below
>  that percentage. See Uwe Schindler's mmapdirectiry blog...
> 
>  Shot in the dark...
> 
>  On Tue, Jun 16, 2020, 11:51 David Hastings <
> >> hastings.recurs...@gmail.com
> 
>  wrote:
> 
> > To add to this, i generally have solr start with this:
> > -Xms31000m-Xmx31000m
> >
> > and the only other thing that runs on them are maria db gallera
> >> cluster
> > nodes that are not in use (aside from replication)
> >
> > the 31gb is not an accident either, you dont want 32gb.
> >
> >
> > On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey 
>  wrote:
> >
> >> On 6/11/2020 11:52 AM, Ryan W wrote:
>  I will check "dmesg" first, to find out any hardware error
> >>> message.
> >>
> >> 
> >>
> >>> [1521232.781801] Out of memory: Kill process 117529 (httpd)
> >> score 9
>  or
> >>> sacrifice child
> >>> [1521232.782908] Killed process 117529 (httpd), UID 48,
> >> total-vm:675824kB,
> >>> anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB
> >>>
> >>> Is this a relevant "Out of memory" message?  Does this suggest an
> >>> OOM
> >>> situation is the culprit?
> >>
> >> Because this was in the "dmesg" output, it indicates that it is the
> >> operating system killing programs because the *system* doesn't have
> >>> any
> >> memory left.  It wasn't Java that did this, and it wasn't Solr that
> >>> was
> >> killed.  It very well could have been Solr that was killed at
> >> another
> >> time, though.
> >>
> >> The process that it killed this time is named httpd ... which is
> >> most
> >> likely the Apache webserver.  Because the UID is 48, this is
> >> probably
>  an
> >> OS derived from Redhat, where the "apache" user has UID and GID 48
> >> by
> >> default.  Apache with its default config can be VERY memory hungry
> >>> when
> >> it gets busy.
> >>
> >>> -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912
> >>
> >> This says that you started Solr with the default 512MB heap.  Which
> >>> is
> >> VERY VERY small.  The default is small so that Solr will start on
> >> virtually any hardware.  Almost every user must increase the heap
> >>> size.
> >> And because the OS is killing processes, it is likely that the
> >> system
> >> does not have enough memory installed for what you have running on
> >>> it.
>

Re: How to determine why solr stops running?

2020-06-29 Thread Erick Erickson
The thing that’s unsettling about this is that assuming you were hitting OOMs,
and were running the OOM-killer script, you _should_ have had very clear
evidence that that was the cause.

If you were not running the killer script, the apologies for not asking about 
that
in the first place. Java’s performance is unpredictable when OOMs happen,
which is the point of the killer script: at least Solr stops rather than do
something inexplicable.

Best,
Erick

> On Jun 29, 2020, at 11:52 AM, David Hastings  
> wrote:
> 
> sometimes just throwing money/ram/ssd at the problem is just the best
> answer.
> 
> On Mon, Jun 29, 2020 at 11:38 AM Ryan W  wrote:
> 
>> Thanks everyone. Just to give an update on this issue, I bumped the RAM
>> available to Solr up to 16GB a couple weeks ago, and haven’t had any
>> problem since.
>> 
>> 
>> On Tue, Jun 16, 2020 at 1:00 PM David Hastings <
>> hastings.recurs...@gmail.com>
>> wrote:
>> 
>>> me personally, around 290gb.  as much as we could shove into them
>>> 
>>> On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson >> 
>>> wrote:
>>> 
 How much physical RAM? A rule of thumb is that you should allocate no
>>> more
 than 25-50 percent of the total physical RAM to Solr. That's
>> cumulative,
 i.e. the sum of the heap allocations across all your JVMs should be
>> below
 that percentage. See Uwe Schindler's mmapdirectiry blog...
 
 Shot in the dark...
 
 On Tue, Jun 16, 2020, 11:51 David Hastings <
>> hastings.recurs...@gmail.com
 
 wrote:
 
> To add to this, i generally have solr start with this:
> -Xms31000m-Xmx31000m
> 
> and the only other thing that runs on them are maria db gallera
>> cluster
> nodes that are not in use (aside from replication)
> 
> the 31gb is not an accident either, you dont want 32gb.
> 
> 
> On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey 
 wrote:
> 
>> On 6/11/2020 11:52 AM, Ryan W wrote:
 I will check "dmesg" first, to find out any hardware error
>>> message.
>> 
>> 
>> 
>>> [1521232.781801] Out of memory: Kill process 117529 (httpd)
>> score 9
 or
>>> sacrifice child
>>> [1521232.782908] Killed process 117529 (httpd), UID 48,
>> total-vm:675824kB,
>>> anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB
>>> 
>>> Is this a relevant "Out of memory" message?  Does this suggest an
>>> OOM
>>> situation is the culprit?
>> 
>> Because this was in the "dmesg" output, it indicates that it is the
>> operating system killing programs because the *system* doesn't have
>>> any
>> memory left.  It wasn't Java that did this, and it wasn't Solr that
>>> was
>> killed.  It very well could have been Solr that was killed at
>> another
>> time, though.
>> 
>> The process that it killed this time is named httpd ... which is
>> most
>> likely the Apache webserver.  Because the UID is 48, this is
>> probably
 an
>> OS derived from Redhat, where the "apache" user has UID and GID 48
>> by
>> default.  Apache with its default config can be VERY memory hungry
>>> when
>> it gets busy.
>> 
>>> -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912
>> 
>> This says that you started Solr with the default 512MB heap.  Which
>>> is
>> VERY VERY small.  The default is small so that Solr will start on
>> virtually any hardware.  Almost every user must increase the heap
>>> size.
>> And because the OS is killing processes, it is likely that the
>> system
>> does not have enough memory installed for what you have running on
>>> it.
>> 
>> It is generally not a good idea to share the server hardware
>> between
>> Solr and other software, unless the system has a lot of spare
 resources,
>> memory in particular.
>> 
>> Thanks,
>> Shawn
>> 
> 
 
>>> 
>> 



Re: Prefix + Suffix Wildcards in Searches

2020-06-29 Thread Erick Erickson
I was afraid of “totally arbitrary”

OK, this field type is going to surprise the heck out of you. Whitespace
tokenizer is really stupid. It’ll include punctuation for instance. Take
a look at the admin UI/analysis page and pick your field and put some
creative entries in and you’ll see what I mean.

So let’s get some use-cases in place. Can users enter tags like
blahms-reply-unpaidnonsense and expect to find it with *ms-reply-unpaid*?
Or is the entry something like
my dog has ms-reply-unpaid and is mangy
? If the latter, simple token searching will work fine, there’s no need for
wildcards at all.

FWIW,
Erick

> On Jun 29, 2020, at 11:46 AM, Chris Dempsey  wrote:
> 
> First off, thanks for taking a look, Erick! I see you helping lots of folks
> out here and I've learned a lot from your answers. Much appreciated!
> 
>> How regular are your patterns? Are they arbitrary?
> 
> Good question. :) That's data that I should have included in the initial
> post but both the values in the `tag` field and the search query itself are
> totally arbitrary (*i.e. user entered values*). I see where you're going if
> the set of either part was limited.
> 
>> What’s the field type anyway? Is this field tokenized?
> 
>  multiValued="true"/>
> 
>  positionIncrementGap="100" autoGeneratePhraseQueries="true">
>
>
>
> preserveOriginal="true" />
>
>
> withOriginal="true" maxPosAsterisk="2" maxPosQuestion="1" minTrailing="2"
> maxFractionAsterisk="0"/>
>
>
>
>
>
> 
> 
> On Mon, Jun 29, 2020 at 10:33 AM Erick Erickson 
> wrote:
> 
>> How regular are your patterns? Are they arbitrary?
>> What I’m wondering is if you could shift your work the the
>> indexing end, perhaps even in an auxiliary field. Could you,
>> say, just index “paid”, “ms-reply-unpaid” etc? Then there
>> are no wildcards at all. This akin to “concept search”.
>> 
>> Otherwise ngramming is your best bet.
>> 
>> What’s the field type anyway? Is this field tokenized?
>> 
>> There are lots of options, but s much depends on whether
>> you can process the data such that you won’t need wildcards.
>> 
>> Best,
>> Erick
>> 
>>> On Jun 29, 2020, at 11:16 AM, Chris Dempsey  wrote:
>>> 
>>> Hello, all! I'm relatively new to Solr and Lucene (*using Solr 7.7.1*)
>> but
>>> I'm looking into options for optimizing something like this:
>>> 
 fq=(tag:* -tag:*paid*) OR (tag:* -tag:*ms-reply-unpaid*) OR
>>> tag:*ms-reply-paid*
>>> 
>>> It's probably not a surprise that we're seeing performance issues with
>>> something like this. My understanding is that using the wildcard on both
>>> ends forces a full-text index search. Something like the above can't take
>>> advantage of something like the ReverseWordFilter either. I believe
>>> constructing `n-grams` is an option (*at the expense of index size*) but
>> is
>>> there anything I'm overlooking as a possible avenue to look into?
>> 
>> 



Re: How to determine why solr stops running?

2020-06-29 Thread David Hastings
sometimes just throwing money/ram/ssd at the problem is just the best
answer.

On Mon, Jun 29, 2020 at 11:38 AM Ryan W  wrote:

> Thanks everyone. Just to give an update on this issue, I bumped the RAM
> available to Solr up to 16GB a couple weeks ago, and haven’t had any
> problem since.
>
>
> On Tue, Jun 16, 2020 at 1:00 PM David Hastings <
> hastings.recurs...@gmail.com>
> wrote:
>
> > me personally, around 290gb.  as much as we could shove into them
> >
> > On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson  >
> > wrote:
> >
> > > How much physical RAM? A rule of thumb is that you should allocate no
> > more
> > > than 25-50 percent of the total physical RAM to Solr. That's
> cumulative,
> > > i.e. the sum of the heap allocations across all your JVMs should be
> below
> > > that percentage. See Uwe Schindler's mmapdirectiry blog...
> > >
> > > Shot in the dark...
> > >
> > > On Tue, Jun 16, 2020, 11:51 David Hastings <
> hastings.recurs...@gmail.com
> > >
> > > wrote:
> > >
> > > > To add to this, i generally have solr start with this:
> > > > -Xms31000m-Xmx31000m
> > > >
> > > > and the only other thing that runs on them are maria db gallera
> cluster
> > > > nodes that are not in use (aside from replication)
> > > >
> > > > the 31gb is not an accident either, you dont want 32gb.
> > > >
> > > >
> > > > On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey 
> > > wrote:
> > > >
> > > > > On 6/11/2020 11:52 AM, Ryan W wrote:
> > > > > >> I will check "dmesg" first, to find out any hardware error
> > message.
> > > > >
> > > > > 
> > > > >
> > > > > > [1521232.781801] Out of memory: Kill process 117529 (httpd)
> score 9
> > > or
> > > > > > sacrifice child
> > > > > > [1521232.782908] Killed process 117529 (httpd), UID 48,
> > > > > total-vm:675824kB,
> > > > > > anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB
> > > > > >
> > > > > > Is this a relevant "Out of memory" message?  Does this suggest an
> > OOM
> > > > > > situation is the culprit?
> > > > >
> > > > > Because this was in the "dmesg" output, it indicates that it is the
> > > > > operating system killing programs because the *system* doesn't have
> > any
> > > > > memory left.  It wasn't Java that did this, and it wasn't Solr that
> > was
> > > > > killed.  It very well could have been Solr that was killed at
> another
> > > > > time, though.
> > > > >
> > > > > The process that it killed this time is named httpd ... which is
> most
> > > > > likely the Apache webserver.  Because the UID is 48, this is
> probably
> > > an
> > > > > OS derived from Redhat, where the "apache" user has UID and GID 48
> by
> > > > > default.  Apache with its default config can be VERY memory hungry
> > when
> > > > > it gets busy.
> > > > >
> > > > > > -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912
> > > > >
> > > > > This says that you started Solr with the default 512MB heap.  Which
> > is
> > > > > VERY VERY small.  The default is small so that Solr will start on
> > > > > virtually any hardware.  Almost every user must increase the heap
> > size.
> > > > > And because the OS is killing processes, it is likely that the
> system
> > > > > does not have enough memory installed for what you have running on
> > it.
> > > > >
> > > > > It is generally not a good idea to share the server hardware
> between
> > > > > Solr and other software, unless the system has a lot of spare
> > > resources,
> > > > > memory in particular.
> > > > >
> > > > > Thanks,
> > > > > Shawn
> > > > >
> > > >
> > >
> >
>


Re: Prefix + Suffix Wildcards in Searches

2020-06-29 Thread Chris Dempsey
First off, thanks for taking a look, Erick! I see you helping lots of folks
out here and I've learned a lot from your answers. Much appreciated!

> How regular are your patterns? Are they arbitrary?

Good question. :) That's data that I should have included in the initial
post but both the values in the `tag` field and the search query itself are
totally arbitrary (*i.e. user entered values*). I see where you're going if
the set of either part was limited.

> What’s the field type anyway? Is this field tokenized?


















On Mon, Jun 29, 2020 at 10:33 AM Erick Erickson 
wrote:

> How regular are your patterns? Are they arbitrary?
> What I’m wondering is if you could shift your work the the
> indexing end, perhaps even in an auxiliary field. Could you,
> say, just index “paid”, “ms-reply-unpaid” etc? Then there
> are no wildcards at all. This akin to “concept search”.
>
> Otherwise ngramming is your best bet.
>
> What’s the field type anyway? Is this field tokenized?
>
> There are lots of options, but s much depends on whether
> you can process the data such that you won’t need wildcards.
>
> Best,
> Erick
>
> > On Jun 29, 2020, at 11:16 AM, Chris Dempsey  wrote:
> >
> > Hello, all! I'm relatively new to Solr and Lucene (*using Solr 7.7.1*)
> but
> > I'm looking into options for optimizing something like this:
> >
> >> fq=(tag:* -tag:*paid*) OR (tag:* -tag:*ms-reply-unpaid*) OR
> > tag:*ms-reply-paid*
> >
> > It's probably not a surprise that we're seeing performance issues with
> > something like this. My understanding is that using the wildcard on both
> > ends forces a full-text index search. Something like the above can't take
> > advantage of something like the ReverseWordFilter either. I believe
> > constructing `n-grams` is an option (*at the expense of index size*) but
> is
> > there anything I'm overlooking as a possible avenue to look into?
>
>


Re: How to determine why solr stops running?

2020-06-29 Thread Ryan W
Thanks everyone. Just to give an update on this issue, I bumped the RAM
available to Solr up to 16GB a couple weeks ago, and haven’t had any
problem since.


On Tue, Jun 16, 2020 at 1:00 PM David Hastings 
wrote:

> me personally, around 290gb.  as much as we could shove into them
>
> On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson 
> wrote:
>
> > How much physical RAM? A rule of thumb is that you should allocate no
> more
> > than 25-50 percent of the total physical RAM to Solr. That's cumulative,
> > i.e. the sum of the heap allocations across all your JVMs should be below
> > that percentage. See Uwe Schindler's mmapdirectiry blog...
> >
> > Shot in the dark...
> >
> > On Tue, Jun 16, 2020, 11:51 David Hastings  >
> > wrote:
> >
> > > To add to this, i generally have solr start with this:
> > > -Xms31000m-Xmx31000m
> > >
> > > and the only other thing that runs on them are maria db gallera cluster
> > > nodes that are not in use (aside from replication)
> > >
> > > the 31gb is not an accident either, you dont want 32gb.
> > >
> > >
> > > On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey 
> > wrote:
> > >
> > > > On 6/11/2020 11:52 AM, Ryan W wrote:
> > > > >> I will check "dmesg" first, to find out any hardware error
> message.
> > > >
> > > > 
> > > >
> > > > > [1521232.781801] Out of memory: Kill process 117529 (httpd) score 9
> > or
> > > > > sacrifice child
> > > > > [1521232.782908] Killed process 117529 (httpd), UID 48,
> > > > total-vm:675824kB,
> > > > > anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB
> > > > >
> > > > > Is this a relevant "Out of memory" message?  Does this suggest an
> OOM
> > > > > situation is the culprit?
> > > >
> > > > Because this was in the "dmesg" output, it indicates that it is the
> > > > operating system killing programs because the *system* doesn't have
> any
> > > > memory left.  It wasn't Java that did this, and it wasn't Solr that
> was
> > > > killed.  It very well could have been Solr that was killed at another
> > > > time, though.
> > > >
> > > > The process that it killed this time is named httpd ... which is most
> > > > likely the Apache webserver.  Because the UID is 48, this is probably
> > an
> > > > OS derived from Redhat, where the "apache" user has UID and GID 48 by
> > > > default.  Apache with its default config can be VERY memory hungry
> when
> > > > it gets busy.
> > > >
> > > > > -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912
> > > >
> > > > This says that you started Solr with the default 512MB heap.  Which
> is
> > > > VERY VERY small.  The default is small so that Solr will start on
> > > > virtually any hardware.  Almost every user must increase the heap
> size.
> > > > And because the OS is killing processes, it is likely that the system
> > > > does not have enough memory installed for what you have running on
> it.
> > > >
> > > > It is generally not a good idea to share the server hardware between
> > > > Solr and other software, unless the system has a lot of spare
> > resources,
> > > > memory in particular.
> > > >
> > > > Thanks,
> > > > Shawn
> > > >
> > >
> >
>


Re: [EXTERNAL] Getting rid of Master/Slave nomenclature in Solr

2020-06-29 Thread Bram Van Dam
On 28/06/2020 14:42, Erick Erickson wrote:
> We need to draw a sharp distinction between standalone “going away”
> in terms of our internal code and going away in terms of the user
> experience.

It'll be hard to make it completely transparant in terms of user
experience. For instance, tere is currently no way to unload a core in
SolrCloud (without deleting it). I'm sure there are many other similar
gotchas.

 - Bram


Re: Prefix + Suffix Wildcards in Searches

2020-06-29 Thread Erick Erickson
How regular are your patterns? Are they arbitrary? 
What I’m wondering is if you could shift your work the the
indexing end, perhaps even in an auxiliary field. Could you, 
say, just index “paid”, “ms-reply-unpaid” etc? Then there
are no wildcards at all. This akin to “concept search”.

Otherwise ngramming is your best bet.

What’s the field type anyway? Is this field tokenized?

There are lots of options, but s much depends on whether
you can process the data such that you won’t need wildcards.

Best,
Erick

> On Jun 29, 2020, at 11:16 AM, Chris Dempsey  wrote:
> 
> Hello, all! I'm relatively new to Solr and Lucene (*using Solr 7.7.1*) but
> I'm looking into options for optimizing something like this:
> 
>> fq=(tag:* -tag:*paid*) OR (tag:* -tag:*ms-reply-unpaid*) OR
> tag:*ms-reply-paid*
> 
> It's probably not a surprise that we're seeing performance issues with
> something like this. My understanding is that using the wildcard on both
> ends forces a full-text index search. Something like the above can't take
> advantage of something like the ReverseWordFilter either. I believe
> constructing `n-grams` is an option (*at the expense of index size*) but is
> there anything I'm overlooking as a possible avenue to look into?



Re: [EXTERNAL] Getting rid of Master/Slave nomenclature in Solr

2020-06-29 Thread Mark H. Wood
Wandering off topic, but still apropos Solr.

On Sun, Jun 28, 2020 at 12:14:56PM +0200, Ilan Ginzburg wrote:
> I disagree Ishan. We shouldn't get rid of standalone mode.
> I see three layers in Solr:
> 
>1. Lucene (the actual search libraries)
>2. The server infra ("standalone Solr" basically)
>3. Cluster management (SolrCloud)
> 
> There's value in using lower layers without higher ones.
> SolrCloud is a good solution for some use cases but there are others that
> need a search server and for which SolrCloud is not a good fit and will
> likely never be. If standalone mode is no longer available, such use cases
> will have to turn to something other than Solr (or fork and go their own
> way).

A data point:

While working to upgrade a dependent product from Solr 4 to Solr 7, I
came across a number of APIs which would have made things simpler,
neater and more reliable...except that they all are available *only*
is SolrCloud.  I eventually decided that asking thousands of sites to
run "degenerate" SolrCloud clusters (of a single instance, plus the ZK
stuff that most would find mysterious) was just not worth the gain.

So, my wish-list for Solr includes either (a) abolish standalone so
the decision is taken out of my hands, or (b) port some of the
cloud-only APIs back to the standalone layer.  I haven't spent a
moment's thought on how difficult either would be -- as I said, just a
wish.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Prefix + Suffix Wildcards in Searches

2020-06-29 Thread Chris Dempsey
Hello, all! I'm relatively new to Solr and Lucene (*using Solr 7.7.1*) but
I'm looking into options for optimizing something like this:

> fq=(tag:* -tag:*paid*) OR (tag:* -tag:*ms-reply-unpaid*) OR
tag:*ms-reply-paid*

It's probably not a surprise that we're seeing performance issues with
something like this. My understanding is that using the wildcard on both
ends forces a full-text index search. Something like the above can't take
advantage of something like the ReverseWordFilter either. I believe
constructing `n-grams` is an option (*at the expense of index size*) but is
there anything I'm overlooking as a possible avenue to look into?


Announcing ApacheCon @Home 2020

2020-06-29 Thread Rich Bowen

Hi, Apache enthusiast!

(You’re receiving this because you’re subscribed to one or more dev or 
user mailing lists for an Apache Software Foundation project.)


The ApacheCon Planners and the Apache Software Foundation are pleased to 
announce that ApacheCon @Home will be held online, September 29th 
through October 1st, 2020. We’ll be featuring content from dozens of our 
projects, as well as content about community, how Apache works, business 
models around Apache software, the legal aspects of open source, and 
many other topics.


Full details about the event, and registration, is available at 
https://apachecon.com/acah2020


Due to the confusion around how and where this event was going to be 
held, and in order to open up to presenters from around the world who 
may previously have been unable or unwilling to travel, we’ve reopened 
the Call For Presentations until July 13th. Submit your talks today at 
https://acna2020.jamhosted.net/


We hope to see you at the event!
Rich Bowen, VP Conferences, The Apache Software Foundation


Re: [EXTERNAL] Getting rid of Master/Slave nomenclature in Solr

2020-06-29 Thread Jan Høydahl
Please start another thread to discuss removal of standalone mode, and stay 
on-topic in this one.


> 28. jun. 2020 kl. 14:42 skrev Erick Erickson :
> 
> We need to draw a sharp distinction between standalone “going away”
> in terms of our internal code and going away in terms of the user
> experience.
> 
> Usually when we’re talking about standalone going a way, it’s the
> former. The assumption is that we’ll use an embedded ZK that
> fires up automatically so Solr behaves very similarly to how it
> behaves in the current standalone mode just without all the 
> if (zkHost == null) do_something else do_something_else
> 
> I wonder it the slickest way to use embedded ZK would be to
> populate the embedded ZK during core discovery
> 
> Erick
> 
> 
> 
>> On Jun 28, 2020, at 6:40 AM, Ishan Chattopadhyaya 
>>  wrote:
>> 
>> Cost of maintaining feature parity across the two modes is an overhead.
>> Security plugins, package manager (that doesn't work in standalone), UI,
>> etc. Our codebase is littered with checks to ascertain if we're zkAware.
>> There are massive benefits to maintainability if standalone mode were to go
>> away. Of course, provided all usecases that could be solved using
>> standalone can also be solved using SolrCloud. At that point, I'd love for
>> us to get rid of the term "SolrCloud".
>> 
>> On Sun, 28 Jun, 2020, 3:59 pm Ishan Chattopadhyaya, <
>> ichattopadhy...@gmail.com> wrote:
>> 
>>> I would like to know under which situations (except for the various bugs
>>> that will be fixed eventually) would a SolrCloud solution not suffice.
>>> AFAICT, pull replicas and tlog replicas can provide similar replication
>>> strategies commonly used with standalone Solr. I understand that running ZK
>>> is an overhead and SolrCloud isn't best written when it comes to handling
>>> ZK, but that can be improved.
>>> 
>>> And for those users who just want a single node Solr, they can just start
>>> Solr with embedded ZK. It won't practically make difference.
>>> 
>>> On Sun, 28 Jun, 2020, 3:45 pm Ilan Ginzburg,  wrote:
>>> 
 I disagree Ishan. We shouldn't get rid of standalone mode.
 I see three layers in Solr:
 
  1. Lucene (the actual search libraries)
  2. The server infra ("standalone Solr" basically)
  3. Cluster management (SolrCloud)
 
 There's value in using lower layers without higher ones.
 SolrCloud is a good solution for some use cases but there are others that
 need a search server and for which SolrCloud is not a good fit and will
 likely never be. If standalone mode is no longer available, such use cases
 will have to turn to something other than Solr (or fork and go their own
 way).
 
 Ilan
 
 On Sat, Jun 27, 2020 at 9:39 PM Ishan Chattopadhyaya <
 ichattopadhy...@gmail.com> wrote:
 
> Rather than getting rid of the terminology, we should get rid of the
> standalone mode Solr altogether. I totally understand that SolrCloud is
> broken in many ways today, but we should attempt to fix it and have it
 as
> the only mode in Solr.
> 
> On Wed, 24 Jun, 2020, 8:17 pm Mike Drob,  wrote:
> 
>> Brend,
>> 
>> I appreciate that you are trying to examine this issue from multiple
> sides
>> and consider future implications, but I don’t think that is a stirring
>> argument. By analogy, if we are out of eggs and my wife asks me to go
 to
>> the store to get some, refusing to do so on the basis that she might
 call
>> me while I’m there and also ask me to get milk would not be
 reasonable.
>> 
>> What will come next may be an interesting question philosophically,
 but
> we
>> are not discussing abstract concepts here. There is a concrete issue
>> identified, and we’re soliciting input in how best to address it.
>> 
>> Thank you for the suggestion of "guide/follower"
>> 
>> Mike
>> 
>> On Wed, Jun 24, 2020 at 6:30 AM Bernd Fehling <
>> bernd.fehl...@uni-bielefeld.de> wrote:
>> 
>>> I'm following this thread now for a while and I can understand
>>> the wish to change some naming/wording/speech in one or the other
>>> programs but I always get back to the one question:
>>> "Is it the weapon which kills people or the hand controlled by
>>> the mind which fires the weapon?"
>>> 
>>> The thread started with slave - slavery, then turned over to master
>>> and followed by leader (for me as a german... you know).
>>> What will come next?
>>> 
>>> And more over, we now discuss about changes in the source code and
>>> due to this there need to be changes to the documentation.
>>> What about the books people wrote about this programs and source
 code,
>>> should we force this authors to rewrite their books?
>>> May be we should file a request to all web search engines to reject
>>> all stored content about these "banned" words?
>>> And contact

Re: solrj - get metrics from all nodes

2020-06-29 Thread Jan Høydahl
The admin UI does this my requesting &nodes=,,…
You will get a master response with each sub response as key:value pairs.
The list of node_names can be found in live_nodes in CLUSTERSTATUS api.

Jan

> 27. jun. 2020 kl. 02:09 skrev ChienHuaWang :
> 
> For people who is also looking for the solution - you can append
> "node=node_name" in metrics request to get specific data of node. 
> If anyone know how to get the data if all the nodes together, please kindly
> share, thanks.
> 
> 
> Regards,
> Chien
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html