Re: Query in quotes cannot find results
Looks like you’re removing stopwords. Stopwords cause issues like this with the positions being off. It’s becoming more and more common to _NOT_ remove stopwords, is that an option? Best, Erick > On Jun 29, 2020, at 7:32 PM, Permakoff, Vadim > wrote: > > Hi Shawn, > Many thanks for the response, I checked the field and it is correct. Let's > call it _text_ to make it easier. > I believe the parsing is also correct, please see below: > - Query without quotes (works): >"querystring":"expand the methods", >"parsedquery":"(PhraseQuery(_text_:\"blow up\") _text_:expand) > _text_:methods", > > - Query with quotes (does not work): >"querystring":"\"expand the methods\"", >"parsedquery":"SpanNearQuery(spanNear([spanOr([spanNear([_text_:blow, > _text_:up], 0, true), _text_:expand]), _text_:methods], 0, true))", > > The document has text: > "to expand the methods for mailing cancellation" > > The analysis on this field shows that all words are present in the index and > the query, the order is also correct, but the word "methods" in moved one > position, I guess that's why the result is not found. > > Best Regards, > Vadim Permakoff > > > > > -Original Message- > From: Shawn Heisey > Sent: Monday, June 29, 2020 6:28 PM > To: solr-user@lucene.apache.org > Subject: Re: Query in quotes cannot find results > > On 6/29/2020 3:34 PM, Permakoff, Vadim wrote: >> The basic query q=expand the methods <<< finds the document, >> the query (in quotes) q="expand the methods" <<< cannot find the document >> >> Am I doing something wrong, or is it known bug (I saw similar issues >> discussed in the past, but not for exact match query) and if yes - what is >> the Jira for it? > > The most helpful information will come from running both queries with debug > enabled, so you can see how the query is parsed. If you add a parameter > "debugQuery=true" to the URL, then the response should include the parsed > query. Compare those, and see if you can tell what the differences are. > > One of the most common problems for queries like this is that you're not > searching the field that you THINK you're searching. I don't know whether > this is the problem, I just mention it because it is a common error. > > Thanks, > Shawn > > > > This email is intended solely for the recipient. It may contain privileged, > proprietary or confidential information or material. If you are not the > intended recipient, please delete this email and any attachments and notify > the sender of the error.
RE: Query in quotes cannot find results
Hi Shawn, Many thanks for the response, I checked the field and it is correct. Let's call it _text_ to make it easier. I believe the parsing is also correct, please see below: - Query without quotes (works): "querystring":"expand the methods", "parsedquery":"(PhraseQuery(_text_:\"blow up\") _text_:expand) _text_:methods", - Query with quotes (does not work): "querystring":"\"expand the methods\"", "parsedquery":"SpanNearQuery(spanNear([spanOr([spanNear([_text_:blow, _text_:up], 0, true), _text_:expand]), _text_:methods], 0, true))", The document has text: "to expand the methods for mailing cancellation" The analysis on this field shows that all words are present in the index and the query, the order is also correct, but the word "methods" in moved one position, I guess that's why the result is not found. Best Regards, Vadim Permakoff -Original Message- From: Shawn Heisey Sent: Monday, June 29, 2020 6:28 PM To: solr-user@lucene.apache.org Subject: Re: Query in quotes cannot find results On 6/29/2020 3:34 PM, Permakoff, Vadim wrote: > The basic query q=expand the methods <<< finds the document, > the query (in quotes) q="expand the methods" <<< cannot find the document > > Am I doing something wrong, or is it known bug (I saw similar issues > discussed in the past, but not for exact match query) and if yes - what is > the Jira for it? The most helpful information will come from running both queries with debug enabled, so you can see how the query is parsed. If you add a parameter "debugQuery=true" to the URL, then the response should include the parsed query. Compare those, and see if you can tell what the differences are. One of the most common problems for queries like this is that you're not searching the field that you THINK you're searching. I don't know whether this is the problem, I just mention it because it is a common error. Thanks, Shawn This email is intended solely for the recipient. It may contain privileged, proprietary or confidential information or material. If you are not the intended recipient, please delete this email and any attachments and notify the sender of the error.
Re: Query in quotes cannot find results
On 6/29/2020 3:34 PM, Permakoff, Vadim wrote: The basic query q=expand the methods <<< finds the document, the query (in quotes) q="expand the methods" <<< cannot find the document Am I doing something wrong, or is it known bug (I saw similar issues discussed in the past, but not for exact match query) and if yes - what is the Jira for it? The most helpful information will come from running both queries with debug enabled, so you can see how the query is parsed. If you add a parameter "debugQuery=true" to the URL, then the response should include the parsed query. Compare those, and see if you can tell what the differences are. One of the most common problems for queries like this is that you're not searching the field that you THINK you're searching. I don't know whether this is the problem, I just mention it because it is a common error. Thanks, Shawn
Query in quotes cannot find results
Hi, This might be known issue, but I cannot find a reference for this specific case - searching for exact query with synonyms and stopwords. I have a simple configuration for catch-all field: The synonyms.txt file has only one line: expand,blow up The stopwords.txt file has only one line: the There is only one document: { "id":"1", "title":"to expand the methods for mailing cancellation" } Everything else is default basic configuaration. Tested with Solr 6.5.1 and Solr 8.5.2. The basic query q=expand the methods <<< finds the document, the query (in quotes) q="expand the methods" <<< cannot find the document Am I doing something wrong, or is it known bug (I saw similar issues discussed in the past, but not for exact match query) and if yes - what is the Jira for it? Best Regards, Vadim Permakoff This email is intended solely for the recipient. It may contain privileged, proprietary or confidential information or material. If you are not the intended recipient, please delete this email and any attachments and notify the sender of the error.
Suggestion or recommendation for NRT
Hi, We are using SOLR 7.5.0 version, We are testing one collection for both Search and Index. Our collection created with below indexerconfig, We are using indexing process KAFKA connect plugin with every 5 min commit (cloud SOLRJ) as below https://github.com/jcustenborder/kafka-connect-solr Our collection 30 shard and 3 replica with good RAM EC2 nodes ( 90 nodes) . it is almost 2.5 TB size. We could see the performance impact for search request when indexing in progress. Any kind of recommendation or fine tunning steps to be considered , Please provide any references if there is available that will help. 150 8000 100 10 10 ${solr.lock.type:native} true -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Prefix + Suffix Wildcards in Searches
Hello, Chris. I suppose index time analysis can yield these terms: "paid","ms-reply-unpaid","ms-reply-paid", and thus let you avoid these expensive wildcard queries. Here's why it's worth to avoid them https://www.slideshare.net/lucidworks/search-like-sql-mikhail-khludnev-epam On Mon, Jun 29, 2020 at 6:17 PM Chris Dempsey wrote: > Hello, all! I'm relatively new to Solr and Lucene (*using Solr 7.7.1*) but > I'm looking into options for optimizing something like this: > > > fq=(tag:* -tag:*paid*) OR (tag:* -tag:*ms-reply-unpaid*) OR > tag:*ms-reply-paid* > > It's probably not a surprise that we're seeing performance issues with > something like this. My understanding is that using the wildcard on both > ends forces a full-text index search. Something like the above can't take > advantage of something like the ReverseWordFilter either. I believe > constructing `n-grams` is an option (*at the expense of index size*) but is > there anything I'm overlooking as a possible avenue to look into? > -- Sincerely yours Mikhail Khludnev
Re: How to determine why solr stops running?
Really look at your cache size settings. This is to eliminate this scenario: - your cache sizes are very large - when you looked and the memory was 9G, you also had a lot of cache entries - there was a commit, which threw out the old cache and reduced your cache size This is frankly kind of unlikely, but worth checking. The other option is that you haven’t been hitting OOMs at all and that’s a complete red herring. Let’s say in actuality, you only need an 8G heap or even smaller. By overallocating memory garbage will simply accumulate for a long time and when it is eventually collected, _lots_ of memory will be collected. Another rather unlikely scenario, but again worth checking. Best, Erick > On Jun 29, 2020, at 3:27 PM, Ryan W wrote: > > On Mon, Jun 29, 2020 at 3:13 PM Erick Erickson > wrote: > >> ps aux | grep solr >> > > [solr@faspbsy0002 database-backups]$ ps aux | grep solr > solr 72072 1.6 33.4 22847816 10966476 ? Sl 13:35 1:36 java > -server -Xms16g -Xmx16g -XX:+UseG1GC -XX:+ParallelRefProcEnabled > -XX:G1HeapRegionSize=8m -XX:MaxGCPauseMillis=200 -XX:+UseLargePages > -XX:+AggressiveOpts -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails > -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime > -Xloggc:/opt/solr/server/logs/solr_gc.log -XX:+UseGCLogFileRotation > -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M > -Dsolr.log.dir=/opt/solr/server/logs -Djetty.port=8983 -DSTOP.PORT=7983 > -DSTOP.KEY=solrrocks -Duser.timezone=UTC -Djetty.home=/opt/solr/server > -Dsolr.solr.home=/opt/solr/server/solr -Dsolr.data.home= > -Dsolr.install.dir=/opt/solr > -Dsolr.default.confdir=/opt/solr/server/solr/configsets/_default/conf > -Xss256k -Dsolr.jetty.https.port=8983 -Dsolr.log.muteconsole > -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /opt/solr/server/logs > -jar start.jar --module=http > > > >> should show you all the parameters Solr is running with, as would the >> admin screen. You should see something like: >> >> -XX:OnOutOfMemoryError=your_solr_directory/bin/oom_solr.sh >> >> And there should be some logs laying around if that was the case >> similar to: >> $SOLR_LOGS_DIR/solr_oom_killer-$SOLR_PORT-$NOW.log >> > > This log is not being written, even though in the oom_solr.sh it does > appear a solr_oom_killer-$SOLR_PORT-$NOW.log should be written to the logs > directory, but it isn't. There are some log files in /opt/solr/server/logs, > and they are indeed being written to. There are fresh entries in the logs, > but no sign of any problem. If I grep for oom in the logs directory, the > only references I see are benign... just a few entries that list all the > flags, and oom_solr.sh is among the settings visible in the entry. And > someone did a search for "Mushroom," so there's another instance of oom > from that search. > > > As for memory, It Depends (tm). There are configurations >> you can make choices about that will affect the heap requirements. >> You can’t really draw comparisons between different projects. Your >> Drupal + Solr app has how many documents? Indexed how? Searched >> how? .vs. this one. >> >> The usual suspect for configuration settings that are responsible >> include: >> >> - filterCache size too large. Each filterCache entry is bounded by >> maxDoc/8 bytes. I’ve seen people set this to over 1M… >> >> - using non-docValues for fields used for sorting, grouping, function >> queries >> or faceting. Solr will uninvert the field on the heap, whereas if you have >> specified docValues=true, the memory is out in OS memory space rather than >> heap. >> >> - People just putting too many docs in a collection in a single JVM in >> aggregate. >> All replicas in the same instance are using part of the heap. >> >> - Having unnecessary options on your fields, although that’s more MMap >> space than >> heap. >> >> The problem basically is that all of Solr’s access is essentially random, >> so for >> performance reasons lots of stuff has to be in memory. >> >> That said, Solr hasn’t been as careful as it should be about using up >> memory, >> that’s ongoing. >> >> If you really want to know what’s using up memory, throw a heap analysis >> tool >> at it. That’ll give you a clue what’s hogging memory and you can go from >> there. >> >>> On Jun 29, 2020, at 1:48 PM, David Hastings < >> hastings.recurs...@gmail.com> wrote: >>> >>> little nit picky note here, use 31gb, never 32. >>> >>> On Mon, Jun 29, 2020 at 1:45 PM Ryan W wrote: >>> It figures it would happen again a couple hours after I suggested the >> issue might be resolved. Just now, Solr stopped running. I cleared the >> cache in my app a couple times around the time that it happened, so perhaps that >> was somehow too taxing for the server. However, I've never allocated so >> much RAM to a website before, so it's odd that I'm getting these failures. >> My colleagues were astonished when I said p
Re: How to determine why solr stops running?
On Mon, Jun 29, 2020 at 3:13 PM Erick Erickson wrote: > ps aux | grep solr > [solr@faspbsy0002 database-backups]$ ps aux | grep solr solr 72072 1.6 33.4 22847816 10966476 ? Sl 13:35 1:36 java -server -Xms16g -Xmx16g -XX:+UseG1GC -XX:+ParallelRefProcEnabled -XX:G1HeapRegionSize=8m -XX:MaxGCPauseMillis=200 -XX:+UseLargePages -XX:+AggressiveOpts -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/opt/solr/server/logs/solr_gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M -Dsolr.log.dir=/opt/solr/server/logs -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=UTC -Djetty.home=/opt/solr/server -Dsolr.solr.home=/opt/solr/server/solr -Dsolr.data.home= -Dsolr.install.dir=/opt/solr -Dsolr.default.confdir=/opt/solr/server/solr/configsets/_default/conf -Xss256k -Dsolr.jetty.https.port=8983 -Dsolr.log.muteconsole -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /opt/solr/server/logs -jar start.jar --module=http > should show you all the parameters Solr is running with, as would the > admin screen. You should see something like: > > -XX:OnOutOfMemoryError=your_solr_directory/bin/oom_solr.sh > > And there should be some logs laying around if that was the case > similar to: > $SOLR_LOGS_DIR/solr_oom_killer-$SOLR_PORT-$NOW.log > This log is not being written, even though in the oom_solr.sh it does appear a solr_oom_killer-$SOLR_PORT-$NOW.log should be written to the logs directory, but it isn't. There are some log files in /opt/solr/server/logs, and they are indeed being written to. There are fresh entries in the logs, but no sign of any problem. If I grep for oom in the logs directory, the only references I see are benign... just a few entries that list all the flags, and oom_solr.sh is among the settings visible in the entry. And someone did a search for "Mushroom," so there's another instance of oom from that search. As for memory, It Depends (tm). There are configurations > you can make choices about that will affect the heap requirements. > You can’t really draw comparisons between different projects. Your > Drupal + Solr app has how many documents? Indexed how? Searched > how? .vs. this one. > > The usual suspect for configuration settings that are responsible > include: > > - filterCache size too large. Each filterCache entry is bounded by > maxDoc/8 bytes. I’ve seen people set this to over 1M… > > - using non-docValues for fields used for sorting, grouping, function > queries > or faceting. Solr will uninvert the field on the heap, whereas if you have > specified docValues=true, the memory is out in OS memory space rather than > heap. > > - People just putting too many docs in a collection in a single JVM in > aggregate. > All replicas in the same instance are using part of the heap. > > - Having unnecessary options on your fields, although that’s more MMap > space than > heap. > > The problem basically is that all of Solr’s access is essentially random, > so for > performance reasons lots of stuff has to be in memory. > > That said, Solr hasn’t been as careful as it should be about using up > memory, > that’s ongoing. > > If you really want to know what’s using up memory, throw a heap analysis > tool > at it. That’ll give you a clue what’s hogging memory and you can go from > there. > > > On Jun 29, 2020, at 1:48 PM, David Hastings < > hastings.recurs...@gmail.com> wrote: > > > > little nit picky note here, use 31gb, never 32. > > > > On Mon, Jun 29, 2020 at 1:45 PM Ryan W wrote: > > > >> It figures it would happen again a couple hours after I suggested the > issue > >> might be resolved. Just now, Solr stopped running. I cleared the > cache in > >> my app a couple times around the time that it happened, so perhaps that > was > >> somehow too taxing for the server. However, I've never allocated so > much > >> RAM to a website before, so it's odd that I'm getting these failures. > My > >> colleagues were astonished when I said people on the solr-user list were > >> telling me I might need 32GB just for solr. > >> > >> I manage another project that uses Drupal + Solr, and we have a total of > >> 8GB of RAM on that server and Solr never, ever stops. I've been > managing > >> that site for years and never seen a Solr outage. On that project, > >> Drupal + Solr is OK with 8GB, but somehow this other project needs 64 > GB or > >> more? > >> > >> "The thing that’s unsettling about this is that assuming you were > hitting > >> OOMs, and were running the OOM-killer script, you _should_ have had very > >> clear evidence that that was the cause." > >> > >> How do I know if I'm running the OOM-killer script? > >> > >> Thank you. > >> > >> On Mon, Jun 29, 2020 at 12:12 PM Erick Erickson < > erickerick...@gmail.com> > >> wrote: > >> > >>> The thing that’s unsettling about this is that assuming you were > hitting > >>> OOMs, >
Re: How to determine why solr stops running?
Maybe you can identify in the logfiles some critical queries? What is the total size of the index? What client are you using on the web app side? Are you reusing clients or create one new for every query. > Am 29.06.2020 um 21:14 schrieb Ryan W : > > On Mon, Jun 29, 2020 at 1:49 PM David Hastings > wrote: > >> little nit picky note here, use 31gb, never 32. > > > Good to know. > > Just now I got this output from bin/solr status: > > "solr_home":"/opt/solr/server/solr", > "version":"7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - > 2019-05-28 23:37:48", > "startTime":"2020-06-29T17:35:13.966Z", > "uptime":"0 days, 1 hours, 32 minutes, 7 seconds", > "memory":"9.3 GB (%57.9) of 16 GB"} > > That's the highest memory use I've seen. Not sure if this indicates 16GB > isn't enough. Then I ran it again a couple minutes later and it was down > to 598.3 MB. I wonder what accounts for these wide swings. I can't > imagine if a few users are doing searches, suddenly it uses 9 GB of RAM. > > >> On Mon, Jun 29, 2020 at 1:45 PM Ryan W wrote: >> >>> It figures it would happen again a couple hours after I suggested the >> issue >>> might be resolved. Just now, Solr stopped running. I cleared the cache >> in >>> my app a couple times around the time that it happened, so perhaps that >> was >>> somehow too taxing for the server. However, I've never allocated so much >>> RAM to a website before, so it's odd that I'm getting these failures. My >>> colleagues were astonished when I said people on the solr-user list were >>> telling me I might need 32GB just for solr. >>> >>> I manage another project that uses Drupal + Solr, and we have a total of >>> 8GB of RAM on that server and Solr never, ever stops. I've been managing >>> that site for years and never seen a Solr outage. On that project, >>> Drupal + Solr is OK with 8GB, but somehow this other project needs 64 GB >> or >>> more? >>> >>> "The thing that’s unsettling about this is that assuming you were hitting >>> OOMs, and were running the OOM-killer script, you _should_ have had very >>> clear evidence that that was the cause." >>> >>> How do I know if I'm running the OOM-killer script? >>> >>> Thank you. >>> >>> On Mon, Jun 29, 2020 at 12:12 PM Erick Erickson >> >>> wrote: >>> The thing that’s unsettling about this is that assuming you were >> hitting OOMs, and were running the OOM-killer script, you _should_ have had very >> clear evidence that that was the cause. If you were not running the killer script, the apologies for not asking about that in the first place. Java’s performance is unpredictable when OOMs >> happen, which is the point of the killer script: at least Solr stops rather >> than >>> do something inexplicable. Best, Erick > On Jun 29, 2020, at 11:52 AM, David Hastings < hastings.recurs...@gmail.com> wrote: > > sometimes just throwing money/ram/ssd at the problem is just the best > answer. > > On Mon, Jun 29, 2020 at 11:38 AM Ryan W wrote: > >> Thanks everyone. Just to give an update on this issue, I bumped the >>> RAM >> available to Solr up to 16GB a couple weeks ago, and haven’t had any >> problem since. >> >> >> On Tue, Jun 16, 2020 at 1:00 PM David Hastings < >> hastings.recurs...@gmail.com> >> wrote: >> >>> me personally, around 290gb. as much as we could shove into them >>> >>> On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson < erickerick...@gmail.com >>> >>> wrote: >>> How much physical RAM? A rule of thumb is that you should allocate >>> no >>> more than 25-50 percent of the total physical RAM to Solr. That's >> cumulative, i.e. the sum of the heap allocations across all your JVMs should >> be >> below that percentage. See Uwe Schindler's mmapdirectiry blog... Shot in the dark... On Tue, Jun 16, 2020, 11:51 David Hastings < >> hastings.recurs...@gmail.com wrote: > To add to this, i generally have solr start with this: > -Xms31000m-Xmx31000m > > and the only other thing that runs on them are maria db gallera >> cluster > nodes that are not in use (aside from replication) > > the 31gb is not an accident either, you dont want 32gb. > > > On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey < >> apa...@elyograg.org wrote: > >> On 6/11/2020 11:52 AM, Ryan W wrote: I will check "dmesg" first, to find out any hardware error >>> message. >> >> >> >>> [1521232.781801] Out of memory: Kill process 117529 (httpd) >> score 9 or >>> sacrifice child >>> [1521232.782908] Killed process 117529 (httpd), UID 48, >> total-vm:675
Re: How to determine why solr stops running?
On Mon, Jun 29, 2020 at 1:49 PM David Hastings wrote: > little nit picky note here, use 31gb, never 32. Good to know. Just now I got this output from bin/solr status: "solr_home":"/opt/solr/server/solr", "version":"7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:48", "startTime":"2020-06-29T17:35:13.966Z", "uptime":"0 days, 1 hours, 32 minutes, 7 seconds", "memory":"9.3 GB (%57.9) of 16 GB"} That's the highest memory use I've seen. Not sure if this indicates 16GB isn't enough. Then I ran it again a couple minutes later and it was down to 598.3 MB. I wonder what accounts for these wide swings. I can't imagine if a few users are doing searches, suddenly it uses 9 GB of RAM. On Mon, Jun 29, 2020 at 1:45 PM Ryan W wrote: > > > It figures it would happen again a couple hours after I suggested the > issue > > might be resolved. Just now, Solr stopped running. I cleared the cache > in > > my app a couple times around the time that it happened, so perhaps that > was > > somehow too taxing for the server. However, I've never allocated so much > > RAM to a website before, so it's odd that I'm getting these failures. My > > colleagues were astonished when I said people on the solr-user list were > > telling me I might need 32GB just for solr. > > > > I manage another project that uses Drupal + Solr, and we have a total of > > 8GB of RAM on that server and Solr never, ever stops. I've been managing > > that site for years and never seen a Solr outage. On that project, > > Drupal + Solr is OK with 8GB, but somehow this other project needs 64 GB > or > > more? > > > > "The thing that’s unsettling about this is that assuming you were hitting > > OOMs, and were running the OOM-killer script, you _should_ have had very > > clear evidence that that was the cause." > > > > How do I know if I'm running the OOM-killer script? > > > > Thank you. > > > > On Mon, Jun 29, 2020 at 12:12 PM Erick Erickson > > > wrote: > > > > > The thing that’s unsettling about this is that assuming you were > hitting > > > OOMs, > > > and were running the OOM-killer script, you _should_ have had very > clear > > > evidence that that was the cause. > > > > > > If you were not running the killer script, the apologies for not asking > > > about that > > > in the first place. Java’s performance is unpredictable when OOMs > happen, > > > which is the point of the killer script: at least Solr stops rather > than > > do > > > something inexplicable. > > > > > > Best, > > > Erick > > > > > > > On Jun 29, 2020, at 11:52 AM, David Hastings < > > > hastings.recurs...@gmail.com> wrote: > > > > > > > > sometimes just throwing money/ram/ssd at the problem is just the best > > > > answer. > > > > > > > > On Mon, Jun 29, 2020 at 11:38 AM Ryan W wrote: > > > > > > > >> Thanks everyone. Just to give an update on this issue, I bumped the > > RAM > > > >> available to Solr up to 16GB a couple weeks ago, and haven’t had any > > > >> problem since. > > > >> > > > >> > > > >> On Tue, Jun 16, 2020 at 1:00 PM David Hastings < > > > >> hastings.recurs...@gmail.com> > > > >> wrote: > > > >> > > > >>> me personally, around 290gb. as much as we could shove into them > > > >>> > > > >>> On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson < > > > erickerick...@gmail.com > > > >>> > > > >>> wrote: > > > >>> > > > How much physical RAM? A rule of thumb is that you should allocate > > no > > > >>> more > > > than 25-50 percent of the total physical RAM to Solr. That's > > > >> cumulative, > > > i.e. the sum of the heap allocations across all your JVMs should > be > > > >> below > > > that percentage. See Uwe Schindler's mmapdirectiry blog... > > > > > > Shot in the dark... > > > > > > On Tue, Jun 16, 2020, 11:51 David Hastings < > > > >> hastings.recurs...@gmail.com > > > > > > wrote: > > > > > > > To add to this, i generally have solr start with this: > > > > -Xms31000m-Xmx31000m > > > > > > > > and the only other thing that runs on them are maria db gallera > > > >> cluster > > > > nodes that are not in use (aside from replication) > > > > > > > > the 31gb is not an accident either, you dont want 32gb. > > > > > > > > > > > > On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey < > apa...@elyograg.org > > > > > > wrote: > > > > > > > >> On 6/11/2020 11:52 AM, Ryan W wrote: > > > I will check "dmesg" first, to find out any hardware error > > > >>> message. > > > >> > > > >> > > > >> > > > >>> [1521232.781801] Out of memory: Kill process 117529 (httpd) > > > >> score 9 > > > or > > > >>> sacrifice child > > > >>> [1521232.782908] Killed process 117529 (httpd), UID 48, > > > >> total-vm:675824kB, > > > >>> anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB > > > >>> > > > >>> Is this a relevant "Out of memory" message? Does this suggest > an > > > >>> OOM > > > >>> situatio
Re: How to determine why solr stops running?
ps aux | grep solr should show you all the parameters Solr is running with, as would the admin screen. You should see something like: -XX:OnOutOfMemoryError=your_solr_directory/bin/oom_solr.sh And there should be some logs laying around if that was the case similar to: $SOLR_LOGS_DIR/solr_oom_killer-$SOLR_PORT-$NOW.log As for memory, It Depends (tm). There are configurations you can make choices about that will affect the heap requirements. You can’t really draw comparisons between different projects. Your Drupal + Solr app has how many documents? Indexed how? Searched how? .vs. this one. The usual suspect for configuration settings that are responsible include: - filterCache size too large. Each filterCache entry is bounded by maxDoc/8 bytes. I’ve seen people set this to over 1M… - using non-docValues for fields used for sorting, grouping, function queries or faceting. Solr will uninvert the field on the heap, whereas if you have specified docValues=true, the memory is out in OS memory space rather than heap. - People just putting too many docs in a collection in a single JVM in aggregate. All replicas in the same instance are using part of the heap. - Having unnecessary options on your fields, although that’s more MMap space than heap. The problem basically is that all of Solr’s access is essentially random, so for performance reasons lots of stuff has to be in memory. That said, Solr hasn’t been as careful as it should be about using up memory, that’s ongoing. If you really want to know what’s using up memory, throw a heap analysis tool at it. That’ll give you a clue what’s hogging memory and you can go from there. > On Jun 29, 2020, at 1:48 PM, David Hastings > wrote: > > little nit picky note here, use 31gb, never 32. > > On Mon, Jun 29, 2020 at 1:45 PM Ryan W wrote: > >> It figures it would happen again a couple hours after I suggested the issue >> might be resolved. Just now, Solr stopped running. I cleared the cache in >> my app a couple times around the time that it happened, so perhaps that was >> somehow too taxing for the server. However, I've never allocated so much >> RAM to a website before, so it's odd that I'm getting these failures. My >> colleagues were astonished when I said people on the solr-user list were >> telling me I might need 32GB just for solr. >> >> I manage another project that uses Drupal + Solr, and we have a total of >> 8GB of RAM on that server and Solr never, ever stops. I've been managing >> that site for years and never seen a Solr outage. On that project, >> Drupal + Solr is OK with 8GB, but somehow this other project needs 64 GB or >> more? >> >> "The thing that’s unsettling about this is that assuming you were hitting >> OOMs, and were running the OOM-killer script, you _should_ have had very >> clear evidence that that was the cause." >> >> How do I know if I'm running the OOM-killer script? >> >> Thank you. >> >> On Mon, Jun 29, 2020 at 12:12 PM Erick Erickson >> wrote: >> >>> The thing that’s unsettling about this is that assuming you were hitting >>> OOMs, >>> and were running the OOM-killer script, you _should_ have had very clear >>> evidence that that was the cause. >>> >>> If you were not running the killer script, the apologies for not asking >>> about that >>> in the first place. Java’s performance is unpredictable when OOMs happen, >>> which is the point of the killer script: at least Solr stops rather than >> do >>> something inexplicable. >>> >>> Best, >>> Erick >>> On Jun 29, 2020, at 11:52 AM, David Hastings < >>> hastings.recurs...@gmail.com> wrote: sometimes just throwing money/ram/ssd at the problem is just the best answer. On Mon, Jun 29, 2020 at 11:38 AM Ryan W wrote: > Thanks everyone. Just to give an update on this issue, I bumped the >> RAM > available to Solr up to 16GB a couple weeks ago, and haven’t had any > problem since. > > > On Tue, Jun 16, 2020 at 1:00 PM David Hastings < > hastings.recurs...@gmail.com> > wrote: > >> me personally, around 290gb. as much as we could shove into them >> >> On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson < >>> erickerick...@gmail.com >> >> wrote: >> >>> How much physical RAM? A rule of thumb is that you should allocate >> no >> more >>> than 25-50 percent of the total physical RAM to Solr. That's > cumulative, >>> i.e. the sum of the heap allocations across all your JVMs should be > below >>> that percentage. See Uwe Schindler's mmapdirectiry blog... >>> >>> Shot in the dark... >>> >>> On Tue, Jun 16, 2020, 11:51 David Hastings < > hastings.recurs...@gmail.com >>> >>> wrote: >>> To add to this, i generally have solr start with this: -Xms31000m-Xmx31000m and the only other thing that runs on them are maria db gallera > cluster nodes that are not in use
Re: How to determine why solr stops running?
little nit picky note here, use 31gb, never 32. On Mon, Jun 29, 2020 at 1:45 PM Ryan W wrote: > It figures it would happen again a couple hours after I suggested the issue > might be resolved. Just now, Solr stopped running. I cleared the cache in > my app a couple times around the time that it happened, so perhaps that was > somehow too taxing for the server. However, I've never allocated so much > RAM to a website before, so it's odd that I'm getting these failures. My > colleagues were astonished when I said people on the solr-user list were > telling me I might need 32GB just for solr. > > I manage another project that uses Drupal + Solr, and we have a total of > 8GB of RAM on that server and Solr never, ever stops. I've been managing > that site for years and never seen a Solr outage. On that project, > Drupal + Solr is OK with 8GB, but somehow this other project needs 64 GB or > more? > > "The thing that’s unsettling about this is that assuming you were hitting > OOMs, and were running the OOM-killer script, you _should_ have had very > clear evidence that that was the cause." > > How do I know if I'm running the OOM-killer script? > > Thank you. > > On Mon, Jun 29, 2020 at 12:12 PM Erick Erickson > wrote: > > > The thing that’s unsettling about this is that assuming you were hitting > > OOMs, > > and were running the OOM-killer script, you _should_ have had very clear > > evidence that that was the cause. > > > > If you were not running the killer script, the apologies for not asking > > about that > > in the first place. Java’s performance is unpredictable when OOMs happen, > > which is the point of the killer script: at least Solr stops rather than > do > > something inexplicable. > > > > Best, > > Erick > > > > > On Jun 29, 2020, at 11:52 AM, David Hastings < > > hastings.recurs...@gmail.com> wrote: > > > > > > sometimes just throwing money/ram/ssd at the problem is just the best > > > answer. > > > > > > On Mon, Jun 29, 2020 at 11:38 AM Ryan W wrote: > > > > > >> Thanks everyone. Just to give an update on this issue, I bumped the > RAM > > >> available to Solr up to 16GB a couple weeks ago, and haven’t had any > > >> problem since. > > >> > > >> > > >> On Tue, Jun 16, 2020 at 1:00 PM David Hastings < > > >> hastings.recurs...@gmail.com> > > >> wrote: > > >> > > >>> me personally, around 290gb. as much as we could shove into them > > >>> > > >>> On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson < > > erickerick...@gmail.com > > >>> > > >>> wrote: > > >>> > > How much physical RAM? A rule of thumb is that you should allocate > no > > >>> more > > than 25-50 percent of the total physical RAM to Solr. That's > > >> cumulative, > > i.e. the sum of the heap allocations across all your JVMs should be > > >> below > > that percentage. See Uwe Schindler's mmapdirectiry blog... > > > > Shot in the dark... > > > > On Tue, Jun 16, 2020, 11:51 David Hastings < > > >> hastings.recurs...@gmail.com > > > > wrote: > > > > > To add to this, i generally have solr start with this: > > > -Xms31000m-Xmx31000m > > > > > > and the only other thing that runs on them are maria db gallera > > >> cluster > > > nodes that are not in use (aside from replication) > > > > > > the 31gb is not an accident either, you dont want 32gb. > > > > > > > > > On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey > > > wrote: > > > > > >> On 6/11/2020 11:52 AM, Ryan W wrote: > > I will check "dmesg" first, to find out any hardware error > > >>> message. > > >> > > >> > > >> > > >>> [1521232.781801] Out of memory: Kill process 117529 (httpd) > > >> score 9 > > or > > >>> sacrifice child > > >>> [1521232.782908] Killed process 117529 (httpd), UID 48, > > >> total-vm:675824kB, > > >>> anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB > > >>> > > >>> Is this a relevant "Out of memory" message? Does this suggest an > > >>> OOM > > >>> situation is the culprit? > > >> > > >> Because this was in the "dmesg" output, it indicates that it is > the > > >> operating system killing programs because the *system* doesn't > have > > >>> any > > >> memory left. It wasn't Java that did this, and it wasn't Solr > that > > >>> was > > >> killed. It very well could have been Solr that was killed at > > >> another > > >> time, though. > > >> > > >> The process that it killed this time is named httpd ... which is > > >> most > > >> likely the Apache webserver. Because the UID is 48, this is > > >> probably > > an > > >> OS derived from Redhat, where the "apache" user has UID and GID 48 > > >> by > > >> default. Apache with its default config can be VERY memory hungry > > >>> when > > >> it gets busy. > > >> > > >>> -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912 > > >> > > >> This says that you started Solr with the
Re: How to determine why solr stops running?
It figures it would happen again a couple hours after I suggested the issue might be resolved. Just now, Solr stopped running. I cleared the cache in my app a couple times around the time that it happened, so perhaps that was somehow too taxing for the server. However, I've never allocated so much RAM to a website before, so it's odd that I'm getting these failures. My colleagues were astonished when I said people on the solr-user list were telling me I might need 32GB just for solr. I manage another project that uses Drupal + Solr, and we have a total of 8GB of RAM on that server and Solr never, ever stops. I've been managing that site for years and never seen a Solr outage. On that project, Drupal + Solr is OK with 8GB, but somehow this other project needs 64 GB or more? "The thing that’s unsettling about this is that assuming you were hitting OOMs, and were running the OOM-killer script, you _should_ have had very clear evidence that that was the cause." How do I know if I'm running the OOM-killer script? Thank you. On Mon, Jun 29, 2020 at 12:12 PM Erick Erickson wrote: > The thing that’s unsettling about this is that assuming you were hitting > OOMs, > and were running the OOM-killer script, you _should_ have had very clear > evidence that that was the cause. > > If you were not running the killer script, the apologies for not asking > about that > in the first place. Java’s performance is unpredictable when OOMs happen, > which is the point of the killer script: at least Solr stops rather than do > something inexplicable. > > Best, > Erick > > > On Jun 29, 2020, at 11:52 AM, David Hastings < > hastings.recurs...@gmail.com> wrote: > > > > sometimes just throwing money/ram/ssd at the problem is just the best > > answer. > > > > On Mon, Jun 29, 2020 at 11:38 AM Ryan W wrote: > > > >> Thanks everyone. Just to give an update on this issue, I bumped the RAM > >> available to Solr up to 16GB a couple weeks ago, and haven’t had any > >> problem since. > >> > >> > >> On Tue, Jun 16, 2020 at 1:00 PM David Hastings < > >> hastings.recurs...@gmail.com> > >> wrote: > >> > >>> me personally, around 290gb. as much as we could shove into them > >>> > >>> On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson < > erickerick...@gmail.com > >>> > >>> wrote: > >>> > How much physical RAM? A rule of thumb is that you should allocate no > >>> more > than 25-50 percent of the total physical RAM to Solr. That's > >> cumulative, > i.e. the sum of the heap allocations across all your JVMs should be > >> below > that percentage. See Uwe Schindler's mmapdirectiry blog... > > Shot in the dark... > > On Tue, Jun 16, 2020, 11:51 David Hastings < > >> hastings.recurs...@gmail.com > > wrote: > > > To add to this, i generally have solr start with this: > > -Xms31000m-Xmx31000m > > > > and the only other thing that runs on them are maria db gallera > >> cluster > > nodes that are not in use (aside from replication) > > > > the 31gb is not an accident either, you dont want 32gb. > > > > > > On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey > wrote: > > > >> On 6/11/2020 11:52 AM, Ryan W wrote: > I will check "dmesg" first, to find out any hardware error > >>> message. > >> > >> > >> > >>> [1521232.781801] Out of memory: Kill process 117529 (httpd) > >> score 9 > or > >>> sacrifice child > >>> [1521232.782908] Killed process 117529 (httpd), UID 48, > >> total-vm:675824kB, > >>> anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB > >>> > >>> Is this a relevant "Out of memory" message? Does this suggest an > >>> OOM > >>> situation is the culprit? > >> > >> Because this was in the "dmesg" output, it indicates that it is the > >> operating system killing programs because the *system* doesn't have > >>> any > >> memory left. It wasn't Java that did this, and it wasn't Solr that > >>> was > >> killed. It very well could have been Solr that was killed at > >> another > >> time, though. > >> > >> The process that it killed this time is named httpd ... which is > >> most > >> likely the Apache webserver. Because the UID is 48, this is > >> probably > an > >> OS derived from Redhat, where the "apache" user has UID and GID 48 > >> by > >> default. Apache with its default config can be VERY memory hungry > >>> when > >> it gets busy. > >> > >>> -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912 > >> > >> This says that you started Solr with the default 512MB heap. Which > >>> is > >> VERY VERY small. The default is small so that Solr will start on > >> virtually any hardware. Almost every user must increase the heap > >>> size. > >> And because the OS is killing processes, it is likely that the > >> system > >> does not have enough memory installed for what you have running on > >>> it. >
Re: How to determine why solr stops running?
The thing that’s unsettling about this is that assuming you were hitting OOMs, and were running the OOM-killer script, you _should_ have had very clear evidence that that was the cause. If you were not running the killer script, the apologies for not asking about that in the first place. Java’s performance is unpredictable when OOMs happen, which is the point of the killer script: at least Solr stops rather than do something inexplicable. Best, Erick > On Jun 29, 2020, at 11:52 AM, David Hastings > wrote: > > sometimes just throwing money/ram/ssd at the problem is just the best > answer. > > On Mon, Jun 29, 2020 at 11:38 AM Ryan W wrote: > >> Thanks everyone. Just to give an update on this issue, I bumped the RAM >> available to Solr up to 16GB a couple weeks ago, and haven’t had any >> problem since. >> >> >> On Tue, Jun 16, 2020 at 1:00 PM David Hastings < >> hastings.recurs...@gmail.com> >> wrote: >> >>> me personally, around 290gb. as much as we could shove into them >>> >>> On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson >> >>> wrote: >>> How much physical RAM? A rule of thumb is that you should allocate no >>> more than 25-50 percent of the total physical RAM to Solr. That's >> cumulative, i.e. the sum of the heap allocations across all your JVMs should be >> below that percentage. See Uwe Schindler's mmapdirectiry blog... Shot in the dark... On Tue, Jun 16, 2020, 11:51 David Hastings < >> hastings.recurs...@gmail.com wrote: > To add to this, i generally have solr start with this: > -Xms31000m-Xmx31000m > > and the only other thing that runs on them are maria db gallera >> cluster > nodes that are not in use (aside from replication) > > the 31gb is not an accident either, you dont want 32gb. > > > On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey wrote: > >> On 6/11/2020 11:52 AM, Ryan W wrote: I will check "dmesg" first, to find out any hardware error >>> message. >> >> >> >>> [1521232.781801] Out of memory: Kill process 117529 (httpd) >> score 9 or >>> sacrifice child >>> [1521232.782908] Killed process 117529 (httpd), UID 48, >> total-vm:675824kB, >>> anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB >>> >>> Is this a relevant "Out of memory" message? Does this suggest an >>> OOM >>> situation is the culprit? >> >> Because this was in the "dmesg" output, it indicates that it is the >> operating system killing programs because the *system* doesn't have >>> any >> memory left. It wasn't Java that did this, and it wasn't Solr that >>> was >> killed. It very well could have been Solr that was killed at >> another >> time, though. >> >> The process that it killed this time is named httpd ... which is >> most >> likely the Apache webserver. Because the UID is 48, this is >> probably an >> OS derived from Redhat, where the "apache" user has UID and GID 48 >> by >> default. Apache with its default config can be VERY memory hungry >>> when >> it gets busy. >> >>> -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912 >> >> This says that you started Solr with the default 512MB heap. Which >>> is >> VERY VERY small. The default is small so that Solr will start on >> virtually any hardware. Almost every user must increase the heap >>> size. >> And because the OS is killing processes, it is likely that the >> system >> does not have enough memory installed for what you have running on >>> it. >> >> It is generally not a good idea to share the server hardware >> between >> Solr and other software, unless the system has a lot of spare resources, >> memory in particular. >> >> Thanks, >> Shawn >> > >>> >>
Re: Prefix + Suffix Wildcards in Searches
I was afraid of “totally arbitrary” OK, this field type is going to surprise the heck out of you. Whitespace tokenizer is really stupid. It’ll include punctuation for instance. Take a look at the admin UI/analysis page and pick your field and put some creative entries in and you’ll see what I mean. So let’s get some use-cases in place. Can users enter tags like blahms-reply-unpaidnonsense and expect to find it with *ms-reply-unpaid*? Or is the entry something like my dog has ms-reply-unpaid and is mangy ? If the latter, simple token searching will work fine, there’s no need for wildcards at all. FWIW, Erick > On Jun 29, 2020, at 11:46 AM, Chris Dempsey wrote: > > First off, thanks for taking a look, Erick! I see you helping lots of folks > out here and I've learned a lot from your answers. Much appreciated! > >> How regular are your patterns? Are they arbitrary? > > Good question. :) That's data that I should have included in the initial > post but both the values in the `tag` field and the search query itself are > totally arbitrary (*i.e. user entered values*). I see where you're going if > the set of either part was limited. > >> What’s the field type anyway? Is this field tokenized? > > multiValued="true"/> > > positionIncrementGap="100" autoGeneratePhraseQueries="true"> > > > > preserveOriginal="true" /> > > > withOriginal="true" maxPosAsterisk="2" maxPosQuestion="1" minTrailing="2" > maxFractionAsterisk="0"/> > > > > > > > > On Mon, Jun 29, 2020 at 10:33 AM Erick Erickson > wrote: > >> How regular are your patterns? Are they arbitrary? >> What I’m wondering is if you could shift your work the the >> indexing end, perhaps even in an auxiliary field. Could you, >> say, just index “paid”, “ms-reply-unpaid” etc? Then there >> are no wildcards at all. This akin to “concept search”. >> >> Otherwise ngramming is your best bet. >> >> What’s the field type anyway? Is this field tokenized? >> >> There are lots of options, but s much depends on whether >> you can process the data such that you won’t need wildcards. >> >> Best, >> Erick >> >>> On Jun 29, 2020, at 11:16 AM, Chris Dempsey wrote: >>> >>> Hello, all! I'm relatively new to Solr and Lucene (*using Solr 7.7.1*) >> but >>> I'm looking into options for optimizing something like this: >>> fq=(tag:* -tag:*paid*) OR (tag:* -tag:*ms-reply-unpaid*) OR >>> tag:*ms-reply-paid* >>> >>> It's probably not a surprise that we're seeing performance issues with >>> something like this. My understanding is that using the wildcard on both >>> ends forces a full-text index search. Something like the above can't take >>> advantage of something like the ReverseWordFilter either. I believe >>> constructing `n-grams` is an option (*at the expense of index size*) but >> is >>> there anything I'm overlooking as a possible avenue to look into? >> >>
Re: How to determine why solr stops running?
sometimes just throwing money/ram/ssd at the problem is just the best answer. On Mon, Jun 29, 2020 at 11:38 AM Ryan W wrote: > Thanks everyone. Just to give an update on this issue, I bumped the RAM > available to Solr up to 16GB a couple weeks ago, and haven’t had any > problem since. > > > On Tue, Jun 16, 2020 at 1:00 PM David Hastings < > hastings.recurs...@gmail.com> > wrote: > > > me personally, around 290gb. as much as we could shove into them > > > > On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson > > > wrote: > > > > > How much physical RAM? A rule of thumb is that you should allocate no > > more > > > than 25-50 percent of the total physical RAM to Solr. That's > cumulative, > > > i.e. the sum of the heap allocations across all your JVMs should be > below > > > that percentage. See Uwe Schindler's mmapdirectiry blog... > > > > > > Shot in the dark... > > > > > > On Tue, Jun 16, 2020, 11:51 David Hastings < > hastings.recurs...@gmail.com > > > > > > wrote: > > > > > > > To add to this, i generally have solr start with this: > > > > -Xms31000m-Xmx31000m > > > > > > > > and the only other thing that runs on them are maria db gallera > cluster > > > > nodes that are not in use (aside from replication) > > > > > > > > the 31gb is not an accident either, you dont want 32gb. > > > > > > > > > > > > On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey > > > wrote: > > > > > > > > > On 6/11/2020 11:52 AM, Ryan W wrote: > > > > > >> I will check "dmesg" first, to find out any hardware error > > message. > > > > > > > > > > > > > > > > > > > > > [1521232.781801] Out of memory: Kill process 117529 (httpd) > score 9 > > > or > > > > > > sacrifice child > > > > > > [1521232.782908] Killed process 117529 (httpd), UID 48, > > > > > total-vm:675824kB, > > > > > > anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB > > > > > > > > > > > > Is this a relevant "Out of memory" message? Does this suggest an > > OOM > > > > > > situation is the culprit? > > > > > > > > > > Because this was in the "dmesg" output, it indicates that it is the > > > > > operating system killing programs because the *system* doesn't have > > any > > > > > memory left. It wasn't Java that did this, and it wasn't Solr that > > was > > > > > killed. It very well could have been Solr that was killed at > another > > > > > time, though. > > > > > > > > > > The process that it killed this time is named httpd ... which is > most > > > > > likely the Apache webserver. Because the UID is 48, this is > probably > > > an > > > > > OS derived from Redhat, where the "apache" user has UID and GID 48 > by > > > > > default. Apache with its default config can be VERY memory hungry > > when > > > > > it gets busy. > > > > > > > > > > > -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912 > > > > > > > > > > This says that you started Solr with the default 512MB heap. Which > > is > > > > > VERY VERY small. The default is small so that Solr will start on > > > > > virtually any hardware. Almost every user must increase the heap > > size. > > > > > And because the OS is killing processes, it is likely that the > system > > > > > does not have enough memory installed for what you have running on > > it. > > > > > > > > > > It is generally not a good idea to share the server hardware > between > > > > > Solr and other software, unless the system has a lot of spare > > > resources, > > > > > memory in particular. > > > > > > > > > > Thanks, > > > > > Shawn > > > > > > > > > > > > > > >
Re: Prefix + Suffix Wildcards in Searches
First off, thanks for taking a look, Erick! I see you helping lots of folks out here and I've learned a lot from your answers. Much appreciated! > How regular are your patterns? Are they arbitrary? Good question. :) That's data that I should have included in the initial post but both the values in the `tag` field and the search query itself are totally arbitrary (*i.e. user entered values*). I see where you're going if the set of either part was limited. > What’s the field type anyway? Is this field tokenized? On Mon, Jun 29, 2020 at 10:33 AM Erick Erickson wrote: > How regular are your patterns? Are they arbitrary? > What I’m wondering is if you could shift your work the the > indexing end, perhaps even in an auxiliary field. Could you, > say, just index “paid”, “ms-reply-unpaid” etc? Then there > are no wildcards at all. This akin to “concept search”. > > Otherwise ngramming is your best bet. > > What’s the field type anyway? Is this field tokenized? > > There are lots of options, but s much depends on whether > you can process the data such that you won’t need wildcards. > > Best, > Erick > > > On Jun 29, 2020, at 11:16 AM, Chris Dempsey wrote: > > > > Hello, all! I'm relatively new to Solr and Lucene (*using Solr 7.7.1*) > but > > I'm looking into options for optimizing something like this: > > > >> fq=(tag:* -tag:*paid*) OR (tag:* -tag:*ms-reply-unpaid*) OR > > tag:*ms-reply-paid* > > > > It's probably not a surprise that we're seeing performance issues with > > something like this. My understanding is that using the wildcard on both > > ends forces a full-text index search. Something like the above can't take > > advantage of something like the ReverseWordFilter either. I believe > > constructing `n-grams` is an option (*at the expense of index size*) but > is > > there anything I'm overlooking as a possible avenue to look into? > >
Re: How to determine why solr stops running?
Thanks everyone. Just to give an update on this issue, I bumped the RAM available to Solr up to 16GB a couple weeks ago, and haven’t had any problem since. On Tue, Jun 16, 2020 at 1:00 PM David Hastings wrote: > me personally, around 290gb. as much as we could shove into them > > On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson > wrote: > > > How much physical RAM? A rule of thumb is that you should allocate no > more > > than 25-50 percent of the total physical RAM to Solr. That's cumulative, > > i.e. the sum of the heap allocations across all your JVMs should be below > > that percentage. See Uwe Schindler's mmapdirectiry blog... > > > > Shot in the dark... > > > > On Tue, Jun 16, 2020, 11:51 David Hastings > > > wrote: > > > > > To add to this, i generally have solr start with this: > > > -Xms31000m-Xmx31000m > > > > > > and the only other thing that runs on them are maria db gallera cluster > > > nodes that are not in use (aside from replication) > > > > > > the 31gb is not an accident either, you dont want 32gb. > > > > > > > > > On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey > > wrote: > > > > > > > On 6/11/2020 11:52 AM, Ryan W wrote: > > > > >> I will check "dmesg" first, to find out any hardware error > message. > > > > > > > > > > > > > > > > > [1521232.781801] Out of memory: Kill process 117529 (httpd) score 9 > > or > > > > > sacrifice child > > > > > [1521232.782908] Killed process 117529 (httpd), UID 48, > > > > total-vm:675824kB, > > > > > anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB > > > > > > > > > > Is this a relevant "Out of memory" message? Does this suggest an > OOM > > > > > situation is the culprit? > > > > > > > > Because this was in the "dmesg" output, it indicates that it is the > > > > operating system killing programs because the *system* doesn't have > any > > > > memory left. It wasn't Java that did this, and it wasn't Solr that > was > > > > killed. It very well could have been Solr that was killed at another > > > > time, though. > > > > > > > > The process that it killed this time is named httpd ... which is most > > > > likely the Apache webserver. Because the UID is 48, this is probably > > an > > > > OS derived from Redhat, where the "apache" user has UID and GID 48 by > > > > default. Apache with its default config can be VERY memory hungry > when > > > > it gets busy. > > > > > > > > > -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912 > > > > > > > > This says that you started Solr with the default 512MB heap. Which > is > > > > VERY VERY small. The default is small so that Solr will start on > > > > virtually any hardware. Almost every user must increase the heap > size. > > > > And because the OS is killing processes, it is likely that the system > > > > does not have enough memory installed for what you have running on > it. > > > > > > > > It is generally not a good idea to share the server hardware between > > > > Solr and other software, unless the system has a lot of spare > > resources, > > > > memory in particular. > > > > > > > > Thanks, > > > > Shawn > > > > > > > > > >
Re: [EXTERNAL] Getting rid of Master/Slave nomenclature in Solr
On 28/06/2020 14:42, Erick Erickson wrote: > We need to draw a sharp distinction between standalone “going away” > in terms of our internal code and going away in terms of the user > experience. It'll be hard to make it completely transparant in terms of user experience. For instance, tere is currently no way to unload a core in SolrCloud (without deleting it). I'm sure there are many other similar gotchas. - Bram
Re: Prefix + Suffix Wildcards in Searches
How regular are your patterns? Are they arbitrary? What I’m wondering is if you could shift your work the the indexing end, perhaps even in an auxiliary field. Could you, say, just index “paid”, “ms-reply-unpaid” etc? Then there are no wildcards at all. This akin to “concept search”. Otherwise ngramming is your best bet. What’s the field type anyway? Is this field tokenized? There are lots of options, but s much depends on whether you can process the data such that you won’t need wildcards. Best, Erick > On Jun 29, 2020, at 11:16 AM, Chris Dempsey wrote: > > Hello, all! I'm relatively new to Solr and Lucene (*using Solr 7.7.1*) but > I'm looking into options for optimizing something like this: > >> fq=(tag:* -tag:*paid*) OR (tag:* -tag:*ms-reply-unpaid*) OR > tag:*ms-reply-paid* > > It's probably not a surprise that we're seeing performance issues with > something like this. My understanding is that using the wildcard on both > ends forces a full-text index search. Something like the above can't take > advantage of something like the ReverseWordFilter either. I believe > constructing `n-grams` is an option (*at the expense of index size*) but is > there anything I'm overlooking as a possible avenue to look into?
Re: [EXTERNAL] Getting rid of Master/Slave nomenclature in Solr
Wandering off topic, but still apropos Solr. On Sun, Jun 28, 2020 at 12:14:56PM +0200, Ilan Ginzburg wrote: > I disagree Ishan. We shouldn't get rid of standalone mode. > I see three layers in Solr: > >1. Lucene (the actual search libraries) >2. The server infra ("standalone Solr" basically) >3. Cluster management (SolrCloud) > > There's value in using lower layers without higher ones. > SolrCloud is a good solution for some use cases but there are others that > need a search server and for which SolrCloud is not a good fit and will > likely never be. If standalone mode is no longer available, such use cases > will have to turn to something other than Solr (or fork and go their own > way). A data point: While working to upgrade a dependent product from Solr 4 to Solr 7, I came across a number of APIs which would have made things simpler, neater and more reliable...except that they all are available *only* is SolrCloud. I eventually decided that asking thousands of sites to run "degenerate" SolrCloud clusters (of a single instance, plus the ZK stuff that most would find mysterious) was just not worth the gain. So, my wish-list for Solr includes either (a) abolish standalone so the decision is taken out of my hands, or (b) port some of the cloud-only APIs back to the standalone layer. I haven't spent a moment's thought on how difficult either would be -- as I said, just a wish. -- Mark H. Wood Lead Technology Analyst University Library Indiana University - Purdue University Indianapolis 755 W. Michigan Street Indianapolis, IN 46202 317-274-0749 www.ulib.iupui.edu signature.asc Description: PGP signature
Prefix + Suffix Wildcards in Searches
Hello, all! I'm relatively new to Solr and Lucene (*using Solr 7.7.1*) but I'm looking into options for optimizing something like this: > fq=(tag:* -tag:*paid*) OR (tag:* -tag:*ms-reply-unpaid*) OR tag:*ms-reply-paid* It's probably not a surprise that we're seeing performance issues with something like this. My understanding is that using the wildcard on both ends forces a full-text index search. Something like the above can't take advantage of something like the ReverseWordFilter either. I believe constructing `n-grams` is an option (*at the expense of index size*) but is there anything I'm overlooking as a possible avenue to look into?
Announcing ApacheCon @Home 2020
Hi, Apache enthusiast! (You’re receiving this because you’re subscribed to one or more dev or user mailing lists for an Apache Software Foundation project.) The ApacheCon Planners and the Apache Software Foundation are pleased to announce that ApacheCon @Home will be held online, September 29th through October 1st, 2020. We’ll be featuring content from dozens of our projects, as well as content about community, how Apache works, business models around Apache software, the legal aspects of open source, and many other topics. Full details about the event, and registration, is available at https://apachecon.com/acah2020 Due to the confusion around how and where this event was going to be held, and in order to open up to presenters from around the world who may previously have been unable or unwilling to travel, we’ve reopened the Call For Presentations until July 13th. Submit your talks today at https://acna2020.jamhosted.net/ We hope to see you at the event! Rich Bowen, VP Conferences, The Apache Software Foundation
Re: [EXTERNAL] Getting rid of Master/Slave nomenclature in Solr
Please start another thread to discuss removal of standalone mode, and stay on-topic in this one. > 28. jun. 2020 kl. 14:42 skrev Erick Erickson : > > We need to draw a sharp distinction between standalone “going away” > in terms of our internal code and going away in terms of the user > experience. > > Usually when we’re talking about standalone going a way, it’s the > former. The assumption is that we’ll use an embedded ZK that > fires up automatically so Solr behaves very similarly to how it > behaves in the current standalone mode just without all the > if (zkHost == null) do_something else do_something_else > > I wonder it the slickest way to use embedded ZK would be to > populate the embedded ZK during core discovery > > Erick > > > >> On Jun 28, 2020, at 6:40 AM, Ishan Chattopadhyaya >> wrote: >> >> Cost of maintaining feature parity across the two modes is an overhead. >> Security plugins, package manager (that doesn't work in standalone), UI, >> etc. Our codebase is littered with checks to ascertain if we're zkAware. >> There are massive benefits to maintainability if standalone mode were to go >> away. Of course, provided all usecases that could be solved using >> standalone can also be solved using SolrCloud. At that point, I'd love for >> us to get rid of the term "SolrCloud". >> >> On Sun, 28 Jun, 2020, 3:59 pm Ishan Chattopadhyaya, < >> ichattopadhy...@gmail.com> wrote: >> >>> I would like to know under which situations (except for the various bugs >>> that will be fixed eventually) would a SolrCloud solution not suffice. >>> AFAICT, pull replicas and tlog replicas can provide similar replication >>> strategies commonly used with standalone Solr. I understand that running ZK >>> is an overhead and SolrCloud isn't best written when it comes to handling >>> ZK, but that can be improved. >>> >>> And for those users who just want a single node Solr, they can just start >>> Solr with embedded ZK. It won't practically make difference. >>> >>> On Sun, 28 Jun, 2020, 3:45 pm Ilan Ginzburg, wrote: >>> I disagree Ishan. We shouldn't get rid of standalone mode. I see three layers in Solr: 1. Lucene (the actual search libraries) 2. The server infra ("standalone Solr" basically) 3. Cluster management (SolrCloud) There's value in using lower layers without higher ones. SolrCloud is a good solution for some use cases but there are others that need a search server and for which SolrCloud is not a good fit and will likely never be. If standalone mode is no longer available, such use cases will have to turn to something other than Solr (or fork and go their own way). Ilan On Sat, Jun 27, 2020 at 9:39 PM Ishan Chattopadhyaya < ichattopadhy...@gmail.com> wrote: > Rather than getting rid of the terminology, we should get rid of the > standalone mode Solr altogether. I totally understand that SolrCloud is > broken in many ways today, but we should attempt to fix it and have it as > the only mode in Solr. > > On Wed, 24 Jun, 2020, 8:17 pm Mike Drob, wrote: > >> Brend, >> >> I appreciate that you are trying to examine this issue from multiple > sides >> and consider future implications, but I don’t think that is a stirring >> argument. By analogy, if we are out of eggs and my wife asks me to go to >> the store to get some, refusing to do so on the basis that she might call >> me while I’m there and also ask me to get milk would not be reasonable. >> >> What will come next may be an interesting question philosophically, but > we >> are not discussing abstract concepts here. There is a concrete issue >> identified, and we’re soliciting input in how best to address it. >> >> Thank you for the suggestion of "guide/follower" >> >> Mike >> >> On Wed, Jun 24, 2020 at 6:30 AM Bernd Fehling < >> bernd.fehl...@uni-bielefeld.de> wrote: >> >>> I'm following this thread now for a while and I can understand >>> the wish to change some naming/wording/speech in one or the other >>> programs but I always get back to the one question: >>> "Is it the weapon which kills people or the hand controlled by >>> the mind which fires the weapon?" >>> >>> The thread started with slave - slavery, then turned over to master >>> and followed by leader (for me as a german... you know). >>> What will come next? >>> >>> And more over, we now discuss about changes in the source code and >>> due to this there need to be changes to the documentation. >>> What about the books people wrote about this programs and source code, >>> should we force this authors to rewrite their books? >>> May be we should file a request to all web search engines to reject >>> all stored content about these "banned" words? >>> And contact
Re: solrj - get metrics from all nodes
The admin UI does this my requesting &nodes=,,… You will get a master response with each sub response as key:value pairs. The list of node_names can be found in live_nodes in CLUSTERSTATUS api. Jan > 27. jun. 2020 kl. 02:09 skrev ChienHuaWang : > > For people who is also looking for the solution - you can append > "node=node_name" in metrics request to get specific data of node. > If anyone know how to get the data if all the nodes together, please kindly > share, thanks. > > > Regards, > Chien > > > > -- > Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html