BinaryResponseWriter fetches unnecessary fields?
Hi all, We observe that solr query time increases significantly with the number of rows requested, even all we retrieve for each document is just fl=id,score. Debugged a bit and see that most of the increased time was spent in BinaryResponseWriter, converting lucene document into SolrDocument. Inside convertLuceneDocToSolrDoc(): https://github.com/apache/lucene-solr/blob/df874432b9a17b547acb24a01d3491 839e6a6b69/solr/core/src/java/org/apache/solr/response/ DocsStreamer.java#L182 for (IndexableField f : doc.getFields()) I am a bit puzzled why we need to iterate through all the fields in the document. Why can’t we just iterate through the requested fields in fl? Specifically: https://github.com/apache/lucene-solr/blob/df874432b9a17b547acb24a01d3491 839e6a6b69/solr/core/src/java/org/apache/solr/response/ DocsStreamer.java#L156 if we change sdoc = convertLuceneDocToSolrDoc(doc, rctx.getSearcher().getSchema()) to sdoc = convertLuceneDocToSolrDoc(doc, rctx.getSearcher().getSchema(), fnames) and just iterate through fnames in convertLuceneDocToSolrDoc(), there is a significant performance boost in our case, the query time increase from rows=128 vs rows=500 is much smaller. Am I missing something here? Thanks, Wei
Re: Strange Alias behavior
It seems like a useful feature, especially for migrating from standalone to solrcloud, at least if the precedence of alias to collection is defined and enforced. On Fri, Jan 19, 2018 at 5:01 PM, Shawn Heiseywrote: > On 1/19/2018 3:53 PM, Webster Homer wrote: > >> I created the alias with an existing collection name because our code base >> which was created with stand alone solr was a pain to change. I did test >> that the alias took precedence over the collection, when I did a search. >> > > The ability to create aliases and collections with the same name is viewed > as a bug by some, and probably will be removed in a future version. > > https://issues.apache.org/jira/browse/SOLR-11488 > > It doesn't really make sense to have an alias with the same name as a > collection, and the behavior is probably undefined. > > Thanks, > Shawn > -- This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://www.emdgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer.
Re: Issue with solr.HTMLStripCharFilterFactory
On 1/19/2018 11:56 AM, Fiz Ahmed wrote: But When I Query in Solr Admin.. I am still getting the Search results with Html Tags in it. Search results will always contain the actual content that was indexed. Analysis only happens to indexed data and/or queries, not stored data. This is how Solr and Lucene have *always* worked. It's not new behavior. To achieve what you want, you will either need to use an update processor, or you'll need to adjust your indexing program to make the changes before it sends the data to Solr. If you choose the update processor route, there is a built-in processor that has the same behavior as the HTML filter you are using. Note that if you use that update processor, you won't need the html filter in the analyzer for the affected fields, because the HTML will be gone before the analysis runs. https://lucene.apache.org/solr/6_6_0//solr-core/org/apache/solr/update/processor/HTMLStripFieldUpdateProcessorFactory.html You can always write a custom processor if you wish. A custom processor might be required if you want your stored data to undergo some very extensive transformation. Here's the documentation on update processors: https://lucene.apache.org/solr/guide/6_6/update-request-processors.html Thanks, Shawn
Re: Strange Alias behavior
On 1/19/2018 3:53 PM, Webster Homer wrote: I created the alias with an existing collection name because our code base which was created with stand alone solr was a pain to change. I did test that the alias took precedence over the collection, when I did a search. The ability to create aliases and collections with the same name is viewed as a bug by some, and probably will be removed in a future version. https://issues.apache.org/jira/browse/SOLR-11488 It doesn't really make sense to have an alias with the same name as a collection, and the behavior is probably undefined. Thanks, Shawn
Re: Strange Alias behavior
I created the alias with an existing collection name because our code base which was created with stand alone solr was a pain to change. I did test that the alias took precedence over the collection, when I did a search. On Fri, Jan 19, 2018 at 4:22 PM, Wenjie Zhang (Jack) < wenjiezhang2...@gmail.com> wrote: > Why would you create an alias with an existing collection name? > > Sent from my iPhone > > > On Jan 19, 2018, at 14:14, Webster Homerwrote: > > > > I just discovered some odd behavior with aliases. > > > > We are in the process of converting over to use aliases in solrcloud. We > > have a number of collections that applications have referenced the > > collections from when we used standalone solr. So we created alias names > to > > match the name that the java applications already used. > > > > We still have collections that have the name of the alias. > > > > We also decided to create new aliases for use in our ETL process. > > I have 3 collections that have the same configset which is named > > b2b-catalog-material > > collection 1: b2b-catalog-material > > collection 2: b2b-catalog-material-180117 > > collection 3: b2b-catalog-material-180117T > > > > When the alias, b2b-catalog-material-etl is pointed at > b2b-catalog-material > > and the alias b2b-catalog-material is pointed to > b2b-catalog-material-180117 > > > > and we do a data load to b2b-catalog-material-etl > > > > We see data being added to both b2b-catalog-material and > > b2b-catalog-material-180117 > > > > when I delete the alias b2b-catalog-material then the data stopped > loading > > into the collection b2b-catalog-material-180117 > > > > > > So it seems that alias resolution is somewhat recursive. I'm surprised > that > > both collections were being updated. > > > > Is this the intended behavior for aliases? I don't remember seeing this > > documented. > > This was on a solrcloud running solr 7.2 > > > > I haven't checked this in Solr 7.2 but when I created a new collection > and > > then pointed the alias to it and did a search no data was returned > because > > there was none to return. So this indicates to me that aliases behave > > differently if we're writing to them or reading from them. > > > > -- > > > > > > This message and any attachment are confidential and may be privileged or > > otherwise protected from disclosure. If you are not the intended > recipient, > > you must not copy this message or attachment or disclose the contents to > > any other person. If you have received this transmission in error, please > > notify the sender immediately and delete the message and any attachment > > from your system. Merck KGaA, Darmstadt, Germany and any of its > > subsidiaries do not accept liability for any omissions or errors in this > > message which may arise as a result of E-Mail-transmission or for damages > > resulting from any unauthorized changes of the content of this message > and > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > > subsidiaries do not guarantee that this message is free of viruses and > does > > not accept liability for any damages caused by any virus transmitted > > therewith. > > > > Click http://www.emdgroup.com/disclaimer to access the German, French, > > Spanish and Portuguese versions of this disclaimer. > -- This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://www.emdgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer.
Re: Strange Alias behavior
Why would you create an alias with an existing collection name? Sent from my iPhone > On Jan 19, 2018, at 14:14, Webster Homerwrote: > > I just discovered some odd behavior with aliases. > > We are in the process of converting over to use aliases in solrcloud. We > have a number of collections that applications have referenced the > collections from when we used standalone solr. So we created alias names to > match the name that the java applications already used. > > We still have collections that have the name of the alias. > > We also decided to create new aliases for use in our ETL process. > I have 3 collections that have the same configset which is named > b2b-catalog-material > collection 1: b2b-catalog-material > collection 2: b2b-catalog-material-180117 > collection 3: b2b-catalog-material-180117T > > When the alias, b2b-catalog-material-etl is pointed at b2b-catalog-material > and the alias b2b-catalog-material is pointed to b2b-catalog-material-180117 > > and we do a data load to b2b-catalog-material-etl > > We see data being added to both b2b-catalog-material and > b2b-catalog-material-180117 > > when I delete the alias b2b-catalog-material then the data stopped loading > into the collection b2b-catalog-material-180117 > > > So it seems that alias resolution is somewhat recursive. I'm surprised that > both collections were being updated. > > Is this the intended behavior for aliases? I don't remember seeing this > documented. > This was on a solrcloud running solr 7.2 > > I haven't checked this in Solr 7.2 but when I created a new collection and > then pointed the alias to it and did a search no data was returned because > there was none to return. So this indicates to me that aliases behave > differently if we're writing to them or reading from them. > > -- > > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > Click http://www.emdgroup.com/disclaimer to access the German, French, > Spanish and Portuguese versions of this disclaimer.
Re: SOLR Data Backup
Another option is to have CDCR enabled for Solr and replicate your data to another Solr cluster continuously. BTW, why do we not recommend having Solr as a source of truth? On Thu, Jan 18, 2018 at 4:08 AM, Florian Gleixnerwrote: > Am 18.01.2018 um 10:21 schrieb Wael Kader: > > Hello, > > > > Whats the best way to do a backup of the SOLR data. > > I have a single node solr server and I want to always keep a copy of the > > data I have. > > > > Is replication an option for what I want ? > > > > I would like to get some tutorials and papers if possible on the method > > that should be used in case its backup or replication or anything else. > > > > The reference manual will help you: > > > https://lucene.apache.org/solr/guide/6_6/making-and- > restoring-backups.html#standalone-mode-backups > >
Re: Preserve order during indexing
db order isn't generally defined, unless you are using an explicit "order by" on your select. Default behavior would vary by database type and even release of the database. You can index the fields that you would "order by" in the db, and sort on those fields in solr On Thu, Jan 18, 2018 at 10:17 PM, jagdish vasaniwrote: > Hi Ashish, > I think it's not possible,solr creates inverted index.. but you can get > documents by sorting orders, give sort= asc/desc. > > Thanks, > JagdishVasani > On 19-Jan-2018 9:22 am, "Aashish Agarwal" wrote: > > > Hi, > > > > I need to index documents in solr so that they are stored in same order > as > > present in database. i.e *:* gives result in db order. Is it possible. > > > > Thanks, > > Aashish > > > -- This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://www.emdgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer.
Strange Alias behavior
I just discovered some odd behavior with aliases. We are in the process of converting over to use aliases in solrcloud. We have a number of collections that applications have referenced the collections from when we used standalone solr. So we created alias names to match the name that the java applications already used. We still have collections that have the name of the alias. We also decided to create new aliases for use in our ETL process. I have 3 collections that have the same configset which is named b2b-catalog-material collection 1: b2b-catalog-material collection 2: b2b-catalog-material-180117 collection 3: b2b-catalog-material-180117T When the alias, b2b-catalog-material-etl is pointed at b2b-catalog-material and the alias b2b-catalog-material is pointed to b2b-catalog-material-180117 and we do a data load to b2b-catalog-material-etl We see data being added to both b2b-catalog-material and b2b-catalog-material-180117 when I delete the alias b2b-catalog-material then the data stopped loading into the collection b2b-catalog-material-180117 So it seems that alias resolution is somewhat recursive. I'm surprised that both collections were being updated. Is this the intended behavior for aliases? I don't remember seeing this documented. This was on a solrcloud running solr 7.2 I haven't checked this in Solr 7.2 but when I created a new collection and then pointed the alias to it and did a search no data was returned because there was none to return. So this indicates to me that aliases behave differently if we're writing to them or reading from them. -- This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://www.emdgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer.
Re: Adding a child doc incrementally
Restriction to a single shard seems like a big limitation for us. Also, I was hoping that this was something Solr provided out of the box. (Like https://lucene.apache.org/solr/guide/6_6/updating-parts-of-documents.html#UpdatingPartsofDocuments-In-PlaceUpdates ) Something like: { "id":"parents-id", "price":{"set":99}, "popularity":{"inc":20}, "children": {"add": {child document(s)}} } or something like: { "id":"child-id", "parentId": "parents-id" ... normal fields of the child ... "operationType": "add | delete" } In both the cases, Solr can just look at the parents' ID, route the document to the correct shard and add the child to the parent to create the full nested document (as in block join), that would be ideal. Thanks SG On Wed, Jan 17, 2018 at 9:58 PM, Gus Heckwrote: > If the document routing can be arranged such that the children and the > parent are always co-located in the same shard, and share an identifier, > the graph query can pull back the parent plus any arbitrary number of > "children" that have been added at any time in any order. In this scheme > "children" are just things that match your graph query... ( > https://lucene.apache.org/solr/guide/6_6/other-parsers.html#OtherParsers- > GraphQueryParser) > However, if your query has to cross shards, that won't work (yet... > https://issues.apache.org/jira/browse/SOLR-11384). > > More info here: > https://www.slideshare.net/lucidworks/solr-graph-query- > presented-by-kevin-watters-kmw-technology > > On Mon, Jan 15, 2018 at 2:09 PM, S G wrote: > > > Hi, > > > > We have a use-case where a single document can contain thousands of child > > documents. > > However, I could not find any way to do it incrementally. > > Only way is to read the full document from Solr, add the new child > document > > to it and then re-index the full document will all of its child documents > > again. > > This causes lot of reads from Solr just to form the document with one > extra > > document. > > Ideally, I would have liked to only send the parent-ID and the > > child-document only as part of an "incremental update" command to Solr. > > > > Is there a way to incrementally add a child document to a parent > document? > > > > Thanks > > SG > > > > > > -- > http://www.the111shift.com >
RE: Solr Replication being flaky (6.2.0)
Working on that now to see if it helps us out. Solr process is NOT dying at all. Searches are still working as expected, but since we load balance requestsif the master/slave are out of sync the search results vary. The advice is MUCH appreciated! -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Friday, January 19, 2018 1:49 PM To: solr-user@lucene.apache.org Subject: Re: Solr Replication being flaky (6.2.0) On 1/19/2018 11:27 AM, Shawn Heisey wrote: > On 1/19/2018 8:54 AM, Pouliot, Scott wrote: >> I do have a ticket in with our systems team to up the file handlers >> since I am seeing the "Too many files open" error on occasion on our >> prod servers. Is this the setting you're referring to? Found we >> were set to to 1024 using the "Ulimit" command. > > No, but that often needs increasing too. I think you need to increase > the process limit even if that's not the cause of this particular problem. Had another thought. Either of these limits can cause completely unpredictable problems with Solr. The open file limit could be the reason for these issues, even if you're not actually hitting the process limit. As I mentioned before, I would expect a process limit to cause Solr to kill itself, and your other messages don't mention problems like that. The scale of your Solr installation indicates that you should greatly increase both limits on all of your Solr servers. Thanks, Shawn
Issue with solr.HTMLStripCharFilterFactory
Hi Solr Experts, I am using the HTMLStripCharFilterFactory for removing tags in Body element. Body contains data like Ipad I made changes in managed schema . --- I restarted the Solr and Indexed again. But When I Query in Solr Admin.. I am still getting the Search results with Html Tags in it. "body":"Practically everytime I log onto Mogran, suddenly I see it running *Please let me know what will be the Issue…Am I Missing anything.* Thanks Fiz..
Re: Solr Replication being flaky (6.2.0)
On 1/19/2018 11:27 AM, Shawn Heisey wrote: On 1/19/2018 8:54 AM, Pouliot, Scott wrote: I do have a ticket in with our systems team to up the file handlers since I am seeing the "Too many files open" error on occasion on our prod servers. Is this the setting you're referring to? Found we were set to to 1024 using the "Ulimit" command. No, but that often needs increasing too. I think you need to increase the process limit even if that's not the cause of this particular problem. Had another thought. Either of these limits can cause completely unpredictable problems with Solr. The open file limit could be the reason for these issues, even if you're not actually hitting the process limit. As I mentioned before, I would expect a process limit to cause Solr to kill itself, and your other messages don't mention problems like that. The scale of your Solr installation indicates that you should greatly increase both limits on all of your Solr servers. Thanks, Shawn
Re: Solr Replication being flaky (6.2.0)
On 1/19/2018 8:54 AM, Pouliot, Scott wrote: I do have a ticket in with our systems team to up the file handlers since I am seeing the "Too many files open" error on occasion on our prod servers. Is this the setting you're referring to? Found we were set to to 1024 using the "Ulimit" command. No, but that often needs increasing too. I think you need to increase the process limit even if that's not the cause of this particular problem. Sounds like you're running on Linux, though ulimit is probably available on other platforms too. If it's Linux, generally you must increase both the number of processes and the open file limit in /etc/security/limits.conf. Trying to use the ulimit command generally doesn't work because the kernel has hard limits configured that ulimit can't budge. If it's not Linux, then you'll need to consult with an expert in the OS you're running. Again, assuming Linux, in the output of "ulimit -a" the value I'm talking about is the "-u" value -- "max user processes". The following is the additions that I typically make to /etc/security/limits.conf, to increase both the open file limit and the process limit for the solr user: solr hard nproc 61440 solr soft nproc 40960 solr hard nofile 65535 solr soft nofile 49151 Are you running into problems where Solr just disappears? I would expect a process limit to generate OutOfMemoryError exceptions. When Solr is started with the included shell script, unless it's running with the foreground option, OOME will kill the Solr process. We have issues to bring the OOME death option to running in the foreground, as well as when running on Windows. Thanks, Shawn
RE: Solr Replication being flaky (6.2.0)
That's evidence enough for me to beat on our systems guys to get these file handles upped and cross my fingers then! -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, January 19, 2018 1:18 PM To: solr-userSubject: Re: Solr Replication being flaky (6.2.0) "Could be", certainly. "Definitely is" is iffier ;)... But the statement "If we restart the Solr service or optimize the core it seems to kick back in again.", especially the "optimize" bit (which, by the way you should do only if you have the capability of doing it periodically [1]) is some evidence that this may be in the vicinity. One of the effects of an optimize is to merge your segments files from N to 1. So say you have 10 segments. Each one of those may consist of 10-15 individual files, all of which are held open. So you'd go from 150 open file handles to 15.. https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucidworks.com%2F2017%2F10%2F13%2Fsegment-merging-deleted-documents-optimize-may-bad%2F=02%7C01%7CScott.Pouliot%40peoplefluent.com%7Cc2912861f58248e3a92808d55f690eb8%7C8b16fb62c78448b6aba889567990e7fe%7C1%7C0%7C636519827178716698=DxnChrfyTbRDjB7HzqpOE%2BvOJRIxdnrXVCIyfoMjJPU%3D=0 Best, Erick On Fri, Jan 19, 2018 at 9:32 AM, Pouliot, Scott wrote: > Erick, > > Thanks! Could these settings be toying with replication? Solr itself seems > to be working like a champ, except when things get out of sync. > > Scott > > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Friday, January 19, 2018 12:27 PM > To: solr-user > Subject: Re: Solr Replication being flaky (6.2.0) > > Scott: > > We usually recommend setting files and processes very, very high. Like 65K > high. Or unlimited if you can. > > Plus max user processes should also be bumped very high as well, like 65K as > well. > > Plus max memory and virtual memory should be unlimited. > > We've included warnings at startup for open files and processes, see > SOLR-11703 > > Best, > Erick > > On Fri, Jan 19, 2018 at 7:54 AM, Pouliot, Scott > wrote: >> I do have a ticket in with our systems team to up the file handlers since I >> am seeing the "Too many files open" error on occasion on our prod servers. >> Is this the setting you're referring to? Found we were set to to 1024 using >> the "Ulimit" command. >> >> -Original Message- >> From: Shawn Heisey [mailto:apa...@elyograg.org] >> Sent: Friday, January 19, 2018 10:48 AM >> To: solr-user@lucene.apache.org >> Subject: Re: Solr Replication being flaky (6.2.0) >> >> On 1/19/2018 7:50 AM, Pouliot, Scott wrote: >>> So we're running Solr in a Master/Slave configuration (1 of each) and it >>> seems that the replication stalls or stops functioning every now and again. >>> If we restart the Solr service or optimize the core it seems to kick back >>> in again. >>> >>> Anyone have any idea what might be causing this? We do have a good amount >>> of cores on each server (@150 or so), but I have heard reports of a LOT >>> more than that in use. >> >> Have you increased the number of processes that the user running Solr is >> allowed to start? Most operating systems limit the number of >> threads/processes a user can start to a low value like 1024. With 150 >> cores, particularly with background tasks like replication configured, >> chances are that Solr is going to need to start a lot of threads. This is >> an OS setting that a lot of Solr admins end up needing to increase. >> >> I ran into the process limit on my servers and I don't have anywhere near >> 150 cores. >> >> The fact that restarting Solr gets it working again (at least >> temporarily) would fit with a process limit being the problem. I'm not >> guaranteeing that this is the problem, only saying that it fits. >> >> Thanks, >> Shawn
Re: Solr Replication being flaky (6.2.0)
"Could be", certainly. "Definitely is" is iffier ;)... But the statement "If we restart the Solr service or optimize the core it seems to kick back in again.", especially the "optimize" bit (which, by the way you should do only if you have the capability of doing it periodically [1]) is some evidence that this may be in the vicinity. One of the effects of an optimize is to merge your segments files from N to 1. So say you have 10 segments. Each one of those may consist of 10-15 individual files, all of which are held open. So you'd go from 150 open file handles to 15.. https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/ Best, Erick On Fri, Jan 19, 2018 at 9:32 AM, Pouliot, Scottwrote: > Erick, > > Thanks! Could these settings be toying with replication? Solr itself seems > to be working like a champ, except when things get out of sync. > > Scott > > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Friday, January 19, 2018 12:27 PM > To: solr-user > Subject: Re: Solr Replication being flaky (6.2.0) > > Scott: > > We usually recommend setting files and processes very, very high. Like 65K > high. Or unlimited if you can. > > Plus max user processes should also be bumped very high as well, like 65K as > well. > > Plus max memory and virtual memory should be unlimited. > > We've included warnings at startup for open files and processes, see > SOLR-11703 > > Best, > Erick > > On Fri, Jan 19, 2018 at 7:54 AM, Pouliot, Scott > wrote: >> I do have a ticket in with our systems team to up the file handlers since I >> am seeing the "Too many files open" error on occasion on our prod servers. >> Is this the setting you're referring to? Found we were set to to 1024 using >> the "Ulimit" command. >> >> -Original Message- >> From: Shawn Heisey [mailto:apa...@elyograg.org] >> Sent: Friday, January 19, 2018 10:48 AM >> To: solr-user@lucene.apache.org >> Subject: Re: Solr Replication being flaky (6.2.0) >> >> On 1/19/2018 7:50 AM, Pouliot, Scott wrote: >>> So we're running Solr in a Master/Slave configuration (1 of each) and it >>> seems that the replication stalls or stops functioning every now and again. >>> If we restart the Solr service or optimize the core it seems to kick back >>> in again. >>> >>> Anyone have any idea what might be causing this? We do have a good amount >>> of cores on each server (@150 or so), but I have heard reports of a LOT >>> more than that in use. >> >> Have you increased the number of processes that the user running Solr is >> allowed to start? Most operating systems limit the number of >> threads/processes a user can start to a low value like 1024. With 150 >> cores, particularly with background tasks like replication configured, >> chances are that Solr is going to need to start a lot of threads. This is >> an OS setting that a lot of Solr admins end up needing to increase. >> >> I ran into the process limit on my servers and I don't have anywhere near >> 150 cores. >> >> The fact that restarting Solr gets it working again (at least >> temporarily) would fit with a process limit being the problem. I'm not >> guaranteeing that this is the problem, only saying that it fits. >> >> Thanks, >> Shawn
RE: Solr Replication being flaky (6.2.0)
Erick, Thanks! Could these settings be toying with replication? Solr itself seems to be working like a champ, except when things get out of sync. Scott -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, January 19, 2018 12:27 PM To: solr-userSubject: Re: Solr Replication being flaky (6.2.0) Scott: We usually recommend setting files and processes very, very high. Like 65K high. Or unlimited if you can. Plus max user processes should also be bumped very high as well, like 65K as well. Plus max memory and virtual memory should be unlimited. We've included warnings at startup for open files and processes, see SOLR-11703 Best, Erick On Fri, Jan 19, 2018 at 7:54 AM, Pouliot, Scott wrote: > I do have a ticket in with our systems team to up the file handlers since I > am seeing the "Too many files open" error on occasion on our prod servers. > Is this the setting you're referring to? Found we were set to to 1024 using > the "Ulimit" command. > > -Original Message- > From: Shawn Heisey [mailto:apa...@elyograg.org] > Sent: Friday, January 19, 2018 10:48 AM > To: solr-user@lucene.apache.org > Subject: Re: Solr Replication being flaky (6.2.0) > > On 1/19/2018 7:50 AM, Pouliot, Scott wrote: >> So we're running Solr in a Master/Slave configuration (1 of each) and it >> seems that the replication stalls or stops functioning every now and again. >> If we restart the Solr service or optimize the core it seems to kick back in >> again. >> >> Anyone have any idea what might be causing this? We do have a good amount >> of cores on each server (@150 or so), but I have heard reports of a LOT more >> than that in use. > > Have you increased the number of processes that the user running Solr is > allowed to start? Most operating systems limit the number of > threads/processes a user can start to a low value like 1024. With 150 cores, > particularly with background tasks like replication configured, chances are > that Solr is going to need to start a lot of threads. This is an OS setting > that a lot of Solr admins end up needing to increase. > > I ran into the process limit on my servers and I don't have anywhere near 150 > cores. > > The fact that restarting Solr gets it working again (at least > temporarily) would fit with a process limit being the problem. I'm not > guaranteeing that this is the problem, only saying that it fits. > > Thanks, > Shawn
Re: Solr Replication being flaky (6.2.0)
Scott: We usually recommend setting files and processes very, very high. Like 65K high. Or unlimited if you can. Plus max user processes should also be bumped very high as well, like 65K as well. Plus max memory and virtual memory should be unlimited. We've included warnings at startup for open files and processes, see SOLR-11703 Best, Erick On Fri, Jan 19, 2018 at 7:54 AM, Pouliot, Scottwrote: > I do have a ticket in with our systems team to up the file handlers since I > am seeing the "Too many files open" error on occasion on our prod servers. > Is this the setting you're referring to? Found we were set to to 1024 using > the "Ulimit" command. > > -Original Message- > From: Shawn Heisey [mailto:apa...@elyograg.org] > Sent: Friday, January 19, 2018 10:48 AM > To: solr-user@lucene.apache.org > Subject: Re: Solr Replication being flaky (6.2.0) > > On 1/19/2018 7:50 AM, Pouliot, Scott wrote: >> So we're running Solr in a Master/Slave configuration (1 of each) and it >> seems that the replication stalls or stops functioning every now and again. >> If we restart the Solr service or optimize the core it seems to kick back in >> again. >> >> Anyone have any idea what might be causing this? We do have a good amount >> of cores on each server (@150 or so), but I have heard reports of a LOT more >> than that in use. > > Have you increased the number of processes that the user running Solr is > allowed to start? Most operating systems limit the number of > threads/processes a user can start to a low value like 1024. With 150 cores, > particularly with background tasks like replication configured, chances are > that Solr is going to need to start a lot of threads. This is an OS setting > that a lot of Solr admins end up needing to increase. > > I ran into the process limit on my servers and I don't have anywhere near 150 > cores. > > The fact that restarting Solr gets it working again (at least > temporarily) would fit with a process limit being the problem. I'm not > guaranteeing that this is the problem, only saying that it fits. > > Thanks, > Shawn
RE: Solr Replication being flaky (6.2.0)
I do have a ticket in with our systems team to up the file handlers since I am seeing the "Too many files open" error on occasion on our prod servers. Is this the setting you're referring to? Found we were set to to 1024 using the "Ulimit" command. -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Friday, January 19, 2018 10:48 AM To: solr-user@lucene.apache.org Subject: Re: Solr Replication being flaky (6.2.0) On 1/19/2018 7:50 AM, Pouliot, Scott wrote: > So we're running Solr in a Master/Slave configuration (1 of each) and it > seems that the replication stalls or stops functioning every now and again. > If we restart the Solr service or optimize the core it seems to kick back in > again. > > Anyone have any idea what might be causing this? We do have a good amount of > cores on each server (@150 or so), but I have heard reports of a LOT more > than that in use. Have you increased the number of processes that the user running Solr is allowed to start? Most operating systems limit the number of threads/processes a user can start to a low value like 1024. With 150 cores, particularly with background tasks like replication configured, chances are that Solr is going to need to start a lot of threads. This is an OS setting that a lot of Solr admins end up needing to increase. I ran into the process limit on my servers and I don't have anywhere near 150 cores. The fact that restarting Solr gets it working again (at least temporarily) would fit with a process limit being the problem. I'm not guaranteeing that this is the problem, only saying that it fits. Thanks, Shawn
Re: Solr Replication being flaky (6.2.0)
On 1/19/2018 7:50 AM, Pouliot, Scott wrote: So we're running Solr in a Master/Slave configuration (1 of each) and it seems that the replication stalls or stops functioning every now and again. If we restart the Solr service or optimize the core it seems to kick back in again. Anyone have any idea what might be causing this? We do have a good amount of cores on each server (@150 or so), but I have heard reports of a LOT more than that in use. Have you increased the number of processes that the user running Solr is allowed to start? Most operating systems limit the number of threads/processes a user can start to a low value like 1024. With 150 cores, particularly with background tasks like replication configured, chances are that Solr is going to need to start a lot of threads. This is an OS setting that a lot of Solr admins end up needing to increase. I ran into the process limit on my servers and I don't have anywhere near 150 cores. The fact that restarting Solr gets it working again (at least temporarily) would fit with a process limit being the problem. I'm not guaranteeing that this is the problem, only saying that it fits. Thanks, Shawn
RE: Solr Replication being flaky (6.2.0)
I'm at the point now where I may end up writing a script to compare master/slave nightly...and trigger an optimize or solr restart if there are any differences. Of course I have to check 150+ cores...but it could be done. I'm just hoping I don't need to go that route -Original Message- From: David Hastings [mailto:hastings.recurs...@gmail.com] Sent: Friday, January 19, 2018 10:35 AM To: solr-user@lucene.apache.org Subject: Re: Solr Replication being flaky (6.2.0) This happens to me quite often as well. Generally on the replication admin screen it will say its downloading a file, but be at 0 or a VERY small kb/sec. Then after a restart of the slave its back to downloading at 30 to 100 mg/sec. Would be curious if there actually is a solution to this aside from checking every day if the core replicated. Im on Solr 5.x by the way -Dave On Fri, Jan 19, 2018 at 9:50 AM, Pouliot, Scott < scott.poul...@peoplefluent.com> wrote: > So we're running Solr in a Master/Slave configuration (1 of each) and > it seems that the replication stalls or stops functioning every now > and again. If we restart the Solr service or optimize the core it > seems to kick back in again. > > Anyone have any idea what might be causing this? We do have a good > amount of cores on each server (@150 or so), but I have heard reports > of a LOT more than that in use. > > Here is our master config: > > > > startup > commit > > > 00:00:10 > > > > 1 > > > > And our slave config: > > > > >name="masterUrl">http://server1:8080/solr/${https://na01.safelinks.pro > tection.outlook.com/?url=solr.core.name=02%7C01%7CScott.Pouliot%4 > 0peoplefluent.com%7C8d43918dd95540a3a11708d55f523302%7C8b16fb62c78448b > 6aba889567990e7fe%7C1%7C1%7C636519729029923349=Fes6G36gIMRyfahTI > fftg0eUEVEiVK77B8KpuTr%2FJrA%3D=0} > > > > 00:00:45 > > > > > > solr-data-config.xml > > >
Re: Solr Replication being flaky (6.2.0)
This happens to me quite often as well. Generally on the replication admin screen it will say its downloading a file, but be at 0 or a VERY small kb/sec. Then after a restart of the slave its back to downloading at 30 to 100 mg/sec. Would be curious if there actually is a solution to this aside from checking every day if the core replicated. Im on Solr 5.x by the way -Dave On Fri, Jan 19, 2018 at 9:50 AM, Pouliot, Scott < scott.poul...@peoplefluent.com> wrote: > So we're running Solr in a Master/Slave configuration (1 of each) and it > seems that the replication stalls or stops functioning every now and > again. If we restart the Solr service or optimize the core it seems to > kick back in again. > > Anyone have any idea what might be causing this? We do have a good amount > of cores on each server (@150 or so), but I have heard reports of a LOT > more than that in use. > > Here is our master config: > > > > startup > commit > > > 00:00:10 > > > > 1 > > > > And our slave config: > > > > > http://server1:8080/solr/${solr.core.name} > > > > 00:00:45 > > > > > > solr-data-config.xml > > >
Solr Replication being flaky (6.2.0)
So we're running Solr in a Master/Slave configuration (1 of each) and it seems that the replication stalls or stops functioning every now and again. If we restart the Solr service or optimize the core it seems to kick back in again. Anyone have any idea what might be causing this? We do have a good amount of cores on each server (@150 or so), but I have heard reports of a LOT more than that in use. Here is our master config: startup commit 00:00:10 1 And our slave config: http://server1:8080/solr/${solr.core.name} 00:00:45 solr-data-config.xml