Re: SOLR Cloud - Full index replication
Thanks Erick! We are using SOLR version 7.0.1. is there any disadvantages if we increase peer sync size to 1000 ? We have analysed the GC logs but we have not seen long GC pauses so far. We tried to find the reason for the full sync, but noting more informative, but we have seen too many logs which reads "No registered leader was found after waiting for 4000ms" followed by this full index. Thanks, Doss. On Sun, Dec 30, 2018 at 8:49 AM Erick Erickson wrote: > No. There's a "peer sync" that will try to update from the leader's > transaction log if (and only if) the replica has fallen behind. By > "fallen behind" I mean it was unable to accept any updates for > some period of time. The default peer sync size is 100 docs, > you can make it larger see numRecordsToKeep here: > http://lucene.apache.org/solr/guide/7_6/updatehandlers-in-solrconfig.html > > Some observations though: > 12G heap for 250G of index on disk _may_ work, but I'd be looking at > the GC characteristics, particularly stop-the-world pauses. > > Your hard commit interval looks too long. I'd shorten it to < 1 minute > with openSearcher=false. See: > > https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ > > I'd concentrate on _why_ the replica goes into recovery in the first > place. You say you're on 7x, which one? Starting in 7.3 the recovery > logic was pretty thoroughly reworked, so _which_ 7x version is > important to know. > > The Solr logs should give you some idea of _why_ the replica > goes into recovery, concentrate on the replica that goes into > recovery and the corresponding leader's log. > > Best, > Erick > > On Sat, Dec 29, 2018 at 6:23 PM Doss wrote: > > > > we are using 3 node solr (64GB ram/8cpu/12GB heap)cloud setup with > version > > 7.X. we have 3 indexes/collection on each node. index size were about > > 250GB. NRT with 5sec soft /10min hard commit. Sometimes in any one node > we > > are seeing full index replication started running.. is there any > > configuration which forces solr to replicate full , like 100/200 updates > > difference if a node sees with the leader ? - Thanks. >
Re: Removing words like "FONT-SIZE: 9pt; FONT-FAMILY: arial" from content
These texts are likely from the original EML file data, but they are not visible in the content when the EML file is opened in Microsoft Outlook. I have already applied the HTMLStripFieldUpdateProcessorFactory in solrconfig.xml, but these texts are still showing up in the index. Below is my configuration. content_tcs Regards, Edwin On Mon, 31 Dec 2018 at 11:29, Alexandre Rafalovitch wrote: > Specifically, a custome Update Request Processor chain can be used before > indexing. Probably with HTMLStripFieldUpdateProcessorFactory > Regards, > Alex > > On Sun, Dec 30, 2018, 9:26 PM Vincenzo D'Amore > > Hi, > > > > I think this kind of text manipulation should be done before indexing, if > > you have font-size font-family in your text, very likely you’re indexing > an > > html with css. > > If I’m right, you’re just entering in a hell of words that should be > > removed from your text. > > > > On the other hand, if you have to do this at index time, a quick and > dirty > > solution is using the pattern-replace filter. > > > > > > > https://lucene.apache.org/solr/guide/7_5/filter-descriptions.html#pattern-replace-filter > > > > Ciao, > > Vincenzo > > > > -- > > mobile: 3498513251 > > skype: free.dev > > > > > On 31 Dec 2018, at 02:47, Zheng Lin Edwin Yeo > > wrote: > > > > > > Hi, > > > > > > I noticed that during the indexing of EMLfiles, there are words like > > > "*FONT-SIZE: > > > 9pt; FONT-FAMILY: arial*" that are being indexed into the content as > > well. > > > > > > Would like to check, how are we able to remove those words during the > > > indexing? > > > > > > I am using Solr 7.5.0 > > > > > > Regards, > > > Edwin > > >
Re: Removing words like "FONT-SIZE: 9pt; FONT-FAMILY: arial" from content
Specifically, a custome Update Request Processor chain can be used before indexing. Probably with HTMLStripFieldUpdateProcessorFactory Regards, Alex On Sun, Dec 30, 2018, 9:26 PM Vincenzo D'Amore Hi, > > I think this kind of text manipulation should be done before indexing, if > you have font-size font-family in your text, very likely you’re indexing an > html with css. > If I’m right, you’re just entering in a hell of words that should be > removed from your text. > > On the other hand, if you have to do this at index time, a quick and dirty > solution is using the pattern-replace filter. > > > https://lucene.apache.org/solr/guide/7_5/filter-descriptions.html#pattern-replace-filter > > Ciao, > Vincenzo > > -- > mobile: 3498513251 > skype: free.dev > > > On 31 Dec 2018, at 02:47, Zheng Lin Edwin Yeo > wrote: > > > > Hi, > > > > I noticed that during the indexing of EMLfiles, there are words like > > "*FONT-SIZE: > > 9pt; FONT-FAMILY: arial*" that are being indexed into the content as > well. > > > > Would like to check, how are we able to remove those words during the > > indexing? > > > > I am using Solr 7.5.0 > > > > Regards, > > Edwin >
Re: Removing words like "FONT-SIZE: 9pt; FONT-FAMILY: arial" from content
Hi, I think this kind of text manipulation should be done before indexing, if you have font-size font-family in your text, very likely you’re indexing an html with css. If I’m right, you’re just entering in a hell of words that should be removed from your text. On the other hand, if you have to do this at index time, a quick and dirty solution is using the pattern-replace filter. https://lucene.apache.org/solr/guide/7_5/filter-descriptions.html#pattern-replace-filter Ciao, Vincenzo -- mobile: 3498513251 skype: free.dev > On 31 Dec 2018, at 02:47, Zheng Lin Edwin Yeo wrote: > > Hi, > > I noticed that during the indexing of EMLfiles, there are words like > "*FONT-SIZE: > 9pt; FONT-FAMILY: arial*" that are being indexed into the content as well. > > Would like to check, how are we able to remove those words during the > indexing? > > I am using Solr 7.5.0 > > Regards, > Edwin
Removing words like "FONT-SIZE: 9pt; FONT-FAMILY: arial" from content
Hi, I noticed that during the indexing of EMLfiles, there are words like "*FONT-SIZE: 9pt; FONT-FAMILY: arial*" that are being indexed into the content as well. Would like to check, how are we able to remove those words during the indexing? I am using Solr 7.5.0 Regards, Edwin
Re: PC hang while running Solr cloud instance?
1. Each pc? How many are you talking about? 2. Why are you using shards? On Dec 30, 2018, at 4:11 PM, John Milton mailto:johnmilton@gmail.com>> wrote: Wish you happy new year to you all. Hi, I had run my Solr cloud instance 7.5 on my Windows OS. It has 100 shards with 4 replication. My PC is hanging,and cpu and memory occupied 95% of space. Each PC has 16 GB of RAM. PC in ideal state only, at the moment no indexing and searching happens, but task manager shows 95% usage of CPU and memory. How to solve this problem? Thanks, John Milton
Identifying product name and other details from search string
Is there any way to identify product name and other details from search string in Solr or Java? For example: 1. Input String: " wound type cartridge filter size 20 * 4 Inch for RO plant" Output: Product: cartridge filter for RO plant Size: 20 * 4 inch 2. Input String: " WD 40 rust removing spray Container of 100 ml" Product: Rust removing spray Size: 100ml Model: WD 40 -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
PC hang while running Solr cloud instance?
Wish you happy new year to you all. Hi, I had run my Solr cloud instance 7.5 on my Windows OS. It has 100 shards with 4 replication. My PC is hanging,and cpu and memory occupied 95% of space. Each PC has 16 GB of RAM. PC in ideal state only, at the moment no indexing and searching happens, but task manager shows 95% usage of CPU and memory. How to solve this problem? Thanks, John Milton
How to archive Solr cloud and delete the data?
Hi Solr Team, I want to archive my Solr data. Is there any api available to archive data? I planned to read data by month wise and store that into another collection. But this plan takes long time, as like adding new data and new indexing. And when I delete the archived data from the main collection disk size not get changed, I mean after deletion also data directory size is same. Deleted documents count only updated on the admin GUI. When I Google for this, some body says based on merged policy when the deleted documents reached 50%,then only it will removed from the disk. I didnt clear with it. How can I delete and retain the deleted document space? Which is the best way to archive data? Thanks, Rekha K
Re: RuleBasedAuthorizationPlugin configuration
Hi, After reading more carefully the log file, here is my understanding. The request http://2:xx@localhost:8983/solr/biblio/select?indent=on=*:*=json report this in log 2018-12-30 12:24:52.102 INFO (qtp1731656333-20) [ x:biblio] o.a.s.s.HttpSolrCall USER_REQUIRED auth header Basic Mjox context : userPrincipal: [[principal: 2]] type: [READ], collections: [], Path: [/select] path : /select params :q=*:*=on=json collections is empty, so it looks like "/select" is not collection specific and so it is not possible to define read access by collection. Can someone confirm ? Regards Dominique Le ven. 21 déc. 2018 à 10:46, Dominique Bejean a écrit : > Hi, > > I am trying to configure security.json file, in order to define the > following users and permissions : > >- user "admin" with all permissions on all collections >- user "read" with read permissions on all collections >- user "1" with only read permissions on biblio collection >- user "2" with only read permissions on personnes collection > > Here is my security.json file > > { > "authentication":{ > "blockUnknown":true, > "class":"solr.BasicAuthPlugin", > "credentials":{ > "admin":"4uwfcjV7bCqOdLF/Qn2wiTyC7zIWN6lyA1Bgp1yqZj0= > 7PCh68vhIlZXg1l45kSlvGKowMg1bm/L3eSfgT5dzjs=", > "read":"azUFSo9/plsGkQGhSQuk8YXoir22pALVpP8wFkd7wlk= > gft4wNAeuvz7P8bv/Jv6TK94g516/qXe9cFWe/VlhDo=", > "1":"azUFSo9/plsGkQGhSQuk8YXoir22pALVpP8wFkd7wlk= > gft4wNAeuvz7P8bv/Jv6TK94g516/qXe9cFWe/VlhDo=", > "2":"azUFSo9/plsGkQGhSQuk8YXoir22pALVpP8wFkd7wlk= > gft4wNAeuvz7P8bv/Jv6TK94g516/qXe9cFWe/VlhDo="}, > "":{"v":0}}, > "authorization":{ > "class":"solr.RuleBasedAuthorizationPlugin", > "permissions":[ > { > "name":"all", > "role":"admin", > "index":1}, > { > "name":"read-biblio", > "path":"/select", > "role":["admin","read","r1"], > "collection":"biblio", > "index":2}, > { > "name":"read-personnes", > "path":"/select", > "role":["admin","read","r2"], > "collection":"personnes", > "index":3}, > { > "name":"read", > "collection":"*", > "role":["admin","read"], > "index":4}], > "user-role":{ > "admin":"admin", > "read":"read", > "1":"r1", > "2":"r2"} > } > } > > > I have a 403 errors for user 1 on biblio and user 2 on personnes while > using the "/select" requestHandler. However according to r1 and r2 roles > and premissions order, the access should be allowed. > > I have duplicated the TestRuleBasedAuthorizationPlugin.java class in order > to test these exact same permissions and roles. checkRules reports access > is allowed !!! > > I don't understand where is the problem. Any ideas ? > > Regards > > Dominique > > > > > > > >
Re: Reload synonyms without reloading the multiple collections
Sorry, I see that it may have been confusing. My webapp calls the reload of all the affected Collections (about a dozen of them) in sequential mode using the Collections API. Ideally I would be able to write some QueryTimeSynonymFilterFactory that would periodically or when told, reload the synonym's file from ZK, which is what the system edits when a user changes some synonyms. I understand that a Collection needs to be reloaded if the synonyms were to be used at indexation time, but this is not my case. The managed API is on the same situation, basically it does what I am doing on my own right now. At the end, there has to be a reload of the affected Collections. Regards, Simón On Sun, Dec 30, 2018 at 5:01 AM Shawn Heisey wrote: > On 12/29/2018 5:55 AM, Simón de Frosterus Pokrzywnicki wrote: > > The problem is that when the user changes the synonyms, it automatically > > triggers a sequential reload of all the Collections. > > What exactly is being done when you say "the user changes the > synonyms"? Just uploading a new synonyms definition file to zookeeper > would *NOT* result in a reload of *ANY* collection. As far as I am > aware, collection reloads only happen when they are explicitly > requested. Usage of the managed APIs to change aspects of the schema > could cause a reload, but it's only going to happen on the collection > where the API is used, not all collections. > > Basically, I cannot imagine any situation that would cause a reload of > all collections, other than explicitly asking Solr to do those reloads. > > Thanks, > Shawn > >