Re: SOLR Cloud - Full index replication

2018-12-30 Thread Doss
Thanks Erick!

We are using SOLR version 7.0.1.

is there any disadvantages if we increase  peer sync size to 1000 ?

We have analysed the GC logs but we have not seen long GC pauses so far.

We tried to find the reason for the full sync, but noting more informative,
but we have seen too many logs which reads "No registered leader was found
after waiting for 4000ms" followed by this full index.

Thanks,
Doss.


On Sun, Dec 30, 2018 at 8:49 AM Erick Erickson 
wrote:

> No. There's a "peer sync" that will try to update from the leader's
> transaction log if (and only if) the replica has fallen behind. By
> "fallen behind" I mean it was unable to accept any updates for
> some period of time. The default peer sync size is 100 docs,
> you can make it larger see numRecordsToKeep here:
> http://lucene.apache.org/solr/guide/7_6/updatehandlers-in-solrconfig.html
>
> Some observations though:
> 12G heap for 250G of index on disk _may_ work, but I'd be looking at
> the GC characteristics, particularly stop-the-world pauses.
>
> Your hard commit interval looks too long. I'd shorten it to < 1 minute
> with openSearcher=false. See:
>
> https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> I'd concentrate on _why_ the replica goes into recovery in the first
> place. You say you're on 7x, which one? Starting in 7.3 the recovery
> logic was pretty thoroughly reworked, so _which_ 7x version is
> important to know.
>
> The Solr logs should give you some idea of _why_ the replica
> goes into recovery, concentrate on the replica that goes into
> recovery and the corresponding leader's log.
>
> Best,
> Erick
>
> On Sat, Dec 29, 2018 at 6:23 PM Doss  wrote:
> >
> > we are using 3 node solr (64GB ram/8cpu/12GB heap)cloud setup with
> version
> > 7.X. we have 3 indexes/collection on each node. index size were about
> > 250GB. NRT with 5sec soft /10min hard commit. Sometimes in any one node
> we
> > are seeing full index replication started running..  is there any
> > configuration which forces solr to replicate full , like 100/200 updates
> > difference if a node sees with the leader ? - Thanks.
>


Re: Removing words like "FONT-SIZE: 9pt; FONT-FAMILY: arial" from content

2018-12-30 Thread Zheng Lin Edwin Yeo
These texts are likely from the original EML file data, but they are not
visible in the content when the EML file is opened in Microsoft Outlook.

I have already applied the HTMLStripFieldUpdateProcessorFactory in
solrconfig.xml, but these texts are still showing up in the index. Below is
my configuration.





  content_tcs










Regards,
Edwin

On Mon, 31 Dec 2018 at 11:29, Alexandre Rafalovitch 
wrote:

> Specifically, a custome Update Request Processor chain can be used before
> indexing. Probably with HTMLStripFieldUpdateProcessorFactory
> Regards,
>  Alex
>
> On Sun, Dec 30, 2018, 9:26 PM Vincenzo D'Amore 
> > Hi,
> >
> > I think this kind of text manipulation should be done before indexing, if
> > you have font-size font-family in your text, very likely you’re indexing
> an
> > html with css.
> > If I’m right, you’re just entering in a hell of words that should be
> > removed from your text.
> >
> > On the other hand, if you have to do this at index time, a quick and
> dirty
> > solution is using the pattern-replace filter.
> >
> >
> >
> https://lucene.apache.org/solr/guide/7_5/filter-descriptions.html#pattern-replace-filter
> >
> > Ciao,
> > Vincenzo
> >
> > --
> > mobile: 3498513251
> > skype: free.dev
> >
> > > On 31 Dec 2018, at 02:47, Zheng Lin Edwin Yeo 
> > wrote:
> > >
> > > Hi,
> > >
> > > I noticed that during the indexing of EMLfiles, there are words like
> > > "*FONT-SIZE:
> > > 9pt; FONT-FAMILY: arial*" that are being indexed into the content as
> > well.
> > >
> > > Would like to check, how are we able to remove those words during the
> > > indexing?
> > >
> > > I am using Solr 7.5.0
> > >
> > > Regards,
> > > Edwin
> >
>


Re: Removing words like "FONT-SIZE: 9pt; FONT-FAMILY: arial" from content

2018-12-30 Thread Alexandre Rafalovitch
Specifically, a custome Update Request Processor chain can be used before
indexing. Probably with HTMLStripFieldUpdateProcessorFactory
Regards,
 Alex

On Sun, Dec 30, 2018, 9:26 PM Vincenzo D'Amore  Hi,
>
> I think this kind of text manipulation should be done before indexing, if
> you have font-size font-family in your text, very likely you’re indexing an
> html with css.
> If I’m right, you’re just entering in a hell of words that should be
> removed from your text.
>
> On the other hand, if you have to do this at index time, a quick and dirty
> solution is using the pattern-replace filter.
>
>
> https://lucene.apache.org/solr/guide/7_5/filter-descriptions.html#pattern-replace-filter
>
> Ciao,
> Vincenzo
>
> --
> mobile: 3498513251
> skype: free.dev
>
> > On 31 Dec 2018, at 02:47, Zheng Lin Edwin Yeo 
> wrote:
> >
> > Hi,
> >
> > I noticed that during the indexing of EMLfiles, there are words like
> > "*FONT-SIZE:
> > 9pt; FONT-FAMILY: arial*" that are being indexed into the content as
> well.
> >
> > Would like to check, how are we able to remove those words during the
> > indexing?
> >
> > I am using Solr 7.5.0
> >
> > Regards,
> > Edwin
>


Re: Removing words like "FONT-SIZE: 9pt; FONT-FAMILY: arial" from content

2018-12-30 Thread Vincenzo D'Amore
Hi,

I think this kind of text manipulation should be done before indexing, if you 
have font-size font-family in your text, very likely you’re indexing an html 
with css.
If I’m right, you’re just entering in a hell of words that should be removed 
from your text. 

On the other hand, if you have to do this at index time, a quick and dirty 
solution is using the pattern-replace filter. 

https://lucene.apache.org/solr/guide/7_5/filter-descriptions.html#pattern-replace-filter

Ciao,
Vincenzo

--
mobile: 3498513251
skype: free.dev

> On 31 Dec 2018, at 02:47, Zheng Lin Edwin Yeo  wrote:
> 
> Hi,
> 
> I noticed that during the indexing of EMLfiles, there are words like
> "*FONT-SIZE:
> 9pt; FONT-FAMILY: arial*" that are being indexed into the content as well.
> 
> Would like to check, how are we able to remove those words during the
> indexing?
> 
> I am using Solr 7.5.0
> 
> Regards,
> Edwin


Removing words like "FONT-SIZE: 9pt; FONT-FAMILY: arial" from content

2018-12-30 Thread Zheng Lin Edwin Yeo
Hi,

I noticed that during the indexing of EMLfiles, there are words like
"*FONT-SIZE:
9pt; FONT-FAMILY: arial*" that are being indexed into the content as well.

Would like to check, how are we able to remove those words during the
indexing?

I am using Solr 7.5.0

Regards,
Edwin


Re: PC hang while running Solr cloud instance?

2018-12-30 Thread David Hastings
1. Each pc? How many are you talking about?
2. Why are you using shards?

On Dec 30, 2018, at 4:11 PM, John Milton 
mailto:johnmilton@gmail.com>> wrote:

Wish you happy new year to you all.

Hi,

I had run my Solr cloud instance 7.5 on my Windows OS. It has 100 shards
with 4 replication.

My PC is hanging,and cpu and memory occupied 95% of space.
Each PC has 16 GB of RAM.
PC in ideal state only, at the moment no indexing and searching happens,
but task manager shows 95% usage of CPU and memory.

How to solve this problem?

Thanks,
John Milton


Identifying product name and other details from search string

2018-12-30 Thread UsesRN
Is there any way to identify product name and other details from search
string in Solr or Java?

For example: 
1. Input String: "

wound type cartridge filter size 20 * 4 Inch for RO plant" 

Output:

Product: cartridge filter for RO plant

Size: 20 * 4 inch

 

2. Input String: "

WD 40 rust removing spray Container of 100 ml"

Product: Rust removing spray

Size: 100ml

Model: WD 40



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


PC hang while running Solr cloud instance?

2018-12-30 Thread John Milton
Wish you happy new year to you all.

Hi,

I had run my Solr cloud instance 7.5 on my Windows OS. It has 100 shards
with 4 replication.

My PC is hanging,and cpu and memory occupied 95% of space.
Each PC has 16 GB of RAM.
PC in ideal state only, at the moment no indexing and searching happens,
but task manager shows 95% usage of CPU and memory.

How to solve this problem?

Thanks,
John Milton


How to archive Solr cloud and delete the data?

2018-12-30 Thread Rekha
Hi Solr Team, I want to archive my Solr data. Is there any api available to 
archive data? I planned to read data by month wise and store that into another 
collection. But this plan takes long time, as like adding new data and new 
indexing. And when I delete the archived data from the main collection disk 
size not get changed,  I mean after deletion also data directory size is same. 
Deleted documents count only updated on the admin GUI. When I Google for this, 
some body says based on merged policy when the deleted documents reached 
50%,then only it will removed from the disk. I didnt clear with it. How 
can I delete and retain the deleted document space? Which is the best way to 
archive data? Thanks, Rekha K 

Re: RuleBasedAuthorizationPlugin configuration

2018-12-30 Thread Dominique Bejean
Hi,

After reading more carefully the log file, here is my understanding.

The request

http://2:xx@localhost:8983/solr/biblio/select?indent=on=*:*=json

report this in log

2018-12-30 12:24:52.102 INFO  (qtp1731656333-20) [   x:biblio]
o.a.s.s.HttpSolrCall USER_REQUIRED auth header Basic Mjox context :
userPrincipal: [[principal: 2]] type: [READ], collections: [], Path:
[/select] path : /select params :q=*:*=on=json

collections is empty, so it looks like "/select" is not collection specific
and so it is not possible to define read access by collection.

Can someone confirm ?

Regards

Dominique





Le ven. 21 déc. 2018 à 10:46, Dominique Bejean 
a écrit :

> Hi,
>
> I am trying to configure security.json file, in order to define the
> following users and permissions :
>
>- user "admin" with all permissions on all collections
>- user "read" with read  permissions  on all collections
>- user "1" with only read  permissions  on biblio collection
>- user "2" with only read  permissions  on personnes collection
>
> Here is my security.json file
>
> {
>   "authentication":{
> "blockUnknown":true,
> "class":"solr.BasicAuthPlugin",
> "credentials":{
>   "admin":"4uwfcjV7bCqOdLF/Qn2wiTyC7zIWN6lyA1Bgp1yqZj0=
> 7PCh68vhIlZXg1l45kSlvGKowMg1bm/L3eSfgT5dzjs=",
>   "read":"azUFSo9/plsGkQGhSQuk8YXoir22pALVpP8wFkd7wlk=
> gft4wNAeuvz7P8bv/Jv6TK94g516/qXe9cFWe/VlhDo=",
>   "1":"azUFSo9/plsGkQGhSQuk8YXoir22pALVpP8wFkd7wlk=
> gft4wNAeuvz7P8bv/Jv6TK94g516/qXe9cFWe/VlhDo=",
>   "2":"azUFSo9/plsGkQGhSQuk8YXoir22pALVpP8wFkd7wlk=
> gft4wNAeuvz7P8bv/Jv6TK94g516/qXe9cFWe/VlhDo="},
> "":{"v":0}},
>   "authorization":{
> "class":"solr.RuleBasedAuthorizationPlugin",
> "permissions":[
>   {
> "name":"all",
> "role":"admin",
> "index":1},
>   {
> "name":"read-biblio",
> "path":"/select",
> "role":["admin","read","r1"],
> "collection":"biblio",
> "index":2},
>   {
> "name":"read-personnes",
> "path":"/select",
> "role":["admin","read","r2"],
> "collection":"personnes",
> "index":3},
>  {
> "name":"read",
> "collection":"*",
> "role":["admin","read"],
> "index":4}],
> "user-role":{
>   "admin":"admin",
>   "read":"read",
>   "1":"r1",
>   "2":"r2"}
>   }
> }
>
>
> I have a 403 errors for user 1 on biblio and user 2 on personnes while
> using the "/select" requestHandler. However according to r1 and r2 roles
> and premissions order, the access should be allowed.
>
> I have duplicated the TestRuleBasedAuthorizationPlugin.java class in order
> to test these exact same permissions and roles. checkRules reports access
> is allowed !!!
>
> I don't understand where is the problem. Any ideas ?
>
> Regards
>
> Dominique
>
>
>
>
>
>
>
>


Re: Reload synonyms without reloading the multiple collections

2018-12-30 Thread Simón de Frosterus Pokrzywnicki
Sorry, I see that it may have been confusing.

My webapp calls the reload of all the affected Collections (about a dozen
of them) in sequential mode using the Collections API.

Ideally I would be able to write some QueryTimeSynonymFilterFactory that
would periodically or when told, reload the synonym's file from ZK, which
is what the system edits when a user changes some synonyms.

I understand that a Collection needs to be reloaded if the synonyms were to
be used at indexation time, but this is not my case.

The managed API is on the same situation, basically it does what I am doing
on my own right now. At the end, there has to be a reload of the affected
Collections.

Regards,
Simón

On Sun, Dec 30, 2018 at 5:01 AM Shawn Heisey  wrote:

> On 12/29/2018 5:55 AM, Simón de Frosterus Pokrzywnicki wrote:
> > The problem is that when the user changes the synonyms, it automatically
> > triggers a sequential reload of all the Collections.
>
> What exactly is being done when you say "the user changes the
> synonyms"?  Just uploading a new synonyms definition file to zookeeper
> would *NOT* result in a reload of *ANY* collection.  As far as I am
> aware, collection reloads only happen when they are explicitly
> requested.  Usage of the managed APIs to change aspects of the schema
> could cause a reload, but it's only going to happen on the collection
> where the API is used, not all collections.
>
> Basically, I cannot imagine any situation that would cause a reload of
> all collections, other than explicitly asking Solr to do those reloads.
>
> Thanks,
> Shawn
>
>