Re: Rule-Based permissions for cores

2020-08-31 Thread Dominique Bejean
Hi,

It looks like this issue I opened a long time ago.
https://issues.apache.org/jira/browse/SOLR-13097

Regards

Dominique


Le lun. 31 août 2020 à 23:02, Thomas Corthals  a
écrit :

> Hi,
>
> I'm trying to configure the Rule-Based Authorization Plugin in Solr 8.4.0
> in standalone mode. My goal is to limit a user's access to one or more
> designated cores. My security.json looks like this:
>
> {
>   "authentication":{
> "blockUnknown":true,
> "class":"solr.BasicAuthPlugin",
> "credentials":{
>   "solr":"...",
>   "user1":"...",
>   "user2":"..."},
> "realm":"Solr",
> "forwardCredentials":false,
> "":{"v":0}},
>   "authorization":{
> "class":"solr.RuleBasedAuthorizationPlugin",
> "permissions":[
>   {
> "name":"security-edit",
> "role":"admin",
> "index":1},
>   {
> "name":"read",
> "collection":"core1",
> "role":"role1",
> "index":2},
>   {
> "name":"read",
> "collection":"core2",
> "role":"role2",
> "index":3},
>   {
> "name":"all",
> "role":"admin",
> "index":4}],
> "user-role":{
>   "solr":"admin",
>   "user1":"role1",
>   "user2":"role2"},
> "":{"v":0}}}
>
> With this setup, I'm unable to read from any of the cores with either user.
> If I "delete-permission":4 both users can read from either core, not just
> "their" core.
>
> I have tried custom permissions like this to no avail:
> {"name": "access-core1", "collection": "core1", "role": "role1"},
> {"name": "access-core2", "collection": "core2", "role": "role2"},
> {"name": "all", "role": "admin"}
>
> Is it possible to do this for cores? Or am I out of luck because I'm not
> using collections?
>
> Regards
>
> Thomas
>


Rule-Based permissions for cores

2020-08-31 Thread Thomas Corthals
Hi,

I'm trying to configure the Rule-Based Authorization Plugin in Solr 8.4.0
in standalone mode. My goal is to limit a user's access to one or more
designated cores. My security.json looks like this:

{
  "authentication":{
"blockUnknown":true,
"class":"solr.BasicAuthPlugin",
"credentials":{
  "solr":"...",
  "user1":"...",
  "user2":"..."},
"realm":"Solr",
"forwardCredentials":false,
"":{"v":0}},
  "authorization":{
"class":"solr.RuleBasedAuthorizationPlugin",
"permissions":[
  {
"name":"security-edit",
"role":"admin",
"index":1},
  {
"name":"read",
"collection":"core1",
"role":"role1",
"index":2},
  {
"name":"read",
"collection":"core2",
"role":"role2",
"index":3},
  {
"name":"all",
"role":"admin",
"index":4}],
"user-role":{
  "solr":"admin",
  "user1":"role1",
  "user2":"role2"},
"":{"v":0}}}

With this setup, I'm unable to read from any of the cores with either user.
If I "delete-permission":4 both users can read from either core, not just
"their" core.

I have tried custom permissions like this to no avail:
{"name": "access-core1", "collection": "core1", "role": "role1"},
{"name": "access-core2", "collection": "core2", "role": "role2"},
{"name": "all", "role": "admin"}

Is it possible to do this for cores? Or am I out of luck because I'm not
using collections?

Regards

Thomas


Child query with negative filter return zero documents

2020-08-31 Thread Marvin Bredal Lillehaug
Hi!
We have documents with one level of child documents.
One use case we have is returning (or getting stats for) child documents,
filtering by field values on both the child and the parent.

This works for «must» filter on parent:
q= *:*
fq=doc_type:child
fq=child_field:child_field_value
fq={!child of=doc_type:parent}parent_field:parent_field_value

But a «must not» filter on parent returns zero documents:
fq={!child of=doc_type:parent}-parent_field:parent_field_value

Rewriting to:
fq=-{!child of=doc_type:parent}parent_field:parent_field_value
seems to work.

Should these two be equivalent?

A concrete example of a filter on parent document: enum_attr_2021:2738
(integer field)
Stepping through query parsing
«{!child of=docType:object}enum_attr_2021:2730» is parsed to
«ToChildBlockJoinQuery (+(-enum_attr_2021:[2738 TO 2738]))»

«-{!child of=docType:object}enum_attr_2021:2730» is parsed to
«-ToChildBlockJoinQuery (+enum_attr_2021:[2730 TO 2730])»


   1. SOLR-9327  describes
   similar behaviour for graph queries.


-- 
Kind regards,
Marvin B. Lillehaug


What is the Best way to block certain types of queries/ query patterns in Solr?

2020-08-31 Thread Mark Robinson
Hi,
I had come across a mail (Oct, 2019 one) which suggested the best way is to
handle it before it reaches Solr. I was curious whether:-
   1. Jetty query filter can be used (came across something like
that,, need to check)
2. Any new features in Solr itself (like in a request handler...or
solrconfig, schema etc..)

Thanks!
Mark


Solr waits too long for connect after editing solr.in.sh

2020-08-31 Thread maciejpregiel
Good afternoon.
In Solr 8.6 Guide, in chapter "Securing Solr", there is section "Enable IP 
Access Control". Here I can theoretically uncomment and edit content of 
solr.in.sh commands: SOLR_IP_WHITELIST, SOLR_IP_BLACKLIST.
When I did this, launching local Solr server takes too much time (over 30 
seconds) that connecting fails.
How could I fix problem?
Best Regards,
Maciej Pregiel


Re: How to Prevent Recovery?

2020-08-31 Thread Dominique Bejean
Hi,

Even if it is not the root cause, I suggest to try to respect some basic
best practices and so not have "2 Zk running on the
same nodes where Solr is running". Maybe you can achieve this by just
stopping these 2 Zk (and move them later). Did you increase
ZK_CLIENT_TIMEOUT to 3 ?

Did you check your GC logs ? Any consecutive full GC ? How big is your Solr
heap size ? Not too big ?

The last time I saw such long commits, it was due to slow segment merges
related docValues and dynamicfield. Are you intensively using DynamicFields
with docValues ?

Can you enable Lucene detailed debug information
(true) ?
https://lucene.apache.org/solr/guide/8_5/indexconfig-in-solrconfig.html#other-indexing-settings

With these Lucene debug information, are there any lines like this in your
logs ?

2020-05-03 16:22:38.139 INFO  (qtp1837543557-787) [   x:###]
o.a.s.u.LoggingInfoStream [MS][qtp1837543557-787]: too many merges;
stalling...
2020-05-03 16:24:58.318 INFO  (commitScheduler-19-thread-1) [   x:###]
o.a.s.u.DirectUpdateHandler2 start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
2020-05-03 16:24:59.005 INFO  (commitScheduler-19-thread-1) [   x:###]
o.a.s.u.LoggingInfoStream [MS][commitScheduler-19-thread-1]: too many
merges; stalling...
2020-05-03 16:31:31.402 INFO  (Lucene Merge Thread #55) [   x:###]
o.a.s.u.LoggingInfoStream [SM][Lucene Merge Thread #55]: 1291879 msec to
merge doc values [464265 docs]


Regards

Dominique





Le dim. 30 août 2020 à 20:44, Anshuman Singh  a
écrit :

> Hi,
>
> I changed all the replicas, 50x2, from NRT to TLOG by adding TLOG replicas
> using the ADDREPLICA API and then deleting the NRT replicas.
> But now, these replicas are going into recovery even more frequently during
> indexing. Same errors are observed.
> Also, commit is taking a lot of time compared to NRT replicas.
> Can this be due to the fact that most of the indexes are on disk and not in
> RAM, and therefore copying index from leader is causing high disk
> utilisation and causing poor performance?
> Do I need to tweak the auto commit settings? Right now it is 30 seconds max
> time and 100k max docs.
>
> Regards,
> Anshuman
>
> On Tue, Aug 25, 2020 at 10:23 PM Erick Erickson 
> wrote:
>
> > Commits should absolutely not be taking that much time, that’s where I’d
> > focus first.
> >
> > Some sneaky places things go wonky:
> > 1> you have  suggester configured that builds whenever there’s a commit.
> > 2> you send commits from the client
> > 3> you’re optimizing on commit
> > 4> you have too much data for your hardware
> >
> > My guess though is that the root cause of your recovery is that the
> > followers
> > get backed up. If there are enough merge threads running, the
> > next update can block until at least one is done. Then the scenario
> > goes something like this:
> >
> > leader sends doc to follower
> > follower does not index the document in time
> > leader puts follower into “leader initiated recovery”.
> >
> > So one thing to look for if that scenario is correct is whether there are
> > messages
> > in your logs with "leader-initiated recovery” I’d personally grep my logs
> > for
> >
> > grep logfile initated | grep recovery | grep leader
> >
> > ‘cause I never remember whether that’s the exact form. If it is this, you
> > can
> > lengthen the timeouts, look particularly for:
> > • distribUpdateConnTimeout
> > • distribUpdateSoTimeout
> >
> > All that said, your symptoms are consistent with a lot of merging going
> > on. With NRT
> > nodes, all replicas do all indexing and thus merging. Have you considered
> > using TLOG/PULL replicas? In your case they could even all be TLOG
> > replicas. In that
> > case, only the leader does the indexing, the other TLOG replicas of a
> > shard just stuff
> > the documents into their local tlogs without indexing at all.
> >
> > Speaking of which, you could reduce some of the disk pressure if you can
> > put your
> > tlogs on another drive, don’t know if that’s possible. Ditto the Solr
> logs.
> >
> > Beyond that, it may be a matter of increasing the hardware. You’re really
> > indexing
> > 120K records second ((1 leader + 2 followers) * 40K)/sec.
> >
> > Best,
> > Erick
> >
> > > On Aug 25, 2020, at 12:02 PM, Anshuman Singh <
> singhanshuma...@gmail.com>
> > wrote:
> > >
> > > Hi,
> > >
> > > We have a 10 node (150G RAM, 1TB SAS HDD, 32 cores) Solr 8.5.1 cluster
> > with
> > > 50 shards, rf 2 (NRT replicas), 7B docs, We have 5 Zk with 2 running on
> > the
> > > same nodes where Solr is running. Our use case requires continuous
> > > ingestions (updates mostly). If we ingest at 40k records per sec, after
> > > 10-15mins some replicas go into recovery with the errors observed given
> > in
> > > the end. We also observed high CPU during these ingestions (60-70%) and
> > > disks frequently reach 100% utilization.
> > >
> > > We know our hardware is limited but this system will be