Re: Replica is going into recovery in Solr 6.1.0

2020-02-12 Thread vishal patel
What GC are you using? -- G1GC

Sent from Outlook


From: Walter Underwood 
Sent: Thursday, February 13, 2020 11:09 AM
To: solr-user@lucene.apache.org 
Subject: Re: Replica is going into recovery in Solr 6.1.0

Your JVM had very bad GC trouble. The 5 second GCs were enough to cause 
problems. The one minute GC is really, really bad. I’m not surprised the 
replica went down.

Look at the graphs for memory usage in the new and old spaces. It looks like it 
ran out. Maybe the heap is too small, but it might be something else.

What GC are you using?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 12, 2020, at 8:16 PM, vishal patel  
> wrote:
>
> Is there anyone looking at this?
>
> Sent from Outlook
> 
> From: vishal patel 
> Sent: Wednesday, February 12, 2020 3:45 PM
> To: solr-user@lucene.apache.org 
> Subject: Replica is going into recovery in Solr 6.1.0
>
> I am using solr version 6.1.0, Java 8 version and G1gc on production. We have 
> 2 shards and each shard has 1 replica. Suddenly one replica is going into 
> recovery mode and Requests become slow in our production.
> I have analyzed that minor GC max pause time was 1 min 6 sec 800 ms on that 
> time and also multiple times minor GC pauses.
>
> My logs :
> https://drive.google.com/file/d/158z3nzLsnHGouyRnXgfzCjwD4iadgKSp/view?usp=sharing
> https://drive.google.com/file/d/1E4jyffvIWVJB7EeEMXBXyqaK2ZfAA8kk/view?usp=sharing
>
> I do not know why long GC pause time happened. In our platform heavy 
> searching and indexing is performed.
> long GC pause times happen due to searching or indexing?
> If GC pause time long then why replica is going into recovery? can we set the 
> waiting time of update request?
> what is the minimum GC pause time for going into recovery mode?
>
> It is useful for my problem? : https://issues.apache.org/jira/browse/SOLR-9310
>
> Regards,
> Vishal Patel
>
> Sent from Outlook



Re: Replica is going into recovery in Solr 6.1.0

2020-02-12 Thread vishal patel
My configuration:

-XX:+AggressiveOpts -XX:ConcGCThreads=12 -XX:G1HeapRegionSize=33554432 
-XX:G1ReservePercent=20 -XX:InitialHeapSize=68719476736 
-XX:InitiatingHeapOccupancyPercent=10 -XX:+ManagementServer 
-XX:MaxHeapSize=68719476736 -XX:ParallelGCThreads=36 
-XX:+ParallelRefProcEnabled -XX:PrintFLSStatistics=1 -XX:+PrintGC 
-XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps -XX:+PrintGCDetails 
-XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution 
-XX:ThreadStackSize=256 -XX:+UseG1GC -XX:-UseLargePages 
-XX:-UseLargePagesIndividualAllocation -XX:+UseStringDeduplication

Sent from Outlook

From: Rajdeep Sahoo 
Sent: Thursday, February 13, 2020 10:03 AM
To: solr-user@lucene.apache.org 
Subject: Re: Replica is going into recovery in Solr 6.1.0

What is your memory configuration

On Thu, 13 Feb, 2020, 9:46 AM vishal patel, 
wrote:

> Is there anyone looking at this?
>
> Sent from Outlook
> 
> From: vishal patel 
> Sent: Wednesday, February 12, 2020 3:45 PM
> To: solr-user@lucene.apache.org 
> Subject: Replica is going into recovery in Solr 6.1.0
>
> I am using solr version 6.1.0, Java 8 version and G1gc on production. We
> have 2 shards and each shard has 1 replica. Suddenly one replica is going
> into recovery mode and Requests become slow in our production.
> I have analyzed that minor GC max pause time was 1 min 6 sec 800 ms on
> that time and also multiple times minor GC pauses.
>
> My logs :
>
> https://drive.google.com/file/d/158z3nzLsnHGouyRnXgfzCjwD4iadgKSp/view?usp=sharing
>
> https://drive.google.com/file/d/1E4jyffvIWVJB7EeEMXBXyqaK2ZfAA8kk/view?usp=sharing
>
> I do not know why long GC pause time happened. In our platform heavy
> searching and indexing is performed.
> long GC pause times happen due to searching or indexing?
> If GC pause time long then why replica is going into recovery? can we set
> the waiting time of update request?
> what is the minimum GC pause time for going into recovery mode?
>
> It is useful for my problem? :
> https://issues.apache.org/jira/browse/SOLR-9310
>
> Regards,
> Vishal Patel
>
> Sent from Outlook
>


Re: Replica is going into recovery in Solr 6.1.0

2020-02-12 Thread Walter Underwood
Your JVM had very bad GC trouble. The 5 second GCs were enough to cause 
problems. The one minute GC is really, really bad. I’m not surprised the 
replica went down.

Look at the graphs for memory usage in the new and old spaces. It looks like it 
ran out. Maybe the heap is too small, but it might be something else.

What GC are you using?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 12, 2020, at 8:16 PM, vishal patel  
> wrote:
> 
> Is there anyone looking at this?
> 
> Sent from Outlook
> 
> From: vishal patel 
> Sent: Wednesday, February 12, 2020 3:45 PM
> To: solr-user@lucene.apache.org 
> Subject: Replica is going into recovery in Solr 6.1.0
> 
> I am using solr version 6.1.0, Java 8 version and G1gc on production. We have 
> 2 shards and each shard has 1 replica. Suddenly one replica is going into 
> recovery mode and Requests become slow in our production.
> I have analyzed that minor GC max pause time was 1 min 6 sec 800 ms on that 
> time and also multiple times minor GC pauses.
> 
> My logs :
> https://drive.google.com/file/d/158z3nzLsnHGouyRnXgfzCjwD4iadgKSp/view?usp=sharing
> https://drive.google.com/file/d/1E4jyffvIWVJB7EeEMXBXyqaK2ZfAA8kk/view?usp=sharing
> 
> I do not know why long GC pause time happened. In our platform heavy 
> searching and indexing is performed.
> long GC pause times happen due to searching or indexing?
> If GC pause time long then why replica is going into recovery? can we set the 
> waiting time of update request?
> what is the minimum GC pause time for going into recovery mode?
> 
> It is useful for my problem? : https://issues.apache.org/jira/browse/SOLR-9310
> 
> Regards,
> Vishal Patel
> 
> Sent from Outlook



Alfresco SOLR4 Wild card search not working if minimum 1 or 2 characters used for search

2020-02-12 Thread Vignesh Sabapathi
Hi Team,

I am using SOLR4 (4.10.3) which comes along with Alfresco and there are no
customizations in the SOLR with all default configurations.

I have a field called fieldOne:72,73,74

If I search for Query @fieldOne:*72* It does not produce any result Same
with If I search for @fieldOne:*7*

But If I use 3 characters then it works as @fieldOne:*72,*

I would like to know where this is configured to update this settings to
allow the search to use minimum 2 characters to fetch the results.


Re: Replica is going into recovery in Solr 6.1.0

2020-02-12 Thread Rajdeep Sahoo
What is your memory configuration

On Thu, 13 Feb, 2020, 9:46 AM vishal patel, 
wrote:

> Is there anyone looking at this?
>
> Sent from Outlook
> 
> From: vishal patel 
> Sent: Wednesday, February 12, 2020 3:45 PM
> To: solr-user@lucene.apache.org 
> Subject: Replica is going into recovery in Solr 6.1.0
>
> I am using solr version 6.1.0, Java 8 version and G1gc on production. We
> have 2 shards and each shard has 1 replica. Suddenly one replica is going
> into recovery mode and Requests become slow in our production.
> I have analyzed that minor GC max pause time was 1 min 6 sec 800 ms on
> that time and also multiple times minor GC pauses.
>
> My logs :
>
> https://drive.google.com/file/d/158z3nzLsnHGouyRnXgfzCjwD4iadgKSp/view?usp=sharing
>
> https://drive.google.com/file/d/1E4jyffvIWVJB7EeEMXBXyqaK2ZfAA8kk/view?usp=sharing
>
> I do not know why long GC pause time happened. In our platform heavy
> searching and indexing is performed.
> long GC pause times happen due to searching or indexing?
> If GC pause time long then why replica is going into recovery? can we set
> the waiting time of update request?
> what is the minimum GC pause time for going into recovery mode?
>
> It is useful for my problem? :
> https://issues.apache.org/jira/browse/SOLR-9310
>
> Regards,
> Vishal Patel
>
> Sent from Outlook
>


Re: [External] Re: per-field count of documents matched?

2020-02-12 Thread Susmit
i used json facet api for a similar requirement. it can ignore filters from 
main query if needed and roll up the hit counts to any field ..


> On Feb 11, 2020, at 6:19 PM, Fischer, Stephen 
>  wrote:
> 
> Thanks very much!   By the way, we are using eDisMax, and the queries our UI 
> supports don't include fancy Booleans, so your ideas just might work
> 
> Thanks again,
> Steve
> 
> -Original Message-
> From: Erick Erickson  
> Sent: Tuesday, February 11, 2020 7:16 PM
> To: solr-user@lucene.apache.org
> Subject: [External] Re: per-field count of documents matched?
> 
> Hmmm, you could do a facet query (or a series of them). 
> facet.query=LastName:stone=Street:stone etc….. That’d 
> automatically only tally for the docs that match.
> 
> You could also consider a custom search component. For the exact case you 
> describe, it’s actually fairly simple. The postings list has, for each term, 
> the list of docs that contain it (internal Lucene doc ID). So I might have 
> for field LastName:
> stone -> 1,73,100…
> 
> for field Street:
> stone-> 264,933…
> 
> So it’s simply a matter of, for each term, and each doc the overall query 
> matches go down the list of docs and add them up.
> 
> However… I’m not sure you’d get what you want in either case. Consider a 
> query (A AND B) OR (C AND D). And let’s say doc1 contains A in LastName, and 
> C,D in Street. Should A be counted in the LastName tally for this doc?
> 
> I suppose you could put the full query in the facet.query above. I’m still 
> not sure it’s what you need, since I’m not sure what "per-field count of 
> documents that match” means in your application…
> 
> Best,
> Erick
> 
>> On Feb 11, 2020, at 6:15 PM, Fischer, Stephen 
>>  wrote:
>> 
>> Hi wise Solr experts,
>> 
>> For our scientific use-case we want to show users a per-field count of 
>> documents that match that field.
>> 
>> We like to do this efficiently because we might return up to a million 
>> documents.
>> 
>> For example, if we had documents describing People, and a query of, 
>> say, "Stone" we might want to show
>> 
>> Fields matched:
>> Last name:  145
>> Street: 431
>> Favorite rock band:  13
>> Home exterior: 2340
>> 
>> Is there an efficient way to do this?
>> 
>> So far, we're trying to leverage highlighting.   But it seems very slow.
>> 
>> Thanks
> 


Re: Replica is going into recovery in Solr 6.1.0

2020-02-12 Thread vishal patel
Is there anyone looking at this?

Sent from Outlook

From: vishal patel 
Sent: Wednesday, February 12, 2020 3:45 PM
To: solr-user@lucene.apache.org 
Subject: Replica is going into recovery in Solr 6.1.0

I am using solr version 6.1.0, Java 8 version and G1gc on production. We have 2 
shards and each shard has 1 replica. Suddenly one replica is going into 
recovery mode and Requests become slow in our production.
I have analyzed that minor GC max pause time was 1 min 6 sec 800 ms on that 
time and also multiple times minor GC pauses.

My logs :
https://drive.google.com/file/d/158z3nzLsnHGouyRnXgfzCjwD4iadgKSp/view?usp=sharing
https://drive.google.com/file/d/1E4jyffvIWVJB7EeEMXBXyqaK2ZfAA8kk/view?usp=sharing

I do not know why long GC pause time happened. In our platform heavy searching 
and indexing is performed.
long GC pause times happen due to searching or indexing?
If GC pause time long then why replica is going into recovery? can we set the 
waiting time of update request?
what is the minimum GC pause time for going into recovery mode?

It is useful for my problem? : https://issues.apache.org/jira/browse/SOLR-9310

Regards,
Vishal Patel

Sent from Outlook


Zookeeper upgrade required with Solr upgrade?

2020-02-12 Thread Rahul Goswami
Hello,
We are running a SolrCloud (7.2.1) cluster and upgrading to Solr 7.7.2. We
run a separate multi node zookeeper ensemble which currently runs
Zookeeper 3.4.10.
Is it also required to upgrade Zookeeper (to  3.4.14 as per change.txt for
Solr 7.7.2) along with Solr ?

I tried a few basic updates requests for a 2 node SolrCloud cluster with
the older (3.4.10) zookeeper and it seemed to work fine. But just want to
know if there are any caveats I should be aware of.

Thanks,
Rahul


mapping and tuning payloads in Solr 8

2020-02-12 Thread Burgmans, Tom
Hi all,

In our Solr 6 setup we use string payloads to boost certain tokens (URIs). 
These strings are mapped to floats via a schema parameter "PayloadMapping", 
which can be read out in our custom WKSimilarity class (extending 
TFIDFSimilarity).









   
0.4
0.4
0.5
0
0.0
10.0
3.0
 1.0
 isAbout=15.0,coversFiscalPeriod=10.0,type=5.0,hasTheme=5.0,subject=4.0,mentions=2.0,creator=2.0
   


The reason for this indirection is convenience: by storing payload strings 
i.s.o. floats we could change & tune the boosts easily by updating the schema 
without having to change the content set.
Inside WKSimilarity each payload string is mapped to its corresponding boost 
value and the final boost is applied via the scorePayload method (where we 
could tune the boost curve via some additional schema parameters). This works 
well in Solr 6.

The problem: we are about to migrate to Solr 8 and after LUCENE-8014 it isn't 
possible anymore the override the scorePayload method in WKSimilarity (it is 
removed from TFIDFSimilarity). I wonder what alternatives there are for mapping 
strings payload to floats and use them in a tunable formula for boosting.

Thanks,
Tom Burgmans


Slow quires and facets

2020-02-12 Thread Rudenko, Artur
Hello everyone,
I'm am currently investigating a performance issue in our environment:
20M large PARENT documents and 800M nested small CHILD documents.
The system inserts about 400K PARENT documents and 16M CHILD documents per day. 
(Currently we stopped the calls insertion to investigate the performance issue)
This is a solr cloud 8.3 environment with 7 servers (64 VCPU 128 GB RAM each, 
24GB allocated to Solr) with single collection (32 shards and replication 
factor 2).

We experience generally slow queries (about 4-7 seconds) and facet times. The 
below query runs in about 14-16 seconds (we have to use limit:-1 due to a 
business case - cardinality is 1K values).

fq=channel:345133
=content_type:PARENT
=Meta_is_organizationIds:(344996998 344594999 34501 total of int 562 
values)
=*:*
={
"Chart_01_Bins":{
type:terms,
field:groupIds,
mincount:1,
limit:-1,
numBuckets:true,
missing:false,
refine:true,
facet:{

min_score_avg:"avg(min_score)",

max_score_avg:"avg(max_score)",

avg_score_avg:"avg(avg_score)"
}
},
"Chart_01_FIELD_NOT_EXISTS":{
type:query,
q:"-groupIds:[* TO *]",
facet:{
min_score_avg:"avg(min_score)",
max_score_avg:"avg(max_score)",
avg_score_avg:"avg(avg_score)"
}
}
}
=0

Also, when the facet is simplified, it takes about 4-6 seconds

fq=channel:345133
=content_type:PARENT
=Meta_is_organizationIds:(344996998 344594999 34501 total of int 562 
values)
=*:*
={
"Chart_01_Bins":{
type:terms,
field:groupIds,
mincount:1,
limit:-1,
numBuckets:true,
missing:false,
refine:true
}
}
=0

Schema relevant fields:






















Any suggestions how to proceed with the investigation?

Right now we are trying to figure out if using single shard on each machine 
will help.
Artur Rudenko
Analytics Developer
Customer Engagement Solutions, VERINT
T +972.74.747.2536 | M +972.52.425.4686



This electronic message may contain proprietary and confidential information of 
Verint Systems Inc., its affiliates and/or subsidiaries. The information is 
intended to be for the use of the individual(s) or entity(ies) named above. If 
you are not the intended recipient (or authorized to receive this e-mail for 
the intended recipient), you may not use, copy, disclose or distribute to 
anyone this message or any information contained in this message. If you have 
received this electronic message in error, please notify us by replying to this 
e-mail.


Possible performance bug - JSON facet - numBuckets:true

2020-02-12 Thread Rudenko, Artur
Hello everyone,
I'm am currently investigating a performance issue in our environment and it 
looks like we found a performance bug.
Our environment:
20M large PARENT documents and 800M nested small CHILD documents.
The system inserts about 400K PARENT documents and 16M CHILD documents per day. 
(Currently we stopped the calls insertion to investigate the performance issue)
This is a solr cloud 8.3 environment with 7 servers (64 VCPU 128 GB RAM each, 
24GB allocated to Solr) with single collection (32 shards and replication 
factor 2).

The below query runs in about 14-16 seconds (we have to use limit:-1 due to a 
business case - cardinality is 1K values).

fq=channel:345133
=content_type:PARENT
=Meta_is_organizationIds:(344996998 344594999 34501 total of int 562 
values)
=*:*
={
"Chart_01_Bins":{
type:terms,
field:groupIds,
mincount:1,
limit:-1,
numBuckets:true,
missing:false,
refine:true,
facet:{

min_score_avg:"avg(min_score)",

max_score_avg:"avg(max_score)",

avg_score_avg:"avg(avg_score)"
}
},
"Chart_01_FIELD_NOT_EXISTS":{
type:query,
q:"-groupIds:[* TO *]",
facet:{
min_score_avg:"avg(min_score)",
max_score_avg:"avg(max_score)",
avg_score_avg:"avg(avg_score)"
}
}
}
=0

Also, when the facet is simplified, it takes about 4-6 seconds

fq=channel:345133
=content_type:PARENT
=Meta_is_organizationIds:(344996998 344594999 34501 total of int 562 
values)
=*:*
={
"Chart_01_Bins":{
type:terms,
field:groupIds,
mincount:1,
limit:-1,
numBuckets:true,
missing:false,
refine:true
}
}
=0

Schema relevant fields:























I noticed that when we set numBuckets:false, the result returns faster (1.5-3.5 
seconds less) - that sounds like a performance bug:
The limit is -1, which means all bucks, so adding about significant time to the 
overall time just to get number of buckets when we will get all of them anyway 
doesn't seems to be right.

Any thoughts?


Thanks
Artur Rudenko


This electronic message may contain proprietary and confidential information of 
Verint Systems Inc., its affiliates and/or subsidiaries. The information is 
intended to be for the use of the individual(s) or entity(ies) named above. If 
you are not the intended recipient (or authorized to receive this e-mail for 
the intended recipient), you may not use, copy, disclose or distribute to 
anyone this message or any information contained in this message. If you have 
received this electronic message in error, please notify us by replying to this 
e-mail.


Re: REINDEXCOLLECTION fatal error in DaemonStream

2020-02-12 Thread Karl Stoney
Hmm interestingly this happened when I set an `fq` 
(*,old_version:_version_,old_lmake:L_MAKE,old_lmodel:L_MODEL) which I pulled 
from our old dataimporthandler.  Removing that it worked fine

From: Karl Stoney 
Sent: 12 February 2020 17:20
To: solr-user@lucene.apache.org 
Subject: REINDEXCOLLECTION fatal error in DaemonStream

Hey folks,
Trying out the REINDEXCOLLECTION but getting the following error:

Anyone seen it before?


17:14:09.610 
[DaemonStream-at-uk-002-88-thread-1-processing-n:solr-0.search-solr.dev.k8.atcloud.io:80_solr
 x:at-uk-001_shard1_replica_n1 c:at-uk-001 s:shard1 r:core_node2] ERROR 
org.apache.solr.client.solrj.io.stream.DaemonStream - Fatal Error in 
DaemonStream:at-uk-002
java.lang.NullPointerException: null
at 
org.apache.solr.client.solrj.io.stream.TopicStream.read(TopicStream.java:380) 
~[solr-solrj-8.4.2-SNAPSHOT.jar:8.4.2-SNAPSHOT 
7d3ac7c284b26ce62f41d3b8686f70c7d6bd758d - root - 2020-02-11 19:56:06]
at 
org.apache.solr.client.solrj.io.stream.PushBackStream.read(PushBackStream.java:88)
 ~[solr-solrj-8.4.2-SNAPSHOT.jar:8.4.2-SNAPSHOT 
7d3ac7c284b26ce62f41d3b8686f70c7d6bd758d - root - 2020-02-11 19:56:06]
at 
org.apache.solr.client.solrj.io.stream.UpdateStream.read(UpdateStream.java:111) 
~[solr-solrj-8.4.2-SNAPSHOT.jar:8.4.2-SNAPSHOT 
7d3ac7c284b26ce62f41d3b8686f70c7d6bd758d - root - 2020-02-11 19:56:06]
at 
org.apache.solr.client.solrj.io.stream.CommitStream.read(CommitStream.java:116) 
~[solr-solrj-8.4.2-SNAPSHOT.jar:8.4.2-SNAPSHOT 
7d3ac7c284b26ce62f41d3b8686f70c7d6bd758d - root - 2020-02-11 19:56:06]
at 
org.apache.solr.client.solrj.io.stream.DaemonStream$StreamRunner.stream(DaemonStream.java:338)
 ~[solr-solrj-8.4.2-SNAPSHOT.jar:8.4.2-SNAPSHOT 
7d3ac7c284b26ce62f41d3b8686f70c7d6bd758d - root - 2020-02-11 19:56:06]
at 
org.apache.solr.client.solrj.io.stream.DaemonStream$StreamRunner.run(DaemonStream.java:319)
 ~[solr-solrj-8.4.2-SNAPSHOT.jar:8.4.2-SNAPSHOT 
7d3ac7c284b26ce62f41d3b8686f70c7d6bd758d - root - 2020-02-11 19:56:06]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
 ~[solr-solrj-8.4.2-SNAPSHOT.jar:8.4.2-SNAPSHOT 
7d3ac7c284b26ce62f41d3b8686f70c7d6bd758d - root - 2020-02-11 19:56:06]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 
Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 
9439967). This email and any files transmitted with it are confidential and may 
be legally privileged, and intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this email in error 
please notify the sender. This email message has been swept for the presence of 
computer viruses.
This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 
Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 
9439967). This email and any files transmitted with it are confidential and may 
be legally privileged, and intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this email in error 
please notify the sender. This email message has been swept for the presence of 
computer viruses.


REINDEXCOLLECTION fatal error in DaemonStream

2020-02-12 Thread Karl Stoney
Hey folks,
Trying out the REINDEXCOLLECTION but getting the following error:

Anyone seen it before?


17:14:09.610 
[DaemonStream-at-uk-002-88-thread-1-processing-n:solr-0.search-solr.dev.k8.atcloud.io:80_solr
 x:at-uk-001_shard1_replica_n1 c:at-uk-001 s:shard1 r:core_node2] ERROR 
org.apache.solr.client.solrj.io.stream.DaemonStream - Fatal Error in 
DaemonStream:at-uk-002
java.lang.NullPointerException: null
at 
org.apache.solr.client.solrj.io.stream.TopicStream.read(TopicStream.java:380) 
~[solr-solrj-8.4.2-SNAPSHOT.jar:8.4.2-SNAPSHOT 
7d3ac7c284b26ce62f41d3b8686f70c7d6bd758d - root - 2020-02-11 19:56:06]
at 
org.apache.solr.client.solrj.io.stream.PushBackStream.read(PushBackStream.java:88)
 ~[solr-solrj-8.4.2-SNAPSHOT.jar:8.4.2-SNAPSHOT 
7d3ac7c284b26ce62f41d3b8686f70c7d6bd758d - root - 2020-02-11 19:56:06]
at 
org.apache.solr.client.solrj.io.stream.UpdateStream.read(UpdateStream.java:111) 
~[solr-solrj-8.4.2-SNAPSHOT.jar:8.4.2-SNAPSHOT 
7d3ac7c284b26ce62f41d3b8686f70c7d6bd758d - root - 2020-02-11 19:56:06]
at 
org.apache.solr.client.solrj.io.stream.CommitStream.read(CommitStream.java:116) 
~[solr-solrj-8.4.2-SNAPSHOT.jar:8.4.2-SNAPSHOT 
7d3ac7c284b26ce62f41d3b8686f70c7d6bd758d - root - 2020-02-11 19:56:06]
at 
org.apache.solr.client.solrj.io.stream.DaemonStream$StreamRunner.stream(DaemonStream.java:338)
 ~[solr-solrj-8.4.2-SNAPSHOT.jar:8.4.2-SNAPSHOT 
7d3ac7c284b26ce62f41d3b8686f70c7d6bd758d - root - 2020-02-11 19:56:06]
at 
org.apache.solr.client.solrj.io.stream.DaemonStream$StreamRunner.run(DaemonStream.java:319)
 ~[solr-solrj-8.4.2-SNAPSHOT.jar:8.4.2-SNAPSHOT 
7d3ac7c284b26ce62f41d3b8686f70c7d6bd758d - root - 2020-02-11 19:56:06]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
 ~[solr-solrj-8.4.2-SNAPSHOT.jar:8.4.2-SNAPSHOT 
7d3ac7c284b26ce62f41d3b8686f70c7d6bd758d - root - 2020-02-11 19:56:06]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 
Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 
9439967). This email and any files transmitted with it are confidential and may 
be legally privileged, and intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this email in error 
please notify the sender. This email message has been swept for the presence of 
computer viruses.


Solr Commit Not Working on Update

2020-02-12 Thread logancraft
So I am trying to do a partial update to a document in Solr, but it will not
commit!

So this is the original doc I am trying to update with 11 votes.

{
  "doc":
  {
    "id":"location_23_deal_51",
    "deal_id":"deal_51",
    "deal":"$1.99 Kid's Meal with Purchase of Adult Meal",
    "rating":"3",
    "votes":"11",
    "deal_restaurant":"Moe's BBQ",
    "deal_type":"Kids",
    "content_type":"deal",
    "day":"Sunday",
    "_version_":1658287992903565312
  }
}

So I can run the command without the commit and it works like below:

curl 'http://54.146.2.60:8983/solr/eatzcollection/update/json' -d
'[{"id":"location_23_deal_51","votes":{"set":23}}]' -H
'Content-type:application/json'


And then I run a get command it returns the right results.

curl http://54.146.2.60:8983/solr/eatzcollection/get\?id\=location_23_deal_51

{
  "doc":
  {
    "id":"location_23_deal_51",
    "deal_id":"deal_51",
    "deal":"$1.99 Kid's Meal with Purchase of Adult Meal",
    "rating":"3",
    "votes":"23",
    "deal_restaurant":"Moe's BBQ",
    "deal_type":"Kids",
    "content_type":"deal",
    "day":"Sunday",
    "_version_":1658297071939092480
  }
}


This is above result is what I want to be able to commit but when I run the
same command with commit=true it will not work like below.

curl
'http://54.146.2.60:8983/solr/eatzcollection/update/json?commit=true' -d
'[{"id":"location_23_deal_51","votes":"23"}]' -H
'Content-type:application/json'

And I run the get command I get the wrong result.

curl http://54.146.2.60:8983/solr/eatzcollection/get\?id\=location_23_deal_51

{
  "doc":
  {
    "id":"location_23_deal_51",
    "deal_id":"deal_51",
    "deal":"$1.99 Kid's Meal with Purchase of Adult Meal",
    "rating":"3",
    "votes":"11",
    "deal_restaurant":"Moe's BBQ",
    "deal_type":"Kids",
    "content_type":"deal",
    "day":"Sunday",
    "_version_":1658287992903565312
  }
}


I have tried a lot different query string like update/json, /update, but
they all stop working when I add the commit=true parameter to the query
string.

Any ideas will be much appreciated!




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Mongolian language in Solr

2020-02-12 Thread Samir Joshi
Hi,

Is it possible to get a Mongolian language in Solr indexing?

Regards,

Samir Joshi

VFS GLOBAL
EST. 2001 | Partnering Governments. Providing Solutions.

10th Floor, Tower A, Urmi Estate, 95, Ganpatrao Kadam Marg, Lower Parel (W), 
Mumbai 400 013, India
Mob: +91 9987550070 | sami...@vfsglobal.com | 
www.vfsglobal.com



--
Care4Green: Please consider the environment before printing this e-mail
--
This message contains information that may be privileged or confidential and is 
the property of the VFS Global Group. It is intended only for the person to 
whom it is addressed. Any unauthorised printing, copying, disclosure, 
distribution or use of this message or any part thereof is strictly forbidden. 
If you are not the intended recipient, you are not authorised to read, print, 
retain, copy, disseminate, distribute, or use this message or any part thereof. 
If you receive this message in error, please notify the sender immediately and 
delete all copies of this message. VFS Global Group has taken reasonable 
precaution to ensure that any attachment to this e-mail has been swept for 
viruses. However, we do not accept liability for any direct or indirect damage 
sustained as a result of software viruses and would advise that you conduct 
your own virus checks before opening any attachment. VFS Global Group does not 
guarantee the security of any information transmitted electronically and is not 
liable for the proper, timely and complete transmission thereof.
--



Solr Commit Not Working

2020-02-12 Thread logancraft
So I am trying to do a partial update to a document in Solr, but it will not
commit!

So this is the original doc I am trying to update with 11 votes.

{
  "doc":
  {
"id":"location_23_deal_51",
"deal_id":"deal_51",
"deal":"$1.99 Kid's Meal with Purchase of Adult Meal",
"rating":"3",
"votes":"11",
"deal_restaurant":"Moe's BBQ",
"deal_type":"Kids",
"content_type":"deal",
"day":"Sunday",
"_version_":1658287992903565312
  }
}

So I can run the command without the commit and it works like below:

curl 'http://54.146.2.60:8983/solr/eatzcollection/update/json' -d
'[{"id":"location_23_deal_51","votes":{"set":23}}]' -H
'Content-type:application/json'


And then I run a get command it returns the right results.

curl
http://54.146.2.60:8983/solr/eatzcollection/get\?id\=location_23_deal_51

{
  "doc":
  {
"id":"location_23_deal_51",
"deal_id":"deal_51",
"deal":"$1.99 Kid's Meal with Purchase of Adult Meal",
"rating":"3",
"votes":"23",
"deal_restaurant":"Moe's BBQ",
"deal_type":"Kids",
"content_type":"deal",
"day":"Sunday",
"_version_":1658297071939092480
  }
}


This is above result is what I want to be able to commit but when I run the
same command with commit=true it will not work like below.

curl 'http://54.146.2.60:8983/solr/eatzcollection/update/json?commit=true'
-d '[{"id":"location_23_deal_51","votes":"23"}]' -H
'Content-type:application/json'

And I run the get command I get the wrong result.

curl
http://54.146.2.60:8983/solr/eatzcollection/get\?id\=location_23_deal_51

{
  "doc":
  {
"id":"location_23_deal_51",
"deal_id":"deal_51",
"deal":"$1.99 Kid's Meal with Purchase of Adult Meal",
"rating":"3",
"votes":"11",
"deal_restaurant":"Moe's BBQ",
"deal_type":"Kids",
"content_type":"deal",
"day":"Sunday",
"_version_":1658287992903565312
  }
}


I have tried a lot different query string like update/json, /update, but
they all stop working when I add the commit=true parameter to the query
string.

Any ideas will be much appreciated!











--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


no servers hosting shard: shard2

2020-02-12 Thread Yogesh Chaudhari
Hi All,

Currently we are using Solr 5.2.1 on production server and want upgrade to Solr 
7.7.2.  We are using solr 5.2.1 from last 5 years, we do have millions of 
documents on production server. We have Solr cloud with 2 shards and 3 replicas 
on production server.

I have upgraded Solr 5.2.1 to Solr 6.6.6 , it is upgraded successfully on my 
local machine.

Now I am trying to upgrading Solr 6.6.6 to Solr 7.7.2. I have upgraded all 6 
solr instances one at a time to Solr 7.7.2. I am getting below error. One shard 
(with 3 replicas) is upgraded successfully bu =t the another shard is giving an 
error (please refer below image).  Though one shard is upgraded, I can not do 
anything with that. I think issue is due to old indexes or documents.


[cid:image001.png@01D5E1E2.3BA81F70]

org.apache.solr.common.SolrException: no servers hosting shard: shard2
at 
org.apache.solr.handler.component.HttpShardHandler.prepDistributed(HttpShardHandler.java:463)
at 
org.apache.solr.handler.component.SearchHandler.getAndPrepShardHandler(SearchHandler.java:226)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:267)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2551)
at 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:711)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516)


Thanks,

Yogesh Chaudhari


Re: wildcards match end-of-word?

2020-02-12 Thread Erick Erickson
Steve:

You _really_ want to get acquainted with the admin UI/Analysis page ;). Choose 
a core/collection and you should see the choice. It shows you exactly what 
transformations your data goes through. If you hover over the light gray pairs 
of letters, you’ll get a tooltip showing you what part of your analysis chain 
is responsible for a particular change. I un-check the “verbose” box 95% of the 
time BTW.

The critical bit is that what comes out of the end of the analysis pipe are the 
tokens that are actually _in_ the index. From there, problems like this make 
more sense.

My bet is that, as Walter says, you have a stemmer in the analysis chain and 
the actual token in the index is “kinas” so of course “kinase*” won’t be found. 
By adding OR kinase to the query, that token is stemmed to “kinas” and matches.

Also, adding =query to your URL will show you what the query looks like 
after parsing and analysis, also a major tool for figuring out what’s really 
happening.

Wildcards are not stemmed, which can lead to surprising results. There’s no 
perfect answer here. Let’s claim wildcards _were_ stemmed. Then you’d have to 
try to explain why “running*” returned a doc with only “run” or “runner” or 
“runs” or... in it, but searching for “runnin*” did not due the stemmer not 
recognizing it as a stemmable word.

Finally, one of my personal hot buttons is wildcards in general. They’re very 
often over-used because people are used to simple search capabilities. 
Something about “if your only tool is a hammer, every problem looks like a 
nail”. That gets into training users too though...

Best,
Erick

> On Feb 11, 2020, at 9:24 PM, Fischer, Stephen 
>  wrote:
> 
> Hi,
> 
> I am a solr newbie.  I was surprised to discover that a search for kinase* 
> returned fewer results than kinase.
> 
> Then I read the wildcard 
> documentation,
>  and saw why.  kinase* will not match the word "kinase".
> 
> Our end-users won't expect this behavior.  Presumably the solution would be 
> for them (actually us, on their behalf), to use kinase* OR kinase.
> 
> But that is kind of a hack.
> 
> Is there a way we can configure solr to have wildcards match on end-of-word?
> 
> Thanks,
> Steve



RE: [External] Re: wildcards match end-of-word?

2020-02-12 Thread Fischer, Stephen
Thanks *very much* for replying.  (You're right, I missed the "zero or more," 
having focused only on the examples in the doc.  Oops).

New discovery.  kin*ase returns 0 hits.   Below I show the debug output and the 
pertinent parts of the schema.   Maybe you can spot my problem?

{
  "responseHeader":{
"status":0,
"QTime":2,
"params":{
  "q":"kin*ase",
  "defType":"edismax",
  "debug":"all",
  "qf":"TEXT__gene_product",
  "fl":"id,document-type,TEXT__gene_product,score",
  "stopwords":"true"}},
  "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
  },
  "debug":{
"rawquerystring":"kin*ase",
"querystring":"kin*ase",
"parsedquery":"+DisjunctionMaxQuery((TEXT__gene_product:kin*ase))",
"parsedquery_toString":"+(TEXT__gene_product:kin*ase)",
"explain":{},
"QParser":"ExtendedDismaxQParser",
"altquerystring":null,
"boost_queries":null,
"parsed_boost_queries":[],
"boostfuncs":null,
"timing":{
  "time":2.0,
  "prepare":{
"time":1.0,
"query":{
  "time":1.0},
"facet":{
  "time":0.0},
"facet_module":{
  "time":0.0},
"mlt":{
  "time":0.0},
"highlight":{
  "time":0.0},
"stats":{
  "time":0.0},
"expand":{
  "time":0.0},
"terms":{
  "time":0.0},
"debug":{
  "time":0.0}},
  "process":{
"time":1.0,
"query":{
  "time":0.0},
"facet":{
  "time":0.0},
"facet_module":{
  "time":0.0},
"mlt":{
  "time":0.0},
"highlight":{
  "time":0.0},
"stats":{
  "time":0.0},
"expand":{
  "time":0.0},
"terms":{
  "time":0.0},
"debug":{
  "time":0.0}




  










  
  









  




-Original Message-
From: Walter Underwood  
Sent: Wednesday, February 12, 2020 12:31 AM
To: solr-user@lucene.apache.org
Subject: [External] Re: wildcards match end-of-word?

“kinase*” does match “kinase”. On the page you linked to, it defines “*” as 
matching "Multiple characters (matches zero or more sequential characters)”.

If it is not matching, you may be using a stemmer on that field or doing some 
other processing that changes the tokens.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 11, 2020, at 6:24 PM, Fischer, Stephen 
>  wrote:
> 
> Hi,
> 
> I am a solr newbie.  I was surprised to discover that a search for kinase* 
> returned fewer results than kinase.
> 
> Then I read the wildcard 
> documentation,
>  and saw why.  kinase* will not match the word "kinase".
> 
> Our end-users won't expect this behavior.  Presumably the solution would be 
> for them (actually us, on their behalf), to use kinase* OR kinase.
> 
> But that is kind of a hack.
> 
> Is there a way we can configure solr to have wildcards match on end-of-word?
> 
> Thanks,
> Steve



Re: Bug? Documents not visible after sucessful commit - chaos testing

2020-02-12 Thread Michael Frank
Hi Group, Hi Chris,
---
We found the Issue and a Workaround
---

We traced the problem down to DistributedUpdateProcessor.doLocalCommit()

which is *silently* dropping all commits while the replica is currently
inactive and replaying, imeadiatly returns and still reports status=0.
This behaviour is inconsistent with the documented behaviour of  a commits
"waitSeacher" parameter which:  "[..] Blocks until a new searcher is opened
and registered as the main query searcher, making the changes visible."
https://lucene.apache.org/solr/guide/7_7/uploading-data-with-index-handlers.html#xml-update-commands

The issue we have is the "silent" part. If upon recieving a commit request
the replica
  - would either wait to become healthy and and then commit and return,
honoring waitSearcher=true (which is what we expected from reading the
documentation)
  - or at least behave consistently the same way as all other
UpdateRequests and report back the achieved replication factor with the
"rf" response parameter
 we could easily detect the degraded cluster state in the client and keep
re-trying the commit till "rf" matches the number of replicas.

We think this is a bug (silently dropping commits even if the client
requested "waitForSearcher"), or at least a missing feature (commits beging
the only UpdateRequests not reporting the achieved RF), which should be
worth a JIRA Ticket.
While i personally would prefere not dropping commits we are fine as long
as the client can detect this behaviour, e.g. with a "rf" response
parameter in CommitResponse.

Client Side Workaround
==
Define a custom updateRequestProcessorChain only for commits, bypassing the
DistributingUpdateProcessor:







..and manually send the following commit to each replica of the target
collection using the custom chain:
 UpdateRequest commit = new UpdateRequest();
 commit.setAction(ACTION.COMMIT, true, true, softCommit);
 commit.getParams().set(CommonParams.DISTRIB, false);//we are the
aggregator and take care of distribution
 commit.getParams().set(UpdateParams.UPDATE_CHAIN, "custom-force-commit-chain");

By bypassing the DistributingUpdateProcessor we achieve the desired
behavoiur: a successful commit OR we get a detectable error in the
client on which we can retry. Nice thing: most errors are allready
detected by HttpSolrClient, which automatically retrys on stale state.

All thrown together (not includig the
retryOnException(()->forceCommitAllReplicas(col,false)) part):
  private void forceCommitAllReplicas(String collection, boolean
softCommit) throws IOException {
ZkStateReader zkStateReader=
solrClientCache.getCloudSolrClient(zkHost).getZkStateReader();
Set replicaRoutes =
Arrays.stream(CloudSolrStream.getSlices(collection, zkStateReader,
true))
.flatMap(s->s.getReplicas().stream())
.map(ZkCoreNodeProps::getCoreUrl)
.collect(Collectors.toSet());

UpdateRequest commit = new UpdateRequest();
commit.setAction(ACTION.COMMIT, true/*waitFlush*/,
true/*waitSeacher*/, softCommit);
commit.getParams().set(CommonParams.DISTRIB, false);//we are
the aggregator and take care of distribution
commit.getParams().set(UpdateParams.UPDATE_CHAIN,
"custom-force-commit-chain");

int achievedRF = replicaRoutes.stream()
.map(solrClientCache::getHttpSolrClient)
.mapToInt(client-> {
try {
UpdateResponse resp = commit.process(client);
return resp.getStatus()==0?1:0;//count
achieved replication factor
} catch (Exception e) {
return 0;//commit did not succeeded on this replica
}
})
.sum();

if(achievedRF < replicaRoutes.size()){
throw new SolrException(ErrorCode.INVALID_STATE,
"Cluster is in degraded state -
not all replicas acknowledged the commit."
+ achievedRF + "/" +replicaRoutes.size());
}
}

Cheers
Michael


Am Do., 6. Feb. 2020 um 12:18 Uhr schrieb Michael Frank <
frank.michael.busin...@gmail.com>:

> Hi Chris,
> thank you for your detailed answer!
>
> We are aware that Solr Cloud is eventually consistent and in our
> application that's fine in most cases.
> However, what is really important for us is that we get a "Read Your
> Writes" for a clear point in time - which in our understand should be after
> hard commits with waitSearcher=true return sucessfull from all replicas. Is
> that correct?
> The client that indexes new documents performs a hard commit with
> waitSearcher=true and after that was successful, we expect the documents to
> be visible on all Replicas.
> This 

Replica is going into recovery in Solr 6.1.0

2020-02-12 Thread vishal patel
I am using solr version 6.1.0, Java 8 version and G1gc on production. We have 2 
shards and each shard has 1 replica. Suddenly one replica is going into 
recovery mode and Requests become slow in our production.
I have analyzed that minor GC max pause time was 1 min 6 sec 800 ms on that 
time and also multiple times minor GC pauses.

My logs :
https://drive.google.com/file/d/158z3nzLsnHGouyRnXgfzCjwD4iadgKSp/view?usp=sharing
https://drive.google.com/file/d/1E4jyffvIWVJB7EeEMXBXyqaK2ZfAA8kk/view?usp=sharing

I do not know why long GC pause time happened. In our platform heavy searching 
and indexing is performed.
long GC pause times happen due to searching or indexing?
If GC pause time long then why replica is going into recovery? can we set the 
waiting time of update request?
what is the minimum GC pause time for going into recovery mode?

It is useful for my problem? : https://issues.apache.org/jira/browse/SOLR-9310

Regards,
Vishal Patel

Sent from Outlook