Re: solr 8.4.1 with ssl tls1.2 creating an issue with non-leader node

2020-06-03 Thread yaswanth kumar
Hi Franke,

I suspect its because of the certificate encryption ?? But will wait for
you to confirm the same. We are trying to generate a certs with RSA 2048
and finally combining them to a single JKS and that's what we are referring
as a keystore and truststore, let me know if it doesn't work or if there is
a standard procedure to do this certs.

Thanks,

On Wed, Jun 3, 2020 at 8:25 AM yaswanth kumar  wrote:

> thanks Franke,
>
> I now made the use of the default jetty-ssl.xml that comes with the solr
> package, but the issue is still happening when I try to push data to a
> non-leader node.
>
> Do you still think if its something to do with the configurations ??
>
> Thanks,
>
> On Wed, Jun 3, 2020 at 12:29 AM Jörn Franke  wrote:
>
>> Why in the jetty-ssl.xml?
>>
>> Should this not be configured in the solr.in.sh?
>>
>> > Am 03.06.2020 um 00:38 schrieb yaswanth kumar :
>> >
>> > Thanks Franke, but yes for all these questions I did configured it
>> > properly, I made sure to include
>> >
>> > > > default="JKS"/>
>> >  > > default="JKS"/>
>> > in the jetty-ssl.xml along with the path keystore and truststore.
>> >
>> > Also I have made sure that trusstore exists on all nodes and also I am
>> > using the same file for both keystore and truststore as below
>> > > > default="./etc/solr-keystore.jks"/>
>> >  > > name="solr.jetty.keystore.password" default=""/>
>> >  > > default="./etc/solr-keystore.jks"/>
>> >  > > name="solr.jetty.truststore.password" default=""/>
>> >
>> > also urlScheme for ZK is set to https
>> >
>> >
>> > Also the main error that I posted is the one that I am seeing as a
>> return
>> > response where as the below one is what I see from solr logs
>> >
>> > 2020-06-02 22:32:04.472 ERROR (qtp984876512-93) [c:default s:shard1
>> > r:core_node3 x:default_shard1_replica_n1] o.a.s.s.HttpSolrCall
>> > null:org.apache.solr.update.processor.Distr$
>> >at
>> >
>> org.apache.solr.update.processor.DistributedZkUpdateProcessor.doDistribFinish(DistributedZkUpdateProcessor.java:1189)
>> >at
>> >
>> org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1096)
>> >at
>> >
>> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182)
>> >at
>> >
>> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>> >at
>> >
>> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>> >at
>> >
>> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>> >at
>> >
>> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>> >at
>> >
>> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>> >at
>> >
>> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>> >at
>> >
>> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>> >at
>> >
>> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>> >at
>> >
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:78)
>> >at
>> >
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
>> >at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)
>> >at
>> > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799)
>> >at
>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578)
>> >at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
>> >at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
>> >at
>> >
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
>> >at
>> >
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
>> >at
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
>> >at
>> >
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>> >at
>> >
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>> >at
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
>> >
>> >
>> > One strange observation is that when I hit update api on the leader node
>> > its working without any error, and now immediately if I hit non-leader
>> its
>> > working fine (only once or twice), but if I keep on trying to hit this
>> node
>> > again and again its then throwing the above error and once the error
>> > started happening , its consistent again.
>> >
>> > Please let me know if you need more information or if I am 

SolrSlf4jReporter, MDC information not set if num collections > coreLoadThreads

2020-06-03 Thread Marvin Bredal Lillehaug
Hi!
We just started using SolrSlf4jReporter to get hold of metrics.
In Solr 8.5.2 there is a issue, when the number of cores is larger than 3
(default value of coreLoadThreads) the logged metrics for some cores are
missing all MDC variables for the core.

There has been some changes concerning MDCLoggingContext by David Smiley in
«SOLR-14351 Harden MDCLoggingContext.clear depth tracking», I haven't
managed to check if this fixes it.

What happens in 8.5.2 is that MDCLoggingContext.clear() is called at the
end of CoreContainer.createFromDescriptor(...), clear
decrements CALL_DEPTH, resulting in CALL_DEPTH having a value larger than 0
when the next Core is created.

Maybe MDCLoggingContext.reset() should be used instead
of MDCLoggingContext.clear() for a 8.5.x fix?

The workaround to set a larger value of coreLoadThreads does work.

-- 
Kind regards,
Marvin B. Lillehaug


Re: Insert documents to a particular shard

2020-06-03 Thread sambasivarao giddaluri
Thanks Jorn for your suggestions ,

It was a sample schema but each document_type will have more fields .
1) Yes i have exported graph traversal gatherNodes using streaming
expression but we found few issues
ex:  get parent doc based on grandchild doc filter
Graph Traversal -
{!graph from=parentId to=parentId traversalFilter='document_type:parent'
returnRoot=false}(name:David AND document_type:grandchild)
this request gives all the fields of the parent doc  but  gather nodes i
can gather only a single field of the parent doc and then i have to query
to get all the fields also we are looking for pagination where streams does
not support pagination .


2) I tried document routing with explicit way and it might work for us but
i have to explore more on what happens when we split the shards.
ex: curl 'localhost:8983/solr/admin/collections?action=CREATE&name=family&
router.name
=implicit&router.field=rfield&collection.configName=base-config&shards=shard1,shard2&maxShardsPerNode=2&numShards=1&replicationFactor=2'

   - when inserting the parent doc i can randomly pick one of the shard
   (shard1 or shard2) for the rfield
   - while inserting any child doc or grandchild doc i use the parent doc
   rfield to keep them in the same shard.

Regards
sam


On Tue, Jun 2, 2020 at 10:35 PM Jörn Franke  wrote:

> Hint: you can easily try out streaming expressions in the admin UI
>
> > Am 03.06.2020 um 07:32 schrieb Jörn Franke :
> >
> > 
> > You are trying to achieve data locality by having parents and children
> in the same shard?
> > Does document routing address it?
> >
> >
> https://lucene.apache.org/solr/guide/8_5/shards-and-indexing-data-in-solrcloud.html#document-routing
> >
> >
> > On a side node, I don’t know your complete use case, but have you
> explored streaming expressions for graph traversal?
> >
> > https://lucene.apache.org/solr/guide/8_5/graph-traversal.html
> >
> >
> >>> Am 03.06.2020 um 00:37 schrieb sambasivarao giddaluri <
> sambasiva.giddal...@gmail.com>:
> >>>
> >> Hi All,
> >> I am running solr in cloud mode in local with 2 shards and 2 replica on
> >> port 8983 and 7574 and figuring out how to insert document in to a
> >> particular shard , I read about implicit and composite route but i don't
> >> think it will work for my usecase.
> >>
> >> shard1 :  http://192.168.0.112:8983/family_shard1_replica_n1
> >> http://192.168.0.112:7574/family_shard1_replica_n2
> >>
> >> shard2:   http://192.168.0.112:8983/family_shard2_replica_n3
> >> http://192.168.0.112:7574/family_shard2_replica_n4
> >>
> >> we have documents with parent child relationship but flatten out with 2
> >> levels down and reference to each other.
> >> family schema documents:
> >> {
> >> "Id":"1"
> >> "document_type":"parent"
> >> "name":"John"
> >> }
> >> {
> >> "Id":"2"
> >> "document_type":"child"
> >> "parentId":"1"
> >> "name":"Rodney"
> >> }
> >> {
> >> "Id":"3"
> >> "document_type":"child"
> >> "parentId":"1"
> >> "name":"George"
> >> }
> >> {
> >> "Id":"4"
> >> "document_type":"grandchild"
> >> "parentId":"1",
> >> "childIdId":"2"
> >> "name":"David"
> >> }
> >> we have complex queries to get data based on graph query parser and  as
> >> graph query parser does not work on solr cloud with multiple shards. I
> was
> >> trying to develop a logic like whenever a document gets inserted or
> updated
> >> make sure it gets saved in the same shard where the parent doc is
> stored ,
> >> in that way graph query works because all the family information will
> be in
> >> the same shard.
> >> Approach :
> >> 1) If a new child/grandchild is getting inserted then get the parent doc
> >> shard details and add the shard details to the document in a field
> >> ex:parentshard and save the doc in the shard.
> >> 2) If document is getting updated check if the parentshard field exists
> if
> >> so update the doc to same shard.
> >> But all these check conditions will increase response time , currently
> our
> >> development is done in cloud mode with single shard and  using solrj to
> >> save the data.
> >> Also i an unable to figure out the query to update  doc to a particular
> >> shard.
> >>
> >> Any suggestions will help .
> >>
> >> Thanks in Advance
> >> sam
>


Re: Multiple Solr instances using same ZooKeepers

2020-06-03 Thread Walter Underwood
If your clusters are able to use the same Zookeeper, then they are in the same 
data center (or AWS region), so you should not need CDCR. That is for clusters 
in different data centers. Also, CDCR has some known problems.

What are you trying to solve with CDCR? There may be a better way to solve it.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jun 2, 2020, at 6:35 AM, Gell-Holleron, Daniel 
>  wrote:
> 
> Many thanks for this information! 
> 
> 
> -Original Message-
> From: Colvin Cowie  
> Sent: 02 June 2020 09:46
> To: solr-user@lucene.apache.org
> Subject: Re: Multiple Solr instances using same ZooKeepers
> 
> You can specify a different "chroot" directory path in zookeeper for each 
> cloud 
> https://lucene.apache.org/solr/guide/8_5/setting-up-an-external-zookeeper-ensemble.html#using-a-chroot
> 
> On Tue, 2 Jun 2020 at 09:33, Gell-Holleron, Daniel < 
> daniel.gell-holle...@gb.unisys.com> wrote:
> 
>> Hi there,
>> 
>> We are in the process of deploying Solr Cloud with CDCR.
>> 
>> I would like to know if multiple instances of Solr (4 Solr servers for 
>> one instance, 4 for another instance) can use the same ZooKeeper servers?
>> 
>> This would prevent us from needing multiple ZooKeepers servers to 
>> serve each instance of Solr.
>> 
>> Regards,
>> 
>> Daniel
>> 
>> 



Re: which terms are used at the matched document?

2020-06-03 Thread Mikhail Khludnev
This is matching term: "ht(*content:banka* in 71"

On Wed, Jun 3, 2020 at 5:15 PM Serkan KAZANCI  wrote:

> Dobry den Mikhail,
>
> So I searched for "banka" which means "bank" at my language. Below is
> highlighted fragments of a matched document. You can see from mark tags
> that "Bankalar", "banka", "bankaya", "bankalar" terms exist in document,
>
>
> "highlighting":{
> "/var/www/vhosts/deneme.biz/httpdocs/kho3/ibb/files/7d-2000-4267.htm
> ":{
>   "content":["Anlamda Bankalar Arası Mevduat
> Sayılamayacağı ) \n\n • TÜRKİYE'DEKİ BİR BANKANIN
> YURTDIŞINDAKİ BANKAYA PARA YATIRMASI ( Banka ve
> Sigorta ",
> "anlamında kurulmuş bir banka olarak
> değerlendirilmesine ve davacı Banka tarafından yurt dışındaki
> bankaya yatırılan mevduatın da bankalar arası
> mevduat ",
> "anlamında kurulmuş bir banka olarak
> değerlendirilmesine ve davacı Banka tarafından yurt dışındaki
> bankaya yatırılan mevduatın da bankalar arası
> mevduat "]},
>
>
>
> Below is debug-explain part of the response about the same document, how
> or where should I read the variations matched term "banka" ? ("Bankalar",
> "bankaya" )
>
>
> "explain":{
>   "/var/www/vhosts/deneme.biz/httpdocs/kho3/ibb/files/7d-2000-4267.htm
> ":{
> "match":true,
> "value":2.6295655,
> "description":"max of:",
> "details":[{
> "match":true,
> "value":2.6295655,
> "description":"weight(content:banka in 7179)
> [SchemaSimilarity], result of:",
> "details":[{
> "match":true,
> "value":2.6295655,
> "description":"score(freq=58.0), computed as boost * idf *
> tf from:",
> "details":[{
> "match":true,
> "value":2.6807382,
> "description":"idf, computed as log(1 + (N - n + 0.5)
> / (n + 0.5)) from:",
> "details":[{
> "match":true,
> "value":3361,
> "description":"n, number of documents containing
> term"},
>   {
> "match":true,
> "value":49063,
> "description":"N, total number of documents with
> field"}]},
>   {
> "match":true,
> "value":0.980911,
> "description":"tf, computed as freq / (freq + k1 * (1
> - b + b * dl / avgdl)) from:",
> "details":[{
> "match":true,
> "value":58.0,
> "description":"freq, occurrences of term within
> document"},
>   {
> "match":true,
> "value":1.2,
> "description":"k1, term saturation parameter"},
>   {
> "match":true,
> "value":0.75,
> "description":"b, length normalization parameter"},
>   {
> "match":true,
> "value":664.0,
> "description":"dl, length of field (approximate)"},
>   {
> "match":true,
> "value":721.1222,
> "description":"avgdl, average length of
> field"}]}]}]}]},
>
>
> -Original Message-
> From: Mikhail Khludnev [mailto:m...@apache.org]
> Sent: Wednesday, June 3, 2020 4:39 PM
> To: solr-user
> Subject: Re: which terms are used at the matched document?
>
> Hi,
> debugQuery response contains matched terms as well. It's just a little bit
> hard to read.
>
> On Wed, Jun 3, 2020 at 3:55 PM Serkan KAZANCI 
> wrote:
>
> > Hi,
> >
> >
> >
> > Is it possible to retrieve the terms that are used to match the document?
> > (Keyword term itself, stemmed versions of term, term matched from
> > synonyms.txt)
> >
> >
> >
> > Example:  search keyword "heaven"
> >
> >
> >
> > Found in document1 via "heavens" and "heaven", found in document2 via
> > "heavenly" , found in document3 via "paradise" (because of synonyms.txt)
> >
> >
> >
> > I looked into debug mode but I believe it returns information about the
> > ranking calculation.
> >
> >
> >
> > Thanks,
> >
> >
> >
> > Serkan
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
> --
> Sincerely yours
> Mikhail Khludnev
>
>

-- 
Sincerely yours
Mikhail Khludnev


RE: Autoscaling using SolrCloud8.5 on AWS EKS - issue with Node Added trigger

2020-06-03 Thread Mangla,Kirti
Hi,

Looking for help on this issue.
Anyone has faced this problem?

Thanks,
Kirti Mangla
Software Engineer- Gartner Digital Markets - GetApp
Two Horizon Center, Golf Course Road, Gurgaon, India
Direct:  +91124-4795963
[logo_small]

From: Mangla,Kirti
Sent: Wednesday, June 3, 2020 12:29 AM
To: solr-user@lucene.apache.org
Subject: Autoscaling using SolrCloud8.5 on AWS EKS - issue with Node Added 
trigger

Hi,

I have been trying to enable autoscaling on SolrCloud 8.5, with Node Added 
trigger and Node Lost trigger. The SolrCloud is running on AWS EKS pods, with 2 
nodes minimum.
I have added NodeAddedTrigger. My autoscaling API response looks like as in the 
attached file.

Whenever I scale up the SolrCloud replicas on EKS, new nodes are added to the 
cluster but the Node Added trigger throws below error:

org.apache.solr.common.SolrException: Unexpected 
exception while processing event: {
"id":"c889e6ef3b34eTcc9nazth0kbod28rj2zc84n0b",
"source":"node_added_trigger",
"eventTime":3527913768203086,
"eventType":"NODEADDED",
"properties":{
"eventTimes":[3527913768203086],
"preferredOperation":"addreplica",
"_enqueue_time_":3527918773192489,
"nodeNames":["solrcloud-2.solrcluster:8983_solr"],
"replicaType":"NRT"}}
at 
org.apache.solr.cloud.autoscaling.ComputePlanAction.process(ComputePlanAction.java:161)
at 
org.apache.solr.cloud.autoscaling.ScheduledTriggers.lambda$null$3(ScheduledTriggers.java:326)
at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown 
Source)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: org.apache.solr.common.SolrException: 
org.apache.solr.common.SolrException: Error getting remote info
at 
org.apache.solr.common.cloud.rule.ImplicitSnitch.getTags(ImplicitSnitch.java:78)
at 
org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider.fetchTagValues(SolrClientNodeStateProvider.java:139)
at 
org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider.getNodeValues(SolrClientNodeStateProvider.java:128)
at 
org.apache.solr.client.solrj.cloud.autoscaling.Row.(Row.java:71)
at 
org.apache.solr.client.solrj.cloud.autoscaling.Policy$Session.(Policy.java:575)
at 
org.apache.solr.client.solrj.cloud.autoscaling.Policy.createSession(Policy.java:396)
at 
org.apache.solr.client.solrj.cloud.autoscaling.Policy.createSession(Policy.java:358)
at 
org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper$SessionRef.createSession(PolicyHelper.java:492)
at 
org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper$SessionRef.get(PolicyHelper.java:457)
at 
org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper.getSession(PolicyHelper.java:513)
at 
org.apache.solr.cloud.autoscaling.ComputePlanAction.process(ComputePlanAction.java:90)
... 7 more
Caused by: org.apache.solr.common.SolrException: Error 
getting remote info
at 
org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider$AutoScalingSnitch.getRemoteInfo(SolrClientNodeStateProvider.java:364)
at 
org.apache.solr.common.cloud.rule.ImplicitSnitch.getTags(ImplicitSnitch.java:76)
... 17 more
Caused by: org.apache.solr.common.SolrException: Could not 
get remote info after many retries on NoHttpResponseException
at 
org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider$AutoScalingSnitch.getRemoteInfo(SolrClientNodeStateProvider.java:335)
... 18 more
 

Looking for help on the subject.
Please let me know for doubts.

Thanks,
Kirti Mangla




If you are not the intended recipient or have received this message in error, 
please notify the sender and permanently delete this message and any 
attachments.
curl -s -u solr: http://solr-service:8983/api/cluster/autoscaling
{
  "responseHeader":{
"status":0,
"QTime":0},
  "cluster-preferences":[{
  "minimize":"cores",
  "precision":1}
,{
  "maximize":"freedisk"}],
  "cluster-policy":[{
  "replica":"2",

Re: Periodically 100% cpu and high load/IO

2020-06-03 Thread Marvin Bredal Lillehaug
Yes, there are light/moderate indexing most of the time.
The setup has NRT replicas. And the shards are around 45GB each.
Index merging has been the hypothesis for some time, but we haven't dared
to activate info stream logging.

On Wed, Jun 3, 2020 at 2:34 PM Erick Erickson 
wrote:

> One possibility is merging index segments. When this happens, are you
> actively indexing? And are these NRT replicas or TLOG/PULL? If the latter,
> are your TLOG leaders on the affected machines?
>
> Best,
> Erick
>
> > On Jun 3, 2020, at 3:57 AM, Marvin Bredal Lillehaug <
> marvin.lilleh...@gmail.com> wrote:
> >
> > Hi,
> > We have a cluster with five Solr(8.5.1, Java 11) nodes, and sometimes one
> > or two nodes has Solr running with 100% cpu on all cores, «load» over
> 400,
> > and high IO. It usually lasts five to ten minutes, and the node is hardly
> > responding.
> > Does anyone have any experience with this type of behaviour? Is there any
> > logging other than infostream that could give any information?
> >
> > We managed to trigger a thread dump,
> >
> >> java.base@11.0.6
> >>
> /java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:112)
> >> org.apache.lucene.util.IOUtils.fsync(IOUtils.java:483)
> >> org.apache.lucene.store.FSDirectory.fsync(FSDirectory.java:331)
> >> org.apache.lucene.store.FSDirectory.sync(FSDirectory.java:286)
> >>
> >>
> org.apache.lucene.store.NRTCachingDirectory.sync(NRTCachingDirectory.java:158)
> >>
> >>
> org.apache.lucene.store.LockValidatingDirectoryWrapper.sync(LockValidatingDirectoryWrapper.java:68)
> >> org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4805)
> >>
> >>
> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3277)
> >>
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3445)
> >> org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3410)
> >>
> >>
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:678)
> >>
> >>
> org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:636)
> >>
> >>
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:337)
> >> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:318)
> >
> >
> > But not sure if this is from the incident or just right after. It seems
> > strange that a fsync should behave like this.
> >
> > Swappiness is set to default for RHEL 7 (Ops have resisted turning it
> off)
> >
> > --
> > Kind regards,
> > Marvin B. Lillehaug
>
>

-- 
med vennlig hilsen,
Marvin B. Lillehaug


RE: which terms are used at the matched document?

2020-06-03 Thread Serkan KAZANCI
Dobry den Mikhail,

So I searched for "banka" which means "bank" at my language. Below is 
highlighted fragments of a matched document. You can see from mark tags that 
"Bankalar", "banka", "bankaya", "bankalar" terms exist in document,


"highlighting":{
"/var/www/vhosts/deneme.biz/httpdocs/kho3/ibb/files/7d-2000-4267.htm":{
  "content":["Anlamda Bankalar Arası Mevduat Sayılamayacağı ) 
\n\n • TÜRKİYE'DEKİ BİR BANKANIN YURTDIŞINDAKİ 
BANKAYA PARA YATIRMASI ( Banka ve Sigorta ",
"anlamında kurulmuş bir banka olarak değerlendirilmesine 
ve davacı Banka tarafından yurt dışındaki bankaya 
yatırılan mevduatın da bankalar arası mevduat ",
"anlamında kurulmuş bir banka olarak değerlendirilmesine 
ve davacı Banka tarafından yurt dışındaki bankaya 
yatırılan mevduatın da bankalar arası mevduat "]},



Below is debug-explain part of the response about the same document, how or 
where should I read the variations matched term "banka" ? ("Bankalar", 
"bankaya" )


"explain":{
  "/var/www/vhosts/deneme.biz/httpdocs/kho3/ibb/files/7d-2000-4267.htm":{
"match":true,
"value":2.6295655,
"description":"max of:",
"details":[{
"match":true,
"value":2.6295655,
"description":"weight(content:banka in 7179) [SchemaSimilarity], 
result of:",
"details":[{
"match":true,
"value":2.6295655,
"description":"score(freq=58.0), computed as boost * idf * tf 
from:",
"details":[{
"match":true,
"value":2.6807382,
"description":"idf, computed as log(1 + (N - n + 0.5) / (n 
+ 0.5)) from:",
"details":[{
"match":true,
"value":3361,
"description":"n, number of documents containing term"},
  {
"match":true,
"value":49063,
"description":"N, total number of documents with 
field"}]},
  {
"match":true,
"value":0.980911,
"description":"tf, computed as freq / (freq + k1 * (1 - b + 
b * dl / avgdl)) from:",
"details":[{
"match":true,
"value":58.0,
"description":"freq, occurrences of term within 
document"},
  {
"match":true,
"value":1.2,
"description":"k1, term saturation parameter"},
  {
"match":true,
"value":0.75,
"description":"b, length normalization parameter"},
  {
"match":true,
"value":664.0,
"description":"dl, length of field (approximate)"},
  {
"match":true,
"value":721.1222,
"description":"avgdl, average length of field"}]}]}]}]},


-Original Message-
From: Mikhail Khludnev [mailto:m...@apache.org] 
Sent: Wednesday, June 3, 2020 4:39 PM
To: solr-user
Subject: Re: which terms are used at the matched document?

Hi,
debugQuery response contains matched terms as well. It's just a little bit
hard to read.

On Wed, Jun 3, 2020 at 3:55 PM Serkan KAZANCI  wrote:

> Hi,
>
>
>
> Is it possible to retrieve the terms that are used to match the document?
> (Keyword term itself, stemmed versions of term, term matched from
> synonyms.txt)
>
>
>
> Example:  search keyword "heaven"
>
>
>
> Found in document1 via "heavens" and "heaven", found in document2 via
> "heavenly" , found in document3 via "paradise" (because of synonyms.txt)
>
>
>
> I looked into debug mode but I believe it returns information about the
> ranking calculation.
>
>
>
> Thanks,
>
>
>
> Serkan
>
>
>
>
>
>
>
>
>
>

-- 
Sincerely yours
Mikhail Khludnev



solrj - get metrics from all nodes

2020-06-03 Thread lstusr 5u93n4
Hi All,

I'm attempting to connect to the metrics api in solrj to query metrics from
my cluster. Using the CloudSolrClient, I get routed to one node, and get
metrics only from that node.

I'm building my request like this:

GenericSolrRequest req = new GenericSolrRequest(METHOD.GET,
"/admin/metrics", new MapSolrParams(params));

 NamedList resp = getCloudSolrClient().request(req);

And this returns metrics only from the node that gets selected by the
LbHttpClient (I think).

Is there an easy way to query all of the nodes for their metrics in solrj?

Kyle


Re: backup compression

2020-06-03 Thread Jan Høydahl
I see from the original issue https://issues.apache.org/jira/browse/SOLR-5750 
 that backup compression was 
thought of but not implemented.
I don’t see any open JIRAs for it either. Would be great to have a ‘compress’ 
option to the command, and have restore auto-detect whether the backup is 
compressed.

Jan

> 3. jun. 2020 kl. 15:19 skrev Gell-Holleron, Daniel 
> :
> 
> Hi there,
> 
> I wanted to know as part of a backup (action=BACKUP) that compression can be 
> used as part of the command?
> 
> Going forward as more data is pumped into Solr the backups are going to be 
> very large.
> 
> Aside from applying compression to the folder the backup gets written to, is 
> there a compress command that can be used when calling a backup? I cannot see 
> anything in the Solr guide to suggest that there is.
> 
> Thanks,
> 
> Daniel
> 



Re: which terms are used at the matched document?

2020-06-03 Thread Mikhail Khludnev
Hi,
debugQuery response contains matched terms as well. It's just a little bit
hard to read.

On Wed, Jun 3, 2020 at 3:55 PM Serkan KAZANCI  wrote:

> Hi,
>
>
>
> Is it possible to retrieve the terms that are used to match the document?
> (Keyword term itself, stemmed versions of term, term matched from
> synonyms.txt)
>
>
>
> Example:  search keyword "heaven"
>
>
>
> Found in document1 via "heavens" and "heaven", found in document2 via
> "heavenly" , found in document3 via "paradise" (because of synonyms.txt)
>
>
>
> I looked into debug mode but I believe it returns information about the
> ranking calculation.
>
>
>
> Thanks,
>
>
>
> Serkan
>
>
>
>
>
>
>
>
>
>

-- 
Sincerely yours
Mikhail Khludnev


backup compression

2020-06-03 Thread Gell-Holleron, Daniel
Hi there,

I wanted to know as part of a backup (action=BACKUP) that compression can be 
used as part of the command?

Going forward as more data is pumped into Solr the backups are going to be very 
large.

Aside from applying compression to the folder the backup gets written to, is 
there a compress command that can be used when calling a backup? I cannot see 
anything in the Solr guide to suggest that there is.

Thanks,

Daniel



which terms are used at the matched document?

2020-06-03 Thread Serkan KAZANCI
Hi,

 

Is it possible to retrieve the terms that are used to match the document?
(Keyword term itself, stemmed versions of term, term matched from
synonyms.txt)

 

Example:  search keyword "heaven" 

 

Found in document1 via "heavens" and "heaven", found in document2 via
"heavenly" , found in document3 via "paradise" (because of synonyms.txt)

 

I looked into debug mode but I believe it returns information about the
ranking calculation.

 

Thanks,

 

Serkan

 

 

 

 



Re: Periodically 100% cpu and high load/IO

2020-06-03 Thread Erick Erickson
One possibility is merging index segments. When this happens, are you actively 
indexing? And are these NRT replicas or TLOG/PULL? If the latter, are your TLOG 
leaders on the affected machines?

Best,
Erick

> On Jun 3, 2020, at 3:57 AM, Marvin Bredal Lillehaug 
>  wrote:
> 
> Hi,
> We have a cluster with five Solr(8.5.1, Java 11) nodes, and sometimes one
> or two nodes has Solr running with 100% cpu on all cores, «load» over 400,
> and high IO. It usually lasts five to ten minutes, and the node is hardly
> responding.
> Does anyone have any experience with this type of behaviour? Is there any
> logging other than infostream that could give any information?
> 
> We managed to trigger a thread dump,
> 
>> java.base@11.0.6
>> /java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:112)
>> org.apache.lucene.util.IOUtils.fsync(IOUtils.java:483)
>> org.apache.lucene.store.FSDirectory.fsync(FSDirectory.java:331)
>> org.apache.lucene.store.FSDirectory.sync(FSDirectory.java:286)
>> 
>> org.apache.lucene.store.NRTCachingDirectory.sync(NRTCachingDirectory.java:158)
>> 
>> org.apache.lucene.store.LockValidatingDirectoryWrapper.sync(LockValidatingDirectoryWrapper.java:68)
>> org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4805)
>> 
>> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3277)
>> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3445)
>> org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3410)
>> 
>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:678)
>> 
>> org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:636)
>> 
>> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:337)
>> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:318)
> 
> 
> But not sure if this is from the incident or just right after. It seems
> strange that a fsync should behave like this.
> 
> Swappiness is set to default for RHEL 7 (Ops have resisted turning it off)
> 
> -- 
> Kind regards,
> Marvin B. Lillehaug



Re: solr 8.4.1 with ssl tls1.2 creating an issue with non-leader node

2020-06-03 Thread yaswanth kumar
thanks Franke,

I now made the use of the default jetty-ssl.xml that comes with the solr
package, but the issue is still happening when I try to push data to a
non-leader node.

Do you still think if its something to do with the configurations ??

Thanks,

On Wed, Jun 3, 2020 at 12:29 AM Jörn Franke  wrote:

> Why in the jetty-ssl.xml?
>
> Should this not be configured in the solr.in.sh?
>
> > Am 03.06.2020 um 00:38 schrieb yaswanth kumar :
> >
> > Thanks Franke, but yes for all these questions I did configured it
> > properly, I made sure to include
> >
> >  > default="JKS"/>
> >   > default="JKS"/>
> > in the jetty-ssl.xml along with the path keystore and truststore.
> >
> > Also I have made sure that trusstore exists on all nodes and also I am
> > using the same file for both keystore and truststore as below
> >  > default="./etc/solr-keystore.jks"/>
> >   > name="solr.jetty.keystore.password" default=""/>
> >   > default="./etc/solr-keystore.jks"/>
> >   > name="solr.jetty.truststore.password" default=""/>
> >
> > also urlScheme for ZK is set to https
> >
> >
> > Also the main error that I posted is the one that I am seeing as a return
> > response where as the below one is what I see from solr logs
> >
> > 2020-06-02 22:32:04.472 ERROR (qtp984876512-93) [c:default s:shard1
> > r:core_node3 x:default_shard1_replica_n1] o.a.s.s.HttpSolrCall
> > null:org.apache.solr.update.processor.Distr$
> >at
> >
> org.apache.solr.update.processor.DistributedZkUpdateProcessor.doDistribFinish(DistributedZkUpdateProcessor.java:1189)
> >at
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1096)
> >at
> >
> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182)
> >at
> >
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
> >at
> >
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
> >at
> >
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
> >at
> >
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
> >at
> >
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
> >at
> >
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
> >at
> >
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
> >at
> >
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
> >at
> >
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:78)
> >at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
> >at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)
> >at
> > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799)
> >at
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578)
> >at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
> >at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
> >at
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
> >at
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
> >at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
> >at
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> >at
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> >at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
> >
> >
> > One strange observation is that when I hit update api on the leader node
> > its working without any error, and now immediately if I hit non-leader
> its
> > working fine (only once or twice), but if I keep on trying to hit this
> node
> > again and again its then throwing the above error and once the error
> > started happening , its consistent again.
> >
> > Please let me know if you need more information or if I am missing
> > something else
> >
> > Thanks,
> >
> >> On Tue, Jun 2, 2020 at 4:59 PM Jörn Franke 
> wrote:
> >>
> >> Have you looked in the logfiles?
> >>
> >> Keystore Type correctly defined  on all nodes?
> >>
> >> Have you configured the truststore on all nodes correctly?
> >>
> >> Have you set clusterprop urlScheme to htttps in ZK?
> >>
> >>
> >>
> https://lucene.apache.org/solr/guide/7_5/enabling-ssl.html#configure-zookeeper
> >>
> >>
> >>
>  Am 02.06.2020 um 18:57 schrieb yaswanth kumar  >:
> >>>
> >>> team, can someone help me on the 

Bi-Directional CDCR

2020-06-03 Thread Gell-Holleron, Daniel
Hi there,

I need some advice on how Bi-Directional CDCR is properly configured.

I've created a collection on Site A (3 Solr nodes, 5 ZooKeepers). I've also 
created a collection on site B (3 Solr nodes, 5 ZooKeepers). These both have 
the same number of shards (not sure if that is a factor or not?)

I've configured the solrconfig.xml file as below on SiteA. I've then done the 
same on SiteB, where zkHosts are siteA's and the source and target have 
switched around. Once these were done I then ran the config update to ZooKeeper 
on both sites.


${solr.ulog.dir:}
${solr.ulog.numVersionBuckets:65536}
  


  
  
cdcr-processor-chain
  



  
  



  
siteA-zook01:2181,siteA-zook02:2181,siteA-zook03:2181,siteA-zook04:2181,siteA-zook05:2181

siteACollection
siteBCollection
  

  
8
1000
128
  

  
1000
  

After is I then did the following:


  *   Start ZooKeeper on Site A
  *   Start ZooKeeper on Site B
  *   Start SolrCloud on Site A
  *   Start SolrCloud on Site B
  *   I then activated the CDCR on Site A and Site B using the CDCR API
  *   I then disabled the buffer on Site A and Site B using the Disable Buffer 
API

When started up, all documents on Site A appeared to syncronise across to Site 
B and their corresponding shards. However when I create a new document, it will 
send across to Site A but won't to Site B. Site B will however recognize that 
the number of documents aren't current.

Not sure if I have missed something along the way here? I'm using Solr version 
7.7.1 on a Windows Server 2016 OS.

Thanks,

Daniel



Re: Not all EML files are indexing during indexing

2020-06-03 Thread Charlie Hull
I think the OP is indexing flat files, not web pages (but otherwise, I 
agree with you that Scrapy is great - I know some of the people behind 
it too and they're a good bunch).


Charlie

On 02/06/2020 16:41, Walter Underwood wrote:

On Jun 2, 2020, at 7:40 AM, Charlie Hull  wrote:

If it was me I'd probably build a standalone indexer script in Python that did 
the file handling, called out to a separate Tika service for extraction, posted 
to Solr.

I would do the same thing, and I would base that script on Scrapy (https://scrapy.org 
). I worked on a Python-based web spider for about ten 
years.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)




--
Charlie Hull
OpenSource Connections, previously Flax

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.o19s.com



Periodically 100% cpu and high load/IO

2020-06-03 Thread Marvin Bredal Lillehaug
Hi,
We have a cluster with five Solr(8.5.1, Java 11) nodes, and sometimes one
or two nodes has Solr running with 100% cpu on all cores, «load» over 400,
and high IO. It usually lasts five to ten minutes, and the node is hardly
responding.
Does anyone have any experience with this type of behaviour? Is there any
logging other than infostream that could give any information?

We managed to trigger a thread dump,

> java.base@11.0.6
> /java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:112)
> org.apache.lucene.util.IOUtils.fsync(IOUtils.java:483)
> org.apache.lucene.store.FSDirectory.fsync(FSDirectory.java:331)
> org.apache.lucene.store.FSDirectory.sync(FSDirectory.java:286)
>
> org.apache.lucene.store.NRTCachingDirectory.sync(NRTCachingDirectory.java:158)
>
> org.apache.lucene.store.LockValidatingDirectoryWrapper.sync(LockValidatingDirectoryWrapper.java:68)
> org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4805)
>
> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3277)
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3445)
> org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3410)
>
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:678)
>
> org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:636)
>
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:337)
> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:318)


But not sure if this is from the incident or just right after. It seems
strange that a fsync should behave like this.

Swappiness is set to default for RHEL 7 (Ops have resisted turning it off)

-- 
Kind regards,
Marvin B. Lillehaug