Re: regarding exposing merge metrics

2018-01-08 Thread Shalin Shekhar Mangar
The merge metrics were enabled by default in 6.4 but they were turned
off in 6.4.2 because of large performance degradations. For more
details, see https://issues.apache.org/jira/browse/SOLR-10130

On Tue, Jan 9, 2018 at 9:11 AM, S G  wrote:
> Yes, this is actually confusing and the documentation (
> https://lucene.apache.org/solr/guide/7_2/metrics-reporting.html) does not
> help either:
>
> *Index Merge Metrics* : These metrics are collected in respective
> registries for each core (e.g., solr.core.collection1….), under the INDEX
> category.
> Basic metrics are always collected - collection of additional metrics can
> be turned on using boolean parameters in the /config/indexConfig/metrics.
>
> However, we do not see the merge-metrics being collected if the above
> config is absent. So what basic metrics are always collected for merge?
> And why do the merge metrics require an additional config while most of the
> others are reported directly?
>
> Thanks
> SG
>
>
>
> On Mon, Jan 8, 2018 at 2:02 PM, suresh pendap 
> wrote:
>
>> Hi,
>> I am following the instructions from
>> https://lucene.apache.org/solr/guide/7_1/metrics-reporting.html
>>  in order to expose the Index merge related metrics.
>>
>> The document says that we have to add the below snippet in order to expose
>> the merge metrics
>>
>> 
>>   ...
>>   
>> 
>>   524288
>>   true
>> 
>> ...
>>   
>> ...
>>
>>
>>
>> I would like to know why is this metrics not exposed by default just like
>> all the other metrics?
>>
>> Is there any performance overhead that we should be concerned about it?
>>
>> If there was no particular reason, then can we expose it by default?
>>
>>
>>
>> Regards
>> Suresh
>>



-- 
Regards,
Shalin Shekhar Mangar.


Indexing of MD5 and SHA256 fields

2018-01-08 Thread Zheng Lin Edwin Yeo
Hi,

I'm using Solr 7.2.0.

Would like to check, is there a way that we can index the MD5 and SHA256
fields that are being extracted by Tika for EML files?

Example:
X-TIKA:digest:MD5: 0035b9e8a14bb8ca5dd3c6b63a74a31d
X-TIKA:digest:SHA256:
6183ed712e318b8674da5d685be02a331feee0416f900c728aa543c0679685cc

I have set the attr_* dynamic field in the schema, but these 2 fields are
not captured.

Regards,
Edwin


Re: In-place update vs Atomic updates

2018-01-08 Thread kshitij tyagi
Hi Shawn,

Thanks for the information,

1. Does in place updates opens a new searcher by itself or not?
2. As the entire segment is rewriten, it means that frequent in place
updates are expensive as each in place update will rewrite the entire
segment again? Correct me here if my understanding is not correct.

Thanks,
Kshitij

On Mon, Jan 8, 2018 at 9:19 PM, Shawn Heisey  wrote:

> On 1/8/2018 4:05 AM, kshitij tyagi wrote:
>
>> What are the major differences between atomic and in-place updates, I have
>> gone through the documentation but it does not give detail internal
>> information.
>>
>
> Atomic updates are nearly identical to simple indexing, except that the
> existing document is read from the index to populate a new document along
> with whatever updates were requested, then the new document is indexed and
> the old one is deleted.
>
> 1. Does doing in-place update prevents solr cache burst or not, what are
>> the benefits of using in-place updates?
>>
>
> In-place updates are only possible on a field where only docValues is
> true.  The settings for things like indexed and stored must be false.
>
> An in-place update finds the segment containing the document and writes a
> whole new file containing the value of every document in the segment for
> the updated field.  If the segment contains ten million documents, then
> information for ten million values will be written for a single document
> update.
>
> I want to update one of the fields of the documnet but I do not want to
>> burst my cache.
>>
>
> When the index changes for ANY reason, no matter how the change is
> accomplished, caches must be thrown away when a new searcher is built.
> Lucene and Solr have no way of knowing that a change doesn't affect some
> cache entries, so the only thing it can do is assume that all the
> information in the cache is now invalid.  What you are asking for here is
> not possible at the moment, and chances are that if code was written to do
> it, that it would be far slower than simply invalidating the caches and
> doing autowarming.
>
> Thanks,
> Shawn
>


Re: Profanity

2018-01-08 Thread John Blythe
Gladly. Good luck!

On Mon, Jan 8, 2018 at 8:27 PM Sadiki Latty  wrote:

> Thanks for the feedback John,
>
> This is a genius idea if I don’t want to create my own processor. I could
> simply check that field for data for my reports. Either the field will have
> data or it won’t.
>
> Thanks
>
> Sid
>
> Sent from my iPhone
>
> > On Jan 8, 2018, at 4:38 PM, John Blythe  wrote:
> >
> > you could use the keepwords functionality. have a field that only keeps
> > profanity and then you can query against that field having its default
> > value vs. profane text
> >
> > --
> > John Blythe
> >
> >> On Mon, Jan 8, 2018 at 3:12 PM, Sadiki Latty  wrote:
> >>
> >> Hey
> >>
> >> I would like to find a solution to flag (at index-time) profanity.
> >> Optimally, it would be good if it function similar to stopwords in the
> >> sense that I can have a predefined list that is read and if token is on
> the
> >> list that document is 'flagged' in a different field. Does anyone know
> of
> >> solution (outside of configuring my own). If none exists and I end up
> >> configuring my own would I be doing this in the updateprcoessor phase.
> I am
> >> still fairly new to Solr, but from what I've read, that seems to be the
> >> best place to look.
> >>
> >>
> >> Thanks,
> >>
> >> Sid
> >>
>
-- 
John Blythe


Re: regarding exposing merge metrics

2018-01-08 Thread S G
Yes, this is actually confusing and the documentation (
https://lucene.apache.org/solr/guide/7_2/metrics-reporting.html) does not
help either:

*Index Merge Metrics* : These metrics are collected in respective
registries for each core (e.g., solr.core.collection1…​.), under the INDEX
category.
Basic metrics are always collected - collection of additional metrics can
be turned on using boolean parameters in the /config/indexConfig/metrics.

However, we do not see the merge-metrics being collected if the above
config is absent. So what basic metrics are always collected for merge?
And why do the merge metrics require an additional config while most of the
others are reported directly?

Thanks
SG



On Mon, Jan 8, 2018 at 2:02 PM, suresh pendap 
wrote:

> Hi,
> I am following the instructions from
> https://lucene.apache.org/solr/guide/7_1/metrics-reporting.html
>  in order to expose the Index merge related metrics.
>
> The document says that we have to add the below snippet in order to expose
> the merge metrics
>
> 
>   ...
>   
> 
>   524288
>   true
> 
> ...
>   
> ...
>
>
>
> I would like to know why is this metrics not exposed by default just like
> all the other metrics?
>
> Is there any performance overhead that we should be concerned about it?
>
> If there was no particular reason, then can we expose it by default?
>
>
>
> Regards
> Suresh
>


Re: Profanity

2018-01-08 Thread Sadiki Latty
Thanks for the feedback John,

This is a genius idea if I don’t want to create my own processor. I could 
simply check that field for data for my reports. Either the field will have 
data or it won’t. 

Thanks

Sid

Sent from my iPhone

> On Jan 8, 2018, at 4:38 PM, John Blythe  wrote:
> 
> you could use the keepwords functionality. have a field that only keeps
> profanity and then you can query against that field having its default
> value vs. profane text
> 
> --
> John Blythe
> 
>> On Mon, Jan 8, 2018 at 3:12 PM, Sadiki Latty  wrote:
>> 
>> Hey
>> 
>> I would like to find a solution to flag (at index-time) profanity.
>> Optimally, it would be good if it function similar to stopwords in the
>> sense that I can have a predefined list that is read and if token is on the
>> list that document is 'flagged' in a different field. Does anyone know of
>> solution (outside of configuring my own). If none exists and I end up
>> configuring my own would I be doing this in the updateprcoessor phase. I am
>> still fairly new to Solr, but from what I've read, that seems to be the
>> best place to look.
>> 
>> 
>> Thanks,
>> 
>> Sid
>> 


Re: Profanity

2018-01-08 Thread Sadiki Latty
Thanks a lot guys. Multilingual will also be a hurdle tbh. The data will only 
be coming From 2 languages but it will prove to be potentially challenging 
nonetheless. French and English so “merde” will be making that list. This 
requirement is in itself an edge case for my project so ML may be overkill 
hence why I was thinking the list. The data being inserted is from sources that 
we have “control” over. This requirement is simply for the worst case scenario 
that we miss something. We might also want to allow this profanity which is why 
we need to flag it rather than strip it all together. 

This provides me with great direction.

Sent from my iPhone

> On Jan 8, 2018, at 5:17 PM, Markus Jelsma  wrote:
> 
> Indeed, hence the small suggestion to use ML for this instead of a dumb set 
> of terms, which is useless in almost any real solution. We have had very good 
> results with SVM's for text processing, although in the end it depends on 
> your input data, and the care for selecting edge cases.
> 
> Regards,
> Markus
> 
> -Original message-
>> From:Davis, Daniel (NIH/NLM) [C] 
>> Sent: Monday 8th January 2018 23:12
>> To: solr-user@lucene.apache.org
>> Subject: RE: Profanity
>> 
>> Fun topic.   Same complicated issues as normal search:
>> 
>> Multilingual support?Is "Merde" profanity too, or just in French.
>> Multi-word synonyms?   Does "God Damn" becomes "goddamn", or do you 
>> treat "Damn" and "God damn" the same because you drop "God"
>>   "Merde Alors" is same as "Merde" or 
>> again multi-word synonyms
>> 
>> -Original Message-
>> From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
>> Sent: Monday, January 8, 2018 4:42 PM
>> To: solr-user@lucene.apache.org
>> Subject: RE: Profanity
>> 
>> Yes, an UpdateRequestProcessor is the API to implement for these sorts of 
>> requirements. In the URP you have access to a SolrDocument object that 
>> carries the input data. You can inspect the fields, and add, remove or 
>> modify fields if you want, or discard the input altogether.
>> 
>> So, check your text input field for 'profanity' and set another boolean 
>> field if it matches or doesn't. If you are using a list of words - or an SVM 
>> or another machine learning algorithm - to detect provanity is up to you.
>> 
>> Cheers,
>> Markus
>>   
>> -Original message-
>>> From:Sadiki Latty 
>>> Sent: Monday 8th January 2018 22:12
>>> To: solr-user@lucene.apache.org
>>> Subject: Profanity
>>> 
>>> Hey
>>> 
>>> I would like to find a solution to flag (at index-time) profanity. 
>>> Optimally, it would be good if it function similar to stopwords in the 
>>> sense that I can have a predefined list that is read and if token is on the 
>>> list that document is 'flagged' in a different field. Does anyone know of 
>>> solution (outside of configuring my own). If none exists and I end up 
>>> configuring my own would I be doing this in the updateprcoessor phase. I am 
>>> still fairly new to Solr, but from what I've read, that seems to be the 
>>> best place to look.
>>> 
>>> 
>>> Thanks,
>>> 
>>> Sid
>>> 
>> 


Re: solr 5.4.1 leader issue

2018-01-08 Thread Petersen, Robert (Contr)
OK just restarting all the solr nodes did fix it, since they are in production 
I was hesitant to do that


From: Petersen, Robert (Contr) 
Sent: Monday, January 8, 2018 12:34:28 PM
To: solr-user@lucene.apache.org
Subject: solr 5.4.1 leader issue

Hi got two out of my three servers think they are replicas on one shard getting 
exceptions wondering what is the easiest way to fix this? Can I just restart 
zookeeper across the servers? Here are the exceptions:


TY

Robi


ERROR
null
RecoveryStrategy
Error while trying to recover. 
core=custsearch_shard3_replica1:java.util.concurrent.ExecutionException: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://x.x.x.x:8983/solr: We are not the leader
Error while trying to recover. 
core=custsearch_shard3_replica1:java.util.concurrent.ExecutionException: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://x.x.x.x:8983/solr: We are not the leader
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at 
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:607)
at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:364)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:226)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:232)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://10.209.55.10:8983/solr: We are not the leader
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:575)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient$1.call(HttpSolrClient.java:285)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient$1.call(HttpSolrClient.java:281)
... 5 more
(and on the one everyone thinks is the leader)
Error while trying to recover. 
core=custsearch_shard3_replica3:org.apache.solr.common.SolrException: Cloud 
state still says we are leader.
at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:332)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:226)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:232)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)




This communication is confidential. Frontier only sends and receives email on 
the basis of the terms set out at http://www.frontier.com/email_disclaimer.


Re: solr 5.4.1 leader issue

2018-01-08 Thread Petersen, Robert (Contr)
Perhaps I didn't explain well, three nodes live. Two are in recovering mode 
exception being they cant get to the Leader because the Leader replies that he 
is not the leader. On the dashboard it shows him as the leader but he thinks he 
isn't. The exceptions are below... Do I have to just restart the solr 
instances, the zookeeper instances, both, or is there another better way 
without restarting everything?


Thx

Robi


From: Petersen, Robert (Contr) 
Sent: Monday, January 8, 2018 12:34:28 PM
To: solr-user@lucene.apache.org
Subject: solr 5.4.1 leader issue

Hi got two out of my three servers think they are replicas on one shard getting 
exceptions wondering what is the easiest way to fix this? Can I just restart 
zookeeper across the servers? Here are the exceptions:


TY

Robi


ERROR
null
RecoveryStrategy
Error while trying to recover. 
core=custsearch_shard3_replica1:java.util.concurrent.ExecutionException: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://x.x.x.x:8983/solr: We are not the leader
Error while trying to recover. 
core=custsearch_shard3_replica1:java.util.concurrent.ExecutionException: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://x.x.x.x:8983/solr: We are not the leader
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at 
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:607)
at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:364)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:226)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:232)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://10.209.55.10:8983/solr: We are not the leader
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:575)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient$1.call(HttpSolrClient.java:285)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient$1.call(HttpSolrClient.java:281)
... 5 more
(and on the one everyone thinks is the leader)
Error while trying to recover. 
core=custsearch_shard3_replica3:org.apache.solr.common.SolrException: Cloud 
state still says we are leader.
at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:332)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:226)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:232)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)




This communication is confidential. Frontier only sends and receives email on 
the basis of the terms set out at http://www.frontier.com/email_disclaimer.


RE: Profanity

2018-01-08 Thread Markus Jelsma
Indeed, hence the small suggestion to use ML for this instead of a dumb set of 
terms, which is useless in almost any real solution. We have had very good 
results with SVM's for text processing, although in the end it depends on your 
input data, and the care for selecting edge cases.

Regards,
Markus
 
-Original message-
> From:Davis, Daniel (NIH/NLM) [C] 
> Sent: Monday 8th January 2018 23:12
> To: solr-user@lucene.apache.org
> Subject: RE: Profanity
> 
> Fun topic.   Same complicated issues as normal search:
> 
> Multilingual support?    Is "Merde" profanity too, or just in French.
> Multi-word synonyms?   Does "God Damn" becomes "goddamn", or do you treat 
> "Damn" and "God damn" the same because you drop "God"
>  "Merde Alors" is same as "Merde" or 
>again multi-word synonyms
> 
> -Original Message-
> From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
> Sent: Monday, January 8, 2018 4:42 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Profanity
> 
> Yes, an UpdateRequestProcessor is the API to implement for these sorts of 
> requirements. In the URP you have access to a SolrDocument object that 
> carries the input data. You can inspect the fields, and add, remove or modify 
> fields if you want, or discard the input altogether.
> 
> So, check your text input field for 'profanity' and set another boolean field 
> if it matches or doesn't. If you are using a list of words - or an SVM or 
> another machine learning algorithm - to detect provanity is up to you.
> 
> Cheers,
> Markus
>  
> -Original message-
> > From:Sadiki Latty 
> > Sent: Monday 8th January 2018 22:12
> > To: solr-user@lucene.apache.org
> > Subject: Profanity
> > 
> > Hey
> > 
> > I would like to find a solution to flag (at index-time) profanity. 
> > Optimally, it would be good if it function similar to stopwords in the 
> > sense that I can have a predefined list that is read and if token is on the 
> > list that document is 'flagged' in a different field. Does anyone know of 
> > solution (outside of configuring my own). If none exists and I end up 
> > configuring my own would I be doing this in the updateprcoessor phase. I am 
> > still fairly new to Solr, but from what I've read, that seems to be the 
> > best place to look.
> > 
> > 
> > Thanks,
> > 
> > Sid
> > 
> 


RE: Profanity

2018-01-08 Thread Davis, Daniel (NIH/NLM) [C]
Fun topic.   Same complicated issues as normal search:

Multilingual support?Is "Merde" profanity too, or just in French.
Multi-word synonyms?   Does "God Damn" becomes "goddamn", or do you treat 
"Damn" and "God damn" the same because you drop "God"
 "Merde Alors" is same as "Merde" or again 
multi-word synonyms

-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
Sent: Monday, January 8, 2018 4:42 PM
To: solr-user@lucene.apache.org
Subject: RE: Profanity

Yes, an UpdateRequestProcessor is the API to implement for these sorts of 
requirements. In the URP you have access to a SolrDocument object that carries 
the input data. You can inspect the fields, and add, remove or modify fields if 
you want, or discard the input altogether.

So, check your text input field for 'profanity' and set another boolean field 
if it matches or doesn't. If you are using a list of words - or an SVM or 
another machine learning algorithm - to detect provanity is up to you.

Cheers,
Markus
 
-Original message-
> From:Sadiki Latty 
> Sent: Monday 8th January 2018 22:12
> To: solr-user@lucene.apache.org
> Subject: Profanity
> 
> Hey
> 
> I would like to find a solution to flag (at index-time) profanity. Optimally, 
> it would be good if it function similar to stopwords in the sense that I can 
> have a predefined list that is read and if token is on the list that document 
> is 'flagged' in a different field. Does anyone know of solution (outside of 
> configuring my own). If none exists and I end up configuring my own would I 
> be doing this in the updateprcoessor phase. I am still fairly new to Solr, 
> but from what I've read, that seems to be the best place to look.
> 
> 
> Thanks,
> 
> Sid
> 


regarding exposing merge metrics

2018-01-08 Thread suresh pendap
Hi,
I am following the instructions from
https://lucene.apache.org/solr/guide/7_1/metrics-reporting.html
 in order to expose the Index merge related metrics.

The document says that we have to add the below snippet in order to expose
the merge metrics


  ...
  

  524288
  true

...
  
...



I would like to know why is this metrics not exposed by default just like
all the other metrics?

Is there any performance overhead that we should be concerned about it?

If there was no particular reason, then can we expose it by default?



Regards
Suresh


RE: Profanity

2018-01-08 Thread Markus Jelsma
Yes, an UpdateRequestProcessor is the API to implement for these sorts of 
requirements. In the URP you have access to a SolrDocument object that carries 
the input data. You can inspect the fields, and add, remove or modify fields if 
you want, or discard the input altogether.

So, check your text input field for 'profanity' and set another boolean field 
if it matches or doesn't. If you are using a list of words - or an SVM or 
another machine learning algorithm - to detect provanity is up to you.

Cheers,
Markus
 
-Original message-
> From:Sadiki Latty 
> Sent: Monday 8th January 2018 22:12
> To: solr-user@lucene.apache.org
> Subject: Profanity
> 
> Hey
> 
> I would like to find a solution to flag (at index-time) profanity. Optimally, 
> it would be good if it function similar to stopwords in the sense that I can 
> have a predefined list that is read and if token is on the list that document 
> is 'flagged' in a different field. Does anyone know of solution (outside of 
> configuring my own). If none exists and I end up configuring my own would I 
> be doing this in the updateprcoessor phase. I am still fairly new to Solr, 
> but from what I've read, that seems to be the best place to look.
> 
> 
> Thanks,
> 
> Sid
> 


Re: Profanity

2018-01-08 Thread John Blythe
you could use the keepwords functionality. have a field that only keeps
profanity and then you can query against that field having its default
value vs. profane text

--
John Blythe

On Mon, Jan 8, 2018 at 3:12 PM, Sadiki Latty  wrote:

> Hey
>
> I would like to find a solution to flag (at index-time) profanity.
> Optimally, it would be good if it function similar to stopwords in the
> sense that I can have a predefined list that is read and if token is on the
> list that document is 'flagged' in a different field. Does anyone know of
> solution (outside of configuring my own). If none exists and I end up
> configuring my own would I be doing this in the updateprcoessor phase. I am
> still fairly new to Solr, but from what I've read, that seems to be the
> best place to look.
>
>
> Thanks,
>
> Sid
>


Profanity

2018-01-08 Thread Sadiki Latty
Hey

I would like to find a solution to flag (at index-time) profanity. Optimally, 
it would be good if it function similar to stopwords in the sense that I can 
have a predefined list that is read and if token is on the list that document 
is 'flagged' in a different field. Does anyone know of solution (outside of 
configuring my own). If none exists and I end up configuring my own would I be 
doing this in the updateprcoessor phase. I am still fairly new to Solr, but 
from what I've read, that seems to be the best place to look.


Thanks,

Sid


Re: solr 5.4.1 leader issue

2018-01-08 Thread Petersen, Robert (Contr)
I'm on zookeeper 3.4.8


From: Petersen, Robert (Contr) 
Sent: Monday, January 8, 2018 12:34:28 PM
To: solr-user@lucene.apache.org
Subject: solr 5.4.1 leader issue

Hi got two out of my three servers think they are replicas on one shard getting 
exceptions wondering what is the easiest way to fix this? Can I just restart 
zookeeper across the servers? Here are the exceptions:


TY

Robi


ERROR
null
RecoveryStrategy
Error while trying to recover. 
core=custsearch_shard3_replica1:java.util.concurrent.ExecutionException: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://x.x.x.x:8983/solr: We are not the leader
Error while trying to recover. 
core=custsearch_shard3_replica1:java.util.concurrent.ExecutionException: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://x.x.x.x:8983/solr: We are not the leader
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at 
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:607)
at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:364)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:226)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:232)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://10.209.55.10:8983/solr: We are not the leader
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:575)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient$1.call(HttpSolrClient.java:285)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient$1.call(HttpSolrClient.java:281)
... 5 more
(and on the one everyone thinks is the leader)
Error while trying to recover. 
core=custsearch_shard3_replica3:org.apache.solr.common.SolrException: Cloud 
state still says we are leader.
at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:332)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:226)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:232)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)




This communication is confidential. Frontier only sends and receives email on 
the basis of the terms set out at http://www.frontier.com/email_disclaimer.


solr 5.4.1 leader issue

2018-01-08 Thread Petersen, Robert (Contr)
Hi got two out of my three servers think they are replicas on one shard getting 
exceptions wondering what is the easiest way to fix this? Can I just restart 
zookeeper across the servers? Here are the exceptions:


TY

Robi


ERROR
null
RecoveryStrategy
Error while trying to recover. 
core=custsearch_shard3_replica1:java.util.concurrent.ExecutionException: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://x.x.x.x:8983/solr: We are not the leader
Error while trying to recover. 
core=custsearch_shard3_replica1:java.util.concurrent.ExecutionException: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://x.x.x.x:8983/solr: We are not the leader
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at 
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:607)
at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:364)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:226)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:232)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://10.209.55.10:8983/solr: We are not the leader
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:575)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient$1.call(HttpSolrClient.java:285)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient$1.call(HttpSolrClient.java:281)
... 5 more
(and on the one everyone thinks is the leader)
Error while trying to recover. 
core=custsearch_shard3_replica3:org.apache.solr.common.SolrException: Cloud 
state still says we are leader.
at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:332)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:226)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:232)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)




This communication is confidential. Frontier only sends and receives email on 
the basis of the terms set out at http://www.frontier.com/email_disclaimer.


SSL configuration with Master/Slave

2018-01-08 Thread Sundaram, Dinesh
Team,

I'm facing an SSL issue while configuring Master/Slave. Master runs fine lone 
with SSL and Slave runs fine lone with SSL but getting SSL exception during the 
synch up. It gives the below error. I believe we need to trust the target 
server at source. Can you give me the steps to allow inbound calls at source 
jvm. FYI, the same synch up works fine via http.

2018-01-08 13:57:06.735 WARN  (qtp33524623-16) [c:dm-global s:shard1 
r:core_node2 x:dm-global_shard1_replica_n1] o.a.s.h.ReplicationHandler 
Exception while invoking 'details' method for replication on master
org.apache.solr.client.solrj.SolrServerException: IOException occured when 
talking to server at: 
https://test21.mastercard.int:8983/solr/dm-global_shard1_replica_n1
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:640)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:253)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:242)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
at 
org.apache.solr.handler.IndexFetcher.getDetails(IndexFetcher.java:1823)
at 
org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:954)
at 
org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:332)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2484)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:720)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:526)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:745)
Caused by: javax.net.ssl.SSLHandshakeException: 
sun.security.validator.ValidatorException: PKIX path building failed: 
sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
valid certification path to requested target
at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296)
at 

Re: Limit search queries only to pull replicas

2018-01-08 Thread Tomas Fernandez Lobbe
This feature is not currently supported. I was thinking in implementing it by 
extending the work done in SOLR-10880. I still didn’t have time to work on it 
though.  There is a patch for SOLR-10880 that doesn’t implement support for 
replica types, but could be used as base. 

Tomás

> On Jan 8, 2018, at 12:04 AM, Ere Maijala  wrote:
> 
> Server load alone doesn't always indicate the server's ability to serve 
> queries. Memory and cache state are important too, and they're not as easy to 
> monitor. Additionally, server load at any single point in time or a short 
> term average is not indicative of the server's ability to handle search 
> requests if indexing happens in short but intense bursts.
> 
> It can also complicate things if there are more than one Solr instance 
> running on a single server.
> 
> I'm definitely not against intelligent routing. In many cases it makes 
> perfect sense, and I'd still like to use it, just limited to the pull 
> replicas.
> 
> --Ere
> 
> Erick Erickson kirjoitti 5.1.2018 klo 19.03:
>> Actually, I think a much better option is to route queries to server load.
>> The theory of preferring pull replicas to leaders would be that the leader
>> will be doing the indexing work and the pull replicas would be doing less
>> work therefore serving queries faster. But that's a fragile assumption.
>> Let's say indexing stops totally. Now your leader is sitting there idle
>> when it could be serving queries.
>> The autoscaling work will allow for more intelligent routing, you can
>> monitor the CPU load on your servers and if the leader has some spare
>> cycles use them .vs. crudely routing all queries to pull replicas (or tlog
>> replicas for that matter). NOTE: I don't know whether this is being
>> actively worked on or not, but seems a logical extension of the increased
>> monitoring capabilities being put in place for autoscaling, but I'd rather
>> see effort put in there than support routing based solely on a node's type.
>> Best,
>> Erick
>> On Fri, Jan 5, 2018 at 7:51 AM, Emir Arnautović <
>> emir.arnauto...@sematext.com> wrote:
>>> It is interesting that ES had similar feature to prefer primary/replica
>>> but it deprecating that and will remove it - could not find explanation why.
>>> 
>>> Emir
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>> 
>>> 
>>> 
 On 5 Jan 2018, at 15:22, Ere Maijala  wrote:
 
 Hi,
 
 It would be really nice to have a server-side option, though. Not
>>> everyone uses Solrj, and a typical fairly dummy client just queries the
>>> server without any understanding about shards etc. Solr could be clever
>>> enough to not forward the query to NRT shards when configured to prefer
>>> PULL shards and they're available. Maybe it could be something similar to
>>> the preferLocalShards parameter, like "preferShardTypes=TLOG,PULL".
 
 --Ere
 
 Emir Arnautović kirjoitti 14.12.2017 klo 11.41:
> Hi Stanislav,
> I don’t think that there is a built in feature to do this, but that
>>> sounds like nice feature of Solrj - maybe you should check if available.
>>> You can implement it outside of Solrj - check cluster state to see which
>>> shards are available and send queries only to pull replicas.
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> On 14 Dec 2017, at 09:58, Stanislav Sandalnikov <
>>> s.sandalni...@gmail.com> wrote:
>> 
>> Hi,
>> 
>> We have a Solr 7.1 setup with SolrCloud where we have multiple shards
>>> on one server (for indexing) each shard has a pull replica on other servers.
>> 
>> What are the possible ways to limit search request only to pull type
>>> replicase?
>> At the moment the only solution I found is to append shards parameter
>>> to each query, but if new shards added later it requires to change
>>> solrconfig. Is it the only way to do this?
>> 
>> Thank you
>> 
>> Regards
>> Stanislav
>> 
 
 --
 Ere Maijala
 Kansalliskirjasto / The National Library of Finland
>>> 
>>> 
> 
> -- 
> Ere Maijala
> Kansalliskirjasto / The National Library of Finland



Re: Newbie Question

2018-01-08 Thread Deepak Goel
Got it . Thank You for your help



Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Mon, Jan 8, 2018 at 11:48 PM, Deepak Goel  wrote:

> *Is this right?*
>
> SolrClient client = new HttpSolrClient.Builder("http:/
> /localhost:8983/solr/shakespeare/select").build();
>
> SolrQuery query = new SolrQuery();
> query.setQuery("henry");
> query.setFields("text_entry");
> query.setStart(0);
>
> queryResponse = client.query(query);
>
> *This is still returning NULL*
>
>
>
> 
>  Virus-free.
> www.avg.com
> 
> <#m_-1646772333528808550_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
>
>
> Deepak
> "Please stop cruelty to Animals, help by becoming a Vegan"
> +91 73500 12833
> deic...@gmail.com
>
> Facebook: https://www.facebook.com/deicool
> LinkedIn: www.linkedin.com/in/deicool
>
> "Plant a Tree, Go Green"
>
> On Mon, Jan 8, 2018 at 10:55 PM, Alexandre Rafalovitch  > wrote:
>
>> I think you are missing /query handler endpoint in the URL. Plus actual
>> search parameters.
>>
>> You may try using the admin UI to build your queries first.
>>
>> Regards,
>> Alex
>>
>> On Jan 8, 2018 12:23 PM, "Deepak Goel"  wrote:
>>
>> > Hello
>> >
>> > *I am trying to search for documents in my collection (Shakespeare). The
>> > code is as follows:*
>> >
>> > SolrClient client = new HttpSolrClient.Builder("
>> > http://localhost:8983/solr/shakespeare;).build();
>> >
>> > SolrDocument doc = client.getById("2");
>> > *However this does not return any document. What mistake am I making?*
>> >
>> > Thank You
>> > Deepak
>> >
>> > Deepak
>> > "Please stop cruelty to Animals, help by becoming a Vegan"
>> > +91 73500 12833
>> > deic...@gmail.com
>> >
>> > Facebook: https://www.facebook.com/deicool
>> > LinkedIn: www.linkedin.com/in/deicool
>> >
>> > "Plant a Tree, Go Green"
>> >
>> > > > utm_source=link_campaign=sig-email_content=webmail>
>> > Virus-free.
>> > www.avg.com
>> > > > utm_source=link_campaign=sig-email_content=webmail>
>> > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>> >
>>
>
>


Re: Newbie Question

2018-01-08 Thread Deepak Goel
*Is this right?*

SolrClient client = new HttpSolrClient.Builder("
http://localhost:8983/solr/shakespeare/select;).build();

SolrQuery query = new SolrQuery();
query.setQuery("henry");
query.setFields("text_entry");
query.setStart(0);

queryResponse = client.query(query);

*This is still returning NULL*



Virus-free.
www.avg.com

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>



Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Mon, Jan 8, 2018 at 10:55 PM, Alexandre Rafalovitch 
wrote:

> I think you are missing /query handler endpoint in the URL. Plus actual
> search parameters.
>
> You may try using the admin UI to build your queries first.
>
> Regards,
> Alex
>
> On Jan 8, 2018 12:23 PM, "Deepak Goel"  wrote:
>
> > Hello
> >
> > *I am trying to search for documents in my collection (Shakespeare). The
> > code is as follows:*
> >
> > SolrClient client = new HttpSolrClient.Builder("
> > http://localhost:8983/solr/shakespeare;).build();
> >
> > SolrDocument doc = client.getById("2");
> > *However this does not return any document. What mistake am I making?*
> >
> > Thank You
> > Deepak
> >
> > Deepak
> > "Please stop cruelty to Animals, help by becoming a Vegan"
> > +91 73500 12833
> > deic...@gmail.com
> >
> > Facebook: https://www.facebook.com/deicool
> > LinkedIn: www.linkedin.com/in/deicool
> >
> > "Plant a Tree, Go Green"
> >
> >  > utm_source=link_campaign=sig-email_content=webmail>
> > Virus-free.
> > www.avg.com
> >  > utm_source=link_campaign=sig-email_content=webmail>
> > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> >
>


Re: Newbie Question

2018-01-08 Thread Alexandre Rafalovitch
I think you are missing /query handler endpoint in the URL. Plus actual
search parameters.

You may try using the admin UI to build your queries first.

Regards,
Alex

On Jan 8, 2018 12:23 PM, "Deepak Goel"  wrote:

> Hello
>
> *I am trying to search for documents in my collection (Shakespeare). The
> code is as follows:*
>
> SolrClient client = new HttpSolrClient.Builder("
> http://localhost:8983/solr/shakespeare;).build();
>
> SolrDocument doc = client.getById("2");
> *However this does not return any document. What mistake am I making?*
>
> Thank You
> Deepak
>
> Deepak
> "Please stop cruelty to Animals, help by becoming a Vegan"
> +91 73500 12833
> deic...@gmail.com
>
> Facebook: https://www.facebook.com/deicool
> LinkedIn: www.linkedin.com/in/deicool
>
> "Plant a Tree, Go Green"
>
>  utm_source=link_campaign=sig-email_content=webmail>
> Virus-free.
> www.avg.com
>  utm_source=link_campaign=sig-email_content=webmail>
> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>


Newbie Question

2018-01-08 Thread Deepak Goel
Hello

*I am trying to search for documents in my collection (Shakespeare). The
code is as follows:*

SolrClient client = new HttpSolrClient.Builder("
http://localhost:8983/solr/shakespeare;).build();

SolrDocument doc = client.getById("2");
*However this does not return any document. What mistake am I making?*

Thank You
Deepak

Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"


Virus-free.
www.avg.com

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


Re: docValues with stored and useDocValuesAsStored

2018-01-08 Thread Shalin Shekhar Mangar
Hi Bernd,

If Solr can fetch a field from both stored and docValues then it
chooses docValues only if such field is single-valued and that allows
Solr to avoid accessing the stored document altogether for *all*
fields to be returned. Otherwise stored values are preferred. This is
the behavior since 7.1+

On Mon, Jan 8, 2018 at 2:25 PM, Bernd Fehling
 wrote:
> What is the precedence when docValues with stored=true is used?
> e.g.
>  docValues="true" />
>
> My guess, because of useDocValuesAsStored=true is default, that stored=true is
> ignored and the values are pulled from docValues.
>
> And only if useDocValuesAsStored=false is explicitly used then stored=true 
> comes
> into play.
>
> Or short, useDocValuesAsStored=true (the default) has precedence over 
> stored=true.
> I this right?
>
> Regards
> Bernd



-- 
Regards,
Shalin Shekhar Mangar.


Re: In-place update vs Atomic updates

2018-01-08 Thread Shawn Heisey

On 1/8/2018 4:05 AM, kshitij tyagi wrote:

What are the major differences between atomic and in-place updates, I have
gone through the documentation but it does not give detail internal
information.


Atomic updates are nearly identical to simple indexing, except that the 
existing document is read from the index to populate a new document 
along with whatever updates were requested, then the new document is 
indexed and the old one is deleted.



1. Does doing in-place update prevents solr cache burst or not, what are
the benefits of using in-place updates?


In-place updates are only possible on a field where only docValues is 
true.  The settings for things like indexed and stored must be false.


An in-place update finds the segment containing the document and writes 
a whole new file containing the value of every document in the segment 
for the updated field.  If the segment contains ten million documents, 
then information for ten million values will be written for a single 
document update.



I want to update one of the fields of the documnet but I do not want to
burst my cache.


When the index changes for ANY reason, no matter how the change is 
accomplished, caches must be thrown away when a new searcher is built. 
Lucene and Solr have no way of knowing that a change doesn't affect some 
cache entries, so the only thing it can do is assume that all the 
information in the cache is now invalid.  What you are asking for here 
is not possible at the moment, and chances are that if code was written 
to do it, that it would be far slower than simply invalidating the 
caches and doing autowarming.


Thanks,
Shawn


Re: SolrJ with Async Http Client

2018-01-08 Thread Emir Arnautović
Not sure if alilgns with your expectations, but here is something that is 
declared as “async solr client”: https://github.com/inoio/solrs 


HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 2 Jan 2018, at 16:31, RAUNAK AGRAWAL  wrote:
> 
> Hi Guys,
> 
> I am trying to write fully async service where solr calls are also async.
> Just wondering did anyone tried calling solr in non-blocking mode or is
> there is a way to do it? I have come across one such project
>  but wondering is there anything provided
> by solrj?
> 
> Thanks



Re: Personalized search parameters

2018-01-08 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
I'm assuming that you are writing the cosine similarity and you have two 
vectors containing the pairs . The two vectors could have 
different sizes because they only contain the terms that have tfidf != 0.
if you want to compute cosine similarity between the two lists you just have to 
consider the pairs that appears in **both the vectors**, because otherwise if a 
term doesn't appear in one of the two the product is going to be 0, so it will 
not contribute to the final tfidf score. 

(Really old) Example: 
https://github.com/diegoceccarelli/dexter/blob/fb4bbcb27a13da2665f3c19d6c75bfc4f5778440/dexter-core/src/main/java/it/cnr/isti/hpc/dexter/lucene/LuceneHelper.java#L386


From: solr-user@lucene.apache.org At: 01/06/18 17:24:07To:  
solr-user@lucene.apache.org
Subject: Re: Personalized search parameters

Don't we need vectors of the same size to calculate the cosine similarity? 
Maybe I missed something, but following that example it looks like i have to
manually recreate the sparse vectors, because the term vector of a document
should (i may be wrong) contain only the terms that appear in that document.
Am I wrong?

Given that i assumed (and that example goes in that direction) that we have
to manually create the sparse vector by first collecting all the terms and
then calculating the tf-idf frequency for each term in each document.
That's what i did, and I obtained vectors of the same dimension for each
document, i was just wondering if there was a better optimized way to obtain
those sparse vectors.


--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html




In-place update vs Atomic updates

2018-01-08 Thread kshitij tyagi
Hi,

What are the major differences between atomic and in-place updates, I have
gone through the documentation but it does not give detail internal
information.

1. Does doing in-place update prevents solr cache burst or not, what are
the benefits of using in-place updates?

I want to update one of the fields of the documnet but I do not want to
burst my cache.

What is the best approach to achieve the same.

Thanks,
Kshitij


docValues with stored and useDocValuesAsStored

2018-01-08 Thread Bernd Fehling
What is the precedence when docValues with stored=true is used?
e.g.


My guess, because of useDocValuesAsStored=true is default, that stored=true is
ignored and the values are pulled from docValues.

And only if useDocValuesAsStored=false is explicitly used then stored=true comes
into play.

Or short, useDocValuesAsStored=true (the default) has precedence over 
stored=true.
I this right?

Regards
Bernd


Re: Limit search queries only to pull replicas

2018-01-08 Thread Ere Maijala
Server load alone doesn't always indicate the server's ability to serve 
queries. Memory and cache state are important too, and they're not as 
easy to monitor. Additionally, server load at any single point in time 
or a short term average is not indicative of the server's ability to 
handle search requests if indexing happens in short but intense bursts.


It can also complicate things if there are more than one Solr instance 
running on a single server.


I'm definitely not against intelligent routing. In many cases it makes 
perfect sense, and I'd still like to use it, just limited to the pull 
replicas.


--Ere

Erick Erickson kirjoitti 5.1.2018 klo 19.03:

Actually, I think a much better option is to route queries to server load.

The theory of preferring pull replicas to leaders would be that the leader
will be doing the indexing work and the pull replicas would be doing less
work therefore serving queries faster. But that's a fragile assumption.
Let's say indexing stops totally. Now your leader is sitting there idle
when it could be serving queries.

The autoscaling work will allow for more intelligent routing, you can
monitor the CPU load on your servers and if the leader has some spare
cycles use them .vs. crudely routing all queries to pull replicas (or tlog
replicas for that matter). NOTE: I don't know whether this is being
actively worked on or not, but seems a logical extension of the increased
monitoring capabilities being put in place for autoscaling, but I'd rather
see effort put in there than support routing based solely on a node's type.

Best,
Erick

On Fri, Jan 5, 2018 at 7:51 AM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:


It is interesting that ES had similar feature to prefer primary/replica
but it deprecating that and will remove it - could not find explanation why.

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/




On 5 Jan 2018, at 15:22, Ere Maijala  wrote:

Hi,

It would be really nice to have a server-side option, though. Not

everyone uses Solrj, and a typical fairly dummy client just queries the
server without any understanding about shards etc. Solr could be clever
enough to not forward the query to NRT shards when configured to prefer
PULL shards and they're available. Maybe it could be something similar to
the preferLocalShards parameter, like "preferShardTypes=TLOG,PULL".


--Ere

Emir Arnautović kirjoitti 14.12.2017 klo 11.41:

Hi Stanislav,
I don’t think that there is a built in feature to do this, but that

sounds like nice feature of Solrj - maybe you should check if available.
You can implement it outside of Solrj - check cluster state to see which
shards are available and send queries only to pull replicas.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/

On 14 Dec 2017, at 09:58, Stanislav Sandalnikov <

s.sandalni...@gmail.com> wrote:


Hi,

We have a Solr 7.1 setup with SolrCloud where we have multiple shards

on one server (for indexing) each shard has a pull replica on other servers.


What are the possible ways to limit search request only to pull type

replicase?

At the moment the only solution I found is to append shards parameter

to each query, but if new shards added later it requires to change
solrconfig. Is it the only way to do this?


Thank you

Regards
Stanislav



--
Ere Maijala
Kansalliskirjasto / The National Library of Finland







--
Ere Maijala
Kansalliskirjasto / The National Library of Finland