StopWords behavior with phrases

2019-05-21 Thread Ashish Bisht
Hi,

We make query to solr as below

*q="market and cloud" OR (market and cloud)=AND=edismax*

Our intent to look for results with both phrase match and AND query together
where solr itself takes care of relevancy.

But due to presence of stopword in phrase query a gap is left which gives
different results as against a keyword "market cloud".

"parsedquery_toString":"+(+(content:\"market ? cloud\" |
search_field:\"market ? cloud\"))",

There are suggestion that for phrase query create a separate field with no
stopword,But then we'll not be able to achieve both phrase and AND in a
single request.

Is there anyway ? can be removed from phrase or any suggestion for our
requirement.

Please suggest

Regards
Ashish





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Spellcheck Collations Phrase based instead of AND

2019-05-13 Thread Ashish Bisht
Hi,


For a sample collation during spellcheck.

 "collation",{
"collationQuery":"smart connected factory",
"hits":109,
"misspellingsAndCorrections":[
  "smart","smart",
  "connected","connected",
  "fator","factory"]},
  "collation",{
"collationQuery":"smart connected faster",
"hits":325,
"misspellingsAndCorrections":[
  "smart","smart",
  "connected","connected",
  "fator","faster"]},
  "collation",{
"collationQuery":"sparc connected factory",
"hits":14,
"misspellingsAndCorrections":[
  "smart","sparc",
  "connected","connected",
  "fator","factory"]},

The hits in the collationQuery are based on AND between the keyword .

Is it possible to get the collations sorted based on phrase instead of AND

Regards
Ashish



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Stopwords param of edismax parser not working

2019-03-28 Thread Ashish Bisht
Hi,

We are trying  to remove stopwords from analysis using edismax parser
parameter.The documentation says

*stopwords
A Boolean parameter indicating if the StopFilterFactory configured in the
query analyzer should be respected when parsing the query. If this is set to
false, then the StopFilterFactory in the query analyzer is ignored.*

https://lucene.apache.org/solr/guide/7_3/the-extended-dismax-query-parser.html


But seems like its not working.

http://Box-1:8983/solr/SalesCentralDev_4/select?q=internet of
things=0=edismax=search_field
content*=false*=true


"parsedquery":"+(DisjunctionMaxQuery((content:internet |
search_field:internet)) DisjunctionMaxQuery((content:thing |
search_field:thing)))",
  *  "parsedquery_toString":"+((content:internet | search_field:internet)
(content:thing | search_field:thing))",*


Are we missing something here?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Spellchecker -File based vs Index based

2019-03-19 Thread Ashish Bisht
Spellcheck configuration is default one..


solr.FileBasedSpellChecker
file
spellings.txt
UTF-8
./spellcheckerFile




  default
  jkdefault
  file
  on
  true
  10
  5
  5
  true
  10
  true
  10
  5


Also the words are present in the file..For e.g things word which is
corrected is present inside file.Also the suggestions related to it are
present.

I don't want suggestions for right word (of,things)..Any problem with
request .Tried two combinations.

1./spell?spellcheck.q=intnet of
things=true=AND=spellcontent=file

2.spell?q=intnet of
things=edismax=spellcontent=json=0&=true=file=AND

Please suggest



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Spellchecker -File based vs Index based

2019-03-19 Thread Ashish Bisht
Spellcheck configuration is default one.. 


solr.FileBasedSpellChecker
file
spellings.txt
UTF-8
./spellcheckerFile




  default
  jkdefault
  file
  on
  true
  10
  5
  5
  true
  10
  true
  10
  5


Also the words are present in the file..For e.g things word which is
corrected is present inside file.Also the suggestions related to it are
present. 

*I don't want suggestions for right word (of,things)..Any problem with
request .Tried two combinations.* 

1./spell?spellcheck.q=intnet of
things=true=AND=spellcontent=file
 

2./spell?q=intnet of
things=edismax=spellcontent=json=0&=true=file=AND
 

Please suggest



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Behavior of Function Query

2019-03-18 Thread Ashish Bisht
Please see the below requests and response

http://Sol:8983/solr/SCSpell/select?q="*internet of
things*"=edismax=spellcontent=json=1=score,internet_of_things:query({!edismax
v='"*internet of things*"'}),instant_of_things:query({!edismax v='"instant
of things"'})


Response contains score from function query

 "fl":"score,internet_of_things:query({!edismax v='\"internet of
things\"'}),instant_of_things:query({!edismax v='\"instant of things\"'})",
  "rows":"1",
  "wt":"json"}},
  "response":{"numFound":851,"start":0,"maxScore":7.6176834,"docs":[
  {
"score":7.6176834,
   * "internet_of_things":7.6176834*}]
  }}


But if in the same request q is changed,it doesn't give score

http://Sol-1:8983/solr/SCSpell/select?q="*wall
street*"=edismax=spellcontent=json=1=score,internet_of_things:query({!edismax
v='"*internet of things*"'}),instant_of_things:query({!edismax v='"instant
of things"'})

   "q":"\"wall street\"",
  "defType":"edismax",
  "qf":"spellcontent",
  "fl":"score,internet_of_things:query({!edismax v='\"internet of
things\"'}),instant_of_things:query({!edismax v='\"instant of things\"'})",
  "rows":"1",
  "wt":"json"}},
  "response":{"numFound":46,"start":0,"maxScore":15.670144,"docs":[
  {
"score":15.670144}]
  }}


Why score of function query is getting applied when q is a different.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Different behavior when using function queries

2019-03-18 Thread Ashish Bisht
Can someone please explain the below behavior.For different q parameter
function query response differs although function queries are same

http://:8983/solr/SCSpell/select?q="*market
place*"=edismax=spellcontent=json=1=internet_of_things:if(exists(query({!edismax
v='"internet of
things"'})),true,false),instant_of_things:if(exists(query({!edismax
v='"instant of things"'})),true,false)

Response contains function query results

 "response":{"numFound":80,"start":0,"docs":[
  {
"internet_of_things":false,
"instant_of_things":false}]
  }}

wheras for different q

http://:8983/solr/SCSpell/select?q="*intent of
things*"=edismax=spellcontent=json=1=internet_of_things:if(exists(query({!edismax
v='"internet of
things"'})),true,false),instant_of_things:if(exists(query({!edismax
v='"instant of things"'})),true,false)

Response doesnot contain function query results
  
"response":{"numFound":0,"start":0,"docs":[]
  }}


>From the results it looks like if the results of q doesn't yield result
function queries don't work.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Spellchecker -File based vs Index based

2019-03-18 Thread Ashish Bisht
Hi,

I am seeing difference in file based spellcheck and index based spellcheck
implementations.

Using index based
http://:8983/solr/SCSpell/spell?q=*intnet of
things*=edismax=spellcontent=json=0=true=*default*=AND


  "suggestions":[
  "intnet",{
"numFound":10,
"startOffset":0,
"endOffset":6,
"origFreq


Suggestion get build up only for wrong word.


But while suing file based,they get build up for right words too which
messes collations

http://:8983/solr/SCSpell/spell?q=intnet%20of%20things=edismax=spellcontent=json=0&=true=*file*=AND

 "suggestion":["*internet*",
  "contnet",
  "intel",
  "intent",
  "intert",
  "intelect",
  "intended",
  "intented",
  "interest",
  "botnets"]},
  "*of*",{
"numFound":8,
"startOffset":7,
"endOffset":9,
"suggestion":["ofc",
  "off",
  "ohf",
 .
  "soft"]},
 "*things*",{
"numFound":10,
"startOffset":10,
"endOffset":16,
"suggestion":["thing",
  "brings",
  "think",
  "thinkers",
  .



Is there any property in file based which I use to fix this



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Relevancy Score Calculation

2019-02-11 Thread Ashish Bisht
Thanks.I Agree.

Regards
Ashish



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Relevancy Score Calculation

2019-02-03 Thread Ashish Bisht
Hi,

Currently score is calculated based on "Max Doc"  instead of "Num Docs".Is
it possible to change it to "Num Docs"(i.e without deleted docs).Will it
require a code change or some config change.

Regards
Ashish



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr relevancy score different on replicated nodes

2019-02-03 Thread Ashish Bisht
Thanks Erick and everyone.We are checking on stats cache.

I noticed stats skew again and optimized the index to correct the same.As
per the documents.

https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
and 
https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/

wanted to check on below points considering we want stats skew to be
corrected.

1.When optimized single segment won't be natural merged easily.As we might
be doing manual optimize every time,what I visualize is at a certain point
in future we might be having a single large segment.What impact this large
segment is going to have?
Our index ~30k documents i.e files with content(Segment size <1Gb as of now)

1.Do you recommend going for optimize in these situations?Probably it will
be done only when stats skew.Is it safe?

Regards
Ashish

 




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr relevancy score different on replicated nodes

2019-01-29 Thread Ashish Bisht
Hi Erick, 

Our business wanted score not to be totally based on default relevancy algo.
Instead a mix of solr relevancy+usermetrics(80%+20%). 

Each result doc is calculated against max score as a fraction of
80.Remaining 20 is from user metrics. 

Finally sort happens on new score. 

But say we got first page correctly, and for the second page if the request
goes to other replica where max score is different. UI may result give wrong
sort as compared to first page. For e.g last value of page 1 is 70 and first
value of second page can be 72 I. e distorted sorting. 

On top of it we are not using pagination but a infinite scroll which makes
it more noticeable. 

Please suggest. 

Regards
Ashish








--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr relevancy score different on replicated nodes

2019-01-29 Thread Ashish Bisht
Hi Erick,

To test this scenario I added replica again and from few days have been
monitoring metrics like Num Docs, Max Doc, Deleted Docs from *Overview*
section of core.Checked *Segments Info* section too.Everything looks in
sync.

http://:8983/solr/#/MyTestCollection_*shard1_replica_n7*/
http://:8983/solr/#/MyTestCollection_*4_shard1_replica_n7*/

If in future they go out of sync,just wanted to confirm if this is a bug
although you mentioned as

*bq. Shouldn't both replica and leader come to same state 
after this much long period. 

No. After that long, the docs will be the same, all the docs 
present on one replica will be present and searchable on 
the other. However, they will be in different segments so the 
"stats skew" will remain. *


We need these score,so as a temporary solution if we monitor these metrics
for any issues and take action (either optimize or delete-add replica)
accordingly.Does it make sense?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr relevancy score different on replicated nodes

2019-01-11 Thread Ashish Bisht
Hi Erick,

Your statement "*At best, I've seen UIs where they display, say, 1 to 5
stars that are just showing the percentile that the particular doc had
_relative to the max score*"  is something we are trying to achieve,but we
are dealing in percentages rather stars(ratings)

Change in MaxScore per node is messing it.

I was thinking if it possible to make one complete request(for a term) go
though one replica,i.e if to the client we could tell which replica hit the
first request and subsequently further paginated requests should go though
that replica until keyword is changed.Do you think it is possible or a good
idea?If yes is there a way in solr to know which replica served request?

Regards
Ashish




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr relevancy score different on replicated nodes

2019-01-08 Thread Ashish Bisht
Thank you Erick for explaining. 

In my senario, I stopped indexing and updates too and waited for 1 day.
Restarted solr too.Shouldn't both replica and leader come to same state
after this much long period. As you said this gets corrected by segment
merging, hope it is internal process itself and no manual activity required.

For us score matters as we are using it to display some scenarios on search
and it gave changing values.As of now we are dependent of single
shard-replica but in future we might need more replicas
Will planning indexing and updates outside peak query hour help? 

I have tried the exact cache while debugging score difference during
sharding.Didn't help much.Anyhow that's a different topic. 

Thanks again, 

Regards
Ashish Bisht





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr relevancy score different on replicated nodes

2019-01-06 Thread Ashish Bisht
Hi Erick,

Thank you for the details,but doesn't look like a time difference in
autocommit caused this issue.As I said if I do retrieve all query/keyword
query on both server,they returned correct number of docs,its just relevancy
score is taking diff values.  

I waited for brief period,still discrepancy was coming(no indexing also).So
I went ahead deleting the follower node(thinking leader replica should be in
correct state).After adding the new replica again,the issue is not
appearing.

We will monitor same if it appears in future.

Regards
Ashish



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr relevancy score different on replicated nodes

2019-01-04 Thread Ashish Bisht
Hi Erick, 

I have updated that I am not facing this problem in a new collection. 

As per 3) I can try deleting a replica and adding it again, but the
confusion is which one out of two should I delete.(wondering which replica
is giving correct score for query) 

Both replicas give same number of docs while doing all query.Its strange
that in query explain docCount and docFreq is differing. 

Regards
Ashish



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html