Re: Full-text search for Solr manual

2019-11-12 Thread Luke Miller
Thanks Alex, 

 

For your response.

 

Unfortunately the Solr source does not ship with the source of the manual.
(Directory /docs only contains a link to the online manual.)

 

Google search with domain limitation does not give any results, as mentioned
in my initial post. Any other limitation does not filter for a specific
version. E.g. https://www.google.de/search?q=

"Solr%20Ref%20Guide%208.3"%20site:https://lucene.apache.org/%20luceneMatchVe
rsion

 

I ended up downloading the whole documentation manually:

wget --timeout=1 --tries=5 --cut-dirs=3 -mkpnp -nH -P solr-8.3 -e robots=off
https://lucene.apache.org/solr/guide/8_3/

 

And then I have to grep. A plain PDF file would be so much more convenient!
Of course a Solr-enabled search for the online manual would work as well.

 

Thanks,

Julian

 

 

>Grep on the source of the manual (which ships with Solr source).

> 

>Google search with domain or keywords limitations.

> 

>Online copy searching is not powered by Solr yet. Yes, we are aware of the

>irony and are discussing it.

> 

>Regards,

>Aled

> 

>On Tue, Nov 12, 2019, 1:25 AM Luke Miller wrote:

> 

>> Hi,

>> 

>> 

>> 

>> I just noticed that since Solr 8.2 the Apache Solr Reference Guide is not

>> available anymore as PDF.

>> 

>> 

>> 

>> Is there a way to perform a full-text search using the HTML manual? E.g.

>> I'd

>> like to find every hit for "luceneMatchVersion".

>> 

>> 

>> 

>> *   Using the integrated "Page title lookup." does not find anything
(

>> -

>> sure, it only looks up page titles. )

>> *   Google does not return anything either searching for:

>> site:https://lucene.apache.org/solr/guide/8_3/ luceneMatchVersion

>> 

>> 

>> 

>> Is there another search method I missed?

>> 

>> 

>> 

>> Thanks.

>> 

>> 

 



Solr 8.2 indexing issues

2019-11-12 Thread Sujatha Arun
We recently migrated from 6.6.2 to 8.2. We are seeing issues with indexing
where the leader and the replica document counts do not match. We get
different results every time we do a *:* search.

The only issue we see in the logs is Jira issue : Solr-13293

Has anybody seen similar issues?

Thanks


Re: different results in numFound vs using the cursor

2019-11-12 Thread rhys J
> : I am going to adjust my schema, re-index, and try again. See if that
> : doesn't fix this problem. I didn't know that having the uniqueKey be a
> : textField was a bad idea.
>
>
> https://lucene.apache.org/solr/guide/8_3/other-schema-elements.html#OtherSchemaElements-UniqueKey
>
> "The fieldType of uniqueKey must not be analyzed"
>
> (hence my comment baout "possible, but hard to get right ... you can use
> something like the KeywordTokenizer, but at that point you might as well
> use StrField except in some really esoteric special situations)
>
>
Good news. I added a field called ID, and made it string. Then I deleted
documents, re-indexed my data, and tried the search again.

Now solrResults size and numFound size are exactly the same.

Thanks for your help.

Rhys


Re: using fq means no results

2019-11-12 Thread Erik Hatcher
To add bq in there makes it query parser specific.  But I’m being pedantic 
since most folks are using edismax where that applies (along with a bunch of 
other params that would also deserve mention, like boost and bf).  q and fq, 
agreed for the explanation.  bq mentioned only if specifics and siblings 
described too :)

> On Nov 12, 2019, at 12:16, Walter Underwood  wrote:
> 
> I explain it this way:
> 
> * fq: filtering
> * q: filtering and scoring
> * bq: scoring
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>>> On Nov 12, 2019, at 9:08 AM, Erik Hatcher  wrote:
>>> 
>>> 
>>> 
 On Nov 12, 2019, at 12:01 PM, rhys J  wrote:
>>> 
>>> On Tue, Nov 12, 2019 at 11:57 AM Erik Hatcher 
>>> wrote:
>>> 
 fq is a filter query, and thus narrows the result set provided by the q
 down to what also matches all specified fq's.
 
 
>>> So this can be used instead of scoring? Or alongside scoring?
>> 
>> That's right.   Only `q` (and it's query parser associated params) are used 
>> for scoring.   fq's narrow the result set, but don't influence score.
>> 
>>Erik
>> 
> 


Re: different results in numFound vs using the cursor

2019-11-12 Thread Chris Hostetter


: > whoa... that's not normal .. what *exactly* does the fieldType declaration
: > (with all analyzers) look like, and what does the  declaration
: > look like?
: >
: >
: 
: 
: 

NOTE: "text_general" != "text_gen_sort"

Assuming your "text_general" declaration looks like it does in the 
_default config set, then using that for uniqueKey or sorting is definitly 
not a good idea.

If you were *actually* using SortableTextField for your uniqueKeyField ... 
well, that should be ok to *sort* on, but i still wouldn't suggest using 
it as a uniqueKey field ... honestly not sure what behavior that might 
have with things like deleteById, etc...


: I am going to adjust my schema, re-index, and try again. See if that
: doesn't fix this problem. I didn't know that having the uniqueKey be a
: textField was a bad idea.

https://lucene.apache.org/solr/guide/8_3/other-schema-elements.html#OtherSchemaElements-UniqueKey

"The fieldType of uniqueKey must not be analyzed"

(hence my comment baout "possible, but hard to get right ... you can use 
something like the KeywordTokenizer, but at that point you might as well 
use StrField except in some really esoteric special situations)



-Hoss
http://www.lucidworks.com/


Re: different results in numFound vs using the cursor

2019-11-12 Thread rhys J
On Tue, Nov 12, 2019 at 12:18 PM Chris Hostetter 
wrote:

>
> : > a) What is the fieldType of the uniqueKey field in use?
> : >
> :
> : It is a textField
>
> whoa... that's not normal .. what *exactly* does the fieldType declaration
> (with all analyzers) look like, and what does the  declaration
> look like?
>
>




  
  
  


  
  
  
  

  



> you should really never use TextField for a uniqueKey ... it's possible,
> but incredibly tricky to get "right".
>
>
I am going to adjust my schema, re-index, and try again. See if that
doesn't fix this problem. I didn't know that having the uniqueKey be a
textField was a bad idea.


> Independent from that, "sorting" on a TextField doesn't always do what you
> might think (again: depending on the analysis in use)
>
> With a cursorMark you have other factors to consider: i bet what's
> happening is that the post-analysis terms for your docs result it
> duplicate values, so the cursorMark is skipping all docs that have hte
> same (post analysis) sort value ... this could also manifest itself in
> other weird ways, like trying to deleteById.
>
> Step #1: switch to using a simple StrField for your uniqueKey field and
> see if htat solves all your problems.
>
>
Thanks, doing this now.

Rhys


Re: different results in numFound vs using the cursor

2019-11-12 Thread Chris Hostetter


: > a) What is the fieldType of the uniqueKey field in use?
: >
: 
: It is a textField

whoa... that's not normal .. what *exactly* does the fieldType declaration 
(with all analyzers) look like, and what does the  declaration 
look like?

you should really never use TextField for a uniqueKey ... it's possible, 
but incredibly tricky to get "right".

Independent from that, "sorting" on a TextField doesn't always do what you 
might think (again: depending on the analysis in use)

With a cursorMark you have other factors to consider: i bet what's 
happening is that the post-analysis terms for your docs result it 
duplicate values, so the cursorMark is skipping all docs that have hte 
same (post analysis) sort value ... this could also manifest itself in 
other weird ways, like trying to deleteById.

Step #1: switch to using a simple StrField for your uniqueKey field and 
see if htat solves all your problems.


-Hoss
http://www.lucidworks.com/


Re: using fq means no results

2019-11-12 Thread Walter Underwood
I explain it this way:

* fq: filtering
* q: filtering and scoring
* bq: scoring

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Nov 12, 2019, at 9:08 AM, Erik Hatcher  wrote:
> 
> 
> 
>> On Nov 12, 2019, at 12:01 PM, rhys J  wrote:
>> 
>> On Tue, Nov 12, 2019 at 11:57 AM Erik Hatcher 
>> wrote:
>> 
>>> fq is a filter query, and thus narrows the result set provided by the q
>>> down to what also matches all specified fq's.
>>> 
>>> 
>> So this can be used instead of scoring? Or alongside scoring?
> 
> That's right.   Only `q` (and it's query parser associated params) are used 
> for scoring.   fq's narrow the result set, but don't influence score.
> 
>   Erik
> 



Re: using fq means no results

2019-11-12 Thread Erik Hatcher



> On Nov 12, 2019, at 12:01 PM, rhys J  wrote:
> 
> On Tue, Nov 12, 2019 at 11:57 AM Erik Hatcher 
> wrote:
> 
>> fq is a filter query, and thus narrows the result set provided by the q
>> down to what also matches all specified fq's.
>> 
>> 
> So this can be used instead of scoring? Or alongside scoring?

That's right.   Only `q` (and it's query parser associated params) are used for 
scoring.   fq's narrow the result set, but don't influence score.

Erik



Re: using fq means no results

2019-11-12 Thread rhys J
On Tue, Nov 12, 2019 at 11:57 AM Erik Hatcher 
wrote:

> fq is a filter query, and thus narrows the result set provided by the q
> down to what also matches all specified fq's.
>
>
So this can be used instead of scoring? Or alongside scoring?


> You gave it a query, "cat_ref_no", which literally looks for that string
> in your default field.   Looking at your q parameter, cat_ref_no looks like
> a field name, and your fq should probably also have a value for that field
> (say fq=cat_ref_no=owl-2924-8)
>
> Use debug=true to see how your q and fq's are parsed, and that
> should shed some light on the issue.
>
>
Thank you for your help!

Rhys


Re: using fq means no results

2019-11-12 Thread Erik Hatcher
fq is a filter query, and thus narrows the result set provided by the q down to 
what also matches all specified fq's.

You gave it a query, "cat_ref_no", which literally looks for that string in 
your default field.   Looking at your q parameter, cat_ref_no looks like a 
field name, and your fq should probably also have a value for that field (say 
fq=cat_ref_no=owl-2924-8)

Use debug=true to see how your q and fq's are parsed, and that should 
shed some light on the issue.

Erik


> On Nov 12, 2019, at 11:33 AM, rhys J  wrote:
> 
> If I do this query in the browser:
> 
> http://10.40.10.14:8983/solr/debt/select?q=(clt_ref_no:+owl-2924-8)^=1.0+clt_ref_no:owl-2924-8
> 
> I get 84662 results.
> 
> If I do this query:
> 
> http://10.40.10.14:8983/solr/debt/select?q=(clt_ref_no:+owl-2924-8)^=1.0+clt_ref_no:owl-2924-8=clt_ref_no
> 
> I get 0 results.
> 
> Why does using fq do this?
> 
> What am I missing in my query?
> 
> Thanks,
> 
> Rhys



Re: sort by score in join with geodist()

2019-11-12 Thread Vasily Ogar
Thank you for advice, now it working as expected. Maybe you know how to
integrate with dismax?

On Tue, Nov 12, 2019 at 6:10 PM Mikhail Khludnev  wrote:

> tlrd;
> I noticed func under fq that make no sense. Only q or sort yield scores.
>
> On Tue, Nov 12, 2019 at 6:43 PM Vasily Ogar  wrote:
>
> > First of all, thank you for your help.
> > Now it doesn't show any errors, but somehow score is based on the title
> and
> > description but not on the geodist.
> > "params":{ "hl":"on", "pt":"54.6973867999,25.22481530046",
> > "fl":"score,*,store:[subquery
> > fromIndex=stores]", "store.rows":"1", "fq":"{!join from=site_id
> to=site_id
> > fromIndex=stores score=max}{!func}geodist()", "store.sort":"geodist()
> asc",
> > "hl.simple.pre":"", "store.q":"{!terms f=site_id v=$row.site_id}", "
> > store.sfield":"coordinates", "hl.fl":"title description", "group.field":
> > "site_id", "_":"1573559644298", "group":"true", "store.fq":"{!geofilt}",
> "d
> > ":"100", "{!geofilt}":"", "group.limit":"2", "store.d":"100", "store.pt
> ":
> > "54.6973867999,25.22481530046", "store.fl":"*,score",
> "sort":"score
> > desc", "sfield":"coordinates", "q":"title:\"iphone xr 64gb\"",
> > "group.main":
> > "true", "hl.simple.post":"", "debugQuery":"on"}
> >
> > Here is debug:
> > "debug":{ "rawquerystring":"title:\"iphone xr 64gb\"",
> > "querystring":"title:\"iphone
> > xr 64gb\"", "parsedquery":"PhraseQuery(title:\"iphon xr 64gb\")", "
> > parsedquery_toString":"title:\"iphon xr 64gb\"", "explain":{ "product:
> > https://www.ideal.lt/iphone/iphone-xr/iphone-xr-64gb-yellow
> ":"\n3.9714882
> > =
> > weight(title:\"iphon xr 64gb\" in 568) [SchemaSimilarity], result of:\n
> > 3.9714882 = score(freq=1.0), product of:\n 6.8681135 = idf, sum of:\n
> > 1.1837479 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:\n
> 459
> > = n, number of documents containing term\n 1500 = N, total number of
> > documents with field\n 3.3156862 = idf, computed as log(1 + (N - n +
> 0.5) /
> > (n + 0.5)) from:\n 54 = n, number of documents containing term\n 1500 =
> N,
> > total number of documents with field\n 2.3686793 = idf, computed as
> log(1 +
> > (N - n + 0.5) / (n + 0.5)) from:\n 140 = n, number of documents
> containing
> > term\n 1500 = N, total number of documents with field\n 0.5782502 = tf,
> > computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:\n 1.0 =
> > phraseFreq=1.0\n 1.2 = k1, term saturation parameter\n 0.75 = b, length
> > normalization parameter\n 4.0 = dl, length of field\n 8.384666 = avgdl,
> > average length of field\n", "product:
> > https://www.ideal.lt/iphone/iphone-xr/iphone-xr-64gb-white":"\n3.9714882
> =
> > weight(title:\"iphon xr 64gb\" in 569) [SchemaSimilarity], result of:\n
> > 3.9714882 = score(freq=1.0), product of:\n 6.8681135 = idf, sum of:\n
> > 1.1837479 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:\n
> 459
> > = n, number of documents containing term\n 1500 = N, total number of
> > documents with field\n 3.3156862 = idf, computed as log(1 + (N - n +
> 0.5) /
> > (n + 0.5)) from:\n 54 = n, number of documents containing term\n 1500 =
> N,
> > total number of documents with field\n 2.3686793 = idf, computed as
> log(1 +
> > (N - n + 0.5) / (n + 0.5)) from:\n 140 = n, number of documents
> containing
> > term\n 1500 = N, total number of documents with field\n 0.5782502 = tf,
> > computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:\n 1.0 =
> > phraseFreq=1.0\n 1.2 = k1, term saturation parameter\n 0.75 = b, length
> > normalization parameter\n 4.0 = dl, length of field\n 8.384666 = avgdl,
> > average length of field\n", "product:
> > https://istore.lt/iphone-xr-64gb-blue.html":"\n3.9714882 =
> > weight(title:\"iphon xr 64gb\" in 28) [SchemaSimilarity], result of:\n
> > 3.9714882 = score(freq=1.0), product of:\n 6.8681135 = idf, sum of:\n
> > 1.1837479 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:\n
> 459
> > = n, number of documents containing term\n 1500 = N, total number of
> > documents with field\n 3.3156862 = idf, computed as log(1 + (N - n +
> 0.5) /
> > (n + 0.5)) from:\n 54 = n, number of documents containing term\n 1500 =
> N,
> > total number of documents with field\n 2.3686793 = idf, computed as
> log(1 +
> > (N - n + 0.5) / (n + 0.5)) from:\n 140 = n, number of documents
> containing
> > term\n 1500 = N, total number of documents with field\n 0.5782502 = tf,
> > computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:\n 1.0 =
> > phraseFreq=1.0\n 1.2 = k1, term saturation parameter\n 0.75 = b, length
> > normalization parameter\n 4.0 = dl, length of field\n 8.384666 = avgdl,
> > average length of field\n", "product:
> > https://istore.lt/iphone-xr-64gb-coral.html":"\n3.9714882 =
> > weight(title:\"iphon xr 64gb\" in 29) [SchemaSimilarity], result of:\n
> > 3.9714882 = score(freq=1.0), product of:\n 6.8681135 = idf, sum of:\n
> > 1.1837479 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:\n
> 459
> > = n, number of 

using fq means no results

2019-11-12 Thread rhys J
If I do this query in the browser:

http://10.40.10.14:8983/solr/debt/select?q=(clt_ref_no:+owl-2924-8)^=1.0+clt_ref_no:owl-2924-8

I get 84662 results.

If I do this query:

http://10.40.10.14:8983/solr/debt/select?q=(clt_ref_no:+owl-2924-8)^=1.0+clt_ref_no:owl-2924-8=clt_ref_no

I get 0 results.

Why does using fq do this?

What am I missing in my query?

Thanks,

Rhys


Re: sort by score in join with geodist()

2019-11-12 Thread Mikhail Khludnev
tlrd;
I noticed func under fq that make no sense. Only q or sort yield scores.

On Tue, Nov 12, 2019 at 6:43 PM Vasily Ogar  wrote:

> First of all, thank you for your help.
> Now it doesn't show any errors, but somehow score is based on the title and
> description but not on the geodist.
> "params":{ "hl":"on", "pt":"54.6973867999,25.22481530046",
> "fl":"score,*,store:[subquery
> fromIndex=stores]", "store.rows":"1", "fq":"{!join from=site_id to=site_id
> fromIndex=stores score=max}{!func}geodist()", "store.sort":"geodist() asc",
> "hl.simple.pre":"", "store.q":"{!terms f=site_id v=$row.site_id}", "
> store.sfield":"coordinates", "hl.fl":"title description", "group.field":
> "site_id", "_":"1573559644298", "group":"true", "store.fq":"{!geofilt}", "d
> ":"100", "{!geofilt}":"", "group.limit":"2", "store.d":"100", "store.pt":
> "54.6973867999,25.22481530046", "store.fl":"*,score", "sort":"score
> desc", "sfield":"coordinates", "q":"title:\"iphone xr 64gb\"",
> "group.main":
> "true", "hl.simple.post":"", "debugQuery":"on"}
>
> Here is debug:
> "debug":{ "rawquerystring":"title:\"iphone xr 64gb\"",
> "querystring":"title:\"iphone
> xr 64gb\"", "parsedquery":"PhraseQuery(title:\"iphon xr 64gb\")", "
> parsedquery_toString":"title:\"iphon xr 64gb\"", "explain":{ "product:
> https://www.ideal.lt/iphone/iphone-xr/iphone-xr-64gb-yellow":"\n3.9714882
> =
> weight(title:\"iphon xr 64gb\" in 568) [SchemaSimilarity], result of:\n
> 3.9714882 = score(freq=1.0), product of:\n 6.8681135 = idf, sum of:\n
> 1.1837479 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:\n 459
> = n, number of documents containing term\n 1500 = N, total number of
> documents with field\n 3.3156862 = idf, computed as log(1 + (N - n + 0.5) /
> (n + 0.5)) from:\n 54 = n, number of documents containing term\n 1500 = N,
> total number of documents with field\n 2.3686793 = idf, computed as log(1 +
> (N - n + 0.5) / (n + 0.5)) from:\n 140 = n, number of documents containing
> term\n 1500 = N, total number of documents with field\n 0.5782502 = tf,
> computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:\n 1.0 =
> phraseFreq=1.0\n 1.2 = k1, term saturation parameter\n 0.75 = b, length
> normalization parameter\n 4.0 = dl, length of field\n 8.384666 = avgdl,
> average length of field\n", "product:
> https://www.ideal.lt/iphone/iphone-xr/iphone-xr-64gb-white":"\n3.9714882 =
> weight(title:\"iphon xr 64gb\" in 569) [SchemaSimilarity], result of:\n
> 3.9714882 = score(freq=1.0), product of:\n 6.8681135 = idf, sum of:\n
> 1.1837479 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:\n 459
> = n, number of documents containing term\n 1500 = N, total number of
> documents with field\n 3.3156862 = idf, computed as log(1 + (N - n + 0.5) /
> (n + 0.5)) from:\n 54 = n, number of documents containing term\n 1500 = N,
> total number of documents with field\n 2.3686793 = idf, computed as log(1 +
> (N - n + 0.5) / (n + 0.5)) from:\n 140 = n, number of documents containing
> term\n 1500 = N, total number of documents with field\n 0.5782502 = tf,
> computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:\n 1.0 =
> phraseFreq=1.0\n 1.2 = k1, term saturation parameter\n 0.75 = b, length
> normalization parameter\n 4.0 = dl, length of field\n 8.384666 = avgdl,
> average length of field\n", "product:
> https://istore.lt/iphone-xr-64gb-blue.html":"\n3.9714882 =
> weight(title:\"iphon xr 64gb\" in 28) [SchemaSimilarity], result of:\n
> 3.9714882 = score(freq=1.0), product of:\n 6.8681135 = idf, sum of:\n
> 1.1837479 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:\n 459
> = n, number of documents containing term\n 1500 = N, total number of
> documents with field\n 3.3156862 = idf, computed as log(1 + (N - n + 0.5) /
> (n + 0.5)) from:\n 54 = n, number of documents containing term\n 1500 = N,
> total number of documents with field\n 2.3686793 = idf, computed as log(1 +
> (N - n + 0.5) / (n + 0.5)) from:\n 140 = n, number of documents containing
> term\n 1500 = N, total number of documents with field\n 0.5782502 = tf,
> computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:\n 1.0 =
> phraseFreq=1.0\n 1.2 = k1, term saturation parameter\n 0.75 = b, length
> normalization parameter\n 4.0 = dl, length of field\n 8.384666 = avgdl,
> average length of field\n", "product:
> https://istore.lt/iphone-xr-64gb-coral.html":"\n3.9714882 =
> weight(title:\"iphon xr 64gb\" in 29) [SchemaSimilarity], result of:\n
> 3.9714882 = score(freq=1.0), product of:\n 6.8681135 = idf, sum of:\n
> 1.1837479 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:\n 459
> = n, number of documents containing term\n 1500 = N, total number of
> documents with field\n 3.3156862 = idf, computed as log(1 + (N - n + 0.5) /
> (n + 0.5)) from:\n 54 = n, number of documents containing term\n 1500 = N,
> total number of documents with field\n 2.3686793 = idf, computed as log(1 +
> (N - n + 0.5) / (n + 0.5)) from:\n 140 = n, number of 

Re: sort by score in join with geodist()

2019-11-12 Thread Vasily Ogar
First of all, thank you for your help.
Now it doesn't show any errors, but somehow score is based on the title and
description but not on the geodist.
"params":{ "hl":"on", "pt":"54.6973867999,25.22481530046",
"fl":"score,*,store:[subquery
fromIndex=stores]", "store.rows":"1", "fq":"{!join from=site_id to=site_id
fromIndex=stores score=max}{!func}geodist()", "store.sort":"geodist() asc",
"hl.simple.pre":"", "store.q":"{!terms f=site_id v=$row.site_id}", "
store.sfield":"coordinates", "hl.fl":"title description", "group.field":
"site_id", "_":"1573559644298", "group":"true", "store.fq":"{!geofilt}", "d
":"100", "{!geofilt}":"", "group.limit":"2", "store.d":"100", "store.pt":
"54.6973867999,25.22481530046", "store.fl":"*,score", "sort":"score
desc", "sfield":"coordinates", "q":"title:\"iphone xr 64gb\"", "group.main":
"true", "hl.simple.post":"", "debugQuery":"on"}

Here is debug:
"debug":{ "rawquerystring":"title:\"iphone xr 64gb\"",
"querystring":"title:\"iphone
xr 64gb\"", "parsedquery":"PhraseQuery(title:\"iphon xr 64gb\")", "
parsedquery_toString":"title:\"iphon xr 64gb\"", "explain":{ "product:
https://www.ideal.lt/iphone/iphone-xr/iphone-xr-64gb-yellow":"\n3.9714882 =
weight(title:\"iphon xr 64gb\" in 568) [SchemaSimilarity], result of:\n
3.9714882 = score(freq=1.0), product of:\n 6.8681135 = idf, sum of:\n
1.1837479 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:\n 459
= n, number of documents containing term\n 1500 = N, total number of
documents with field\n 3.3156862 = idf, computed as log(1 + (N - n + 0.5) /
(n + 0.5)) from:\n 54 = n, number of documents containing term\n 1500 = N,
total number of documents with field\n 2.3686793 = idf, computed as log(1 +
(N - n + 0.5) / (n + 0.5)) from:\n 140 = n, number of documents containing
term\n 1500 = N, total number of documents with field\n 0.5782502 = tf,
computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:\n 1.0 =
phraseFreq=1.0\n 1.2 = k1, term saturation parameter\n 0.75 = b, length
normalization parameter\n 4.0 = dl, length of field\n 8.384666 = avgdl,
average length of field\n", "product:
https://www.ideal.lt/iphone/iphone-xr/iphone-xr-64gb-white":"\n3.9714882 =
weight(title:\"iphon xr 64gb\" in 569) [SchemaSimilarity], result of:\n
3.9714882 = score(freq=1.0), product of:\n 6.8681135 = idf, sum of:\n
1.1837479 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:\n 459
= n, number of documents containing term\n 1500 = N, total number of
documents with field\n 3.3156862 = idf, computed as log(1 + (N - n + 0.5) /
(n + 0.5)) from:\n 54 = n, number of documents containing term\n 1500 = N,
total number of documents with field\n 2.3686793 = idf, computed as log(1 +
(N - n + 0.5) / (n + 0.5)) from:\n 140 = n, number of documents containing
term\n 1500 = N, total number of documents with field\n 0.5782502 = tf,
computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:\n 1.0 =
phraseFreq=1.0\n 1.2 = k1, term saturation parameter\n 0.75 = b, length
normalization parameter\n 4.0 = dl, length of field\n 8.384666 = avgdl,
average length of field\n", "product:
https://istore.lt/iphone-xr-64gb-blue.html":"\n3.9714882 =
weight(title:\"iphon xr 64gb\" in 28) [SchemaSimilarity], result of:\n
3.9714882 = score(freq=1.0), product of:\n 6.8681135 = idf, sum of:\n
1.1837479 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:\n 459
= n, number of documents containing term\n 1500 = N, total number of
documents with field\n 3.3156862 = idf, computed as log(1 + (N - n + 0.5) /
(n + 0.5)) from:\n 54 = n, number of documents containing term\n 1500 = N,
total number of documents with field\n 2.3686793 = idf, computed as log(1 +
(N - n + 0.5) / (n + 0.5)) from:\n 140 = n, number of documents containing
term\n 1500 = N, total number of documents with field\n 0.5782502 = tf,
computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:\n 1.0 =
phraseFreq=1.0\n 1.2 = k1, term saturation parameter\n 0.75 = b, length
normalization parameter\n 4.0 = dl, length of field\n 8.384666 = avgdl,
average length of field\n", "product:
https://istore.lt/iphone-xr-64gb-coral.html":"\n3.9714882 =
weight(title:\"iphon xr 64gb\" in 29) [SchemaSimilarity], result of:\n
3.9714882 = score(freq=1.0), product of:\n 6.8681135 = idf, sum of:\n
1.1837479 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:\n 459
= n, number of documents containing term\n 1500 = N, total number of
documents with field\n 3.3156862 = idf, computed as log(1 + (N - n + 0.5) /
(n + 0.5)) from:\n 54 = n, number of documents containing term\n 1500 = N,
total number of documents with field\n 2.3686793 = idf, computed as log(1 +
(N - n + 0.5) / (n + 0.5)) from:\n 140 = n, number of documents containing
term\n 1500 = N, total number of documents with field\n 0.5782502 = tf,
computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:\n 1.0 =
phraseFreq=1.0\n 1.2 = k1, term saturation parameter\n 0.75 = b, length
normalization parameter\n 4.0 = dl, length of 

Re: different results in numFound vs using the cursor

2019-11-12 Thread rhys J
On Mon, Nov 11, 2019 at 8:32 PM Chris Hostetter 
wrote:

>
> Based on the info provided, it's hard to be certain, but reading between
> the lines here are hte assumptions i'm making...
>
> 1) your core name is "dbtr"
> 2) the uniqueId field for the "dbtr" core is "debtor_id"
>
> ..are those assumptions correct?
>

Yes they are. Sorry I didn't provide that from the beginning.


> Two key pieces of information that doesn't seem to be assumable from the
> imfo you've provided:
>
> a) What is the fieldType of the uniqueKey field in use?
>

It is a textField


> b) how are you determining that "The numFound: 35008"
>
>
I do a preliminary query to the solr core and print out the numFound from
this:

 my $solrResponse = $ua->post( $solrURI );

 my $decoded = decode_json( $solrResponse->{_content} );
 my $numFound = $decoded->{response}{numFound};


> ...
>
> You show the code that prints out "size of solrResults: 22006" but nothing
> in your code ever prints $numFound.  there is a snippet of code at the top
>

I am printing numFound every time it loops. This should remain constant,
because it is the total of all documents found. It's not really necessary
that I am printing it.

The number of docs is the size that I also print, and that is 1000 every
time, until the last little bit, and then it is 6 docs found.


> of your perl logic that seems disconnected from the rest of the code which
> makes me think that before you do anything with a cursor you are already
> parsing some *other* query response to get $numFound that way...
>
>
I am running this query first, to get the cursor set:

"http://10.40.10.14:8983/solr/debt/select?indent=on=1000=id
asc=debt_id: 608384 OR debt_id: 393291=*"

This sets the cursor, and then returns a cursorMark that I start using in
order to grab 1000 documents at a time.



> ...what exactly does all the code *before* this look like? what is the
> request that you are using to get that initial '$solrResponse' that you
> are parsing to extract '$numFound'  are you sure it's exactly the same as
> the query whose cursor you are iterating over?
>
>
query from before the loop:

"http://10.40.10.14:8983/solr/debt/select?indent=on=1000=id
asc=debt_id: 608384 OR debt_id: 393291=*"

query in the loop:

http://10.40.10.14:8983/solr/debt/select?indent=on=1000=id+asc=debt_id:
608384 OR debt_id: 393291=AoElMTg1MzE=

I do have some logic to make sure i grab the first 1000 from the first
query, but other than that, it's a simple loop.


> It looks like you are (also) extracting 'my $numFound =
> $decoded->{response}{numFound};' on every (cusor) request ... what do you
> get if add this to your cursor loop...
>
>print STDERR "numFound = $numFound at '$cursor'";
>
> numFound is always 35008 because that is how many total documents are
found. The number of docs in the response is the number that I care about,
because that shows me how many came back for this slice.


> ...because unless documents are being added/deleted as you iterate over
> hte cursor, the numFound value should be consistent on each request.
>
>
numFound is consistently 35008.

Thanks

Rhys


Re: Does Solr replicate data securely

2019-11-12 Thread Pushkar Raste
Hi,
How about in the master/slave set up. If I enable ssl in master/slave setup
would the segment and config files be copied using TLS.

On Sat, Nov 9, 2019 at 3:31 PM Jan Høydahl  wrote:

> You choose. If you use solr cloud and have enabled ssl in your cluster,
> then all requests including replication will be secure (https). This it is
> still tcp but using TLS :)
>
> Jan Høydahl
>
> > 6. nov. 2019 kl. 00:03 skrev Pushkar Raste :
> >
> > Hi,
> > When slaves/pull replicas copy index files from master is done using an
> > secure protocol or just over tcp?
> > --
> > — Pushkar Raste
>


Re: sort by score in join with geodist()

2019-11-12 Thread Mikhail Khludnev
Hello,
It seems like I breached the limit on unconscious replies in mailing list
  I'd rather start with this:
q={!join from=site_id to=site_id fromIndex=stores
score=max}+{!geofilt}
{!func}geodist()=coordinates=54.6973867999,25.22481530046=10


On Mon, Nov 11, 2019 at 11:11 PM Mikhail Khludnev  wrote:

> Is it something like  https://issues.apache.org/jira/browse/SOLR-10673 ?
>
> On Mon, Nov 11, 2019 at 3:47 PM Vasily Ogar  wrote:
>
>> it's show nothing because I got an error
>> "metadata":[ "error-class","org.apache.solr.common.SolrException",
>> "root-error-class","org.apache.solr.search.SyntaxError"],
>> "msg":"org.apache.solr.search.SyntaxError:
>> geodist - not enough parameters:[]",
>>
>> If I set parameters then I got another error
>> "metadata":[ "error-class","org.apache.solr.common.SolrException",
>> "root-error-class","org.apache.solr.common.SolrException"], "msg":"A
>> ValueSource isn't directly available from this field. Instead try a query
>> using the distance as the score.",
>>
>> On Mon, Nov 11, 2019 at 1:36 PM Mikhail Khludnev  wrote:
>>
>> > Hello, Vasily.
>> > Why not? What have you got in debugQuery=true?
>> >
>> > On Mon, Nov 11, 2019 at 1:19 PM Vasily Ogar 
>> wrote:
>> >
>> > > Hello,
>> > > Is it possible to sort by score in join by geodist()? For instance,
>> > > something like this
>> > > q={!join from=site_id to=site_id fromIndex=stores score=max}
>> > > +{!func}gedist() +{!geofilt sfield=coordinates
>> > > pt=54.6973867999,25.22481530046 d=10}
>> > > sort=score desc
>> > > Thank you
>> > >
>> >
>> >
>> > --
>> > Sincerely yours
>> > Mikhail Khludnev
>> >
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Sincerely yours
Mikhail Khludnev


Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-12 Thread Guilherme Viteri
What I can't understand is:
I search for the exact term - "Immunoregulatory interactions between a Lymphoid 
and a non-Lymphoid cell" and If i search "I search for the exact term - 
Immunoregulatory interactions between a Lymphoid and non-Lymphoid cell" then it 
works 

> On 11 Nov 2019, at 12:24, Guilherme Viteri  wrote:
> 
> Thanks
>> Removing stopwords is another story. I'm curious to find the reason
>> assuming that you keep on using stopwords. In some cases, stopwords are
>> really necessary.
> Yes. It always make sense the way we've been using.
> 
>> If q.alt is giving you responses, it's confirmed that your stopwords filter
>> is working as expected. The problem definitely lies in the configuration of
>> edismax.
> I see.
> 
>> *Let me explain again:* In your solrconfig.xml, look at your /search
> Ok, using q now, removed all qf, performed the search and I got 23 results, 
> and the one I really want, on the top.
> As soon as I add dbId or stId (regardless the boost, 1.0 or 100.0), then I 
> don't get anything (which make sense). However if I query name_exact, I get 
> the 23 results again, and unfortunately if I query stId^1.0 name_exact^10.0 I 
> still don't get any results.
> 
> In summary
> - without qf - 23 results
> - dbId - 0 results
> - name_exact - 16 results
> - name - 23 results
> - dbId^1.0
>  name_exact^10.0 - 0 results
> - 0 results if any other, stId, dbId (key) is added on top of the 
> name(name_exact, etc).
> 
> Definitely lost here! :-/
> 
> 
>> On 11 Nov 2019, at 07:59, Paras Lehana  wrote:
>> 
>> Hi
>> 
>> So I don't think removing it completely is the way to go from the scenario
>>> we have
>> 
>> 
>> Removing stopwords is another story. I'm curious to find the reason
>> assuming that you keep on using stopwords. In some cases, stopwords are
>> really necessary.
>> 
>> 
>> Quite a considerable increase
>> 
>> 
>> If q.alt is giving you responses, it's confirmed that your stopwords filter
>> is working as expected. The problem definitely lies in the configuration of
>> edismax.
>> 
>> 
>> 
>>> I am sorry but I didn't understand what do you want me to do exactly with
>>> the lst (??) and qf and bf.
>> 
>> 
>> What combinations did you try? I was referring to the field-level boosting
>> you have applied in edismax config.
>> 
>> *Let me explain again:* In your solrconfig.xml, look at your /search
>> request handler. There are many qf and some bq boosts. I want you to remove
>> all of these, check response again (with q now) and keep on adding them
>> again (one by one) while looking for when the numFound drastically changes.
>> 
>> On Fri, 8 Nov 2019 at 23:47, David Hastings 
>> wrote:
>> 
>>> I use 3 word shingles with stopwords for my MLT ML trainer that worked
>>> pretty well for such a solution, but for a full index the size became
>>> prohibitive
>>> 
>>> On Fri, Nov 8, 2019 at 12:13 PM Walter Underwood 
>>> wrote:
>>> 
 If we had IDF for phrases, they would be super effective. The 2X weight
>>> is
 a hack that mostly works.
 
 Infoseek had phrase IDF and it was a killer algorithm for relevance.
 
 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)
 
> On Nov 8, 2019, at 11:08 AM, David Hastings <
 hastings.recurs...@gmail.com> wrote:
> 
> the pf and qf fields are REALLY nice for this
> 
> On Fri, Nov 8, 2019 at 12:02 PM Walter Underwood <
>>> wun...@wunderwood.org>
> wrote:
> 
>> I always enable phrase searching in edismax for exactly this reason.
>> 
>> Something like:
>> 
>> title^16 keywords^8 text^2
>> 
>> To deal with concepts in queries, a classifier and/or named entity
>> extractor can be helpful. If you have a list of concepts (“controlled
>> vocabulary”) that includes “Lamin A”, and that shows up in a query,
>>> that
>> term can be queried against the field matching that vocabulary.
>> 
>> This is how LinkedIn separates people, companies, and places, for
 example.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Nov 8, 2019, at 10:48 AM, Erick Erickson >>> 
>> wrote:
>>> 
>>> Look at the “mm” parameter, try setting it to 100%. Although that’t
>>> not
>> entirely likely to do what you want either since virtually every doc
 will
>> have “a” in it. But at least you’d get docs that have both terms.
>>> 
>>> you may also be able to search for things like “Lamin A” _only as a
>> phrase_ and have some luck. But this is a gnarly problem in general.
 Some
>> people have been able to substitute synonyms and/or shingles to make
 this
>> work at the expense of a larger index.
>>> 
>>> This is a generic problem with context. “Lamin A” is really a
 “concept”,
>> not just two words that happen to be near each other. Searching as a
 phrase
>> is 

Re: $deleteDocByQuery is not working for me

2019-11-12 Thread Paresh
Hi,

I am able to get it done using following way -







Thanks,
Paresh



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: $deleteDocByQuery and $deleteDocByID

2019-11-12 Thread Paresh
Hi,

I am able to get it done using following way -







Thanks,
Paresh



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: $deleteDocByQuery and $deleteDocByID

2019-11-12 Thread Paresh
Hi,

I am able to get it done using following way -







Thanks,
Paresh



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html