Spell check with data from database and not from english dictionary

2020-01-22 Thread seeteshh
Hello all,

Can the spell check feature be configured with words/data fetched from a
database and not from the English dictionary?

Regards,

Seetesh Hindlekar



-
Seetesh Hindlekar
--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Early termination in Lucene 8

2020-01-22 Thread Wei
Hi,

I am excited to see Lucene 8 introduced BlockMax WAND as a major speed
improvement https://issues.apache.org/jira/browse/LUCENE-8135.  My question
is, how does it integrate with facet request,  when the numFound won't be
exact? I did some search but haven't found any documentation on this. Any
pointer is greatly appreciated.

Best,
Wei


Re: BooleanQueryBuilder is not adding parenthesis around the query

2020-01-22 Thread Edward Ribeiro
Cool. Glad to help. :)

Cheers,
Edward

Em qua, 22 de jan de 2020 16:44, Arnold Bronley 
escreveu:

> I knew about the + and other signs and their connections to MUST and other
> operators. What I did not understand was why it was not adding parentheses
> around the expression. In your first replay you mentioned that -  'roughly,
> a builder for each query enclosed in "parenthesis"' - that was the key
> point I was missing.
>
> On Wed, Jan 22, 2020 at 2:40 PM Arnold Bronley 
> wrote:
>
> > Thanks, Edaward. This was the exact answer I was looking for :)
> >
> > On Wed, Jan 22, 2020 at 1:08 PM Edward Ribeiro  >
> > wrote:
> >
> >> If you are using Lucene's BooleanQueryBuilder then you need to do
> nesting
> >> of your queries (roughly, a builder for each query enclosed in
> >> "parenthesis").
> >>
> >> A query like (text:child AND text:toys) OR age:12 would be:
> >>
> >> Query query1 = new TermQuery(new Term("text", "toys"));
> >> Query query2 = new TermQuery(new Term("text", "children"));
> >> Query query3 = new TermQuery(new Term("age", "12"));
> >>
> >> BooleanQuery.Builder andBuilder = new BooleanQuery.Builder();
> >> andBuilder.add(query1, BooleanClause.Occur.MUST);
> >> andBuilder.add(query2, BooleanClause.Occur.MUST);
> >>
> >> BooleanQuery.Builder builder = new BooleanQuery.Builder();
> >> builder.add(andBuilder.build(), BooleanClause.Occur.SHOULD);
> >> builder.add(query3, BooleanClause.Occur.SHOULD);
> >>
> >> BooleanQuery booleanQuery = builder.build();
> >>
> >> This booleanQuery.toString() will be:
> >>
> >> (+text:toys +text:children) age:12
> >>
> >> That is the parsing of "(text:child AND text:toys) OR age:12"
> >>
> >>
> >> Edward
> >>
> >> On Tue, Jan 21, 2020 at 5:24 PM Arnold Bronley  >
> >> wrote:
> >> >
> >> > Hi,
> >> >
> >> > BooleanQueryBuilder is not adding parenthesis around the query. It
> >> > only adds + sign at the start of the query but not the parentheses
> >> around
> >> > the query. Why is that? How should I add it?
> >> >
> >> > booleanQueryBuilder.add(query, BooleanClause.Occur.MUST)
> >>
> >
>


Re: Does it make sense docValues="true" for _root_ field for uniqueBlock()

2020-01-22 Thread Mikhail Khludnev
It's hard to predict will it be faster read docValues files or uninvert
field ad-hoc and read them from heap. Only test might judge it.

On Wed, Jan 22, 2020 at 11:08 PM kumar gaurav  wrote:

> HI Mikhail
>
> for example :- 6GB index size (Parent-child documents)
> indexing in 12 hours interval .
>
> need to use uniqueBlock for json facet for child faceting .
>
> Should i use docValues="true" for _root_  field   ?
>
> Thanks .
>
> regards
> Kumar Gaurav
>
>
>
> On Thu, Jan 23, 2020 at 1:28 AM Mikhail Khludnev  wrote:
>
> > It depends from env.
> >
> > On Wed, Jan 22, 2020 at 9:31 PM kumar gaurav  wrote:
> >
> > > Hi Everyone
> > >
> > > Should i use docValues="true" for _root_  field to improve nested child
> > > json.facet performance  ? i am using uniqueBlock() .
> > >
> > >
> > > Thanks in advance .
> > >
> > > regards
> > > Kumar Gaurav
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev


QParser does not retain double quotes

2020-01-22 Thread Arnold Bronley
Hi,

I have following code that does some parsing with QParser plugin. I noticed
that it does not retain the double quotes in the filterQueryString. How
should make it retain the double quotes?

QParser.getParser(filterQueryString, null, req).getQuery();

filterQueryString passed = id:"x:1234"


Re: Does it make sense docValues="true" for _root_ field for uniqueBlock()

2020-01-22 Thread kumar gaurav
HI Mikhail

for example :- 6GB index size (Parent-child documents)
indexing in 12 hours interval .

need to use uniqueBlock for json facet for child faceting .

Should i use docValues="true" for _root_  field   ?

Thanks .

regards
Kumar Gaurav



On Thu, Jan 23, 2020 at 1:28 AM Mikhail Khludnev  wrote:

> It depends from env.
>
> On Wed, Jan 22, 2020 at 9:31 PM kumar gaurav  wrote:
>
> > Hi Everyone
> >
> > Should i use docValues="true" for _root_  field to improve nested child
> > json.facet performance  ? i am using uniqueBlock() .
> >
> >
> > Thanks in advance .
> >
> > regards
> > Kumar Gaurav
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: Does it make sense docValues="true" for _root_ field for uniqueBlock()

2020-01-22 Thread Mikhail Khludnev
It depends from env.

On Wed, Jan 22, 2020 at 9:31 PM kumar gaurav  wrote:

> Hi Everyone
>
> Should i use docValues="true" for _root_  field to improve nested child
> json.facet performance  ? i am using uniqueBlock() .
>
>
> Thanks in advance .
>
> regards
> Kumar Gaurav
>


-- 
Sincerely yours
Mikhail Khludnev


Re: BooleanQueryBuilder is not adding parenthesis around the query

2020-01-22 Thread Arnold Bronley
I knew about the + and other signs and their connections to MUST and other
operators. What I did not understand was why it was not adding parentheses
around the expression. In your first replay you mentioned that -  'roughly,
a builder for each query enclosed in "parenthesis"' - that was the key
point I was missing.

On Wed, Jan 22, 2020 at 2:40 PM Arnold Bronley 
wrote:

> Thanks, Edaward. This was the exact answer I was looking for :)
>
> On Wed, Jan 22, 2020 at 1:08 PM Edward Ribeiro 
> wrote:
>
>> If you are using Lucene's BooleanQueryBuilder then you need to do nesting
>> of your queries (roughly, a builder for each query enclosed in
>> "parenthesis").
>>
>> A query like (text:child AND text:toys) OR age:12 would be:
>>
>> Query query1 = new TermQuery(new Term("text", "toys"));
>> Query query2 = new TermQuery(new Term("text", "children"));
>> Query query3 = new TermQuery(new Term("age", "12"));
>>
>> BooleanQuery.Builder andBuilder = new BooleanQuery.Builder();
>> andBuilder.add(query1, BooleanClause.Occur.MUST);
>> andBuilder.add(query2, BooleanClause.Occur.MUST);
>>
>> BooleanQuery.Builder builder = new BooleanQuery.Builder();
>> builder.add(andBuilder.build(), BooleanClause.Occur.SHOULD);
>> builder.add(query3, BooleanClause.Occur.SHOULD);
>>
>> BooleanQuery booleanQuery = builder.build();
>>
>> This booleanQuery.toString() will be:
>>
>> (+text:toys +text:children) age:12
>>
>> That is the parsing of "(text:child AND text:toys) OR age:12"
>>
>>
>> Edward
>>
>> On Tue, Jan 21, 2020 at 5:24 PM Arnold Bronley 
>> wrote:
>> >
>> > Hi,
>> >
>> > BooleanQueryBuilder is not adding parenthesis around the query. It
>> > only adds + sign at the start of the query but not the parentheses
>> around
>> > the query. Why is that? How should I add it?
>> >
>> > booleanQueryBuilder.add(query, BooleanClause.Occur.MUST)
>>
>


Re: BooleanQueryBuilder is not adding parenthesis around the query

2020-01-22 Thread Arnold Bronley
Thanks, Edaward. This was the exact answer I was looking for :)

On Wed, Jan 22, 2020 at 1:08 PM Edward Ribeiro 
wrote:

> If you are using Lucene's BooleanQueryBuilder then you need to do nesting
> of your queries (roughly, a builder for each query enclosed in
> "parenthesis").
>
> A query like (text:child AND text:toys) OR age:12 would be:
>
> Query query1 = new TermQuery(new Term("text", "toys"));
> Query query2 = new TermQuery(new Term("text", "children"));
> Query query3 = new TermQuery(new Term("age", "12"));
>
> BooleanQuery.Builder andBuilder = new BooleanQuery.Builder();
> andBuilder.add(query1, BooleanClause.Occur.MUST);
> andBuilder.add(query2, BooleanClause.Occur.MUST);
>
> BooleanQuery.Builder builder = new BooleanQuery.Builder();
> builder.add(andBuilder.build(), BooleanClause.Occur.SHOULD);
> builder.add(query3, BooleanClause.Occur.SHOULD);
>
> BooleanQuery booleanQuery = builder.build();
>
> This booleanQuery.toString() will be:
>
> (+text:toys +text:children) age:12
>
> That is the parsing of "(text:child AND text:toys) OR age:12"
>
>
> Edward
>
> On Tue, Jan 21, 2020 at 5:24 PM Arnold Bronley 
> wrote:
> >
> > Hi,
> >
> > BooleanQueryBuilder is not adding parenthesis around the query. It
> > only adds + sign at the start of the query but not the parentheses around
> > the query. Why is that? How should I add it?
> >
> > booleanQueryBuilder.add(query, BooleanClause.Occur.MUST)
>


Re: BooleanQueryBuilder is not adding parenthesis around the query

2020-01-22 Thread Edward Ribeiro
Oh, you asked about the meaning of the plus sign too.

Well, I recommend reading a book* or any tutorial, but the clauses of
boolean queries there are three occurences, SHOULD, MUST and MUST_NOT, that
roughly translate to OR, AND, and NOT, respectively.

The plus sign means MUST, the minus sign means MUST_NOT and the absence of
both means SHOULD (it may or may not match any indexed term)

For example:

- a query like "text:(toys AND child)" will be translated like "+text:toys
+text:child" (both terms are required to match)

- a query like "text:(toys OR child)" will be translated as "text:toys
text:child" (both terms or only one term can match, that is more or less
equivalent to OR);

- a query like "text:toys NOT text:child" will be translated as "text:toys
-text:child" (try to match text:toys, but also remove the docs that match
text:child from the result set);

* = Lucene book, Solr book and Relevant Search book are excellent resources!

Edward

Em qua, 22 de jan de 2020 15:07, Edward Ribeiro 
escreveu:

> If you are using Lucene's BooleanQueryBuilder then you need to do nesting
> of your queries (roughly, a builder for each query enclosed in
> "parenthesis").
>
> A query like (text:child AND text:toys) OR age:12 would be:
>
> Query query1 = new TermQuery(new Term("text", "toys"));
> Query query2 = new TermQuery(new Term("text", "children"));
> Query query3 = new TermQuery(new Term("age", "12"));
>
> BooleanQuery.Builder andBuilder = new BooleanQuery.Builder();
> andBuilder.add(query1, BooleanClause.Occur.MUST);
> andBuilder.add(query2, BooleanClause.Occur.MUST);
>
> BooleanQuery.Builder builder = new BooleanQuery.Builder();
> builder.add(andBuilder.build(), BooleanClause.Occur.SHOULD);
> builder.add(query3, BooleanClause.Occur.SHOULD);
>
> BooleanQuery booleanQuery = builder.build();
>
> This booleanQuery.toString() will be:
>
> (+text:toys +text:children) age:12
>
> That is the parsing of "(text:child AND text:toys) OR age:12"
>
>
> Edward
>
> On Tue, Jan 21, 2020 at 5:24 PM Arnold Bronley 
> wrote:
> >
> > Hi,
> >
> > BooleanQueryBuilder is not adding parenthesis around the query. It
> > only adds + sign at the start of the query but not the parentheses around
> > the query. Why is that? How should I add it?
> >
> > booleanQueryBuilder.add(query, BooleanClause.Occur.MUST)
>


Re: Solr 8.0 Json Facets are slow - need help

2020-01-22 Thread kumar gaurav
Also

its not looks like box is slow . because for following query prepare time
is 3 ms but facet time is 84ms on the same box .Don't know why prepare time
was huge for that example :( .

debug:
{

   - rawquerystring:
   "{!parent tag=top which=$pq filters=$child.fq score=max v=$cq}",
   - querystring:
   "{!parent tag=top which=$pq filters=$child.fq score=max v=$cq}",
   - parsedquery:
   "AllParentsAware(ToParentBlockJoinQuery (+(+docType:sku
+store_873:1) #color_refine:Blue #size_refine:L))"
   ,
   - parsedquery_toString:
   "ToParentBlockJoinQuery (+(+docType:sku +store_873:1)
#color_refine:Blue #size_refine:L)"
   ,
   - explain:
   {
  - 1729659: "
  2.0 = Score based on 2 child docs in range from 5103808 to
5104159, best match:
2.0 = sum of: 2.0 = sum of:   1.0 = docType:sku
1.0 = store_873:1 0.0 = match on required clause, product of:
0.0 = # clause
1.0 = weight(color_refine:Blue in 4059732)
[DisabledStatisticsSimilarity], result of:
  1.0 = score(freq=1.0), product of:
1.0 = idf(docFreq, docCount)
  1 = docFreq, number of documents containing term
  1 = docCount, total number of documents with field
1.0 = tf(freq=1.0), with freq of:
  1.0 = freq, occurrences of term within document
1.0 = fieldNorm 0.0 = match on required clause, product of:
0.0 = # clause
1.0 = weight(size_refine:L in 4059732)
[DisabledStatisticsSimilarity], result of:
  1.0 = score(freq=1.0), product of:
1.0 = idf(docFreq, docCount)
  1 = docFreq, number of documents containing term
  1 = docCount, total number of documents with field
1.0 = tf(freq=1.0), with freq of:
  1.0 = freq, occurrences of term within document
1.0 = fieldNorm ",
  - 1730320: "
  2.0 = Score based on 1 child docs in range from 5099889 to
5100070, best match:
2.0 = sum of: 2.0 = sum of:   1.0 = docType:sku
1.0 = store_873:1 0.0 = match on required clause, product of:
0.0 = # clause
1.0 = weight(color_refine:Blue in 4055914)
[DisabledStatisticsSimilarity], result of:
  1.0 = score(freq=1.0), product of:
1.0 = idf(docFreq, docCount)
  1 = docFreq, number of documents containing term
  1 = docCount, total number of documents with field
1.0 = tf(freq=1.0), with freq of:
  1.0 = freq, occurrences of term within document
1.0 = fieldNorm 0.0 = match on required clause, product of:
0.0 = # clause
1.0 = weight(size_refine:L in 4055914)
[DisabledStatisticsSimilarity], result of:
  1.0 = score(freq=1.0), product of:
1.0 = idf(docFreq, docCount)
  1 = docFreq, number of documents containing term
  1 = docCount, total number of documents with field
1.0 = tf(freq=1.0), with freq of:
  1.0 = freq, occurrences of term within document
1.0 = fieldNorm ",
  - 1730721: "
  2.0 = Score based on 4 child docs in range from 5097552 to
5097808, best match:
2.0 = sum of: 2.0 = sum of:   1.0 = docType:sku
1.0 = store_873:1 0.0 = match on required clause, product of:
0.0 = # clause
1.0 = weight(color_refine:Blue in 4053487)
[DisabledStatisticsSimilarity], result of:
  1.0 = score(freq=1.0), product of:
1.0 = idf(docFreq, docCount)
  1 = docFreq, number of documents containing term
  1 = docCount, total number of documents with field
1.0 = tf(freq=1.0), with freq of:
  1.0 = freq, occurrences of term within document
1.0 = fieldNorm 0.0 = match on required clause, product of:
0.0 = # clause
1.0 = weight(size_refine:L in 4053487)
[DisabledStatisticsSimilarity], result of:
  1.0 = score(freq=1.0), product of:
1.0 = idf(docFreq, docCount)
  1 = docFreq, number of documents containing term
  1 = docCount, total number of documents with field
1.0 = tf(freq=1.0), with freq of:
  1.0 = freq, occurrences of term within document
1.0 = fieldNorm ",
  - 1759239: "
  2.0 = Score based on 1 child docs in range from 5061166 to
5061231, best match:
2.0 = sum of: 2.0 = sum of:   1.0 = docType:sku
1.0 = store_873:1 0.0 = match on required clause, product of:
0.0 = # clause
1.0 = weight(color_refine:Blue in 4017096)
[DisabledStatisticsSimilarity], result of:
  1.0 = score(freq=1.0), product of:
1.0 = 

Does it make sense docValues="true" for _root_ field for uniqueBlock()

2020-01-22 Thread kumar gaurav
Hi Everyone

Should i use docValues="true" for _root_  field to improve nested child
json.facet performance  ? i am using uniqueBlock() .


Thanks in advance .

regards
Kumar Gaurav


Re: Solr 8.0 Json Facets are slow - need help

2020-01-22 Thread kumar gaurav
Lots of thanks Mikhail.

Also can you please answer - Should i use docValues="true" for _root_
field to improve this json.facet performance ?

On Wed, Jan 22, 2020 at 11:42 PM Mikhail Khludnev  wrote:

> Initial request refers unknown (to me) query parser  {!simpleFilter, I
> can't comment on it.
> Parsing queries took in millis: - time: 261, usually prepare for query
> takes a moment. I suspect the box is really slow per se or encounter heavy
> load.
> And then facets took about 6 times more  - facet_module: {   - time: 1122,
> that a reasonable ratio.
> I also notice limit: -1 that's really expensive usually. If tweaking can't
> help, only profiling might give a clue.
> Note: in 8.5 there will be uniqueBlockQuery() operation, which is expected
> to be faster than uniqueBlock()
>
> On Wed, Jan 22, 2020 at 5:36 PM kumar gaurav  wrote:
>
> > HI Mikhail
> >
> > Here is full debug log . Please have a look .
> >
> > debug:
> > {
> >
> >- rawquerystring:
> >"{!parent tag=top which=$pq filters=$child.fq score=max v=$cq}",
> >- querystring:
> >"{!parent tag=top which=$pq filters=$child.fq score=max v=$cq}",
> >- parsedquery:
> >"AllParentsAware(ToParentBlockJoinQuery (+(+docType:sku
> > +(store_873:1)^0.0) #(filter(color_refine:Black)
> > filter(color_refine:Blue"
> >,
> >- parsedquery_toString:
> >"ToParentBlockJoinQuery (+(+docType:sku +(store_873:1)^0.0)
> > #(filter(color_refine:Black) filter(color_refine:Blue)))"
> >,
> >- explain:
> >{
> >   - 5172: "
> >   1.0 = Score based on 240 child docs in range from 2572484 to
> > 2573162, best match:
> > 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> > 0.0 = ConstantScore(store_873:1)^0.0
> >   0.0 = match on required clause, product of:   0.0 = #
> clause
> > 0.0 = sum of: 0.0 =
> > ConstantScore(BitSetDocTopFilter)^0.0 "
> >   ,
> >   - 5178: "
> >   1.0 = Score based on 304 child docs in range from 2571860 to
> > 2572404, best match:
> > 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> > 0.0 = ConstantScore(store_873:1)^0.0
> >   0.0 = match on required clause, product of:   0.0 = #
> clause
> > 0.0 = sum of: 0.0 =
> > ConstantScore(BitSetDocTopFilter)^0.0 "
> >   ,
> >   - 9301: "
> >   1.0 = Score based on 93 child docs in range from 710150 to
> > 710796, best match:
> > 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> > 0.0 = ConstantScore(store_873:1)^0.0
> >   0.0 = match on required clause, product of:   0.0 = #
> clause
> > 0.0 = sum of: 0.0 =
> > ConstantScore(BitSetDocTopFilter)^0.0 "
> >   ,
> >   - 118561: "
> >   1.0 = Score based on 177 child docs in range from 5728215 to
> > 5728505, best match:
> > 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> > 0.0 = ConstantScore(store_873:1)^0.0
> >   0.0 = match on required clause, product of:   0.0 = #
> clause
> > 0.0 = sum of: 0.0 =
> > ConstantScore(BitSetDocTopFilter)^0.0 "
> >   ,
> >   - 266659: "
> >   1.0 = Score based on 89 child docs in range from 5368923 to
> > 5369396, best match:
> > 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> > 0.0 = ConstantScore(store_873:1)^0.0
> >   0.0 = match on required clause, product of:   0.0 = #
> clause
> > 0.0 = sum of: 0.0 =
> > ConstantScore(BitSetDocTopFilter)^0.0 "
> >   ,
> >   - 323407: "
> >   1.0 = Score based on 321 child docs in range from 4807493 to
> > 4808441, best match:
> > 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> > 0.0 = ConstantScore(store_873:1)^0.0
> >   0.0 = match on required clause, product of:   0.0 = #
> clause
> > 0.0 = sum of: 0.0 =
> > ConstantScore(BitSetDocTopFilter)^0.0 "
> >   ,
> >   - 381312: "
> >   1.0 = Score based on 232 child docs in range from 2660717 to
> > 2661101, best match:
> > 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> > 0.0 = ConstantScore(store_873:1)^0.0
> >   0.0 = match on required clause, product of:   0.0 = #
> clause
> > 0.0 = sum of: 0.0 =
> > ConstantScore(BitSetDocTopFilter)^0.0 "
> >   ,
> >   - 851246: "
> >   1.0 = Score based on 61 child docs in range from 730259 to
> > 730562, best match:
> > 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> > 0.0 = ConstantScore(store_873:1)^0.0
> >   0.0 = match on required clause, product of:   0.0 = #
> clause
> > 0.0 = sum of: 0.0 =
> > ConstantScore(BitSetDocTopFilter)^0.0 "
> >   ,
> >   - 1564330: "
> >   1.0 = Score based on 12 child docs in range from 6831792 to
> > 6832154, best match:
> > 1.0 = sum 

Re: Solr 8.0 Json Facets are slow - need help

2020-01-22 Thread Mikhail Khludnev
Initial request refers unknown (to me) query parser  {!simpleFilter, I
can't comment on it.
Parsing queries took in millis: - time: 261, usually prepare for query
takes a moment. I suspect the box is really slow per se or encounter heavy
load.
And then facets took about 6 times more  - facet_module: {   - time: 1122,
that a reasonable ratio.
I also notice limit: -1 that's really expensive usually. If tweaking can't
help, only profiling might give a clue.
Note: in 8.5 there will be uniqueBlockQuery() operation, which is expected
to be faster than uniqueBlock()

On Wed, Jan 22, 2020 at 5:36 PM kumar gaurav  wrote:

> HI Mikhail
>
> Here is full debug log . Please have a look .
>
> debug:
> {
>
>- rawquerystring:
>"{!parent tag=top which=$pq filters=$child.fq score=max v=$cq}",
>- querystring:
>"{!parent tag=top which=$pq filters=$child.fq score=max v=$cq}",
>- parsedquery:
>"AllParentsAware(ToParentBlockJoinQuery (+(+docType:sku
> +(store_873:1)^0.0) #(filter(color_refine:Black)
> filter(color_refine:Blue"
>,
>- parsedquery_toString:
>"ToParentBlockJoinQuery (+(+docType:sku +(store_873:1)^0.0)
> #(filter(color_refine:Black) filter(color_refine:Blue)))"
>,
>- explain:
>{
>   - 5172: "
>   1.0 = Score based on 240 child docs in range from 2572484 to
> 2573162, best match:
> 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> 0.0 = ConstantScore(store_873:1)^0.0
>   0.0 = match on required clause, product of:   0.0 = # clause
> 0.0 = sum of: 0.0 =
> ConstantScore(BitSetDocTopFilter)^0.0 "
>   ,
>   - 5178: "
>   1.0 = Score based on 304 child docs in range from 2571860 to
> 2572404, best match:
> 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> 0.0 = ConstantScore(store_873:1)^0.0
>   0.0 = match on required clause, product of:   0.0 = # clause
> 0.0 = sum of: 0.0 =
> ConstantScore(BitSetDocTopFilter)^0.0 "
>   ,
>   - 9301: "
>   1.0 = Score based on 93 child docs in range from 710150 to
> 710796, best match:
> 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> 0.0 = ConstantScore(store_873:1)^0.0
>   0.0 = match on required clause, product of:   0.0 = # clause
> 0.0 = sum of: 0.0 =
> ConstantScore(BitSetDocTopFilter)^0.0 "
>   ,
>   - 118561: "
>   1.0 = Score based on 177 child docs in range from 5728215 to
> 5728505, best match:
> 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> 0.0 = ConstantScore(store_873:1)^0.0
>   0.0 = match on required clause, product of:   0.0 = # clause
> 0.0 = sum of: 0.0 =
> ConstantScore(BitSetDocTopFilter)^0.0 "
>   ,
>   - 266659: "
>   1.0 = Score based on 89 child docs in range from 5368923 to
> 5369396, best match:
> 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> 0.0 = ConstantScore(store_873:1)^0.0
>   0.0 = match on required clause, product of:   0.0 = # clause
> 0.0 = sum of: 0.0 =
> ConstantScore(BitSetDocTopFilter)^0.0 "
>   ,
>   - 323407: "
>   1.0 = Score based on 321 child docs in range from 4807493 to
> 4808441, best match:
> 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> 0.0 = ConstantScore(store_873:1)^0.0
>   0.0 = match on required clause, product of:   0.0 = # clause
> 0.0 = sum of: 0.0 =
> ConstantScore(BitSetDocTopFilter)^0.0 "
>   ,
>   - 381312: "
>   1.0 = Score based on 232 child docs in range from 2660717 to
> 2661101, best match:
> 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> 0.0 = ConstantScore(store_873:1)^0.0
>   0.0 = match on required clause, product of:   0.0 = # clause
> 0.0 = sum of: 0.0 =
> ConstantScore(BitSetDocTopFilter)^0.0 "
>   ,
>   - 851246: "
>   1.0 = Score based on 61 child docs in range from 730259 to
> 730562, best match:
> 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> 0.0 = ConstantScore(store_873:1)^0.0
>   0.0 = match on required clause, product of:   0.0 = # clause
> 0.0 = sum of: 0.0 =
> ConstantScore(BitSetDocTopFilter)^0.0 "
>   ,
>   - 1564330: "
>   1.0 = Score based on 12 child docs in range from 6831792 to
> 6832154, best match:
> 1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
> 0.0 = ConstantScore(store_873:1)^0.0
>   0.0 = match on required clause, product of:   0.0 = # clause
> 0.0 = sum of: 0.0 =
> ConstantScore(BitSetDocTopFilter)^0.0 "
>   ,
>   - 1695762: "
>   1.0 = Score based on 157 child docs in range from 5155397 to
> 5156414, best match:
> 1.0 = sum of: 1.0 = sum of:   1.0 = 

Re: BooleanQueryBuilder is not adding parenthesis around the query

2020-01-22 Thread Edward Ribeiro
If you are using Lucene's BooleanQueryBuilder then you need to do nesting
of your queries (roughly, a builder for each query enclosed in
"parenthesis").

A query like (text:child AND text:toys) OR age:12 would be:

Query query1 = new TermQuery(new Term("text", "toys"));
Query query2 = new TermQuery(new Term("text", "children"));
Query query3 = new TermQuery(new Term("age", "12"));

BooleanQuery.Builder andBuilder = new BooleanQuery.Builder();
andBuilder.add(query1, BooleanClause.Occur.MUST);
andBuilder.add(query2, BooleanClause.Occur.MUST);

BooleanQuery.Builder builder = new BooleanQuery.Builder();
builder.add(andBuilder.build(), BooleanClause.Occur.SHOULD);
builder.add(query3, BooleanClause.Occur.SHOULD);

BooleanQuery booleanQuery = builder.build();

This booleanQuery.toString() will be:

(+text:toys +text:children) age:12

That is the parsing of "(text:child AND text:toys) OR age:12"


Edward

On Tue, Jan 21, 2020 at 5:24 PM Arnold Bronley 
wrote:
>
> Hi,
>
> BooleanQueryBuilder is not adding parenthesis around the query. It
> only adds + sign at the start of the query but not the parentheses around
> the query. Why is that? How should I add it?
>
> booleanQueryBuilder.add(query, BooleanClause.Occur.MUST)


Apache Solr HTTP health endpoint for blackbox_exporter probings

2020-01-22 Thread Daniel Trüssel

Hey

With DuckDuckGo I found no HTTP health endpoint for Solr.

I use https://github.com/prometheus/blackbox_exporter to probe our apps. 
JMX_exporter is not an option, I need to use blackbox.


Please point me in the right direction.

kind regards

Daniel



Re: Lucene query to Solr query

2020-01-22 Thread Edward Ribeiro
equivalent to "+(topics:29)^2 (topics:38)^3 +(-id:41135)", I mean. :)

Edward

On Wed, Jan 22, 2020 at 1:51 PM Edward Ribeiro 
wrote:

> Hi,
>
> A more or less equivalent query (using Solr's LuceneQParser) to
> "topics:29^2 AND (-id:41135) topics:38^3" would be:
>
> topics:29^2 AND (-id:41135) topics:38^3
>
> Edward
>
> On Mon, Jan 20, 2020 at 1:10 AM Arnold Bronley 
> wrote:
>
>> Hi,
>>
>> I have a Lucene query as following (toString represenation of Lucene's
>> Query object):
>>
>> +(topics:29)^2 (topics:38)^3 +(-id:41135)
>>
>> It works fine when I am using it as a lucene query in
>> SolrIndexSearcher.getDocList function.
>>
>> However, now I want to use it as a Solr query and query against a
>> collection. I tried to use the as-is representation from Lucene query
>> object's toString method but it does not work. How should I proceed?
>>
>


Re: Lucene query to Solr query

2020-01-22 Thread Edward Ribeiro
Hi,

A more or less equivalent query (using Solr's LuceneQParser) to
"topics:29^2 AND (-id:41135) topics:38^3" would be:

topics:29^2 AND (-id:41135) topics:38^3

Edward

On Mon, Jan 20, 2020 at 1:10 AM Arnold Bronley 
wrote:

> Hi,
>
> I have a Lucene query as following (toString represenation of Lucene's
> Query object):
>
> +(topics:29)^2 (topics:38)^3 +(-id:41135)
>
> It works fine when I am using it as a lucene query in
> SolrIndexSearcher.getDocList function.
>
> However, now I want to use it as a Solr query and query against a
> collection. I tried to use the as-is representation from Lucene query
> object's toString method but it does not work. How should I proceed?
>


Re: Is it possible to add stemming in a text_exact field

2020-01-22 Thread Edward Ribeiro
Hi,

One possible solution would be to create a second field (e.g.,
text_general) that uses DefaultTokenizer, or other tokenizer that breaks
the string into tokens, and use a copyField to copy the content from
text_exact to text_general. Then, you can use edismax parser to search both
fields, but giving text_exact a higher boost (qf=text_exact^5
text_general). In this case, both fields should be indexed, but only one
needs to be stored.

Edward

On Wed, Jan 22, 2020 at 10:34 AM Dhanesh Radhakrishnan 
wrote:

> Hello,
> I'm facing an issue with stemming.
> My search query is "restaurant dubai" and returns  results.
> If I search "restaurants dubai" it returns no data.
>
> How to stem this keyword "restaurant dubai" with "restaurants dubai" ?
>
> I'm using a text exact field for search.
>
>  multiValued="true" omitNorms="false" omitTermFreqAndPositions="false"/>
>
> Here is the field definition
>
>  positionIncrementGap="100">
> 
>
>
>
>
> 
> 
>   
>   
>   
>   
>
> 
>
> Is there any solutions without changing the tokenizer class.
>
>
>
>
> Dhanesh S.R
>
> --
> IMPORTANT: This is an e-mail from HiFX IT Media Services Pvt. Ltd. Its
> content are confidential to the intended recipient. If you are not the
> intended recipient, be advised that you have received this e-mail in error
> and that any use, dissemination, forwarding, printing or copying of this
> e-mail is strictly prohibited. It may not be disclosed to or used by
> anyone
> other than its intended recipient, nor may it be copied in any way. If
> received in error, please email a reply to the sender, then delete it from
> your system.
>
> Although this e-mail has been scanned for viruses, HiFX
> cannot ultimately accept any responsibility for viruses and it is your
> responsibility to scan attachments (if any).
>
> ​Before you print this email
> or attachments, please consider the negative environmental impacts
> associated with printing.
>


Re: Solr 8.0 Json Facets are slow - need help

2020-01-22 Thread kumar gaurav
HI Mikhail

Here is full debug log . Please have a look .

debug:
{

   - rawquerystring:
   "{!parent tag=top which=$pq filters=$child.fq score=max v=$cq}",
   - querystring:
   "{!parent tag=top which=$pq filters=$child.fq score=max v=$cq}",
   - parsedquery:
   "AllParentsAware(ToParentBlockJoinQuery (+(+docType:sku
+(store_873:1)^0.0) #(filter(color_refine:Black)
filter(color_refine:Blue"
   ,
   - parsedquery_toString:
   "ToParentBlockJoinQuery (+(+docType:sku +(store_873:1)^0.0)
#(filter(color_refine:Black) filter(color_refine:Blue)))"
   ,
   - explain:
   {
  - 5172: "
  1.0 = Score based on 240 child docs in range from 2572484 to
2573162, best match:
1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
0.0 = ConstantScore(store_873:1)^0.0
  0.0 = match on required clause, product of:   0.0 = # clause
0.0 = sum of: 0.0 = ConstantScore(BitSetDocTopFilter)^0.0 "
  ,
  - 5178: "
  1.0 = Score based on 304 child docs in range from 2571860 to
2572404, best match:
1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
0.0 = ConstantScore(store_873:1)^0.0
  0.0 = match on required clause, product of:   0.0 = # clause
0.0 = sum of: 0.0 = ConstantScore(BitSetDocTopFilter)^0.0 "
  ,
  - 9301: "
  1.0 = Score based on 93 child docs in range from 710150 to
710796, best match:
1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
0.0 = ConstantScore(store_873:1)^0.0
  0.0 = match on required clause, product of:   0.0 = # clause
0.0 = sum of: 0.0 = ConstantScore(BitSetDocTopFilter)^0.0 "
  ,
  - 118561: "
  1.0 = Score based on 177 child docs in range from 5728215 to
5728505, best match:
1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
0.0 = ConstantScore(store_873:1)^0.0
  0.0 = match on required clause, product of:   0.0 = # clause
0.0 = sum of: 0.0 = ConstantScore(BitSetDocTopFilter)^0.0 "
  ,
  - 266659: "
  1.0 = Score based on 89 child docs in range from 5368923 to
5369396, best match:
1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
0.0 = ConstantScore(store_873:1)^0.0
  0.0 = match on required clause, product of:   0.0 = # clause
0.0 = sum of: 0.0 = ConstantScore(BitSetDocTopFilter)^0.0 "
  ,
  - 323407: "
  1.0 = Score based on 321 child docs in range from 4807493 to
4808441, best match:
1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
0.0 = ConstantScore(store_873:1)^0.0
  0.0 = match on required clause, product of:   0.0 = # clause
0.0 = sum of: 0.0 = ConstantScore(BitSetDocTopFilter)^0.0 "
  ,
  - 381312: "
  1.0 = Score based on 232 child docs in range from 2660717 to
2661101, best match:
1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
0.0 = ConstantScore(store_873:1)^0.0
  0.0 = match on required clause, product of:   0.0 = # clause
0.0 = sum of: 0.0 = ConstantScore(BitSetDocTopFilter)^0.0 "
  ,
  - 851246: "
  1.0 = Score based on 61 child docs in range from 730259 to
730562, best match:
1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
0.0 = ConstantScore(store_873:1)^0.0
  0.0 = match on required clause, product of:   0.0 = # clause
0.0 = sum of: 0.0 = ConstantScore(BitSetDocTopFilter)^0.0 "
  ,
  - 1564330: "
  1.0 = Score based on 12 child docs in range from 6831792 to
6832154, best match:
1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
0.0 = ConstantScore(store_873:1)^0.0
  0.0 = match on required clause, product of:   0.0 = # clause
0.0 = sum of: 0.0 = ConstantScore(BitSetDocTopFilter)^0.0 "
  ,
  - 1695762: "
  1.0 = Score based on 157 child docs in range from 5155397 to
5156414, best match:
1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
0.0 = ConstantScore(store_873:1)^0.0
  0.0 = match on required clause, product of:   0.0 = # clause
0.0 = sum of: 0.0 = ConstantScore(BitSetDocTopFilter)^0.0 "
  ,
  - 1728758: "
  1.0 = Score based on 4 child docs in range from 5108617 to
5108632, best match:
1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
0.0 = ConstantScore(store_873:1)^0.0
  0.0 = match on required clause, product of:   0.0 = # clause
0.0 = sum of: 0.0 = ConstantScore(BitSetDocTopFilter)^0.0 "
  ,
  - 1730721: "
  1.0 = Score based on 34 child docs in range from 5097552 to
5097808, best match:
1.0 = sum of: 1.0 = sum of:   1.0 = docType:sku
0.0 = ConstantScore(store_873:1)^0.0
  0.0 

Re: Solr 7.7 heap space is getting full

2020-01-22 Thread Michael Gibney
Rajdeep, you say that "suddenly" heap space is getting full ... does
this mean that some variant of this configuration was working for you
at some point, or just that the failure happens quickly?

If heap space and faceting are indeed the bottleneck, you might make
sure that you have docValues enabled for your facet field fieldTypes,
and perhaps set uninvertible=false.

I'm not seeing where large numbers of facets initially came from in
this thread? But on that topic this is perhaps relevant, regarding the
potential utility of a facet cache:
https://issues.apache.org/jira/browse/SOLR-13807

Michael

On Wed, Jan 22, 2020 at 7:16 AM Toke Eskildsen  wrote:
>
> On Sun, 2020-01-19 at 21:19 -0500, Mehai, Lotfi wrote:
> > I  had a similar issue with a large number of facets. There is no way
> > (At least I know) your can get an acceptable response time from
> > search engine with high number of facets.
>
> Just for the record then it is doable under specific circumstances
> (static single-shard index, only String fields, Solr 4 with patch,
> fixed list of facet fields):
> https://sbdevel.wordpress.com/2013/03/20/over-9000-facet-fields/
>
> More usable for the current case would be to play with facet.threads
> and throw hardware with many CPU-cores after the problem.
>
> - Toke Eskildsen, Royal Danish Library
>
>


Is it possible to add stemming in a text_exact field

2020-01-22 Thread Dhanesh Radhakrishnan
Hello,
I'm facing an issue with stemming.
My search query is "restaurant dubai" and returns  results.
If I search "restaurants dubai" it returns no data.

How to stem this keyword "restaurant dubai" with "restaurants dubai" ?

I'm using a text exact field for search.



Here is the field definition



   
   
   
   


  
  
  
  
   


Is there any solutions without changing the tokenizer class.




Dhanesh S.R

-- 
IMPORTANT: This is an e-mail from HiFX IT Media Services Pvt. Ltd. Its 
content are confidential to the intended recipient. If you are not the 
intended recipient, be advised that you have received this e-mail in error 
and that any use, dissemination, forwarding, printing or copying of this 
e-mail is strictly prohibited. It may not be disclosed to or used by anyone 
other than its intended recipient, nor may it be copied in any way. If 
received in error, please email a reply to the sender, then delete it from 
your system. 

Although this e-mail has been scanned for viruses, HiFX 
cannot ultimately accept any responsibility for viruses and it is your 
responsibility to scan attachments (if any).

​Before you print this email 
or attachments, please consider the negative environmental impacts 
associated with printing.


Re: regarding Extracting text from Images

2020-01-22 Thread Steve Ge
In my experience, enabling Tika at server level can result in memory heap space 
used up under high volume of extraction, and bring down Solr entirely.   Likely 
due to garbage collector not able to keep up w/ load, even tuning garbage 
collector didn't resolve the problem completely.  Not recommend.
Steve  
 
  On Wed, Oct 23, 2019 at 7:08 PM, suresh pendap wrote: 
  Hi Alex,
Thanks for your reply. How do we integrate tesseract with Solr?  Do we have
to implement Custom update processor or extend the
ExtractingRequestProcessor?

Regards
Suresh

On Wed, Oct 23, 2019 at 11:21 AM Alexandre Rafalovitch 
wrote:

> I believe Tika that powers this can do so with extra libraries (tesseract?)
> But Solr does not bundle those extras.
>
> In any case, you may want to run Tika externally to avoid the
> conversion/extraction process be a burden to Solr itself.
>
> Regards,
>      Alex
>
> On Wed, Oct 23, 2019, 1:58 PM suresh pendap, 
> wrote:
>
> > Hello,
> > I am reading the Solr documentation about integration with Tika and Solr
> > Cell framework over here
> >
> >
> https://lucene.apache.org/solr/guide/6_6/uploading-data-with-solr-cell-using-apache-tika.html
> >
> > I would like to know if the can Solr Cell framework also be used to
> extract
> > text from the image files?
> >
> > Regards
> > Suresh
> >
>
  


Query Regarding SOLR cross collection join

2020-01-22 Thread Doss
HI,

SOLR version 8.3.1 (10 nodes), zookeeper ensemble (3 nodes)

One of our use cases requires joins, we are joining 2 large indexes. As
required by SOLR one index (2GB) has one shared and 10 replicas and the
other has 10 shard (40GB / Shard).

The query takes too much time, some times in minutes how can we improve
this?

Debug query produces one or more based on the number of shards (i believe)

"time":303442,
"fromSetSize":0,
"toSetSize":81653955,
"fromTermCount":0,
"fromTermTotalDf":0,
"fromTermDirectCount":0,
"fromTermHits":0,
"fromTermHitsTotalDf":0,
"toTermHits":0,
"toTermHitsTotalDf":0,
"toTermDirectCount":0,
"smallSetsDeferred":0,
"toSetDocsAdded":0},

here what is the  toSetSize  mean? does it read 81MB of data from the
index? how can we reduce this?

Read somewhere that the score join parser will be faster, but for me it
produces no results. I am using string type fields for from and to.


Thanks!


Re: SolrCloud upgrade concern

2020-01-22 Thread Jason Gerlowski
Hi Arnold,

The stability and complexity issues Mark highlighted in his post
aren't just imagined - there are real, sometimes serious, bugs in
SolrCloud features.  But at the same time there are many many stable
deployments out there where SolrCloud is a real success story for
users.  Small example, I work at a company (Lucidworks) where our main
product (Fusion) is built heavily on top of SolrCloud and we see it
deployed successfully every day.

In no way am I trying to minimize Mark's concerns (or David's).  There
are stability bugs.  But the extent to which those need affect you
depends a lot on what your deployment looks like.  How many nodes?
How many collections?  How tightly are you trying to squeeze your
hardware?  Is your network flaky?  Are you looking to use any of
SolrCloud's newer, less stable features like CDCR, etc.?

Is SolrCloud better for you than Master/Slave?  It depends on what
you're hoping to gain by a move to SolrCloud, and on your answers to
some of the questions above.  I would be leery of following any
recommendations that are made without regard for your reason for
switching or your deployment details.  Those things are always the
biggest driver in terms of success.

Good luck making your decision!

Best,

Jason


null:org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /roles.json

2020-01-22 Thread sotna
Hi.

We have SolrCloud enabled on production environment (2 Solr [16 GB RAM each] 
nodes and 3 Zookeeper nodes, each hosted on separate server)

Quite seldom Solr loose connection to zookeeper search stop working.
After we restarting all zookeeper nodes at a time - it starts working again

I Solr logs I can find next errors:

1/21/2020, 7:48:01 PM
ERROR true
OverseerTaskProcessor
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /overseer_elect/leader
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1212)
    at 
org.apache.solr.common.cloud.SolrZkClient.lambda$getData$5(SolrZkClient.java:339)
    at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
    at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:339)
    at 
org.apache.solr.cloud.OverseerTaskProcessor.amILeader(OverseerTaskProcessor.java:387)
    at 
org.apache.solr.cloud.OverseerTaskProcessor.run(OverseerTaskProcessor.java:193)
    at java.lang.Thread.run(Unknown Source)

1/21/2020, 7:52:38 PM
ERROR true
HttpSolrCall
null:org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for /roles.json
null:org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for /roles.json
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1102)
    at 
org.apache.solr.common.cloud.SolrZkClient.lambda$exists$3(SolrZkClient.java:315)
    at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
    at org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:315)
    at 
org.apache.solr.handler.admin.ClusterStatus.getClusterStatus(ClusterStatus.java:74)
    at 
org.apache.solr.handler.admin.CollectionsHandler$CollectionOperation.lambda$static$20(CollectionsHandler.java:682)
    at 
org.apache.solr.handler.admin.CollectionsHandler$CollectionOperation.execute(CollectionsHandler.java:957)
    at 
org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:237)
    at 
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:224)
    at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)
    at org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:735)
    at 
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:716)
    at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:497)
    at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)
    at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)
    at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)
    at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
    at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
    at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
    at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
    at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
    at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
    at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
    at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
    at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
    at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
    at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
    at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
    at org.eclipse.jetty.server.Server.handle(Server.java:534)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
    at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
    at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
    at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
    at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
    at 

Re: Solr 8.0 Json Facets are slow - need help

2020-01-22 Thread Mikhail Khludnev
Screenshot didn't come though the list. That excerpt doesn't have any
informative numbers.

On Tue, Jan 21, 2020 at 5:18 PM kumar gaurav  wrote:

> Hi Mikhail
>
> Thanks for your reply . Please help me in this .
>
> Followings are the screenshot:-
>
> [image: image.png]
>
>
> [image: image.png]
>
>
> json facet debug Output:-
>
> json:
> {
>
>- facet:
>{
>   - color_refine:
>   {
>  - domain:
>  {
> - excludeTags: "rassortment,top,top2,top3,top4,",
> - filter:
> [
>-
>"{!filters param=$child.fq excludeTags=rcolor_refine v=$sq}"
>,
>- "{!child of=$pq filters=$fq}docType:(product collection)"
>,
>],
> },
>  - type: "terms",
>  - field: "color_refine",
>  - limit: -1,
>  - facet:
>  {
> - productsCount: "uniqueBlock(_root_)"
> },
>  },
>   - size_refine:
>   {
>  - domain:
>  {
> - excludeTags: "rassortment,top,top2,top3,top4,",
> - filter:
> [
>-
>"{!filters param=$child.fq excludeTags=rsize_refine v=$sq}"
>,
>- "{!child of=$pq filters=$fq}docType:(product collection)"
>,
>],
> },
>  - type: "terms",
>  - field: "size_refine",
>  - limit: -1,
>  - facet:
>  {
> - productsCount: "uniqueBlock(_root_)"
> },
>  },
>   }
>
> }
>
>
>
> regards
> Kumar Gaurav
>
>
> On Tue, Jan 21, 2020 at 5:25 PM Mikhail Khludnev  wrote:
>
>> Hi.
>> Can you share debugQuery=true output?
>>
>> On Tue, Jan 21, 2020 at 1:37 PM kumar gaurav  wrote:
>>
>> > HI
>> >
>> > i have a parent child query in which i have used json facet for child
>> > faceting like following.
>> >
>> > qt=/dismax
>> > matchAllQueryRef1=+(+({!query v=$cq}))
>> > sq=+{!lucene v=$matchAllQueryRef1}
>> > q={!parent tag=top which=$pq filters=$child.fq score=max v=$cq}
>> > child.fq={!tag=rcolor_refine}filter({!term f=color_refine
>> > v=$qcolor_refine1}) filter({!term f=color_refine v=$qcolor_refine2})
>> > qcolor_refine1=Blue
>> > qcolor_refine2=Other clrs
>> > cq=+{!simpleFilter v=docType:sku}
>> > pq=docType:(product)
>> > facet=true
>> > facet.mincount=1
>> > facet.limit=-1
>> > facet.missing=false
>> > json.facet= {color_refine:{
>> > domain:{
>> > filter:["{!filters param=$child.fq excludeTags=rcolor_refine
>> > v=$sq}","{!child of=$pq filters=$fq}docType:(product)"]
>> >},
>> > type:terms,
>> > field:color_refine,
>> > limit:-1,
>> > facet:{productsCount:"uniqueBlock(_root_)"}}}
>> >
>> > schema :-
>> > > > multiValued="true" docValues="true"/>
>> >
>> > i have observed that json facets are slow . It is taking much time than
>> > expected .
>> > Can anyone please check this query specially child.fq and json.facet
>> part .
>> >
>> > Please help me in this .
>> >
>> > Thanks & regards
>> > Kumar Gaurav
>> >
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Solr 7.7 heap space is getting full

2020-01-22 Thread Toke Eskildsen
On Sun, 2020-01-19 at 21:19 -0500, Mehai, Lotfi wrote:
> I  had a similar issue with a large number of facets. There is no way
> (At least I know) your can get an acceptable response time from
> search engine with high number of facets.

Just for the record then it is doable under specific circumstances
(static single-shard index, only String fields, Solr 4 with patch,
fixed list of facet fields):
https://sbdevel.wordpress.com/2013/03/20/over-9000-facet-fields/

More usable for the current case would be to play with facet.threads
and throw hardware with many CPU-cores after the problem.

- Toke Eskildsen, Royal Danish Library




Re: regarding Extracting text from Images

2020-01-22 Thread Retro
Good day,
We solved the situation. Here is what was used and changed:
In our installation we used Tesseract  version 3.05, Tika version 1.17, SOLR
version 7.4.  We actually, had TIKA version 1.17, not 18. 
1. Changed from HOCR to TXT  >>> 
in file parseContext.xml
2. Had to start SOLR as a root user.
Version 4.1.1 is not compatible with TIKA 1.17 , so we will upgrade SOLR to
version 7.7, TIKA version 1.19 and will try to install Tesseract 4.1.1
 



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html