Re: Change field to DocValues

2021-02-17 Thread Mahmoud Almokadem
That's right, I want to avoid a complete reindexing process.
But should I create another field with the docValues property or change the
current field directly?

Can I use streaming expressions to update the whole index or should I
select and update using batches?


Thanks,
Mahmoud


On Wed, Feb 17, 2021 at 4:51 PM xiefengchang 
wrote:

> Hi:
> I think you are just trying to avoid complete re-index right?
> why don't you take a look at this:
> https://lucene.apache.org/solr/guide/8_0/updating-parts-of-documents.html
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> At 2021-02-17 21:14:11, "Mahmoud Almokadem" 
> wrote:
> >Hello,
> >
> >I've an integer field on an index with billions of documents and need to
> do
> >facets on this field, unfortunately the field doesn't have the docValues
> >property, so the FieldCache will be fired and use much memory.
> >
> >What is the best way to change the field to be docValues supported?
> >
> >Regards,
> >Mahmoud
>


Re:Change field to DocValues

2021-02-17 Thread xiefengchang
Hi:
I think you are just trying to avoid complete re-index right?
why don't you take a look at this: 
https://lucene.apache.org/solr/guide/8_0/updating-parts-of-documents.html

















At 2021-02-17 21:14:11, "Mahmoud Almokadem"  wrote:
>Hello,
>
>I've an integer field on an index with billions of documents and need to do
>facets on this field, unfortunately the field doesn't have the docValues
>property, so the FieldCache will be fired and use much memory.
>
>What is the best way to change the field to be docValues supported?
>
>Regards,
>Mahmoud


Change field to DocValues

2021-02-17 Thread Mahmoud Almokadem
Hello,

I've an integer field on an index with billions of documents and need to do
facets on this field, unfortunately the field doesn't have the docValues
property, so the FieldCache will be fired and use much memory.

What is the best way to change the field to be docValues supported?

Regards,
Mahmoud


Re: docValues usage

2020-11-04 Thread Wei
And in the case of both stored=true and docValues=true,  Solr 8.x shall be
choosing the optimal approach by itself?

On Wed, Nov 4, 2020 at 9:15 AM Wei  wrote:

> Thanks Erick. As indexed is not necessary,  and docValues is more
> efficient than stored fields for function queries, so  we shall go with the
> following:
>
>   3) indexed=false,  stored=false,  docValues=true.
>
> Is my understanding correct?
>
> Best,
> Wei
>
> On Wed, Nov 4, 2020 at 5:24 AM Erick Erickson 
> wrote:
>
>> You don’t need to index the field for function queries, see:
>> https://lucene.apache.org/solr/guide/8_6/docvalues.html.
>>
>> Function queries, as opposed to sorting, faceting and grouping are
>> evaluated at search time where the
>> search process is already parked on the document anyway, so answering the
>> question “for doc X, what
>> is the value of field Y” to compute the score. DocValues are still more
>> efficient I think, although I
>> haven’t measured explicitly...
>>
>> For sorting, faceting and grouping, it’s a much different story. Take
>> sorting. You have to ask
>> “for field Y, what’s the value in docX and docZ?”. Say you’re parked on
>> docX. Doc Z is long gone
>> and getting the value for field Y much more expensive.
>>
>> Also, docValues will not increase memory requirements _unless used_.
>> Otherwise they’ll
>> just sit there on disk. They will certainly increase disk space whether
>> used or not.
>>
>> And _not_ using docValues when you facet, group or sort will also
>> _certainly_ increase
>> your heap requirements since the docValues structure must be built on the
>> heap rather
>> than be in MMapDirectory space.
>>
>> Best,
>> Erick
>>
>>
>> > On Nov 4, 2020, at 5:32 AM, uyilmaz 
>> wrote:
>> >
>> > Hi,
>> >
>> > I'm by no means expert on this so if anyone sees a mistake please
>> correct me.
>> >
>> > I think you need to index this field, since boost functions are added
>> to the query as optional clauses (
>> https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Thebf_BoostFunctions_Parameter).
>> It's like boosting a regular field by putting ^2 next to it in a query.
>> Storing or enabling docValues will unnecesarily consume space/memory.
>> >
>> > On Tue, 3 Nov 2020 16:10:50 -0800
>> > Wei  wrote:
>> >
>> >> Hi,
>> >>
>> >> I have a couple of primitive single value numeric type fields,  their
>> >> values are used in boosting functions, but not used in sort/facet. or
>> in
>> >> returned response.   Should I use docValues for them in the schema?  I
>> can
>> >> think of the following options:
>> >>
>> >> 1)   indexed=true,  stored=true, docValues=false
>> >> 2)   indexed=true, stored=false, docValues=true
>> >> 3)   indexed=false,  stored=false,  docValues=true
>> >>
>> >> What would be the performance implications for these options?
>> >>
>> >> Best,
>> >> Wei
>> >
>> >
>> > --
>> > uyilmaz 
>>
>>


Re: docValues usage

2020-11-04 Thread Wei
Thanks Erick. As indexed is not necessary,  and docValues is more efficient
than stored fields for function queries, so  we shall go with the
following:

  3) indexed=false,  stored=false,  docValues=true.

Is my understanding correct?

Best,
Wei

On Wed, Nov 4, 2020 at 5:24 AM Erick Erickson 
wrote:

> You don’t need to index the field for function queries, see:
> https://lucene.apache.org/solr/guide/8_6/docvalues.html.
>
> Function queries, as opposed to sorting, faceting and grouping are
> evaluated at search time where the
> search process is already parked on the document anyway, so answering the
> question “for doc X, what
> is the value of field Y” to compute the score. DocValues are still more
> efficient I think, although I
> haven’t measured explicitly...
>
> For sorting, faceting and grouping, it’s a much different story. Take
> sorting. You have to ask
> “for field Y, what’s the value in docX and docZ?”. Say you’re parked on
> docX. Doc Z is long gone
> and getting the value for field Y much more expensive.
>
> Also, docValues will not increase memory requirements _unless used_.
> Otherwise they’ll
> just sit there on disk. They will certainly increase disk space whether
> used or not.
>
> And _not_ using docValues when you facet, group or sort will also
> _certainly_ increase
> your heap requirements since the docValues structure must be built on the
> heap rather
> than be in MMapDirectory space.
>
> Best,
> Erick
>
>
> > On Nov 4, 2020, at 5:32 AM, uyilmaz  wrote:
> >
> > Hi,
> >
> > I'm by no means expert on this so if anyone sees a mistake please
> correct me.
> >
> > I think you need to index this field, since boost functions are added to
> the query as optional clauses (
> https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Thebf_BoostFunctions_Parameter).
> It's like boosting a regular field by putting ^2 next to it in a query.
> Storing or enabling docValues will unnecesarily consume space/memory.
> >
> > On Tue, 3 Nov 2020 16:10:50 -0800
> > Wei  wrote:
> >
> >> Hi,
> >>
> >> I have a couple of primitive single value numeric type fields,  their
> >> values are used in boosting functions, but not used in sort/facet. or in
> >> returned response.   Should I use docValues for them in the schema?  I
> can
> >> think of the following options:
> >>
> >> 1)   indexed=true,  stored=true, docValues=false
> >> 2)   indexed=true, stored=false, docValues=true
> >> 3)   indexed=false,  stored=false,  docValues=true
> >>
> >> What would be the performance implications for these options?
> >>
> >> Best,
> >> Wei
> >
> >
> > --
> > uyilmaz 
>
>


Re: docValues usage

2020-11-04 Thread Erick Erickson
You don’t need to index the field for function queries, see: 
https://lucene.apache.org/solr/guide/8_6/docvalues.html.

Function queries, as opposed to sorting, faceting and grouping are evaluated at 
search time where the  
search process is already parked on the document anyway, so answering the 
question “for doc X, what
is the value of field Y” to compute the score. DocValues are still more 
efficient I think, although I
haven’t measured explicitly...

For sorting, faceting and grouping, it’s a much different story. Take sorting. 
You have to ask
“for field Y, what’s the value in docX and docZ?”. Say you’re parked on docX. 
Doc Z is long gone 
and getting the value for field Y much more expensive.

Also, docValues will not increase memory requirements _unless used_. Otherwise 
they’ll
just sit there on disk. They will certainly increase disk space whether used or 
not.

And _not_ using docValues when you facet, group or sort will also _certainly_ 
increase
your heap requirements since the docValues structure must be built on the heap 
rather
than be in MMapDirectory space.

Best,
Erick


> On Nov 4, 2020, at 5:32 AM, uyilmaz  wrote:
> 
> Hi,
> 
> I'm by no means expert on this so if anyone sees a mistake please correct me.
> 
> I think you need to index this field, since boost functions are added to the 
> query as optional clauses 
> (https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Thebf_BoostFunctions_Parameter).
>  It's like boosting a regular field by putting ^2 next to it in a query. 
> Storing or enabling docValues will unnecesarily consume space/memory.
> 
> On Tue, 3 Nov 2020 16:10:50 -0800
> Wei  wrote:
> 
>> Hi,
>> 
>> I have a couple of primitive single value numeric type fields,  their
>> values are used in boosting functions, but not used in sort/facet. or in
>> returned response.   Should I use docValues for them in the schema?  I can
>> think of the following options:
>> 
>> 1)   indexed=true,  stored=true, docValues=false
>> 2)   indexed=true, stored=false, docValues=true
>> 3)   indexed=false,  stored=false,  docValues=true
>> 
>> What would be the performance implications for these options?
>> 
>> Best,
>> Wei
> 
> 
> -- 
> uyilmaz 



Re: when to use stored over docValues and useDocValuesAsStored

2020-11-04 Thread Erick Erickson


> On Nov 4, 2020, at 6:43 AM, uyilmaz  wrote:
> 
> Hi,
> 
> I heavily use streaming expressions and facets, or export large amounts of 
> data from Solr to Spark to make analyses.
> 
> Please correct me if I know wrong:
> 
> + requesting a non-docValues field in a response causes whole document to be 
> decompressed and read from disk

non-docValues fields don’t work at all for many stream spources, IIRC only the 
Topic Stream will work with stored values. The read/decompress/extract cycle 
would be unacceptable performance-wise for large data sets otherwise.

> + streaming expressions and export handler requires every field read to have 
> docValues

Pretty muche.

> 
> - docValues increases index size, therefore memory requirement, stored only 
> uses disk space

Yes. 

> - stored preserves order of multivalued fields

Yes.

> 
> It seems stored is only useful when I have a multivalued field that I care 
> about the index-time order of things, and since I will be using the export 
> handler, it will use docValues anyways and lose the order.

Yes.

> 
> So is there any case that I need stored=true?

Not for export outside of the Topic Stream as above. stored=true is there for 
things like showing the user the original input and highlighting.

> 
> Best,
> ufuk
> 
> -- 
> uyilmaz 



when to use stored over docValues and useDocValuesAsStored

2020-11-04 Thread uyilmaz
Hi,

I heavily use streaming expressions and facets, or export large amounts of data 
from Solr to Spark to make analyses.

Please correct me if I know wrong:

+ requesting a non-docValues field in a response causes whole document to be 
decompressed and read from disk
+ streaming expressions and export handler requires every field read to have 
docValues

- docValues increases index size, therefore memory requirement, stored only 
uses disk space
- stored preserves order of multivalued fields

It seems stored is only useful when I have a multivalued field that I care 
about the index-time order of things, and since I will be using the export 
handler, it will use docValues anyways and lose the order.

So is there any case that I need stored=true?

Best,
ufuk

-- 
uyilmaz 


Re: docValues usage

2020-11-04 Thread uyilmaz
Hi,

I'm by no means expert on this so if anyone sees a mistake please correct me.

I think you need to index this field, since boost functions are added to the 
query as optional clauses 
(https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Thebf_BoostFunctions_Parameter).
 It's like boosting a regular field by putting ^2 next to it in a query. 
Storing or enabling docValues will unnecesarily consume space/memory.

On Tue, 3 Nov 2020 16:10:50 -0800
Wei  wrote:

> Hi,
> 
> I have a couple of primitive single value numeric type fields,  their
> values are used in boosting functions, but not used in sort/facet. or in
> returned response.   Should I use docValues for them in the schema?  I can
> think of the following options:
> 
>  1)   indexed=true,  stored=true, docValues=false
>  2)   indexed=true, stored=false, docValues=true
>  3)   indexed=false,  stored=false,  docValues=true
> 
> What would be the performance implications for these options?
> 
> Best,
> Wei


-- 
uyilmaz 


docValues usage

2020-11-03 Thread Wei
Hi,

I have a couple of primitive single value numeric type fields,  their
values are used in boosting functions, but not used in sort/facet. or in
returned response.   Should I use docValues for them in the schema?  I can
think of the following options:

 1)   indexed=true,  stored=true, docValues=false
 2)   indexed=true, stored=false, docValues=true
 3)   indexed=false,  stored=false,  docValues=true

What would be the performance implications for these options?

Best,
Wei


Re: Faceting on indexed=false stored=false docValues=true fields

2020-10-19 Thread uyilmaz
Sorry, correction, taking "the" time

On Mon, 19 Oct 2020 22:18:30 +0300
uyilmaz  wrote:

> Thanks for taking time to write a detailed answer.
> 
> We use Solr to both store our data and to perform aggregations, using 
> faceting or streaming expressions. When required analysis is too complex to 
> do in Solr, we export large query results from Solr to a more capable 
> analysis tool.
> 
> So I guess all fields need to be docValues="true", because export handler and 
> streaming both require fields to have docValues, and even if I won't use a 
> field in queries or facets, it should be in available to read in result set. 
> Fields that won't be searched or faceted can be (indexed=false stored=false 
> docValues=true) right?
> 
> --uyilmaz
> 
> 
> On Mon, 19 Oct 2020 14:14:27 -0400
> Michael Gibney  wrote:
> 
> > As you've observed, it is indeed possible to facet on fields with
> > docValues=true, indexed=false; but in almost all cases you should
> > probably set indexed=true. 1. for distributed facet count refinement,
> > the "indexed" approach is used to look up counts by value; 2. assuming
> > you're wanting to do something usual, e.g. allow users to apply
> > filters based on facet counts, the filter application would use the
> > "indexed" approach as well. Where indexed=false, if either filtering
> > or distributed refinement is attempted, I'm not 100% sure what
> > happens. It might fail, or lead to inconsistent results, or attempt to
> > look up results via the equivalent of a "table scan" over docValues (I
> > think the last of these is what actually happens, fwiw) ... but none
> > of these options is likely desirable.
> > 
> > Michael
> > 
> > On Mon, Oct 19, 2020 at 1:42 PM uyilmaz  wrote:
> > >
> > > Thanks! This also contributed to my confusion:
> > >
> > > https://lucene.apache.org/solr/guide/8_4/faceting.html#field-value-faceting-parameters
> > >
> > > "If you want Solr to perform both analysis (for searching) and faceting 
> > > on the full literal strings, use the copyField directive in your Schema 
> > > to create two versions of the field: one Text and one String. Make sure 
> > > both are indexed="true"."
> > >
> > > On Mon, 19 Oct 2020 13:08:00 -0400
> > > Alexandre Rafalovitch  wrote:
> > >
> > > > I think this is all explained quite well in the Ref Guide:
> > > > https://lucene.apache.org/solr/guide/8_6/docvalues.html
> > > >
> > > > DocValues is a different way to index/store values. Faceting is a
> > > > primary use case where docValues are better than what 'indexed=true'
> > > > gives you.
> > > >
> > > > Regards,
> > > >Alex.
> > > >
> > > > On Mon, 19 Oct 2020 at 12:51, uyilmaz  
> > > > wrote:
> > > > >
> > > > >
> > > > > Hey all,
> > > > >
> > > > > From my little experiments, I see that (if I didn't make a stupid 
> > > > > mistake) we can facet on fields marked as both indexed and stored 
> > > > > being false:
> > > > >
> > > > >  > > > > indexed="false" stored="false" docValues="true"/>
> > > > >
> > > > > I'm suprised by this, I thought I would need to index it. Can you 
> > > > > confirm this?
> > > > >
> > > > > Regards
> > > > >
> > > > > --
> > > > > uyilmaz 
> > >
> > >
> > > --
> > > uyilmaz 
> 
> 
> -- 
> uyilmaz 


-- 
uyilmaz 


Re: Faceting on indexed=false stored=false docValues=true fields

2020-10-19 Thread uyilmaz
Thanks for taking time to write a detailed answer.

We use Solr to both store our data and to perform aggregations, using faceting 
or streaming expressions. When required analysis is too complex to do in Solr, 
we export large query results from Solr to a more capable analysis tool.

So I guess all fields need to be docValues="true", because export handler and 
streaming both require fields to have docValues, and even if I won't use a 
field in queries or facets, it should be in available to read in result set. 
Fields that won't be searched or faceted can be (indexed=false stored=false 
docValues=true) right?

--uyilmaz


On Mon, 19 Oct 2020 14:14:27 -0400
Michael Gibney  wrote:

> As you've observed, it is indeed possible to facet on fields with
> docValues=true, indexed=false; but in almost all cases you should
> probably set indexed=true. 1. for distributed facet count refinement,
> the "indexed" approach is used to look up counts by value; 2. assuming
> you're wanting to do something usual, e.g. allow users to apply
> filters based on facet counts, the filter application would use the
> "indexed" approach as well. Where indexed=false, if either filtering
> or distributed refinement is attempted, I'm not 100% sure what
> happens. It might fail, or lead to inconsistent results, or attempt to
> look up results via the equivalent of a "table scan" over docValues (I
> think the last of these is what actually happens, fwiw) ... but none
> of these options is likely desirable.
> 
> Michael
> 
> On Mon, Oct 19, 2020 at 1:42 PM uyilmaz  wrote:
> >
> > Thanks! This also contributed to my confusion:
> >
> > https://lucene.apache.org/solr/guide/8_4/faceting.html#field-value-faceting-parameters
> >
> > "If you want Solr to perform both analysis (for searching) and faceting on 
> > the full literal strings, use the copyField directive in your Schema to 
> > create two versions of the field: one Text and one String. Make sure both 
> > are indexed="true"."
> >
> > On Mon, 19 Oct 2020 13:08:00 -0400
> > Alexandre Rafalovitch  wrote:
> >
> > > I think this is all explained quite well in the Ref Guide:
> > > https://lucene.apache.org/solr/guide/8_6/docvalues.html
> > >
> > > DocValues is a different way to index/store values. Faceting is a
> > > primary use case where docValues are better than what 'indexed=true'
> > > gives you.
> > >
> > > Regards,
> > >Alex.
> > >
> > > On Mon, 19 Oct 2020 at 12:51, uyilmaz  wrote:
> > > >
> > > >
> > > > Hey all,
> > > >
> > > > From my little experiments, I see that (if I didn't make a stupid 
> > > > mistake) we can facet on fields marked as both indexed and stored being 
> > > > false:
> > > >
> > > >  > > > indexed="false" stored="false" docValues="true"/>
> > > >
> > > > I'm suprised by this, I thought I would need to index it. Can you 
> > > > confirm this?
> > > >
> > > > Regards
> > > >
> > > > --
> > > > uyilmaz 
> >
> >
> > --
> > uyilmaz 


-- 
uyilmaz 


Re: Faceting on indexed=false stored=false docValues=true fields

2020-10-19 Thread Walter Underwood
Hmm. Fields used for faceting will also be used for filtering, which is a kind
of search. Are docValues OK for filtering? I expect they might be slow the
first time, then cached.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 19, 2020, at 11:15 AM, Erick Erickson  wrote:
> 
> uyilmaz:
> 
> Hmm, that _is_ confusing. And inaccurate.
> 
> In this context, it should read something like
> 
> The Text field should have indexed="true" docValues=“false" if used for 
> searching 
> but not faceting and the String field should have indexed="false" 
> docValues=“true"
> if used for faceting but not searching.
> 
> I’ll fix this, thanks for pointing this out.
> 
> Erick
> 
>> On Oct 19, 2020, at 1:42 PM, uyilmaz  wrote:
>> 
>> Thanks! This also contributed to my confusion:
>> 
>> https://lucene.apache.org/solr/guide/8_4/faceting.html#field-value-faceting-parameters
>> 
>> "If you want Solr to perform both analysis (for searching) and faceting on 
>> the full literal strings, use the copyField directive in your Schema to 
>> create two versions of the field: one Text and one String. Make sure both 
>> are indexed="true"."
>> 
>> On Mon, 19 Oct 2020 13:08:00 -0400
>> Alexandre Rafalovitch  wrote:
>> 
>>> I think this is all explained quite well in the Ref Guide:
>>> https://lucene.apache.org/solr/guide/8_6/docvalues.html
>>> 
>>> DocValues is a different way to index/store values. Faceting is a
>>> primary use case where docValues are better than what 'indexed=true'
>>> gives you.
>>> 
>>> Regards,
>>>  Alex.
>>> 
>>> On Mon, 19 Oct 2020 at 12:51, uyilmaz  wrote:
>>>> 
>>>> 
>>>> Hey all,
>>>> 
>>>> From my little experiments, I see that (if I didn't make a stupid mistake) 
>>>> we can facet on fields marked as both indexed and stored being false:
>>>> 
>>>> >>> stored="false" docValues="true"/>
>>>> 
>>>> I'm suprised by this, I thought I would need to index it. Can you confirm 
>>>> this?
>>>> 
>>>> Regards
>>>> 
>>>> --
>>>> uyilmaz 
>> 
>> 
>> -- 
>> uyilmaz 
> 



Re: Faceting on indexed=false stored=false docValues=true fields

2020-10-19 Thread Erick Erickson
uyilmaz:

Hmm, that _is_ confusing. And inaccurate.

In this context, it should read something like

The Text field should have indexed="true" docValues=“false" if used for 
searching 
but not faceting and the String field should have indexed="false" 
docValues=“true"
if used for faceting but not searching.

I’ll fix this, thanks for pointing this out.

Erick

> On Oct 19, 2020, at 1:42 PM, uyilmaz  wrote:
> 
> Thanks! This also contributed to my confusion:
> 
> https://lucene.apache.org/solr/guide/8_4/faceting.html#field-value-faceting-parameters
> 
> "If you want Solr to perform both analysis (for searching) and faceting on 
> the full literal strings, use the copyField directive in your Schema to 
> create two versions of the field: one Text and one String. Make sure both are 
> indexed="true"."
> 
> On Mon, 19 Oct 2020 13:08:00 -0400
> Alexandre Rafalovitch  wrote:
> 
>> I think this is all explained quite well in the Ref Guide:
>> https://lucene.apache.org/solr/guide/8_6/docvalues.html
>> 
>> DocValues is a different way to index/store values. Faceting is a
>> primary use case where docValues are better than what 'indexed=true'
>> gives you.
>> 
>> Regards,
>>   Alex.
>> 
>> On Mon, 19 Oct 2020 at 12:51, uyilmaz  wrote:
>>> 
>>> 
>>> Hey all,
>>> 
>>> From my little experiments, I see that (if I didn't make a stupid mistake) 
>>> we can facet on fields marked as both indexed and stored being false:
>>> 
>>> >> stored="false" docValues="true"/>
>>> 
>>> I'm suprised by this, I thought I would need to index it. Can you confirm 
>>> this?
>>> 
>>> Regards
>>> 
>>> --
>>> uyilmaz 
> 
> 
> -- 
> uyilmaz 



Re: Faceting on indexed=false stored=false docValues=true fields

2020-10-19 Thread Michael Gibney
As you've observed, it is indeed possible to facet on fields with
docValues=true, indexed=false; but in almost all cases you should
probably set indexed=true. 1. for distributed facet count refinement,
the "indexed" approach is used to look up counts by value; 2. assuming
you're wanting to do something usual, e.g. allow users to apply
filters based on facet counts, the filter application would use the
"indexed" approach as well. Where indexed=false, if either filtering
or distributed refinement is attempted, I'm not 100% sure what
happens. It might fail, or lead to inconsistent results, or attempt to
look up results via the equivalent of a "table scan" over docValues (I
think the last of these is what actually happens, fwiw) ... but none
of these options is likely desirable.

Michael

On Mon, Oct 19, 2020 at 1:42 PM uyilmaz  wrote:
>
> Thanks! This also contributed to my confusion:
>
> https://lucene.apache.org/solr/guide/8_4/faceting.html#field-value-faceting-parameters
>
> "If you want Solr to perform both analysis (for searching) and faceting on 
> the full literal strings, use the copyField directive in your Schema to 
> create two versions of the field: one Text and one String. Make sure both are 
> indexed="true"."
>
> On Mon, 19 Oct 2020 13:08:00 -0400
> Alexandre Rafalovitch  wrote:
>
> > I think this is all explained quite well in the Ref Guide:
> > https://lucene.apache.org/solr/guide/8_6/docvalues.html
> >
> > DocValues is a different way to index/store values. Faceting is a
> > primary use case where docValues are better than what 'indexed=true'
> > gives you.
> >
> > Regards,
> >Alex.
> >
> > On Mon, 19 Oct 2020 at 12:51, uyilmaz  wrote:
> > >
> > >
> > > Hey all,
> > >
> > > From my little experiments, I see that (if I didn't make a stupid 
> > > mistake) we can facet on fields marked as both indexed and stored being 
> > > false:
> > >
> > >  > > stored="false" docValues="true"/>
> > >
> > > I'm suprised by this, I thought I would need to index it. Can you confirm 
> > > this?
> > >
> > > Regards
> > >
> > > --
> > > uyilmaz 
>
>
> --
> uyilmaz 


Re: Faceting on indexed=false stored=false docValues=true fields

2020-10-19 Thread uyilmaz
Thanks! This also contributed to my confusion:

https://lucene.apache.org/solr/guide/8_4/faceting.html#field-value-faceting-parameters

"If you want Solr to perform both analysis (for searching) and faceting on the 
full literal strings, use the copyField directive in your Schema to create two 
versions of the field: one Text and one String. Make sure both are 
indexed="true"."

On Mon, 19 Oct 2020 13:08:00 -0400
Alexandre Rafalovitch  wrote:

> I think this is all explained quite well in the Ref Guide:
> https://lucene.apache.org/solr/guide/8_6/docvalues.html
> 
> DocValues is a different way to index/store values. Faceting is a
> primary use case where docValues are better than what 'indexed=true'
> gives you.
> 
> Regards,
>Alex.
> 
> On Mon, 19 Oct 2020 at 12:51, uyilmaz  wrote:
> >
> >
> > Hey all,
> >
> > From my little experiments, I see that (if I didn't make a stupid mistake) 
> > we can facet on fields marked as both indexed and stored being false:
> >
> >  > stored="false" docValues="true"/>
> >
> > I'm suprised by this, I thought I would need to index it. Can you confirm 
> > this?
> >
> > Regards
> >
> > --
> > uyilmaz 


-- 
uyilmaz 


Re: Faceting on indexed=false stored=false docValues=true fields

2020-10-19 Thread Alexandre Rafalovitch
I think this is all explained quite well in the Ref Guide:
https://lucene.apache.org/solr/guide/8_6/docvalues.html

DocValues is a different way to index/store values. Faceting is a
primary use case where docValues are better than what 'indexed=true'
gives you.

Regards,
   Alex.

On Mon, 19 Oct 2020 at 12:51, uyilmaz  wrote:
>
>
> Hey all,
>
> From my little experiments, I see that (if I didn't make a stupid mistake) we 
> can facet on fields marked as both indexed and stored being false:
>
>  stored="false" docValues="true"/>
>
> I'm suprised by this, I thought I would need to index it. Can you confirm 
> this?
>
> Regards
>
> --
> uyilmaz 


Faceting on indexed=false stored=false docValues=true fields

2020-10-19 Thread uyilmaz


Hey all,

>From my little experiments, I see that (if I didn't make a stupid mistake) we 
>can facet on fields marked as both indexed and stored being false:



I'm suprised by this, I thought I would need to index it. Can you confirm this?

Regards

-- 
uyilmaz 


Re: facets & docValues

2020-05-13 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Thu, May 7, 2020 at 8:49 PM Joel Bernstein  wrote:
>
> You can be pretty sure that adding static warming queries will improve your
> performance following softcommits. But, opening new searchers every 2
> seconds may be too fast to allow for warming so you may need to adjust. As
> a general rule you cannot open searchers faster than you can warm them.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Tue, May 5, 2020 at 5:54 PM Revas  wrote:
>
> > Hi joel, No, we have not, we have softCommit requirement of 2 secs.
> >
> > On Tue, May 5, 2020 at 3:31 PM Joel Bernstein  wrote:
> >
> > > Have you configured static warming queries for the facets? This will warm
> > > the cache structures for the facet fields. You just want to make sure you
> > > commits are spaced far enough apart that the warming completes before a
> > new
> > > searcher starts warming.
> > >
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > >
> > > On Mon, May 4, 2020 at 10:27 AM Revas  wrote:
> > >
> > > > Hi Erick, Thanks for the explanation and advise. With facet queries,
> > does
> > > > doc Values help at all ?
> > > >
> > > > 1) indexed=true, docValues=true =>  all facets
> > > >
> > > > 2)
> > > >
> > > >-  indexed=true , docValues=true => only for subfacets
> > > >- inexed=true, docValues=false=> facet query
> > > >- docValues=true, indexed=false=> term facets
> > > >
> > > >
> > > >
> > > > In case of 1 above, => Indexing slowed considerably. over all facet
> > > > performance improved many fold
> > > > In case of  2=>  over all performance showed only slight
> > > > improvement
> > > >
> > > > Does that mean turning on docValues even for facet query helps improve
> > > the
> > > > performance,  fetching from docValues for facet query is faster than
> > > > fetching from stored fields ?
> > > >
> > > > Thanks
> > > >
> > > >
> > > > On Thu, Apr 16, 2020 at 1:50 PM Erick Erickson <
> > erickerick...@gmail.com>
> > > > wrote:
> > > >
> > > > > DocValues should help when faceting over fields, i.e.
> > facet.field=blah.
> > > > >
> > > > > I would expect docValues to help with sub facets and, but don’t know
> > > > > the code well enough to say definitely one way or the other.
> > > > >
> > > > > The empirical approach would be to set “uninvertible=true” (Solr 7.6)
> > > and
> > > > > turn docValues off. What that means is that if any operation tries to
> > > > > uninvert
> > > > > the index on the Java heap, you’ll get an exception like:
> > > > > "can not sort on a field w/o docValues unless it is indexed=true
> > > > > uninvertible=true and the type supports Uninversion:”
> > > > >
> > > > > See SOLR-12962
> > > > >
> > > > > Speed is only one issue. The entire point of docValues is to not
> > > > “uninvert”
> > > > > the field on the heap. This used to lead to very significant memory
> > > > > pressure. So when turning docValues off, you run the risk of
> > > > > reverting back to the old behavior and having unexpected memory
> > > > > consumption, not to mention slowdowns when the uninversion
> > > > > takes place.
> > > > >
> > > > > Also, unless your documents are very large, this is a tiny corpus. It
> > > can
> > > > > be
> > > > > quite hard to get realistic numbers, the signal gets lost in the
> > noise.
> > > > >
> > > > > You should only shard when your individual query times exceed your
> > > > > requirement. Say you have a 95%tile requirement of 1 second response
> > > > time.
> > > > >
> > > > > Let’s further say that you can meet that requirement with 50
> > > > > queries/second,
> > > > > but when you get to 75 queries/second your response time exceeds your
> > > > > requirements. Do NOT shard at thi

Re: facets & docValues

2020-05-07 Thread Joel Bernstein
You can be pretty sure that adding static warming queries will improve your
performance following softcommits. But, opening new searchers every 2
seconds may be too fast to allow for warming so you may need to adjust. As
a general rule you cannot open searchers faster than you can warm them.

Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, May 5, 2020 at 5:54 PM Revas  wrote:

> Hi joel, No, we have not, we have softCommit requirement of 2 secs.
>
> On Tue, May 5, 2020 at 3:31 PM Joel Bernstein  wrote:
>
> > Have you configured static warming queries for the facets? This will warm
> > the cache structures for the facet fields. You just want to make sure you
> > commits are spaced far enough apart that the warming completes before a
> new
> > searcher starts warming.
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Mon, May 4, 2020 at 10:27 AM Revas  wrote:
> >
> > > Hi Erick, Thanks for the explanation and advise. With facet queries,
> does
> > > doc Values help at all ?
> > >
> > > 1) indexed=true, docValues=true =>  all facets
> > >
> > > 2)
> > >
> > >-  indexed=true , docValues=true => only for subfacets
> > >- inexed=true, docValues=false=> facet query
> > >- docValues=true, indexed=false=> term facets
> > >
> > >
> > >
> > > In case of 1 above, => Indexing slowed considerably. over all facet
> > > performance improved many fold
> > > In case of  2=>  over all performance showed only slight
> > > improvement
> > >
> > > Does that mean turning on docValues even for facet query helps improve
> > the
> > > performance,  fetching from docValues for facet query is faster than
> > > fetching from stored fields ?
> > >
> > > Thanks
> > >
> > >
> > > On Thu, Apr 16, 2020 at 1:50 PM Erick Erickson <
> erickerick...@gmail.com>
> > > wrote:
> > >
> > > > DocValues should help when faceting over fields, i.e.
> facet.field=blah.
> > > >
> > > > I would expect docValues to help with sub facets and, but don’t know
> > > > the code well enough to say definitely one way or the other.
> > > >
> > > > The empirical approach would be to set “uninvertible=true” (Solr 7.6)
> > and
> > > > turn docValues off. What that means is that if any operation tries to
> > > > uninvert
> > > > the index on the Java heap, you’ll get an exception like:
> > > > "can not sort on a field w/o docValues unless it is indexed=true
> > > > uninvertible=true and the type supports Uninversion:”
> > > >
> > > > See SOLR-12962
> > > >
> > > > Speed is only one issue. The entire point of docValues is to not
> > > “uninvert”
> > > > the field on the heap. This used to lead to very significant memory
> > > > pressure. So when turning docValues off, you run the risk of
> > > > reverting back to the old behavior and having unexpected memory
> > > > consumption, not to mention slowdowns when the uninversion
> > > > takes place.
> > > >
> > > > Also, unless your documents are very large, this is a tiny corpus. It
> > can
> > > > be
> > > > quite hard to get realistic numbers, the signal gets lost in the
> noise.
> > > >
> > > > You should only shard when your individual query times exceed your
> > > > requirement. Say you have a 95%tile requirement of 1 second response
> > > time.
> > > >
> > > > Let’s further say that you can meet that requirement with 50
> > > > queries/second,
> > > > but when you get to 75 queries/second your response time exceeds your
> > > > requirements. Do NOT shard at this point. Add another replica
> instead.
> > > > Sharding adds inevitable overhead and should only be considered when
> > > > you can’t get adequate response time even under fairly light query
> > loads
> > > > as a general rule.
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > > > On Apr 16, 2020, at 12:08 PM, Revas  wrote:
> > > > >
> > > > > Hi Erick, You are correct, we have only about 1.8M documents so far
> > and
> > > > > turning on the indexing on the facet fields helped improve the
> > timings
> > > of
> > > > > the fac

Re: facets & docValues

2020-05-05 Thread Revas
Hi joel, No, we have not, we have softCommit requirement of 2 secs.

On Tue, May 5, 2020 at 3:31 PM Joel Bernstein  wrote:

> Have you configured static warming queries for the facets? This will warm
> the cache structures for the facet fields. You just want to make sure you
> commits are spaced far enough apart that the warming completes before a new
> searcher starts warming.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Mon, May 4, 2020 at 10:27 AM Revas  wrote:
>
> > Hi Erick, Thanks for the explanation and advise. With facet queries, does
> > doc Values help at all ?
> >
> > 1) indexed=true, docValues=true =>  all facets
> >
> > 2)
> >
> >    -  indexed=true , docValues=true => only for subfacets
> >- inexed=true, docValues=false=> facet query
> >- docValues=true, indexed=false=> term facets
> >
> >
> >
> > In case of 1 above, => Indexing slowed considerably. over all facet
> > performance improved many fold
> > In case of  2=>  over all performance showed only slight
> > improvement
> >
> > Does that mean turning on docValues even for facet query helps improve
> the
> > performance,  fetching from docValues for facet query is faster than
> > fetching from stored fields ?
> >
> > Thanks
> >
> >
> > On Thu, Apr 16, 2020 at 1:50 PM Erick Erickson 
> > wrote:
> >
> > > DocValues should help when faceting over fields, i.e. facet.field=blah.
> > >
> > > I would expect docValues to help with sub facets and, but don’t know
> > > the code well enough to say definitely one way or the other.
> > >
> > > The empirical approach would be to set “uninvertible=true” (Solr 7.6)
> and
> > > turn docValues off. What that means is that if any operation tries to
> > > uninvert
> > > the index on the Java heap, you’ll get an exception like:
> > > "can not sort on a field w/o docValues unless it is indexed=true
> > > uninvertible=true and the type supports Uninversion:”
> > >
> > > See SOLR-12962
> > >
> > > Speed is only one issue. The entire point of docValues is to not
> > “uninvert”
> > > the field on the heap. This used to lead to very significant memory
> > > pressure. So when turning docValues off, you run the risk of
> > > reverting back to the old behavior and having unexpected memory
> > > consumption, not to mention slowdowns when the uninversion
> > > takes place.
> > >
> > > Also, unless your documents are very large, this is a tiny corpus. It
> can
> > > be
> > > quite hard to get realistic numbers, the signal gets lost in the noise.
> > >
> > > You should only shard when your individual query times exceed your
> > > requirement. Say you have a 95%tile requirement of 1 second response
> > time.
> > >
> > > Let’s further say that you can meet that requirement with 50
> > > queries/second,
> > > but when you get to 75 queries/second your response time exceeds your
> > > requirements. Do NOT shard at this point. Add another replica instead.
> > > Sharding adds inevitable overhead and should only be considered when
> > > you can’t get adequate response time even under fairly light query
> loads
> > > as a general rule.
> > >
> > > Best,
> > > Erick
> > >
> > > > On Apr 16, 2020, at 12:08 PM, Revas  wrote:
> > > >
> > > > Hi Erick, You are correct, we have only about 1.8M documents so far
> and
> > > > turning on the indexing on the facet fields helped improve the
> timings
> > of
> > > > the facet query a lot which has (sub facets and facet queries). So
> does
> > > > docValues help at all for sub facets and facet query, our tests
> > > > revealed further query time improvement when we turned off the
> > docValues.
> > > > is that the right approach?
> > > >
> > > > Currently we have only 1 shard and  we are thinking of scaling by
> > > > increasing the number of shards when we see a deterioration on query
> > > time.
> > > > Any suggestions?
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > On Wed, Apr 15, 2020 at 8:21 AM Erick Erickson <
> > erickerick...@gmail.com>
> > > > wrote:
> > > >
> > > >> In a word, “yes”. I also suspect your corpus isn’t very big.
> > > >>
> > >

Re: facets & docValues

2020-05-05 Thread Joel Bernstein
Have you configured static warming queries for the facets? This will warm
the cache structures for the facet fields. You just want to make sure you
commits are spaced far enough apart that the warming completes before a new
searcher starts warming.


Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, May 4, 2020 at 10:27 AM Revas  wrote:

> Hi Erick, Thanks for the explanation and advise. With facet queries, does
> doc Values help at all ?
>
> 1) indexed=true, docValues=true =>  all facets
>
> 2)
>
>-  indexed=true , docValues=true => only for subfacets
>- inexed=true, docValues=false=> facet query
>- docValues=true, indexed=false=> term facets
>
>
>
> In case of 1 above, => Indexing slowed considerably. over all facet
> performance improved many fold
> In case of  2=>  over all performance showed only slight
> improvement
>
> Does that mean turning on docValues even for facet query helps improve the
> performance,  fetching from docValues for facet query is faster than
> fetching from stored fields ?
>
> Thanks
>
>
> On Thu, Apr 16, 2020 at 1:50 PM Erick Erickson 
> wrote:
>
> > DocValues should help when faceting over fields, i.e. facet.field=blah.
> >
> > I would expect docValues to help with sub facets and, but don’t know
> > the code well enough to say definitely one way or the other.
> >
> > The empirical approach would be to set “uninvertible=true” (Solr 7.6) and
> > turn docValues off. What that means is that if any operation tries to
> > uninvert
> > the index on the Java heap, you’ll get an exception like:
> > "can not sort on a field w/o docValues unless it is indexed=true
> > uninvertible=true and the type supports Uninversion:”
> >
> > See SOLR-12962
> >
> > Speed is only one issue. The entire point of docValues is to not
> “uninvert”
> > the field on the heap. This used to lead to very significant memory
> > pressure. So when turning docValues off, you run the risk of
> > reverting back to the old behavior and having unexpected memory
> > consumption, not to mention slowdowns when the uninversion
> > takes place.
> >
> > Also, unless your documents are very large, this is a tiny corpus. It can
> > be
> > quite hard to get realistic numbers, the signal gets lost in the noise.
> >
> > You should only shard when your individual query times exceed your
> > requirement. Say you have a 95%tile requirement of 1 second response
> time.
> >
> > Let’s further say that you can meet that requirement with 50
> > queries/second,
> > but when you get to 75 queries/second your response time exceeds your
> > requirements. Do NOT shard at this point. Add another replica instead.
> > Sharding adds inevitable overhead and should only be considered when
> > you can’t get adequate response time even under fairly light query loads
> > as a general rule.
> >
> > Best,
> > Erick
> >
> > > On Apr 16, 2020, at 12:08 PM, Revas  wrote:
> > >
> > > Hi Erick, You are correct, we have only about 1.8M documents so far and
> > > turning on the indexing on the facet fields helped improve the timings
> of
> > > the facet query a lot which has (sub facets and facet queries). So does
> > > docValues help at all for sub facets and facet query, our tests
> > > revealed further query time improvement when we turned off the
> docValues.
> > > is that the right approach?
> > >
> > > Currently we have only 1 shard and  we are thinking of scaling by
> > > increasing the number of shards when we see a deterioration on query
> > time.
> > > Any suggestions?
> > >
> > > Thanks.
> > >
> > >
> > > On Wed, Apr 15, 2020 at 8:21 AM Erick Erickson <
> erickerick...@gmail.com>
> > > wrote:
> > >
> > >> In a word, “yes”. I also suspect your corpus isn’t very big.
> > >>
> > >> I think the key is the facet queries. Now, I’m talking from
> > >> theory rather than diving into the code, but querying on
> > >> a docValues=true, indexed=false field is really doing a
> > >> search. And searching on a field like that is effectively
> > >> analogous to a table scan. Even if somehow an internal
> > >> structure would be constructed to deal with it, it would
> > >> probably be on the heap, where you don’t want it.
> > >>
> > >> So the test would be to take the queries out and measure
> > >> performance, but I think that’s the root issue here.
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >>> On Apr 14, 2020, at 11:51 PM, Revas  wrote:
> > >>>
> > >>> We have faceting fields that have been defined as indexed=false,
> > >>> stored=false and docValues=true
> > >>>
> > >>> However we use a lot of subfacets  using  json facets and facet
> ranges
> > >>> using facet.queries. We see that after every soft-commit our
> > performance
> > >>> worsens and performs ideal between commits
> > >>>
> > >>> how is that docValue fields are affected by soft-commit and do we
> need
> > to
> > >>> enable indexing if we use subfacets and facet query to improve
> > >> performance?
> > >>>
> > >>> Tha
> > >>
> > >>
> >
> >
>


Re: facets & docValues

2020-05-04 Thread Revas
Hi Erick, Thanks for the explanation and advise. With facet queries, does
doc Values help at all ?

1) indexed=true, docValues=true =>  all facets

2)

   -  indexed=true , docValues=true => only for subfacets
   - inexed=true, docValues=false=> facet query
   - docValues=true, indexed=false=> term facets



In case of 1 above, => Indexing slowed considerably. over all facet
performance improved many fold
In case of  2=>  over all performance showed only slight
improvement

Does that mean turning on docValues even for facet query helps improve the
performance,  fetching from docValues for facet query is faster than
fetching from stored fields ?

Thanks


On Thu, Apr 16, 2020 at 1:50 PM Erick Erickson 
wrote:

> DocValues should help when faceting over fields, i.e. facet.field=blah.
>
> I would expect docValues to help with sub facets and, but don’t know
> the code well enough to say definitely one way or the other.
>
> The empirical approach would be to set “uninvertible=true” (Solr 7.6) and
> turn docValues off. What that means is that if any operation tries to
> uninvert
> the index on the Java heap, you’ll get an exception like:
> "can not sort on a field w/o docValues unless it is indexed=true
> uninvertible=true and the type supports Uninversion:”
>
> See SOLR-12962
>
> Speed is only one issue. The entire point of docValues is to not “uninvert”
> the field on the heap. This used to lead to very significant memory
> pressure. So when turning docValues off, you run the risk of
> reverting back to the old behavior and having unexpected memory
> consumption, not to mention slowdowns when the uninversion
> takes place.
>
> Also, unless your documents are very large, this is a tiny corpus. It can
> be
> quite hard to get realistic numbers, the signal gets lost in the noise.
>
> You should only shard when your individual query times exceed your
> requirement. Say you have a 95%tile requirement of 1 second response time.
>
> Let’s further say that you can meet that requirement with 50
> queries/second,
> but when you get to 75 queries/second your response time exceeds your
> requirements. Do NOT shard at this point. Add another replica instead.
> Sharding adds inevitable overhead and should only be considered when
> you can’t get adequate response time even under fairly light query loads
> as a general rule.
>
> Best,
> Erick
>
> > On Apr 16, 2020, at 12:08 PM, Revas  wrote:
> >
> > Hi Erick, You are correct, we have only about 1.8M documents so far and
> > turning on the indexing on the facet fields helped improve the timings of
> > the facet query a lot which has (sub facets and facet queries). So does
> > docValues help at all for sub facets and facet query, our tests
> > revealed further query time improvement when we turned off the docValues.
> > is that the right approach?
> >
> > Currently we have only 1 shard and  we are thinking of scaling by
> > increasing the number of shards when we see a deterioration on query
> time.
> > Any suggestions?
> >
> > Thanks.
> >
> >
> > On Wed, Apr 15, 2020 at 8:21 AM Erick Erickson 
> > wrote:
> >
> >> In a word, “yes”. I also suspect your corpus isn’t very big.
> >>
> >> I think the key is the facet queries. Now, I’m talking from
> >> theory rather than diving into the code, but querying on
> >> a docValues=true, indexed=false field is really doing a
> >> search. And searching on a field like that is effectively
> >> analogous to a table scan. Even if somehow an internal
> >> structure would be constructed to deal with it, it would
> >> probably be on the heap, where you don’t want it.
> >>
> >> So the test would be to take the queries out and measure
> >> performance, but I think that’s the root issue here.
> >>
> >> Best,
> >> Erick
> >>
> >>> On Apr 14, 2020, at 11:51 PM, Revas  wrote:
> >>>
> >>> We have faceting fields that have been defined as indexed=false,
> >>> stored=false and docValues=true
> >>>
> >>> However we use a lot of subfacets  using  json facets and facet ranges
> >>> using facet.queries. We see that after every soft-commit our
> performance
> >>> worsens and performs ideal between commits
> >>>
> >>> how is that docValue fields are affected by soft-commit and do we need
> to
> >>> enable indexing if we use subfacets and facet query to improve
> >> performance?
> >>>
> >>> Tha
> >>
> >>
>
>


Re: facets & docValues

2020-04-16 Thread Erick Erickson
DocValues should help when faceting over fields, i.e. facet.field=blah.

I would expect docValues to help with sub facets and, but don’t know
the code well enough to say definitely one way or the other.

The empirical approach would be to set “uninvertible=true” (Solr 7.6) and
turn docValues off. What that means is that if any operation tries to uninvert
the index on the Java heap, you’ll get an exception like:
"can not sort on a field w/o docValues unless it is indexed=true 
uninvertible=true and the type supports Uninversion:”

See SOLR-12962

Speed is only one issue. The entire point of docValues is to not “uninvert”
the field on the heap. This used to lead to very significant memory
pressure. So when turning docValues off, you run the risk of 
reverting back to the old behavior and having unexpected memory
consumption, not to mention slowdowns when the uninversion
takes place.

Also, unless your documents are very large, this is a tiny corpus. It can be
quite hard to get realistic numbers, the signal gets lost in the noise.

You should only shard when your individual query times exceed your
requirement. Say you have a 95%tile requirement of 1 second response time.

Let’s further say that you can meet that requirement with 50 queries/second,
but when you get to 75 queries/second your response time exceeds your 
requirements. Do NOT shard at this point. Add another replica instead.
Sharding adds inevitable overhead and should only be considered when
you can’t get adequate response time even under fairly light query loads
as a general rule.

Best,
Erick

> On Apr 16, 2020, at 12:08 PM, Revas  wrote:
> 
> Hi Erick, You are correct, we have only about 1.8M documents so far and
> turning on the indexing on the facet fields helped improve the timings of
> the facet query a lot which has (sub facets and facet queries). So does
> docValues help at all for sub facets and facet query, our tests
> revealed further query time improvement when we turned off the docValues.
> is that the right approach?
> 
> Currently we have only 1 shard and  we are thinking of scaling by
> increasing the number of shards when we see a deterioration on query time.
> Any suggestions?
> 
> Thanks.
> 
> 
> On Wed, Apr 15, 2020 at 8:21 AM Erick Erickson 
> wrote:
> 
>> In a word, “yes”. I also suspect your corpus isn’t very big.
>> 
>> I think the key is the facet queries. Now, I’m talking from
>> theory rather than diving into the code, but querying on
>> a docValues=true, indexed=false field is really doing a
>> search. And searching on a field like that is effectively
>> analogous to a table scan. Even if somehow an internal
>> structure would be constructed to deal with it, it would
>> probably be on the heap, where you don’t want it.
>> 
>> So the test would be to take the queries out and measure
>> performance, but I think that’s the root issue here.
>> 
>> Best,
>> Erick
>> 
>>> On Apr 14, 2020, at 11:51 PM, Revas  wrote:
>>> 
>>> We have faceting fields that have been defined as indexed=false,
>>> stored=false and docValues=true
>>> 
>>> However we use a lot of subfacets  using  json facets and facet ranges
>>> using facet.queries. We see that after every soft-commit our performance
>>> worsens and performs ideal between commits
>>> 
>>> how is that docValue fields are affected by soft-commit and do we need to
>>> enable indexing if we use subfacets and facet query to improve
>> performance?
>>> 
>>> Tha
>> 
>> 



Re: facets & docValues

2020-04-16 Thread Revas
Hi Erick, You are correct, we have only about 1.8M documents so far and
turning on the indexing on the facet fields helped improve the timings of
the facet query a lot which has (sub facets and facet queries). So does
docValues help at all for sub facets and facet query, our tests
revealed further query time improvement when we turned off the docValues.
is that the right approach?

Currently we have only 1 shard and  we are thinking of scaling by
increasing the number of shards when we see a deterioration on query time.
Any suggestions?

Thanks.


On Wed, Apr 15, 2020 at 8:21 AM Erick Erickson 
wrote:

> In a word, “yes”. I also suspect your corpus isn’t very big.
>
> I think the key is the facet queries. Now, I’m talking from
> theory rather than diving into the code, but querying on
> a docValues=true, indexed=false field is really doing a
> search. And searching on a field like that is effectively
> analogous to a table scan. Even if somehow an internal
> structure would be constructed to deal with it, it would
> probably be on the heap, where you don’t want it.
>
> So the test would be to take the queries out and measure
> performance, but I think that’s the root issue here.
>
> Best,
> Erick
>
> > On Apr 14, 2020, at 11:51 PM, Revas  wrote:
> >
> > We have faceting fields that have been defined as indexed=false,
> > stored=false and docValues=true
> >
> > However we use a lot of subfacets  using  json facets and facet ranges
> > using facet.queries. We see that after every soft-commit our performance
> > worsens and performs ideal between commits
> >
> > how is that docValue fields are affected by soft-commit and do we need to
> > enable indexing if we use subfacets and facet query to improve
> performance?
> >
> > Tha
>
>


Re: facets & docValues

2020-04-15 Thread Erick Erickson
In a word, “yes”. I also suspect your corpus isn’t very big.

I think the key is the facet queries. Now, I’m talking from
theory rather than diving into the code, but querying on
a docValues=true, indexed=false field is really doing a
search. And searching on a field like that is effectively
analogous to a table scan. Even if somehow an internal
structure would be constructed to deal with it, it would 
probably be on the heap, where you don’t want it.

So the test would be to take the queries out and measure
performance, but I think that’s the root issue here.

Best,
Erick

> On Apr 14, 2020, at 11:51 PM, Revas  wrote:
> 
> We have faceting fields that have been defined as indexed=false,
> stored=false and docValues=true
> 
> However we use a lot of subfacets  using  json facets and facet ranges
> using facet.queries. We see that after every soft-commit our performance
> worsens and performs ideal between commits
> 
> how is that docValue fields are affected by soft-commit and do we need to
> enable indexing if we use subfacets and facet query to improve performance?
> 
> Tha



facets & docValues

2020-04-14 Thread Revas
We have faceting fields that have been defined as indexed=false,
stored=false and docValues=true

However we use a lot of subfacets  using  json facets and facet ranges
using facet.queries. We see that after every soft-commit our performance
worsens and performs ideal between commits

how is that docValue fields are affected by soft-commit and do we need to
enable indexing if we use subfacets and facet query to improve performance?

Tha


Re: Does it make sense docValues="true" for _root_ field for uniqueBlock()

2020-01-22 Thread Mikhail Khludnev
It's hard to predict will it be faster read docValues files or uninvert
field ad-hoc and read them from heap. Only test might judge it.

On Wed, Jan 22, 2020 at 11:08 PM kumar gaurav  wrote:

> HI Mikhail
>
> for example :- 6GB index size (Parent-child documents)
> indexing in 12 hours interval .
>
> need to use uniqueBlock for json facet for child faceting .
>
> Should i use docValues="true" for _root_  field   ?
>
> Thanks .
>
> regards
> Kumar Gaurav
>
>
>
> On Thu, Jan 23, 2020 at 1:28 AM Mikhail Khludnev  wrote:
>
> > It depends from env.
> >
> > On Wed, Jan 22, 2020 at 9:31 PM kumar gaurav  wrote:
> >
> > > Hi Everyone
> > >
> > > Should i use docValues="true" for _root_  field to improve nested child
> > > json.facet performance  ? i am using uniqueBlock() .
> > >
> > >
> > > Thanks in advance .
> > >
> > > regards
> > > Kumar Gaurav
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Does it make sense docValues="true" for _root_ field for uniqueBlock()

2020-01-22 Thread kumar gaurav
HI Mikhail

for example :- 6GB index size (Parent-child documents)
indexing in 12 hours interval .

need to use uniqueBlock for json facet for child faceting .

Should i use docValues="true" for _root_  field   ?

Thanks .

regards
Kumar Gaurav



On Thu, Jan 23, 2020 at 1:28 AM Mikhail Khludnev  wrote:

> It depends from env.
>
> On Wed, Jan 22, 2020 at 9:31 PM kumar gaurav  wrote:
>
> > Hi Everyone
> >
> > Should i use docValues="true" for _root_  field to improve nested child
> > json.facet performance  ? i am using uniqueBlock() .
> >
> >
> > Thanks in advance .
> >
> > regards
> > Kumar Gaurav
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: Does it make sense docValues="true" for _root_ field for uniqueBlock()

2020-01-22 Thread Mikhail Khludnev
It depends from env.

On Wed, Jan 22, 2020 at 9:31 PM kumar gaurav  wrote:

> Hi Everyone
>
> Should i use docValues="true" for _root_  field to improve nested child
> json.facet performance  ? i am using uniqueBlock() .
>
>
> Thanks in advance .
>
> regards
> Kumar Gaurav
>


-- 
Sincerely yours
Mikhail Khludnev


Does it make sense docValues="true" for _root_ field for uniqueBlock()

2020-01-22 Thread kumar gaurav
Hi Everyone

Should i use docValues="true" for _root_  field to improve nested child
json.facet performance  ? i am using uniqueBlock() .


Thanks in advance .

regards
Kumar Gaurav


Re: [Q] Faster Atomic Updates - use docValues?

2019-12-11 Thread Erick Erickson
GCEasy works fine. GCViewer is something you can have on your local machine,
sometimes if you have very large GC logs uploading them can take quite a
while.

The next step, if you can’t find anything satisfactory is to put a profiler on
the running Solr instance, which will tell you where the time is being spent.

Do note that indexing is an I/O intensive operation, especially when segments
are being merged so if you were swapping, I’d expect I/O to go from just
very high to extremely high….

Good luck!

> On Dec 11, 2019, at 8:13 AM, Paras Lehana  wrote:
> 
> Hi Erick,
> 
> You're right - IO was extraordinarily high. But something odd happened. To
> actually build a relation, I tried different heap sizes with default
> solrconfig.xml values as you recommended.
> 
>   1. Increased RAM to 4G, speed 8500k.
>   2. Decreased to 2G, back to old 65k.
>   3. Increased back to 4G, speed 50k
>   4. Decreased to 3G, speed 50k
>   5. Increased to 10G, speed 8500k.
> 
> The speed is 1 min average after the indexing is started. With last 10G, as
> (maybe) expected, I got java.lang.NullPointerException at
> org.apache.solr.handler.component.RealTimeGetComponent.getInputDocument
> before committing. I'm not getting the faster speeds with any of the heap
> sizes now. I will continue digging in deeper and in the meantime, I will be
> getting the 24G RAM. Currently giving Solr 6G heap (speed is 55k - too
> low).
> 
> After making the progress, this may be a step backward but I do believe I
> will take 2 steps forward soon. All credits to you. Getting into GC logs
> now. I'm a newbie here - know about GC theory but have never analyzed
> those. What tool do you prefer? I'm planning to use GCeasy for uploading
> the solr current gc log.
> 
> On Wed, 11 Dec 2019 at 18:21, Erick Erickson 
> wrote:
> 
>> I doubt GC alone would make nearly that difference. More likely
>> it’s I/O interacting with MMapDirectory. Lucene uses OS memory
>> space for much of its index, i.e. the RAM left over
>> after that used for the running Solr process (and any other
>> processes of course). See:
>> 
>> https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>> 
>> So if you, you don’t leave much OS memory space for Lucene’s
>> use via MMap, that can lead to swapping. My bet is that was
>> what was happening, and your CPU utilization was low; Lucene and
>> thus Solr was spending all its time waiting around for I/O. If that theory
>> is true, your disk I/O should have been much higher before you reduced
>> your heap.
>> 
>> IOW, I claim if you left the java heap at 12G and increased the physical
>> memory to 24G you’d see an identical (or nearly) speedup. GC for a 12G
>> heap is rarely a bottleneck. That said you want to use as little heap for
>> your Java process as possible, but if you reduce it too much you wind up
>> with other problems. OOM for one, and I’ve also seen GC take an inordinate
>> amount of time when it’s _barely_ enough to run. You hit a GC that
>> recovers,
>> say, 10M of heap which is barely enough to continue for a few milliseconds
>> and hits another GC….. As you can tell, “this is more art than science”…
>> 
>> Glad to hear you’re making progress!
>> Erick
>> 
>>> On Dec 11, 2019, at 5:06 AM, Paras Lehana 
>> wrote:
>>> 
>>> Just to update, I kept the defaults. The indexing got only a little boost
>>> though I have decided to continue with the defaults and do incremental
>>> experiments only. To my surprise, our development server had only 12GB
>> RAM,
>>> of which 8G was allocated to Java. Because I could not increase the RAM,
>> I
>>> tried decreasing it to 4G and guess what! My indexing speed got a boost
>> of
>>> over *50x*. Erick, thanks for helping. I think I should do more homework
>>> about GCs also. Your GC guess seems to be valid. I have raised the
>> request
>>> to increase RAM on the development to 24GB.
>>> 
>>> On Mon, 9 Dec 2019 at 20:23, Erick Erickson 
>> wrote:
>>> 
 Note that that article is from 2011. That was in the Solr 3x days when
 many, many, many things were different. There was no SolrCloud for
 instance. Plus Tom’s problem space is indexing _books_. Whole, complete,
 books. Which is, actually, not “normal” indexing at all as most Solr
 indexes are much smaller documents. Books are a perfectly reasonable
 use-case of course, but have a whole bunch of special requirements.
 
 get-by-id should be very efficient, _except_ that the longer you spend
 before opening a new searcher, the larger the internal data buffers
 supporting get-by-id need to be.
 
 Anyway, best of luck
 Erick
 
> On Dec 9, 2019, at 1:05 AM, Paras Lehana 
 wrote:
> 
> Hi Erick,
> 
> I have reverted back to original values and yes, I did see
>> improvement. I
> will collect more stats. *Thank you for helping. :)*
> 
> Also, here is the reference article that I had referred for changing
> values:
> 
 
>> https://www.ha

Re: [Q] Faster Atomic Updates - use docValues?

2019-12-11 Thread Paras Lehana
Hi Erick,

You're right - IO was extraordinarily high. But something odd happened. To
actually build a relation, I tried different heap sizes with default
solrconfig.xml values as you recommended.

   1. Increased RAM to 4G, speed 8500k.
   2. Decreased to 2G, back to old 65k.
   3. Increased back to 4G, speed 50k
   4. Decreased to 3G, speed 50k
   5. Increased to 10G, speed 8500k.

The speed is 1 min average after the indexing is started. With last 10G, as
(maybe) expected, I got java.lang.NullPointerException at
org.apache.solr.handler.component.RealTimeGetComponent.getInputDocument
before committing. I'm not getting the faster speeds with any of the heap
sizes now. I will continue digging in deeper and in the meantime, I will be
getting the 24G RAM. Currently giving Solr 6G heap (speed is 55k - too
low).

After making the progress, this may be a step backward but I do believe I
will take 2 steps forward soon. All credits to you. Getting into GC logs
now. I'm a newbie here - know about GC theory but have never analyzed
those. What tool do you prefer? I'm planning to use GCeasy for uploading
the solr current gc log.

On Wed, 11 Dec 2019 at 18:21, Erick Erickson 
wrote:

> I doubt GC alone would make nearly that difference. More likely
> it’s I/O interacting with MMapDirectory. Lucene uses OS memory
> space for much of its index, i.e. the RAM left over
> after that used for the running Solr process (and any other
> processes of course). See:
>
> https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> So if you, you don’t leave much OS memory space for Lucene’s
> use via MMap, that can lead to swapping. My bet is that was
> what was happening, and your CPU utilization was low; Lucene and
> thus Solr was spending all its time waiting around for I/O. If that theory
> is true, your disk I/O should have been much higher before you reduced
> your heap.
>
> IOW, I claim if you left the java heap at 12G and increased the physical
> memory to 24G you’d see an identical (or nearly) speedup. GC for a 12G
> heap is rarely a bottleneck. That said you want to use as little heap for
> your Java process as possible, but if you reduce it too much you wind up
> with other problems. OOM for one, and I’ve also seen GC take an inordinate
> amount of time when it’s _barely_ enough to run. You hit a GC that
> recovers,
> say, 10M of heap which is barely enough to continue for a few milliseconds
> and hits another GC….. As you can tell, “this is more art than science”…
>
> Glad to hear you’re making progress!
> Erick
>
> > On Dec 11, 2019, at 5:06 AM, Paras Lehana 
> wrote:
> >
> > Just to update, I kept the defaults. The indexing got only a little boost
> > though I have decided to continue with the defaults and do incremental
> > experiments only. To my surprise, our development server had only 12GB
> RAM,
> > of which 8G was allocated to Java. Because I could not increase the RAM,
> I
> > tried decreasing it to 4G and guess what! My indexing speed got a boost
> of
> > over *50x*. Erick, thanks for helping. I think I should do more homework
> > about GCs also. Your GC guess seems to be valid. I have raised the
> request
> > to increase RAM on the development to 24GB.
> >
> > On Mon, 9 Dec 2019 at 20:23, Erick Erickson 
> wrote:
> >
> >> Note that that article is from 2011. That was in the Solr 3x days when
> >> many, many, many things were different. There was no SolrCloud for
> >> instance. Plus Tom’s problem space is indexing _books_. Whole, complete,
> >> books. Which is, actually, not “normal” indexing at all as most Solr
> >> indexes are much smaller documents. Books are a perfectly reasonable
> >> use-case of course, but have a whole bunch of special requirements.
> >>
> >> get-by-id should be very efficient, _except_ that the longer you spend
> >> before opening a new searcher, the larger the internal data buffers
> >> supporting get-by-id need to be.
> >>
> >> Anyway, best of luck
> >> Erick
> >>
> >>> On Dec 9, 2019, at 1:05 AM, Paras Lehana 
> >> wrote:
> >>>
> >>> Hi Erick,
> >>>
> >>> I have reverted back to original values and yes, I did see
> improvement. I
> >>> will collect more stats. *Thank you for helping. :)*
> >>>
> >>> Also, here is the reference article that I had referred for changing
> >>> values:
> >>>
> >>
> https://www.hathitrust.org/blogs/large-scale-search/forty-days-and-forty-nights-re-indexing-7-million-books-part-1
> >>>
> >>> The article was perhaps for normal indexing and thus, suggested
> >> increasing
> >>> mergeFactor and then finally optimizing. In my case, a large number of
> >>> segments could have impacted get-by-id of atomic updates? Just being
> >>> curious.
> >>>
> >>> On Fri, 6 Dec 2019 at 19:02, Paras Lehana 
> >>> wrote:
> >>>
>  Hey Erick,
> 
>  We have just upgraded to 8.3 before starting the indexing. We were on
> >> 6.6
>  before that.
> 
>  Thank you for your continued support and resources. Again, I have
> >> already
>  taken

Re: [Q] Faster Atomic Updates - use docValues?

2019-12-11 Thread Erick Erickson
I doubt GC alone would make nearly that difference. More likely
it’s I/O interacting with MMapDirectory. Lucene uses OS memory
space for much of its index, i.e. the RAM left over
after that used for the running Solr process (and any other
processes of course). See:

https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

So if you, you don’t leave much OS memory space for Lucene’s 
use via MMap, that can lead to swapping. My bet is that was
what was happening, and your CPU utilization was low; Lucene and
thus Solr was spending all its time waiting around for I/O. If that theory
is true, your disk I/O should have been much higher before you reduced
your heap.

IOW, I claim if you left the java heap at 12G and increased the physical
memory to 24G you’d see an identical (or nearly) speedup. GC for a 12G
heap is rarely a bottleneck. That said you want to use as little heap for
your Java process as possible, but if you reduce it too much you wind up
with other problems. OOM for one, and I’ve also seen GC take an inordinate
amount of time when it’s _barely_ enough to run. You hit a GC that recovers,
say, 10M of heap which is barely enough to continue for a few milliseconds
and hits another GC….. As you can tell, “this is more art than science”…

Glad to hear you’re making progress!
Erick

> On Dec 11, 2019, at 5:06 AM, Paras Lehana  wrote:
> 
> Just to update, I kept the defaults. The indexing got only a little boost
> though I have decided to continue with the defaults and do incremental
> experiments only. To my surprise, our development server had only 12GB RAM,
> of which 8G was allocated to Java. Because I could not increase the RAM, I
> tried decreasing it to 4G and guess what! My indexing speed got a boost of
> over *50x*. Erick, thanks for helping. I think I should do more homework
> about GCs also. Your GC guess seems to be valid. I have raised the request
> to increase RAM on the development to 24GB.
> 
> On Mon, 9 Dec 2019 at 20:23, Erick Erickson  wrote:
> 
>> Note that that article is from 2011. That was in the Solr 3x days when
>> many, many, many things were different. There was no SolrCloud for
>> instance. Plus Tom’s problem space is indexing _books_. Whole, complete,
>> books. Which is, actually, not “normal” indexing at all as most Solr
>> indexes are much smaller documents. Books are a perfectly reasonable
>> use-case of course, but have a whole bunch of special requirements.
>> 
>> get-by-id should be very efficient, _except_ that the longer you spend
>> before opening a new searcher, the larger the internal data buffers
>> supporting get-by-id need to be.
>> 
>> Anyway, best of luck
>> Erick
>> 
>>> On Dec 9, 2019, at 1:05 AM, Paras Lehana 
>> wrote:
>>> 
>>> Hi Erick,
>>> 
>>> I have reverted back to original values and yes, I did see improvement. I
>>> will collect more stats. *Thank you for helping. :)*
>>> 
>>> Also, here is the reference article that I had referred for changing
>>> values:
>>> 
>> https://www.hathitrust.org/blogs/large-scale-search/forty-days-and-forty-nights-re-indexing-7-million-books-part-1
>>> 
>>> The article was perhaps for normal indexing and thus, suggested
>> increasing
>>> mergeFactor and then finally optimizing. In my case, a large number of
>>> segments could have impacted get-by-id of atomic updates? Just being
>>> curious.
>>> 
>>> On Fri, 6 Dec 2019 at 19:02, Paras Lehana 
>>> wrote:
>>> 
 Hey Erick,
 
 We have just upgraded to 8.3 before starting the indexing. We were on
>> 6.6
 before that.
 
 Thank you for your continued support and resources. Again, I have
>> already
 taken your suggestion to start afresh and that's what I'm going to do.
 Don't get me wrong but I have been just asking doubts. I will surely get
 back with my experience after performing the full indexing.
 
 Thanks again! :)
 
 On Fri, 6 Dec 2019 at 18:48, Erick Erickson 
 wrote:
 
> Nothing implicitly handles optimization, you must continue to do that
> externally.
> 
> Until you get to the bottom of your indexing slowdown, I wouldn’t
>> bother
> with it at all, trying to do all these things at once is what lead to
>> your
> problem in the first place, please change one thing at a time. You say:
> 
> “For a full indexing, optimizations occurred 30 times between batches”.
> 
> This is horrible. I’m not sure what version of Solr you’re using. If
>> it’s
> 7.4 or earlier, this means the the entire index was rewritten 30 times.
> The first time it would condense all segments into a single segment, or
> 1/30 of the total. The second time it would rewrite all that, 2/30 of
>> the
> index into a new segment. The third time 3/30. And so on.
> 
> If Solr 7.5 or later, it wouldn’t be as bad, assuming your index was
>> over
> 5G. But still.
> 
> See:
> 
>> https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/
>

Re: [Q] Faster Atomic Updates - use docValues?

2019-12-11 Thread Paras Lehana
Just to update, I kept the defaults. The indexing got only a little boost
though I have decided to continue with the defaults and do incremental
experiments only. To my surprise, our development server had only 12GB RAM,
of which 8G was allocated to Java. Because I could not increase the RAM, I
tried decreasing it to 4G and guess what! My indexing speed got a boost of
over *50x*. Erick, thanks for helping. I think I should do more homework
about GCs also. Your GC guess seems to be valid. I have raised the request
to increase RAM on the development to 24GB.

On Mon, 9 Dec 2019 at 20:23, Erick Erickson  wrote:

> Note that that article is from 2011. That was in the Solr 3x days when
> many, many, many things were different. There was no SolrCloud for
> instance. Plus Tom’s problem space is indexing _books_. Whole, complete,
> books. Which is, actually, not “normal” indexing at all as most Solr
> indexes are much smaller documents. Books are a perfectly reasonable
> use-case of course, but have a whole bunch of special requirements.
>
> get-by-id should be very efficient, _except_ that the longer you spend
> before opening a new searcher, the larger the internal data buffers
> supporting get-by-id need to be.
>
> Anyway, best of luck
> Erick
>
> > On Dec 9, 2019, at 1:05 AM, Paras Lehana 
> wrote:
> >
> > Hi Erick,
> >
> > I have reverted back to original values and yes, I did see improvement. I
> > will collect more stats. *Thank you for helping. :)*
> >
> > Also, here is the reference article that I had referred for changing
> > values:
> >
> https://www.hathitrust.org/blogs/large-scale-search/forty-days-and-forty-nights-re-indexing-7-million-books-part-1
> >
> > The article was perhaps for normal indexing and thus, suggested
> increasing
> > mergeFactor and then finally optimizing. In my case, a large number of
> > segments could have impacted get-by-id of atomic updates? Just being
> > curious.
> >
> > On Fri, 6 Dec 2019 at 19:02, Paras Lehana 
> > wrote:
> >
> >> Hey Erick,
> >>
> >> We have just upgraded to 8.3 before starting the indexing. We were on
> 6.6
> >> before that.
> >>
> >> Thank you for your continued support and resources. Again, I have
> already
> >> taken your suggestion to start afresh and that's what I'm going to do.
> >> Don't get me wrong but I have been just asking doubts. I will surely get
> >> back with my experience after performing the full indexing.
> >>
> >> Thanks again! :)
> >>
> >> On Fri, 6 Dec 2019 at 18:48, Erick Erickson 
> >> wrote:
> >>
> >>> Nothing implicitly handles optimization, you must continue to do that
> >>> externally.
> >>>
> >>> Until you get to the bottom of your indexing slowdown, I wouldn’t
> bother
> >>> with it at all, trying to do all these things at once is what lead to
> your
> >>> problem in the first place, please change one thing at a time. You say:
> >>>
> >>> “For a full indexing, optimizations occurred 30 times between batches”.
> >>>
> >>> This is horrible. I’m not sure what version of Solr you’re using. If
> it’s
> >>> 7.4 or earlier, this means the the entire index was rewritten 30 times.
> >>> The first time it would condense all segments into a single segment, or
> >>> 1/30 of the total. The second time it would rewrite all that, 2/30 of
> the
> >>> index into a new segment. The third time 3/30. And so on.
> >>>
> >>> If Solr 7.5 or later, it wouldn’t be as bad, assuming your index was
> over
> >>> 5G. But still.
> >>>
> >>> See:
> >>>
> https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/
> >>> for 7.4 and earlier,
> >>> https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/
> for
> >>> 7.5 and later
> >>>
> >>> Eventually you can optimize by sending in an http or curl request like
> >>> this:
> >>> ../solr/collection/update?optimize=true
> >>>
> >>> You also changed to using StandardDirectory. The default has heuristics
> >>> built in
> >>> to choose the best directory implementation.
> >>>
> >>> I can’t emphasize enough that you’re changing lots of things at one
> time.
> >>> I
> >>> _strongly_ urge you to go back to the standard setup, make _no_
> >>> modifications
> >>> and change things one at a time. Some very bright people have done a
> lot
> >>> of work to try to make Lucene/Solr work well.
> >>>
> >>> Make one change at a time. Measure. If that change isn’t helpful, undo
> it
> >>> and
> >>> move to the next one. You’re trying to second-guess the Lucene/Solr
> >>> developers who have years of understanding how this all works. Assume
> they
> >>> picked reasonable options for defaults and that Lucene/Solr performs
> >>> reasonably
> >>> well. When I get unexplainably poor results, I usually assume it was
> the
> >>> last
> >>> thing I changed….
> >>>
> >>> Best,
> >>> Erick
> >>>
> >>>
> >>>
> >>>
>  On Dec 6, 2019, at 1:31 AM, Paras Lehana 
> >>> wrote:
> 
>  Hi Erick,
> 
>  I believed optimizing explicitly merges segments and that's why I was
>  expecting it to giv

Re: [Q] Faster Atomic Updates - use docValues?

2019-12-09 Thread Erick Erickson
Note that that article is from 2011. That was in the Solr 3x days when many, 
many, many things were different. There was no SolrCloud for instance. Plus 
Tom’s problem space is indexing _books_. Whole, complete, books. Which is, 
actually, not “normal” indexing at all as most Solr indexes are much smaller 
documents. Books are a perfectly reasonable use-case of course, but have a 
whole bunch of special requirements.

get-by-id should be very efficient, _except_ that the longer you spend before 
opening a new searcher, the larger the internal data buffers supporting 
get-by-id need to be.

Anyway, best of luck
Erick

> On Dec 9, 2019, at 1:05 AM, Paras Lehana  wrote:
> 
> Hi Erick,
> 
> I have reverted back to original values and yes, I did see improvement. I
> will collect more stats. *Thank you for helping. :)*
> 
> Also, here is the reference article that I had referred for changing
> values:
> https://www.hathitrust.org/blogs/large-scale-search/forty-days-and-forty-nights-re-indexing-7-million-books-part-1
> 
> The article was perhaps for normal indexing and thus, suggested increasing
> mergeFactor and then finally optimizing. In my case, a large number of
> segments could have impacted get-by-id of atomic updates? Just being
> curious.
> 
> On Fri, 6 Dec 2019 at 19:02, Paras Lehana 
> wrote:
> 
>> Hey Erick,
>> 
>> We have just upgraded to 8.3 before starting the indexing. We were on 6.6
>> before that.
>> 
>> Thank you for your continued support and resources. Again, I have already
>> taken your suggestion to start afresh and that's what I'm going to do.
>> Don't get me wrong but I have been just asking doubts. I will surely get
>> back with my experience after performing the full indexing.
>> 
>> Thanks again! :)
>> 
>> On Fri, 6 Dec 2019 at 18:48, Erick Erickson 
>> wrote:
>> 
>>> Nothing implicitly handles optimization, you must continue to do that
>>> externally.
>>> 
>>> Until you get to the bottom of your indexing slowdown, I wouldn’t bother
>>> with it at all, trying to do all these things at once is what lead to your
>>> problem in the first place, please change one thing at a time. You say:
>>> 
>>> “For a full indexing, optimizations occurred 30 times between batches”.
>>> 
>>> This is horrible. I’m not sure what version of Solr you’re using. If it’s
>>> 7.4 or earlier, this means the the entire index was rewritten 30 times.
>>> The first time it would condense all segments into a single segment, or
>>> 1/30 of the total. The second time it would rewrite all that, 2/30 of the
>>> index into a new segment. The third time 3/30. And so on.
>>> 
>>> If Solr 7.5 or later, it wouldn’t be as bad, assuming your index was over
>>> 5G. But still.
>>> 
>>> See:
>>> https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/
>>> for 7.4 and earlier,
>>> https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/ for
>>> 7.5 and later
>>> 
>>> Eventually you can optimize by sending in an http or curl request like
>>> this:
>>> ../solr/collection/update?optimize=true
>>> 
>>> You also changed to using StandardDirectory. The default has heuristics
>>> built in
>>> to choose the best directory implementation.
>>> 
>>> I can’t emphasize enough that you’re changing lots of things at one time.
>>> I
>>> _strongly_ urge you to go back to the standard setup, make _no_
>>> modifications
>>> and change things one at a time. Some very bright people have done a lot
>>> of work to try to make Lucene/Solr work well.
>>> 
>>> Make one change at a time. Measure. If that change isn’t helpful, undo it
>>> and
>>> move to the next one. You’re trying to second-guess the Lucene/Solr
>>> developers who have years of understanding how this all works. Assume they
>>> picked reasonable options for defaults and that Lucene/Solr performs
>>> reasonably
>>> well. When I get unexplainably poor results, I usually assume it was the
>>> last
>>> thing I changed….
>>> 
>>> Best,
>>> Erick
>>> 
>>> 
>>> 
>>> 
 On Dec 6, 2019, at 1:31 AM, Paras Lehana 
>>> wrote:
 
 Hi Erick,
 
 I believed optimizing explicitly merges segments and that's why I was
 expecting it to give performance boost. I know that optimizations should
 not be done very frequently. For a full indexing, optimizations
>>> occurred 30
 times between batches. I take your suggestion to undo all the changes
>>> and
 that's what I'm going to do. I mentioned about the optimizations giving
>>> an
 indexing boost (for sometime) only to support your point of my
>>> mergePolicy
 backfiring. I will certainly read again about the merge process.
 
 Taking your suggestions - so, commits would be handled by autoCommit.
>>> What
 implicitly handles optimizations? I think the merge policy or is there
>>> any
 other setting I'm missing?
 
 I'm indexing via Curl API on the same server. The Current Speed of curl
>>> is
 only 50k (down from 1300k in the first batch). I think - as the curl is
>

Re: [Q] Faster Atomic Updates - use docValues?

2019-12-08 Thread Paras Lehana
Hi Erick,

I have reverted back to original values and yes, I did see improvement. I
will collect more stats. *Thank you for helping. :)*

Also, here is the reference article that I had referred for changing
values:
https://www.hathitrust.org/blogs/large-scale-search/forty-days-and-forty-nights-re-indexing-7-million-books-part-1

The article was perhaps for normal indexing and thus, suggested increasing
mergeFactor and then finally optimizing. In my case, a large number of
segments could have impacted get-by-id of atomic updates? Just being
curious.

On Fri, 6 Dec 2019 at 19:02, Paras Lehana 
wrote:

> Hey Erick,
>
> We have just upgraded to 8.3 before starting the indexing. We were on 6.6
> before that.
>
> Thank you for your continued support and resources. Again, I have already
> taken your suggestion to start afresh and that's what I'm going to do.
> Don't get me wrong but I have been just asking doubts. I will surely get
> back with my experience after performing the full indexing.
>
> Thanks again! :)
>
> On Fri, 6 Dec 2019 at 18:48, Erick Erickson 
> wrote:
>
>> Nothing implicitly handles optimization, you must continue to do that
>> externally.
>>
>> Until you get to the bottom of your indexing slowdown, I wouldn’t bother
>> with it at all, trying to do all these things at once is what lead to your
>> problem in the first place, please change one thing at a time. You say:
>>
>> “For a full indexing, optimizations occurred 30 times between batches”.
>>
>> This is horrible. I’m not sure what version of Solr you’re using. If it’s
>> 7.4 or earlier, this means the the entire index was rewritten 30 times.
>> The first time it would condense all segments into a single segment, or
>> 1/30 of the total. The second time it would rewrite all that, 2/30 of the
>> index into a new segment. The third time 3/30. And so on.
>>
>> If Solr 7.5 or later, it wouldn’t be as bad, assuming your index was over
>> 5G. But still.
>>
>> See:
>> https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/
>> for 7.4 and earlier,
>> https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/ for
>> 7.5 and later
>>
>> Eventually you can optimize by sending in an http or curl request like
>> this:
>> ../solr/collection/update?optimize=true
>>
>> You also changed to using StandardDirectory. The default has heuristics
>> built in
>> to choose the best directory implementation.
>>
>> I can’t emphasize enough that you’re changing lots of things at one time.
>> I
>> _strongly_ urge you to go back to the standard setup, make _no_
>> modifications
>> and change things one at a time. Some very bright people have done a lot
>> of work to try to make Lucene/Solr work well.
>>
>> Make one change at a time. Measure. If that change isn’t helpful, undo it
>> and
>> move to the next one. You’re trying to second-guess the Lucene/Solr
>> developers who have years of understanding how this all works. Assume they
>> picked reasonable options for defaults and that Lucene/Solr performs
>> reasonably
>> well. When I get unexplainably poor results, I usually assume it was the
>> last
>> thing I changed….
>>
>> Best,
>> Erick
>>
>>
>>
>>
>> > On Dec 6, 2019, at 1:31 AM, Paras Lehana 
>> wrote:
>> >
>> > Hi Erick,
>> >
>> > I believed optimizing explicitly merges segments and that's why I was
>> > expecting it to give performance boost. I know that optimizations should
>> > not be done very frequently. For a full indexing, optimizations
>> occurred 30
>> > times between batches. I take your suggestion to undo all the changes
>> and
>> > that's what I'm going to do. I mentioned about the optimizations giving
>> an
>> > indexing boost (for sometime) only to support your point of my
>> mergePolicy
>> > backfiring. I will certainly read again about the merge process.
>> >
>> > Taking your suggestions - so, commits would be handled by autoCommit.
>> What
>> > implicitly handles optimizations? I think the merge policy or is there
>> any
>> > other setting I'm missing?
>> >
>> > I'm indexing via Curl API on the same server. The Current Speed of curl
>> is
>> > only 50k (down from 1300k in the first batch). I think - as the curl is
>> > transmitting the XML, the documents are getting indexing. Because then
>> only
>> > would speed be so low. I don't think that the whole XML is taking the
>> > memory - I remember I had to change the curl options to get rid of the
>> > transmission error for large files.
>> >
>> > This is my curl request:
>> >
>> > curl 'http://localhost:$port/solr/product/update?commit=true'  -T
>> > batch1.xml -X POST -H 'Content-type:text/xml
>> >
>> > Although, we had been doing this since ages - I think I should now
>> consider
>> > using the solr post service (since the indexing files stays on the same
>> > server) or using Solarium (we use PHP to make XMLs).
>> >
>> > On Thu, 5 Dec 2019 at 20:00, Erick Erickson 
>> wrote:
>> >
>> >>> I think I should have also done optimize between batches, no?
>> >>
>> >> No, 

Re: [Q] Faster Atomic Updates - use docValues?

2019-12-06 Thread Paras Lehana
Hey Erick,

We have just upgraded to 8.3 before starting the indexing. We were on 6.6
before that.

Thank you for your continued support and resources. Again, I have already
taken your suggestion to start afresh and that's what I'm going to do.
Don't get me wrong but I have been just asking doubts. I will surely get
back with my experience after performing the full indexing.

Thanks again! :)

On Fri, 6 Dec 2019 at 18:48, Erick Erickson  wrote:

> Nothing implicitly handles optimization, you must continue to do that
> externally.
>
> Until you get to the bottom of your indexing slowdown, I wouldn’t bother
> with it at all, trying to do all these things at once is what lead to your
> problem in the first place, please change one thing at a time. You say:
>
> “For a full indexing, optimizations occurred 30 times between batches”.
>
> This is horrible. I’m not sure what version of Solr you’re using. If it’s
> 7.4 or earlier, this means the the entire index was rewritten 30 times.
> The first time it would condense all segments into a single segment, or
> 1/30 of the total. The second time it would rewrite all that, 2/30 of the
> index into a new segment. The third time 3/30. And so on.
>
> If Solr 7.5 or later, it wouldn’t be as bad, assuming your index was over
> 5G. But still.
>
> See:
> https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/
> for 7.4 and earlier,
> https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/ for
> 7.5 and later
>
> Eventually you can optimize by sending in an http or curl request like
> this:
> ../solr/collection/update?optimize=true
>
> You also changed to using StandardDirectory. The default has heuristics
> built in
> to choose the best directory implementation.
>
> I can’t emphasize enough that you’re changing lots of things at one time. I
> _strongly_ urge you to go back to the standard setup, make _no_
> modifications
> and change things one at a time. Some very bright people have done a lot
> of work to try to make Lucene/Solr work well.
>
> Make one change at a time. Measure. If that change isn’t helpful, undo it
> and
> move to the next one. You’re trying to second-guess the Lucene/Solr
> developers who have years of understanding how this all works. Assume they
> picked reasonable options for defaults and that Lucene/Solr performs
> reasonably
> well. When I get unexplainably poor results, I usually assume it was the
> last
> thing I changed….
>
> Best,
> Erick
>
>
>
>
> > On Dec 6, 2019, at 1:31 AM, Paras Lehana 
> wrote:
> >
> > Hi Erick,
> >
> > I believed optimizing explicitly merges segments and that's why I was
> > expecting it to give performance boost. I know that optimizations should
> > not be done very frequently. For a full indexing, optimizations occurred
> 30
> > times between batches. I take your suggestion to undo all the changes and
> > that's what I'm going to do. I mentioned about the optimizations giving
> an
> > indexing boost (for sometime) only to support your point of my
> mergePolicy
> > backfiring. I will certainly read again about the merge process.
> >
> > Taking your suggestions - so, commits would be handled by autoCommit.
> What
> > implicitly handles optimizations? I think the merge policy or is there
> any
> > other setting I'm missing?
> >
> > I'm indexing via Curl API on the same server. The Current Speed of curl
> is
> > only 50k (down from 1300k in the first batch). I think - as the curl is
> > transmitting the XML, the documents are getting indexing. Because then
> only
> > would speed be so low. I don't think that the whole XML is taking the
> > memory - I remember I had to change the curl options to get rid of the
> > transmission error for large files.
> >
> > This is my curl request:
> >
> > curl 'http://localhost:$port/solr/product/update?commit=true'  -T
> > batch1.xml -X POST -H 'Content-type:text/xml
> >
> > Although, we had been doing this since ages - I think I should now
> consider
> > using the solr post service (since the indexing files stays on the same
> > server) or using Solarium (we use PHP to make XMLs).
> >
> > On Thu, 5 Dec 2019 at 20:00, Erick Erickson 
> wrote:
> >
> >>> I think I should have also done optimize between batches, no?
> >>
> >> No, no, no, no. Absolutely not. Never. Never, never, never between
> batches.
> >> I don’t  recommend optimizing at _all_ unless there are demonstrable
> >> improvements.
> >>
> >> Please don’t take this the wrong way, the whole merge process is really
> >> hard to get your head around. But the very fact that you’d suggest
> >> optimizing between batches shows that the entire merge process is
> >> opaque to you. I’ve seen many people just start changing things and
> >> get themselves into a bad place, then try to change more things to get
> >> out of that hole. Rinse. Repeat.
> >>
> >> I _strongly_ recommend that you undo all your changes. Neither
> >> commit nor optimize from outside Solr. Set your autocommit
> >> settings to somethi

Re: [Q] Faster Atomic Updates - use docValues?

2019-12-06 Thread Erick Erickson
Nothing implicitly handles optimization, you must continue to do that
externally.

Until you get to the bottom of your indexing slowdown, I wouldn’t bother
with it at all, trying to do all these things at once is what lead to your
problem in the first place, please change one thing at a time. You say:

“For a full indexing, optimizations occurred 30 times between batches”.

This is horrible. I’m not sure what version of Solr you’re using. If it’s
7.4 or earlier, this means the the entire index was rewritten 30 times.
The first time it would condense all segments into a single segment, or
1/30 of the total. The second time it would rewrite all that, 2/30 of the
index into a new segment. The third time 3/30. And so on.

If Solr 7.5 or later, it wouldn’t be as bad, assuming your index was over
5G. But still.

See: 
https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/ 
for 7.4 and earlier,
https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/ for 7.5 and 
later

Eventually you can optimize by sending in an http or curl request like this:
../solr/collection/update?optimize=true

You also changed to using StandardDirectory. The default has heuristics built in
to choose the best directory implementation.

I can’t emphasize enough that you’re changing lots of things at one time. I
_strongly_ urge you to go back to the standard setup, make _no_ modifications
and change things one at a time. Some very bright people have done a lot
of work to try to make Lucene/Solr work well.

Make one change at a time. Measure. If that change isn’t helpful, undo it and
move to the next one. You’re trying to second-guess the Lucene/Solr
developers who have years of understanding how this all works. Assume they
picked reasonable options for defaults and that Lucene/Solr performs reasonably
well. When I get unexplainably poor results, I usually assume it was the last 
thing I changed….

Best,
Erick




> On Dec 6, 2019, at 1:31 AM, Paras Lehana  wrote:
> 
> Hi Erick,
> 
> I believed optimizing explicitly merges segments and that's why I was
> expecting it to give performance boost. I know that optimizations should
> not be done very frequently. For a full indexing, optimizations occurred 30
> times between batches. I take your suggestion to undo all the changes and
> that's what I'm going to do. I mentioned about the optimizations giving an
> indexing boost (for sometime) only to support your point of my mergePolicy
> backfiring. I will certainly read again about the merge process.
> 
> Taking your suggestions - so, commits would be handled by autoCommit. What
> implicitly handles optimizations? I think the merge policy or is there any
> other setting I'm missing?
> 
> I'm indexing via Curl API on the same server. The Current Speed of curl is
> only 50k (down from 1300k in the first batch). I think - as the curl is
> transmitting the XML, the documents are getting indexing. Because then only
> would speed be so low. I don't think that the whole XML is taking the
> memory - I remember I had to change the curl options to get rid of the
> transmission error for large files.
> 
> This is my curl request:
> 
> curl 'http://localhost:$port/solr/product/update?commit=true'  -T
> batch1.xml -X POST -H 'Content-type:text/xml
> 
> Although, we had been doing this since ages - I think I should now consider
> using the solr post service (since the indexing files stays on the same
> server) or using Solarium (we use PHP to make XMLs).
> 
> On Thu, 5 Dec 2019 at 20:00, Erick Erickson  wrote:
> 
>>> I think I should have also done optimize between batches, no?
>> 
>> No, no, no, no. Absolutely not. Never. Never, never, never between batches.
>> I don’t  recommend optimizing at _all_ unless there are demonstrable
>> improvements.
>> 
>> Please don’t take this the wrong way, the whole merge process is really
>> hard to get your head around. But the very fact that you’d suggest
>> optimizing between batches shows that the entire merge process is
>> opaque to you. I’ve seen many people just start changing things and
>> get themselves into a bad place, then try to change more things to get
>> out of that hole. Rinse. Repeat.
>> 
>> I _strongly_ recommend that you undo all your changes. Neither
>> commit nor optimize from outside Solr. Set your autocommit
>> settings to something like 5 minutes with openSearcher=true.
>> Set all autowarm counts in your caches in solrconfig.xml to 0,
>> especially filterCache and queryResultCache.
>> 
>> Do not set soft commit at all, leave it at -1.
>> 
>> Repeat do _not_ commit or optimize from the client! Just let your
>> autocommit settings do the commits.
>> 
>> It’s also pushing things to send 5M docs in a single XML packet.
>> That all has to be held in memory and then indexed, adding to
>> pressure on the heap. I usually index from SolrJ in batches
>> of 1,000. See:
>> https://lucidworks.com/post/indexing-with-solrj/
>> 
>> Simply put, your slowdown should not be happening. 

Re: [Q] Faster Atomic Updates - use docValues?

2019-12-05 Thread Paras Lehana
Hi Erick,

I believed optimizing explicitly merges segments and that's why I was
expecting it to give performance boost. I know that optimizations should
not be done very frequently. For a full indexing, optimizations occurred 30
times between batches. I take your suggestion to undo all the changes and
that's what I'm going to do. I mentioned about the optimizations giving an
indexing boost (for sometime) only to support your point of my mergePolicy
backfiring. I will certainly read again about the merge process.

Taking your suggestions - so, commits would be handled by autoCommit. What
implicitly handles optimizations? I think the merge policy or is there any
other setting I'm missing?

I'm indexing via Curl API on the same server. The Current Speed of curl is
only 50k (down from 1300k in the first batch). I think - as the curl is
transmitting the XML, the documents are getting indexing. Because then only
would speed be so low. I don't think that the whole XML is taking the
memory - I remember I had to change the curl options to get rid of the
transmission error for large files.

This is my curl request:

curl 'http://localhost:$port/solr/product/update?commit=true'  -T
batch1.xml -X POST -H 'Content-type:text/xml

Although, we had been doing this since ages - I think I should now consider
using the solr post service (since the indexing files stays on the same
server) or using Solarium (we use PHP to make XMLs).

On Thu, 5 Dec 2019 at 20:00, Erick Erickson  wrote:

> >  I think I should have also done optimize between batches, no?
>
> No, no, no, no. Absolutely not. Never. Never, never, never between batches.
> I don’t  recommend optimizing at _all_ unless there are demonstrable
> improvements.
>
> Please don’t take this the wrong way, the whole merge process is really
> hard to get your head around. But the very fact that you’d suggest
> optimizing between batches shows that the entire merge process is
> opaque to you. I’ve seen many people just start changing things and
> get themselves into a bad place, then try to change more things to get
> out of that hole. Rinse. Repeat.
>
> I _strongly_ recommend that you undo all your changes. Neither
> commit nor optimize from outside Solr. Set your autocommit
> settings to something like 5 minutes with openSearcher=true.
> Set all autowarm counts in your caches in solrconfig.xml to 0,
> especially filterCache and queryResultCache.
>
> Do not set soft commit at all, leave it at -1.
>
> Repeat do _not_ commit or optimize from the client! Just let your
> autocommit settings do the commits.
>
> It’s also pushing things to send 5M docs in a single XML packet.
> That all has to be held in memory and then indexed, adding to
> pressure on the heap. I usually index from SolrJ in batches
> of 1,000. See:
> https://lucidworks.com/post/indexing-with-solrj/
>
> Simply put, your slowdown should not be happening. I strongly
> believe that it’s something in your environment, most likely
> 1> your changes eventually shoot you in the foot OR
> 2> you are running in too little memory and eventually GC is killing you.
> Really, analyze your GC logs. OR
> 3> you are running on underpowered hardware which just can’t take the load
> OR
> 4> something else in your environment
>
> I’ve never heard of a Solr installation with such a massive slowdown during
> indexing that was fixed by tweaking things like the merge policy etc.
>
> Best,
> Erick
>
>
> > On Dec 5, 2019, at 12:57 AM, Paras Lehana 
> wrote:
> >
> > Hey Erick,
> >
> > This is a huge red flag to me: "(but I could only test for the first few
> >> thousand documents”.
> >
> >
> > Yup, that's probably where the culprit lies. I could only test for the
> > starting batch because I had to wait for a day to actually compare. I
> > tweaked the merge values and kept whatever gave a speed boost. My first
> > batch of 5 million docs took only 40 minutes (atomic updates included)
> and
> > the last batch of 5 million took more than 18 hours. If this is an issue
> of
> > mergePolicy, I think I should have also done optimize between batches,
> no?
> > I remember, when I indexed a single XML of 80 million after optimizing
> the
> > core already indexed with 30 XMLs of 5 million each, I could post 80
> > million in a day only.
> >
> >
> >
> >> The indexing rate you’re seeing is abysmal unless these are _huge_
> >> documents
> >
> >
> > Documents only contain the suggestion name, possible titles,
> > phonetics/spellcheck/synonym fields and numerical fields for boosting.
> They
> > are far smaller than what a Search Document would contain. Auto-Suggest
> is
> > only concerned about suggestions so you can guess how simple the
> documents
> > would be.
> >
> >
> > Some data is held on the heap and some in the OS RAM due to MMapDirectory
> >
> >
> > I'm using StandardDirectory (which will make Solr choose the right
> > implementation). Also, planning to read more about these (looking forward
> > to use MMap). Thanks for the article!
> >
> >
> > 

Re: [Q] Faster Atomic Updates - use docValues?

2019-12-05 Thread Erick Erickson
>  I think I should have also done optimize between batches, no?

No, no, no, no. Absolutely not. Never. Never, never, never between batches.
I don’t  recommend optimizing at _all_ unless there are demonstrable
improvements.

Please don’t take this the wrong way, the whole merge process is really
hard to get your head around. But the very fact that you’d suggest
optimizing between batches shows that the entire merge process is
opaque to you. I’ve seen many people just start changing things and
get themselves into a bad place, then try to change more things to get
out of that hole. Rinse. Repeat.

I _strongly_ recommend that you undo all your changes. Neither
commit nor optimize from outside Solr. Set your autocommit
settings to something like 5 minutes with openSearcher=true.
Set all autowarm counts in your caches in solrconfig.xml to 0,
especially filterCache and queryResultCache.

Do not set soft commit at all, leave it at -1.

Repeat do _not_ commit or optimize from the client! Just let your
autocommit settings do the commits.

It’s also pushing things to send 5M docs in a single XML packet.
That all has to be held in memory and then indexed, adding to
pressure on the heap. I usually index from SolrJ in batches
of 1,000. See:
https://lucidworks.com/post/indexing-with-solrj/

Simply put, your slowdown should not be happening. I strongly
believe that it’s something in your environment, most likely
1> your changes eventually shoot you in the foot OR
2> you are running in too little memory and eventually GC is killing you. 
Really, analyze your GC logs. OR
3> you are running on underpowered hardware which just can’t take the load OR
4> something else in your environment

I’ve never heard of a Solr installation with such a massive slowdown during
indexing that was fixed by tweaking things like the merge policy etc.

Best,
Erick


> On Dec 5, 2019, at 12:57 AM, Paras Lehana  wrote:
> 
> Hey Erick,
> 
> This is a huge red flag to me: "(but I could only test for the first few
>> thousand documents”.
> 
> 
> Yup, that's probably where the culprit lies. I could only test for the
> starting batch because I had to wait for a day to actually compare. I
> tweaked the merge values and kept whatever gave a speed boost. My first
> batch of 5 million docs took only 40 minutes (atomic updates included) and
> the last batch of 5 million took more than 18 hours. If this is an issue of
> mergePolicy, I think I should have also done optimize between batches, no?
> I remember, when I indexed a single XML of 80 million after optimizing the
> core already indexed with 30 XMLs of 5 million each, I could post 80
> million in a day only.
> 
> 
> 
>> The indexing rate you’re seeing is abysmal unless these are _huge_
>> documents
> 
> 
> Documents only contain the suggestion name, possible titles,
> phonetics/spellcheck/synonym fields and numerical fields for boosting. They
> are far smaller than what a Search Document would contain. Auto-Suggest is
> only concerned about suggestions so you can guess how simple the documents
> would be.
> 
> 
> Some data is held on the heap and some in the OS RAM due to MMapDirectory
> 
> 
> I'm using StandardDirectory (which will make Solr choose the right
> implementation). Also, planning to read more about these (looking forward
> to use MMap). Thanks for the article!
> 
> 
> You're right. I should change one thing at a time. Let me experiment and
> then I will summarize here what I tried. Thank you for your responses. :)
> 
> On Wed, 4 Dec 2019 at 20:31, Erick Erickson  wrote:
> 
>> This is a huge red flag to me: "(but I could only test for the first few
>> thousand documents”
>> 
>> You’re probably right that that would speed things up, but pretty soon
>> when you’re indexing
>> your entire corpus there are lots of other considerations.
>> 
>> The indexing rate you’re seeing is abysmal unless these are _huge_
>> documents, but you
>> indicate that at the start you’re getting 1,400 docs/second so I don’t
>> think the complexity
>> of the docs is the issue here.
>> 
>> Do note that when we’re throwing RAM figures out, we need to draw a sharp
>> distinction
>> between Java heap and total RAM. Some data is held on the heap and some in
>> the OS
>> RAM due to MMapDirectory, see Uwe’s excellent article:
>> https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>> 
>> Uwe recommends about 25% of your available physical RAM be allocated to
>> Java as
>> a starting point. Your particular Solr installation may need a larger
>> percent, IDK.
>> 
>> But basically I’d go back to all default settings and change one thing at
>> a time.
>> First, I’d look at GC performance. Is it taking all your CPU? In which
>> case you probably need to
>> increase your heap. I pick this first because it’s very common that this
>> is a root cause.
>> 
>> Next, I’d put a profiler on it to see exactly where I’m spending time.
>> Otherwise you wind
>> up making random changes and hoping one of them works.
>> 
>>

Re: [Q] Faster Atomic Updates - use docValues?

2019-12-04 Thread Paras Lehana
Hey Erick,

This is a huge red flag to me: "(but I could only test for the first few
> thousand documents”.


Yup, that's probably where the culprit lies. I could only test for the
starting batch because I had to wait for a day to actually compare. I
tweaked the merge values and kept whatever gave a speed boost. My first
batch of 5 million docs took only 40 minutes (atomic updates included) and
the last batch of 5 million took more than 18 hours. If this is an issue of
mergePolicy, I think I should have also done optimize between batches, no?
I remember, when I indexed a single XML of 80 million after optimizing the
core already indexed with 30 XMLs of 5 million each, I could post 80
million in a day only.



> The indexing rate you’re seeing is abysmal unless these are _huge_
> documents


Documents only contain the suggestion name, possible titles,
phonetics/spellcheck/synonym fields and numerical fields for boosting. They
are far smaller than what a Search Document would contain. Auto-Suggest is
only concerned about suggestions so you can guess how simple the documents
would be.


Some data is held on the heap and some in the OS RAM due to MMapDirectory


I'm using StandardDirectory (which will make Solr choose the right
implementation). Also, planning to read more about these (looking forward
to use MMap). Thanks for the article!


You're right. I should change one thing at a time. Let me experiment and
then I will summarize here what I tried. Thank you for your responses. :)

On Wed, 4 Dec 2019 at 20:31, Erick Erickson  wrote:

> This is a huge red flag to me: "(but I could only test for the first few
> thousand documents”
>
> You’re probably right that that would speed things up, but pretty soon
> when you’re indexing
> your entire corpus there are lots of other considerations.
>
> The indexing rate you’re seeing is abysmal unless these are _huge_
> documents, but you
> indicate that at the start you’re getting 1,400 docs/second so I don’t
> think the complexity
> of the docs is the issue here.
>
> Do note that when we’re throwing RAM figures out, we need to draw a sharp
> distinction
> between Java heap and total RAM. Some data is held on the heap and some in
> the OS
> RAM due to MMapDirectory, see Uwe’s excellent article:
> https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> Uwe recommends about 25% of your available physical RAM be allocated to
> Java as
> a starting point. Your particular Solr installation may need a larger
> percent, IDK.
>
> But basically I’d go back to all default settings and change one thing at
> a time.
> First, I’d look at GC performance. Is it taking all your CPU? In which
> case you probably need to
> increase your heap. I pick this first because it’s very common that this
> is a root cause.
>
> Next, I’d put a profiler on it to see exactly where I’m spending time.
> Otherwise you wind
> up making random changes and hoping one of them works.
>
> Best,
> Erick
>
> > On Dec 4, 2019, at 3:21 AM, Paras Lehana 
> wrote:
> >
> > (but I could only test for the first few
> > thousand documents
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 


Re: [Q] Faster Atomic Updates - use docValues?

2019-12-04 Thread Erick Erickson
This is a huge red flag to me: "(but I could only test for the first few 
thousand documents”

You’re probably right that that would speed things up, but pretty soon when 
you’re indexing
your entire corpus there are lots of other considerations.

The indexing rate you’re seeing is abysmal unless these are _huge_ documents, 
but you
indicate that at the start you’re getting 1,400 docs/second so I don’t think 
the complexity
of the docs is the issue here.

Do note that when we’re throwing RAM figures out, we need to draw a sharp 
distinction
between Java heap and total RAM. Some data is held on the heap and some in the 
OS
RAM due to MMapDirectory, see Uwe’s excellent article:
https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Uwe recommends about 25% of your available physical RAM be allocated to Java as
a starting point. Your particular Solr installation may need a larger percent, 
IDK.

But basically I’d go back to all default settings and change one thing at a 
time.
First, I’d look at GC performance. Is it taking all your CPU? In which case you 
probably need to 
increase your heap. I pick this first because it’s very common that this is a 
root cause.

Next, I’d put a profiler on it to see exactly where I’m spending time. 
Otherwise you wind
up making random changes and hoping one of them works.

Best,
Erick

> On Dec 4, 2019, at 3:21 AM, Paras Lehana  wrote:
> 
> (but I could only test for the first few
> thousand documents



Re: [Q] Faster Atomic Updates - use docValues?

2019-12-04 Thread Paras Lehana
 50M docs.
>
> So here’s what I’d do:
> 1> go back to the defaults for TieredMergePolicy and RamBufferSizeMB
> 2> measure first, tweak later. Analyze your GC logs to see whether
>  you’re taking an inordinate amount of time doing GC coincident with
>  your slowness. If so, adjust your heap.
> 3> If it’s not GC, put a profiler on it and find out where, exactly, you’re
>  spending your time.
>
> Best,
> Erick
>
>
> > We occasionally reindex whole data to our Auto-Suggest corpus. Total
> > documents to be indexed are around 250 million while, due to atomic
> > updates, total unique documents after full indexing converges to 60
> > million.
> >
> > We have to atomically index documents to store different names for the
> same
> > product (like "bag" and "bags"), to increase demand and to store the
> months
> > they were searched for in the past. One approach could be to calculate
> all
> > this beforehand and then index normally to Solr (non-atomic).
> >
> > Once the atomic updates process over 50 million documents, the speed of
> > indexing drops down to more than 10x of initial speed.
> >
> > As what I have learnt, atomic updates fetch the matching document by
> > uniqueKey and then does the normal index using the information in the
> > fetched document. Is this actually taking time? As the number of
> documents
> > increases, Solr might be taking time to fetch the stored document.
> >
> > But shouldn't the fetch by uniqueKey take O(1) time? If this really
> impacts
> > the fetch, can we use docValues for the field id (uniqueKey)? Our field
> is
> > of type string.
> >
> >
> >
> > I'm pasting my config lines that may impact this:
> >
> >
> --
> >
> > -Xmx8g -Xms8g
> >
> >  required="true"
> > omitNorms="false" multiValued="false" />
> > id
> >
> > 2000
> >
> >  class="org.apache.solr.index.TieredMergePolicyFactory">
> > 50
> > 50
> > 150
> > 
> >
> > 
> >10
> >12
> >false
> > 
> >
> >
> --
> >
> >
> >
> > A normal indexing that should take less than 1 day actually takes over 5
> > days with atomic updates. Any experience or suggestion will help. How do
> > expedite your indexing process specifically atomic updates? I know this
> > might have been asked so many times and I have actually read/implemented
> > all of the recommendations. My question is specific to Atomic Updates and
> > if something exclusive to Atomic Updates can make it faster.
> >
> >
> > --
> > --
> > Regards,
> >
> > *Paras Lehana* [65871]
> > Development Engineer, Auto-Suggest,
> > IndiaMART Intermesh Ltd.
> >
> > 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> > Noida, UP, IN - 201303
> >
> > Mob.: +91-9560911996
> > Work: 01203916600 | Extn:  *8173*
> >
> > --
> > *
> > *
> >
> > <https://www.facebook.com/IndiaMART/videos/578196442936091/>
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: [Q] Faster Atomic Updates - use docValues?

2019-12-03 Thread Erick Erickson
Do you have empirical evidence that all these parameter changes are doing you 
any good?

The first thing I note is that 8G for a 250M document index is a red flag. If 
you’re running on
a larger machine, I’d increase that to 16G as a test. I’ve seen GC start to 
take up more and
more CPU as you get closer to the max, sometimes to the point of having a 90% or
more of the CPU consumed by GC.

The second thing is you have no searchers being opened. Solr has to keep 
certain in-memory
structures in place to support Real Time Get that only gets reclaimed when a new
searcher is opened. Perhaps that’s chewing up memory and getting to a tipping 
point.

Why did you increase RamBufferSizeMB?  I’ve rarely found much increase in 
throughput
over the default 100M. It’s probably not very useful anyway since, unless your 
autocommit
limits mean that unless you’re using that full 2G for 100,000 docs or within 2 
minutes, it
won’t be used up anyway.

The third thing is that you have changed the TieredMergePolicy extensively. When
background merges kick in, they’ll be HUGE. Further, the settings will probably
cause you to have a lot of segments, which is not ideal.

Fourth why do you think the lookup of the  has anything to do with
your slowdown? If I’m reading this right, you do atomic updates on 50M docs
_then_ things get slow. If it was a  lookup I should think it’d
be a problem for the first 50M docs.

So here’s what I’d do:
1> go back to the defaults for TieredMergePolicy and RamBufferSizeMB
2> measure first, tweak later. Analyze your GC logs to see whether
 you’re taking an inordinate amount of time doing GC coincident with
 your slowness. If so, adjust your heap.
3> If it’s not GC, put a profiler on it and find out where, exactly, you’re
 spending your time.

Best,
Erick


> We occasionally reindex whole data to our Auto-Suggest corpus. Total
> documents to be indexed are around 250 million while, due to atomic
> updates, total unique documents after full indexing converges to 60
> million.
> 
> We have to atomically index documents to store different names for the same
> product (like "bag" and "bags"), to increase demand and to store the months
> they were searched for in the past. One approach could be to calculate all
> this beforehand and then index normally to Solr (non-atomic).
> 
> Once the atomic updates process over 50 million documents, the speed of
> indexing drops down to more than 10x of initial speed.
> 
> As what I have learnt, atomic updates fetch the matching document by
> uniqueKey and then does the normal index using the information in the
> fetched document. Is this actually taking time? As the number of documents
> increases, Solr might be taking time to fetch the stored document.
> 
> But shouldn't the fetch by uniqueKey take O(1) time? If this really impacts
> the fetch, can we use docValues for the field id (uniqueKey)? Our field is
> of type string.
> 
> 
> 
> I'm pasting my config lines that may impact this:
> 
> --
> 
> -Xmx8g -Xms8g
> 
>  omitNorms="false" multiValued="false" />
> id
> 
> 2000
> 
> 
> 50
> 50
> 150
> 
> 
> 
>10
>12
>false
> 
> 
> --
> 
> 
> 
> A normal indexing that should take less than 1 day actually takes over 5
> days with atomic updates. Any experience or suggestion will help. How do
> expedite your indexing process specifically atomic updates? I know this
> might have been asked so many times and I have actually read/implemented
> all of the recommendations. My question is specific to Atomic Updates and
> if something exclusive to Atomic Updates can make it faster.
> 
> 
> -- 
> -- 
> Regards,
> 
> *Paras Lehana* [65871]
> Development Engineer, Auto-Suggest,
> IndiaMART Intermesh Ltd.
> 
> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> Noida, UP, IN - 201303
> 
> Mob.: +91-9560911996
> Work: 01203916600 | Extn:  *8173*
> 
> -- 
> *
> *
> 
> <https://www.facebook.com/IndiaMART/videos/578196442936091/>



[Q] Faster Atomic Updates - use docValues?

2019-12-03 Thread Paras Lehana
Hi Community,

We occasionally reindex whole data to our Auto-Suggest corpus. Total
documents to be indexed are around 250 million while, due to atomic
updates, total unique documents after full indexing converges to 60
million.

We have to atomically index documents to store different names for the same
product (like "bag" and "bags"), to increase demand and to store the months
they were searched for in the past. One approach could be to calculate all
this beforehand and then index normally to Solr (non-atomic).

Once the atomic updates process over 50 million documents, the speed of
indexing drops down to more than 10x of initial speed.

As what I have learnt, atomic updates fetch the matching document by
uniqueKey and then does the normal index using the information in the
fetched document. Is this actually taking time? As the number of documents
increases, Solr might be taking time to fetch the stored document.

But shouldn't the fetch by uniqueKey take O(1) time? If this really impacts
the fetch, can we use docValues for the field id (uniqueKey)? Our field is
of type string.



I'm pasting my config lines that may impact this:

--

-Xmx8g -Xms8g


id

2000


 50
 50
150
 


10
12
false


--



A normal indexing that should take less than 1 day actually takes over 5
days with atomic updates. Any experience or suggestion will help. How do
expedite your indexing process specifically atomic updates? I know this
might have been asked so many times and I have actually read/implemented
all of the recommendations. My question is specific to Atomic Updates and
if something exclusive to Atomic Updates can make it faster.


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Solr 7.2.1 - unexpected docvalues type

2019-11-11 Thread Antony Alphonse
Thank you both. I will look into the options.

-AA

On Mon, Nov 11, 2019 at 6:05 AM Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Antony,
> Like Erick explained, you still have to preprocess your field in order to
> be able to use doc values. What you can do is use update request processor
> chain and have all the logic in Solr. Here is blog post explaining how it
> could work:
> https://www.od-bits.com/2018/02/solr-docvalues-on-analysed-field.html <
> https://www.od-bits.com/2018/02/solr-docvalues-on-analysed-field.html>
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 10 Nov 2019, at 15:54, Erick Erickson 
> wrote:
> >
> > So “lowercase” is, indeed, a solr.TextField, which is ineligible for
> docValues. Given that definition, the difference will be that a “string”
> type is totally un-analyzed, so the values that go into the index and the
> query itself will be case-sensitive. You’ll have to pre-process both to do
> the right thing.
> >
> >> On Nov 9, 2019, at 6:15 PM, Antony Alphonse 
> wrote:
> >>
> >> Hi Shawn,
> >>
> >> Thank you. I switched the fieldType=string and it worked. I might have
> to
> >> check on the use-case to see if "string" will work for us.
> >>
> >> I have noted the "lowercase" field type which I believe is similar to
> the
> >> one in schema ver 1.6.
> >>
> >>
> >>  >>   positionIncrementGap="100">
> >>   
> >>>> class="solr.KeywordTokenizerFactory" />
> >>class="solr.LowerCaseFilterFactory"
> >> />
> >>   
> >>   
> >>
> >> Thanks,
> >> Antony
> >>
> >> On Sat, Nov 9, 2019 at 7:52 AM Erick Erickson 
> >> wrote:
> >>
> >>> We can’t answer whether you should change the field type for two
> reasons:
> >>>
> >>> 1> It depends on your use case.
> >>> 2> we don’t know what the field type “lowercase” does. It’s composed
> of an
> >>> analysis chain that you may have changed. And whatever config you are
> using
> >>> may have changed with different releases of Solr.
> >>>
> >>> Grouping is generally done on a docValues-eligible field type. AFAIK,
> >>> “lowercase” is a solr-text based field so is ineligible for docValues.
> I’ve
> >>> got to guess here, but I’d suggest you start with a fieldType of
> “string”,
> >>> and enable docValues on it.
> >>>
> >>> Best,
> >>> Erick
> >>>
> >>>
> >>>
> >>>> On Nov 9, 2019, at 12:54 AM, Antony Alphonse <
> antonyaugus...@gmail.com>
> >>> wrote:
> >>>>
> >>>>>
> >>>>> Hi Shawn,
> >>>>>
> >>>>
> >>>> I will try that solution. Also I had to mention that the queries that
> >>> fail
> >>>> with this error has the "group.field":"lowercase". Should I change the
> >>>> field type?
> >>>>
> >>>> Thanks,
> >>>> Antony
> >>>
> >>>
> >
>
>


Re: Solr 7.2.1 - unexpected docvalues type

2019-11-11 Thread Emir Arnautović
Hi Antony,
Like Erick explained, you still have to preprocess your field in order to be 
able to use doc values. What you can do is use update request processor chain 
and have all the logic in Solr. Here is blog post explaining how it could work: 
https://www.od-bits.com/2018/02/solr-docvalues-on-analysed-field.html 
<https://www.od-bits.com/2018/02/solr-docvalues-on-analysed-field.html>

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 10 Nov 2019, at 15:54, Erick Erickson  wrote:
> 
> So “lowercase” is, indeed, a solr.TextField, which is ineligible for 
> docValues. Given that definition, the difference will be that a “string” type 
> is totally un-analyzed, so the values that go into the index and the query 
> itself will be case-sensitive. You’ll have to pre-process both to do the 
> right thing.
> 
>> On Nov 9, 2019, at 6:15 PM, Antony Alphonse  wrote:
>> 
>> Hi Shawn,
>> 
>> Thank you. I switched the fieldType=string and it worked. I might have to
>> check on the use-case to see if "string" will work for us.
>> 
>> I have noted the "lowercase" field type which I believe is similar to the
>> one in schema ver 1.6.
>> 
>> 
>> >   positionIncrementGap="100">
>>   
>>   > class="solr.KeywordTokenizerFactory" />
>>   > />
>>   
>>   
>> 
>> Thanks,
>> Antony
>> 
>> On Sat, Nov 9, 2019 at 7:52 AM Erick Erickson 
>> wrote:
>> 
>>> We can’t answer whether you should change the field type for two reasons:
>>> 
>>> 1> It depends on your use case.
>>> 2> we don’t know what the field type “lowercase” does. It’s composed of an
>>> analysis chain that you may have changed. And whatever config you are using
>>> may have changed with different releases of Solr.
>>> 
>>> Grouping is generally done on a docValues-eligible field type. AFAIK,
>>> “lowercase” is a solr-text based field so is ineligible for docValues. I’ve
>>> got to guess here, but I’d suggest you start with a fieldType of “string”,
>>> and enable docValues on it.
>>> 
>>> Best,
>>> Erick
>>> 
>>> 
>>> 
>>>> On Nov 9, 2019, at 12:54 AM, Antony Alphonse 
>>> wrote:
>>>> 
>>>>> 
>>>>> Hi Shawn,
>>>>> 
>>>> 
>>>> I will try that solution. Also I had to mention that the queries that
>>> fail
>>>> with this error has the "group.field":"lowercase". Should I change the
>>>> field type?
>>>> 
>>>> Thanks,
>>>> Antony
>>> 
>>> 
> 



Re: Solr 7.2.1 - unexpected docvalues type

2019-11-10 Thread Erick Erickson
So “lowercase” is, indeed, a solr.TextField, which is ineligible for docValues. 
Given that definition, the difference will be that a “string” type is totally 
un-analyzed, so the values that go into the index and the query itself will be 
case-sensitive. You’ll have to pre-process both to do the right thing.

> On Nov 9, 2019, at 6:15 PM, Antony Alphonse  wrote:
> 
> Hi Shawn,
> 
> Thank you. I switched the fieldType=string and it worked. I might have to
> check on the use-case to see if "string" will work for us.
> 
> I have noted the "lowercase" field type which I believe is similar to the
> one in schema ver 1.6.
> 
> 
>  positionIncrementGap="100">
>
> class="solr.KeywordTokenizerFactory" />
> />
>
>
> 
> Thanks,
> Antony
> 
> On Sat, Nov 9, 2019 at 7:52 AM Erick Erickson 
> wrote:
> 
>> We can’t answer whether you should change the field type for two reasons:
>> 
>> 1> It depends on your use case.
>> 2> we don’t know what the field type “lowercase” does. It’s composed of an
>> analysis chain that you may have changed. And whatever config you are using
>> may have changed with different releases of Solr.
>> 
>> Grouping is generally done on a docValues-eligible field type. AFAIK,
>> “lowercase” is a solr-text based field so is ineligible for docValues. I’ve
>> got to guess here, but I’d suggest you start with a fieldType of “string”,
>> and enable docValues on it.
>> 
>> Best,
>> Erick
>> 
>> 
>> 
>>> On Nov 9, 2019, at 12:54 AM, Antony Alphonse 
>> wrote:
>>> 
>>>> 
>>>> Hi Shawn,
>>>> 
>>> 
>>> I will try that solution. Also I had to mention that the queries that
>> fail
>>> with this error has the "group.field":"lowercase". Should I change the
>>> field type?
>>> 
>>> Thanks,
>>> Antony
>> 
>> 



Re: Solr 7.2.1 - unexpected docvalues type

2019-11-09 Thread Antony Alphonse
Hi Shawn,

Thank you. I switched the fieldType=string and it worked. I might have to
check on the use-case to see if "string" will work for us.

I have noted the "lowercase" field type which I believe is similar to the
one in schema ver 1.6.


  






Thanks,
Antony

On Sat, Nov 9, 2019 at 7:52 AM Erick Erickson 
wrote:

> We can’t answer whether you should change the field type for two reasons:
>
> 1> It depends on your use case.
> 2> we don’t know what the field type “lowercase” does. It’s composed of an
> analysis chain that you may have changed. And whatever config you are using
> may have changed with different releases of Solr.
>
> Grouping is generally done on a docValues-eligible field type. AFAIK,
> “lowercase” is a solr-text based field so is ineligible for docValues. I’ve
> got to guess here, but I’d suggest you start with a fieldType of “string”,
> and enable docValues on it.
>
> Best,
> Erick
>
>
>
> > On Nov 9, 2019, at 12:54 AM, Antony Alphonse 
> wrote:
> >
> >>
> >> Hi Shawn,
> >>
> >
> > I will try that solution. Also I had to mention that the queries that
> fail
> > with this error has the "group.field":"lowercase". Should I change the
> > field type?
> >
> > Thanks,
> > Antony
>
>


Re: Solr 7.2.1 - unexpected docvalues type

2019-11-09 Thread Erick Erickson
We can’t answer whether you should change the field type for two reasons:

1> It depends on your use case. 
2> we don’t know what the field type “lowercase” does. It’s composed of an 
analysis chain that you may have changed. And whatever config you are using may 
have changed with different releases of Solr.

Grouping is generally done on a docValues-eligible field type. AFAIK, 
“lowercase” is a solr-text based field so is ineligible for docValues. I’ve got 
to guess here, but I’d suggest you start with a fieldType of “string”, and 
enable docValues on it.

Best,
Erick



> On Nov 9, 2019, at 12:54 AM, Antony Alphonse  wrote:
> 
>> 
>> Hi Shawn,
>> 
> 
> I will try that solution. Also I had to mention that the queries that fail
> with this error has the "group.field":"lowercase". Should I change the
> field type?
> 
> Thanks,
> Antony



Re: Solr 7.2.1 - unexpected docvalues type

2019-11-08 Thread Antony Alphonse
>
> Hi Shawn,
>

I will try that solution. Also I had to mention that the queries that fail
with this error has the "group.field":"lowercase". Should I change the
field type?

Thanks,
Antony


Re: Solr 7.2.1 - unexpected docvalues type

2019-11-08 Thread Shawn Heisey

On 11/8/2019 5:31 PM, Antony Alphonse wrote:

I shared the collection and re-indexed the data with the same schema. But
one of the field is throwing the below error. Any suggestions?



ERROR (qtp672320506-32) [c: s:shard3 r:core_node01 x:_shard3_replica_n69]
o.a.s.h.RequestHandlerBase java.lang.IllegalStateException: unexpected
docvalues type SORTED_SET for field 'lowercase' (expected=SORTED). Re-index
with correct docvalues type.


This error means that part of the index was created with one definition 
for the field in question, then the schema was changed in an 
incompatible way, and additional indexing was attempted.


The solution to this particular error is to completely delete the index 
directories that make up the collection, reload it, and then build it 
from scratch again.  The error happens at the Lucene level and the only 
way to fix it is to completely delete the index.  You could do it by 
creating an entirely new collection.


Thanks,
Shawn


Solr 7.2.1 - unexpected docvalues type

2019-11-08 Thread Antony Alphonse
Hi,

I shared the collection and re-indexed the data with the same schema. But
one of the field is throwing the below error. Any suggestions?





ERROR (qtp672320506-32) [c: s:shard3 r:core_node01 x:_shard3_replica_n69]
o.a.s.h.RequestHandlerBase java.lang.IllegalStateException: unexpected
docvalues type SORTED_SET for field 'lowercase' (expected=SORTED). Re-index
with correct docvalues type.
at org.apache.lucene.index.DocValues.checkField(DocValues.java:340)
at org.apache.lucene.index.DocValues.getSorted(DocValues.java:392)
at
org.apache.lucene.search.grouping.TermGroupSelector.setNextReader(TermGroupSelector.java:56)
at
org.apache.lucene.search.grouping.FirstPassGroupingCollector.doSetNextReader(FirstPassGroupingCollector.java:350)
at
org.apache.lucene.search.SimpleCollector.getLeafCollector(SimpleCollector.java:33)
at
org.apache.lucene.search.MultiCollector.getLeafCollector(MultiCollector.java:121)
at
org.apache.lucene.search.MultiCollector.getLeafCollector(MultiCollector.java:121)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:651)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:462)
at
org.apache.solr.search.grouping.CommandHandler.searchWithTimeLimiter(CommandHandler.java:239)
at
org.apache.solr.search.grouping.CommandHandler.execute(CommandHandler.java:162)
at
org.apache.solr.handler.component.QueryComponent.doProcessGroupedDistributedSearchFirstPhase(QueryComponent.java:1279)
at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:360)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)
at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:710)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
at
org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:251)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:745)

Thanks!!


Re: In Place Updates: Can we filter on fields with only docValues="true"

2019-09-15 Thread Erick Erickson
Filtering is really searching. As Shawn says, you _might_ get away with it in 
some circumstances, but it’s not something I’d recommend.

Here’s the problem: For most searches, you’re trying to ask “for term X, what 
docs contain it?”. That’s exactly what the inverted index is for, it’s an 
ordered list of terms, each term has the list of documents it appears in.

DocValues is the exact opposite. It answers “For doc X, what is the value of 
field Y?”. When _searching_ on a DV only field, think “table scan” in DB terms.

Pick a field with high cardinality. Worst-case, every doc has a unique value 
and try searching on that. If it’s fast, then I need to go into the code and 
understand why it’s not doing what I expect ;).

I’ll add parenthetically that 100M docs with 100 shards seems excessively 
sharded. Perhaps you have so many fields that that’s warranted, but it seems 
high. My rule-of-thumb starting place is 50M docs/shard. Admittedly that can be 
low or high, I’ve seen 300M docs fit in 12G and 10M docs strain 31G. You might 
try testing a node to destruction, see: 
https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Best,
Erick

> On Sep 14, 2019, at 7:54 PM, Shawn Heisey  wrote:
> 
> On 9/14/2019 4:29 PM, Mikhail Khludnev wrote:
>> Shawn, would you mind to provide some numbers?
>> I'm experimenting with lucene 8.0.0.
>> I have 100 shard index of 100M docs with 2000 docVals only updateable
>> fields. Searching for such field turns to be blazingly fast
>> $ curl 'localhost:39200/books/_search?pretty&size=20' -d '
> 
> I have no idea how to read the json you've pasted.  Neither that or the URLs 
> look like Solr.
> 
>> I've just updated this field in this particular doc. Other 245K of 100M
>> docs has 1 in it
>> $ curl -H 'Content-Type:application/json'
> 
> 
> 
>> It's dv field without index
>> $ curl -s
>> 'localhost:39200/books/_mapping/field/subscription_0x1?pretty&include_defaults=true'
> 
> What's the cardinality of the field you're searching on?  If it's small, then 
> even an inefficient search will be fast.  Try on a field with millions or 
> billions of possible values.
> 
> Thanks,
> Shawn



Re: In Place Updates: Can we filter on fields with only docValues="true"

2019-09-14 Thread Shawn Heisey

On 9/14/2019 4:29 PM, Mikhail Khludnev wrote:

Shawn, would you mind to provide some numbers?
I'm experimenting with lucene 8.0.0.
I have 100 shard index of 100M docs with 2000 docVals only updateable
fields. Searching for such field turns to be blazingly fast
$ curl 'localhost:39200/books/_search?pretty&size=20' -d '


I have no idea how to read the json you've pasted.  Neither that or the 
URLs look like Solr.



I've just updated this field in this particular doc. Other 245K of 100M
docs has 1 in it

$ curl -H 'Content-Type:application/json'





It's dv field without index

$ curl -s
'localhost:39200/books/_mapping/field/subscription_0x1?pretty&include_defaults=true'


What's the cardinality of the field you're searching on?  If it's small, 
then even an inefficient search will be fast.  Try on a field with 
millions or billions of possible values.


Thanks,
Shawn


Re: In Place Updates: Can we filter on fields with only docValues="true"

2019-09-14 Thread Mikhail Khludnev
Shawn, would you mind to provide some numbers?
I'm experimenting with lucene 8.0.0.
I have 100 shard index of 100M docs with 2000 docVals only updateable
fields. Searching for such field turns to be blazingly fast
$ curl 'localhost:39200/books/_search?pretty&size=20' -d '
{"query": {"bool": {"filter": {"range": {"subscription_0x1": {"lte": 666,
"gte": 666}}'
{
  "took" : 148,
  "timed_out" : false,
  "_shards" : {"total" : 100,"successful" : 100,"skipped" : 0,
  "failed" : 0
  },
  "hits" : {
"total" : {  "value" : 1,  "relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
  {
"_index" : "books",
"_type" : "books",
"_id" : "28113070",
"_score" : 0.0
  }
]
  }
}

I've just updated this field in this particular doc. Other 245K of 100M
docs has 1 in it

$ curl -H 'Content-Type:application/json'
'localhost:39200/books/_search?pretty&size=20' -d '
{"track_total_hits": true, "query": {"bool": {"filter": {"range":
{"subscription_0x1": {"lte": 1, "gte":1}}'
{
  "took" : 16,
  "timed_out" : false,
  "_shards" : {
"total" : 100,
"successful" : 100,
"skipped" : 0,
"failed" : 0
  },
  "hits" : {
"total" : {
  "value" : 245335,
  "relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
  {
"_index" : "books",
"_type" : "books",
"_id" : "30155366",
"_score" : 0.0
  },

It's dv field without index

$ curl -s
'localhost:39200/books/_mapping/field/subscription_0x1?pretty&include_defaults=true'
{
  "books" : {
"mappings" : {
  "subscription_0x1" : {
"full_name" : "subscription_0x1",
"mapping" : {
  "subscription_0x1" : {
"type" : "integer",
"boost" : 1.0,
"index" : false,
"store" : false,
"doc_values" : true,
"term_vector" : "no",
"norms" : false,
"eager_global_ordinals" : false,
"similarity" : "BM25",
"ignore_malformed" : false,
"coerce" : true,
"null_value" : null
  }
}
  }
}
  }
}



On Tue, Sep 10, 2019 at 4:55 PM Shawn Heisey  wrote:

> On 9/10/2019 7:15 AM, Doss wrote:
> > 4 to 5 million documents.
> >
> > For an NTR index, we need a field to be updated very frequently and
> filter
> > results based on it. Will In-Place updates help us?
> >
> >  > docValues="true" />
>
> Although you CAN search on docValues-only fields, the performance is
> terrible.  So the answer I have for you is "maybe, but you won't like
> it."  For good filtering performance, you need the field to be indexed.
> Which means you can't do in-place updates.
>
> Thanks,
> Shawn
>


-- 
Sincerely yours
Mikhail Khludnev


Re: In Place Updates: Can we filter on fields with only docValues="true"

2019-09-10 Thread Shawn Heisey

On 9/10/2019 7:15 AM, Doss wrote:

4 to 5 million documents.

For an NTR index, we need a field to be updated very frequently and filter
results based on it. Will In-Place updates help us?




Although you CAN search on docValues-only fields, the performance is 
terrible.  So the answer I have for you is "maybe, but you won't like 
it."  For good filtering performance, you need the field to be indexed. 
Which means you can't do in-place updates.


Thanks,
Shawn


Re: In Place Updates: Can we filter on fields with only docValues="true"

2019-09-10 Thread Mikhail Khludnev
It's worth to try. I know about folks who build NRT system on it. One
thing, I might be wrong but, "pint" might mean points which is hardly
compatible with inPlace update. It should be the simplest numbers, if you
can debug Solr, check that it creates NumericDocValues, not sorted ones.
These are updateable inplace.

On Tue, Sep 10, 2019 at 4:15 PM Doss  wrote:

> Hi,
>
> 4 to 5 million documents.
>
> For an NTR index, we need a field to be updated very frequently and filter
> results based on it. Will In-Place updates help us?
>
>  docValues="true" />
>
>
> Thanks,
> Doss.
>


-- 
Sincerely yours
Mikhail Khludnev


In Place Updates: Can we filter on fields with only docValues="true"

2019-09-10 Thread Doss
Hi,

4 to 5 million documents.

For an NTR index, we need a field to be updated very frequently and filter
results based on it. Will In-Place updates help us?




Thanks,
Doss.


Re: Are docValues useful for FilterQueries?

2019-07-08 Thread Erick Erickson
DocValues are irrelevant for scoring. Here’s the way I think of it.

When querying (and thus scoring), you have a term X. I need to know
> what docs does it appear in?
> how many docs does it appear in?
> how often does the term appear in the entire corpus?

These are questions the inverted index (indexed=“true”) was designed
to answer.

For faceting, sorting and grouping, I want to know for a _document_,
what value appears in field Y. This is what docValues does much more
efficiently.

Best,
Erick
> On Jul 8, 2019, at 5:36 PM, Ashwin Ramesh  wrote:
> 
> Hi everybody,
> 
> I can't find concrete evidence whether docValues are indeed useful for
> filter queries. One example of a field:
> 
>  required="false" multiValued="false" />
> 
> This field will have a value between 0-1 The only usecase for this
> field is to filter on a range / subset of values. There will be no scoring
> / querying on this field. Is this a good usecase for docValues? Regards, Ash
> 
> -- 
> *P.S. We've launched a new blog to share the latest ideas and case studies 
> from our team. Check it out here: product.canva.com 
> <https://product.canva.com/>. ***
> ** <https://www.canva.com/>Empowering the 
> world to design
> Also, we're hiring. Apply here! 
> <https://about.canva.com/careers/>
> <https://twitter.com/canva> 
> <https://facebook.com/canva> <https://au.linkedin.com/company/canva> 
> <https://twitter.com/canva>  <https://facebook.com/canva>  
> <https://au.linkedin.com/company/canva>  <https://instagram.com/canva>
> 
> 
> 
> 
> 
> 



Are docValues useful for FilterQueries?

2019-07-08 Thread Ashwin Ramesh
Hi everybody,

I can't find concrete evidence whether docValues are indeed useful for
filter queries. One example of a field:



This field will have a value between 0-1 The only usecase for this
field is to filter on a range / subset of values. There will be no scoring
/ querying on this field. Is this a good usecase for docValues? Regards, Ash

-- 
*P.S. We've launched a new blog to share the latest ideas and case studies 
from our team. Check it out here: product.canva.com 
<https://product.canva.com/>. ***
** <https://www.canva.com/>Empowering the 
world to design
Also, we're hiring. Apply here! 
<https://about.canva.com/careers/>
 <https://twitter.com/canva> 
<https://facebook.com/canva> <https://au.linkedin.com/company/canva> 
<https://twitter.com/canva>  <https://facebook.com/canva>  
<https://au.linkedin.com/company/canva>  <https://instagram.com/canva>








Re: Enabling/disabling docValues

2019-06-11 Thread John Davis
There is no way to match case insensitive without TextFields + no
tokenization. Its a long standing limitation of not being able to apply any
analyzers with str fields.

Thanks for pointing out the re-index page I've seen it. However sometimes
it is hard to re-index in a reasonable amount of time & resources, and if
we empower power users to understand the system better it will help making
more informed tradeoffs.

On Tue, Jun 11, 2019 at 6:52 AM Gus Heck  wrote:

> On Mon, Jun 10, 2019 at 10:53 PM John Davis 
> wrote:
>
> > You have made many assumptions which might not always be realistic a)
> > TextField is always tokenized
>
>
> Well, you could of course change configuration or code to do something else
> but this would be a very odd and misleading thing to do and we would expect
> you to have mentioned it.
>
>
> > b) Users care about precise counts and
>
>
> This is indeed use case dependent if you are talking about approximately
> correct (150 vs 152 etc), but it's pretty reasonable to say that gross
> errors (75 vs 153 or 0 vs 5 etc) more or less make faceting pointless.
>
>
> > c) Users have the luxury or ability to do a full re-index anytime.
>
>
> This is a state of affairs we consistently advise against. The reason we
> give the advice is precisely because one cannot change the schema out from
> under an existing index safely without rewriting the index. Without
> extremely careful design on your side (not using certain features and high
> storage requirements), your index will not retain enough information to
> re-remake itself. Therefore, it is a long standing bad practice to not have
> a separate canonical copy of the data and a means to re-index it (or a
> design where only the very most recent data is important, and a copy of
> that). There is a whole page dedicated to reindexing in the ref guide:
> https://lucene.apache.org/solr/guide/8_0/reindexing.html Here's a relevant
> bit from the current version:
>
> `There is no process in Solr for programmatically reindexing data. When we
> say "reindex", we mean, literally, "index it again". However you got the
> data into the index the first time, you will run that process again. It is
> strongly recommended that Solr users index their data in a repeatable,
> consistent way, so that the process can be easily repeated when the need
> for reindexing arises.`
>
>
> The ref guide has lots of nice info, maybe you should read it rather than
> snubbing one of the nicest and most knowledgeable committers on the project
> (who is helping you for free) by haughtily saying you'll go ask someone
> else... And if you've been left with this situation (no ability to reindex)
> by your predecessor you have our deepest sympathies, but it still doesn't
> change the fact that you need break it to management the your predecessor
> has lost the data required to maintain the system and you still need
> re-index whatever you can salvage somehow, or start fresh.
>
> When Erick is saying you shouldn't be asking that question... >90% of the
> time you really shouldn't be, and if you do pursue it, you'll just waste a
> lot of your own time.
>
>
> > On Mon, Jun 10, 2019 at 10:55 AM Erick Erickson  >
> > wrote:
> >
> > > bq. Does lucene look at %docs in each state, or the first doc or
> > something
> > > else?
> > >
> > > Frankly I don’t care since no matter what, the results of faceting
> mixed
> > > definitions is not useful.
> > >
> > > tl;dr;
> > >
> > > “When I use a word,’ Humpty Dumpty said in rather a scornful tone, ‘it
> > > means just what I choose it to mean — neither more nor less.’
> > >
> > > So “undefined" in this case means “I don’t see any value at all in
> > chasing
> > > that info down” ;).
> > >
> > > Changing from regular text to SortableText means that the results will
> be
> > > inaccurate no matter what. For example, I have a doc with the value “my
> > dog
> > > has fleas”. When NOT using SortableText, there are multiple tokens so
> > facet
> > > counts would be:
> > >
> > > my (1)
> > > dog (1)
> > > has (1)
> > > fleas (1)
> > >
> > > But for SortableText will be:
> > >
> > > my dog has fleas (1)
> > >
> > > Consider doc1 with “my dog has fleas” and doc2 with “my cat has fleas”.
> > > doc1 was  indexed before switching to SortableText and doc2 after.
> > > Presumably  the output you want is:
> > >
> > > my dog has fleas (1)
> > > my ca

Re: Enabling/disabling docValues

2019-06-11 Thread Gus Heck
On Mon, Jun 10, 2019 at 10:53 PM John Davis 
wrote:

> You have made many assumptions which might not always be realistic a)
> TextField is always tokenized


Well, you could of course change configuration or code to do something else
but this would be a very odd and misleading thing to do and we would expect
you to have mentioned it.


> b) Users care about precise counts and


This is indeed use case dependent if you are talking about approximately
correct (150 vs 152 etc), but it's pretty reasonable to say that gross
errors (75 vs 153 or 0 vs 5 etc) more or less make faceting pointless.


> c) Users have the luxury or ability to do a full re-index anytime.


This is a state of affairs we consistently advise against. The reason we
give the advice is precisely because one cannot change the schema out from
under an existing index safely without rewriting the index. Without
extremely careful design on your side (not using certain features and high
storage requirements), your index will not retain enough information to
re-remake itself. Therefore, it is a long standing bad practice to not have
a separate canonical copy of the data and a means to re-index it (or a
design where only the very most recent data is important, and a copy of
that). There is a whole page dedicated to reindexing in the ref guide:
https://lucene.apache.org/solr/guide/8_0/reindexing.html Here's a relevant
bit from the current version:

`There is no process in Solr for programmatically reindexing data. When we
say "reindex", we mean, literally, "index it again". However you got the
data into the index the first time, you will run that process again. It is
strongly recommended that Solr users index their data in a repeatable,
consistent way, so that the process can be easily repeated when the need
for reindexing arises.`


The ref guide has lots of nice info, maybe you should read it rather than
snubbing one of the nicest and most knowledgeable committers on the project
(who is helping you for free) by haughtily saying you'll go ask someone
else... And if you've been left with this situation (no ability to reindex)
by your predecessor you have our deepest sympathies, but it still doesn't
change the fact that you need break it to management the your predecessor
has lost the data required to maintain the system and you still need
re-index whatever you can salvage somehow, or start fresh.

When Erick is saying you shouldn't be asking that question... >90% of the
time you really shouldn't be, and if you do pursue it, you'll just waste a
lot of your own time.


> On Mon, Jun 10, 2019 at 10:55 AM Erick Erickson 
> wrote:
>
> > bq. Does lucene look at %docs in each state, or the first doc or
> something
> > else?
> >
> > Frankly I don’t care since no matter what, the results of faceting mixed
> > definitions is not useful.
> >
> > tl;dr;
> >
> > “When I use a word,’ Humpty Dumpty said in rather a scornful tone, ‘it
> > means just what I choose it to mean — neither more nor less.’
> >
> > So “undefined" in this case means “I don’t see any value at all in
> chasing
> > that info down” ;).
> >
> > Changing from regular text to SortableText means that the results will be
> > inaccurate no matter what. For example, I have a doc with the value “my
> dog
> > has fleas”. When NOT using SortableText, there are multiple tokens so
> facet
> > counts would be:
> >
> > my (1)
> > dog (1)
> > has (1)
> > fleas (1)
> >
> > But for SortableText will be:
> >
> > my dog has fleas (1)
> >
> > Consider doc1 with “my dog has fleas” and doc2 with “my cat has fleas”.
> > doc1 was  indexed before switching to SortableText and doc2 after.
> > Presumably  the output you want is:
> >
> > my dog has fleas (1)
> > my cat has fleas (1)
> >
> > But you can’t get that output.  There are three cases:
> >
> > 1> Lucene treats all documents as SortableText, faceting on the docValues
> > parts. No facets on doc1
> >
> > my  cat has fleas (1)
> >
> > 2> Lucene treats all documents as tokenized, faceting on each individual
> > token. Faceting is performed on the tokenized content of both,  docValues
> > in doc2  ignored
> >
> > my  (2)
> > dog (1)
> > has (2)
> > fleas (2)
> > cat (1)
> >
> >
> > 3> Lucene does the best it can, faceting on the tokens for docs without
> > SortableText and docValues if the doc was indexed with Sortable text.
> doc1
> > faceted on tokenized, doc2 on docValues
> >
> > my  (1)
> > dog (1)
> > has (1)
> > fleas (1)
> > my cat has fleas (1)
> >
> > Since none of those

Re: Enabling/disabling docValues

2019-06-10 Thread John Davis
You have made many assumptions which might not always be realistic a)
TextField is always tokenized b) Users care about precise counts and c)
Users have the luxury or ability to do a full re-index anytime. These are
real issues and there is no black/white solution. I will ask Lucene folks
on the actual implementation.

On Mon, Jun 10, 2019 at 10:55 AM Erick Erickson 
wrote:

> bq. Does lucene look at %docs in each state, or the first doc or something
> else?
>
> Frankly I don’t care since no matter what, the results of faceting mixed
> definitions is not useful.
>
> tl;dr;
>
> “When I use a word,’ Humpty Dumpty said in rather a scornful tone, ‘it
> means just what I choose it to mean — neither more nor less.’
>
> So “undefined" in this case means “I don’t see any value at all in chasing
> that info down” ;).
>
> Changing from regular text to SortableText means that the results will be
> inaccurate no matter what. For example, I have a doc with the value “my dog
> has fleas”. When NOT using SortableText, there are multiple tokens so facet
> counts would be:
>
> my (1)
> dog (1)
> has (1)
> fleas (1)
>
> But for SortableText will be:
>
> my dog has fleas (1)
>
> Consider doc1 with “my dog has fleas” and doc2 with “my cat has fleas”.
> doc1 was  indexed before switching to SortableText and doc2 after.
> Presumably  the output you want is:
>
> my dog has fleas (1)
> my cat has fleas (1)
>
> But you can’t get that output.  There are three cases:
>
> 1> Lucene treats all documents as SortableText, faceting on the docValues
> parts. No facets on doc1
>
> my  cat has fleas (1)
>
> 2> Lucene treats all documents as tokenized, faceting on each individual
> token. Faceting is performed on the tokenized content of both,  docValues
> in doc2  ignored
>
> my  (2)
> dog (1)
> has (2)
> fleas (2)
> cat (1)
>
>
> 3> Lucene does the best it can, faceting on the tokens for docs without
> SortableText and docValues if the doc was indexed with Sortable text. doc1
> faceted on tokenized, doc2 on docValues
>
> my  (1)
> dog (1)
> has (1)
> fleas (1)
> my cat has fleas (1)
>
> Since none of those cases is what I want, there’s no point I can see in
> chasing down what actually happens….
>
> Best,
> Erick
>
> P.S. I _think_ Lucene tries to use the definition from the first segment,
> but since whether the lists of segments to be  merged don’t look at the
> field definitions at all. Whether the first segment in the list has
> SortableText or not will not be predictable in a general way even within a
> single run.
>
>
> > On Jun 9, 2019, at 6:53 PM, John Davis 
> wrote:
> >
> > Understood, however code is rarely random/undefined. Does lucene look at
> %
> > docs in each state, or the first doc or something else?
> >
> > On Sun, Jun 9, 2019 at 1:58 PM Erick Erickson 
> > wrote:
> >
> >> It’s basically undefined. When segments are merged that have dissimilar
> >> definitions like this what can Lucene do? Consider:
> >>
> >> Faceting on a text (not sortable) means that each individual token in
> the
> >> index is uninverted on the Java heap and the facets are computed for
> each
> >> individual term.
> >>
> >> Faceting on a SortableText field just has a single term per document,
> and
> >> that in the docValues structures as opposed to the inverted index.
> >>
> >> Now you change the value and start indexing. At some point a segment
> >> containing no docValues is merged with a segment containing docValues
> for
> >> the field. The resulting mixed segment is in this state. If you facet on
> >> the field, should the docs without docValues have each individual term
> >> counted? Or just the SortableText values in the docValues structure?
> >> Neither one is right.
> >>
> >> Also remember that Lucene has no notion of schema. That’s entirely
> imposed
> >> on Lucene by Solr carefully constructing low-level analysis chains.
> >>
> >> So I’d _strongly_ recommend you re-index your corpus to a new collection
> >> with the current definition, then perhaps use CREATEALIAS to seamlessly
> >> switch.
> >>
> >> Best,
> >> Erick
> >>
> >>> On Jun 9, 2019, at 12:50 PM, John Davis 
> >> wrote:
> >>>
> >>> Hi there,
> >>> We recently changed a field from TextField + no docValues to
> >>> SortableTextField which has docValues enabled by default. Once I did
> >> this I
> >>> do not see any facet values for the field. I know that once all the
> docs
> >>> are re-indexed facets should work again, however can someone clarify
> the
> >>> current logic of lucene/solr how facets will be computed when schema is
> >>> changed from no docValues to docValues and vice-versa?
> >>>
> >>> 1. Until ALL the docs are re-indexed, no facets will be returned?
> >>> 2. Once certain fraction of docs are re-indexed, those facets will be
> >>> returned?
> >>> 3. Something else?
> >>>
> >>>
> >>> Varun
> >>
> >>
>
>


Re: Enabling/disabling docValues

2019-06-10 Thread Erick Erickson
bq. Does lucene look at %docs in each state, or the first doc or something else?

Frankly I don’t care since no matter what, the results of faceting mixed 
definitions is not useful.

tl;dr;

“When I use a word,’ Humpty Dumpty said in rather a scornful tone, ‘it means 
just what I choose it to mean — neither more nor less.’

So “undefined" in this case means “I don’t see any value at all in chasing that 
info down” ;).

Changing from regular text to SortableText means that the results will be 
inaccurate no matter what. For example, I have a doc with the value “my dog has 
fleas”. When NOT using SortableText, there are multiple tokens so facet counts 
would be:

my (1)
dog (1)
has (1)
fleas (1)

But for SortableText will be:

my dog has fleas (1)

Consider doc1 with “my dog has fleas” and doc2 with “my cat has fleas”. doc1 
was  indexed before switching to SortableText and doc2 after. Presumably  the 
output you want is:

my dog has fleas (1)
my cat has fleas (1)

But you can’t get that output.  There are three cases:

1> Lucene treats all documents as SortableText, faceting on the docValues 
parts. No facets on doc1

my  cat has fleas (1) 

2> Lucene treats all documents as tokenized, faceting on each individual token. 
Faceting is performed on the tokenized content of both,  docValues in doc2  
ignored

my  (2)
dog (1)
has (2)
fleas (2)
cat (1)


3> Lucene does the best it can, faceting on the tokens for docs without 
SortableText and docValues if the doc was indexed with Sortable text. doc1 
faceted on tokenized, doc2 on docValues

my  (1)
dog (1)
has (1)
fleas (1)
my cat has fleas (1)

Since none of those cases is what I want, there’s no point I can see in chasing 
down what actually happens….

Best,
Erick

P.S. I _think_ Lucene tries to use the definition from the first segment, but 
since whether the lists of segments to be  merged don’t look at the field 
definitions at all. Whether the first segment in the list has SortableText or 
not will not be predictable in a general way even within a single run.


> On Jun 9, 2019, at 6:53 PM, John Davis  wrote:
> 
> Understood, however code is rarely random/undefined. Does lucene look at %
> docs in each state, or the first doc or something else?
> 
> On Sun, Jun 9, 2019 at 1:58 PM Erick Erickson 
> wrote:
> 
>> It’s basically undefined. When segments are merged that have dissimilar
>> definitions like this what can Lucene do? Consider:
>> 
>> Faceting on a text (not sortable) means that each individual token in the
>> index is uninverted on the Java heap and the facets are computed for each
>> individual term.
>> 
>> Faceting on a SortableText field just has a single term per document, and
>> that in the docValues structures as opposed to the inverted index.
>> 
>> Now you change the value and start indexing. At some point a segment
>> containing no docValues is merged with a segment containing docValues for
>> the field. The resulting mixed segment is in this state. If you facet on
>> the field, should the docs without docValues have each individual term
>> counted? Or just the SortableText values in the docValues structure?
>> Neither one is right.
>> 
>> Also remember that Lucene has no notion of schema. That’s entirely imposed
>> on Lucene by Solr carefully constructing low-level analysis chains.
>> 
>> So I’d _strongly_ recommend you re-index your corpus to a new collection
>> with the current definition, then perhaps use CREATEALIAS to seamlessly
>> switch.
>> 
>> Best,
>> Erick
>> 
>>> On Jun 9, 2019, at 12:50 PM, John Davis 
>> wrote:
>>> 
>>> Hi there,
>>> We recently changed a field from TextField + no docValues to
>>> SortableTextField which has docValues enabled by default. Once I did
>> this I
>>> do not see any facet values for the field. I know that once all the docs
>>> are re-indexed facets should work again, however can someone clarify the
>>> current logic of lucene/solr how facets will be computed when schema is
>>> changed from no docValues to docValues and vice-versa?
>>> 
>>> 1. Until ALL the docs are re-indexed, no facets will be returned?
>>> 2. Once certain fraction of docs are re-indexed, those facets will be
>>> returned?
>>> 3. Something else?
>>> 
>>> 
>>> Varun
>> 
>> 



Re: Enabling/disabling docValues

2019-06-09 Thread John Davis
Understood, however code is rarely random/undefined. Does lucene look at %
docs in each state, or the first doc or something else?

On Sun, Jun 9, 2019 at 1:58 PM Erick Erickson 
wrote:

> It’s basically undefined. When segments are merged that have dissimilar
> definitions like this what can Lucene do? Consider:
>
> Faceting on a text (not sortable) means that each individual token in the
> index is uninverted on the Java heap and the facets are computed for each
> individual term.
>
> Faceting on a SortableText field just has a single term per document, and
> that in the docValues structures as opposed to the inverted index.
>
> Now you change the value and start indexing. At some point a segment
> containing no docValues is merged with a segment containing docValues for
> the field. The resulting mixed segment is in this state. If you facet on
> the field, should the docs without docValues have each individual term
> counted? Or just the SortableText values in the docValues structure?
> Neither one is right.
>
> Also remember that Lucene has no notion of schema. That’s entirely imposed
> on Lucene by Solr carefully constructing low-level analysis chains.
>
> So I’d _strongly_ recommend you re-index your corpus to a new collection
> with the current definition, then perhaps use CREATEALIAS to seamlessly
> switch.
>
> Best,
> Erick
>
> > On Jun 9, 2019, at 12:50 PM, John Davis 
> wrote:
> >
> > Hi there,
> > We recently changed a field from TextField + no docValues to
> > SortableTextField which has docValues enabled by default. Once I did
> this I
> > do not see any facet values for the field. I know that once all the docs
> > are re-indexed facets should work again, however can someone clarify the
> > current logic of lucene/solr how facets will be computed when schema is
> > changed from no docValues to docValues and vice-versa?
> >
> > 1. Until ALL the docs are re-indexed, no facets will be returned?
> > 2. Once certain fraction of docs are re-indexed, those facets will be
> > returned?
> > 3. Something else?
> >
> >
> > Varun
>
>


Re: Enabling/disabling docValues

2019-06-09 Thread Erick Erickson
It’s basically undefined. When segments are merged that have dissimilar 
definitions like this what can Lucene do? Consider:

Faceting on a text (not sortable) means that each individual token in the index 
is uninverted on the Java heap and the facets are computed for each individual 
term.

Faceting on a SortableText field just has a single term per document, and that 
in the docValues structures as opposed to the inverted index.

Now you change the value and start indexing. At some point a segment containing 
no docValues is merged with a segment containing docValues for the field. The 
resulting mixed segment is in this state. If you facet on the field, should the 
docs without docValues have each individual term counted? Or just the 
SortableText values in the docValues structure? Neither one is right.

Also remember that Lucene has no notion of schema. That’s entirely imposed on 
Lucene by Solr carefully constructing low-level analysis chains.

So I’d _strongly_ recommend you re-index your corpus to a new collection with 
the current definition, then perhaps use CREATEALIAS to seamlessly switch.

Best,
Erick

> On Jun 9, 2019, at 12:50 PM, John Davis  wrote:
> 
> Hi there,
> We recently changed a field from TextField + no docValues to
> SortableTextField which has docValues enabled by default. Once I did this I
> do not see any facet values for the field. I know that once all the docs
> are re-indexed facets should work again, however can someone clarify the
> current logic of lucene/solr how facets will be computed when schema is
> changed from no docValues to docValues and vice-versa?
> 
> 1. Until ALL the docs are re-indexed, no facets will be returned?
> 2. Once certain fraction of docs are re-indexed, those facets will be
> returned?
> 3. Something else?
> 
> 
> Varun



Enabling/disabling docValues

2019-06-09 Thread John Davis
Hi there,
We recently changed a field from TextField + no docValues to
SortableTextField which has docValues enabled by default. Once I did this I
do not see any facet values for the field. I know that once all the docs
are re-indexed facets should work again, however can someone clarify the
current logic of lucene/solr how facets will be computed when schema is
changed from no docValues to docValues and vice-versa?

1. Until ALL the docs are re-indexed, no facets will be returned?
2. Once certain fraction of docs are re-indexed, those facets will be
returned?
3. Something else?


Varun


Re: Slow faceting performance on a docValues field

2019-05-10 Thread gulats
maybe quite late to the party but for the benefit of future readers,
experimentation with facet.range.method might be helpful (for solr versions
6 and above) as it allows us to use docValues as well for range faceting



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr exception: java.lang.IllegalStateException: unexpected docvalues type NUMERIC for field 'weight' (expected one of [BINARY, NUMERIC, SORTED, SORTED_NUMERIC, SORTED_SET]). Re-index with correct

2019-04-10 Thread Erick Erickson
"Re-index with correct docvalues”. I.e. define weight to have docValues=true in 
your schema. WARNING: you have to totally get rid of your current data, I’d 
recommend starting with a new collection.

> On Apr 10, 2019, at 12:21 AM, Alex Broitman  
> wrote:
> 
> We got the Solr exception when searching in Solr:
>  
> SolrNet.Exceptions.SolrConnectionException:  encoding="UTF-8"?>
> 
> true name="status">500160 name="hl">true name="fl">vid:def(rid,id),name,nls_NAME___en-us,nls_NAME_NLS_KEY,txt_display_name,sysid  name="hl.requireFieldMatch">true0 name="hl.usePhraseHighlighter">truegid:(0 
> 21)-(+type:3 -recipients:5164077)-disabled_types:(16 
> 1024 2048){!acls user="5164077" gid="21" group="34" pcid="6" 
> ecid="174"}20 name="version">2.2+(Dashboard Dashboard*) name="defType">edismaxDashboard name="qf">name nls_NAME___en-ustrue name="boost">product(sum(1,product(norm(acl_i),termfreq(acl_i,5164077))),if(exists(weight),weight,1))  name="hl.fl">sysid1 name="spellcheck.collate">true name="msg">unexpected docvalues type NUMERIC for field 'weight' (expected one 
> of [BINARY, NUMERIC, SORTED, SORTED_NUMERIC, SORTED_SET]). Re-index with 
> correct docvalues type. name="trace">java.lang.IllegalStateException: unexpected docvalues type 
> NUMERIC for field 'weight' (expected one of [BINARY, NUMERIC, SORTED, 
> SORTED_NUMERIC, SORTED_SET]). Re-index with correct docvalues type.
> at 
> org.apache.lucene.index.DocValues.checkField(DocValues.java:212)
> at 
> org.apache.lucene.index.DocValues.getDocsWithField(DocValues.java:324)
> at 
> org.apache.lucene.queries.function.valuesource.FloatFieldSource.getValues(FloatFieldSource.java:56)
> at 
> org.apache.lucene.queries.function.valuesource.SimpleBoolFunction.getValues(SimpleBoolFunction.java:48)
> at 
> org.apache.lucene.queries.function.valuesource.SimpleBoolFunction.getValues(SimpleBoolFunction.java:35)
> at 
> org.apache.lucene.queries.function.valuesource.IfFunction.getValues(IfFunction.java:47)
> at 
> org.apache.lucene.queries.function.valuesource.MultiFloatFunction.getValues(MultiFloatFunction.java:76)
> at 
> org.apache.lucene.queries.function.BoostedQuery$CustomScorer.<init>(BoostedQuery.java:124)
> at 
> org.apache.lucene.queries.function.BoostedQuery$CustomScorer.<init>(BoostedQuery.java:114)
> at 
> org.apache.lucene.queries.function.BoostedQuery$BoostedWeight.scorer(BoostedQuery.java:98)
> at 
> org.apache.lucene.search.Weight.scorerSupplier(Weight.java:126)
> at 
> org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:400)
> at 
> org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:381)
> at org.apache.lucene.search.Weight.bulkScorer(Weight.java:160)
> at 
> org.apache.lucene.search.BooleanWeight.bulkScorer(BooleanWeight.java:375)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:665)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:472)
> at 
> org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:217)
> at 
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1582)
> at 
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1399)
> at 
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:566)
> at 
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:545)
> at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:296)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
> at 
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
> at 
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
> at 
> org.eclipse.jetty.servlet.S

Solr exception: java.lang.IllegalStateException: unexpected docvalues type NUMERIC for field 'weight' (expected one of [BINARY, NUMERIC, SORTED, SORTED_NUMERIC, SORTED_SET]). Re-index with correct doc

2019-04-10 Thread Alex Broitman
We got the Solr exception when searching in Solr:

SolrNet.Exceptions.SolrConnectionException: 

true500160truevid:def(rid,id),name,nls_NAME___en-us,nls_NAME_NLS_KEY,txt_display_name,sysidtrue0truegid:(0 
21)-(+type:3 -recipients:5164077)-disabled_types:(16 1024 
2048){!acls user="5164077" gid="21" group="34" pcid="6" 
ecid="174"}202.2+(Dashboard Dashboard*)edismaxDashboardname nls_NAME___en-ustrueproduct(sum(1,product(norm(acl_i),termfreq(acl_i,5164077))),if(exists(weight),weight,1))sysid1trueunexpected docvalues type NUMERIC for field 'weight' (expected one 
of [BINARY, NUMERIC, SORTED, SORTED_NUMERIC, SORTED_SET]). Re-index with 
correct docvalues type.java.lang.IllegalStateException: 
unexpected docvalues type NUMERIC for field 'weight' (expected one of [BINARY, 
NUMERIC, SORTED, SORTED_NUMERIC, SORTED_SET]). Re-index with correct docvalues 
type.
at 
org.apache.lucene.index.DocValues.checkField(DocValues.java:212)
at 
org.apache.lucene.index.DocValues.getDocsWithField(DocValues.java:324)
at 
org.apache.lucene.queries.function.valuesource.FloatFieldSource.getValues(FloatFieldSource.java:56)
at 
org.apache.lucene.queries.function.valuesource.SimpleBoolFunction.getValues(SimpleBoolFunction.java:48)
at 
org.apache.lucene.queries.function.valuesource.SimpleBoolFunction.getValues(SimpleBoolFunction.java:35)
at 
org.apache.lucene.queries.function.valuesource.IfFunction.getValues(IfFunction.java:47)
at 
org.apache.lucene.queries.function.valuesource.MultiFloatFunction.getValues(MultiFloatFunction.java:76)
at 
org.apache.lucene.queries.function.BoostedQuery$CustomScorer.<init>(BoostedQuery.java:124)
at 
org.apache.lucene.queries.function.BoostedQuery$CustomScorer.<init>(BoostedQuery.java:114)
at 
org.apache.lucene.queries.function.BoostedQuery$BoostedWeight.scorer(BoostedQuery.java:98)
at 
org.apache.lucene.search.Weight.scorerSupplier(Weight.java:126)
at 
org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:400)
at 
org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:381)
at org.apache.lucene.search.Weight.bulkScorer(Weight.java:160)
at 
org.apache.lucene.search.BooleanWeight.bulkScorer(BooleanWeight.java:375)
at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:665)
at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:472)
at 
org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:217)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1582)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1399)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:566)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:545)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:296)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
at 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
at 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:

Re: DocValues or stored fields to enable atomic updates

2019-04-05 Thread Emir Arnautović
Hi Andreas,
Stored values are compressed so should take less disk. I am thinking that doc 
values might perform better when it comes to executing atomic update.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 5 Apr 2019, at 12:54, Andreas Hubold  wrote:
> 
> Hi,
> 
> I have a question on schema design: If a single-valued StrField is just used 
> for filtering results by exact value (indexed=true) and its value isn't 
> needed in the search result and not for sorting, faceting or highlighting - 
> should I use docValues=true or stored=true to enable atomic updates? Or even 
> both? I understand that either docValues or stored fields are needed for 
> atomic updates but which of the two would perform better / consume less 
> resources in this scenario?
> 
> Thank you.
> 
> Best regards,
> Andreas
> 
> 
> 



DocValues or stored fields to enable atomic updates

2019-04-05 Thread Andreas Hubold

Hi,

I have a question on schema design: If a single-valued StrField is just 
used for filtering results by exact value (indexed=true) and its value 
isn't needed in the search result and not for sorting, faceting or 
highlighting - should I use docValues=true or stored=true to enable 
atomic updates? Or even both? I understand that either docValues or 
stored fields are needed for atomic updates but which of the two would 
perform better / consume less resources in this scenario?


Thank you.

Best regards,
Andreas





Re: Unexpected docvalues type SORTED_NUMERIC Exception when grouping by a PointField facet

2019-04-03 Thread Erick Erickson
Looks like: https://issues.apache.org/jira/browse/SOLR-11728

> On Apr 3, 2019, at 1:09 AM, JiaJun Zhu  wrote:
> 
> Hello,
> 
> 
> I got an "Unexpected docvalues type SORTED_NUMERIC" exception when I perform 
> group facet on an IntPointField. Debugging into the source code, the cause is 
> that internally the docvalue type for PointField is "NUMERIC" (single value) 
> or "SORTED_NUMERIC" (multi value), while the TermGroupFacetCollector class 
> requires the facet field must have a "SORTED" or "SOTRTED_SET" docvalue type: 
> https://github.com/apache/lucene-solr/blob/2480b74887eff01f729d62a57b415d772f947c91/lucene/grouping/src/java/org/apache/lucene/search/grouping/TermGroupFacetCollector.java#L313
> 
> When I change schema for all int field to TrieIntField, the group facet then 
> work. Since internally the docvalue type for TrieField is SORTED (single 
> value) or SORTED_SET (multi value).
> 
> Regarding that the "TrieField" is depreciated in Solr7, can someone help on 
> this grouping facet issue for PointField. I also commented this issue in 
> SOLR-7495.
> 
> 
> Thanks.
> 
> 
> 
> Best regards,
> 
> JiaJun
> Manager Technology
> Alexander Street, a ProQuest Company
> No. 201 NingXia Road, Room 6J Shanghai China P.R.
> 200063



Unexpected docvalues type SORTED_NUMERIC Exception when grouping by a PointField facet

2019-04-03 Thread JiaJun Zhu
Hello,


I got an "Unexpected docvalues type SORTED_NUMERIC" exception when I perform 
group facet on an IntPointField. Debugging into the source code, the cause is 
that internally the docvalue type for PointField is "NUMERIC" (single value) or 
"SORTED_NUMERIC" (multi value), while the TermGroupFacetCollector class 
requires the facet field must have a "SORTED" or "SOTRTED_SET" docvalue type: 
https://github.com/apache/lucene-solr/blob/2480b74887eff01f729d62a57b415d772f947c91/lucene/grouping/src/java/org/apache/lucene/search/grouping/TermGroupFacetCollector.java#L313

When I change schema for all int field to TrieIntField, the group facet then 
work. Since internally the docvalue type for TrieField is SORTED (single value) 
or SORTED_SET (multi value).

Regarding that the "TrieField" is depreciated in Solr7, can someone help on 
this grouping facet issue for PointField. I also commented this issue in 
SOLR-7495.


Thanks.



Best regards,

JiaJun
Manager Technology
Alexander Street, a ProQuest Company
No. 201 NingXia Road, Room 6J Shanghai China P.R.
200063


Re: uniqueKey and docValues?

2018-11-23 Thread Mikhail Khludnev
It make sense to have docValues=true for _root_ for uniqueBlock()

On Thu, Nov 22, 2018 at 6:44 PM Vincenzo D'Amore  wrote:

> Hi guys, this is an interesting thread.
>
> Looking at schema.xml I found having uniqueKey (type="string") configured
> as docValues="true" but, I also found that _root_ is
> configured docValues="false"
>
> Is there any drawbacks in having _root_ with docValues="false" ?
>
>
> On Thu, Nov 22, 2018 at 12:28 AM Erick Erickson 
> wrote:
>
> > In  SolrCloud there are a couple of places where it might be useful.
> > First pass each replica collects the top N ids for the aggregator to
> > sort. If the uniqueKey isn't DV, it  needs to either decompress it off
> > disk or build an structure on heap if it's not DV. Last I knew anyway.
> >
> > Best,
> > Erick
> > On Wed, Nov 21, 2018 at 12:04 PM Walter Underwood  >
> > wrote:
> > >
> > > Is it a good idea to store the uniqueKey as docValues? A great idea? A
> > maybe or maybe not idea?
> > >
> > > It looks like it will speed up export and streaming. Otherwise, I can’t
> > find anything the docs pro or con.
> > >
> > > wunder
> > > Walter Underwood
> > > wun...@wunderwood.org
> > > http://observer.wunderwood.org/  (my blog)
> > >
> >
>
>
> --
> Vincenzo D'Amore
>


-- 
Sincerely yours
Mikhail Khludnev


Re: uniqueKey and docValues?

2018-11-22 Thread Walter Underwood
/export needs all fields to be docValues. If you are going to export, including 
the id seems like a good idea.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Nov 22, 2018, at 6:51 PM, Erick Erickson  wrote:
> 
> I doubt it mattes. The only point for docValues is to speed up
> situations where you want to answer the question "for docX, what is
> the value of fieldY"? Unless you're doing something interesting with
> the_route_ field, it's only used to, well, route documents at index
> time. By "interesting", I'm talking grouping, faceting and sorting.
> Are you doing any of those things on the _route_ field? If not you
> might as well save the disk space by leaving docValues=false.
> 
> Best,
> Erick
> On Thu, Nov 22, 2018 at 7:44 AM Vincenzo D'Amore  wrote:
>> 
>> Hi guys, this is an interesting thread.
>> 
>> Looking at schema.xml I found having uniqueKey (type="string") configured
>> as docValues="true" but, I also found that _root_ is
>> configured docValues="false"
>> 
>> Is there any drawbacks in having _root_ with docValues="false" ?
>> 
>> 
>> On Thu, Nov 22, 2018 at 12:28 AM Erick Erickson 
>> wrote:
>> 
>>> In  SolrCloud there are a couple of places where it might be useful.
>>> First pass each replica collects the top N ids for the aggregator to
>>> sort. If the uniqueKey isn't DV, it  needs to either decompress it off
>>> disk or build an structure on heap if it's not DV. Last I knew anyway.
>>> 
>>> Best,
>>> Erick
>>> On Wed, Nov 21, 2018 at 12:04 PM Walter Underwood 
>>> wrote:
>>>> 
>>>> Is it a good idea to store the uniqueKey as docValues? A great idea? A
>>> maybe or maybe not idea?
>>>> 
>>>> It looks like it will speed up export and streaming. Otherwise, I can’t
>>> find anything the docs pro or con.
>>>> 
>>>> wunder
>>>> Walter Underwood
>>>> wun...@wunderwood.org
>>>> http://observer.wunderwood.org/  (my blog)
>>>> 
>>> 
>> 
>> 
>> --
>> Vincenzo D'Amore



Re: uniqueKey and docValues?

2018-11-22 Thread Erick Erickson
I doubt it mattes. The only point for docValues is to speed up
situations where you want to answer the question "for docX, what is
the value of fieldY"? Unless you're doing something interesting with
the_route_ field, it's only used to, well, route documents at index
time. By "interesting", I'm talking grouping, faceting and sorting.
Are you doing any of those things on the _route_ field? If not you
might as well save the disk space by leaving docValues=false.

Best,
Erick
On Thu, Nov 22, 2018 at 7:44 AM Vincenzo D'Amore  wrote:
>
> Hi guys, this is an interesting thread.
>
> Looking at schema.xml I found having uniqueKey (type="string") configured
> as docValues="true" but, I also found that _root_ is
> configured docValues="false"
>
> Is there any drawbacks in having _root_ with docValues="false" ?
>
>
> On Thu, Nov 22, 2018 at 12:28 AM Erick Erickson 
> wrote:
>
> > In  SolrCloud there are a couple of places where it might be useful.
> > First pass each replica collects the top N ids for the aggregator to
> > sort. If the uniqueKey isn't DV, it  needs to either decompress it off
> > disk or build an structure on heap if it's not DV. Last I knew anyway.
> >
> > Best,
> > Erick
> > On Wed, Nov 21, 2018 at 12:04 PM Walter Underwood 
> > wrote:
> > >
> > > Is it a good idea to store the uniqueKey as docValues? A great idea? A
> > maybe or maybe not idea?
> > >
> > > It looks like it will speed up export and streaming. Otherwise, I can’t
> > find anything the docs pro or con.
> > >
> > > wunder
> > > Walter Underwood
> > > wun...@wunderwood.org
> > > http://observer.wunderwood.org/  (my blog)
> > >
> >
>
>
> --
> Vincenzo D'Amore


Re: uniqueKey and docValues?

2018-11-22 Thread Vincenzo D'Amore
Hi guys, this is an interesting thread.

Looking at schema.xml I found having uniqueKey (type="string") configured
as docValues="true" but, I also found that _root_ is
configured docValues="false"

Is there any drawbacks in having _root_ with docValues="false" ?


On Thu, Nov 22, 2018 at 12:28 AM Erick Erickson 
wrote:

> In  SolrCloud there are a couple of places where it might be useful.
> First pass each replica collects the top N ids for the aggregator to
> sort. If the uniqueKey isn't DV, it  needs to either decompress it off
> disk or build an structure on heap if it's not DV. Last I knew anyway.
>
> Best,
> Erick
> On Wed, Nov 21, 2018 at 12:04 PM Walter Underwood 
> wrote:
> >
> > Is it a good idea to store the uniqueKey as docValues? A great idea? A
> maybe or maybe not idea?
> >
> > It looks like it will speed up export and streaming. Otherwise, I can’t
> find anything the docs pro or con.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
>


-- 
Vincenzo D'Amore


Re: uniqueKey and docValues?

2018-11-21 Thread Erick Erickson
In  SolrCloud there are a couple of places where it might be useful.
First pass each replica collects the top N ids for the aggregator to
sort. If the uniqueKey isn't DV, it  needs to either decompress it off
disk or build an structure on heap if it's not DV. Last I knew anyway.

Best,
Erick
On Wed, Nov 21, 2018 at 12:04 PM Walter Underwood  wrote:
>
> Is it a good idea to store the uniqueKey as docValues? A great idea? A maybe 
> or maybe not idea?
>
> It looks like it will speed up export and streaming. Otherwise, I can’t find 
> anything the docs pro or con.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>


uniqueKey and docValues?

2018-11-21 Thread Walter Underwood
Is it a good idea to store the uniqueKey as docValues? A great idea? A maybe or 
maybe not idea?

It looks like it will speed up export and streaming. Otherwise, I can’t find 
anything the docs pro or con.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



Re: Able to search with indexed=false and docvalues=true

2018-11-21 Thread Toke Eskildsen
On Tue, 2018-11-20 at 21:17 -0700, Shawn Heisey wrote:
> Maybe the error condition should be related to a new schema
> property, something like allowQueryOnDocValues.  This would default
> to true with current schema versions and false in the next schema
> version, which I think is 1.7.  Then a user could choose to allow
> them on a field-by-field basis, by reading documentation that
> outlines the severe performance disadvantages.

I support the idea of making such choices explicit and your suggestion
sounds like a sensible way to do it.


I have toyed with a related idea: Issue a query with debug=sanity and
get a report from checks on both the underlying index and the issued
query for indicators of problems: 
https://github.com/tokee/lucene-solr/issues/54

- Toke Eskildsen, Royal Danish Library




Re: Able to search with indexed=false and docvalues=true

2018-11-20 Thread Shawn Heisey

On 11/20/2018 8:18 PM, Rahul Goswami wrote:

Erick and Toke,

Thank you for the replies. I am surprised there already isn’t a JIRA for
this. In my opinion, this should be an error condition on search or
alternatively should simply be giving zero results. That would be a defined
behavior as opposed to now, where the searches are not particularly
functional for any industry size load anyway.


It wouldn't be a good idea to turn that into an error condition, at 
least not in any 7.x version.  There could be a lot of users out there 
who are unknowingly relying on that functionality, and would be very 
surprised to find their index doesn't work any more when they upgrade.  
It's slow, but maybe they have very small indexes.


Maybe the error condition should be related to a new schema property, 
something like allowQueryOnDocValues.  This would default to true with 
current schema versions and false in the next schema version, which I 
think is 1.7.  Then a user could choose to allow them on a 
field-by-field basis, by reading documentation that outlines the severe 
performance disadvantages.


Thanks,
Shawn



Re: Able to search with indexed=false and docvalues=true

2018-11-20 Thread Rahul Goswami
Erick and Toke,

Thank you for the replies. I am surprised there already isn’t a JIRA for
this. In my opinion, this should be an error condition on search or
alternatively should simply be giving zero results. That would be a defined
behavior as opposed to now, where the searches are not particularly
functional for any industry size load anyway.

Thanks,
Rahul

On Tue, Nov 20, 2018 at 3:37 AM Toke Eskildsen  wrote:

> On Mon, 2018-11-19 at 22:19 -0500, Rahul Goswami wrote:
> > I am using SolrCloud 7.2.1. My understanding is that setting
> > docvalues=true would optimize faceting, grouping and sorting; but for
> > a field to be searchable it needs to be indexed=true.
>
> Erick explained the search thing, so I'll just note that faceting on a
> DocValues=true indexed=false field on a multi-shard index also has a
> performance penalty as the field will be slow-searched (using the
> DocValues) in the secondary fine-counting phase.
>
> - Toke Eskildsen, Royal Danish Library
>
>
>


Re: Able to search with indexed=false and docvalues=true

2018-11-20 Thread Toke Eskildsen
On Mon, 2018-11-19 at 22:19 -0500, Rahul Goswami wrote:
> I am using SolrCloud 7.2.1. My understanding is that setting
> docvalues=true would optimize faceting, grouping and sorting; but for
> a field to be searchable it needs to be indexed=true.

Erick explained the search thing, so I'll just note that faceting on a
DocValues=true indexed=false field on a multi-shard index also has a
performance penalty as the field will be slow-searched (using the
DocValues) in the secondary fine-counting phase.

- Toke Eskildsen, Royal Danish Library




Re: Able to search with indexed=false and docvalues=true

2018-11-19 Thread Erick Erickson
I've noticed this  too, but I think it's more a side effect than
something usable for the reasons you outlined. Searching a docValues
field is akin to a "table scan", the uninverted structure is totally
unsuited for searching. It works, but as you've found out it's
unusably  slow for any decent sized corpus. Whether it should throw
some kind of error or not is worth discussing.

Erick
On Mon, Nov 19, 2018 at 7:20 PM Rahul Goswami  wrote:
>
> I am using SolrCloud 7.2.1. My understanding is that setting docvalues=true
> would optimize faceting, grouping and sorting; but for a field to be
> searchable it needs to be indexed=true. However I was dumbfounded today
> when I executed a successful search on a field with below configuration:
>  docValues="true"/>
> However the searches don't always complete and often time out.
>
> My question is...
> Is searching on docValues=true and indexed=false fields supported? If yes,
> in which cases?
> What are the pitfalls (as I see that searches, although sometimes
> successful are atrociously slow and quite often time out)?


Able to search with indexed=false and docvalues=true

2018-11-19 Thread Rahul Goswami
I am using SolrCloud 7.2.1. My understanding is that setting docvalues=true
would optimize faceting, grouping and sorting; but for a field to be
searchable it needs to be indexed=true. However I was dumbfounded today
when I executed a successful search on a field with below configuration:

However the searches don't always complete and often time out.

My question is...
Is searching on docValues=true and indexed=false fields supported? If yes,
in which cases?
What are the pitfalls (as I see that searches, although sometimes
successful are atrociously slow and quite often time out)?


Re: Retrieve field from docValues

2018-11-06 Thread Erick Erickson
You should until this is resolved. The original purpose of that JIRA
doesn't count any longer, i.e. the speedup aspects since that's been
taken care of though.
On Tue, Nov 6, 2018 at 3:50 PM Wei  wrote:
>
> Also I notice this issue is still open:
> https://issues.apache.org/jira/browse/SOLR-10816
> Does that mean we still need to have stored=true for uniqueKey?
>
> On Tue, Nov 6, 2018 at 2:14 PM Wei  wrote:
>
> > I see there is also a docValuesFormat option, what's the default for this
> > setting? Performance wise is it good to set docValuesFormat="Memory" ?
> >
> > Best,
> > Wei
> >
> >
> > On Tue, Nov 6, 2018 at 11:55 AM Erick Erickson 
> > wrote:
> >
> >> Yes, "the most efficient possible" is associated with that JIRA, so only
> >> in 7x.
> >>
> >> "Does this still hold if whole index is loaded into memory?"
> >> The decompression part yes, the disk seek part no. And it's also
> >> sensitive to whether the documentCache already has the document.
> >>
> >> I'd also make uniqueKey ant the _version_ fields docValues.
> >>
> >> Best,
> >> Erick
> >> On Tue, Nov 6, 2018 at 10:44 AM Wei  wrote:
> >> >
> >> > Thanks Yasufumi and Erick.
> >> >
> >> > ---. 2. "it depends". Solr  will try to do the most efficient thing
> >> > possible. If _all_ the fields are docValues, it will return the stored
> >> > values from the docValues  structure.
> >> >
> >> > I find this jira:   https://issues.apache.org/jira/browse/SOLR-8344
> >> Does
> >> > this mean "Solr  will try to do the most efficient thing possible" only
> >> > working for 7.x?  Is the behavior available for 6.6?
> >> >
> >> > -- This prevents a disk seek and  decompress cycle.
> >> >
> >> > Does this still hold if whole index is loaded into memory?  Also for the
> >> > benefit of performance improvement,  does the uniqueKey field need to be
> >> > always docValues? Since it is used in the first phase of distributed
> >> > search.
> >> >
> >> > Thanks,
> >> > Wei
> >> >
> >> >
> >> >
> >> > On Tue, Nov 6, 2018 at 8:30 AM Erick Erickson 
> >> > wrote:
> >> >
> >> > > 2. "it depends". Solr  will try to do the most efficient thing
> >> > > possible. If _all_ the fields are docValues, it will return the stored
> >> > > values from the docValues  structure. This prevents a disk seek and
> >> > > decompress cycle.
> >> > >
> >> > > However, if even one field is docValues=false Solr will by default
> >> > > return the stored values. For the multiValued case, you can explicitly
> >> > > tell Solr to return the docValues field.
> >> > >
> >> > > Best,
> >> > > Erick
> >> > > On Tue, Nov 6, 2018 at 1:46 AM Yasufumi Mizoguchi
> >> > >  wrote:
> >> > > >
> >> > > > Hi,
> >> > > >
> >> > > > > 1. For schema version 1.6, useDocValuesAsStored=true is default,
> >> so
> >> > > there
> >> > > > > is no need to explicitly set it in schema.xml?
> >> > > >
> >> > > > Yes.
> >> > > >
> >> > > > > 2.  With useDocValuesAsStored=true and the following definition,
> >> will
> >> > > Solr
> >> > > > > retrieve id from docValues instead of stored field?
> >> > > >
> >> > > > No.
> >> > > > AFAIK, if you define both docValues="true" and stored="true" in your
> >> > > > schema,
> >> > > > Solr tries to retrieve stored value.
> >> > > > (Except using streaming expressions or /export handler etc...
> >> > > > See:
> >> > > >
> >> > >
> >> https://lucene.apache.org/solr/guide/6_6/docvalues.html#DocValues-EnablingDocValues
> >> > > > )
> >> > > >
> >> > > > Thanks,
> >> > > > Yasufumi
> >> > > >
> >> > > >
> >> > > > 2018年11月6日(火) 9:54 Wei :
> >> > > >
> >> > > > > Hi,
> >> > > > >
> >> > > > > I have a few questions about using the useDocValuesAsStored
> >> option to
> >> > > > > retrieve field from docValues:
> >> > > > >
> >> > > > > 1. For schema version 1.6, useDocValuesAsStored=true is default,
> >> so
> >> > > there
> >> > > > > is no need to explicitly set it in schema.xml?
> >> > > > >
> >> > > > > 2.  With useDocValuesAsStored=true and the following definition,
> >> will
> >> > > Solr
> >> > > > > retrieve id from docValues instead of stored field? if fl= id,
> >> title,
> >> > > > > score,   both id and title are single value field:
> >> > > > >
> >> > > > >>> > > > > docValues="true" required="true"/>
> >> > > > >
> >> > > > >   >> > > > > docValues="true" required="true"/>
> >> > > > >
> >> > > > >   Do I need to have all fields stored="false" docValues="true" to
> >> make
> >> > > solr
> >> > > > > retrieve from docValues only? I am using Solr 6.6.
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Wei
> >> > > > >
> >> > >
> >>
> >


Re: Retrieve field from docValues

2018-11-06 Thread Wei
Also I notice this issue is still open:
https://issues.apache.org/jira/browse/SOLR-10816
Does that mean we still need to have stored=true for uniqueKey?

On Tue, Nov 6, 2018 at 2:14 PM Wei  wrote:

> I see there is also a docValuesFormat option, what's the default for this
> setting? Performance wise is it good to set docValuesFormat="Memory" ?
>
> Best,
> Wei
>
>
> On Tue, Nov 6, 2018 at 11:55 AM Erick Erickson 
> wrote:
>
>> Yes, "the most efficient possible" is associated with that JIRA, so only
>> in 7x.
>>
>> "Does this still hold if whole index is loaded into memory?"
>> The decompression part yes, the disk seek part no. And it's also
>> sensitive to whether the documentCache already has the document.
>>
>> I'd also make uniqueKey ant the _version_ fields docValues.
>>
>> Best,
>> Erick
>> On Tue, Nov 6, 2018 at 10:44 AM Wei  wrote:
>> >
>> > Thanks Yasufumi and Erick.
>> >
>> > ---. 2. "it depends". Solr  will try to do the most efficient thing
>> > possible. If _all_ the fields are docValues, it will return the stored
>> > values from the docValues  structure.
>> >
>> > I find this jira:   https://issues.apache.org/jira/browse/SOLR-8344
>> Does
>> > this mean "Solr  will try to do the most efficient thing possible" only
>> > working for 7.x?  Is the behavior available for 6.6?
>> >
>> > -- This prevents a disk seek and  decompress cycle.
>> >
>> > Does this still hold if whole index is loaded into memory?  Also for the
>> > benefit of performance improvement,  does the uniqueKey field need to be
>> > always docValues? Since it is used in the first phase of distributed
>> > search.
>> >
>> > Thanks,
>> > Wei
>> >
>> >
>> >
>> > On Tue, Nov 6, 2018 at 8:30 AM Erick Erickson 
>> > wrote:
>> >
>> > > 2. "it depends". Solr  will try to do the most efficient thing
>> > > possible. If _all_ the fields are docValues, it will return the stored
>> > > values from the docValues  structure. This prevents a disk seek and
>> > > decompress cycle.
>> > >
>> > > However, if even one field is docValues=false Solr will by default
>> > > return the stored values. For the multiValued case, you can explicitly
>> > > tell Solr to return the docValues field.
>> > >
>> > > Best,
>> > > Erick
>> > > On Tue, Nov 6, 2018 at 1:46 AM Yasufumi Mizoguchi
>> > >  wrote:
>> > > >
>> > > > Hi,
>> > > >
>> > > > > 1. For schema version 1.6, useDocValuesAsStored=true is default,
>> so
>> > > there
>> > > > > is no need to explicitly set it in schema.xml?
>> > > >
>> > > > Yes.
>> > > >
>> > > > > 2.  With useDocValuesAsStored=true and the following definition,
>> will
>> > > Solr
>> > > > > retrieve id from docValues instead of stored field?
>> > > >
>> > > > No.
>> > > > AFAIK, if you define both docValues="true" and stored="true" in your
>> > > > schema,
>> > > > Solr tries to retrieve stored value.
>> > > > (Except using streaming expressions or /export handler etc...
>> > > > See:
>> > > >
>> > >
>> https://lucene.apache.org/solr/guide/6_6/docvalues.html#DocValues-EnablingDocValues
>> > > > )
>> > > >
>> > > > Thanks,
>> > > > Yasufumi
>> > > >
>> > > >
>> > > > 2018年11月6日(火) 9:54 Wei :
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > I have a few questions about using the useDocValuesAsStored
>> option to
>> > > > > retrieve field from docValues:
>> > > > >
>> > > > > 1. For schema version 1.6, useDocValuesAsStored=true is default,
>> so
>> > > there
>> > > > > is no need to explicitly set it in schema.xml?
>> > > > >
>> > > > > 2.  With useDocValuesAsStored=true and the following definition,
>> will
>> > > Solr
>> > > > > retrieve id from docValues instead of stored field? if fl= id,
>> title,
>> > > > > score,   both id and title are single value field:
>> > > > >
>> > > > >   > > > > > docValues="true" required="true"/>
>> > > > >
>> > > > >  > > > > > docValues="true" required="true"/>
>> > > > >
>> > > > >   Do I need to have all fields stored="false" docValues="true" to
>> make
>> > > solr
>> > > > > retrieve from docValues only? I am using Solr 6.6.
>> > > > >
>> > > > > Thanks,
>> > > > > Wei
>> > > > >
>> > >
>>
>


Re: Retrieve field from docValues

2018-11-06 Thread Erick Erickson
docValuesFormat="Memory" has been deprecated, so you shouldn't use it.
On Tue, Nov 6, 2018 at 2:14 PM Wei  wrote:
>
> I see there is also a docValuesFormat option, what's the default for this
> setting? Performance wise is it good to set docValuesFormat="Memory" ?
>
> Best,
> Wei
>
>
> On Tue, Nov 6, 2018 at 11:55 AM Erick Erickson 
> wrote:
>
> > Yes, "the most efficient possible" is associated with that JIRA, so only
> > in 7x.
> >
> > "Does this still hold if whole index is loaded into memory?"
> > The decompression part yes, the disk seek part no. And it's also
> > sensitive to whether the documentCache already has the document.
> >
> > I'd also make uniqueKey ant the _version_ fields docValues.
> >
> > Best,
> > Erick
> > On Tue, Nov 6, 2018 at 10:44 AM Wei  wrote:
> > >
> > > Thanks Yasufumi and Erick.
> > >
> > > ---. 2. "it depends". Solr  will try to do the most efficient thing
> > > possible. If _all_ the fields are docValues, it will return the stored
> > > values from the docValues  structure.
> > >
> > > I find this jira:   https://issues.apache.org/jira/browse/SOLR-8344
> > Does
> > > this mean "Solr  will try to do the most efficient thing possible" only
> > > working for 7.x?  Is the behavior available for 6.6?
> > >
> > > -- This prevents a disk seek and  decompress cycle.
> > >
> > > Does this still hold if whole index is loaded into memory?  Also for the
> > > benefit of performance improvement,  does the uniqueKey field need to be
> > > always docValues? Since it is used in the first phase of distributed
> > > search.
> > >
> > > Thanks,
> > > Wei
> > >
> > >
> > >
> > > On Tue, Nov 6, 2018 at 8:30 AM Erick Erickson 
> > > wrote:
> > >
> > > > 2. "it depends". Solr  will try to do the most efficient thing
> > > > possible. If _all_ the fields are docValues, it will return the stored
> > > > values from the docValues  structure. This prevents a disk seek and
> > > > decompress cycle.
> > > >
> > > > However, if even one field is docValues=false Solr will by default
> > > > return the stored values. For the multiValued case, you can explicitly
> > > > tell Solr to return the docValues field.
> > > >
> > > > Best,
> > > > Erick
> > > > On Tue, Nov 6, 2018 at 1:46 AM Yasufumi Mizoguchi
> > > >  wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > > 1. For schema version 1.6, useDocValuesAsStored=true is default, so
> > > > there
> > > > > > is no need to explicitly set it in schema.xml?
> > > > >
> > > > > Yes.
> > > > >
> > > > > > 2.  With useDocValuesAsStored=true and the following definition,
> > will
> > > > Solr
> > > > > > retrieve id from docValues instead of stored field?
> > > > >
> > > > > No.
> > > > > AFAIK, if you define both docValues="true" and stored="true" in your
> > > > > schema,
> > > > > Solr tries to retrieve stored value.
> > > > > (Except using streaming expressions or /export handler etc...
> > > > > See:
> > > > >
> > > >
> > https://lucene.apache.org/solr/guide/6_6/docvalues.html#DocValues-EnablingDocValues
> > > > > )
> > > > >
> > > > > Thanks,
> > > > > Yasufumi
> > > > >
> > > > >
> > > > > 2018年11月6日(火) 9:54 Wei :
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I have a few questions about using the useDocValuesAsStored option
> > to
> > > > > > retrieve field from docValues:
> > > > > >
> > > > > > 1. For schema version 1.6, useDocValuesAsStored=true is default, so
> > > > there
> > > > > > is no need to explicitly set it in schema.xml?
> > > > > >
> > > > > > 2.  With useDocValuesAsStored=true and the following definition,
> > will
> > > > Solr
> > > > > > retrieve id from docValues instead of stored field? if fl= id,
> > title,
> > > > > > score,   both id and title are single value field:
> > > > > >
> > > > > >> > > > > docValues="true" required="true"/>
> > > > > >
> > > > > >   > > > > > docValues="true" required="true"/>
> > > > > >
> > > > > >   Do I need to have all fields stored="false" docValues="true" to
> > make
> > > > solr
> > > > > > retrieve from docValues only? I am using Solr 6.6.
> > > > > >
> > > > > > Thanks,
> > > > > > Wei
> > > > > >
> > > >
> >


Re: Retrieve field from docValues

2018-11-06 Thread Wei
I see there is also a docValuesFormat option, what's the default for this
setting? Performance wise is it good to set docValuesFormat="Memory" ?

Best,
Wei


On Tue, Nov 6, 2018 at 11:55 AM Erick Erickson 
wrote:

> Yes, "the most efficient possible" is associated with that JIRA, so only
> in 7x.
>
> "Does this still hold if whole index is loaded into memory?"
> The decompression part yes, the disk seek part no. And it's also
> sensitive to whether the documentCache already has the document.
>
> I'd also make uniqueKey ant the _version_ fields docValues.
>
> Best,
> Erick
> On Tue, Nov 6, 2018 at 10:44 AM Wei  wrote:
> >
> > Thanks Yasufumi and Erick.
> >
> > ---. 2. "it depends". Solr  will try to do the most efficient thing
> > possible. If _all_ the fields are docValues, it will return the stored
> > values from the docValues  structure.
> >
> > I find this jira:   https://issues.apache.org/jira/browse/SOLR-8344
> Does
> > this mean "Solr  will try to do the most efficient thing possible" only
> > working for 7.x?  Is the behavior available for 6.6?
> >
> > -- This prevents a disk seek and  decompress cycle.
> >
> > Does this still hold if whole index is loaded into memory?  Also for the
> > benefit of performance improvement,  does the uniqueKey field need to be
> > always docValues? Since it is used in the first phase of distributed
> > search.
> >
> > Thanks,
> > Wei
> >
> >
> >
> > On Tue, Nov 6, 2018 at 8:30 AM Erick Erickson 
> > wrote:
> >
> > > 2. "it depends". Solr  will try to do the most efficient thing
> > > possible. If _all_ the fields are docValues, it will return the stored
> > > values from the docValues  structure. This prevents a disk seek and
> > > decompress cycle.
> > >
> > > However, if even one field is docValues=false Solr will by default
> > > return the stored values. For the multiValued case, you can explicitly
> > > tell Solr to return the docValues field.
> > >
> > > Best,
> > > Erick
> > > On Tue, Nov 6, 2018 at 1:46 AM Yasufumi Mizoguchi
> > >  wrote:
> > > >
> > > > Hi,
> > > >
> > > > > 1. For schema version 1.6, useDocValuesAsStored=true is default, so
> > > there
> > > > > is no need to explicitly set it in schema.xml?
> > > >
> > > > Yes.
> > > >
> > > > > 2.  With useDocValuesAsStored=true and the following definition,
> will
> > > Solr
> > > > > retrieve id from docValues instead of stored field?
> > > >
> > > > No.
> > > > AFAIK, if you define both docValues="true" and stored="true" in your
> > > > schema,
> > > > Solr tries to retrieve stored value.
> > > > (Except using streaming expressions or /export handler etc...
> > > > See:
> > > >
> > >
> https://lucene.apache.org/solr/guide/6_6/docvalues.html#DocValues-EnablingDocValues
> > > > )
> > > >
> > > > Thanks,
> > > > Yasufumi
> > > >
> > > >
> > > > 2018年11月6日(火) 9:54 Wei :
> > > >
> > > > > Hi,
> > > > >
> > > > > I have a few questions about using the useDocValuesAsStored option
> to
> > > > > retrieve field from docValues:
> > > > >
> > > > > 1. For schema version 1.6, useDocValuesAsStored=true is default, so
> > > there
> > > > > is no need to explicitly set it in schema.xml?
> > > > >
> > > > > 2.  With useDocValuesAsStored=true and the following definition,
> will
> > > Solr
> > > > > retrieve id from docValues instead of stored field? if fl= id,
> title,
> > > > > score,   both id and title are single value field:
> > > > >
> > > > >> > > > docValues="true" required="true"/>
> > > > >
> > > > >   > > > > docValues="true" required="true"/>
> > > > >
> > > > >   Do I need to have all fields stored="false" docValues="true" to
> make
> > > solr
> > > > > retrieve from docValues only? I am using Solr 6.6.
> > > > >
> > > > > Thanks,
> > > > > Wei
> > > > >
> > >
>


Re: Retrieve field from docValues

2018-11-06 Thread Erick Erickson
Yes, "the most efficient possible" is associated with that JIRA, so only in 7x.

"Does this still hold if whole index is loaded into memory?"
The decompression part yes, the disk seek part no. And it's also
sensitive to whether the documentCache already has the document.

I'd also make uniqueKey ant the _version_ fields docValues.

Best,
Erick
On Tue, Nov 6, 2018 at 10:44 AM Wei  wrote:
>
> Thanks Yasufumi and Erick.
>
> ---. 2. "it depends". Solr  will try to do the most efficient thing
> possible. If _all_ the fields are docValues, it will return the stored
> values from the docValues  structure.
>
> I find this jira:   https://issues.apache.org/jira/browse/SOLR-8344Does
> this mean "Solr  will try to do the most efficient thing possible" only
> working for 7.x?  Is the behavior available for 6.6?
>
> -- This prevents a disk seek and  decompress cycle.
>
> Does this still hold if whole index is loaded into memory?  Also for the
> benefit of performance improvement,  does the uniqueKey field need to be
> always docValues? Since it is used in the first phase of distributed
> search.
>
> Thanks,
> Wei
>
>
>
> On Tue, Nov 6, 2018 at 8:30 AM Erick Erickson 
> wrote:
>
> > 2. "it depends". Solr  will try to do the most efficient thing
> > possible. If _all_ the fields are docValues, it will return the stored
> > values from the docValues  structure. This prevents a disk seek and
> > decompress cycle.
> >
> > However, if even one field is docValues=false Solr will by default
> > return the stored values. For the multiValued case, you can explicitly
> > tell Solr to return the docValues field.
> >
> > Best,
> > Erick
> > On Tue, Nov 6, 2018 at 1:46 AM Yasufumi Mizoguchi
> >  wrote:
> > >
> > > Hi,
> > >
> > > > 1. For schema version 1.6, useDocValuesAsStored=true is default, so
> > there
> > > > is no need to explicitly set it in schema.xml?
> > >
> > > Yes.
> > >
> > > > 2.  With useDocValuesAsStored=true and the following definition, will
> > Solr
> > > > retrieve id from docValues instead of stored field?
> > >
> > > No.
> > > AFAIK, if you define both docValues="true" and stored="true" in your
> > > schema,
> > > Solr tries to retrieve stored value.
> > > (Except using streaming expressions or /export handler etc...
> > > See:
> > >
> > https://lucene.apache.org/solr/guide/6_6/docvalues.html#DocValues-EnablingDocValues
> > > )
> > >
> > > Thanks,
> > > Yasufumi
> > >
> > >
> > > 2018年11月6日(火) 9:54 Wei :
> > >
> > > > Hi,
> > > >
> > > > I have a few questions about using the useDocValuesAsStored option to
> > > > retrieve field from docValues:
> > > >
> > > > 1. For schema version 1.6, useDocValuesAsStored=true is default, so
> > there
> > > > is no need to explicitly set it in schema.xml?
> > > >
> > > > 2.  With useDocValuesAsStored=true and the following definition, will
> > Solr
> > > > retrieve id from docValues instead of stored field? if fl= id, title,
> > > > score,   both id and title are single value field:
> > > >
> > > >> > > docValues="true" required="true"/>
> > > >
> > > >   > > > docValues="true" required="true"/>
> > > >
> > > >   Do I need to have all fields stored="false" docValues="true" to make
> > solr
> > > > retrieve from docValues only? I am using Solr 6.6.
> > > >
> > > > Thanks,
> > > > Wei
> > > >
> >


Re: Retrieve field from docValues

2018-11-06 Thread Wei
Thanks Yasufumi and Erick.

---. 2. "it depends". Solr  will try to do the most efficient thing
possible. If _all_ the fields are docValues, it will return the stored
values from the docValues  structure.

I find this jira:   https://issues.apache.org/jira/browse/SOLR-8344Does
this mean "Solr  will try to do the most efficient thing possible" only
working for 7.x?  Is the behavior available for 6.6?

-- This prevents a disk seek and  decompress cycle.

Does this still hold if whole index is loaded into memory?  Also for the
benefit of performance improvement,  does the uniqueKey field need to be
always docValues? Since it is used in the first phase of distributed
search.

Thanks,
Wei



On Tue, Nov 6, 2018 at 8:30 AM Erick Erickson 
wrote:

> 2. "it depends". Solr  will try to do the most efficient thing
> possible. If _all_ the fields are docValues, it will return the stored
> values from the docValues  structure. This prevents a disk seek and
> decompress cycle.
>
> However, if even one field is docValues=false Solr will by default
> return the stored values. For the multiValued case, you can explicitly
> tell Solr to return the docValues field.
>
> Best,
> Erick
> On Tue, Nov 6, 2018 at 1:46 AM Yasufumi Mizoguchi
>  wrote:
> >
> > Hi,
> >
> > > 1. For schema version 1.6, useDocValuesAsStored=true is default, so
> there
> > > is no need to explicitly set it in schema.xml?
> >
> > Yes.
> >
> > > 2.  With useDocValuesAsStored=true and the following definition, will
> Solr
> > > retrieve id from docValues instead of stored field?
> >
> > No.
> > AFAIK, if you define both docValues="true" and stored="true" in your
> > schema,
> > Solr tries to retrieve stored value.
> > (Except using streaming expressions or /export handler etc...
> > See:
> >
> https://lucene.apache.org/solr/guide/6_6/docvalues.html#DocValues-EnablingDocValues
> > )
> >
> > Thanks,
> > Yasufumi
> >
> >
> > 2018年11月6日(火) 9:54 Wei :
> >
> > > Hi,
> > >
> > > I have a few questions about using the useDocValuesAsStored option to
> > > retrieve field from docValues:
> > >
> > > 1. For schema version 1.6, useDocValuesAsStored=true is default, so
> there
> > > is no need to explicitly set it in schema.xml?
> > >
> > > 2.  With useDocValuesAsStored=true and the following definition, will
> Solr
> > > retrieve id from docValues instead of stored field? if fl= id, title,
> > > score,   both id and title are single value field:
> > >
> > >> > docValues="true" required="true"/>
> > >
> > >   > > docValues="true" required="true"/>
> > >
> > >   Do I need to have all fields stored="false" docValues="true" to make
> solr
> > > retrieve from docValues only? I am using Solr 6.6.
> > >
> > > Thanks,
> > > Wei
> > >
>


Re: Retrieve field from docValues

2018-11-06 Thread Erick Erickson
2. "it depends". Solr  will try to do the most efficient thing
possible. If _all_ the fields are docValues, it will return the stored
values from the docValues  structure. This prevents a disk seek and
decompress cycle.

However, if even one field is docValues=false Solr will by default
return the stored values. For the multiValued case, you can explicitly
tell Solr to return the docValues field.

Best,
Erick
On Tue, Nov 6, 2018 at 1:46 AM Yasufumi Mizoguchi
 wrote:
>
> Hi,
>
> > 1. For schema version 1.6, useDocValuesAsStored=true is default, so there
> > is no need to explicitly set it in schema.xml?
>
> Yes.
>
> > 2.  With useDocValuesAsStored=true and the following definition, will Solr
> > retrieve id from docValues instead of stored field?
>
> No.
> AFAIK, if you define both docValues="true" and stored="true" in your
> schema,
> Solr tries to retrieve stored value.
> (Except using streaming expressions or /export handler etc...
> See:
> https://lucene.apache.org/solr/guide/6_6/docvalues.html#DocValues-EnablingDocValues
> )
>
> Thanks,
> Yasufumi
>
>
> 2018年11月6日(火) 9:54 Wei :
>
> > Hi,
> >
> > I have a few questions about using the useDocValuesAsStored option to
> > retrieve field from docValues:
> >
> > 1. For schema version 1.6, useDocValuesAsStored=true is default, so there
> > is no need to explicitly set it in schema.xml?
> >
> > 2.  With useDocValuesAsStored=true and the following definition, will Solr
> > retrieve id from docValues instead of stored field? if fl= id, title,
> > score,   both id and title are single value field:
> >
> >    > docValues="true" required="true"/>
> >
> >   > docValues="true" required="true"/>
> >
> >   Do I need to have all fields stored="false" docValues="true" to make solr
> > retrieve from docValues only? I am using Solr 6.6.
> >
> > Thanks,
> > Wei
> >


Re: Retrieve field from docValues

2018-11-06 Thread Yasufumi Mizoguchi
Hi,

> 1. For schema version 1.6, useDocValuesAsStored=true is default, so there
> is no need to explicitly set it in schema.xml?

Yes.

> 2.  With useDocValuesAsStored=true and the following definition, will Solr
> retrieve id from docValues instead of stored field?

No.
AFAIK, if you define both docValues="true" and stored="true" in your
schema,
Solr tries to retrieve stored value.
(Except using streaming expressions or /export handler etc...
See:
https://lucene.apache.org/solr/guide/6_6/docvalues.html#DocValues-EnablingDocValues
)

Thanks,
Yasufumi


2018年11月6日(火) 9:54 Wei :

> Hi,
>
> I have a few questions about using the useDocValuesAsStored option to
> retrieve field from docValues:
>
> 1. For schema version 1.6, useDocValuesAsStored=true is default, so there
> is no need to explicitly set it in schema.xml?
>
> 2.  With useDocValuesAsStored=true and the following definition, will Solr
> retrieve id from docValues instead of stored field? if fl= id, title,
> score,   both id and title are single value field:
>
>docValues="true" required="true"/>
>
>   docValues="true" required="true"/>
>
>   Do I need to have all fields stored="false" docValues="true" to make solr
> retrieve from docValues only? I am using Solr 6.6.
>
> Thanks,
> Wei
>


Retrieve field from docValues

2018-11-05 Thread Wei
Hi,

I have a few questions about using the useDocValuesAsStored option to
retrieve field from docValues:

1. For schema version 1.6, useDocValuesAsStored=true is default, so there
is no need to explicitly set it in schema.xml?

2.  With useDocValuesAsStored=true and the following definition, will Solr
retrieve id from docValues instead of stored field? if fl= id, title,
score,   both id and title are single value field:

  

 

  Do I need to have all fields stored="false" docValues="true" to make solr
retrieve from docValues only? I am using Solr 6.6.

Thanks,
Wei


Re: Highlighting is not working with docValues only String field

2018-08-13 Thread Karthik Ramachandran
I have opened JIRA https://issues.apache.org/jira/browse/SOLR-12663


On Sat, Aug 11, 2018 at 8:59 PM Erick Erickson 
wrote:

> I can see why it wouldn't and also why it could/should. I also wonder about
> SortableTextField, perhaps mention that too.
>
> Seems worth a JIRA to me if there isn't one already
>
> On Fri, Aug 10, 2018, 19:49 Karthik Ramachandran <
> kramachand...@commvault.com> wrote:
>
> > We are using Solr 7.2.1, highlighting is not working with docValues only
> > String field.
> >
> > Should I open a JIRA for this?
> >
> > Schema:
> > 
> >   id
> >   
> >> required="true"/>
> >> stored="true"/>
> >> stored="false"/>
> >   
> > 
> >
> > Data:
> > [{"id":1,"name":"Testing line 1"},{"id":2,"name":"Testing line
> > 2"},{"id":3,"name":"Testing line 3"}]
> >
> > Query:
> >
> >
> http://localhost:8983/solr/test/select?q=Testing*&df=name&hl=true&hl.fl=name,name1
> >
> > Response:
> > {"response":{"numFound":3,"start":0,"docs":[{"id":"1","name":"Testing
> line
> > 1","name1":"Testing line 1"},{"id":"2","name":"Testing line
> > 2","name1":"Testing line 2"},{"id":"3","name":"Testing line
> > 3","name1":"Testing line 3"}]},"highlighting":{"1":{"name":["Testing
> > line 1"]},"2":{"name":["Testing line
> > 2"]},"3":{"name":["Testing line 3"]}}}
> >
> >
> > With Thanks & Regards
> > Karthik Ramachandran
> > P Please don't print this e-mail unless you really need to
> >
> > ***Legal Disclaimer***
> > "This communication may contain confidential and privileged material for
> > the
> > sole use of the intended recipient. Any unauthorized review, use or
> > distribution
> > by others is strictly prohibited. If you have received the message by
> > mistake,
> > please advise the sender by reply email and delete the message. Thank
> you."
> > **
> >
>


-- 
With Thanks & Regards
Karthik Ramachandran

P Please don't print this e-mail unless you really need to


Re: Highlighting is not working with docValues only String field

2018-08-11 Thread Erick Erickson
I can see why it wouldn't and also why it could/should. I also wonder about
SortableTextField, perhaps mention that too.

Seems worth a JIRA to me if there isn't one already

On Fri, Aug 10, 2018, 19:49 Karthik Ramachandran <
kramachand...@commvault.com> wrote:

> We are using Solr 7.2.1, highlighting is not working with docValues only
> String field.
>
> Should I open a JIRA for this?
>
> Schema:
> 
>   id
>   
>required="true"/>
>stored="true"/>
>stored="false"/>
>   
> 
>
> Data:
> [{"id":1,"name":"Testing line 1"},{"id":2,"name":"Testing line
> 2"},{"id":3,"name":"Testing line 3"}]
>
> Query:
>
> http://localhost:8983/solr/test/select?q=Testing*&df=name&hl=true&hl.fl=name,name1
>
> Response:
> {"response":{"numFound":3,"start":0,"docs":[{"id":"1","name":"Testing line
> 1","name1":"Testing line 1"},{"id":"2","name":"Testing line
> 2","name1":"Testing line 2"},{"id":"3","name":"Testing line
> 3","name1":"Testing line 3"}]},"highlighting":{"1":{"name":["Testing
> line 1"]},"2":{"name":["Testing line
> 2"]},"3":{"name":["Testing line 3"]}}}
>
>
> With Thanks & Regards
> Karthik Ramachandran
> P Please don't print this e-mail unless you really need to
>
> ***Legal Disclaimer***
> "This communication may contain confidential and privileged material for
> the
> sole use of the intended recipient. Any unauthorized review, use or
> distribution
> by others is strictly prohibited. If you have received the message by
> mistake,
> please advise the sender by reply email and delete the message. Thank you."
> **
>


Highlighting is not working with docValues only String field

2018-08-10 Thread Karthik Ramachandran
We are using Solr 7.2.1, highlighting is not working with docValues only String 
field.

Should I open a JIRA for this?

Schema:

  id
  
  
  
  
  


Data:
[{"id":1,"name":"Testing line 1"},{"id":2,"name":"Testing line 
2"},{"id":3,"name":"Testing line 3"}]

Query:
http://localhost:8983/solr/test/select?q=Testing*&df=name&hl=true&hl.fl=name,name1

Response:
{"response":{"numFound":3,"start":0,"docs":[{"id":"1","name":"Testing line 
1","name1":"Testing line 1"},{"id":"2","name":"Testing line 2","name1":"Testing 
line 2"},{"id":"3","name":"Testing line 3","name1":"Testing line 
3"}]},"highlighting":{"1":{"name":["Testing line 
1"]},"2":{"name":["Testing line 2"]},"3":{"name":["Testing 
line 3"]}}}


With Thanks & Regards
Karthik Ramachandran
P Please don't print this e-mail unless you really need to

***Legal Disclaimer***
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**


  1   2   3   4   5   6   >