subject:"Best field definition which is only use for filter query."

Re: Best field definition which is only use for filter query.

2020-07-22 Thread Erik Hatcher




> On Jul 22, 2020, at 08:52, raj.yadav  wrote:
> 
> Erik Hatcher-4 wrote
>> Wouldn’t a “string” field be as good, if not better, for this use case?
> 
> What is the rationale behind this type change to 'string'. How will it speed
> up search/filtering? Will it not increase the index size. Since in general
> string type takes more space storage then int (not sure about whats case in
> lucene). 

You tell me? ;)   Easy enough to try in your environment, I imagine, in 
parallel in same collection index.  

As I understand it (in regards to Erick’s points), range queries aren’t being 
used here.  

Erik

> 
> Regards,
> Raj
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Best field definition which is only use for filter query.

2020-07-22 Thread Erick Erickson

pints 
1> take up less space (IIRC)
2> are better for range queries.

Best,
Erick

> On Jul 22, 2020, at 8:49 AM, raj.yadav  wrote:
> 
> Erik Hatcher-4 wrote
>> Wouldn’t a “string” field be as good, if not better, for this use case?
> 
> What is the rationale behind this type change to 'string'. How will it speed
> up search/filtering? Will it not increase the index size. Since in general
> string type takes more space storage then int (not sure about whats case in
> lucene). 
> 
> Regards,
> Raj
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Best field definition which is only use for filter query.

2020-07-22 Thread raj.yadav

Erik Hatcher-4 wrote
> Wouldn’t a “string” field be as good, if not better, for this use case?

What is the rationale behind this type change to 'string'. How will it speed
up search/filtering? Will it not increase the index size. Since in general
string type takes more space storage then int (not sure about whats case in
lucene). 

Regards,
Raj



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Best field definition which is only use for filter query.

2020-07-22 Thread raj.yadav

Erick Erickson wrote
> Also, the default pint type is not as efficient for single-value searches
> like this, the trie fields are better. Trie support will be kept until
> there’s a good alternative for the single-value lookup with pint.
> 
> So for what you’re doing, I’d change to TrieInt, docValues=false,
> index=true.

 
So, we should use TrieInt type for single-value searches (on a single value
and multivalue field). Please correct me if I'm wrong. 

Also in what scenarios we should prefer pint over TrieInt from both document
search(index) and retrieval(stored) latency point of view (not looking from
sorting or faceting point of view). Is there any documentation that compares
these two field types.

Regards,
Raj



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Best field definition which is only use for filter query.

2020-07-22 Thread Erik Hatcher

Wouldn’t a “string” field be as good, if not better, for this use case?

> On Jul 22, 2020, at 08:02, Erick Erickson  wrote:
> 
> fq clauses are just like the q clause except for two things:
> 1> no scoring is done
> 2> the entire result set _can_ be stored in the filterCache.
> 
> so if a value isn’t indexed, it can’t be used in either an fq or q clause.
> 
> The thread you reference is under the assumption (and this is the default in 
> some versions of Solr) that docValues=true. And yes, that will be very, very 
> slow. Think “table scan”.
> 
> Also, the default pint type is not as efficient for single-value searches 
> like this, the trie fields are better. Trie support will be kept until 
> there’s a good alternative for the single-value lookup with pint.
> 
> So for what you’re doing, I’d change to TrieInt, docValues=false, index=true. 
> If you have neither docValues=true nor index=true, the query won’t work at 
> all. You’ll have to adequately size your hardware if index size is a concern.
> 
> Best,
> Erick
> 
>> On Jul 22, 2020, at 7:18 AM, Raj Yadav  wrote:
>> 
>> Below is the sample document
>> 
>> 
>> 
>> 
>> 
>> *{"filedA": 1,"filedB": "","filedC": "Sher","filedD":
>> "random","rules":[203,7843,43,283,6603,83,513,5303,243,103,323,163,403,363,5333,2483,313,703,523,503,563,8543,1003,483,1083,2043,6523,603,963,683,5353,763,443,643,743,723,1123,843,1243,1663,1803,1403,1783,7563,3843,1843,1523,1203,1563,1703,1883,8913,1923,1323,5313,1623,1963,2033,2763,2623,2083,2123,2143,123,2183,2333,8183,7323,2323,7243,2313,2463,2423,2383,5833,2343,2503,2663,8263,3083,2683,2543,8313,2883,2923,3043,2703,3243,3123,2263,3003,2393,3203,3163,6243,3283,3443,3343,3403,1913,3323,3483,3603,3723,3763,8333,3563,863,3683,3643,3523,3803,8323,3883,4003,3923,4043,4173,1163,2963,1743,6593,4083,4103,4143,1363,3983,4183,4223,6623,4383,1443,4303,4263,4403,4423,4283,4343,5043,4923,4983,4993,6633,4503,5843,8073,4663]}*
>> As you can see we have 5 fields and one of the field names is "rules".
>> Field Definition:
>> > multiValued="true">
>> 
>> The only operation that we do on this field is filtering.
>> example: => fq=rules:203
>> 
>> *Problems:*
>> 1. The problem over here is, for `rules` field we have
>> marked indexed="true" and it is consuming a large percentage of total index
>> size.
>> 2. Another problem is, a large chunk of our document update request is
>> mainly for this(rules) field.
>> 
>> If I marked `indexed=false` for this field (by default pint field type have
>> docValue=true)
>> *> multiValued="true">*
>> Then following thread is suggesting that filter operation (which is also
>> one kind of search operation) will be very slow
>> https://lucene.472066.n3.nabble.com/Facet-performance-problem-td4375925.html
>> 
>> Is there a way to not keep indexed=true for `rules` field and still does
>> not impact our search(filtering performance). Or any other solution which
>> can help in reducing our total index size and also does not increase
>> search(filter) latency
>> 
>> Regards,
>> Raj
>

Re: Best field definition which is only use for filter query.

2020-07-22 Thread Erick Erickson

fq clauses are just like the q clause except for two things:
1> no scoring is done
2> the entire result set _can_ be stored in the filterCache.

so if a value isn’t indexed, it can’t be used in either an fq or q clause.

The thread you reference is under the assumption (and this is the default in 
some versions of Solr) that docValues=true. And yes, that will be very, very 
slow. Think “table scan”.

Also, the default pint type is not as efficient for single-value searches like 
this, the trie fields are better. Trie support will be kept until there’s a 
good alternative for the single-value lookup with pint.

So for what you’re doing, I’d change to TrieInt, docValues=false, index=true. 
If you have neither docValues=true nor index=true, the query won’t work at all. 
You’ll have to adequately size your hardware if index size is a concern.

Best,
Erick

> On Jul 22, 2020, at 7:18 AM, Raj Yadav  wrote:
> 
> Below is the sample document
> 
> 
> 
> 
> 
> *{"filedA": 1,"filedB": "","filedC": "Sher","filedD":
> "random","rules":[203,7843,43,283,6603,83,513,5303,243,103,323,163,403,363,5333,2483,313,703,523,503,563,8543,1003,483,1083,2043,6523,603,963,683,5353,763,443,643,743,723,1123,843,1243,1663,1803,1403,1783,7563,3843,1843,1523,1203,1563,1703,1883,8913,1923,1323,5313,1623,1963,2033,2763,2623,2083,2123,2143,123,2183,2333,8183,7323,2323,7243,2313,2463,2423,2383,5833,2343,2503,2663,8263,3083,2683,2543,8313,2883,2923,3043,2703,3243,3123,2263,3003,2393,3203,3163,6243,3283,3443,3343,3403,1913,3323,3483,3603,3723,3763,8333,3563,863,3683,3643,3523,3803,8323,3883,4003,3923,4043,4173,1163,2963,1743,6593,4083,4103,4143,1363,3983,4183,4223,6623,4383,1443,4303,4263,4403,4423,4283,4343,5043,4923,4983,4993,6633,4503,5843,8073,4663]}*
> As you can see we have 5 fields and one of the field names is "rules".
> Field Definition:
>  multiValued="true">
> 
> The only operation that we do on this field is filtering.
> example: => fq=rules:203
> 
> *Problems:*
> 1. The problem over here is, for `rules` field we have
> marked indexed="true" and it is consuming a large percentage of total index
> size.
> 2. Another problem is, a large chunk of our document update request is
> mainly for this(rules) field.
> 
> If I marked `indexed=false` for this field (by default pint field type have
> docValue=true)
> * multiValued="true">*
> Then following thread is suggesting that filter operation (which is also
> one kind of search operation) will be very slow
> https://lucene.472066.n3.nabble.com/Facet-performance-problem-td4375925.html
> 
> Is there a way to not keep indexed=true for `rules` field and still does
> not impact our search(filtering performance). Or any other solution which
> can help in reducing our total index size and also does not increase
> search(filter) latency
> 
> Regards,
> Raj

Best field definition which is only use for filter query.

2020-07-22 Thread Raj Yadav

Below is the sample document





*{"filedA": 1,"filedB": "","filedC": "Sher","filedD":
"random","rules":[203,7843,43,283,6603,83,513,5303,243,103,323,163,403,363,5333,2483,313,703,523,503,563,8543,1003,483,1083,2043,6523,603,963,683,5353,763,443,643,743,723,1123,843,1243,1663,1803,1403,1783,7563,3843,1843,1523,1203,1563,1703,1883,8913,1923,1323,5313,1623,1963,2033,2763,2623,2083,2123,2143,123,2183,2333,8183,7323,2323,7243,2313,2463,2423,2383,5833,2343,2503,2663,8263,3083,2683,2543,8313,2883,2923,3043,2703,3243,3123,2263,3003,2393,3203,3163,6243,3283,3443,3343,3403,1913,3323,3483,3603,3723,3763,8333,3563,863,3683,3643,3523,3803,8323,3883,4003,3923,4043,4173,1163,2963,1743,6593,4083,4103,4143,1363,3983,4183,4223,6623,4383,1443,4303,4263,4403,4423,4283,4343,5043,4923,4983,4993,6633,4503,5843,8073,4663]}*
As you can see we have 5 fields and one of the field names is "rules".
Field Definition:


The only operation that we do on this field is filtering.
example: => fq=rules:203

*Problems:*
1. The problem over here is, for `rules` field we have
marked indexed="true" and it is consuming a large percentage of total index
size.
2. Another problem is, a large chunk of our document update request is
mainly for this(rules) field.

If I marked `indexed=false` for this field (by default pint field type have
docValue=true)
**
Then following thread is suggesting that filter operation (which is also
one kind of search operation) will be very slow
https://lucene.472066.n3.nabble.com/Facet-performance-problem-td4375925.html

Is there a way to not keep indexed=true for `rules` field and still does
not impact our search(filtering performance). Or any other solution which
can help in reducing our total index size and also does not increase
search(filter) latency

Regards,
Raj

Re: Best field definition which is only use for filter query.

Re: Best field definition which is only use for filter query.

Re: Best field definition which is only use for filter query.

Re: Best field definition which is only use for filter query.

Re: Best field definition which is only use for filter query.

Re: Best field definition which is only use for filter query.

Best field definition which is only use for filter query.

7 matches

Site Navigation

Mail list logo

Footer information