Re: Best field definition which is only use for filter query.
> On Jul 22, 2020, at 08:52, raj.yadav wrote: > > Erik Hatcher-4 wrote >> Wouldn’t a “string” field be as good, if not better, for this use case? > > What is the rationale behind this type change to 'string'. How will it speed > up search/filtering? Will it not increase the index size. Since in general > string type takes more space storage then int (not sure about whats case in > lucene). You tell me? ;) Easy enough to try in your environment, I imagine, in parallel in same collection index. As I understand it (in regards to Erick’s points), range queries aren’t being used here. Erik > > Regards, > Raj > > > > -- > Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Best field definition which is only use for filter query.
pints 1> take up less space (IIRC) 2> are better for range queries. Best, Erick > On Jul 22, 2020, at 8:49 AM, raj.yadav wrote: > > Erik Hatcher-4 wrote >> Wouldn’t a “string” field be as good, if not better, for this use case? > > What is the rationale behind this type change to 'string'. How will it speed > up search/filtering? Will it not increase the index size. Since in general > string type takes more space storage then int (not sure about whats case in > lucene). > > Regards, > Raj > > > > -- > Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Best field definition which is only use for filter query.
Erik Hatcher-4 wrote > Wouldn’t a “string” field be as good, if not better, for this use case? What is the rationale behind this type change to 'string'. How will it speed up search/filtering? Will it not increase the index size. Since in general string type takes more space storage then int (not sure about whats case in lucene). Regards, Raj -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Best field definition which is only use for filter query.
Erick Erickson wrote > Also, the default pint type is not as efficient for single-value searches > like this, the trie fields are better. Trie support will be kept until > there’s a good alternative for the single-value lookup with pint. > > So for what you’re doing, I’d change to TrieInt, docValues=false, > index=true. So, we should use TrieInt type for single-value searches (on a single value and multivalue field). Please correct me if I'm wrong. Also in what scenarios we should prefer pint over TrieInt from both document search(index) and retrieval(stored) latency point of view (not looking from sorting or faceting point of view). Is there any documentation that compares these two field types. Regards, Raj -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Best field definition which is only use for filter query.
Wouldn’t a “string” field be as good, if not better, for this use case? > On Jul 22, 2020, at 08:02, Erick Erickson wrote: > > fq clauses are just like the q clause except for two things: > 1> no scoring is done > 2> the entire result set _can_ be stored in the filterCache. > > so if a value isn’t indexed, it can’t be used in either an fq or q clause. > > The thread you reference is under the assumption (and this is the default in > some versions of Solr) that docValues=true. And yes, that will be very, very > slow. Think “table scan”. > > Also, the default pint type is not as efficient for single-value searches > like this, the trie fields are better. Trie support will be kept until > there’s a good alternative for the single-value lookup with pint. > > So for what you’re doing, I’d change to TrieInt, docValues=false, index=true. > If you have neither docValues=true nor index=true, the query won’t work at > all. You’ll have to adequately size your hardware if index size is a concern. > > Best, > Erick > >> On Jul 22, 2020, at 7:18 AM, Raj Yadav wrote: >> >> Below is the sample document >> >> >> >> >> >> *{"filedA": 1,"filedB": "","filedC": "Sher","filedD": >> "random","rules":[203,7843,43,283,6603,83,513,5303,243,103,323,163,403,363,5333,2483,313,703,523,503,563,8543,1003,483,1083,2043,6523,603,963,683,5353,763,443,643,743,723,1123,843,1243,1663,1803,1403,1783,7563,3843,1843,1523,1203,1563,1703,1883,8913,1923,1323,5313,1623,1963,2033,2763,2623,2083,2123,2143,123,2183,2333,8183,7323,2323,7243,2313,2463,2423,2383,5833,2343,2503,2663,8263,3083,2683,2543,8313,2883,2923,3043,2703,3243,3123,2263,3003,2393,3203,3163,6243,3283,3443,3343,3403,1913,3323,3483,3603,3723,3763,8333,3563,863,3683,3643,3523,3803,8323,3883,4003,3923,4043,4173,1163,2963,1743,6593,4083,4103,4143,1363,3983,4183,4223,6623,4383,1443,4303,4263,4403,4423,4283,4343,5043,4923,4983,4993,6633,4503,5843,8073,4663]}* >> As you can see we have 5 fields and one of the field names is "rules". >> Field Definition: >> > multiValued="true"> >> >> The only operation that we do on this field is filtering. >> example: => fq=rules:203 >> >> *Problems:* >> 1. The problem over here is, for `rules` field we have >> marked indexed="true" and it is consuming a large percentage of total index >> size. >> 2. Another problem is, a large chunk of our document update request is >> mainly for this(rules) field. >> >> If I marked `indexed=false` for this field (by default pint field type have >> docValue=true) >> *> multiValued="true">* >> Then following thread is suggesting that filter operation (which is also >> one kind of search operation) will be very slow >> https://lucene.472066.n3.nabble.com/Facet-performance-problem-td4375925.html >> >> Is there a way to not keep indexed=true for `rules` field and still does >> not impact our search(filtering performance). Or any other solution which >> can help in reducing our total index size and also does not increase >> search(filter) latency >> >> Regards, >> Raj >
Re: Best field definition which is only use for filter query.
fq clauses are just like the q clause except for two things: 1> no scoring is done 2> the entire result set _can_ be stored in the filterCache. so if a value isn’t indexed, it can’t be used in either an fq or q clause. The thread you reference is under the assumption (and this is the default in some versions of Solr) that docValues=true. And yes, that will be very, very slow. Think “table scan”. Also, the default pint type is not as efficient for single-value searches like this, the trie fields are better. Trie support will be kept until there’s a good alternative for the single-value lookup with pint. So for what you’re doing, I’d change to TrieInt, docValues=false, index=true. If you have neither docValues=true nor index=true, the query won’t work at all. You’ll have to adequately size your hardware if index size is a concern. Best, Erick > On Jul 22, 2020, at 7:18 AM, Raj Yadav wrote: > > Below is the sample document > > > > > > *{"filedA": 1,"filedB": "","filedC": "Sher","filedD": > "random","rules":[203,7843,43,283,6603,83,513,5303,243,103,323,163,403,363,5333,2483,313,703,523,503,563,8543,1003,483,1083,2043,6523,603,963,683,5353,763,443,643,743,723,1123,843,1243,1663,1803,1403,1783,7563,3843,1843,1523,1203,1563,1703,1883,8913,1923,1323,5313,1623,1963,2033,2763,2623,2083,2123,2143,123,2183,2333,8183,7323,2323,7243,2313,2463,2423,2383,5833,2343,2503,2663,8263,3083,2683,2543,8313,2883,2923,3043,2703,3243,3123,2263,3003,2393,3203,3163,6243,3283,3443,3343,3403,1913,3323,3483,3603,3723,3763,8333,3563,863,3683,3643,3523,3803,8323,3883,4003,3923,4043,4173,1163,2963,1743,6593,4083,4103,4143,1363,3983,4183,4223,6623,4383,1443,4303,4263,4403,4423,4283,4343,5043,4923,4983,4993,6633,4503,5843,8073,4663]}* > As you can see we have 5 fields and one of the field names is "rules". > Field Definition: > multiValued="true"> > > The only operation that we do on this field is filtering. > example: => fq=rules:203 > > *Problems:* > 1. The problem over here is, for `rules` field we have > marked indexed="true" and it is consuming a large percentage of total index > size. > 2. Another problem is, a large chunk of our document update request is > mainly for this(rules) field. > > If I marked `indexed=false` for this field (by default pint field type have > docValue=true) > * multiValued="true">* > Then following thread is suggesting that filter operation (which is also > one kind of search operation) will be very slow > https://lucene.472066.n3.nabble.com/Facet-performance-problem-td4375925.html > > Is there a way to not keep indexed=true for `rules` field and still does > not impact our search(filtering performance). Or any other solution which > can help in reducing our total index size and also does not increase > search(filter) latency > > Regards, > Raj
Best field definition which is only use for filter query.
Below is the sample document *{"filedA": 1,"filedB": "","filedC": "Sher","filedD": "random","rules":[203,7843,43,283,6603,83,513,5303,243,103,323,163,403,363,5333,2483,313,703,523,503,563,8543,1003,483,1083,2043,6523,603,963,683,5353,763,443,643,743,723,1123,843,1243,1663,1803,1403,1783,7563,3843,1843,1523,1203,1563,1703,1883,8913,1923,1323,5313,1623,1963,2033,2763,2623,2083,2123,2143,123,2183,2333,8183,7323,2323,7243,2313,2463,2423,2383,5833,2343,2503,2663,8263,3083,2683,2543,8313,2883,2923,3043,2703,3243,3123,2263,3003,2393,3203,3163,6243,3283,3443,3343,3403,1913,3323,3483,3603,3723,3763,8333,3563,863,3683,3643,3523,3803,8323,3883,4003,3923,4043,4173,1163,2963,1743,6593,4083,4103,4143,1363,3983,4183,4223,6623,4383,1443,4303,4263,4403,4423,4283,4343,5043,4923,4983,4993,6633,4503,5843,8073,4663]}* As you can see we have 5 fields and one of the field names is "rules". Field Definition: The only operation that we do on this field is filtering. example: => fq=rules:203 *Problems:* 1. The problem over here is, for `rules` field we have marked indexed="true" and it is consuming a large percentage of total index size. 2. Another problem is, a large chunk of our document update request is mainly for this(rules) field. If I marked `indexed=false` for this field (by default pint field type have docValue=true) ** Then following thread is suggesting that filter operation (which is also one kind of search operation) will be very slow https://lucene.472066.n3.nabble.com/Facet-performance-problem-td4375925.html Is there a way to not keep indexed=true for `rules` field and still does not impact our search(filtering performance). Or any other solution which can help in reducing our total index size and also does not increase search(filter) latency Regards, Raj