Re: Sorting on multiValued fields via function query
Was there a solution here? Is there a ticket related to the sort=max(FIELD) solution? -brian -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-on-multiValued-fields-via-function-query-tp2681833p3340145.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sorting on multiValued fields via function query
+1 for both Chris's and Yonik's comments. On Thu, Mar 17, 2011 at 3:19 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Thu, Mar 17, 2011 at 2:12 PM, Chris Hostetter hossman_luc...@fucit.org wrote: As the code stands now: we fail fast and let the person building hte index make a decision. Indexing two fields when one could work is unfortunate though. I think what we should support (eventually) is a max() function will also work on a multi-valued field and select the maximum value (i.e. it will simply bypass the check for multi-valued fields). Then one can utilize sort-by-function to do sort=max(author) asc -Yonik http://lucidimagination.com
Re: Sorting on multiValued fields via function query
On Wed, Mar 16, 2011 at 6:08 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Also... if lucene is already capable of sorting on multi-valued field by choosing the largest value largest vs. smallest is presumably just arbitrary there, there is presumably no performance implication to choosing the smallest instead of the largest. It just chooses the largest, according to Yonik. It's a little more complicated than that. It's not so much an explicit feature in lucene, but just what naturally happens when building the field cache via uninverting an indexed field. It's pretty much this: for every term in the field: for every document that matches that term: value[document] = term And since terms are iterated from smallest to largest (and no, you can't reverse this) larger values end up overwriting smaller values. There's no simple patch to pick the smallest rather than the largest. In the past, lucene used to try and detect this multi-valued case by checking the number of values set in the whole array. This was unreliable though and the check was discarded. -Yonik http://lucidimagination.com
Re: Sorting on multiValued fields via function query
Here is a work around. Stick the high value and low value into other fields. Use those fields for sorting. Bill Bell Sent from mobile On Mar 17, 2011, at 8:49 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, Mar 16, 2011 at 6:08 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Also... if lucene is already capable of sorting on multi-valued field by choosing the largest value largest vs. smallest is presumably just arbitrary there, there is presumably no performance implication to choosing the smallest instead of the largest. It just chooses the largest, according to Yonik. It's a little more complicated than that. It's not so much an explicit feature in lucene, but just what naturally happens when building the field cache via uninverting an indexed field. It's pretty much this: for every term in the field: for every document that matches that term: value[document] = term And since terms are iterated from smallest to largest (and no, you can't reverse this) larger values end up overwriting smaller values. There's no simple patch to pick the smallest rather than the largest. In the past, lucene used to try and detect this multi-valued case by checking the number of values set in the whole array. This was unreliable though and the check was discarded. -Yonik http://lucidimagination.com
Re: Sorting on multiValued fields via function query
By the way, this could be done automatically by Solr or Lucene behind the scenes. Bill Bell Sent from mobile On Mar 17, 2011, at 9:02 AM, Bill Bell billnb...@gmail.com wrote: Here is a work around. Stick the high value and low value into other fields. Use those fields for sorting. Bill Bell Sent from mobile On Mar 17, 2011, at 8:49 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, Mar 16, 2011 at 6:08 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Also... if lucene is already capable of sorting on multi-valued field by choosing the largest value largest vs. smallest is presumably just arbitrary there, there is presumably no performance implication to choosing the smallest instead of the largest. It just chooses the largest, according to Yonik. It's a little more complicated than that. It's not so much an explicit feature in lucene, but just what naturally happens when building the field cache via uninverting an indexed field. It's pretty much this: for every term in the field: for every document that matches that term: value[document] = term And since terms are iterated from smallest to largest (and no, you can't reverse this) larger values end up overwriting smaller values. There's no simple patch to pick the smallest rather than the largest. In the past, lucene used to try and detect this multi-valued case by checking the number of values set in the whole array. This was unreliable though and the check was discarded. -Yonik http://lucidimagination.com
Re: Sorting on multiValued fields via function query
Aha, oh well, not quite as good/flexible as I hoped. Still, if lucene is now behaving somewhat more predictably/rationally when sorting on multi-valued fields, then I think, in response to your other email on a similar thread, perhaps SOLR-2339 is now a mistake. When lucene was returning completely unpredictable results -- and even sometimes crashing entirely -- when sorting on a multi-valued field --- then I think in that situation it made a lot of sense for Solr to prevent you from doing that, which is I think what SOLR-2339 does? So I don't think it was neccesarily a mistake in that context. But if lucene now can sort a multi-valued field without crashing when there are 'too many' unique values, and with easily described and predictable semantics (use the minimal value in the multi-valued field as sort key) -- then it probably makes more sense for Solr to let you do that if you really want to, give you enough rope to hang yourself. Jonathan On 3/17/2011 10:49 AM, Yonik Seeley wrote: On Wed, Mar 16, 2011 at 6:08 PM, Jonathan Rochkindrochk...@jhu.edu wrote: Also... if lucene is already capable of sorting on multi-valued field by choosing the largest value largest vs. smallest is presumably just arbitrary there, there is presumably no performance implication to choosing the smallest instead of the largest. It just chooses the largest, according to Yonik. It's a little more complicated than that. It's not so much an explicit feature in lucene, but just what naturally happens when building the field cache via uninverting an indexed field. It's pretty much this: for every term in the field: for every document that matches that term: value[document] = term And since terms are iterated from smallest to largest (and no, you can't reverse this) larger values end up overwriting smaller values. There's no simple patch to pick the smallest rather than the largest. In the past, lucene used to try and detect this multi-valued case by checking the number of values set in the whole array. This was unreliable though and the check was discarded. -Yonik http://lucidimagination.com
Re: Sorting on multiValued fields via function query
: But if lucene now can sort a multi-valued field without crashing when there : are 'too many' unique values, and with easily described and predictable : semantics (use the minimal value in the multi-valued field as sort key) -- : then it probably makes more sense for Solr to let you do that if you really : want to, give you enough rope to hang yourself. (Clarification: it's the the *maximal* value that gets used by lucene in that situation) I disagree. If we do what you describe we'd be relying on users to recognize when the sort logic is silently doing something tricky under the covers and make a concious decision as to if that was what they want, and if not then change their indexing to account for it. That seems like a recipe for confusion and unexpected behavior. with SOLR-2339 in place, we tell users explicitly and up front what you are attempting to do can not work as specified and we force them to decide in advance how they want to deal with it -- by either indexing the lowest value or hte highest value (or both in distinct fields). As the code stands now: we fail fast and let the person building hte index make a decision. If we silently sort on the maximal value, we leave nasty headache for people who don't realize they are missusing a multiValued field and then wonder why some sorts don't do what they expect in some situations. Bottom line: from day 1, we have always documented that sorting on multiValued fields (or fields that produced more then one document per document) didn't work. If people didn't notice that documentation, they aren't likely to notice any documentation that says it will sort on the maximal value either -- SOLR-2339 may introduce a pain point for people upgrading, but it introduces it early and loudly, not quietly at some arbitrary moment in the future when they're beating their heads against a desk wondering why some sort isn't working the way they expect it to becuase they added some more values to a few documents. -Hoss
Re: Sorting on multiValued fields via function query
On Thu, Mar 17, 2011 at 2:12 PM, Chris Hostetter hossman_luc...@fucit.org wrote: As the code stands now: we fail fast and let the person building hte index make a decision. Indexing two fields when one could work is unfortunate though. I think what we should support (eventually) is a max() function will also work on a multi-valued field and select the maximum value (i.e. it will simply bypass the check for multi-valued fields). Then one can utilize sort-by-function to do sort=max(author) asc -Yonik http://lucidimagination.com
Re: Sorting on multiValued fields via function query
Hi David, It did seem to work correctly for me - we had it running on our production indexes for some time and we never noticed any strange sorting behavior. However, many of our multiValued fields are single valued for the majority of documents in our index so we may not have noticed the incorrect sorting behaviors. Regardless, I understand the reasoning behind the restriction, I'm interested in getting around it by using a functionQuery to reduce multiValued fields to a single value. It sounds like this isn't possible, is that correct? Ideally I'd like to sort by the maximum value on descending sorts and the minimum value on ascending sorts. Is there any movement towards implementing this sort of behavior? Best, -Harish -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-on-multiValued-fields-via-function-query-tp2681833p2688288.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sorting on multiValued fields via function query
Heh heh, you say it worked correctly for me yet you didn't actually have multi-valued data ;-) Funny. The only solution right now is to store the max and min into indexed single-valued fields at index time. This is pretty straight-forward to do. Even if/when Solr supports sorting on a multi-valued field, I doubt it would perform as well as what I suggest. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/ On Mar 16, 2011, at 10:16 AM, harish.agarwal wrote: Hi David, It did seem to work correctly for me - we had it running on our production indexes for some time and we never noticed any strange sorting behavior. However, many of our multiValued fields are single valued for the majority of documents in our index so we may not have noticed the incorrect sorting behaviors. Regardless, I understand the reasoning behind the restriction, I'm interested in getting around it by using a functionQuery to reduce multiValued fields to a single value. It sounds like this isn't possible, is that correct? Ideally I'd like to sort by the maximum value on descending sorts and the minimum value on ascending sorts. Is there any movement towards implementing this sort of behavior? Best, -Harish -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-on-multiValued-fields-via-function-query-tp2681833p2688288.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sorting on multiValued fields via function query
: However, many of our multiValued fields are single valued for the majority : of documents in our index so we may not have noticed the incorrect sorting : behaviors. that would make sense ... if you use a multiValued field as if it were single valued, you would never enocunter a problem. if you had *some* multivalued fields your results would be sorted extremely arbitrarily for those docs that did have multiple values, unless you had more distinct values then you had documents -- at which point you would get a hard crash at query time. : Regardless, I understand the reasoning behind the restriction, I'm : interested in getting around it by using a functionQuery to reduce : multiValued fields to a single value. It sounds like this isn't possible, I don't think we have any functions that do that -- functions are composed of valuesources which may be composed of other value sources but ultimatley the data comes from somewhere, and in every case i can think of (except for constant values) that data comes from the FieldCache -- the same FieldCache used for sorting. I don't think there are any value sources that will let you specify a multiValued field, and then pick one of those values based on a rule/function ... even the PolyFields used for spatial search work by using multiple field names unde the covers (N distinct field names for an N-dimensional space) : is that correct? Ideally I'd like to sort by the maximum value on : descending sorts and the minimum value on ascending sorts. Is there any : movement towards implementing this sort of behavior? this is a fairly classic usecase of just having multiple fields. even if the logic was implemented to support this at query time, it could never be faster then sorting on asingle valued field that you populat with the min/max at indexing time -- the mantra of fast I/R is that if you can precompute it independently of the individual search critera, you should (it's the whole foundation for why the inverted index exists) -Hoss
Re: Sorting on multiValued fields via function query
On Wed, Mar 16, 2011 at 5:46 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : However, many of our multiValued fields are single valued for the majority : of documents in our index so we may not have noticed the incorrect sorting : behaviors. that would make sense ... if you use a multiValued field as if it were single valued, you would never enocunter a problem. if you had *some* multivalued fields your results would be sorted extremely arbitrarily for those docs that did have multiple values, unless you had more distinct values then you had documents -- at which point you would get a hard crash at query time. AFAIK, not any more. Since that behavior was very unreliable, it has been removed and you can reliably sort by any multi-valued field in lucene (with the sort order being defined by the largest value if there are multiple). -Yonik http://lucidimagination.com
Re: Sorting on multiValued fields via function query
Huh, so lucene is actually doing what has been commonly described as impossible in Solr? But is Solr trunk, as the OP person seemed to report, still not aware of this and raising on a sort on multi-valued field, instead of just saying, okay, we'll just pass it to lucene anyway and go with lucene's approach to sorting on multi-valued field (that is, apparently, using the largest value)? If so... that kind of sounds like a bug/misfeature, yes, no? Also... if lucene is already capable of sorting on multi-valued field by choosing the largest value largest vs. smallest is presumably just arbitrary there, there is presumably no performance implication to choosing the smallest instead of the largest. It just chooses the largest, according to Yonik. So... if someone patched lucene, so whether it chose the largest or smallest in that case was a parameter passed in -- probably not a large patch since lucene, says Yonik, already has been enhanced to choose largest always -- and then patched Solr to take a param and pass it to Lucene for this purpose, which presumably also wouldn't be a large patch if lucene supported it then we'd have the feature OP asked for. Based on Yonik's description (assuming I understand correctly and he's correct), it doesn't sound like a lot of code. But it's still beyond my unfamiliar-with-lucene-code-not-so-great-at-java abilities, nor do I have the interest for my own app needs at the moment. But if OP or someone else has both sounds like a plausible feature? On 3/16/2011 6:00 PM, Yonik Seeley wrote: On Wed, Mar 16, 2011 at 5:46 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : However, many of our multiValued fields are single valued for the majority : of documents in our index so we may not have noticed the incorrect sorting : behaviors. that would make sense ... if you use a multiValued field as if it were single valued, you would never enocunter a problem. if you had *some* multivalued fields your results would be sorted extremely arbitrarily for those docs that did have multiple values, unless you had more distinct values then you had documents -- at which point you would get a hard crash at query time. AFAIK, not any more. Since that behavior was very unreliable, it has been removed and you can reliably sort by any multi-valued field in lucene (with the sort order being defined by the largest value if there are multiple). -Yonik http://lucidimagination.com
Re: Sorting on multiValued fields via function query
I agree with this and it is even needed for function sorting for multvalued fields. See geohash patch for one wY to deal with multivalued fields on distance. Not ideal but it works efficiently. Bill Bell Sent from mobile On Mar 16, 2011, at 4:08 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Huh, so lucene is actually doing what has been commonly described as impossible in Solr? But is Solr trunk, as the OP person seemed to report, still not aware of this and raising on a sort on multi-valued field, instead of just saying, okay, we'll just pass it to lucene anyway and go with lucene's approach to sorting on multi-valued field (that is, apparently, using the largest value)? If so... that kind of sounds like a bug/misfeature, yes, no? Also... if lucene is already capable of sorting on multi-valued field by choosing the largest value largest vs. smallest is presumably just arbitrary there, there is presumably no performance implication to choosing the smallest instead of the largest. It just chooses the largest, according to Yonik. So... if someone patched lucene, so whether it chose the largest or smallest in that case was a parameter passed in -- probably not a large patch since lucene, says Yonik, already has been enhanced to choose largest always -- and then patched Solr to take a param and pass it to Lucene for this purpose, which presumably also wouldn't be a large patch if lucene supported it then we'd have the feature OP asked for. Based on Yonik's description (assuming I understand correctly and he's correct), it doesn't sound like a lot of code. But it's still beyond my unfamiliar-with-lucene-code-not-so-great-at-java abilities, nor do I have the interest for my own app needs at the moment. But if OP or someone else has both sounds like a plausible feature? On 3/16/2011 6:00 PM, Yonik Seeley wrote: On Wed, Mar 16, 2011 at 5:46 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : However, many of our multiValued fields are single valued for the majority : of documents in our index so we may not have noticed the incorrect sorting : behaviors. that would make sense ... if you use a multiValued field as if it were single valued, you would never enocunter a problem. if you had *some* multivalued fields your results would be sorted extremely arbitrarily for those docs that did have multiple values, unless you had more distinct values then you had documents -- at which point you would get a hard crash at query time. AFAIK, not any more. Since that behavior was very unreliable, it has been removed and you can reliably sort by any multi-valued field in lucene (with the sort order being defined by the largest value if there are multiple). -Yonik http://lucidimagination.com
Re: Sorting on multiValued fields via function query
Hi Harish. Did sorting on multiValued fields actually work correctly for you before? I'd be surprised if so. I could be wrong but I think you previously always got the sorting affects of whatever was the last indexed value. It is indeed the case that the FieldCache only supports up to one indexed value per field. Recently Hoss added sanity checks that you are seeing the results of: https://issues.apache.org/jira/browse/SOLR-2339 You might want to comment on that issue with proof (e.g. a simple test) that it worked before but not now. ~ David - Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-on-multiValued-fields-via-function-query-tp2681833p2685485.html Sent from the Solr - User mailing list archive at Nabble.com.