Re: Sorting on mutivalued fields still impossible?
Hi Jack, thank you for the hint. Since I have already a solrj client to do the preprocessing, mapping to sort fields isn't my problem. I will try to explain better in my reply to Erick. Uwe (Sorry late reaction) Am 30.08.2012 16:04, schrieb Jack Krupansky: You can also use a Field Mutating Update Processor to do a smart copy of a multi-valued field to a sortable single-valued field. See: http://wiki.apache.org/solr/UpdateRequestProcessor#Field_Mutating_Update_Processors Such as using the maximum value via MaxFieldValueUpdateProcessorFactory. See: http://lucene.apache.org/solr/api-4_0_0-BETA/org/apache/solr/update/processor/MaxFieldValueUpdateProcessorFactory.html Which value of a multi-valued field do you wish to sort by? -- Jack Krupansky
Re: Sorting on mutivalued fields still impossible?
Am 31.08.2012 13:35, schrieb Erick Erickson: ... what would the correct behavior be for sorting on a multivalued field Hi Erick, in generally you are right, the question of multivalued fields is which value the reference is. But there are thousands of cases where this question is implicit answered. See my example ...sort=max(datefield) desc It is obvious, that the newest date should win. I see no reason why simple filters like max can't handle multivalued fields. Now four month's later i still wounder, why there is no pluginable function to map multivalued fields into a single value. eg. ...sort=sqrt(mapMultipleToOne(FQN, fieldname)) asc... Uwe (Sorry late reaction)
Re: Sorting on mutivalued fields still impossible?
If the Multiple-to-one mapping would be stable (e.g. independent of a query), why not implement it as a custom update.chain processor with a copy to a separate field? There is already a couple of implementations under FieldValueMutatingUpdateProcessor (first, last, max, min). Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Mon, Jan 7, 2013 at 8:19 AM, Uwe Reh r...@hebis.uni-frankfurt.de wrote: Am 31.08.2012 13:35, schrieb Erick Erickson: ... what would the correct behavior be for sorting on a multivalued field Hi Erick, in generally you are right, the question of multivalued fields is which value the reference is. But there are thousands of cases where this question is implicit answered. See my example ...sort=max(datefield) desc It is obvious, that the newest date should win. I see no reason why simple filters like max can't handle multivalued fields. Now four month's later i still wounder, why there is no pluginable function to map multivalued fields into a single value. eg. ...sort=sqrt(**mapMultipleToOne(FQN, fieldname)) asc... Uwe (Sorry late reaction)
Re: Sorting on mutivalued fields still impossible?
Hi, like I just wrote in my reply to the similar suggestion form Jack. I'm not looking for a way to preprocess my data. My question is, why do i need two redundant fields to sort a multivalued field ('date_max' and 'date_min' for 'date') For me it's just a waste of space, poisoning the fieldcache. There is also an other class of problems, where a filterfunction like 'mapMultipleToOne' may helpful. In the thread 'theory of sets' (this list) I described a hack with the function strdist, an own class and the mapping of a multiple values as a cvs list in a single value field. Uwe Am 07.01.2013 14:54, schrieb Alexandre Rafalovitch: If the Multiple-to-one mapping would be stable (e.g. independent of a query), why not implement it as a custom update.chain processor with a copy to a separate field? There is already a couple of implementations under FieldValueMutatingUpdateProcessor (first, last, max, min). Regards, Alex.
Re: Sorting on mutivalued fields still impossible?
: My question is, why do i need two redundant fields to sort a multivalued field : ('date_max' and 'date_min' for 'date') : For me it's just a waste of space, poisoning the fieldcache. how does two fields poion the fieldcache ? ... if there was a function that could find the min or max value of a multi-valued field, it would need to construct an UInvertedField of all N of the field values of each doc in order to find the min/max at query time -- by pre-computing a min_field and max_field at indexing time you only need FieldCache's for those 2 fields (where 2 = N, and N may be very big) Generall speaking: most solr use cases are willing to pay a slightly higher indexing cost (time/cpu) to have faster searches -- which answers your earlier question... Now four month's later i still wounder, why there is no pluginable function to map multivalued fields into a single value. ...because no one has written/contributed these functions (because most people would rather pay that cost at indexing time) -Hoss
Re: Sorting on mutivalued fields still impossible?
On Fri, 2012-09-07 at 06:55 +0200, Erick Erickson wrote: I may prefer the first, and you may prefer the second. Neither is necessarily more correct IMO, it depends on the problem space. Choosing either one will be unpopular with anyone who likes the other Sorry, I did not make myself clear: If we decide that there is only a few obvious (that's a loaded word. Maybe common?) solutions, my idea was to implement them all. Especially if they can be reduced to the same underlying algorithm with a few tweaks for each case. And I suspect that 99 times out of 100, someone wanting to sort on fields with multiple tokens hasn't thought the problem through carefully. That might very well be the case. I must admit that I have mostly seen the issue as User asks for X, how do we implement X?, instead of User asks for X, would user be better off with Y?. And duplicate entries in the result set gets ugly. Say a user sorts on a field containing 10,000 tokens. Now one doc is repeated 10,000 times in the result set. How many docs are set for numFound? Faceting? Grouping? I don't see the difference between 2 and 10,000 tokens for this, but I concede that there is no clear answer and that choosing by setup would require the user to have a fairly deep understanding. I accept that there is no clear need for the functionality at this point in time and defer hacking on it. Thank you for your input, Toke Eskildsen
Re: Sorting on mutivalued fields still impossible?
And you've illustrated my viewpoint I think by saying two obvious choices. I may prefer the first, and you may prefer the second. Neither is necessarily more correct IMO, it depends on the problem space. Choosing either one will be unpopular with anyone who likes the other And I suspect that 99 times out of 100, someone wanting to sort on fields with multiple tokens hasn't thought the problem through carefully. So I favor forcing the person with the use-case where this is actually _desired_ behavior to work to implement rather than have to deal with surprising orderings. And duplicate entries in the result set gets ugly. Say a user sorts on a field containing 10,000 tokens. Now one doc is repeated 10,000 times in the result set. How many docs are set for numFound? Faceting? Grouping? I think your first option is at least easy to explain, but I don't see it as compelling enough to put the work into it, although I confess I don't know the guts of how much work it would take to find the first (and last, don't forget specifying desc) token for each doc Anyway, that's my story and I'm sticking to it G... Best Erick On Wed, Sep 5, 2012 at 12:54 AM, Toke Eskildsen t...@statsbiblioteket.dk wrote: On Fri, 2012-08-31 at 13:35 +0200, Erick Erickson wrote: Imagine you have two entries, aardvark and emu in your multiValued field. How should that document sort relative to another doc with camel and zebra? Any heuristic you apply will be wrong for someone else I see two obvious choices here: 1) Sort by the value that is ordered first by the comparator function. Doc1: aardvark, (emu) Doc2: camel, (zebra) This is what Uwe wants to do and it is normally done by preprocessing and collapsing to a single value. It could be implemented with an ordered multi-valued field cache by comparing on the first (or last, in the case of reverse sort) entry for each matching document. 2) Make duplicate entries in the result set, one for each value. Doc1: aardvark, (emu) Doc2: camel, (zebra) Doc1: (aardvark), emu Doc2: (camel), zebra I have a hard time coming up with a real world use case for this. It could be implemented by using a multi-valued field cache as above and putting the same document ID into the sliding window sorter once for each field value. Collapsing this into a single algorithm: Step through all IDs. For each ID, give access to the list of field values and provide a callback for adding one or more (value, ID)-pairs to the sliding windows sorter. Are there some other realistic heuristics that I have missed?
Re: Sorting on mutivalued fields still impossible?
On Fri, 2012-08-31 at 13:35 +0200, Erick Erickson wrote: Imagine you have two entries, aardvark and emu in your multiValued field. How should that document sort relative to another doc with camel and zebra? Any heuristic you apply will be wrong for someone else I see two obvious choices here: 1) Sort by the value that is ordered first by the comparator function. Doc1: aardvark, (emu) Doc2: camel, (zebra) This is what Uwe wants to do and it is normally done by preprocessing and collapsing to a single value. It could be implemented with an ordered multi-valued field cache by comparing on the first (or last, in the case of reverse sort) entry for each matching document. 2) Make duplicate entries in the result set, one for each value. Doc1: aardvark, (emu) Doc2: camel, (zebra) Doc1: (aardvark), emu Doc2: (camel), zebra I have a hard time coming up with a real world use case for this. It could be implemented by using a multi-valued field cache as above and putting the same document ID into the sliding window sorter once for each field value. Collapsing this into a single algorithm: Step through all IDs. For each ID, give access to the list of field values and provide a callback for adding one or more (value, ID)-pairs to the sliding windows sorter. Are there some other realistic heuristics that I have missed?
Re: Sorting on mutivalued fields still impossible?
In addition to Jack's comment, what would the correct behavior be for sorting on a multivalued field? The reason this is disallowed is because there is no correct behavior in the general case. Imagine you have two entries, aardvark and emu in your multiValued field. How should that document sort relative to another doc with camel and zebra? Any heuristic you apply will be wrong for someone else Best Erick On Thu, Aug 30, 2012 at 1:57 AM, Uwe Reh r...@hebis.uni-frankfurt.de wrote: Hi, just to be sure. There is still no way to sort by multivalued fields? ...sort=max(datefield) desc There is no smarter option, than creating additional singelevalued fields just for sorting? eg. datafield_max and datefield_min Uwe
Re: Sorting on mutivalued fields still impossible?
You can also use a Field Mutating Update Processor to do a smart copy of a multi-valued field to a sortable single-valued field. See: http://wiki.apache.org/solr/UpdateRequestProcessor#Field_Mutating_Update_Processors Such as using the maximum value via MaxFieldValueUpdateProcessorFactory. See: http://lucene.apache.org/solr/api-4_0_0-BETA/org/apache/solr/update/processor/MaxFieldValueUpdateProcessorFactory.html Which value of a multi-valued field do you wish to sort by? -- Jack Krupansky -Original Message- From: Uwe Reh Sent: Thursday, August 30, 2012 1:57 AM To: solr-user@lucene.apache.org Subject: Sorting on mutivalued fields still impossible? Hi, just to be sure. There is still no way to sort by multivalued fields? ...sort=max(datefield) desc There is no smarter option, than creating additional singelevalued fields just for sorting? eg. datafield_max and datefield_min Uwe