Re: Sorting on mutivalued fields still impossible?

2013-01-07 Thread Uwe Reh

Hi Jack,

thank you for the hint.
Since I have already a solrj client to do the preprocessing, mapping to 
sort fields isn't my problem. I will try to explain better in my reply 
to Erick.


Uwe
(Sorry late reaction)


Am 30.08.2012 16:04, schrieb Jack Krupansky:

You can also use a Field Mutating Update Processor to do a smart
copy of a multi-valued field to a sortable single-valued field.

See:
http://wiki.apache.org/solr/UpdateRequestProcessor#Field_Mutating_Update_Processors


Such as using the maximum value via MaxFieldValueUpdateProcessorFactory.

See:
http://lucene.apache.org/solr/api-4_0_0-BETA/org/apache/solr/update/processor/MaxFieldValueUpdateProcessorFactory.html


Which value of a multi-valued field do you wish to sort by?

-- Jack Krupansky




Re: Sorting on mutivalued fields still impossible?

2013-01-07 Thread Uwe Reh

Am 31.08.2012 13:35, schrieb Erick Erickson:

... what would the correct behavior
be for sorting on a multivalued field


Hi Erick,

in generally you are right, the question of multivalued fields is which 
value the reference is. But there are thousands of cases where this 
question is implicit answered. See my example ...sort=max(datefield) 
desc It is obvious, that the newest date should win. I see no 
reason why simple filters like max can't handle multivalued fields.


Now four month's later i still wounder, why there is no pluginable 
function to map multivalued fields into a single value.

eg. ...sort=sqrt(mapMultipleToOne(FQN, fieldname)) asc...

Uwe
(Sorry late reaction)




Re: Sorting on mutivalued fields still impossible?

2013-01-07 Thread Alexandre Rafalovitch
If the Multiple-to-one mapping would be stable (e.g. independent of a
query), why not implement it as a custom update.chain processor with a copy
to a separate field? There is already a couple of implementations
under FieldValueMutatingUpdateProcessor (first, last, max, min).

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Mon, Jan 7, 2013 at 8:19 AM, Uwe Reh r...@hebis.uni-frankfurt.de wrote:

 Am 31.08.2012 13:35, schrieb Erick Erickson:

 ... what would the correct behavior

 be for sorting on a multivalued field


 Hi Erick,

 in generally you are right, the question of multivalued fields is which
 value the reference is. But there are thousands of cases where this
 question is implicit answered. See my example ...sort=max(datefield)
 desc It is obvious, that the newest date should win. I see no reason
 why simple filters like max can't handle multivalued fields.

 Now four month's later i still wounder, why there is no pluginable
 function to map multivalued fields into a single value.
 eg. ...sort=sqrt(**mapMultipleToOne(FQN, fieldname)) asc...

 Uwe
 (Sorry late reaction)





Re: Sorting on mutivalued fields still impossible?

2013-01-07 Thread Uwe Reh

Hi,

like I just wrote in my reply to the similar suggestion form Jack.
I'm not looking for a way to preprocess my data.

My question is, why do i need two redundant fields to sort a multivalued 
field ('date_max' and 'date_min' for 'date')

For me it's just a waste of space, poisoning the fieldcache.

There is also an other class of problems, where a filterfunction like 
'mapMultipleToOne' may helpful. In the thread 'theory of sets' (this 
list) I described a hack with the function strdist, an own class and the 
mapping of a multiple values as a cvs list in a single value field.


Uwe




Am 07.01.2013 14:54, schrieb Alexandre Rafalovitch:

If the Multiple-to-one mapping would be stable (e.g. independent of a
query), why not implement it as a custom update.chain processor with a copy
to a separate field? There is already a couple of implementations
under FieldValueMutatingUpdateProcessor (first, last, max, min).

Regards,
Alex.





Re: Sorting on mutivalued fields still impossible?

2013-01-07 Thread Chris Hostetter

: My question is, why do i need two redundant fields to sort a multivalued field
: ('date_max' and 'date_min' for 'date')
: For me it's just a waste of space, poisoning the fieldcache.

how does two fields poion the fieldcache ? ... if there was a function 
that could find the min or max value of a multi-valued field, it would 
need to construct an UInvertedField of all N of the field values of each 
doc in order to find the min/max at query time -- by pre-computing a 
min_field and max_field at indexing time you only need FieldCache's for 
those 2 fields (where 2 = N, and N may be very big)

Generall speaking: most solr use cases are willing to pay a slightly 
higher indexing cost (time/cpu) to have faster searches -- which answers 
your earlier question...

 Now four month's later i still wounder, why there is no pluginable 
 function to map multivalued fields into a single value.

...because no one has written/contributed these functions (because most 
people would rather pay that cost at indexing time)



-Hoss


Re: Sorting on mutivalued fields still impossible?

2012-09-10 Thread Toke Eskildsen
On Fri, 2012-09-07 at 06:55 +0200, Erick Erickson wrote:
 I may prefer the first, and you may prefer the second. Neither is
 necessarily more correct IMO, it depends on the problem
 space. Choosing either one will be unpopular with anyone
 who likes the other

Sorry, I did not make myself clear: If we decide that there is only a
few obvious (that's a loaded word. Maybe common?) solutions, my idea
was to implement them all. Especially if they can be reduced to the same
underlying algorithm with a few tweaks for each case.

 And I suspect that 99 times out of 100, someone wanting to sort on
 fields with multiple tokens hasn't thought the problem through
 carefully.

That might very well be the case. I must admit that I have mostly seen
the issue as User asks for X, how do we implement X?, instead of User
asks for X, would user be better off with Y?.

 And duplicate entries in the result set gets ugly. Say a user sorts
 on a field containing 10,000 tokens. Now one doc is repeated
 10,000 times in the result set. How many docs are set for
 numFound? Faceting? Grouping?

I don't see the difference between 2 and 10,000 tokens for this, but I
concede that there is no clear answer and that choosing by setup would
require the user to have a fairly deep understanding.

I accept that there is no clear need for the functionality at this point
in time and defer hacking on it.

Thank you for your input,
Toke Eskildsen



Re: Sorting on mutivalued fields still impossible?

2012-09-06 Thread Erick Erickson
And you've illustrated my viewpoint I think by saying
two obvious choices.

I may prefer the first, and you may prefer the second. Neither is
necessarily more correct IMO, it depends on the problem
space. Choosing either one will be unpopular with anyone
who likes the other

And I suspect that 99 times out of 100, someone wanting to sort on
fields with multiple tokens hasn't thought the problem through
carefully. So I favor forcing the person with the use-case where this
is actually _desired_ behavior to work to implement rather than
have to deal with surprising orderings.

And duplicate entries in the result set gets ugly. Say a user sorts
on a field containing 10,000 tokens. Now one doc is repeated
10,000 times in the result set. How many docs are set for
numFound? Faceting? Grouping?

I think your first option is at least easy to explain, but I don't see
it as compelling enough to put the work into it, although I confess
I don't know the guts of how much work it would take to find the
first (and last, don't forget specifying desc) token for each doc

Anyway, that's my story and I'm sticking to it G...

Best
Erick

On Wed, Sep 5, 2012 at 12:54 AM, Toke Eskildsen t...@statsbiblioteket.dk 
wrote:
 On Fri, 2012-08-31 at 13:35 +0200, Erick Erickson wrote:
 Imagine you have two entries, aardvark and emu in your
 multiValued field. How should that document sort relative to
 another doc with camel and zebra? Any heuristic
 you apply will be wrong for someone else

 I see two obvious choices here:

 1) Sort by the value that is ordered first by the comparator function.
 Doc1: aardvark, (emu)
 Doc2: camel, (zebra)
 This is what Uwe wants to do and it is normally done by preprocessing
 and collapsing to a single value.
 It could be implemented with an ordered multi-valued field cache by
 comparing on the first (or last, in the case of reverse sort) entry for
 each matching document.

 2) Make duplicate entries in the result set, one for each value.
 Doc1: aardvark, (emu)
 Doc2: camel, (zebra)
 Doc1: (aardvark), emu
 Doc2: (camel), zebra
 I have a hard time coming up with a real world use case for this.
 It could be implemented by using a multi-valued field cache as above and
 putting the same document ID into the sliding window sorter once for
 each field value.

 Collapsing this into a single algorithm:
 Step through all IDs. For each ID, give access to the list of field
 values and provide a callback for adding one or more (value, ID)-pairs
 to the sliding windows sorter.


 Are there some other realistic heuristics that I have missed?



Re: Sorting on mutivalued fields still impossible?

2012-09-05 Thread Toke Eskildsen
On Fri, 2012-08-31 at 13:35 +0200, Erick Erickson wrote:
 Imagine you have two entries, aardvark and emu in your
 multiValued field. How should that document sort relative to
 another doc with camel and zebra? Any heuristic
 you apply will be wrong for someone else

I see two obvious choices here:

1) Sort by the value that is ordered first by the comparator function.
Doc1: aardvark, (emu)
Doc2: camel, (zebra)
This is what Uwe wants to do and it is normally done by preprocessing
and collapsing to a single value.
It could be implemented with an ordered multi-valued field cache by
comparing on the first (or last, in the case of reverse sort) entry for
each matching document.

2) Make duplicate entries in the result set, one for each value.
Doc1: aardvark, (emu)
Doc2: camel, (zebra)
Doc1: (aardvark), emu
Doc2: (camel), zebra
I have a hard time coming up with a real world use case for this.
It could be implemented by using a multi-valued field cache as above and
putting the same document ID into the sliding window sorter once for
each field value.

Collapsing this into a single algorithm:
Step through all IDs. For each ID, give access to the list of field
values and provide a callback for adding one or more (value, ID)-pairs
to the sliding windows sorter. 


Are there some other realistic heuristics that I have missed?



Re: Sorting on mutivalued fields still impossible?

2012-08-31 Thread Erick Erickson
In addition to Jack's comment, what would the correct behavior
be for sorting on a multivalued field? The reason this is disallowed
is because there is no correct behavior in the general case.

Imagine you have two entries, aardvark and emu in your
multiValued field. How should that document sort relative to
another doc with camel and zebra? Any heuristic
you apply will be wrong for someone else

Best
Erick


On Thu, Aug 30, 2012 at 1:57 AM, Uwe Reh r...@hebis.uni-frankfurt.de wrote:
 Hi,
 just to be sure.

 There is still no way to sort by multivalued fields?
 ...sort=max(datefield) desc

 There is no smarter option, than creating additional singelevalued fields
 just for sorting?
 eg. datafield_max and datefield_min

 Uwe


Re: Sorting on mutivalued fields still impossible?

2012-08-30 Thread Jack Krupansky
You can also use a Field Mutating Update Processor to do a smart copy of 
a multi-valued field to a sortable single-valued field.


See:
http://wiki.apache.org/solr/UpdateRequestProcessor#Field_Mutating_Update_Processors

Such as using the maximum value via MaxFieldValueUpdateProcessorFactory.

See:
http://lucene.apache.org/solr/api-4_0_0-BETA/org/apache/solr/update/processor/MaxFieldValueUpdateProcessorFactory.html

Which value of a multi-valued field do you wish to sort by?

-- Jack Krupansky

-Original Message- 
From: Uwe Reh

Sent: Thursday, August 30, 2012 1:57 AM
To: solr-user@lucene.apache.org
Subject: Sorting on mutivalued fields still impossible?

Hi,
just to be sure.

There is still no way to sort by multivalued fields?
...sort=max(datefield) desc

There is no smarter option, than creating additional singelevalued
fields just for sorting?
eg. datafield_max and datefield_min

Uwe