Just a concern where things could act a little funky: today for example, If I set strength=primary, then its going to fold Test and test to the same unique term, but under this scheme you would have <bytes>Test and <bytes>test as two terms.
this could be undesirable in the typical case that you just want case-insensitive facets: but we don't provide any way to preprocess the text to avoid this. Really a lot of this is because factory-based analysis chains have no way to specify the AttributeFactory, e.g. i guess if we really wanted to fix this right we would need to pass in the AttributeFactory to TokenizerFactory's create() method. But for now from Solr it would be a little hacky, e.g. someone is gonna have to fold the case client-side or whatever if they don't want these problems. On Tue, Sep 11, 2012 at 10:43 AM, Toke Eskildsen <t...@statsbiblioteket.dk> wrote: > Claudio Ranieri and I briefly discussed collator based sorting for > facets in the thread "Problem with accented words sorting" on the > solr-user mailing list. Here's the idea: > > Solr faceting supports sorting by either count or index order. Claudio > and I both need the order to be collator-based. My understanding of the > issue is that it is not currently possible. > > Collator-based document sorting in Solr uses CollationKeys as field > values. This does not work with faceting on fields with multiple values > as there is no mapping from the key to the human readable value. > > ICU sort keys are always null (00) terminated and when two keys are > compared, the comparison stops as soon as null is reached(?) > http://userguide.icu-project.org/collation/architecture > > If we concatenate the keys with the original values: > <key><00><original value><offset of original value> > we get an entity where the ordering is still correct upon comparison and > where the original value can be extracted by using the offset from the > last int (or maybe short, to spare 2 bytes) in the BytesRef. > > If the idea is sound, I'll open a JIRA issue. Unfortunately I do not > have time right now for hacking on it. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > -- lucidworks.com --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org