[
https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hoss Man reopened SOLR-5354:
----------------------------
Looking further at this as part of SOLR-5463 I realized that this change
introduces a new bug in some edge cases of sorting on functions. The easiest
way to reproduce this is to add a catchall dynamic field to the example
schema.xml...
{{<dynamicField name="*" type="ignored" multiValued="true" />}}
..and then attempt a distributed sort on a function...
http://localhost:8983/solr/select?q=*:*&sort=sum%28popularity,price%29+desc&shards=localhost:8983/solr
(But the problem can be reproduced in more sublte ways -- notably any prefix
based dynamicField that is also a prefix of a function -- for example {{"s*"}}
or {{"currency*"} as dynamicFields)
The problem comes about because of a mistaken impression I had when i made this
comment above...
{quote}
bq. I think solr should fix its own apis here? It could add FieldType[] to
SortSpec or something like that.
I'm not sure why that would help? We can already ask each SortField for it's
getField() and then look that up in the Schema.
{quote}
My mistake was in thinking that SortField.getField() would always be null
unless the Sort was on a "real" field -- but SortField's built arround
functions have a getField method that returns a string representation of the
function. (This is behavior is currently required for the distributed sorting
code when serializing/deserializing the list of sort values from each shard in
order to know which values belong to which SortField).
In the past, using SortField.getField() to ask IndexSchema for a FieldType was
a rare occurance driven by the runtime type of the value found -- the FieldType
found was never used unless the runtime sort value was a String, so it was
never a problem if a dynamicField pattern matched a sort function string since
they never returned Strings. But now wit hthis new code, where we use the
FieldType's marshalling methods for any valid field, it can cause problems.
I think rmuir's suggestion is spot on: When parsing the SortSpec we need to
keep track of the FieldType at a minimum -- but it's just as easy to keep track
of the SchemaField itself.
I've got a patch where I tweaked sarowe's new test to demonstrate the bug and
then fix it by having SortSpec keep track of a List<SchemaField> that
corrisponds one-to-one with the SortFields. It's a bit hairy, but it works.
> Distributed sort is broken with CUSTOM FieldType
> ------------------------------------------------
>
> Key: SOLR-5354
> URL: https://issues.apache.org/jira/browse/SOLR-5354
> Project: Solr
> Issue Type: Bug
> Components: SearchComponents - other
> Affects Versions: 4.4, 4.5, 5.0
> Reporter: Jessica Cheng
> Assignee: Steve Rowe
> Labels: custom, query, sort
> Fix For: 5.0, 4.7
>
> Attachments: SOLR-5354.patch, SOLR-5354.patch, SOLR-5354.patch,
> SOLR-5354.patch
>
>
> We added a custom field type to allow an indexed binary field type that
> supports search (exact match), prefix search, and sort as unsigned bytes
> lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator
> accomplishes what we want, and even though the name of the comparator
> mentions UTF8, it doesn't actually assume so and just does byte-level
> operation, so it's good. However, when we do this across different nodes, we
> run into an issue where in QueryComponent.doFieldSortValues:
> // Must do the same conversion when sorting by a
> // String field in Lucene, which returns the terms
> // data as BytesRef:
> if (val instanceof BytesRef) {
> UnicodeUtil.UTF8toUTF16((BytesRef)val, spare);
> field.setStringValue(spare.toString());
> val = ft.toObject(field);
> }
> UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually
> UTF8. I did a hack where I specified our own field comparator to be
> ByteBuffer based to get around that instanceof check, but then the field
> value gets transformed into BYTEARR in JavaBinCodec, and when it's
> unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a
> ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator,
> which decides to give me comparatorNatural in the else of the TODO for
> CUSTOM, which barfs because byte[] are not Comparable...
> From Chris Hostetter:
> I'm not very familiar with the distributed sorting code, but based on your
> comments, and a quick skim of the functions you pointed to, it definitely
> seems like there are two problems here for people trying to implement
> custom sorting in custom FieldTypes...
> 1) QueryComponent.doFieldSortValues - this definitely seems like it should
> be based on the FieldType, not an "instanceof BytesRef" check (oddly: the
> comment event suggestsion that it should be using the FieldType's
> indexedToReadable() method -- but it doesn't do that. If it did, then
> this part of hte logic should work for you as long as your custom
> FieldType implemented indexedToReadable in a sane way.
> 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that
> needs filled. I'm guessing the sanest thing to do in the CUSTOM case
> would be to ask the FieldComparatorSource (which should be coming from the
> SortField that the custom FieldType produced) to create a FieldComparator
> (via newComparator - the numHits & sortPos could be anything) and then
> wrap that up in a Comparator facade that delegates to
> FieldComparator.compareValues
> That way a custom FieldType could be in complete control of the sort
> comparisons (even when merging ids).
> ...But as i said: i may be missing something, i'm not super familia with
> that code. Please try it out and let us know if thta works -- either way
> please open a Jira pointing out the problems trying to implement
> distributed sorting in a custom FieldType.
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]