[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType

ASF subversion and git services (JIRA) Fri, 29 Nov 2013 06:38:27 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13835397#comment-13835397
 ]


ASF subversion and git services commented on SOLR-5354:
-------------------------------------------------------

Commit 1546571 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1546571 ]

SOLR-5354: don't try to write docvalues with 3.x codec in these tests

> Distributed sort is broken with CUSTOM FieldType
> ------------------------------------------------
>
>                 Key: SOLR-5354
>                 URL: https://issues.apache.org/jira/browse/SOLR-5354
>             Project: Solr
>          Issue Type: Bug
>          Components: SearchComponents - other
>    Affects Versions: 4.4, 4.5, 5.0
>            Reporter: Jessica Cheng
>            Assignee: Steve Rowe
>              Labels: custom, query, sort
>             Fix For: 5.0, 4.7
>
>         Attachments: SOLR-5354.patch, SOLR-5354.patch, SOLR-5354.patch, 
> SOLR-5354.patch
>
>
> We added a custom field type to allow an indexed binary field type that 
> supports search (exact match), prefix search, and sort as unsigned bytes 
> lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator 
> accomplishes what we want, and even though the name of the comparator 
> mentions UTF8, it doesn't actually assume so and just does byte-level 
> operation, so it's good. However, when we do this across different nodes, we 
> run into an issue where in QueryComponent.doFieldSortValues:
>           // Must do the same conversion when sorting by a
>           // String field in Lucene, which returns the terms
>           // data as BytesRef:
>           if (val instanceof BytesRef) {
>             UnicodeUtil.UTF8toUTF16((BytesRef)val, spare);
>             field.setStringValue(spare.toString());
>             val = ft.toObject(field);
>           }
> UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually 
> UTF8. I did a hack where I specified our own field comparator to be 
> ByteBuffer based to get around that instanceof check, but then the field 
> value gets transformed into BYTEARR in JavaBinCodec, and when it's 
> unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a 
> ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, 
> which decides to give me comparatorNatural in the else of the TODO for 
> CUSTOM, which barfs because byte[] are not Comparable...
> From Chris Hostetter:
> I'm not very familiar with the distributed sorting code, but based on your
> comments, and a quick skim of the functions you pointed to, it definitely
> seems like there are two problems here for people trying to implement
> custom sorting in custom FieldTypes...
> 1) QueryComponent.doFieldSortValues - this definitely seems like it should
> be based on the FieldType, not an "instanceof BytesRef" check (oddly: the
> comment event suggestsion that it should be using the FieldType's
> indexedToReadable() method -- but it doesn't do that.  If it did, then
> this part of hte logic should work for you as long as your custom
> FieldType implemented indexedToReadable in a sane way.
> 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that
> needs filled.  I'm guessing the sanest thing to do in the CUSTOM case
> would be to ask the FieldComparatorSource (which should be coming from the
> SortField that the custom FieldType produced) to create a FieldComparator
> (via newComparator - the numHits & sortPos could be anything) and then
> wrap that up in a Comparator facade that delegates to
> FieldComparator.compareValues
> That way a custom FieldType could be in complete control of the sort
> comparisons (even when merging ids).
> ...But as i said: i may be missing something, i'm not super familia with
> that code.  Please try it out and let us know if thta works -- either way
> please open a Jira pointing out the problems trying to implement
> distributed sorting in a custom FieldType.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType

Reply via email to