[ 
https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-5354:
-----------------------------

    Attachment: SOLR-5354.patch

Thanks Robert and Hoss for the reviews.

New patch incorporating Hoss's suggestions, details below:

bq. call me paranoid, but i really dislike distrib tests that only use the 
query() method to ensure that the distrib response is the same as the control 
response – could we please add some assertions that use queryServer() to prove 
the docs are coming back in the right order in the distrib test?

As Hoss suggested on #solr IRC, I changed 
{{BasicDistributedSearchTestCase#query()}} to return the {{QueryResponse}} 
(instead of void), so that {{queryServer()}} doesn't have to be called 
separately from the random server vs. control check that {{query()}} performs.

I then added a new method to {{SolrTestCaseJ4}}: 
{{assertFieldValues(doclist,fieldName,values)}}, and now check that {{id}} 
field values are in the expected order using this new method.

bq. the test should really sanity check that multi-level sorts (eg: "payload 
asc, id desc") are working properly

Added.

bq. we should be really clear & careful in the javadocs for 
FieldType.marshalSortValue and FieldType.unmarshalSortValue – in your patch 
they refer to "a value of this FieldType" but that's not actually what they 
operate on. They operate on the values used by the FieldComparator returned by 
the SortField for this FieldType (ie: SortableDoubleField's toObject returns a 
Double, but the marshal method operates on ByteRef)

Reworded.

bq. I'm confused why we still need comparatorNatural() and it's use for 
REWRITEABLE. Why not actually rewrite() the SortField using the local 
IndexSearcher and then wrap the rewritten SortField's FieldComparator using 
comparatorFieldComparator() just like any other SortField? Since we're only 
ever going to compare the raw values on the coordinator it shouldn't matter if 
we rewrite in terms of the local IndexSearcher - it's the best we can do, and 
that seems safer then assuming REWRITABLE == function and trusting 
comparatorNatural. (ie: consider someone who writes a custom FieldType that 
uses REWRITABLE)

Fixed - {{comparatorNatural()}} is gone.

bq. don't the marshal methods in StrField, TextField, and CollationField need 
null checks (for the possibilities of docs w/o a value in the sort field?)

Yes, they do, {{ICUCollatonField}} too - added.

bq. do we even have any existing tests of distributed sorting on strings & 
numerics using sortMisstingLast / sortMissingFirst to be sure we don't break 
that?

No, I couldn't find any, so I added tests for {{sortMissingFirst}} and 
{{sortMissingLast}} on both {{SortableIntField}} and {{StringField}} on the 
existing {{TestDistributedSearch}} class.  I'll add testing for a Trie field 
too before I commit, not in the patch yet.


> Distributed sort is broken with CUSTOM FieldType
> ------------------------------------------------
>
>                 Key: SOLR-5354
>                 URL: https://issues.apache.org/jira/browse/SOLR-5354
>             Project: Solr
>          Issue Type: Bug
>          Components: SearchComponents - other
>    Affects Versions: 4.4, 4.5, 5.0
>            Reporter: Jessica Cheng
>            Assignee: Steve Rowe
>              Labels: custom, query, sort
>         Attachments: SOLR-5354.patch, SOLR-5354.patch, SOLR-5354.patch
>
>
> We added a custom field type to allow an indexed binary field type that 
> supports search (exact match), prefix search, and sort as unsigned bytes 
> lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator 
> accomplishes what we want, and even though the name of the comparator 
> mentions UTF8, it doesn't actually assume so and just does byte-level 
> operation, so it's good. However, when we do this across different nodes, we 
> run into an issue where in QueryComponent.doFieldSortValues:
>           // Must do the same conversion when sorting by a
>           // String field in Lucene, which returns the terms
>           // data as BytesRef:
>           if (val instanceof BytesRef) {
>             UnicodeUtil.UTF8toUTF16((BytesRef)val, spare);
>             field.setStringValue(spare.toString());
>             val = ft.toObject(field);
>           }
> UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually 
> UTF8. I did a hack where I specified our own field comparator to be 
> ByteBuffer based to get around that instanceof check, but then the field 
> value gets transformed into BYTEARR in JavaBinCodec, and when it's 
> unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a 
> ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, 
> which decides to give me comparatorNatural in the else of the TODO for 
> CUSTOM, which barfs because byte[] are not Comparable...
> From Chris Hostetter:
> I'm not very familiar with the distributed sorting code, but based on your
> comments, and a quick skim of the functions you pointed to, it definitely
> seems like there are two problems here for people trying to implement
> custom sorting in custom FieldTypes...
> 1) QueryComponent.doFieldSortValues - this definitely seems like it should
> be based on the FieldType, not an "instanceof BytesRef" check (oddly: the
> comment event suggestsion that it should be using the FieldType's
> indexedToReadable() method -- but it doesn't do that.  If it did, then
> this part of hte logic should work for you as long as your custom
> FieldType implemented indexedToReadable in a sane way.
> 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that
> needs filled.  I'm guessing the sanest thing to do in the CUSTOM case
> would be to ask the FieldComparatorSource (which should be coming from the
> SortField that the custom FieldType produced) to create a FieldComparator
> (via newComparator - the numHits & sortPos could be anything) and then
> wrap that up in a Comparator facade that delegates to
> FieldComparator.compareValues
> That way a custom FieldType could be in complete control of the sort
> comparisons (even when merging ids).
> ...But as i said: i may be missing something, i'm not super familia with
> that code.  Please try it out and let us know if thta works -- either way
> please open a Jira pointing out the problems trying to implement
> distributed sorting in a custom FieldType.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to