[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType
[ https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872841#comment-13872841 ] ASF subversion and git services commented on SOLR-5354: --- Commit 1558618 from [~rcmuir] in branch 'dev/branches/lucene539399' [ https://svn.apache.org/r1558618 ] LUCENE-5399, SOLR-5354: fix distributed grouping to marshal/unmarshal sort values properly Distributed sort is broken with CUSTOM FieldType Key: SOLR-5354 URL: https://issues.apache.org/jira/browse/SOLR-5354 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 4.4, 4.5, 4.6, 5.0 Reporter: Jessica Cheng Assignee: Steve Rowe Labels: custom, query, sort Fix For: 5.0, 4.7 Attachments: SOLR-5354.patch, SOLR-5354.patch, SOLR-5354.patch, SOLR-5354.patch, SOLR-5354__fix_function_edge_case.patch We added a custom field type to allow an indexed binary field type that supports search (exact match), prefix search, and sort as unsigned bytes lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator accomplishes what we want, and even though the name of the comparator mentions UTF8, it doesn't actually assume so and just does byte-level operation, so it's good. However, when we do this across different nodes, we run into an issue where in QueryComponent.doFieldSortValues: // Must do the same conversion when sorting by a // String field in Lucene, which returns the terms // data as BytesRef: if (val instanceof BytesRef) { UnicodeUtil.UTF8toUTF16((BytesRef)val, spare); field.setStringValue(spare.toString()); val = ft.toObject(field); } UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually UTF8. I did a hack where I specified our own field comparator to be ByteBuffer based to get around that instanceof check, but then the field value gets transformed into BYTEARR in JavaBinCodec, and when it's unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, which decides to give me comparatorNatural in the else of the TODO for CUSTOM, which barfs because byte[] are not Comparable... From Chris Hostetter: I'm not very familiar with the distributed sorting code, but based on your comments, and a quick skim of the functions you pointed to, it definitely seems like there are two problems here for people trying to implement custom sorting in custom FieldTypes... 1) QueryComponent.doFieldSortValues - this definitely seems like it should be based on the FieldType, not an instanceof BytesRef check (oddly: the comment event suggestsion that it should be using the FieldType's indexedToReadable() method -- but it doesn't do that. If it did, then this part of hte logic should work for you as long as your custom FieldType implemented indexedToReadable in a sane way. 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that needs filled. I'm guessing the sanest thing to do in the CUSTOM case would be to ask the FieldComparatorSource (which should be coming from the SortField that the custom FieldType produced) to create a FieldComparator (via newComparator - the numHits sortPos could be anything) and then wrap that up in a Comparator facade that delegates to FieldComparator.compareValues That way a custom FieldType could be in complete control of the sort comparisons (even when merging ids). ...But as i said: i may be missing something, i'm not super familia with that code. Please try it out and let us know if thta works -- either way please open a Jira pointing out the problems trying to implement distributed sorting in a custom FieldType. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType
[ https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837786#comment-13837786 ] ASF subversion and git services commented on SOLR-5354: --- Commit 1547430 from [~steve_rowe] in branch 'dev/trunk' [ https://svn.apache.org/r1547430 ] SOLR-5354: applying hoss's patch to fix function edge case in distributed sort Distributed sort is broken with CUSTOM FieldType Key: SOLR-5354 URL: https://issues.apache.org/jira/browse/SOLR-5354 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 4.4, 4.5, 5.0 Reporter: Jessica Cheng Assignee: Steve Rowe Labels: custom, query, sort Fix For: 5.0, 4.7 Attachments: SOLR-5354.patch, SOLR-5354.patch, SOLR-5354.patch, SOLR-5354.patch, SOLR-5354__fix_function_edge_case.patch We added a custom field type to allow an indexed binary field type that supports search (exact match), prefix search, and sort as unsigned bytes lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator accomplishes what we want, and even though the name of the comparator mentions UTF8, it doesn't actually assume so and just does byte-level operation, so it's good. However, when we do this across different nodes, we run into an issue where in QueryComponent.doFieldSortValues: // Must do the same conversion when sorting by a // String field in Lucene, which returns the terms // data as BytesRef: if (val instanceof BytesRef) { UnicodeUtil.UTF8toUTF16((BytesRef)val, spare); field.setStringValue(spare.toString()); val = ft.toObject(field); } UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually UTF8. I did a hack where I specified our own field comparator to be ByteBuffer based to get around that instanceof check, but then the field value gets transformed into BYTEARR in JavaBinCodec, and when it's unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, which decides to give me comparatorNatural in the else of the TODO for CUSTOM, which barfs because byte[] are not Comparable... From Chris Hostetter: I'm not very familiar with the distributed sorting code, but based on your comments, and a quick skim of the functions you pointed to, it definitely seems like there are two problems here for people trying to implement custom sorting in custom FieldTypes... 1) QueryComponent.doFieldSortValues - this definitely seems like it should be based on the FieldType, not an instanceof BytesRef check (oddly: the comment event suggestsion that it should be using the FieldType's indexedToReadable() method -- but it doesn't do that. If it did, then this part of hte logic should work for you as long as your custom FieldType implemented indexedToReadable in a sane way. 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that needs filled. I'm guessing the sanest thing to do in the CUSTOM case would be to ask the FieldComparatorSource (which should be coming from the SortField that the custom FieldType produced) to create a FieldComparator (via newComparator - the numHits sortPos could be anything) and then wrap that up in a Comparator facade that delegates to FieldComparator.compareValues That way a custom FieldType could be in complete control of the sort comparisons (even when merging ids). ...But as i said: i may be missing something, i'm not super familia with that code. Please try it out and let us know if thta works -- either way please open a Jira pointing out the problems trying to implement distributed sorting in a custom FieldType. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType
[ https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837888#comment-13837888 ] ASF subversion and git services commented on SOLR-5354: --- Commit 1547473 from [~steve_rowe] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1547473 ] SOLR-5354: applying hoss's patch to fix function edge case in distributed sort (merged trunk r1547430) Distributed sort is broken with CUSTOM FieldType Key: SOLR-5354 URL: https://issues.apache.org/jira/browse/SOLR-5354 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 4.4, 4.5, 5.0 Reporter: Jessica Cheng Assignee: Steve Rowe Labels: custom, query, sort Fix For: 5.0, 4.7 Attachments: SOLR-5354.patch, SOLR-5354.patch, SOLR-5354.patch, SOLR-5354.patch, SOLR-5354__fix_function_edge_case.patch We added a custom field type to allow an indexed binary field type that supports search (exact match), prefix search, and sort as unsigned bytes lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator accomplishes what we want, and even though the name of the comparator mentions UTF8, it doesn't actually assume so and just does byte-level operation, so it's good. However, when we do this across different nodes, we run into an issue where in QueryComponent.doFieldSortValues: // Must do the same conversion when sorting by a // String field in Lucene, which returns the terms // data as BytesRef: if (val instanceof BytesRef) { UnicodeUtil.UTF8toUTF16((BytesRef)val, spare); field.setStringValue(spare.toString()); val = ft.toObject(field); } UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually UTF8. I did a hack where I specified our own field comparator to be ByteBuffer based to get around that instanceof check, but then the field value gets transformed into BYTEARR in JavaBinCodec, and when it's unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, which decides to give me comparatorNatural in the else of the TODO for CUSTOM, which barfs because byte[] are not Comparable... From Chris Hostetter: I'm not very familiar with the distributed sorting code, but based on your comments, and a quick skim of the functions you pointed to, it definitely seems like there are two problems here for people trying to implement custom sorting in custom FieldTypes... 1) QueryComponent.doFieldSortValues - this definitely seems like it should be based on the FieldType, not an instanceof BytesRef check (oddly: the comment event suggestsion that it should be using the FieldType's indexedToReadable() method -- but it doesn't do that. If it did, then this part of hte logic should work for you as long as your custom FieldType implemented indexedToReadable in a sane way. 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that needs filled. I'm guessing the sanest thing to do in the CUSTOM case would be to ask the FieldComparatorSource (which should be coming from the SortField that the custom FieldType produced) to create a FieldComparator (via newComparator - the numHits sortPos could be anything) and then wrap that up in a Comparator facade that delegates to FieldComparator.compareValues That way a custom FieldType could be in complete control of the sort comparisons (even when merging ids). ...But as i said: i may be missing something, i'm not super familia with that code. Please try it out and let us know if thta works -- either way please open a Jira pointing out the problems trying to implement distributed sorting in a custom FieldType. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType
[ https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13835397#comment-13835397 ] ASF subversion and git services commented on SOLR-5354: --- Commit 1546571 from [~rcmuir] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1546571 ] SOLR-5354: don't try to write docvalues with 3.x codec in these tests Distributed sort is broken with CUSTOM FieldType Key: SOLR-5354 URL: https://issues.apache.org/jira/browse/SOLR-5354 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 4.4, 4.5, 5.0 Reporter: Jessica Cheng Assignee: Steve Rowe Labels: custom, query, sort Fix For: 5.0, 4.7 Attachments: SOLR-5354.patch, SOLR-5354.patch, SOLR-5354.patch, SOLR-5354.patch We added a custom field type to allow an indexed binary field type that supports search (exact match), prefix search, and sort as unsigned bytes lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator accomplishes what we want, and even though the name of the comparator mentions UTF8, it doesn't actually assume so and just does byte-level operation, so it's good. However, when we do this across different nodes, we run into an issue where in QueryComponent.doFieldSortValues: // Must do the same conversion when sorting by a // String field in Lucene, which returns the terms // data as BytesRef: if (val instanceof BytesRef) { UnicodeUtil.UTF8toUTF16((BytesRef)val, spare); field.setStringValue(spare.toString()); val = ft.toObject(field); } UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually UTF8. I did a hack where I specified our own field comparator to be ByteBuffer based to get around that instanceof check, but then the field value gets transformed into BYTEARR in JavaBinCodec, and when it's unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, which decides to give me comparatorNatural in the else of the TODO for CUSTOM, which barfs because byte[] are not Comparable... From Chris Hostetter: I'm not very familiar with the distributed sorting code, but based on your comments, and a quick skim of the functions you pointed to, it definitely seems like there are two problems here for people trying to implement custom sorting in custom FieldTypes... 1) QueryComponent.doFieldSortValues - this definitely seems like it should be based on the FieldType, not an instanceof BytesRef check (oddly: the comment event suggestsion that it should be using the FieldType's indexedToReadable() method -- but it doesn't do that. If it did, then this part of hte logic should work for you as long as your custom FieldType implemented indexedToReadable in a sane way. 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that needs filled. I'm guessing the sanest thing to do in the CUSTOM case would be to ask the FieldComparatorSource (which should be coming from the SortField that the custom FieldType produced) to create a FieldComparator (via newComparator - the numHits sortPos could be anything) and then wrap that up in a Comparator facade that delegates to FieldComparator.compareValues That way a custom FieldType could be in complete control of the sort comparisons (even when merging ids). ...But as i said: i may be missing something, i'm not super familia with that code. Please try it out and let us know if thta works -- either way please open a Jira pointing out the problems trying to implement distributed sorting in a custom FieldType. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType
[ https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13835434#comment-13835434 ] ASF subversion and git services commented on SOLR-5354: --- Commit 1546589 from [~steve_rowe] in branch 'dev/trunk' [ https://svn.apache.org/r1546589 ] SOLR-5354: fix attribution Distributed sort is broken with CUSTOM FieldType Key: SOLR-5354 URL: https://issues.apache.org/jira/browse/SOLR-5354 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 4.4, 4.5, 5.0 Reporter: Jessica Cheng Assignee: Steve Rowe Labels: custom, query, sort Fix For: 5.0, 4.7 Attachments: SOLR-5354.patch, SOLR-5354.patch, SOLR-5354.patch, SOLR-5354.patch We added a custom field type to allow an indexed binary field type that supports search (exact match), prefix search, and sort as unsigned bytes lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator accomplishes what we want, and even though the name of the comparator mentions UTF8, it doesn't actually assume so and just does byte-level operation, so it's good. However, when we do this across different nodes, we run into an issue where in QueryComponent.doFieldSortValues: // Must do the same conversion when sorting by a // String field in Lucene, which returns the terms // data as BytesRef: if (val instanceof BytesRef) { UnicodeUtil.UTF8toUTF16((BytesRef)val, spare); field.setStringValue(spare.toString()); val = ft.toObject(field); } UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually UTF8. I did a hack where I specified our own field comparator to be ByteBuffer based to get around that instanceof check, but then the field value gets transformed into BYTEARR in JavaBinCodec, and when it's unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, which decides to give me comparatorNatural in the else of the TODO for CUSTOM, which barfs because byte[] are not Comparable... From Chris Hostetter: I'm not very familiar with the distributed sorting code, but based on your comments, and a quick skim of the functions you pointed to, it definitely seems like there are two problems here for people trying to implement custom sorting in custom FieldTypes... 1) QueryComponent.doFieldSortValues - this definitely seems like it should be based on the FieldType, not an instanceof BytesRef check (oddly: the comment event suggestsion that it should be using the FieldType's indexedToReadable() method -- but it doesn't do that. If it did, then this part of hte logic should work for you as long as your custom FieldType implemented indexedToReadable in a sane way. 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that needs filled. I'm guessing the sanest thing to do in the CUSTOM case would be to ask the FieldComparatorSource (which should be coming from the SortField that the custom FieldType produced) to create a FieldComparator (via newComparator - the numHits sortPos could be anything) and then wrap that up in a Comparator facade that delegates to FieldComparator.compareValues That way a custom FieldType could be in complete control of the sort comparisons (even when merging ids). ...But as i said: i may be missing something, i'm not super familia with that code. Please try it out and let us know if thta works -- either way please open a Jira pointing out the problems trying to implement distributed sorting in a custom FieldType. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType
[ https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13835435#comment-13835435 ] ASF subversion and git services commented on SOLR-5354: --- Commit 1546591 from [~steve_rowe] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1546591 ] SOLR-5354: fix attribution (merged trunk r1546589) Distributed sort is broken with CUSTOM FieldType Key: SOLR-5354 URL: https://issues.apache.org/jira/browse/SOLR-5354 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 4.4, 4.5, 5.0 Reporter: Jessica Cheng Assignee: Steve Rowe Labels: custom, query, sort Fix For: 5.0, 4.7 Attachments: SOLR-5354.patch, SOLR-5354.patch, SOLR-5354.patch, SOLR-5354.patch We added a custom field type to allow an indexed binary field type that supports search (exact match), prefix search, and sort as unsigned bytes lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator accomplishes what we want, and even though the name of the comparator mentions UTF8, it doesn't actually assume so and just does byte-level operation, so it's good. However, when we do this across different nodes, we run into an issue where in QueryComponent.doFieldSortValues: // Must do the same conversion when sorting by a // String field in Lucene, which returns the terms // data as BytesRef: if (val instanceof BytesRef) { UnicodeUtil.UTF8toUTF16((BytesRef)val, spare); field.setStringValue(spare.toString()); val = ft.toObject(field); } UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually UTF8. I did a hack where I specified our own field comparator to be ByteBuffer based to get around that instanceof check, but then the field value gets transformed into BYTEARR in JavaBinCodec, and when it's unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, which decides to give me comparatorNatural in the else of the TODO for CUSTOM, which barfs because byte[] are not Comparable... From Chris Hostetter: I'm not very familiar with the distributed sorting code, but based on your comments, and a quick skim of the functions you pointed to, it definitely seems like there are two problems here for people trying to implement custom sorting in custom FieldTypes... 1) QueryComponent.doFieldSortValues - this definitely seems like it should be based on the FieldType, not an instanceof BytesRef check (oddly: the comment event suggestsion that it should be using the FieldType's indexedToReadable() method -- but it doesn't do that. If it did, then this part of hte logic should work for you as long as your custom FieldType implemented indexedToReadable in a sane way. 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that needs filled. I'm guessing the sanest thing to do in the CUSTOM case would be to ask the FieldComparatorSource (which should be coming from the SortField that the custom FieldType produced) to create a FieldComparator (via newComparator - the numHits sortPos could be anything) and then wrap that up in a Comparator facade that delegates to FieldComparator.compareValues That way a custom FieldType could be in complete control of the sort comparisons (even when merging ids). ...But as i said: i may be missing something, i'm not super familia with that code. Please try it out and let us know if thta works -- either way please open a Jira pointing out the problems trying to implement distributed sorting in a custom FieldType. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType
[ https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13835077#comment-13835077 ] ASF subversion and git services commented on SOLR-5354: --- Commit 1546457 from [~steve_rowe] in branch 'dev/trunk' [ https://svn.apache.org/r1546457 ] SOLR-5354: Distributed sort is broken with CUSTOM FieldType Distributed sort is broken with CUSTOM FieldType Key: SOLR-5354 URL: https://issues.apache.org/jira/browse/SOLR-5354 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 4.4, 4.5, 5.0 Reporter: Jessica Cheng Assignee: Steve Rowe Labels: custom, query, sort Attachments: SOLR-5354.patch, SOLR-5354.patch, SOLR-5354.patch, SOLR-5354.patch We added a custom field type to allow an indexed binary field type that supports search (exact match), prefix search, and sort as unsigned bytes lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator accomplishes what we want, and even though the name of the comparator mentions UTF8, it doesn't actually assume so and just does byte-level operation, so it's good. However, when we do this across different nodes, we run into an issue where in QueryComponent.doFieldSortValues: // Must do the same conversion when sorting by a // String field in Lucene, which returns the terms // data as BytesRef: if (val instanceof BytesRef) { UnicodeUtil.UTF8toUTF16((BytesRef)val, spare); field.setStringValue(spare.toString()); val = ft.toObject(field); } UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually UTF8. I did a hack where I specified our own field comparator to be ByteBuffer based to get around that instanceof check, but then the field value gets transformed into BYTEARR in JavaBinCodec, and when it's unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, which decides to give me comparatorNatural in the else of the TODO for CUSTOM, which barfs because byte[] are not Comparable... From Chris Hostetter: I'm not very familiar with the distributed sorting code, but based on your comments, and a quick skim of the functions you pointed to, it definitely seems like there are two problems here for people trying to implement custom sorting in custom FieldTypes... 1) QueryComponent.doFieldSortValues - this definitely seems like it should be based on the FieldType, not an instanceof BytesRef check (oddly: the comment event suggestsion that it should be using the FieldType's indexedToReadable() method -- but it doesn't do that. If it did, then this part of hte logic should work for you as long as your custom FieldType implemented indexedToReadable in a sane way. 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that needs filled. I'm guessing the sanest thing to do in the CUSTOM case would be to ask the FieldComparatorSource (which should be coming from the SortField that the custom FieldType produced) to create a FieldComparator (via newComparator - the numHits sortPos could be anything) and then wrap that up in a Comparator facade that delegates to FieldComparator.compareValues That way a custom FieldType could be in complete control of the sort comparisons (even when merging ids). ...But as i said: i may be missing something, i'm not super familia with that code. Please try it out and let us know if thta works -- either way please open a Jira pointing out the problems trying to implement distributed sorting in a custom FieldType. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType
[ https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13835081#comment-13835081 ] ASF subversion and git services commented on SOLR-5354: --- Commit 1546461 from [~steve_rowe] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1546461 ] SOLR-5354: Distributed sort is broken with CUSTOM FieldType (merged trunk r1546457) Distributed sort is broken with CUSTOM FieldType Key: SOLR-5354 URL: https://issues.apache.org/jira/browse/SOLR-5354 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 4.4, 4.5, 5.0 Reporter: Jessica Cheng Assignee: Steve Rowe Labels: custom, query, sort Fix For: 5.0, 4.7 Attachments: SOLR-5354.patch, SOLR-5354.patch, SOLR-5354.patch, SOLR-5354.patch We added a custom field type to allow an indexed binary field type that supports search (exact match), prefix search, and sort as unsigned bytes lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator accomplishes what we want, and even though the name of the comparator mentions UTF8, it doesn't actually assume so and just does byte-level operation, so it's good. However, when we do this across different nodes, we run into an issue where in QueryComponent.doFieldSortValues: // Must do the same conversion when sorting by a // String field in Lucene, which returns the terms // data as BytesRef: if (val instanceof BytesRef) { UnicodeUtil.UTF8toUTF16((BytesRef)val, spare); field.setStringValue(spare.toString()); val = ft.toObject(field); } UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually UTF8. I did a hack where I specified our own field comparator to be ByteBuffer based to get around that instanceof check, but then the field value gets transformed into BYTEARR in JavaBinCodec, and when it's unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, which decides to give me comparatorNatural in the else of the TODO for CUSTOM, which barfs because byte[] are not Comparable... From Chris Hostetter: I'm not very familiar with the distributed sorting code, but based on your comments, and a quick skim of the functions you pointed to, it definitely seems like there are two problems here for people trying to implement custom sorting in custom FieldTypes... 1) QueryComponent.doFieldSortValues - this definitely seems like it should be based on the FieldType, not an instanceof BytesRef check (oddly: the comment event suggestsion that it should be using the FieldType's indexedToReadable() method -- but it doesn't do that. If it did, then this part of hte logic should work for you as long as your custom FieldType implemented indexedToReadable in a sane way. 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that needs filled. I'm guessing the sanest thing to do in the CUSTOM case would be to ask the FieldComparatorSource (which should be coming from the SortField that the custom FieldType produced) to create a FieldComparator (via newComparator - the numHits sortPos could be anything) and then wrap that up in a Comparator facade that delegates to FieldComparator.compareValues That way a custom FieldType could be in complete control of the sort comparisons (even when merging ids). ...But as i said: i may be missing something, i'm not super familia with that code. Please try it out and let us know if thta works -- either way please open a Jira pointing out the problems trying to implement distributed sorting in a custom FieldType. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType
[ https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13833312#comment-13833312 ] Hoss Man commented on SOLR-5354: Steve: looks great for the most part. a few comments / questions... * call me paranoid, but i really dislike distrib tests that *only* use the query() method to ensure that the distrib response is the same as the control response -- could we please add some assertions that use queryServer() to prove the docs are coming back in the right order in the distrib test? * the test should really sanity check that multi-level sorts (eg: payload asc, id desc) are working properly * we should be really clear careful in the javadocs for FieldType.marshalSortValue and FieldType.unmarshalSortValue -- in your patch they refer to a value of this FieldType but that's not actually what they operate on. They operate on the values used by the FieldComparator returned by the SortField for this FieldType (ie: SortableDoubleField's toObject returns a Double, but the marshal method operates on ByteRef) * I'm confused why we still need comparatorNatural() and it's use for REWRITEABLE. Why not actually rewrite() the SortField using the local IndexSearcher and then wrap the rewritten SortField's FieldComparator using comparatorFieldComparator() just like any other SortField? Since we're only ever going to compare the raw values on the coordinator it shouldn't matter if we rewrite in terms of the local IndexSearcher - it's the best we can do, and that seems safer then assuming REWRITABLE == function and trusting comparatorNatural. (ie: consider someone who writes a custom FieldType that uses REWRITABLE) * don't the marshal methods in StrField, TextField, and CollationField need null checks (for the possibilities of docs w/o a value in the sort field?) * do we even have any existing tests of distributed sorting on strings numerics using sortMisstingLast / sortMissingFirst to be sure we don't break that? Distributed sort is broken with CUSTOM FieldType Key: SOLR-5354 URL: https://issues.apache.org/jira/browse/SOLR-5354 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 4.4, 4.5, 5.0 Reporter: Jessica Cheng Assignee: Steve Rowe Labels: custom, query, sort Attachments: SOLR-5354.patch, SOLR-5354.patch We added a custom field type to allow an indexed binary field type that supports search (exact match), prefix search, and sort as unsigned bytes lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator accomplishes what we want, and even though the name of the comparator mentions UTF8, it doesn't actually assume so and just does byte-level operation, so it's good. However, when we do this across different nodes, we run into an issue where in QueryComponent.doFieldSortValues: // Must do the same conversion when sorting by a // String field in Lucene, which returns the terms // data as BytesRef: if (val instanceof BytesRef) { UnicodeUtil.UTF8toUTF16((BytesRef)val, spare); field.setStringValue(spare.toString()); val = ft.toObject(field); } UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually UTF8. I did a hack where I specified our own field comparator to be ByteBuffer based to get around that instanceof check, but then the field value gets transformed into BYTEARR in JavaBinCodec, and when it's unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, which decides to give me comparatorNatural in the else of the TODO for CUSTOM, which barfs because byte[] are not Comparable... From Chris Hostetter: I'm not very familiar with the distributed sorting code, but based on your comments, and a quick skim of the functions you pointed to, it definitely seems like there are two problems here for people trying to implement custom sorting in custom FieldTypes... 1) QueryComponent.doFieldSortValues - this definitely seems like it should be based on the FieldType, not an instanceof BytesRef check (oddly: the comment event suggestsion that it should be using the FieldType's indexedToReadable() method -- but it doesn't do that. If it did, then this part of hte logic should work for you as long as your custom FieldType implemented indexedToReadable in a sane way. 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that needs filled. I'm guessing the sanest thing to do in the CUSTOM case would be to ask the FieldComparatorSource (which should be coming from the SortField that the custom FieldType produced) to
[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType
[ https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13832268#comment-13832268 ] Robert Muir commented on SOLR-5354: --- This looks great Steve, thanks. Distributed sort is broken with CUSTOM FieldType Key: SOLR-5354 URL: https://issues.apache.org/jira/browse/SOLR-5354 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 4.4, 4.5, 5.0 Reporter: Jessica Cheng Assignee: Steve Rowe Labels: custom, query, sort Attachments: SOLR-5354.patch, SOLR-5354.patch We added a custom field type to allow an indexed binary field type that supports search (exact match), prefix search, and sort as unsigned bytes lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator accomplishes what we want, and even though the name of the comparator mentions UTF8, it doesn't actually assume so and just does byte-level operation, so it's good. However, when we do this across different nodes, we run into an issue where in QueryComponent.doFieldSortValues: // Must do the same conversion when sorting by a // String field in Lucene, which returns the terms // data as BytesRef: if (val instanceof BytesRef) { UnicodeUtil.UTF8toUTF16((BytesRef)val, spare); field.setStringValue(spare.toString()); val = ft.toObject(field); } UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually UTF8. I did a hack where I specified our own field comparator to be ByteBuffer based to get around that instanceof check, but then the field value gets transformed into BYTEARR in JavaBinCodec, and when it's unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, which decides to give me comparatorNatural in the else of the TODO for CUSTOM, which barfs because byte[] are not Comparable... From Chris Hostetter: I'm not very familiar with the distributed sorting code, but based on your comments, and a quick skim of the functions you pointed to, it definitely seems like there are two problems here for people trying to implement custom sorting in custom FieldTypes... 1) QueryComponent.doFieldSortValues - this definitely seems like it should be based on the FieldType, not an instanceof BytesRef check (oddly: the comment event suggestsion that it should be using the FieldType's indexedToReadable() method -- but it doesn't do that. If it did, then this part of hte logic should work for you as long as your custom FieldType implemented indexedToReadable in a sane way. 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that needs filled. I'm guessing the sanest thing to do in the CUSTOM case would be to ask the FieldComparatorSource (which should be coming from the SortField that the custom FieldType produced) to create a FieldComparator (via newComparator - the numHits sortPos could be anything) and then wrap that up in a Comparator facade that delegates to FieldComparator.compareValues That way a custom FieldType could be in complete control of the sort comparisons (even when merging ids). ...But as i said: i may be missing something, i'm not super familia with that code. Please try it out and let us know if thta works -- either way please open a Jira pointing out the problems trying to implement distributed sorting in a custom FieldType. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType
[ https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830727#comment-13830727 ] Robert Muir commented on SOLR-5354: --- {quote} I'm not sure why that would help? We can already ask each SortField for it's getField() and then look that up in the Schema. The crux of the problem really seems to be: naive assumptions in the distributed sorting code about how to safely send sort values over the wire; and what comparator to use when sorting those values. {quote} My point is that the serialization/deserialization doesn't really belong in the lucene comparator API, thats all. I agree with your proposed solution... Distributed sort is broken with CUSTOM FieldType Key: SOLR-5354 URL: https://issues.apache.org/jira/browse/SOLR-5354 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 4.4, 4.5, 5.0 Reporter: Jessica Cheng Assignee: Steve Rowe Labels: custom, query, sort Attachments: SOLR-5354.patch We added a custom field type to allow an indexed binary field type that supports search (exact match), prefix search, and sort as unsigned bytes lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator accomplishes what we want, and even though the name of the comparator mentions UTF8, it doesn't actually assume so and just does byte-level operation, so it's good. However, when we do this across different nodes, we run into an issue where in QueryComponent.doFieldSortValues: // Must do the same conversion when sorting by a // String field in Lucene, which returns the terms // data as BytesRef: if (val instanceof BytesRef) { UnicodeUtil.UTF8toUTF16((BytesRef)val, spare); field.setStringValue(spare.toString()); val = ft.toObject(field); } UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually UTF8. I did a hack where I specified our own field comparator to be ByteBuffer based to get around that instanceof check, but then the field value gets transformed into BYTEARR in JavaBinCodec, and when it's unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, which decides to give me comparatorNatural in the else of the TODO for CUSTOM, which barfs because byte[] are not Comparable... From Chris Hostetter: I'm not very familiar with the distributed sorting code, but based on your comments, and a quick skim of the functions you pointed to, it definitely seems like there are two problems here for people trying to implement custom sorting in custom FieldTypes... 1) QueryComponent.doFieldSortValues - this definitely seems like it should be based on the FieldType, not an instanceof BytesRef check (oddly: the comment event suggestsion that it should be using the FieldType's indexedToReadable() method -- but it doesn't do that. If it did, then this part of hte logic should work for you as long as your custom FieldType implemented indexedToReadable in a sane way. 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that needs filled. I'm guessing the sanest thing to do in the CUSTOM case would be to ask the FieldComparatorSource (which should be coming from the SortField that the custom FieldType produced) to create a FieldComparator (via newComparator - the numHits sortPos could be anything) and then wrap that up in a Comparator facade that delegates to FieldComparator.compareValues That way a custom FieldType could be in complete control of the sort comparisons (even when merging ids). ...But as i said: i may be missing something, i'm not super familia with that code. Please try it out and let us know if thta works -- either way please open a Jira pointing out the problems trying to implement distributed sorting in a custom FieldType. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType
[ https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830238#comment-13830238 ] Hoss Man commented on SOLR-5354: I've been looking into Solr's distributed sorting code more and more as part of my investigating into SOLR-5463 and i spoke breifly with sarowe off line about the overlap. I think the problems with CUSTOM distributed sorting is really just a subset of the larger weirdness with the assumptions Solr makes in general about how it can do distributed sorting and how it can de/serialize the sort values when merging hte results from the multiple shards. I think my earlier suggestion (in email that jessica quoted in the issue summary) about using methods on the FieldType (like indexedToReadable and toObject) to ensure we safely de/serialize the sort values are still the right way to go -- we have to ensure that no matter what strange object an arbitrary objects are used by a FieldComparator, we can safely serialize it. But i'm not longer convinced re-using those existing methods makes sense -- because the sort values used by a FieldType's FieldComparator may not map directly to the end user representation of the value (ie: TriedDateField sorts as long, but toObject returns Date; String fields sort on BytesRefs; Custom classes sort on who-knows-what, etc...) I think the best solution would be something like: * move the toExternal/toInternal concept in the existing patch out of FieldComparatorSource and into Solr's FieldType as methods clearly ment to be very speciic to sorting (ie: marshalSortValue and unmarshalSortValue) * change the fsv=true logic on shards to use marshalSortValue for any SortField that is on a field (if it's score or a function it will be a sinple numeric and already safe to serialize over the wire) * change the mergeIds logic on the coordinator node to explicitly use unmarshalSortValue and then use the _actual_ FieldComparator associated with each SortField instead of the hooky assumptions currently being made in ShardFieldSortedHitQueue.getCachedComparator about using things like comparatorNatural Other misc comments... bq. If the deserialization method depends on FieldType, the node responsible for the merge must also have the schema loaded, which might not be the case in SolrCloud. That's already a requirement in SolrCloud - the coordnator node merging results and writing them back to the client already has to have the same schema. (If it didn't a custom FieldType with a custom FieldComparator could never work, because there would be now way at all to know what order things should go in) bq. I think solr should fix its own apis here? It could add FieldType[] to SortSpec or something like that. I'm not sure why that would help? We can already ask each SortField for it's getField() and then look that up in the Schema. The crux of the problem really seems to be: naive assumptions in the distributed sorting code about how to safely send sort values over the wire; and what comparator to use when sorting those values. Distributed sort is broken with CUSTOM FieldType Key: SOLR-5354 URL: https://issues.apache.org/jira/browse/SOLR-5354 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 4.4, 4.5, 5.0 Reporter: Jessica Cheng Assignee: Steve Rowe Labels: custom, query, sort Attachments: SOLR-5354.patch We added a custom field type to allow an indexed binary field type that supports search (exact match), prefix search, and sort as unsigned bytes lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator accomplishes what we want, and even though the name of the comparator mentions UTF8, it doesn't actually assume so and just does byte-level operation, so it's good. However, when we do this across different nodes, we run into an issue where in QueryComponent.doFieldSortValues: // Must do the same conversion when sorting by a // String field in Lucene, which returns the terms // data as BytesRef: if (val instanceof BytesRef) { UnicodeUtil.UTF8toUTF16((BytesRef)val, spare); field.setStringValue(spare.toString()); val = ft.toObject(field); } UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually UTF8. I did a hack where I specified our own field comparator to be ByteBuffer based to get around that instanceof check, but then the field value gets transformed into BYTEARR in JavaBinCodec, and when it's unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, which decides
[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType
[ https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811422#comment-13811422 ] Steve Rowe commented on SOLR-5354: -- Thanks for the review Robert. bq. Can we please not have the Object/Object stuff in FieldComparatorSource? This is wrong: FieldComparator already has a generic type so I don't understand the need to discard type safety. I'm not sure what you have in mind - do you think FieldComparatorSource should be generified? In this case I think each extending class will need to provide an implementation for these methods, since there isn't a sensible way to provide a default implementation of conversion to/from the generic type. bq. The unicode conversion for String/String_VAL is incorrect and should not exist: despite the name, these types can be any bytes This is the status quo right now - the patch just keeps that in place. But I agree. I think the issue is non-binary (XML) serialization, for which UTF-8 is safe, but arbitrary binary is not. Serializing all STRING/STRING_VAL as Base64 seems wasteful in the general case. Relatedly, looks like there's an orphaned {{SortField.Type.BYTES}} (orphaned in that it's not handled in lots of places) - I guess this should go away? {quote} As a concrete example the CollationField and ICUCollationField sort with String/String_VAL comparators but contain non-unicode bytes. These currently do not work distributed today either (which I would love to see fixed on this issue). {quote} I'm working on a distributed version of the Solr (icu) collation tests. Once I get that failing, I'll be able to test potential solutions. Distributed sort is broken with CUSTOM FieldType Key: SOLR-5354 URL: https://issues.apache.org/jira/browse/SOLR-5354 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 4.4, 4.5, 5.0 Reporter: Jessica Cheng Assignee: Steve Rowe Labels: custom, query, sort Attachments: SOLR-5354.patch We added a custom field type to allow an indexed binary field type that supports search (exact match), prefix search, and sort as unsigned bytes lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator accomplishes what we want, and even though the name of the comparator mentions UTF8, it doesn't actually assume so and just does byte-level operation, so it's good. However, when we do this across different nodes, we run into an issue where in QueryComponent.doFieldSortValues: // Must do the same conversion when sorting by a // String field in Lucene, which returns the terms // data as BytesRef: if (val instanceof BytesRef) { UnicodeUtil.UTF8toUTF16((BytesRef)val, spare); field.setStringValue(spare.toString()); val = ft.toObject(field); } UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually UTF8. I did a hack where I specified our own field comparator to be ByteBuffer based to get around that instanceof check, but then the field value gets transformed into BYTEARR in JavaBinCodec, and when it's unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, which decides to give me comparatorNatural in the else of the TODO for CUSTOM, which barfs because byte[] are not Comparable... From Chris Hostetter: I'm not very familiar with the distributed sorting code, but based on your comments, and a quick skim of the functions you pointed to, it definitely seems like there are two problems here for people trying to implement custom sorting in custom FieldTypes... 1) QueryComponent.doFieldSortValues - this definitely seems like it should be based on the FieldType, not an instanceof BytesRef check (oddly: the comment event suggestsion that it should be using the FieldType's indexedToReadable() method -- but it doesn't do that. If it did, then this part of hte logic should work for you as long as your custom FieldType implemented indexedToReadable in a sane way. 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that needs filled. I'm guessing the sanest thing to do in the CUSTOM case would be to ask the FieldComparatorSource (which should be coming from the SortField that the custom FieldType produced) to create a FieldComparator (via newComparator - the numHits sortPos could be anything) and then wrap that up in a Comparator facade that delegates to FieldComparator.compareValues That way a custom FieldType could be in complete control of the sort comparisons (even when merging ids). ...But as i said: i may be missing something, i'm not super
[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType
[ https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811432#comment-13811432 ] Robert Muir commented on SOLR-5354: --- {quote} This is the status quo right now - the patch just keeps that in place. {quote} No its not: its a bug in solr. This patch moves that bug into Lucene. Lucene's APIs here work correctly on any bytes today. Distributed sort is broken with CUSTOM FieldType Key: SOLR-5354 URL: https://issues.apache.org/jira/browse/SOLR-5354 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 4.4, 4.5, 5.0 Reporter: Jessica Cheng Assignee: Steve Rowe Labels: custom, query, sort Attachments: SOLR-5354.patch We added a custom field type to allow an indexed binary field type that supports search (exact match), prefix search, and sort as unsigned bytes lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator accomplishes what we want, and even though the name of the comparator mentions UTF8, it doesn't actually assume so and just does byte-level operation, so it's good. However, when we do this across different nodes, we run into an issue where in QueryComponent.doFieldSortValues: // Must do the same conversion when sorting by a // String field in Lucene, which returns the terms // data as BytesRef: if (val instanceof BytesRef) { UnicodeUtil.UTF8toUTF16((BytesRef)val, spare); field.setStringValue(spare.toString()); val = ft.toObject(field); } UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually UTF8. I did a hack where I specified our own field comparator to be ByteBuffer based to get around that instanceof check, but then the field value gets transformed into BYTEARR in JavaBinCodec, and when it's unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, which decides to give me comparatorNatural in the else of the TODO for CUSTOM, which barfs because byte[] are not Comparable... From Chris Hostetter: I'm not very familiar with the distributed sorting code, but based on your comments, and a quick skim of the functions you pointed to, it definitely seems like there are two problems here for people trying to implement custom sorting in custom FieldTypes... 1) QueryComponent.doFieldSortValues - this definitely seems like it should be based on the FieldType, not an instanceof BytesRef check (oddly: the comment event suggestsion that it should be using the FieldType's indexedToReadable() method -- but it doesn't do that. If it did, then this part of hte logic should work for you as long as your custom FieldType implemented indexedToReadable in a sane way. 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that needs filled. I'm guessing the sanest thing to do in the CUSTOM case would be to ask the FieldComparatorSource (which should be coming from the SortField that the custom FieldType produced) to create a FieldComparator (via newComparator - the numHits sortPos could be anything) and then wrap that up in a Comparator facade that delegates to FieldComparator.compareValues That way a custom FieldType could be in complete control of the sort comparisons (even when merging ids). ...But as i said: i may be missing something, i'm not super familia with that code. Please try it out and let us know if thta works -- either way please open a Jira pointing out the problems trying to implement distributed sorting in a custom FieldType. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType
[ https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811433#comment-13811433 ] Robert Muir commented on SOLR-5354: --- {quote} I think the issue is non-binary (XML) serialization, for which UTF-8 is safe, but arbitrary binary is not. Serializing all STRING/STRING_VAL as Base64 seems wasteful in the general case. {quote} This is all solr stuff. I don't think it makes sense to move that logic into lucene, let the user deal with this. They might not be using XML at all: maybe thrift or avro or something else. Why not just add serialize/deserialize methods to solr's FieldType.java? It seems like the obvious place. Distributed sort is broken with CUSTOM FieldType Key: SOLR-5354 URL: https://issues.apache.org/jira/browse/SOLR-5354 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 4.4, 4.5, 5.0 Reporter: Jessica Cheng Assignee: Steve Rowe Labels: custom, query, sort Attachments: SOLR-5354.patch We added a custom field type to allow an indexed binary field type that supports search (exact match), prefix search, and sort as unsigned bytes lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator accomplishes what we want, and even though the name of the comparator mentions UTF8, it doesn't actually assume so and just does byte-level operation, so it's good. However, when we do this across different nodes, we run into an issue where in QueryComponent.doFieldSortValues: // Must do the same conversion when sorting by a // String field in Lucene, which returns the terms // data as BytesRef: if (val instanceof BytesRef) { UnicodeUtil.UTF8toUTF16((BytesRef)val, spare); field.setStringValue(spare.toString()); val = ft.toObject(field); } UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually UTF8. I did a hack where I specified our own field comparator to be ByteBuffer based to get around that instanceof check, but then the field value gets transformed into BYTEARR in JavaBinCodec, and when it's unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, which decides to give me comparatorNatural in the else of the TODO for CUSTOM, which barfs because byte[] are not Comparable... From Chris Hostetter: I'm not very familiar with the distributed sorting code, but based on your comments, and a quick skim of the functions you pointed to, it definitely seems like there are two problems here for people trying to implement custom sorting in custom FieldTypes... 1) QueryComponent.doFieldSortValues - this definitely seems like it should be based on the FieldType, not an instanceof BytesRef check (oddly: the comment event suggestsion that it should be using the FieldType's indexedToReadable() method -- but it doesn't do that. If it did, then this part of hte logic should work for you as long as your custom FieldType implemented indexedToReadable in a sane way. 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that needs filled. I'm guessing the sanest thing to do in the CUSTOM case would be to ask the FieldComparatorSource (which should be coming from the SortField that the custom FieldType produced) to create a FieldComparator (via newComparator - the numHits sortPos could be anything) and then wrap that up in a Comparator facade that delegates to FieldComparator.compareValues That way a custom FieldType could be in complete control of the sort comparisons (even when merging ids). ...But as i said: i may be missing something, i'm not super familia with that code. Please try it out and let us know if thta works -- either way please open a Jira pointing out the problems trying to implement distributed sorting in a custom FieldType. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType
[ https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811450#comment-13811450 ] Jessica Cheng commented on SOLR-5354: - {quote} Why not just add serialize/deserialize methods to solr's FieldType.java? It seems like the obvious place. {quote} When SortField's are deserialized on the receiving end, it's no longer clear which FieldType the field came from. If the deserialization method depends on FieldType, the node responsible for the merge must also have the schema loaded, which might not be the case in SolrCloud. Maybe solr needs its own SortField too then? Distributed sort is broken with CUSTOM FieldType Key: SOLR-5354 URL: https://issues.apache.org/jira/browse/SOLR-5354 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 4.4, 4.5, 5.0 Reporter: Jessica Cheng Assignee: Steve Rowe Labels: custom, query, sort Attachments: SOLR-5354.patch We added a custom field type to allow an indexed binary field type that supports search (exact match), prefix search, and sort as unsigned bytes lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator accomplishes what we want, and even though the name of the comparator mentions UTF8, it doesn't actually assume so and just does byte-level operation, so it's good. However, when we do this across different nodes, we run into an issue where in QueryComponent.doFieldSortValues: // Must do the same conversion when sorting by a // String field in Lucene, which returns the terms // data as BytesRef: if (val instanceof BytesRef) { UnicodeUtil.UTF8toUTF16((BytesRef)val, spare); field.setStringValue(spare.toString()); val = ft.toObject(field); } UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually UTF8. I did a hack where I specified our own field comparator to be ByteBuffer based to get around that instanceof check, but then the field value gets transformed into BYTEARR in JavaBinCodec, and when it's unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, which decides to give me comparatorNatural in the else of the TODO for CUSTOM, which barfs because byte[] are not Comparable... From Chris Hostetter: I'm not very familiar with the distributed sorting code, but based on your comments, and a quick skim of the functions you pointed to, it definitely seems like there are two problems here for people trying to implement custom sorting in custom FieldTypes... 1) QueryComponent.doFieldSortValues - this definitely seems like it should be based on the FieldType, not an instanceof BytesRef check (oddly: the comment event suggestsion that it should be using the FieldType's indexedToReadable() method -- but it doesn't do that. If it did, then this part of hte logic should work for you as long as your custom FieldType implemented indexedToReadable in a sane way. 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that needs filled. I'm guessing the sanest thing to do in the CUSTOM case would be to ask the FieldComparatorSource (which should be coming from the SortField that the custom FieldType produced) to create a FieldComparator (via newComparator - the numHits sortPos could be anything) and then wrap that up in a Comparator facade that delegates to FieldComparator.compareValues That way a custom FieldType could be in complete control of the sort comparisons (even when merging ids). ...But as i said: i may be missing something, i'm not super familia with that code. Please try it out and let us know if thta works -- either way please open a Jira pointing out the problems trying to implement distributed sorting in a custom FieldType. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType
[ https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811461#comment-13811461 ] Robert Muir commented on SOLR-5354: --- {quote} When SortField's are deserialized on the receiving end, it's no longer clear which FieldType the field came from. If the deserialization method depends on FieldType, the node responsible for the merge must also have the schema loaded, which might not be the case in SolrCloud. {quote} Then where is it getting a comparator from? I don't understand how changing a lucene API solves this problem. Distributed sort is broken with CUSTOM FieldType Key: SOLR-5354 URL: https://issues.apache.org/jira/browse/SOLR-5354 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 4.4, 4.5, 5.0 Reporter: Jessica Cheng Assignee: Steve Rowe Labels: custom, query, sort Attachments: SOLR-5354.patch We added a custom field type to allow an indexed binary field type that supports search (exact match), prefix search, and sort as unsigned bytes lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator accomplishes what we want, and even though the name of the comparator mentions UTF8, it doesn't actually assume so and just does byte-level operation, so it's good. However, when we do this across different nodes, we run into an issue where in QueryComponent.doFieldSortValues: // Must do the same conversion when sorting by a // String field in Lucene, which returns the terms // data as BytesRef: if (val instanceof BytesRef) { UnicodeUtil.UTF8toUTF16((BytesRef)val, spare); field.setStringValue(spare.toString()); val = ft.toObject(field); } UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually UTF8. I did a hack where I specified our own field comparator to be ByteBuffer based to get around that instanceof check, but then the field value gets transformed into BYTEARR in JavaBinCodec, and when it's unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, which decides to give me comparatorNatural in the else of the TODO for CUSTOM, which barfs because byte[] are not Comparable... From Chris Hostetter: I'm not very familiar with the distributed sorting code, but based on your comments, and a quick skim of the functions you pointed to, it definitely seems like there are two problems here for people trying to implement custom sorting in custom FieldTypes... 1) QueryComponent.doFieldSortValues - this definitely seems like it should be based on the FieldType, not an instanceof BytesRef check (oddly: the comment event suggestsion that it should be using the FieldType's indexedToReadable() method -- but it doesn't do that. If it did, then this part of hte logic should work for you as long as your custom FieldType implemented indexedToReadable in a sane way. 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that needs filled. I'm guessing the sanest thing to do in the CUSTOM case would be to ask the FieldComparatorSource (which should be coming from the SortField that the custom FieldType produced) to create a FieldComparator (via newComparator - the numHits sortPos could be anything) and then wrap that up in a Comparator facade that delegates to FieldComparator.compareValues That way a custom FieldType could be in complete control of the sort comparisons (even when merging ids). ...But as i said: i may be missing something, i'm not super familia with that code. Please try it out and let us know if thta works -- either way please open a Jira pointing out the problems trying to implement distributed sorting in a custom FieldType. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType
[ https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811463#comment-13811463 ] Robert Muir commented on SOLR-5354: --- {quote} Maybe solr needs its own SortField too then? {quote} OK I see it, I think solr should fix its own apis here? It could add FieldType[] to SortSpec or something like that. Distributed sort is broken with CUSTOM FieldType Key: SOLR-5354 URL: https://issues.apache.org/jira/browse/SOLR-5354 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 4.4, 4.5, 5.0 Reporter: Jessica Cheng Assignee: Steve Rowe Labels: custom, query, sort Attachments: SOLR-5354.patch We added a custom field type to allow an indexed binary field type that supports search (exact match), prefix search, and sort as unsigned bytes lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator accomplishes what we want, and even though the name of the comparator mentions UTF8, it doesn't actually assume so and just does byte-level operation, so it's good. However, when we do this across different nodes, we run into an issue where in QueryComponent.doFieldSortValues: // Must do the same conversion when sorting by a // String field in Lucene, which returns the terms // data as BytesRef: if (val instanceof BytesRef) { UnicodeUtil.UTF8toUTF16((BytesRef)val, spare); field.setStringValue(spare.toString()); val = ft.toObject(field); } UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually UTF8. I did a hack where I specified our own field comparator to be ByteBuffer based to get around that instanceof check, but then the field value gets transformed into BYTEARR in JavaBinCodec, and when it's unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, which decides to give me comparatorNatural in the else of the TODO for CUSTOM, which barfs because byte[] are not Comparable... From Chris Hostetter: I'm not very familiar with the distributed sorting code, but based on your comments, and a quick skim of the functions you pointed to, it definitely seems like there are two problems here for people trying to implement custom sorting in custom FieldTypes... 1) QueryComponent.doFieldSortValues - this definitely seems like it should be based on the FieldType, not an instanceof BytesRef check (oddly: the comment event suggestsion that it should be using the FieldType's indexedToReadable() method -- but it doesn't do that. If it did, then this part of hte logic should work for you as long as your custom FieldType implemented indexedToReadable in a sane way. 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that needs filled. I'm guessing the sanest thing to do in the CUSTOM case would be to ask the FieldComparatorSource (which should be coming from the SortField that the custom FieldType produced) to create a FieldComparator (via newComparator - the numHits sortPos could be anything) and then wrap that up in a Comparator facade that delegates to FieldComparator.compareValues That way a custom FieldType could be in complete control of the sort comparisons (even when merging ids). ...But as i said: i may be missing something, i'm not super familia with that code. Please try it out and let us know if thta works -- either way please open a Jira pointing out the problems trying to implement distributed sorting in a custom FieldType. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType
[ https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808147#comment-13808147 ] Robert Muir commented on SOLR-5354: --- I think there are a couple issues to address first at least in the lucene part. Can we please not have the Object/Object stuff in FieldComparatorSource? This is wrong: FieldComparator already has a generic type so I don't understand the need to discard type safety. The unicode conversion for String/String_VAL is incorrect and should not exist: despite the name, these types can be any bytes. Distributed sort is broken with CUSTOM FieldType Key: SOLR-5354 URL: https://issues.apache.org/jira/browse/SOLR-5354 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 4.4, 4.5, 5.0 Reporter: Jessica Cheng Assignee: Steve Rowe Labels: custom, query, sort Attachments: SOLR-5354.patch We added a custom field type to allow an indexed binary field type that supports search (exact match), prefix search, and sort as unsigned bytes lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator accomplishes what we want, and even though the name of the comparator mentions UTF8, it doesn't actually assume so and just does byte-level operation, so it's good. However, when we do this across different nodes, we run into an issue where in QueryComponent.doFieldSortValues: // Must do the same conversion when sorting by a // String field in Lucene, which returns the terms // data as BytesRef: if (val instanceof BytesRef) { UnicodeUtil.UTF8toUTF16((BytesRef)val, spare); field.setStringValue(spare.toString()); val = ft.toObject(field); } UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually UTF8. I did a hack where I specified our own field comparator to be ByteBuffer based to get around that instanceof check, but then the field value gets transformed into BYTEARR in JavaBinCodec, and when it's unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, which decides to give me comparatorNatural in the else of the TODO for CUSTOM, which barfs because byte[] are not Comparable... From Chris Hostetter: I'm not very familiar with the distributed sorting code, but based on your comments, and a quick skim of the functions you pointed to, it definitely seems like there are two problems here for people trying to implement custom sorting in custom FieldTypes... 1) QueryComponent.doFieldSortValues - this definitely seems like it should be based on the FieldType, not an instanceof BytesRef check (oddly: the comment event suggestsion that it should be using the FieldType's indexedToReadable() method -- but it doesn't do that. If it did, then this part of hte logic should work for you as long as your custom FieldType implemented indexedToReadable in a sane way. 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that needs filled. I'm guessing the sanest thing to do in the CUSTOM case would be to ask the FieldComparatorSource (which should be coming from the SortField that the custom FieldType produced) to create a FieldComparator (via newComparator - the numHits sortPos could be anything) and then wrap that up in a Comparator facade that delegates to FieldComparator.compareValues That way a custom FieldType could be in complete control of the sort comparisons (even when merging ids). ...But as i said: i may be missing something, i'm not super familia with that code. Please try it out and let us know if thta works -- either way please open a Jira pointing out the problems trying to implement distributed sorting in a custom FieldType. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType
[ https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808160#comment-13808160 ] Robert Muir commented on SOLR-5354: --- As a concrete example the CollationField and ICUCollationField sort with String/String_VAL comparators but contain non-unicode bytes. These currently do not work distributed today either (which I would love to see fixed on this issue). Distributed sort is broken with CUSTOM FieldType Key: SOLR-5354 URL: https://issues.apache.org/jira/browse/SOLR-5354 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 4.4, 4.5, 5.0 Reporter: Jessica Cheng Assignee: Steve Rowe Labels: custom, query, sort Attachments: SOLR-5354.patch We added a custom field type to allow an indexed binary field type that supports search (exact match), prefix search, and sort as unsigned bytes lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator accomplishes what we want, and even though the name of the comparator mentions UTF8, it doesn't actually assume so and just does byte-level operation, so it's good. However, when we do this across different nodes, we run into an issue where in QueryComponent.doFieldSortValues: // Must do the same conversion when sorting by a // String field in Lucene, which returns the terms // data as BytesRef: if (val instanceof BytesRef) { UnicodeUtil.UTF8toUTF16((BytesRef)val, spare); field.setStringValue(spare.toString()); val = ft.toObject(field); } UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually UTF8. I did a hack where I specified our own field comparator to be ByteBuffer based to get around that instanceof check, but then the field value gets transformed into BYTEARR in JavaBinCodec, and when it's unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, which decides to give me comparatorNatural in the else of the TODO for CUSTOM, which barfs because byte[] are not Comparable... From Chris Hostetter: I'm not very familiar with the distributed sorting code, but based on your comments, and a quick skim of the functions you pointed to, it definitely seems like there are two problems here for people trying to implement custom sorting in custom FieldTypes... 1) QueryComponent.doFieldSortValues - this definitely seems like it should be based on the FieldType, not an instanceof BytesRef check (oddly: the comment event suggestsion that it should be using the FieldType's indexedToReadable() method -- but it doesn't do that. If it did, then this part of hte logic should work for you as long as your custom FieldType implemented indexedToReadable in a sane way. 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that needs filled. I'm guessing the sanest thing to do in the CUSTOM case would be to ask the FieldComparatorSource (which should be coming from the SortField that the custom FieldType produced) to create a FieldComparator (via newComparator - the numHits sortPos could be anything) and then wrap that up in a Comparator facade that delegates to FieldComparator.compareValues That way a custom FieldType could be in complete control of the sort comparisons (even when merging ids). ...But as i said: i may be missing something, i'm not super familia with that code. Please try it out and let us know if thta works -- either way please open a Jira pointing out the problems trying to implement distributed sorting in a custom FieldType. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType
[ https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796289#comment-13796289 ] Jessica Cheng commented on SOLR-5354: - I think calling indexedToReadable() in QueryComponent.doFieldSortValues won't actually help here because it's trying to get the actual Object of the fields, not the readable String representation. If we do go with sending sort_values as readable strings, on the other side when QueryComponent.mergeIds is called, it'll need to take care of translating all readable strings to the actual Objects, which I'm not sure if there's an easy way to do. The safest thing/ least change is probably to check the sort field type instead of using instanceof in QueryComponent.doFieldSortValues. Distributed sort is broken with CUSTOM FieldType Key: SOLR-5354 URL: https://issues.apache.org/jira/browse/SOLR-5354 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 4.4, 4.5, 5.0 Reporter: Jessica Cheng Labels: custom, query, sort We added a custom field type to allow an indexed binary field type that supports search (exact match), prefix search, and sort as unsigned bytes lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator accomplishes what we want, and even though the name of the comparator mentions UTF8, it doesn't actually assume so and just does byte-level operation, so it's good. However, when we do this across different nodes, we run into an issue where in QueryComponent.doFieldSortValues: // Must do the same conversion when sorting by a // String field in Lucene, which returns the terms // data as BytesRef: if (val instanceof BytesRef) { UnicodeUtil.UTF8toUTF16((BytesRef)val, spare); field.setStringValue(spare.toString()); val = ft.toObject(field); } UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually UTF8. I did a hack where I specified our own field comparator to be ByteBuffer based to get around that instanceof check, but then the field value gets transformed into BYTEARR in JavaBinCodec, and when it's unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, which decides to give me comparatorNatural in the else of the TODO for CUSTOM, which barfs because byte[] are not Comparable... From Chris Hostetter: I'm not very familiar with the distributed sorting code, but based on your comments, and a quick skim of the functions you pointed to, it definitely seems like there are two problems here for people trying to implement custom sorting in custom FieldTypes... 1) QueryComponent.doFieldSortValues - this definitely seems like it should be based on the FieldType, not an instanceof BytesRef check (oddly: the comment event suggestsion that it should be using the FieldType's indexedToReadable() method -- but it doesn't do that. If it did, then this part of hte logic should work for you as long as your custom FieldType implemented indexedToReadable in a sane way. 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that needs filled. I'm guessing the sanest thing to do in the CUSTOM case would be to ask the FieldComparatorSource (which should be coming from the SortField that the custom FieldType produced) to create a FieldComparator (via newComparator - the numHits sortPos could be anything) and then wrap that up in a Comparator facade that delegates to FieldComparator.compareValues That way a custom FieldType could be in complete control of the sort comparisons (even when merging ids). ...But as i said: i may be missing something, i'm not super familia with that code. Please try it out and let us know if thta works -- either way please open a Jira pointing out the problems trying to implement distributed sorting in a custom FieldType. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org