[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType

2014-01-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872841#comment-13872841
 ] 

ASF subversion and git services commented on SOLR-5354:
---

Commit 1558618 from [~rcmuir] in branch 'dev/branches/lucene539399'
[ https://svn.apache.org/r1558618 ]

LUCENE-5399, SOLR-5354: fix distributed grouping to marshal/unmarshal sort 
values properly

 Distributed sort is broken with CUSTOM FieldType
 

 Key: SOLR-5354
 URL: https://issues.apache.org/jira/browse/SOLR-5354
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.4, 4.5, 4.6, 5.0
Reporter: Jessica Cheng
Assignee: Steve Rowe
  Labels: custom, query, sort
 Fix For: 5.0, 4.7

 Attachments: SOLR-5354.patch, SOLR-5354.patch, SOLR-5354.patch, 
 SOLR-5354.patch, SOLR-5354__fix_function_edge_case.patch


 We added a custom field type to allow an indexed binary field type that 
 supports search (exact match), prefix search, and sort as unsigned bytes 
 lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator 
 accomplishes what we want, and even though the name of the comparator 
 mentions UTF8, it doesn't actually assume so and just does byte-level 
 operation, so it's good. However, when we do this across different nodes, we 
 run into an issue where in QueryComponent.doFieldSortValues:
   // Must do the same conversion when sorting by a
   // String field in Lucene, which returns the terms
   // data as BytesRef:
   if (val instanceof BytesRef) {
 UnicodeUtil.UTF8toUTF16((BytesRef)val, spare);
 field.setStringValue(spare.toString());
 val = ft.toObject(field);
   }
 UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually 
 UTF8. I did a hack where I specified our own field comparator to be 
 ByteBuffer based to get around that instanceof check, but then the field 
 value gets transformed into BYTEARR in JavaBinCodec, and when it's 
 unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a 
 ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, 
 which decides to give me comparatorNatural in the else of the TODO for 
 CUSTOM, which barfs because byte[] are not Comparable...
 From Chris Hostetter:
 I'm not very familiar with the distributed sorting code, but based on your
 comments, and a quick skim of the functions you pointed to, it definitely
 seems like there are two problems here for people trying to implement
 custom sorting in custom FieldTypes...
 1) QueryComponent.doFieldSortValues - this definitely seems like it should
 be based on the FieldType, not an instanceof BytesRef check (oddly: the
 comment event suggestsion that it should be using the FieldType's
 indexedToReadable() method -- but it doesn't do that.  If it did, then
 this part of hte logic should work for you as long as your custom
 FieldType implemented indexedToReadable in a sane way.
 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that
 needs filled.  I'm guessing the sanest thing to do in the CUSTOM case
 would be to ask the FieldComparatorSource (which should be coming from the
 SortField that the custom FieldType produced) to create a FieldComparator
 (via newComparator - the numHits  sortPos could be anything) and then
 wrap that up in a Comparator facade that delegates to
 FieldComparator.compareValues
 That way a custom FieldType could be in complete control of the sort
 comparisons (even when merging ids).
 ...But as i said: i may be missing something, i'm not super familia with
 that code.  Please try it out and let us know if thta works -- either way
 please open a Jira pointing out the problems trying to implement
 distributed sorting in a custom FieldType.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType

2013-12-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837786#comment-13837786
 ] 

ASF subversion and git services commented on SOLR-5354:
---

Commit 1547430 from [~steve_rowe] in branch 'dev/trunk'
[ https://svn.apache.org/r1547430 ]

SOLR-5354: applying hoss's patch to fix function edge case in distributed sort

 Distributed sort is broken with CUSTOM FieldType
 

 Key: SOLR-5354
 URL: https://issues.apache.org/jira/browse/SOLR-5354
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.4, 4.5, 5.0
Reporter: Jessica Cheng
Assignee: Steve Rowe
  Labels: custom, query, sort
 Fix For: 5.0, 4.7

 Attachments: SOLR-5354.patch, SOLR-5354.patch, SOLR-5354.patch, 
 SOLR-5354.patch, SOLR-5354__fix_function_edge_case.patch


 We added a custom field type to allow an indexed binary field type that 
 supports search (exact match), prefix search, and sort as unsigned bytes 
 lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator 
 accomplishes what we want, and even though the name of the comparator 
 mentions UTF8, it doesn't actually assume so and just does byte-level 
 operation, so it's good. However, when we do this across different nodes, we 
 run into an issue where in QueryComponent.doFieldSortValues:
   // Must do the same conversion when sorting by a
   // String field in Lucene, which returns the terms
   // data as BytesRef:
   if (val instanceof BytesRef) {
 UnicodeUtil.UTF8toUTF16((BytesRef)val, spare);
 field.setStringValue(spare.toString());
 val = ft.toObject(field);
   }
 UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually 
 UTF8. I did a hack where I specified our own field comparator to be 
 ByteBuffer based to get around that instanceof check, but then the field 
 value gets transformed into BYTEARR in JavaBinCodec, and when it's 
 unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a 
 ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, 
 which decides to give me comparatorNatural in the else of the TODO for 
 CUSTOM, which barfs because byte[] are not Comparable...
 From Chris Hostetter:
 I'm not very familiar with the distributed sorting code, but based on your
 comments, and a quick skim of the functions you pointed to, it definitely
 seems like there are two problems here for people trying to implement
 custom sorting in custom FieldTypes...
 1) QueryComponent.doFieldSortValues - this definitely seems like it should
 be based on the FieldType, not an instanceof BytesRef check (oddly: the
 comment event suggestsion that it should be using the FieldType's
 indexedToReadable() method -- but it doesn't do that.  If it did, then
 this part of hte logic should work for you as long as your custom
 FieldType implemented indexedToReadable in a sane way.
 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that
 needs filled.  I'm guessing the sanest thing to do in the CUSTOM case
 would be to ask the FieldComparatorSource (which should be coming from the
 SortField that the custom FieldType produced) to create a FieldComparator
 (via newComparator - the numHits  sortPos could be anything) and then
 wrap that up in a Comparator facade that delegates to
 FieldComparator.compareValues
 That way a custom FieldType could be in complete control of the sort
 comparisons (even when merging ids).
 ...But as i said: i may be missing something, i'm not super familia with
 that code.  Please try it out and let us know if thta works -- either way
 please open a Jira pointing out the problems trying to implement
 distributed sorting in a custom FieldType.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType

2013-12-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837888#comment-13837888
 ] 

ASF subversion and git services commented on SOLR-5354:
---

Commit 1547473 from [~steve_rowe] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1547473 ]

SOLR-5354: applying hoss's patch to fix function edge case in distributed sort 
(merged trunk r1547430)

 Distributed sort is broken with CUSTOM FieldType
 

 Key: SOLR-5354
 URL: https://issues.apache.org/jira/browse/SOLR-5354
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.4, 4.5, 5.0
Reporter: Jessica Cheng
Assignee: Steve Rowe
  Labels: custom, query, sort
 Fix For: 5.0, 4.7

 Attachments: SOLR-5354.patch, SOLR-5354.patch, SOLR-5354.patch, 
 SOLR-5354.patch, SOLR-5354__fix_function_edge_case.patch


 We added a custom field type to allow an indexed binary field type that 
 supports search (exact match), prefix search, and sort as unsigned bytes 
 lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator 
 accomplishes what we want, and even though the name of the comparator 
 mentions UTF8, it doesn't actually assume so and just does byte-level 
 operation, so it's good. However, when we do this across different nodes, we 
 run into an issue where in QueryComponent.doFieldSortValues:
   // Must do the same conversion when sorting by a
   // String field in Lucene, which returns the terms
   // data as BytesRef:
   if (val instanceof BytesRef) {
 UnicodeUtil.UTF8toUTF16((BytesRef)val, spare);
 field.setStringValue(spare.toString());
 val = ft.toObject(field);
   }
 UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually 
 UTF8. I did a hack where I specified our own field comparator to be 
 ByteBuffer based to get around that instanceof check, but then the field 
 value gets transformed into BYTEARR in JavaBinCodec, and when it's 
 unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a 
 ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, 
 which decides to give me comparatorNatural in the else of the TODO for 
 CUSTOM, which barfs because byte[] are not Comparable...
 From Chris Hostetter:
 I'm not very familiar with the distributed sorting code, but based on your
 comments, and a quick skim of the functions you pointed to, it definitely
 seems like there are two problems here for people trying to implement
 custom sorting in custom FieldTypes...
 1) QueryComponent.doFieldSortValues - this definitely seems like it should
 be based on the FieldType, not an instanceof BytesRef check (oddly: the
 comment event suggestsion that it should be using the FieldType's
 indexedToReadable() method -- but it doesn't do that.  If it did, then
 this part of hte logic should work for you as long as your custom
 FieldType implemented indexedToReadable in a sane way.
 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that
 needs filled.  I'm guessing the sanest thing to do in the CUSTOM case
 would be to ask the FieldComparatorSource (which should be coming from the
 SortField that the custom FieldType produced) to create a FieldComparator
 (via newComparator - the numHits  sortPos could be anything) and then
 wrap that up in a Comparator facade that delegates to
 FieldComparator.compareValues
 That way a custom FieldType could be in complete control of the sort
 comparisons (even when merging ids).
 ...But as i said: i may be missing something, i'm not super familia with
 that code.  Please try it out and let us know if thta works -- either way
 please open a Jira pointing out the problems trying to implement
 distributed sorting in a custom FieldType.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType

2013-11-29 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13835397#comment-13835397
 ] 

ASF subversion and git services commented on SOLR-5354:
---

Commit 1546571 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1546571 ]

SOLR-5354: don't try to write docvalues with 3.x codec in these tests

 Distributed sort is broken with CUSTOM FieldType
 

 Key: SOLR-5354
 URL: https://issues.apache.org/jira/browse/SOLR-5354
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.4, 4.5, 5.0
Reporter: Jessica Cheng
Assignee: Steve Rowe
  Labels: custom, query, sort
 Fix For: 5.0, 4.7

 Attachments: SOLR-5354.patch, SOLR-5354.patch, SOLR-5354.patch, 
 SOLR-5354.patch


 We added a custom field type to allow an indexed binary field type that 
 supports search (exact match), prefix search, and sort as unsigned bytes 
 lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator 
 accomplishes what we want, and even though the name of the comparator 
 mentions UTF8, it doesn't actually assume so and just does byte-level 
 operation, so it's good. However, when we do this across different nodes, we 
 run into an issue where in QueryComponent.doFieldSortValues:
   // Must do the same conversion when sorting by a
   // String field in Lucene, which returns the terms
   // data as BytesRef:
   if (val instanceof BytesRef) {
 UnicodeUtil.UTF8toUTF16((BytesRef)val, spare);
 field.setStringValue(spare.toString());
 val = ft.toObject(field);
   }
 UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually 
 UTF8. I did a hack where I specified our own field comparator to be 
 ByteBuffer based to get around that instanceof check, but then the field 
 value gets transformed into BYTEARR in JavaBinCodec, and when it's 
 unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a 
 ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, 
 which decides to give me comparatorNatural in the else of the TODO for 
 CUSTOM, which barfs because byte[] are not Comparable...
 From Chris Hostetter:
 I'm not very familiar with the distributed sorting code, but based on your
 comments, and a quick skim of the functions you pointed to, it definitely
 seems like there are two problems here for people trying to implement
 custom sorting in custom FieldTypes...
 1) QueryComponent.doFieldSortValues - this definitely seems like it should
 be based on the FieldType, not an instanceof BytesRef check (oddly: the
 comment event suggestsion that it should be using the FieldType's
 indexedToReadable() method -- but it doesn't do that.  If it did, then
 this part of hte logic should work for you as long as your custom
 FieldType implemented indexedToReadable in a sane way.
 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that
 needs filled.  I'm guessing the sanest thing to do in the CUSTOM case
 would be to ask the FieldComparatorSource (which should be coming from the
 SortField that the custom FieldType produced) to create a FieldComparator
 (via newComparator - the numHits  sortPos could be anything) and then
 wrap that up in a Comparator facade that delegates to
 FieldComparator.compareValues
 That way a custom FieldType could be in complete control of the sort
 comparisons (even when merging ids).
 ...But as i said: i may be missing something, i'm not super familia with
 that code.  Please try it out and let us know if thta works -- either way
 please open a Jira pointing out the problems trying to implement
 distributed sorting in a custom FieldType.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType

2013-11-29 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13835434#comment-13835434
 ] 

ASF subversion and git services commented on SOLR-5354:
---

Commit 1546589 from [~steve_rowe] in branch 'dev/trunk'
[ https://svn.apache.org/r1546589 ]

SOLR-5354: fix attribution

 Distributed sort is broken with CUSTOM FieldType
 

 Key: SOLR-5354
 URL: https://issues.apache.org/jira/browse/SOLR-5354
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.4, 4.5, 5.0
Reporter: Jessica Cheng
Assignee: Steve Rowe
  Labels: custom, query, sort
 Fix For: 5.0, 4.7

 Attachments: SOLR-5354.patch, SOLR-5354.patch, SOLR-5354.patch, 
 SOLR-5354.patch


 We added a custom field type to allow an indexed binary field type that 
 supports search (exact match), prefix search, and sort as unsigned bytes 
 lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator 
 accomplishes what we want, and even though the name of the comparator 
 mentions UTF8, it doesn't actually assume so and just does byte-level 
 operation, so it's good. However, when we do this across different nodes, we 
 run into an issue where in QueryComponent.doFieldSortValues:
   // Must do the same conversion when sorting by a
   // String field in Lucene, which returns the terms
   // data as BytesRef:
   if (val instanceof BytesRef) {
 UnicodeUtil.UTF8toUTF16((BytesRef)val, spare);
 field.setStringValue(spare.toString());
 val = ft.toObject(field);
   }
 UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually 
 UTF8. I did a hack where I specified our own field comparator to be 
 ByteBuffer based to get around that instanceof check, but then the field 
 value gets transformed into BYTEARR in JavaBinCodec, and when it's 
 unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a 
 ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, 
 which decides to give me comparatorNatural in the else of the TODO for 
 CUSTOM, which barfs because byte[] are not Comparable...
 From Chris Hostetter:
 I'm not very familiar with the distributed sorting code, but based on your
 comments, and a quick skim of the functions you pointed to, it definitely
 seems like there are two problems here for people trying to implement
 custom sorting in custom FieldTypes...
 1) QueryComponent.doFieldSortValues - this definitely seems like it should
 be based on the FieldType, not an instanceof BytesRef check (oddly: the
 comment event suggestsion that it should be using the FieldType's
 indexedToReadable() method -- but it doesn't do that.  If it did, then
 this part of hte logic should work for you as long as your custom
 FieldType implemented indexedToReadable in a sane way.
 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that
 needs filled.  I'm guessing the sanest thing to do in the CUSTOM case
 would be to ask the FieldComparatorSource (which should be coming from the
 SortField that the custom FieldType produced) to create a FieldComparator
 (via newComparator - the numHits  sortPos could be anything) and then
 wrap that up in a Comparator facade that delegates to
 FieldComparator.compareValues
 That way a custom FieldType could be in complete control of the sort
 comparisons (even when merging ids).
 ...But as i said: i may be missing something, i'm not super familia with
 that code.  Please try it out and let us know if thta works -- either way
 please open a Jira pointing out the problems trying to implement
 distributed sorting in a custom FieldType.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType

2013-11-29 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13835435#comment-13835435
 ] 

ASF subversion and git services commented on SOLR-5354:
---

Commit 1546591 from [~steve_rowe] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1546591 ]

SOLR-5354: fix attribution (merged trunk r1546589)

 Distributed sort is broken with CUSTOM FieldType
 

 Key: SOLR-5354
 URL: https://issues.apache.org/jira/browse/SOLR-5354
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.4, 4.5, 5.0
Reporter: Jessica Cheng
Assignee: Steve Rowe
  Labels: custom, query, sort
 Fix For: 5.0, 4.7

 Attachments: SOLR-5354.patch, SOLR-5354.patch, SOLR-5354.patch, 
 SOLR-5354.patch


 We added a custom field type to allow an indexed binary field type that 
 supports search (exact match), prefix search, and sort as unsigned bytes 
 lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator 
 accomplishes what we want, and even though the name of the comparator 
 mentions UTF8, it doesn't actually assume so and just does byte-level 
 operation, so it's good. However, when we do this across different nodes, we 
 run into an issue where in QueryComponent.doFieldSortValues:
   // Must do the same conversion when sorting by a
   // String field in Lucene, which returns the terms
   // data as BytesRef:
   if (val instanceof BytesRef) {
 UnicodeUtil.UTF8toUTF16((BytesRef)val, spare);
 field.setStringValue(spare.toString());
 val = ft.toObject(field);
   }
 UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually 
 UTF8. I did a hack where I specified our own field comparator to be 
 ByteBuffer based to get around that instanceof check, but then the field 
 value gets transformed into BYTEARR in JavaBinCodec, and when it's 
 unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a 
 ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, 
 which decides to give me comparatorNatural in the else of the TODO for 
 CUSTOM, which barfs because byte[] are not Comparable...
 From Chris Hostetter:
 I'm not very familiar with the distributed sorting code, but based on your
 comments, and a quick skim of the functions you pointed to, it definitely
 seems like there are two problems here for people trying to implement
 custom sorting in custom FieldTypes...
 1) QueryComponent.doFieldSortValues - this definitely seems like it should
 be based on the FieldType, not an instanceof BytesRef check (oddly: the
 comment event suggestsion that it should be using the FieldType's
 indexedToReadable() method -- but it doesn't do that.  If it did, then
 this part of hte logic should work for you as long as your custom
 FieldType implemented indexedToReadable in a sane way.
 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that
 needs filled.  I'm guessing the sanest thing to do in the CUSTOM case
 would be to ask the FieldComparatorSource (which should be coming from the
 SortField that the custom FieldType produced) to create a FieldComparator
 (via newComparator - the numHits  sortPos could be anything) and then
 wrap that up in a Comparator facade that delegates to
 FieldComparator.compareValues
 That way a custom FieldType could be in complete control of the sort
 comparisons (even when merging ids).
 ...But as i said: i may be missing something, i'm not super familia with
 that code.  Please try it out and let us know if thta works -- either way
 please open a Jira pointing out the problems trying to implement
 distributed sorting in a custom FieldType.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType

2013-11-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13835077#comment-13835077
 ] 

ASF subversion and git services commented on SOLR-5354:
---

Commit 1546457 from [~steve_rowe] in branch 'dev/trunk'
[ https://svn.apache.org/r1546457 ]

SOLR-5354: Distributed sort is broken with CUSTOM FieldType

 Distributed sort is broken with CUSTOM FieldType
 

 Key: SOLR-5354
 URL: https://issues.apache.org/jira/browse/SOLR-5354
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.4, 4.5, 5.0
Reporter: Jessica Cheng
Assignee: Steve Rowe
  Labels: custom, query, sort
 Attachments: SOLR-5354.patch, SOLR-5354.patch, SOLR-5354.patch, 
 SOLR-5354.patch


 We added a custom field type to allow an indexed binary field type that 
 supports search (exact match), prefix search, and sort as unsigned bytes 
 lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator 
 accomplishes what we want, and even though the name of the comparator 
 mentions UTF8, it doesn't actually assume so and just does byte-level 
 operation, so it's good. However, when we do this across different nodes, we 
 run into an issue where in QueryComponent.doFieldSortValues:
   // Must do the same conversion when sorting by a
   // String field in Lucene, which returns the terms
   // data as BytesRef:
   if (val instanceof BytesRef) {
 UnicodeUtil.UTF8toUTF16((BytesRef)val, spare);
 field.setStringValue(spare.toString());
 val = ft.toObject(field);
   }
 UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually 
 UTF8. I did a hack where I specified our own field comparator to be 
 ByteBuffer based to get around that instanceof check, but then the field 
 value gets transformed into BYTEARR in JavaBinCodec, and when it's 
 unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a 
 ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, 
 which decides to give me comparatorNatural in the else of the TODO for 
 CUSTOM, which barfs because byte[] are not Comparable...
 From Chris Hostetter:
 I'm not very familiar with the distributed sorting code, but based on your
 comments, and a quick skim of the functions you pointed to, it definitely
 seems like there are two problems here for people trying to implement
 custom sorting in custom FieldTypes...
 1) QueryComponent.doFieldSortValues - this definitely seems like it should
 be based on the FieldType, not an instanceof BytesRef check (oddly: the
 comment event suggestsion that it should be using the FieldType's
 indexedToReadable() method -- but it doesn't do that.  If it did, then
 this part of hte logic should work for you as long as your custom
 FieldType implemented indexedToReadable in a sane way.
 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that
 needs filled.  I'm guessing the sanest thing to do in the CUSTOM case
 would be to ask the FieldComparatorSource (which should be coming from the
 SortField that the custom FieldType produced) to create a FieldComparator
 (via newComparator - the numHits  sortPos could be anything) and then
 wrap that up in a Comparator facade that delegates to
 FieldComparator.compareValues
 That way a custom FieldType could be in complete control of the sort
 comparisons (even when merging ids).
 ...But as i said: i may be missing something, i'm not super familia with
 that code.  Please try it out and let us know if thta works -- either way
 please open a Jira pointing out the problems trying to implement
 distributed sorting in a custom FieldType.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType

2013-11-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13835081#comment-13835081
 ] 

ASF subversion and git services commented on SOLR-5354:
---

Commit 1546461 from [~steve_rowe] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1546461 ]

SOLR-5354: Distributed sort is broken with CUSTOM FieldType (merged trunk 
r1546457)

 Distributed sort is broken with CUSTOM FieldType
 

 Key: SOLR-5354
 URL: https://issues.apache.org/jira/browse/SOLR-5354
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.4, 4.5, 5.0
Reporter: Jessica Cheng
Assignee: Steve Rowe
  Labels: custom, query, sort
 Fix For: 5.0, 4.7

 Attachments: SOLR-5354.patch, SOLR-5354.patch, SOLR-5354.patch, 
 SOLR-5354.patch


 We added a custom field type to allow an indexed binary field type that 
 supports search (exact match), prefix search, and sort as unsigned bytes 
 lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator 
 accomplishes what we want, and even though the name of the comparator 
 mentions UTF8, it doesn't actually assume so and just does byte-level 
 operation, so it's good. However, when we do this across different nodes, we 
 run into an issue where in QueryComponent.doFieldSortValues:
   // Must do the same conversion when sorting by a
   // String field in Lucene, which returns the terms
   // data as BytesRef:
   if (val instanceof BytesRef) {
 UnicodeUtil.UTF8toUTF16((BytesRef)val, spare);
 field.setStringValue(spare.toString());
 val = ft.toObject(field);
   }
 UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually 
 UTF8. I did a hack where I specified our own field comparator to be 
 ByteBuffer based to get around that instanceof check, but then the field 
 value gets transformed into BYTEARR in JavaBinCodec, and when it's 
 unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a 
 ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, 
 which decides to give me comparatorNatural in the else of the TODO for 
 CUSTOM, which barfs because byte[] are not Comparable...
 From Chris Hostetter:
 I'm not very familiar with the distributed sorting code, but based on your
 comments, and a quick skim of the functions you pointed to, it definitely
 seems like there are two problems here for people trying to implement
 custom sorting in custom FieldTypes...
 1) QueryComponent.doFieldSortValues - this definitely seems like it should
 be based on the FieldType, not an instanceof BytesRef check (oddly: the
 comment event suggestsion that it should be using the FieldType's
 indexedToReadable() method -- but it doesn't do that.  If it did, then
 this part of hte logic should work for you as long as your custom
 FieldType implemented indexedToReadable in a sane way.
 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that
 needs filled.  I'm guessing the sanest thing to do in the CUSTOM case
 would be to ask the FieldComparatorSource (which should be coming from the
 SortField that the custom FieldType produced) to create a FieldComparator
 (via newComparator - the numHits  sortPos could be anything) and then
 wrap that up in a Comparator facade that delegates to
 FieldComparator.compareValues
 That way a custom FieldType could be in complete control of the sort
 comparisons (even when merging ids).
 ...But as i said: i may be missing something, i'm not super familia with
 that code.  Please try it out and let us know if thta works -- either way
 please open a Jira pointing out the problems trying to implement
 distributed sorting in a custom FieldType.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType

2013-11-26 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13833312#comment-13833312
 ] 

Hoss Man commented on SOLR-5354:


Steve: looks great for the most part.

a few comments / questions...

* call me paranoid, but i really dislike distrib tests that *only* use the 
query() method to ensure that the distrib response is the same as the control 
response -- could we please add some assertions that use queryServer() to prove 
the docs are coming back in the right order in the distrib test?
* the test should really sanity check that multi-level sorts (eg: payload asc, 
id desc) are working properly
* we should be really clear  careful in the javadocs for 
FieldType.marshalSortValue and FieldType.unmarshalSortValue -- in your patch 
they refer to a value of this FieldType but that's not actually what they 
operate on.  They operate on the values used by the FieldComparator returned by 
the SortField for this FieldType (ie: SortableDoubleField's toObject returns a 
Double, but the marshal method operates on ByteRef)
* I'm confused why we still need comparatorNatural() and it's use for 
REWRITEABLE.  Why not actually rewrite() the SortField using the local 
IndexSearcher and then wrap the rewritten SortField's FieldComparator using 
comparatorFieldComparator() just like any other SortField? Since we're only 
ever going to compare the raw values on the coordinator it shouldn't matter if 
we rewrite in terms of the local IndexSearcher - it's the best we can do, and 
that seems safer then assuming REWRITABLE == function and trusting 
comparatorNatural.  (ie: consider someone who writes a custom FieldType that 
uses REWRITABLE)
* don't the marshal methods in StrField, TextField, and CollationField need 
null checks (for the possibilities of docs w/o a value in the sort field?)
* do we even have any existing tests of distributed sorting on strings  
numerics using sortMisstingLast / sortMissingFirst to be sure we don't break 
that? 

 Distributed sort is broken with CUSTOM FieldType
 

 Key: SOLR-5354
 URL: https://issues.apache.org/jira/browse/SOLR-5354
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.4, 4.5, 5.0
Reporter: Jessica Cheng
Assignee: Steve Rowe
  Labels: custom, query, sort
 Attachments: SOLR-5354.patch, SOLR-5354.patch


 We added a custom field type to allow an indexed binary field type that 
 supports search (exact match), prefix search, and sort as unsigned bytes 
 lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator 
 accomplishes what we want, and even though the name of the comparator 
 mentions UTF8, it doesn't actually assume so and just does byte-level 
 operation, so it's good. However, when we do this across different nodes, we 
 run into an issue where in QueryComponent.doFieldSortValues:
   // Must do the same conversion when sorting by a
   // String field in Lucene, which returns the terms
   // data as BytesRef:
   if (val instanceof BytesRef) {
 UnicodeUtil.UTF8toUTF16((BytesRef)val, spare);
 field.setStringValue(spare.toString());
 val = ft.toObject(field);
   }
 UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually 
 UTF8. I did a hack where I specified our own field comparator to be 
 ByteBuffer based to get around that instanceof check, but then the field 
 value gets transformed into BYTEARR in JavaBinCodec, and when it's 
 unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a 
 ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, 
 which decides to give me comparatorNatural in the else of the TODO for 
 CUSTOM, which barfs because byte[] are not Comparable...
 From Chris Hostetter:
 I'm not very familiar with the distributed sorting code, but based on your
 comments, and a quick skim of the functions you pointed to, it definitely
 seems like there are two problems here for people trying to implement
 custom sorting in custom FieldTypes...
 1) QueryComponent.doFieldSortValues - this definitely seems like it should
 be based on the FieldType, not an instanceof BytesRef check (oddly: the
 comment event suggestsion that it should be using the FieldType's
 indexedToReadable() method -- but it doesn't do that.  If it did, then
 this part of hte logic should work for you as long as your custom
 FieldType implemented indexedToReadable in a sane way.
 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that
 needs filled.  I'm guessing the sanest thing to do in the CUSTOM case
 would be to ask the FieldComparatorSource (which should be coming from the
 SortField that the custom FieldType produced) to 

[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType

2013-11-25 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13832268#comment-13832268
 ] 

Robert Muir commented on SOLR-5354:
---

This looks great Steve, thanks.

 Distributed sort is broken with CUSTOM FieldType
 

 Key: SOLR-5354
 URL: https://issues.apache.org/jira/browse/SOLR-5354
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.4, 4.5, 5.0
Reporter: Jessica Cheng
Assignee: Steve Rowe
  Labels: custom, query, sort
 Attachments: SOLR-5354.patch, SOLR-5354.patch


 We added a custom field type to allow an indexed binary field type that 
 supports search (exact match), prefix search, and sort as unsigned bytes 
 lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator 
 accomplishes what we want, and even though the name of the comparator 
 mentions UTF8, it doesn't actually assume so and just does byte-level 
 operation, so it's good. However, when we do this across different nodes, we 
 run into an issue where in QueryComponent.doFieldSortValues:
   // Must do the same conversion when sorting by a
   // String field in Lucene, which returns the terms
   // data as BytesRef:
   if (val instanceof BytesRef) {
 UnicodeUtil.UTF8toUTF16((BytesRef)val, spare);
 field.setStringValue(spare.toString());
 val = ft.toObject(field);
   }
 UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually 
 UTF8. I did a hack where I specified our own field comparator to be 
 ByteBuffer based to get around that instanceof check, but then the field 
 value gets transformed into BYTEARR in JavaBinCodec, and when it's 
 unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a 
 ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, 
 which decides to give me comparatorNatural in the else of the TODO for 
 CUSTOM, which barfs because byte[] are not Comparable...
 From Chris Hostetter:
 I'm not very familiar with the distributed sorting code, but based on your
 comments, and a quick skim of the functions you pointed to, it definitely
 seems like there are two problems here for people trying to implement
 custom sorting in custom FieldTypes...
 1) QueryComponent.doFieldSortValues - this definitely seems like it should
 be based on the FieldType, not an instanceof BytesRef check (oddly: the
 comment event suggestsion that it should be using the FieldType's
 indexedToReadable() method -- but it doesn't do that.  If it did, then
 this part of hte logic should work for you as long as your custom
 FieldType implemented indexedToReadable in a sane way.
 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that
 needs filled.  I'm guessing the sanest thing to do in the CUSTOM case
 would be to ask the FieldComparatorSource (which should be coming from the
 SortField that the custom FieldType produced) to create a FieldComparator
 (via newComparator - the numHits  sortPos could be anything) and then
 wrap that up in a Comparator facade that delegates to
 FieldComparator.compareValues
 That way a custom FieldType could be in complete control of the sort
 comparisons (even when merging ids).
 ...But as i said: i may be missing something, i'm not super familia with
 that code.  Please try it out and let us know if thta works -- either way
 please open a Jira pointing out the problems trying to implement
 distributed sorting in a custom FieldType.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType

2013-11-23 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830727#comment-13830727
 ] 

Robert Muir commented on SOLR-5354:
---

{quote}
I'm not sure why that would help? We can already ask each SortField for it's 
getField() and then look that up in the Schema. The crux of the problem really 
seems to be: naive assumptions in the distributed sorting code about how to 
safely send sort values over the wire; and what comparator to use when sorting 
those values.
{quote}

My point is that the serialization/deserialization doesn't really belong in the 
lucene comparator API, thats all. I agree with your proposed solution...

 Distributed sort is broken with CUSTOM FieldType
 

 Key: SOLR-5354
 URL: https://issues.apache.org/jira/browse/SOLR-5354
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.4, 4.5, 5.0
Reporter: Jessica Cheng
Assignee: Steve Rowe
  Labels: custom, query, sort
 Attachments: SOLR-5354.patch


 We added a custom field type to allow an indexed binary field type that 
 supports search (exact match), prefix search, and sort as unsigned bytes 
 lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator 
 accomplishes what we want, and even though the name of the comparator 
 mentions UTF8, it doesn't actually assume so and just does byte-level 
 operation, so it's good. However, when we do this across different nodes, we 
 run into an issue where in QueryComponent.doFieldSortValues:
   // Must do the same conversion when sorting by a
   // String field in Lucene, which returns the terms
   // data as BytesRef:
   if (val instanceof BytesRef) {
 UnicodeUtil.UTF8toUTF16((BytesRef)val, spare);
 field.setStringValue(spare.toString());
 val = ft.toObject(field);
   }
 UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually 
 UTF8. I did a hack where I specified our own field comparator to be 
 ByteBuffer based to get around that instanceof check, but then the field 
 value gets transformed into BYTEARR in JavaBinCodec, and when it's 
 unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a 
 ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, 
 which decides to give me comparatorNatural in the else of the TODO for 
 CUSTOM, which barfs because byte[] are not Comparable...
 From Chris Hostetter:
 I'm not very familiar with the distributed sorting code, but based on your
 comments, and a quick skim of the functions you pointed to, it definitely
 seems like there are two problems here for people trying to implement
 custom sorting in custom FieldTypes...
 1) QueryComponent.doFieldSortValues - this definitely seems like it should
 be based on the FieldType, not an instanceof BytesRef check (oddly: the
 comment event suggestsion that it should be using the FieldType's
 indexedToReadable() method -- but it doesn't do that.  If it did, then
 this part of hte logic should work for you as long as your custom
 FieldType implemented indexedToReadable in a sane way.
 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that
 needs filled.  I'm guessing the sanest thing to do in the CUSTOM case
 would be to ask the FieldComparatorSource (which should be coming from the
 SortField that the custom FieldType produced) to create a FieldComparator
 (via newComparator - the numHits  sortPos could be anything) and then
 wrap that up in a Comparator facade that delegates to
 FieldComparator.compareValues
 That way a custom FieldType could be in complete control of the sort
 comparisons (even when merging ids).
 ...But as i said: i may be missing something, i'm not super familia with
 that code.  Please try it out and let us know if thta works -- either way
 please open a Jira pointing out the problems trying to implement
 distributed sorting in a custom FieldType.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType

2013-11-22 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830238#comment-13830238
 ] 

Hoss Man commented on SOLR-5354:


I've been looking into Solr's distributed sorting code more and more as part of 
my investigating into SOLR-5463 and i spoke breifly with sarowe off line about 
the overlap.

I think the problems with CUSTOM distributed sorting is really just a subset of 
the larger weirdness with the assumptions Solr makes in general about how it 
can do distributed sorting and how it can de/serialize the sort values when 
merging hte results from the multiple shards.

I think my earlier suggestion (in email that jessica quoted in the issue 
summary) about using methods on the FieldType (like indexedToReadable and 
toObject) to ensure we safely de/serialize the sort values are still the right 
way to go -- we have to ensure that no matter what strange object an arbitrary 
objects are used by a FieldComparator, we can safely serialize it.   But i'm 
not longer convinced re-using those existing methods makes sense -- because the 
sort values used by a FieldType's FieldComparator may not map directly to the 
end user representation of the value (ie: TriedDateField sorts as long, but 
toObject returns Date; String fields sort on BytesRefs; Custom classes sort 
on who-knows-what, etc...)

I think the best solution would be something like:


* move the toExternal/toInternal concept in the existing patch out of 
FieldComparatorSource and into Solr's FieldType as methods clearly ment to be 
very speciic to sorting (ie: marshalSortValue and unmarshalSortValue)
* change the fsv=true logic on shards to use marshalSortValue for any SortField 
that is on a field (if it's score or a function it will be a sinple numeric and 
already safe to serialize over the wire)
* change the mergeIds logic on the coordinator node to explicitly use 
unmarshalSortValue and then use the _actual_ FieldComparator associated with 
each SortField instead of the hooky assumptions currently being made in 
ShardFieldSortedHitQueue.getCachedComparator about using things like 
comparatorNatural



Other misc comments...


bq. If the deserialization method depends on FieldType, the node responsible 
for the merge must also have the schema loaded, which might not be the case in 
SolrCloud.

That's already a requirement in SolrCloud - the coordnator node merging results 
and writing them back to the client already has to have the same schema.  (If 
it didn't a custom FieldType with a custom FieldComparator could never work, 
because there would be now way at all to know what order things should go in)

bq. I think solr should fix its own apis here? It could add FieldType[] to 
SortSpec or something like that.

I'm not sure why that would help?  We can already ask each SortField for it's 
getField() and then look that up in the Schema.  The crux of the problem really 
seems to be: naive assumptions in the distributed sorting code about how to 
safely send sort values over the wire; and what comparator to use when sorting 
those values.


 Distributed sort is broken with CUSTOM FieldType
 

 Key: SOLR-5354
 URL: https://issues.apache.org/jira/browse/SOLR-5354
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.4, 4.5, 5.0
Reporter: Jessica Cheng
Assignee: Steve Rowe
  Labels: custom, query, sort
 Attachments: SOLR-5354.patch


 We added a custom field type to allow an indexed binary field type that 
 supports search (exact match), prefix search, and sort as unsigned bytes 
 lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator 
 accomplishes what we want, and even though the name of the comparator 
 mentions UTF8, it doesn't actually assume so and just does byte-level 
 operation, so it's good. However, when we do this across different nodes, we 
 run into an issue where in QueryComponent.doFieldSortValues:
   // Must do the same conversion when sorting by a
   // String field in Lucene, which returns the terms
   // data as BytesRef:
   if (val instanceof BytesRef) {
 UnicodeUtil.UTF8toUTF16((BytesRef)val, spare);
 field.setStringValue(spare.toString());
 val = ft.toObject(field);
   }
 UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually 
 UTF8. I did a hack where I specified our own field comparator to be 
 ByteBuffer based to get around that instanceof check, but then the field 
 value gets transformed into BYTEARR in JavaBinCodec, and when it's 
 unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a 
 ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, 
 which decides 

[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType

2013-11-01 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811422#comment-13811422
 ] 

Steve Rowe commented on SOLR-5354:
--

Thanks for the review Robert.

bq. Can we please not have the Object/Object stuff in FieldComparatorSource? 
This is wrong: FieldComparator already has a generic type so I don't understand 
the need to discard type safety.

I'm not sure what you have in mind - do you think FieldComparatorSource should 
be generified? In this case I think each extending class will need to provide 
an implementation for these methods, since there isn't a sensible way to 
provide a default implementation of conversion to/from the generic type.

 bq. The unicode conversion for String/String_VAL is incorrect and should not 
exist: despite the name, these types can be any bytes

This is the status quo right now - the patch just keeps that in place.  But I 
agree.  I think the issue is non-binary (XML) serialization, for which UTF-8 is 
safe, but arbitrary binary is not.  Serializing all STRING/STRING_VAL as Base64 
seems wasteful in the general case.

Relatedly, looks like there's an orphaned {{SortField.Type.BYTES}} (orphaned in 
that it's not handled in lots of places) - I guess this should go away?

{quote}
As a concrete example the CollationField and ICUCollationField sort with 
String/String_VAL comparators but contain non-unicode bytes.

These currently do not work distributed today either (which I would love to see 
fixed on this issue).
{quote}

I'm working on a distributed version of the Solr (icu) collation tests.  Once I 
get that failing, I'll be able to test potential solutions.

 Distributed sort is broken with CUSTOM FieldType
 

 Key: SOLR-5354
 URL: https://issues.apache.org/jira/browse/SOLR-5354
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.4, 4.5, 5.0
Reporter: Jessica Cheng
Assignee: Steve Rowe
  Labels: custom, query, sort
 Attachments: SOLR-5354.patch


 We added a custom field type to allow an indexed binary field type that 
 supports search (exact match), prefix search, and sort as unsigned bytes 
 lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator 
 accomplishes what we want, and even though the name of the comparator 
 mentions UTF8, it doesn't actually assume so and just does byte-level 
 operation, so it's good. However, when we do this across different nodes, we 
 run into an issue where in QueryComponent.doFieldSortValues:
   // Must do the same conversion when sorting by a
   // String field in Lucene, which returns the terms
   // data as BytesRef:
   if (val instanceof BytesRef) {
 UnicodeUtil.UTF8toUTF16((BytesRef)val, spare);
 field.setStringValue(spare.toString());
 val = ft.toObject(field);
   }
 UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually 
 UTF8. I did a hack where I specified our own field comparator to be 
 ByteBuffer based to get around that instanceof check, but then the field 
 value gets transformed into BYTEARR in JavaBinCodec, and when it's 
 unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a 
 ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, 
 which decides to give me comparatorNatural in the else of the TODO for 
 CUSTOM, which barfs because byte[] are not Comparable...
 From Chris Hostetter:
 I'm not very familiar with the distributed sorting code, but based on your
 comments, and a quick skim of the functions you pointed to, it definitely
 seems like there are two problems here for people trying to implement
 custom sorting in custom FieldTypes...
 1) QueryComponent.doFieldSortValues - this definitely seems like it should
 be based on the FieldType, not an instanceof BytesRef check (oddly: the
 comment event suggestsion that it should be using the FieldType's
 indexedToReadable() method -- but it doesn't do that.  If it did, then
 this part of hte logic should work for you as long as your custom
 FieldType implemented indexedToReadable in a sane way.
 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that
 needs filled.  I'm guessing the sanest thing to do in the CUSTOM case
 would be to ask the FieldComparatorSource (which should be coming from the
 SortField that the custom FieldType produced) to create a FieldComparator
 (via newComparator - the numHits  sortPos could be anything) and then
 wrap that up in a Comparator facade that delegates to
 FieldComparator.compareValues
 That way a custom FieldType could be in complete control of the sort
 comparisons (even when merging ids).
 ...But as i said: i may be missing something, i'm not super 

[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType

2013-11-01 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811432#comment-13811432
 ] 

Robert Muir commented on SOLR-5354:
---

{quote}
This is the status quo right now - the patch just keeps that in place. 
{quote}

No its not: its a bug in solr. This patch moves that bug into Lucene. 

Lucene's APIs here work correctly on any bytes today.

 Distributed sort is broken with CUSTOM FieldType
 

 Key: SOLR-5354
 URL: https://issues.apache.org/jira/browse/SOLR-5354
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.4, 4.5, 5.0
Reporter: Jessica Cheng
Assignee: Steve Rowe
  Labels: custom, query, sort
 Attachments: SOLR-5354.patch


 We added a custom field type to allow an indexed binary field type that 
 supports search (exact match), prefix search, and sort as unsigned bytes 
 lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator 
 accomplishes what we want, and even though the name of the comparator 
 mentions UTF8, it doesn't actually assume so and just does byte-level 
 operation, so it's good. However, when we do this across different nodes, we 
 run into an issue where in QueryComponent.doFieldSortValues:
   // Must do the same conversion when sorting by a
   // String field in Lucene, which returns the terms
   // data as BytesRef:
   if (val instanceof BytesRef) {
 UnicodeUtil.UTF8toUTF16((BytesRef)val, spare);
 field.setStringValue(spare.toString());
 val = ft.toObject(field);
   }
 UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually 
 UTF8. I did a hack where I specified our own field comparator to be 
 ByteBuffer based to get around that instanceof check, but then the field 
 value gets transformed into BYTEARR in JavaBinCodec, and when it's 
 unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a 
 ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, 
 which decides to give me comparatorNatural in the else of the TODO for 
 CUSTOM, which barfs because byte[] are not Comparable...
 From Chris Hostetter:
 I'm not very familiar with the distributed sorting code, but based on your
 comments, and a quick skim of the functions you pointed to, it definitely
 seems like there are two problems here for people trying to implement
 custom sorting in custom FieldTypes...
 1) QueryComponent.doFieldSortValues - this definitely seems like it should
 be based on the FieldType, not an instanceof BytesRef check (oddly: the
 comment event suggestsion that it should be using the FieldType's
 indexedToReadable() method -- but it doesn't do that.  If it did, then
 this part of hte logic should work for you as long as your custom
 FieldType implemented indexedToReadable in a sane way.
 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that
 needs filled.  I'm guessing the sanest thing to do in the CUSTOM case
 would be to ask the FieldComparatorSource (which should be coming from the
 SortField that the custom FieldType produced) to create a FieldComparator
 (via newComparator - the numHits  sortPos could be anything) and then
 wrap that up in a Comparator facade that delegates to
 FieldComparator.compareValues
 That way a custom FieldType could be in complete control of the sort
 comparisons (even when merging ids).
 ...But as i said: i may be missing something, i'm not super familia with
 that code.  Please try it out and let us know if thta works -- either way
 please open a Jira pointing out the problems trying to implement
 distributed sorting in a custom FieldType.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType

2013-11-01 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811433#comment-13811433
 ] 

Robert Muir commented on SOLR-5354:
---

{quote}
I think the issue is non-binary (XML) serialization, for which UTF-8 is safe, 
but arbitrary binary is not. Serializing all STRING/STRING_VAL as Base64 seems 
wasteful in the general case.
{quote}

This is all solr stuff. I don't think it makes sense to move that logic into 
lucene, let the user deal with this. They might not be using XML at all: maybe 
thrift or avro or something else.

Why not just add serialize/deserialize methods to solr's FieldType.java? It 
seems like the obvious place. 

 Distributed sort is broken with CUSTOM FieldType
 

 Key: SOLR-5354
 URL: https://issues.apache.org/jira/browse/SOLR-5354
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.4, 4.5, 5.0
Reporter: Jessica Cheng
Assignee: Steve Rowe
  Labels: custom, query, sort
 Attachments: SOLR-5354.patch


 We added a custom field type to allow an indexed binary field type that 
 supports search (exact match), prefix search, and sort as unsigned bytes 
 lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator 
 accomplishes what we want, and even though the name of the comparator 
 mentions UTF8, it doesn't actually assume so and just does byte-level 
 operation, so it's good. However, when we do this across different nodes, we 
 run into an issue where in QueryComponent.doFieldSortValues:
   // Must do the same conversion when sorting by a
   // String field in Lucene, which returns the terms
   // data as BytesRef:
   if (val instanceof BytesRef) {
 UnicodeUtil.UTF8toUTF16((BytesRef)val, spare);
 field.setStringValue(spare.toString());
 val = ft.toObject(field);
   }
 UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually 
 UTF8. I did a hack where I specified our own field comparator to be 
 ByteBuffer based to get around that instanceof check, but then the field 
 value gets transformed into BYTEARR in JavaBinCodec, and when it's 
 unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a 
 ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, 
 which decides to give me comparatorNatural in the else of the TODO for 
 CUSTOM, which barfs because byte[] are not Comparable...
 From Chris Hostetter:
 I'm not very familiar with the distributed sorting code, but based on your
 comments, and a quick skim of the functions you pointed to, it definitely
 seems like there are two problems here for people trying to implement
 custom sorting in custom FieldTypes...
 1) QueryComponent.doFieldSortValues - this definitely seems like it should
 be based on the FieldType, not an instanceof BytesRef check (oddly: the
 comment event suggestsion that it should be using the FieldType's
 indexedToReadable() method -- but it doesn't do that.  If it did, then
 this part of hte logic should work for you as long as your custom
 FieldType implemented indexedToReadable in a sane way.
 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that
 needs filled.  I'm guessing the sanest thing to do in the CUSTOM case
 would be to ask the FieldComparatorSource (which should be coming from the
 SortField that the custom FieldType produced) to create a FieldComparator
 (via newComparator - the numHits  sortPos could be anything) and then
 wrap that up in a Comparator facade that delegates to
 FieldComparator.compareValues
 That way a custom FieldType could be in complete control of the sort
 comparisons (even when merging ids).
 ...But as i said: i may be missing something, i'm not super familia with
 that code.  Please try it out and let us know if thta works -- either way
 please open a Jira pointing out the problems trying to implement
 distributed sorting in a custom FieldType.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType

2013-11-01 Thread Jessica Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811450#comment-13811450
 ] 

Jessica Cheng commented on SOLR-5354:
-

{quote}
Why not just add serialize/deserialize methods to solr's FieldType.java? It 
seems like the obvious place.
{quote}

When SortField's are deserialized on the receiving end, it's no longer clear 
which FieldType the field came from. If the deserialization method depends on 
FieldType, the node responsible for the merge must also have the schema loaded, 
which might not be the case in SolrCloud. Maybe solr needs its own SortField 
too then?

 Distributed sort is broken with CUSTOM FieldType
 

 Key: SOLR-5354
 URL: https://issues.apache.org/jira/browse/SOLR-5354
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.4, 4.5, 5.0
Reporter: Jessica Cheng
Assignee: Steve Rowe
  Labels: custom, query, sort
 Attachments: SOLR-5354.patch


 We added a custom field type to allow an indexed binary field type that 
 supports search (exact match), prefix search, and sort as unsigned bytes 
 lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator 
 accomplishes what we want, and even though the name of the comparator 
 mentions UTF8, it doesn't actually assume so and just does byte-level 
 operation, so it's good. However, when we do this across different nodes, we 
 run into an issue where in QueryComponent.doFieldSortValues:
   // Must do the same conversion when sorting by a
   // String field in Lucene, which returns the terms
   // data as BytesRef:
   if (val instanceof BytesRef) {
 UnicodeUtil.UTF8toUTF16((BytesRef)val, spare);
 field.setStringValue(spare.toString());
 val = ft.toObject(field);
   }
 UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually 
 UTF8. I did a hack where I specified our own field comparator to be 
 ByteBuffer based to get around that instanceof check, but then the field 
 value gets transformed into BYTEARR in JavaBinCodec, and when it's 
 unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a 
 ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, 
 which decides to give me comparatorNatural in the else of the TODO for 
 CUSTOM, which barfs because byte[] are not Comparable...
 From Chris Hostetter:
 I'm not very familiar with the distributed sorting code, but based on your
 comments, and a quick skim of the functions you pointed to, it definitely
 seems like there are two problems here for people trying to implement
 custom sorting in custom FieldTypes...
 1) QueryComponent.doFieldSortValues - this definitely seems like it should
 be based on the FieldType, not an instanceof BytesRef check (oddly: the
 comment event suggestsion that it should be using the FieldType's
 indexedToReadable() method -- but it doesn't do that.  If it did, then
 this part of hte logic should work for you as long as your custom
 FieldType implemented indexedToReadable in a sane way.
 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that
 needs filled.  I'm guessing the sanest thing to do in the CUSTOM case
 would be to ask the FieldComparatorSource (which should be coming from the
 SortField that the custom FieldType produced) to create a FieldComparator
 (via newComparator - the numHits  sortPos could be anything) and then
 wrap that up in a Comparator facade that delegates to
 FieldComparator.compareValues
 That way a custom FieldType could be in complete control of the sort
 comparisons (even when merging ids).
 ...But as i said: i may be missing something, i'm not super familia with
 that code.  Please try it out and let us know if thta works -- either way
 please open a Jira pointing out the problems trying to implement
 distributed sorting in a custom FieldType.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType

2013-11-01 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811461#comment-13811461
 ] 

Robert Muir commented on SOLR-5354:
---

{quote}
When SortField's are deserialized on the receiving end, it's no longer clear 
which FieldType the field came from. If the deserialization method depends on 
FieldType, the node responsible for the merge must also have the schema loaded, 
which might not be the case in SolrCloud.
{quote}

Then where is it getting a comparator from? I don't understand how changing a 
lucene API solves this problem.

 Distributed sort is broken with CUSTOM FieldType
 

 Key: SOLR-5354
 URL: https://issues.apache.org/jira/browse/SOLR-5354
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.4, 4.5, 5.0
Reporter: Jessica Cheng
Assignee: Steve Rowe
  Labels: custom, query, sort
 Attachments: SOLR-5354.patch


 We added a custom field type to allow an indexed binary field type that 
 supports search (exact match), prefix search, and sort as unsigned bytes 
 lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator 
 accomplishes what we want, and even though the name of the comparator 
 mentions UTF8, it doesn't actually assume so and just does byte-level 
 operation, so it's good. However, when we do this across different nodes, we 
 run into an issue where in QueryComponent.doFieldSortValues:
   // Must do the same conversion when sorting by a
   // String field in Lucene, which returns the terms
   // data as BytesRef:
   if (val instanceof BytesRef) {
 UnicodeUtil.UTF8toUTF16((BytesRef)val, spare);
 field.setStringValue(spare.toString());
 val = ft.toObject(field);
   }
 UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually 
 UTF8. I did a hack where I specified our own field comparator to be 
 ByteBuffer based to get around that instanceof check, but then the field 
 value gets transformed into BYTEARR in JavaBinCodec, and when it's 
 unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a 
 ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, 
 which decides to give me comparatorNatural in the else of the TODO for 
 CUSTOM, which barfs because byte[] are not Comparable...
 From Chris Hostetter:
 I'm not very familiar with the distributed sorting code, but based on your
 comments, and a quick skim of the functions you pointed to, it definitely
 seems like there are two problems here for people trying to implement
 custom sorting in custom FieldTypes...
 1) QueryComponent.doFieldSortValues - this definitely seems like it should
 be based on the FieldType, not an instanceof BytesRef check (oddly: the
 comment event suggestsion that it should be using the FieldType's
 indexedToReadable() method -- but it doesn't do that.  If it did, then
 this part of hte logic should work for you as long as your custom
 FieldType implemented indexedToReadable in a sane way.
 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that
 needs filled.  I'm guessing the sanest thing to do in the CUSTOM case
 would be to ask the FieldComparatorSource (which should be coming from the
 SortField that the custom FieldType produced) to create a FieldComparator
 (via newComparator - the numHits  sortPos could be anything) and then
 wrap that up in a Comparator facade that delegates to
 FieldComparator.compareValues
 That way a custom FieldType could be in complete control of the sort
 comparisons (even when merging ids).
 ...But as i said: i may be missing something, i'm not super familia with
 that code.  Please try it out and let us know if thta works -- either way
 please open a Jira pointing out the problems trying to implement
 distributed sorting in a custom FieldType.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType

2013-11-01 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811463#comment-13811463
 ] 

Robert Muir commented on SOLR-5354:
---

{quote}
Maybe solr needs its own SortField too then?
{quote}

OK I see it, I think solr should fix its own apis here? It could add 
FieldType[] to SortSpec or something like that.

 Distributed sort is broken with CUSTOM FieldType
 

 Key: SOLR-5354
 URL: https://issues.apache.org/jira/browse/SOLR-5354
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.4, 4.5, 5.0
Reporter: Jessica Cheng
Assignee: Steve Rowe
  Labels: custom, query, sort
 Attachments: SOLR-5354.patch


 We added a custom field type to allow an indexed binary field type that 
 supports search (exact match), prefix search, and sort as unsigned bytes 
 lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator 
 accomplishes what we want, and even though the name of the comparator 
 mentions UTF8, it doesn't actually assume so and just does byte-level 
 operation, so it's good. However, when we do this across different nodes, we 
 run into an issue where in QueryComponent.doFieldSortValues:
   // Must do the same conversion when sorting by a
   // String field in Lucene, which returns the terms
   // data as BytesRef:
   if (val instanceof BytesRef) {
 UnicodeUtil.UTF8toUTF16((BytesRef)val, spare);
 field.setStringValue(spare.toString());
 val = ft.toObject(field);
   }
 UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually 
 UTF8. I did a hack where I specified our own field comparator to be 
 ByteBuffer based to get around that instanceof check, but then the field 
 value gets transformed into BYTEARR in JavaBinCodec, and when it's 
 unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a 
 ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, 
 which decides to give me comparatorNatural in the else of the TODO for 
 CUSTOM, which barfs because byte[] are not Comparable...
 From Chris Hostetter:
 I'm not very familiar with the distributed sorting code, but based on your
 comments, and a quick skim of the functions you pointed to, it definitely
 seems like there are two problems here for people trying to implement
 custom sorting in custom FieldTypes...
 1) QueryComponent.doFieldSortValues - this definitely seems like it should
 be based on the FieldType, not an instanceof BytesRef check (oddly: the
 comment event suggestsion that it should be using the FieldType's
 indexedToReadable() method -- but it doesn't do that.  If it did, then
 this part of hte logic should work for you as long as your custom
 FieldType implemented indexedToReadable in a sane way.
 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that
 needs filled.  I'm guessing the sanest thing to do in the CUSTOM case
 would be to ask the FieldComparatorSource (which should be coming from the
 SortField that the custom FieldType produced) to create a FieldComparator
 (via newComparator - the numHits  sortPos could be anything) and then
 wrap that up in a Comparator facade that delegates to
 FieldComparator.compareValues
 That way a custom FieldType could be in complete control of the sort
 comparisons (even when merging ids).
 ...But as i said: i may be missing something, i'm not super familia with
 that code.  Please try it out and let us know if thta works -- either way
 please open a Jira pointing out the problems trying to implement
 distributed sorting in a custom FieldType.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType

2013-10-29 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808147#comment-13808147
 ] 

Robert Muir commented on SOLR-5354:
---

I think there are a couple issues to address first at least in the lucene part.

Can we please not have the Object/Object stuff in FieldComparatorSource? This 
is wrong: FieldComparator already has a generic type so I don't understand the 
need to discard type safety.

The unicode conversion for String/String_VAL is incorrect and should not exist: 
despite the name, these types can be any bytes.

 Distributed sort is broken with CUSTOM FieldType
 

 Key: SOLR-5354
 URL: https://issues.apache.org/jira/browse/SOLR-5354
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.4, 4.5, 5.0
Reporter: Jessica Cheng
Assignee: Steve Rowe
  Labels: custom, query, sort
 Attachments: SOLR-5354.patch


 We added a custom field type to allow an indexed binary field type that 
 supports search (exact match), prefix search, and sort as unsigned bytes 
 lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator 
 accomplishes what we want, and even though the name of the comparator 
 mentions UTF8, it doesn't actually assume so and just does byte-level 
 operation, so it's good. However, when we do this across different nodes, we 
 run into an issue where in QueryComponent.doFieldSortValues:
   // Must do the same conversion when sorting by a
   // String field in Lucene, which returns the terms
   // data as BytesRef:
   if (val instanceof BytesRef) {
 UnicodeUtil.UTF8toUTF16((BytesRef)val, spare);
 field.setStringValue(spare.toString());
 val = ft.toObject(field);
   }
 UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually 
 UTF8. I did a hack where I specified our own field comparator to be 
 ByteBuffer based to get around that instanceof check, but then the field 
 value gets transformed into BYTEARR in JavaBinCodec, and when it's 
 unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a 
 ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, 
 which decides to give me comparatorNatural in the else of the TODO for 
 CUSTOM, which barfs because byte[] are not Comparable...
 From Chris Hostetter:
 I'm not very familiar with the distributed sorting code, but based on your
 comments, and a quick skim of the functions you pointed to, it definitely
 seems like there are two problems here for people trying to implement
 custom sorting in custom FieldTypes...
 1) QueryComponent.doFieldSortValues - this definitely seems like it should
 be based on the FieldType, not an instanceof BytesRef check (oddly: the
 comment event suggestsion that it should be using the FieldType's
 indexedToReadable() method -- but it doesn't do that.  If it did, then
 this part of hte logic should work for you as long as your custom
 FieldType implemented indexedToReadable in a sane way.
 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that
 needs filled.  I'm guessing the sanest thing to do in the CUSTOM case
 would be to ask the FieldComparatorSource (which should be coming from the
 SortField that the custom FieldType produced) to create a FieldComparator
 (via newComparator - the numHits  sortPos could be anything) and then
 wrap that up in a Comparator facade that delegates to
 FieldComparator.compareValues
 That way a custom FieldType could be in complete control of the sort
 comparisons (even when merging ids).
 ...But as i said: i may be missing something, i'm not super familia with
 that code.  Please try it out and let us know if thta works -- either way
 please open a Jira pointing out the problems trying to implement
 distributed sorting in a custom FieldType.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType

2013-10-29 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808160#comment-13808160
 ] 

Robert Muir commented on SOLR-5354:
---

As a concrete example the CollationField and ICUCollationField sort with 
String/String_VAL comparators but contain non-unicode bytes.

These currently do not work distributed today either (which I would love to see 
fixed on this issue).

 Distributed sort is broken with CUSTOM FieldType
 

 Key: SOLR-5354
 URL: https://issues.apache.org/jira/browse/SOLR-5354
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.4, 4.5, 5.0
Reporter: Jessica Cheng
Assignee: Steve Rowe
  Labels: custom, query, sort
 Attachments: SOLR-5354.patch


 We added a custom field type to allow an indexed binary field type that 
 supports search (exact match), prefix search, and sort as unsigned bytes 
 lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator 
 accomplishes what we want, and even though the name of the comparator 
 mentions UTF8, it doesn't actually assume so and just does byte-level 
 operation, so it's good. However, when we do this across different nodes, we 
 run into an issue where in QueryComponent.doFieldSortValues:
   // Must do the same conversion when sorting by a
   // String field in Lucene, which returns the terms
   // data as BytesRef:
   if (val instanceof BytesRef) {
 UnicodeUtil.UTF8toUTF16((BytesRef)val, spare);
 field.setStringValue(spare.toString());
 val = ft.toObject(field);
   }
 UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually 
 UTF8. I did a hack where I specified our own field comparator to be 
 ByteBuffer based to get around that instanceof check, but then the field 
 value gets transformed into BYTEARR in JavaBinCodec, and when it's 
 unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a 
 ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, 
 which decides to give me comparatorNatural in the else of the TODO for 
 CUSTOM, which barfs because byte[] are not Comparable...
 From Chris Hostetter:
 I'm not very familiar with the distributed sorting code, but based on your
 comments, and a quick skim of the functions you pointed to, it definitely
 seems like there are two problems here for people trying to implement
 custom sorting in custom FieldTypes...
 1) QueryComponent.doFieldSortValues - this definitely seems like it should
 be based on the FieldType, not an instanceof BytesRef check (oddly: the
 comment event suggestsion that it should be using the FieldType's
 indexedToReadable() method -- but it doesn't do that.  If it did, then
 this part of hte logic should work for you as long as your custom
 FieldType implemented indexedToReadable in a sane way.
 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that
 needs filled.  I'm guessing the sanest thing to do in the CUSTOM case
 would be to ask the FieldComparatorSource (which should be coming from the
 SortField that the custom FieldType produced) to create a FieldComparator
 (via newComparator - the numHits  sortPos could be anything) and then
 wrap that up in a Comparator facade that delegates to
 FieldComparator.compareValues
 That way a custom FieldType could be in complete control of the sort
 comparisons (even when merging ids).
 ...But as i said: i may be missing something, i'm not super familia with
 that code.  Please try it out and let us know if thta works -- either way
 please open a Jira pointing out the problems trying to implement
 distributed sorting in a custom FieldType.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType

2013-10-15 Thread Jessica Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796289#comment-13796289
 ] 

Jessica Cheng commented on SOLR-5354:
-

I think calling indexedToReadable() in QueryComponent.doFieldSortValues won't 
actually help here because it's trying to get the actual Object of the fields, 
not the readable String representation. If we do go with sending sort_values 
as readable strings, on the other side when QueryComponent.mergeIds is called, 
it'll need to take care of translating all readable strings to the actual 
Objects, which I'm not sure if there's an easy way to do.

The safest thing/ least change is probably to check the sort field type instead 
of using instanceof in QueryComponent.doFieldSortValues.

 Distributed sort is broken with CUSTOM FieldType
 

 Key: SOLR-5354
 URL: https://issues.apache.org/jira/browse/SOLR-5354
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.4, 4.5, 5.0
Reporter: Jessica Cheng
  Labels: custom, query, sort

 We added a custom field type to allow an indexed binary field type that 
 supports search (exact match), prefix search, and sort as unsigned bytes 
 lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator 
 accomplishes what we want, and even though the name of the comparator 
 mentions UTF8, it doesn't actually assume so and just does byte-level 
 operation, so it's good. However, when we do this across different nodes, we 
 run into an issue where in QueryComponent.doFieldSortValues:
   // Must do the same conversion when sorting by a
   // String field in Lucene, which returns the terms
   // data as BytesRef:
   if (val instanceof BytesRef) {
 UnicodeUtil.UTF8toUTF16((BytesRef)val, spare);
 field.setStringValue(spare.toString());
 val = ft.toObject(field);
   }
 UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually 
 UTF8. I did a hack where I specified our own field comparator to be 
 ByteBuffer based to get around that instanceof check, but then the field 
 value gets transformed into BYTEARR in JavaBinCodec, and when it's 
 unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a 
 ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, 
 which decides to give me comparatorNatural in the else of the TODO for 
 CUSTOM, which barfs because byte[] are not Comparable...
 From Chris Hostetter:
 I'm not very familiar with the distributed sorting code, but based on your
 comments, and a quick skim of the functions you pointed to, it definitely
 seems like there are two problems here for people trying to implement
 custom sorting in custom FieldTypes...
 1) QueryComponent.doFieldSortValues - this definitely seems like it should
 be based on the FieldType, not an instanceof BytesRef check (oddly: the
 comment event suggestsion that it should be using the FieldType's
 indexedToReadable() method -- but it doesn't do that.  If it did, then
 this part of hte logic should work for you as long as your custom
 FieldType implemented indexedToReadable in a sane way.
 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that
 needs filled.  I'm guessing the sanest thing to do in the CUSTOM case
 would be to ask the FieldComparatorSource (which should be coming from the
 SortField that the custom FieldType produced) to create a FieldComparator
 (via newComparator - the numHits  sortPos could be anything) and then
 wrap that up in a Comparator facade that delegates to
 FieldComparator.compareValues
 That way a custom FieldType could be in complete control of the sort
 comparisons (even when merging ids).
 ...But as i said: i may be missing something, i'm not super familia with
 that code.  Please try it out and let us know if thta works -- either way
 please open a Jira pointing out the problems trying to implement
 distributed sorting in a custom FieldType.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org