nickva commented on PR #4810:
URL: https://github.com/apache/couchdb/pull/4810#issuecomment-1790141900

   > Sure but what should be that value specifically? My impression is that 
0xFFFF was created for that exact purpose because it was missing before. It 
would be good to know how others have implemented this feature in lack of this 
symbol.
   
   Looking over at the PR itself, one issue that jumps out is in the line 
`<<Arg/binary, 16#10FFFF>>`. We don't actually build a binary with 3 extra 
bytes at the end: 10, FF and FF; instead we're appending an integer with the 
size of 8 bits which gets maxed out at FF.
   
   ```
   io:format("~w~n", [<<"A",16#10FFFF>>]).
   <<65,255>>
   ```
   
   At some point the ICU library decided to treat that differently but they 
could have as well have thrown an error too.
   
   But then, if even if we did `16#10FFFF:24` it would still hit the same 
special case of the API implementation. What we're passing in is a utf8 encoded 
binary but then append a raw unicode code point to it.
   
   So then the fix might be to actually encode the `U+10FFF`:
   
   ```
   io:format("~w~n", [<<16#10FFFF/utf8>>]).
   <<244,143,191,191>>
   ```
   
   Even better, since unicode standard seems to define a max sortable code 
point `U+FFFF`, we should use that instead of `U+10FFFF`.  So we could define a 
max as `unicode:characters_to_binary([16#FFFF])` or the shorter version of 
`<<16#FFFF/utf8>>`.
   
   ```
   io:format("~w~n", [<<16#FFFF/utf8>>]).
   <<239,191,191>>
   ```
   
   Give `<<Arg/binary, 16#FFFF/utf8>>` a try, that should work on all the 
OS-es. It probably calls for extra unit tests too. 
   
   One interesting case to think about is that with `/utf8` binary type we'll 
crash if the string is not a valid utf8 binary (say if somehow jiffy allowed 
through a surrogate pair prefix or we then building /utf8 binary should blow 
up).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to