nickva commented on PR #4810: URL: https://github.com/apache/couchdb/pull/4810#issuecomment-1790141900
> Sure but what should be that value specifically? My impression is that 0xFFFF was created for that exact purpose because it was missing before. It would be good to know how others have implemented this feature in lack of this symbol. Looking over at the PR itself, one issue that jumps out is in the line `<<Arg/binary, 16#10FFFF>>`. We don't actually build a binary with 3 extra bytes at the end: 10, FF and FF; instead we're appending an integer with the size of 8 bits which gets maxed out at FF. ``` io:format("~w~n", [<<"A",16#10FFFF>>]). <<65,255>> ``` At some point the ICU library decided to treat that differently but they could have as well have thrown an error too. But then, if even if we did `16#10FFFF:24` it would still hit the same special case of the API implementation. What we're passing in is a utf8 encoded binary but then append a raw unicode code point to it. So then the fix might be to actually encode the `U+10FFF`: ``` io:format("~w~n", [<<16#10FFFF/utf8>>]). <<244,143,191,191>> ``` Even better, since unicode standard seems to define a max sortable code point `U+FFFF`, we should use that instead of `U+10FFFF`. So we could define a max as `unicode:characters_to_binary([16#FFFF])` or the shorter version of `<<16#FFFF/utf8>>`. ``` io:format("~w~n", [<<16#FFFF/utf8>>]). <<239,191,191>> ``` Give `<<Arg/binary, 16#FFFF/utf8>>` a try, that should work on all the OS-es. It probably calls for extra unit tests too. One interesting case to think about is that with `/utf8` binary type we'll crash if the string is not a valid utf8 binary (say if somehow jiffy allowed through a surrogate pair prefix or we then building /utf8 binary should blow up). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@couchdb.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org