On Fri, Oct 21, 2016 at 12:51 PM, Lars via Lazarus <lazarus@lists.lazarus-ide.org> wrote: > Indeed this is a serious problem these days, unicode.. which is almost a > virus. > In GoLang they use something called "Runes" to try and solve the problem.
I had to search about what "runes" in GoLang mean. I found: --- "Code point" is a bit of a mouthful, so Go introduces a shorter term for the concept: rune. The term appears in the libraries and source code, and means exactly the same as "code point", with one interesting addition. The Go language defines the word rune as an alias for the type int32, so programs can be clear when an integer value represents a code point. --- So it is a new name for CodePoint. Great. It does not sound very useful to me. I hope they don't do something as stupid as Python 3 does, converting all string data internally to UTF-32. > Off topic but I wonder if Lazarus/fpc uses something anything > similar to golang's rune's approach or looked into it. Yes but we call it "CodePoint" like rest of the world does. CodePoints are the easy part of Unicode, regardless of encoding! Look at the examples here: http://wiki.freepascal.org/UTF8_strings_and_characters They can handle pretty much any use case dealing with CodePoints. It is not difficult. It is easy. Your worries about complexity of Unicode are valid but the reason is combining CodePoints into user perceived characters. The rules are complex, there is normalization and its associated problems etc. No, neither FPC nor Lazarus have library code to deal with that yet. The goal is to have an enumerator for user perceived characters, just like LazUnicode unit has for encoding agnostic CodePoints. Juha -- _______________________________________________ Lazarus mailing list Lazarus@lists.lazarus-ide.org http://lists.lazarus-ide.org/listinfo/lazarus