Hi, I am working on a tokenizer based on Unicode text segmentation (UAX 29 <https://unicode.org/reports/tr29/#Word_Boundaries>). I am wondering if there would be an interest in adding range tables for word break categories <https://unicode.org/Public/12.1.0/ucd/auxiliary/WordBreakProperty.txt> to the x/text or unicode packages. It appears they could be code-gen’d alongside the rest of the range tables.
Pardon if this is already being done and I have missed it. I see some mention <https://github.com/golang/text/search?q=ALetter&unscoped_q=ALetter> of those categories (e.g. ALetter) in other places. My code is here <https://github.com/clipperhouse/uax29>. Thanks. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/2a058556-da51-46d0-a41b-28e323541332%40googlegroups.com.