[issue30717] Add unicode grapheme cluster break algorithm

2021-06-29 Thread Jakub Wilk
Change by Jakub Wilk : -- nosy: +jwilk ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue30717] Add unicode grapheme cluster break algorithm

2020-01-07 Thread Manish
Manish added the comment: > Does `unicode-segmentation` support all platforms that CPython supports? It's no-std, so it supports everything the base Rust compiler supports (which is basically everything llvm supports). And yeah, if there's something that doesn't match with the support

[issue30717] Add unicode grapheme cluster break algorithm

2020-01-06 Thread Paul Ganssle
Paul Ganssle added the comment: > Oh, also, if y'all are fine with binding to Rust (through a C ABI) I'd love > to help y'all use unicode-segmentation, which is much less work that pulling > in ICU. Otherwise if y'all have implementation questions I can answer them. > This spec is kinda

[issue30717] Add unicode grapheme cluster break algorithm

2020-01-06 Thread Manish
Manish added the comment: > one never needs to look at more than two adjacent code points to tell whether or not a grapheme break will occur between them, so this ought to be pretty efficient. That note is outdated (and has been outdated since Unicode 9). The regional indicator rules

[issue30717] Add unicode grapheme cluster break algorithm

2020-01-06 Thread Steven D'Aprano
Steven D'Aprano added the comment: > I think it would be a mistake to make the stdlib use this for most > notions of what a "character" is, as I said this notion is also > inaccurate. Having an iterator library somewhere that you can use and > compose is great, changing the internal

[issue30717] Add unicode grapheme cluster break algorithm

2020-01-05 Thread Manish
Manish added the comment: Oh, also, if y'all are fine with binding to Rust (through a C ABI) I'd love to help y'all use unicode-segmentation, which is much less work that pulling in ICU. Otherwise if y'all have implementation questions I can answer them. This spec is kinda tricky to

[issue30717] Add unicode grapheme cluster break algorithm

2020-01-05 Thread Manish
Manish added the comment: Hi, Unicodey person here, I'm involved in Unicode itself and also maintain an implementation of this particular spec[1]. So, firstly, > "a⃑".center(width=5, fillchar=".") If you're trying to do terminal width stuff, extended grapheme clusters *will not* solve

[issue30717] Add unicode grapheme cluster break algorithm

2019-02-19 Thread Bert JW Regeer
Change by Bert JW Regeer : -- nosy: +Bert JW Regeer ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue30717] Add unicode grapheme cluster break algorithm

2019-02-19 Thread Jens Troeger
Change by Jens Troeger : -- nosy: +_savage ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue30717] Add unicode grapheme cluster break algorithm

2018-09-09 Thread Matej Cepl
Change by Matej Cepl : -- nosy: +mcepl ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue30717] Add unicode grapheme cluster break algorithm

2018-08-23 Thread Paul Ganssle
Change by Paul Ganssle : -- nosy: +p-ganssle ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue30717] Add unicode grapheme cluster break algorithm

2018-08-19 Thread Xiang Zhang
Change by Xiang Zhang : -- nosy: +xiang.zhang ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue30717] Add unicode grapheme cluster break algorithm

2018-08-18 Thread Bian Jiaping
Change by Bian Jiaping : -- nosy: +bianjp ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue30717] Add unicode grapheme cluster break algorithm

2018-02-12 Thread INADA Naoki
INADA Naoki added the comment: We missed 3.7 train. I'm sorry about I couldn't review it. But I have many shine features I want in 3.7 and I have no time to review all. Especially, I need to understand tr29. It was hard job to me. I think publishing this (and any

[issue30717] Add unicode grapheme cluster break algorithm

2017-08-07 Thread Guillaume Sanchez
Guillaume Sanchez added the comment: > I don't think unicodedata is the right place I do agree with that. A new module sounds good, would it be a problem if that module would contain very few functions at first? > Can we mark this as having a Provisional API to give us time to decide on the

[issue30717] Add unicode grapheme cluster break algorithm

2017-08-04 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: The well known library for Unicode support in C++ and Java is ICU (International Components for Unicode). There is a Python wrapper [1]. This is a large complex library that covers many aspects of Unicode support. It's interface looks rather Javaic than

[issue30717] Add unicode grapheme cluster break algorithm

2017-08-04 Thread Stefan Behnel
Stefan Behnel added the comment: Wouldn't this be a typical case where we'd expect a module to evolve and gain usage on PyPI first, before adding it to the stdlib? Searching for "grapheme" in PyPI gives some results for me. Even if they do not cover what this ticket asks for, they might give

[issue30717] Add unicode grapheme cluster break algorithm

2017-08-03 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 03.08.2017 15:05, Guillaume Sanchez wrote: > > Guillaume Sanchez added the comment: > > I have a few criticism to do against that proto-PEP > > http://mail.python.org/pipermail/python-dev/2001-July/015938.html > > In particular, the fact that all

[issue30717] Add unicode grapheme cluster break algorithm

2017-08-03 Thread Steven D'Aprano
Steven D'Aprano added the comment: On Thu, Aug 03, 2017 at 11:21:38AM +, Serhiy Storchaka wrote: > Should iterators provide just substrings or their positions? [...] I think we're breaking new ground here and I'm not sure what the right API should be. Should we follow Perl 6?

[issue30717] Add unicode grapheme cluster break algorithm

2017-08-03 Thread Guillaume Sanchez
Guillaume Sanchez added the comment: I have a few criticism to do against that proto-PEP http://mail.python.org/pipermail/python-dev/2001-July/015938.html In particular, the fact that all those functions return an index prevents any state keeping. That's a problem because: > next_(u, index)

[issue30717] Add unicode grapheme cluster break algorithm

2017-08-03 Thread Guillaume Sanchez
Guillaume Sanchez added the comment: Thanks for your consideration. I'm currently fixing what's been asked in the reviews. > But it would be useful to provide also word and sentence iterators. I'll gladly do that as well! > I think emitting a pair (pos, substring) would be more useful. That

[issue30717] Add unicode grapheme cluster break algorithm

2017-08-03 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Issue18406 is closed as a duplicate of this issue. There are useful links in issue18406. In particular see a proto-PEP of Unicode Indexing Helper Module: http://mail.python.org/pipermail/python-dev/2001-July/015938.html I agreed that providing grapheme

[issue30717] Add unicode grapheme cluster break algorithm

2017-08-03 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : -- components: +Unicode nosy: +benjamin.peterson, ezio.melotti, lemburg, loewis stage: needs patch -> patch review title: str.center() is not unicode aware -> Add unicode grapheme cluster break algorithm