TLDR
We replace Gecko’s segmenter code with ICU4X [*1] ’s segmenter that is
compatible with UAX#14 [*2] and UAX#29 [*3].

Gecko's line/word segmenter was designed in pre-2000 and is one of the
oldest codes in Gecko. The Unicode Consortium published the standard
as "UAX#14 - Unicode Line Breaking Algorithm" and "UAX#29 - Unicode
Text Segmentation" for segmentation rules that cover many languages
after we did it. Unfortunately, Gecko’s segmentation isn’t compatible
with this standard. Other web browsers (WebKit and Blink) use ICU4C
for segmenter rules that are compatible with this standard, so this is
a web compatibility issue.

Now, Amazon, Google and Mozilla are working on ICU4X, which is Rust
crates for I18N. Specifically, I and Ting-Yu Lin are working on a new
segmenter crate in ICU4X. We decide that we use ICU4X for this new
segmenter implementation in Gecko. It means that this is the first
integration with the ICU4X project in Gecko.

Bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1719535

Specification: https://www.unicode.org/reports/tr14/ and
https://www.unicode.org/reports/tr29/

Standards Body: The Unicode Consortium

Platform coverage: All

Preference: intl.icu4x.segmenter.enabled

DevTools bug: N/A

Other Browsers: shipped

web-platform-tests:
https://wpt.fyi/results/css/css-text/line-breaking,
https://wpt.fyi/results/css/css-text/i18n

-- Makoto Kato / :m_kato

*1 https://github.com/unicode-org/icu4x/
*2 https://www.unicode.org/reports/tr14/
*3 https://www.unicode.org/reports/tr29/

-- 
You received this message because you are subscribed to the Google Groups 
"[email protected]" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/a/mozilla.org/d/msgid/dev-platform/CAP0dOsHawaK_nLWHLFpBkdL8JR67FcfsmVnS1J3c2e%2BeYGgeDw%40mail.gmail.com.

Reply via email to