Huang Jing <[email protected]> writes: > When writing Org documents that contain CJ (Chinese and Japanese) > text, inline markup such as |*bold*|, |/italic/| etc often fails to be > recognized correctly, since CJ writing conventions do not divide words > using spaces, while Org's parser relies on whitespace or punctuation > to detect markup boundaries. > > 测试*文本* -> test *text* (Chinese) > テスト*テキススト* -> test *text* (Japanese) > > One workaround is using zero-width spaces, though this can be > problematic for TeX export backend since the CJK supporting macro- > packages (xeCJK, luatex-ja) might not be expecting this.
Thanks for reporting here! We have discussed this issue several times, but it was without native speakers participating in the discussion. There are two aspects of the problem with markup in CJK languages: 1. From Org perspective, 测试*文本* looks like an attempt to use intraword markup - something that was not considered as a common use case when designing Org markup. As a result, our official recommendation with using zero-width space (https://orgmode.org/manual/Escape-Character.html) becomes rather awkward. 2. Even if zero-width space is used, it is not treated differently during export - Org mode leaves zero-width spaces surrounding the markup intact, which is problematic with LaTeX export, where zero-width spaces are rendered as full-width spaces. We have previously discussed both aspects (multiple times). For the export, I once suggested a patch that would remove *all* zero-width spaces during export: https://list.orgmode.org/orgmode/87v8rkav2x.fsf@localhost/ However, Max pointed that zero-width spaces might actually be added intentionally for the purposes of creating non-breakable line. Another suggestion is removing spaces only around emphasis markup https://list.orgmode.org/[email protected]/ (then, we would also need to adjust ox-org to not lose the markup in such scenario). But then the question about intentionally used zero-width spaces (that happens to be around the markup) remains. And we can also do something just for LaTeX export. But then what about other export backends? Huang Jing, what are the conventions about zero-width spaces in Chinese HTML pages, OpenOffice documents, and in plain text? For the Org markup itself, we have considered alternative escape mechanisms. For example, I once suggested \--{} (new entity) to serve as a replacement for zero-width spaces. https://list.orgmode.org/orgmode/87mtct9y1f.fsf@localhost/ It still feels awkward though. Another idea is mimicking markdown with its doubled emphasis markers text**bold**text. Or we can make use of the discussed future inline special block markup: text@*{bold}text or text@bold{bold}text. https://list.orgmode.org/orgmode/875xwqj4tl.fsf@localhost/ The question is whether any of the above ideas will be practical for people who will need to use them often. -- Ihor Radchenko // yantar92, Org mode maintainer, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92>
