Huang Jing <[email protected]> writes:

> When writing Org documents that contain CJ (Chinese and Japanese)
> text, inline markup such as |*bold*|, |/italic/| etc often fails to be
> recognized correctly, since CJ writing conventions do not divide words
> using spaces, while Org's parser relies on whitespace or punctuation
> to detect markup boundaries.
>
>   测试*文本*           -> test *text* (Chinese)
>   テスト*テキススト*    -> test *text* (Japanese)
>
> One workaround is using zero-width spaces, though this can be
> problematic for TeX export backend since the CJK supporting macro-
> packages (xeCJK, luatex-ja) might not be expecting this.

Thanks for reporting here!
We have discussed this issue several times, but it was without native
speakers participating in the discussion.

There are two aspects of the problem with markup in CJK languages:

1. From Org perspective, 测试*文本* looks like an attempt to use
   intraword markup - something that was not considered as a common use
   case when designing Org markup. As a result, our official
   recommendation with using zero-width space
   (https://orgmode.org/manual/Escape-Character.html) becomes rather awkward.

2. Even if zero-width space is used, it is not treated differently
   during export - Org mode leaves zero-width spaces surrounding the
   markup intact, which is problematic with LaTeX export, where
   zero-width spaces are rendered as full-width spaces.

We have previously discussed both aspects (multiple times).

For the export, I once suggested a patch that would remove *all* zero-width
spaces during export:
https://list.orgmode.org/orgmode/87v8rkav2x.fsf@localhost/
However, Max pointed that zero-width spaces might actually be added
intentionally for the purposes of creating non-breakable line.

Another suggestion is removing spaces only around emphasis markup
https://list.orgmode.org/[email protected]/
(then, we would also need to adjust ox-org to not lose the markup in
such scenario). But then the question about intentionally used zero-width
spaces (that happens to be around the markup) remains.

And we can also do something just for LaTeX export. But then what about
other export backends? Huang Jing, what are the conventions about
zero-width spaces in Chinese HTML pages, OpenOffice documents, and in
plain text?

For the Org markup itself, we have considered alternative escape
mechanisms. For example, I once suggested \--{} (new entity) to serve as
a replacement for zero-width spaces.
https://list.orgmode.org/orgmode/87mtct9y1f.fsf@localhost/
It still feels awkward though.

Another idea is mimicking markdown with its doubled emphasis markers
text**bold**text.

Or we can make use of the discussed future inline special block markup:
text@*{bold}text or text@bold{bold}text.
https://list.orgmode.org/orgmode/875xwqj4tl.fsf@localhost/

The question is whether any of the above ideas will be practical for
people who will need to use them often.

-- 
Ihor Radchenko // yantar92,
Org mode maintainer,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

Reply via email to