On 2017-01-29, Georg Baum wrote:
> Guenter Milde wrote:

>> ... There may be issues with documents containing literal Unicode
>> dashes: these may now have different line breaks.


> If this is an issue then it was already an issue when format 481 was 
> introduced (because we changed -- to U+2013 and --- to U+2014 in the 
> conversion).

Unfortunatly, yes. It turns out the correct change would have been
to change -- to U+2013,U+200B and --- to U+2014,U+200B
(or to SpecialChars).

Why?

* --- is converted to EM DASH by TeX via font ligature.

* TeX insert a line breaking opportunity (LBO) after each HYPHEN (-)
  and suppresses hyphenation in the preceding word, also if
  three HYPHENs become an EM DASH by ligating.

* TeX does *not* insert an LBO afte a literal EM DASH or \textemdash macro
  while the Unicode standard recommends LBOs before and after EM DASH.

  Exception: XeTeX treats literal dashes and \textemdash macro like the
  ligatures - no hypenation in the word before, LBO after.
  This behaviour can be switched of by the preamble command


* Unicode and LyX/LaTeX recommend/support the Zero Width Space
  U+200B (ZWSP) as representation of a line break opportunity that does not
  insert a hyphen (in contrast to the SOFT HYPHEN).

As a result, the TeX input string "---" is functionally equivalent to
EM DASH + ZWSP (except for missing hyphenation of the preceding word
with "---").

For more "research", background and links, see
http://www.lyx.org/trac/ticket/10543


> If native support for both versions of dashes (-- vs. \textendash in LaTeX)  
> is really needed then this should be done via a special char inset.

The SpecialChar inset(s) could either be 

a) insets for "ligature dashes" -- and ---

   +1 full backwards compatibility with fileformat <481

b) inset for zero width space ZWSP

   +2 versatile: currently there is no LyX support for a line breaking
      opportunity that does not insert a hyphen (except for a literal ZWSP
      that is invisible in the GUI)

   +1 can be implemented as space inset instance

   +0 backwards compatible with fileformat <481 
      when replacing --- by EM DASH + SpecialChar(ZWSP)
      (Except for not suppressing hyphenation in the preceding word, but
       it may be argued that this is actually an improvement)


>> It is not about "horrible look", but about WYSIWYM:

>> Treat hyphens similar to other TeX ligatures (e.g. << and >>) and special
>> characters:

>> * show on screen what you will get in the output

>> * escape in LaTeX export.

>> If the LyX GUI shows "get the LyX version with `lyx --version`", I don't
>> want a suprise `lyx –version` in the output.

> This is the most important point IMO (and the very reason to introduce 
> format 481 which fixed https://www.lyx.org/trac/ticket/3647).

Yes.

> I have no strong opinion about the UI for entering en-dashes and em-dashes. 
> If there are better methods than the current one then the UI should change.

I favour an LFUN for input. This can be bound to the - key by default.
It could be extended to allow an optional argument "allowbreak" that would
add (and care for) ZWSP after the DASHES.


Günter


Reply via email to