[XeTeX] (not) understanding XeTeXinterchartoks

2015-05-08 Thread David Carlisle
The following plain xetex document loops forever on \show\tmpb
the \show don't cause the looping, if they are replaced by
\def\zzzb{} xetex just hangs in a tight loop.

The fact that it loops isn't necessarily a bug.
 \def\zzz{\zzz}\zzz
does the same,  but are there any words that could be added to
the manual so that I could have predicted this?

I'm not sure why 255 is being triggered at all as
the X is being inserted into the middle of an existing hlist.

The manual says 255 represents

 a boundary between a `run' of characters and something else

So I guess I am asking what 'run' means in this context:-)



David



\XeTeXinterchartokenstate = 1

\newXeTeXintercharclass \Xclass
\XeTeXcharclass `\X \Xclass

\XeTeXinterchartoks 0 \Xclass = {\zza}
\XeTeXinterchartoks 255 \Xclass = {\zzb}


\def\zza{\futurelet\tmpa\zzza}
\def\zzza{\show\tmpa}

\def\zzb{\futurelet\tmpb\zzzb}
\def\zzzb{\show\tmpb}



xxxXxxx


\bye


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] (not) understanding XeTeXinterchartoks

2015-05-08 Thread Adam Twardoch (List)
In modern text processing (Unicode+OpenType), a text run is a series of 
characters with the same formatting (font, size, color etc.), directionality 
(ltr, rtl) and script (writing system such as Latin, Greek, Arabic or Gujarati).

A. 

Sent from my mobile phone.

 On 08.05.2015, at 11:45, David Carlisle d.p.carli...@gmail.com wrote:
 
 The following plain xetex document loops forever on \show\tmpb
 the \show don't cause the looping, if they are replaced by
 \def\zzzb{} xetex just hangs in a tight loop.
 
 The fact that it loops isn't necessarily a bug.
 \def\zzz{\zzz}\zzz
 does the same,  but are there any words that could be added to
 the manual so that I could have predicted this?
 
 I'm not sure why 255 is being triggered at all as
 the X is being inserted into the middle of an existing hlist.
 
 The manual says 255 represents
 
 a boundary between a `run' of characters and something else
 
 So I guess I am asking what 'run' means in this context:-)
 
 
 
 David
 
 
 
 \XeTeXinterchartokenstate = 1
 
 \newXeTeXintercharclass \Xclass
 \XeTeXcharclass `\X \Xclass
 
 \XeTeXinterchartoks 0 \Xclass = {\zza}
 \XeTeXinterchartoks 255 \Xclass = {\zzb}
 
 
 \def\zza{\futurelet\tmpa\zzza}
 \def\zzza{\show\tmpa}
 
 \def\zzb{\futurelet\tmpb\zzzb}
 \def\zzzb{\show\tmpb}
 
 
 
 xxxXxxx
 
 
 \bye
 
 
 --
 Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] (not) understanding XeTeXinterchartoks

2015-05-08 Thread David Carlisle
On 8 May 2015 at 13:44, Jonathan Kew jfkth...@gmail.com wrote:


 If what you want to do here doesn't involve inserting text,

yes, well what I wanted to do here was understand \XeTeXcharclass, so it doesn't
necessarily involve inserting text:-)

\XeTeXcharclass is a slightly strange beast as it is documented as
working at the level of
hlist character nodes, but it inserts tokens, and is specified in
terms of character tokens.
So I was just trying to understand the processing model, while
avoiding reading the code

Hence the \futurelet which was giving me a hint as to where in the
pipeline I was, until I
looped.

 it appears that xetex sets the boundary state at the beginning of an 
 interchartoks-inserted token list;

That is of course the key here, the rest of the behaviour is
understandable given that, so thanks!

I was partly lead down this path by a comment on an old example of
Taco's providing a similar feature
in luatex

http://tex.stackexchange.com/a/21691/1090

where the final comment is

@Taco: unfortunately, your code is not equivalent to what XeTeX does,
since your code inserts the tokens between each character-token in the
input stream, while XeTeX inserts them between character nodes being
added to the current list. ...


so I was lead to thinking about when exactly does xetex add these tokens

David


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] (not) understanding XeTeXinterchartoks

2015-05-08 Thread Jonathan Kew

On 8/5/15 12:54, David Carlisle wrote:

On 8 May 2015 at 11:45, Adam Twardoch (List) list.a...@twardoch.com wrote:

In modern text processing (Unicode+OpenType), a text run is a series of 
characters with the same formatting (font, size, color etc.), directionality 
(ltr, rtl) and script (writing system such as Latin, Greek, Arabic or Gujarati).



er, yes, quite:-)

so my question is why did this X trigger the 255 boundary for end of
text run processing.



Well... it's a long time since I touched any of this, but let's see if 
we can figure it out. I don't suppose it's clearly spelled out anywhere 
just what such a run is, for interchartoks purposes.


From looking at the code, it appears that xetex sets the boundary 
state at the beginning of an interchartoks-inserted token list; and if 
-- as in your example -- that token list doesn't contain any characters 
that cause the current state to change, then the following character 
will be treated as adjacent to that boundary. Which is why it then sees 
the 255 \Xclass on (re-)encountering the X after processing the 0 
\Xclass insertion.


So the lesson seems to be that if you're going to provide interchartoks 
for a boundarysomething transition, whatever you insert had better 
cause a change in the current class -- i.e. insert an actual character 
of some kind -- otherwise you're headed for a loop. If what you want to 
do here doesn't involve inserting text, then you probably want to 
locally disable interchartoks processing. E.g. if you modify your 
example to say something like


  \def\zza{\begingroup\XeTeXinterchartokenstate=0 \futurelet\tmpa\zzza}
  \def\zzza#1{#1\show\tmpa\endgroup}

then you'll get \zza executed once, showing \tmpa as expected, but your 
\zzb never gets hit.


JK



--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] (not) understanding XeTeXinterchartoks

2015-05-08 Thread David Carlisle
On 8 May 2015 at 11:45, Adam Twardoch (List) list.a...@twardoch.com wrote:
 In modern text processing (Unicode+OpenType), a text run is a series of 
 characters with the same formatting (font, size, color etc.), directionality 
 (ltr, rtl) and script (writing system such as Latin, Greek, Arabic or 
 Gujarati).


er, yes, quite:-)

so my question is why did this X trigger the 255 boundary for end of
text run processing.

 xxxXxxx

David


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex