[XeTeX] (not) understanding XeTeXinterchartoks
The following plain xetex document loops forever on \show\tmpb the \show don't cause the looping, if they are replaced by \def\zzzb{} xetex just hangs in a tight loop. The fact that it loops isn't necessarily a bug. \def\zzz{\zzz}\zzz does the same, but are there any words that could be added to the manual so that I could have predicted this? I'm not sure why 255 is being triggered at all as the X is being inserted into the middle of an existing hlist. The manual says 255 represents a boundary between a `run' of characters and something else So I guess I am asking what 'run' means in this context:-) David \XeTeXinterchartokenstate = 1 \newXeTeXintercharclass \Xclass \XeTeXcharclass `\X \Xclass \XeTeXinterchartoks 0 \Xclass = {\zza} \XeTeXinterchartoks 255 \Xclass = {\zzb} \def\zza{\futurelet\tmpa\zzza} \def\zzza{\show\tmpa} \def\zzb{\futurelet\tmpb\zzzb} \def\zzzb{\show\tmpb} xxxXxxx \bye -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] (not) understanding XeTeXinterchartoks
In modern text processing (Unicode+OpenType), a text run is a series of characters with the same formatting (font, size, color etc.), directionality (ltr, rtl) and script (writing system such as Latin, Greek, Arabic or Gujarati). A. Sent from my mobile phone. On 08.05.2015, at 11:45, David Carlisle d.p.carli...@gmail.com wrote: The following plain xetex document loops forever on \show\tmpb the \show don't cause the looping, if they are replaced by \def\zzzb{} xetex just hangs in a tight loop. The fact that it loops isn't necessarily a bug. \def\zzz{\zzz}\zzz does the same, but are there any words that could be added to the manual so that I could have predicted this? I'm not sure why 255 is being triggered at all as the X is being inserted into the middle of an existing hlist. The manual says 255 represents a boundary between a `run' of characters and something else So I guess I am asking what 'run' means in this context:-) David \XeTeXinterchartokenstate = 1 \newXeTeXintercharclass \Xclass \XeTeXcharclass `\X \Xclass \XeTeXinterchartoks 0 \Xclass = {\zza} \XeTeXinterchartoks 255 \Xclass = {\zzb} \def\zza{\futurelet\tmpa\zzza} \def\zzza{\show\tmpa} \def\zzb{\futurelet\tmpb\zzzb} \def\zzzb{\show\tmpb} xxxXxxx \bye -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] (not) understanding XeTeXinterchartoks
On 8 May 2015 at 13:44, Jonathan Kew jfkth...@gmail.com wrote: If what you want to do here doesn't involve inserting text, yes, well what I wanted to do here was understand \XeTeXcharclass, so it doesn't necessarily involve inserting text:-) \XeTeXcharclass is a slightly strange beast as it is documented as working at the level of hlist character nodes, but it inserts tokens, and is specified in terms of character tokens. So I was just trying to understand the processing model, while avoiding reading the code Hence the \futurelet which was giving me a hint as to where in the pipeline I was, until I looped. it appears that xetex sets the boundary state at the beginning of an interchartoks-inserted token list; That is of course the key here, the rest of the behaviour is understandable given that, so thanks! I was partly lead down this path by a comment on an old example of Taco's providing a similar feature in luatex http://tex.stackexchange.com/a/21691/1090 where the final comment is @Taco: unfortunately, your code is not equivalent to what XeTeX does, since your code inserts the tokens between each character-token in the input stream, while XeTeX inserts them between character nodes being added to the current list. ... so I was lead to thinking about when exactly does xetex add these tokens David -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] (not) understanding XeTeXinterchartoks
On 8/5/15 12:54, David Carlisle wrote: On 8 May 2015 at 11:45, Adam Twardoch (List) list.a...@twardoch.com wrote: In modern text processing (Unicode+OpenType), a text run is a series of characters with the same formatting (font, size, color etc.), directionality (ltr, rtl) and script (writing system such as Latin, Greek, Arabic or Gujarati). er, yes, quite:-) so my question is why did this X trigger the 255 boundary for end of text run processing. Well... it's a long time since I touched any of this, but let's see if we can figure it out. I don't suppose it's clearly spelled out anywhere just what such a run is, for interchartoks purposes. From looking at the code, it appears that xetex sets the boundary state at the beginning of an interchartoks-inserted token list; and if -- as in your example -- that token list doesn't contain any characters that cause the current state to change, then the following character will be treated as adjacent to that boundary. Which is why it then sees the 255 \Xclass on (re-)encountering the X after processing the 0 \Xclass insertion. So the lesson seems to be that if you're going to provide interchartoks for a boundarysomething transition, whatever you insert had better cause a change in the current class -- i.e. insert an actual character of some kind -- otherwise you're headed for a loop. If what you want to do here doesn't involve inserting text, then you probably want to locally disable interchartoks processing. E.g. if you modify your example to say something like \def\zza{\begingroup\XeTeXinterchartokenstate=0 \futurelet\tmpa\zzza} \def\zzza#1{#1\show\tmpa\endgroup} then you'll get \zza executed once, showing \tmpa as expected, but your \zzb never gets hit. JK -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] (not) understanding XeTeXinterchartoks
On 8 May 2015 at 11:45, Adam Twardoch (List) list.a...@twardoch.com wrote: In modern text processing (Unicode+OpenType), a text run is a series of characters with the same formatting (font, size, color etc.), directionality (ltr, rtl) and script (writing system such as Latin, Greek, Arabic or Gujarati). er, yes, quite:-) so my question is why did this X trigger the 255 boundary for end of text run processing. xxxXxxx David -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex