[ To [email protected] CC [email protected] CC [email protected] Please keep tex-hyphen copied in your replies.]
Hi, I'd like to invite you to checkout <URL:https://github.com/sh2d/padrinoma>. The repository contains a package that provides support for pattern driven node list manipulations in LuaTeX. There are a handful of typographic features missing in TeX that involve certain kinds of glyph replacement and in principle can be implemented at the node list level in LuaTeX. Examples are non-standard hyphenation, smart ligature building or long/round s handling in black-letter fonts. There's two main questions that need to be answered when doing low-level operations like this on node lists: a) Where to apply manipulations? b) What manipulations to apply? The purpose of the padrinoma package is to give a higher-level answer to the first question. Given suitable patterns (that look like ordinary Liang patterns), the package can scan a node list, apply the patterns to the words found and return data structures that contain the results of the pattern matching. Some basic, illustrative examples of how that can be used can be found in directory examples/pdnm/. Document hyph-mark-color.tex colours all letter pairs surrounding valid hyphenation points in a word (without falling-back to LuaTeX's lang.hyphenate function). Document hyph-mark-explicit.tex is similar, but inserts a certain character at a valid hyphenation position instead of the colouring. Document german-nstd-hyph.tex is a bit more ambitious and shows an attempt to bring non-standard ck hyphenation to German users. Example patterns are provided. Quick start: To compile the sample documents, the following files lua-classes/cls_pdnm*.lua lua-modules/pdnm*.lua have to be placed in a local TEXMF tree. In general, mktexlsr needs to be run afterwards. Then move to the example documents and compile them using LuaLaTeX. There's not much user-level documentation, currently. That's because the package doesn't contain user-level code. :-) The API of Lua modules and classes is documented in LuaDoc format. Documentation of examples, on the other hand, is terse or non-existent. If you want to play with the example documents, look for a line > nlm.register_manipulation('hyph-la.pat.txt', 'pdnm_hyph-mark-explicit') or similar. The first argument to the function call is the name of an ordinary pattern file (plain UTF-8 text) and the second argument is the name of a Lua module implementing a particular kind of node list manipulation. Directory lua-classes/ contains basic data structures needed for the pattern matching. Pure TeX hackers need not care about the code there. Directory lua-modules/ contains modules that apply the data structures from the former directory to LuaTeX's node lists. What works: The example documents, see above. What's missing: Much! I'm announcing the package here, because it is at a point where input and code contributions from people more firm in TeX and LaTeX internals than me is desired. Format: * A user-level TeX and LaTeX interface is missing. * How should a low-level LuaTeX interface look like? If the approach shown here proves useful, I think non-standard hyphenation and other application should get first class support similar to regular hyphenation by formats. * What about Babel/Polyglossia/hyph-utf8 integration? Plain TeX: * Currently, only a very basic notion of a "word" is implemented. Basically, a word is a series of glyph, discretionary or user-defined whatsit nodes. Plain TeX has a much more sophisticated notion of a "word subject to hyphenation". => see file lua-modules/pdnm_nl_iterate_words.lua * The language of a "word" is currently completely ignored. * "Words" are currently not checked against the list of hyphenation exceptions. * There's no way to locally switch off a particular manipulation for a single word or phrase. This is needed for words where certain glyph replacements might not be desirable, e.g., names for better recognition. * And much more ... Prospectively, I hope that LuaTeX gets means to apply more than one type of patterns and custom manipulations to a node list built-in so that this package renders superfluous. (Well, if this approach works out, Hans and Taco might as well get reluctant adding that functionality to the core. :-) Until then, please share your ideas and -- more important -- your coding skills! Happy TeXing! Stephan Hennig
