Trying to get some sort of 'implicit' bidi with luatex, I though
of using an (extenal) OCP filter as it is much easier than messing with
luatex nodes, and it respects TeX grouping.
The attached example loads the do_bidi.py script as an \externalocp, it
works fine with plain luatex, but not with 'texexec', using 'context' I get:
MtxRun | fatal error, message: luatex: execution interrupted
Any ideas? Is there a lua equivalent to \externalocp ?
Regards,
Khaled
--
Khaled Hosny
Arabic localizer and member of Arabeyes.org team
% engine=luatex
\externalocp\OCPbidi=do_bidi.py {}
\ocplist\bidi=
\addbeforeocplist 100 \OCPbidi
\nullocplist
\pushocplist\bidi
\pardir TRT
\textdir TRT
Hello world. % this should appear LTR despite \textdir
\bye
#!/usr/bin/env python
# subflip - Prepare subtitle files for non-bidi-aware players.
# Copyright (C) 2006 Noam Raphael < spam.noam at gmail.com >
# Copyright (C) 2008 Khaled Hosny <[EMAIL PROTECTED]>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
"""
This class defines the function tag_text, which tags a text for
directionality, by using the relevant part of the Unicode BIDI algorithm
(see http://unicode.org/reports/tr9/).
It defines the function tag_text, which gets an iterator over characters, and
return an iterator over characters and the constants OPEN_L, CLOSE_L, OPEN_R
and CLOSE_R. These constants are inserted between the given characters, to
give the directionality structure of the text.
This module currently doesn't implement the algorithm for arabic characters,
since those require some special treatment.
"""
__all__ = ['tag_text', 'OPEN_L', 'CLOSE_L', 'OPEN_R', 'CLOSE_R']
from collections import deque
from unicodedata import bidirectional
def dir(c):
types = {
'ON' : 'N', 'B' : 'N',
'BN' : 'N', 'S' : 'N',
'WS' : 'N', 'R' : 'R',
'NSM' : 'R', 'AL' : 'R',
'L' : 'L', 'EN' : 'EN',
'AN' : 'EN','ET' : 'T',
'ES' : 'S', 'CS' : 'S',
}
d = bidirectional(c)
return types[d]
class StateMachine(object):
'''
This class is the base class for the state machine classes defined here.
The state machine implemented has a state, which is a string. In each
step, the machine gets two arguments, and according to these and the
state, it produces some output and changes the state.
In this implementation, the input is taken from one iterable, and the
output is given using the iterable protocol. When output is requested,
by a call to the next() method, the given iterable's next() method is
called to produce the two input arguments (arg1, arg2). Then, the method
named "%s_%s" % (current_state, arg1) is called with arg2 as its argument.
The method is expected to return an iterable over output elements, which
will be given upon subsequent calls to the instance's next() method.
'''
def __init__(self, iterable):
if not hasattr(self, 'state'):
raise NotImplementedError, \
'You must implement an __init__ method that will initialize '\
'self.state.'
self._input_source = iter(iterable)
self._current_output_generator = iter([])
self._finishing = False
def next(self):
while True:
try:
return self._current_output_generator.next()
except StopIteration:
if self._finishing:
raise
# Make another call to the appropriate method.
try:
letter, input = self._input_source.next()
except StopIteration:
self._finishing = True
self._current_output_generator = \
getattr(self, '%s_END' % self.state)()
if self._current_output_generator is None:
self._current_output_generator = iter([])
continue
self._current_output_generator = \
getattr(self, '%s_%s' % (self.state, letter))(input)
if self._current_output_generator is None:
self._current_output_generator = iter([])
def __iter__(self):
return self
# For us, there are six types of chars:
#
# L - strong left-to-right
# R - strong right-to-left
# N - neu