Re: [patch] reLyX and \(...\)*

Angus Leeming Mon, 10 Feb 2003 07:30:15 -0800

Jean-Marc Lasgouttes wrote:

>>>>>> "Angus" == Angus Leeming <[EMAIL PROTECTED]> writes:
> 
> Angus> This patch enables reLyX to handle \(...\)* correctly and not
> Angus> generate a pile of crap. I attach a test case so the
> Angus> inquisitive can try out reLyX both with and without the
> patch.
> 
> Can you explain a bit how this macro is used?


Not convincingly. I have been looking at this code hard over
the w/e in an effort to understand it. I'll get back to you when I do. 
Mean while, throw the patch away.

Here, however is my current state of knowledge (just for 
your delectation ;-)

Basically, the regex $macro is used by two subroutines, 
'lookAheadToken' and 'eatMultiToken'. Both subroutines are used both 
by the main TeX parser, TeX.pm, and by Amir's LyX-centric code.

For example, 'eat' is the main subroutine to read in a file and split 
it into tokens. Our own use of the TeX parser is so simple it may be 
written as
        sub process {
                my $txt = shift;
                my $eaten;
                while (defined ($eaten = $txt->eat)) {
                        ;
                }
        }
The real process is more complex than this but we don't pass the 
other, optional args, so it collapses to this. As you see, 'eat' is 
fundamental. I am still trying to decipher it, but 'lookAheadToken' 
and 'eatMultiToken' are both central to its operation.

'lookAheadToken' is only a few lines long, so I hoped to be able to 
understand it:

# return next token without eating it. Return '' if end of paragraph
  sub lookAheadToken {          # If arg2, will eat one token - WHY!? -Ak
    my $txt = shift;
    # Call paragraph with no argument to say we're "just looking"
    my $in = $txt->paragraph;
    return '' unless $in;       # To be able to match without warnings
    my $comment = undef;
    if ($$in =~ 
        /^(?:\s*)(?:$Text::TeX::commentpattern)?($Text::TeX::tokenpattern)/o) {
      if (defined $2) {return $1} #if 1 usualtokenclass char, return it ($1==$2)
      elsif (defined $3) {return "\\$3"} # Multiletter (\[a-zA-Z]+)
      elsif (defined $1) {return $1} # \" or notusualtokenclass
    }
    return '';
  }

The guts of the routine lies in that regex, which I'm still trying 
to deciper. In turn, $Text::TeX::tokenpattern is a string made up of 
other strings.

$macro = '\\\\(?:[^a-zA-Z]\*?|([a-zA-Z]+\*?)\s*)'; # Has one level of grouping
$active = "$macro|\\\$\\\$|\\^\\^.|$notusualtokenclass"; # 1 level of grouping
$tokenpattern = "($usualtokenclass)|$active"; # Two levels of grouping

So, when I understand these 3 regexes and the one in lookAheadToken 
and I'll understand that subroutine.

> Is there a list of macros somewhere that accept an optional *?

See lib/reLyX/syntax.default. Macros are defined explicitly as
\section
\section*

However, this is used after the initial parsing of the TeX file using
'lookAheadToken' and 'eatMultiToken'. At least that's my understanding 
to date. So, $macro is absolutely crucial to the success of the whole
scheme. If it fails, nothing will work.

> macros somewhere that accept an optional *? This should probably be
> defined in the syntax tables. What happens actually is that * is not
> part of the macro, but is a first optional argument to the given
> macro.
> 
> Do you handle for example starred version of \\? And also optional
> argument to \\ (like \\*[2cm])?

Don't know yet.

> I am not saying that your patch is wrong, I just want to understand
> what is going on... It seems to me that your code implies that only
> macros with alphabetic names can have optional *. This is completely
> false, actually. I think the handling og * should be done at the
> same place as the handling of [], if this is possible.

I will not rewrite the TeX parser. I propose a match that fixes a
bug. As I see it, the use of the regex in question is so fundamental 
to the operation of the TeX parser that the change will either work or 
it won't. There will be no 'half-way house'. 

However, I do understand your concerns (indeed, I share them), and
willl attempt to understand the code properly.

regards,
Angus

ps. perl is foul and I parser built out of regexes is fouler.

Re: [patch] reLyX and \(...\)*

Reply via email to