Jean-Marc Lasgouttes wrote: >>>>>> "Angus" == Angus Leeming <[EMAIL PROTECTED]> writes: > > Angus> This patch enables reLyX to handle \(...\)* correctly and not > Angus> generate a pile of crap. I attach a test case so the > Angus> inquisitive can try out reLyX both with and without the > patch. > > Can you explain a bit how this macro is used?
Not convincingly. I have been looking at this code hard over the w/e in an effort to understand it. I'll get back to you when I do. Mean while, throw the patch away. Here, however is my current state of knowledge (just for your delectation ;-) Basically, the regex $macro is used by two subroutines, 'lookAheadToken' and 'eatMultiToken'. Both subroutines are used both by the main TeX parser, TeX.pm, and by Amir's LyX-centric code. For example, 'eat' is the main subroutine to read in a file and split it into tokens. Our own use of the TeX parser is so simple it may be written as sub process { my $txt = shift; my $eaten; while (defined ($eaten = $txt->eat)) { ; } } The real process is more complex than this but we don't pass the other, optional args, so it collapses to this. As you see, 'eat' is fundamental. I am still trying to decipher it, but 'lookAheadToken' and 'eatMultiToken' are both central to its operation. 'lookAheadToken' is only a few lines long, so I hoped to be able to understand it: # return next token without eating it. Return '' if end of paragraph sub lookAheadToken { # If arg2, will eat one token - WHY!? -Ak my $txt = shift; # Call paragraph with no argument to say we're "just looking" my $in = $txt->paragraph; return '' unless $in; # To be able to match without warnings my $comment = undef; if ($$in =~ /^(?:\s*)(?:$Text::TeX::commentpattern)?($Text::TeX::tokenpattern)/o) { if (defined $2) {return $1} #if 1 usualtokenclass char, return it ($1==$2) elsif (defined $3) {return "\\$3"} # Multiletter (\[a-zA-Z]+) elsif (defined $1) {return $1} # \" or notusualtokenclass } return ''; } The guts of the routine lies in that regex, which I'm still trying to deciper. In turn, $Text::TeX::tokenpattern is a string made up of other strings. $macro = '\\\\(?:[^a-zA-Z]\*?|([a-zA-Z]+\*?)\s*)'; # Has one level of grouping $active = "$macro|\\\$\\\$|\\^\\^.|$notusualtokenclass"; # 1 level of grouping $tokenpattern = "($usualtokenclass)|$active"; # Two levels of grouping So, when I understand these 3 regexes and the one in lookAheadToken and I'll understand that subroutine. > Is there a list of macros somewhere that accept an optional *? See lib/reLyX/syntax.default. Macros are defined explicitly as \section \section* However, this is used after the initial parsing of the TeX file using 'lookAheadToken' and 'eatMultiToken'. At least that's my understanding to date. So, $macro is absolutely crucial to the success of the whole scheme. If it fails, nothing will work. > macros somewhere that accept an optional *? This should probably be > defined in the syntax tables. What happens actually is that * is not > part of the macro, but is a first optional argument to the given > macro. > > Do you handle for example starred version of \\? And also optional > argument to \\ (like \\*[2cm])? Don't know yet. > I am not saying that your patch is wrong, I just want to understand > what is going on... It seems to me that your code implies that only > macros with alphabetic names can have optional *. This is completely > false, actually. I think the handling og * should be done at the > same place as the handling of [], if this is possible. I will not rewrite the TeX parser. I propose a match that fixes a bug. As I see it, the use of the regex in question is so fundamental to the operation of the TeX parser that the change will either work or it won't. There will be no 'half-way house'. However, I do understand your concerns (indeed, I share them), and willl attempt to understand the code properly. regards, Angus ps. perl is foul and I parser built out of regexes is fouler.