Hi, I hope it's OK to resend an email that was overlooked previously...
My problem is separating text from commands in LaTeX. I'm doing pretty
well recognizing LaTeX commands, but now I'm at the stage where I want
to capture the "text". I'm having trouble defining "everything else".
Basically, I currently define LaTeX as
commands (as I define them), possibly separated by WS, and everything
that's not a command is "text". I keep running into a problem that when
I define "text" generously, it starts grabbing tokens that belong to
commands. Any help would be greatly appreciated!
Thanks in advance,
Pavel
I'm including what I have so far, and the document I'm hoping to parse.
grammar PGTeX;
doc : (command WS?)+ EOF;
command : escWord cWord+ ( sWord+ cWord*)?;
sWord : '[' word ']';
cWord : '{' word '}';
escWord : '\\' word;
word : WORD;
WORD: ('-'|'a'..'z'|'A'..'Z'|'0'..'9'|'\*')+;
WS : ( ' ' | '\t'| '\r' | '\n' )+;
COMMENT
: '%' (~('\n'|'\r'))* {$channel = HIDDEN;};
And here's the document:
\documentclass{book}%
\usepackage{amsfonts}
\usepackage{amsmath}%
\newtheorem{summary}[theorem]{Summary}
\begin{document}
\chapter*{Intro}
Book starts here $x^{2}+y^{2}=1$. Here's an intersting faction:
\begin{equation}
\int_{0}^{1}\sin xdx=4
\end{equation}
\end{document}
List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address
--
You received this message because you are subscribed to the Google Groups
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/il-antlr-interest?hl=en.