Hi, I hope it's OK to resend an email that was overlooked previously...

My problem is separating text from commands in LaTeX. I'm doing pretty 
well recognizing LaTeX commands, but now I'm at the stage where I want 
to capture the "text". I'm having trouble defining "everything else".

Basically, I currently define LaTeX as

commands (as I define them), possibly separated by WS, and everything 
that's not a command is "text". I keep running into a problem that when 
I define "text" generously, it starts grabbing tokens that belong to 
commands. Any help would be greatly appreciated!

Thanks in advance,

Pavel

  I'm including what I have so far, and the document I'm hoping to parse.

grammar PGTeX;

doc : (command WS?)+ EOF;

command : escWord  cWord+ ( sWord+ cWord*)?;

sWord    : '[' word ']';
cWord    : '{' word '}';
escWord : '\\' word;

word : WORD;

WORD:    ('-'|'a'..'z'|'A'..'Z'|'0'..'9'|'\*')+;

WS  :   ( ' ' | '\t'| '\r' | '\n' )+;

COMMENT
     :    '%' (~('\n'|'\r'))*  {$channel = HIDDEN;};


And here's the document:

\documentclass{book}%
\usepackage{amsfonts}
\usepackage{amsmath}%
\newtheorem{summary}[theorem]{Summary}
\begin{document}


\chapter*{Intro}

Book starts here $x^{2}+y^{2}=1$. Here's an intersting faction:
\begin{equation}
\int_{0}^{1}\sin xdx=4
\end{equation}

\end{document}





List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

Reply via email to