On Wed, Jun 20, 2012 at 1:26 AM, Gregory Woodhouse <[email protected]> wrote:
> I want to write a rule that will recognize strings in a language (MUMPS) that
> doubles double quotes as a means of escaping them. For example "The double
> quote symbol is \"." would be "The double quote symbol is ""." and "\"" would
> be """". That seems simple enough except that I need to write regular
> expression that matches any printing character (including #\spacer and #\tab
> except, of course #\". There is the complement operator, but that gives me
> any character but #\", not quite what I want. With a set difference, I
> suppose I could do something like
>
> DQUOTE (DQUOTE DQUOTE | printing - DQUOTE)* DQUOTE
>
> but again, I'm not quite sure how to express this in the lexer.
Perhaps we can use the character set complement operator. Let's see...
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
#lang racket
(require parser-tools/lex)
(define my-lexer
(lexer [(concatenation
"\""
(repetition 0 +inf.0 (union (char-complement #\")
"\"\""))
"\"")
lexeme]))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
Would this work? Here's how it behaves on a few examples:
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> (my-lexer (open-input-string "\"hello world\""))
"\"hello world\""
> (my-lexer (open-input-string "\"hello \"\"world\""))
"\"hello \"\"world\""
> (my-lexer (open-input-string "\"hello \"world\""))
"\"hello \""
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
____________________
Racket Users list:
http://lists.racket-lang.org/users