I'm implementing a tokenizer as a custom input port using
`make-input-port`. My thinking is that `peek-char-or-special` and
`read-char-or-special` will be the primary interface to the tokenizer; port
locations will also be used. In this case, the input port should emitting
characters as well as special values. It's not just bytes/characters.
Here's a simplified form of what I'm trying to accomplish. Imagine a
language consisting of sequences (possibly empty) of ASCII letters
(lowercase and uppercase). If you see a non-X (capital X), just emit that.
(When I say "emit", what I mean is "value that should be returned by
`peek-char-or-special` and `read-char-or-special`.) If you see an X and
it's the last character of the input, emit #\X. If you see a capital X
followed by any character, emit that character as a symbol. That's the
special value. Thus:
a b c ==> a b c
X ==> X
a X b ==> a 'b
I realize that this example could easily be done with regular expressions
or just straightforward processing of byte strings using `port->bytes`. But
I'd like to attack this problem using custom input ports. It feels like the
right thing to do. With a custom input port, I can even do validation by
logging errors. For example, if I'm given a byte that represents a
non-ASCII letter, I can log an error and advance the port by one byte and
try again.
I find the documentation for `make-input-port` rather heavy going. There
are some examples there, which are a good start, but I'm still a bit lost.
In the discussion of the peek and read procedures that are supplied as
arguments to `make-input-port` (see `peek!` and `read!` below), I don't
understand the byte strings that are being passed. It seems that this
procedures are always given a mutable byte string, and the examples in the
docs suggest that the byte string could/should indeed be modified. But in
the input I have in mind, where peek and read might emit specials, it's
unclear to me what I should stuff into the byte string. For example, if I'm
peeking `Xm` (capital X and lowercase m), that should eventually get turned
into `'m` (symbol whose name is "m"), so in my thinking, I'm looking at two
bytes, not 1. But it seems that the peek and read procedures are always (?)
given a byte string of length 1.
Anyway, this is perhaps all a long way of saying that I'm rather lost with
my custom input port approach to the issue. Any advice would be
appreciated. Maybe custom input ports are not the way to go about what I'm
doing, but I'm not ready to abandon them just yet. Below you can read (ha!)
the current status of where I am with this project.
Jesse
#lang racket/base
(require racket/match
racket/format
racket/port)
; Input strings are intended to be sequences of ASCII letters,
; uppercase and lowercase. Capital X followed by another letter
; should get turned into a symbol whose name is the one-character
; string consisting of that letter.
(define (make-cool-port in)
(define (peek! bstr skip event)
(sleep 1)
(define bs (peek-bytes 2 skip in))
(cond [(eof-object? bs)
eof]
[else
(match (bytes->list bs)
[(list 88 a) ; 88 = X
(lambda args 2)]
[_ 1])]))
(define (read! bstr)
(define peeked (peek! bstr 0 #f))
(cond [(eof-object? peeked)
eof]
[(procedure? peeked)
(define bs (bytes->list (read-bytes (peeked) in)))
(define a (cadr bs))
(lambda args (string->symbol (~a (integer->char a]
[else
(read-byte in)
peeked]))
(make-input-port
'xs
read!
peek!
(lambda () (close-input-port
(define (read-it-all)
(define t (peek-char-or-special))
(log-error "peeked ~a" t)
(unless (eof-object? t)
(read-char-or-special)
(read-it-all)))
(module+ main
(call-with-input-string
"Xaw"
(lambda (in)
(define p (make-cool-port in))
(parameterize ([current-input-port p])
(read-it-all)
--
You received this message because you are subscribed to the Google Groups
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to racket-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/racket-users/a26f3849-5bec-4d41-bfb9-0f1127e8748cn%40googlegroups.com.