[racket-users] custom input ports and specials

2021-04-12 Thread je...@lisp.sh
I'm implementing a tokenizer as a custom input port using 
`make-input-port`. My thinking is that `peek-char-or-special` and 
`read-char-or-special` will be the primary interface to the tokenizer; port 
locations will also be used. In this case, the input port should emitting 
characters as well as special values. It's not just bytes/characters.

Here's a simplified form of what I'm trying to accomplish. Imagine a 
language consisting of sequences (possibly empty) of ASCII letters 
(lowercase and uppercase). If you see a non-X (capital X), just emit that. 
(When I say "emit", what I mean is "value that should be returned by 
`peek-char-or-special` and `read-char-or-special`.) If you see an X and 
it's the last character of the input, emit #\X. If you see a capital X 
followed by any character, emit that character as a symbol. That's the 
special value. Thus:

  a b c ==> a b c

  X ==> X

  a X b ==> a 'b

I realize that this example could easily be done with regular expressions 
or just straightforward processing of byte strings using `port->bytes`. But 
I'd like to attack this problem using custom input ports. It feels like the 
right thing to do. With a custom input port, I can even do validation by 
logging errors. For example, if I'm given a byte that represents a 
non-ASCII letter, I can log an error and advance the port by one byte and 
try again.

I find the documentation for `make-input-port` rather heavy going. There 
are some examples there, which are a good start, but I'm still a bit lost. 
In the discussion of the peek and read procedures that are supplied as 
arguments to `make-input-port` (see `peek!` and `read!` below), I don't 
understand the byte strings that are being passed. It seems that this 
procedures are always given a mutable byte string, and the examples in the 
docs suggest that the byte string could/should indeed be modified. But in 
the input I have in mind, where peek and read might emit specials, it's 
unclear to me what I should stuff into the byte string. For example, if I'm 
peeking `Xm` (capital X and lowercase m), that should eventually get turned 
into `'m` (symbol whose name is "m"), so in my thinking, I'm looking at two 
bytes, not 1. But it seems that the peek and read procedures are always (?) 
given a byte string of length 1.

Anyway, this is perhaps all a long way of saying that I'm rather lost with 
my custom input port approach to the issue. Any advice would be 
appreciated. Maybe custom input ports are not the way to go about what I'm 
doing, but I'm not ready to abandon them just yet. Below you can read (ha!) 
the current status of where I am with this project.

Jesse


#lang racket/base

(require racket/match
 racket/format
 racket/port)

; Input strings are intended to be sequences of ASCII letters,
; uppercase and lowercase. Capital X followed by another letter
; should get turned into a symbol whose name is the one-character
; string consisting of that letter.
(define (make-cool-port in)
  (define (peek! bstr skip event)
(sleep 1)
(define bs (peek-bytes 2 skip in))
(cond [(eof-object? bs)
   eof]
  [else
   (match (bytes->list bs)
 [(list 88 a) ; 88 = X
  (lambda args 2)]
 [_ 1])]))
  (define (read! bstr)
(define peeked (peek! bstr 0 #f))
(cond [(eof-object? peeked)
   eof]
  [(procedure? peeked)
   (define bs (bytes->list (read-bytes (peeked) in)))
   (define a (cadr bs))
   (lambda args (string->symbol (~a (integer->char a]
  [else
   (read-byte in)
   peeked]))
  (make-input-port
   'xs
   read!
   peek!
   (lambda () (close-input-port

(define (read-it-all)
  (define t (peek-char-or-special))
  (log-error "peeked ~a" t)
  (unless (eof-object? t)
(read-char-or-special)
(read-it-all)))

(module+ main
  (call-with-input-string
   "Xaw"
   (lambda (in)
 (define p (make-cool-port in))
 (parameterize ([current-input-port p])
   (read-it-all)


-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/a26f3849-5bec-4d41-bfb9-0f1127e8748cn%40googlegroups.com.


[racket-users] custom input ports

2016-03-14 Thread Jon Zeppieri
Almost every time I've created a custom input port, I've run up against the
rule: "The read-in procedure must not block indefinitely." And each time,
I've either:

- ignored it (with the suspicion that I've just written code that can block
the whole process, though I've never actually verified this);
- written a state machine that buffers data from the underlying port until
it has enough to work with; or
- created a new thread that reads from the underlying port and pipes data
back.

(By the way, `filter-read-input-port` doesn't help here, since the filter
is called in the context of the above-mentioned `read-in` procedure, so,
even though the docs don't mention it, presumably it shouldn't do blocking
I/O either.)

The last of these is the least painful (if you count the psychological pain
of option 1), but it raises another problem. If the data turns out to be
malformed, I want to raise an exception. (Or, at least, I think I do.) But
I want to raise it in the reader's thread, and I don't think there's a way
to do this, short of an explicit pre-arrangement.

Has anyone else dealt with this problem (and come up with a nice solution)?
I haven't yet tried using generators. They might make it a bit easier to do
a variation on option 2, at the cost of a bunch of stack-copying.

- Jon

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.