You can adjust a port to count characters instead of bytes by using
`port-count-lines!`.

At Wed, 13 Jan 2016 12:23:24 -0800 (PST), Ben Draut wrote:
> I've been tinkering with a lexer/parser for Lambda calculus expressions. I'm 
> trying to add a feature to highlight unexpected characters, rather than 
> making 
> the user count columns. For example, the lexer will choke on this string, 
> because it doesn't have a match for % anywhere: (λx.x %)
> 
> I'd like to get output like this:
> 
> <Some kind of explanation message>:
> 
> (λx.x %)
>       ^
> 
> I've hit a problem though where the srclocs returned from the lexer don't 
> correspond exactly to string indices when unicode characters (such as λ) are 
> included in the expression.
> 
> Take this program for example:
> 
> (with-handlers ([exn:fail:read? (λ (e) (printf "ERROR: ~a~n" e))]) 
>   ((lexer-src-pos [(eof) 'EOF]) (open-input-string "λ")))
> 
> This outputs:
> 
> ERROR: #(struct:exn:fail:read lexer: No match found in input starting with: λ 
> #<continuation-mark-set> (#(struct:srcloc #f #f #f 1 2)))
> 
> You can see in the srcloc that the position is 1, (as expected) but the span 
> is 2. 
> 
> So let's imagine I have a lexer that understands λ, but nothing else. When I 
> try to lex the expression "λx", the lexer chokes on the x, and the srcloc 
> information has position 3 and span 1. However, (string-length "λx") 
> evaluates 
> to 2. 
> 
> I'm struggling to figure out how to solve the problem. I'm guessing that the 
> λ 
> is two bytes, and the lexer counts 1 position per byte. So what's the 'right' 
> way to handle this? I could just sub1 from a position for each λ that 
> appeared 
> in the expression beforehand, but that feels sick and wrong. 
> 
> Is there a way to tell the lexer to count positions on a 
> character-by-character basis, rather than bytes? (Or perhaps I'm 
> misunderstanding how that works too.)
> 
> Any pointers would be appreciated. Thanks!
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to