In my ongoing quest to create a PDF parser in Perl6, I have some Rakudo/PGE/parrot questions. These are low-urgency and some of these may not be implemented yet...

1) byte orientation

PDF's syntax is inherently an 8-bit ASCII superset. Some subsections may be interpreted as some multi-byte encoding or even binary, but low-level parsers can safely work solely in the string-as-byte-array domain.

How do I make a grammar work on bytes instead of chars? Is that a property of the $.target string?

2) file as lazy string

PDF files are largely random access, but individual segments have arbitrary lengths. Rather than slurping in the whole file or guessing at segment lengths, I'd like to emulate a string via a wrapper around a seekable file, and then apply my grammar to that fake string. I think I can accomplish this by subclassing PGE::Match and override new(), text() and item() appropriately. text() would seek to appropriate locations in the file and buffer chunks at a time. From there, I could substr the desired passages.

Does anyone know any implementation details that would make this lazy- string approach work or not work? Has someone tried this?

It seems like the runtime/parrot/library/Stream classes parallel what I want to accomplish.

3) gzip

Has anyone worked on a zlib interface?


Thanks,
Chris

Reply via email to