bytes, lazy strings and zlib

Chris Dolan Fri, 21 Nov 2008 19:25:17 -0800

In my ongoing quest to create a PDF parser in Perl6, I have someRakudo/PGE/parrot questions. These are low-urgency and some of thesemay not be implemented yet...


1) byte orientation

PDF's syntax is inherently an 8-bit ASCII superset. Some subsectionsmay be interpreted as some multi-byte encoding or even binary, butlow-level parsers can safely work solely in the string-as-byte-arraydomain.

How do I make a grammar work on bytes instead of chars? Is that aproperty of the $.target string?


2) file as lazy string

PDF files are largely random access, but individual segments havearbitrary lengths. Rather than slurping in the whole file orguessing at segment lengths, I'd like to emulate a string via awrapper around a seekable file, and then apply my grammar to thatfake string. I think I can accomplish this by subclassing PGE::Matchand override new(), text() and item() appropriately. text() wouldseek to appropriate locations in the file and buffer chunks at atime. From there, I could substr the desired passages.

Does anyone know any implementation details that would make this lazy-string approach work or not work? Has someone tried this?

It seems like the runtime/parrot/library/Stream classes parallel whatI want to accomplish.


3) gzip

Has anyone worked on a zlib interface?


Thanks,
Chris

bytes, lazy strings and zlib

Reply via email to