Hi
Re our conversion on parsing content streams, I'm beginning to think
that doing much more than building a list might be nuking a fly.
According to the PDF reference:
"PDF has no concept of an operand stack as PostScript has. In PDF, all
of the operands needed by an operator must immediately precede that
operator. Operators do not return results, and operands cannot be left
over when an operator finishes execution."
I realised before that that was how it commonly worked out, but I didn't
realise it was a hard rule; I'd always assumed there was a stack.
That means that a tree representation of a content stream would be very
boring and simple - just:
[root]
|
--------------------------------------------------
| | | |
[operator] [operator] [operator] [...]
| | |
[operand] [operand] [operand]
... which might be just as well represented by something like a
list/vector of pairs, where each pair describes an operator and an array
of zero or more PdfVariant arguments.
Stream operations would seem to be equally simple - accumulate variants
until you hit a keyword, then return the keyword and an array of
arguments to it.
I've put together a quick reader based on Dom's code that can be used to
read a content stream an operator at a time, returning a pair containing
the string representation of the operator and a vector of PdfVariant
operands. There's also a simple function to accumulate the lot of them
if you want to read a whole stream at once.
It's probably not the fastest way to do things, but it provides
something to play with.
--
Craig Ringer
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Podofo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/podofo-users