Sargrad, Dave wrote:
> "It would be absoltely wonderful if as part of your work you ended up writing 
> even a rudimentary content stream parser that was self contained enough to be 
> included in PoFoFo ."
> 
> Great. I would love to contribute a content stream parser. I don't quite know 
> what this means yet, but perhaps we can talk about the proper API (from your 
> perspective). 
> 
> With your mentoring I may be able to contribute a component to the podofo 
> project that you find useful.
> 
>  "Looking at the attached PDF, I think it's safe to say you can handle a very 
> restricted subset of PDF and still be OK. I begin to see why you're doing it 
> the way you are. A content stream parser for that shouldn't be too hard to 
> write at all by the looks."
> 
> This was my impression/hope as well. I want to start simple, and yet put 
> myself on a road to increasingly understand/use pdf.
> 
> Now that you've seen the pdf files that im currently interested and 
> understand that I'm willing to put in the effort to "do this right", and to 
> contribute something back to the community, please help me to understand the 
> appropriate initial characteristics (API) of the "content stream parser".

Honestly, in this case the best thing you can probably do is read parts 
of the PDF Reference. In my distinctly non-expert view I'd recommend:

   Overview - section 2.1 and the overview intro

   Skim reading section 3.1-3.4, 3.6 & 3.8 . Looking at some sample PDFs
   might be helpful here. PoDoFoBrowser or podofouncompress might be
   handy.

   I'd unsurprisingly recommend reading section 3.7 "Content Streams"
   in detail.

To my mind, a basic content stream parser should be able to read a 
content stream (just a byte sequence as far as anything else is 
concerned) and as a first stage produce a stream of tokens. That's 
trivial since content streams use whitespace separators. Those could 
then be processed to identify operators, convert int/float tokens to 
real numeric values, etc, giving you a stream of content stream elements 
that code could actually do something useful with.

 From there... I'm not sure what the best way is. Getting that far 
should be pretty trivial though. I'm itching to have a go at it myself 
now that I've actually looked at it, but it's now 4am and sleep is no 
longer optional.

--
Craig Ringer

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Podofo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to