Hi there,

resuming to work on PDFBOX-1000 I came across a question how to maintain some 
state within the base components PDFLexer and Simple Parser (which has yet to 
come). 

E.g. in order to differentiate a number from an indirect object I potentially 
have to read three tokens {num} {gen}  obj to check if {num} is an individual 
number or the start of an indirect object. There are two ways to recover if 
I've read too many tokens and the number was in fact the individual object

a) depend on file position e.g. filePointer and seek
b) maintain some internal state

I currently tend to go for b) as this would remove the dependency on 
filePointer() and seek() or similar methods but that means if the parsing has 
to start from a new point within the file, object etc. there needs too be some 
reset() call to reset the state. Also the caller e.g. ConformingParser has to 
make sure that there is some way to reposition the cursor. On the other hand 
not being dependent on a specific position would enable the PDFLexer and 
SimpleParser to be extended to work on byte[] and similar. 

WDYT

Kind regards

Maruan Sahyoun

Reply via email to