On Tue, May 26, 2009 at 00:53:46 +0200, [email protected] wrote: ... > We need to be able to determine the boundaries of a token that has > been read, for error reporting. We cannot rely on the stm used by the > token reader to determine the beginning position of a read token, > since it is skipping white characters.
This behaviour could be changed by - adding a flag that causes token_read to return whitespace as a token; or, - adding a function/flag to advance to the beginning of the next token > We would need to expand the pdf_token_read to communicate both the > beginning position and the end position in the stm of the last read > token. It could be done using two extra parameters: > > pdf_status_t pdf_token_read (pdf_token_reader_t reader, > pdf_u32_t flags, > pdf_size_t *beginning_pos, > pdf_size_t *end_pos, > pdf_token_t *token); > > If NULLs are passed then the parameters are not filled. > > An alternative would be to expand the pdf_token_t TAD to include such > information, but I think it would not be quite appropriate, since it > is not part of the semantics of the token. True, I'd rather not include it in the token structure. > Would this modification be ok with you? I'm not sure about the API. If the extra parameters will only be used in the case of an error, maybe a new function could be added to access the positions of the last token (to keep pdf_token_read simple); or the stream methods could be called directly if the caller could manually skip whitespace. Also, what would beginning_pos and end_pos mean exactly? Are they based on the byte positions of the underlying stream before filtering, or on the number of bytes actually seen by the tokeniser (after filtering)? The physical stream position (e.g. as reported by ftell) might not be useful; for example, if a decompression filter operates on blocks of data, it could emit many tokens without advancing. -- Michael
signature.asc
Description: Digital signature
