[pdf-devel] Re: Initial API for the tokeniser module

jemarch Mon, 18 May 2009 06:46:08 -0700

   > Heh, yes :)
   > 
   > I wanted to say that since there are several ways to get the printed
   > representation of a token, we should provide a fine grained control to
   > the client in this level. Upper layers of the library will take care,
   > for example, about PDF x.y compatibility.


   So the token writer should output a normal string when no special flags
   are given to pdf_token_write, and let an upper layer give a flag when it
   wants a hex string?

Yes, that is the idea.

   > The same applies to the interpretation of the textual forms for
   > tokens.

   Are you referring just to '#' escapes here?  I think that's the only
   case where a token can be interpreted in two different ways.

It is currently the only case, but it may change in future versions of
the PDF specification. So I would go for a general solution here using
writing flags even if currently we only implement support for one.

   >    For now there's no implementation that can return EAGAIN since there's
   >    no way to set a non-blocking mode.  At some point a function to do that
   >    should be added.
   > 
   > Maybe would be good to write a little read-only http filesystem and
   > incorporate it to the library.  That would allow us to test the
   > non-blocking capabilities of pdf_fsys and also the read-in-advance
   > functions.  It would also work as an example of non-disk based
   > filesystems for people interested in writing their own fsys
   > implementations.

   That's a good idea.  Whoever implements this should look into using a
   library like libcurl to handle the protocol (and possibly give us other
   protocols for free).

Yes, that library seems to implement quite a lot of protocols. For the
moment I will be creating a task for the writing of the http
file system.

   >    Also, I had two flags for the token reader:
   >      _RET_COMMENTS  (return comments as tokens)
   >      _PDF11         (don't treat '#' as an escape character in names)
   >    I didn't have a public function to set them, but one would need to be
   >    added.  How about
   >      void pdf_token_reader_set_flags(int flags),
   >    where 'flags' is a bitmask?  We may also need a _PDF11 flag for the
   >    writer.
   > 
   > I would use _SHARP_ESCAPE instead of _PDF11. Upper layers will care
   > about PDFx.y portability.

   I think something like NO_SHARP_ESCAPE/NO_NAME_ESCAPE would be better
   (i.e. the default would be to interpret the escapes, so no flags would
   be needed in the common case).

Yes, much better.

   Also, the flags should probably be specified for token_read rather
   than a separate reader_set_flags call, for consistency with
   token_write.

Ok.

   > Also, I think that it would be good to use the PDF_ prefix for these
   > flags.

   Just "PDF_", or something like "PDF_TOKENISER_"?

I would go for PDF_TOKEN_, but if you feel that it would make the
names too large, PDF_ would be also ok.

-- 
Jose E. Marchesi
[email protected]

GNU Project
http://www.gnu.org

[pdf-devel] Re: Initial API for the tokeniser module

Reply via email to