On 24Aug2018 17:55, Peter Otten <__pete...@web.de> wrote:
Albert-Jan Roskam wrote:
I have Ghostscript files with a table of contents (toc) and I would like
to use this info to generate a human-readable toc. The problem is: I can't
get the (nested) hierarchy right.

import re

toc = """\
[ /PageMode /UseOutlines
  /Page 1
  /View [/XYZ null null 0]
  /DOCVIEW pdfmark
[ /Title (Title page)
  /Page 1
  /View [/XYZ null null 0]
  /OUT pdfmark
[ /Title (Document information)
  /Page 2
  /View [/XYZ null null 0]
  /OUT pdfmark
[...]
What is the best approach to do this?

The best approach is probably to use some tool/library that understands
postscript.

Just to this: I disagree. IIRC, there's no such thing as '/Title' etc in PostScript - these will all be PostScript functions defined by whatever made the document. So a generic tool won't have any way to extract semantics like titles from a document.

The OP presumably has the specific output of a particular tool with this nice well structured postscript, so he needs to write his/her own special parser.

Cheers,
Cameron Simpson <c...@cskk.id.au>
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to