maybe its really a generated content field. you can try converting it to html or word and then filtering the lines styled by header1 or whatever is used to distinguish it as a section.
ciao! On Fri, Dec 18, 2009 at 11:56 AM, Erwin Olario <[email protected]> wrote: > A PDF's index information doesn't seem to be part of its meta-data.. > > On Fri, Dec 18, 2009 at 11:32 AM, Erwin Olario <[email protected]> wrote: >> >> Your google-fu is better than mine. Thanks, will check it out. >> >> On Fri, Dec 18, 2009 at 10:31 AM, Anuerin Diaz <[email protected]> >> wrote: >>> >>> have you tried libextractor >>> [http://www.gnu.org/software/libextractor/]? that was one of the hits >>> i got from google. >>> >>> >>> >>> On Thu, Dec 17, 2009 at 11:51 PM, Erwin Olario <[email protected]> wrote: >>> > Hi list. >>> > Are there any tools I can use to extract the index information from PDF >>> > files? >>> -- "Programming, an artform that fights back" Anuerin G. Diaz Registered Linux User #246176 Friendly Linux Board @ http://mandrivausers.org/index.php http://ramfree17.net/capsule , when you absolutely have nothing else better to do _________________________________________________ Philippine Linux Users' Group (PLUG) Mailing List http://lists.linux.org.ph/mailman/listinfo/plug Searchable Archives: http://archives.free.net.ph

