If you have a pdf produced in a controlled environment where all the strings
are in the form '(string text)' it's possible to go to the page, get the
Contents key, decompress it and then parse it.
I'll be in London for the next week and I doubt that I will be able to
answer any mails but when I get back I'll prepare something to get you
starting.

Best Regards,
Paulo Soares

> -----Original Message-----
> From: Matt Benson [SMTP:[EMAIL PROTECTED]]
> Sent: Friday, December 06, 2002 22:54
> To:   Paulo Soares; itext-questions
> Subject:      RE: [iText-questions] PDF metadata
> 
> Paulo, do you have any sample code for this?  I am
> using ps2ascii which comes with gs, but to get
> checksums of existing text I thought I might save time
> if I didn't have to shell out.  But I don't know
> exactly what you're talking about.  What
> classes/methods should I be looking at?
> 
> Thanks,
> Matt
> 
> --- Paulo Soares <[EMAIL PROTECTED]> wrote:
> > I'm surprised that pstotext fails, iText certainly
> > doesn't generate anything
> > strange. If the documents were generated by you with
> > iText, with winansi
> > font encoding, you can read the document with iText,
> > open the stream and
> > parse the text. To parse the text find the the first
> > '(' and read until the
> > next ')', escaping the '\'.
> > 
> > Best Regards,
> > Paulo Soares
> > 
> > > -----Original Message-----
> > > From:     Matt Benson [SMTP:[EMAIL PROTECTED]]
> > > Sent:     Tuesday, November 26, 2002 17:52
> > > To:       Paulo Soares; itext-questions
> > > Subject:  RE: [iText-questions] PDF metadata
> > > 
> > > Thanks for the suggestion, Paulo.  Now, what would
> > you
> > > use to do the extraction?  I have been playing
> > with
> > > pstotext which uses gs behind the scenes, but gs
> > is
> > > choking on the iText-generated PDF.  I have looked
> > at
> > > jPedal, but not deeply as it seems to lack an
> > > intuitive enough API that I could get started
> > quickly
> > > without reading the source code of the examples.
> > > 
> > > -Matt
> > > 
> > > --- Paulo Soares <[EMAIL PROTECTED]> wrote:
> > > > Taking out the metadata won't help you as there
> > are
> > > > no guaranties that the
> > > > layout engine is the same from version to
> > version,
> > > > the text may look the
> > > > same but the internal representation is
> > different.
> > > > The best way is to do a
> > > > checksum to the text (words only, skipping the
> > > > whitespace) and store that
> > > > information in the pdf metadata as a new key.
> > The
> > > > already generated pdf can
> > > > have the text extracted, the checksum calculated
> > and
> > > > applied to the same
> > > > pdf.
> > > > 
> > > > Best Regards,
> > > > Paulo Soares
> > > > 
> > > > > -----Original Message-----
> > > > > From: Matt Benson [SMTP:[EMAIL PROTECTED]]
> > > > > Sent: Tuesday, November 26, 2002 15:40
> > > > > To:   itext-questions
> > > > > Subject:      [iText-questions] PDF metadata
> > > > > 
> > > > > We are using iText to convert text files to
> > PDF as
> > > > > outlined in the FAQ.  This works; however I
> > want
> > > > to
> > > > > take a checksum of the PDF created and use it
> > in
> > > > > conjunction with some other information to
> > verify
> > > > we
> > > > > have not created this file before.  What I am
> > > > finding,
> > > > > however, is that the metadata of the PDF
> > always
> > > > > differs between iText versions as well as
> > creation
> > > > > date/time, so I cannot create the exact same
> > file
> > > > > twice and thus cannot rely on a checksum.  I
> > could
> > > > use
> > > > > the checksum from the input file, except that
> > this
> > > > is
> > > > > a modification to a production application and
> > we
> > > > no
> > > > > longer have the input files for the existing
> > data.
> > > >  So
> > > > > to do this I would have to extract the text to
> > get
> > > > an
> > > > > approximation of the original file.  If I did
> > > > this,
> > > > > the checksum would represent slightly
> > different
> > > > things
> > > > > from the old to the new data.  What I am
> > wondering
> > > > > about is whether these variable pieces of
> > metadata
> > > > are
> > > > > vital to the PDF structure, and if not, what
> > would
> > > > it
> > > > > take to remove them?  Alternatively, if anyone
> > has
> > > > a
> > > > > better idea then those are welcome too.
> > > > > 
> > > > > Thanks,
> > > > > Matt
> > > > > 
> > > > >
> > __________________________________________________
> > > > > Do you Yahoo!?
> > > > > Yahoo! Mail Plus - Powerful. Affordable. Sign
> > up
> > > > now.
> > > > > http://mailplus.yahoo.com
> > > > > 
> > > > > 
> > > > >
> > > >
> > >
> >
> -------------------------------------------------------
> > > > > This SF.net email is sponsored by: Get the new
> > > > Palm Tungsten T 
> > > > > handheld. Power & Color in a compact size! 
> > > > >
> > > >
> > >
> >
> http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0002en
> > > > >
> > _______________________________________________
> > > > > iText-questions mailing list
> > > > > [EMAIL PROTECTED]
> > > > >
> > >
> >
> https://lists.sourceforge.net/lists/listinfo/itext-questions
> > > 
> > > 
> > > __________________________________________________
> > > Do you Yahoo!?
> > > Yahoo! Mail Plus - Powerful. Affordable. Sign up
> > now.
> > > http://mailplus.yahoo.com
> > 
> > 
> >
> -------------------------------------------------------
> > This SF.net email is sponsored by: Get the new Palm
> > Tungsten T 
> > handheld. Power & Color in a compact size! 
> >
> http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0002en
> > _______________________________________________
> > iText-questions mailing list
> > [EMAIL PROTECTED]
> >
> https://lists.sourceforge.net/lists/listinfo/itext-questions
> 
> 
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> http://mailplus.yahoo.com


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
iText-questions mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Reply via email to