Thanks for the suggestion, Paulo.  Now, what would you
use to do the extraction?  I have been playing with
pstotext which uses gs behind the scenes, but gs is
choking on the iText-generated PDF.  I have looked at
jPedal, but not deeply as it seems to lack an
intuitive enough API that I could get started quickly
without reading the source code of the examples.

-Matt

--- Paulo Soares <[EMAIL PROTECTED]> wrote:
> Taking out the metadata won't help you as there are
> no guaranties that the
> layout engine is the same from version to version,
> the text may look the
> same but the internal representation is different.
> The best way is to do a
> checksum to the text (words only, skipping the
> whitespace) and store that
> information in the pdf metadata as a new key. The
> already generated pdf can
> have the text extracted, the checksum calculated and
> applied to the same
> pdf.
> 
> Best Regards,
> Paulo Soares
> 
> > -----Original Message-----
> > From:       Matt Benson [SMTP:[EMAIL PROTECTED]]
> > Sent:       Tuesday, November 26, 2002 15:40
> > To: itext-questions
> > Subject:    [iText-questions] PDF metadata
> > 
> > We are using iText to convert text files to PDF as
> > outlined in the FAQ.  This works; however I want
> to
> > take a checksum of the PDF created and use it in
> > conjunction with some other information to verify
> we
> > have not created this file before.  What I am
> finding,
> > however, is that the metadata of the PDF always
> > differs between iText versions as well as creation
> > date/time, so I cannot create the exact same file
> > twice and thus cannot rely on a checksum.  I could
> use
> > the checksum from the input file, except that this
> is
> > a modification to a production application and we
> no
> > longer have the input files for the existing data.
>  So
> > to do this I would have to extract the text to get
> an
> > approximation of the original file.  If I did
> this,
> > the checksum would represent slightly different
> things
> > from the old to the new data.  What I am wondering
> > about is whether these variable pieces of metadata
> are
> > vital to the PDF structure, and if not, what would
> it
> > take to remove them?  Alternatively, if anyone has
> a
> > better idea then those are welcome too.
> > 
> > Thanks,
> > Matt
> > 
> > __________________________________________________
> > Do you Yahoo!?
> > Yahoo! Mail Plus - Powerful. Affordable. Sign up
> now.
> > http://mailplus.yahoo.com
> > 
> > 
> >
>
-------------------------------------------------------
> > This SF.net email is sponsored by: Get the new
> Palm Tungsten T 
> > handheld. Power & Color in a compact size! 
> >
>
http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0002en
> > _______________________________________________
> > iText-questions mailing list
> > [EMAIL PROTECTED]
> >
https://lists.sourceforge.net/lists/listinfo/itext-questions


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com


-------------------------------------------------------
This SF.net email is sponsored by: Get the new Palm Tungsten T 
handheld. Power & Color in a compact size! 
http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0002en
_______________________________________________
iText-questions mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Reply via email to