Thanks for the suggestion, Paulo. Now, what would you use to do the extraction? I have been playing with pstotext which uses gs behind the scenes, but gs is choking on the iText-generated PDF. I have looked at jPedal, but not deeply as it seems to lack an intuitive enough API that I could get started quickly without reading the source code of the examples.
-Matt --- Paulo Soares <[EMAIL PROTECTED]> wrote: > Taking out the metadata won't help you as there are > no guaranties that the > layout engine is the same from version to version, > the text may look the > same but the internal representation is different. > The best way is to do a > checksum to the text (words only, skipping the > whitespace) and store that > information in the pdf metadata as a new key. The > already generated pdf can > have the text extracted, the checksum calculated and > applied to the same > pdf. > > Best Regards, > Paulo Soares > > > -----Original Message----- > > From: Matt Benson [SMTP:[EMAIL PROTECTED]] > > Sent: Tuesday, November 26, 2002 15:40 > > To: itext-questions > > Subject: [iText-questions] PDF metadata > > > > We are using iText to convert text files to PDF as > > outlined in the FAQ. This works; however I want > to > > take a checksum of the PDF created and use it in > > conjunction with some other information to verify > we > > have not created this file before. What I am > finding, > > however, is that the metadata of the PDF always > > differs between iText versions as well as creation > > date/time, so I cannot create the exact same file > > twice and thus cannot rely on a checksum. I could > use > > the checksum from the input file, except that this > is > > a modification to a production application and we > no > > longer have the input files for the existing data. > So > > to do this I would have to extract the text to get > an > > approximation of the original file. If I did > this, > > the checksum would represent slightly different > things > > from the old to the new data. What I am wondering > > about is whether these variable pieces of metadata > are > > vital to the PDF structure, and if not, what would > it > > take to remove them? Alternatively, if anyone has > a > > better idea then those are welcome too. > > > > Thanks, > > Matt > > > > __________________________________________________ > > Do you Yahoo!? > > Yahoo! Mail Plus - Powerful. Affordable. Sign up > now. > > http://mailplus.yahoo.com > > > > > > > ------------------------------------------------------- > > This SF.net email is sponsored by: Get the new > Palm Tungsten T > > handheld. Power & Color in a compact size! > > > http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0002en > > _______________________________________________ > > iText-questions mailing list > > [EMAIL PROTECTED] > > https://lists.sourceforge.net/lists/listinfo/itext-questions __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com ------------------------------------------------------- This SF.net email is sponsored by: Get the new Palm Tungsten T handheld. Power & Color in a compact size! http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0002en _______________________________________________ iText-questions mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/itext-questions
