Leonard, How much of the PDF content do you reckon is tagged? I haven't seen anything from IRS come tagged. Does iText support Tagging?
Also a snippet from PDFPlanet: #################################### Adobe's Acrobat 6.0 will add tags to a PDF file, but human intelligence is still required to ensure the tagging process was performed correctly. There is little room for error in document tagging. Even seemingly small errors in document structure can easily render a file completely incomprehensible. #################################### I don't see many PDF authors tagging their files. -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Leonard Rosenthol Sent: Tuesday, February 21, 2006 11:32 AM To: [EMAIL PROTECTED]; itext-questions@lists.sourceforge.net; [EMAIL PROTECTED]; [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: [PDFBox-user] Re: [iText-questions] Good reading/resarch on PDF text extraction At 10:36 AM 2/21/2006, Richard Braman wrote: >As more and more content gets "pushed" into PDF it looses its >meaning to anyone else other than a human reader or a printer. ONLY IF the document content is untagged. Tagged PDF (part of the PDF spec since 1.5) provides for the inclusion of semantic information about the content IN ADDITION to its visible attributes. Then, extraction of that content with all the necessary usability becomes trivial. Leonard ------------------------------------------------------------------------ --- Leonard Rosenthol <mailto:[EMAIL PROTECTED]> Chief Technical Officer <http://www.pdfsages.com> PDF Sages, Inc. 215-938-7080 (voice) 215-938-0880 (fax) ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ PDFBox-user mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/pdfbox-user ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions