Hi, It turns out that itext seems to be very easy to work with and I was able to modify the examples to dump text labelled with X and Y positions. I can then sort this in bash and arrange as needed for many document layouts.
For example, with a simple script I could get output like this, giving test ordered by page, x, and y, jmake -libs ../../lib -libs ../../blib -cp classes -cp ../../src/core -run mike 541.pdf | sort -g -k 1 -k 2 -k 3 | more 3 48169 45058 The conversion of a part- 3 48539 34859 If a partnership is termi- 3 52762 62156 under 3 54464 97792 Page 3 3 55043 62156 Basis 4 4200 13500 Page 4 of 15 of Publication 541 4 4200 16400 The type and rule above prints on all proofs including departmenta l reproduction proofs. MUST be removed before printing. 4 4200 25550 business purpose for adopting a tax year for the 4 4200 26500 partnership that differs from its required tax But often in documents you do not have pure multi-column layout but rather have notes and tables interspersed, marginal notes, or multi-column lists. Is there any information in the PDF that tells me how this stuff is supposed to be organized to extract the INFORMATION or is this just a bunch of hopelessly jumbled text that can only be read by a human, not a computer? Thanks. Mike Marchywka 586 Saint James Walk Marietta GA 30067-7165 415-264-8477 (w)<- use this 404-788-1216 (C)<- leave message 989-348-4796 (P)<- emergency only [email protected] Note: If I am asking for free stuff, I normally use for hobby/non-profit information but may use in investment forums, public and private. Please indicate any concerns if applicable. Note: hotmail is getting cumbersom, try also [email protected] _________________________________________________________________ Windows Liveā¢: Life without walls. http://windowslive.com/explore?ocid=TXT_TAGLM_WL_allup_1a_explore_032009 ------------------------------------------------------------------------------ _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php
