Hi,
It turns out that itext seems to be very easy to work
with and I was able to modify the examples to dump text
labelled with X and Y positions. I can then sort this in 
bash and arrange as needed for many document layouts.

For example, with a simple script I could get output like this, 
giving test ordered by page, x, and y, 

jmake -libs ../../lib -libs ../../blib -cp classes -cp ../../src/core -run mike 
541.pdf | sort -g  -k 1 -k 2 -k 3  | more
 
3 48169 45058  The conversion of a part-
3 48539 34859  If a partnership is termi-
3 52762 62156  under
3 54464 97792  Page 3
3 55043 62156  Basis
4 4200 13500  Page 4 of 15 of Publication 541
4 4200 16400  The type and rule above prints on all proofs including departmenta
l reproduction proofs. MUST be removed before printing.
4 4200 25550  business purpose for adopting a tax year for the
4 4200 26500  partnership that differs from its required tax


But often in documents you do not have pure multi-column layout
but rather have notes and tables interspersed, marginal notes,
or multi-column lists. Is there any information in the
PDF that tells me how this stuff is supposed to be organized
to extract the INFORMATION or is this just a bunch of hopelessly jumbled
text that can only be read by a human, not a computer?

Thanks.




Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
415-264-8477 (w)<- use this
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
[email protected]
Note: If I am asking for free stuff, I normally use for hobby/non-profit
information but may use in investment forums, public and private.
Please indicate any concerns if applicable.
Note:  hotmail is getting cumbersom, try also [email protected]




_________________________________________________________________
Windows Liveā„¢: Life without walls.
http://windowslive.com/explore?ocid=TXT_TAGLM_WL_allup_1a_explore_032009
------------------------------------------------------------------------------
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php

Reply via email to