> Problem I am having is that some of them has multiple columns. and multiple > word boxes. Does the xpdf patch extract different columns and wordboxes?
It tells you where each word is. Columns you have to do for yourself. Bill > > In UpLib, I use xpdf-3.02pl2 with a patch which gives me position and > > font information for each word. You can get the xpdf sources from > > http://www.foolabs.com/xpdf/, and the patch file is at > > http://uplib.parc.com/misc/xpdf-3.02-PATCH. To extract the byte > > positions, use pdftotext with the "-wordboxes" switch, and see the > > pdftotext man page for more info. This is run automatically in UpLib > > before the indexing is done. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]