We are using pdftotext to strip out text from pdf's to prepare for search indexing and more. This works well except with our own pdf's (produced in Scribus) which getting badly broken up - we suspect through kerning. The text generated is simply fragmented into meaningless chunks. It remains in sequential order and some words are fine, but generally it's not working.
We are using (the great) Bitstream Vera which looks so good both on screen and in print, however we are also getting the same effect when we convert our text to Arial. 1. Has anybody experienced this? Is this a pdftotext thing? 2. Are there alternative pdf-to-text parsers that anyone would recommend? Lucien Oxford Information Labs ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.
