PDFdev is a service provided by PDFzone.com | http://www.pdfzone.com _____________________________________________________________
> Sorry about the lack of understanding but I expected something > more like: [(some text)ddd]. Have I had my assumptions totally wrong? You are assuming way too much. Sometimes, often in fact, you will see that. But that is just good luck. You need to take each byte value, look up the font, and process the encoding values. You also have to deal with split strings and out of order text. You will need to be very familiar with the chapter on text/fonts in the PDF Reference in particular, but you'll need to have read it in detail up to that point too. Believe me when I say extracting text from a PDF is one of the more difficult problems, and may represent many months of work, even once you have a full grasp of the problem. Bear in mind that you cannot do a perfect solution, partly because the encodings are not always present, and partly because deducing reading order is guesswork. Hence, unless you really have to do it yourself, or really want to solve the issues (it's an interesting challenge), I would recommend looking into I mean I do not know what all this text line should look like. I believe this should read "It's the 21st century" although again, I am not 100% sure. > May I add it is my first atempt to decode PDF stuff, > and I do feel like I am stepping into something new to put it mildly... Not a good choice for a first attempt, I suspect... Aandi To change your subscription: http://www.pdfzone.com/discussions/lists-pdfdev.html
