extracted text with
PDDocument doc = PDDocument.load(new URL(
"http://people.ischool.berkeley.edu/~hearst/irbook/print/chap10.pdf"));
PDFTextStripper stripper = new PDFTextStripper();
stripper.writeText(doc, new OutputStreamWriter(System.out));
looks like this
¡ ¢¤£¦¥¨§ª© ®©°¯±¢²§ª³ ´¶µ¸·¹¢º© » ¥¼µ½§ff·fi¥ffifl¼´²Â
"!$#&%ª')(+* ,-%ª.ff/0%ff132"%ff45.ff6
,-.7'84:97!;.7'< "!>=?.ª!>'fi*�[email protected]®*
ACM Press
New York
Addison-Wesley
D)EGFIH J>KMLON8P$QRH ESPUTffVffWYXZE>TR[\PUQ]L_^`E>ababE>cedgfUahX;ijija
...
best regards
reinhard