Re: search through postscript documents?
[EMAIL PROTECTED] wrote: I have tried versions of Ghostscript on Slackware and on Knoppix, a Debian derivative. I have downloaded and installed Ghostscript 8.50. I have installed the latest pstotext. Nothing works. try pdftotext from xpdf package -- Matej Cepl, http://www.ceplovi.cz/matej GPG Finger: 89EF 4BC6 288A BF43 1BAB 25C3 E09F EF25 D964 84AC 138 Highland Ave. #10, Somerville, Ma 02143, (617) 623-1488 His mother should have thrown him away and kept the stork. -- Mae West -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: search through postscript documents?
Antonio Rodriguez wrote: On Thu, Mar 03, 2005 at 12:15:40PM +0100, Joerg Reckers wrote: Is there a way(program) to search for expressions in a postscript document? and to copy + paste words out of a ghostview-program to text? As i am using Kghostview now, and i am missing these features, so i will ask on this list. :-) thanks, joerg Package: pstotext Priority: optional Section: text Installed-Size: 110 Maintainer: J.H.M. Dassen (Ray) [EMAIL PROTECTED] Architecture: i386 Version: 1.9-1 Depends: gs | gs-aladdin (= 3.51), libc6 (= 2.3.2.ds1-4) Filename: pool/main/p/pstotext/pstotext_1.9-1_i386.deb Size: 32294 MD5sum: a159e4b756759beeae003700d31487d1 Description: Extract text from PostScript and PDF files pstotext extracts text (in the ISO 8859-1 character set) from a PostScript or PDF (Portable Document Format) file. Thus, pstotext is similar to the ps2ascii program that comes with ghostscript. The output of pstotext is however better than that of ps2ascii, because pstotext deals better with punctuation and ligatures. I have a pdf file produced by a recent version of InDesign CS. The utility pdf2ps will produce a Postscript file that is readable using GV. However from that point on everything fails. There seems to be no way to convert the file to ASCII except by cutting pages and pasting them into e.g., Gvim. I have tried creating a subset of the pages and then converting that. What I get is just the EOP characters. This is the second such file I have had trouble with. It may have something to do with PostScript 1.5. In any case this is a customer's file and I can't very well ask him to resave his PDF to an earlier version. I have tried versions of Ghostscript on Slackware and on Knoppix, a Debian derivative. I have downloaded and installed Ghostscript 8.50. I have installed the latest pstotext. Nothing works. John Culleton -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
search through postscript documents?
Is there a way(program) to search for expressions in a postscript document? and to copy + paste words out of a ghostview-program to text? As i am using Kghostview now, and i am missing these features, so i will ask on this list. :-) thanks, joerg -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: search through postscript documents?
On Thu, 3 Mar 2005, Joerg Reckers wrote: Is there a way(program) to search for expressions in a postscript document? and to copy + paste words out of a ghostview-program to text? AFAIK: postscript doesn't preserve word, sentence etc. boundaries. So there's no way to reliably know what a word inside a ps is. One of the goals of PDF is to remedy exaclty this problem. *t -- --- Tomas Pospisek http://sourcepole.com - Linux Open Source Solutions --- -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: search through postscript documents?
On Thu, Mar 03, 2005 at 12:15:40PM +0100, Joerg Reckers wrote: Is there a way(program) to search for expressions in a postscript document? and to copy + paste words out of a ghostview-program to text? As i am using Kghostview now, and i am missing these features, so i will ask on this list. :-) thanks, joerg Package: pstotext Priority: optional Section: text Installed-Size: 110 Maintainer: J.H.M. Dassen (Ray) [EMAIL PROTECTED] Architecture: i386 Version: 1.9-1 Depends: gs | gs-aladdin (= 3.51), libc6 (= 2.3.2.ds1-4) Filename: pool/main/p/pstotext/pstotext_1.9-1_i386.deb Size: 32294 MD5sum: a159e4b756759beeae003700d31487d1 Description: Extract text from PostScript and PDF files pstotext extracts text (in the ISO 8859-1 character set) from a PostScript or PDF (Portable Document Format) file. Thus, pstotext is similar to the ps2ascii program that comes with ghostscript. The output of pstotext is however better than that of ps2ascii, because pstotext deals better with punctuation and ligatures. So, you can pipe the output to some shellscript, with sed or gawk in the background to process the text. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]