The benefit of using the pdftohtml and reading the html-document with lynx is, that all links (internal and external) are usable including links to the document pages (outuline). One problem is, that pdftohtml does not recoginise the reading order tags in pdf (e.g. text in two or more columns).
Klaus On Tue, 4 Jun 2019, Steffen Nurpmeso wrote:
Klaus-Peter Wegge wrote in <[email protected]\ c.uni-paderborn.de>: |On Tue, 4 Jun 2019, Steffen Nurpmeso wrote: |> [email protected] wrote in <[email protected]\ |> t>: |>| I use: |>| |>| pdftotext -layout %s - | utf8trans UTFtoASCII |> |> Yes, Mr. Bell, luckily it has that -layout argument. |> Whereas mupdf (now) comes with mutool, which can "convert", that |> pdftotext from poppler with its -layout is the only PDF (and thus |> PS) converter i know who does an acceptable job. ... |Hi, |I'm reading most document formats with the help auf lynx and various |format2html tools like pdftohtml I possibly should have pointed out that i had direct conversation with Mr. Bell on another ML in the past, and to me he is known as someone who hits the mark. (He reported bugs of the software i maintain. Thanks again, Mr. Bell.) |Here is short version of my viewer script: | |--- |#!/bin/sh | |dir=/tmp/$USER/viewer.$$ |mkdir -p $dir |doc=$1 | |case $doc in | *.pdf) file=`basename "$doc" .pdf`; | echo Portable Document Format: $doc; | html="$file"_ind.html; | pdftohtml -nodrm -hidden -enc Latin1 "$doc" "$dir"\/"$file" | *.vcl) file=`basename "$doc" .vcl`; | echo Calendar: $doc; | html="$file.txt"; | vcal "$doc" >"$dir/$html"; | show="$dir/$html";; | *) echo eror; | exit 0;; |esac | |lynx -nolist file://localhost"$show" |rm -rf "$dir" | |--- |I have removed the cases for other formats. |Remark: the pdftohtml Option seem to be different on various Linuxes. | |Kluaus lynx is a "swiss army knife" program, just like the shell. I personally prefer plain text, if i can. Luckily i have all my senses in acceptable shape, that is to say, and do not need a Braille reader, nor do i know how bad that ends up when converting PS or PDF to plain text or HTML. All i can imagine is that converting via ghostscript's text output cannot be it. --steffen
_______________________________________________ Lynx-dev mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/lynx-dev
