Re: [Lynx-dev] Displaying a pdf live on the Fly?
2019/06/03 22:58 ... Tim Chase: The quality of the output depends largely on how the PDF was created, so I have some mostly-pure-text PDFs where it works great; and I have some PDFs that are full of graphics and poorly laid-out that are next to useless when piped through pdftotext. YMMV. And I hav encountered PDFs that were really only collections of photographed pages. ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] Displaying a pdf live on the Fly?
The benefit of using the pdftohtml and reading the html-document with lynx is, that all links (internal and external) are usable including links to the document pages (outuline). One problem is, that pdftohtml does not recoginise the reading order tags in pdf (e.g. text in two or more columns). Klaus On Tue, 4 Jun 2019, Steffen Nurpmeso wrote: Klaus-Peter Wegge wrote in : |On Tue, 4 Jun 2019, Steffen Nurpmeso wrote: |> russellb...@gmail.com wrote in <201906040449.x544niog005...@randytool.ne\ |> t>: |>| I use: |>| |>| pdftotext -layout %s - | utf8trans UTFtoASCII |> |> Yes, Mr. Bell, luckily it has that -layout argument. |> Whereas mupdf (now) comes with mutool, which can "convert", that |> pdftotext from poppler with its -layout is the only PDF (and thus |> PS) converter i know who does an acceptable job. ... |Hi, |I'm reading most document formats with the help auf lynx and various |format2html tools like pdftohtml I possibly should have pointed out that i had direct conversation with Mr. Bell on another ML in the past, and to me he is known as someone who hits the mark. (He reported bugs of the software i maintain. Thanks again, Mr. Bell.) |Here is short version of my viewer script: | |--- |#!/bin/sh | |dir=/tmp/$USER/viewer.$$ |mkdir -p $dir |doc=$1 | |case $doc in |*.pdf) file=`basename "$doc" .pdf`; | echo Portable Document Format: $doc; | html="$file"_ind.html; | pdftohtml -nodrm -hidden -enc Latin1 "$doc" "$dir"\/"$file" |*.vcl) file=`basename "$doc" .vcl`; | echo Calendar: $doc; | html="$file.txt"; | vcal "$doc" >"$dir/$html"; | show="$dir/$html";; |*) echo eror; | exit 0;; |esac | |lynx -nolist file://localhost"$show" |rm -rf "$dir" | |--- |I have removed the cases for other formats. |Remark: the pdftohtml Option seem to be different on various Linuxes. | |Kluaus lynx is a "swiss army knife" program, just like the shell. I personally prefer plain text, if i can. Luckily i have all my senses in acceptable shape, that is to say, and do not need a Braille reader, nor do i know how bad that ends up when converting PS or PDF to plain text or HTML. All i can imagine is that converting via ghostscript's text output cannot be it. --steffen ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] Displaying a pdf live on the Fly?
Klaus-Peter Wegge wrote in : |On Tue, 4 Jun 2019, Steffen Nurpmeso wrote: |> russellb...@gmail.com wrote in <201906040449.x544niog005...@randytool.ne\ |> t>: |>| I use: |>| |>| pdftotext -layout %s - | utf8trans UTFtoASCII |> |> Yes, Mr. Bell, luckily it has that -layout argument. |> Whereas mupdf (now) comes with mutool, which can "convert", that |> pdftotext from poppler with its -layout is the only PDF (and thus |> PS) converter i know who does an acceptable job. ... |Hi, |I'm reading most document formats with the help auf lynx and various |format2html tools like pdftohtml I possibly should have pointed out that i had direct conversation with Mr. Bell on another ML in the past, and to me he is known as someone who hits the mark. (He reported bugs of the software i maintain. Thanks again, Mr. Bell.) |Here is short version of my viewer script: | |--- |#!/bin/sh | |dir=/tmp/$USER/viewer.$$ |mkdir -p $dir |doc=$1 | |case $doc in |*.pdf) file=`basename "$doc" .pdf`; | echo Portable Document Format: $doc; | html="$file"_ind.html; | pdftohtml -nodrm -hidden -enc Latin1 "$doc" "$dir"\/"$file" |*.vcl) file=`basename "$doc" .vcl`; | echo Calendar: $doc; | html="$file.txt"; | vcal "$doc" >"$dir/$html"; | show="$dir/$html";; |*) echo eror; | exit 0;; |esac | |lynx -nolist file://localhost"$show" |rm -rf "$dir" | |--- |I have removed the cases for other formats. |Remark: the pdftohtml Option seem to be different on various Linuxes. | |Kluaus lynx is a "swiss army knife" program, just like the shell. I personally prefer plain text, if i can. Luckily i have all my senses in acceptable shape, that is to say, and do not need a Braille reader, nor do i know how bad that ends up when converting PS or PDF to plain text or HTML. All i can imagine is that converting via ghostscript's text output cannot be it. --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt) ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] Displaying a pdf live on the Fly?
Well, Steffan-and-All, thanks much for a spirited discussion. I will try mupdf, but also a majority of the time in pdftotext, I am not needing a -layout option as much as years ago. And yes, as example, my HOA sends an invoice as a pdf, but more what I wanted to do is read an article on a news related web-site, where an only option may be a pdf. Seems like a waste of time to download or save a pdf-and-then hope to convert. Sometimes all you see is a letter l. Thanks again Chime ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] Displaying a pdf live on the Fly?
Tim Chase dixit: >If you have the "pdftotext" utility (part of my "poppler-utils" pdftotext isn’t bad, but I almost always have X11+uxterm around lynx anyway (for proper Unicode support), so I get along well with: $ fgrep mupdf /etc/lynx.cfg DOWNLOADER:View in mupdf:mupdf '%s':FALSE:XWINDOWS bye, //mirabilos -- 15:39⎜«mika:#grml» mira|AO: "mit XFree86® wär’ das nicht passiert" - muhaha 15:48⎜ also warum machen die xorg Jungs eigentlich alles kaputt? :)15:49⎜ thkoehler: weil sie als Kinder nie den gebauten Turm selber umschmeissen durften? -- ~/.Xmodmap wonders… ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] Displaying a pdf live on the Fly?
Hi, I'm reading most document formats with the help auf lynx and various format2html tools like pdftohtml Here is short version of my viewer script: --- #!/bin/sh dir=/tmp/$USER/viewer.$$ mkdir -p $dir doc=$1 case $doc in *.pdf) file=`basename "$doc" .pdf`; echo Portable Document Format: $doc; html="$file"_ind.html; pdftohtml -nodrm -hidden -enc Latin1 "$doc" "$dir"\/"$file" *.vcl) file=`basename "$doc" .vcl`; echo Calendar: $doc; html="$file.txt"; vcal "$doc" >"$dir/$html"; show="$dir/$html";; *) echo eror; exit 0;; esac lynx -nolist file://localhost"$show" rm -rf "$dir" --- I have removed the cases for other formats. Remark: the pdftohtml Option seem to be different on various Linuxes. Kluaus On Tue, 4 Jun 2019, Steffen Nurpmeso wrote: russellb...@gmail.com wrote in <201906040449.x544niog005...@randytool.net>: | I use: | | pdftotext -layout %s - | utf8trans UTFtoASCII Yes, Mr. Bell, luckily it has that -layout argument. Whereas mupdf (now) comes with mutool, which can "convert", that pdftotext from poppler with its -layout is the only PDF (and thus PS) converter i know who does an acceptable job. --steffen | ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] Displaying a pdf live on the Fly?
russellb...@gmail.com wrote in <201906040449.x544niog005...@randytool.net>: | I use: | | pdftotext -layout %s - | utf8trans UTFtoASCII Yes, Mr. Bell, luckily it has that -layout argument. Whereas mupdf (now) comes with mutool, which can "convert", that pdftotext from poppler with its -layout is the only PDF (and thus PS) converter i know who does an acceptable job. --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt) ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] Displaying a pdf live on the Fly?
Tim Chase wrote in <20190603215813.5034d...@bigbox.christie.dr>: |If you have the "pdftotext" utility (part of my "poppler-utils" |package here on Debian), you might be able to either use it in your |mailcap | | pdftotext "%s" - | less | |or create a shell-script: | | #!/bin/sh | pdftotext "$1" - | less | |and then spawn that shell-script in your mailcap file: | | application/pdf; my_pdf_to_text.sh "%s" And do not forget the -layout argument. --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt) ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] Displaying a pdf live on the Fly?
On 04/06/2019 12:01, Mouse wrote: ...nor, apparently, that not everyone has Word. Reading with Open Office (and without Microsoft fonts) is the common reason why the layout ends up broken. ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] Displaying a pdf live on the Fly?
On 04/06/2019 11:57, Mouse wrote: I don't recall enough details to know whether FlateDecode's compression algorithm is close enough to any of the general-purpose compression FlateDecode uses the core algorithm from gzip (and also PNG), but won't have the metadata. DCTDecode uses the JPEG (discrete cosine transform) algorithm, again metadata will be outside the compressed stream. There is also one for Group 4 Fax, which is what should be, but often isn't, used for bi-level scans of documents, and older PDFs have LZWDecode, which is compress. ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] Displaying a pdf live on the Fly?
>> Are people using pdf emails? > Businesses will often send an email with [...] a PDF attachment. > Less sophisticated businesses will often do this with Microsoft Word, > not realising that they cannot predict the layout. ...nor, apparently, that not everyone has Word. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] Displaying a pdf live on the Fly?
Well, lynx said it may be a binary, see it anyway? It was a mess. >> Yes. Most PDFs in my experience have most of their data compressed, >> so they are "binary junk" when looked at with tools that don't >> understand PDF structure and the compression method(s) in question. > zless may be a better alternative since it does compressed data. Not of much use here. PDFs are not simply text files which have had a general-purpose compression tool applied to them; they have internal structure, and _some_ of the content gets compressed. One PDF I have, for example, begins %PDF-1.6 %âãÏÓ 5191 0 obj <>stream after which the "binary junk" begins. A few KB later (3647 bytes, I expect), I see endstream endobj 5192 0 obj <>stream and it's back to binary compressed data. Other PDFs have more plaintext before the compressed data begins; another one I checked has some sixty or seventy lines of plain text before going into compressed data. I don't recall enough details to know whether FlateDecode's compression algorithm is close enough to any of the general-purpose compression tools like gzip or compress to be of use, but even if it is, you would at a minimum have to pick apart the PDF structure enough to extract the compressed portion. And, of course, FlateDecode is not the only compression algorithm PDFs can use. For full details, of course, read the PDF spec. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] Displaying a pdf live on the Fly?
On 04/06/2019 11:38, Mike Marchywka wrote: Are people using pdf emails? Businesses will often send an email with a formal business letter, with all the proper letterheads, as a PDF attachment. Invoices are often done this way. Less sophisticated businesses will often do this with Microsoft Word, not realising that they cannot predict the layout. ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] Displaying a pdf live on the Fly?
On Mon, Jun 03, 2019 at 07:44:33PM -0700, Chime Hart wrote: > Hi All: I am realizing it would be easier to have lynx display pdfs basicly > like any other texts. Otherwise I must also run a pdf converter on the file > or e-mail to RoboBraille. Anyway we tried modifying my dot mailcap file, > like this > application/pdf; less "%s" Are people using pdf emails? I was curious about latex email because the source code can be made more human readable than html- I was usig lynx to try to convert the html layout info into logical latex like syntax for testing some stuff. I don't really like non-text emails but for things that absolutely need to have some structure - logical or layout- latex like source would probably be a better way to go than anything compiled into a binary display format. With logical structure the viewer can expand or hide various blocks as needed. Right now I am just noting tags I found useful for later parsing into logical strucutre. So I added things to the formated output, ./lynx -cfg=./lynx.cfg -mjm=2 -dump -force-html ifn_clips.txt | grep "" | more unknown option name EXTERNAL in ./lynx.cfg \br{}EC4Y 0AN, United Kingdom - Company No. 09901510 [Exact \p{This message contains My NCBI what's new results from the National (\url{http://www.ncbi.nlm.nih.gov/}NCBI) at the U.S. National Library of Medicine (\url{http://www.nlm.nih.gov/}NLM). \br{}Do not reply directly to this message. \p{Sender's message: Search: interferon \br{}Search: interferon \br{} \br{}\url{http://www.ncbi.nlm.nih.gov/myncbi/searches/1340461/1jGkfURVu \br{} \br{}\url{http://www.ncbi.nlm.nih.gov/myncbi/searches/1340461/}Edit to aid with things like this which is solely for input into a viewer I was trying to make, cat mail_clip_file.txt | gawk -f pubmed.awk | more \citation{ 2. World J Biol Psychiatry. 2019 May 13:1-22. doi: 10.1080/15622975.2019.1618494. [Epub ahead of print]} \title{[16]Cytokine-mediated cellular immune activation in electroconvulsive therapy: A CSF study in patients with treatment-resistant depression.} \authors{ [17]Mindt S^1, [18]Neumaier M^1, [19]Hoyer C^2, [20]Sartorius A^3, [21]Kranaster L^3. Author information: 1. a Institute for Clinical Chemistry, University Medical Centre Mannheim, Faculty of Medicine Mannheim, University of Heidelberg , Mannheim , Germany. 2. b Department of Neurology , University Medical Centre Mannheim , Mannheim , Germany. 3. c Department of Psychiatry and Psychotherapy , Central Institute of Mental Health, Medical Faculty Mannheim/Heidelberg University , Mannheim , Germany.} \abstract{OBJECTIVES: Evidence points towar It is interesting though that while many test emxails can be under 1k, the headers can be 5x more than that with all the relays and spam stuff lol. > Well, lynx said it may be a binary, see it anyway? It was a mess. So can > some1 please inform an easy way of doing this, or would I need an external? > Thanks so much in advance > Chime > > ___ > Lynx-dev mailing list > Lynx-dev@nongnu.org > https://lists.nongnu.org/mailman/listinfo/lynx-dev -- mike marchywka 306 charles cox canton GA 30115 USA, Earth marchy...@hotmail.com 404-788-1216 ORCID: -0001-9237-455X ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] Displaying a pdf live on the Fly?
zless may be a better alternative since it does compressed data. The strings utility is in debian build-essential package. On Mon, 3 Jun 2019, Mouse wrote: > Date: Mon, 3 Jun 2019 23:21:56 > From: Mouse > To: lynx-dev@nongnu.org > Subject: Re: [Lynx-dev] Displaying a pdf live on the Fly? > > >> Hi All: I am realizing it would be easier to have lynx display pdfs > >> basicly like any other texts. > > This would be difficult; a few PDFs aren't text at all, and many more > aren't just text. How much you lose by keeping just the text can be > anything on the spectrum from nothing to everything. > > >> Anyway we tried modifying my dot mailcap file, like this > >> application/pdf; less "%s" > >> Well, lynx said it may be a binary, see it anyway? It was a mess. > > Yes. Most PDFs in my experience have most of their data compressed, so > they are "binary junk" when looked at with tools that don't understand > PDF structure and the compression method(s) in question. > > >> So can [someone] please inform an easy way of doing this, or would I > >> need an external? > > In full generality, there is no easy way. You will need _something_ > that understands the strtucture of PDFs. Even for just a "most cases" > converter, you probably will need something that knows enough about PDF > structure to decompress compressed content. > > There is a package, xpdf, which I picked up a decade ago from > ftp.foolabs.com (I don't know whether it's available anywhere these > days; I can make what I have available if it would help); it includes a > PDF-to-text converter which works well enough to be useful in some > cases for me. There may well be something better knocking around by > now; this is just the one I happen to know of. > > /~\ The ASCII Mouse > \ / Ribbon Campaign > X Against HTML mo...@rodents-montreal.org > / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B > > ___ > Lynx-dev mailing list > Lynx-dev@nongnu.org > https://lists.nongnu.org/mailman/listinfo/lynx-dev > -- ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] Displaying a pdf live on the Fly?
>> Hi All: I am realizing it would be easier to have lynx display pdfs >> basicly like any other texts. This would be difficult; a few PDFs aren't text at all, and many more aren't just text. How much you lose by keeping just the text can be anything on the spectrum from nothing to everything. >> Anyway we tried modifying my dot mailcap file, like this >> application/pdf; less "%s" >> Well, lynx said it may be a binary, see it anyway? It was a mess. Yes. Most PDFs in my experience have most of their data compressed, so they are "binary junk" when looked at with tools that don't understand PDF structure and the compression method(s) in question. >> So can [someone] please inform an easy way of doing this, or would I >> need an external? In full generality, there is no easy way. You will need _something_ that understands the strtucture of PDFs. Even for just a "most cases" converter, you probably will need something that knows enough about PDF structure to decompress compressed content. There is a package, xpdf, which I picked up a decade ago from ftp.foolabs.com (I don't know whether it's available anywhere these days; I can make what I have available if it would help); it includes a PDF-to-text converter which works well enough to be useful in some cases for me. There may well be something better knocking around by now; this is just the one I happen to know of. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] Displaying a pdf live on the Fly?
If you have the "pdftotext" utility (part of my "poppler-utils" package here on Debian), you might be able to either use it in your mailcap pdftotext "%s" - | less or create a shell-script: #!/bin/sh pdftotext "$1" - | less and then spawn that shell-script in your mailcap file: application/pdf; my_pdf_to_text.sh "%s" The quality of the output depends largely on how the PDF was created, so I have some mostly-pure-text PDFs where it works great; and I have some PDFs that are full of graphics and poorly laid-out that are next to useless when piped through pdftotext. YMMV. -tim On 2019-06-03 19:44, Chime Hart wrote: > Hi All: I am realizing it would be easier to have lynx display pdfs > basicly like any other texts. Otherwise I must also run a pdf > converter on the file or e-mail to RoboBraille. Anyway we tried > modifying my dot mailcap file, like this application/pdf; less "%s" > Well, lynx said it may be a binary, see it anyway? It was a mess. > So can some1 please inform an easy way of doing this, or would I > need an external? Thanks so much in advance > Chime > > ___ > Lynx-dev mailing list > Lynx-dev@nongnu.org > https://lists.nongnu.org/mailman/listinfo/lynx-dev ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] Displaying a pdf live on the Fly?
The strings utility maybe can help clean up some of that mess. Strings needs the pdf file and then maybe less can take it. How this would be done inside .mailcap or even if it would work I don't yet know. If it's possible to put an alias in a .mailcap file and in that alias always send everything to strings before less gets it that might work. On Mon, 3 Jun 2019, Chime Hart wrote: > Date: Mon, 3 Jun 2019 22:44:33 > From: Chime Hart > To: Discussion of Lynx Issues > Subject: [Lynx-dev] Displaying a pdf live on the Fly? > > Hi All: I am realizing it would be easier to have lynx display pdfs basicly > like any other texts. Otherwise I must also run a pdf converter on the file or > e-mail to RoboBraille. Anyway we tried modifying my dot mailcap file, like > this > application/pdf; less "%s" > Well, lynx said it may be a binary, see it anyway? It was a mess. So can some1 > please inform an easy way of doing this, or would I need an external? Thanks > so much in advance > Chime > > ___ > Lynx-dev mailing list > Lynx-dev@nongnu.org > https://lists.nongnu.org/mailman/listinfo/lynx-dev > -- ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev