Thanks for kindly writing to me, Professor Moore. I apologize for two reasons here: 1) I think I misspelled mebibyte (2^20 bytes). 2) It appears that I sent one e-mail letter twice.--Sorry, I made those two mistakes. Anyhow, again thanks, Professor Moore, for kindly taking the time to write to me.

Pat

--------------------------------------------------
From: "Ross Moore" <ross.mo...@mq.edu.au>
Sent: Friday, June 24, 2011 7:22 PM
To: "Pat Somerville" <l_pa...@hotmail.com>
Subject: Re: [l2h] Poor resolution for my .eps file contents in a .html file produced by LaTeX2HTML and viewed in a Web browser such as Konqueror

Hi Pat,

On 25/06/2011, at 7:06 AM, Pat Somerville wrote:

Getting good results from scanning old books is quite an art form.
You definitely need to experiment quite a bit with settings on the scanner,
and also with image manipulation software afterwards.

I always advise to do this with just a few pages first, before embarking
on a program to scan hundreds of pages. Otherwise, you'll end up
having to repeat a lot of the physical work of placing and scanning
many of the pages that did not give you the quality result that you
desire.

I scanned a copy of one of my own drawings and caption on paper in both the Tagged Image File Format (.tif), in an attached figure called TestFig.tif, and the Portable Document Format (.pdf), in an attached figure called TestFig.pdf.

Yes, these look really good, due to the scanning resolution.

Note how much smaller (45k) is TestFig.pdf than TestFig.tif (2.7Mb).



Using the Gnu Not Unix (GNU) Image Manipulation Program (GIMP) 2.6.11 I made Encapsulated PostScript (.eps) files of those figures, which are respectively the attached files TestFigTif.eps and TestFigPDF.eps; in that process I probably cropped the original figures.

TestFigTif.eps  worked fine, giving a good size reduction to  336kb.

But TestFigPDF.eps  has expanded to  2.6Mb and displays at poor quality.
If this really did come from  TestFig.pdf  then the compression used
in that file certainly has not been preserved by the conversion to .eps .
What displays seems to be a black&white bitmap version of the image,
losing all gray-scale information. But it is the use of gray-scales that
make images seem to be clearer and cleaner (better quality!).
The effect is also know as "anti-aliasing".
The 2.6 Mb .eps file must surely contain the grayscale info., presumably
as well as the lo-res (Preview) bitmap version, but it is the lo-res that
seems to be the image that my viewer shows. Not sure why --- with more
than one version of the image within the file, it could be that the viewer
just chose to show the lo-res preview. A printer should use the hi-res
description ...


Then in the attached LaTeX file Throwaway11.tex I included those two .eps figures using the LaTeX epsfig software package. I executed a latex2html command on Throwaway11.tex to produce Throwaway11.html and 11 other files which are all attached to this e-mail letter. On viewing Throwaway11.html in a Konqueror Web browser I found the qualities of the two figures TestFigPDF.eps and TestFigTif.eps quite acceptable!

... as must have happened here.



Then I noticed that some lines or curves of letters in some mathematics type in the book I was copying were probably thinner than the lines or curves of the drawing on paper to which I referred in the previous paragraph here. On a "surface level" that could point to problems or complications associated with the original source on paper rather than to LaTeX2HTML or the GIMP, although to my human eye the type on the original book page was of excellent visual quality. But I found ways to adjust my Epson Stylus CX3810's scanning program to deal with the thin-lined type on the page of the book in question: In the Epson scanner program's "Professional Mode" 1) for the "Auto Exposure Type" I probably switched from "Photo" to "Document." 2) For the "Image Type" I switched from "Black and White" to "8-bit Grayscale." I still kept the resolution setting at 600 dots per inch (dpi). Differences with various choices of "Image Type" could even be seen in a "Preview" or brief scan of the book page. As a result the full scan of the book page took considerably longer than previously, but with a total scan time of within a few minutes. I guess that increased scan time might be a clue to why I gratefully had success with the scanner program setting "8-bit Grayscale"

Yes.  8-bit implies 2^8 = 256 different possible shades of gray,
ranging from white to black.
When rasterising, you definitely want to capture this amount of information,
otherwise you get "blocky" images.

Later processing can cut down this information to something that still looks
good on-screen. But you must start with a lot more.

in the case of a thin-lined document! I saved the scanned file as a .pdf file with a size of 1.1 MiB (megibyte=2**20 bytes). As a result of the "8-bit Grayscale" setting I could see a blue or grey color near the book-binding side of the page in the scanned image file of that book page. In importing the .pdf file in the GIMP it might have been important to set the resolution to 600 pixels/inch instead of some possibly lower default setting. In the GIMP I could crop away that unwanted grey or blue color by clicking on the GIMP's select tool and then clicking on the image and dragging the touch-pad pointer while holding down the left touch-pad button to enclose the interesting portion of the image of the book page in a dashed rectangle; then I could select "Image" and then "Crop to Selection" to accomplish the cropping. Via "Image" and "Rescale Image" or something similar I could resize the image to the roughly 6.00-inch width I desired while keeping the aspect ratio or proportions of the figure unchanged. A surprise for me was that the .eps file converted from this .pdf file using the GIMP had a large size of 33.2 MiB. In the GIMP I had to do something unusual, namely as I was directed by the GIMP to export the file, which I found I could do by clicking on an "Export" button before finally saving the .eps file. In my .tex file I changed the name of the .eps file I wanted to include to match the new .eps file name I made using the GIMP.


Then after executing latex .... and latex2html.... commands on that .tex file I gratefully found that this time the new .eps file had very good visual quality with minor graininess visible especially within the letter "H." The size of the folder produced by LaTeX2HTML was roughly equal to the 1.1-MiB size of the .pdf file with which I started. So apparently about 97 percent or more of the contents of the 33.2-MiB .eps file were discarded in LaTeX2HTML's process of generating the folder containing the output, .html file.

Sure.
The resulting files img1.png  and  img2.png have down-sampled back
to a simple black&white image rather than using grayscales.
I'm not sure why this has happened. There should be options that you
can give to LaTeX2HTML to affect the image processing, and get
much better final graphics.

Isn't there a switch  -antialias   that you can use?



I am grateful to now have a way to produce fairly good-quality figures in a .html file using LaTeX2HTML from .pdf scans of printed paper converted to .eps files using the GIMP. But as the results in the first paragraph have shown, the "Black and White"

Definitely do *not* use this setting.

and probably "Photo" settings of my scanner program were apparently adequate for lines and curves that were not very thin in the original source.

Certainly try with "Photo", to see the differences.
But "Grayscale" better tells the scanner what kind of material
is being scanned.
It may have been black&white once, but ink spreads into the paper,
and the pages fade with time --- not to mention yellowing.
So it really is appropriate to use Grayscale.


So in retrospect thank you, Professor Moore, for your suggestion that I provide you with one or more of my example files. That suggestion and the circumstance of a copyright were early steps which ultimately and gratefully led me to a solution to the problem of how to generate fairly good-quality figures in a .html file produced by LaTeX2HTML.

Pat

I'm glad you are happy with what you have achieved.

Preserving the content of old books --- particularly those with
mathematical content that is still valid and relevant --- is a very worthwhile undertaking.


All the best,

Ross

------------------------------------------------------------------------
Ross Moore                                       ross.mo...@mq.edu.au
Mathematics Department                           office: E7A-419
Macquarie University                             tel: +61 (0)2 9850 8955
Sydney, Australia  2109                          fax: +61 (0)2 9850 8114
------------------------------------------------------------------------




_______________________________________________
latex2html mailing list
latex2html@tug.org
http://tug.org/mailman/listinfo/latex2html

Reply via email to