Thanks for kindly writing to me, Professor Moore. I apologize for two
reasons here: 1) I think I misspelled mebibyte (2^20 bytes). 2) It appears
that I sent one e-mail letter twice.--Sorry, I made those two mistakes.
Anyhow, again thanks, Professor Moore, for kindly taking the time to write
to me.
Pat
--------------------------------------------------
From: "Ross Moore" <ross.mo...@mq.edu.au>
Sent: Friday, June 24, 2011 7:22 PM
To: "Pat Somerville" <l_pa...@hotmail.com>
Subject: Re: [l2h] Poor resolution for my .eps file contents in a .html file
produced by LaTeX2HTML and viewed in a Web browser such as Konqueror
Hi Pat,
On 25/06/2011, at 7:06 AM, Pat Somerville wrote:
Getting good results from scanning old books is quite an art form.
You definitely need to experiment quite a bit with settings on the
scanner,
and also with image manipulation software afterwards.
I always advise to do this with just a few pages first, before embarking
on a program to scan hundreds of pages. Otherwise, you'll end up
having to repeat a lot of the physical work of placing and scanning
many of the pages that did not give you the quality result that you
desire.
I scanned a copy of one of my own drawings and caption on paper in both
the Tagged Image File Format (.tif), in an attached figure called
TestFig.tif, and the Portable Document Format (.pdf), in an attached
figure called TestFig.pdf.
Yes, these look really good, due to the scanning resolution.
Note how much smaller (45k) is TestFig.pdf than TestFig.tif (2.7Mb).
Using the Gnu Not Unix (GNU) Image Manipulation Program (GIMP) 2.6.11 I
made Encapsulated PostScript (.eps) files of those figures, which are
respectively the attached files TestFigTif.eps and TestFigPDF.eps; in
that process I probably cropped the original figures.
TestFigTif.eps worked fine, giving a good size reduction to 336kb.
But TestFigPDF.eps has expanded to 2.6Mb and displays at poor quality.
If this really did come from TestFig.pdf then the compression used
in that file certainly has not been preserved by the conversion to .eps .
What displays seems to be a black&white bitmap version of the image,
losing all gray-scale information. But it is the use of gray-scales that
make images seem to be clearer and cleaner (better quality!).
The effect is also know as "anti-aliasing".
The 2.6 Mb .eps file must surely contain the grayscale info., presumably
as well as the lo-res (Preview) bitmap version, but it is the lo-res that
seems to be the image that my viewer shows. Not sure why --- with more
than one version of the image within the file, it could be that the viewer
just chose to show the lo-res preview. A printer should use the hi-res
description ...
Then in the attached LaTeX file Throwaway11.tex I included those two .eps
figures using the LaTeX epsfig software package. I executed a latex2html
command on Throwaway11.tex to produce Throwaway11.html and 11 other files
which are all attached to this e-mail letter. On viewing
Throwaway11.html in a Konqueror Web browser I found the qualities of the
two figures TestFigPDF.eps and TestFigTif.eps quite acceptable!
... as must have happened here.
Then I noticed that some lines or curves of letters in some mathematics
type in the book I was copying were probably thinner than the lines or
curves of the drawing on paper to which I referred in the previous
paragraph here. On a "surface level" that could point to problems or
complications associated with the original source on paper rather than to
LaTeX2HTML or the GIMP, although to my human eye the type on the original
book page was of excellent visual quality. But I found ways to adjust my
Epson Stylus CX3810's scanning program to deal with the thin-lined type
on the page of the book in question: In the Epson scanner program's
"Professional Mode" 1) for the "Auto Exposure Type" I probably switched
from "Photo" to "Document." 2) For the "Image Type" I switched from
"Black and White" to "8-bit Grayscale." I still kept the resolution
setting at 600 dots per inch (dpi). Differences with various choices of
"Image Type" could even be seen in a "Preview" or brief scan of the book
page. As a result the full scan of the book page took considerably
longer than previously, but with a total scan time of within a few
minutes. I guess that increased scan time might be a clue to why I
gratefully had success with the scanner program setting "8-bit Grayscale"
Yes. 8-bit implies 2^8 = 256 different possible shades of gray,
ranging from white to black.
When rasterising, you definitely want to capture this amount of
information,
otherwise you get "blocky" images.
Later processing can cut down this information to something that still
looks
good on-screen. But you must start with a lot more.
in the case of a thin-lined document! I saved the scanned file as a .pdf
file with a size of 1.1 MiB (megibyte=2**20 bytes). As a result of the
"8-bit Grayscale" setting I could see a blue or grey color near the
book-binding side of the page in the scanned image file of that book
page. In importing the .pdf file in the GIMP it might have been important
to set the resolution to 600 pixels/inch instead of some possibly lower
default setting. In the GIMP I could crop away that unwanted grey or
blue color by clicking on the GIMP's select tool and then clicking on the
image and dragging the touch-pad pointer while holding down the left
touch-pad button to enclose the interesting portion of the image of the
book page in a dashed rectangle; then I could select "Image" and then
"Crop to Selection" to accomplish the cropping. Via "Image" and "Rescale
Image" or something similar I could resize the image to the roughly
6.00-inch width I desired while keeping the aspect ratio or proportions
of the figure unchanged. A surprise for me was that the .eps file
converted from this .pdf file using the GIMP had a large size of 33.2
MiB. In the GIMP I had to do something unusual, namely as I was directed
by the GIMP to export the file, which I found I could do by clicking on
an "Export" button before finally saving the .eps file. In my .tex file
I changed the name of the .eps file I wanted to include to match the new
.eps file name I made using the GIMP.
Then after executing latex .... and latex2html.... commands on that .tex
file I gratefully found that this time the new .eps file had very good
visual quality with minor graininess visible especially within the letter
"H." The size of the folder produced by LaTeX2HTML was roughly equal to
the 1.1-MiB size of the .pdf file with which I started. So apparently
about 97 percent or more of the contents of the 33.2-MiB .eps file were
discarded in LaTeX2HTML's process of generating the folder containing the
output, .html file.
Sure.
The resulting files img1.png and img2.png have down-sampled back
to a simple black&white image rather than using grayscales.
I'm not sure why this has happened. There should be options that you
can give to LaTeX2HTML to affect the image processing, and get
much better final graphics.
Isn't there a switch -antialias that you can use?
I am grateful to now have a way to produce fairly good-quality figures in
a .html file using LaTeX2HTML from .pdf scans of printed paper converted
to .eps files using the GIMP. But as the results in the first paragraph
have shown, the "Black and White"
Definitely do *not* use this setting.
and probably "Photo" settings of my scanner program were apparently
adequate for lines and curves that were not very thin in the original
source.
Certainly try with "Photo", to see the differences.
But "Grayscale" better tells the scanner what kind of material
is being scanned.
It may have been black&white once, but ink spreads into the paper,
and the pages fade with time --- not to mention yellowing.
So it really is appropriate to use Grayscale.
So in retrospect thank you, Professor Moore, for your suggestion that I
provide you with one or more of my example files. That suggestion and
the circumstance of a copyright were early steps which ultimately and
gratefully led me to a solution to the problem of how to generate fairly
good-quality figures in a .html file produced by LaTeX2HTML.
Pat
I'm glad you are happy with what you have achieved.
Preserving the content of old books --- particularly those with
mathematical content that is still valid and relevant ---
is a very worthwhile undertaking.
All the best,
Ross
------------------------------------------------------------------------
Ross Moore ross.mo...@mq.edu.au
Mathematics Department office: E7A-419
Macquarie University tel: +61 (0)2 9850 8955
Sydney, Australia 2109 fax: +61 (0)2 9850 8114
------------------------------------------------------------------------
_______________________________________________
latex2html mailing list
latex2html@tug.org
http://tug.org/mailman/listinfo/latex2html