Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-26 Thread Ken Sharp

At 18:05 26/09/2017 +0200, Knut Petersen wrote:


Just to be absolutely certain; the lack of PDFDontUseFontObjectNum is no 
longer a showstopper for you ?


We do not  need PDFDontUseFontObjectNum any longer. It's removal is not a 
showstopper.


Thanks for the confirmation!

We've run across a rather serious regression, so the release is going to be 
at least another week, possibly more. Sorry folks



Ken




Knut



___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-26 Thread Knut Petersen

Am 26.09.2017 um 14:13 schrieb Ken Sharp:

At 11:24 25/09/2017 +0200, Knut Petersen wrote:


Thanks to the ghostscript community for your great tool and your patience!


Just to be absolutely certain; the lack of PDFDontUseFontObjectNum is no longer 
a showstopper for you ?


We do not  need PDFDontUseFontObjectNum any longer. It's removal is not a 
showstopper.

Knut

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-26 Thread Ken Sharp

At 11:24 25/09/2017 +0200, Knut Petersen wrote:


Thanks to the ghostscript community for your great tool and your patience!


Just to be absolutely certain; the lack of PDFDontUseFontObjectNum is no 
longer a showstopper for you ?


We're planning to do a second release candidate 'real soon now' and if I 
need to reinstate the functionality I need to do it before we build that RC.


As long as you are all happy, then I won't reinstate it, if there's decent 
reason to think you still need it, please let me know.



Ken


___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-25 Thread Knut Petersen

Am 25.09.2017 um 10:41 schrieb David Kastrup:

The good news: The winning solution (96MB)

  * is possible without PDFDontUseFontObjectNum
  * is possible with HEAD of git master of ghostscript.

What's the tree size for "make doc"?  How does it deal with older
versions of Ghostscript?

At any rate: is there any reason for us to actually use glyphshow at all
considering its apparent drawbacks?


Some test show that a complete switch to "show" would increase file sizes for 
the normal user that compiles a score and does not process the pdf with other tools.

I'll answer with more details soon with some of the cc: dropped.

Thanks to the ghostscript community for your great tool and your patience!

Knut

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-25 Thread David Kastrup
Knut Petersen  writes:

>>> If other people agree that my
>> patch solves the problem
>> of exorbitant file sizes, we might live well without ghostscripts
>> PDFDontUseFontObjectNum.
>>
>> I'm not really clear on how this works, I'm curious as to where the
>> final embedded fonts are inserted. However, if it does work it would
>> be a good thing from my point of view. 
>
> It's amazing to see how putative minor changes affect pdf file sizes:
> The total size of the pdfs of our documentation (11108 pages in 47
> pdfs) varies between 96MB and 306MB dependent on some minor changes in
> the our code and the ghostscript version used.

That sounds worse than it is once you realize that those 47 pdf are in
something like 10 different languages, so people are going to use only a
subset of those anyway.

> The good news: The winning solution (96MB)
>
>  * is possible without PDFDontUseFontObjectNum
>  * is possible with HEAD of git master of ghostscript.

What's the tree size for "make doc"?  How does it deal with older
versions of Ghostscript?

At any rate: is there any reason for us to actually use glyphshow at all
considering its apparent drawbacks?  Or is it intended more for things
that should be treated like glyphs but are produced on-the-fly?  Like
beams and stems and such: we do those using graphical commands right now
but that does not allow for things like hinting.

-- 
David Kastrup

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-25 Thread Knut Petersen



If other people agree that my patch 
solves the problem of exorbitant file sizes, we might live well without ghostscripts 
PDFDontUseFontObjectNum.


I'm not really clear on how this works, I'm curious as to where the final embedded fonts are inserted. However, if it does work it would be a good thing from my point of view. 


It's amazing to see how putative minor changes affect pdf file sizes: The total 
size of the pdfs of our documentation (11108 pages in 47 pdfs) varies between 
96MB and 306MB dependent on some minor changes in the our code and the 
ghostscript version used.

The good news: The winning solution (96MB)

 * is possible without PDFDontUseFontObjectNum
 * is possible with HEAD of git master of ghostscript.

Knut

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-25 Thread Ken Sharp

At 19:12 23/09/2017 +0200, Knut Petersen wrote:

For a recent ghostscript without the PDFDontUseFontObjectNum only the 
combination of lilyponds --bigpdfs with -dgs-never-embed-fonts gives the 
desired result. The necessity for the extractpdfmark utility has not 
changed - use it if you link to destinations in other pdf files. 
Unfortunately our lilypond build system missed to use the --bigpdfs option 
together with the -dgs-never-embed-fonts.


If other people agree that my 
patch solves the problem of 
exorbitant file sizes, we might live well without ghostscripts 
PDFDontUseFontObjectNum.


I'm not really clear on how this works, I'm curious as to where the final 
embedded fonts are inserted. However, if it does work it would be a good 
thing from my point of view.



Ken


___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-23 Thread Knut Petersen

Hi everybody!

The removal of the PDFDontUseFontObjectNum option in ghostscript caused a lot 
of headache during the last days.

Lilypond provides since early 2015 a way to persuade ghostscript to remove 
duplicated fonts from pdfs.
Later the extractpdfmark program was introduced to allow links to other pdfs to 
survive this process.
Also the -dgs-never-embed-fonts was introduced to force ghostscript never to 
embed fonts.
We also have a -dgs-load-fonts option.

All those options influence important details of our postscript code generation.

For ghostscript versions 9.16 and older --bigpdfs might be used with or without 
the -dgs-never-embed-fonts.

For ghostscripts versions 9.17 until the removal of ghostscripts 
PDFDontUseFontObjectNum, --bigpdfs might successfully be used with or without 
-dgs-never-embed-fonts as long as PDFDontUseFontObjectNum is given to 
ghostscript.

For a recent ghostscript without the PDFDontUseFontObjectNum only the combination of lilyponds --bigpdfs with -dgs-never-embed-fonts gives the desired result. The necessity for the extractpdfmark utility has not changed - use it if you link to destinations in other pdf files. Unfortunately our 
lilypond build system missed to use the --bigpdfs option together with the -dgs-never-embed-fonts.


If other people agree that my patch  
solves the problem of exorbitant file sizes, we might live well without ghostscripts 
PDFDontUseFontObjectNum.

Knut
___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-23 Thread Masamichi Hosoda
> Note that I didn't attempt this on 9.21, just the 9.22 release
> candidate. While the fonts are properly dropped from the individual
> PDF files, none of the text was visible in the final PDF file, when
> rebuilding with Emmentaler supplied as an external font in fontmap.GS.
> 
> Its interesting that the OTF worked for you because I would expect it
> to; OTF fonts with CFF outlines are handled simply be extracting the
> CFF, which is usable directly in PostSceript. TrueType fonts are not
> directly supported by PostScript and that's where the heuristics come
> back again, we have to guess at some aspects of the encodings.

In my experiment, I used gs-9.22rc1 but I didn't use fontmap.GS.

Instead, I created a PostScript file
that contains only font resource in the same format as EPSs.
That is, for the  Emmentaler-20 font,
I created a file `fonts/Emmentaler-20.font.ps` with the following contents.

```
%%BeginFont: Emmentaler-20
%%BeginResource: font Emmentaler-20
%!PS-Adobe-3.0 Resource-FontSet
%%DocumentNeededResources: ProcSet (FontSetInit)
%%Title: (FontSet/Emmentaler-20)
%%Version: 0
%%EndComments
%%IncludeResource: ProcSet (FontSetInit)
%%BeginResource: FontSet (Emmentaler-20)
/FontSetInit /ProcSet findresource begin
%%BeginData: 67355 Binary Bytes
[...snip...]
%%EndData
%%EndResource
%%EndResource
%%EndFont
```

Then, I invoked Ghostscipt with the following command.

$ gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite \
 -sOutputFile=include-bigpdfs-noembed-gse.pdf \
 -c ".setpdfwrite" \
 -f fonts/*.font.ps include-bigpdfs-noembed.pdf

`fonts/Emmentaler-20.font.ps` and other font files
`fonts/TeXGyreSchola-Regular.font.ps` etc. are in my sample tarball.

In this method, for TrueType fonts,
I think that the encoding is confusing and broken as you mentioned.

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-23 Thread Ken Sharp

At 02:35 23/09/2017 +0900, Masamichi Hosoda wrote:



> Converting to TeX format would probably work, but apparently there
> were problems with that.
>
> Is there some other approach available ?

There is a method of using font non-embedded PDF.
In my experiment, it seems to work fine except TrueType fonts.


That was what I had tried. IMO its the fact that its an OpenType font, ie a 
TrueType font with CFF outlines, which is the problem rather than it being 
TrueType.


However, I haven't attempted to diagnose why that didn't work either.

Note that I didn't attempt this on 9.21, just the 9.22 release candidate. 
While the fonts are properly dropped from the individual PDF files, none of 
the text was visible in the final PDF file, when rebuilding with Emmentaler 
supplied as an external font in fontmap.GS.


Its interesting that the OTF worked for you because I would expect it to; 
OTF fonts with CFF outlines are handled simply be extracting the CFF, which 
is usable directly in PostSceript. TrueType fonts are not directly 
supported by PostScript and that's where the heuristics come back again, we 
have to guess at some aspects of the encodings.




Ken


___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-22 Thread Masamichi Hosoda
>>If there is a full font embedded (non-subset) PDF,
>>does the bigpdfs trick work effectively?
> 
> Its still, in my opinion, a risky thing to do. However, if the font
> were fully embedded, you wouldn't need to use Ghostscript and the
> PDFDontUseFontObjectNum bug approach (which is the risky part).
> 
> Because the fonts would be genuinely identical, MuPDF would be able to
> spot the font streams at least as being the same and would be able to
> reliably remove the duplicates.
> 
> The Font and FontDescriptor dictionaries might not be possible to
> remove, so the effect wouldn't be quite as good as the current
> approach, but these dictionaries run to a few tens or at most a few
> hundred bytes. The FontFile streams are where most of the space is
> going, and those would be possible to remove, if they were truly
> duplicates.

I understand that the bigpdfs trick should to be fixed
to use MuPDF instead of PDFDontUseFontObjectNum bug.

>>Or, even so, should we take other methods (e.g. using non-embedded
>>PDFs)?
> 
> I'm not sure what other methods there would be. Using EPS inclusions
> would have the same effect as PDF. Rendering to bitmaps would be (I'd
> guess) as large as the PDF files, and would suffer from
> non-scalability.
> 
> Converting to TeX format would probably work, but apparently there
> were problems with that.
> 
> Is there some other approach available ?

There is a method of using font non-embedded PDF.
In my experiment, it seems to work fine except TrueType fonts.

It uses like
`-c ".setpdfwrite << /NeverEmbed [ /Emmentaler-20 /TeXGyreSchola-Regular ] >> 
setdistillerparams"`
instead of
`-sOutputFile=filename.pdf`.

I've made sample files `20170922_lilypond_eps_pdf_examples.tar.xz`.
https://drive.google.com/file/d/0ByGBX3PDrqjsSFhVdXJfbjFjRlk/view?usp=sharing

In the tarball:

`fonts?-bigpdfs.eps`: EPS that is generated by LilyPond.
`fonts?-bigpdfs-noembed.pdf`: No font embedded PDF
  that is generated by Ghostscript
  with `/NeverEmbed` parameter like above.
`include-bigpdfs-noembed.pdf`: TeX outputted PDF that lacks LilyPond fonts.
`include-bigpdfs-noembed-gse.pdf`: Finally PDF that is font embedded
   by Ghostscript.

In `include-bigpdfs-noembed-gse.pdf`,
OTF seems to be fine. However, some glyphs of TTF are broken.

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-22 Thread Ken Sharp

At 10:12 22/09/2017 +0200, David Kastrup wrote:



Well, if we could delay the embedding, I'd not be particularly sad:
"make doc" currently(?) eats up more than 3Gb which is sort of
ridiculous.  The intermediate PDFs for lilypond-book are arranged in
some "database" and not really externalized, so if they don't work on
their own, this isn't a showstopper.


I guessed this was the case, and that's what I was aiming for, sadly it 
doesn't work.




At any rate, any such strategy could not be implemented and tested in
short time, so if in the mean time the font merging expedient would stay
available for some time, it would make things a lot smoother for us.


Clearly it'll have to, whatever happens next. There simply isn't time to 
implement and test anything else. And I'd be very nervous about putting any 
changes into the release without a decent interval for bugs to emerge.




We are not really striving for "optimum" rather than "better than awful"
regarding the resulting file sizes.  This seems like being close enough.


Again I guessed this was probably the case, but its good to hear it for 
sure. As I said in reply to suzuki toshiya, if the font isn't being fully 
embedded I'd be inclined to regard that as a bug (slightly diffident as I 
don't even know *why* its not being fully embedded yet, there might be a 
good reason).


If the font(s) were fully embedded, then mutool could remove the duplicates 
from the final file. Caveat; the intermediate PDF files will be bigger, 
possibly a *lot* bigger. Currently the font streams are running at ~9KB, 
the full font is ~65KB uncompressed, lets say 30 KB compressed (CFF fonts 
don't compress well). So you're looking at each file growing by about 40KB. 
If the 3xfont embedding I see in 9.22 is real, then that becomes 100KB, 
maybe more if its one font per character.


I'm sorry to keep repeating but I do need to take a good chunk of time to 
look into this, which I don't have right now.


I need to see why the font isn't being fully embedded, and why 9.22 is 
apparently embedding multiple fonts when I'd only expect one. The text and 
font code is particularly difficult to debug and amend, so this is probably 
several weeks work.



Ken


___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-22 Thread Ken Sharp

At 10:23 22/09/2017 +0200, David Kastrup wrote:


If it's a conceivable part of a good longterm strategy: I think our
fonts are generated via Fontforge starting with a METAFONT (or
METAPOST?) font description, so it's conceivable that if other font
formats would generally be better supported by toolchains in general
that we could possibly tell Fontforge to generate other formats.

I have very little clue of what is involved here, and circumnavigating a
possibly temporary Ghostscript bug alone would likely not be enough of
an incentive for investing that work.  It would really depend on the
quality of general support (mostly in the Free Software world) that we
could count on whether format/technology changes make sense here.


I think the most important thing is for me to get a grip on what's actually 
going on. I believe we intend to improve support for OTF fonts as 
substitutes for missing Fonts, but that's a long term project, so it only 
gets worked on when there are free cycles (and not by me any more) so 
eventually that problem may go away.


But there's no reason to wait for that, it was only a possible way to solve 
a problem. Looking at the real problems, potentially bugs, that I see is 
probably more important and more productive.




Ken


___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-22 Thread David Kastrup
Ken Sharp  writes:

> At 00:41 22/09/2017 +0200, David Kastrup wrote:
>
>
>> > Or, even so, should we take other methods (e.g. using non-embedded PDFs)?
>>
>>If we figure out a working alternative, we should take it.  The current
>>set of Ghostscript bugs in 9.22 is still a bit in flux, so it's not
>>clear yet which alternative actually could work.
>>
>>Is that a reasonable summary of the current state, Ken?
>
> I'd say so, yes. I can't think of a reasonable alternative right at
> the moment, which will yield the same or at least similar output file
> size.

[...]

> I've no idea why they aren't fully embedded but I'd have to guess its
> because they are CFF outlines, we don't see a lot of those. So it
> smells like a bug. I will look at it, as soon as I get some time, but
> its not likely to be a change we'll put into 9.22 given the state of
> the release cycle. In fact, realistically, its unlikely I'll even get
> the time to look at it before the release is complete.

If it's a conceivable part of a good longterm strategy: I think our
fonts are generated via Fontforge starting with a METAFONT (or
METAPOST?) font description, so it's conceivable that if other font
formats would generally be better supported by toolchains in general
that we could possibly tell Fontforge to generate other formats.

I have very little clue of what is involved here, and circumnavigating a
possibly temporary Ghostscript bug alone would likely not be enough of
an incentive for investing that work.  It would really depend on the
quality of general support (mostly in the Free Software world) that we
could count on whether format/technology changes make sense here.

-- 
David Kastrup

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-22 Thread David Kastrup
Ken Sharp  writes:

> At 07:01 22/09/2017 +0900, Masamichi Hosoda wrote:
>
>>If there is a full font embedded (non-subset) PDF,
>>does the bigpdfs trick work effectively?
>
> Its still, in my opinion, a risky thing to do. However, if the font
> were fully embedded, you wouldn't need to use Ghostscript and the
> PDFDontUseFontObjectNum bug approach (which is the risky part).

Well, if we could delay the embedding, I'd not be particularly sad:
"make doc" currently(?) eats up more than 3Gb which is sort of
ridiculous.  The intermediate PDFs for lilypond-book are arranged in
some "database" and not really externalized, so if they don't work on
their own, this isn't a showstopper.  Generating the images is usually
done by a limited number of LilyPond jobs (depending on the number of
processors available), each of them converting hundreds of input files
into corresponding PDF files.  It would be conceivable to at least keep
some sort of font identifier consistent in a single job.  Embedding a
font 10 times (for 10 graphics-generating jobs) seems at least better
than embedding it hundreds of times.

At any rate, any such strategy could not be implemented and tested in
short time, so if in the mean time the font merging expedient would stay
available for some time, it would make things a lot smoother for us.

> The Font and FontDescriptor dictionaries might not be possible to
> remove, so the effect wouldn't be quite as good as the current
> approach, but these dictionaries run to a few tens or at most a few
> hundred bytes. The FontFile streams are where most of the space is
> going, and those would be possible to remove, if they were truly
> duplicates.

We are not really striving for "optimum" rather than "better than awful"
regarding the resulting file sizes.  This seems like being close enough.

-- 
David Kastrup

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-22 Thread Ken Sharp

At 00:41 22/09/2017 +0200, David Kastrup wrote:



> Or, even so, should we take other methods (e.g. using non-embedded PDFs)?

If we figure out a working alternative, we should take it.  The current
set of Ghostscript bugs in 9.22 is still a bit in flux, so it's not
clear yet which alternative actually could work.

Is that a reasonable summary of the current state, Ken?


I'd say so, yes. I can't think of a reasonable alternative right at the 
moment, which will yield the same or at least similar output file size. 
Especially given the time scale of our ongoing release. The only other 
approach I could think of didn't work. If someone has other ideas I'll be 
happy to try them out or at least think them over.


As I said in my reply (sorry I saw Masamichi's mail first and replied to it 
first), *if* the fonts were fully embedded which, from a first glance they 
should be, then you wouldn't need this trickery. You could just use MuPDF 
to remove the duplicated FontFile objects, because they'd really be identical.


I've no idea why they aren't fully embedded but I'd have to guess its 
because they are CFF outlines, we don't see a lot of those. So it smells 
like a bug. I will look at it, as soon as I get some time, but its not 
likely to be a change we'll put into 9.22 given the state of the release 
cycle. In fact, realistically, its unlikely I'll even get the time to look 
at it before the release is complete.


So this is something that probably needs to be looked at after the release, 
preferably at leisure. Time pressure sort of makes this a lot worse.


More worrying is the fact that when I run the EPS files here with the 
current release candidate, I don't get one copy of Emmentaler-20 in the 
output PDF files, I get three. For me that didn't make any difference in 
the final output file, but it is a concern because I don't know why that 
would have changed.



Ken


___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-22 Thread Ken Sharp

At 07:01 22/09/2017 +0900, Masamichi Hosoda wrote:


If there is a full font embedded (non-subset) PDF,
does the bigpdfs trick work effectively?


Its still, in my opinion, a risky thing to do. However, if the font were 
fully embedded, you wouldn't need to use Ghostscript and the 
PDFDontUseFontObjectNum bug approach (which is the risky part).


Because the fonts would be genuinely identical, MuPDF would be able to spot 
the font streams at least as being the same and would be able to reliably 
remove the duplicates.


The Font and FontDescriptor dictionaries might not be possible to remove, 
so the effect wouldn't be quite as good as the current approach, but these 
dictionaries run to a few tens or at most a few hundred bytes. The FontFile 
streams are where most of the space is going, and those would be possible 
to remove, if they were truly duplicates.




Or, even so, should we take other methods (e.g. using non-embedded PDFs)?


I'm not sure what other methods there would be. Using EPS inclusions would 
have the same effect as PDF. Rendering to bitmaps would be (I'd guess) as 
large as the PDF files, and would suffer from non-scalability.


Converting to TeX format would probably work, but apparently there were 
problems with that.


Is there some other approach available ?


Ken


___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-21 Thread David Kastrup
Masamichi Hosoda  writes:

>>>We use the following command to convert from EPS to PDF.
>>>
>>>$ gs -dSAFER -dEPSCrop -dCompatibilityLevel=1.4 -dNOPAUSE -dBATCH
>>>-r1200 -dSubsetFonts=false -sDEVICE=pdfwrite -dAutoRotatePages=/None
>>>-sOutputFile=filename.pdf -c.setpdfwrite -ffilename.eps
>>>
>>>We believed that Ghostscript generates full font embedded (non-subset)
>>>PDF
>>>when `-dSubsetFonts=false` is specified.
>> 
>> It should. In this case, I believe it does not. No clue why at the
>> moment though.
>
> Thank you for your answer.
>
> If there is a full font embedded (non-subset) PDF,
> does the bigpdfs trick work effectively?

If I understood Ken correctly: not in the current implementation with
the 9.22rc1 but he's sympathetic to putting the option that became
necessary in 9.21 in order to merge fonts with potential different
object numbers back to work in 9.22, with the understanding that it may
stop existing at an unspecified time later, and that its use might well
buy us other bugs (that have been reported by other users of
Ghostscript).

> Or, even so, should we take other methods (e.g. using non-embedded PDFs)?

If we figure out a working alternative, we should take it.  The current
set of Ghostscript bugs in 9.22 is still a bit in flux, so it's not
clear yet which alternative actually could work.

Is that a reasonable summary of the current state, Ken?

-- 
David Kastrup

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-21 Thread Masamichi Hosoda
>>We use the following command to convert from EPS to PDF.
>>
>>$ gs -dSAFER -dEPSCrop -dCompatibilityLevel=1.4 -dNOPAUSE -dBATCH
>>-r1200 -dSubsetFonts=false -sDEVICE=pdfwrite -dAutoRotatePages=/None
>>-sOutputFile=filename.pdf -c.setpdfwrite -ffilename.eps
>>
>>We believed that Ghostscript generates full font embedded (non-subset)
>>PDF
>>when `-dSubsetFonts=false` is specified.
> 
> It should. In this case, I believe it does not. No clue why at the
> moment though.

Thank you for your answer.

If there is a full font embedded (non-subset) PDF,
does the bigpdfs trick work effectively?
Or, even so, should we take other methods (e.g. using non-embedded PDFs)?

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-21 Thread Ken Sharp

At 21:43 21/09/2017 +0900, Masamichi Hosoda wrote:



We use the following command to convert from EPS to PDF.

$ gs -dSAFER -dEPSCrop -dCompatibilityLevel=1.4 -dNOPAUSE -dBATCH -r1200 
-dSubsetFonts=false -sDEVICE=pdfwrite -dAutoRotatePages=/None 
-sOutputFile=filename.pdf -c.setpdfwrite -ffilename.eps


We believed that Ghostscript generates full font embedded (non-subset) PDF
when `-dSubsetFonts=false` is specified.


It should. In this case, I believe it does not. No clue why at the moment 
though.



Ken


___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-21 Thread Ken Sharp

At 14:43 21/09/2017 +0200, Knut Petersen wrote:

The fonts in the pdfs are identical fonts constructed by ghostscript on 
the fly, I think it was Ken Sharp who explained to me some years ago that 
the term "subset" is wrong ;-)


Well, sort of, they aren't identical though, they are all different, but 
yes constructed fonts. But if you set SubsetFonts=false, then I'd expect 
the full font to be embedded, regardless of which glyphs are used.


That's not quite the same as constructing a fully populated new font, but 
it may well explain why SubsetFonts=false isn't having the result I'd expect.


Except.

I thought that when you set bigpdfs, you used 'show' instead of 
'glyphshow', and that takes you down a totally different code path, where 
Ghostscript/pdfwrite *doesn't* construct a font. It only does that if the 
PostScript uses glyphshow, because there is no glyphshow in PDF. Instead it 
just uses the font it has. If you don't subset the font then it also 
doesn't re-encode it (which is important fo your workflow).


So, if you aren't using glypshow, then the logic is different, and the 
fonts really are subsets. Except that they shouldn't be, because 
-dSubsetFonts=false says to embed the entire font.


I haven't had the time to check what's actually going on yet, I've had to 
go back to working. I'm reasonably certain you aren't using glyphshow, 
because if you were pdfwrite would create fonts with different encodings, 
and this hack wouldn't work, you'd get the wrong output. So, in this case, 
it is correct to call the fonts subsets. The problem is, they shouldn't be 
subset.



One emmentaler font + three encodings + one character (scaled to 
invisibilty) of each encoding used prior to anything else  in the ps 
leads ghostscript to produce three different subsets ;-)) of the 
emmentaler font in every pdf. But the set of 3Â  "subsets" is identical in 
any pdf that is produced this way, and so gs  is (was) able to remove the 
duplicates. That's the --bigpdf trick.


That's not what I see, nor what I would expect. Unless you are using 
glyphshow, but if you were doing that then I believe the encodings would 
differ significantly and you would get collisions in the encodings, which 
would mean the bigpdf trick would produce garbled output.


The PDF files you supplied each contain 1xEmmentaler-20 font, and each one 
has a FontFile (the actual data) of a different size. So the fonts in each 
case are, actually, different. Again I haven't checked (and its probably 
not worth it) but the subsets certainly don't contain the full set of 
glyphs and probably only contain the glyph descriptions of the glyphs that 
were used.


I don't disagree with the expectation, but what you expect isn't what's in 
the files.


That doesn't prevent the trick you are using from working, because all the 
fonts have the same name, so if you don't consider the filenames and font 
object numbers, then Ghostscript (falsely) considers them to be the same 
font. Provided the Encoding is the same (or at least compatible, and 
pdfwrite checks that) for each of the fonts, they can safely be treated as 
the same font.


We only gather the glyph descriptions as they are used because, in 
PostScript, its possible to incrementally download a font, so the glyph 
description might not be available until its used. So we can happily copy 
the used glyphs from instance 'A' of the font and instance 'B' of the font 
(at this point we think they are the same font, possibly with some glyphs 
added since we last looked at it), and combine them into one final 
destination font.


Now as long as there are no character encodings in the 2 fonts which have 
different glyphs at the same character code, everything is fine. The 
problem arises if you have two fonts with the same name, but *different* 
glyphs at the same code point. Because we think they are the same font, 
when we see the second use of the code point, we *don't* copy the glyph. We 
see that we already have a glyph at that location, and it must be the same 
one, because this is the same font, right ? So we use the existing one.


You get away with this because, in your workflow, there are no collisions 
in encoding with the various fonts. If you were using glyphshow I'm fairly 
certain this would not be the case.



However, what if you used the same font in TeX ? I don't necessarily mean 
the Emmentaler font, I note that there's a font called something like 
TeXGyreSchola-Regular in the Lilypond files too, and that will be getting 
the same treatment as Emmentaler-20. If someone used that font in TeX 
itself, then potentially there's a problem. You could end up with the 
encodings colliding and get the wrong glyph when the PDF file is rendered.


Obviously I'm not sure this is a valid concern, I presume for your special 
case of creating documentation it isn't, but in the general case I would 
think it would be.



I agree that mutool clean can be a good starting point. If I read 

Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-21 Thread Knut Petersen


Well, it occurs to me that the *real* problem here is that the fonts in the individual PDF files are subsets. If they were not, then I believe you could safely and easily use MuPDF (specifically mutool clean) to remove the duplicate fonts. Or at least, the duplicate FontFile streams, I'm not 
certain if the Font and FontDescriptor objects would be possible to remove as well. But that would certainly cover a good portion of the file size, the fonts are running at about 9Kb each, while the Font and FontDescriptor objects are a few tens of bytes.


The fonts in the pdfs are identical fonts constructed by ghostscript on the fly, I think 
it was Ken Sharp who explained to me some years ago that the term "subset" is 
wrong ;-)

One emmentaler font + three encodings + one character (scaled to invisibilty) of each encoding used prior to anything else  in the ps leads ghostscript to produce three different subsets ;-)) of the emmentaler font in every pdf. But the set of 3  "subsets" is identical in any pdf that is produced 
this way, and so gs  is (was) able to remove the duplicates. That's the --bigpdf trick.


I agree that mutool clean can be a good starting point. If I read the documentation 
correctly, it does "clean" (remove) unused objects, but it is unable to subset 
fonts if not all glyphs of the fonts are used?

So the question then becomes 'why are the fonts subset ?' That's a really good question, and the answer is that I don't know. Its possible that there is a genuine pdfwrite bug here. The piece of information I'm missing is the step used to create the PDF files from the EPS files, I don't know how 
you are doing that.


lilypond spawns ghostscript. If our --bigpdf option is used the command is e.g.:

    gs -q -dSAFER -dEPSCrop -dCompatibilityLevel=1.4 -dNOPAUSE -dBATCH -r1200 
-dSubsetFonts=false -sDEVICE=pdfwrite -dAutoRotatePages=/None 
-sOutputFile=testa.pdf -c.setpdfwrite -ftesta.eps



My attempts to replicate the individual PDF files have been entirely 
unsuccessful, I get files with three copies of the Emmentaler font embedded 
instead of 1, and none of the three fonts match the ones in the PDF files Knut 
supplied.


I used tag ghostscript-9.21 from the git repository.


Hmm, actually, going back to the 9.21 release does produce at least similar 
behaviour, whereas the 9.22 release does not. In 9.22 I get three fonts output 
instead of 1. I've no idea why currently, and right at the moment I don't have 
time to look.

I'll try and remember to look at it when I am not drowning under support, but 
it looks like there have been changes in this area unrelated to the 
PDFDontUseObjectNum bug, and that in itself may mean that your process doesn't 
work any more, or works less well.


Thanks for you patience!

Knut

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-21 Thread Masamichi Hosoda
> So the question then becomes 'why are the fonts subset ?' That's a
> really good question, and the answer is that I don't know. Its
> possible that there is a genuine pdfwrite bug here. The piece of
> information I'm missing is the step used to create the PDF files from
> the EPS files, I don't know how you are doing that.

We use the following command to convert from EPS to PDF.

$ gs -dSAFER -dEPSCrop -dCompatibilityLevel=1.4 -dNOPAUSE -dBATCH -r1200 
-dSubsetFonts=false -sDEVICE=pdfwrite -dAutoRotatePages=/None 
-sOutputFile=filename.pdf -c.setpdfwrite -ffilename.eps

We believed that Ghostscript generates full font embedded (non-subset) PDF
when `-dSubsetFonts=false` is specified.

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-21 Thread Ken Sharp

At 18:50 20/09/2017 +0200, David Kastrup wrote:


Did you get to see the PostScript files before conversion with pstopdf?
Would being able to generate those differently make a difference?


I'm pretty sure Knut sent me everything, really everything. Not that I can 
use it all, but its nice to have the complete set just in case.


The problem (for my idea) is not the generation of the individual 
PostScript files, or the individual PDF files. However, there is some more 
information on the process at the end of this mail which is (slightly) 
illuminating, feel free to skip ahead past this explanation.


-
What I was hoping to do (and this works for my test cases with simpler 
fonts) was create the PDF files from the PostScript with only font 
references, no font data embedded. Then create the final PDF, still with no 
font data. Finally run that back through Ghostscript with the font 
available to it. Then the individual uses of the font would pick up the one 
and only font available, referenced from Ghostscript, and embed it.


That would (and does for my tests) create a final PDF file with only one 
instance of the font.


The problem is that supporting non-PostScript fonts from disk as 
replacements for PostScript fonts is tricky, it involves a certain amount 
of guesswork to fill in missing information. Our support for TrueType fonts 
isn't bad, but OTF fonts (those with CFF outlines) isn't as good. Also, the 
nature of the font makes the guesswork rather more difficult, since it is 
mostly a 'symbolic' font.


So basically that won't work, at least as things stand now.
-



Those 125GB files, I wager, are for one-time printing or further
compression, not for public download from a website.  So the comparison
is not entirely fair.


Well that one's anomalous, certainly, but we do have people passing around 
multi-gigabyte files for download. Alos, the last game I picked up was 
20GB, and that was a download only.


But, not important as I think I said.



Now, during the investigation of the files Knut sent me I did notice a few 
things.


From what I understand of the process, the intention is that the entire 
font is downloaded with each of the individual EPS files, and then the PDF 
file which is created should contain the entire font (I'm fairly sure 
someone said this). Then the individual PDF files are merged together in 
TeX, presumably along with some other text, producing a PDF file where 
there are multiple, identical, full copies of the font. You then take 
advantage of the Ghostscript bug to treat all the copies of the font as 
being the same.


I'm sorry to disappoint you, but that's not what is happening.

If the process were happening as described, then I believe mutool would be 
quite able to detect the duplicate font streams in the final PDF file and 
remove them. The reason that doesn't work is because the fonts embedded in 
the individual PDF files are not complete, they are subsets. Worse still, 
they don't have subset prefixes on the font name, so its not even clear 
they are subsets.


For example, Knut sent me a bunch of EPS files and the PDF files created 
from them, called testa-1.eps to teste-1.eps. Looking at the EPS files I see:


%%IncludeResource: ProcSet (FontSetInit)
%%BeginResource: FontSet (Emmentaler-20)
/FontSetInit /ProcSet findresource begin
%%BeginData: 64933 Binary Bytes

The following binary looks the same to me, I haven't bothered to check 
precisely. All the EPS files appear to contain the same data. So I'll 
assume that's a complete copy of the font. Note the size, just short of 65Kb.


But, looking at the PDF files, I see quite different results.

Testa-1.pdf:

9 0 obj
<<
  /BaseFont /Emmentaler-20
  /FontDescriptor 10 0 R
  /Type /Font
  /FirstChar 7
  /LastChar 176
  /Widths [ 641 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
490 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 424 ]
  /Encoding 18 0 R
  /Subtype /Type1
>>
endobj

10 0 obj
<<
  /Type /FontDescriptor
  /FontName /Emmentaler-20
  /FontBBox [ 0 -635 645 1196 ]
  /Flags 4
  /Ascent 1196
  /CapHeight 1196
  /Descent -635
  /ItalicAngle 0
  /StemV 96
  /MissingWidth 500
  /FontFile3 17 0 R
>>
endobj

17 0 obj
<<
  /Length 9653
  /Subtype /Type1C
>>
stream


Testb-1.pdf

10 0 obj
<<
  /BaseFont /Emmentaler-20
  /FontDescriptor 11 0 R
  /Type /Font
  /FirstChar 7
  /LastChar 176
  /Widths [ 641 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 344 0 0 0 

Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-20 Thread David Kastrup
Ken Sharp  writes:

> At 14:57 20/09/2017 +0200, Knut Petersen wrote:
>
>>I sent a collection of files to Ken.
>
> Well, my idea doesn't work with your font, because (I think) its an
> OTF font. I had hoped it would be possible to create the PDF files
> with *no* fonts embedded at all, then have Ghostscript embed them just
> the once when emitting the final file. This works for my simple tests,
> but not for your files/fonts.

Did you get to see the PostScript files before conversion with pstopdf?
Would being able to generate those differently make a difference?

> When we have customers wanting to send us 125GB files I have to say
> that a concern over file sizes in the few megabytes seems a bit picky.

Those 125GB files, I wager, are for one-time printing or further
compression, not for public download from a website.  So the comparison
is not entirely fair.

-- 
David Kastrup

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-20 Thread Ken Sharp

At 14:57 20/09/2017 +0200, Knut Petersen wrote:


I sent a collection of files to Ken.


Well, my idea doesn't work with your font, because (I think) its an OTF 
font. I had hoped it would be possible to create the PDF files with *no* 
fonts embedded at all, then have Ghostscript embed them just the once when 
emitting the final file. This works for my simple tests, but not for your 
files/fonts.


I've basically run out of time to look any further. In my opinion you would 
be better to embed subset fonts in all the PDF files and live with the size 
of that, or create larger figures so that you have fewer of them and 
therefore fewer fonts embedded.


When we have customers wanting to send us 125GB files I have to say that a 
concern over file sizes in the few megabytes seems a bit picky.


However, that's clearly not going to sway anyone, so I'll have to give up. 
Obviously you can do what you will, however I will warn you, one last time, 
that what you are doing is taking advantage of a bug.


There are two consequences to this; firstly that the actual bug which the 
current Ghostscript code is designed to fix may one day affect you as well. 
Secondly, if PDFDontUseFontObjectNum ever stops working because we have 
altered the code to fix some other problem, I (and probably any successor) 
won't feel obligated to repair it, because its a bug not a feature.



Ken


___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-20 Thread David Kastrup
Knut Petersen  writes:

> Am 20.09.2017 um 09:50 schrieb Ken Sharp:
>> If someone can create a couple of PostScript files, ideally genuine
>> examples of the files you would use for your manual, created as you
>> would create them for the manual (ie with the bigpdf switch) then I
>> can experiment a bit. I don't need any PDF files, just the
>> PostScript you would send to Ghostscript to create a PDF file.
>>
>
> I sent a collection of files to Ken.

Great, thanks!  Did you include the intermediate PostScript written by
LilyPond (I think either lilypond or lilypond-book has an option for
leaving intermediate files around)?

I suspect that those may be of high interest since that is where what
LilyPond does is reflected directly.

-- 
David Kastrup

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-20 Thread Knut Petersen

Am 20.09.2017 um 09:50 schrieb Ken Sharp:
If someone can create a couple of PostScript files, ideally genuine examples of the files you would use for your manual, created as you would create them for the manual (ie with the bigpdf switch) then I can experiment a bit. I don't need any PDF files, just the PostScript you would send to 
Ghostscript to create a PDF file.




I sent a collection of files to Ken.

Knut


___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-20 Thread Werner LEMBERG

> I'm surprised you find it necessary to have different fonts for
> different point sizes though.

For full scores, this is *extremely* important.  Today, music
publishers no longer typeset pocket scores (approx. in A5 format)
separately in most cases, mainly to save money.  Instead, they simply
scale down the A4 (or larger) full scores, which makes the result
almost illegible due to extremely thin lines.


Werner

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-20 Thread Ken Sharp

At 22:30 19/09/2017 +0200, Knut Petersen wrote:


What happens if you include several  "final" pdfs in a *TeX document?

If you include several pdfs generated as described above in a 
*TeX-generated pdf, all fonts from the lilypond pdfs are included. 
Probably all are different. If you  feed the  *TeX-pdf to ghostscript, 
ghostscript sees different fonts (although all are constructed from 
emmentaler glyphs). ghostscript never was able to merge those fonts, and 
it probably never will be able.


Quite so. The use of glyphshow means that a font must be created. Each such 
font is effectively unique in its layout. It is technically possible to 
reconstruct a small number of fonts which include all the potential glyphs, 
and remap all the character codes in all the PDF files so that they use 
these fonts. In practice this is unfeasible.




We include thousands of lilypond-pdfs in a TeX document


OK so Lilypond produces its PostScript in an entirely different (but better 
from a PostScript programmers point of view) way. This explains why I 
wasn't able to understand David's point, my prior exposure to Lilypond 
output didn't look like that.



I do have a notion of a way to solve this, using Ghostscript but without 
exploiting the old bug, but before I discuss it I'd like to test it 
further, with real Lilypond data. Just in case it turns out not to work 
with your files.


If someone can create a couple of PostScript files, ideally genuine 
examples of the files you would use for your manual, created as you would 
create them for the manual (ie with the bigpdf switch) then I can 
experiment a bit. I don't need any PDF files, just the PostScript you would 
send to Ghostscript to create a PDF file.


Oh, and I'd need whatever fonts you are using as well. I'm surprised you 
find it necessary to have different fonts for different point sizes though.




Ken



___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-19 Thread Karlin High
On 9/19/2017 11:25 AM, Ken Sharp wrote:
> not disabling subsetting when creating the PDF files would make some 
> savings, probably quite signifcant, but I can't tell without seeing 
> some examples of the PDF files in question.

If it helps, here are some example PDF files. I believe this discussion 
began from them.

Gotlandstoner, a collection of about 500 traditional fiddle tunes from 
Gotland, Sweden.
https://github.com/erikronstrom/gotlandstoner/releases/download/v1.0/214-416.pdf
 
(9.49 MB)
https://github.com/erikronstrom/gotlandstoner/releases/download/v1.0/417-727.pdf
 
(13.01 MB)

If the files from those links have sizes smaller than they are noted 
here, that means they have been reproduced with a more-optimized process 
since this discussion started.

For more info about the files, a link to their source code, and a 
description of their production process:
https://lists.gnu.org/archive/html/lilypond-user/2017-08/msg00399.html

For a report on effects of de-duplicating the fonts:
https://lists.gnu.org/archive/html/lilypond-user/2017-09/msg00034.html
"I get the following results with ghostscript 9.06:
bok2: Original size 9.996.691 bytes, optimized size 2.043.380 bytes
bok3: Original size 13.706.324 bytes, optimized size 2.447.232 bytes"
--
Karlin High
Missouri, USA
___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-19 Thread Knut Petersen

Hi everybody!


So you aren't gaining any benefit from exploiting the Ghostscript bug
with the Lilypond output.

But we are.  I hope that Masamichi-san (or Kurt?) can provide the
details here in order to give you a better picture.

*
*Some technical details about our usage of the bug/feature we discuss.

*Our fonts*
**
The font family with the symbols used for notation is called emmentaler, it is 
provided in several design sizes.
To keep this short: Below I use 20 as the design size of the font and will not 
mention other sizes as they are handled identically.
Emmentaler fonts are otf fonts build from our own sources.
Every emmentaler font  contains more than 2*2^8 but less than 3*2^8 symbols.
There is also an  emmentaler-brace font, it is handled identically. To keep 
this short I will not mention emmentaler-brace again.

*Lilypond is used for single documents that are intended to be the final 
document*

If lilypond is used to write a document that is not intended to be included in other documents, it generates a postscript file and feeds that to ghostscript.Ghostscript does a good job, it constructs a font from the glyphs of our emmentaler font that contains only the glyphs actually used in the 
document, includes it in the generated pdf, and the result is a  pdf with as few font data as possible.


If lilypond is used to write a slightly different document, chances are very 
high that the font ghostscript constructs from the glyphs of our emmentaler 
font is different because different symbols are included.

*What happens if you include several  "final" pdfs in a *TeX document?*

If you include several pdfs generated as described above in a *TeX-generated pdf, all fonts from the lilypond pdfs are included. Probably all are different. If you  feed the  *TeX-pdf to ghostscript, ghostscript sees different fonts (although all are constructed from emmentaler glyphs). ghostscript 
never was able to merge those fonts, and it probably never will be able.


*We include thousands of lilypond-pdfs in a TeX document*

Now we plan to write a big TeX document with thousands of different pdfs. All are based 
on different lilypond source files, "lilypond --bigpdf"  is used. Lilypond 
generates postscript files and calls ghostscript, but the postscript files generated by 
lilypond are different:

 * Now every postscript file contains three encodings (an encoding cannot 
contain more than 256 glyphs) with glyphs from the emmentaler font ( 
LilyNoteHeadEncoding, LilyScriptEncoding and LilyOtherEncoding).  Every glyph 
of the emmentaler font is included in exactly one of the encodings.
 * For every glyph we define a command (e.g. "/noteheads.s2 {<6f> show} def").
 * We include font directorys, e.g.:

   FontDirectory /Emmentaler-20 known {
  /Emmentaler-20 findfont dup length dict copy begin
  /Encoding LilyNoteHeadEncoding def
  /Emmentaler-20-N currentdict definefont pop end
  /Emmentaler-20 findfont dup length dict copy begin
  /Encoding LilyScriptEncoding def
  /Emmentaler-20-S currentdict definefont pop end
  /Emmentaler-20 findfont dup length dict copy begin
  /Encoding LilyOtherEncoding def
  /Emmentaler-20-O currentdict definefont pop end
   } if

 *     We also define

   /magfontemmentaler-20mXVo-N { /Emmentaler-20-N 7.0292 output-scale div 
selectfont } bind def
   /magfontemmentaler-20mXVo-S { /Emmentaler-20-S 7.0292 output-scale div 
selectfont } bind def
   /magfontemmentaler-20mXVo-O { /Emmentaler-20-O 7.0292 output-scale div 
selectfont } bind def
   /helpEmmentaler-20 {
  gsave
  1 setgray
  /Emmentaler-20-N 0.001 selectfont 0 0 moveto <01> show
  /Emmentaler-20-S 0.001 selectfont 0 0 moveto <01> show
  /Emmentaler-20-O 0.001 selectfont 0 0 moveto <01> show
  grestore
   } def

 * Prior to any other output we execute helpEmmentaler-20 defined above.
 * To print a glyph we use constructs like the one below (here we do not use 
glyphshow but show ...)

   67.7411 -17.6139 moveto magfontemmentaler-20mXVo-N noteheads.s2

 * -dSubsetFonts=false is used.

This way ghostscript is persuaded to include three emmentaler fonts in the 
generated pdf, one for every encoding. And there is no subsetting. That means 
the pdfs are really big. But all the pdfs generated this way include three 
identical  fonts constructed by ghostscript from emmentaler glyphs.

Now the *TeX document is compiled to pdf. Thousands of big lilypond pdfs with 
all their fonts are included.

Finally the pdf generated by *TeX is fed to ghostscript. An old ghostscript or 
a ghostscript with the PDFDontUseFontObjectNum option enabled will output a pdf 
with all superfluous copies of the fonts removed.

Knut
___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-19 Thread Werner LEMBERG

>>There are already libraries that can read PDFs into a data structure
>>and then write a new PDF [...]
>
> Indeed, and if I was going to do this I would use MuPDF.  [...]

Thanks for your suggestions.  However, these are long-time solutions,
which need capable persons for an implementation (and I fear we
currently don't have such guys).  On the other hand, the gs way
actually works for us *right now*.


Werner

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-19 Thread Ken Sharp

At 17:35 19/09/2017 +0200, David Kastrup wrote:


TeX is designed for the problem of creating documents and all current
TeX engines offer ways of including externally created inclusions in a
graphic format.  And Ghostscript, far from being a general purpose
program, is designed for executing PostScript code and producing
printable renditions, even if its PDF writer has been created quite
later than its original PostScript interpreting core.


Yes, but as you pointed out previously, PostScript *is* a programming 
language, and while its primarily aimed at producing printed output, you 
don't have to use it that way. It is an interpreter for a general-purpose 
language.


You can even write typesetting packages in it, I've seen them 

Which makes it pretty general to my mind. And indeed, if you want to, you 
could write the tool I described in PostScript. I think it would be an 
exercise for a masochist, and I certainly don't plan to, but its possible.




> Assuming that you are using TeX throughout for your documentation,
> then it seems to me that you should be creating your final document by
> appending the various TeX documents together and then producing a
> final PDF, instead of appending multiple PDF files.

This is a misconception of our document creation process.


Well, that's hardly surprising, since I have no experience of it. :-)



  There is only
a single TeX document (actually a Texinfo document), but interspersed
with the main text it includes example output from several thousands of
individual LilyPond runs.  LilyPond's current native output format for
this purpose is PostScript which is converted to PDF using pdftops
(namely, Ghostscript).  Those PDF files are inserted into the final PDF
file while it is being generated by a TeX engine from the Texinfo input.


But the Lilypond output can't be where the multiple fonts are coming from 
(unless things have changed) because Lilypond doesn't have this problem. It 
has other problems but not that one. Perhaps I'm still misunderstanding, I 
thought the problem was PDF documents produced from TeX.


So you could use EPS instead, or just stick with PDF.



> Presumably you want to show some parts of Lilypond as well,

Not "as well": this is actually the principal problem.  All the rest is
a single document, consequently not having a lot of font overhead.


OK well that wasn't previously clear, I had assumed the problem was TeX PDF 
files, not Lilypond ones. I also was under the impression there were many 
TeX PDF files being assembled, not a single file.




I am not into the details here (Masamichi-san?), but this font merging
of the included files is _exactly_ responsible for the reported space
savings.


Last time I looked at Lilypond output it was because it uses glyphshow 
throughout, which means that Ghostscript synthesises fonts for it. I 
haven't seen a case where it uses normal fonts. Which is why I thought it 
couldn't be the problem.


I've never seen Lilypond code that didn't use glyphshow.



It's assuming a different problem than the one we are dealing with.  So
obviously my attempts at explanation have been assuming too much prior
knowledge, putting us on different pages and talking about different
problems.  I apologize for wasting your time in that manner: we may well
disagree about how to best solve LilyPond's problems, but we should be
actually talking about the same problems for this to mean anything.


Yeah I'm afraid we've been talking at cross-purposes, mostly because what 
little I do know about Lilypond has been limited to the files I've been 
presented with in bug reports. Clearly they don't show the full range of 
possibilities.


So it seems like you don't have much choice, though not disabling 
subsetting when creating the PDF files would make some savings, probably 
quite signifcant, but I can't tell without seeing some examples of the PDF 
files in question.


While I have agreed to talk about this with the other developers, I still 
think you are potentially getting into a future situation where you will 
end up with incorrect final output by assuming that all fonts with the same 
name are the same font. Worse still, you won't find out until after you've 
produced the document and someone finds the error.


My solution for that would be to find or create a tool to remove the 
duplicates, and I really don't think you want to be doing that with 
Ghostscript.


Its not a terribly complex task but it would be time consuming to write it, 
and require a decent working knowledge of the basics of PDF. It might make 
a little Google summer of code project if you support that.


As I said in response to william Bader, while it would be possible in your 
case to do a simplified tool, I think a more general font de-duplication 
tool would be better, as it would guard against future changes in your 
workflow and would be something of general use.



Ken


___

Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-19 Thread David Kastrup
Ken Sharp  writes:

> At 15:44 19/09/2017 +0200, David Kastrup wrote:
>
>
>>Are there any example documents with thousands of pages and ten
>>thousands of PDF inclusions one could look at?
>
> I would suggest that the fact you want to 'include' tens of thousands
> of PDF files to be the problem, really.

I prefer the term "challenge" myself since there is nothing inherently
problematic apart from the scale of the document.

> I appreciate you are trying to deal with an existing problem, but
> using Ghostscript to do something it wasn't intended for isn't really
> the best idea for solving the problem.
>
> As I've said elsewhere there is a genuine bug which can be exposed
> doing what you want with Ghostscript and it would not surprise me if
> in the long run it causes you another problem.

Neither would it surprise me.  But as I said, we are actually navigating
a compromise between various solutions and tools and the various
unexpected problems they cause.  If you can present a path that will not
cause any problem at all while still producing good documents of the
required size and type, nobody will be happier than myself.

But that does not appear like a feasible option within reach any time
soon, so the possibility of a future problem does not keep me from
trying to deal with a current problem.

> It would be possible to write a tool which could reliably detect
> identical fonts in a PDF file, remove the duplicates and alter the
> references so that the PDF continued to work. In all honesty, if the
> problem is as important as you say, this is probably a better
> solution. A tailored program, specifically designed to solve a
> specific problem is much more likely to work reliably than trying to
> use a general purpose program, designed for a different problem.

TeX is designed for the problem of creating documents and all current
TeX engines offer ways of including externally created inclusions in a
graphic format.  And Ghostscript, far from being a general purpose
program, is designed for executing PostScript code and producing
printable renditions, even if its PDF writer has been created quite
later than its original PostScript interpreting core.

So we are not really using anything at cross-purposes just because we
are employing it at large scale.  To make this a bit less theoretical,
the various versions of the Notation Reference can be found at
.  For getting
an impression of the content, you may look at the "split HTML" version,
and the PDF is there as well.

> This is extracted from an email I decided earlier not to send:
> -

[...]

> Assuming that you are using TeX throughout for your documentation,
> then it seems to me that you should be creating your final document by
> appending the various TeX documents together and then producing a
> final PDF, instead of appending multiple PDF files.

This is a misconception of our document creation process.  There is only
a single TeX document (actually a Texinfo document), but interspersed
with the main text it includes example output from several thousands of
individual LilyPond runs.  LilyPond's current native output format for
this purpose is PostScript which is converted to PDF using pdftops
(namely, Ghostscript).  Those PDF files are inserted into the final PDF
file while it is being generated by a TeX engine from the Texinfo input.

Producing at first a DVI file and turning that into a single PostScript
file then converted into PDF rather than using a PDF-producing (and
including) TeX engine sounds like a workable idea until you realize that
the DVI/PostScript path is badly equipped working with Unicode-range
fonts: PostScript is only part of "legacy" TeX workflows centered around
8-bit encodings.

> Presumably you want to show some parts of Lilypond as well,

Not "as well": this is actually the principal problem.  All the rest is
a single document, consequently not having a lot of font overhead.

> so I would create EPS figures for those. It will of course increase
> the number of font inclusions again, but in the case of Lilypond I
> don't think that you can be merging the fonts anyway, because Lilypond
> always uses glyphshow, and pdfwrite will create a uniquely named font
> for each usage.

I am not into the details here (Masamichi-san?), but this font merging
of the included files is _exactly_ responsible for the reported space
savings.

> So you aren't gaining any benefit from exploiting the Ghostscript bug
> with the Lilypond output.

But we are.  I hope that Masamichi-san (or Kurt?) can provide the
details here in order to give you a better picture.

> So by maintaining the text and layout in TeX, inserting EPS figures as
> required, and only producing PDF as the last step in the process you
> would create a file which (as I understand it) would only contain a
> single instance of 

Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-19 Thread Ken Sharp

At 15:11 19/09/2017 +, William Bader wrote:

>It would be possible to write a tool which could reliably detect 
identical fonts in a PDF file,



There are already libraries that can read PDFs into a data structure and 
then write a new PDF, for example, pdfsizeopt in python, poppler 
https://poppler.freedesktop.org/ and 
PoDoFo 
http://podofo.sourceforge.net/about.html 
in C++, pdfclown 
https://sourceforge.net/projects/clown/ 
in .net, PDFBox http://pdfbox.apache.org/ in 
java, iText https://itextpdf.com/ in java and c#, 
pdfsam http://www.pdfsam.org/ in java. Maybe one 
of them would be suitable as a starting point for writing a font merging tool.


Indeed, and if I was going to do this I would use MuPDF. Note that it will 
likely be a slow job to run. You can't do the job until you have all the 
PDF files collected into one, then you need to check each instance of each 
font to see if its the same as any other font, and remove the other font, 
updating the relevant Resources dictionaries. Fortunately you don't need to 
alter any of the content streams. Finally you'd need to rewrite the PDF 
file with a modified xref and the relevant font streams removed.


Of course, because you have a fixed workflow you *could* simply look for 
the second and following instances of any font rather than checking them 
all exhaustively, but I think it would be better to do the job right. 
Firstly you'd be protected against any further changes in your workflow, 
and secondly you would have a genuinely useful tool in its own right.


Ghostscript is entirely the wrong tool for that job. Its possible, but I 
wouldn't want to write the PostScript program for it.



Ken


___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-19 Thread William Bader
>It would be possible to write a tool which could reliably detect identical 
>fonts in a PDF file,


There are already libraries that can read PDFs into a data structure and then 
write a new PDF, for example, pdfsizeopt in python, poppler 
https://poppler.freedesktop.org/ and PoDoFo  
http://podofo.sourceforge.net/about.html in C++, pdfclown 
https://sourceforge.net/projects/clown/  in .net, PDFBox 
http://pdfbox.apache.org/ in java, iText https://itextpdf.com/ in java and c#, 
pdfsam http://www.pdfsam.org/ in java. Maybe one of them would be suitable as a 
starting point for writing a font merging tool.








From: Ken Sharp <ken.sh...@artifex.com>
Sent: Tuesday, September 19, 2017 10:03 AM
To: David Kastrup
Cc: William Bader; gs-de...@ghostscript.com; lilypond-devel@gnu.org
Subject: Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

At 15:44 19/09/2017 +0200, David Kastrup wrote:


>Are there any example documents with thousands of pages and ten
>thousands of PDF inclusions one could look at?

I would suggest that the fact you want to 'include' tens of thousands of
PDF files to be the problem, really.

I appreciate you are trying to deal with an existing problem, but using
Ghostscript to do something it wasn't intended for isn't really the best
idea for solving the problem.

As I've said elsewhere there is a genuine bug which can be exposed doing
what you want with Ghostscript and it would not surprise me if in the long
run it causes you another problem.

It would be possible to write a tool which could reliably detect identical
fonts in a PDF file, remove the duplicates and alter the references so that
the PDF continued to work. In all honesty, if the problem is as important
as you say, this is probably a better solution. A tailored program,
specifically designed to solve a specific problem is much more likely to
work reliably than trying to use a general purpose program, designed for a
different problem.

That said, it would be quite a big job, and I'm not actually offering to
take it on.

My suggestion, which may not be feasible, is to keep everything in an
editable format until the last second

This is extracted from an email I decided earlier not to send:
-

While I can tell you a lot about PostScript and PDF I can't help you at all
with TeX. In general, however, my experience of working with large
documents is that the content should be maintained in the layout
application native format until the last moment. Broadly speaking this is
similar to keeping bitmap data in something like TIFF and only converting
to JPEG at the last moment, and for similar reasons.

When you create a PDF you are discarding all the 'metadata' that describes
the layout to the typesetting or layout application. Its all but impossible
to recover that information once its been lost.

Your problem with multiple fonts pretty much exhibits that; once you've got
the PDF file, a layout engine can't tell that all the fonts are the same.
Ghostscript can't either, which is why it now doesn't strip the duplicates
out. While I appreciate this is a problem for your particular use case, it
is actually a considerable improvement for users in general.

Assuming that you are using TeX throughout for your documentation, then it
seems to me that you should be creating your final document by appending
the various TeX documents together and then producing a final PDF, instead
of appending multiple PDF files.

Presumably you want to show some parts of Lilypond as well, so I would
create EPS figures for those. It will of course increase the number of font
inclusions again, but in the case of Lilypond I don't think that you can be
merging the fonts anyway, because Lilypond always uses glyphshow, and
pdfwrite will create a uniquely named font for each usage. So you aren't
gaining any benefit from exploiting the Ghostscript bug with the Lilypond
output.

So by maintaining the text and layout in TeX, inserting EPS figures as
required, and only producing PDF as the last step in the process you would
create a file which (as I understand it) would only contain a single
instance of each font.

in short I'm not really suggesting that you change anything except your
working practices, and maintain your files as TeX files rather than as PDF.
Because I don't have any knowledge of your workflow (or TeX) I cannot say
if this is reasonable, it may well not be.


 Ken

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-19 Thread Ken Sharp

At 16:29 19/09/2017 +0200, Werner LEMBERG wrote:



This is next to impossible.  lilypond has knowledge for good music
typography, while TeX has knowledge for good text typography.  I read
your suggestion that lilypond should do everything, i.e., both text
and music layout, but this won't happen, for obvious reasons.


Umm no, I was suggesting (since as I understand it, this is for production 
of manuals) that the bulk of the manual be in TeX, and the bits where you 
need to show Lilypond should be EPS. However, its possible I'm (again) 
mis-understanding the purpose here. Jet lag is causing serious confusion 
today :-(




In other words, we have to co-operate with TeX somehow – output from
lilypond must be included into TeX documents.


Right, EPS or even PDF since Lilypond can't suffer from the font problem, 
at least not if it uses Ghostscript to produce its PDF.




> Your problem with multiple fonts pretty much exhibits that; once
> you've got the PDF file, a layout engine can't tell that all the
> fonts are the same.  Ghostscript can't either, which is why it now
> doesn't strip the duplicates out.

But our pipeline *guarantees* that the fonts are identical (basically
by disabling font subsetting)!


Well, that's a pretty unique workflow. Of course, if you didn't disable 
font subsetting the included multiple fonts would be much smaller anyway.




  This is something we can completely
control, and for such a use-case we would like to have forthcoming
versions of ghostscript be able to do what it did previously.


See previous replies on this subject.



ken


___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-19 Thread Werner LEMBERG

> In general, however, my experience of working with large documents
> is that the content should be maintained in the layout application
> native format until the last moment.  Broadly speaking this is
> similar to keeping bitmap data in something like TIFF and only
> converting to JPEG at the last moment, and for similar reasons.

This is next to impossible.  lilypond has knowledge for good music
typography, while TeX has knowledge for good text typography.  I read
your suggestion that lilypond should do everything, i.e., both text
and music layout, but this won't happen, for obvious reasons.

In other words, we have to co-operate with TeX somehow – output from
lilypond must be included into TeX documents.

> Your problem with multiple fonts pretty much exhibits that; once
> you've got the PDF file, a layout engine can't tell that all the
> fonts are the same.  Ghostscript can't either, which is why it now
> doesn't strip the duplicates out.

But our pipeline *guarantees* that the fonts are identical (basically
by disabling font subsetting)!  This is something we can completely
control, and for such a use-case we would like to have forthcoming
versions of ghostscript be able to do what it did previously.

> Assuming that you are using TeX throughout for your documentation,
> then it seems to me that you should be creating your final document
> by appending the various TeX documents together and then producing a
> final PDF, instead of appending multiple PDF files.

As mentioned above, lilypond doesn't produce TeX output (it did so
many, many years ago, but this was abandoned for various reasons).  It
natively produces EPS, which gets converted to PDF using gs.  However,
the modern TeX flavours we need produce PDF only, and can only include
PDF files easily.  However, its font handling capabilities are not
sophisticated enough to produce small output files.  Thus the second
pass with ghostscript.


Werner
___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-19 Thread Ken Sharp

At 15:44 19/09/2017 +0200, David Kastrup wrote:



Are there any example documents with thousands of pages and ten
thousands of PDF inclusions one could look at?


I would suggest that the fact you want to 'include' tens of thousands of 
PDF files to be the problem, really.


I appreciate you are trying to deal with an existing problem, but using 
Ghostscript to do something it wasn't intended for isn't really the best 
idea for solving the problem.


As I've said elsewhere there is a genuine bug which can be exposed doing 
what you want with Ghostscript and it would not surprise me if in the long 
run it causes you another problem.


It would be possible to write a tool which could reliably detect identical 
fonts in a PDF file, remove the duplicates and alter the references so that 
the PDF continued to work. In all honesty, if the problem is as important 
as you say, this is probably a better solution. A tailored program, 
specifically designed to solve a specific problem is much more likely to 
work reliably than trying to use a general purpose program, designed for a 
different problem.


That said, it would be quite a big job, and I'm not actually offering to 
take it on.


My suggestion, which may not be feasible, is to keep everything in an 
editable format until the last second


This is extracted from an email I decided earlier not to send:
-

While I can tell you a lot about PostScript and PDF I can't help you at all 
with TeX. In general, however, my experience of working with large 
documents is that the content should be maintained in the layout 
application native format until the last moment. Broadly speaking this is 
similar to keeping bitmap data in something like TIFF and only converting 
to JPEG at the last moment, and for similar reasons.


When you create a PDF you are discarding all the 'metadata' that describes 
the layout to the typesetting or layout application. Its all but impossible 
to recover that information once its been lost.


Your problem with multiple fonts pretty much exhibits that; once you've got 
the PDF file, a layout engine can't tell that all the fonts are the same. 
Ghostscript can't either, which is why it now doesn't strip the duplicates 
out. While I appreciate this is a problem for your particular use case, it 
is actually a considerable improvement for users in general.


Assuming that you are using TeX throughout for your documentation, then it 
seems to me that you should be creating your final document by appending 
the various TeX documents together and then producing a final PDF, instead 
of appending multiple PDF files.


Presumably you want to show some parts of Lilypond as well, so I would 
create EPS figures for those. It will of course increase the number of font 
inclusions again, but in the case of Lilypond I don't think that you can be 
merging the fonts anyway, because Lilypond always uses glyphshow, and 
pdfwrite will create a uniquely named font for each usage. So you aren't 
gaining any benefit from exploiting the Ghostscript bug with the Lilypond 
output.


So by maintaining the text and layout in TeX, inserting EPS figures as 
required, and only producing PDF as the last step in the process you would 
create a file which (as I understand it) would only contain a single 
instance of each font.


in short I'm not really suggesting that you change anything except your 
working practices, and maintain your files as TeX files rather than as PDF. 
Because I don't have any knowledge of your workflow (or TeX) I cannot say 
if this is reasonable, it may well not be.



Ken


___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-19 Thread David Kastrup
William Bader  writes:

>>Then maybe you should complain to the software producing that content.
>
> That is one place where TeX shows its age, and switching to a newer
> system like SILE might produce better output
> https://github.com/simoncozens/sile
> https://www.youtube.com/watch?v=5BIP_N9qQm4

Are there any example documents with thousands of pages and ten
thousands of PDF inclusions one could look at?

-- 
David Kastrup

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-19 Thread David Kastrup
Ken Sharp  writes:

> At 13:42 19/09/2017 +0200, David Kastrup wrote:
>
>>So the mechanisms mostly out of our own control are Ghostscript in its
>>ps2pdf facility, various TeX engines when including lots of
>>ps2pdf-generated PDF files into a main document.
>
> To me this is where the problem lies, PDF is good as a terminal
> document format, and that was its original aim. Its not good as an
> intermediate format, or for inclusion in more complex documents.
>
> I feel the correct answer to this is not to use PDF as an intermediate
> format, it seem to me you should stick with a typesetting format
> because that allows you to determine that fonts which are named the
> same, are in fact the same, and you don't need to include them
> multiple times. In fact for a layout format, you wouldn't normally
> include the actual fonts at all, of course.

PDF is at least a format.

PostScript is not even a format but a programming language, and one
using global resources to boot.  That makes inclusion tricky.  EPS is a
black-boxed actual inclusion format but with rather little dependable
information.  Now in our case, the PostScript is generated entirely
under our control, but the inclusion into other documents (and
possibility distillation into PDF) isn't, so we are dealing with
inclusion mechanisms catering to more general use cases.

As a programmer, I am sympathetic to your feelings about the correct
answer for intermediate formats, but "in the real world," "this ship has
sailed."

PDF is extensively used as a format for graphics inclusion.  When you
submit papers with figures to basically any journal, they want the
figures to be submitted as PDF.

So I don't see that Ghostscript can reasonably avoid dealing with the
fallout of PDF being used as intermediate format basically everywhere,
whether or not you consider it suitable for that.

>>For this use case, we want a process that avoids excessive font
>>duplication.  The process so far involved an additional Ghostscript
>>run removing most of the duplicates from the TeX-generated PDF
>>(someone please correct me if I got this wrong).
>
> This only works because all the PDF files you are using (so far) embed
> the whole font, don't use subsets, and use the same Encoding (or use
> different names so that they are clearly different fonts). Were you to
> start using PDF files (from whatever source) where that is not the
> case, and I quoted OpenOffice as an example, then you might run into
> the problem with the bug you are exploiting.

For the technical details of either PostScript or PDF, I am not
competent (I've done some basic graphics stuff in either).  I was only
trying to paint the big picture we are more or less dealing with because
Masamichi-san was not really able to communicate it well.

But he definitely still is the go-to guy for discussing the detailed
problems occuring with each approach and basically did much of the
testing required to figure out the individual shortcomings of each in
practice.

-- 
David Kastrup

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-19 Thread Ken Sharp

At 13:42 19/09/2017 +0200, David Kastrup wrote:


So the mechanisms mostly out of our own control are Ghostscript in its
ps2pdf facility, various TeX engines when including lots of
ps2pdf-generated PDF files into a main document.


To me this is where the problem lies, PDF is good as a terminal document 
format, and that was its original aim. Its not good as an intermediate 
format, or for inclusion in more complex documents.


I feel the correct answer to this is not to use PDF as an intermediate 
format, it seem to me you should stick with a typesetting format because 
that allows you to determine that fonts which are named the same, are in 
fact the same, and you don't need to include them multiple times. In fact 
for a layout format, you wouldn't normally include the actual fonts at all, 
of course.




  For this use case, we
want a process that avoids excessive font duplication.  The process so
far involved an additional Ghostscript run removing most of the
duplicates from the TeX-generated PDF (someone please correct me if I
got this wrong).


This only works because all the PDF files you are using (so far) embed the 
whole font, don't use subsets, and use the same Encoding (or use different 
names so that they are clearly different fonts). Were you to start using 
PDF files (from whatever source) where that is not the case, and I quoted 
OpenOffice as an example, then you might run into the problem with the bug 
you are exploiting.


By not using the PDF object number as a unique identifier, Ghostscript only 
uses the font name. If you get two different fonts (subset or otherwise) 
Ghostscript will assume they are the same font. If they are differently 
encoded (say that 'A' is encoded at position 0x42 in the first font, but 
0x42 in the second font has a 'B') then Ghostscript can't tell and will 
simply drop the second font.


The result of this is that you will get the wrong text in the output PDF 
file. Again, this isn't a theoretical problem, we have had numerous bug 
reports on this count which we have done our best to work around. In the 
end there was no alternative but to use the object number as the unique 
identifier (NB we actually use the object number and the filename, in case 
we get two files with the same font using the same object number)


The only way you find out this has happened is when you carefully read the 
text, of course.




We don't really have a way to forego Texinfo for our printed manuals.
Given the comparative importance of TeX for document preparation,
however, I think it would be good to figure out how to keep at least one
viable way open of making this work and figure out a migration path of
the involved tools to how you would optimally would want to have things
working.

I don't think that TeX can (or should) preserve object ids when
including external PDF files, so figuring out some other reasonably
robust identity associated with fonts would seem important.


Well I know nothing about TeX. It seems to me however, that it *must* 
preserve the object IDs in some sense, because otherwise you wouldn't be 
ending up with multiple copies of fonts. If it didn't preserve the object 
numbers, then it would assume that the first 'Times' is the same as the 
second 'Times' and would collapse them into a single reference. Exactly as 
you are using Ghostscript for at present.


If your PDF files contain ToUnicode CMaps then its possible to identify 
properly which glyph is actually intended by each character code in each 
font. Doing that would allow you to optimise the use of fonts, because you 
could alter the character coding of each usage so that it was consistent 
across the documents and only required a single instance of the font in 
question.


I'd have to experiment to find out, but it would nit surprise me to 
discover that when you include a PDF file in TeX what it actually does is 
convert it into an EPS or PostScript program and then concatenates all the 
documents together.


That would mean TeX could use PDF files as a kind of 'black box', and would 
mean that the fonts would be included multiple times, just as you say is 
happening.




> PDF was never intended as a means of transferring, or 'containerising'
> content, its not trivial (or even possible in general) to extract
> content from, or simplify, PDF files.

And yet I seem to remember Adobe has a specification for how to write
PDF intended for embedding, haven't they?


Err, no, I don't think so. You can embed files untouched (including PDF 
files) inside a PDF, just as other file types. But that's not really what I 
meant when I said 'containerising'.


You can also have PDF Collections (I can't recall if that's the correct 
name) but again that isn't what I meant when I talk about transferring 
content, because you aren't transferring the content, you are including the 
whole thing, not just its content.


I was thinking more like writing a .docx file as an RTF or a spreadhseet as 
a comma 

Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-19 Thread David Kastrup
Ken Sharp  writes:

> At 11:33 19/09/2017 +0200, David Kastrup wrote:
>
>>The question is what the complaint should be, namely what LilyPond does
>>wrong.  Producing large comprehensive manuals using TeX including lots
>>of example images generated using the same fonts?
>
> Ah, you need to be careful talking about 'images' because in PDF (and
> PostScript) images are bitmaps. I don't think that's what you mean
> here.
>
>
>>To me that sounds like a stock typesetting task with mainstream tools
>>that Ghostscript should be suitable for.
>
> Whoa, Ghostscript is a PostScript interpreter, it isn't intended as a
> typesetting application. All its really intended for is producing
> rendered bitmaps from PostScript.
>
> It does have the pdfwrite and ps2write devices, but the intention
> there is to produce output which is visually the same as the
> input. There's all kinds of stuff you can put in an input PDF file
> which we don't preserve into the output with these devices.

Ok, maybe this wasn't made explicit enough: Ghostscript here is involved
very much in our document generation.

LilyPond creates PostScript output files using vector music fonts (in
the vast majority of cases, including the manuals, mainly using its own
font sets coming with it but also some free text fonts).  Those are
converted into PDF files using Ghostscript (I think via ps2pdf).  Ten
thousands of those PDF files are usually included in one Texinfo
document generated with a TeX variant (we started out with PDFTeX, this
was switched to XeTeX and I am not sure whether we are now using LuaTeX:
the changes were mostly driven by getting the best PDF output for the
manuals).

With regard to adapting to changes, the initial PostScript generation is
under our own control.  ps2pdf is of course GhostScript.  Masamichi
Hosoda has contributed a number of patches to Texinfo in our course of
changing TeX engines, and texinfo.tex is included in LilyPond's
distribution: so basically we can do changes in the area of TeXinfo
sourcecode pretty fast, once the problem is understood.  Of course, it
would be desirable if the same document workflow worked with LaTeX
rather than Texinfo, but we have a much less direct influence on LaTeX.
And the TeX binaries are slow-changing and separately distributed, so we
cannot really prescribe a whole lot there.

So the mechanisms mostly out of our own control are Ghostscript in its
ps2pdf facility, various TeX engines when including lots of
ps2pdf-generated PDF files into a main document.  For this use case, we
want a process that avoids excessive font duplication.  The process so
far involved an additional Ghostscript run removing most of the
duplicates from the TeX-generated PDF (someone please correct me if I
got this wrong).

So Ghostscript is involved in multiple steps here, and at least one
Unicode-capable TeX variant in another.  This is a combination that is
certainly relevant for more document production in the context of Free
Software than just for LilyPond.

So I don't think that LilyPond is alone in the particular requirements
served by the now removed feature or misfeature: documents including
external files will at least in the context of Free Software rely on
similar tools and mechanisms as LilyPond does.

>>But obviously you think there must be something wrong with the way we
>>are generating and including a large amount of images into one
>>document.
>>
>>Would you be willing to help us figure out a different way in which we
>>could make this work?
>
> Certainly! I did spend some time on a bug thread trying to help with
> moving away from using glyphshow. I'm happy to spend what little time
> I have explaining stuff that I know about.
>
> However, while that covers PDF and PostScript, it doesn't cover TeX.

We don't really have a way to forego Texinfo for our printed manuals.
Given the comparative importance of TeX for document preparation,
however, I think it would be good to figure out how to keep at least one
viable way open of making this work and figure out a migration path of
the involved tools to how you would optimally would want to have things
working.

I don't think that TeX can (or should) preserve object ids when
including external PDF files, so figuring out some other reasonably
robust identity associated with fonts would seem important.

> As to there being an easy solution, I doubt there is one. Other than
> using a tool better suited to the task of producing documentation. But
> that would almost certainly mean moving away from open source toolsets
> which I imagine wouldn't be acceptable.

That would include moving away from Ghostscript, wouldn't it?  But then
we would not be having that discussion in the first place.

> Possibly not producing multiple intermediate files or not producing
> them as PDF would be an answer. From my uninformed outsider's
> perspective it sounds like you are making trouble for yourselves in
> this fashion, but that's probably a 

Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-19 Thread Ken Sharp

At 11:33 19/09/2017 +0200, David Kastrup wrote:


The question is what the complaint should be, namely what LilyPond does
wrong.  Producing large comprehensive manuals using TeX including lots
of example images generated using the same fonts?


Ah, you need to be careful talking about 'images' because in PDF (and 
PostScript) images are bitmaps. I don't think that's what you mean here.




To me that sounds like a stock typesetting task with mainstream tools
that Ghostscript should be suitable for.


Whoa, Ghostscript is a PostScript interpreter, it isn't intended as a 
typesetting application. All its really intended for is producing rendered 
bitmaps from PostScript.


It does have the pdfwrite and ps2write devices, but the intention there is 
to produce output which is visually the same as the input. There's all 
kinds of stuff you can put in an input PDF file which we don't preserve 
into the output with these devices.


Essentially its an implementation of Acrobat Distiller, with the advantage 
that it can handle a wider range of input.


And I note that the output indeed does contain the examples, in an entirely 
comprehensible way. Its simply that the output is large, and we've never 
claimed that the output won't be at least as large as the input. In fact I 
spend quite a lot of time disabusing people of that notion on Stack Overflow.




But obviously you think there must be something wrong with the way we
are generating and including a large amount of images into one document.

Would you be willing to help us figure out a different way in which we
could make this work?


Certainly! I did spend some time on a bug thread trying to help with moving 
away from using glyphshow. I'm happy to spend what little time I have 
explaining stuff that I know about.


However, while that covers PDF and PostScript, it doesn't cover TeX.



  In particular Masamichi Hosoda has invested
months of work chasing various Ghostscript versions and their
idiosyncrasies and figuring out the best-suited TeX engines to be using
for that task, so if there is an easy solution he and others have
overlooked, it certainly would help having someone on board who has a
clue about where Ghostscript is and should be heading.


Most of the idiosyncracies would, I imagine, be bugs. I'd love to say we 
could control that, but it wouldn't be true.


As to there being an easy solution, I doubt there is one. Other than using 
a tool better suited to the task of producing documentation. But that would 
almost certainly mean moving away from open source toolsets which I imagine 
wouldn't be acceptable.


Possibly not producing multiple intermediate files or not producing them as 
PDF would be an answer. From my uninformed outsider's perspective it sounds 
like you are making trouble for yourselves in this fashion, but that's 
probably a mis-interpretation.


PDF was never intended as a means of transferring, or 'containerising' 
content, its not trivial (or even possible in general) to extract content 
from, or simplify, PDF files.




I don't see where explaining the use case for which the availability of
the option makes much more of a difference than what you thought it
would does amount to "berating" you.


I read the second email which simply said 'if you do this then you get a 
large file' and then listed a bunch of URLs as 'you've got to, because 
look'. I certainly didn't (and still don't) feel it explained much of the 
'use case' other than 'if you do this then its a problem'. To which my 
answer is still 'then don't do that'.


I also felt, to be honest, that the follow up was unnecessary and didn't 
add anything. Hence 'berating', possibly a poor choice of word.


Knut's email was, to my mind, much more explanatory.

I really think its time to draw this to a conclusion, as the discussion 
isn't really going anywhere. I have repeatedly said I'll discuss it 
internally. I will also say that I'm now more inclined to restore the 
behavior, though with some big warnings in the documentation.


Still, its nice to see some activity on gs-devel :-)


Ken


___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-19 Thread Ken Sharp

At 20:38 18/09/2017 +0200, David Kastrup wrote:


> And if you have multiple subsets, badly named (eg OpenOffice output)
> then you get a final PDF file where some of the text is missing or
> garbled.

So?  Nobody forces anybody to use that option.


The point is rather that if you don't use the object numbers, then you get 
incorrect output.




> And your point is what ?

That we are talking about functionality that is considered useful?


Which is indeed useful information. But that wasn't what was provided, just 
a bunch of links. Knut's later email presented that point of view much 
better (from my point of view) .




I think "slightly smaller" was something like a factor of 10.  We are
talking about files including literally thousands if not ten thousands
of graphics (manuals close to a thousand pages with lots of graphic
output included).


Which just takes me right back to 'don't create your PDF files like that then'.

If a file of that complexity suffers so much from a superfluity of fonts, 
then you should really do something about not including the entire font 
multiple times.



The 'feature' that everyone is referring to here is not, and never was, a 
feature. It was an unintended consequence of a limitation in the way that 
Ghostscript's PDF interpreter works. In short, you were taking advantage of 
a bug.


That limitation was actually potentially quite severe and could cause 
incorrect output, and if that did occur there was no way to tell without 
closely proof-reading the output.


Given the on-going complaints about that bug we did eventually come up with 
a solution. We were worried that the solution would prove worse than the 
original disease, so we preserved a means to restore the original 
behaviour. Over time it became apparent that the solution is, in fact, robust.


Since we now have an apparently good solution to the problem and that 
preserving the means to restore the old behaviour does add (in however 
small a way) to the complexity of the interpreter, I removed that option.


I've covered this in more detail in my reply to Knut. I also said that I 
welcomed the other replies on the subject. It *is* useful to know that the 
old (flawed) behaviour is important to you all.


Stating this clearly, rather than simply providing a bunch of URLs to pages 
which document how to *use* the flawed implementation, is useful information.


As I have also said a number of times, the old behaviour has a flaw, and 
its entirely possible that you will trip over it, in which case, there is 
no further help we can offer, the solution to that problem is to use the 
current behaviour, which you don't like.


I do feel that creating the pages in this fashion is less than ideal and 
you would probably be well advised to seek a different way of working.




As I have said (repeatedly now) I will discuss this with the other 
developers, and the input (especially from Knut's mail) will be taken into 
account.




Ken Sharp


___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-19 Thread David Kastrup
Ken Sharp  writes:

> At 20:38 18/09/2017 +0200, David Kastrup wrote:
>
>
>>I think "slightly smaller" was something like a factor of 10.  We are
>>talking about files including literally thousands if not ten thousands
>>of graphics (manuals close to a thousand pages with lots of graphic
>>output included).
>
> Then maybe you should complain to the software producing that content.

The software producing that content is LilyPond itself (in the case of
the manuals we are talking about, via a tool LilyPond-Book), so there is
no point in complaining since it is under our control.

The question is what the complaint should be, namely what LilyPond does
wrong.  Producing large comprehensive manuals using TeX including lots
of example images generated using the same fonts?

To me that sounds like a stock typesetting task with mainstream tools
that Ghostscript should be suitable for.

But obviously you think there must be something wrong with the way we
are generating and including a large amount of images into one document.

Would you be willing to help us figure out a different way in which we
could make this work?  In particular Masamichi Hosoda has invested
months of work chasing various Ghostscript versions and their
idiosyncrasies and figuring out the best-suited TeX engines to be using
for that task, so if there is an easy solution he and others have
overlooked, it certainly would help having someone on board who has a
clue about where Ghostscript is and should be heading.

> I already said I would discuss this further, berating me will not
> induce me to make changes.

I don't see where explaining the use case for which the availability of
the option makes much more of a difference than what you thought it
would does amount to "berating" you.

-- 
David Kastrup

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-19 Thread Ken Sharp

At 20:38 18/09/2017 +0200, David Kastrup wrote:



I think "slightly smaller" was something like a factor of 10.  We are
talking about files including literally thousands if not ten thousands
of graphics (manuals close to a thousand pages with lots of graphic
output included).


Then maybe you should complain to the software producing that content.

I already said I would discuss this further, berating me will not induce me 
to make changes.




Ken


___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-19 Thread David Kastrup
per...@pluto.rain.com (Perry Hutchison) writes:

> Masamichi Hosoda  wrote:
>
>> >>It seems that `-dPDFDontUseFontObjectNum` option does not work.
> ...
>> There is a tool for using this method of removing duplicate fonts.
>> https://www.ctan.org/pkg/extractpdfmark
>> https://packages.debian.org/stretch/extractpdfmark
>> http://packages.ubuntu.com/zesty/extractpdfmark
>
> As I see it, the availability of a separate tool to do the same thing
> is a reason to _not_ provide a duplicate capability in Ghostscript.
> Those who want that processing (despite the risks that Ken mentioned)
> can use extractpdfmark.

extractpdfmark requires that capability in Ghostscript for doing its
work.  It is not a "separate tool" as such.

-- 
David Kastrup

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

2017-09-18 Thread David Kastrup
Ken Sharp  writes:

> At 00:31 19/09/2017 +0900, Masamichi Hosoda wrote:
>
>>When you create a PDF document using something like a TeX system
>>you may include many small PDF files in the main PDF file.
>>It is common for each of the small PDF files to use the same fonts.
>>
>>If the small PDF files contain embedded full font sets,
>>the TeX system includes all of them in the main PDF.
>>The main PDF contains duplicates of the same full sets of fonts.
>>Therefore, `PDFDontUseFontObjectNum` can remove the duplicates.
>>This may considerably reduce the main PDF-file's size.
>
> And if you have multiple subsets, badly named (eg OpenOffice output)
> then you get a final PDF file where some of the text is missing or
> garbled.

So?  Nobody forces anybody to use that option.

>>LilyPond has option `--bigpdfs` for unifying duplicate fonts in this
>>method.
>
> And your point is what ?

That we are talking about functionality that is considered useful?

> That's not what the pdfwrite device is intended for, and we don't
> claim you can use it to do that.
>
> As I said, if you think its that useful, then you can add the switch
> back in. In fact, provided you don't change SubsetFonts, the resulting
> file may well be smaller anyway, since the pdfwrite device will only
> embed that portion of each font (which you say is a complete
> duplicate) so the resulting two fonts will be smaller than the
> original two fonts.
>
> Risking incorrect output for the minimal benefit of a slightly smaller
> file seems unwise to me.

I think "slightly smaller" was something like a factor of 10.  We are
talking about files including literally thousands if not ten thousands
of graphics (manuals close to a thousand pages with lots of graphic
output included).

-- 
David Kastrup

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel