Re: guile-2.0 and debian

2016-11-19 Thread Thomas Morley
Hi,

for now I looked only at a single problem, the meta-data. (Giving me a
royal headache)

2016-11-19 16:05 GMT+01:00 Antonio Ospite :

> The following change fixes the issue at hand:
>
> -
> diff --git a/scm/framework-ps.scm b/scm/framework-ps.scm
> index a404119..b2b6802 100644
> --- a/scm/framework-ps.scm
> +++ b/scm/framework-ps.scm
> @@ -28,6 +28,9 @@
>   (scm clip-region)
>   (lily))
>
> +(if (guile-v2)
> +  (use-modules(rnrs bytevectors)))
> +
>  (define format ergonomic-simple-format)
>
>  (define framework-ps-module (current-module))
> @@ -518,15 +521,22 @@
>(define (metadata-encode val)
>  ;; First, call ly:encode-string-for-pdf to encode the string (Latin1 or
>  ;; utf-16be), then escape all parentheses and backslashes
> -;; FIXME guile-2.0: use (string->utf16 str 'big) instead
> +;; With guile-2.0: use (string->utf16 str 'big) instead
> +(if (guile-v2)
> +  (ps-quote (utf16->string (string->utf16 val 'big)))

Well, the line above does not make much sense, because val is
effectively returned unchanged, before ps-quote does it's work. See:
(string=? (utf16->string (string->utf16 "ちりぬるを)" 'big)) "ちりぬるを)")
--> #t

> +  (ps-quote (ly:encode-string-for-pdf val
>
> -(ps-quote (ly:encode-string-for-pdf val)))
>(define (metadata-lookup-output overridevar fallbackvar field)
>  (let* ((overrideval (ly:modules-lookup (list header) overridevar))
> (fallbackval (ly:modules-lookup (list header) fallbackvar))
> (val (if overrideval overrideval fallbackval)))
>(if val
> -  (format port "/~a (~a)\n" field (metadata-encode (markup->string 
> val (list header)))
> +(begin
> +  (format port "/~a (" field)
> +  (set-port-encoding! port "UTF-16")
> +  (format port "~a" (metadata-encode (markup->string val (list 
> header
> +  (set-port-encoding! port "ISO-8859-1")
> +  (format port ")\n")
>
>(if (module? header)
>(begin
> -
>
> This is rather ugly, but encoding only the actual _value_ of the field
> in UTF-16 allows to have exactly the same output as with guile-1.8.

Well, this example gives a gs-error again:

\header { title = "ちりぬるを)" } \markup \null

> The issue is about a file (the postscript file) with a mixed encoding
> (Latin1 and UTF-16) while the file port only has one encoding.
>
> AFAICS in guile-2.0 the difference between characters and bytes is taken
> very seriously.
>
> I'll try to set %default-port-conversion-strategy to 'error and see if
> some other issue shows up.  Where is the earliest point I can set that?

I tried different settings at different places to no avail. So
currently I don't know.


Cheers,
  Harm

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Add using Extract PDFmark for document building (issue 314130043 by truer...@gmail.com)

2016-11-19 Thread trueroad

Thank you for your advice.



https://codereview.appspot.com/314130043/diff/20001/configure.ac
File configure.ac (right):

https://codereview.appspot.com/314130043/diff/20001/configure.ac#newcode305
configure.ac:305: [" documentation and the final PDF files.)"])
On 2016/11/19 14:06:07, pkx166h wrote:

All I would say is that "... and the final PDF files." is redundant

*if* this

change ONLY affects the PDF files and the rest of the 'documentation

files' are

unaffected then simply say:



"...significantly reduce the disk space needed for the PDF files when

building

the documentation".


It also affects the size of intermediate PDF files not only the final
PDF files.


Also, would it be better to state 'Ghostscript' rather than 'gs' in

the message?

Done.

https://codereview.appspot.com/314130043/

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: guile-2.0 and debian

2016-11-19 Thread Antonio Ospite
Hi,

just to confirm my understanding about some previous items.

On Thu, 17 Nov 2016 13:05:47 +0100
Antonio Ospite  wrote:

[...]
> I compared the postscript output of the same lilypond git revision built
> with both guile-1.8 (in a debian stable container) and guile-2.0, and by
> looking closely at this issue the conversion from postscript to pdf
> fails for two possible reasons:
> 
>   1. The numbers are represented with the decimal separator from the
>  locale instead of a period;
>

If I got it right this happens because the custom formatting functions
defined in lilypond (lily/general-scheme.cc) use the "%f" format
specifier, but in a guile-2.0 scenario lilypond calls
  (setlocale LC_ALL "")
which brings in LC_NUMERIC from the environment which affects the "%f"
output.

So making lilypond more locale-independent as David suggested in
another message, and removing the setlocale call, could fix this too.

For the time being I am overriding LC_NUMERIC in a few places:
https://ao2.it/tmp/lilypond-guile2/patches_2016-11-19/0007-Print-floating-point-variables-using-a-period-as-the.patch

>   2. The embedded fonts are encoded in UTF-8, while it looks like that
>  when using guile-1.8 the postscript file is encoded in latin1:
> 

Again, here, the UTF-8 encoding for the file port seems to be brought
in from the locale, maybe via make-tmpfile called in
scm/framework-ps.scm.

For now I am hardcoding the Latin1 encoding for the postscript output:
https://ao2.it/tmp/lilypond-guile2/patches_2016-11-19/0008-Fix-the-encoding-of-the-postscript-output.patch

But could dropping the (setlocale LC_ALL "") help here too?
Or would it be better to use a binary port for output as well, to be
safe?

Ciao ciao,
   Antonio

-- 
Antonio Ospite
https://ao2.it
https://twitter.com/ao2it

A: Because it messes up the order in which people normally read text.
   See http://en.wikipedia.org/wiki/Posting_style
Q: Why is top-posting such a bad thing?

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: guile-2.0 and debian

2016-11-19 Thread David Kastrup
Antonio Ospite  writes:

> AFAICS in guile-2.0 the difference between characters and bytes is
> taken very seriously.

The problem is that lily/parser.yy and particularly lily/lexer.ll
implement robust and fast recognition and interpretation of UTF-8.  It
transparently maps them to C++ strings encoded in UTF-8.

Guile-2.0 has _no_ UTF-8 encoded strings.  Its strings are _either_
encoded in Latin-1 or in UCS-32.  Its string _ports_ are exclusively
encoded in UTF-8 and that also includes any file offsets in the string
ports.  As a result, its string port offsets are _useless_ for indexing
into strings.

If you want to get an UTF-8 string into Guile, it will get decoded into
UCS-32 only to be reencoded into UTF-8 when moved through a string port
(like when using the Scheme reader on it) and have each character be
redecoded into UCS-32 that will get reencoded into UTF-8 when getting it
back into C++.

Guile-2.0 cannot work efficiently with string ports internally since it
constantly needs to recode stuff.  Its UTF-8 encoding/decoding (unlike
that of Emacs) cannot represent anything not in proper UTF-8: it either
produces stuff that does not encode into the original, or errors out
without remedy and useful offsets.  As a consequence, pinpointing the
problem into the original string or byte sequence is unreliable.

The UTF-8 libraries Guile employs are not internal to Guile (though
partly distributed as part of Guile rather than an external dependency).
Very little active work on them has been done in recent years.

The Guile developers will be in total denial that anything is amiss with
the current situation and that there is anything wrong with the
inability of Guile to read and write UTF-8 strings without involving a
non-information preserving conversion to UCS-32 or Latin-1 and back and
having its string ports work in an encoding that its strings cannot
represent.

LilyPond uses Guile as a very tightly integrated extension language so
it constantly passes strings into Guile and back and reads from string
ports.

Actual byte streams seem like they could help keeping some of this
insanity in check, in particular if you can let the Scheme reader treat
them as if they were in UTF-8.

Now in Guile-1.8, we did a lot of the UTF-8 work seamlessly and
manually.  There are a few rough corners with that in the context of
Scheme identifiers and strings.

Doing stuff "the Guile way" instead will be good for a lot of headaches
since Guile's representations are not even compatible within Guile
itself and since any attempt of getting strings into and out of Guile
requires a conversion since Guile's internal encodings are not exposed
to its API.

-- 
David Kastrup

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: guile-2.0 and debian

2016-11-19 Thread Thomas Morley
2016-11-19 16:05 GMT+01:00 Antonio Ospite :

> But I don't really know what I am talking about here...

Speaking of it, the whole encoding-issue is not exactly my prefered topic ...

> I still need to clean things up, and ask for advice for better fixes,
> but I wanted to report something in case you had some time in the
> week-end.

Surprisingly I have to work this weekend, I'll see what I can do, though.

> Thanks,
>Antonio

We have to say a big, big thank you for all your work!


Cheers,
  Harm

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: guile-2.0 and debian

2016-11-19 Thread Antonio Ospite
On Thu, 17 Nov 2016 22:24:27 +0100
David Kastrup  wrote:

> Antonio Ospite  writes:
> 
> > -
> > (process:18706): Pango-WARNING **: Invalid UTF-8 string passed to 
> > pango_layout_set_text()
> > -
> >
> > and in the final files only a part of the "büüh" string was rendered,
> > however the "ü" was rendered correctly.
> >
> > So I added a printout to see what was going on:
> >
> > -
> > diff --git a/lily/lily-guile.cc b/lily/lily-guile.cc
> > index 2c519ec..9c0c10c 100644
> > --- a/lily/lily-guile.cc
> > +++ b/lily/lily-guile.cc
> > @@ -132,6 +132,7 @@ ly_scm2string (SCM str)
> >result.resize (len);
> >scm_to_locale_stringbuf (str,  (0), len);
> >  }
> > +  fprintf(stderr, "%s: len: %d result: '%s'\n", __func__, len, 
> > result.c_str());
> >return result;
> >  }
> > -
> >
> > with guile-1.8:
> > -
> > ly_scm2string: len: 6 result: 'büüh'
> > -
> >
> > with guile-2.0:
> > -
> > ly_scm2string: len: 4 result: 'bü�'
> >
> > (process:18706): Pango-WARNING **: Invalid UTF-8 string passed to 
> > pango_layout_set_text()
> > -
> >
> > In ly_scm2string() I see that scm_c_string_length() is used, by looking
> > at the documentation
> > (https://www.gnu.org/software/guile/manual/html_node/String-Selection.html#String-Selection)
> > I read:
> >
> > Return the number of characters in string.
> >
> > So 4 characters looks correct to me, even if they take 6 bytes.
> >
> > IMHO it can be safer not to mix scm_c_string_length() and
> > scm_to_locale_stringbuf().
> 
> I've just done a git grep of ly_scm2string and even if you fix that bug,
> most uses of it should _not_ use the current locale.  So obviously
> ly_scm2string needs to get split into several different functions.  The
> current locale should only be used for writing to the _console_.
> Possibly also for writing to the log file.  For everything else,
> LilyPond is likely utf-8 (or Latin-1 for efficiency reasons when
> LilyPond _knows_ that only the common ASCII subset of utf-8 and Latin-1
> is being used).
> 

That makes sense of course; having the current locale affecting how the
input and output files are treated seemed a little weird to me.

I don't think I can put time into it, tho.

However, if you confirm that my patch above is valid (even if not
complete) I'll start submitting that, which already improves the
situation with guile-2.0.

A self-contained patch is here:
https://ao2.it/tmp/lilypond-guile2/patches_2016-11-19/0010-Fix-converting-SCM-strings-with-wide-characters-to-s.patch

Thanks,
   Antonio

-- 
Antonio Ospite
https://ao2.it
https://twitter.com/ao2it

A: Because it messes up the order in which people normally read text.
   See http://en.wikipedia.org/wiki/Posting_style
Q: Why is top-posting such a bad thing?

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: guile-2.0 and debian

2016-11-19 Thread Antonio Ospite
On Sat, 19 Nov 2016 16:05:22 +0100
Antonio Ospite  wrote:

[...]
>   - and than when scm_read() (lily/parse-scm.cc in
  ^
This is a "then".

> internal_ly_parse_scm()) is called to parse the embedded scm, and
> the latter is interpreted as Latin1 too.
>   

-- 
Antonio Ospite
https://ao2.it
https://twitter.com/ao2it

A: Because it messes up the order in which people normally read text.
   See http://en.wikipedia.org/wiki/Posting_style
Q: Why is top-posting such a bad thing?

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: guile-2.0 and debian

2016-11-19 Thread Antonio Ospite
On Fri, 18 Nov 2016 00:24:19 +0100
Thomas Morley  wrote:

[...]
> Hi Antonio,
>

Hi,

> as said, no time to dive in deeper, though here some observations (my
> test-file attached.)
> 
> - toplevel-markups with special characters working
> - ly-identifier with special characters working
> - context-names with special characters and assigned Lyrics working
> 
> - scheme/guile-identifier with special characters _not_ working

AFAIU this is what happens with guile-2.0:

  - lilypond tries to open the input file as a bytevector input port
(see lily/source-file.cc in Source_file::init_port ()), the encoding
is set to Latin1 which should mean a "pure" binary port:
https://www.gnu.org/software/guile/docs/master/guile.html/Encoding.html

  - and than when scm_read() (lily/parse-scm.cc in
internal_ly_parse_scm()) is called to parse the embedded scm, and
the latter is interpreted as Latin1 too.
  
  - while the identifiers outside the scm code are recognized as UTF-8.

To confirm my rationale I tried this weird input file, and it does not
give the error, even though the final output is not quite right (bööh
gets the wrong encoding too):

-
#(define bääh #{ { c1^\markup "bööh" } #})
\new Staff \bääh
-

If lilypond were to set the %default-port-encoding to UTF-8 your example
would work, but I don't know if that would break something else.

Alternatively, a more "confined" change could look like this:

-
diff --git a/lily/parse-scm.cc b/lily/parse-scm.cc
index 576591d..20627ed 100644
--- a/lily/parse-scm.cc
+++ b/lily/parse-scm.cc
@@ -54,7 +54,14 @@ internal_ly_parse_scm (Parse_start *ps)
   if (multiple)
 (void) scm_read_char (port);

+#if GUILEV2
+  SCM current_encoding = scm_port_encoding (port);
+  scm_set_port_encoding_x (port, ly_string2scm("UTF-8"));
   SCM form = scm_read (port);
+  scm_set_port_encoding_x (port, current_encoding);
+#else
+  SCM form = scm_read (port);
+#endif
   SCM to = scm_ftell (port);
-

But, really? :)

> - pdf-meta-data with special characters _not_ working:
> 
> exiftool atest-40.pdf
> ExifTool Version Number : 10.10
> File Name   : atest-40.pdf
> ...skipping...
> Title   : ??b??h
> Creator : LilyPond 2.19.51
> 

Your test about pdf-meta-data is enough to trigger the issue, but it
firstly lead me to a partial understanding of the issue because those
characters are also representable in Latin1. A better example is to use
the Japanese text in the metadata.

BTW, the issue is that the values of the pdf metadata fields are
expected to be in UTF-16, but since my last change the port for the
postscript file uses the Latin1 encoding globally, and guile was
substituting the characters it didn't recognize as representable in the
encoding (the first two "??" are the BOM and the other two are the
udieresis).

The following change fixes the issue at hand:

-
diff --git a/scm/framework-ps.scm b/scm/framework-ps.scm
index a404119..b2b6802 100644
--- a/scm/framework-ps.scm
+++ b/scm/framework-ps.scm
@@ -28,6 +28,9 @@
  (scm clip-region)
  (lily))

+(if (guile-v2)
+  (use-modules(rnrs bytevectors)))
+
 (define format ergonomic-simple-format)

 (define framework-ps-module (current-module))
@@ -518,15 +521,22 @@
   (define (metadata-encode val)
 ;; First, call ly:encode-string-for-pdf to encode the string (Latin1 or
 ;; utf-16be), then escape all parentheses and backslashes
-;; FIXME guile-2.0: use (string->utf16 str 'big) instead
+;; With guile-2.0: use (string->utf16 str 'big) instead
+(if (guile-v2)
+  (ps-quote (utf16->string (string->utf16 val 'big)))
+  (ps-quote (ly:encode-string-for-pdf val

-(ps-quote (ly:encode-string-for-pdf val)))
   (define (metadata-lookup-output overridevar fallbackvar field)
 (let* ((overrideval (ly:modules-lookup (list header) overridevar))
(fallbackval (ly:modules-lookup (list header) fallbackvar))
(val (if overrideval overrideval fallbackval)))
   (if val
-  (format port "/~a (~a)\n" field (metadata-encode (markup->string val 
(list header)))
+(begin
+  (format port "/~a (" field)
+  (set-port-encoding! port "UTF-16")
+  (format port "~a" (metadata-encode (markup->string val (list 
header
+  (set-port-encoding! port "ISO-8859-1")
+  (format port ")\n")

   (if (module? header)
   (begin
-

This is rather ugly, but encoding only the actual _value_ of the field
in UTF-16 allows to 

Re: Add using Extract PDFmark for document building (issue 314130043 by truer...@gmail.com)

2016-11-19 Thread pkx166h

Domou Hosoda-Sama.


https://codereview.appspot.com/314130043/diff/20001/configure.ac
File configure.ac (right):

https://codereview.appspot.com/314130043/diff/20001/configure.ac#newcode305
configure.ac:305: [" documentation and the final PDF files.)"])
All I would say is that "... and the final PDF files." is redundant *if*
this change ONLY affects the PDF files and the rest of the
'documentation files' are unaffected then simply say:

"...significantly reduce the disk space needed for the PDF files when
building the documentation".

Also, would it be better to state 'Ghostscript' rather than 'gs' in the
message?

https://codereview.appspot.com/314130043/

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: Add using Extract PDFmark for document building (issue 314130043 by truer...@gmail.com)

2016-11-19 Thread trueroad

On 2016/11/19 06:11:09, lemzwerg wrote:

Ah, bad wording, sorry.  I don't care about the actual length of the

message but

the line length in the source code, so please just wrap the message to

stay

within 80 columns or so if possible.



Here's another version of the message which you might consider.



   Optionally using gs >= 9.20 together with extractpdfmark can
   significantly reduce the disk space required for building the
   documentation and the final PDF files.



Maybe native English speakers find something even better :-)


Thank you for your suggestion.
I've uploaded Patch Set 2.


https://codereview.appspot.com/314130043/

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: meta-data with guile2

2016-11-19 Thread Masamichi Hosoda
> Hi,
> 
> this is a single problem from
> http://lists.gnu.org/archive/html/lilypond-devel/2016-11/msg00090.html
> about pdf-meta-data with guile2
> 
> In framework-ps.scm we have `metadata-encode' defined as part of
> `handle-metadata'
> There's the comment:
> ;; First, call ly:encode-string-for-pdf to encode the string (latin1 or
> ;; utf-16be), then escape all parentheses and backslashes
> ;; FIXME guile-2.0: use (string->utf16 str 'big) instead
> 
> `handle-metadata' finally returns
> (ps-quote (ly:encode-string-for-pdf val))
> 
> Why did we do so at all?
> In my (ofcourse limited) testings I see no difference doing directly:
> (ps-quote val)
> instead.

Non-Latin-1 strings (like Japanese strings)
in PDF metadata must be expressed in UTF-16BE encoding.
`ly:encode-string-for-pdf` converts such strings
to UTF-16BE encoding strings.

If you use directly `(ps-quote val)`, non-Latin-1 strings will broken.

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


Re: meta-data with guile2

2016-11-19 Thread Antonio Ospite
On Sat, 19 Nov 2016 13:09:08 +0100
Thomas Morley  wrote:

> Hi,
> 
> this is a single problem from
> http://lists.gnu.org/archive/html/lilypond-devel/2016-11/msg00090.html
> about pdf-meta-data with guile2
> 
> In framework-ps.scm we have `metadata-encode' defined as part of
> `handle-metadata'
> There's the comment:
> ;; First, call ly:encode-string-for-pdf to encode the string (latin1 or
> ;; utf-16be), then escape all parentheses and backslashes
> ;; FIXME guile-2.0: use (string->utf16 str 'big) instead
> 
> `handle-metadata' finally returns
> (ps-quote (ly:encode-string-for-pdf val))
> 
> Why did we do so at all?

I am not sure if this is a requirement of the PDF spec that metadata
has to be encoded in UTF-16 if wide characters are present.

> In my (ofcourse limited) testings I see no difference doing directly:
> (ps-quote val)
> instead.
>

But that would still not solve the problem about the substitution of
wide characters (try with the japanese text in metadata) when writing to
the postscript file port which has a latin1 encoding, would it?

BTW, I found a way to make your test file work, I am preparing a
reply to your previous message, just waiting for "make LANGS='' doc" to
complete... which takes a while here.

Thanks,
   Antonio

-- 
Antonio Ospite
https://ao2.it
https://twitter.com/ao2it

A: Because it messes up the order in which people normally read text.
   See http://en.wikipedia.org/wiki/Posting_style
Q: Why is top-posting such a bad thing?

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel


meta-data with guile2

2016-11-19 Thread Thomas Morley
Hi,

this is a single problem from
http://lists.gnu.org/archive/html/lilypond-devel/2016-11/msg00090.html
about pdf-meta-data with guile2

In framework-ps.scm we have `metadata-encode' defined as part of
`handle-metadata'
There's the comment:
;; First, call ly:encode-string-for-pdf to encode the string (latin1 or
;; utf-16be), then escape all parentheses and backslashes
;; FIXME guile-2.0: use (string->utf16 str 'big) instead

`handle-metadata' finally returns
(ps-quote (ly:encode-string-for-pdf val))

Why did we do so at all?
In my (ofcourse limited) testings I see no difference doing directly:
(ps-quote val)
instead.

Thanks,
  Harm

___
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel