Re: UTF-8 regression in guile 1.9.5

2009-12-06 Thread Linas Vepstas
2009/12/6 Mike Gran :
>
>> > need to call (setlocale LC_ALL "")
>
> But for Guile to store characters as codepoints, declaring a locale
> pretty much a requirement now.

Would it make sense to add (setlocale LC_ALL "") to some default,
e.g. boot-9.scm  ?

--linas




Re: UTF-8 regression in guile 1.9.5

2009-12-06 Thread Mike Gran


> > Hmm.  The "ã" is a dead giveaway that you are printing a UTF-8 string
> > that is being interpreted as a ISO-8859-1 string.
> >
> > You've already said that you're in a UTF-8 locale.  It could be that you
> > need to call (setlocale LC_ALL "")
> 
> That cured it.
> 
> > as well as having a setlocale call in your program.
> 
> Doesn't seem to be required, after the above.
> 
> Thanks!
> 
> Why this happened is strange; I'm now investigating.  Sorry to
> have bothered you with something that is dohh .. basic.

1.9.x does work fundamentally differently w.r.t. strings.
The reason for that is because of how strings are now stored.
In 1.8.x, a character was a byte.  In 1.9.x a character is a 
codepoint.

But for Guile to store characters as codepoints, declaring a locale
pretty much a requirement now.

-Mike




Re: UTF-8 regression in guile 1.9.5

2009-12-06 Thread Linas Vepstas
2009/12/6 Mike Gran :
>> From: Linas Vepstas 
>
>
>> Then, from the guile prompt, I can evaluate the following:
>>
>>    (new-node "てみました。")
>>
>> and get the output "The name is てみました。"
>>
>>
>> However, in guile-1.9.5, the above gives me:
>>
>>    "The name is ã¦ã¿ã¾ããã"
>
> Hmm.  The "ã" is a dead giveaway that you are printing a UTF-8 string
> that is being interpreted as a ISO-8859-1 string.
>
> You've already said that you're in a UTF-8 locale.  It could be that you
> need to call (setlocale LC_ALL "")

That cured it.

> as well as having a setlocale call in your program.

Doesn't seem to be required, after the above.

Thanks!

Why this happened is strange; I'm now investigating.  Sorry to
have bothered you with something that is dohh .. basic.

--linas




Re: UTF-8 regression in guile 1.9.5

2009-12-06 Thread Mike Gran
> From: Linas Vepstas 


> Then, from the guile prompt, I can evaluate the following:
> 
>(new-node "てみました。")
> 
> and get the output "The name is てみました。"
> 
> 
> However, in guile-1.9.5, the above gives me:
> 
>"The name is ã¦ã¿ã¾ããã"

Hmm.  The "ã" is a dead giveaway that you are printing a UTF-8 string
that is being interpreted as a ISO-8859-1 string.

You've already said that you're in a UTF-8 locale.  It could be that you 
need to call (setlocale LC_ALL "") from the command line before entering
(new-node "てみました。") as well as having a setlocale call in your program.

Thanks,

Mike Gran




UTF-8 regression in guile 1.9.5

2009-12-06 Thread Linas Vepstas
Hi,

I seem to see either a regression in guile-1.9.5 with regard
to UTF-8 strings, or at least some sort of incompatible change.

In guile-1.8.6, I am able to do the following:

SCM new_node (SCM sname)
{
char * cname = scm_to_locale_string(sname);
printf ("The name is %s\n", cname);
free (cname);
return SCM_EOL;
}

scm_c_define_gsubr("new-node", 1, 0, 0, ss_name);

Then, from the guile prompt, I can evaluate the following:

   (new-node "てみました。")

and get the output "The name is てみました。"


However, in guile-1.9.5, the above gives me:

   "The name is てみましたã€"

Now, it is very possible that I've forgotten to say

  (use-modules some-new-utf8-module)

but I am unclear on what that module is (and why its not
specified by default).

In both cases, my shell has: LANG=en_US.UTF-8

--linas