Re: [Rd] Making iconv portable?

2014-12-16 Thread Martin Maechler
> Kurt Hornik 
> on Mon, 15 Dec 2014 18:21:19 +0100 writes:

> Spencer Graves writes:
>> Hello, All: What would it take to make “iconv” portable?


>> I ask, because I want to convert accented characters to
>> vanilla ASCII, thereby converting, e.g., ‘Raúl’ to
>> “Raul”, and Milan Bouchet-Valet suggested on R-help that
>> I use 'iconv(x, “", "ASCII//TRANSLIT”)’.  This worked
>> under Windows but failed on Linux and Mac.  It’s part of
>> the “subNonStandardCharacters” function in the Ecfun
>> package. The development version on R-Forge uses this and
>> returns “Raul” under Windows and NA under Mac OS X (and
>> presumably also Linux).

> Hmm.

R> iconv("Raúl", "", "ASCII//TRANSLIT")
> [1] "Raul"

> seems to work for me on Linux ...

and me:

> iconv("Martin Mächler, Zürich.  ¡España, Olé!", "", "ASCII//TRANSLIT")
[1] "Martin Maechler, Zuerich.  ?Espana, Ole!"


>> The “iconv” R code merely calls compiled code, which I’ve
>> used very little in 30 years.


>> Thanks, Spencer



>>> On Nov 30, 2014, at 2:32 AM, Spencer Graves
>>> >> > wrote:
>>> 
>>> Wonderful.  Thanks very much.  Spencer
>>> 
>>> 
>>> On 11/30/2014 2:25 AM, Milan Bouchet-Valat wrote:

>> [[alternative HTML version deleted]]

>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Making iconv portable?

2014-12-15 Thread Milan Bouchet-Valat
Le lundi 15 décembre 2014 à 13:49 -0500, Simon Urbanek a écrit :
> On Dec 15, 2014, at 1:37 PM, Spencer Graves  
> wrote:
> > 
> > 
> >> On Dec 15, 2014, at 10:13 AM, Simon Urbanek  
> >> wrote:
> >> 
> >>> 
> >>> On Dec 15, 2014, at 12:21 PM, Kurt Hornik  wrote:
> >>> 
>  Spencer Graves writes:
> >>> 
>  Hello, All:  
> What would it take to make “iconv” portable?  
> >>> 
> >>> 
> I ask, because I want to convert accented characters to
> vanilla ASCII, thereby converting, e.g., ‘Raúl’ to “Raul”, and
> Milan Bouchet-Valet suggested on R-help that I use 'iconv(x,
> “", "ASCII//TRANSLIT”)’.  This worked under Windows but failed
> on Linux and Mac.  It’s part of the “subNonStandardCharacters”
> function in the Ecfun package. The development version on
> R-Forge uses this and returns “Raul” under Windows and NA
> under Mac OS X (and presumably also Linux).
> >>> 
> >>> Hmm.
> >>> 
> >>> R> iconv("Raúl", "", "ASCII//TRANSLIT")
> >>> [1] "Raul"
> >>> 
> >>> seems to work for me on Linux ...
> >>> 
> >> 
> >> also on OS X:
> >> 
> >>> iconv("Raúl", "", "ASCII//TRANSLIT")
> >> [1] “Ra'ul"
> > 
> > 
> >   Thanks for the replies.  I should have checked my examples more 
> > carefully.  Consider the following example and a slight modification from 
> > help(“iconv”):  
> > 
> > 
> > > x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
> > > Encoding(x) <- "latin1"
> > > x
> > [1] "Ekstrøm"   "Jöreskog"  "bißchen Zürcher"  
> > > iconv(x, "latin1", "ASCII//TRANSLIT")  # platform-dependent
> > [1] "Ekstrom""J\"oreskog" "bisschen Z\"urcher"
> > > 
> > > x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
> > > x
> > [1] "Ekstr\xf8m""J\xf6reskog"   "bi\xdfchen Z\xfcrcher"
> > > iconv(x, "", "ASCII//TRANSLIT")  # platform-dependent
> > [1] NA NA NA 
> > 
> > 
> >   This suggests a two-step fix to my problem:  (1) Check Encoding(x) 
> > and set to “latin1” if it’s “unknown”.
> 
> Well, that depends heavily on your source. In the above it is hand-crafted 
> latin1 so if you don't declare it, the native encoding will be assumed - 
> which can be anything and has nothing to do with your actual input in this 
> particular case where you hand-constructed latin1.
> 
> 
> >  (2) Delete any new \” added by iconv.  
> > 
> 
> The whole point of translit is to create combinations of ASCII
> characters that represent the unicode characters, so " is just one
> many characters that can be used.
But it's quite unexpected that ö is transliterated to "o and ú to 'u.
Looks like iconv on OS X has a different idea of what ASCII
transliteration means than on Linux and Windows...

Anyway it's easy to remove " and ' if needed.


Regards

> Cheers,
> S
> 
> 
> > 
> >   Thanks again, 
> >   Spencer 
> > 
> >> 
> >> 
> >> 
> >>> -k
> >>> 
> >>> 
>    The “iconv” R code merely calls compiled code, which I’ve used very 
>  little in 30 years.   
> >>> 
> >>> 
> Thanks, 
> Spencer 
> >>> 
> >>> 
> >>> 
> > On Nov 30, 2014, at 2:32 AM, Spencer Graves 
> >  > > wrote:
> > 
> > Wonderful.  Thanks very much.  Spencer
> > 
> > 
> > On 11/30/2014 2:25 AM, Milan Bouchet-Valat wrote:
> >>> 
>   [[alternative HTML version deleted]]
> >>> 
>  __
>  R-devel@r-project.org mailing list
>  https://stat.ethz.ch/mailman/listinfo/r-devel
> >>> 
> >>> __
> >>> R-devel@r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-devel
> > 
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Making iconv portable?

2014-12-15 Thread Simon Urbanek
On Dec 15, 2014, at 1:37 PM, Spencer Graves  wrote:
> 
> 
>> On Dec 15, 2014, at 10:13 AM, Simon Urbanek  
>> wrote:
>> 
>>> 
>>> On Dec 15, 2014, at 12:21 PM, Kurt Hornik  wrote:
>>> 
 Spencer Graves writes:
>>> 
 Hello, All:  
  What would it take to make “iconv” portable?  
>>> 
>>> 
  I ask, because I want to convert accented characters to
  vanilla ASCII, thereby converting, e.g., ‘Raúl’ to “Raul”, and
  Milan Bouchet-Valet suggested on R-help that I use 'iconv(x,
  “", "ASCII//TRANSLIT”)’.  This worked under Windows but failed
  on Linux and Mac.  It’s part of the “subNonStandardCharacters”
  function in the Ecfun package. The development version on
  R-Forge uses this and returns “Raul” under Windows and NA
  under Mac OS X (and presumably also Linux).
>>> 
>>> Hmm.
>>> 
>>> R> iconv("Raúl", "", "ASCII//TRANSLIT")
>>> [1] "Raul"
>>> 
>>> seems to work for me on Linux ...
>>> 
>> 
>> also on OS X:
>> 
>>> iconv("Raúl", "", "ASCII//TRANSLIT")
>> [1] “Ra'ul"
> 
> 
> Thanks for the replies.  I should have checked my examples more 
> carefully.  Consider the following example and a slight modification from 
> help(“iconv”):  
> 
> 
> > x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
> > Encoding(x) <- "latin1"
> > x
> [1] "Ekstrøm"   "Jöreskog"  "bißchen Zürcher"  
> > iconv(x, "latin1", "ASCII//TRANSLIT")  # platform-dependent
> [1] "Ekstrom""J\"oreskog" "bisschen Z\"urcher"
> > 
> > x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
> > x
> [1] "Ekstr\xf8m""J\xf6reskog"   "bi\xdfchen Z\xfcrcher"
> > iconv(x, "", "ASCII//TRANSLIT")  # platform-dependent
> [1] NA NA NA 
> 
> 
> This suggests a two-step fix to my problem:  (1) Check Encoding(x) 
> and set to “latin1” if it’s “unknown”.

Well, that depends heavily on your source. In the above it is hand-crafted 
latin1 so if you don't declare it, the native encoding will be assumed - which 
can be anything and has nothing to do with your actual input in this particular 
case where you hand-constructed latin1.


>  (2) Delete any new \” added by iconv.  
> 

The whole point of translit is to create combinations of ASCII characters that 
represent the unicode characters, so " is just one many characters that can be 
used.

Cheers,
S


> 
> Thanks again, 
> Spencer 
> 
>> 
>> 
>> 
>>> -k
>>> 
>>> 
 The “iconv” R code merely calls compiled code, which I’ve used very 
 little in 30 years.   
>>> 
>>> 
  Thanks, 
  Spencer 
>>> 
>>> 
>>> 
> On Nov 30, 2014, at 2:32 AM, Spencer Graves 
>  > wrote:
> 
> Wonderful.  Thanks very much.  Spencer
> 
> 
> On 11/30/2014 2:25 AM, Milan Bouchet-Valat wrote:
>>> 
[[alternative HTML version deleted]]
>>> 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Making iconv portable?

2014-12-15 Thread Spencer Graves

> On Dec 15, 2014, at 10:13 AM, Simon Urbanek  
> wrote:
> 
>> 
>> On Dec 15, 2014, at 12:21 PM, Kurt Hornik  wrote:
>> 
>>> Spencer Graves writes:
>> 
>>> Hello, All:  
>>>   What would it take to make “iconv” portable?  
>> 
>> 
>>>   I ask, because I want to convert accented characters to
>>>   vanilla ASCII, thereby converting, e.g., ‘Raúl’ to “Raul”, and
>>>   Milan Bouchet-Valet suggested on R-help that I use 'iconv(x,
>>>   “", "ASCII//TRANSLIT”)’.  This worked under Windows but failed
>>>   on Linux and Mac.  It’s part of the “subNonStandardCharacters”
>>>   function in the Ecfun package. The development version on
>>>   R-Forge uses this and returns “Raul” under Windows and NA
>>>   under Mac OS X (and presumably also Linux).
>> 
>> Hmm.
>> 
>> R> iconv("Raúl", "", "ASCII//TRANSLIT")
>> [1] "Raul"
>> 
>> seems to work for me on Linux ...
>> 
> 
> also on OS X:
> 
>> iconv("Raúl", "", "ASCII//TRANSLIT")
> [1] “Ra'ul"


  Thanks for the replies.  I should have checked my examples more 
carefully.  Consider the following example and a slight modification from 
help(“iconv”):  


> x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
> Encoding(x) <- "latin1"
> x
[1] "Ekstrøm"   "Jöreskog"  "bißchen Zürcher"  
> iconv(x, "latin1", "ASCII//TRANSLIT")  # platform-dependent
[1] "Ekstrom""J\"oreskog" "bisschen Z\"urcher"
> 
> x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
> x
[1] "Ekstr\xf8m""J\xf6reskog"   "bi\xdfchen Z\xfcrcher"
> iconv(x, "", "ASCII//TRANSLIT")  # platform-dependent
[1] NA NA NA 


  This suggests a two-step fix to my problem:  (1) Check Encoding(x) 
and set to “latin1” if it’s “unknown”.  (2) Delete any new \” added by iconv.  


  Thanks again, 
  Spencer 

> 
> 
> 
>> -k
>> 
>> 
>>>  The “iconv” R code merely calls compiled code, which I’ve used very 
>>> little in 30 years.   
>> 
>> 
>>>   Thanks, 
>>>   Spencer 
>> 
>> 
>> 
 On Nov 30, 2014, at 2:32 AM, Spencer Graves 
 >>> > wrote:
 
 Wonderful.  Thanks very much.  Spencer
 
 
 On 11/30/2014 2:25 AM, Milan Bouchet-Valat wrote:
>> 
>>> [[alternative HTML version deleted]]
>> 
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>> __
>> R-devel@r-project.org  mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel 
>> 

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Making iconv portable?

2014-12-15 Thread Simon Urbanek

> On Dec 15, 2014, at 12:21 PM, Kurt Hornik  wrote:
> 
>> Spencer Graves writes:
> 
>> Hello, All:  
>>What would it take to make “iconv” portable?  
> 
> 
>>I ask, because I want to convert accented characters to
>>vanilla ASCII, thereby converting, e.g., ‘Raúl’ to “Raul”, and
>>Milan Bouchet-Valet suggested on R-help that I use 'iconv(x,
>>“", "ASCII//TRANSLIT”)’.  This worked under Windows but failed
>>on Linux and Mac.  It’s part of the “subNonStandardCharacters”
>>function in the Ecfun package. The development version on
>>R-Forge uses this and returns “Raul” under Windows and NA
>>under Mac OS X (and presumably also Linux).
> 
> Hmm.
> 
> R> iconv("Raúl", "", "ASCII//TRANSLIT")
> [1] "Raul"
> 
> seems to work for me on Linux ...
> 

also on OS X:

> iconv("Raúl", "", "ASCII//TRANSLIT")
[1] "Ra'ul"



> -k
> 
> 
>>   The “iconv” R code merely calls compiled code, which I’ve used very 
>> little in 30 years.   
> 
> 
>>Thanks, 
>>Spencer 
> 
> 
> 
>>> On Nov 30, 2014, at 2:32 AM, Spencer Graves 
>>> >> > wrote:
>>> 
>>> Wonderful.  Thanks very much.  Spencer
>>> 
>>> 
>>> On 11/30/2014 2:25 AM, Milan Bouchet-Valat wrote:
> 
>>  [[alternative HTML version deleted]]
> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Making iconv portable?

2014-12-15 Thread Kurt Hornik
> Spencer Graves writes:

> Hello, All:  
> What would it take to make “iconv” portable?  


> I ask, because I want to convert accented characters to
> vanilla ASCII, thereby converting, e.g., ‘Raúl’ to “Raul”, and
> Milan Bouchet-Valet suggested on R-help that I use 'iconv(x,
> “", "ASCII//TRANSLIT”)’.  This worked under Windows but failed
> on Linux and Mac.  It’s part of the “subNonStandardCharacters”
> function in the Ecfun package. The development version on
> R-Forge uses this and returns “Raul” under Windows and NA
> under Mac OS X (and presumably also Linux).

Hmm.

R> iconv("Raúl", "", "ASCII//TRANSLIT")
[1] "Raul"

seems to work for me on Linux ...

-k


>The “iconv” R code merely calls compiled code, which I’ve used very 
> little in 30 years.   


> Thanks, 
> Spencer 



>> On Nov 30, 2014, at 2:32 AM, Spencer Graves 
>> > > wrote:
>> 
>> Wonderful.  Thanks very much.  Spencer
>> 
>> 
>> On 11/30/2014 2:25 AM, Milan Bouchet-Valat wrote:

>   [[alternative HTML version deleted]]

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Making iconv portable?

2014-12-15 Thread Spencer Graves
Hello, All:  


  What would it take to make “iconv” portable?  


  I ask, because I want to convert accented characters to vanilla 
ASCII, thereby converting, e.g., ‘Raúl’ to “Raul”, and Milan Bouchet-Valet 
suggested on R-help that I use 'iconv(x, “",  "ASCII//TRANSLIT”)’.  This worked 
under Windows but failed on Linux and Mac.  It’s part of the 
“subNonStandardCharacters” function in the Ecfun package. The development 
version on R-Forge uses this and returns “Raul” under Windows and NA under Mac 
OS X (and presumably also Linux).


 The “iconv” R code merely calls compiled code, which I’ve used very 
little in 30 years.   


  Thanks, 
  Spencer 



> On Nov 30, 2014, at 2:32 AM, Spencer Graves 
>  > wrote:
> 
> Wonderful.  Thanks very much.  Spencer
> 
> 
> On 11/30/2014 2:25 AM, Milan Bouchet-Valat wrote:

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel