On 15/12/2014 05:33, Spencer Graves wrote:
Hello, All:


          What do people do to strip accents from latin characters, returning 
vanilla ASCII?

I think the devil is the detail here: what is Latin? Latin-1 has characters for which this is unclear, let alone Latin-2 or Latin-7.

What I would do is

1) convert to UTF-8 with iconv()
2) convert to Unicode points with utf8ToInt().
3) remap the Unicode characters with an integer lookup table tab[].
4) convert back to UTF-8, then to the desired encoding (or mark as UTF-8 with Encoding()).

As I suspect all the characters you do want to convert are in the first few planes of Unicode, the lookup table can be small, maybe less than 512 elements. So for example ú is Unicode 250 and the value of tab[250] should be 117. iconv() with transliteration might give you a good start for preparing that table.

(Note that transliteration to two chars is often more acceptable/widely applicable. E.g. å to aa and ß to ss.)


          For example, I want to convert ‘Raúl’ to “Raul”.  Milan (below) suggested 
'iconv(x, “",  "ASCII//TRANSLIT”)’.  This worked under Windows but failed on 
Linux and Mac.  It’s part of the “subNonStandardCharacters” function in the Ecfun 
package.  The development version on R-Forge uses this and returns “Raul” under Windows 
and NA under Mac OS X (and something different from “Raul”, presumably NA, under Linux).


          Thanks,
          Spencer


On Nov 30, 2014, at 2:32 AM, Spencer Graves 
<spencer.gra...@structuremonitoring.com> wrote:

Wonderful.  Thanks very much.  Spencer


On 11/30/2014 2:25 AM, Milan Bouchet-Valat wrote:
Le dimanche 30 novembre 2014 à 02:14 -0800, Spencer Graves a écrit :
Hello:


        How can one convert Latin characters with to the corresponding
characters without?  For example, I want to convert "ú" to "u", similar
to how tolower('U') returns "u".


        This can be done using chartr{base}, e.g., chartr('ú', 'u',
'Raúl') returns "Raul".  However, I wondered if a simpler version of
this is available.
This appears to work:
iconv("ù", "", "ASCII//TRANSLIT")
[1] "u"


Regards

        Thanks,
        Spencer


p.s.   findFn('convert to ascii') found 117 help pages in 70 packages.
A brief review identified two to "Convert to ASCII": ASCIIfy {gtools}
and stri_enc_toascii {stringi}.  Neither of these did what I expected.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,                  rip...@stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford
1 South Parks Road, Oxford OX1 3TG, UK

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to