[R] reading and frequency analysis of Spanish text

Michael Friendly Wed, 05 Aug 2009 11:20:58 -0700

For an historical paper I'm working on, I have some Spanish plaintext,presently in the form of a Word .doc

file,
http://euclid.psych.yorku.ca/SCS/Gallery/images/Private/Langren/Verdadera-spanish-stripped.doc

and also some ciphered text from the same original source. The ultimategoal is to use somefrequency analysis of letters and word lengths in the plaintext to helpdecode the ciphered text.

For now, I'm stuck on how to read the Spanish plaintext into R as a textstring, given that it is in a Word .doc fileusing some form of latin1 encoding. From Word, I can Save As .. plaintext (.txt), but I'm worried about losingcharacter encoding information and I don't see anything in the list ofOther encodings presented that seemshelpful.

A naive attempt to read the .doc file directly gives:

> langren.sp.file <-"http://euclid.psych.yorku.ca/SCS/Gallery/images/Private/Langren/Verdadera-spanish-stripped.doc";

>
> langren.txt <- scan(langren.sp.file, encoding="latin1")

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,na.strings, :

 scan() expected 'a real', got 'ÐÏà¡±á'
>

Can someone help?

--

Michael Friendly Email: friendly AT yorku DOT caProfessor, Psychology Dept.

York University      Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street    http://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT  M3J 1P3 CANADA

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] reading and frequency analysis of Spanish text

Reply via email to