Re: [R] [Rd] source(echo = TRUE) with a iso-8859-1 encoded file gives an error
On Fri, May 04, 2018 at 10:58:26PM +, Ista Zahn wrote: > On Fri, May 4, 2018 at 4:47 PM, Scott Kostyshak <skostys...@ufl.edu> wrote: > > I have very little knowledge about file encodings and would like to > > learn more. > > > > I've read the following pages to learn more: > > > > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__stat.ethz.ch_R-2Dmanual_R-2Ddevel_library_base_html_Encoding.html=DwIFaQ=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM=neJ42wVqpDzuvOKMBML6-HnbH0l0aXpb0ZUFWoGb-Bo=yaDPpePO4lxR7-PBircARZlFh-GVyi5sTNtjTr_JZ7U=PSqR5opjnHspAeM6Edm1ddsaY3ok1bnV-t6W4MKtVCM= > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_questions_4806823_how-2Dto-2Ddetect-2Dthe-2Dright-2Dencoding-2Dfor-2Dread-2Dcsv=DwIFaQ=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM=neJ42wVqpDzuvOKMBML6-HnbH0l0aXpb0ZUFWoGb-Bo=yaDPpePO4lxR7-PBircARZlFh-GVyi5sTNtjTr_JZ7U=1M6pNfwFR5uG5DkSAHPpXZKYETCiwV1wsJxpew6lThY= > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__developer.r-2Dproject.org_Encodings-5Fand-5FR.html=DwIFaQ=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM=neJ42wVqpDzuvOKMBML6-HnbH0l0aXpb0ZUFWoGb-Bo=yaDPpePO4lxR7-PBircARZlFh-GVyi5sTNtjTr_JZ7U=hAF57aL9khHQ_2Ndars7qMO-FoqxnnmOiEDIprsllko= > > > > The last one, in particular, has been very helpful. I would be > > interested in any further references that you suggest. > > > > I attach a file that reproduces the issue I would like to learn more > > about. I do not know if the file encoding will be correctly preserved > > through email, so I also provide the file (temporarily) on Dropbox here: > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_3lbgebk7b5uaia7_encoding-5Fexport-5Fissue.R-3Fdl-3D0=DwIFaQ=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM=neJ42wVqpDzuvOKMBML6-HnbH0l0aXpb0ZUFWoGb-Bo=yaDPpePO4lxR7-PBircARZlFh-GVyi5sTNtjTr_JZ7U=fGtYdB-U7ktXVFeniRudE-ZmxmCP3ZUfeLOvJ0AJwqs= > > > > The file gives an error when using "source()" with the > > argument echo = TRUE: > > > > > source("encoding_export_issue.R", echo = TRUE) > > Error in nchar(dep, "c") : invalid multibyte string, element 1 > > In addition: Warning message: > > In grepl("^[[:blank:]]*$", dep[1L]) : > > input string 1 is invalid in this locale > > > > The problem comes from the "á" character in the .R file. The file > > appears to be encoded as "iso-8859-1": > > > > $ file --mime-encoding encoding_export_issue.R > > encoding_export_issue.R: iso-8859-1 > > > > Note that for me: > > > > > getOption("encoding") > > [1] "native.enc" > > > > so "native.enc" is used for the "encoding" argument of source(). > > > > The following two calls succeed: > > > > > source("encoding_export_issue.R", echo = TRUE, encoding = "unknown") > > > source("encoding_export_issue.R", echo = TRUE, encoding = "iso-8859-1") > > > > Is this file a valid "iso-8859-1" encoded file? > > The one you attached is not. The one linked to in dropbox is. > > Why does source() fail > > in the case of encoding set to "native.enc"? Is it because of the > > settings to UTF-8 in my locale (see info on my system at the bottom of > > this email). > > Yes. > > > > > I'm guessing it would be a bad idea to put > > > > options(encoding = "unknown") > > > > in my .Rprofile, because it is difficult to always correctly guess the > > encoding of files? > > My guess is that the issue is less about the difficulty of guessing > the encoding, and more about the time it takes to do so. That's not > particularly relevant for the "source" function, but the encoding > option is used by many of the file IO functions in R and so has > implications well beyond the behavior of "source". Ah I did not think about this possibility. Makes sense. > > Is there a reason why setting it to "unknown" would > > lead to more problems than leaving it set to "native.enc"? > > It depends on what you are actually doing. If you are on a UTF-8 > locale and working exclusively with UTF-8 files, setting > options(encoding = "unknown") will just slow down your file IO by > checking for the encoding every time. Good to know. Thank you for your response, Ista. Scott -- Scott Kostyshak Assistant Professor of Economics University of Florida https://people.clas.ufl.edu/skostyshak/ > > > > I've reproduced the
[R] [Rd] source(echo = TRUE) with a iso-8859-1 encoded file gives an error
I have very little knowledge about file encodings and would like to learn more. I've read the following pages to learn more: http://stat.ethz.ch/R-manual/R-devel/library/base/html/Encoding.html https://stackoverflow.com/questions/4806823/how-to-detect-the-right-encoding-for-read-csv https://developer.r-project.org/Encodings_and_R.html The last one, in particular, has been very helpful. I would be interested in any further references that you suggest. I attach a file that reproduces the issue I would like to learn more about. I do not know if the file encoding will be correctly preserved through email, so I also provide the file (temporarily) on Dropbox here: https://www.dropbox.com/s/3lbgebk7b5uaia7/encoding_export_issue.R?dl=0 The file gives an error when using "source()" with the argument echo = TRUE: > source("encoding_export_issue.R", echo = TRUE) Error in nchar(dep, "c") : invalid multibyte string, element 1 In addition: Warning message: In grepl("^[[:blank:]]*$", dep[1L]) : input string 1 is invalid in this locale The problem comes from the "á" character in the .R file. The file appears to be encoded as "iso-8859-1": $ file --mime-encoding encoding_export_issue.R encoding_export_issue.R: iso-8859-1 Note that for me: > getOption("encoding") [1] "native.enc" so "native.enc" is used for the "encoding" argument of source(). The following two calls succeed: > source("encoding_export_issue.R", echo = TRUE, encoding = "unknown") > source("encoding_export_issue.R", echo = TRUE, encoding = "iso-8859-1") Is this file a valid "iso-8859-1" encoded file? Why does source() fail in the case of encoding set to "native.enc"? Is it because of the settings to UTF-8 in my locale (see info on my system at the bottom of this email). I'm guessing it would be a bad idea to put options(encoding = "unknown") in my .Rprofile, because it is difficult to always correctly guess the encoding of files? Is there a reason why setting it to "unknown" would lead to more problems than leaving it set to "native.enc"? I've reproduced the above behavior on R-devel (r74677) and 3.4.3. Below is my session info and locale info for my system with the 3.4.3 version: > sessionInfo() R version 3.4.3 (2017-11-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.3 LTS Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.6.0 LAPACK: /usr/lib/lapack/liblapack.so.3.6.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.4.3 > Sys.getlocale() [1] "LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C" Thanks for your time, Scott P.S. Note that I had posted this question to r-devel, which was the incorrect choice. For archival purposes, I reference the thread here: https://www.mail-archive.com/search?l=mid=20180501185750.445oub53vcdnyyyx%40steph -- Scott Kostyshak Assistant Professor of Economics University of Florida https://people.clas.ufl.edu/skostyshak/ # Ch?vez quantile_type <- 4 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Gender balance in R
On Mon, Nov 24, 2014 at 12:34 PM, Sarah Goslee sarah.gos...@gmail.com wrote: I took a look at apparent gender among list participants a few years ago: https://stat.ethz.ch/pipermail/r-help/2011-June/280272.html Same general thing: very few regular participants on the list were women. I don't see any sign that that has changed in the last three years. The bar to participation in the R-help list is much, much lower than that to become a developer. I plotted the gender of posters on r-help over time. The plot is here: https://twitter.com/scottkosty/status/449933971644633088 The code to reproduce that plot is here: https://github.com/scottkosty/genderAnalysis The R file there will call devtools::install_github to install a package from Github used for guessing the gender based on the first name (https://github.com/scottkosty/gender). Note also on that tweet that Gabriela de Queiroz posted it, who is the founder of R-ladies; and that David Smith showed interest in discussing the topic. So there is definitely demand for some data analysis and discussion on the topic. It would be interesting to look at the stats for CRAN packages as well. The very low percentage of regular female participants is one of the things that keeps me active on this list: to demonstrate that it's not only men who use R and participate in the community. Thank you for that! Scott -- Scott Kostyshak Economics PhD Candidate Princeton University (If you decide to do the stats for 2014, be aware that I've been out on medical leave for the past two months, so the numbers are even lower than usual.) Sarah On Mon, Nov 24, 2014 at 10:10 AM, Maarten Blaauw maarten.bla...@qub.ac.uk wrote: Hi there, I can't help to notice that the gender balance among R developers and ordinary members is extremely skewed (as it is with open source software in general). Have a look at http://www.r-project.org/foundation/memberlist.html - at most a handful of women are listed among the 'supporting members', and none at all among the 29 'ordinary members'. On the other hand I personally know many happy R users of both genders. My questions are thus: Should R developers (and users) be worried that the 'other half' is excluded? If so, how could female R users/developers be persuaded to become more visible (e.g. added as supporting or ordinary members)? Thanks, Maarten -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Gender balance in R
On Tue, Nov 25, 2014 at 8:24 AM, Maarten Blaauw maarten.bla...@qub.ac.uk wrote: Nice graph, Scott, thanks! Based on your code I plotted not the absolute numbers but the ratios, which show slowly increasing relative participation of female Rhelpers over time (red = women, blue=men, black=unknown). After a c. 5% female contribution in 1998, this has grown to about 15% now. At this rate we'll reach parity around AD 2080. Interesting forecasts Maarten! Let's hope for a trend break to make them wrong. Scott -- Scott Kostyshak Economics PhD Candidate Princeton University My code: if (!require(gender)) { library(devtools) install_github(scottkosty/gender) library(gender) } rHelp - rHelpNames rHelp[is.na(rHelp$gender), gender] - unknown yr - unique(rHelp$year) helpers - list(dates, M=rep(0, length(yr)), F=rep(0, length(yr)), unkn=rep(0, length(yr))) for(i in 1:nrow(rHelp)) { j - which(yr == rHelp$year[i]) gender - rHelp$gender[i] if(gender == M) helpers$M[[j]] - helpers$M[[j]]+1 else if(gender == F) helpers$F[[j]] - helpers$F[[j]]+1 else if(gender == unknown) helpers$unkn[[j]] - helpers$unkn[[j]]+1 } plot(yr, helpers$M / (helpers$M+helpers$F+helpers$unkn), type=l, col=4, ylim=c(0,1), ylab=proportions, yaxs=i) lines(yr, helpers$F / (helpers$M+helpers$F+helpers$unkn), col=2) lines(yr, helpers$unkn / (helpers$M+helpers$F+helpers$unkn)) Cheers, Maarten On 25/11/14 12:11, Scott Kostyshak wrote: On Mon, Nov 24, 2014 at 12:34 PM, Sarah Goslee sarah.gos...@gmail.com wrote: I took a look at apparent gender among list participants a few years ago: https://stat.ethz.ch/pipermail/r-help/2011-June/280272.html Same general thing: very few regular participants on the list were women. I don't see any sign that that has changed in the last three years. The bar to participation in the R-help list is much, much lower than that to become a developer. I plotted the gender of posters on r-help over time. The plot is here: https://twitter.com/scottkosty/status/449933971644633088 The code to reproduce that plot is here: https://github.com/scottkosty/genderAnalysis The R file there will call devtools::install_github to install a package from Github used for guessing the gender based on the first name (https://github.com/scottkosty/gender). Note also on that tweet that Gabriela de Queiroz posted it, who is the founder of R-ladies; and that David Smith showed interest in discussing the topic. So there is definitely demand for some data analysis and discussion on the topic. It would be interesting to look at the stats for CRAN packages as well. The very low percentage of regular female participants is one of the things that keeps me active on this list: to demonstrate that it's not only men who use R and participate in the community. Thank you for that! Scott -- Scott Kostyshak Economics PhD Candidate Princeton University (If you decide to do the stats for 2014, be aware that I've been out on medical leave for the past two months, so the numbers are even lower than usual.) Sarah On Mon, Nov 24, 2014 at 10:10 AM, Maarten Blaauw maarten.bla...@qub.ac.uk wrote: Hi there, I can't help to notice that the gender balance among R developers and ordinary members is extremely skewed (as it is with open source software in general). Have a look at http://www.r-project.org/foundation/memberlist.html - at most a handful of women are listed among the 'supporting members', and none at all among the 29 'ordinary members'. On the other hand I personally know many happy R users of both genders. My questions are thus: Should R developers (and users) be worried that the 'other half' is excluded? If so, how could female R users/developers be persuaded to become more visible (e.g. added as supporting or ordinary members)? Thanks, Maarten -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- | Dr. Maarten Blaauw | Lecturer in Chronology | | School of Geography, Archaeology Palaeoecology | Queen's University Belfast, UK | | www http://www.chrono.qub.ac.uk/blaauw | tel +44 (0)28 9097 3895 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Gender balance in R
On Tue, Nov 25, 2014 at 1:15 PM, Martin Morgan mtmor...@fredhutch.org wrote: On 11/25/2014 04:11 AM, Scott Kostyshak wrote: On Mon, Nov 24, 2014 at 12:34 PM, Sarah Goslee sarah.gos...@gmail.com wrote: I took a look at apparent gender among list participants a few years ago: https://stat.ethz.ch/pipermail/r-help/2011-June/280272.html Same general thing: very few regular participants on the list were women. I don't see any sign that that has changed in the last three years. The bar to participation in the R-help list is much, much lower than that to become a developer. I plotted the gender of posters on r-help over time. The plot is here: https://twitter.com/scottkosty/status/449933971644633088 The code to reproduce that plot is here: https://github.com/scottkosty/genderAnalysis The R file there will call devtools::install_github to install a package from Github used for guessing the gender based on the first name (https://github.com/scottkosty/gender). It would be great to include in your package the script that scraped author names from R-help archives (I guess that's what you did?). Presumably it easily applies to other mailing lists hosted at the same location (R-devel, further along the ladder from user to developer, and Bioconductor / Bioc-devel, in a different domain and perhaps confounded with a different 'feel' to the list). Also the R community is definitely international, so finding more versatile gender-assignment approaches seems important. I just put the script up on https://github.com/scottkosty/genderAnalysis I don't have much time at the moment to generalize it, but a pull request is always welcome. Alternatively, anyone is welcome (at least as far as I'm concerned) to take the script and modify it for any purpose. it might be interesting to ask about participation in mailing list forums versus other, and in particular the recent Bioconductor transition from mailing list to 'StackOverflow' style support forum (https://support.bioconductor.org) -- on the one hand the 'gamification' elements might seem to only entrench male participation, while on the other we have already seen increased (quantifiable) and broader (subjective) participation from the Bioconductor community. I'd be happy to make support site usage data available, and am interested in collaborating in an academically well-founded analysis of this data; any interested parties please feel free to contact me off-list. I would be interested in collaborating on such a project in the future also. Scott -- Scott Kostyshak Economics PhD Candidate Princeton University Martin Morgan Bioconductor Note also on that tweet that Gabriela de Queiroz posted it, who is the founder of R-ladies; and that David Smith showed interest in discussing the topic. So there is definitely demand for some data analysis and discussion on the topic. It would be interesting to look at the stats for CRAN packages as well. The very low percentage of regular female participants is one of the things that keeps me active on this list: to demonstrate that it's not only men who use R and participate in the community. Thank you for that! Scott -- Scott Kostyshak Economics PhD Candidate Princeton University (If you decide to do the stats for 2014, be aware that I've been out on medical leave for the past two months, so the numbers are even lower than usual.) Sarah On Mon, Nov 24, 2014 at 10:10 AM, Maarten Blaauw maarten.bla...@qub.ac.uk wrote: Hi there, I can't help to notice that the gender balance among R developers and ordinary members is extremely skewed (as it is with open source software in general). Have a look at http://www.r-project.org/foundation/memberlist.html - at most a handful of women are listed among the 'supporting members', and none at all among the 29 'ordinary members'. On the other hand I personally know many happy R users of both genders. My questions are thus: Should R developers (and users) be worried that the 'other half' is excluded? If so, how could female R users/developers be persuaded to become more visible (e.g. added as supporting or ordinary members)? Thanks, Maarten -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone
Re: [R] Testing general hypotheses on regression coefficients
Hi Chris, On Fri, Sep 5, 2014 at 7:17 PM, Chris bonsxa...@yahoo.com wrote: Hi. Say I have a model like y = a + B1*x1 + B2*x2 + B3*x3 + B4*x4 + e and I want to test H0: B2/B1 = 0 As noted by Bert, think about this. or H0: B2/B1=B4/B3 (whatever H1). How can I proceed? I now about car::linearHypothesis, but I can't figure out a way to do the tests above. Any hint? Take a look at car::deltaMethod. I suggest you study the theory of the delta method. If you happen to have taken a graduate statistics/econometrics class it should not be difficult and can provide some insights. If not, at least consider that the delta method can lead to misleading estimates (biased standard errors) in many cases for finite samples. You might want to run some simulations to get a feel for it. Best, Scott -- Scott Kostyshak Economics PhD Candidate Princeton University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R-es] NA no es reconocido como NA
El 13 de agosto de 2014, 17:06, neo ericconchamu...@gmail.com escribió: Estimados, cuál es la diferencia para R entre : NA NA NA NA Para saberlo, usa class, e.g. class(NA) [1] character class(NA) [1] logical class(NA_integer_) [1] integer class(NA) [1] character como demuestra el segundo y tercer ejemplos, hay varios tipos de NA. Saludos, Scott -- Scott Kostyshak Economics PhD Candidate Princeton University ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es
Re: [R-es] Descargar lista de paquetes zipeados
Hola, Si el objetivo final es replicar algo en el futuro con exactamente las mismas versiones de los paquetes, tal vez le sirve packrat: http://rstudio.github.io/packrat/ Saludos, Scott 2014-07-22 14:03 GMT+10:00 Julio Alejandro Di Rienzo dirienzo.ju...@gmail.com: Hola Alguien sabe como descargar una lista de librerías de R en formato zipeado. Por ejemplo quiero descargar las librerías (lme4, latticeExtras, Biobase,., etc,etc) en formato zipeado. Se que puedo hacerlo una por una desde el cran pero quisiera tener un procedimiento para hacerlo automáticamente. Prof. Julio Di Rienzo Estadística y Biometría FCA- U.N. Córdoba http://sites.google.com/site/juliodirienzo ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es
Re: [R-es] Bootstrap
2014-06-11 7:57 GMT-04:00 rubenfcasal rubenfca...@gmail.com: Hola Celia, Yo normalmente empleo el paquete boot (me parece mejor que el paquete bootstrap), pero en cualquier caso si necesitas información adicional sobre el tema, la referencia que se suele recomendar es: Davison, A.C. and Hinkley, D.V. (1997). Bootstrap Methods and Their Application. Cambridge University Press Un saludo, Rubén Fernández Casal El 11/06/2014 9:04, Celia Rubio Linares escribió: Hola! Tengo que hacer un proyecto acerca del paquete bootstrap de R, alguien podría facilitarme información completa (y en español a ser posible) acerca de este paquete? Muchas gracias de antemano, y un saludo. [[alternative HTML version deleted]] Hola Celia, Soy el encargado del paquete bootstrap. También recomiendo el paquete boot si es un trabajo serio (como otro paquete). Además, el paquete boot tiene una opción para que funcione en paralel. Todavía no hemos hecho lo mismo en el paquete boostrap, que es sobre todo para aprender. Y para este motivo creo que es mejor comenzar con el libro que inspiró el paquete, o sea: An Introduction to the Bootstrap por Bradley Efron. Este libro es una maravilla. Uno no tiene que tener much experiencia en matématicas para leerlo. Explica muy bien la intuición de por qué el bootstrap funciona (y por qué no en los casos en que no funciona). Busqué un poco pero creo que por desgracia el libro no se tardujo al español. Como puedes ver todos aquí te estamos recomiendo libros en vez de explicar el paquete bootstrap. Creo que es porque en cuanto a como usar el paquete, si entiendes el bootstrap, no hay mucha explicación necesaria. Ve los ejemplos. Cualquier duda, haznos una pregunta específica de lo que quieres hacer y lo que intentaste. Saludos, Scott ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es