Re: [Bioc-devel] ShortRead: optional custom labeling of samples in QA report
Hi, Since the attached file didn't make it all the way through to the mailing list, you can find it at http://www.ebi.ac.uk/~jgehring/share/shortRead-pkg/0001-Example-patch-for-naming-samples-in-BAMQA.patch. Best wishes Julian On 02/12/2013 03:23 PM, Julian Gehring wrote: Hi, In the QA report of the 'ShortRead' package, a short sequential integer labeling for referencing the samples/files throughout the report is created by default. Would it be reasonable/possible to allow for other optional names to label the samples to make the results of the report easier to understand? In general, I have three ideas what would be handy to have: 1. Derive a label from the file names. This is probably hard to generalize and implement in a way that it actually helps. 2. In case the 'dirPath' argument in the 'qa' function call is a named vector, such as qa(dirPath=c(p1=bam_file1.bam, p2=bam_file2.bam)) use the names [p1, p2] for the labeling later on. This would require storing the names in the object returned by 'qa', but should not be too hard to implement. 3. Optionally, pass a named vector to the 'report' method, matching file names to sample labels. In case the file names do not match or 'samples' is missing, default to the sequential labeling. For option 3, I have created a simple example patch to illustrate how this could be implemented (see attached). So, later this may look like this: library(ShortRead) files = c(p1=bam_file1.bam, p2=bam_file2.bam) qa = qa(files, type=BAM) ## default sequential labeling ## ShortRead:::.report_html_BAMQA(qa, dest=report_normal) ## samples named according to names(files) ## ShortRead:::.report_html_BAMQA(qa, dest=report_named, samples=files) I'm happy about any inputs or thoughts regarding this. Best wishes Julian ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] ShortRead: optional custom labeling of samples in QA report
On 02/12/2013 06:29 AM, Julian Gehring wrote: Hi, Since the attached file didn't make it all the way through to the mailing list, you can find it at http://www.ebi.ac.uk/~jgehring/share/shortRead-pkg/0001-Example-patch-for-naming-samples-in-BAMQA.patch. Thanks Julian the request seems reasonable and I'll try to get to this in the next week. Martin Best wishes Julian On 02/12/2013 03:23 PM, Julian Gehring wrote: Hi, In the QA report of the 'ShortRead' package, a short sequential integer labeling for referencing the samples/files throughout the report is created by default. Would it be reasonable/possible to allow for other optional names to label the samples to make the results of the report easier to understand? In general, I have three ideas what would be handy to have: 1. Derive a label from the file names. This is probably hard to generalize and implement in a way that it actually helps. 2. In case the 'dirPath' argument in the 'qa' function call is a named vector, such as qa(dirPath=c(p1=bam_file1.bam, p2=bam_file2.bam)) use the names [p1, p2] for the labeling later on. This would require storing the names in the object returned by 'qa', but should not be too hard to implement. 3. Optionally, pass a named vector to the 'report' method, matching file names to sample labels. In case the file names do not match or 'samples' is missing, default to the sequential labeling. For option 3, I have created a simple example patch to illustrate how this could be implemented (see attached). So, later this may look like this: library(ShortRead) files = c(p1=bam_file1.bam, p2=bam_file2.bam) qa = qa(files, type=BAM) ## default sequential labeling ## ShortRead:::.report_html_BAMQA(qa, dest=report_normal) ## samples named according to names(files) ## ShortRead:::.report_html_BAMQA(qa, dest=report_named, samples=files) I'm happy about any inputs or thoughts regarding this. Best wishes Julian ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] R CMD build not copying PDF vignettes to inst\doc
Hi Dan, I actually just installed the latest version of R-Devel (r61902) and used biocLite(plgem) to download and install the latest version of my package from the server. Although there are no errors or warnings on the Bioc build/check report, my package still lacks the PDF version of the vignette. I checked the source tarball in http://www.bioconductor.org/packages/2.12/bioc/src/contrib/plgem_1.31.1.tar.gz and in fact cannot see any PDFs in inst/doc. You can also notice the vignette is not listed anymore in http://www.bioconductor.org/packages/2.12/bioc/html/plgem.html I then rebuilt the package from source myself from a freshly checked-out version from the Bioc-devel repository (plgem version 1.31.1) using R-Devel r61902. I get no errors, no warnings and most importantly the PDF is being built and included in the tarball correctly. So it appears that R-Devel r61868 (the version currenlty on the build machine) is still not copying the vignette PDF into the package. Could you please try to update R-Devel to r61902 and see if it solves the problem? Thanks! Norman P.S.: For full disclosure, I should probably mention that I recently moved the .Rnw file from inst/doc to /vignettes following the latest R recommendations, but I am unsure if this has anything to do with the problem, as the package builds just fine on my machine using the latest version of R-Devel. On Wed, Feb 13, 2013 at 12:11 PM, Norman Pavelka normanpave...@gmail.com wrote: Hi Dan, I can see the issue is resolved now! I will update my version of R-devel, too. Thanks, Norman On Fri, Feb 8, 2013 at 1:19 PM, Dan Tenenbaum dtene...@fhcrc.org wrote: On Thu, Feb 7, 2013 at 8:48 PM, Dan Tenenbaum dtene...@fhcrc.org wrote: Hi Norman, On Thu, Feb 7, 2013 at 6:59 PM, Norman Pavelka normanpave...@gmail.com wrote: Hi, I am sure many of you may have noticed already, but basically every package in Bioc-devel that has a vignette (i.e. almost every package) is currently issuing warnings in R CMD check: http://www.bioconductor.org/checkResults/2.12/bioc-LATEST/ I ran some tests myself and it appears that in the latest version of R-devel some changes have been introduced in R CMD build that causes it not to copy the compiled PDF vignettes to inst\doc. R CMD build returns only a silent warning such as: * creating vignettes ... OK Warning in file.copy(c(vigns$docs, outfiles), doc_dir) : problem copying E:\biocbld\bbs-2.12-bioc\tmpdir\Rtmpq4jjoR\Rbuild93c202b5ca6\plgem\vignettes\plgem.pdf to inst\doc\plgem.pdf: No such file or directory R CMD check then issues the following user-visible warning: * checking package vignettes in 'inst/doc' ... WARNING Package vignette without corresponding PDF: 'plgem.Rnw' Compiling my package from the same source but using the previous version of R CMD build does not cause any problems, i.e. the vignette PDF is correctly copied to inst/doc and R CMD check does not issue any warning. Should we bring this up to R-Devel mailing list? I'm not sure (checking right now) but I think this was fixed in r61843. The build machines are running r61836. The nightly build is underway but I will update R-devel tomorrow if doing so indeed fixes the problem. I can confirm that pdfs are properly copied into source tarballs with R-devel r61868. I will update to the latest R-devel tomorrow. Dan Thanks! Dan Cheers, Norman ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Rd] stringsAsFactors
FWIW my view is that for data cleaning and organizing factors just get it the way. For modeling I like them because they make it easier to understand what is happening. For example I can look at the levels() to see what the reference group will be. With characters one has to know a) that levels are created in alphabetical order and b) the alphabetical order of the the unique values in the character vector. Ugh. So my habit is to turn off stringsAsFactors, then explicitly convert to factors before modeling (I also use factors to change the order in which things are displayed in tables and graphs, another place where converting to factors myself is useful but the creating them in alphabetical order by default is not) All this is to say that I would like options(stingsAsFactors=FALSE) to be the default, but I like the warning about converting to factors in modeling functions because it reminds me that I forgot to covert them, which I like to do anyway... Best, Ista On Mon, Feb 11, 2013 at 12:50 PM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 11/02/2013 12:13 PM, William Dunlap wrote: Note that changing this does not just mean getting rid of silly warnings. Currently, predict.lm() can give wrong answers when stringsAsFactors is FALSE. d - data.frame(x=1:10, f=rep(c(A,B,C), c(4,3,3)), y=c(1:4, 15:17, 28.1,28.8,30.1)) fit_ab - lm(y ~ x + f, data = d, subset = f!=B) Warning message: In model.matrix.default(mt, mf, contrasts) : variable 'f' converted to a factor predict(fit_ab, newdata=d) 1 2 3 4 5 6 7 8 9 10 1 2 3 4 25 26 27 8 9 10 Warning messages: 1: In model.matrix.default(Terms, m, contrasts.arg = object$contrasts) : variable 'f' converted to a factor 2: In predict.lm(fit_ab, newdata = d) : prediction from a rank-deficient fit may be misleading fit_ab is not rank-deficient and the predict should report 1 2 3 4 NA NA NA 28 29 30 In R-devel, the two warnings about factor conversions are no longer given, but the predictions are the same and the warning about rank deficiency still shows up. If f is set to be a factor, an error is generated: Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor f has new levels B I think both the warning and error are somewhat reasonable responses. The fit is rank deficient relative to the model that includes f == B, because the column of the design matrix corresponding to f level B would be completely zero. In this particular model, we could still do predictions for the other levels, but it also seems reasonable to quit, given that clearly something has gone wrong. I do think that it's unfortunate that we don't get the same result in both cases, and I'd like to have gotten the predictions you suggested, but I don't think that's going to happen. The reason for the difference is that the subsetting is done before the conversion to a factor, but I think that is unavoidable without really big changes. Duncan Murdoch Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On Behalf Of Terry Therneau Sent: Monday, February 11, 2013 5:50 AM To: r-devel@r-project.org; Duncan Murdoch Subject: Re: [Rd] stringsAsFactors I think your idea to remove the warnings is excellent, and a good compromise. Characters already work fine in modeling functions except for the silly warning. It is interesting how often the defaults for a program reflect the data sets in use at the time the defaults were chosen. There are some such in my own survival package whose proper value is no longer as obvious as it was when I chose them. Factors are very handy for variables which have only a few levels and will be used in modeling. Every character variable of every dataset in Statistical Models in S, which introduced factors, is of this type so auto-transformation made a lot of sense. The solder data set there is one for which Helmert contrasts are proper so guess what the default contrast option was? (I think there are only a few data sets in the world for which Helmert makes sense, however, and R eventually changed the default.) For character variables that should not be factors such as a street adress stringsAsFactors can be a real PITA, and I expect that people's preference for the option depends almost entirely on how often these arise in their own work. As long as there is an option that can be overridden I'm okay. Yes, I'd prefer FALSE as the default, partly because the current value is a tripwire in the hallway that eventually catches every new user. Terry Therneau On 02/11/2013 05:00 AM, r-devel-requ...@r-project.org wrote: Both of these were discussed by R Core. I think it's unlikely the default for stringsAsFactors will be
[Rd] Private environments and/or assignInMyNamespace
Dear DevelopeRs, I've been struggling with the new regulations regarding modifications to the search path, regarding my Rcmdr plugin package RcmdrPlugin.DoE. John Fox made Rcmdr comply with the new policy by removing the environment RcmdrEnv from the search path. For the time being, he developed an option that allows users to put the environment from Rcmdr (RcmdrEnv) on the search path, like in earlier versions of Rcmdr (thanks John!), which rescues my package for the immediate future; however, in the long run it would be nice to be able to make it work without that. The reason why I currently need the environment on the search path (may be due to my lack of understanding how tcltk widgets are handled): I have quite elaborate notebook widgets on which users can make many entries. Some entries are only checked after clicking OK, and if an error is found at that point, the user receives a small message window that has to be confirmed and is subsequently returned to the notebook widget in the state it was in when pressing OK. These widgets are currently held in the environment RcmdrEnv; they work when RcmdrEnv is on the search path; however, it is not sufficient to retrieve them with John's function getRcmdr, which works fine for objects other than widgets. Here my question: Would it be an option to place the widgets in a private environment of my plugin package (then I would have to learn how to create one and work with it), or won't they be found that way? Alternatively, I could have unexported objects of all required names in my namespace and modify these via assignInMyNamespace (I don't think that anybody from somewhere else would import that namespace, it's not that kind of package). Would that be a viable alternative, and would the widgets be found that way? Any further ideas? Best regards, Ulrike -- * * Ulrike Groemping * * BHT Berlin - University of Applied Sciences * * * +49 (30) 39404863 (Home Office) * * +49 (30) 4504 5127 (BHT) * * * http://prof.beuth-hochschule.de/groemping * * groemp...@bht-berlin.de * __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
Duncan Murdoch murdoch.duncan at gmail.com writes: [snip] Regarding stringsAsFactors: I'm not going to defend keeping it as is, I'll let the people who like it defend it. Would someone (anyone) like to come forward and give us a defense of stringsAsFactors=TRUE -- even someone who doesn't personally like it but would like to play devil's advocate? What I will likely do is make a few changes so that character vectors are automatically changed to factors in modelling functions, so that operating with stringsAsFactors=FALSE doesn't trigger silly warnings. Duncan Murdoch [apologies for snipping context: gmane made me do it] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Private environments and/or assignInMyNamespace
Here my question: Would it be an option to place the widgets in a private environment of my plugin package (then I would have to learn how to create one and work with it), or won't they be found that way? It sounds like you want to maintain state across function calls within your package, and a private environment is a good solution. See the notes on local() at https://github.com/hadley/devtools/wiki/Environments for a few details. Hadley -- Chief Scientist, RStudio http://had.co.nz/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
On 12.02.2013 14:54, Ben Bolker wrote: Duncan Murdoch murdoch.duncan at gmail.com writes: [snip] Regarding stringsAsFactors: I'm not going to defend keeping it as is, I'll let the people who like it defend it. Would someone (anyone) like to come forward and give us a defense of stringsAsFactors=TRUE -- even someone who doesn't personally like it but would like to play devil's advocate? Sure: I will have to change all my scripts, my teaching examples, my book, and lots of code examples for research and particularly consulting jobs. Personally, I think having stringsAsFactors=TRUE is not too bad for read.table() but less useful for data.frame(). And since you ask for the devil's advocate already, related to the subject line: Removing stars is horrible for consulting: With all those people from biology, medicine and other fields who even ask us questions in term of significance stars that are obviously very common for them. Many of them will certainly ask us for the stars, and ask us to switch to another software product once they do not get it from R. They may not be interested in being taught about the advantages or disadvantages of p-values or stars. There are different use cases of R, and I want to keep stars for consulting tasks where things have to be delivered within minutes. I am happy with or without for teaching, where I have the time and can easily talk about the sense and nonsense of p-values. Best, Uwe What I will likely do is make a few changes so that character vectors are automatically changed to factors in modelling functions, so that operating with stringsAsFactors=FALSE doesn't trigger silly warnings. Duncan Murdoch [apologies for snipping context: gmane made me do it] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
Uwe I've been consulting for decades and have never once been asked for such stars. And when a clinical researcher puts a sentence in a study protocol that P0.05 will be considered significant I get them to take it out. Frank Uwe Ligges-3 wrote On 12.02.2013 14:54, Ben Bolker wrote: Duncan Murdoch murdoch.duncan at gmail.com writes: [snip] Regarding stringsAsFactors: I'm not going to defend keeping it as is, I'll let the people who like it defend it. Would someone (anyone) like to come forward and give us a defense of stringsAsFactors=TRUE -- even someone who doesn't personally like it but would like to play devil's advocate? Sure: I will have to change all my scripts, my teaching examples, my book, and lots of code examples for research and particularly consulting jobs. Personally, I think having stringsAsFactors=TRUE is not too bad for read.table() but less useful for data.frame(). And since you ask for the devil's advocate already, related to the subject line: Removing stars is horrible for consulting: With all those people from biology, medicine and other fields who even ask us questions in term of significance stars that are obviously very common for them. Many of them will certainly ask us for the stars, and ask us to switch to another software product once they do not get it from R. They may not be interested in being taught about the advantages or disadvantages of p-values or stars. There are different use cases of R, and I want to keep stars for consulting tasks where things have to be delivered within minutes. I am happy with or without for teaching, where I have the time and can easily talk about the sense and nonsense of p-values. Best, Uwe What I will likely do is make a few changes so that character vectors are automatically changed to factors in modelling functions, so that operating with stringsAsFactors=FALSE doesn't trigger silly warnings. Duncan Murdoch [apologies for snipping context: gmane made me do it] __ R-devel@ mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@ mailing list https://stat.ethz.ch/mailman/listinfo/r-devel - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Regression-stars-tp4657795p4658268.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
On 12/02/2013 9:20 AM, Uwe Ligges wrote: On 12.02.2013 14:54, Ben Bolker wrote: Duncan Murdoch murdoch.duncan at gmail.com writes: [snip] Regarding stringsAsFactors: I'm not going to defend keeping it as is, I'll let the people who like it defend it. Would someone (anyone) like to come forward and give us a defense of stringsAsFactors=TRUE -- even someone who doesn't personally like it but would like to play devil's advocate? Sure: I will have to change all my scripts, my teaching examples, my book, and lots of code examples for research and particularly consulting jobs. Could you post an example of a non-trivial one? (By trivial, I mean one that says data.frame() converts character vectors to factors. Obviously that would need to change. I mean one that just assumes current behaviour, and would be broken by the change.) Duncan Murdoch Personally, I think having stringsAsFactors=TRUE is not too bad for read.table() but less useful for data.frame(). And since you ask for the devil's advocate already, related to the subject line: Removing stars is horrible for consulting: With all those people from biology, medicine and other fields who even ask us questions in term of significance stars that are obviously very common for them. Many of them will certainly ask us for the stars, and ask us to switch to another software product once they do not get it from R. They may not be interested in being taught about the advantages or disadvantages of p-values or stars. There are different use cases of R, and I want to keep stars for consulting tasks where things have to be delivered within minutes. I am happy with or without for teaching, where I have the time and can easily talk about the sense and nonsense of p-values. Best, Uwe What I will likely do is make a few changes so that character vectors are automatically changed to factors in modelling functions, so that operating with stringsAsFactors=FALSE doesn't trigger silly warnings. Duncan Murdoch [apologies for snipping context: gmane made me do it] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Contribution
Hi, I am Parthasarathy G , from IIT Maras ( India ). I am currently in third year of the undergraduate course. I would like to contribute to the R project. Can anyone guide me regarding this? Thanking you, Parthasarathy [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
I think that we should use P .03 (which approximates the probability of 5 consecutive heads) for assigning significance! Ravi -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On Behalf Of Frank Harrell Sent: Tuesday, February 12, 2013 9:43 AM To: r-devel@r-project.org Subject: Re: [Rd] Regression stars Uwe I've been consulting for decades and have never once been asked for such stars. And when a clinical researcher puts a sentence in a study protocol that P0.05 will be considered significant I get them to take it out. Frank Uwe Ligges-3 wrote On 12.02.2013 14:54, Ben Bolker wrote: Duncan Murdoch murdoch.duncan at gmail.com writes: [snip] Regarding stringsAsFactors: I'm not going to defend keeping it as is, I'll let the people who like it defend it. Would someone (anyone) like to come forward and give us a defense of stringsAsFactors=TRUE -- even someone who doesn't personally like it but would like to play devil's advocate? Sure: I will have to change all my scripts, my teaching examples, my book, and lots of code examples for research and particularly consulting jobs. Personally, I think having stringsAsFactors=TRUE is not too bad for read.table() but less useful for data.frame(). And since you ask for the devil's advocate already, related to the subject line: Removing stars is horrible for consulting: With all those people from biology, medicine and other fields who even ask us questions in term of significance stars that are obviously very common for them. Many of them will certainly ask us for the stars, and ask us to switch to another software product once they do not get it from R. They may not be interested in being taught about the advantages or disadvantages of p-values or stars. There are different use cases of R, and I want to keep stars for consulting tasks where things have to be delivered within minutes. I am happy with or without for teaching, where I have the time and can easily talk about the sense and nonsense of p-values. Best, Uwe What I will likely do is make a few changes so that character vectors are automatically changed to factors in modelling functions, so that operating with stringsAsFactors=FALSE doesn't trigger silly warnings. Duncan Murdoch [apologies for snipping context: gmane made me do it] __ R-devel@ mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@ mailing list https://stat.ethz.ch/mailman/listinfo/r-devel - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Regression-stars-tp4657795p4658268.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
On 12.02.2013 15:42, Frank Harrell wrote: Uwe I've been consulting for decades and have never once been asked for such stars. Honestly: last time I have been asked last week. And when I answered (in another case few months ago) OK, I can add you another 5 stars for p values smaller than 0.5 they did not find it too funny. Best, Uwe And when a clinical researcher puts a sentence in a study protocol that P0.05 will be considered significant I get them to take it out. Frank Uwe Ligges-3 wrote On 12.02.2013 14:54, Ben Bolker wrote: Duncan Murdoch murdoch.duncan at gmail.com writes: [snip] Regarding stringsAsFactors: I'm not going to defend keeping it as is, I'll let the people who like it defend it. Would someone (anyone) like to come forward and give us a defense of stringsAsFactors=TRUE -- even someone who doesn't personally like it but would like to play devil's advocate? Sure: I will have to change all my scripts, my teaching examples, my book, and lots of code examples for research and particularly consulting jobs. Personally, I think having stringsAsFactors=TRUE is not too bad for read.table() but less useful for data.frame(). And since you ask for the devil's advocate already, related to the subject line: Removing stars is horrible for consulting: With all those people from biology, medicine and other fields who even ask us questions in term of significance stars that are obviously very common for them. Many of them will certainly ask us for the stars, and ask us to switch to another software product once they do not get it from R. They may not be interested in being taught about the advantages or disadvantages of p-values or stars. There are different use cases of R, and I want to keep stars for consulting tasks where things have to be delivered within minutes. I am happy with or without for teaching, where I have the time and can easily talk about the sense and nonsense of p-values. Best, Uwe What I will likely do is make a few changes so that character vectors are automatically changed to factors in modelling functions, so that operating with stringsAsFactors=FALSE doesn't trigger silly warnings. Duncan Murdoch [apologies for snipping context: gmane made me do it] __ R-devel@ mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@ mailing list https://stat.ethz.ch/mailman/listinfo/r-devel - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Regression-stars-tp4657795p4658268.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
On 13-02-12 09:20 AM, Uwe Ligges wrote: On 12.02.2013 14:54, Ben Bolker wrote: Duncan Murdoch murdoch.duncan at gmail.com writes: [snip] Regarding stringsAsFactors: I'm not going to defend keeping it as is, I'll let the people who like it defend it. Would someone (anyone) like to come forward and give us a defense of stringsAsFactors=TRUE -- even someone who doesn't personally like it but would like to play devil's advocate? Sure: I will have to change all my scripts, my teaching examples, my book, and lots of code examples for research and particularly consulting jobs. Personally, I think having stringsAsFactors=TRUE is not too bad for read.table() but less useful for data.frame(). And since you ask for the devil's advocate already, related to the subject line: Removing stars is horrible for consulting: With all those people from biology, medicine and other fields who even ask us questions in term of significance stars that are obviously very common for them. Many of them will certainly ask us for the stars, and ask us to switch to another software product once they do not get it from R. They may not be interested in being taught about the advantages or disadvantages of p-values or stars. There are different use cases of R, and I want to keep stars for consulting tasks where things have to be delivered within minutes. I am happy with or without for teaching, where I have the time and can easily talk about the sense and nonsense of p-values. Best, Uwe Thanks, Uwe. Now let me go one step farther. Can you (or anyone) give a good argument **other than backward compatibility** for keeping the stringAsFactors=TRUE argument on data.frame()? I appreciate your distinction between data.frame() and read.table()'s use of stringAsFactors, and I can see that there is some point for quick-and-dirty interactive use in setting all non-numeric variables to factors (arguing that wanting non-numerics as factors is somewhat more common than wanting them as strings). It might be nice to add an optional stringsAsFactors (and check.names) argument to transform(): I've had to write my own Transform() function to allow the defaults to be overridden, since transform() calls data.frame() with the defaults. (Setting the stringsAsFactors option globally would work, although not for check.names.) Ben BOlker What I will likely do is make a few changes so that character vectors are automatically changed to factors in modelling functions, so that operating with stringsAsFactors=FALSE doesn't trigger silly warnings. Duncan Murdoch [apologies for snipping context: gmane made me do it] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
On 12.02.2013 16:40, Ben Bolker wrote: On 13-02-12 09:20 AM, Uwe Ligges wrote: On 12.02.2013 14:54, Ben Bolker wrote: Duncan Murdoch murdoch.duncan at gmail.com writes: [snip] Regarding stringsAsFactors: I'm not going to defend keeping it as is, I'll let the people who like it defend it. Would someone (anyone) like to come forward and give us a defense of stringsAsFactors=TRUE -- even someone who doesn't personally like it but would like to play devil's advocate? Sure: I will have to change all my scripts, my teaching examples, my book, and lots of code examples for research and particularly consulting jobs. Personally, I think having stringsAsFactors=TRUE is not too bad for read.table() but less useful for data.frame(). And since you ask for the devil's advocate already, related to the subject line: Removing stars is horrible for consulting: With all those people from biology, medicine and other fields who even ask us questions in term of significance stars that are obviously very common for them. Many of them will certainly ask us for the stars, and ask us to switch to another software product once they do not get it from R. They may not be interested in being taught about the advantages or disadvantages of p-values or stars. There are different use cases of R, and I want to keep stars for consulting tasks where things have to be delivered within minutes. I am happy with or without for teaching, where I have the time and can easily talk about the sense and nonsense of p-values. Best, Uwe Thanks, Uwe. Now let me go one step farther. Can you (or anyone) give a good argument **other than backward compatibility** for keeping the stringAsFactors=TRUE argument on data.frame()? No, I cannot, Uwe I appreciate your distinction between data.frame() and read.table()'s use of stringAsFactors, and I can see that there is some point for quick-and-dirty interactive use in setting all non-numeric variables to factors (arguing that wanting non-numerics as factors is somewhat more common than wanting them as strings). It might be nice to add an optional stringsAsFactors (and check.names) argument to transform(): I've had to write my own Transform() function to allow the defaults to be overridden, since transform() calls data.frame() with the defaults. (Setting the stringsAsFactors option globally would work, although not for check.names.) Ben BOlker What I will likely do is make a few changes so that character vectors are automatically changed to factors in modelling functions, so that operating with stringsAsFactors=FALSE doesn't trigger silly warnings. Duncan Murdoch [apologies for snipping context: gmane made me do it] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
On 12/02/2013 10:40 AM, Ben Bolker wrote: On 13-02-12 09:20 AM, Uwe Ligges wrote: On 12.02.2013 14:54, Ben Bolker wrote: Duncan Murdoch murdoch.duncan at gmail.com writes: [snip] Regarding stringsAsFactors: I'm not going to defend keeping it as is, I'll let the people who like it defend it. Would someone (anyone) like to come forward and give us a defense of stringsAsFactors=TRUE -- even someone who doesn't personally like it but would like to play devil's advocate? Sure: I will have to change all my scripts, my teaching examples, my book, and lots of code examples for research and particularly consulting jobs. Personally, I think having stringsAsFactors=TRUE is not too bad for read.table() but less useful for data.frame(). And since you ask for the devil's advocate already, related to the subject line: Removing stars is horrible for consulting: With all those people from biology, medicine and other fields who even ask us questions in term of significance stars that are obviously very common for them. Many of them will certainly ask us for the stars, and ask us to switch to another software product once they do not get it from R. They may not be interested in being taught about the advantages or disadvantages of p-values or stars. There are different use cases of R, and I want to keep stars for consulting tasks where things have to be delivered within minutes. I am happy with or without for teaching, where I have the time and can easily talk about the sense and nonsense of p-values. Best, Uwe Thanks, Uwe. Now let me go one step farther. Can you (or anyone) give a good argument **other than backward compatibility** for keeping the stringAsFactors=TRUE argument on data.frame()? I can, under two assumptions: 1. We keep stringsAsFactors=TRUE on read.table(). 2. We keep the stringsAsFactors argument in data.frame(). Under those assumptions, it would just be confusing to have opposite defaults. (Just in case someone hasn't read all of this thread: I'd be happier to have the default be FALSE in both cases, but not until 3.1.x. For 3.0.x I think I'd just change the default value of default.stringsAsFactors() to FALSE, so people could easily get the old behaviour.) Duncan Murdoch I appreciate your distinction between data.frame() and read.table()'s use of stringAsFactors, and I can see that there is some point for quick-and-dirty interactive use in setting all non-numeric variables to factors (arguing that wanting non-numerics as factors is somewhat more common than wanting them as strings). It might be nice to add an optional stringsAsFactors (and check.names) argument to transform(): I've had to write my own Transform() function to allow the defaults to be overridden, since transform() calls data.frame() with the defaults. (Setting the stringsAsFactors option globally would work, although not for check.names.) Ben BOlker What I will likely do is make a few changes so that character vectors are automatically changed to factors in modelling functions, so that operating with stringsAsFactors=FALSE doesn't trigger silly warnings. Duncan Murdoch [apologies for snipping context: gmane made me do it] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
I thought that the default was the way it was for performance reasons. For large data.frames or repeated applications, using factors should be faster for non-trivial strings. fs - c('apple','peach','watermelon','spinach','persimmon','potato','kale') n - 100 a1 - data.frame(f=sample(fs,n,replace=TRUE), x1=rnorm(n), x2=rnorm(n), stringsAsFactors=TRUE) a2 - data.frame(f=sample(fs,n,replace=TRUE), x1=rnorm(n), x2=rnorm(n), stringsAsFactors=FALSE) fn - function(i,x) x[x$f %in% c('kale','spinach'),] system.time(z - sapply(1:100, fn, a1)) user system elapsed 19.614 4.037 24.649 system.time(z - sapply(1:100, fn, a2)) user system elapsed 19.726 7.715 36.761 On Feb 12, 2013, at 10:40 AM, Ben Bolker bbol...@gmail.com wrote: Thanks, Uwe. Now let me go one step farther. Can you (or anyone) give a good argument **other than backward compatibility** for keeping the stringAsFactors=TRUE argument on data.frame()? I appreciate your distinction between data.frame() and read.table()'s use of stringAsFactors, and I can see that there is some point for quick-and-dirty interactive use in setting all non-numeric variables to factors (arguing that wanting non-numerics as factors is somewhat more common than wanting them as strings). It might be nice to add an optional stringsAsFactors (and check.names) argument to transform(): I've had to write my own Transform() function to allow the defaults to be overridden, since transform() calls data.frame() with the defaults. (Setting the stringsAsFactors option globally would work, although not for check.names.) Ben BOlker What I will likely do is make a few changes so that character vectors are automatically changed to factors in modelling functions, so that operating with stringsAsFactors=FALSE doesn't trigger silly warnings. Duncan Murdoch [apologies for snipping context: gmane made me do it] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
On Feb 12, 2013, at 17:05 , Brian Lee Yung Rowe wrote: I thought that the default was the way it was for performance reasons. For large data.frames or repeated applications, using factors should be faster for non-trivial strings. I think not. Historically, it's more like In statistics we have two kinds of variables, numerical and categorical. OK, so we have the occasional truly character-type variables like name and address, let's handle those as a special case. -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] get and exists are not vectorized
Here is the current behavior (in 2.15.2 and 3.0.0): exists(c('notLikely', 'exists')) [1] FALSE exists(c('exists', 'notLikely')) [1] TRUE get(c('notLikely', 'exists')) Error in get(c(notLikely, exists)) : object 'notLikely' not found get(c('exists', 'notLikely')) function (x, where = -1, envir = if (missing(frame)) as.environment(where) else sys.frame(frame), frame, mode = any, inherits = TRUE) .Internal(exists(x, envir, mode, inherits)) bytecode: 0x0f7f8830 environment: namespace:base Both 'exists' and 'get' silently ignore all but the first element. My view is that 'get' should do what it currently does except it should warn about ignoring subsequent elements if there are any. I don't see a reason why 'exists' shouldn't be vectorized. Am I missing something? Pat -- Patrick Burns pbu...@pburns.seanet.com twitter: @burnsstat @portfolioprobe http://www.portfolioprobe.com/blog http://www.burns-stat.com (home of: 'Impatient R' 'The R Inferno' 'Tao Te Programming') __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
They are reaching for the stars. Pardon my jest, but I couldn't resist. Ravi -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On Behalf Of Uwe Ligges Sent: Tuesday, February 12, 2013 10:01 AM To: Frank Harrell Cc: r-devel@r-project.org Subject: Re: [Rd] Regression stars On 12.02.2013 15:42, Frank Harrell wrote: Uwe I've been consulting for decades and have never once been asked for such stars. Honestly: last time I have been asked last week. And when I answered (in another case few months ago) OK, I can add you another 5 stars for p values smaller than 0.5 they did not find it too funny. Best, Uwe And when a clinical researcher puts a sentence in a study protocol that P0.05 will be considered significant I get them to take it out. Frank Uwe Ligges-3 wrote On 12.02.2013 14:54, Ben Bolker wrote: Duncan Murdoch murdoch.duncan at gmail.com writes: [snip] Regarding stringsAsFactors: I'm not going to defend keeping it as is, I'll let the people who like it defend it. Would someone (anyone) like to come forward and give us a defense of stringsAsFactors=TRUE -- even someone who doesn't personally like it but would like to play devil's advocate? Sure: I will have to change all my scripts, my teaching examples, my book, and lots of code examples for research and particularly consulting jobs. Personally, I think having stringsAsFactors=TRUE is not too bad for read.table() but less useful for data.frame(). And since you ask for the devil's advocate already, related to the subject line: Removing stars is horrible for consulting: With all those people from biology, medicine and other fields who even ask us questions in term of significance stars that are obviously very common for them. Many of them will certainly ask us for the stars, and ask us to switch to another software product once they do not get it from R. They may not be interested in being taught about the advantages or disadvantages of p-values or stars. There are different use cases of R, and I want to keep stars for consulting tasks where things have to be delivered within minutes. I am happy with or without for teaching, where I have the time and can easily talk about the sense and nonsense of p-values. Best, Uwe What I will likely do is make a few changes so that character vectors are automatically changed to factors in modelling functions, so that operating with stringsAsFactors=FALSE doesn't trigger silly warnings. Duncan Murdoch [apologies for snipping context: gmane made me do it] __ R-devel@ mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@ mailing list https://stat.ethz.ch/mailman/listinfo/r-devel - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Regression-stars-tp4657795p4658268.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
I think it may have been John D. Cook who first observed that p-values are linearly correlated with the amount of time remaining on a grant. Perhaps a suitable transform would reveal an ordinal relationship with stars. On Tue, Feb 12, 2013 at 7:03 AM, Ravi Varadhan ravi.varad...@jhu.eduwrote: They are reaching for the stars. Pardon my jest, but I couldn't resist. Ravi -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On Behalf Of Uwe Ligges Sent: Tuesday, February 12, 2013 10:01 AM To: Frank Harrell Cc: r-devel@r-project.org Subject: Re: [Rd] Regression stars On 12.02.2013 15:42, Frank Harrell wrote: Uwe I've been consulting for decades and have never once been asked for such stars. Honestly: last time I have been asked last week. And when I answered (in another case few months ago) OK, I can add you another 5 stars for p values smaller than 0.5 they did not find it too funny. Best, Uwe And when a clinical researcher puts a sentence in a study protocol that P0.05 will be considered significant I get them to take it out. Frank Uwe Ligges-3 wrote On 12.02.2013 14:54, Ben Bolker wrote: Duncan Murdoch murdoch.duncan at gmail.com writes: [snip] Regarding stringsAsFactors: I'm not going to defend keeping it as is, I'll let the people who like it defend it. Would someone (anyone) like to come forward and give us a defense of stringsAsFactors=TRUE -- even someone who doesn't personally like it but would like to play devil's advocate? Sure: I will have to change all my scripts, my teaching examples, my book, and lots of code examples for research and particularly consulting jobs. Personally, I think having stringsAsFactors=TRUE is not too bad for read.table() but less useful for data.frame(). And since you ask for the devil's advocate already, related to the subject line: Removing stars is horrible for consulting: With all those people from biology, medicine and other fields who even ask us questions in term of significance stars that are obviously very common for them. Many of them will certainly ask us for the stars, and ask us to switch to another software product once they do not get it from R. They may not be interested in being taught about the advantages or disadvantages of p-values or stars. There are different use cases of R, and I want to keep stars for consulting tasks where things have to be delivered within minutes. I am happy with or without for teaching, where I have the time and can easily talk about the sense and nonsense of p-values. Best, Uwe What I will likely do is make a few changes so that character vectors are automatically changed to factors in modelling functions, so that operating with stringsAsFactors=FALSE doesn't trigger silly warnings. Duncan Murdoch [apologies for snipping context: gmane made me do it] __ R-devel@ mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@ mailing list https://stat.ethz.ch/mailman/listinfo/r-devel - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Regression-stars-tp4657795p4658268.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- *A model is a lie that helps you see the truth.* * * Howard Skipperhttp://cancerres.aacrjournals.org/content/31/9/1173.full.pdf [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
On Feb 12, 2013, at 11:05 AM, Brian Lee Yung Rowe wrote: I thought that the default was the way it was for performance reasons. For large data.frames or repeated applications, using factors should be faster for non-trivial strings. fs - c('apple','peach','watermelon','spinach','persimmon','potato','kale') n - 100 a1 - data.frame(f=sample(fs,n,replace=TRUE), x1=rnorm(n), x2=rnorm(n), stringsAsFactors=TRUE) a2 - data.frame(f=sample(fs,n,replace=TRUE), x1=rnorm(n), x2=rnorm(n), stringsAsFactors=FALSE) fn - function(i,x) x[x$f %in% c('kale','spinach'),] system.time(z - sapply(1:100, fn, a1)) user system elapsed 19.614 4.037 24.649 system.time(z - sapply(1:100, fn, a2)) user system elapsed 19.726 7.715 36.761 Not really: system.time(z - sapply(1:100, fn, a1)) user system elapsed 13.780 0.444 14.229 rm(z) gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 182113 9.8 407500 21.8337655 18.1 Vcells 5789638 44.2 133982285 1022.3 163019778 1243.8 system.time(z - sapply(1:100, fn, a2)) user system elapsed 13.201 0.668 13.873 But your test is bogus, because %in% uses match() which converts factors to character vectors anyway, so in your case you're just measuring noise in your system, character vectors are always faster in your example. The reason is that in R strings are hashed so character vectors are technically very similar to factors just with faster access (because they don't need to go through the integer indirection). On 32-bit strings are in theory always faster than factors, on 64-bit they use double the size so they may or may not be faster depending on how you hit the cache etc. Anyway, in modern R versions you're much better off using character vectors than factors for any processing, so stringsAsFactors=FALSE is what I use exclusively. Cheers, Simon On Feb 12, 2013, at 10:40 AM, Ben Bolker bbol...@gmail.com wrote: Thanks, Uwe. Now let me go one step farther. Can you (or anyone) give a good argument **other than backward compatibility** for keeping the stringAsFactors=TRUE argument on data.frame()? I appreciate your distinction between data.frame() and read.table()'s use of stringAsFactors, and I can see that there is some point for quick-and-dirty interactive use in setting all non-numeric variables to factors (arguing that wanting non-numerics as factors is somewhat more common than wanting them as strings). It might be nice to add an optional stringsAsFactors (and check.names) argument to transform(): I've had to write my own Transform() function to allow the defaults to be overridden, since transform() calls data.frame() with the defaults. (Setting the stringsAsFactors option globally would work, although not for check.names.) Ben BOlker What I will likely do is make a few changes so that character vectors are automatically changed to factors in modelling functions, so that operating with stringsAsFactors=FALSE doesn't trigger silly warnings. Duncan Murdoch [apologies for snipping context: gmane made me do it] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Contribution
Hi Parthasarathy, IMHO the easiest way to contribute to R is contributing to an R package. And one way to do that is to apply for a Google Summer of Code project. I guess activities about that will start soon, as the program was just announced, and they will take place at a separate email list: gso...@groups.google.com So I suggest you sign up for that list, and maybe explain a bit who you are, what experience you have in R programming (or other languages) and what your programming interests are. Best, Claudia I am Parthasarathy G , from IIT Maras ( India ). I am currently in third year of the undergraduate course. I would like to contribute to the R project. Can anyone guide me regarding this? Thanking you, Parthasarathy [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Claudia Beleites Spectroscopy/Imaging Institute of Photonic Technology Albert-Einstein-Str. 9 07745 Jena Germany email: claudia.belei...@ipht-jena.de phone: +49 3641 206-133 fax: +49 2641 206-399 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
On 02/12/2013 08:20 AM, peter dalgaard wrote: On Feb 12, 2013, at 17:05 , Brian Lee Yung Rowe wrote: I thought that the default was the way it was for performance reasons. For large data.frames or repeated applications, using factors should be faster for non-trivial strings. I think not. Historically, it's more like In statistics we have two kinds of variables, numerical and categorical. OK, so we have the occasional truly character-type variables like name and address, let's handle those as a special case. sarcasm Since character vectors are so bad and people use them where they should instead use a factor, I propose to go all the way and by adding the stringsAsFactors arg to character() too. That way people are put on the right track from the very start. /sarcasm No seriously, if my variable is categorical, it's already in a factor and that's how I pass it to data.frame(). But if I have it in a character vector, it's because that's how I want it. It's my choice. How could anybody ever think that having data.frame() alter his/her data is a good thing? Please *remove* the stringsAsFactors arg of data.frame() in R 3.0. You'll do a big favor to your user base. Thanks, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fhcrc.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
On 12/02/2013 1:47 PM, Hervé Pagès wrote: On 02/12/2013 08:20 AM, peter dalgaard wrote: On Feb 12, 2013, at 17:05 , Brian Lee Yung Rowe wrote: I thought that the default was the way it was for performance reasons. For large data.frames or repeated applications, using factors should be faster for non-trivial strings. I think not. Historically, it's more like In statistics we have two kinds of variables, numerical and categorical. OK, so we have the occasional truly character-type variables like name and address, let's handle those as a special case. sarcasm Since character vectors are so bad and people use them where they should instead use a factor, I propose to go all the way and by adding the stringsAsFactors arg to character() too. That way people are put on the right track from the very start. /sarcasm I think you are misreading what Peter wrote. He wasn't defending that point of view, he was describing it. No seriously, if my variable is categorical, it's already in a factor and that's how I pass it to data.frame(). But if I have it in a character vector, it's because that's how I want it. It's my choice. How could anybody ever think that having data.frame() alter his/her data is a good thing? Please *remove* the stringsAsFactors arg of data.frame() in R 3.0. You'll do a big favor to your user base. That's a really bad suggestion -- it would break code for people who set stringsAsFactors=FALSE as well as those who rely on the current default behaviour. We certainly won't do that. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
Hi Duncan, On 02/12/2013 11:19 AM, Duncan Murdoch wrote: On 12/02/2013 1:47 PM, Hervé Pagès wrote: On 02/12/2013 08:20 AM, peter dalgaard wrote: On Feb 12, 2013, at 17:05 , Brian Lee Yung Rowe wrote: I thought that the default was the way it was for performance reasons. For large data.frames or repeated applications, using factors should be faster for non-trivial strings. I think not. Historically, it's more like In statistics we have two kinds of variables, numerical and categorical. OK, so we have the occasional truly character-type variables like name and address, let's handle those as a special case. sarcasm Since character vectors are so bad and people use them where they should instead use a factor, I propose to go all the way and by adding the stringsAsFactors arg to character() too. That way people are put on the right track from the very start. /sarcasm I think you are misreading what Peter wrote. He wasn't defending that point of view, he was describing it. I was answering to the thread, not to Peter in particular. Sorry if it sounded otherwise. No seriously, if my variable is categorical, it's already in a factor and that's how I pass it to data.frame(). But if I have it in a character vector, it's because that's how I want it. It's my choice. How could anybody ever think that having data.frame() alter his/her data is a good thing? Please *remove* the stringsAsFactors arg of data.frame() in R 3.0. You'll do a big favor to your user base. That's a really bad suggestion -- it would break code for people who set stringsAsFactors=FALSE as well as those who rely on the current default behaviour. We certainly won't do that. But since there seems to be a discussion about doing some changes to the stringsAsFactors feature, I was hoping you would consider that one too. Doing the right thing sometimes requires breaking people's code, sadly! Cheers, H. Duncan Murdoch -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fhcrc.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] stopping finalizers
Is there some way to prevent finalizers running during a section of code? I have a package that includes R objects linked to database tables. To maintain the call-by-value semantics, tables are copied rather than modified, and the extra tables are removed by finalizers during garbage collection. However, if the garbage collection occurs in the middle of processing another SQL query (which is relatively likely, since that's where the memory allocations are) there are problems with the database interface. Since the guarantees for the finalizer are at most once, not before the object is out of scope it seems harmless to be able to prevent finalizers from running during a particular code block, but I can't see any way to do it. Suggestions? -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel