Re: [Bioc-devel] ShortRead: optional custom labeling of samples in QA report

2013-02-12 Thread Julian Gehring

Hi,

Since the attached file didn't make it all the way through to the 
mailing list, you can find it at 
http://www.ebi.ac.uk/~jgehring/share/shortRead-pkg/0001-Example-patch-for-naming-samples-in-BAMQA.patch.



Best wishes
Julian


On 02/12/2013 03:23 PM, Julian Gehring wrote:

Hi,

In the QA report of the 'ShortRead' package, a short sequential integer
labeling for referencing the samples/files throughout the report is
created by default.  Would it be reasonable/possible to allow for other
optional names to label the samples to make the results of the report
easier to understand?

In general, I have three ideas what would be handy to have:

1. Derive a label from the file names.  This is probably hard to
generalize and implement in a way that it actually helps.

2. In case the 'dirPath' argument in the 'qa' function call is a named
vector, such as

 qa(dirPath=c(p1=bam_file1.bam, p2=bam_file2.bam))

use the names [p1, p2] for the labeling later on.  This would
require storing the names in the object returned by 'qa', but should not
be too hard to implement.

3. Optionally, pass a named vector to the 'report' method, matching file
names to sample labels.  In case the file names do not match or
'samples' is missing, default to the sequential labeling.


For option 3, I have created a simple example patch to illustrate how
this could be implemented (see attached).  So, later this may look like
this:


 library(ShortRead)
 files = c(p1=bam_file1.bam, p2=bam_file2.bam)
 qa = qa(files, type=BAM)

 ## default sequential labeling ##
 ShortRead:::.report_html_BAMQA(qa, dest=report_normal)

 ## samples named according to names(files) ##
 ShortRead:::.report_html_BAMQA(qa, dest=report_named, samples=files)


I'm happy about any inputs or thoughts regarding this.


Best wishes
Julian


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] ShortRead: optional custom labeling of samples in QA report

2013-02-12 Thread Martin Morgan

On 02/12/2013 06:29 AM, Julian Gehring wrote:

Hi,

Since the attached file didn't make it all the way through to the mailing list,
you can find it at
http://www.ebi.ac.uk/~jgehring/share/shortRead-pkg/0001-Example-patch-for-naming-samples-in-BAMQA.patch.


Thanks Julian the request seems reasonable and I'll try to get to this in the 
next week. Martin






Best wishes
Julian


On 02/12/2013 03:23 PM, Julian Gehring wrote:

Hi,

In the QA report of the 'ShortRead' package, a short sequential integer
labeling for referencing the samples/files throughout the report is
created by default.  Would it be reasonable/possible to allow for other
optional names to label the samples to make the results of the report
easier to understand?

In general, I have three ideas what would be handy to have:

1. Derive a label from the file names.  This is probably hard to
generalize and implement in a way that it actually helps.

2. In case the 'dirPath' argument in the 'qa' function call is a named
vector, such as

 qa(dirPath=c(p1=bam_file1.bam, p2=bam_file2.bam))

use the names [p1, p2] for the labeling later on.  This would
require storing the names in the object returned by 'qa', but should not
be too hard to implement.

3. Optionally, pass a named vector to the 'report' method, matching file
names to sample labels.  In case the file names do not match or
'samples' is missing, default to the sequential labeling.


For option 3, I have created a simple example patch to illustrate how
this could be implemented (see attached).  So, later this may look like
this:


 library(ShortRead)
 files = c(p1=bam_file1.bam, p2=bam_file2.bam)
 qa = qa(files, type=BAM)

 ## default sequential labeling ##
 ShortRead:::.report_html_BAMQA(qa, dest=report_normal)

 ## samples named according to names(files) ##
 ShortRead:::.report_html_BAMQA(qa, dest=report_named, samples=files)


I'm happy about any inputs or thoughts regarding this.


Best wishes
Julian


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] R CMD build not copying PDF vignettes to inst\doc

2013-02-12 Thread Norman Pavelka
Hi Dan,

I actually just installed the latest version of R-Devel (r61902) and
used biocLite(plgem) to download and install the latest version of
my package from the server. Although there are no errors or warnings
on the Bioc build/check report, my package still lacks the PDF version
of the vignette. I checked the source tarball in
http://www.bioconductor.org/packages/2.12/bioc/src/contrib/plgem_1.31.1.tar.gz
and in fact cannot see any PDFs in inst/doc. You can also notice the
vignette is not listed anymore in
http://www.bioconductor.org/packages/2.12/bioc/html/plgem.html

I then rebuilt the package from source myself from a freshly
checked-out version from the Bioc-devel repository (plgem version
1.31.1) using R-Devel r61902. I get no errors, no warnings and most
importantly the PDF is being built and included in the tarball
correctly.

So it appears that R-Devel r61868 (the version currenlty on the build
machine) is still not copying the vignette PDF into the package. Could
you please try to update R-Devel to r61902 and see if it solves the
problem?

Thanks!
Norman

P.S.: For full disclosure, I should probably mention that I recently
moved the .Rnw file from inst/doc to /vignettes following the latest R
recommendations, but I am unsure if this has anything to do with the
problem, as the package builds just fine on my machine using the
latest version of R-Devel.

On Wed, Feb 13, 2013 at 12:11 PM, Norman Pavelka
normanpave...@gmail.com wrote:
 Hi Dan,

 I can see the issue is resolved now! I will update my version of R-devel, too.

 Thanks,
 Norman

 On Fri, Feb 8, 2013 at 1:19 PM, Dan Tenenbaum dtene...@fhcrc.org wrote:
 On Thu, Feb 7, 2013 at 8:48 PM, Dan Tenenbaum dtene...@fhcrc.org wrote:
 Hi Norman,


 On Thu, Feb 7, 2013 at 6:59 PM, Norman Pavelka normanpave...@gmail.com 
 wrote:
 Hi,

 I am sure many of you may have noticed already, but basically every
 package in Bioc-devel that has a vignette (i.e. almost every package)
 is currently issuing warnings in R CMD check:
 http://www.bioconductor.org/checkResults/2.12/bioc-LATEST/

 I ran some tests myself and it appears that in the latest version of
 R-devel some changes have been introduced in R CMD build that causes
 it not to copy the compiled PDF vignettes to inst\doc. R CMD build
 returns only a silent warning such as:

 * creating vignettes ... OK
 Warning in file.copy(c(vigns$docs, outfiles), doc_dir) :
   problem copying
 E:\biocbld\bbs-2.12-bioc\tmpdir\Rtmpq4jjoR\Rbuild93c202b5ca6\plgem\vignettes\plgem.pdf
 to inst\doc\plgem.pdf: No such file or directory

 R CMD check then issues the following user-visible warning:

 * checking package vignettes in 'inst/doc' ... WARNING
 Package vignette without corresponding PDF:
'plgem.Rnw'

 Compiling my package from the same source but using the previous
 version of R CMD build does not cause any problems, i.e. the vignette
 PDF is correctly copied to inst/doc and R CMD check does not issue any
 warning.

 Should we bring this up to R-Devel mailing list?


 I'm not sure (checking right now) but I think this was fixed in r61843.
 The build machines are running r61836. The nightly build is underway
 but I will update R-devel tomorrow if doing so indeed fixes the
 problem.


 I can confirm that pdfs are properly copied into source tarballs with
 R-devel r61868.

 I will update to the latest R-devel tomorrow.
 Dan


 Thanks!
 Dan


 Cheers,
 Norman

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Rd] stringsAsFactors

2013-02-12 Thread Ista Zahn
FWIW my view is that for data cleaning and organizing factors just get
it the way. For modeling I like them because they make it easier to
understand what is happening. For example I can look at the levels()
to see what the reference group will be. With characters one has to
know a) that levels are created in alphabetical order and b) the
alphabetical order of the the unique values in the character vector.
Ugh. So my habit is to turn off stringsAsFactors, then explicitly
convert to factors before modeling (I also use factors to change the
order in which things are displayed in tables and graphs, another
place where converting to factors myself is useful but the creating
them in alphabetical order by default is not)

All this is to say that I would like options(stingsAsFactors=FALSE) to
be the default, but I like the warning about converting to factors in
modeling functions because it reminds me that I forgot to covert them,
which I like to do anyway...

Best,
Ista

On Mon, Feb 11, 2013 at 12:50 PM, Duncan Murdoch
murdoch.dun...@gmail.com wrote:
 On 11/02/2013 12:13 PM, William Dunlap wrote:

 Note that changing this does not just mean getting rid of silly
 warnings.
 Currently, predict.lm() can give wrong answers when stringsAsFactors is
 FALSE.

 d - data.frame(x=1:10, f=rep(c(A,B,C), c(4,3,3)), y=c(1:4,
 15:17, 28.1,28.8,30.1))
 fit_ab - lm(y ~ x + f, data = d, subset = f!=B)
Warning message:
In model.matrix.default(mt, mf, contrasts) :
  variable 'f' converted to a factor
 predict(fit_ab, newdata=d)
 1 2 3 4 5 6 7 8 9 10
 1  2  3  4 25 26 27  8  9 10
Warning messages:
1: In model.matrix.default(Terms, m, contrasts.arg = object$contrasts)
 :
  variable 'f' converted to a factor
2: In predict.lm(fit_ab, newdata = d) :
  prediction from a rank-deficient fit may be misleading

 fit_ab is not rank-deficient and the predict should report
 1 2 3 4 NA NA NA 28 29 30


 In R-devel, the two warnings about factor conversions are no longer given,
 but the predictions are the same and the warning about rank deficiency still
 shows up.  If f is set to be a factor, an error is generated:

 Error in model.frame.default(Terms, newdata, na.action = na.action, xlev =
 object$xlevels) :
   factor f has new levels B

 I think both the warning and error are somewhat reasonable responses.  The
 fit is rank deficient relative to the model that includes f == B,  because
 the column of the design matrix corresponding to f level B would be
 completely zero.  In this particular model, we could still do predictions
 for the other levels, but it also seems reasonable to quit, given that
 clearly something has gone wrong.

 I do think that it's unfortunate that we don't get the same result in both
 cases, and I'd like to have gotten the predictions you suggested, but I
 don't think that's going to happen.  The reason for the difference is that
 the subsetting is done before the conversion to a factor, but I think that
 is unavoidable without really big changes.

 Duncan Murdoch




 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com

  -Original Message-
  From: r-devel-boun...@r-project.org
  [mailto:r-devel-boun...@r-project.org] On Behalf
  Of Terry Therneau
  Sent: Monday, February 11, 2013 5:50 AM
  To: r-devel@r-project.org; Duncan Murdoch
  Subject: Re: [Rd] stringsAsFactors
 
  I think your idea to remove the warnings is excellent, and a good
  compromise.
  Characters
  already work fine in modeling functions except for the silly warning.
 
  It is interesting how often the defaults for a program reflect the data
  sets in use at the
  time the defaults were chosen.  There are some such in my own survival
  package whose
  proper value is no longer as obvious as it was when I chose them.
  Factors are very
  handy for variables which have only a few levels and will be used in
  modeling.  Every
  character variable of every dataset in Statistical Models in S, which
  introduced
  factors, is of this type so auto-transformation made a lot of sense.
  The solder data
  set there is one for which Helmert contrasts are proper so guess what
  the default
  contrast
  option was?  (I think there are only a few data sets in the world for
  which Helmert makes
  sense, however, and R eventually changed the default.)
 
  For character variables that should not be factors such as a street
  adress
  stringsAsFactors can be a real PITA, and I expect that people's
  preference for the option
  depends almost entirely on how often these arise in their own work.  As
  long as there is
  an option that can be overridden I'm okay.  Yes, I'd prefer FALSE as the
  default, partly
  because the current value is a tripwire in the hallway that eventually
  catches every new
  user.
 
  Terry Therneau
 
  On 02/11/2013 05:00 AM, r-devel-requ...@r-project.org wrote:
   Both of these were discussed by R Core.  I think it's unlikely the
   default for stringsAsFactors will be 

[Rd] Private environments and/or assignInMyNamespace

2013-02-12 Thread Ulrike Grömping

Dear DevelopeRs,

I've been struggling with the new regulations regarding modifications to 
the search path, regarding my Rcmdr plugin package RcmdrPlugin.DoE. John 
Fox made Rcmdr comply with the new policy by removing the environment 
RcmdrEnv from the search path. For the time being, he developed an 
option that allows users to put the environment from Rcmdr (RcmdrEnv) on 
the search path, like in earlier versions of Rcmdr (thanks John!), which 
rescues my package for the immediate future; however, in the long run it 
would be nice to be able to make it work without that.


The reason why I currently need the environment on the search path (may 
be due to my lack of understanding how tcltk widgets are handled): I 
have quite elaborate notebook widgets on which users can make many 
entries. Some entries are only checked after clicking OK, and if an 
error is found at that point, the user receives a small message window 
that has to be confirmed and is subsequently returned to the notebook 
widget in the state it was in when pressing OK. These widgets are 
currently held in the environment RcmdrEnv; they work when RcmdrEnv is 
on the search path; however, it is not sufficient to retrieve them with 
John's function getRcmdr, which works fine for objects other than widgets.


Here my question: Would it be an option to place the widgets in a 
private environment of my plugin package (then I would have to learn how 
to create one and work with it), or won't they be found that way? 
Alternatively, I could have unexported objects of all required names in 
my namespace and modify these via assignInMyNamespace (I don't think 
that anybody from somewhere else would import that namespace, it's not 
that kind of package). Would that be a viable alternative, and would the 
widgets be found that way? Any further ideas?


Best regards,
Ulrike

--
*
* Ulrike Groemping  *
* BHT Berlin - University of Applied Sciences   *
*
* +49 (30) 39404863 (Home Office)   *
* +49 (30) 4504 5127 (BHT)  *
*
* http://prof.beuth-hochschule.de/groemping *
* groemp...@bht-berlin.de   *

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Ben Bolker
Duncan Murdoch murdoch.duncan at gmail.com writes:

  [snip]
 
 Regarding stringsAsFactors:  I'm not going to defend keeping it as is, 
 I'll let the people who like it defend it.  

  Would someone (anyone) like to come forward and give us a defense
of stringsAsFactors=TRUE -- even someone who doesn't personally like
it but would like to play devil's advocate?

 What I will likely do is 
 make a few changes so that character vectors are automatically changed 
 to factors in modelling functions, so that operating with 
 stringsAsFactors=FALSE doesn't trigger silly warnings.
 
 Duncan Murdoch
 

 [apologies for snipping context: gmane made me do it]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Private environments and/or assignInMyNamespace

2013-02-12 Thread Hadley Wickham
 Here my question: Would it be an option to place the widgets in a private
 environment of my plugin package (then I would have to learn how to create
 one and work with it), or won't they be found that way?

It sounds like you want to maintain state across function calls within
your package, and a private environment is a good solution.  See the
notes on local() at
https://github.com/hadley/devtools/wiki/Environments for a few
details.

Hadley

-- 
Chief Scientist, RStudio
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Uwe Ligges



On 12.02.2013 14:54, Ben Bolker wrote:

Duncan Murdoch murdoch.duncan at gmail.com writes:

   [snip]


Regarding stringsAsFactors:  I'm not going to defend keeping it as is,
I'll let the people who like it defend it.


   Would someone (anyone) like to come forward and give us a defense
of stringsAsFactors=TRUE -- even someone who doesn't personally like
it but would like to play devil's advocate?


Sure:
I will have to change all my scripts, my teaching examples, my book, and 
lots of code examples for research and particularly consulting jobs.


Personally, I think having stringsAsFactors=TRUE is not too bad for 
read.table() but less useful for data.frame().


And since you ask for the devil's advocate already, related to the 
subject line: Removing stars is horrible for consulting: With all those 
people from biology, medicine and other fields who even ask us questions 
in term of significance stars that are obviously very common for them. 
Many of them will certainly ask us for the stars, and ask us to switch 
to another software product once they do not get it from R. They may not 
be interested in being taught about the advantages or disadvantages of 
p-values or stars.


There are different use cases of R, and I want to keep stars for 
consulting tasks where things have to be delivered within minutes. I am 
happy with or without for teaching, where I have the time and can easily 
talk about the sense and nonsense of p-values.



Best,
Uwe
















What I will likely do is
make a few changes so that character vectors are automatically changed
to factors in modelling functions, so that operating with
stringsAsFactors=FALSE doesn't trigger silly warnings.

Duncan Murdoch



  [apologies for snipping context: gmane made me do it]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Frank Harrell
Uwe I've been consulting for decades and have never once been asked for such
stars.  And when a clinical researcher puts a sentence in a study protocol
that P0.05 will be considered significant I get them to take it out.
Frank

Uwe Ligges-3 wrote
 On 12.02.2013 14:54, Ben Bolker wrote:
 Duncan Murdoch 
 murdoch.duncan at
  gmail.com writes:

[snip]

 Regarding stringsAsFactors:  I'm not going to defend keeping it as is,
 I'll let the people who like it defend it.

Would someone (anyone) like to come forward and give us a defense
 of stringsAsFactors=TRUE -- even someone who doesn't personally like
 it but would like to play devil's advocate?
 
 Sure:
 I will have to change all my scripts, my teaching examples, my book, and 
 lots of code examples for research and particularly consulting jobs.
 
 Personally, I think having stringsAsFactors=TRUE is not too bad for 
 read.table() but less useful for data.frame().
 
 And since you ask for the devil's advocate already, related to the 
 subject line: Removing stars is horrible for consulting: With all those 
 people from biology, medicine and other fields who even ask us questions 
 in term of significance stars that are obviously very common for them. 
 Many of them will certainly ask us for the stars, and ask us to switch 
 to another software product once they do not get it from R. They may not 
 be interested in being taught about the advantages or disadvantages of 
 p-values or stars.
 
 There are different use cases of R, and I want to keep stars for 
 consulting tasks where things have to be delivered within minutes. I am 
 happy with or without for teaching, where I have the time and can easily 
 talk about the sense and nonsense of p-values.
 
 
 Best,
 Uwe
 
 
 
 
 
 
 
 
 
 
 
 
 

 What I will likely do is
 make a few changes so that character vectors are automatically changed
 to factors in modelling functions, so that operating with
 stringsAsFactors=FALSE doesn't trigger silly warnings.

 Duncan Murdoch


   [apologies for snipping context: gmane made me do it]

 __
 

 R-devel@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

 
 __

 R-devel@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Regression-stars-tp4657795p4658268.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Duncan Murdoch

On 12/02/2013 9:20 AM, Uwe Ligges wrote:


On 12.02.2013 14:54, Ben Bolker wrote:
 Duncan Murdoch murdoch.duncan at gmail.com writes:

[snip]

 Regarding stringsAsFactors:  I'm not going to defend keeping it as is,
 I'll let the people who like it defend it.

Would someone (anyone) like to come forward and give us a defense
 of stringsAsFactors=TRUE -- even someone who doesn't personally like
 it but would like to play devil's advocate?

Sure:
I will have to change all my scripts, my teaching examples, my book, and
lots of code examples for research and particularly consulting jobs.


Could you post an example of a non-trivial one?  (By trivial, I mean one 
that says data.frame() converts character vectors to factors. 
Obviously that would need to change.  I mean one that just assumes 
current behaviour, and would be broken by the change.)


Duncan Murdoch


Personally, I think having stringsAsFactors=TRUE is not too bad for
read.table() but less useful for data.frame().

And since you ask for the devil's advocate already, related to the
subject line: Removing stars is horrible for consulting: With all those
people from biology, medicine and other fields who even ask us questions
in term of significance stars that are obviously very common for them.
Many of them will certainly ask us for the stars, and ask us to switch
to another software product once they do not get it from R. They may not
be interested in being taught about the advantages or disadvantages of
p-values or stars.

There are different use cases of R, and I want to keep stars for
consulting tasks where things have to be delivered within minutes. I am
happy with or without for teaching, where I have the time and can easily
talk about the sense and nonsense of p-values.


Best,
Uwe














 What I will likely do is
 make a few changes so that character vectors are automatically changed
 to factors in modelling functions, so that operating with
 stringsAsFactors=FALSE doesn't trigger silly warnings.

 Duncan Murdoch


   [apologies for snipping context: gmane made me do it]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Contribution

2013-02-12 Thread Parthasarathy Gopavarapu
Hi,

I am Parthasarathy G , from IIT Maras ( India ). I am currently in third
year of the undergraduate course.

I would like to contribute to the R project. Can anyone guide me regarding
this?

Thanking you,
Parthasarathy

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Ravi Varadhan
I think that we should use P  .03 (which approximates the probability of 5 
consecutive heads) for assigning significance!

Ravi

-Original Message-
From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On 
Behalf Of Frank Harrell
Sent: Tuesday, February 12, 2013 9:43 AM
To: r-devel@r-project.org
Subject: Re: [Rd] Regression stars

Uwe I've been consulting for decades and have never once been asked for such 
stars.  And when a clinical researcher puts a sentence in a study protocol that 
P0.05 will be considered significant I get them to take it out.
Frank

Uwe Ligges-3 wrote
 On 12.02.2013 14:54, Ben Bolker wrote:
 Duncan Murdoch
 murdoch.duncan at
  gmail.com writes:

[snip]

 Regarding stringsAsFactors:  I'm not going to defend keeping it as 
 is, I'll let the people who like it defend it.

Would someone (anyone) like to come forward and give us a defense 
 of stringsAsFactors=TRUE -- even someone who doesn't personally like 
 it but would like to play devil's advocate?
 
 Sure:
 I will have to change all my scripts, my teaching examples, my book, 
 and lots of code examples for research and particularly consulting jobs.
 
 Personally, I think having stringsAsFactors=TRUE is not too bad for
 read.table() but less useful for data.frame().
 
 And since you ask for the devil's advocate already, related to the 
 subject line: Removing stars is horrible for consulting: With all 
 those people from biology, medicine and other fields who even ask us 
 questions in term of significance stars that are obviously very common for 
 them.
 Many of them will certainly ask us for the stars, and ask us to switch 
 to another software product once they do not get it from R. They may 
 not be interested in being taught about the advantages or 
 disadvantages of p-values or stars.
 
 There are different use cases of R, and I want to keep stars for 
 consulting tasks where things have to be delivered within minutes. I 
 am happy with or without for teaching, where I have the time and can 
 easily talk about the sense and nonsense of p-values.
 
 
 Best,
 Uwe
 
 
 
 
 
 
 
 
 
 
 
 
 

 What I will likely do is
 make a few changes so that character vectors are automatically 
 changed to factors in modelling functions, so that operating with 
 stringsAsFactors=FALSE doesn't trigger silly warnings.

 Duncan Murdoch


   [apologies for snipping context: gmane made me do it]

 __
 

 R-devel@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

 
 __

 R-devel@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Regression-stars-tp4657795p4658268.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Uwe Ligges



On 12.02.2013 15:42, Frank Harrell wrote:

Uwe I've been consulting for decades and have never once been asked for such
stars.


Honestly: last time I have been asked last week.

And when I answered (in another case few months ago) OK, I can add you 
another 5 stars for p values smaller than 0.5 they did not find it too 
funny.


Best,
Uwe


And when a clinical researcher puts a sentence in a study protocol
that P0.05 will be considered significant I get them to take it out.

Frank

Uwe Ligges-3 wrote

On 12.02.2013 14:54, Ben Bolker wrote:

Duncan Murdoch

murdoch.duncan at
  gmail.com writes:


[snip]


Regarding stringsAsFactors:  I'm not going to defend keeping it as is,
I'll let the people who like it defend it.


Would someone (anyone) like to come forward and give us a defense
of stringsAsFactors=TRUE -- even someone who doesn't personally like
it but would like to play devil's advocate?


Sure:
I will have to change all my scripts, my teaching examples, my book, and
lots of code examples for research and particularly consulting jobs.

Personally, I think having stringsAsFactors=TRUE is not too bad for
read.table() but less useful for data.frame().

And since you ask for the devil's advocate already, related to the
subject line: Removing stars is horrible for consulting: With all those
people from biology, medicine and other fields who even ask us questions
in term of significance stars that are obviously very common for them.
Many of them will certainly ask us for the stars, and ask us to switch
to another software product once they do not get it from R. They may not
be interested in being taught about the advantages or disadvantages of
p-values or stars.

There are different use cases of R, and I want to keep stars for
consulting tasks where things have to be delivered within minutes. I am
happy with or without for teaching, where I have the time and can easily
talk about the sense and nonsense of p-values.


Best,
Uwe
















What I will likely do is
make a few changes so that character vectors are automatically changed
to factors in modelling functions, so that operating with
stringsAsFactors=FALSE doesn't trigger silly warnings.

Duncan Murdoch



   [apologies for snipping context: gmane made me do it]

__




R-devel@



  mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel



__



R-devel@



  mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel






-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Regression-stars-tp4657795p4658268.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Ben Bolker
On 13-02-12 09:20 AM, Uwe Ligges wrote:
 
 
 On 12.02.2013 14:54, Ben Bolker wrote:
 Duncan Murdoch murdoch.duncan at gmail.com writes:

[snip]

 Regarding stringsAsFactors:  I'm not going to defend keeping it as is,
 I'll let the people who like it defend it.

Would someone (anyone) like to come forward and give us a defense
 of stringsAsFactors=TRUE -- even someone who doesn't personally like
 it but would like to play devil's advocate?
 
 Sure:
 I will have to change all my scripts, my teaching examples, my book, and
 lots of code examples for research and particularly consulting jobs.
 
 Personally, I think having stringsAsFactors=TRUE is not too bad for
 read.table() but less useful for data.frame().
 
 And since you ask for the devil's advocate already, related to the
 subject line: Removing stars is horrible for consulting: With all those
 people from biology, medicine and other fields who even ask us questions
 in term of significance stars that are obviously very common for them.
 Many of them will certainly ask us for the stars, and ask us to switch
 to another software product once they do not get it from R. They may not
 be interested in being taught about the advantages or disadvantages of
 p-values or stars.
 
 There are different use cases of R, and I want to keep stars for
 consulting tasks where things have to be delivered within minutes. I am
 happy with or without for teaching, where I have the time and can easily
 talk about the sense and nonsense of p-values.
 
 
 Best,
 Uwe

  Thanks, Uwe.
  Now let me go one step farther.

  Can you (or anyone) give a good argument **other than backward
compatibility** for keeping the stringAsFactors=TRUE argument on
data.frame()?

  I appreciate your distinction between data.frame() and read.table()'s
use of stringAsFactors, and I can see that there is some point for
quick-and-dirty interactive use in setting all non-numeric variables to
factors (arguing that wanting non-numerics as factors is somewhat more
common than wanting them as strings).

  It might be nice to add an optional stringsAsFactors (and check.names)
argument to transform(): I've had to write my own Transform() function
to allow the defaults to be overridden, since transform() calls
data.frame() with the defaults.  (Setting the stringsAsFactors option
globally would work, although not for check.names.)

  Ben BOlker

 

 What I will likely do is
 make a few changes so that character vectors are automatically changed
 to factors in modelling functions, so that operating with
 stringsAsFactors=FALSE doesn't trigger silly warnings.

 Duncan Murdoch


   [apologies for snipping context: gmane made me do it]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Uwe Ligges



On 12.02.2013 16:40, Ben Bolker wrote:

On 13-02-12 09:20 AM, Uwe Ligges wrote:



On 12.02.2013 14:54, Ben Bolker wrote:

Duncan Murdoch murdoch.duncan at gmail.com writes:

[snip]


Regarding stringsAsFactors:  I'm not going to defend keeping it as is,
I'll let the people who like it defend it.


Would someone (anyone) like to come forward and give us a defense
of stringsAsFactors=TRUE -- even someone who doesn't personally like
it but would like to play devil's advocate?


Sure:
I will have to change all my scripts, my teaching examples, my book, and
lots of code examples for research and particularly consulting jobs.

Personally, I think having stringsAsFactors=TRUE is not too bad for
read.table() but less useful for data.frame().

And since you ask for the devil's advocate already, related to the
subject line: Removing stars is horrible for consulting: With all those
people from biology, medicine and other fields who even ask us questions
in term of significance stars that are obviously very common for them.
Many of them will certainly ask us for the stars, and ask us to switch
to another software product once they do not get it from R. They may not
be interested in being taught about the advantages or disadvantages of
p-values or stars.

There are different use cases of R, and I want to keep stars for
consulting tasks where things have to be delivered within minutes. I am
happy with or without for teaching, where I have the time and can easily
talk about the sense and nonsense of p-values.


Best,
Uwe


   Thanks, Uwe.
   Now let me go one step farther.

   Can you (or anyone) give a good argument **other than backward
compatibility** for keeping the stringAsFactors=TRUE argument on
data.frame()?


No, I cannot,
Uwe




   I appreciate your distinction between data.frame() and read.table()'s
use of stringAsFactors, and I can see that there is some point for
quick-and-dirty interactive use in setting all non-numeric variables to
factors (arguing that wanting non-numerics as factors is somewhat more
common than wanting them as strings).

   It might be nice to add an optional stringsAsFactors (and check.names)
argument to transform(): I've had to write my own Transform() function
to allow the defaults to be overridden, since transform() calls
data.frame() with the defaults.  (Setting the stringsAsFactors option
globally would work, although not for check.names.)

   Ben BOlker






What I will likely do is
make a few changes so that character vectors are automatically changed
to factors in modelling functions, so that operating with
stringsAsFactors=FALSE doesn't trigger silly warnings.

Duncan Murdoch



   [apologies for snipping context: gmane made me do it]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Duncan Murdoch

On 12/02/2013 10:40 AM, Ben Bolker wrote:

On 13-02-12 09:20 AM, Uwe Ligges wrote:


 On 12.02.2013 14:54, Ben Bolker wrote:
 Duncan Murdoch murdoch.duncan at gmail.com writes:

[snip]

 Regarding stringsAsFactors:  I'm not going to defend keeping it as is,
 I'll let the people who like it defend it.

Would someone (anyone) like to come forward and give us a defense
 of stringsAsFactors=TRUE -- even someone who doesn't personally like
 it but would like to play devil's advocate?

 Sure:
 I will have to change all my scripts, my teaching examples, my book, and
 lots of code examples for research and particularly consulting jobs.

 Personally, I think having stringsAsFactors=TRUE is not too bad for
 read.table() but less useful for data.frame().

 And since you ask for the devil's advocate already, related to the
 subject line: Removing stars is horrible for consulting: With all those
 people from biology, medicine and other fields who even ask us questions
 in term of significance stars that are obviously very common for them.
 Many of them will certainly ask us for the stars, and ask us to switch
 to another software product once they do not get it from R. They may not
 be interested in being taught about the advantages or disadvantages of
 p-values or stars.

 There are different use cases of R, and I want to keep stars for
 consulting tasks where things have to be delivered within minutes. I am
 happy with or without for teaching, where I have the time and can easily
 talk about the sense and nonsense of p-values.


 Best,
 Uwe

   Thanks, Uwe.
   Now let me go one step farther.

   Can you (or anyone) give a good argument **other than backward
compatibility** for keeping the stringAsFactors=TRUE argument on
data.frame()?


I can, under two assumptions:

  1.  We keep stringsAsFactors=TRUE on read.table().
  2.  We keep the stringsAsFactors argument in data.frame().

Under those assumptions, it would just be confusing to have opposite 
defaults.  (Just in case someone hasn't read all of this thread: I'd be 
happier to have the default be FALSE in both cases, but not until 
3.1.x.  For 3.0.x I think I'd just change the default value of 
default.stringsAsFactors() to FALSE, so people could easily get the old 
behaviour.)


Duncan Murdoch



   I appreciate your distinction between data.frame() and read.table()'s
use of stringAsFactors, and I can see that there is some point for
quick-and-dirty interactive use in setting all non-numeric variables to
factors (arguing that wanting non-numerics as factors is somewhat more
common than wanting them as strings).

   It might be nice to add an optional stringsAsFactors (and check.names)
argument to transform(): I've had to write my own Transform() function
to allow the defaults to be overridden, since transform() calls
data.frame() with the defaults.  (Setting the stringsAsFactors option
globally would work, although not for check.names.)

   Ben BOlker



 What I will likely do is
 make a few changes so that character vectors are automatically changed
 to factors in modelling functions, so that operating with
 stringsAsFactors=FALSE doesn't trigger silly warnings.

 Duncan Murdoch


   [apologies for snipping context: gmane made me do it]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Brian Lee Yung Rowe

I thought that the default was the way it was for performance reasons. For 
large data.frames or repeated applications, using factors should be faster for 
non-trivial strings.

 fs - c('apple','peach','watermelon','spinach','persimmon','potato','kale')
 n - 100

 a1 - data.frame(f=sample(fs,n,replace=TRUE), x1=rnorm(n), x2=rnorm(n), 
 stringsAsFactors=TRUE)
 a2 - data.frame(f=sample(fs,n,replace=TRUE), x1=rnorm(n), x2=rnorm(n), 
 stringsAsFactors=FALSE)

 fn - function(i,x) x[x$f %in% c('kale','spinach'),]
 system.time(z - sapply(1:100, fn, a1))
   user  system elapsed 
 19.614   4.037  24.649 
 system.time(z - sapply(1:100, fn, a2))
   user  system elapsed 
 19.726   7.715  36.761 


On Feb 12, 2013, at 10:40 AM, Ben Bolker bbol...@gmail.com wrote:
 
  Thanks, Uwe.
  Now let me go one step farther.
 
  Can you (or anyone) give a good argument **other than backward
 compatibility** for keeping the stringAsFactors=TRUE argument on
 data.frame()?
 
  I appreciate your distinction between data.frame() and read.table()'s
 use of stringAsFactors, and I can see that there is some point for
 quick-and-dirty interactive use in setting all non-numeric variables to
 factors (arguing that wanting non-numerics as factors is somewhat more
 common than wanting them as strings).
 
  It might be nice to add an optional stringsAsFactors (and check.names)
 argument to transform(): I've had to write my own Transform() function
 to allow the defaults to be overridden, since transform() calls
 data.frame() with the defaults.  (Setting the stringsAsFactors option
 globally would work, although not for check.names.)
 
  Ben BOlker
 
 
 
 What I will likely do is
 make a few changes so that character vectors are automatically changed
 to factors in modelling functions, so that operating with
 stringsAsFactors=FALSE doesn't trigger silly warnings.
 
 Duncan Murdoch
 
 
  [apologies for snipping context: gmane made me do it]
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread peter dalgaard

On Feb 12, 2013, at 17:05 , Brian Lee Yung Rowe wrote:

 
 I thought that the default was the way it was for performance reasons. For 
 large data.frames or repeated applications, using factors should be faster 
 for non-trivial strings.

I think not. Historically, it's more like In statistics we have two kinds of 
variables, numerical and categorical. OK, so we have the occasional truly 
character-type variables like name and address, let's handle those as a special 
case. 


-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] get and exists are not vectorized

2013-02-12 Thread Patrick Burns

Here is the current behavior (in 2.15.2 and 3.0.0):

 exists(c('notLikely', 'exists'))
[1] FALSE
 exists(c('exists', 'notLikely'))
[1] TRUE
 get(c('notLikely', 'exists'))
Error in get(c(notLikely, exists)) : object 'notLikely' not found
 get(c('exists', 'notLikely'))
function (x, where = -1, envir = if (missing(frame)) 
as.environment(where) else sys.frame(frame),

frame, mode = any, inherits = TRUE)
.Internal(exists(x, envir, mode, inherits))
bytecode: 0x0f7f8830
environment: namespace:base


Both 'exists' and 'get' silently ignore all but the
first element.

My view is that 'get' should do what it currently does
except it should warn about ignoring subsequent elements
if there are any.

I don't see a reason why 'exists' shouldn't be vectorized.

Am I missing something?

Pat

--
Patrick Burns
pbu...@pburns.seanet.com
twitter: @burnsstat @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of:
 'Impatient R'
 'The R Inferno'
 'Tao Te Programming')

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Ravi Varadhan
They are reaching for the stars.  Pardon my jest, but I couldn't resist. 

Ravi

-Original Message-
From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On 
Behalf Of Uwe Ligges
Sent: Tuesday, February 12, 2013 10:01 AM
To: Frank Harrell
Cc: r-devel@r-project.org
Subject: Re: [Rd] Regression stars



On 12.02.2013 15:42, Frank Harrell wrote:
 Uwe I've been consulting for decades and have never once been asked 
 for such stars.

Honestly: last time I have been asked last week.

And when I answered (in another case few months ago) OK, I can add you another 
5 stars for p values smaller than 0.5 they did not find it too funny.

Best,
Uwe

 And when a clinical researcher puts a sentence in a study protocol 
 that P0.05 will be considered significant I get them to take it out.

 Frank

 Uwe Ligges-3 wrote
 On 12.02.2013 14:54, Ben Bolker wrote:
 Duncan Murdoch
 murdoch.duncan at
   gmail.com writes:

 [snip]

 Regarding stringsAsFactors:  I'm not going to defend keeping it as 
 is, I'll let the people who like it defend it.

 Would someone (anyone) like to come forward and give us a 
 defense of stringsAsFactors=TRUE -- even someone who doesn't 
 personally like it but would like to play devil's advocate?

 Sure:
 I will have to change all my scripts, my teaching examples, my book, 
 and lots of code examples for research and particularly consulting jobs.

 Personally, I think having stringsAsFactors=TRUE is not too bad for
 read.table() but less useful for data.frame().

 And since you ask for the devil's advocate already, related to the 
 subject line: Removing stars is horrible for consulting: With all 
 those people from biology, medicine and other fields who even ask us 
 questions in term of significance stars that are obviously very common for 
 them.
 Many of them will certainly ask us for the stars, and ask us to 
 switch to another software product once they do not get it from R. 
 They may not be interested in being taught about the advantages or 
 disadvantages of p-values or stars.

 There are different use cases of R, and I want to keep stars for 
 consulting tasks where things have to be delivered within minutes. I 
 am happy with or without for teaching, where I have the time and can 
 easily talk about the sense and nonsense of p-values.


 Best,
 Uwe














 What I will likely do is
 make a few changes so that character vectors are automatically 
 changed to factors in modelling functions, so that operating with 
 stringsAsFactors=FALSE doesn't trigger silly warnings.

 Duncan Murdoch


[apologies for snipping context: gmane made me do it]

 __


 R-devel@

   mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


 __

 R-devel@

   mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel





 -
 Frank Harrell
 Department of Biostatistics, Vanderbilt University
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Regression-stars-tp4657795p4658268.html
 Sent from the R devel mailing list archive at Nabble.com.

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Tim Triche, Jr.
I think it may have been John D. Cook who first observed that p-values are
linearly correlated with the amount of time remaining on a grant.

Perhaps a suitable transform would reveal an ordinal relationship with
stars.



On Tue, Feb 12, 2013 at 7:03 AM, Ravi Varadhan ravi.varad...@jhu.eduwrote:

 They are reaching for the stars.  Pardon my jest, but I couldn't resist.

 Ravi

 -Original Message-
 From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org]
 On Behalf Of Uwe Ligges
 Sent: Tuesday, February 12, 2013 10:01 AM
 To: Frank Harrell
 Cc: r-devel@r-project.org
 Subject: Re: [Rd] Regression stars



 On 12.02.2013 15:42, Frank Harrell wrote:
  Uwe I've been consulting for decades and have never once been asked
  for such stars.

 Honestly: last time I have been asked last week.

 And when I answered (in another case few months ago) OK, I can add you
 another 5 stars for p values smaller than 0.5 they did not find it too
 funny.

 Best,
 Uwe

  And when a clinical researcher puts a sentence in a study protocol
  that P0.05 will be considered significant I get them to take it out.
 
  Frank
 
  Uwe Ligges-3 wrote
  On 12.02.2013 14:54, Ben Bolker wrote:
  Duncan Murdoch
  murdoch.duncan at
gmail.com writes:
 
  [snip]
 
  Regarding stringsAsFactors:  I'm not going to defend keeping it as
  is, I'll let the people who like it defend it.
 
  Would someone (anyone) like to come forward and give us a
  defense of stringsAsFactors=TRUE -- even someone who doesn't
  personally like it but would like to play devil's advocate?
 
  Sure:
  I will have to change all my scripts, my teaching examples, my book,
  and lots of code examples for research and particularly consulting jobs.
 
  Personally, I think having stringsAsFactors=TRUE is not too bad for
  read.table() but less useful for data.frame().
 
  And since you ask for the devil's advocate already, related to the
  subject line: Removing stars is horrible for consulting: With all
  those people from biology, medicine and other fields who even ask us
  questions in term of significance stars that are obviously very common
 for them.
  Many of them will certainly ask us for the stars, and ask us to
  switch to another software product once they do not get it from R.
  They may not be interested in being taught about the advantages or
  disadvantages of p-values or stars.
 
  There are different use cases of R, and I want to keep stars for
  consulting tasks where things have to be delivered within minutes. I
  am happy with or without for teaching, where I have the time and can
  easily talk about the sense and nonsense of p-values.
 
 
  Best,
  Uwe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  What I will likely do is
  make a few changes so that character vectors are automatically
  changed to factors in modelling functions, so that operating with
  stringsAsFactors=FALSE doesn't trigger silly warnings.
 
  Duncan Murdoch
 
 
 [apologies for snipping context: gmane made me do it]
 
  __
 
 
  R-devel@
 
mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel
 
 
  __
 
  R-devel@
 
mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel
 
 
 
 
 
  -
  Frank Harrell
  Department of Biostatistics, Vanderbilt University
  --
  View this message in context:
  http://r.789695.n4.nabble.com/Regression-stars-tp4657795p4658268.html
  Sent from the R devel mailing list archive at Nabble.com.
 
  __
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel
 

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel




-- 
*A model is a lie that helps you see the truth.*
*
*
Howard Skipperhttp://cancerres.aacrjournals.org/content/31/9/1173.full.pdf

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Simon Urbanek

On Feb 12, 2013, at 11:05 AM, Brian Lee Yung Rowe wrote:

 
 I thought that the default was the way it was for performance reasons. For 
 large data.frames or repeated applications, using factors should be faster 
 for non-trivial strings.
 
 fs - c('apple','peach','watermelon','spinach','persimmon','potato','kale')
 n - 100
 
 a1 - data.frame(f=sample(fs,n,replace=TRUE), x1=rnorm(n), x2=rnorm(n), 
 stringsAsFactors=TRUE)
 a2 - data.frame(f=sample(fs,n,replace=TRUE), x1=rnorm(n), x2=rnorm(n), 
 stringsAsFactors=FALSE)
 
 fn - function(i,x) x[x$f %in% c('kale','spinach'),]
 system.time(z - sapply(1:100, fn, a1))
   user  system elapsed 
 19.614   4.037  24.649 
 system.time(z - sapply(1:100, fn, a2))
   user  system elapsed 
 19.726   7.715  36.761 
 

Not really:

 system.time(z - sapply(1:100, fn, a1))
   user  system elapsed 
 13.780   0.444  14.229 
 rm(z)
 gc()
  used (Mb) gc trigger   (Mb)  max used   (Mb)
Ncells  182113  9.8 407500   21.8337655   18.1
Vcells 5789638 44.2  133982285 1022.3 163019778 1243.8
 system.time(z - sapply(1:100, fn, a2))
   user  system elapsed 
 13.201   0.668  13.873 


But your test is bogus, because %in% uses match() which converts factors to 
character vectors anyway, so in your case you're just measuring noise in your 
system, character vectors are always faster in your example.

The reason is that in R strings are hashed so character vectors are technically 
very similar to factors just with faster access (because they don't need to go 
through the integer indirection). On 32-bit strings are in theory always faster 
than factors, on 64-bit they use double the size so they may or may not be 
faster depending on how you hit the cache etc. Anyway, in modern R versions 
you're much better off using character vectors than factors for any processing, 
so stringsAsFactors=FALSE is what I use exclusively.

Cheers,
Simon

 
 On Feb 12, 2013, at 10:40 AM, Ben Bolker bbol...@gmail.com wrote:
 
 Thanks, Uwe.
 Now let me go one step farther.
 
 Can you (or anyone) give a good argument **other than backward
 compatibility** for keeping the stringAsFactors=TRUE argument on
 data.frame()?
 
 I appreciate your distinction between data.frame() and read.table()'s
 use of stringAsFactors, and I can see that there is some point for
 quick-and-dirty interactive use in setting all non-numeric variables to
 factors (arguing that wanting non-numerics as factors is somewhat more
 common than wanting them as strings).
 
 It might be nice to add an optional stringsAsFactors (and check.names)
 argument to transform(): I've had to write my own Transform() function
 to allow the defaults to be overridden, since transform() calls
 data.frame() with the defaults.  (Setting the stringsAsFactors option
 globally would work, although not for check.names.)
 
 Ben BOlker
 
 
 
 What I will likely do is
 make a few changes so that character vectors are automatically changed
 to factors in modelling functions, so that operating with
 stringsAsFactors=FALSE doesn't trigger silly warnings.
 
 Duncan Murdoch
 
 
 [apologies for snipping context: gmane made me do it]
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Contribution

2013-02-12 Thread Claudia Beleites
Hi Parthasarathy,

IMHO the easiest way to contribute to R is contributing to an R
package. And one way to do that is to apply for a Google Summer of Code
project. I guess activities about that will start soon, as the program
was just announced, and they will take place at a separate email list:

gso...@groups.google.com

So I suggest you sign up for that list, and maybe explain a bit who you
are, what experience you have in R programming (or other languages) and
what your programming interests are. 

Best, Claudia




 I am Parthasarathy G , from IIT Maras ( India ). I am currently in
 third year of the undergraduate course.
 
 I would like to contribute to the R project. Can anyone guide me
 regarding this?
 
 Thanking you,
 Parthasarathy
 
   [[alternative HTML version deleted]]
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Claudia Beleites
Spectroscopy/Imaging
Institute of Photonic Technology 
Albert-Einstein-Str. 9
07745 Jena
Germany

email: claudia.belei...@ipht-jena.de
phone: +49 3641 206-133
fax:   +49 2641 206-399

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Hervé Pagès

On 02/12/2013 08:20 AM, peter dalgaard wrote:


On Feb 12, 2013, at 17:05 , Brian Lee Yung Rowe wrote:



I thought that the default was the way it was for performance reasons. For 
large data.frames or repeated applications, using factors should be faster for 
non-trivial strings.


I think not. Historically, it's more like In statistics we have two kinds of 
variables, numerical and categorical. OK, so we have the occasional truly character-type 
variables like name and address, let's handle those as a special case.


sarcasm

Since character vectors are so bad and people use them where
they should instead use a factor, I propose to go all the way and
by adding the stringsAsFactors arg to character() too. That way
people are put on the right track from the very start.

/sarcasm

No seriously, if my variable is categorical, it's already in a factor
and that's how I pass it to data.frame(). But if I have it in a
character vector, it's because that's how I want it. It's my choice.
How could anybody ever think that having data.frame() alter his/her
data is a good thing?

Please *remove* the stringsAsFactors arg of data.frame() in R 3.0.
You'll do a big favor to your user base.

Thanks,
H.






--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Duncan Murdoch

On 12/02/2013 1:47 PM, Hervé Pagès wrote:

On 02/12/2013 08:20 AM, peter dalgaard wrote:

 On Feb 12, 2013, at 17:05 , Brian Lee Yung Rowe wrote:


 I thought that the default was the way it was for performance reasons. For 
large data.frames or repeated applications, using factors should be faster for 
non-trivial strings.

 I think not. Historically, it's more like In statistics we have two kinds of 
variables, numerical and categorical. OK, so we have the occasional truly character-type 
variables like name and address, let's handle those as a special case.

sarcasm

Since character vectors are so bad and people use them where
they should instead use a factor, I propose to go all the way and
by adding the stringsAsFactors arg to character() too. That way
people are put on the right track from the very start.

/sarcasm


I think you are misreading what Peter wrote.  He wasn't defending that 
point of view, he was describing it.


No seriously, if my variable is categorical, it's already in a factor
and that's how I pass it to data.frame(). But if I have it in a
character vector, it's because that's how I want it. It's my choice.
How could anybody ever think that having data.frame() alter his/her
data is a good thing?

Please *remove* the stringsAsFactors arg of data.frame() in R 3.0.
You'll do a big favor to your user base.


That's a really bad suggestion -- it would break code for people who set 
stringsAsFactors=FALSE as well as those who rely on the current default 
behaviour.   We certainly won't do that.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Hervé Pagès

Hi Duncan,

On 02/12/2013 11:19 AM, Duncan Murdoch wrote:

On 12/02/2013 1:47 PM, Hervé Pagès wrote:

On 02/12/2013 08:20 AM, peter dalgaard wrote:

 On Feb 12, 2013, at 17:05 , Brian Lee Yung Rowe wrote:


 I thought that the default was the way it was for performance
reasons. For large data.frames or repeated applications, using factors
should be faster for non-trivial strings.

 I think not. Historically, it's more like In statistics we have two
kinds of variables, numerical and categorical. OK, so we have the
occasional truly character-type variables like name and address, let's
handle those as a special case.

sarcasm

Since character vectors are so bad and people use them where
they should instead use a factor, I propose to go all the way and
by adding the stringsAsFactors arg to character() too. That way
people are put on the right track from the very start.

/sarcasm


I think you are misreading what Peter wrote.  He wasn't defending that
point of view, he was describing it.


I was answering to the thread, not to Peter in particular. Sorry if it
sounded otherwise.



No seriously, if my variable is categorical, it's already in a factor
and that's how I pass it to data.frame(). But if I have it in a
character vector, it's because that's how I want it. It's my choice.
How could anybody ever think that having data.frame() alter his/her
data is a good thing?

Please *remove* the stringsAsFactors arg of data.frame() in R 3.0.
You'll do a big favor to your user base.


That's a really bad suggestion -- it would break code for people who set
stringsAsFactors=FALSE as well as those who rely on the current default
behaviour.   We certainly won't do that.


But since there seems to be a discussion about doing some changes to
the stringsAsFactors feature, I was hoping you would consider that
one too.  Doing the right thing sometimes requires breaking people's
code, sadly!

Cheers,
H.



Duncan Murdoch



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] stopping finalizers

2013-02-12 Thread Thomas Lumley
Is there some way to prevent finalizers running during a section of code?

I have a package that includes R objects linked to database tables.  To
maintain the call-by-value semantics, tables are copied rather than
modified, and the extra tables are removed by finalizers during garbage
collection.

However, if the garbage collection occurs in the middle of processing
another SQL query (which is relatively likely, since that's where the
memory allocations are) there are problems with the database interface.

Since the guarantees for the finalizer are at most once, not before the
object is out of scope it seems harmless to be able to prevent finalizers
from running during a particular code block, but I can't see any way to do
it.

Suggestions?

-thomas


-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel