Re: [Rd] Write unix format files on windows and vice versa
On 24-Apr-2012 17:23:00 Andrew Redd wrote: I go back and forth between windows and linux, and find myself running into problem with line endings. Is there a way to control the line ending conversion when writing files, such as write and cat? More explicitly I want to be able to write files with LF line endings rather than CRLF line ending on windows; and CRLF line endings instead of LF on linux, and I want to be able to control when the conversion is made and/or choose the line endings that I want. As far as I can tell the conversion is not optional and buried deep in compiled code. Is there a possible work around? Thanks, Andrew Rather than write the Linux version out in Windows (or the other way round in Linux), you might find it more useful to use an external conversion utility. One such is unix2dos/dos2unix (the same program in either case, whose function depends on what name you call it by). This used to be standard on older Linux systems, but may need to be explicitly installed on your system. It seems there may also be a version for Windows -- see: http://download.cnet.com/Unix2DOS/3000-2381_4-10488164.html Then, when you create a file in Windows, you could transfer it to Linux and covert it to Unix; and also keep it as it is for use on Windows. Conversely, when you create a file in Linux, you coled convert it to Windows and transfer it to Windows; and also keep it as it is for use on Linux. It is also possible to do the CRLF--LF or LF--CRLF using 'sed' in Linux. For some further detail see http://en.wikipedia.org/wiki/Unix2dos Hoping this helps, Ted. - E-Mail: (Ted Harding) ted.hard...@wlandres.net Date: 24-Apr-2012 Time: 18:56:21 This message was sent by XFMail __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] NA vs. NA
On 05-Apr-2012 11:03:15 Adrian Dusa wrote: Dear All, I assume this is an R-devel issue, apologies if I missed something obvious. I have a dataframe where the row names are country codes, based on ISO 3166, something like this: v1v2 UK12 NA23 It happens that NA is the country code for Namibia, and that creates problems on using this data within a package due to this: Error in read.table(zfile, header = TRUE, as.is = FALSE) : missing values in 'row.names' are not allowed I realise that NA is reserved in R, but I assumed that when quoted it would be usable. For the moment I simply changes the country code, but I wonder if there's any (other) solution to circumvent this issue. Thanks very much in advance, Adrian Hi Adrian, The default in read.table() for the na.strings parameter is na.strings = NA So, provided you have no NA in the data portion of your file (or e.g. any missing values are simply blank) you could use something like: read.table(zfile, header = TRUE, as.is = FALSE, na.strings=OOPS) which should avoid the problem. Hoping this helps, Ted. - E-Mail: (Ted Harding) ted.hard...@wlandres.net Date: 05-Apr-2012 Time: 12:21:57 This message was sent by XFMail __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Julia
http://julialang.org/blog Then click on Stanford Talk Video. Then click on available here. Ted. On 01-Mar-2012 Kjetil Halvorsen wrote: Can somebody postb a link to the video? I cant find it, searching Julia on youtube stanford channel gives nothing. Kjetil On Thu, Mar 1, 2012 at 11:37 AM, Douglas Bates ba...@stat.wisc.edu wrote: On Thu, Mar 1, 2012 at 11:20 AM, Jeffrey Ryan jeffrey.r...@lemnica.com wrote: Doug, Agreed on the interesting point - looks like it has some real promise. Â_I think the spike in interest could be attributable to Mike Loukides's tweet on Feb 20. (editor at O'Reilly) https://twitter.com/#!/mikeloukides/status/171773229407551488 That is exactly the moment I stumbled upon it. I think Jeff Bezanson attributes the interest to a blog posting by Viral Shah, another member of the development team, that hit Reddit. He said that, with Viral now in India, it all happened overnight for those in North America and he awoke the next day to find a firestorm of interest. Â_I ran across Julia in the Release Notes of LLVM and mentioned it to Dirk Eddelbuettel who posted about it on Google+ in January. Â_(Dirk, being much younger than I, knows about these new-fangled social media things and I don't.) __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel - E-Mail: (Ted Harding) ted.hard...@wlandres.net Date: 01-Mar-2012 Time: 20:47:42 This message was sent by XFMail __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] bug in sum() on integer vector
of decimal (or binary) digits. An example of this is R (and most other numerical programs). This may be 32 bits or 64 bits. Any result ot computation which involve smore than this numer of bits is inevitably an approximation. Provided the user is aware of this, there is no need for your It should always return the correct value or fail. It will return the correct value if the integers are not too large; otherwise it will retuirn the best approximation that it can cope with in the fixed finite storage space for which it has been programmed. There is an implcit element of the arbitrary in this. You can install 32-bit R on a 64-bit-capable machine, or a 64-bit version. You could re-program R so that it can work to, say, 128 bits or 256 bits even on a 32-bit machine (using techniques like those that underlie 'bc'), but that would be an arbitrary choice. However, the essential point is that some choice is unavoidable, since if you push it too far the Universe will run out of particles -- and the computer industry will run out of transistors long before you hit the Universe limit! So you just have to accept the limits. Provided you are aware of the approximations which may set in at some point, you can cope with the consequences, so long as you take account of some concept of adequacy in the inevitable approximations. Simply to fail is far too unsophisticated a result! Hoping this is useful, Ted. E-Mail: (Ted Harding) ted.hard...@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 14-Dec-11 Time: 00:52:49 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] array extraction
Somewhat out of my depth here (I have only 2 arms, but am swimming in waters which require 3): My interpretation is that a and M are basically vectors, with dimension attributes, accessed down the columns. The array 'a' consists of 30 elements 1:30 in that order, accessed by each of 3 rows for each of 5 columns in each of two layers, in that order of precedence. The matrix M consusts of 15 elements, accessed by each of 35 rows for each of 5 columns. Thus a(M) treats M as an selection vector, ignoring dimension, and reads along a and at the same time along M, selecting according to TRUE of FALSE. Then, when it gets to the end of the first layer in 'a' it re-cycles M. When I tried a[M,] I got a[M,] # Error in a[M, ] : incorrect number of dimensions I infer that it is treating the M as a vector, so there are only 2 dimensions in a[M,] instead of 3. So try: a[M,,] # Error: (subscript) logical subscript too long which is not surprising since one is applying a 15-long selector to the first dimension of 'a', which has only length 3, just as in a[rep(TRUE,15),,] # Error: (subscript) logical subscript too long which, interestingly, differs from a[rep(1,15),,] # (Output, which is what you'd expect, omitted) (Hmm, out of my depth again ... ). Well, maybe two arms is not enough when you need three to swim in these waters; but it seems that one long swishy tail will do nicely. That being said, I still find the water quite murky! Ted. On 27-Sep-11 22:12:26, robin hankin wrote: thank you Simon. I find a[M] working to be unexpected, but consistent with (a close reading of) Extract.Rd Can we reproduce a[,M]? [I would expect this to extract a[,j,k] where M[j,k] is TRUE] try this: a - array(1:30,c(3,5,2)) M - matrix(1:10,5,2) %% 3==1 a[M] [1] 1 4 7 10 11 14 17 20 21 24 27 30 This is not doing what I would want a[,M] to do. I'll checkout afill() right now best wishes Robin On Wed, Sep 28, 2011 at 10:39 AM, Simon Knapp sleepingw...@gmail.com wrote: a[M] gives the same as your `cobbled together' code. On Wed, Sep 28, 2011 at 6:35 AM, robin hankin hankin.ro...@gmail.com wrote: hello everyone. Look at the following R idiom: _a - array(1:30,c(3,5,2)) _M - (matrix(1:15,c(3,5)) %% 4) 2 _a[M,] - 0 Now, I think that a[M,] has an unambiguous meaning (to a human). However, the last line doesn't work as desired, but I expected it to...and it recently took me an indecent amount of time to debug an analogous case. _Just to be explicit, I would expect a[M,] to extract a[i,j,] where M[i,j] is TRUE. _(Extract.Rd is perfectly clear here, and R is behaving as documented). The best I could cobble together was the following: _ind - which(M,arr.ind=TRUE) _n - 3 _ind - cbind(kronecker(ind,rep(1,dim(a)[n])),rep(seq_len(dim(a)[n]),nrow(ind) )) _a[ind] - 0 but the intent is hardly clear, certainly compared to a[M,] I've been pondering how to implement such indexing, and its generalization. Suppose 'a' is a seven-dimensional array, and M1 a matrix and M2 a three-dimensional array (both Boolean). _Then a[,M1,,M2] is a natural generalization of the above. _I would want a[,M1,,M2] to extract a[i1,i2,i3,i4,i5,i6,i7] where M1[i2,i3] and M[i5,i6,i7] are TRUE. One would need all(dim(a)[2:3] == dim(M1)) and all(dim(a)[5:7] == dim(M2)) for consistency. Can any R-devel subscribers advise? -- Robin Hankin Uncertainty Analyst hankin.ro...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Robin Hankin Uncertainty Analyst hankin.ro...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel E-Mail: (Ted Harding) ted.hard...@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 28-Sep-11 Time: 00:28:37 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Wish there were a strict mode for R interpreter. What
On 09-Apr-11 20:37:28, Duncan Murdoch wrote: On 11-04-09 3:51 PM, Paul Johnson wrote: Years ago, I did lots of Perl programming. Perl will let you be lazy and write functions that refer to undefined variables (like R does), but there is also a strict mode so the interpreter will block anything when a variable is mentioned that has not been defined. I wish there were a strict mode for checking R functions. Here's why. We have a lot of students writing R functions around here and they run into trouble because they use the same name for things inside and outside of functions. When they call functions that have mistaken or undefined references to names that they use elsewhere, then variables that are in the environment are accidentally used. Know what I mean? dat- whatever someNewFunction- function(z, w){ #do something with z and w and create a new dat # but forget to name it dat lm (y, x, data=dat) # lm just used wrong data } I wish R had a strict mode to return an error in that case. Users don't realize they are getting nonsense because R finds things to fill in for their mistakes. Is this possible? Does anybody agree it would be good? It would be really bad, unless done carefully. In your function the free (undefined) variables are dat and lm. You want to be warned about dat, but you don't want to be warned about lm. What rule should R use to determine that? (One possible rule would work in a package with a namespace. In that case, all variables must be found in declared dependencies, the search could stop before it got to globalenv(). But it seems unlikely that your students are writing packages with namespaces.) Duncan Murdoch I'm with Duncan on this one! On the other hand, I can understand the issues that Paul's students might encounter. I think the right thing to so is to introduce the students to the basics of scoping, early in the process of learning R. Thus, when there is a variable (such as 'lm' in the example) which you *expect* to already be out there (since 'lm' is in 'stats' which is pre-loaded by default), then you can go ahead and use it. But when your function uses a variable (e.g. 'dat') which just *happened* to be out there when you first wrote the function, then when you re-use the same function definition in a different context things are likely to go wrong. So teach them that variables which occur in functions, which might have any meaning in whatever the context of use may be, should either be named arguments in the argument list, or should be specifically defined within the function, and not assumed to already exist unless that is already guaranteed in every context in which the function would be used. This is basic good practice which, once routinely adopted, should ensure that the right thing is done every time! Ted. E-Mail: (Ted Harding) ted.hard...@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 09-Apr-11 Time: 22:08:10 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Surprising behavior of letters[c(NA, NA)]
On 17-Dec-10 14:32:18, Gabor Grothendieck wrote: Consider this: letters[c(2, 3)] [1] b c letters[c(2, NA)] [1] b NA letters[c(NA, 3)] [1] NA c letters[c(NA, NA)] [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [26] NA The result is a 2-vector in each case until we get to c(NA, NA) and then it unexpectedly changes from returning a 2-vector to returning a 26-vector. I think most people would have expected that the answer would be c(NA, NA). I'm not sure that it is suprising! Consider letters[NA] which returns exactly the same result. Then consider that 'letters' is simply a 26-element character vector c(a,...). Now consider x - c(1,2,3,4,5,6,7,8,9,10,11,12,13) x[NA] # [1] NA NA NA NA NA NA NA NA NA NA NA NA NA In other words, x[NA] for any vector x will test each index 1:length(x) against NA, and will find that it's NA, since it doesn't know whether the index matches or not. Therefore it returns NA for that index, and will do the same for every index. So it's telling you: For each of my elements a,b,c,d,e,f,... I have to tell you that I don't know whether you want it or not. You also get similar behavior for x==NA. If anything might be surprising (though that also admits a logical explanation), is the result letters[c(2, NA)] # [1] b NA since the result being asked for by the first element of c(2,NA) is definite -- so far so good -- but then you would expect it to have the same problem with what is being asked for by NA. This time, it seems that because the 2-element vector c(2,NA) is being submitted, its length over-rides the length of the response that would be given for x[NA]: You asked for a 2-element extraction from letters; I can see what you want for the first, but not for the second. However, that logic does not work for letters[c(NA,NA)] which still returns the 26-element result! After all that, I'm inclined to the view that letters[NA] should return one element (NA), letters[c(NA,NA)] should return 2 (NA,NA), etc.; and that the same should apply to all vectors accessed by []. The above behaviour seems to contradict [what I can understand from] what is said in ?[: NAs in indexing: When extracting, a numerical, logical or character 'NA' index picks an unknown element and so returns 'NA' in the corresponding element of a logical, integer, numeric, complex or character result, and 'NULL' for a list. (It returns '00' for a raw result.] since that seems to imply that x[c(NA,NA)] should return c(NA,NA) and not rep(NA,length(x))! Ted. E-Mail: (Ted Harding) ted.hard...@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 17-Dec-10 Time: 15:40:03 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [R] Logical vectors
On 04-Nov-10 08:56:42, Gerrit Eichner wrote: On Thu, 4 Nov 2010, Stephen Liu wrote: [snip] In; 2.4 Logical vectors http://cran.r-project.org/doc/manuals/R-intro.html#R-and-statistics It states:- The logical operators are , =, , =, == for exact equality and != for inequality # exact equality != # inequality [snip] Hello, Stephen, in my understanding of the sentence The logical operators are , =, , =, == for exact equality and != for inequality the phrase exact equality refers to the operator ==, i. e. to the last element == in the enumeration (, =, , =, ==), and not to its first. Regards -- Gerrit This indicates that the sentence can be mis-read. It should be cured by a small change in punctuation (hence I copy to R-devel): The logical operators are , =, , =; == for exact equality; and != for inequality Hoping this helps! Ted. E-Mail: (Ted Harding) ted.hard...@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 04-Nov-10 Time: 09:08:37 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] No RTFM?
On 22-Aug-10 18:31:46, Paul Johnson wrote: Hey Ben: One of my colleagues bought your book and was reading it during a faculty meeting last Tuesday. Everybody kept asking what's that? If you know how to put it in the wiki, would you please do it and let us know where it is. I was involved with an R wiki about 5 or 6 years ago, but completely lost track of it. I'm going to keep pushing for sharp, clear guidelines. If I could get R-help set up as a template for messages like bugzilla, I think it would be awesome! To disagree with Ted. People won't mind giving us what we need, as long as we tell them exactly what to do. We save ourselves the circle of give us sessionInfo() and give us your LANG and 'what version is that' and so forth. People won't mind? If R-help ends up telling me exactly what to do, I shall leave the list. I mean it. For good. Ted. [the rest snipped] E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 22-Aug-10 Time: 19:49:41 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] No RTFM?
is not part of R proper and you should address questions about Bioconductor to their support framework. Presumably you mean the various special lists for Bioconductor? In such a case, that is fair enough, since anyhone using Bioconductor is likely to be subscribed to such lists anyway. (But they wouldn't need to be told about this in the general Posting Guide). [***] There are many packages, however, such as combinat, boot, cluster, etc. which are not part of R-base (though many of them are in recommended) which are in frequent use, and questions about them to R-help are perfectly in order (and the individual author of such a package does not want to be the sole recipient of frequent questions about it -- the R-help community can carry the load better with its distributed model). C. If you are writing code for R itself, or if you are developing a package, send your question to r-devel, rather than r-help. It depends on the question! D. For operating-system or R interface questions, there are dedicated lists. See R-sig-Mac, R-sig-Debian, R-sig-Fedora, etc. Fair enough (in most cases). == It will be necessary to add, toward the end, the part about be polite when posting. And, since a reply is also a posting, be polite when responding! And along the lines of the No RTFM policy, I think we should say All RTFM answers should include an exact document and section number. It is certainly insulting to answer a question about plot with the one line ?plot but it is not insulting to say In ?plot, check the Details section and run the example code. Fair enough -- and indeed this is an example of a helpful and sufficient (for most people) response, and therefore is not in the RTFM category. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 22-Aug-10 Time: 03:13:22 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] suggestion for ?factor
Hi Folks. May I suggest a small addition to ?factor -- this is not explicit about how to force the ordering of levels in an ordered factor. One can guess it, but I think it would be a good idea to spell it out. For example, one can do: X - c(Alf,Bert,Chris,Dave) F - factor(X,levels=c(Dave,Alf,Chris,Bert),ordered=TRUE) F # [1] Alf Bert Chris Dave # Levels: Dave Alf Chris Bert So, where ?factor says: levels: an optional vector of the values that ?x? might have taken. The default is the unique set of values taken by 'as.character(x)', sorted into increasing order of 'x'. Note that this set can be smaller than 'sort(unique(x))'. it could include a bit extra so as to read: levels: an optional vector of the values that 'x' might have taken. The default is the unique set of values taken by 'as.character(x)', sorted into increasing order of 'x'. Note that this set can be smaller than 'sort(unique(x))'. If 'levels=...' is present and 'ordered=TRUE', then the levels are ordered according to the order in which they are listed in 'levels'. With thanks, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 09-Aug-10 Time: 16:37:50 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] results of pnorm as either NaN or Inf
On 13-May-10 20:04:50, efree...@berkeley.edu wrote: I stumbled across this and I am wondering if this is unexpected behavior or if I am missing something. pnorm(-1.0e+307, log.p=TRUE) [1] -Inf pnorm(-1.0e+308, log.p=TRUE) [1] NaN Warning message: In pnorm(q, mean, sd, lower.tail, log.p) : NaNs produced pnorm(-1.0e+309, log.p=TRUE) [1] -Inf I don't know C and am not that skilled with R, so it would be hard for me to look into the code for pnorm. If I'm not just missing something, I thought it may be of interest. Details: I am using Mac OS X 10.5.8. I installed a precompiled binary version. Here is the output from sessionInfo(), requested in the posting guide: R version 2.11.0 (2010-04-22) i386-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.11.0 Thank you very much, Eric Freeman UC Berkeley This is probably platform-independent. I get the same results with R on Linux. More to the point: You are clearly pushing the envelope here. First, have a look at what R makes of your inputs to pnorm(): -1.0e+307 # [1] -1e+307 -1.0e+308 # [1] -1e+308 -1.0e+309 # [1] -Inf So, somewhere beteen -1e+308 and -1.0e+309. the envelope burst! Given -1.0e+309, R returns -Inf (i.e. R can no longer represent this internally as a finite number). Now look at pnorm(-Inf,log.p=TRUE) # [1] -Inf So, R knows how to give the correct answer (an exact 0, or -Inf on the log scale) if you feed pnorm() with -Inf. So you're OK with -1e+N where N = 309. For smaller powers, e.g. -1e+(200:306), these give pnorm() much less than -1.0e+309, and presumably R's algorithm (which I haven't studied either ... ) returns 0 for pnorm(), as it should to the available internal accuracy. So, up to pnorm(-1.0e+307, log.p=TRUE) = -Inf. All is as it should be. However, at -1e+308, the envelope is about to burst, and something may occur within the algorithm which results in a NaN. So there is nothing anomalous about your results except at -1e+308, which is where R is at a critical point. That's how I see it, anway! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 14-May-10 Time: 00:01:27 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] y ~ X -1 , X a matrix
On 17-Mar-10 23:32:41, Ross Boylan wrote: While browsing some code I discovered a call to lm that used a formula y ~ X - 1, where X was a matrix. Looking through the documentation of formula, lm, model.matrix and maybe some others I couldn't find this useage (R 2.10.1). Is it anything I can count on in future versions? Is there documentation I've overlooked? For the curious: model.frame on the above equation returns a data.frame with 2 columns. The second column is the whole X matrix. model.matrix on that object returns the expected matrix, with the transition from the odd model.frame to the regular matrix happening in an .Internal call. Thanks. Ross P.S. I would appreciate cc's, since mail problems are preventing me from seeing list mail. Hmmm ... I'm not sure what is the problem with what you describe. Code: set.seed(54321) X - matrix(rnorm(50),ncol=2) Y - 1*X[,1] + 2*X[,2] + 0.25*rnorm(25) LM - lm(Y ~ X-1) summary(LM) # Call: # lm(formula = Y ~ X - 1) # Residuals: # Min 1Q Median 3Q Max # -0.39942 -0.13143 -0.02249 0.11662 0.61661 # Coefficients: #Estimate Std. Error t value Pr(|t|) # X1 0.977070.04159 23.49 2e-16 *** # X2 2.091520.06714 31.15 2e-16 *** # --- # Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 # Residual standard error: 0.2658 on 23 degrees of freedom # Multiple R-squared: 0.9863, Adjusted R-squared: 0.9851 # F-statistic: 826.6 on 2 and 23 DF, p-value: 2.2e-16 model.frame(LM) # Y X.1 X.2 # 1 0.04936244 -0.178900750 0.051420078 # 2 -0.54224173 -0.928044132 -0.027963292 # [...] # 24 1.54196979 0.312332806 0.602009497 # 25 -0.16928420 -1.285559427 0.394790358 str(model.frame(LM)) # $ Y: num 0.0494 -0.5422 -0.7295 -3.4422 -3.1296 ... # $ X: num [1:25, 1:2] -0.179 -0.928 -0.784 -1.651 -0.408 ... # [...] model.frame(Y ~ X-1) # Y X.1 X.2 # 1 0.04936244 -0.178900750 0.051420078 # 2 -0.54224173 -0.928044132 -0.027963292 # [...] # 24 1.54196979 0.312332806 0.602009497 # 25 -0.16928420 -1.285559427 0.394790358 ## (Identical to above) str(model.frame(Y ~ X-1)) # $ Y: num 0.0494 -0.5422 -0.7295 -3.4422 -3.1296 ... # $ X: num [1:25, 1:2] -0.179 -0.928 -0.784 -1.651 -0.408 ... # [...] ## (Identical to above) Maybe the clue (admittedly somewhat obtuse( can be found in ?lm: lm(formula, data, subset, weights, na.action, method = qr, model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, contrasts = NULL, offset, ...) [...] data: an optional data frame, list or environment (or object coercible by 'as.data.frame' to a data frame) containing the variables in the model. If not found in 'data', the variables are taken from 'environment(formula)', typically the environment from which ?lm? is called. So, in the example the variables are taken from X, coercible by 'as.data.frame' ... taken from 'environment(formula)'. Hence (I guess) X is found in the environment and is coerced into a dataframe with 2 columns, and X.1, X.2 are taken from there. R Gurus: Please comment! (I'm only guessing by plausibility). Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 18-Mar-10 Time: 00:57:20 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Possible bug in fisher.test() (PR#14196)
On 27-Jan-10 17:30:10, nhor...@smith.edu wrote: # is there a bug in the calculation of the odds ratio in fisher.test? # Nicholas Horton, nhor...@smith.edu Fri Jan 22 08:29:07 EST 2010 x1 = c(rep(0, 244), rep(1, 209)) x2 = c(rep(0, 177), rep(1, 67), rep(0, 169), rep(1, 40)) or1 = sum(x1==1x2==1)*sum(x1==0x2==0)/ (sum(x1==1x2==0)*sum(x1==0x2==1)) library(epitools) or2 = oddsratio.wald(x1, x2)$measure[2,1] or3 = fisher.test(x1, x2)$estimate # or1=or2 = 0.625276, but or3=0.6259267! I'm running R 2.10.1 under Mac OS X 10.6.2. Nick Not so. Look closely at ?fisher.test: Value: [...] estimate: an estimate of the odds ratio. Note that the _conditional_ Maximum Likelihood Estimate (MLE) rather than the unconditional MLE (the sample odds ratio) is used. Only present in the 2 by 2 case. Your or1 (and presumably the epitools value also) is the sample OR. The conditional MLE is the value of rho (the OR) that maximises the probability of the table *conditional* on the margins. In this case it differs slightly from the sample OR (by 0.1%). For smaller tables it will tend to differ even more, e.g. M1 - matrix(c(4,7,17,18),nrow=2) M1 # [,1] [,2] # [1,]4 17 # [2,]7 18 (4*18)/(17*7) # [1] 0.605042 fisher.test(M1)$estimate # odds ratio # 0.6116235 ## (1.1% larger than sample OR) M2 - matrix(c(1,2,4,5),nrow=2) M2 # [,1] [,2] # [1,]14 # [2,]25 (1*5)/(4*2) # [1] 0.625 fisher.test(M2)$estimate # odds ratio # 0.649423 ## (3.9% larger than sample OR) The probability of a table matrix(c(a,b,c,d),nrow=2) given the marginals (a+b),(a+c),(b+c) and hence also (c+d) is a function of the odds ratio only. Again see ?fisher.test: given all marginal totals fixed, the first element of the contingency table has a non-central hypergeometric distribution with non-centrality parameter given by the odds ratio (Fisher, 1935). The value of the odds ratio which maximises this (for given observed 'a') is not the sample OR. Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 27-Jan-10 Time: 18:14:57 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Benefit of treating NA and NaN differently for numerics
On 31-Dec-09 20:43:43, Saptarshi Guha wrote: Hello, I notice in main/arithmetic.c, that NA and NaN are encoded differently(since every numeric NA comes from R_NaReal which is defined via ValueOfNA) What is the benefit of treating these two differently? Why can't NA be a synonym for NaN? Thank you Saptarshi (R-2.9) Because they are used to represent different things. Others will be able to give you a much more comprehensive account than I can of their uses in R, but essentially: NaN represents a result which is not valid (i.e. Not a Number) in the domain of quantities being evaluated. For example, R does its arithmetic by default in the domain of double, i.e. the machine representation of real numbers. In this domain, sqrt(-1) does not exist -- it is not a number in the domain of real numbers. Hence: sqrt(-1) # [1] NaN # Warning message: # In sqrt(-1) : NaNs produced In order to obtain a result which does exist, you need to switch domain to complex numbers: sqrt(as.complex(-1)) # [1] 0+1i NA, on the other hand, represents a value (in whatever domain: double, logical, character, ...) which is not known, which is why it is typically used to represent missing data. It would be a valid entity in the current domain if its value were known, but the value is not known. Hence the result of any expression involving NA quantities is NA, since the value if the expression would depend on the unkown elements, and hence the value of the expression is unknown. This distinction is important and useful, so it should not be done away with by merging NaN and NA! Best wishes, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 31-Dec-09 Time: 21:05:06 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] a little bug for the function 'sprintf' (PR#14161)
On 21-Dec-09 17:36:12, Peter Dalgaard wrote: baoli...@gmail.com wrote: Dear R-ers, I am a gratuate student from South China University of Technology. I fond the function 'sprintf' in R2.10.1 have a little bug(?): When you type in the example codes: sprintf(%s is %f feet tall\n, Sven, 7.1) and R returns: [1] Sven is 7.10 feet tall\n this is very different from the 'sprintf' function in C/C++, for in C/C++, the format string \n usually represents a new line, but here, just the plain text \n! No, this is exactly the same as in C/C++. If you compare the result of sprintf to Sven is 7.10 feet tall\n with strcmp() in C, they will compare equal. s - sprintf(%s is %f feet tall\n, Sven, 7.1) s [1] Sven is 7.10 feet tall\n nchar(s) [1] 27 substr(s,27,27) [1] \n The thing that is confusing you is that strings are DISPLAYED using the same escape-character mechanisms as used for input. Compare cat(s) Sven is 7.10 feet tall Is it a bug, or a deliberate design? Design, not bug (and please don't file as bug when you are in doubt.) And another confusion is that the C/C++ function sprintf() indeed creates a string AND ASSIGNS IT TO A NAMED VARIABLE, according to the syntax int sprintf(char *str, const char *format, ...); as in char *X ; sprintf(X,%s is %f feet tall\n, Sven, 7.1) ; as a result of which the string X will have the value Sven is 7.10 feet tall\n R's sprintf does not provide for the parameter Char *str, here X, and so RETURNS the string as the value of the function. This is NOT TO BE CONFUSED with the behaviour of the C/C++ functions printf() and fprintf(), both of which create the string and then send it to either stdout or to a file: int printf(const char *format, ...); int fprintf(FILE *stream, const char *format, ...); Therefore, if you programmed printf(%s is %f feet tall\n, Sven, 7.1) ; you would see on-screen (stdout) the string Sven is 7.10 feet tall (followed by a line-break due to the \n), while mystream = fopen(myoutput.txt,a) ; fprintf(mystream, %s is %f feet tall\n, Sven, 7.1) ; would append Sven is 7.10 feet tall (followed by a line-break) to myoutput.txt Hoping this helps! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 21-Dec-09 Time: 18:14:57 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Binning of integers with hist() function odd results (P
On 06-Nov-09 23:30:12, g...@fnal.gov wrote: Full_Name: Gerald Guglielmo Version: 2.8.1 (2008-12-22) OS: OSX Leopard Submission from: (NULL) (131.225.103.35) When I attempt to use the hist() function to bin integers the behavior seems very odd as the bin boundary seems inconsistent across the various bins. For some bins the upper boundary includes the next integer value, while in others it does not. If I add 0.1 to every value, then the hist() binning behavior is what I would normally expect. h1-hist(c(1,2,2,3,3,3,4,4,4,4,5,5,5,5,5)) h1$mids [1] 1.5 2.5 3.5 4.5 h1$counts [1] 3 3 4 5 h2-hist(c(1.1,2.1,2.1,3.1,3.1,3.1,4.1,4.1,4.1,4.1,5.1,5.1,5.1,5.1,5.1) ) h2$mids [1] 1.5 2.5 3.5 4.5 5.5 h2$counts [1] 1 2 3 4 5 Naively I would have expected the same distribution of counts in the two cases, but clearly that is not happening. This is a simple example to illustrate the behavior, originally I noticed this while binning a large data sample where I had set the breaks=c(0,24,1). This is the correct intended bahaviour. By default, values which are exactly on the boundary between two bins are counted in the bin which is just below the boundary value. Except that the bottom-most break will count values on it into the bin just above it. Hence 1,2,2 all go into the [1,2] bin; 3,3,3 into (2,3]; 4,4,4,4 into (3,4]; and 5,5,5,5,5 into (4,5]. Hence the counts 3,3,4,5. Since you did not set breaks in h1-hist(c(1,2,2,3,3,3,4,4,4,4,5,5,5,5,5)), they were set using the default method, and you can see what they are with h1$breaks [1] 1 2 3 4 5 When you add 0.1 to each value, you push the values on the boundaries up into the next bin. Now each value is inside its bin, and not on any boundary. Hence 1.1 is in (1,2]; 2.1,2.1 in (2,3]; 3.1,3.1,3.1 in (3,4]; 4.1,4.1,4.1,4.1 in (4,5]; and 5.1,5.1,5.1,5.1,5.1 in (5,6], giving counts 1,2,3,4,5 as you observe. The default behaviour described above is defined by the default options include.lowest = TRUE, right = TRUE where: include.lowest: logical; if 'TRUE', an 'x[i]' equal to the 'breaks' value will be included in the first (or last, for 'right = FALSE') bar. This will be ignored (with a warning) unless 'breaks' is a vector. right: logical; if 'TRUE', the histograms cells are right-closed (left open) intervals. See '?hist'. You can change this behaviour by shanging the options. Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 07-Nov-09 Time: 13:57:07 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Binning of integers with hist() function odd results (P (PR#14047)
On 06-Nov-09 23:30:12, g...@fnal.gov wrote: Full_Name: Gerald Guglielmo Version: 2.8.1 (2008-12-22) OS: OSX Leopard Submission from: (NULL) (131.225.103.35) When I attempt to use the hist() function to bin integers the behavior seems very odd as the bin boundary seems inconsistent across the various bins. For some bins the upper boundary includes the next integer value, while in others it does not. If I add 0.1 to every value, then the hist() binning behavior is what I would normally expect. h1-hist(c(1,2,2,3,3,3,4,4,4,4,5,5,5,5,5)) h1$mids [1] 1.5 2.5 3.5 4.5 h1$counts [1] 3 3 4 5 h2-hist(c(1.1,2.1,2.1,3.1,3.1,3.1,4.1,4.1,4.1,4.1,5.1,5.1,5.1,5.1,5.1) ) h2$mids [1] 1.5 2.5 3.5 4.5 5.5 h2$counts [1] 1 2 3 4 5 Naively I would have expected the same distribution of counts in the two cases, but clearly that is not happening. This is a simple example to illustrate the behavior, originally I noticed this while binning a large data sample where I had set the breaks=c(0,24,1). This is the correct intended bahaviour. By default, values which are exactly on the boundary between two bins are counted in the bin which is just below the boundary value. Except that the bottom-most break will count values on it into the bin just above it. Hence 1,2,2 all go into the [1,2] bin; 3,3,3 into (2,3]; 4,4,4,4 into (3,4]; and 5,5,5,5,5 into (4,5]. Hence the counts 3,3,4,5. Since you did not set breaks in h1-hist(c(1,2,2,3,3,3,4,4,4,4,5,5,5,5,5)), they were set using the default method, and you can see what they are with h1$breaks [1] 1 2 3 4 5 When you add 0.1 to each value, you push the values on the boundaries up into the next bin. Now each value is inside its bin, and not on any boundary. Hence 1.1 is in (1,2]; 2.1,2.1 in (2,3]; 3.1,3.1,3.1 in (3,4]; 4.1,4.1,4.1,4.1 in (4,5]; and 5.1,5.1,5.1,5.1,5.1 in (5,6], giving counts 1,2,3,4,5 as you observe. The default behaviour described above is defined by the default options include.lowest = TRUE, right = TRUE where: include.lowest: logical; if 'TRUE', an 'x[i]' equal to the 'breaks' value will be included in the first (or last, for 'right = FALSE') bar. This will be ignored (with a warning) unless 'breaks' is a vector. right: logical; if 'TRUE', the histograms cells are right-closed (left open) intervals. See '?hist'. You can change this behaviour by shanging the options. Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 07-Nov-09 Time: 13:57:07 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] basename returns . not in filename (PR#13958)
On 18-Sep-09 19:08:15, Jens Oehlschlägel wrote: Mmh, Point is, I gather, that trailing slashes are removed, e.g., viggo:~/basename foo/ foo So, not a bug. This unfortunately means that we cannot distinguish between 1) a path with a filename 2) a path without a filename For example in the next version of the ff-package we allow a user to specify a 'pattern' for all files of a ff dataframe which is path together with a fileprefix, the above means we cannot specify an empty prefix for the current working directory, because dirname(./.) [1] . basename(./.) [1] . dirname(./) [1] . basename(./) [1] . Jens Oehlschlägel I am getting confused by this discussion. At least on Unixoid systems, and I believe it holds for Windows systems too, . stands for the current directory (working directory). Moreover, ./ means exactly the same as .: If you list the files in ./ you will get exactly the same as if you list the files in .. Further, any directory/. means the same as any directory/ and the same as any directory, so (on the same basis) ./. also means exactly the same as .. Therefore the second . in ./. is not a filename. What the above examples of dirname and basename usage are returning is simply a specific representation of the current working directlory. Forgive me if I have not seen the point, but what I think I have seen boils down to the interpretation I have given above. Best wishes, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 18-Sep-09 Time: 22:35:34 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Printing the null hypothesis
On 16-Aug-09 10:38:40, Liviu Andronic wrote: Dear R developers, Currently many (all?) test functions in R describe the alternative hypothesis, but not the the null hypothesis being tested. For example, cor.test: require(boot) data(mtcars) with(mtcars, cor.test(mpg, wt, met=kendall)) Kendall's rank correlation tau data: mpg and wt z = -5.7981, p-value = 0.6706 alternative hypothesis: true tau is not equal to 0 sample estimates: tau -0.72783 Warning message: In cor.test.default(mpg, wt, met = kendall) : Cannot compute exact p-value with ties In this example, H0: (not printed) Ha: true tau is not equal to 0 This should be fine for the advanced users and expert statisticians, but not for beginners. The help page will also often not explicitely state the null hypothesis. Personally, I often find myself in front of an htest object guessing what the null should have reasonably sounded like. Are there compelling reasons for not printing out the null being tested, along with the rest of the results? Thank you Liviu I don't know about *compelling* reasons! But (as a general rule) if the Alternative Hyptohesis is stated, then the Null Hypothesis is simply its negation. So, in your example, you can infer H0: true tau equals 0 Ha: true tau is not equal to 0. I don't think one needs to be an advanced user or expert statistician to see this -- it is part of the basic understanding of hypothesis testing. Some people might regard the H0 statement as simply redundant! The Ha statement is, however, essential, since different alterntatives may ne adopted depending on the application such as Ha: true tau is greater than 0 (implicit: true tau = 0) or Ha: tru tau is less than 0 (implicit: true tau = 0) Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 16-Aug-09 Time: 11:55:15 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Printing the null hypothesis
On 16-Aug-09 14:06:18, Liviu Andronic wrote: Hello, On 8/16/09, Ted Harding ted.hard...@manchester.ac.uk wrote: I don't know about *compelling* reasons! But (as a general rule) if the Alternative Hyptohesis is stated, then the Null Hypothesis is simply its negation. So, in your example, you can infer H0: true tau equals 0 Ha: true tau is not equal to 0. Oh, I had a slightly different H0 in mind. In the given example, cor.test(..., met=kendall) would test H0: x and y are independent, but cor.test(..., met=pearson) would test: H0: x and y are not correlated (or `are linearly independent') . Ah, now you are playing with fire! What the Pearson, Kendall and Spearman coefficients in cor.test measure is *association*. OK, if the results clearly indicate association, then the variables are not independent. But it is possible to have two variables x, y which are definitely not independent (indeed one is a function of the other) which yield zero association by any of these measures. Example: x - (-10:10) ; y - x^2 - mean(x^2) cor.test(x,y,method=pearson) # Pearson's product-moment correlation # t = 0, df = 19, p-value = 1 # alternative hypothesis: true correlation is not equal to 0 # sample estimates: cor 0 cor.test(x,y,method=kendall) # Kendall's rank correlation tau # z = 0, p-value = 1 # alternative hypothesis: true tau is not equal to 0 # sample estimates: tau 0 # cor.test(x,y,method=spearman) # Spearman's rank correlation rho # S = 1540, p-value = 1 # alternative hypothesis: true rho is not equal to 0 # sample estimates: rho 0 If you wanted, for instance, that the method=kendall should announce that it is testing H0: x and y are independent then it would seriously mislead the reader! To take a different example, a test of normality. shapiro.test(mtcars$wt) Shapiro-Wilk normality test data: mtcars$wt W = 0.9433, p-value = 0.09265 Here both H0: x is normal and Ha: x is not normal are missing. At least to beginners, these things are not always perfectly clear (even after reading the documentation), and when interpreting the results it can prove useful to have on-screen information about the null. Thank you for answering Liviu This is possibly a more discussable point, in that even if you know what the Shapiro-Wilk statistic is, it is not obvious what it is sensitive to, and hence what it might be testing for. But I doubt that someone would be led to try the Shapiro-Wilk test in the first place unless they were aware that it was a test for normality, and indeded this is announced in the first line of the response. The alternative, therefore, is non-normality. As to the contrast between absence of an Ha statement for the Shapiro-Wilk, and its presence in cor,test(), this comes back to the point I made earlier: cot.test() offers you three alternatives to choose from: two-sided (default), greater, less. This distinction can be important, and when cor.test() reports Ha it tells you which one was used. On the other hand, as far as Shapiro-Wilk is concerned there is no choice of alternatives (nor of anything else except the data x). So there is nothing to tell you! And, further, departure from normality has so many dimensions that alternatives like two sided, greater or less would make no sense. One can think of tests targeted at specific kinds of alternative such as Distribution is excessively skew or distribution has excessive kurtosis or distribution is bimodal or distribution is multimodal, and so on. But any of these can be detected by Shapiro-Wilk, so it is not targeted at any specific alternative. Best wishes, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 16-Aug-09 Time: 16:26:48 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] identical(0, -0)
On 07-Aug-09 11:07:08, Duncan Murdoch wrote: Martin Maechler wrote: William Dunlap wdun...@tibco.com on Thu, 6 Aug 2009 15:06:08 -0700 writes: -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Giovanni Petris Sent: Thursday, August 06, 2009 3:00 PM To: milton.ru...@gmail.com Cc: r-h...@r-project.org; daniel.gerl...@geodecapital.com Subject: Re: [R] Why is 0 not an integer? I ran an instant experiment... typeof(0) [1] double typeof(-0) [1] double identical(0, -0) [1] TRUE Best, Giovanni But 0.0 and -0.0 have different reciprocals 1.0/0.0 [1] Inf 1.0/-0.0 [1] -Inf Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com yes. {finally something interesting in this boring thread !} --- diverting to R-devel In April, I've had a private e-mail communication with John Chambers [father of S, notably S4, which also brought identical()] and Bill about the topic, where I had started suggesting that R should be changed such that identical(-0. , +0.) would return FALSE. Bill did mention that it does so for (newish versions of) S+ and that he'd prefer that, too, and John said I agree on having a preference for a bitwise comparison for identical()---that's what the name means after all. But since someone implemented the numerical case as the C == it's probably going to be more hassle than it's worth to change it. But we should make the implementation clear in the documentation. so in principle, we all agreed that R's identical() should be changed here, namely by using something like memcmp() instead of simple '==' , however we haven't bothered to actually *implement* this change. I am currently testing a patch which would lead to identical(0, -0) return FALSE. I don't think that would be a good idea. Other expressions besides -0 calculate the zero with the negative sign bit, e.g. the following sequence: pos - 1 neg - -1 zero - 0 y - zero*pos z - zero*neg identical(y, z) I think most R users would expect the last expression there to be TRUE based on the previous two lines, given that pos and neg both have finite values. In a simple case like this y == z would be a better test to use, but if those were components of a larger structure, identical() is all we've got, and people would waste a lot of time tracking down why structures differing only in the sign of zero were not identical, even though every element tested equal. Duncan Murdoch Martin Maechler, ETH Zurich My own view of this is that there may in certain cirumstances be an interest in distinguishing between 0 and (-0), yet normally most users will simply want to compare the numerical values. Therefore I am in favour of revising identical() so that it can so distinguish; but also of taking the opportunity to give it a parameter say identical(x,y,sign.bit=FALSE) so that the default behaviour would be to see 0 and (-0) as identical, but with sign.bit=TRUE it would see the difference. However, I put this forward in ignorance of a) Any difficulties that this may present in re-coding identical(); b) Any complications that may arise when applying this new form to complex objects. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 07-Aug-09 Time: 14:49:51 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] read.csv
On 14-Jun-09 18:56:01, Gabor Grothendieck wrote: If read.csv's colClasses= argument is NOT used then read.csv accepts double quoted numerics: 1: read.csv(stdin()) 0: A,B 1: 1,1 2: 2,2 3: A B 1 1 1 2 2 2 However, if colClasses is used then it seems that it does not: read.csv(stdin(), colClasses = numeric) 0: A,B 1: 1,1 2: 2,2 3: Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : scan() expected 'a real', got '1' Is this really intended? I would have expected that a csv file in which each field is surrounded with double quotes is acceptable in both cases. This may be documented as is yet seems undesirable from both a consistency viewpoint and the viewpoint that it should be possible to double quote fields in a csv file. Well, the default for colClasses is NA, for which ?read.csv says: [...] Possible values are 'NA' (when 'type.convert' is used), [...] and then ?type.convert says: This is principally a helper function for 'read.table'. Given a character vector, it attempts to convert it to logical, integer, numeric or complex, and failing that converts it to factor unless 'as.is = TRUE'. The first type that can accept all the non-missing values is chosen. It would seem that type 'logical' won't accept integer (naively one might expect 1 -- TRUE, but see experiment below), so the first acceptable type for 1 is integer, and that is what happens. So it is indeed documented (in the R[ecursive] sense of documented :)) However, presumably when colClasses is used then type.convert() is not called, in which case R sees itself being asked to assign a character entity to a destination which it has been told shall be integer, and therefore, since the default for as.is is as.is = !stringsAsFactors but for this ?read.csv says that stringsAsFactors is overridden bu [sic] 'as.is' and 'colClasses', both of which allow finer control., so that wouldn't come to the rescue either. Experiment: X -logical(10) class(X) # [1] logical X[1]-1 X # [1] 1 0 0 0 0 0 0 0 0 0 class(X) # [1] numeric so R has converted X from class 'logical' to class 'numeric' on being asked to assign a number to a logical; but in this case its hands were not tied by colClasses. Or am I missing something?!! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 14-Jun-09 Time: 21:21:22 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Closed-source non-free ParallelR ?
and a lot of these things are covered in the GPL FAQs, including the reporting of violations. The GPL FAQs are the FSF's interpretation. The R Foundation is not obliged to have the same interpretation, and of course the FSF cannot enforce licenses given by the R Foundation. Underlying all of your comments seems to be a presumption that the R Foundation can disentangle themselves from the FSF vis-a-vis the GPL. Keep in mind that it is the FSF that is the copyright holder of the GPL. The R Foundation may be the copyright holder to R, but they are distributing it under a license which they did not write. Thus, I cannot envision any reasonable circumstances under which the R Foundation would place themselves in a position of legal risk in deviating from the interpretations of the GPL by the FSF. It would be insane legally to do so. The key issue is the lack of case law relative to the GPL and that leaves room for interpretation. One MUST therefore give significant weight to the interpretations of the FSF as it will likely be the FSF that will be involved in any legal disputes over the GPL and its application. You would want them on your side, not be fighting them. A parallel here is why most large U.S. public corporations legally incorporate in the state of Delaware, even though they may not have any material physical presence in that state. It is because the overwhelming majority of corporate case law in the U.S. has been decided under the laws of Delaware and the interpretations of said laws. If I were to start a company (which I have done in the past) and feared that I should find myself facing litigation at some future date, I would want that huge database of case law behind me. A small company (such as I had) may be less concerned about this and be comfortable with the laws of their own state, which I was. But if I were to be looking to build a big company with investors, etc. and perhaps look to go public at a future date, you bet I would look to incorporate in Delaware. It would be the right fiduciary decision to make in the interest of all parties. Unfortunately, we have no such archive of case law yet of the GPL. Thus at least from a legally enforceable perspective, all is grey and the FSF has to be the presumptive leader here. HTH, Marc Schwartz __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 24-Apr-09 Time: 01:54:11 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Gamma funtion(s) bug
On 30-Mar-09 18:40:03, Kjetil Halvorsen wrote: With R 2.8.1 on ubuntu I get: gamma(-1) [1] NaN Warning message: In gamma(-1) : NaNs produced lgamma(-1) [1] Inf Warning message: value out of range in 'lgamma' Is'nt the first one right, and the second one (lgamma) should also be NaN? Kjetil That is surely correct! Since lim[x-(-1)+] gamma(x) = +Inf, while lim[x-(-1)-] gamma(x) = -Inf, at gamma(-1) one cannot choose between +Inf and -Inf, so surely is is NaN. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 30-Mar-09 Time: 19:55:33 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Gamma funtion(s) bug
On 30-Mar-09 20:37:51, Duncan Murdoch wrote: On 3/30/2009 2:55 PM, (Ted Harding) wrote: On 30-Mar-09 18:40:03, Kjetil Halvorsen wrote: With R 2.8.1 on ubuntu I get: gamma(-1) [1] NaN Warning message: In gamma(-1) : NaNs produced lgamma(-1) [1] Inf Warning message: value out of range in 'lgamma' Is'nt the first one right, and the second one (lgamma) should also be NaN? Kjetil That is surely correct! Since lim[x-(-1)+] gamma(x) = +Inf, while lim[x-(-1)-] gamma(x) = -Inf, at gamma(-1) one cannot choose between +Inf and -Inf, so surely is is NaN. But lgamma(x) is log(abs(gamma(x))), so it looks okay to me. Duncan Murdoch Oops, yes! That's what comes of talking off the top of my head (I don't think I've ever had occasion to evaluate lgamma(x) for negative x, so never consciously checked in ?lgamma). Thanks, Duncan! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 30-Mar-09 Time: 22:28:52 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Error in help file for quantile()
On 29-Mar-09 20:43:33, Rob Hyndman wrote: For some reason, the help file on quantile() says Missing values are ignored in the description of the x argument. Yet this is only true if na.rm=TRUE. I suggest the help file is amended to remove the words Missing values are ignored. Rob True enough -- in that if, as in the default, na.rm == FALSE, then applying quantile() to a vector with NAs yields the error message: quantile(X1) # Error in quantile.default(X1) : # missing values and NaN's not allowed if 'na.rm' is FALSE So either you have na.rm==TRUE, in which case it doesn't need saying that Missing values are ignored (unless you really want to spell it out in the form Missing values are ignored if called with na.rm=TRUE; otherwise an error message is produced), or you have na.rm==FALSE, in which case you get the error message and know where you stand. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 29-Mar-09 Time: 23:08:50 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [R] Semantics of sequences in R
On 24-Feb-09 13:14:36, Berwin A Turlach wrote: G'day Dimitris, On Tue, 24 Feb 2009 11:19:15 +0100 Dimitris Rizopoulos d.rizopou...@erasmusmc.nl wrote: in my opinion the point of the whole discussion could be summarized by the question, what is a design flaw? This is totally subjective, and it happens almost everywhere in life. [...] Beautifully summarised and I completely agree. Not surprisingly, others don't. [...] To close I'd like to share with you a Greek saying (maybe also a saying in other parts of the world) that goes, for every rule there is an exception. [...] As far as I know, the same saying exist in English. It definitely exist in German. Actually, in German it is every rule has its exception including this rule. Or, as my mother used to say, Moderation in all things! To which, as I grew up, I adjoined ... including moderation. Ted. In German there is one grammar rule that does not have an exception. At least there used to be one; I am not really sure whether that rule survived the recent reform of the German grammar rules. Cheers, Berwin E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 24-Feb-09 Time: 14:44:21 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] open-ended plot limits?
Hi Folks, Maybe I've missed it already being available somehow, but if the following isn't available I'd like to suggest it. If you're happy to let plot() choose its own limits, then of course plot(x,y) will do it. If you know what limits you want, then plot(x,y,xlim=c(x0,x1),ylim(y0,y1) will do it. But sometimes one would like to a) make sure that (e.g.) the y-axis has a lower limit (say) 0 b) let plot() choose the upper limit. In that case, something like plot(x,y,ylim=c(0,NA)) would be a natural way of specifying it. But of course that does not work. I would like to suggest that this possibility should be available. What do people think? Best wishes, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 05-Feb-09 Time: 20:48:30 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] open-ended plot limits?
Thanks, everyone, for all the responses! Ted. On 05-Feb-09 20:48:33, Ted Harding wrote: Hi Folks, Maybe I've missed it already being available somehow, but if the following isn't available I'd like to suggest it. If you're happy to let plot() choose its own limits, then of course plot(x,y) will do it. If you know what limits you want, then plot(x,y,xlim=c(x0,x1),ylim(y0,y1) will do it. But sometimes one would like to a) make sure that (e.g.) the y-axis has a lower limit (say) 0 b) let plot() choose the upper limit. In that case, something like plot(x,y,ylim=c(0,NA)) would be a natural way of specifying it. But of course that does not work. I would like to suggest that this possibility should be available. What do people think? Best wishes, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 05-Feb-09 Time: 20:48:30 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 05-Feb-09 Time: 23:08:25 -- XFMail -- E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 06-Feb-09 Time: 00:21:44 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] cat: ./R/Copy: No such file or directory
Just guessing here, but it looks as though an attempt has been made to execute the following command: cat ./R/Copy of create.fourier.basis.R This may have arisen because there is indeed a file named Copy of create.fourier.basis.R but the command was issued to the OS without quotes; or (perhaps less likely, because of the presence of the ./R/) the line Copy of create.fourier.basis.R was intended as a comment in the file, but somehow got read as a substantive part of the code. Good luck! Ted. On 01-Nov-08 15:40:33, Spencer Graves wrote: Hello: What do you recommend I do to get past cryptic error messages from R CMD check, complaining No such file or directory? The package is under SVN control, and I reverted to a version that passed R CMD check, without eliminating the error. The 00install.out file is short and bitter: cat: ./R/Copy: No such file or directory cat: of: No such file or directory cat: create.fourier.basis.R: No such file or directory make: *** [Rcode0] Error 1 -- Making package fda adding build stamp to DESCRIPTION installing NAMESPACE file and metadata make[2]: *** No rule to make target `R/Copy', needed by `D:/spencerg/statmtds/splines/fda/RForge/fda/fda.Rcheck/fda/R/fda'. Stop. make[1]: *** [all] Error 2 make: *** [pkg-fda] Error 2 *** Installation of fda failed *** Removing 'D:/spencerg/statmtds/splines/fda/RForge/fda/fda.Rcheck/fda' Thanks for any suggestions. I have a *.tar.gz file that I hope may not have this problem, but I'm not sure. Thanks, Spencer __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 01-Nov-08 Time: 16:19:56 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] non user-friendly error for chol2inv functions
On 29-Aug-08 13:00:01, Martin Maechler wrote: cd == christophe dutang [EMAIL PROTECTED] on Fri, 29 Aug 2008 14:28:42 +0200 writes: cd Yes, I do not cast the first argument as a matrix with cd as.matrix function. cd Maybe we could detail the error message if the first cd argument cd is a numeric? cd error(_('a' is a numeric and must be coerced to a numeric cd matrix)); Merci, Christophe. Yes, we *could* do that. Alternatively, I think I will just make it work in that case, since I see that qr(), chol(), svd(), solve() all treat a numeric (of length 1) as a 1 x 1 matrix automatically. I was about to draw attention to this inconsistency! While one is about it, might it not be useful also to do the converse: Treat a 1x1 matrix as a scalar in appropriate contexts? E.g. a - 4 A - matrix(4,1,1) B - matrix(c(1,2,3,4),2,2) a*B # [,1] [,2] # [1,]4 12 # [2,]8 16 a+B # [,1] [,2] # [1,]57 # [2,]68 A*B # Error in A * B : non-conformable arrays A+B # Error in A + B : non-conformable arrays Ted. cd Thanks for your answer De rien! Martin cd 2008/8/29 Martin Maechler [EMAIL PROTECTED] cd == christophe dutang [EMAIL PROTECTED] on Fri, 29 Aug 2008 12:44:18 +0200 writes: cd Hi, cd In function chol2inv with the option LINPACK set to false (default), it cd raises an error when the matrix is 1x1 matrix (i.e. just a real) saying cd 'a' must be a numeric matrix It is very helpful, but you have to read and understand it. I'm pretty sure you did not provide a 1 x 1 matrix. Here's an example showing how things works : m - matrix(4,1,1) cm - chol(m) cm [,1] [1,]2 chol2inv(cm) [,1] [1,] 0.25 Martin Maechler, ETH Zurich cd This error is raised by the underlying C function (modLa_chol2inv in cd function Lapack.c). Everything is normal, but I wonder if we could have cd another behavior when we pass a 1x1 matrix. I spent time this morning cd finding where was the error, and it was this problem. cd Thanks in advance cd Christophe cd [[alternative HTML version deleted]] cd __ cd R-devel@r-project.org mailing list cd https://stat.ethz.ch/mailman/listinfo/r-devel cd [[alternative HTML version deleted]] cd __ cd R-devel@r-project.org mailing list cd https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 29-Aug-08 Time: 14:16:03 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Suggestion: add a warning in the help-file of unique()
On 17-Apr-08 10:44:32, Matthieu Stigler wrote: Hello I'm sorry if this suggestion/correction was already made but after a search in devel list I did not find any mention of it. I would just suggest to add a warning or an exemple for the help-file of the function unique() like Note that unique() compares only identical values. Values which, are printed equally but in facts are not identical will be treated as different. a-c(0.2, 0.3, 0.2, 0.4-0.1) a [1] 0.2 0.3 0.2 0.3 unique(a) [1] 0.2 0.3 0.3 Well this is just the idea and the sentence could be made better (my poor english...). Maybe a reference to RFAQ 7.31 could be made. Maybe is this behaviour clear and logical for experienced users, but I don't think it is for beginners. I personnaly spent two hours to see that the problem in my code came from this. The above is potentially a useful suggestion, and I would be inclined to support it. However, for your other suggestion: I was thinking about modify the function unique() to introduce a tol argument which allows to compare with a tolerance level (with default value zero to keep unique consistent) like all.equal(), but it seemed too complicated with my little understanding. Bests regards and many thanks for what you do for R! Matthieu Stigler What is really complicated about it is that the results may depend on the order of elements. When unique() eliminates only values which are strictly identical to values which have been scanned earlier, there is no problem. But suppose you set tol=0.11 in unique(c(20.0, 30.0, 30.1, 30.2, 40.0) # 20.0, 30.0, 40 [30.1 rejected because within 0.11 of previous 30.0; 30.2 rejected because within 0.11 of previous 30.1] and compare with unique(c(20.0, 30.0, 30.2, 30.1, 40.0) # 20.0, 30.0, 30.2, 40.0 [30.2 accepted because not within 0.11 of any previous; 30.1 rejected because within 0.11 of previous 30.2 or 30.0] This kind of problem is always present in situations where there are potential chained tolerances. You cannot see the difference between the position of the hour-hand of a clock now, and one minute later. But you may not chain this logic, for, if you could: If A is indistinguishable from B, and B is indistinguishable from C, then A is indistinguishable from C. 10:00 is indistinguishable from 10:01 (on the hour-hand) 10:[n] is indistinguishable from 10:[n+1] Hence, by induction, 10:00 is indistinguishable from 11:00 Which you do not want! Best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 17-Apr-08 Time: 14:54:19 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Warnings generated by log2()/log10() are really large/t
On 27-Feb-08 13:39:47, Gabor Grothendieck wrote: On Wed, Feb 27, 2008 at 5:50 AM, Henrik Bengtsson [EMAIL PROTECTED] wrote: On Wed, Feb 27, 2008 at 12:56 AM, Prof Brian Ripley [EMAIL PROTECTED] wrote: On Wed, 27 Feb 2008, Martin Maechler wrote: Thank you Henrik, HenrikB == Henrik Bengtsson [EMAIL PROTECTED] on Tue, 26 Feb 2008 22:03:24 -0800 writes: {with many superfluous empty statements ( i.e., trailing ; ): Indeed! I like to add a personal touch to the code I'm writing ;) Seriously, I added them here as a bait in order to get a chance to say that I finally found a good reason for adding the semicolons. If you cut'n'paste code from certain web pages it may happen that newlines/carriage returns are not transferred and all code is pasted into the same line at the R prompt. With semicolons you still get a valid syntax. I cannot remember under what conditions this happened - I have seen that too and many others have as well since in some forums (not related to R) its common to indent all source lines by two spaces. Any line appearing without indentation must have been wrapped. A not-so-subtle solution to this (subtle or not) problem. NEVER paste from a browser (or a Word doc, or anything similar) into the R command interface. Paste only from pure plain text. Therefore, if you must paste, then paste first into a window where a pure-plain-text editor is running. Then you can see what you're getting, and can clean it up. After that, you can paste from this directly into R, or can save the file and source() it. Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 27-Feb-08 Time: 14:36:02 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] 0.450.45 = TRUE (PR#10744)
On 13-Feb-08 12:40:48, Barry Rowlingson wrote: hadley wickham wrote: It's more than that as though, as floating point addition is no longer guaranteed to be commutative or associative, and multiplication does not distribute over addition. Many concepts that are clear cut in pure math become fuzzy in floating point math - equality, singularity of matrices etc etc. I've just noticed that R doesn't calculate e^pi - pi as equal to 20: exp(pi)-pi == 20 [1] FALSE See: http://www.xkcd.com/217/ Barry Barry, These things fluctuate. Once upon a time (sometime in 1915 will do) you could get $[US]4.81 for £1.00 sterling. One of the rare brief periods when the folks on opposite side of the Atlantic saw i^i (to within .Machine$double.eps, which at the time was about 0.001, if you were lucky and didn't make a slip of the pen). R still gets it approximately right: 1/(1i^1i) [1] 4.810477+0i $i^i = £1 Best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 13-Feb-08 Time: 15:57:02 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] 0.450.45 = TRUE (PR#10744)
On 12-Feb-08 14:53:19, Gavin Simpson wrote: On Tue, 2008-02-12 at 15:35 +0100, [EMAIL PROTECTED] wrote: Dear developer, in my version of R (2.4.0) as weel as in a more recent version (2.6.0) on different computers, we found this problem : No problem in R. This is the FAQ of all FAQs (Type III SS is probably up there as well). I'm thinking (by now quite strongly) that there is a place in Introduction to R (and maybe other basic documentation) for an account of arithmetic precision in R (and in digital computation generally). A section Arithmetic Precision in R near the beginning would alert people to this issue (there is nothing about it in Introduction to R, R Language Definition, or R internals). Once upon a time, poeple who did arithmetic knew about this from hands-on experience (just when do you break out of the loop when you are dividing 1 by 3 on a sheet of paper?) -- but now people press buttons on black boxes, and when they find that 1/3 calculated in two mathematically equivalent ways comes out with two different values, they believe that there is a bug in the software. It would not occur to them, spontaneously, that the computer is doing the right thing and that they should look in a FAQ for an explanation of how they do not understand! I would be willing to contribute to such an explanation; and probably many others would too. But I feel it should be coordinated by people who are experts in the internals of how R handles such things. Best wishes to all, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 12-Feb-08 Time: 15:31:26 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] When 1+2 != 3 (PR#9895)
On 03-Sep-07 15:12:06, Henrik Bengtsson wrote: On 9/2/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: [...] If it may be usefull, I have written to small function (Unique and isEqual) which can deal with this problem of the double numbers. Quiz: What about utility functions equalsE() and equalsPi()? ...together with examples illustrating when they return TRUE and when they return FALSE. Cheers /Henrik Well, if you guys want a Quiz: ... My favourite example of something which will probably never work on R (or any machine which implements fixed-length binary real arithmetic). An interated function scheme on [0,1] is defined by if 0 = x = 0.5 then next x = 2*x if 0.5 x = 1 then next x = 2*(1 - x) in R: nextX - function(x){ifelse(x=0.5, 2*x, 2*(1-x))} and try, e.g., x-3/7; for(i in (1:60)){x-nextX(x); print(c(i,x))} x = 0 is an absorbing state. x = 1 - x = 0 x = 1/2 - 1 - 0 ... (these work in R) If K is an odd integer, and 0 r K, then x = r/K - ... leads into a periodic set. E.g. (see above) 3/7 - 6/7 - 2/7 - 4/7 - 2/7 All other numbers x outside these sets generate non-periodic sequences. Apart from the case where initial x = 1/2^k, none of the above is true in R (e.g. the example above). So can you devise an isEqual function which will make this work? It's only Monday .. plenty of time! Best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 03-Sep-07 Time: 17:32:38 -- XFMail -- E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 03-Sep-07 Time: 18:50:23 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] When 1+2 != 3 (PR#9895)
On 03-Sep-07 19:25:58, Gabor Grothendieck wrote: Not sure if this counts but using the Ryacas package Gabor, I'm afraid it doesn't count! (Though I didn't exclude it explicitly). I'm not interested in the behaviour of the sequence with denominator = 7 particularly. The system is in fact an example of simulating chaotic systems on a computer. For instance, one of the classic illustrations is next x = 2*x*(1-x) for any real x. The question is, how does a finite-length binary representation behave? Petr Savicky [privately] sent me a similar example: Starting with r/K: nextr - function(r){ifelse(r=K/2, 2*r, 2*(K-r))} For K = 7 and r = 3, this yields r = 3, 6, 2, 4, 6, ... Dividing this by K=7, one gets the correct period with approximately correct numbers. Best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 03-Sep-07 Time: 21:02:27 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] SWF animation method
On 08-Aug-07 13:52:59, Mike Lawrence wrote: Hi all, Just thought I'd share something I discovered last night. I was interested in creating animations consisting of a series of plots and after finding very little in the usual sources regarding animation in R directly, and disliking the imagemagick method described here (http://tolstoy.newcastle.edu.au/R/help/05/10/13297.html), I discovered that if one exports the plots to a multipage pdf, it is relatively trivial to then use the pdf2swf command in SWFTools (http://www.swftools.org/download.html; mac install instructions here: http://9mmedia.com/blog/?p=7). Thanks so much for sharing your discovery, Mike! Out of the blue! (Unexpected bonus for being on the R list). Best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 08-Aug-07 Time: 15:25:44 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] formula(CO2)
On 16-Jul-07 13:28:50, Gabor Grothendieck wrote: The formula attribute of the builtin CO2 dataset seems a bit strange: formula(CO2) Plant ~ Type + Treatment + conc + uptake What is one supposed to do with that? Certainly its not suitable for input to lm and none of the examples in ?CO2 use the above. I think one is supposed to ignore it! (Or maybe be inspired to write a mail to the list ... ). I couldn't find anything that looked like the above formula from str(CO2). But I did spot that the order of terms in the formula: Plant, Type, treatment, conc, uptake, is the same as the order of the columns in the dataframe. So I tried: D-data.frame(x=(1:10),y=(1:10)) formula(D) x ~ y So, lo and behold, D has a formula! Or does it? Maybe if you give formula() a dataframe, it simply constructs one from the columns. Best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 16-Jul-07 Time: 14:57:28 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] formula(CO2)
On 16-Jul-07 13:57:56, Ted Harding wrote: On 16-Jul-07 13:28:50, Gabor Grothendieck wrote: The formula attribute of the builtin CO2 dataset seems a bit strange: formula(CO2) Plant ~ Type + Treatment + conc + uptake What is one supposed to do with that? Certainly its not suitable for input to lm and none of the examples in ?CO2 use the above. I think one is supposed to ignore it! (Or maybe be inspired to write a mail to the list ... ). I couldn't find anything that looked like the above formula from str(CO2). But I did spot that the order of terms in the formula: Plant, Type, treatment, conc, uptake, is the same as the order of the columns in the dataframe. So I tried: D-data.frame(x=(1:10),y=(1:10)) formula(D) x ~ y So, lo and behold, D has a formula! Or does it? Maybe if you give formula() a dataframe, it simply constructs one from the columns. Now that I think about it, I can see a use for this phenomenon: formula(CO2) Plant ~ Type + Treatment + conc + uptake formula(CO2[,2:5]) Type ~ Treatment + conc + uptake formula(CO2[,3:5]) Treatment ~ conc + uptake formula(CO2[,4:5]) conc ~ uptake formula(CO2[,c(5,1,2,3,4)]) uptake ~ Plant + Type + Treatment + conc Could save a lot of typing! Best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 16-Jul-07 Time: 15:14:38 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] formula(CO2)
On 16-Jul-07 14:16:10, Gabor Grothendieck wrote: Following up on your comments it seems formula.data.frame just creates a formula whose lhs is the first column name and whose rhs is made up of the remaining column names. It ignores the formula attribute. In fact, CO2 does have a formula attribute but its not extracted by formula.data.frame: [EMAIL PROTECTED] uptake ~ conc | Plant formula(CO2) Plant ~ Type + Treatment + conc + uptake Indeed! And, following up yet again on my own follow-up comment: library(combinat) for(j in (1:4)){ for(i in combn((1:4),j,simplify=FALSE)){ print(formula(CO2[,c(5,i)])) } } uptake ~ Plant uptake ~ Type uptake ~ Treatment uptake ~ conc uptake ~ Plant + Type uptake ~ Plant + Treatment uptake ~ Plant + conc uptake ~ Type + Treatment uptake ~ Type + conc uptake ~ Treatment + conc uptake ~ Plant + Type + Treatment uptake ~ Plant + Type + conc uptake ~ Plant + Treatment + conc uptake ~ Type + Treatment + conc uptake ~ Plant + Type + Treatment + conc opening the door to automated fitting of all possible models (without interactions)! Now if only I could find out how to do the interactions as well, I would never need to think again! best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 16-Jul-07 Time: 15:40:36 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] formula(CO2)
On 16-Jul-07 14:42:19, Gabor Grothendieck wrote: Note that the formula uptake ~. will do the same thing so its not clear how useful this facility really is. Hmmm... Do you mean somthing like lm(uptake ~ . , data=CO2[,i]) where i is a subset of (1:4) as in my code below? In which case I agree! Ted. On 7/16/07, Ted Harding [EMAIL PROTECTED] wrote: On 16-Jul-07 14:16:10, Gabor Grothendieck wrote: Following up on your comments it seems formula.data.frame just creates a formula whose lhs is the first column name and whose rhs is made up of the remaining column names. It ignores the formula attribute. In fact, CO2 does have a formula attribute but its not extracted by formula.data.frame: [EMAIL PROTECTED] uptake ~ conc | Plant formula(CO2) Plant ~ Type + Treatment + conc + uptake Indeed! And, following up yet again on my own follow-up comment: library(combinat) for(j in (1:4)){ for(i in combn((1:4),j,simplify=FALSE)){ print(formula(CO2[,c(5,i)])) } } uptake ~ Plant uptake ~ Type uptake ~ Treatment uptake ~ conc uptake ~ Plant + Type uptake ~ Plant + Treatment uptake ~ Plant + conc uptake ~ Type + Treatment uptake ~ Type + conc uptake ~ Treatment + conc uptake ~ Plant + Type + Treatment uptake ~ Plant + Type + conc uptake ~ Plant + Treatment + conc uptake ~ Type + Treatment + conc uptake ~ Plant + Type + Treatment + conc opening the door to automated fitting of all possible models (without interactions)! Now if only I could find out how to do the interactions as well, I would never need to think again! best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 16-Jul-07 Time: 15:40:36 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 16-Jul-07 Time: 16:13:15 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] bounding box in PostScript
On 17-Apr-06 Prof Brian Ripley wrote: On Mon, 17 Apr 2006, David Allen wrote: When a graph is saved as PostScript, the bounding box is often too big. A consequence is that when the graph is included in a LaTeX document, the spacing does not look good. Is this a recognized problem? Is someone working on it? Could I help? It's not really true. The bounding box is of the figure region, and it is a design feature. If the figure `region' that is too big for your purposes you need to adjust the margins. Alternatively, there are lots of ways to set a tight bounding box: I tend to use the bbox device in gs. But that often ends up with figures which do not align, depending on whether text has descenders. Ghostscript's BBox resources tend to suffer from this problem, since for some reason as Brian says it may not properly allow for the true sizes of characters. I once tweaked ps2eps to correct (more or less) for this. But nowadays, to get it exactly right, I use gv (ghostview front end) to open the EPS file, and set watch mode: gv -watch filename.eps If it doesn't automatically give the BoundingBox view, you can always select this from the top menu. This ensures that the frame of the gv window is the same as the BoundingBox. Then open the EPS file for editing, and look for the %%BoundingBox llx lly urx ury line (it may be near the beginning, or at the end -- which will be flagged by a line %%BoundingBox: atend near the beinning). Here, llx, lly, urx, ury are integers giving the coordinates (in points relative to the page origin) of the lower left, and upper right, corners of the bounding box. Now you can change the values of these. Each time you change, save the edited file. Because gv is in watch mode, it will re-draw the page whenever the file changes. Thus you can adjust the bounding box until it is just as you want it. Best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 17-Apr-06 Time: 23:05:29 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] pbinom with size argument 0 (PR#8560)
On 03-Feb-06 [EMAIL PROTECTED] wrote: Full_Name: Uffe Høgsbro Thygesen Version: 2.2.0 OS: linux Submission from: (NULL) (130.226.135.250) Hello all. pbinom(q=0,size=0,prob=0.5) returns the value NaN. I had expected the result 1. In fact any value for q seems to give an NaN. Well, NaN can make sense since q=0 refers to a single sampled value, and there is no value which you can sample from size=0; i.e. sampling from size=0 is a non-event. I think the probability of a non-event should be NaN, not 1! (But maybe others might argue that if you try to sample from an empty urn you necessarily get zero successes, so p should be 1; but I would counter that you also necessarily get zero failures so q should be 1. I suppose it may be a matter of whether you regard the r of the binomial distribution as referring to the identities of the outcomes rather than to how many you get of a particular type. Hmmm.) Note that dbinom(x=0,size=0,prob=0.5) returns the value 1. That is probably because the .Internal code for pbinom may do a preliminary test for x = size. This also makes sense, for the cumulative pdist for any dist with a finite range, since the answer must then be 1 and a lot of computation would be saved (likewise returning 0 when x 0). However, it would make even more sense to have a preceding test for size=0 and return NaN in that case since, for the same reasons as above, the result is the probability of a non-event. (But it depends on your point of view, as above ... However, surely the two should be consistent with each other.) Best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 03-Feb-06 Time: 14:34:28 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] pbinom with size argument 0 (PR#8560)
On 03-Feb-06 Peter Dalgaard wrote: (Ted Harding) [EMAIL PROTECTED] writes: On 03-Feb-06 [EMAIL PROTECTED] wrote: Full_Name: Uffe Høgsbro Thygesen Version: 2.2.0 OS: linux Submission from: (NULL) (130.226.135.250) Hello all. pbinom(q=0,size=0,prob=0.5) returns the value NaN. I had expected the result 1. In fact any value for q seems to give an NaN. Well, NaN can make sense since q=0 refers to a single sampled value, and there is no value which you can sample from size=0; i.e. sampling from size=0 is a non-event. I think the probability of a non-event should be NaN, not 1! (But maybe others might argue that if you try to sample from an empty urn you necessarily get zero successes, so p should be 1; but I would counter that you also necessarily get zero failures so q should be 1. I suppose it may be a matter of whether you regard the r of the binomial distribution as referring to the identities of the outcomes rather than to how many you get of a particular type. Hmmm.) Note that dbinom(x=0,size=0,prob=0.5) returns the value 1. That is probably because the .Internal code for pbinom may do a preliminary test for x = size. This also makes sense, for the cumulative pdist for any dist with a finite range, since the answer must then be 1 and a lot of computation would be saved (likewise returning 0 when x 0). However, it would make even more sense to have a preceding test for size=0 and return NaN in that case since, for the same reasons as above, the result is the probability of a non-event. Once you get your coffee, you'll likely realize that you got your p's and d's mixed up... You're right about the mix-up! (I must mend the pipeline.) I think Uffe is perfectly right: The result of zero experiments will be zero successes (and zero failures) with probability 1, so the cumulative distribution function is a step function with one step at zero ( == as.numeric(x=0) ). I'm perfectly happy with this argument so long as it leads to dbinom(x=0,size=0,prob=p)=1 and also pbinom(q=0,size=0,prob=p)=1 (which seems to be what you are arguing too). And I think there are no traps if p=0 or p=1. (But it depends on your point of view, as above ... However, surely the two should be consistent with each other.) Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 03-Feb-06 Time: 15:07:49 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Typo [Was: Rd and guillemots]
On 17-Sep-05 Prof Brian Ripley wrote: On Fri, 16 Sep 2005 [EMAIL PROTECTED] wrote: On 16-Sep-05 Duncan Murdoch wrote: [...] This seems to happen in Rdconv.pm, around here: ## avoid conversion to guillemots $c =~ s//\{\}/; $c =~ s//\{\}/; The name of the continental quotation mark « is guillemet. The R Development Core Team must have had some bird on the brain at the time ... I don't think any authority agrees with Ted here. There are two characters, left and right. Agreed I only gave one instance. Either « or » is a guillemet. As to any authority, it depends what you mean by authority. 1. Take any good French dictionary (e.g. Collins Robert). Look up [Fr]guillemet: -- [En]quotation mark, inverted comma. Look up [En]quotation mark: -- [Fr]guillemet. There is a phrase entre guillemets: -- in quotation marks or in quotes, and vice versa. Look up [Fr]guillemot: -- [En]guillemot and vice versa. 2. Take a good book on printing/typographical matters, e.g. The Chicago Manual of Style which is very comprehensive. Index: guillemets [the entry is in the plural]: - 9.22-26 Small angle marks called guillemets («») are used for quotation marks ... Index: guillemot: -- nothing found. It's not as straightforward as that, however! In French, guillemet is in fact used generically for quotation mark and, typographically, includes not only the marks « and » we are talking bout, but also the marks used for similar purposes in non-continental typography. So the opening double quote `` (e.g. in Times Roman) and closing '' (sorry, can't make these marks in email) are also guillemets. Indeed we have [note the singular] guillemet anglais ouvrant (``), guillemet anglais fermant (''), as well as guillemet français ouvrant («), guillemet français fermant (»); not to mention the fact that a guillemet français e.g. « consists of two chevrons and one can also have a chevron ouvrant consisting of just one of these (can't do this either) which is also called a guillemet français simple ouvrant (in PostScript guilsingleft), etc. And there is (as in Courier font) the guillemet dactylographique = typewriter quotation mark (). And lots of other variants. Rather than sink in the morass of French-speaking usage, we might be better off referring to an authority closer to the sort of usage that concerns us, So I've had a look at the Unicode Standard, specifically http://www.unicode.org/Public/UNIDATA/NamesList.txt where one can find 00ABLEFT-POINTING DOUBLE ANGLE QUOTATION MARK = LEFT POINTING GUILLEMET = chevrons (in typography) * usually opening, sometimes closing 00BBRIGHT-POINTING DOUBLE ANGLE QUOTATION MARK = RIGHT POINTING GUILLEMET * usually closing, sometimes opening 2039SINGLE LEFT-POINTING ANGLE QUOTATION MARK = LEFT POINTING SINGLE GUILLEMET * usually opening, sometimes closing 203ASINGLE RIGHT-POINTING ANGLE QUOTATION MARK = RIGHT POINTING SINGLE GUILLEMET * usually closing, sometimes opening but no guillemots! Collectively it seems agreed they are called guillemets, but the issue is over the names of the single characters, and the character shown is the left guillem[eo]t. See above ... Adobe says these are left and right guillemot. It seems that the majority opinion does not agree, but there is a substantial usage following Adobe. That is certainly a matter of fact! And it is certainly thus in Adobe's PostScript Language Reference Manual (see e.g. Standard Roman Character Set in Appendix E, Standard Character Sets and Encoding Vectors). So that is what must be used when invoking them in PostScript. However, I am firmly of the view that Adobe made an error when they gave these things the names guillemotleft and guillemotright. I had already changed the R source code, so please Ted and others follow the advice in the posting guide and *** check the current sources before posting alleged bugs *** Easier said than done ... However, I apologise! Nevertheless, apart from the issue of a possible R bug, I think it is worth putting the record straight on the general issue of nomenclature. Best wishes to all, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 17-Sep-05 Time: 10:51:17 -- XFMail -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel