[Rd] bug in strsplit?
src/main/character.c:435-438 (do_strsplit) contains the following code: for (i = 0; i tlen; i++) if (getCharCE(STRING_ELT(tok, 0)) == CE_UTF8) use_UTF8 = TRUE; for (i = 0; i len; i++) if (getCharCE(STRING_ELT(x, 0)) == CE_UTF8) use_UTF8 = TRUE; since both loops iterate over loop-invariant expressions and statements, either the loops are redundant, or the fixed index '0' was meant to actually be the variable i. i guess it's the latter, hence 'bug?' in the subject. it also appears that if *any* element of tok (or x) positively passes the test, use_UTF8 is set to TRUE; in such a case, further checks make no sense. the following rewrite cuts the inessential computation: for (i = 0; i tlen; i++) if (getCharCE(STRING_ELT(tok, i)) == CE_UTF8) { use_UTF8 = TRUE; break; } for (i = 0; i len; i++) if (getCharCE(STRING_ELT(x, i)) == CE_UTF8) { use_UTF8 = TRUE; break; } since the pattern is repetitive, the following generic approach would help (and the macro could possibly be reused in other places): #define CHECK_CE(CHARACTER, LENGTH, USEUTF8) \ for (i = 0; i (LENGTH); i++) \ if (getCharCE(STRING_ELT((CHARACTER), i)) == CE_UTF8) { \ (USEUTF8) = TRUE; \ break; } CHECK_CE(tok, tlen, use_UTF8) CHECK_CE(x, len, use_UTF8) if you like it, i can provide a patch. vQ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] edge case concerning NA in dim() (PR#13729)
Full_Name: Allan Stokes Version: 28.1 OS: XP Submission from: (NULL) (24.108.0.245) I'm trying to use package HDF5 and have discovered some round-trip errors: save, load, save is not idempotent. I started digging into the type system to figure out what type graffiti is fouling this up. Soon I discovered that comparisons with NULL produce zero length vectors, which I hadn't known was possible, and I started to wonder about the properties of zero length objects. L0 - logical (0) dim(L0) - c(0) # OK dim(L0) - c(1) # error dim(L0) - c(0,1) # OK dim(L0) - c(0,-1) # OK dim(L0) - c(0,3.14) # OK, c(0,3) results dim(L0) - c(0,FALSE) # OK c(0,0) results dim(L0) - c(0,NA) # OK dim(L0) - c(1,NA) # error dim(L0) - c(1,NA,NA) # OK, SURPRISE!! NA*NA is normally NA, but in the test for dim() assignment, it appears that NA*NA == 0, which is then allowed. If the list contains more than one NA elements, the product seems to evaluate to zero. I can see making a case for 0*NA == 0 in this context, but not for NA*NA == 0. As an aside, I'm not sure why 0*NA does not equal 0 in general evaluation, unless NA is considered to possibly represent +/-inf. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Bug in base function sample ( ) (PR#13727)
Full_Name: Michael Chajewski Version: 2.9.0 OS: Windows XP Submission from: (NULL) (150.108.71.185) I was programming a routine which kept reducing the array from which a random sample was taken, resulting in a single number. I discovered that when R attempts to sample from an object with only one number it does not reproduce/report the number but instead chooses a random number between 1 and that number. Example 1: # I am assigning a single number gg - 7 # Creating an array to store sampled values ggtrack - 0 # I am sampling 10,000 observations from my single value # object and storing them for (i in 1:1) { g0 - sample(gg, (i/i)) ggtrack - c(ggtrack,g0) } # Deleting the initial value in the array ggtrack - ggtrack[-1] # The array ought to be 10,000 samples long (and it is) length(ggtrack) # The array should contain 10,000 7, but it does not # See the histogram of sampled values hist(ggtrack) Example 2: # Here is the same example, but now with # two number. Note that now the function performs # as expected and only samples between the two. gg - c(7,2) ggtrack - 0 for (i in 1:1) { g0 - sample(gg, (i/i)) ggtrack - c(ggtrack,g0) } ggtrack - ggtrack[-1] length(ggtrack) hist(ggtrack) Highest Regards, Michael Chajewski __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Fwd: [R] size of point symbols
Dear Prof. Ripley and all, Thank you very much for the pointers and the always insightful comments. I'd like to add a few further comments below for the sake of discussion, On 26 May 2009, at 08:35, Prof Brian Ripley wrote: I don't know where you get your claims from. R graphics is handled internally in inches, with a device-specific mapping to pixels/points etc (which is documented for each device on its help page). This has to be done carefully, as pixels may not be square. I saw hints of this use of inches in the code but I started off with the wrong assumption that symbols would be in mm (partly because ggplot2 suggested it would be so, partly because it's the natural unit I was taught to use throughout french technical education). What the meaning of pch=1:23 is in terms of coordinates is not documented except via the sources. I own Paul Murrell's R graphics book but I don't think the precise description of the symbols' size is presented in there. Perhaps a useful addition for the next edition? The source is function GESymbol in file src/main/engine.c, so for example pch = 2 is Thank you, I failed to pinpoint this. case 2: /* S triangle - point up */ xc = RADIUS * GSTR_0; r = toDeviceHeight(TRC0 * xc, GE_INCHES, dd); yc = toDeviceHeight(TRC2 * xc, GE_INCHES, dd); xc = toDeviceWidth(TRC1 * xc, GE_INCHES, dd); xx[0] = x; yy[0] = y+r; xx[1] = x+xc; yy[1] = y-yc; xx[2] = x-xc; yy[2] = y-yc; gc-fill = R_TRANWHITE; GEPolygon(3, xx, yy, gc, dd); break; which as you see is in inches, not mm as you asserted. The first line sets xc to 0.375 inches for cex=1, for example. You need to take the stroke width (as set by lty) into account when assessing the visual size of symbols Altering the implementation is definitely way out of my league, but I'm glad I learned where to find this piece of information should the need come in the future. On Mon, 25 May 2009, baptiste auguie wrote: Dear all, Having received no answer in r-help I'm trying r-devel (hoping this is not a stupid question). I don't understand the rationale behind the absolute sizes of the point symbols, and I couldn't find it documented (I got lost in the C code graphics.c and gave up). You are expected to study the sources for yourself. That's part of the price of R. There is a manual, 'R Internals', that would have explained to you that graphics.c is part of base graphics and hence not of grid graphics. R is a big project, and these implementation details can be hard to track down for non-programmers of my sort. That's why I was hoping for some hints on r-help first. In particular, it's not clear to me whether base graphics and grid graphics share these sort of primitive pieces of code. I'll have to read R internals. As a last note, I'd like to share this idea I've contemplated recently (currently implementing it at the R level for ggplot2), The points() symbols (well, rather the par() function, presumably) could gain an attribute 'type', say, with a few options: - 'old' for backward compatibility, this choice would set the symbols to use to the current values in the same way that palette() provides a default set of colours. - 'polygons', could provide the user with a set of regular polygons ordered by the number of vertices (3 to 6 and circle, for instance) with a consistent set of attributes (all having col and fill parameters). These could be complemented by starred versions of the polygons to make for a larger set of shapes. Such a design could provide several benefits over the current situation, 1) the possible mapping between symbols and data could be more straight-forward (in the spirit of the ggplot2 package), 2) the symbol size could be made consistent either with a constant area or a constant circumscribing circle, thereby conforming with the idea that information should minimise visual artefacts in displaying the data (I'm not saying this is the case currently, but I feel it may not be optimum.). - perhaps something else --- TeachingDemos has some interesting examples in the my.symbols help page. Thanks again, baptiste The example below uses Grid to check the size of the symbols against a square of 10mm x 10mm. checkOneSymbol - function(pch=0){ gTree(children=gList( rectGrob(0.5, 0.5, width=unit(10, mm), height=unit(10, mm), gp=gpar(lty=2, fill=NA, col=alpha(black, 0.5))), pointsGrob(0.5, 0.5, size=unit(10, mm),pch=pch, gp=gpar(col=alpha(red, 0.5))) )) } all.symbols - lapply(0:23, checkOneSymbol) pdf(symbols.pdf, height=1.2/2.54, width=24.2/2.54) vp - viewport(width=0.5, height=0.5, name=main) pushViewport(vp) pushViewport(viewport(layout=grid.layout(1, 24, widths=unit(10, mm), heights=unit(10, mm),
Re: [Rd] Bug in base function sample ( ) (PR#13727)
On Thu, 2009-05-28 at 09:30 +0200, chajew...@fordham.edu wrote: Full_Name: Michael Chajewski Version: 2.9.0 OS: Windows XP Submission from: (NULL) (150.108.71.185) I was programming a routine which kept reducing the array from which a random sample was taken, resulting in a single number. I discovered that when R attempts to sample from an object with only one number it does not reproduce/report the number but instead chooses a random number between 1 and that number. This is working as documented/intended in ?sample. 'x' is of length 1, so it is interpreted as 1:x (if x =1), resulting in the behaviour you have encountered. That help page even goes so far as to warn you that this convenience feature may lead to undesired behaviour... and gives an example function (in Examples) that handles the sort of use case you have. See the Examples section and the resample() function created there. HTH G Example 1: # I am assigning a single number gg - 7 # Creating an array to store sampled values ggtrack - 0 # I am sampling 10,000 observations from my single value # object and storing them for (i in 1:1) { g0 - sample(gg, (i/i)) ggtrack - c(ggtrack,g0) } # Deleting the initial value in the array ggtrack - ggtrack[-1] # The array ought to be 10,000 samples long (and it is) length(ggtrack) # The array should contain 10,000 7, but it does not # See the histogram of sampled values hist(ggtrack) Example 2: # Here is the same example, but now with # two number. Note that now the function performs # as expected and only samples between the two. gg - c(7,2) ggtrack - 0 for (i in 1:1) { g0 - sample(gg, (i/i)) ggtrack - c(ggtrack,g0) } ggtrack - ggtrack[-1] length(ggtrack) hist(ggtrack) Highest Regards, Michael Chajewski __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% signature.asc Description: This is a digitally signed message part __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] as.numeric(levels(factor(x))) may be a decreasing sequence
PS == Petr Savicky savi...@cs.cas.cz on Thu, 28 May 2009 09:36:48 +0200 writes: PS On Wed, May 27, 2009 at 10:51:38PM +0200, Martin Maechler wrote: I have very slightly modified the changes (to get rid of -Wall warnings) and also exported the function as Rf_dropTrailing0(), and tested the result with 'make check-all' . PS Thank you very much for considering the patch. -Wall indeed requires to add PS parentheses PS warning: suggest parentheses around comparison in operand of PS warning: suggest parentheses around assignment used as truth value PS If there are also other changes, i would like to ask you to make your modification PS available, mainly due to a possible further discussion. PS Let me also suggest a modification of my original proposal. It contains a cycle PS while (*(replace++) = *(p++)) { PS ; PS } PS If the number has no trailing zeros, but contains an exponent, this cycle PS shifts the exponent by 0 positions, which means that it copies each of its PS characters to itself. This may be eliminated as follows PS if (replace != p) { PSwhile (*(replace++) = *(p++)) { PS ; PS} PS } yes, that's a simple improvement, thank you. Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] edge case concerning NA in dim() (PR#13729)
On Fri, 29 May 2009, asto...@esica.com wrote: Full_Name: Allan Stokes Version: 28.1 OS: XP Submission from: (NULL) (24.108.0.245) I'm trying to use package HDF5 and have discovered some round-trip errors: save, load, save is not idempotent. I started digging into the type system to figure out what type graffiti is fouling this up. Soon I discovered that comparisons with NULL produce zero length vectors, which I hadn't known was possible, and I started to wonder about the properties of zero length objects. L0 - logical (0) dim(L0) - c(0) # OK dim(L0) - c(1) # error dim(L0) - c(0,1) # OK dim(L0) - c(0,-1) # OK dim(L0) - c(0,3.14) # OK, c(0,3) results dim(L0) - c(0,FALSE) # OK c(0,0) results dim(L0) - c(0,NA) # OK dim(L0) - c(1,NA) # error dim(L0) - c(1,NA,NA) # OK, SURPRISE!! NA*NA is normally NA, but in the test for dim() assignment, it appears that NA*NA == 0, which is then allowed. If the list contains more than one NA elements, the product seems to evaluate to zero. The calculation was done in C and failed to take NAs (and indeed negative values) into account. So L - logical(1) dim(L) - c(1, -1, -1) succeeded. Thank you for the report, changed in R 2.9.0 patched. (Since the representation of an integer NA is negative, a test for positivity would have caught this.) I can see making a case for 0*NA == 0 in this context, but not for NA*NA == 0. As an aside, I'm not sure why 0*NA does not equal 0 in general evaluation, unless NA is considered to possibly represent +/-inf. In fact NA as used here is logical but is coerced to a numeric NA, and a 'missing' numeric could take any possible value including Inf, -Inf and NaN. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [R] custom sort?
I've moved this to R-devel... On 5/28/2009 8:17 PM, Stavros Macrakis wrote: I couldn't get your suggested method to work: `==.foo` - function(a,b) unclass(a)==unclass(b) `.foo` - function(a,b) unclass(a) unclass(b) # invert comparison is.na.foo - function(a)is.na(unclass(a)) sort(structure(sample(5),class=foo)) #- 1:5 -- not reversed What am I missing? There are two problems. First, I didn't mention that you need a method for indexing as well. The code needs to evaluate things like x[i] x[j], and by default x[i] will not be of class foo, so the custom comparison methods won't be called. Second, I think there's a bug in the internal code, specifically in do_rank or orderVector1 in sort.c: orderVector1 ignores the class of x. do_rank pays attention when breaking ties, so I think this is an oversight. So I'd say two things should be done: 1. the bug should be fixed. Even if this isn't the most obvious approach, it should work. 2. we should look for ways to make all of this simpler, e.g. allowing a comparison function to be used. I'll take on 1, but not 2. It's hard to work out the right place for the comparison function to appear, and it would require a lot of work to implement, because all of this stuff (sort, rank, order, xtfrm, sort.int, etc.) is closely interrelated, some but not all of the functions are S3 generics, some implemented internally, etc. In the end, I'd guess the results won't be very satisfactory from a performance point of view: all those calls out to R to do the comparisons are going to be really slow. I think your advice to use order() with multiple keys is likely to be much faster in most instances. It's just a better approach in R. Duncan Murdoch -s On Thu, May 28, 2009 at 5:48 PM, Duncan Murdoch murd...@stats.uwo.cawrote: On 28/05/2009 5:34 PM, Steve Jaffe wrote: Sounds simple but haven't been able to find it in docs: is it possible to sort a vector using a user-defined comparison function? Seems it must be, but sort doesn't seem to provide that option, nor does order sfaics You put a class on the vector (e.g. using class(x) - myvector), then define a conversion to numeric (e.g. xtfrm.myvector) or actual comparison methods (you'll need ==.myvector, .myvector, and is.na.myvector). Duncan Murdoch __ r-h...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ r-h...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] as.numeric(levels(factor(x))) may be a decreasing sequence
vQ == Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no on Thu, 28 May 2009 00:36:07 +0200 writes: vQ Martin Maechler wrote: I have very slightly modified the changes (to get rid of -Wall warnings) and also exported the function as Rf_dropTrailing0(), and tested the result with 'make check-all' . As the change seems reasonable and consequent, and as it seems not to produce any problems in our tests, I'm hereby proposing to commit it (my version of it), [to R-devel only] within a few days, unless someone speaks up. vQ i may be misunderstanding the code, but: Martin Maechler, ETH Zurich PS --- R-devel/src/main/coerce.c 2009-04-17 17:53:35.0 +0200 PS +++ R-devel-elim-trailing/src/main/coerce.c 2009-05-23 08:39:03.914774176 +0200 PS @@ -294,12 +294,33 @@ PS else return mkChar(EncodeInteger(x, w)); PS } PS +const char *elim_trailing(const char *s, char cdec) vQ the first argument is const char*, which usually means a contract vQ promising not to change the content of the pointed-to object PS +{ PS +const char *p; PS +char *replace; PS +for (p = s; *p; p++) { PS +if (*p == cdec) { PS +replace = (char *) p++; vQ const char* p is cast to non-const char* replace PS +while ('0' = *p *p = '9') { PS +if (*(p++) != '0') { PS +replace = (char *) p; vQ likewise PS +} PS +} PS +while (*(replace++) = *(p++)) { vQ the char* replace is assigned to -- effectively, the content of the vQ promised-to-be-constant string s is modified, and the modification may vQ involve any character in the string. (it's a no-compile-error contract vQ violation; not an uncommon pattern, but not good practice either.) PS +; PS +} PS +break; PS +} PS +} PS +return s; vQ you return s, which should be the same pointer value (given the actual vQ code that does not modify the local variable s) with the same pointed-to vQ string value (given the signature of the function). vQ was perhaps vQ char *elim_trailing(char* const s, char cdec) vQ intended? yes that would seem slightly more logical to my eyes, and in principle I also agree with the other remarks you make above, ... vQ anyway, having the pointer s itself declared as const does vQ make sense, as the code seems to assume that exactly the input pointer vQ value should be returned. or maybe the argument to elim_trailing should vQ not be declared as const, since elim_trailing violates the declaration. vQ one way out is to drop the violated const in both the actual argument vQ and in elim_trailing, which would then be simplified by removing all vQ const qualifiers and (char*) casts. I've tried that, but ``it does not work'' later: {after having renamed 'elim_trailing' to 'dropTrailing0' } my version of *using* the function was 1 SEXP attribute_hidden StringFromReal(double x, int *warn) 2 { 3 int w, d, e; 4 formatReal(x, 1, w, d, e, 0); 5 if (ISNA(x)) return NA_STRING; 6 else return mkChar(dropTrailing0(EncodeReal(x, w, d, e, OutDec), OutDec)); 7 } where you need to consider that mkChar() expects a 'const char*' and EncodeReal(.) returns one, and I am pretty sure this was the main reason why Petr had used the two 'const char*' in (the now-named) dropTrailing0() definition. If I use your proposed signature char* dropTrailing0(char *s, char cdec); line 6 above gives warnings in all of several incantations I've tried including this one : else return mkChar((const char *) dropTrailing0((char *)EncodeReal(x, w, d, e, OutDec), OutDec)); which (the warnings) leave me somewhat clue-less or rather unmotivated to dig further, though I must say that I'm not the expert on the subject char* / const char* .. vQ another way out is to make vQ elim_trailing actually allocate and return a new string, keeping the vQ input truly constant, at a performance cost. yet another way is to vQ ignore the issue, of course. vQ the original (martin/petr) version may quietly pass -Wall, but the vQ compiler would complain (rightfully) with -Wcast-qual. hmm, yes, but actually I haven't found a solution along your proposition that even passes -pedantic -Wall -Wcast-align (the combination I've personally been using for a long time). Maybe we can try to solve this more esthetically in private e-mail exchange? Regards, Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] as.numeric(levels(factor(x))) may be a decreasing sequence
On Fri, May 29, 2009 at 03:53:02PM +0200, Martin Maechler wrote: my version of *using* the function was 1 SEXP attribute_hidden StringFromReal(double x, int *warn) 2 { 3 int w, d, e; 4 formatReal(x, 1, w, d, e, 0); 5 if (ISNA(x)) return NA_STRING; 6 else return mkChar(dropTrailing0(EncodeReal(x, w, d, e, OutDec), OutDec)); 7 } where you need to consider that mkChar() expects a 'const char*' and EncodeReal(.) returns one, and I am pretty sure this was the main reason why Petr had used the two 'const char*' in (the now-named) dropTrailing0() definition. Yes, the goal was to accept the output of EncodeReal() with exactly the same type, which EncodeReal() produces. A question is, whether the output type of EncodeReal() could be changed to (char *). Then, changing the output string could be done without casting const to non-const. This solution may be in conflict with the structure of the rest of R code, so i cannot evaluate, whether this is possible. Petr. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] 'mean' is not reverted in median() as NEWS says (PR#13731)
Full_Name: Version: 2.9.0 OS: windows, linux Submission from: (NULL) (128.231.21.125) In NEWS, it says median.default() was altered in 2.8.1 to use sum() rather than mean(), although it was still documented to use mean(). This caused problems for POSIXt objects, for which mean() but not sum() makes sense, so the change has been reverted. But it's not reverted yet. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] 'mean' is not reverted in median() as NEWS says (PR#13731)
zheng...@mail.nih.gov wrote: Full_Name: Version: 2.9.0 OS: windows, linux Submission from: (NULL) (128.231.21.125) In NEWS, it says median.default() was altered in 2.8.1 to use sum() rather than mean(), although it was still documented to use mean(). This caused problems for POSIXt objects, for which mean() but not sum() makes sense, so the change has been reverted. But it's not reverted yet. That text is not in the NEWS file for 2.9.0. And the NEWS file that it is in is not for 2.9.0, and does not list that change under CHANGES IN R VERSION 2.9.0. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [R] custom sort?
On 5/29/2009 9:28 AM, Duncan Murdoch wrote: I've moved this to R-devel... On 5/28/2009 8:17 PM, Stavros Macrakis wrote: I couldn't get your suggested method to work: `==.foo` - function(a,b) unclass(a)==unclass(b) `.foo` - function(a,b) unclass(a) unclass(b) # invert comparison is.na.foo - function(a)is.na(unclass(a)) sort(structure(sample(5),class=foo)) #- 1:5 -- not reversed What am I missing? There are two problems. First, I didn't mention that you need a method for indexing as well. The code needs to evaluate things like x[i] x[j], and by default x[i] will not be of class foo, so the custom comparison methods won't be called. Second, I think there's a bug in the internal code, specifically in do_rank or orderVector1 in sort.c: orderVector1 ignores the class of x. do_rank pays attention when breaking ties, so I think this is an oversight. So I'd say two things should be done: 1. the bug should be fixed. Even if this isn't the most obvious approach, it should work. I've now fixed the bug, and clarified the documentation to say The default method will make use of == and methods for the class of x[i] (for integers i), and the is.na method for the class of x, but might be rather slow when doing so. You don't actually need a custom indexing method, you just need to be aware that it's the class of x[i] that is important for comparisons. This will make it into R-patched and R-devel. Duncan Murdoch 2. we should look for ways to make all of this simpler, e.g. allowing a comparison function to be used. I'll take on 1, but not 2. It's hard to work out the right place for the comparison function to appear, and it would require a lot of work to implement, because all of this stuff (sort, rank, order, xtfrm, sort.int, etc.) is closely interrelated, some but not all of the functions are S3 generics, some implemented internally, etc. In the end, I'd guess the results won't be very satisfactory from a performance point of view: all those calls out to R to do the comparisons are going to be really slow. I think your advice to use order() with multiple keys is likely to be much faster in most instances. It's just a better approach in R. Duncan Murdoch -s On Thu, May 28, 2009 at 5:48 PM, Duncan Murdoch murd...@stats.uwo.cawrote: On 28/05/2009 5:34 PM, Steve Jaffe wrote: Sounds simple but haven't been able to find it in docs: is it possible to sort a vector using a user-defined comparison function? Seems it must be, but sort doesn't seem to provide that option, nor does order sfaics You put a class on the vector (e.g. using class(x) - myvector), then define a conversion to numeric (e.g. xtfrm.myvector) or actual comparison methods (you'll need ==.myvector, .myvector, and is.na.myvector). Duncan Murdoch __ r-h...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ r-h...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Bug in base function sample ( ) (PR#13727)
...I discovered that when R attempts to sample from an object with only one number it does not reproduce/report the number but instead chooses a random number between 1 and that number. This is the documented behavior. In my opinion, it is a design error, but changing it would no doubt break lots of code. As a general rule, the designers of R seem to have preferred convenience to consistency, which often makes things easier or more concise, but sometimes causes unfortunate surprises like this. -s [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [R] custom sort?
Thanks for the quick fix! -s On Fri, May 29, 2009 at 1:02 PM, Duncan Murdoch murd...@stats.uwo.cawrote: On 5/29/2009 9:28 AM, Duncan Murdoch wrote: I've moved this to R-devel... On 5/28/2009 8:17 PM, Stavros Macrakis wrote: I couldn't get your suggested method to work: `==.foo` - function(a,b) unclass(a)==unclass(b) `.foo` - function(a,b) unclass(a) unclass(b) # invert comparison is.na.foo - function(a)is.na(unclass(a)) sort(structure(sample(5),class=foo)) #- 1:5 -- not reversed What am I missing? There are two problems. First, I didn't mention that you need a method for indexing as well. The code needs to evaluate things like x[i] x[j], and by default x[i] will not be of class foo, so the custom comparison methods won't be called. Second, I think there's a bug in the internal code, specifically in do_rank or orderVector1 in sort.c: orderVector1 ignores the class of x. do_rank pays attention when breaking ties, so I think this is an oversight. So I'd say two things should be done: 1. the bug should be fixed. Even if this isn't the most obvious approach, it should work. I've now fixed the bug, and clarified the documentation to say The default method will make use of == and methods for the class of x[i] (for integers i), and the is.na method for the class of x, but might be rather slow when doing so. You don't actually need a custom indexing method, you just need to be aware that it's the class of x[i] that is important for comparisons. This will make it into R-patched and R-devel. Duncan Murdoch 2. we should look for ways to make all of this simpler, e.g. allowing a comparison function to be used. I'll take on 1, but not 2. It's hard to work out the right place for the comparison function to appear, and it would require a lot of work to implement, because all of this stuff (sort, rank, order, xtfrm, sort.int, etc.) is closely interrelated, some but not all of the functions are S3 generics, some implemented internally, etc. In the end, I'd guess the results won't be very satisfactory from a performance point of view: all those calls out to R to do the comparisons are going to be really slow. I think your advice to use order() with multiple keys is likely to be much faster in most instances. It's just a better approach in R. Duncan Murdoch -s On Thu, May 28, 2009 at 5:48 PM, Duncan Murdoch murd...@stats.uwo.ca wrote: On 28/05/2009 5:34 PM, Steve Jaffe wrote: Sounds simple but haven't been able to find it in docs: is it possible to sort a vector using a user-defined comparison function? Seems it must be, but sort doesn't seem to provide that option, nor does order sfaics You put a class on the vector (e.g. using class(x) - myvector), then define a conversion to numeric (e.g. xtfrm.myvector) or actual comparison methods (you'll need ==.myvector, .myvector, and is.na.myvector). Duncan Murdoch __ r-h...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ r-h...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] install.packages now intentionally references .Rprofile?
I see that related to this thread, 'R CMD INSTALL' (like 'install.packages') also reads the .Rprofile before beginning. This caused package installation headaches for me that developers should be aware (as it was very difficult to debug). I added a setwd() to my .Rprofile [for example: setwd(/tmp)] to keep .Rhistory files from popping up in directories throughout my computer. This causes package installation to fail completely with an unhelpful error message. For example (any package will do here): R CMD INSTALL zoo_1.5-6.tar.gz Warning: invalid package 'zoo_1.5-6.tar.gz' Error: ERROR: no packages specified Removing 'setwd(...)' from the .Rprofile restores normal package installation behavior. I'd like to request that either setwd() not break installation, or the user can disable .Rprofile reading on R CMD INSTALL (for instance with an option such as --no-init-file). I'll use Heather's solution below for the short-term, but would rather not have to completely turn off my .Rprofile for non-interactive scripts. Thanks, Robert -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On Behalf Of Heather Turner Sent: Friday, May 22, 2009 6:13 AM To: Mark Kimpel Cc: Prof Brian Ripley; r-de...@stat.math.ethz.ch Subject: Re: [Rd] install.packages now intentionally references .Rprofile? I had a similar problem when moving to R-2.9.0 as my .Rprofile called update.packages(). The solution was to use if(interactive()) { utils:::update.packages(ask = FALSE) } HTH, Heather Mark Kimpel wrote: This was my original post, with the code example only slightly modified by Martin for clarity. Prior to R-2.9.0, this repeated downloading did not occur, the code worked as intended. In fact, if memory serves me correctly, it even worked at least during the first 3 months of R-2.0.0 in its development stage, before release as a numbered version. Is there a reason for that? Is there a work-around? As I mentioned in my original post, the code is actually wrapped in a function that checks the date and the date of the last update, and proceeds to update package once per week. It was quite handy when it was working, hence my desire for a fix for my code. Thanks, Mark Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry Indiana University School of Medicine 15032 Hunter Court, Westfield, IN 46074 (317) 490-5129 Work, Mobile VoiceMail (317) 399-1219 Home Skype: mkimpel The real problem is not whether machines think but whether men do. -- B. F. Skinner ** On Thu, May 21, 2009 at 2:17 AM, Prof Brian Ripley rip...@stats.ox.ac.ukwrote: On Wed, 20 May 2009, Martin Morgan wrote: A post on the Bioconductor mailing list https://stat.ethz.ch/pipermail/bioconductor/2009-May/027700.html suggests that install.packages now references .Rprofile (?), whereas in R-2-8 it did not. Is this intentional? Yes. And in fact it did in earlier versions, to find the default library into which to install. The example is, in .Rprofile library(utils) install.packages(Biobase, repos=http://bioconductor.org/packages/2.4/bioc;) then starting R from the command line results in repeated downloads of Biobase mtmor...@mm:~/tmp R --quiet trying URL ' http://bioconductor.org/packages/2.4/bioc/src/contrib/Biobase_2.4.1.tar. gz ' Content type 'application/x-gzip' length 1973533 bytes (1.9 Mb) opened URL == downloaded 1.9 Mb trying URL ' http://bioconductor.org/packages/2.4/bioc/src/contrib/Biobase_2.4.1.tar. gz ' Content type 'application/x-gzip' length 1973533 bytes (1.9 Mb) opened URL == downloaded 1.9 Mb ^C Execution halted sessionInfo() R version 2.9.0 Patched (2009-05-20 r48588) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.U TF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME= C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATI ON=C attached base packages: [1] stats graphics grDevices utils datasets methods base Martin -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/http://www.stats.ox.ac.uk/%7Eripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK
Re: [Rd] as.numeric(levels(factor(x))) may be a decreasing sequence
Petr Savicky wrote: On Fri, May 29, 2009 at 03:53:02PM +0200, Martin Maechler wrote: my version of *using* the function was 1 SEXP attribute_hidden StringFromReal(double x, int *warn) 2 { 3 int w, d, e; 4 formatReal(x, 1, w, d, e, 0); 5 if (ISNA(x)) return NA_STRING; 6 else return mkChar(dropTrailing0(EncodeReal(x, w, d, e, OutDec), OutDec)); 7 } where you need to consider that mkChar() expects a 'const char*' and EncodeReal(.) returns one, and I am pretty sure this was the main reason why Petr had used the two 'const char*' in (the now-named) dropTrailing0() definition. Yes, the goal was to accept the output of EncodeReal() with exactly the same type, which EncodeReal() produces. A question is, whether the output type of EncodeReal() could be changed to (char *). Then, changing the output string could be done without casting const to non-const. exactly. my suggestion was to modify your function so that no modify a constant string-cheating is done, by either (a) keeping the const but returning a *new* string (hence no const-to-nonconst cast would be needed), or (b) modify your function to accept a non-const string *and* modify the code that connects to your function via the input and output strings. note, if a solution in which your function serves as a destructive filter is just fine (martin seems to have accepted it already), then EncodeReal probably can produce just a string, with no const qualifier, and analogously for mkChar. on the other hand, if EncodeReal is purposefully designed to return a const string (i.e., there is an important reason for doing so), and analogously for mkChar, then your function violates the assumptions and can potentially be harmful to the rest of the code. This solution may be in conflict with the structure of the rest of R code, so i cannot evaluate, whether this is possible. well, either the rest of the code does *not* need const, and it can be safely removed, or it *does* rely on const, and your solution ciolates the expectation. vQ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] as.numeric(levels(factor(x))) may be a decreasing sequence
Martin Maechler wrote: [...] vQ you return s, which should be the same pointer value (given the actual vQ code that does not modify the local variable s) with the same pointed-to vQ string value (given the signature of the function). vQ was perhaps vQ char *elim_trailing(char* const s, char cdec) vQ intended? yes that would seem slightly more logical to my eyes, and in principle I also agree with the other remarks you make above, what does ' in principle ' mean, as opposed to 'in principle'? (is it emphasis, or sneer quotes?) ... vQ anyway, having the pointer s itself declared as const does vQ make sense, as the code seems to assume that exactly the input pointer vQ value should be returned. or maybe the argument to elim_trailing should vQ not be declared as const, since elim_trailing violates the declaration. vQ one way out is to drop the violated const in both the actual argument vQ and in elim_trailing, which would then be simplified by removing all vQ const qualifiers and (char*) casts. I've tried that, but ``it does not work'' later: {after having renamed 'elim_trailing' to 'dropTrailing0' } my version of *using* the function was 1 SEXP attribute_hidden StringFromReal(double x, int *warn) 2 { 3 int w, d, e; 4 formatReal(x, 1, w, d, e, 0); 5 if (ISNA(x)) return NA_STRING; 6 else return mkChar(dropTrailing0(EncodeReal(x, w, d, e, OutDec), OutDec)); 7 } where you need to consider that mkChar() expects a 'const char*' and EncodeReal(.) returns one, and I am pretty sure this was the main reason why Petr had used the two 'const char*' in (the now-named) dropTrailing0() definition. If I use your proposed signature char* dropTrailing0(char *s, char cdec); line 6 above gives warnings in all of several incantations I've tried including this one : else return mkChar((const char *) dropTrailing0((char *)EncodeReal(x, w, d, e, OutDec), OutDec)); which (the warnings) leave me somewhat clue-less or rather unmotivated to dig further, though I must say that I'm not the expert on the subject char* / const char* .. of course, if the input *is* const and the output is expected to be const, you should get an error/warning in the first case, and at least a warning in the other (depending on the level of verbosity/pedanticity you choose). but my point was not to light-headedly change the signature/return of elim_trailing and its implementation and use it in the original context; it was to either modify the context as well (if const is inessential), or drop modifying the const string if the const is in fact essential. vQ another way out is to make vQ elim_trailing actually allocate and return a new string, keeping the vQ input truly constant, at a performance cost. yet another way is to vQ ignore the issue, of course. vQ the original (martin/petr) version may quietly pass -Wall, but the vQ compiler would complain (rightfully) with -Wcast-qual. hmm, yes, but actually I haven't found a solution along your proposition that even passes -pedantic -Wall -Wcast-align (the combination I've personally been using for a long time). one way is to return from elim_trailing a new, const copy of the const string. using memcpy should be efficient enough. care should be taken to deallocate s when no longer needed. (my guess is that using the approach suggested here, s can be deallocated as soon as it is copied, which means pretty much that it does not really have to be const.) Maybe we can try to solve this more esthetically in private e-mail exchange? sure, we can discuss aesthetics offline. as long as we do not discuss aesthetics (do we?), it seems appropriate to me to keep the discussion online. i will experiment with a patch to solve this issue, and let you know when i have something reasonable. best, vQ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Why change data type when dropping to one-dimension?
Hello, First, let me say I'm an avid fan of R--it's incredibly powerful and I use it all the time. I appreciate all the hard work that the many developers have undergone. My question is: why does the paradigm of changing the type of a 1D return value to an unlisted array exist? This introduces boundary conditions where none need exist, thus making the coding harder and confusing. For example, consider: d = data.frame(a=rnorm(10), b=rnorm(10)); typeof(d); # OK; typeof(d[,1]); # Unexpected; typeof(d[,1,drop=F]); # Oh, now I see. This is indeed documented in the R Language specification, but why is it there in the first place? It doesn't make sense to the average programmer to change the return type based on dimension. Here it is again in 'sapply': sapply function (X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE) { [...snip...] if (common.len == 1) unlist(answer, recursive = FALSE) else if (common.len 1) array(unlist(answer, recursive = FALSE), dim = c(common.len, length(X)), dimnames = if (!(is.null(n1 - names(answer[[1]])) is.null(n2 - names(answer list(n1, n2)) [...snip...] } So, in 'sapply', if your return value is one-dimensional be careful, because the return type will not the be same as if it were otherwise. Is this legacy or a valid, rational design decision which I'm not yet a sophisticated enough R coder to enjoy? Thanks, -- Jason -- Jason Vertrees, PhD Dartmouth College : j...@cs.dartmouth.edu Boston University : jas...@bu.edu PyMOLWiki : http://www.pymolwiki.org/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] as.numeric(levels(factor(x))) may be a decreasing sequence
Hi Waclav (and other interested parties), I have committed my working version of src/main/coerce.c so you can prepare your patch against that. Thank you in advance! Martin On Fri, May 29, 2009 at 21:54, Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote: Martin Maechler wrote: [...] vQ you return s, which should be the same pointer value (given the actual vQ code that does not modify the local variable s) with the same pointed-to vQ string value (given the signature of the function). vQ was perhaps vQ char *elim_trailing(char* const s, char cdec) vQ intended? yes that would seem slightly more logical to my eyes, and in principle I also agree with the other remarks you make above, what does ' in principle ' mean, as opposed to 'in principle'? (is it emphasis, or sneer quotes?) ... vQ anyway, having the pointer s itself declared as const does vQ make sense, as the code seems to assume that exactly the input pointer vQ value should be returned. or maybe the argument to elim_trailing should vQ not be declared as const, since elim_trailing violates the declaration. vQ one way out is to drop the violated const in both the actual argument vQ and in elim_trailing, which would then be simplified by removing all vQ const qualifiers and (char*) casts. I've tried that, but ``it does not work'' later: {after having renamed 'elim_trailing' to 'dropTrailing0' } my version of *using* the function was 1 SEXP attribute_hidden StringFromReal(double x, int *warn) 2 { 3 int w, d, e; 4 formatReal(x, 1, w, d, e, 0); 5 if (ISNA(x)) return NA_STRING; 6 else return mkChar(dropTrailing0(EncodeReal(x, w, d, e, OutDec), OutDec)); 7 } where you need to consider that mkChar() expects a 'const char*' and EncodeReal(.) returns one, and I am pretty sure this was the main reason why Petr had used the two 'const char*' in (the now-named) dropTrailing0() definition. If I use your proposed signature char* dropTrailing0(char *s, char cdec); line 6 above gives warnings in all of several incantations I've tried including this one : else return mkChar((const char *) dropTrailing0((char *)EncodeReal(x, w, d, e, OutDec), OutDec)); which (the warnings) leave me somewhat clue-less or rather unmotivated to dig further, though I must say that I'm not the expert on the subject char* / const char* .. of course, if the input *is* const and the output is expected to be const, you should get an error/warning in the first case, and at least a warning in the other (depending on the level of verbosity/pedanticity you choose). but my point was not to light-headedly change the signature/return of elim_trailing and its implementation and use it in the original context; it was to either modify the context as well (if const is inessential), or drop modifying the const string if the const is in fact essential. vQ another way out is to make vQ elim_trailing actually allocate and return a new string, keeping the vQ input truly constant, at a performance cost . yet another way is to vQ ignore the issue, of course. vQ the original (martin/petr) version may quietly pass -Wall, but the vQ compiler would complain (rightfully) with -Wcast-qual. hmm, yes, but actually I haven't found a solution along your proposition that even passes -pedantic -Wall -Wcast-align (the combination I've personally been using for a long time). one way is to return from elim_trailing a new, const copy of the const string. using memcpy should be efficient enough. care should be taken to deallocate s when no longer needed. (my guess is that using the approach suggested here, s can be deallocated as soon as it is copied, which means pretty much that it does not really have to be const.) Maybe we can try to solve this more esthetically in private e-mail exchange? sure, we can discuss aesthetics offline. as long as we do not discuss aesthetics (do we?), it seems appropriate to me to keep the discussion online. i will experiment with a patch to solve this issue, and let you know when i have something reasonable. best, vQ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Why change data type when dropping to one-dimension?
On Fri, 29 May 2009, Jason Vertrees wrote: My question is: why does the paradigm of changing the type of a 1D return value to an unlisted array exist? This introduces boundary conditions where none need exist, thus making the coding harder and confusing. For example, consider: d = data.frame(a=rnorm(10), b=rnorm(10)); typeof(d);# OK; typeof(d[,1]);# Unexpected; typeof(d[,1,drop=F]); # Oh, now I see. It does make it harder for programmers, but it makes it easier for non-programmers. In particular, it is convenient to be able to do d[1,1] to extract a number from a matrix, rather than having to explicitly coerce the result to stop it being a matrix. At least the last two times this was discussed, there ended up being a reasonable level of agreement that if someone's life had to be made harder the programmers were better able to cope and that dropping dimensions was preferable. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Why change data type when dropping to one-dimension?
This is another example of the general preference of the designers of R for convenience over consistency. In my opinion, this is a design flaw even for non-programmers, because I find that inconsistencies make the system harder to learn. Yes, the naive user may stumble over the difference between m[[1,1]] and m[1,1] a few times before getting it, but once he or she understands the principle, it is general. -s On Fri, May 29, 2009 at 5:33 PM, Jason Vertrees j...@cs.dartmouth.edu wrote: Hello, First, let me say I'm an avid fan of R--it's incredibly powerful and I use it all the time. I appreciate all the hard work that the many developers have undergone. My question is: why does the paradigm of changing the type of a 1D return value to an unlisted array exist? This introduces boundary conditions where none need exist, thus making the coding harder and confusing. For example, consider: d = data.frame(a=rnorm(10), b=rnorm(10)); typeof(d); # OK; typeof(d[,1]); # Unexpected; typeof(d[,1,drop=F]); # Oh, now I see. This is indeed documented in the R Language specification, but why is it there in the first place? It doesn't make sense to the average programmer to change the return type based on dimension. Here it is again in 'sapply': sapply function (X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE) { [...snip...] if (common.len == 1) unlist(answer, recursive = FALSE) else if (common.len 1) array(unlist(answer, recursive = FALSE), dim = c(common.len, length(X)), dimnames = if (!(is.null(n1 - names(answer[[1]])) is.null(n2 - names(answer list(n1, n2)) [...snip...] } So, in 'sapply', if your return value is one-dimensional be careful, because the return type will not the be same as if it were otherwise. Is this legacy or a valid, rational design decision which I'm not yet a sophisticated enough R coder to enjoy? Thanks, -- Jason -- Jason Vertrees, PhD Dartmouth College : j...@cs.dartmouth.edu Boston University : jas...@bu.edu PyMOLWiki : http://www.pymolwiki.org/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Why change data type when dropping to one-dimension?
Thomas Lumley wrote: On Fri, 29 May 2009, Jason Vertrees wrote: My question is: why does the paradigm of changing the type of a 1D return value to an unlisted array exist? This introduces boundary conditions where none need exist, thus making the coding harder and confusing. For example, consider: d = data.frame(a=rnorm(10), b=rnorm(10)); typeof(d);# OK; typeof(d[,1]); # Unexpected; typeof(d[,1,drop=F]);# Oh, now I see. It does make it harder for programmers, but it makes it easier for non-programmers. In particular, it is convenient to be able to do d[1,1] to extract a number from a matrix, rather than having to explicitly coerce the result to stop it being a matrix. At least the last two times this was discussed, there ended up being a reasonable level of agreement that if someone's life had to be made harder the programmers were better able to cope and that dropping dimensions was preferable. -thomas Thomas LumleyAssoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle Thomas, Thanks for the quick response. I agree that extracting a number from a matrix/frame should result in a number not a matrix/frame. But, why do that for a 1D array of numbers? In my example, d[,1]; is an array, not a single number. How does that help the novice user? I guess I just don't like the idea that the default result is to act unexpectedly and that a flag or boundary-conditional code is needed to do the right thing. Regardless that's how it is, so I just need to learn the pitfalls for where that occurs. Thanks again, -- Jason -- Jason Vertrees, PhD Dartmouth College : j...@cs.dartmouth.edu Boston University : jas...@bu.edu PyMOLWiki : http://www.pymolwiki.org/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] as.numeric(levels(factor(x))) may be a decreasing sequence
Martin Maechler wrote: Hi Waclav (and other interested parties), I have committed my working version of src/main/coerce.c so you can prepare your patch against that. Hi Martin, One quick reaction (which does not resolve my original complaint): you can have p non-const, and cast s to char* on the first occasion its value is assigned to p, thus being able to copy from p to replace without repetitive casts. make check-ed patch atatched. vQ Index: src/main/coerce.c === --- src/main/coerce.c (revision 48689) +++ src/main/coerce.c (working copy) @@ -297,13 +297,13 @@ const char* dropTrailing0(const char *s, char cdec) { -const char *p; -for (p = s; *p; p++) { +char *p; +for (p = (char *)s; *p; p++) { if(*p == cdec) { - char *replace = (char *) p++; + char *replace = p++; while ('0' = *p*p = '9') if(*(p++) != '0') - replace = (char *) p; + replace = p; while((*(replace++) = *(p++))) ; break; __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Why change data type when dropping to one-dimension?
Stavros Macrakis wrote: This is another example of the general preference of the designers of R for convenience over consistency. In my opinion, this is a design flaw even for non-programmers, because I find that inconsistencies make the system harder to learn. Yes, the naive user may stumble over the difference between m[[1,1]] and m[1,1] a few times before getting it, but once he or she understands the principle, it is general. +1 vQ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Why change data type when dropping to one-dimension?
On Fri, 29 May 2009, Stavros Macrakis wrote: This is another example of the general preference of the designers of R for convenience over consistency. In my opinion, this is a design flaw even for non-programmers, because I find that inconsistencies make the system harder to learn. Yes, the naive user may stumble over the difference between m[[1,1]] and m[1,1] a few times before getting it, but once he or she understands the principle, it is general. I was on your side of this argument the first time it came up, but ended up being convinced the other way. In contrast to sample(n) or the non-standard evaluation of weights= and subset= arguments to modelling functions, or various other conveniences that I think we are stuck with despite them being a bad idea, I think dropping dimensions is useful. -thomas -s On Fri, May 29, 2009 at 5:33 PM, Jason Vertrees j...@cs.dartmouth.edu wrote: Hello, First, let me say I'm an avid fan of R--it's incredibly powerful and I use it all the time. I appreciate all the hard work that the many developers have undergone. My question is: why does the paradigm of changing the type of a 1D return value to an unlisted array exist? This introduces boundary conditions where none need exist, thus making the coding harder and confusing. For example, consider: d = data.frame(a=rnorm(10), b=rnorm(10)); typeof(d); # OK; typeof(d[,1]); # Unexpected; typeof(d[,1,drop=F]); # Oh, now I see. This is indeed documented in the R Language specification, but why is it there in the first place? It doesn't make sense to the average programmer to change the return type based on dimension. Here it is again in 'sapply': sapply function (X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE) { [...snip...] if (common.len == 1) unlist(answer, recursive = FALSE) else if (common.len 1) array(unlist(answer, recursive = FALSE), dim = c(common.len, length(X)), dimnames = if (!(is.null(n1 - names(answer[[1]])) is.null(n2 - names(answer list(n1, n2)) [...snip...] } So, in 'sapply', if your return value is one-dimensional be careful, because the return type will not the be same as if it were otherwise. Is this legacy or a valid, rational design decision which I'm not yet a sophisticated enough R coder to enjoy? Thanks, -- Jason -- Jason Vertrees, PhD Dartmouth College : j...@cs.dartmouth.edu Boston University : jas...@bu.edu PyMOLWiki : http://www.pymolwiki.org/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] png() error in recent R-devel on Windows
Hi, Tested with the latest r-devel snapshot build for Windows (2009-05-28 r48663): png(test.png) Error in png(test.png) : invalid value of 'fillOddEven' The png() function is defined like this: png function (filename = Rplot%03d.png, width = 480, height = 480, units = px, pointsize = 12, bg = white, res = NA, restoreConsole = TRUE) { if (!checkIntFormat(filename)) stop(invalid 'filename') filename - path.expand(filename) units - match.arg(units, c(in, px, cm, mm)) if (units != px is.na(res)) stop('res' must be specified unless 'units = \px\') height - switch(units, `in` = res, cm = res/2.54, mm = res/25.4, px = 1) * height width - switch(units, `in` = res, cm = res/2.54, mm = 1/25.4, px = 1) * width invisible(.External(Cdevga, paste(png:, filename, sep = ), width, height, pointsize, FALSE, 1L, NA_real_, NA_real_, bg, 1, as.integer(res), NA_integer_, FALSE, .PSenv, NA, restoreConsole, , FALSE)) } Note that the call to .External has 19 arguments, the last 2 of them being and FALSE but the devga() function defined in src/library/grDevices/src/init.c expects 1 more argument (19 + the entry point name), the last 3 of them being expected to be string (title), logical (clickToConfirm), and logical (fillOddEven). So it seems like the recently added 'fillOddEven' argument (r48294) is omitted in the .External call, hence the error. sessionInfo() R version 2.10.0 Under development (unstable) (2009-05-28 r48663) i386-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base Cheers, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fhcrc.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] png() error in recent R-devel on Windows
Thanks, will fix. Duncan Murdoch On 29/05/2009 7:32 PM, Hervé Pagès wrote: Hi, Tested with the latest r-devel snapshot build for Windows (2009-05-28 r48663): png(test.png) Error in png(test.png) : invalid value of 'fillOddEven' The png() function is defined like this: png function (filename = Rplot%03d.png, width = 480, height = 480, units = px, pointsize = 12, bg = white, res = NA, restoreConsole = TRUE) { if (!checkIntFormat(filename)) stop(invalid 'filename') filename - path.expand(filename) units - match.arg(units, c(in, px, cm, mm)) if (units != px is.na(res)) stop('res' must be specified unless 'units = \px\') height - switch(units, `in` = res, cm = res/2.54, mm = res/25.4, px = 1) * height width - switch(units, `in` = res, cm = res/2.54, mm = 1/25.4, px = 1) * width invisible(.External(Cdevga, paste(png:, filename, sep = ), width, height, pointsize, FALSE, 1L, NA_real_, NA_real_, bg, 1, as.integer(res), NA_integer_, FALSE, .PSenv, NA, restoreConsole, , FALSE)) } Note that the call to .External has 19 arguments, the last 2 of them being and FALSE but the devga() function defined in src/library/grDevices/src/init.c expects 1 more argument (19 + the entry point name), the last 3 of them being expected to be string (title), logical (clickToConfirm), and logical (fillOddEven). So it seems like the recently added 'fillOddEven' argument (r48294) is omitted in the .External call, hence the error. sessionInfo() R version 2.10.0 Under development (unstable) (2009-05-28 r48663) i386-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base Cheers, H. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] setdiff bizarre (was: odd behavior out of setdiff)
Dear R-devel, Please see the recent thread on R-help, Odd Behavior Out of setdiff(...) - addition of duplicate entries is not identified posted by Jason Rupert. I gave an answer, then read David Winsemius' answer, and then did some follow-up investigation. I would like to change my answer. My current version of setdiff() is acting in a way that I do not understand, and a way that I suspect has changed. Consider the following, derived from Jason's OP: The base package setdiff(), atomic vectors: x - 1:100 y - c(x,x) setdiff(x, y) # integer(0) setdiff(y, x) # integer(0) z - 1:25 setdiff(x,z) # 26:100 setdiff(z,x) # integer(0) Everything is fine. Now look at base package setdiff(), data frames??? A - data.frame(x = 1:100) B - rbind(A, A) setdiff(A, B) # df 1:100? setdiff(B, A) # df 1:100? C - data.frame(x = 1:25) setdiff(A, C) # df 1:100? setdiff(C, A) # df 1:25? I have read ?setdiff 37 times now, and I cannot divine any interpretation that matches the above output. From the source, it appears that match(x, y, 0L) == 0L is evaluating to TRUE, of length equal to the columns of x, and then x[match(x, y, 0L) == 0L] is returning the entire data frame. Compare with the output from package prob, which uses a setdiff that operates row-wise: ### library(prob) A - data.frame(x = 1:100) B - rbind(A, A) setdiff(A, B) # integer(0) setdiff(B, A) # integer(0) C - data.frame(x = 1:25) setdiff(A, C) # 26:100 setdiff(C, A) # integer(0) IMHO, the entire notion of set and element is problematic in the df case, so I am not advocating the adoption of the prob:::setdiff approach; rather, setdiff is behaving in a way that I cannot believe with my own eyes, and I would like to alert those who can speak as to why this may be happening. Thanks to Jason for bringing this up, and to David for catching the discrepancy. Session info is below. I use the binaries prepared by the Debian group so I do not have the latest patched-revision-4440986745343b. This must have been related to something which has been fixed since April 17, and in that case, please disregard my message. Yours truly, Jay sessionInfo() R version 2.9.0 (2009-04-17) x86_64-pc-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] prob_0.9-1 -- *** G. Jay Kerns, Ph.D. Associate Professor Department of Mathematics Statistics Youngstown State University Youngstown, OH 44555-0002 USA Office: 1035 Cushwa Hall Phone: (330) 941-3310 Office (voice mail) -3302 Department -3170 FAX E-mail: gke...@ysu.edu http://www.cc.ysu.edu/~gjkerns/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel