[Rd] bug in strsplit?

2009-05-29 Thread Wacek Kusnierczyk
src/main/character.c:435-438 (do_strsplit) contains the following code:

for (i = 0; i  tlen; i++)
if (getCharCE(STRING_ELT(tok, 0)) == CE_UTF8) use_UTF8 = TRUE;
for (i = 0; i  len; i++)
if (getCharCE(STRING_ELT(x, 0)) == CE_UTF8) use_UTF8 = TRUE;

since both loops iterate over loop-invariant expressions and statements,
either the loops are redundant, or the fixed index '0' was meant to
actually be the variable i.  i guess it's the latter, hence 'bug?' in
the subject.

it also appears that if *any* element of tok (or x) positively passes
the test, use_UTF8 is set to TRUE;  in such a case, further checks make
no sense.  the following rewrite cuts the inessential computation:

for (i = 0; i  tlen; i++)
if (getCharCE(STRING_ELT(tok, i)) == CE_UTF8) {
use_UTF8 = TRUE;
break; }
for (i = 0; i  len; i++)
if (getCharCE(STRING_ELT(x, i)) == CE_UTF8) {
use_UTF8 = TRUE;
break; }

since the pattern is repetitive, the following generic approach would
help (and the macro could possibly be reused in other places):

#define CHECK_CE(CHARACTER, LENGTH, USEUTF8) \
for (i = 0; i  (LENGTH); i++) \
if (getCharCE(STRING_ELT((CHARACTER), i)) == CE_UTF8) { \
(USEUTF8) = TRUE; \
break; }
CHECK_CE(tok, tlen, use_UTF8)
CHECK_CE(x, len, use_UTF8)

if you like it, i can provide a patch.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] edge case concerning NA in dim() (PR#13729)

2009-05-29 Thread astokes
Full_Name: Allan Stokes
Version: 28.1
OS: XP
Submission from: (NULL) (24.108.0.245)


I'm trying to use package HDF5 and have discovered some round-trip errors: save,
load, save is not idempotent.  I started digging into the type system to figure
out what type graffiti is fouling this up.  

Soon I discovered that comparisons with NULL produce zero length vectors, which
I hadn't known was possible, and I started to wonder about the properties of
zero length objects.  

L0 - logical (0) 
dim(L0) - c(0)  # OK 
dim(L0) - c(1)  # error 
dim(L0) - c(0,1) # OK 
dim(L0) - c(0,-1) # OK 
dim(L0) - c(0,3.14) # OK, c(0,3) results 
dim(L0) - c(0,FALSE) # OK c(0,0) results 
dim(L0) - c(0,NA) # OK 
dim(L0) - c(1,NA) # error
dim(L0) - c(1,NA,NA) # OK, SURPRISE!!   

NA*NA is normally NA, but in the test for dim() assignment, it appears that
NA*NA == 0, which is then allowed.  If the list contains more than one NA
elements, the product seems to evaluate to zero. 

I can see making a case for 0*NA == 0 in this context, but not for NA*NA == 0. 
As an aside, I'm not sure why 0*NA does not equal 0 in general evaluation,
unless NA is considered to possibly represent +/-inf.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Bug in base function sample ( ) (PR#13727)

2009-05-29 Thread chajewski
Full_Name: Michael Chajewski
Version: 2.9.0
OS: Windows XP
Submission from: (NULL) (150.108.71.185)


I was programming a routine which kept reducing the array from which a random
sample was taken, resulting in a single number. I discovered that when R
attempts to sample from an object with only one number it does not
reproduce/report the number but instead chooses a random number between 1 and
that number. 

Example 1:

# I am assigning a single number
gg - 7

# Creating an array to store sampled values
ggtrack - 0

# I am sampling 10,000 observations from my single value
# object and storing them
for (i in 1:1) {
g0 - sample(gg, (i/i))
ggtrack - c(ggtrack,g0)
}

# Deleting the initial value in the array
ggtrack - ggtrack[-1]

# The array ought to be 10,000 samples long (and it is)
length(ggtrack)

# The array should contain 10,000 7, but it does not
# See the histogram of sampled values
hist(ggtrack)

Example 2:

# Here is the same example, but now with
# two number. Note that now the function performs
# as expected and only samples between the two.

gg - c(7,2)
ggtrack - 0
for (i in 1:1) {
g0 - sample(gg, (i/i))
ggtrack - c(ggtrack,g0)
}

ggtrack - ggtrack[-1]
length(ggtrack)
hist(ggtrack)


Highest Regards,
Michael Chajewski

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Fwd: [R] size of point symbols

2009-05-29 Thread baptiste auguie

Dear Prof. Ripley and all,


Thank you very much for the pointers and the always insightful  
comments. I'd like to add a few further comments below for the sake of  
discussion,


On 26 May 2009, at 08:35, Prof Brian Ripley wrote:


I don't know where you get your claims from.  R graphics is handled
internally in inches, with a device-specific mapping to pixels/points
etc (which is documented for each device on its help page).  This has
to be done carefully, as pixels may not be square.


I saw hints of this use of inches in the code but I started off with  
the wrong assumption that symbols would be in mm (partly because  
ggplot2 suggested it would be so, partly because it's the natural unit  
I was taught to use throughout french technical education).




What the meaning of pch=1:23 is in terms of coordinates is not
documented except via the sources.


 I own Paul Murrell's R graphics book but I don't think the precise  
description of the symbols' size is presented in there. Perhaps a  
useful addition for the next edition?



The source is function GESymbol in
file src/main/engine.c, so for example pch = 2 is


Thank you, I failed to pinpoint this.



case 2: /* S triangle - point up */
xc = RADIUS * GSTR_0;
r = toDeviceHeight(TRC0 * xc, GE_INCHES, dd);
yc = toDeviceHeight(TRC2 * xc, GE_INCHES, dd);
xc = toDeviceWidth(TRC1 * xc, GE_INCHES, dd);
xx[0] = x; yy[0] = y+r;
xx[1] = x+xc; yy[1] = y-yc;
xx[2] = x-xc; yy[2] = y-yc;
gc-fill = R_TRANWHITE;
GEPolygon(3, xx, yy, gc, dd);
break;

which as you see is in inches, not mm as you asserted.  The first line
sets xc to 0.375 inches for cex=1, for example.

You need to take the stroke width (as set by lty) into account when
assessing the visual size of symbols



Altering the implementation is definitely way out of my league, but  
I'm glad I learned where to find this piece of information should the  
need come in the future.



On Mon, 25 May 2009, baptiste auguie wrote:


Dear all,


Having received no answer in r-help I'm trying r-devel (hoping this  
is not a

stupid question).

I don't understand the rationale behind the absolute sizes of the  
point

symbols, and I couldn't find it documented (I got lost in the C code
graphics.c and gave up).


You are expected to study the sources for yourself.  That's part of
the price of R.

There is a manual, 'R Internals', that would have explained to you
that graphics.c is part of base graphics and hence not of grid
graphics.


R is a big project, and these implementation details can be hard to  
track down for non-programmers of my sort. That's why I was hoping for  
some hints on r-help first. In particular, it's not clear to me  
whether base graphics and grid graphics share these sort of  
primitive pieces of code. I'll have to read R internals.



As a last note, I'd like to share this idea I've contemplated recently  
(currently implementing it at the R level for ggplot2),


The points() symbols (well, rather the par() function, presumably)  
could gain an attribute 'type', say, with a few options:


- 'old' for backward compatibility, this choice would set the symbols  
to use to the current values in the same way that palette() provides a  
default set of colours.


- 'polygons', could provide the user with a set of regular polygons  
ordered by the number of vertices (3 to 6 and circle, for instance)  
with a consistent set of attributes (all having col and fill  
parameters). These could be complemented by starred versions of the  
polygons to make for a larger set of shapes.


Such a design could provide several benefits over the current  
situation, 1) the possible mapping between symbols and data could be  
more straight-forward (in the spirit of the ggplot2 package), 2) the  
symbol size could be made consistent either with a constant area or a  
constant circumscribing circle, thereby conforming with the idea that  
information should minimise visual artefacts in displaying the data  
(I'm not saying this is the case currently, but I feel it may not be  
optimum.).


- perhaps something else --- TeachingDemos has some interesting  
examples in the my.symbols help page.



Thanks again,

baptiste





The example below uses
Grid to check the size of the symbols against a square of 10mm x  
10mm.



checkOneSymbol - function(pch=0){
 gTree(children=gList(
 rectGrob(0.5, 0.5, width=unit(10, mm), height=unit(10,
mm),
 gp=gpar(lty=2, fill=NA, col=alpha(black, 0.5))),
 pointsGrob(0.5, 0.5, size=unit(10, mm),pch=pch,
 gp=gpar(col=alpha(red, 0.5)))
 ))

}
all.symbols - lapply(0:23, checkOneSymbol)

pdf(symbols.pdf, height=1.2/2.54, width=24.2/2.54)

vp - viewport(width=0.5, height=0.5, name=main)
pushViewport(vp)

pushViewport(viewport(layout=grid.layout(1, 24,
 widths=unit(10, mm),
 heights=unit(10, mm),

Re: [Rd] Bug in base function sample ( ) (PR#13727)

2009-05-29 Thread Gavin Simpson
On Thu, 2009-05-28 at 09:30 +0200, chajew...@fordham.edu wrote:
 Full_Name: Michael Chajewski
 Version: 2.9.0
 OS: Windows XP
 Submission from: (NULL) (150.108.71.185)
 
 
 I was programming a routine which kept reducing the array from which a random
 sample was taken, resulting in a single number. I discovered that when R
 attempts to sample from an object with only one number it does not
 reproduce/report the number but instead chooses a random number between 1 and
 that number. 

This is working as documented/intended in ?sample. 'x' is of length 1,
so it is interpreted as 1:x (if x =1), resulting in the behaviour you
have encountered.

That help page even goes so far as to warn you that this convenience
feature may lead to undesired behaviour... and gives an example
function (in Examples) that handles the sort of use case you have. See
the Examples section and the resample() function created there.

HTH

G

 
 Example 1:
 
 # I am assigning a single number
 gg - 7
 # Creating an array to store sampled values
 ggtrack - 0
 
 # I am sampling 10,000 observations from my single value
 # object and storing them
 for (i in 1:1) {
   g0 - sample(gg, (i/i))
   ggtrack - c(ggtrack,g0)
 }
 
 # Deleting the initial value in the array
 ggtrack - ggtrack[-1]
 
 # The array ought to be 10,000 samples long (and it is)
 length(ggtrack)
 
 # The array should contain 10,000 7, but it does not
 # See the histogram of sampled values
 hist(ggtrack)
 
 Example 2:
 
 # Here is the same example, but now with
 # two number. Note that now the function performs
 # as expected and only samples between the two.
 
 gg - c(7,2)
 ggtrack - 0
 for (i in 1:1) {
   g0 - sample(gg, (i/i))
   ggtrack - c(ggtrack,g0)
 }
 
 ggtrack - ggtrack[-1]
 length(ggtrack)
 hist(ggtrack)
 
 
 Highest Regards,
 Michael Chajewski
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



signature.asc
Description: This is a digitally signed message part
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as.numeric(levels(factor(x))) may be a decreasing sequence

2009-05-29 Thread Martin Maechler
 PS == Petr Savicky savi...@cs.cas.cz
 on Thu, 28 May 2009 09:36:48 +0200 writes:

PS On Wed, May 27, 2009 at 10:51:38PM +0200, Martin Maechler wrote:
 I have very slightly  modified the changes (to get rid of -Wall
 warnings) and also exported the function as Rf_dropTrailing0(),
 and tested the result with 'make check-all' .

PS Thank you very much for considering the patch. -Wall indeed requires to 
add 
PS parentheses
PS warning: suggest parentheses around comparison in operand of 
PS warning: suggest parentheses around assignment used as truth value

PS If there are also other changes, i would like to ask you to make your 
modification
PS available, mainly due to a possible further discussion.

PS Let me also suggest a modification of my original proposal. It contains 
a cycle
PS while (*(replace++) = *(p++)) {
PS ;
PS }
PS If the number has no trailing zeros, but contains an exponent, this 
cycle
PS shifts the exponent by 0 positions, which means that it copies each of 
its
PS characters to itself. This may be eliminated as follows
PS if (replace != p) {
PSwhile (*(replace++) = *(p++)) {
PS   ;
PS}
PS }

yes, that's a simple improvement, thank you.
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] edge case concerning NA in dim() (PR#13729)

2009-05-29 Thread Prof Brian Ripley

On Fri, 29 May 2009, asto...@esica.com wrote:


Full_Name: Allan Stokes
Version: 28.1
OS: XP
Submission from: (NULL) (24.108.0.245)


I'm trying to use package HDF5 and have discovered some round-trip errors: save,
load, save is not idempotent.  I started digging into the type system to figure
out what type graffiti is fouling this up.

Soon I discovered that comparisons with NULL produce zero length vectors, which
I hadn't known was possible, and I started to wonder about the properties of
zero length objects.

L0 - logical (0)
dim(L0) - c(0)  # OK
dim(L0) - c(1)  # error
dim(L0) - c(0,1) # OK
dim(L0) - c(0,-1) # OK
dim(L0) - c(0,3.14) # OK, c(0,3) results
dim(L0) - c(0,FALSE) # OK c(0,0) results
dim(L0) - c(0,NA) # OK
dim(L0) - c(1,NA) # error
dim(L0) - c(1,NA,NA) # OK, SURPRISE!!

NA*NA is normally NA, but in the test for dim() assignment, it appears that
NA*NA == 0, which is then allowed.  If the list contains more than one NA
elements, the product seems to evaluate to zero.


The calculation was done in C and failed to take NAs (and indeed 
negative values) into account.  So



L - logical(1)
dim(L) - c(1, -1, -1)


succeeded.

Thank you for the report, changed in R 2.9.0 patched.  (Since the 
representation of an integer NA is negative, a test for positivity 
would have caught this.)



I can see making a case for 0*NA == 0 in this context, but not for NA*NA == 0.
As an aside, I'm not sure why 0*NA does not equal 0 in general evaluation,
unless NA is considered to possibly represent +/-inf.


In fact NA as used here is logical but is coerced to a numeric NA, and 
a 'missing' numeric could take any possible value including Inf, -Inf 
and NaN.


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] custom sort?

2009-05-29 Thread Duncan Murdoch

I've moved this to R-devel...

On 5/28/2009 8:17 PM, Stavros Macrakis wrote:

I couldn't get your suggested method to work:

  `==.foo` - function(a,b) unclass(a)==unclass(b)
  `.foo` - function(a,b) unclass(a)  unclass(b) # invert comparison
  is.na.foo - function(a)is.na(unclass(a))

  sort(structure(sample(5),class=foo))  #- 1:5  -- not reversed

What am I missing?


There are two problems.  First, I didn't mention that you need a method 
for indexing as well.  The code needs to evaluate things like x[i]  
x[j], and by default x[i] will not be of class foo, so the custom 
comparison methods won't be called.


Second, I think there's a bug in the internal code, specifically in 
do_rank or orderVector1 in sort.c:  orderVector1 ignores the class of x. 
 do_rank pays attention when breaking ties, so I think this is an 
oversight.


So I'd say two things should be done:

 1.  the bug should be fixed.  Even if this isn't the most obvious 
approach, it should work.


 2.  we should look for ways to make all of this simpler, e.g. allowing 
a comparison function to be used.


I'll take on 1, but not 2.  It's hard to work out the right place for 
the comparison function to appear, and it would require a lot of work to 
implement, because all of this stuff (sort, rank, order, xtfrm, 
sort.int, etc.) is closely interrelated, some but not all of the 
functions are S3 generics, some implemented internally, etc.  In the 
end, I'd guess the results won't be very satisfactory from a performance 
point of view:  all those calls out to R to do the comparisons are going 
to be really slow.


I think your advice to use order() with multiple keys is likely to be 
much faster in most instances.  It's just a better approach in R.


Duncan Murdoch



   -s

On Thu, May 28, 2009 at 5:48 PM, Duncan Murdoch murd...@stats.uwo.cawrote:


On 28/05/2009 5:34 PM, Steve Jaffe wrote:


Sounds simple but haven't been able to find it in docs: is it possible to
sort a vector using a user-defined comparison function? Seems it must be,
but sort doesn't seem to provide that option, nor does order sfaics



You put a class on the vector (e.g. using class(x) - myvector), then
define a conversion to numeric (e.g. xtfrm.myvector) or actual comparison
methods (you'll need ==.myvector, .myvector, and is.na.myvector).

Duncan Murdoch


__
r-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
r-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as.numeric(levels(factor(x))) may be a decreasing sequence

2009-05-29 Thread Martin Maechler
 vQ == Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no
 on Thu, 28 May 2009 00:36:07 +0200 writes:

vQ Martin Maechler wrote:
 
 I have very slightly  modified the changes (to get rid of -Wall
 warnings) and also exported the function as Rf_dropTrailing0(),
 and tested the result with 'make check-all' .
 As the change seems reasonable and consequent, and as
 it seems not to produce any problems in our tests, 
 I'm hereby proposing to commit it (my version of it),
 [to R-devel only] within a few days, unless someone speaks up.



vQ i may be misunderstanding the code, but:


 Martin Maechler, ETH Zurich
 
PS --- R-devel/src/main/coerce.c   2009-04-17 17:53:35.0 +0200
PS +++ R-devel-elim-trailing/src/main/coerce.c 2009-05-23 
08:39:03.914774176 +0200
PS @@ -294,12 +294,33 @@
PS else return mkChar(EncodeInteger(x, w));
PS }
 
PS +const char *elim_trailing(const char *s, char cdec)
 

vQ the first argument is const char*, which usually means a contract
vQ promising not to change the content of the pointed-to object

PS +{
PS +const char *p;
PS +char *replace;
PS +for (p = s; *p; p++) {
PS +if (*p == cdec) {
PS +replace = (char *) p++;

  vQ const char* p is cast to non-const char* replace

PS +while ('0' = *p  *p = '9') {
PS +if (*(p++) != '0') {
PS +replace = (char *) p;

  vQ likewise

PS +}
PS +}
PS +while (*(replace++) = *(p++)) {
 

vQ the char* replace is assigned to -- effectively, the content of the
vQ promised-to-be-constant string s is modified, and the modification may
vQ involve any character in the string.  (it's a no-compile-error contract
vQ violation;  not an uncommon pattern, but not good practice either.)

PS +;
PS +}
PS +break;
PS +}
PS +}
PS +return s;
 

vQ you return s, which should be the same pointer value (given the actual
vQ code that does not modify the local variable s) with the same pointed-to
vQ string value (given the signature of the function).

vQ was perhaps

vQ char *elim_trailing(char* const s, char cdec)

vQ intended?

yes that would seem slightly more logical to my eyes, 
and in principle I also agree with the other remarks you make above,
...

vQ anyway, having the pointer s itself declared as const does
vQ make sense, as the code seems to assume that exactly the input pointer
vQ value should be returned.  or maybe the argument to elim_trailing should
vQ not be declared as const, since elim_trailing violates the declaration. 

vQ one way out is to drop the violated const in both the actual argument
vQ and in elim_trailing, which would then be simplified by removing all
vQ const qualifiers and (char*) casts.  

I've tried that, but   ``it does not work'' later:
{after having renamed  'elim_trailing'  to  'dropTrailing0' }
my version of *using* the function was

1 SEXP attribute_hidden StringFromReal(double x, int *warn)
2 {
3   int w, d, e;
4   formatReal(x, 1, w, d, e, 0);
5   if (ISNA(x)) return NA_STRING;
6   else return mkChar(dropTrailing0(EncodeReal(x, w, d, e, OutDec), OutDec));
7 }

where you need to consider that mkChar() expects a 'const char*' 
and EncodeReal(.) returns one, and I am pretty sure this was the
main reason why Petr had used the two 'const char*' in (the
now-named) dropTrailing0() definition. 
If I use your proposed signature

char* dropTrailing0(char *s, char cdec);

line 6 above gives warnings in all of several incantations I've tried
including this one :

else return mkChar((const char *) dropTrailing0((char *)EncodeReal(x, w, d, 
e, OutDec), OutDec));

which (the warnings) leave me somewhat clue-less or rather
unmotivated to dig further, though I must say that I'm not the
expert on the subject char*  / const char* ..

vQ   another way out is to make
vQ elim_trailing actually allocate and return a new string, keeping the
vQ input truly constant, at a performance cost.  yet another way is to
vQ ignore the issue, of course.

vQ the original (martin/petr) version may quietly pass -Wall, but the
vQ compiler would complain (rightfully) with -Wcast-qual.

hmm, yes, but actually I haven't found a solution along your
proposition that even passes   -pedantic -Wall -Wcast-align
(the combination I've personally been using for a long time).

Maybe we can try to solve this more esthetically
in private e-mail exchange?

Regards,
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as.numeric(levels(factor(x))) may be a decreasing sequence

2009-05-29 Thread Petr Savicky
On Fri, May 29, 2009 at 03:53:02PM +0200, Martin Maechler wrote:
 my version of *using* the function was
 
 1 SEXP attribute_hidden StringFromReal(double x, int *warn)
 2 {
 3   int w, d, e;
 4   formatReal(x, 1, w, d, e, 0);
 5   if (ISNA(x)) return NA_STRING;
 6   else return mkChar(dropTrailing0(EncodeReal(x, w, d, e, OutDec), OutDec));
 7 }
 
 where you need to consider that mkChar() expects a 'const char*' 
 and EncodeReal(.) returns one, and I am pretty sure this was the
 main reason why Petr had used the two 'const char*' in (the
 now-named) dropTrailing0() definition. 

Yes, the goal was to accept the output of EncodeReal() with exactly the
same type, which EncodeReal() produces. A question is, whether the
output type of EncodeReal() could be changed to (char *). Then, changing
the output string could be done without casting const to non-const.

This solution may be in conflict with the structure of the rest of R code,
so i cannot evaluate, whether this is possible.

Petr.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] 'mean' is not reverted in median() as NEWS says (PR#13731)

2009-05-29 Thread zhengxin
Full_Name: 
Version: 2.9.0
OS: windows, linux
Submission from: (NULL) (128.231.21.125)


In NEWS, it says median.default() was altered in 2.8.1 to use sum() rather
than mean(), although it was still documented to use mean().
This caused problems for POSIXt objects, for which mean() but
not sum() makes sense, so the change has been reverted.

But it's not reverted yet.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] 'mean' is not reverted in median() as NEWS says (PR#13731)

2009-05-29 Thread Peter Dalgaard

zheng...@mail.nih.gov wrote:
Full_Name: 
Version: 2.9.0

OS: windows, linux
Submission from: (NULL) (128.231.21.125)


In NEWS, it says median.default() was altered in 2.8.1 to use sum() rather
than mean(), although it was still documented to use mean().
This caused problems for POSIXt objects, for which mean() but
not sum() makes sense, so the change has been reverted.

But it's not reverted yet.


That text is not in the NEWS file for 2.9.0. And the NEWS file that it 
is in is not for 2.9.0, and does not list that change under CHANGES IN 
R VERSION 2.9.0.


--
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] custom sort?

2009-05-29 Thread Duncan Murdoch

On 5/29/2009 9:28 AM, Duncan Murdoch wrote:

I've moved this to R-devel...

On 5/28/2009 8:17 PM, Stavros Macrakis wrote:

I couldn't get your suggested method to work:

  `==.foo` - function(a,b) unclass(a)==unclass(b)
  `.foo` - function(a,b) unclass(a)  unclass(b) # invert comparison
  is.na.foo - function(a)is.na(unclass(a))

  sort(structure(sample(5),class=foo))  #- 1:5  -- not reversed

What am I missing?


There are two problems.  First, I didn't mention that you need a method 
for indexing as well.  The code needs to evaluate things like x[i]  
x[j], and by default x[i] will not be of class foo, so the custom 
comparison methods won't be called.


Second, I think there's a bug in the internal code, specifically in 
do_rank or orderVector1 in sort.c:  orderVector1 ignores the class of x. 
  do_rank pays attention when breaking ties, so I think this is an 
oversight.


So I'd say two things should be done:

  1.  the bug should be fixed.  Even if this isn't the most obvious 
approach, it should work.


I've now fixed the bug, and clarified the documentation to say

  The default method will make use of == and  methods
  for the class of x[i] (for integers i), and the
  is.na method for the class of x, but might be rather
  slow when doing so.

You don't actually need a custom indexing method, you just need to be 
aware that it's the class of x[i] that is important for comparisons.


This will make it into R-patched and R-devel.

Duncan Murdoch



  2.  we should look for ways to make all of this simpler, e.g. allowing 
a comparison function to be used.


I'll take on 1, but not 2.  It's hard to work out the right place for 
the comparison function to appear, and it would require a lot of work to 
implement, because all of this stuff (sort, rank, order, xtfrm, 
sort.int, etc.) is closely interrelated, some but not all of the 
functions are S3 generics, some implemented internally, etc.  In the 
end, I'd guess the results won't be very satisfactory from a performance 
point of view:  all those calls out to R to do the comparisons are going 
to be really slow.


I think your advice to use order() with multiple keys is likely to be 
much faster in most instances.  It's just a better approach in R.


Duncan Murdoch



   -s

On Thu, May 28, 2009 at 5:48 PM, Duncan Murdoch murd...@stats.uwo.cawrote:


On 28/05/2009 5:34 PM, Steve Jaffe wrote:


Sounds simple but haven't been able to find it in docs: is it possible to
sort a vector using a user-defined comparison function? Seems it must be,
but sort doesn't seem to provide that option, nor does order sfaics



You put a class on the vector (e.g. using class(x) - myvector), then
define a conversion to numeric (e.g. xtfrm.myvector) or actual comparison
methods (you'll need ==.myvector, .myvector, and is.na.myvector).

Duncan Murdoch


__
r-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
r-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug in base function sample ( ) (PR#13727)

2009-05-29 Thread Stavros Macrakis

 ...I discovered that when R attempts to sample from an object with only one
 number it does not
 reproduce/report the number but instead chooses a random number between 1
 and that number.


This is the documented behavior.

In my opinion, it is a design error, but changing it would no doubt break
lots of code.

As a general rule, the designers of R seem to have preferred convenience to
consistency, which often makes things easier or more concise, but sometimes
causes unfortunate surprises like this.

   -s

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] custom sort?

2009-05-29 Thread Stavros Macrakis
Thanks for the quick fix!

-s

On Fri, May 29, 2009 at 1:02 PM, Duncan Murdoch murd...@stats.uwo.cawrote:

 On 5/29/2009 9:28 AM, Duncan Murdoch wrote:

 I've moved this to R-devel...

 On 5/28/2009 8:17 PM, Stavros Macrakis wrote:

 I couldn't get your suggested method to work:

  `==.foo` - function(a,b) unclass(a)==unclass(b)
  `.foo` - function(a,b) unclass(a)  unclass(b) # invert comparison
  is.na.foo - function(a)is.na(unclass(a))

  sort(structure(sample(5),class=foo))  #- 1:5  -- not reversed

 What am I missing?


 There are two problems.  First, I didn't mention that you need a method
 for indexing as well.  The code needs to evaluate things like x[i]  x[j],
 and by default x[i] will not be of class foo, so the custom comparison
 methods won't be called.

 Second, I think there's a bug in the internal code, specifically in
 do_rank or orderVector1 in sort.c:  orderVector1 ignores the class of x.
  do_rank pays attention when breaking ties, so I think this is an oversight.

 So I'd say two things should be done:

  1.  the bug should be fixed.  Even if this isn't the most obvious
 approach, it should work.


 I've now fixed the bug, and clarified the documentation to say

  The default method will make use of == and  methods
  for the class of x[i] (for integers i), and the
  is.na method for the class of x, but might be rather
  slow when doing so.

 You don't actually need a custom indexing method, you just need to be aware
 that it's the class of x[i] that is important for comparisons.

 This will make it into R-patched and R-devel.

 Duncan Murdoch



  2.  we should look for ways to make all of this simpler, e.g. allowing a
 comparison function to be used.

 I'll take on 1, but not 2.  It's hard to work out the right place for the
 comparison function to appear, and it would require a lot of work to
 implement, because all of this stuff (sort, rank, order, xtfrm, sort.int,
 etc.) is closely interrelated, some but not all of the functions are S3
 generics, some implemented internally, etc.  In the end, I'd guess the
 results won't be very satisfactory from a performance point of view:  all
 those calls out to R to do the comparisons are going to be really slow.

 I think your advice to use order() with multiple keys is likely to be much
 faster in most instances.  It's just a better approach in R.

 Duncan Murdoch


   -s

 On Thu, May 28, 2009 at 5:48 PM, Duncan Murdoch murd...@stats.uwo.ca
 wrote:

  On 28/05/2009 5:34 PM, Steve Jaffe wrote:

  Sounds simple but haven't been able to find it in docs: is it possible
 to
 sort a vector using a user-defined comparison function? Seems it must
 be,
 but sort doesn't seem to provide that option, nor does order sfaics


 You put a class on the vector (e.g. using class(x) - myvector), then
 define a conversion to numeric (e.g. xtfrm.myvector) or actual
 comparison
 methods (you'll need ==.myvector, .myvector, and is.na.myvector).

 Duncan Murdoch


 __
 r-h...@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

 __
 r-h...@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.






[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] install.packages now intentionally references .Rprofile?

2009-05-29 Thread McGehee, Robert
I see that related to this thread, 'R CMD INSTALL' (like
'install.packages') also reads the .Rprofile before beginning. This
caused package installation headaches for me that developers should be
aware (as it was very difficult to debug).

I added a setwd() to my .Rprofile [for example: setwd(/tmp)] to keep
.Rhistory files from popping up in directories throughout my computer.
This causes package installation to fail completely with an unhelpful
error message. For example (any package will do here):
 R CMD INSTALL zoo_1.5-6.tar.gz
Warning: invalid package 'zoo_1.5-6.tar.gz'
Error: ERROR: no packages specified

Removing 'setwd(...)' from the .Rprofile restores normal package
installation behavior.

I'd like to request that either setwd() not break installation, or the
user can disable .Rprofile reading on R CMD INSTALL (for instance with
an option such as --no-init-file). I'll use Heather's solution below for
the short-term, but would rather not have to completely turn off my
.Rprofile for non-interactive scripts.

Thanks, Robert


-Original Message-
From: r-devel-boun...@r-project.org
[mailto:r-devel-boun...@r-project.org] On Behalf Of Heather Turner
Sent: Friday, May 22, 2009 6:13 AM
To: Mark Kimpel
Cc: Prof Brian Ripley; r-de...@stat.math.ethz.ch
Subject: Re: [Rd] install.packages now intentionally references
.Rprofile?

I had a similar problem when moving to R-2.9.0 as my .Rprofile called
update.packages(). The solution was to use

if(interactive()) {
utils:::update.packages(ask = FALSE)
}

HTH,

Heather

Mark Kimpel wrote:
 This was my original post, with the code example only slightly
modified by
 Martin for clarity. Prior to R-2.9.0, this repeated downloading did
not
 occur, the code worked as intended. In fact, if memory serves me
correctly,
 it even worked at least during the first 3 months of R-2.0.0 in its
 development stage, before release as a numbered version. Is there a
reason
 for that? Is there a work-around? As I mentioned in my original post,
the
 code is actually wrapped in a function that checks the date and the
date of
 the last update, and proceeds to update package once per week. It was
quite
 handy when it was working, hence my desire for a fix for my code.
 
 Thanks,
 Mark
 
 Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
 Indiana University School of Medicine
 
 15032 Hunter Court, Westfield, IN  46074
 
 (317) 490-5129 Work,  Mobile  VoiceMail
 (317) 399-1219  Home
 Skype:  mkimpel
 
 The real problem is not whether machines think but whether men do.
-- B.
 F. Skinner
 **
 
 
 On Thu, May 21, 2009 at 2:17 AM, Prof Brian Ripley
rip...@stats.ox.ac.ukwrote:
 
 On Wed, 20 May 2009, Martin Morgan wrote:

  A post on the Bioconductor mailing list
  https://stat.ethz.ch/pipermail/bioconductor/2009-May/027700.html

 suggests that install.packages now references .Rprofile (?), whereas
 in R-2-8 it did not. Is this intentional?

 Yes.  And in fact it did in earlier versions, to find the default
library
 into which to install.



 The example is, in .Rprofile

  library(utils)
  install.packages(Biobase,
  repos=http://bioconductor.org/packages/2.4/bioc;)

 then starting R from the command line results in repeated downloads
 of Biobase

 mtmor...@mm:~/tmp R --quiet
 trying URL
 '

http://bioconductor.org/packages/2.4/bioc/src/contrib/Biobase_2.4.1.tar.
gz
 '
 Content type 'application/x-gzip' length 1973533 bytes (1.9 Mb)
 opened URL
 ==
 downloaded 1.9 Mb

 trying URL
 '

http://bioconductor.org/packages/2.4/bioc/src/contrib/Biobase_2.4.1.tar.
gz
 '
 Content type 'application/x-gzip' length 1973533 bytes (1.9 Mb)
 opened URL
 ==
 downloaded 1.9 Mb

 ^C
 Execution halted

  sessionInfo()
 R version 2.9.0 Patched (2009-05-20 r48588)
 x86_64-unknown-linux-gnu

 locale:


LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.U
TF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=
C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATI
ON=C

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 Martin
 --
 Martin Morgan
 Computational Biology / Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N.
 PO Box 19024 Seattle, WA 98109

 Location: Arnold Building M1 B861
 Phone: (206) 667-2793

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


 --
 Brian D. Ripley,  rip...@stats.ox.ac.uk
 Professor of Applied Statistics,
http://www.stats.ox.ac.uk/~ripley/http://www.stats.ox.ac.uk/%7Eripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UK   

Re: [Rd] as.numeric(levels(factor(x))) may be a decreasing sequence

2009-05-29 Thread Wacek Kusnierczyk
Petr Savicky wrote:
 On Fri, May 29, 2009 at 03:53:02PM +0200, Martin Maechler wrote:
   
 my version of *using* the function was

 1 SEXP attribute_hidden StringFromReal(double x, int *warn)
 2 {
 3   int w, d, e;
 4   formatReal(x, 1, w, d, e, 0);
 5   if (ISNA(x)) return NA_STRING;
 6   else return mkChar(dropTrailing0(EncodeReal(x, w, d, e, OutDec), 
 OutDec));
 7 }

 where you need to consider that mkChar() expects a 'const char*' 
 and EncodeReal(.) returns one, and I am pretty sure this was the
 main reason why Petr had used the two 'const char*' in (the
 now-named) dropTrailing0() definition. 
 

 Yes, the goal was to accept the output of EncodeReal() with exactly the
 same type, which EncodeReal() produces. A question is, whether the
 output type of EncodeReal() could be changed to (char *). Then, changing
 the output string could be done without casting const to non-const.

   
exactly.  my suggestion was to modify your function so that no modify a
constant string-cheating is done, by either (a) keeping the const but
returning a *new* string (hence no const-to-nonconst cast would be
needed), or (b) modify your function to accept a non-const string *and*
modify the code that connects to your function via the input and output
strings. 

note, if a solution in which your function serves as a destructive
filter is just fine (martin seems to have accepted it already), then
EncodeReal probably can produce just a string, with no const qualifier,
and analogously for mkChar.  on the other hand, if EncodeReal is
purposefully designed to return a const string (i.e., there is an
important reason for doing so), and analogously for mkChar, then your
function violates the assumptions and can potentially be harmful to the
rest of the code.


 This solution may be in conflict with the structure of the rest of R code,
 so i cannot evaluate, whether this is possible.

   

well, either the rest of the code does *not* need const, and it can be
safely removed, or it *does* rely on const, and your solution ciolates
the expectation.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as.numeric(levels(factor(x))) may be a decreasing sequence

2009-05-29 Thread Wacek Kusnierczyk
Martin Maechler wrote:

[...]
 vQ you return s, which should be the same pointer value (given the actual
 vQ code that does not modify the local variable s) with the same 
 pointed-to
 vQ string value (given the signature of the function).

 vQ was perhaps

 vQ char *elim_trailing(char* const s, char cdec)

 vQ intended?

 yes that would seem slightly more logical to my eyes, 
 and in principle I also agree with the other remarks you make above,
   

what does ' in principle ' mean, as opposed to 'in principle'?  (is it
emphasis, or sneer quotes?)

 ...

 vQ anyway, having the pointer s itself declared as const does
 vQ make sense, as the code seems to assume that exactly the input pointer
 vQ value should be returned.  or maybe the argument to elim_trailing 
 should
 vQ not be declared as const, since elim_trailing violates the 
 declaration. 

 vQ one way out is to drop the violated const in both the actual argument
 vQ and in elim_trailing, which would then be simplified by removing all
 vQ const qualifiers and (char*) casts.  

 I've tried that, but   ``it does not work'' later:
 {after having renamed  'elim_trailing'  to  'dropTrailing0' }
 my version of *using* the function was

 1 SEXP attribute_hidden StringFromReal(double x, int *warn)
 2 {
 3   int w, d, e;
 4   formatReal(x, 1, w, d, e, 0);
 5   if (ISNA(x)) return NA_STRING;
 6   else return mkChar(dropTrailing0(EncodeReal(x, w, d, e, OutDec), OutDec));
 7 }

 where you need to consider that mkChar() expects a 'const char*' 
 and EncodeReal(.) returns one, and I am pretty sure this was the
 main reason why Petr had used the two 'const char*' in (the
 now-named) dropTrailing0() definition. 
 If I use your proposed signature

 char* dropTrailing0(char *s, char cdec);

 line 6 above gives warnings in all of several incantations I've tried
 including this one :

 else return mkChar((const char *) dropTrailing0((char *)EncodeReal(x, w, 
 d, e, OutDec), OutDec));

 which (the warnings) leave me somewhat clue-less or rather
 unmotivated to dig further, though I must say that I'm not the
 expert on the subject char*  / const char* ..
   

of course, if the input *is* const and the output is expected to be
const, you should get an error/warning in the first case, and at least a
warning in the other (depending on the level of verbosity/pedanticity
you choose).

but my point was not to light-headedly change the signature/return of
elim_trailing and its implementation and use it in the original
context;  it was to either modify the context as well (if const is
inessential), or drop modifying the const string if the const is in fact
essential.


 vQ   another way out is to make
 vQ elim_trailing actually allocate and return a new string, keeping the
 vQ input truly constant, at a performance cost.  yet another way is 
 to
 vQ ignore the issue, of course.

 vQ the original (martin/petr) version may quietly pass -Wall, but the
 vQ compiler would complain (rightfully) with -Wcast-qual.

 hmm, yes, but actually I haven't found a solution along your
 proposition that even passes   -pedantic -Wall -Wcast-align
 (the combination I've personally been using for a long time).
   

one way is to return from elim_trailing a new, const copy of the const
string.  using memcpy should be efficient enough.  care should be taken
to deallocate s when no longer needed.  (my guess is that using the
approach suggested here, s can be deallocated as soon as it is copied,
which means pretty much that it does not really have to be const.)

 Maybe we can try to solve this more esthetically
 in private e-mail exchange?
   

sure, we can discuss aesthetics offline.  as long as we do not discuss
aesthetics (do we?), it seems appropriate to me to keep the discussion
online.

i will experiment with a patch to solve this issue, and let you know
when i have something reasonable.

best,
vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Why change data type when dropping to one-dimension?

2009-05-29 Thread Jason Vertrees
Hello,

First, let me say I'm an avid fan of R--it's incredibly powerful and I
use it all the time.  I appreciate all the hard work that the many
developers have undergone.

My question is: why does the paradigm of changing the type of a 1D
return value to an unlisted array exist?  This introduces boundary
conditions where none need exist, thus making the coding harder and
confusing.

For example, consider:
   d = data.frame(a=rnorm(10), b=rnorm(10));
   typeof(d);  # OK;
   typeof(d[,1]);  # Unexpected;
   typeof(d[,1,drop=F]);   # Oh, now I see.

This is indeed documented in the R Language specification, but why is it
there in the first place?  It doesn't make sense to the average
programmer to change the return type based on dimension.

Here it is again in 'sapply':
   sapply
   function (X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
   {
   [...snip...]
  if (common.len == 1)
  unlist(answer, recursive = FALSE)
  else if (common.len  1)
  array(unlist(answer, recursive = FALSE),
   dim = c(common.len,
  length(X)), dimnames = if (!(is.null(n1 -
   names(answer[[1]])) 
  is.null(n2 - names(answer
  list(n1, n2))
   [...snip...]
}

So, in 'sapply', if your return value is one-dimensional be careful,
because the return type will not the be same as if it were otherwise.

Is this legacy or a valid, rational design decision which I'm not yet a
sophisticated enough R coder to enjoy?

Thanks,

-- Jason

-- 

Jason Vertrees, PhD

Dartmouth College : j...@cs.dartmouth.edu
Boston University : jas...@bu.edu

PyMOLWiki : http://www.pymolwiki.org/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as.numeric(levels(factor(x))) may be a decreasing sequence

2009-05-29 Thread Martin Maechler
Hi Waclav (and other interested parties),

I have committed my working version of src/main/coerce.c
so you can prepare your patch against that.

Thank you in advance!
Martin

On Fri, May 29, 2009 at 21:54, Wacek Kusnierczyk
waclaw.marcin.kusnierc...@idi.ntnu.no wrote:
 Martin Maechler wrote:

 [...]
     vQ you return s, which should be the same pointer value (given the 
 actual
     vQ code that does not modify the local variable s) with the same 
 pointed-to
     vQ string value (given the signature of the function).

     vQ was perhaps

     vQ char *elim_trailing(char* const s, char cdec)

     vQ intended?

 yes that would seem slightly more logical to my eyes,
 and in principle I also agree with the other remarks you make above,


 what does ' in principle ' mean, as opposed to 'in principle'?  (is it
 emphasis, or sneer quotes?)

 ...

     vQ anyway, having the pointer s itself declared as const does
     vQ make sense, as the code seems to assume that exactly the input 
 pointer
     vQ value should be returned.  or maybe the argument to elim_trailing 
 should
     vQ not be declared as const, since elim_trailing violates the 
 declaration.

     vQ one way out is to drop the violated const in both the actual argument
     vQ and in elim_trailing, which would then be simplified by removing all
     vQ const qualifiers and (char*) casts.

 I've tried that, but   ``it does not work'' later:
 {after having renamed  'elim_trailing'  to  'dropTrailing0' }
 my version of *using* the function was

 1 SEXP attribute_hidden StringFromReal(double x, int *warn)
 2 {
 3   int w, d, e;
 4   formatReal(x, 1, w, d, e, 0);
 5   if (ISNA(x)) return NA_STRING;
 6   else return mkChar(dropTrailing0(EncodeReal(x, w, d, e, OutDec), 
 OutDec));
 7 }

 where you need to consider that mkChar() expects a 'const char*'
 and EncodeReal(.) returns one, and I am pretty sure this was the
 main reason why Petr had used the two 'const char*' in (the
 now-named) dropTrailing0() definition.
 If I use your proposed signature

 char* dropTrailing0(char *s, char cdec);

 line 6 above gives warnings in all of several incantations I've tried
 including this one :

     else return mkChar((const char *) dropTrailing0((char *)EncodeReal(x, w, 
 d, e, OutDec), OutDec));

 which (the warnings) leave me somewhat clue-less or rather
 unmotivated to dig further, though I must say that I'm not the
 expert on the subject char*  / const char* ..


 of course, if the input *is* const and the output is expected to be
 const, you should get an error/warning in the first case, and at least a
 warning in the other (depending on the level of verbosity/pedanticity
 you choose).

 but my point was not to light-headedly change the signature/return of
 elim_trailing and its implementation and use it in the original
 context;  it was to either modify the context as well (if const is
 inessential), or drop modifying the const string if the const is in fact
 essential.


     vQ   another way out is to make
     vQ elim_trailing actually allocate and return a new string, keeping the
     vQ input truly constant, at a performance cost    .  yet another way is 
 to
     vQ ignore the issue, of course.

     vQ the original (martin/petr) version may quietly pass -Wall, but the
     vQ compiler would complain (rightfully) with -Wcast-qual.

 hmm, yes, but actually I haven't found a solution along your
 proposition that even passes   -pedantic -Wall -Wcast-align
 (the combination I've personally been using for a long time).


 one way is to return from elim_trailing a new, const copy of the const
 string.  using memcpy should be efficient enough.  care should be taken
 to deallocate s when no longer needed.  (my guess is that using the
 approach suggested here, s can be deallocated as soon as it is copied,
 which means pretty much that it does not really have to be const.)

 Maybe we can try to solve this more esthetically
 in private e-mail exchange?


 sure, we can discuss aesthetics offline.  as long as we do not discuss
 aesthetics (do we?), it seems appropriate to me to keep the discussion
 online.

 i will experiment with a patch to solve this issue, and let you know
 when i have something reasonable.

 best,
 vQ



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Why change data type when dropping to one-dimension?

2009-05-29 Thread Thomas Lumley

On Fri, 29 May 2009, Jason Vertrees wrote:


My question is: why does the paradigm of changing the type of a 1D
return value to an unlisted array exist?  This introduces boundary
conditions where none need exist, thus making the coding harder and
confusing.

For example, consider:
  d = data.frame(a=rnorm(10), b=rnorm(10));
  typeof(d);# OK;
  typeof(d[,1]);# Unexpected;
  typeof(d[,1,drop=F]); # Oh, now I see.


It does make it harder for programmers, but it makes it easier for 
non-programmers.  In particular, it is convenient to be able to do d[1,1] to 
extract a number from a matrix, rather than having to explicitly coerce the 
result to stop it being a matrix.

At least the last two times this was discussed, there ended up being a 
reasonable level of agreement that if someone's life had to be made harder the 
programmers were better able to cope and that dropping dimensions was 
preferable.

-thomas

Thomas Lumley   Assoc. Professor, Biostatistics
tlum...@u.washington.eduUniversity of Washington, Seattle

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Why change data type when dropping to one-dimension?

2009-05-29 Thread Stavros Macrakis
This is another example of the general preference of the designers of R for
convenience over consistency.

In my opinion, this is a design flaw even for non-programmers, because I
find that inconsistencies make the system harder to learn.  Yes, the naive
user may stumble over the difference between m[[1,1]] and m[1,1] a few times
before getting it, but once he or she understands the principle, it is
general.

 -s

On Fri, May 29, 2009 at 5:33 PM, Jason Vertrees j...@cs.dartmouth.edu wrote:

 Hello,

 First, let me say I'm an avid fan of R--it's incredibly powerful and I
 use it all the time.  I appreciate all the hard work that the many
 developers have undergone.

 My question is: why does the paradigm of changing the type of a 1D
 return value to an unlisted array exist?  This introduces boundary
 conditions where none need exist, thus making the coding harder and
 confusing.

 For example, consider:
   d = data.frame(a=rnorm(10), b=rnorm(10));
   typeof(d);  # OK;
   typeof(d[,1]);  # Unexpected;
   typeof(d[,1,drop=F]);   # Oh, now I see.

 This is indeed documented in the R Language specification, but why is it
 there in the first place?  It doesn't make sense to the average
 programmer to change the return type based on dimension.

 Here it is again in 'sapply':
   sapply
   function (X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
   {
   [...snip...]
  if (common.len == 1)
  unlist(answer, recursive = FALSE)
  else if (common.len  1)
  array(unlist(answer, recursive = FALSE),
   dim = c(common.len,
  length(X)), dimnames = if (!(is.null(n1 -
   names(answer[[1]])) 
  is.null(n2 - names(answer
  list(n1, n2))
   [...snip...]
}

 So, in 'sapply', if your return value is one-dimensional be careful,
 because the return type will not the be same as if it were otherwise.

 Is this legacy or a valid, rational design decision which I'm not yet a
 sophisticated enough R coder to enjoy?

 Thanks,

 -- Jason

 --

 Jason Vertrees, PhD

 Dartmouth College : j...@cs.dartmouth.edu
 Boston University : jas...@bu.edu

 PyMOLWiki : http://www.pymolwiki.org/

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Why change data type when dropping to one-dimension?

2009-05-29 Thread Jason Vertrees
Thomas Lumley wrote:
 On Fri, 29 May 2009, Jason Vertrees wrote:
 
 My question is: why does the paradigm of changing the type of a 1D
 return value to an unlisted array exist?  This introduces boundary
 conditions where none need exist, thus making the coding harder and
 confusing.

 For example, consider:
   d = data.frame(a=rnorm(10), b=rnorm(10));
   typeof(d);# OK;
   typeof(d[,1]);  # Unexpected;
   typeof(d[,1,drop=F]);# Oh, now I see.
 
 It does make it harder for programmers, but it makes it easier for
 non-programmers.  In particular, it is convenient to be able to do
 d[1,1] to extract a number from a matrix, rather than having to
 explicitly coerce the result to stop it being a matrix.
 
 At least the last two times this was discussed, there ended up being a
 reasonable level of agreement that if someone's life had to be made
 harder the programmers were better able to cope and that dropping
 dimensions was preferable.
 
 -thomas
 
 Thomas LumleyAssoc. Professor, Biostatistics
 tlum...@u.washington.eduUniversity of Washington, Seattle


Thomas,

Thanks for the quick response.  I agree that extracting a number from a
matrix/frame should result in a number not a matrix/frame.  But, why do
that for a 1D array of numbers?  In my example,
   d[,1];
is an array, not a single number.  How does that help the novice user?

I guess I just don't like the idea that the default result is to act
unexpectedly and that a flag or boundary-conditional code is needed to
do the right thing.

Regardless that's how it is, so I just need to learn the pitfalls for
where that occurs.

Thanks again,

-- Jason

-- 

Jason Vertrees, PhD

Dartmouth College : j...@cs.dartmouth.edu
Boston University : jas...@bu.edu

PyMOLWiki : http://www.pymolwiki.org/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as.numeric(levels(factor(x))) may be a decreasing sequence

2009-05-29 Thread Wacek Kusnierczyk
Martin Maechler wrote:
 Hi Waclav (and other interested parties),

 I have committed my working version of src/main/coerce.c
 so you can prepare your patch against that.
   

Hi Martin,

One quick reaction (which does not resolve my original complaint):  you
can have p non-const, and cast s to char* on the first occasion its
value is assigned to p, thus being able to copy from p to replace
without repetitive casts.  make check-ed patch atatched.

vQ
Index: src/main/coerce.c
===
--- src/main/coerce.c	(revision 48689)
+++ src/main/coerce.c	(working copy)
@@ -297,13 +297,13 @@
 
 const char* dropTrailing0(const char *s, char cdec)
 {
-const char *p;
-for (p = s; *p; p++) {
+char *p;
+for (p = (char *)s; *p; p++) {
 	if(*p == cdec) {
-	char *replace = (char *) p++;
+	char *replace = p++;
 	while ('0' = *p*p = '9')
 		if(*(p++) != '0')
-		replace = (char *) p;
+		replace = p;
 	while((*(replace++) = *(p++)))
 		;
 	break;
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Why change data type when dropping to one-dimension?

2009-05-29 Thread Wacek Kusnierczyk
Stavros Macrakis wrote:
 This is another example of the general preference of the designers of R for
 convenience over consistency.

 In my opinion, this is a design flaw even for non-programmers, because I
 find that inconsistencies make the system harder to learn.  Yes, the naive
 user may stumble over the difference between m[[1,1]] and m[1,1] a few times
 before getting it, but once he or she understands the principle, it is
 general.
   

+1

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Why change data type when dropping to one-dimension?

2009-05-29 Thread Thomas Lumley

On Fri, 29 May 2009, Stavros Macrakis wrote:


This is another example of the general preference of the designers of R for
convenience over consistency.

In my opinion, this is a design flaw even for non-programmers, because I
find that inconsistencies make the system harder to learn.  Yes, the naive
user may stumble over the difference between m[[1,1]] and m[1,1] a few times
before getting it, but once he or she understands the principle, it is
general.


I was on your side of this argument the first time it came up, but ended up 
being convinced the other way.

In contrast to sample(n) or the non-standard evaluation of weights= and subset= 
arguments to modelling functions, or various other conveniences that I think we 
are stuck with despite them being a bad idea, I think dropping dimensions is 
useful.

 -thomas





-s

On Fri, May 29, 2009 at 5:33 PM, Jason Vertrees j...@cs.dartmouth.edu wrote:


Hello,

First, let me say I'm an avid fan of R--it's incredibly powerful and I
use it all the time.  I appreciate all the hard work that the many
developers have undergone.

My question is: why does the paradigm of changing the type of a 1D
return value to an unlisted array exist?  This introduces boundary
conditions where none need exist, thus making the coding harder and
confusing.

For example, consider:
 d = data.frame(a=rnorm(10), b=rnorm(10));
 typeof(d);  # OK;
 typeof(d[,1]);  # Unexpected;
 typeof(d[,1,drop=F]);   # Oh, now I see.

This is indeed documented in the R Language specification, but why is it
there in the first place?  It doesn't make sense to the average
programmer to change the return type based on dimension.

Here it is again in 'sapply':
 sapply
 function (X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
 {
 [...snip...]
if (common.len == 1)
unlist(answer, recursive = FALSE)
else if (common.len  1)
array(unlist(answer, recursive = FALSE),
 dim = c(common.len,
length(X)), dimnames = if (!(is.null(n1 -
 names(answer[[1]])) 
is.null(n2 - names(answer
list(n1, n2))
 [...snip...]
  }

So, in 'sapply', if your return value is one-dimensional be careful,
because the return type will not the be same as if it were otherwise.

Is this legacy or a valid, rational design decision which I'm not yet a
sophisticated enough R coder to enjoy?

Thanks,

-- Jason

--

Jason Vertrees, PhD

Dartmouth College : j...@cs.dartmouth.edu
Boston University : jas...@bu.edu

PyMOLWiki : http://www.pymolwiki.org/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



Thomas Lumley   Assoc. Professor, Biostatistics
tlum...@u.washington.eduUniversity of Washington, Seattle

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] png() error in recent R-devel on Windows

2009-05-29 Thread Hervé Pagès

Hi,

Tested with the latest r-devel snapshot build for Windows (2009-05-28 r48663):

 png(test.png)
Error in png(test.png) : invalid value of 'fillOddEven'

The png() function is defined like this:

 png
function (filename = Rplot%03d.png, width = 480, height = 480,
units = px, pointsize = 12, bg = white, res = NA, restoreConsole = TRUE)
{
if (!checkIntFormat(filename))
stop(invalid 'filename')
filename - path.expand(filename)
units - match.arg(units, c(in, px, cm, mm))
if (units != px  is.na(res))
stop('res' must be specified unless 'units = \px\')
height - switch(units, `in` = res, cm = res/2.54, mm = res/25.4,
px = 1) * height
width - switch(units, `in` = res, cm = res/2.54, mm = 1/25.4,
px = 1) * width
invisible(.External(Cdevga, paste(png:, filename, sep = ),
width, height, pointsize, FALSE, 1L, NA_real_, NA_real_,
bg, 1, as.integer(res), NA_integer_, FALSE, .PSenv, NA,
restoreConsole, , FALSE))
}

Note that the call to .External has 19 arguments, the last 2 of them being
 and FALSE but the devga() function defined in 
src/library/grDevices/src/init.c
expects 1 more argument (19 + the entry point name), the last 3 of them
being expected to be string (title), logical (clickToConfirm), and
logical (fillOddEven). So it seems like the recently added 'fillOddEven'
argument (r48294) is omitted in the .External call, hence the error.

 sessionInfo()
R version 2.10.0 Under development (unstable) (2009-05-28 r48663)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

Cheers,
H.


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] png() error in recent R-devel on Windows

2009-05-29 Thread Duncan Murdoch

Thanks, will fix.

Duncan Murdoch

On 29/05/2009 7:32 PM, Hervé Pagès wrote:

Hi,

Tested with the latest r-devel snapshot build for Windows (2009-05-28 r48663):

  png(test.png)
Error in png(test.png) : invalid value of 'fillOddEven'

The png() function is defined like this:

  png
function (filename = Rplot%03d.png, width = 480, height = 480,
 units = px, pointsize = 12, bg = white, res = NA, restoreConsole = 
TRUE)
{
 if (!checkIntFormat(filename))
 stop(invalid 'filename')
 filename - path.expand(filename)
 units - match.arg(units, c(in, px, cm, mm))
 if (units != px  is.na(res))
 stop('res' must be specified unless 'units = \px\')
 height - switch(units, `in` = res, cm = res/2.54, mm = res/25.4,
 px = 1) * height
 width - switch(units, `in` = res, cm = res/2.54, mm = 1/25.4,
 px = 1) * width
 invisible(.External(Cdevga, paste(png:, filename, sep = ),
 width, height, pointsize, FALSE, 1L, NA_real_, NA_real_,
 bg, 1, as.integer(res), NA_integer_, FALSE, .PSenv, NA,
 restoreConsole, , FALSE))
}

Note that the call to .External has 19 arguments, the last 2 of them being
 and FALSE but the devga() function defined in 
src/library/grDevices/src/init.c
expects 1 more argument (19 + the entry point name), the last 3 of them
being expected to be string (title), logical (clickToConfirm), and
logical (fillOddEven). So it seems like the recently added 'fillOddEven'
argument (r48294) is omitted in the .External call, hence the error.

  sessionInfo()
R version 2.10.0 Under development (unstable) (2009-05-28 r48663)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

Cheers,
H.




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] setdiff bizarre (was: odd behavior out of setdiff)

2009-05-29 Thread G. Jay Kerns
Dear R-devel,

Please see the recent thread on R-help, Odd Behavior Out of
setdiff(...) - addition of duplicate entries is not identified posted
by Jason Rupert.  I gave an answer, then read David Winsemius' answer,
and then did some follow-up investigation.

I would like to change my answer.

My current version of setdiff() is acting in a way that I do not
understand, and a way that I suspect  has changed.  Consider the
following, derived from Jason's OP:

The base package setdiff(), atomic vectors:

x - 1:100
y - c(x,x)

setdiff(x, y)  # integer(0)
setdiff(y, x)  # integer(0)

z - 1:25

setdiff(x,z)   # 26:100
setdiff(z,x)   # integer(0)


Everything is fine.

Now look at base package setdiff(), data frames???


A - data.frame(x = 1:100)
B - rbind(A, A)

setdiff(A, B)   # df 1:100?
setdiff(B, A)   # df 1:100?

C - data.frame(x = 1:25)

setdiff(A, C)   # df 1:100?
setdiff(C, A)   # df 1:25?




I have read ?setdiff 37 times now, and I cannot divine any
interpretation that matches the above output.  From the source, it
appears that

match(x, y, 0L) == 0L

is evaluating to TRUE, of length equal to the columns of x, and then

x[match(x, y, 0L) == 0L]

is returning the entire data frame.

Compare with the output from package prob, which uses a setdiff that
operates row-wise:


###
library(prob)
A - data.frame(x = 1:100)
B - rbind(A, A)

setdiff(A, B)   # integer(0)
setdiff(B, A)   # integer(0)

C - data.frame(x = 1:25)

setdiff(A, C)   # 26:100
setdiff(C, A)   # integer(0)



IMHO, the entire notion of set and element is problematic in the
df case, so I am not advocating the adoption of the prob:::setdiff
approach;  rather, setdiff is behaving in a way that I cannot believe
with my own eyes, and I would like to alert those who can speak as to
why this may be happening.

Thanks to Jason for bringing this up, and to David for catching the discrepancy.

Session info is below.  I use the binaries prepared by the Debian
group so I do not have the latest patched-revision-4440986745343b.
This must have been related to something which has been fixed since
April 17, and in that case, please disregard my message.

Yours truly,
Jay






 sessionInfo()
R version 2.9.0 (2009-04-17)
x86_64-pc-linux-gnu

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] prob_0.9-1

















-- 

***
G. Jay Kerns, Ph.D.
Associate Professor
Department of Mathematics  Statistics
Youngstown State University
Youngstown, OH 44555-0002 USA
Office: 1035 Cushwa Hall
Phone: (330) 941-3310 Office (voice mail)
-3302 Department
-3170 FAX
E-mail: gke...@ysu.edu
http://www.cc.ysu.edu/~gjkerns/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel