Re: [Rd] median and data frames

2011-04-27 Thread Paul Johnson
On Wed, Apr 27, 2011 at 12:44 PM, Patrick Burns
 wrote:
> Here are some data frames:
>
> df3.2 <- data.frame(1:3, 7:9)
> df4.2 <- data.frame(1:4, 7:10)
> df3.3 <- data.frame(1:3, 7:9, 10:12)
> df4.3 <- data.frame(1:4, 7:10, 10:13)
> df3.4 <- data.frame(1:3, 7:9, 10:12, 15:17)
> df4.4 <- data.frame(1:4, 7:10, 10:13, 15:18)
>
> Now here are some commands and their answers:

>> median(df4.4)
> [1]  8.5 11.5
>> median(df3.2[c(1,2,3),])
> [1] 2 8
>> median(df3.2[c(1,3,2),])
> [1]  2 NA
> Warning message:
> In mean.default(X[[2L]], ...) :
>  argument is not numeric or logical: returning NA
>
>
>
> The sessionInfo is below, but it looks
> to me like the present behavior started
> in 2.10.0.
>
> Sometimes it gets the right answer.  I'd
> be grateful to hear how it does that -- I
> can't figure it out.
>

Hello, Pat.

Nice poetry there!  I think I have an actual answer, as opposed to the
usual crap I spew.

I would agree if you said median.data.frame ought to be written to
work columnwise, similar to mean.data.frame.

apply and sapply  always give the correct answer

> apply(df3.3, 2, median)
  X1.3   X7.9 X10.12
 2  8 11

> apply(df3.2, 2, median)
X1.3 X7.9
   28

> apply(df3.2[c(1,3,2),], 2, median)
X1.3 X7.9
   28

mean.data.frame is now implemented as

mean.data.frame <- function(x, ...) sapply(x, mean, ...)

I think we would suggest this for medians:

??

median.data.frame <- function(x,...) sapply(x, median, ...)

?

It works, see:

> median.data.frame(df3.2[c(1,3,2),])
X1.3 X7.9
   28

Would our next step be to enter that somewhere in R bugzilla? (I'm not
joking--I'm that naive).

I think I can explain why the current median works intermittently in
those cases you mention.  Give it a small set of pre-sorted data, all
is well.  median.default uses a sort function, and it is confused when
it is given a data.frame object rather than just a vector.


I put a browser() at the top of median.default

> median(df3.2[c(1,3,2),])
Called from: median.default(df3.2[c(1, 3, 2), ])
Browse[1]> n
debug at #4: if (is.factor(x)) stop("need numeric data")
Browse[2]> n
debug at #4: NULL
Browse[2]> n
debug at #6: if (length(names(x))) names(x) <- NULL
Browse[2]> n
debug at #6: names(x) <- NULL
Browse[2]> n
debug at #8: if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x)))
return(x[FALSE][NA])
Browse[2]> n
debug at #8: if (any(is.na(x))) return(x[FALSE][NA])
Browse[2]> n
debug at #8: NULL
Browse[2]> n
debug at #12: n <- length(x)
Browse[2]> n
debug at #13: if (n == 0L) return(x[FALSE][NA])
Browse[2]> n
debug at #13: NULL
Browse[2]> n
debug at #15: half <- (n + 1L)%/%2L
Browse[2]> n
debug at #16: if (n%%2L == 1L) sort(x, partial = half)[half] else
mean(sort(x,
partial = half + 0L:1L)[half + 0L:1L])
Browse[2]> n
debug at #16: mean(sort(x, partial = half + 0L:1L)[half + 0L:1L])
Browse[2]> n
[1]  2 NA
Warning message:
In mean.default(X[[2L]], ...) :
  argument is not numeric or logical: returning NA


Note the sort there in step 16. I think that's what is killing us.

If you are lucky, give it a small  data frame that is in order, like
df3.2, the sort doesn't produce gibberish. When I get to that point, I
will show you the sort's effect.

First, the case that "works". I moved the browser() down, because I
got tired of looking at the same old not-yet-erroneous output.


> median(df3.2)
Called from: median.default(df3.2)
Browse[1]> n
debug at #15: half <- (n + 1L)%/%2L
Browse[2]> n
debug at #16: if (n%%2L == 1L) sort(x, partial = half)[half] else
mean(sort(x,
partial = half + 0L:1L)[half + 0L:1L])
Browse[2]> n
debug at #16: mean(sort(x, partial = half + 0L:1L)[half + 0L:1L])

Interactively, type

Browse[2]> sort(x, partial = half + 0L:1L)
  NA NA   NA   NA   NA   NA
1  1  7 NULL NULL NULL NULL
2  2  8
3  3  9
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
  corrupt data frame: columns will be truncated or padded with NAs

But it still gives you a "right" answer:

Browse[2]> n
[1] 2 8


But if  you give it data out of order, the second column turns to NA,
and that causes doom.


> median(df3.2[c(1,3,2),])
Called from: median.default(df3.2[c(1, 3, 2), ])
Browse[1]> n
debug at #15: half <- (n + 1L)%/%2L
Browse[2]> n
debug at #16: if (n%%2L == 1L) sort(x, partial = half)[half] else
mean(sort(x,
partial = half + 0L:1L)[half + 0L:1L])
Browse[2]> n
debug at #16: mean(sort(x, partial = half + 0L:1L)[half + 0L:1L])

Interactively:

Browse[2]> sort(x, partial = half + 0L:1L)
  NA   NA NA   NA   NA   NA
1  1 NULL  7 NULL NULL NULL
3  3   9   
2  2   8   
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
  corrupt data frame: columns will be truncated or padded with NAs

Browse[2]> n
[1]  2 NA
Warning message:
In mean.default(X[[2L]], ...) :
  argument is not numeric or logical: returning NA


Here's a larger test case. Note columns 1 and 3 turn to NULL

> df8.8 <- data.frame(a=2:8, b=1:7)

median(df8.8)


Re: [Rd] How to create vignette.pdf for R-2.13.0?

2011-04-27 Thread cstrato

Dear Uwe,

As I have already mentioned R CMD check gives the following output:

* checking for unstated dependencies in vignettes ... OK
* checking package vignettes in 'inst/doc' ... WARNING
Package vignette(s) without corresponding PDF:
   APTvsXPS.Rnw
   xps.Rnw
   xpsClasses.Rnw
   xpsPreprocess.Rnw

* checking running R code from vignettes ... OK
* checking re-building of vignettes ... OK
* checking PDF version of manual ... OK

WARNING: There was 1 warning, see
  '/Volumes/CoreData/CRAN/xps.Rcheck/00check.log'
for details


Although the output says "checking PDF version of manual ... OK" I 
cannot find any result in "xps.Rcheck".



When I run "R64 CMD build xps" I get:

* checking for file 'xps/DESCRIPTION' ... OK
* preparing 'xps':
* checking DESCRIPTION meta-information ... OK
* cleaning src
* installing the package to re-build vignettes
* creating vignettes ... OK
* cleaning src
* checking for LF line-endings in source and make files
* checking for empty or unneeded directories
* building 'xps_1.13.1.tar.gz'

However, the resulting file "xps_1.13.1.tgz" does also not contain any 
vignettes.


Best regards
Christian


On 4/27/11 10:16 AM, Uwe Ligges wrote:



On 26.04.2011 21:58, cstrato wrote:

Dear Duncan, dear Uwe,

Just now I have re-run everything, and today xps.Rnw can be converted to
a vignette w/o any problems using:
a, buildVignettes("xps", dir="/Volumes/CoreData/CRAN/xps", quiet=F)
b, R CMD Sweave xps.Rnw

In both cases the vignette xps.pdf is created (maybe my Mac did not like
to work during eastern holidays).

However, one issue remains:
"R64 CMD check xps_1.13.1.tar.gz" no longer creates any pdf files for
the vignettes.



Dioes it give an error or warning? It should check the code. R CMD build
creates the pdf files.

Uwe Ligges



Best regards
Christian


On 4/25/11 9:31 PM, Duncan Murdoch wrote:

On 25/04/2011 3:16 PM, cstrato wrote:

Thank you.

My problem seems to be that at the moment the problem can be seen only
on my Mac, since e.g. the Bioconductor servers have no problems
creating
the vignettes.


Then you are definitely the one in the best position to diagnose the
problem. Use the usual approach: simplify it by cutting out everything
that looks unrelated. Verify that the problem still exists, then cut
some more. Eventually you'll have isolated the error to a particular
small snippet of code, and then you can add print() statements, or use
trace(), or do whatever is necessary to see what's so special about your
system.

I suspect it will turn out to be an assumption in the code that is not
true on your system.

If the assumption is being made by code you wrote, then fix it. If it's
being made by R, let us know.

Duncan Murdoch



Best regards
Christian

On 4/25/11 8:55 PM, Duncan Murdoch wrote:

cstrato wrote:

Dear Duncan,

Thank you for your example, however it is different since it does not
use x and y. What about print(x+y)?


Try it.



Sorry, I do not believe that there is a bug in my code, since:
1, it worked in all versions from R starting with R-2.6.0 till
R-2.12.2.
2, the identical code works in the examples
3, this code (or a similar code) is the starting code which all users
of xps have to use, and there was never a problem.


This might be a problem in R, or might be a problem in your code. As
far
as I know, it has only shown up in your code, so I'd guess that's
where
the problem is. In any case, you're the one in the best position to
isolate it and debug it.

If it turns out to be a problem in R, put together an example
illustrating the problem that doesn't involve your code, and I'll
take a
look.

Duncan Murdoch



Maybe the reason could be that my code has to import
- the CEL-files from the package dir
- the file SchemeTest3.root from the package dir
??

Best regards
Christian

On 4/25/11 8:00 PM, Duncan Murdoch wrote:

cstrato wrote:

Dear Uwe,

Your suggestion to look at the Sweave manual helped me to solve the
problem. It seems that in R-2.13.0 every chunk can use the code
from
the chunk before but not from an earlier chunk.

I'm either misreading what you wrote, or it's wrong. If I have this
in a
Sweave file:

<<>>=
x<- 1
@

<<>>=
y<- 2
@

<<>>=
print(x)
@

I will see the value of x getting printed, even though it came from
two
chunks earlier.

I think Uwe is right: there is some bug in the code you're running.
Sweave isn't the problem.

Duncan Murdoch


Concretely, the following does not work since chunk 5 needs the
code
from chunk 3 and 4:

###
### chunk number 3:
###
#line 126 "xps.Rnw"
celdir<- file.path(.path.package("xps"), "raw")

###
### chunk number 4:
###
#line 132 "xps.Rnw"
scheme.test3<- root.scheme(file.path(.path.package("xps"),
"schemes",
"SchemeTest3.root"))

###
### chunk number 5:
#

Re: [Rd] Thread synchronization [Was: Interrupting C++ code execution]

2011-04-27 Thread Simon Urbanek
Sean,

On Apr 27, 2011, at 3:21 PM, Sean Robert McGuffee wrote:

> Hi Simon,
> That makes a lot of sense to me. I'll start reading about R's event loop 
> signaling. I'm not sure what the best method will be for me to flag the 
> completeness of a threaded process in my case. In abstract it seems that I 
> could get R's event loop to look for any type of flag. I think key for me in 
> this case will be identifying whether a particular file has been completely 
> produced or not. In principle I could put that type of info into the file 
> itself, but I think I could also make a temp file somewhere with it's full 
> path and flag info about it. Then the event loop could look for a particular 
> pattern of temp file names. On the other hand, if I pass in that info when I 
> start the event loop, that might work too.

Usually, the easiest on unix is to register a file handle as input handler 
(addInputHandler) - in practice a pipe - one end is owned by the thread and the 
other is owned by R. Then all you need is to write anything on the thread's end 
and it will wake up R's even loop and let you handle the read on that end so 
you can do anything. You could even have multiple threads share this one pipe 
since you could distinguish by payload which thread is calling. One example of 
this is the integrated HTTP server in R - see Rhttpd sources (it has also a 
variant that works on Windows using synchronization via OS event loop).


> Regarding the external pointer idea, I was thinking about passing an object 
> to R as a return value after launching the thread, and then I might be able 
> to access a pointer inside that object to reference it from my thread. That 
> could be a binary vector or any type of object if I can figure out how to get 
> to it from my thread. Honestly, I don't know much about dynamic referencing 
> of objects from separate threads, but in principle memory is shared in this 
> case. I'll let you know if I come up with anything generic... Please keep me 
> posted on your  package. Are any versions of it available yet?

Yes, it is not released yet since it's not quite complete, but here we go, at 
your own risk ;):

http://rforge.net/threads

It will work on all platforms, eventually, but currently only unix is 
supported. The idea is sort of taking the multicore paradigm (parallel + 
collect) but using threads (threadEval + yield). The documentation it currently 
non-existent, but I plan to write a vignette for it ... maybe later this week 
...

Cheers,
Simon


> It didn't happen to come up on my list of R packages. I haven't necessarily 
> been maintaining an up-to-date version of R though. I don't know if that 
> influences the package list it shows me.
> Sean
> 
> 
> On 4/26/11 8:51 PM, "Simon Urbanek"  wrote:
> 
>> Sean,
>> 
>> On Apr 26, 2011, at 5:06 PM, Sean Robert McGuffee wrote:
>> 
>>> I've been thinking about how to handle c++ threads that were started via 
>>> Rcpp
>>> calls to some of my c++ libraries from R. My main obstacle is trying to make
>>> sure that users don't try to process files that are being generated by a
>>> thread before the thread finishes. One thing I am considering is having my
>>> threaded code return a class to R that contains a pointer that it remembers.
>>> Then maybe I could just change the value at that pointer when my thread
>>> finishes. Does that seem like a reasonable approach? I'm not completely sure
>>> if this is related to your issue or not, but it might be similar enough to 
>>> be
>>> worth asking...
>> 
>> It depends. For a simple flag it's actually much more simple than that - you
>> can create a boolean vector (make sure you preserve it) and just update its
>> value when it's done - you don't even need an external pointer for that (if
>> your'e careful).
>> 
>> But the slight problem with that approach is rather that you don't have a way
>> to tell R about the status change, so essentially you can only poll on the R
>> side. A more proper way to deal with this is to use the event loop signaling
>> to signal in R that the flag has changed. I'm working on a "threads" package
>> that should help with that, but it's not complete yet (you can spawn threads
>> from R and you can actually even synchronize them with R [so if the result is
>> all you want it's there], but semaphores are not implemented yet  --- your
>> inquiry should shift it further up on my todo stack ;)).
>> 
>> Cheers,
>> Simon
> 
> 
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Thread synchronization [Was: Interrupting C++ code execution]

2011-04-27 Thread Sean Robert McGuffee
Hi Simon,
That makes a lot of sense to me. I'll start reading about R's event loop
signaling. I'm not sure what the best method will be for me to flag the
completeness of a threaded process in my case. In abstract it seems that I
could get R's event loop to look for any type of flag. I think key for me in
this case will be identifying whether a particular file has been completely
produced or not. In principle I could put that type of info into the file
itself, but I think I could also make a temp file somewhere with it's full
path and flag info about it. Then the event loop could look for a particular
pattern of temp file names. On the other hand, if I pass in that info when I
start the event loop, that might work too. Regarding the external pointer
idea, I was thinking about passing an object to R as a return value after
launching the thread, and then I might be able to access a pointer inside
that object to reference it from my thread. That could be a binary vector or
any type of object if I can figure out how to get to it from my thread.
Honestly, I don't know much about dynamic referencing of objects from
separate threads, but in principle memory is shared in this case. I'll let
you know if I come up with anything generic... Please keep me posted on your
package. Are any versions of it available yet? It didn't happen to come up
on my list of R packages. I haven't necessarily been maintaining an
up-to-date version of R though. I don't know if that influences the package
list it shows me.
Sean


On 4/26/11 8:51 PM, "Simon Urbanek"  wrote:

> Sean,
> 
> On Apr 26, 2011, at 5:06 PM, Sean Robert McGuffee wrote:
> 
>> I've been thinking about how to handle c++ threads that were started via Rcpp
>> calls to some of my c++ libraries from R. My main obstacle is trying to make
>> sure that users don't try to process files that are being generated by a
>> thread before the thread finishes. One thing I am considering is having my
>> threaded code return a class to R that contains a pointer that it remembers.
>> Then maybe I could just change the value at that pointer when my thread
>> finishes. Does that seem like a reasonable approach? I'm not completely sure
>> if this is related to your issue or not, but it might be similar enough to be
>> worth asking...
> 
> It depends. For a simple flag it's actually much more simple than that - you
> can create a boolean vector (make sure you preserve it) and just update its
> value when it's done - you don't even need an external pointer for that (if
> your'e careful).
> 
> But the slight problem with that approach is rather that you don't have a way
> to tell R about the status change, so essentially you can only poll on the R
> side. A more proper way to deal with this is to use the event loop signaling
> to signal in R that the flag has changed. I'm working on a "threads" package
> that should help with that, but it's not complete yet (you can spawn threads
> from R and you can actually even synchronize them with R [so if the result is
> all you want it's there], but semaphores are not implemented yet  --- your
> inquiry should shift it further up on my todo stack ;)).
> 
> Cheers,
> Simon

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] median and data frames

2011-04-27 Thread peter dalgaard

On Apr 27, 2011, at 19:44 , Patrick Burns wrote:

> I would think a method in analogy to
> 'mean.data.frame' would be a logical choice.
> But I'm presuming there might be an argument
> against that or 'median.data.frame' would already
> exist.

Only if someone had a better plan. As you are probably well aware, what you are 
currently seeing is a rather exquisite mashup of methods getting applied to 
objects they shouldn't be applied to. Some curious effects are revealed, e.g. 
this little beauty:

> sort(df3.3)
Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing = 
decreasing)) : 
  undefined columns selected
> names(df3.3)<-NULL
> sort(df3.3)
  NA NA NA   NA   NA   NA   NA   NA   NA
1  1  7 10 NULL NULL NULL NULL NULL NULL
2  2  8 11  
3  3  9 12  
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
  corrupt data frame: columns will be truncated or padded with NAs


-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] median and data frames

2011-04-27 Thread Patrick Burns

Here are some data frames:

df3.2 <- data.frame(1:3, 7:9)
df4.2 <- data.frame(1:4, 7:10)
df3.3 <- data.frame(1:3, 7:9, 10:12)
df4.3 <- data.frame(1:4, 7:10, 10:13)
df3.4 <- data.frame(1:3, 7:9, 10:12, 15:17)
df4.4 <- data.frame(1:4, 7:10, 10:13, 15:18)

Now here are some commands and their answers:

> median(df3.2)
[1] 2 8
> median(df4.2)
[1] 2.5 8.5
> median(df3.3)
  NA
1  7
2  8
3  9
> median(df4.3)
  NA
1  7
2  8
3  9
4 10
> median(df3.4)
[1]  8 11
> median(df4.4)
[1]  8.5 11.5
> median(df3.2[c(1,2,3),])
[1] 2 8
> median(df3.2[c(1,3,2),])
[1]  2 NA
Warning message:
In mean.default(X[[2L]], ...) :
  argument is not numeric or logical: returning NA



The sessionInfo is below, but it looks
to me like the present behavior started
in 2.10.0.

Sometimes it gets the right answer.  I'd
be grateful to hear how it does that -- I
can't figure it out.

Under the current regime we can get numbers
that are correct, partially correct, or sort
of random (given the intention).

I claim that much better behavior would be
to always get exactly one of the following:

* a numeric answer (that is consistently correct)
* an error

I would think a method in analogy to
'mean.data.frame' would be a logical choice.
But I'm presuming there might be an argument
against that or 'median.data.frame' would already
exist.


> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United Kingdom.1252
[2] LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] graphics  grDevices utils datasets  stats methods   base

other attached packages:
[1] xts_0.8-0 zoo_1.6-5

loaded via a namespace (and not attached):
[1] grid_2.13.0 lattice_0.19-23 tools_2.13.0

--
Patrick Burns
pbu...@pburns.seanet.com
twitter: @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wish R Core had a standard format (or generic function) for "newdata" objects

2011-04-27 Thread Christophe Dutang
Among many solutions, I generally use the following code, which avoids the
ideal average individual, by considering the mean across of the predicted
values:

averagingpredict <- function(model, varname, varseq, type, subset=NULL)
{
if(is.null(subset))
mydata <- model$data
else
mydata <- model$data[subset, ]

f <- function(x)
{
mydata[, varname] <- x
mean(predict(model, newdata=mydata, type=type), na.rm=TRUE)
}

sapply(varseq, f)
}

It is time consuming, but it deals with non numeric variables.


Christophe


2011/4/26 Paul Johnson 

> Is anybody working on a way to standardize the creation of "newdata"
> objects for predict methods?
>
> When using predict, I find it difficult/tedious to create newdata data
> frames when there are many variables. It is necessary to set all
> variables at the mean/mode/median, and then for some variables of
> interest, one has to insert values for which predictions are desired.
> I was at a presentation by Scott Long last week and he was discussing
> the increasing emphasis in Stata on calculations of marginal
> predictions and "Spost" an several other packages, and,
> co-incidentally, I had a student visit who is learning to use R MASS's
> polr (W.Venables and B. Ripley) and we wrestled for quite a while to
> try to make the same calculations that Stata makes automatically.  It
> spits out predicted probabilities each independent variable, keeping
> other variables at a reference level.
>
> I've found R packages that aim to do essentially the same thing.
>
> In Frank Harrell's Design/rms framework, he uses a "data.dist"
> function that generates an object that the user has to put into the R
> options.  I think many users trip over the use of "options" there.  If
> I don't use that for a month or two, I completely forget the fine
> points and have to fight with it.  But it does "work" to give plots
> and predict functions the information they require.
>
> In  Zelig ( by Kosuke Imai, Gary King, and Olivia Lau), a function
> "setx" does the work of creating "newdata" objects. That appears to be
> about right as a candidate for a generic "newdata" function. Perhaps
> it could directly generalize to all R regression functions, but right
> now it is tailored to the models in Zelig. It has separate methods for
> the different types of models, and that is a bit confusing to me,since
> the "newdata" in one model should be the same as the newdata in
> another, I'm guessing. But his code is all there, I'll keep looking.
>
> In Effects (by John Fox), there are internal functions to create
> newdata and plot the marginal effects. If you load effects and run,
> for example, "effects:::effect.lm" you see Prof Fox has his own way of
> grabbing information from model columns and calculating predictions.
>
> I think it is time the R Core Team would look at this tell "us" what
> is the right way to do this. I think the interface to setx in Zelig is
> pretty easy to understand, at least for numeric variables.
>
> In R's termplot function, such a thing could be put to use.  As far as
> I can tell now, termplot is doing most of the work of creating a
> newdata object, but not exactly.
>
> It seems like it would be a shame to proliferate more functions that
> do the same function, when it is such a common thing.
>
> --
> Paul E. Johnson
> Professor, Political Science
> 1541 Lilac Lane, Room 504
> University of Kansas
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Christophe DUTANG
Ph. D. student at ISFA, Lyon, France

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wish R Core had a standard format (or generic function) for "newdata" objects

2011-04-27 Thread Paul Johnson
On Tue, Apr 26, 2011 at 7:39 PM, Duncan Murdoch
 wrote:

> If you don't like the way this was done in my three lines above, or by Frank
> Harrell, or the Zelig group, or John Fox, why don't you do it yourself, and
> get it right this time?  It's pretty rude to complain about things that
> others have given you for free, and demand they do it better.
>
> Duncan Murdoch
>

I offer sincere apology for sounding that way.  I'm not attacking
anybody. I'm just talking, asking don't you agree this were
standardized.  And you disagree, and I respect that since you are
actually doing the work.

>From a "lowly user's point of view", I wish "you experts" out there
would tell us one way to do this, we could follow your example.

When there's a regression model fitted with 20 variables in it, and
half of them are numeric, 4 are unordered factors, 3 are ordinal
factors, and what not, then this is a hard problem for many of us
ordinary users.  Or it is tedious.  They want "keep everything fixed,"
except one variable that takes on different specified values.  And
they want to do that for every variable, one at a time.

Stata has made this easy for many models, R could as well, if we
coalesced on a more-or-less standard way to create newdata objects for
predict.

But, in the end, I agree with your sentiment.  I just have to do this,
show you it is handy.  I think Zelig's setx has it about right, I'll
pursue that strategy.

pj
-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] on the Distribution task view

2011-04-27 Thread Christophe Dutang
Hi useRs,

As the maintainer of the Distribution task view (
http://cran.r-project.org/web/views/Distributions.html) for more than two
years, the following feedback exercise should have been done earlier. But
late is better than never!

I start this discussion to get your feedbacks/suggestions on the task view,
as well as to list the new packages that could be added to it.

Kind regards

Christophe

-- 
Christophe DUTANG
Ph. D. student at ISFA, Lyon, France

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] standard format for newdata objects

2011-04-27 Thread Terry Therneau

On Wed, 2011-04-27 at 12:00 +0200, Peter Dalgaard wrote:
> Er... No, I don't think Paul is being particularly rude here (and he
> has been doing us some substantial favors in the past, notably his
> useful Rtips page). I know the kind of functionality he is looking
> for; e.g., SAS JMP has some rather nice interactive displays of
> regression effects for which you'll need to fill in "something" for
> the other variables. 
> 
> However, that being said, I agree with Duncan that we probably do not
> want to canonicalize any particular method of filling in "average"
> values for data frame variables. Whatever you do will be statistically
> dubious (in particular, using the mode of a factor variable gives me
> the creeps: Do a subgroup analysis and your "average person" switches
> from male to female?), so I think it is one of those cases where it is
> best to provide mechanism, not policy.
> 

  I agree with Peter.  There are two tasks in newdata: deciding what the
default reference levels should be, and building the data frame with
those levels.  It's the first part that is hard. For survival curves
from a Cox model the historical default has been to use the mean of each
covariate, which can be awful (sex coded as 0/1 leads to prediction for
a hermaphrodite?).  Nevertheless, I've not been able to think of a
strategy that would give sensible answers for most of the data I use and
coxph retains the flawed default for lack of a better idea.  When
teaching a class on this, I tell listeners "bite the bullet" and build
the newdata that makes clinical sense, because package defaults are
always unwise for some of the variables.  How can a package possibly
know that it should use bilirubin=1.0 (upper limit of normal) and AST =
45 when the data set is one of my liver transplant studies?
   Frank Harrell would argue that his "sometimes misguided" default in
cph is better than the "almost always wrong" one in coxph though, and
there is certainly some strength in that position.

Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wish R Core had a standard format (or generic function) for "newdata" objects

2011-04-27 Thread Gabor Grothendieck
On Wed, Apr 27, 2011 at 3:55 AM, peter dalgaard  wrote:
>
> On Apr 27, 2011, at 02:39 , Duncan Murdoch wrote:
>
>> On 26/04/2011 11:13 AM, Paul Johnson wrote:
>>> Is anybody working on a way to standardize the creation of "newdata"
>>> objects for predict methods?
>
> [snip]
>
>>> I think it is time the R Core Team would look at this tell "us" what
>>> is the right way to do this. I think the interface to setx in Zelig is
>>> pretty easy to understand, at least for numeric variables.
>>
>> If you don't like the way this was done in my three lines above, or by Frank 
>> Harrell, or the Zelig group, or John Fox, why don't you do it yourself, and 
>> get it right this time?  It's pretty rude to complain about things that 
>> others have given you for free, and demand they do it better.
>
> Er... No, I don't think Paul is being particularly rude here (and he has been 
> doing us some substantial favors in the past, notably his useful Rtips page). 
> I know the kind of functionality he is looking for; e.g., SAS JMP has some 
> rather nice interactive displays of regression effects for which you'll need 
> to fill in "something" for the other variables.
>
> However, that being said, I agree with Duncan that we probably do not want to 
> canonicalize any particular method of filling in "average" values for data 
> frame variables. Whatever you do will be statistically dubious (in 
> particular, using the mode of a factor variable gives me the creeps: Do a 
> subgroup analysis and your "average person" switches from male to female?), 
> so I think it is one of those cases where it is best to provide mechanism, 
> not policy.
>

That could be satisfied by defining a generic in the core of R without
any methods.  Then individual packages or analyses could provide those
in the way they see fit.  As long as the packages or analyses are
working with objects of different classes they would not conflict.


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] How to create vignette.pdf for R-2.13.0?

2011-04-27 Thread Uwe Ligges



On 26.04.2011 21:58, cstrato wrote:

Dear Duncan, dear Uwe,

Just now I have re-run everything, and today xps.Rnw can be converted to
a vignette w/o any problems using:
a, buildVignettes("xps", dir="/Volumes/CoreData/CRAN/xps", quiet=F)
b, R CMD Sweave xps.Rnw

In both cases the vignette xps.pdf is created (maybe my Mac did not like
to work during eastern holidays).

However, one issue remains:
"R64 CMD check xps_1.13.1.tar.gz" no longer creates any pdf files for
the vignettes.



Dioes it give an error or warning? It should check the code. R CMD build 
creates the pdf files.


Uwe Ligges



Best regards
Christian


On 4/25/11 9:31 PM, Duncan Murdoch wrote:

On 25/04/2011 3:16 PM, cstrato wrote:

Thank you.

My problem seems to be that at the moment the problem can be seen only
on my Mac, since e.g. the Bioconductor servers have no problems creating
the vignettes.


Then you are definitely the one in the best position to diagnose the
problem. Use the usual approach: simplify it by cutting out everything
that looks unrelated. Verify that the problem still exists, then cut
some more. Eventually you'll have isolated the error to a particular
small snippet of code, and then you can add print() statements, or use
trace(), or do whatever is necessary to see what's so special about your
system.

I suspect it will turn out to be an assumption in the code that is not
true on your system.

If the assumption is being made by code you wrote, then fix it. If it's
being made by R, let us know.

Duncan Murdoch



Best regards
Christian

On 4/25/11 8:55 PM, Duncan Murdoch wrote:

cstrato wrote:

Dear Duncan,

Thank you for your example, however it is different since it does not
use x and y. What about print(x+y)?


Try it.



Sorry, I do not believe that there is a bug in my code, since:
1, it worked in all versions from R starting with R-2.6.0 till
R-2.12.2.
2, the identical code works in the examples
3, this code (or a similar code) is the starting code which all users
of xps have to use, and there was never a problem.


This might be a problem in R, or might be a problem in your code. As
far
as I know, it has only shown up in your code, so I'd guess that's where
the problem is. In any case, you're the one in the best position to
isolate it and debug it.

If it turns out to be a problem in R, put together an example
illustrating the problem that doesn't involve your code, and I'll
take a
look.

Duncan Murdoch



Maybe the reason could be that my code has to import
- the CEL-files from the package dir
- the file SchemeTest3.root from the package dir
??

Best regards
Christian

On 4/25/11 8:00 PM, Duncan Murdoch wrote:

cstrato wrote:

Dear Uwe,

Your suggestion to look at the Sweave manual helped me to solve the
problem. It seems that in R-2.13.0 every chunk can use the code from
the chunk before but not from an earlier chunk.

I'm either misreading what you wrote, or it's wrong. If I have this
in a
Sweave file:

<<>>=
x<- 1
@

<<>>=
y<- 2
@

<<>>=
print(x)
@

I will see the value of x getting printed, even though it came from
two
chunks earlier.

I think Uwe is right: there is some bug in the code you're running.
Sweave isn't the problem.

Duncan Murdoch


Concretely, the following does not work since chunk 5 needs the code
from chunk 3 and 4:

###
### chunk number 3:
###
#line 126 "xps.Rnw"
celdir<- file.path(.path.package("xps"), "raw")

###
### chunk number 4:
###
#line 132 "xps.Rnw"
scheme.test3<- root.scheme(file.path(.path.package("xps"),
"schemes",
"SchemeTest3.root"))

###
### chunk number 5:
###
#line 137 "xps.Rnw"
celfiles<- c("TestA1.CEL","TestA2.CEL")
data.test3<- import.data(scheme.test3, "tmpdt_DataTest3",
celdir=celdir, celfiles=celfiles, verbose=FALSE)


However, when I add "celdir" to chunk 5 then everything works since
now chunk 5 needs only the code from chunk 4 but not from chunk 3:

###
### chunk number 5:
###
#line 137 "xps.Rnw"
celdir<- file.path(.path.package("xps"), "raw")
celfiles<- c("TestA1.CEL","TestA2.CEL")
data.test3<- import.data(scheme.test3, "tmpdt_DataTest3",
celdir=celdir, celfiles=celfiles, verbose=FALSE)


Now buildVignettes() is able to create the vignettes, however R CMD
check still does not build the vignettes.


Yes, I get a Warning in both cases:
* checking package vignettes in 'inst/doc' ... WARNING
Package vignettes without corresponding PDF: 

However, with R-2.12.2 the following lines are added:

/Volumes/CoreData/CRAN/xps.Rcheck/00_pkg_src/xps/inst/doc/APTvsXPS.Rnw


/Volumes/CoreData/CRAN/xps.Rcheck/00_pkg_src/xps/inst/doc/xps.Rnw
/Volumes/CoreData/CRAN/xps.Rcheck/00_pkg_src/xps/inst/

Re: [Rd] Wish R Core had a standard format (or generic function) for "newdata" objects

2011-04-27 Thread peter dalgaard

On Apr 27, 2011, at 02:39 , Duncan Murdoch wrote:

> On 26/04/2011 11:13 AM, Paul Johnson wrote:
>> Is anybody working on a way to standardize the creation of "newdata"
>> objects for predict methods?

[snip]

>> I think it is time the R Core Team would look at this tell "us" what
>> is the right way to do this. I think the interface to setx in Zelig is
>> pretty easy to understand, at least for numeric variables.
> 
> If you don't like the way this was done in my three lines above, or by Frank 
> Harrell, or the Zelig group, or John Fox, why don't you do it yourself, and 
> get it right this time?  It's pretty rude to complain about things that 
> others have given you for free, and demand they do it better.

Er... No, I don't think Paul is being particularly rude here (and he has been 
doing us some substantial favors in the past, notably his useful Rtips page). I 
know the kind of functionality he is looking for; e.g., SAS JMP has some rather 
nice interactive displays of regression effects for which you'll need to fill 
in "something" for the other variables. 

However, that being said, I agree with Duncan that we probably do not want to 
canonicalize any particular method of filling in "average" values for data 
frame variables. Whatever you do will be statistically dubious (in particular, 
using the mode of a factor variable gives me the creeps: Do a subgroup analysis 
and your "average person" switches from male to female?), so I think it is one 
of those cases where it is best to provide mechanism, not policy.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel