Re: [Rd] Wish R Core had a standard format (or generic function) for newdata objects

2011-04-27 Thread peter dalgaard

On Apr 27, 2011, at 02:39 , Duncan Murdoch wrote:

 On 26/04/2011 11:13 AM, Paul Johnson wrote:
 Is anybody working on a way to standardize the creation of newdata
 objects for predict methods?

[snip]

 I think it is time the R Core Team would look at this tell us what
 is the right way to do this. I think the interface to setx in Zelig is
 pretty easy to understand, at least for numeric variables.
 
 If you don't like the way this was done in my three lines above, or by Frank 
 Harrell, or the Zelig group, or John Fox, why don't you do it yourself, and 
 get it right this time?  It's pretty rude to complain about things that 
 others have given you for free, and demand they do it better.

Er... No, I don't think Paul is being particularly rude here (and he has been 
doing us some substantial favors in the past, notably his useful Rtips page). I 
know the kind of functionality he is looking for; e.g., SAS JMP has some rather 
nice interactive displays of regression effects for which you'll need to fill 
in something for the other variables. 

However, that being said, I agree with Duncan that we probably do not want to 
canonicalize any particular method of filling in average values for data 
frame variables. Whatever you do will be statistically dubious (in particular, 
using the mode of a factor variable gives me the creeps: Do a subgroup analysis 
and your average person switches from male to female?), so I think it is one 
of those cases where it is best to provide mechanism, not policy.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] How to create vignette.pdf for R-2.13.0?

2011-04-27 Thread Uwe Ligges



On 26.04.2011 21:58, cstrato wrote:

Dear Duncan, dear Uwe,

Just now I have re-run everything, and today xps.Rnw can be converted to
a vignette w/o any problems using:
a, buildVignettes(xps, dir=/Volumes/CoreData/CRAN/xps, quiet=F)
b, R CMD Sweave xps.Rnw

In both cases the vignette xps.pdf is created (maybe my Mac did not like
to work during eastern holidays).

However, one issue remains:
R64 CMD check xps_1.13.1.tar.gz no longer creates any pdf files for
the vignettes.



Dioes it give an error or warning? It should check the code. R CMD build 
creates the pdf files.


Uwe Ligges



Best regards
Christian


On 4/25/11 9:31 PM, Duncan Murdoch wrote:

On 25/04/2011 3:16 PM, cstrato wrote:

Thank you.

My problem seems to be that at the moment the problem can be seen only
on my Mac, since e.g. the Bioconductor servers have no problems creating
the vignettes.


Then you are definitely the one in the best position to diagnose the
problem. Use the usual approach: simplify it by cutting out everything
that looks unrelated. Verify that the problem still exists, then cut
some more. Eventually you'll have isolated the error to a particular
small snippet of code, and then you can add print() statements, or use
trace(), or do whatever is necessary to see what's so special about your
system.

I suspect it will turn out to be an assumption in the code that is not
true on your system.

If the assumption is being made by code you wrote, then fix it. If it's
being made by R, let us know.

Duncan Murdoch



Best regards
Christian

On 4/25/11 8:55 PM, Duncan Murdoch wrote:

cstrato wrote:

Dear Duncan,

Thank you for your example, however it is different since it does not
use x and y. What about print(x+y)?


Try it.



Sorry, I do not believe that there is a bug in my code, since:
1, it worked in all versions from R starting with R-2.6.0 till
R-2.12.2.
2, the identical code works in the examples
3, this code (or a similar code) is the starting code which all users
of xps have to use, and there was never a problem.


This might be a problem in R, or might be a problem in your code. As
far
as I know, it has only shown up in your code, so I'd guess that's where
the problem is. In any case, you're the one in the best position to
isolate it and debug it.

If it turns out to be a problem in R, put together an example
illustrating the problem that doesn't involve your code, and I'll
take a
look.

Duncan Murdoch



Maybe the reason could be that my code has to import
- the CEL-files from the package dir
- the file SchemeTest3.root from the package dir
??

Best regards
Christian

On 4/25/11 8:00 PM, Duncan Murdoch wrote:

cstrato wrote:

Dear Uwe,

Your suggestion to look at the Sweave manual helped me to solve the
problem. It seems that in R-2.13.0 every chunk can use the code from
the chunk before but not from an earlier chunk.

I'm either misreading what you wrote, or it's wrong. If I have this
in a
Sweave file:

=
x- 1
@

=
y- 2
@

=
print(x)
@

I will see the value of x getting printed, even though it came from
two
chunks earlier.

I think Uwe is right: there is some bug in the code you're running.
Sweave isn't the problem.

Duncan Murdoch


Concretely, the following does not work since chunk 5 needs the code
from chunk 3 and 4:

###
### chunk number 3:
###
#line 126 xps.Rnw
celdir- file.path(.path.package(xps), raw)

###
### chunk number 4:
###
#line 132 xps.Rnw
scheme.test3- root.scheme(file.path(.path.package(xps),
schemes,
SchemeTest3.root))

###
### chunk number 5:
###
#line 137 xps.Rnw
celfiles- c(TestA1.CEL,TestA2.CEL)
data.test3- import.data(scheme.test3, tmpdt_DataTest3,
celdir=celdir, celfiles=celfiles, verbose=FALSE)


However, when I add celdir to chunk 5 then everything works since
now chunk 5 needs only the code from chunk 4 but not from chunk 3:

###
### chunk number 5:
###
#line 137 xps.Rnw
celdir- file.path(.path.package(xps), raw)
celfiles- c(TestA1.CEL,TestA2.CEL)
data.test3- import.data(scheme.test3, tmpdt_DataTest3,
celdir=celdir, celfiles=celfiles, verbose=FALSE)


Now buildVignettes() is able to create the vignettes, however R CMD
check still does not build the vignettes.


Yes, I get a Warning in both cases:
* checking package vignettes in 'inst/doc' ... WARNING
Package vignettes without corresponding PDF: 

However, with R-2.12.2 the following lines are added:

/Volumes/CoreData/CRAN/xps.Rcheck/00_pkg_src/xps/inst/doc/APTvsXPS.Rnw


/Volumes/CoreData/CRAN/xps.Rcheck/00_pkg_src/xps/inst/doc/xps.Rnw
/Volumes/CoreData/CRAN/xps.Rcheck/00_pkg_src/xps/inst/doc/xpsClasses.Rnw




Re: [Rd] Wish R Core had a standard format (or generic function) for newdata objects

2011-04-27 Thread Gabor Grothendieck
On Wed, Apr 27, 2011 at 3:55 AM, peter dalgaard pda...@gmail.com wrote:

 On Apr 27, 2011, at 02:39 , Duncan Murdoch wrote:

 On 26/04/2011 11:13 AM, Paul Johnson wrote:
 Is anybody working on a way to standardize the creation of newdata
 objects for predict methods?

 [snip]

 I think it is time the R Core Team would look at this tell us what
 is the right way to do this. I think the interface to setx in Zelig is
 pretty easy to understand, at least for numeric variables.

 If you don't like the way this was done in my three lines above, or by Frank 
 Harrell, or the Zelig group, or John Fox, why don't you do it yourself, and 
 get it right this time?  It's pretty rude to complain about things that 
 others have given you for free, and demand they do it better.

 Er... No, I don't think Paul is being particularly rude here (and he has been 
 doing us some substantial favors in the past, notably his useful Rtips page). 
 I know the kind of functionality he is looking for; e.g., SAS JMP has some 
 rather nice interactive displays of regression effects for which you'll need 
 to fill in something for the other variables.

 However, that being said, I agree with Duncan that we probably do not want to 
 canonicalize any particular method of filling in average values for data 
 frame variables. Whatever you do will be statistically dubious (in 
 particular, using the mode of a factor variable gives me the creeps: Do a 
 subgroup analysis and your average person switches from male to female?), 
 so I think it is one of those cases where it is best to provide mechanism, 
 not policy.


That could be satisfied by defining a generic in the core of R without
any methods.  Then individual packages or analyses could provide those
in the way they see fit.  As long as the packages or analyses are
working with objects of different classes they would not conflict.


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] standard format for newdata objects

2011-04-27 Thread Terry Therneau

On Wed, 2011-04-27 at 12:00 +0200, Peter Dalgaard wrote:
 Er... No, I don't think Paul is being particularly rude here (and he
 has been doing us some substantial favors in the past, notably his
 useful Rtips page). I know the kind of functionality he is looking
 for; e.g., SAS JMP has some rather nice interactive displays of
 regression effects for which you'll need to fill in something for
 the other variables. 
 
 However, that being said, I agree with Duncan that we probably do not
 want to canonicalize any particular method of filling in average
 values for data frame variables. Whatever you do will be statistically
 dubious (in particular, using the mode of a factor variable gives me
 the creeps: Do a subgroup analysis and your average person switches
 from male to female?), so I think it is one of those cases where it is
 best to provide mechanism, not policy.
 

  I agree with Peter.  There are two tasks in newdata: deciding what the
default reference levels should be, and building the data frame with
those levels.  It's the first part that is hard. For survival curves
from a Cox model the historical default has been to use the mean of each
covariate, which can be awful (sex coded as 0/1 leads to prediction for
a hermaphrodite?).  Nevertheless, I've not been able to think of a
strategy that would give sensible answers for most of the data I use and
coxph retains the flawed default for lack of a better idea.  When
teaching a class on this, I tell listeners bite the bullet and build
the newdata that makes clinical sense, because package defaults are
always unwise for some of the variables.  How can a package possibly
know that it should use bilirubin=1.0 (upper limit of normal) and AST =
45 when the data set is one of my liver transplant studies?
   Frank Harrell would argue that his sometimes misguided default in
cph is better than the almost always wrong one in coxph though, and
there is certainly some strength in that position.

Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] on the Distribution task view

2011-04-27 Thread Christophe Dutang
Hi useRs,

As the maintainer of the Distribution task view (
http://cran.r-project.org/web/views/Distributions.html) for more than two
years, the following feedback exercise should have been done earlier. But
late is better than never!

I start this discussion to get your feedbacks/suggestions on the task view,
as well as to list the new packages that could be added to it.

Kind regards

Christophe

-- 
Christophe DUTANG
Ph. D. student at ISFA, Lyon, France

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wish R Core had a standard format (or generic function) for newdata objects

2011-04-27 Thread Christophe Dutang
Among many solutions, I generally use the following code, which avoids the
ideal average individual, by considering the mean across of the predicted
values:

averagingpredict - function(model, varname, varseq, type, subset=NULL)
{
if(is.null(subset))
mydata - model$data
else
mydata - model$data[subset, ]

f - function(x)
{
mydata[, varname] - x
mean(predict(model, newdata=mydata, type=type), na.rm=TRUE)
}

sapply(varseq, f)
}

It is time consuming, but it deals with non numeric variables.


Christophe


2011/4/26 Paul Johnson pauljoh...@gmail.com

 Is anybody working on a way to standardize the creation of newdata
 objects for predict methods?

 When using predict, I find it difficult/tedious to create newdata data
 frames when there are many variables. It is necessary to set all
 variables at the mean/mode/median, and then for some variables of
 interest, one has to insert values for which predictions are desired.
 I was at a presentation by Scott Long last week and he was discussing
 the increasing emphasis in Stata on calculations of marginal
 predictions and Spost an several other packages, and,
 co-incidentally, I had a student visit who is learning to use R MASS's
 polr (W.Venables and B. Ripley) and we wrestled for quite a while to
 try to make the same calculations that Stata makes automatically.  It
 spits out predicted probabilities each independent variable, keeping
 other variables at a reference level.

 I've found R packages that aim to do essentially the same thing.

 In Frank Harrell's Design/rms framework, he uses a data.dist
 function that generates an object that the user has to put into the R
 options.  I think many users trip over the use of options there.  If
 I don't use that for a month or two, I completely forget the fine
 points and have to fight with it.  But it does work to give plots
 and predict functions the information they require.

 In  Zelig ( by Kosuke Imai, Gary King, and Olivia Lau), a function
 setx does the work of creating newdata objects. That appears to be
 about right as a candidate for a generic newdata function. Perhaps
 it could directly generalize to all R regression functions, but right
 now it is tailored to the models in Zelig. It has separate methods for
 the different types of models, and that is a bit confusing to me,since
 the newdata in one model should be the same as the newdata in
 another, I'm guessing. But his code is all there, I'll keep looking.

 In Effects (by John Fox), there are internal functions to create
 newdata and plot the marginal effects. If you load effects and run,
 for example, effects:::effect.lm you see Prof Fox has his own way of
 grabbing information from model columns and calculating predictions.

 I think it is time the R Core Team would look at this tell us what
 is the right way to do this. I think the interface to setx in Zelig is
 pretty easy to understand, at least for numeric variables.

 In R's termplot function, such a thing could be put to use.  As far as
 I can tell now, termplot is doing most of the work of creating a
 newdata object, but not exactly.

 It seems like it would be a shame to proliferate more functions that
 do the same function, when it is such a common thing.

 --
 Paul E. Johnson
 Professor, Political Science
 1541 Lilac Lane, Room 504
 University of Kansas

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel




-- 
Christophe DUTANG
Ph. D. student at ISFA, Lyon, France

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] median and data frames

2011-04-27 Thread Patrick Burns

Here are some data frames:

df3.2 - data.frame(1:3, 7:9)
df4.2 - data.frame(1:4, 7:10)
df3.3 - data.frame(1:3, 7:9, 10:12)
df4.3 - data.frame(1:4, 7:10, 10:13)
df3.4 - data.frame(1:3, 7:9, 10:12, 15:17)
df4.4 - data.frame(1:4, 7:10, 10:13, 15:18)

Now here are some commands and their answers:

 median(df3.2)
[1] 2 8
 median(df4.2)
[1] 2.5 8.5
 median(df3.3)
  NA
1  7
2  8
3  9
 median(df4.3)
  NA
1  7
2  8
3  9
4 10
 median(df3.4)
[1]  8 11
 median(df4.4)
[1]  8.5 11.5
 median(df3.2[c(1,2,3),])
[1] 2 8
 median(df3.2[c(1,3,2),])
[1]  2 NA
Warning message:
In mean.default(X[[2L]], ...) :
  argument is not numeric or logical: returning NA



The sessionInfo is below, but it looks
to me like the present behavior started
in 2.10.0.

Sometimes it gets the right answer.  I'd
be grateful to hear how it does that -- I
can't figure it out.

Under the current regime we can get numbers
that are correct, partially correct, or sort
of random (given the intention).

I claim that much better behavior would be
to always get exactly one of the following:

* a numeric answer (that is consistently correct)
* an error

I would think a method in analogy to
'mean.data.frame' would be a logical choice.
But I'm presuming there might be an argument
against that or 'median.data.frame' would already
exist.


 sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United Kingdom.1252
[2] LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] graphics  grDevices utils datasets  stats methods   base

other attached packages:
[1] xts_0.8-0 zoo_1.6-5

loaded via a namespace (and not attached):
[1] grid_2.13.0 lattice_0.19-23 tools_2.13.0

--
Patrick Burns
pbu...@pburns.seanet.com
twitter: @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] median and data frames

2011-04-27 Thread peter dalgaard

On Apr 27, 2011, at 19:44 , Patrick Burns wrote:

 I would think a method in analogy to
 'mean.data.frame' would be a logical choice.
 But I'm presuming there might be an argument
 against that or 'median.data.frame' would already
 exist.

Only if someone had a better plan. As you are probably well aware, what you are 
currently seeing is a rather exquisite mashup of methods getting applied to 
objects they shouldn't be applied to. Some curious effects are revealed, e.g. 
this little beauty:

 sort(df3.3)
Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing = 
decreasing)) : 
  undefined columns selected
 names(df3.3)-NULL
 sort(df3.3)
  NA NA NA   NA   NA   NA   NA   NA   NA
1  1  7 10 NULL NULL NULL NULL NULL NULL
2  2  8 11 NA NA NA NA NA NA
3  3  9 12 NA NA NA NA NA NA
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
  corrupt data frame: columns will be truncated or padded with NAs


-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Thread synchronization [Was: Interrupting C++ code execution]

2011-04-27 Thread Sean Robert McGuffee
Hi Simon,
That makes a lot of sense to me. I'll start reading about R's event loop
signaling. I'm not sure what the best method will be for me to flag the
completeness of a threaded process in my case. In abstract it seems that I
could get R's event loop to look for any type of flag. I think key for me in
this case will be identifying whether a particular file has been completely
produced or not. In principle I could put that type of info into the file
itself, but I think I could also make a temp file somewhere with it's full
path and flag info about it. Then the event loop could look for a particular
pattern of temp file names. On the other hand, if I pass in that info when I
start the event loop, that might work too. Regarding the external pointer
idea, I was thinking about passing an object to R as a return value after
launching the thread, and then I might be able to access a pointer inside
that object to reference it from my thread. That could be a binary vector or
any type of object if I can figure out how to get to it from my thread.
Honestly, I don't know much about dynamic referencing of objects from
separate threads, but in principle memory is shared in this case. I'll let
you know if I come up with anything generic... Please keep me posted on your
package. Are any versions of it available yet? It didn't happen to come up
on my list of R packages. I haven't necessarily been maintaining an
up-to-date version of R though. I don't know if that influences the package
list it shows me.
Sean


On 4/26/11 8:51 PM, Simon Urbanek simon.urba...@r-project.org wrote:

 Sean,
 
 On Apr 26, 2011, at 5:06 PM, Sean Robert McGuffee wrote:
 
 I've been thinking about how to handle c++ threads that were started via Rcpp
 calls to some of my c++ libraries from R. My main obstacle is trying to make
 sure that users don't try to process files that are being generated by a
 thread before the thread finishes. One thing I am considering is having my
 threaded code return a class to R that contains a pointer that it remembers.
 Then maybe I could just change the value at that pointer when my thread
 finishes. Does that seem like a reasonable approach? I'm not completely sure
 if this is related to your issue or not, but it might be similar enough to be
 worth asking...
 
 It depends. For a simple flag it's actually much more simple than that - you
 can create a boolean vector (make sure you preserve it) and just update its
 value when it's done - you don't even need an external pointer for that (if
 your'e careful).
 
 But the slight problem with that approach is rather that you don't have a way
 to tell R about the status change, so essentially you can only poll on the R
 side. A more proper way to deal with this is to use the event loop signaling
 to signal in R that the flag has changed. I'm working on a threads package
 that should help with that, but it's not complete yet (you can spawn threads
 from R and you can actually even synchronize them with R [so if the result is
 all you want it's there], but semaphores are not implemented yet  --- your
 inquiry should shift it further up on my todo stack ;)).
 
 Cheers,
 Simon

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Thread synchronization [Was: Interrupting C++ code execution]

2011-04-27 Thread Simon Urbanek
Sean,

On Apr 27, 2011, at 3:21 PM, Sean Robert McGuffee wrote:

 Hi Simon,
 That makes a lot of sense to me. I'll start reading about R's event loop 
 signaling. I'm not sure what the best method will be for me to flag the 
 completeness of a threaded process in my case. In abstract it seems that I 
 could get R's event loop to look for any type of flag. I think key for me in 
 this case will be identifying whether a particular file has been completely 
 produced or not. In principle I could put that type of info into the file 
 itself, but I think I could also make a temp file somewhere with it's full 
 path and flag info about it. Then the event loop could look for a particular 
 pattern of temp file names. On the other hand, if I pass in that info when I 
 start the event loop, that might work too.

Usually, the easiest on unix is to register a file handle as input handler 
(addInputHandler) - in practice a pipe - one end is owned by the thread and the 
other is owned by R. Then all you need is to write anything on the thread's end 
and it will wake up R's even loop and let you handle the read on that end so 
you can do anything. You could even have multiple threads share this one pipe 
since you could distinguish by payload which thread is calling. One example of 
this is the integrated HTTP server in R - see Rhttpd sources (it has also a 
variant that works on Windows using synchronization via OS event loop).


 Regarding the external pointer idea, I was thinking about passing an object 
 to R as a return value after launching the thread, and then I might be able 
 to access a pointer inside that object to reference it from my thread. That 
 could be a binary vector or any type of object if I can figure out how to get 
 to it from my thread. Honestly, I don't know much about dynamic referencing 
 of objects from separate threads, but in principle memory is shared in this 
 case. I'll let you know if I come up with anything generic... Please keep me 
 posted on your  package. Are any versions of it available yet?

Yes, it is not released yet since it's not quite complete, but here we go, at 
your own risk ;):

http://rforge.net/threads

It will work on all platforms, eventually, but currently only unix is 
supported. The idea is sort of taking the multicore paradigm (parallel + 
collect) but using threads (threadEval + yield). The documentation it currently 
non-existent, but I plan to write a vignette for it ... maybe later this week 
...

Cheers,
Simon


 It didn't happen to come up on my list of R packages. I haven't necessarily 
 been maintaining an up-to-date version of R though. I don't know if that 
 influences the package list it shows me.
 Sean
 
 
 On 4/26/11 8:51 PM, Simon Urbanek simon.urba...@r-project.org wrote:
 
 Sean,
 
 On Apr 26, 2011, at 5:06 PM, Sean Robert McGuffee wrote:
 
 I've been thinking about how to handle c++ threads that were started via 
 Rcpp
 calls to some of my c++ libraries from R. My main obstacle is trying to make
 sure that users don't try to process files that are being generated by a
 thread before the thread finishes. One thing I am considering is having my
 threaded code return a class to R that contains a pointer that it remembers.
 Then maybe I could just change the value at that pointer when my thread
 finishes. Does that seem like a reasonable approach? I'm not completely sure
 if this is related to your issue or not, but it might be similar enough to 
 be
 worth asking...
 
 It depends. For a simple flag it's actually much more simple than that - you
 can create a boolean vector (make sure you preserve it) and just update its
 value when it's done - you don't even need an external pointer for that (if
 your'e careful).
 
 But the slight problem with that approach is rather that you don't have a way
 to tell R about the status change, so essentially you can only poll on the R
 side. A more proper way to deal with this is to use the event loop signaling
 to signal in R that the flag has changed. I'm working on a threads package
 that should help with that, but it's not complete yet (you can spawn threads
 from R and you can actually even synchronize them with R [so if the result is
 all you want it's there], but semaphores are not implemented yet  --- your
 inquiry should shift it further up on my todo stack ;)).
 
 Cheers,
 Simon
 
 
 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] How to create vignette.pdf for R-2.13.0?

2011-04-27 Thread cstrato

Dear Uwe,

As I have already mentioned R CMD check gives the following output:

* checking for unstated dependencies in vignettes ... OK
* checking package vignettes in 'inst/doc' ... WARNING
Package vignette(s) without corresponding PDF:
   APTvsXPS.Rnw
   xps.Rnw
   xpsClasses.Rnw
   xpsPreprocess.Rnw

* checking running R code from vignettes ... OK
* checking re-building of vignettes ... OK
* checking PDF version of manual ... OK

WARNING: There was 1 warning, see
  '/Volumes/CoreData/CRAN/xps.Rcheck/00check.log'
for details


Although the output says checking PDF version of manual ... OK I 
cannot find any result in xps.Rcheck.



When I run R64 CMD build xps I get:

* checking for file 'xps/DESCRIPTION' ... OK
* preparing 'xps':
* checking DESCRIPTION meta-information ... OK
* cleaning src
* installing the package to re-build vignettes
* creating vignettes ... OK
* cleaning src
* checking for LF line-endings in source and make files
* checking for empty or unneeded directories
* building 'xps_1.13.1.tar.gz'

However, the resulting file xps_1.13.1.tgz does also not contain any 
vignettes.


Best regards
Christian


On 4/27/11 10:16 AM, Uwe Ligges wrote:



On 26.04.2011 21:58, cstrato wrote:

Dear Duncan, dear Uwe,

Just now I have re-run everything, and today xps.Rnw can be converted to
a vignette w/o any problems using:
a, buildVignettes(xps, dir=/Volumes/CoreData/CRAN/xps, quiet=F)
b, R CMD Sweave xps.Rnw

In both cases the vignette xps.pdf is created (maybe my Mac did not like
to work during eastern holidays).

However, one issue remains:
R64 CMD check xps_1.13.1.tar.gz no longer creates any pdf files for
the vignettes.



Dioes it give an error or warning? It should check the code. R CMD build
creates the pdf files.

Uwe Ligges



Best regards
Christian


On 4/25/11 9:31 PM, Duncan Murdoch wrote:

On 25/04/2011 3:16 PM, cstrato wrote:

Thank you.

My problem seems to be that at the moment the problem can be seen only
on my Mac, since e.g. the Bioconductor servers have no problems
creating
the vignettes.


Then you are definitely the one in the best position to diagnose the
problem. Use the usual approach: simplify it by cutting out everything
that looks unrelated. Verify that the problem still exists, then cut
some more. Eventually you'll have isolated the error to a particular
small snippet of code, and then you can add print() statements, or use
trace(), or do whatever is necessary to see what's so special about your
system.

I suspect it will turn out to be an assumption in the code that is not
true on your system.

If the assumption is being made by code you wrote, then fix it. If it's
being made by R, let us know.

Duncan Murdoch



Best regards
Christian

On 4/25/11 8:55 PM, Duncan Murdoch wrote:

cstrato wrote:

Dear Duncan,

Thank you for your example, however it is different since it does not
use x and y. What about print(x+y)?


Try it.



Sorry, I do not believe that there is a bug in my code, since:
1, it worked in all versions from R starting with R-2.6.0 till
R-2.12.2.
2, the identical code works in the examples
3, this code (or a similar code) is the starting code which all users
of xps have to use, and there was never a problem.


This might be a problem in R, or might be a problem in your code. As
far
as I know, it has only shown up in your code, so I'd guess that's
where
the problem is. In any case, you're the one in the best position to
isolate it and debug it.

If it turns out to be a problem in R, put together an example
illustrating the problem that doesn't involve your code, and I'll
take a
look.

Duncan Murdoch



Maybe the reason could be that my code has to import
- the CEL-files from the package dir
- the file SchemeTest3.root from the package dir
??

Best regards
Christian

On 4/25/11 8:00 PM, Duncan Murdoch wrote:

cstrato wrote:

Dear Uwe,

Your suggestion to look at the Sweave manual helped me to solve the
problem. It seems that in R-2.13.0 every chunk can use the code
from
the chunk before but not from an earlier chunk.

I'm either misreading what you wrote, or it's wrong. If I have this
in a
Sweave file:

=
x- 1
@

=
y- 2
@

=
print(x)
@

I will see the value of x getting printed, even though it came from
two
chunks earlier.

I think Uwe is right: there is some bug in the code you're running.
Sweave isn't the problem.

Duncan Murdoch


Concretely, the following does not work since chunk 5 needs the
code
from chunk 3 and 4:

###
### chunk number 3:
###
#line 126 xps.Rnw
celdir- file.path(.path.package(xps), raw)

###
### chunk number 4:
###
#line 132 xps.Rnw
scheme.test3- root.scheme(file.path(.path.package(xps),
schemes,
SchemeTest3.root))

###
### chunk number 5:
###
#line 137 

Re: [Rd] median and data frames

2011-04-27 Thread Paul Johnson
On Wed, Apr 27, 2011 at 12:44 PM, Patrick Burns
pbu...@pburns.seanet.com wrote:
 Here are some data frames:

 df3.2 - data.frame(1:3, 7:9)
 df4.2 - data.frame(1:4, 7:10)
 df3.3 - data.frame(1:3, 7:9, 10:12)
 df4.3 - data.frame(1:4, 7:10, 10:13)
 df3.4 - data.frame(1:3, 7:9, 10:12, 15:17)
 df4.4 - data.frame(1:4, 7:10, 10:13, 15:18)

 Now here are some commands and their answers:

 median(df4.4)
 [1]  8.5 11.5
 median(df3.2[c(1,2,3),])
 [1] 2 8
 median(df3.2[c(1,3,2),])
 [1]  2 NA
 Warning message:
 In mean.default(X[[2L]], ...) :
  argument is not numeric or logical: returning NA



 The sessionInfo is below, but it looks
 to me like the present behavior started
 in 2.10.0.

 Sometimes it gets the right answer.  I'd
 be grateful to hear how it does that -- I
 can't figure it out.


Hello, Pat.

Nice poetry there!  I think I have an actual answer, as opposed to the
usual crap I spew.

I would agree if you said median.data.frame ought to be written to
work columnwise, similar to mean.data.frame.

apply and sapply  always give the correct answer

 apply(df3.3, 2, median)
  X1.3   X7.9 X10.12
 2  8 11

 apply(df3.2, 2, median)
X1.3 X7.9
   28

 apply(df3.2[c(1,3,2),], 2, median)
X1.3 X7.9
   28

mean.data.frame is now implemented as

mean.data.frame - function(x, ...) sapply(x, mean, ...)

I think we would suggest this for medians:

??

median.data.frame - function(x,...) sapply(x, median, ...)

?

It works, see:

 median.data.frame(df3.2[c(1,3,2),])
X1.3 X7.9
   28

Would our next step be to enter that somewhere in R bugzilla? (I'm not
joking--I'm that naive).

I think I can explain why the current median works intermittently in
those cases you mention.  Give it a small set of pre-sorted data, all
is well.  median.default uses a sort function, and it is confused when
it is given a data.frame object rather than just a vector.


I put a browser() at the top of median.default

 median(df3.2[c(1,3,2),])
Called from: median.default(df3.2[c(1, 3, 2), ])
Browse[1] n
debug at tmp#4: if (is.factor(x)) stop(need numeric data)
Browse[2] n
debug at tmp#4: NULL
Browse[2] n
debug at tmp#6: if (length(names(x))) names(x) - NULL
Browse[2] n
debug at tmp#6: names(x) - NULL
Browse[2] n
debug at tmp#8: if (na.rm) x - x[!is.na(x)] else if (any(is.na(x)))
return(x[FALSE][NA])
Browse[2] n
debug at tmp#8: if (any(is.na(x))) return(x[FALSE][NA])
Browse[2] n
debug at tmp#8: NULL
Browse[2] n
debug at tmp#12: n - length(x)
Browse[2] n
debug at tmp#13: if (n == 0L) return(x[FALSE][NA])
Browse[2] n
debug at tmp#13: NULL
Browse[2] n
debug at tmp#15: half - (n + 1L)%/%2L
Browse[2] n
debug at tmp#16: if (n%%2L == 1L) sort(x, partial = half)[half] else
mean(sort(x,
partial = half + 0L:1L)[half + 0L:1L])
Browse[2] n
debug at tmp#16: mean(sort(x, partial = half + 0L:1L)[half + 0L:1L])
Browse[2] n
[1]  2 NA
Warning message:
In mean.default(X[[2L]], ...) :
  argument is not numeric or logical: returning NA


Note the sort there in step 16. I think that's what is killing us.

If you are lucky, give it a small  data frame that is in order, like
df3.2, the sort doesn't produce gibberish. When I get to that point, I
will show you the sort's effect.

First, the case that works. I moved the browser() down, because I
got tired of looking at the same old not-yet-erroneous output.


 median(df3.2)
Called from: median.default(df3.2)
Browse[1] n
debug at tmp#15: half - (n + 1L)%/%2L
Browse[2] n
debug at tmp#16: if (n%%2L == 1L) sort(x, partial = half)[half] else
mean(sort(x,
partial = half + 0L:1L)[half + 0L:1L])
Browse[2] n
debug at tmp#16: mean(sort(x, partial = half + 0L:1L)[half + 0L:1L])

Interactively, type

Browse[2] sort(x, partial = half + 0L:1L)
  NA NA   NA   NA   NA   NA
1  1  7 NULL NULL NULL NULL
2  2  8 NA NA NA NA
3  3  9 NA NA NA NA
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
  corrupt data frame: columns will be truncated or padded with NAs

But it still gives you a right answer:

Browse[2] n
[1] 2 8


But if  you give it data out of order, the second column turns to NA,
and that causes doom.


 median(df3.2[c(1,3,2),])
Called from: median.default(df3.2[c(1, 3, 2), ])
Browse[1] n
debug at tmp#15: half - (n + 1L)%/%2L
Browse[2] n
debug at tmp#16: if (n%%2L == 1L) sort(x, partial = half)[half] else
mean(sort(x,
partial = half + 0L:1L)[half + 0L:1L])
Browse[2] n
debug at tmp#16: mean(sort(x, partial = half + 0L:1L)[half + 0L:1L])

Interactively:

Browse[2] sort(x, partial = half + 0L:1L)
  NA   NA NA   NA   NA   NA
1  1 NULL  7 NULL NULL NULL
3  3 NA  9 NA NA NA
2  2 NA  8 NA NA NA
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
  corrupt data frame: columns will be truncated or padded with NAs

Browse[2] n
[1]  2 NA
Warning message:
In mean.default(X[[2L]], ...) :
  argument is not numeric or logical: returning NA


Here's a larger test case. Note columns 1 and 3 turn to NULL

 df8.8 -