Re: [R] fitting mixed models to censored data?

2007-04-23 Thread Douglas Grove
Hi Bill,

Thanks for your reply.  The first place I looked was in
the survival package since it can obviously handle 
censored data.  However, I don't have any particular desire
to restrict myself to standard survival models just because
I have some censoring.  Frailties appear to fit in nicely
with the types of models typically used with survival data,
but that's not the only kind of model I'd like to look at.

Thanks,
Doug


On Mon, 23 Apr 2007, Pikounis, Bill [CNTUS] wrote:

> Doug,
> In perhaps similar situations where there are clusters of measurements
> due to repeated time or space on an individual subject or experimental
> unit, I have used the survreg() function from the survival library.
>
> You can specify left, right, and/or interval censoring within a data set
> through Surv(), and so I have used left censoring for the LOD
> observations. I was just focused on marginal or population-averaged
> estimation, so the use of cluster() in the argument for survreg() and
> the robust option in survreg() to get sandwich error estimates was
> sufficient for me. Depending on your needs to evaluate random effects,
> frailty() in the survival package -- which can be used with survreg() or
> coxph() --- is another alternative to explore, I believe.
>
> Hope that helps,
> Bill
> Nonclinical Statistics, Centocor R & D
>
>> -Original Message-
>> From: [EMAIL PROTECTED]
>> [mailto:[EMAIL PROTECTED] Behalf Of Douglas Grove
>> Sent: Monday, April 23, 2007 2:29 PM
>> To: Bert Gunter
>> Cc: r-help@stat.math.ethz.ch
>> Subject: Re: [R] fitting mixed models to censored data?
>>
>>
>> Hi Bert,
>>
>> Yes, I am always wary when one software offers something that
>> other do not.
>>
>> The censoring I'm faced with (at present) isn't as complicated
>> as with much 'survival' data.  I'm trying to analyze assay data
>> and have a lower limit of detection (LLD) to contend with.
>> Once the level of the analyte gets low enough it can't be
>> accurately quantitated, hence all that is reported is that
>> the level is less than some value (the LLD).
>>
>> So I'm not worried about all the complex assumptions that go along
>> with censoring in clinical trials, etc.
>>
>> Thanks,
>> Doug
>>
>>
>> On Mon, 23 Apr 2007, Bert Gunter wrote:
>>
>>> Douglas:
>>>
>>> AFAIK, this is subject area of active current research.
>> Diggle, Heagerty,
>>> Liang, and Zeger , 2002, (ANALYSIS OF LONGITUDINAL DATA)
>> say on p.316: "An
>>> emerging consensus is that analysis of data with
>> potentially informative
>>> dropouts necessarily involves assumptions which are
>> difficult, or even
>>> impossible, to check from the observed data."  This was ca
>> 1994, I believe,
>>> so I don't know whether this view is still held among
>> experts (which I am
>>> not). But if it is, you may do well to be careful of
>> whatever SAS does even
>>> if you do have to go running off to it.
>>>
>>> Cheers,
>>>
>>> Bert Gunter
>>> Genentech Nonclinical Statistics
>>>
>>>
>>> -Original Message-
>>> From: [EMAIL PROTECTED]
>>> [mailto:[EMAIL PROTECTED] On Behalf Of Douglas Grove
>>> Sent: Monday, April 23, 2007 10:58 AM
>>> To: r-help@stat.math.ethz.ch
>>> Subject: [R] fitting mixed models to censored data?
>>>
>>> Hi,
>>>
>>> I'm trying to figure out if there are any packages allowing
>>> one to fit mixed models (or non-linear mixed models) to data
>>> that includes censoring.
>>>
>>> I've done some searching already on CRAN and through the mailing
>>> list archives, but haven't discovered anything.  Since I may well
>>> have done a poor job searching I thought I'd ask here prior to
>>> giving up.
>>>
>>> I understand that SAS's proc nlmixed can accomodate censoring
>>> (though proc mixed apparently can't), so if I can't find
>>> something available in R, I'll have to break down and use
>>> that.  Please, save me from having to use SAS!
>>>
>>> Thanks much,
>>> Doug
>>>
>>> __
>>> R-help@stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>> __
>> R-help@stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fitting mixed models to censored data?

2007-04-23 Thread Douglas Grove
Hi Bert,

Yes, I am always wary when one software offers something that
other do not.

The censoring I'm faced with (at present) isn't as complicated
as with much 'survival' data.  I'm trying to analyze assay data
and have a lower limit of detection (LLD) to contend with. 
Once the level of the analyte gets low enough it can't be 
accurately quantitated, hence all that is reported is that 
the level is less than some value (the LLD).

So I'm not worried about all the complex assumptions that go along
with censoring in clinical trials, etc.

Thanks,
Doug


On Mon, 23 Apr 2007, Bert Gunter wrote:

> Douglas:
>
> AFAIK, this is subject area of active current research. Diggle, Heagerty,
> Liang, and Zeger , 2002, (ANALYSIS OF LONGITUDINAL DATA) say on p.316: "An
> emerging consensus is that analysis of data with potentially informative
> dropouts necessarily involves assumptions which are difficult, or even
> impossible, to check from the observed data."  This was ca 1994, I believe,
> so I don't know whether this view is still held among experts (which I am
> not). But if it is, you may do well to be careful of whatever SAS does even
> if you do have to go running off to it.
>
> Cheers,
>
> Bert Gunter
> Genentech Nonclinical Statistics
>
>
> -----Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Douglas Grove
> Sent: Monday, April 23, 2007 10:58 AM
> To: r-help@stat.math.ethz.ch
> Subject: [R] fitting mixed models to censored data?
>
> Hi,
>
> I'm trying to figure out if there are any packages allowing
> one to fit mixed models (or non-linear mixed models) to data
> that includes censoring.
>
> I've done some searching already on CRAN and through the mailing
> list archives, but haven't discovered anything.  Since I may well
> have done a poor job searching I thought I'd ask here prior to
> giving up.
>
> I understand that SAS's proc nlmixed can accomodate censoring
> (though proc mixed apparently can't), so if I can't find
> something available in R, I'll have to break down and use
> that.  Please, save me from having to use SAS!
>
> Thanks much,
> Doug
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] fitting mixed models to censored data?

2007-04-23 Thread Douglas Grove
Hi,

I'm trying to figure out if there are any packages allowing
one to fit mixed models (or non-linear mixed models) to data
that includes censoring.

I've done some searching already on CRAN and through the mailing
list archives, but haven't discovered anything.  Since I may well
have done a poor job searching I thought I'd ask here prior to
giving up.

I understand that SAS's proc nlmixed can accomodate censoring
(though proc mixed apparently can't), so if I can't find 
something available in R, I'll have to break down and use
that.  Please, save me from having to use SAS!

Thanks much,
Doug

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cannot turn some columns in a data frame into factors

2006-05-11 Thread Douglas Grove
You need to create a new object and assign it to 'df'

so you'd do something like this:

df <- sapply(factors, function (name) {
 pos <- match(name,df.names)
 factor(df[[pos]])
 })
 

Doug






On Thu, 11 May 2006, Sam Steingold wrote:

> > * jim holtman <[EMAIL PROTECTED]> [2006-05-11 12:27:39 -0400]:
> >
> > try '<<-' as the assignment to make it global.
> >
> >  df[[pos]] <<- factor(df[[pos]])
> 
> nothing changed -- I observe the exact same behaviour:
> 
> Month ( 1 ): TRUE 
> factors: FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 
> 
> 
> > On 5/11/06, Sam Steingold <[EMAIL PROTECTED]> wrote:
> >>
> >> Hi,
> >> I have a data frame df and a list of names of columns that I want to
> >> turn into factors:
> >>
> >> df.names <- attr(df,"names")
> >> sapply(factors, function (name) {
> >>pos <- match(name,df.names)
> >>if (is.na(pos)) stop(paste(name,": no such column\n"))
> >>df[[pos]] <- factor(df[[pos]])
> >>cat(name,"(",pos,"):",is.factor(df[[pos]]),"\n")
> >> })
> >> cat("factors:",sapply(df,is.factor),"\n")
> >>
> >> the output is:
> >>
> >>
> >> Month ( 1 ): TRUE
> >> factors: FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> >>
> >>
> >> i.e., there is a column named "Month" (the 1st column), and it is indeed
> >> turned into a factor inside sapply(), but after that it is numerical
> >> again!
> >>
> >> what am I doing wrong?
> 
> -- 
> Sam Steingold (http://www.podval.org/~sds) on Fedora Core release 5 (Bordeaux)
> http://pmw.org.il http://ffii.org http://memri.org http://palestinefacts.org
> http://truepeace.org http://mideasttruth.com http://dhimmi.com
> If you're being passed on the right, you're in the wrong lane.
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] is there a formatted output in R?

2006-03-10 Thread Douglas Grove
You really need to learn how to do some searching, as you seem to
be constantly asking questions you can answer yourself

help.search("sprintf")


On Fri, 10 Mar 2006, Michael wrote:

> something like "sprintf" in C?
> 
> so I can do:
> 
> print(sprintf("the correct result is %3.4f\n", myresult));
> 
> ---
> 
> Also, I am desperately looking for a "clear console screen"  function in
> R...
> 
> thanks a lot!
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] can I do this with read.table??

2006-01-26 Thread Douglas Grove
I did read the help page, very carefully.   

The colClasses argument can be used if I want
to stop and look through every data set to see
which column I need to protect.  But that's what I 
said that I don't want to do.

As for 'as.is', I wish it did what you suggest, but
it doesn't.  If one reads carefully, as.is protects
a character vector from converstion to a *factor*,
but not from conversion to numeric/logical.

Doug




On Sun, 26 Feb 2006, Kjetil Brinchmann Halvorsen wrote:

> Douglas Grove wrote:
> > Hi,
> > 
> > I'm trying to figure out if there's an automated way to get
> > read.table to read in my data and *not* convert the character
> > columns into anything, just leave them alone.  What I'm referring
> 
> ?Did you read the help page?
> What about argument as.is=TRUE?
> See also argument colClasses
> 
> Kjetil
> 
> > to as 'character columns' are columns in the data that are quoted.
> > For columns of alphabetic strings (that aren't TRUE or FALSE) I can
> > suppress conversion to factor with as.is=TRUE, but what I'd like to
> > stop is the conversion of quoted numbers of the form "01","02",..., into
> > numeric form.
> > 
> > By an 'automated way', I mean one that does not involve me having
> > to know which columns in the data are the ones I want kept as
> > they are.
> > 
> > This doesn't seem like an unreasonable thing to want to do.
> > After all, say I've got the data.frame:
> > 
> > A <- data.frame(a=1:3, b=I(c("01","02","03")))
> > 
> > I can export this to a text file with the simple command
> > 
> > write.table(A, "A.txt", sep="\t", row.names=FALSE, quote=TRUE)
> > 
> > but I cannot find an equally simple mechanism for reading this
> > data back in from A.txt that allows me to reconstruct my
> > data.frame 'A'.  Is this an unreasonable thing to expect?
> > 
> > Thanks,
> > Doug
> > 
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> > 
> 
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] can I do this with read.table??

2006-01-26 Thread Douglas Grove
Hi,

I'm trying to figure out if there's an automated way to get
read.table to read in my data and *not* convert the character
columns into anything, just leave them alone.  What I'm referring
to as 'character columns' are columns in the data that are quoted.
For columns of alphabetic strings (that aren't TRUE or FALSE) I can
suppress conversion to factor with as.is=TRUE, but what I'd like to
stop is the conversion of quoted numbers of the form "01","02",..., 
into numeric form.
 
By an 'automated way', I mean one that does not involve me having
to know which columns in the data are the ones I want kept as
they are.

This doesn't seem like an unreasonable thing to want to do.
After all, say I've got the data.frame:

  A <- data.frame(a=1:3, b=I(c("01","02","03")))

I can export this to a text file with the simple command

  write.table(A, "A.txt", sep="\t", row.names=FALSE, quote=TRUE)

but I cannot find an equally simple mechanism for reading this
data back in from A.txt that allows me to reconstruct my
data.frame 'A'.  Is this an unreasonable thing to expect?

Thanks,
Doug

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] " 'x' must be numeric"

2006-01-20 Thread Douglas Grove
It's much more helpful if you show the actual command you used.

Presumably you have a data frame 'd' and you've done

hist(d), and 'hist' has complained because d is not numeric,
d is a data frame that *contains* a numeric vector.

You need to give hist() that numeric vector, which you can do
in many ways, including: d$V1, d[,"V1"] and d[,1]

Doug


On Fri, 20 Jan 2006, Naiara S. Pinto wrote:

> Hello all,
> 
> I am importing data from a txt file and try to get a histogram, I get the
> message: "Error in hist: 'x' must be numeric".
> When I use mode R returns "List".
> However when I use srt I get:
> `data.frame':   456 obs. of  1 variable:
>  $ V1: num  0.6344 0.4516 0.0968 0.7634 0.7957 ...
> My file consists of one column only (no headers) and I can't figure out
> why I am getting this error message. Why does this happen?
> 
> Thanks!
> 
> Naiara.
> 
> 
> Naiara S. Pinto
> Ecology, Evolution and Behavior
> 1 University Station A6700
> Austin, TX, 78712
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Selecting data frame components by name - do you know a shorter way?

2006-01-20 Thread Douglas Grove
So you want to create a subset of a data frame?
with components "name1" "name2" "name3" ... 

dframe[, c("name1","name2","name3",...)]   

will do that

Doug



On Fri, 20 Jan 2006, Michael Reinecke wrote:

> Hi! I suspect there must be an easy way to access components of a data frame 
> by name, i.e. the input should look like "name1 name2 name3 ..." and the 
> output be a data frame of those components with the corresponding names. I 
> ´ve been trying for hours, but only found the long way to do it (which is not 
> feasible, since I have lots of components to select):
> 
>  
> 
> dframe[names(dframe)=="name1" | dframe=="name2" | dframe=="name3"]
> 
>  
> 
> Do you know a shortcut?
> 
>  
> 
> Michael
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] discovery (was: data.frame to character)w

2005-06-10 Thread Douglas Grove
Help pages are useful, you should try them

e.g. ?pi or ?LETTERS


> How can one discover or list all available built-in objects?

> On Jun 10, 2005, at 7:23 AM, Muhammad Subianto wrote:
> >> L3 <- LETTERS[1:3]
> >>  L10 <- LETTERS[1:10]

> LETTERS is apparently a built-in character vector.  ls() and objects 
> () only lists the ones I've created.  Is there a function that lists  
> all available built-in objects?

> For example, "pi" is another built-in, but "e" is not.  A means to  
> list them would be nice.

> Regards,
> - Robert

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] problem with dir() in R-2.1.0?

2005-04-25 Thread Douglas Grove
The new version of R has begun enforcing rules on regular expressions.
Your pattern is not a valid regular expression, hence it no longer works.
The meaning of '*' is with respect to a preceding character, hence it is
ill-defined without one.  



On Mon, 25 Apr 2005, Ye, Bin wrote:

> Hi,
> 
> I always use dir(pattern="*.RData") in all the earlier version of R (1.8, 
> 1.9, 2.0.1).
> 
> Error messege is as below:
> Error in list.files(path, pattern, all.files, full.names, recursive) :
> invalid 'pattern' regular expression
> 
> Does anyone have an idea what's going on? How should I define the pattern I 
> need in R-2.1.0?
> 
> Thanks!
> 
> 
> Bin
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] How about a mascot for R?

2004-12-04 Thread Douglas Grove
When I think of New Zealand I think "Rabbit" :)

How 'bout something like the Monty Python rabbit from 
"the Holy Grail" ("nasty pointy teeth...", "look at the bones!")

Doug

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] inverse function of order()

2004-10-04 Thread Douglas Grove
An alternate method that saves having to use order() again is

r[o] <- r

Doug




On Mon, 2004-10-04 at 15:21, Wolfram Fischer wrote:
> I have:
> 
>  d <- sample(10:100, 9)
>  o <- order(d)
>  r <- d[o]
> 
> How I can get d (in the original order), knowing only r and o?
> 
> Thanks - Wolfram
> 
> __
> [EMAIL PROTECTED] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] alternate rank method

2004-06-29 Thread Douglas Grove
I agree.  These are obvious extensions to the options provided
now by rank.  I didn't suggest this as I am not a contributor and
don't feel comfortable asking others to do more work :)

Thanks,
Doug


On Tue, 29 Jun 2004, Martin Maechler wrote:

> >>>>> "Torsten" == Torsten Hothorn <[EMAIL PROTECTED]>
> >>>>> on Mon, 28 Jun 2004 10:59:26 +0200 (CEST) writes:
> 
> Torsten> On Fri, 25 Jun 2004, Douglas Grove wrote:
> 
> >> I should have specified an additional constraint:
> >> 
> >> I'm going to need to use this repeatedly on large vectors
> >> (length 10^6), so something efficient is needed.
> >> 
> 
> Torsten> give function `irank' in package `exactRankTests' a
> Torsten> try.
> 
> As an answer to Torsten (who got it already orally) and Gabor's
> original tricky suggestions:
> 
> I strongly believe this should happen in the same C code on
> which R's base rank() function works and already implements the
> *averaging* of ties.
> Doing the analog of changing "average(..)" to min(..) or max(..)
> shouldn't be hard and certainly will be more efficient than the
> "workarounds" posted here.
> 
> Patches welcome...
> since otherwise I'm not sure I'll get there in time.
> 
> Martin
>

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] ties in runif() output

2004-06-26 Thread Douglas Grove
On Sat, 26 Jun 2004, Prof Brian Ripley wrote:

> On Fri, 25 Jun 2004, Douglas Grove wrote:
> 
> > I get ties in output from runif() when I generate as few as 10^5
> > variates and get quite a lot when I generate 10^6.  Is this 
> > expected??  
> 
> It should have been.
> 
> > I haven't seen any duplication with rnorm(10^6), but
> > see varying amounts of duplication using rexp(), rbeta() and
> > rgamma().  I would have thought that there'd be enough precision
> > that one wouldn't get ties until generating samples larger than this..
> 
> Did you do the calculations?  Please do so. There are about 2e9 possible
> values of the standard generators.

I know little about the limitations of random number generation 
and didn't realize that only 2e9 values were obtainable.
I could have done the math myself had I known

Thanks very much for your help,
Doug


> > qbirthday(classes=2e9)
> [1] 52655
> 
> Statisticians ought to know about the birthday problem!
> 
> (rnorm is different because the default generator uses two uniforms, 
> deliberately to increase the precision.)
> 
> > > set.seed(222)
> > > sum(duplicated(runif(10^5)))
> > [1] 4
> 
> That's unusually high, BTW.
> 
> > > sum(duplicated(runif(10^6)))
> > [1] 140
> 
> -- 
> Brian D. Ripley,  [EMAIL PROTECTED]
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel:  +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UKFax:  +44 1865 272595
> 
> __
> [EMAIL PROTECTED] mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] ties in runif() output

2004-06-25 Thread Douglas Grove
I get ties in output from runif() when I generate as few as 10^5
variates and get quite a lot when I generate 10^6.  Is this 
expected??  I haven't seen any duplication with rnorm(10^6), but
see varying amounts of duplication using rexp(), rbeta() and
rgamma().  I would have thought that there'd be enough precision
that one wouldn't get ties until generating samples larger than this..


> set.seed(222)
> sum(duplicated(runif(10^5)))
[1] 4

> sum(duplicated(runif(10^6)))
[1] 140


platform i686-pc-linux-gnu
arch i686
os   linux-gnu
system   i686, linux-gnu
status   Patched
major1
minor9.0
year 2004
month04
day  13
language R


Thanks,
Doug Grove

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] alternate rank method

2004-06-25 Thread Douglas Grove
I should have specified an additional constraint:

I'm going to need to use this repeatedly on large
vectors (length 10^6), so something efficient is
needed.


On Fri, 25 Jun 2004, Sundar Dorai-Raj wrote:

> Douglas Grove wrote:
> 
> > Hi,
> > 
> > I'm wondering if anyone can point me to a function that will
> > allow me to do a ranking that treats ties differently than
> > rank() provides for?
> > 
> > I'd like a method that will assign to the elements of each 
> > tie group the largest rank. 
> > 
> > An example:  
> > 
> > For the vector 'v', I'd like the method to return 'rv'
> > 
> >  v:  1 2 3 3 3 4 5 5 6 7
> > rv:  1 2 5 5 5 6 8 8 9 10
> > 
> > 
> > Thanks,
> > Doug Grove
> > 
> 
> How about
> 
> rv <- rowSums(outer(v, v, ">="))
> 
> Adapted from Prof. Ripley's reply in the following thread:
> 
> http://finzi.psych.upenn.edu/R/Rhelp02/archive/31993.html
> 
> HTH,
> 
> --sundar
> 
> __
> [EMAIL PROTECTED] mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] alternate rank method

2004-06-25 Thread Douglas Grove
Hi,

I'm wondering if anyone can point me to a function that will
allow me to do a ranking that treats ties differently than
rank() provides for?

I'd like a method that will assign to the elements of each 
tie group the largest rank. 

An example:  

For the vector 'v', I'd like the method to return 'rv'

 v:  1 2 3 3 3 4 5 5 6 7
rv:  1 2 5 5 5 6 8 8 9 10


Thanks,
Doug Grove

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] predict function

2004-02-13 Thread Douglas Grove
You can't use this anymore.  The function predict() has a method
for loess objects, but there is no longer an available function
called "predict.loess".   So just replace "predict.loess"
with "predict".


On Fri, 13 Feb 2004, Thomas Jagoe wrote:

> I am using R to do a loess normalisation procedure.
> In 1.5.1 I used the following commands to normalise the variable "logratio",
> over a 2d surface (defined by coordinates x and y):
> 
> > array <- read.table("121203B_QCnew.txt", header=T, sep="\t")
> > array$logs555<-log(array$s555)/log(2)
> > array$logs647<-log(array$s647)/log(2)
> > array$logratio<-array$logs555-array$logs647
> > array$logav<-(array$logs555+array$logs647)/2
> > library(modreg)
> > loess2d<-loess(logratio~x+y,data=array)
> > array$logratio2DLoeNorm <-array$logratio - predict.loess(loess2d, array)
> 
> However in 1.8.1 all goes well until the last step when I get an error:
> 
> Error: couldn't find function "predict.loess"
> 
> Can anyone help ?
> 
> Thomas
> 
> __
> [EMAIL PROTECTED] mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Windows Memory Issues

2003-12-08 Thread Douglas Grove
On Sat, 6 Dec 2003, Prof Brian Ripley wrote:

> I think you misunderstand how R uses memory.  gc() does not free up all 
> the memory used for the objects it frees, and repeated calls will free 
> more.  Don't speculate about how memory management works: do your 
> homework!

Are you saying that consecutive calls to gc() will free more memory than
a single call, or am I misunderstanding?   Reading ?gc and ?Memory I don't
see anything about this mentioned.  Where should I be looking to find 
more comprehensive info on R's memory management??  I'm not writing any
packages, just would like to have a better handle on efficiently using
memory as it is usually the limiting factor with R.  FYI, I'm running
R1.8.1 and RedHat9 on a P4 with 2GB of RAM in case there is any platform
specific info that may be applicable.

Thanks,

Doug Grove
Statistical Research Associate
Fred Hutchinson Cancer Research Center




 
> In any case, you are using an outdated version of R, and your first
> course of action should be to compile up R-devel and try that, as there 
> has been improvements to memory management under Windows.  You could also 
> try compiling using the native malloc (and that *is* described in the 
> INSTALL file) as that has different compromises.
> 
> 
> On Sat, 6 Dec 2003, Richard Pugh wrote:
> 
> > Hi all,
> >  
> > I am currently building an application based on R 1.7.1 (+ compiled
> > C/C++ code + MySql + VB).  I am building this application to work on 2
> > different platforms (Windows XP Professional (500mb memory) and Windows
> > NT 4.0 with service pack 6 (1gb memory)).  This is a very memory
> > intensive application performing sophisticated operations on "large"
> > matrices (typically 5000x1500 matrices).
> >  
> > I have run into some issues regarding the way R handles its memory,
> > especially on NT.  In particular, R does not seem able to recollect some
> > of the memory used following the creation and manipulation of large data
> > objects.  For example, I have a function which receives a (large)
> > numeric matrix, matches against more data (maybe imported from MySql)
> > and returns a large list structure for further analysis.  A typical call
> > may look like this .
> >  
> > > myInputData <- matrix(sample(1:100, 750, T), nrow=5000)
> > > myPortfolio <- createPortfolio(myInputData)
> >  
> > It seems I can only repeat this code process 2/3 times before I have to
> > restart R (to get the memory back).  I use the same object names
> > (myInputData and myPortfolio) each time, so I am not create more large
> > objects ..
> >  
> > I think the problems I have are illustrated with the following example
> > from a small R session .
> >  
> > > # Memory usage for Rui process = 19,800
> > > testData <- matrix(rnorm(1000), 1000) # Create big matrix
> > > # Memory usage for Rgui process = 254,550k
> > > rm(testData)
> > > # Memory usage for Rgui process = 254,550k
> > > gc()
> >  used (Mb) gc trigger  (Mb)
> > Ncells 369277  9.9 667722  17.9
> > Vcells  87650  0.7   24286664 185.3
> > > # Memory usage for Rgui process = 20,200k
> >  
> > In the above code, R cannot recollect all memory used, so the memory
> > usage increases from 19.8k to 20.2.  However, the following example is
> > more typical of the environments I use .
> >  
> > > # Memory 128,100k
> > > myTestData <- matrix(rnorm(1000), 1000)
> > > # Memory 357,272k
> > > rm(myTestData)
> > > # Memory 357,272k
> > > gc()
> >   used (Mb) gc trigger  (Mb)
> > Ncells  478197 12.8 818163  21.9
> > Vcells 9309525 71.1   31670210 241.7
> > > # Memory 279,152k
> >  
> > Here, the memory usage increases from 128.1k to 279.1k
> >  
> > Could anyone point out what I could do to rectify this (if anything), or
> > generally what strategy I could take to improve this?
> >  
> > Many thanks,
> > Rich.
> >  
> > Mango Solutions
> > Tel : (01628) 418134
> > Mob : (07967) 808091
> >  
> > 
> > [[alternative HTML version deleted]]
> > 
> > __
> > [EMAIL PROTECTED] mailing list
> > https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> > 
> > 
> 
> -- 
> Brian D. Ripley,  [EMAIL PROTECTED]
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel:  +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UKFax:  +44 1865 272595
> 
> __
> [EMAIL PROTECTED] mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] A suggestion regarding multiple replies

2003-11-14 Thread Douglas Grove
On Fri, 14 Nov 2003 [EMAIL PROTECTED] wrote:

> I was wondering if it is time to adopt a strategy a-la Splus help whereby 
> people reply to the author and the author summarizes all the replies?


That might be a bit extreme, but it would be nice if people didn't
reply to the list (only to the authors) for very basic questions.

Most of us already know how to e.g. find the position of the 
largest element in a vector.

Doug

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Kmeans again

2003-06-06 Thread Douglas Grove
> I'm sorry to insist but I still think there is something wrong with the function 
> kmeans. For instance, let's try the same small example:
>  
> > dados<-matrix(c(-1,0,2,2.5,7,9,0,3,0,6,1,4),6,2)
> 
> I will choose observations 3 and 4 for initial centers and just one iteration. The 
> results are
>  
> > A<-kmeans(dados,dados[c(3,4),],1)
> > A
> $cluster
> [1] 1 1 1 1 2 2
> $centers
>[,1] [,2]
> 1 0.875 2.75
> 2 8.000 2.50
> $withinss
> [1] 38.9375  6.5000
> $size
> [1] 4 2
>  
> If I do it by hand, after one iteration, the results are
>  
> $cluster
> [1] 1 2 1 2 1 2
>  
> So I think that something is wrong with the function kmeans; probably the initial 
> centers given
>  by the user are not being taken into account.


Andy Liaw already gave an example where he specified two different starting 
values and Kmeans gave different results after 1 iteration, so clearly 
your hypothesis is incorrect.

Either your calculations are wrong or you are calculating the wrong
formulae.  It is very doubtful that anything is wrong with Kmeans.

Doug Grove

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] removing leading/trailing blanks

2003-02-19 Thread Douglas Grove
Hi,

What's the best way of dropping leading or trailing
blanks from a character string?  

The only thing I can think of is using sub() to replace
blanks with null strings, but I don't know if there is
a better way (I also don't know how to represent the
trailing blank in a regular expression).

Thanks,
Doug Grove

__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help



Re: [R] dataframe subsetting behaviour

2003-01-22 Thread Douglas Grove
> Douglas Grove <[EMAIL PROTECTED]> writes:
> 
> > Hi,
> > 
> > I'm trying to understand a behaviour that I have encountered
> > and can't fathom.
> > 
> > 
> > Here's some code I will use to illustrate the behaviour:
> > 
> > # start with some data frame "a" having some named columns
> > a <- data.frame(a=rep(1,3),c=rep(2,3),d=rep(3,3),e=rep(4,3))
> > 
> > # create a subset of the original data frame, but include a
> > # name "b" that is not present in my original data frame
> > b <- a[,c("a","b","c")]
> > 
> > 
> > ## Up until now no errors are issued, but the following commands
> > ## will give the error shown:
> > 
> > b[1,] ## "Error in x[[j]] : subscript out of bounds"
> > b[1,2]## "Error in "names<-.default"(*tmp*, value = cols) : 
> >   ##  names attribute must be the same length as the vector"
> > 
> > 
> > Can anyone explain to me the meaning of these error messages in terms
> > of R is actually doing?  These error messages had me baffled and 
> > it took me hours to track down that the source of the error was an 
> > incorrect column name in my data frame subsetting.
> 
> Looks like a (semi-)bug. Indexing outside of the data frame creates a
> "column" which is really the single value NULL, e.g. 
> 
> > dput(a[,4:5])
> structure(list(e = c(4, 4, 4), "NA" = NULL), .Names = c("e",
> NA), row.names = c("1", "2", "3"), class = "data.frame")
> 
> This will print because the format.data.frame called inside
> print.data.frame will recycle the NULL and give you
> 
> > a[,4:5]
>   e   NA
> 1 4 NULL
> 2 4 NULL
> 3 4 NULL
> 
> However, it confuses the h*ck out of "[.data.frame"
> 
> > (a[,4:5])[2]
> Error in "[.data.frame"((a[, 4:5]), 2) : undefined columns selected
> > (a[,4:5])[,2]
> NULL
> > (a[,4:5])[,1]
> [1] 4 4 4
> 
> and also the examples you found. However, the main issue is that you
> have managed to construct a corrupt data frame. So indexing outside
> the array should probably either give an error or return a column of
> NA.


Yes, it would be nice if trying to index outside the data frame generated
an error, that is what happens in Splus (at least the version I have
access to: 6.0 Release 1 for Linux 2.2.12)


> 
> -- 
>O__   Peter Dalgaard Blegdamsvej 3  
>   c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
>  (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
> ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907
>

__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help



[R] dataframe subsetting behaviour

2003-01-22 Thread Douglas Grove
Hi,

I'm trying to understand a behaviour that I have encountered
and can't fathom.


Here's some code I will use to illustrate the behaviour:

# start with some data frame "a" having some named columns
a <- data.frame(a=rep(1,3),c=rep(2,3),d=rep(3,3),e=rep(4,3))

# create a subset of the original data frame, but include a
# name "b" that is not present in my original data frame
b <- a[,c("a","b","c")]


## Up until now no errors are issued, but the following commands
## will give the error shown:

b[1,] ## "Error in x[[j]] : subscript out of bounds"
b[1,2]## "Error in "names<-.default"(*tmp*, value = cols) : 
  ##  names attribute must be the same length as the vector"


Can anyone explain to me the meaning of these error messages in terms
of R is actually doing?  These error messages had me baffled and 
it took me hours to track down that the source of the error was an 
incorrect column name in my data frame subsetting.

Thanks,
Doug Grove
Statistical Research Associate
Fred Hutchinson Cancer Research Center
Seattle, WA

__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help