date:20050718

Re: [R] New functions supporting GIF file format in R

2005-07-18 Thread Martin Maechler

> "JarekT" == Tuszynski, Jaroslaw W <[EMAIL PROTECTED]>
> on Mon, 18 Jul 2005 16:00:43 -0400 writes:

JarekT> Hi, A minor announcement. I just added two functions
JarekT> for reading and writing GIF files to my caTools
JarekT> package. Input and output is in the form of standard
JarekT> R matrices or arrays, and standard R color-maps
JarekT> (palettes). The functions can read and write both
JarekT> regular GIF images, as well as, multi-frame animated
JarekT> GIFs. Most of the work is done in C level code
JarekT> (included), so functions do not use any external
JarekT> libraries.

JarekT> For more info and examples go to
JarekT> http://cran.r-project.org/doc/packages/caTools.pdf
JarekT> 
JarekT> and click GIF.

Wouldn't it make sense to donate these to the 'pixmap' package
which is dedicated to such objects and has been in place for a
very long time?

Regards,
Martin

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] how to get dissimilarity matrix

2005-07-18 Thread Martin Maechler

> "Baoqiang" == Baoqiang Cao <[EMAIL PROTECTED]>
> on Mon, 18 Jul 2005 15:02:05 -0400 writes:

Baoqiang> Hello All, I'm learning R. Just wonder, any
Baoqiang> package or function that I can use to get the
Baoqiang> dissimilarity matrix? Thanks.

Yes,
learn to use help.search()  {also read the docu :  ?help.search}

help.search("dissimilarity")

and find daisy() in recommended package 'cluster'.
There's also  dist() in 'stats' which is a bit less versatile.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] how to solve the step halving factor problems in gnls and nls

2005-07-18 Thread Dieter Menne

Yimeng Lu  columbia.edu> writes:

> Could you give me some advice in 
> solving the problem of such error message from gnls and nls?
> ## begin error message
> 
> "Problem in gnls(y1 ~ glogit4(b, c, m, t, x), data.frame(x..: Step halving 
> factor reduced below minimum in NLS step "
> 

Try to set nlsTol in the optional control parameter (gnlsControl) to a larger 
value, e.g. 0.1 instead of default 0.001, but be sure to check your results 
make sense. If this should help, you can try intermediate values.

Dieter Menne

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Survival dummy variables and some questions

2005-07-18 Thread Stephen

Many thanks I follow you what you say You can request predicted
values at any sequence of ages - I guess there are plenty of postings on
how to do that  Regards, Stephen - Original Message - From:
"Frank E Harrell Jr" To: "Stephen" Cc: "Prof Brian Ripley" ; Sent:
Monday, July 18, 2005 6:13 PM Subject: Re: [R] Survival dummy variables
and some questions > Stephen wrote: >> Hi 1. Right perhaps this should
clarify. I would like to extract >> coefficeints for different levels of
the IVs (covariate). So for >> instance, age of onset I would want
Hazards etc for every 5 years and so >> on... The approach I took was to
categorize the variables (e.g., age of >> onset) and then turn the
resultant categorical variable into a factor as >> opposed to a
variable... that is when the problems began An >> alternative
approach to pulling out different values at different levels >> of the
variable is what I seek. 2. I looked for the link, but can't > > Your
needs don't require categorization. You can request predicted values >
at any sequence of ages. If you want hazard ratios you can take >
differences in predicted log hazards and antilog them. > > Frank > > >
-- > Frank E Harrell Jr Professor and Chair School of Medicine >
Department of Biostatistics Vanderbilt University > 

 ?"?  
http://mail.nana.co.il

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] help: how to change the column name of data.frame

2005-07-18 Thread wu sz

Hello,

I have a data frame with 15 variables, and want to exchange the data
of 4th column and 6th column. First I append a column in the data
frame, copy the 4th column data there, then copy the 6th column data
to 4th column, and copy the appended column data to 6th column, but
the names of the 4th and 6th column are still unchanged. How can I
exchange them?

Thank you,
Shengzhe

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Obtaining argument name within a function

2005-07-18 Thread John Sorkin

Francisco,
I had exactly the same question a few days ago. Try the following:
 
z<-function(x)
{
xName <- deparse(substitute(x))
cat(xName)
cat("The parameter name was ",xName,"\n")

}
 
Let me know if this works.
n.b. the \n in the cat function is a signal to print the next piece of
data
on the next line.
 
John
 
John Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
Baltimore VA Medical Center GRECC and
University of Maryland School of Medicine Claude Pepper OAIC
 
University of Maryland School of Medicine
Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
 
410-605-7119 
- NOTE NEW EMAIL ADDRESS:
[EMAIL PROTECTED]

>>> "Francisco J. Zagmutt" <[EMAIL PROTECTED]> 7/18/2005 10:16:53
PM >>>


Dear all

How can I obtain the name of the argument passed in a function?  Here
is a 
simplistic example of what I would like to obtain:

myfunction= function(name) {
 print(paste("The parameter name was",unknownFunction(name))
 }

myfunction(myobject)
[1] "The parameter name was myobject"

Thanks

Francisco

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Obtaining argument name within a function

2005-07-18 Thread Sebastian Luque

"Francisco J. Zagmutt" <[EMAIL PROTECTED]> wrote:
> Dear all
>
> How can I obtain the name of the argument passed in a function? Here is a
> simplistic example of what I would like to obtain:
>
> myfunction= function(name) {
> print(paste("The parameter name was",unknownFunction(name))
> }
>
> myfunction(myobject)
> [1] "The parameter name was myobject"


?substitute

myfunction <- function(obj) {
  paste("The parameter name was", deparse(substitute(obj)))
}

myfunction(myobject)

-- 
Sebastian P. Luque

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Obtaining argument name within a function

2005-07-18 Thread Francisco J. Zagmutt

Dear all

How can I obtain the name of the argument passed in a function?  Here is a 
simplistic example of what I would like to obtain:

myfunction= function(name) {
 print(paste("The parameter name was",unknownFunction(name))
 }

myfunction(myobject)
[1] "The parameter name was myobject"

Thanks

Francisco

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] svmlight running error

2005-07-18 Thread Luke

Dear R Users,

When I used svmlight, I got below error:

my command is:
foo <- svmlight(y~., data= myData)

the results:
Error in file(con, "r") : unable to open connection
In addition: Warning messages:
1: svm_learn not found 
2: cannot open file '_model_1.txt'

> myData[1:2,]
  y X1 X2 X3 X4  X5  X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17
1 1 63  1  0  0 145 233  1  1  0 150   0 2.3   1   0   0   1   0
2 0 67  0  1  0 160 286  0  1  0 108   1 1.5   0   1   3   0   1

I wonder what is the possible reason for this error.

-Luke

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Time Series Count Models

2005-07-18 Thread Brett Gordon

Paul,

Thank you so much for your thoughtful reply. I agree - there are many
possible descriptions for my data, and I realize that I don't want to
get bogged down with figuring out the 'best' model if something simple
will work well. For me, I think the difficulty is going to be handling
the cumulative aspect of the lagged variable.

To be clear, suppose that y_1, ..., y_T are the counts. At time t=3, I
want to include the quantity (y_1 + y_2) as an independent variable,
and so on. I wonder if this is as simple as solving a conditional ML
problem..I'll have to look more deeply into it.

Again, thanks for the references.

-Brett 

On 7/18/05, Paul Johnson <[EMAIL PROTECTED]> wrote:
> Dear Brett:
> 
> There are books for this topic that are more narrowly tailored to your
> question. Lindsey's Models for Repeated Measurements and Diggle, et al's
> Analysis of Longitudinal Data.  Lindsey offers an R package on his web
> site. If you dig around, you will find many modeling papers on this,
> although in my mind none coalesced into a completely clear path such as
> "throw in these variables and you will get the right estimates".
> 
> The problem, as you will see, is that there are many possible
> mathematical descriptions of the idea that there is time dependence in a
> count model.
> 
> My political science colleagues John Williams and Pat Brandt published 2
> articles on time series with counts.  My favorite is the second one
> here.  There is R code for the Pests model.
> http://www.utdallas.edu/~pbrandt/pests/pests.htm
> 
> Brandt, Patrick T., John T. Williams, Benjamin O. Fordham and Brian
> Pollins. 2000. "Dynamic Modelling For Persistent Event Count Time
> Series." American Journal of Political Science 44(4): 823-843.
> 
> Brandt, Patrick T. and John T. Williams. 2001. "A Linear Poisson
> Autoregressive Model: the Poisson AR(p) Model." Political Analysis 9(2):
> 164-184.
> 
> I worked really hard on TS counts a while ago because a student was
> trying that.  If you look at J Lindsay's book Models for Repeated
> Measures you will make some progress on understanding his method
> kalcount. That's in the repeated library you get from his web site.
> 
> Here are the notes I made a couple of years ago
> 
> http://lark.cc.ku.edu/~pauljohn/stats/TimeSeries/
> 
> Look for files called TSCountData*.pdf.
> 
> 
> It all boils down to the fact that you can't just act like it is an OLS
> model and throw Y_t-1 or something like that on the right had side.
> Instead, you have to think in a more delicate way about the process you
> are modeling and hit it from that other direction.
> 
> Here are some of the articles for which I kept copies.
> 
> U. Bokenholt, "Mixed INAR(1) Poisson regression models" Journal of
> Econometrics, 89 (1999): 317-338
> 
> A.C. Harvey and C. Fernandes, "Time Series Models for Count or
> Qualitative Observations, " Journal of Business & Economic Statistics, 4
> (1989): 407-
> 
> 
> I recall liking this one a lot
> 
> J E Kelsall and Scott Zeger and J M Samet "Frequency Domain Log-linear
> Models; air pollution and mortality" Appl. Statis 48 1999 331-344.
> 
> Good luck, let me know what you find out.
> 
> pj
> 
> Brett Gordon wrote:
> > Hello,
> >
> > I'm trying to model the entry of certain firms into a larger number of
> > distinct markets over time. I have a short time series, but a large
> > cross section (small T, big N).
> >
> > I have both time varying and non-time varying variables. Additionally,
> > since I'm modeling entry of firms, it seems like the number of
> > existing firms in the market at time t should depend on the number of
> > firms at (t-1), so I would like to include the lagged cumulative count.
> >
> > My basic question is whether it is appropriate (in a statistical
> > sense) to include both the time varying variables and the lagged
> > cumulative count variable. The lagged count aside, I know there are
> > standard extensions to count models to handle time series. However,
> > I'm not sure if anything changes when lagged values of the cumulative
> > dependent variable are added (i.e. are the regular standard errors
> > correct, are estimates consistent, etc).
> >
> > Can I still use one of the time series count models while including
> > this lagged cumulative value?
> >
> > I would greatly appreciate it if anyone can direct me to relevant
> > material on this. As a note, I have already looked at Cameron and
> > Trivedi's book.
> >
> > Many thanks,
> >
> > Brett
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html
> 
> 
> --
> Paul E. Johnson   email: [EMAIL PROTECTED]
> Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
> 1541 Lilac Lane, Rm 504
> University of Kansas  Office: (785) 864-9086
> Lawrence, Kansas 66044-3177   F

Re: [R] Time Series Count Models

2005-07-18 Thread Spencer Graves

  We are leveraging too far on speculation, at least from what I can 
see.  PLEASE do read the posting guide! 
"http://www.R-project.org/posting-guide.html";.  In particular, try the 
simplest example you can find that illustrates your question, and 
explain your concerns to us in terms of a short series of R commands and 
the resulting output.

  With counts, especially if there were only a few zeros, I'd start by 
taking logarithms (after replacing 0's by something like 0.5 or by 
adding something like 0.5 to avoid sending 0's to (-Inf)) and use "lme", 
if that seemed appropriate.  Then if I got drastically different answers 
from other software, I would suspect a problem.

  Other possibilities for count data are the following:

  * "lmer" library(lme4) [see Douglas Bates. Fitting linear mixed 
models in R. R News, 5(1):27-30, May 2005, www.r-project.org -> 
Newsletter -> "Volume 5/1, May 2005: PDF".

  * "glmmPQL" in library(MASS).

  * "glmmML" in library(glmmML)

  However, I don't know if any of these as the capability now to handle 
short time series like you described.

  You might also consider the IEKS package by Bjarke Mirner Klein 
(http://www.stat.sdu.dk/publications/monographs/m001/KleinPhdThesis.pdf and
http://genetics.agrsci.dk/~bmk/IEKS.R).

  spencer graves

Brett Gordon wrote:

> Thanks for the suggestion. Is such a model appropriate for count data?
> The library you reference seems to just be form standard regressions
> (ie those with continuous dependent variables).
> 
> Thanks,
> Brett
> 
> On 7/16/05, Spencer Graves <[EMAIL PROTECTED]> wrote:
> 
>>  Have you considered "lme" in library(nlme)?  If you want to go this
>>route, I recommend Pinheiro and Bates (2000) Mixed-Effect Models in S
>>and S-Plus (Springer).
>>
>>  spencer graves
>>
>>Brett Gordon wrote:
>>
>>
>>>Hello,
>>>
>>>I'm trying to model the entry of certain firms into a larger number of
>>>distinct markets over time. I have a short time series, but a large
>>>cross section (small T, big N).
>>>
>>>I have both time varying and non-time varying variables. Additionally,
>>>since I'm modeling entry of firms, it seems like the number of
>>>existing firms in the market at time t should depend on the number of
>>>firms at (t-1), so I would like to include the lagged cumulative count.
>>>
>>>My basic question is whether it is appropriate (in a statistical
>>>sense) to include both the time varying variables and the lagged
>>>cumulative count variable. The lagged count aside, I know there are
>>>standard extensions to count models to handle time series. However,
>>>I'm not sure if anything changes when lagged values of the cumulative
>>>dependent variable are added (i.e. are the regular standard errors
>>>correct, are estimates consistent, etc).
>>>
>>>Can I still use one of the time series count models while including
>>>this lagged cumulative value?
>>>
>>>I would greatly appreciate it if anyone can direct me to relevant
>>>material on this. As a note, I have already looked at Cameron and
>>>Trivedi's book.
>>>
>>>Many thanks,
>>>
>>>Brett
>>>
>>>__
>>>R-help@stat.math.ethz.ch mailing list
>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>>
>>--
>>Spencer Graves, PhD
>>Senior Development Engineer
>>PDF Solutions, Inc.
>>333 West San Carlos Street Suite 700
>>San Jose, CA 95110, USA
>>
>>[EMAIL PROTECTED]
>>www.pdf.com 
>>Tel:  408-938-4420
>>Fax: 408-280-7915
>>
> 
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA

[EMAIL PROTECTED]
www.pdf.com 
Tel:  408-938-4420
Fax: 408-280-7915

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Time Series Count Models

2005-07-18 Thread Paul Johnson

Dear Brett:

There are books for this topic that are more narrowly tailored to your 
question. Lindsey's Models for Repeated Measurements and Diggle, et al's 
Analysis of Longitudinal Data.  Lindsey offers an R package on his web 
site. If you dig around, you will find many modeling papers on this, 
although in my mind none coalesced into a completely clear path such as 
"throw in these variables and you will get the right estimates".

The problem, as you will see, is that there are many possible 
mathematical descriptions of the idea that there is time dependence in a 
count model.

My political science colleagues John Williams and Pat Brandt published 2 
articles on time series with counts.  My favorite is the second one 
here.  There is R code for the Pests model. 
http://www.utdallas.edu/~pbrandt/pests/pests.htm

Brandt, Patrick T., John T. Williams, Benjamin O. Fordham and Brian 
Pollins. 2000. "Dynamic Modelling For Persistent Event Count Time 
Series." American Journal of Political Science 44(4): 823-843.

Brandt, Patrick T. and John T. Williams. 2001. "A Linear Poisson 
Autoregressive Model: the Poisson AR(p) Model." Political Analysis 9(2): 
164-184.

I worked really hard on TS counts a while ago because a student was 
trying that.  If you look at J Lindsay's book Models for Repeated 
Measures you will make some progress on understanding his method 
kalcount. That's in the repeated library you get from his web site.

Here are the notes I made a couple of years ago

http://lark.cc.ku.edu/~pauljohn/stats/TimeSeries/

Look for files called TSCountData*.pdf.

It all boils down to the fact that you can't just act like it is an OLS 
model and throw Y_t-1 or something like that on the right had side. 
Instead, you have to think in a more delicate way about the process you 
are modeling and hit it from that other direction.

Here are some of the articles for which I kept copies.

U. Bokenholt, "Mixed INAR(1) Poisson regression models" Journal of 
Econometrics, 89 (1999): 317-338

A.C. Harvey and C. Fernandes, "Time Series Models for Count or 
Qualitative Observations, " Journal of Business & Economic Statistics, 4 
(1989): 407-

I recall liking this one a lot

J E Kelsall and Scott Zeger and J M Samet "Frequency Domain Log-linear 
Models; air pollution and mortality" Appl. Statis 48 1999 331-344.

Good luck, let me know what you find out.

pj

Brett Gordon wrote:
> Hello,
> 
> I'm trying to model the entry of certain firms into a larger number of
> distinct markets over time. I have a short time series, but a large
> cross section (small T, big N).
> 
> I have both time varying and non-time varying variables. Additionally,
> since I'm modeling entry of firms, it seems like the number of
> existing firms in the market at time t should depend on the number of
> firms at (t-1), so I would like to include the lagged cumulative count.
> 
> My basic question is whether it is appropriate (in a statistical
> sense) to include both the time varying variables and the lagged
> cumulative count variable. The lagged count aside, I know there are
> standard extensions to count models to handle time series. However,
> I'm not sure if anything changes when lagged values of the cumulative
> dependent variable are added (i.e. are the regular standard errors
> correct, are estimates consistent, etc).
> 
> Can I still use one of the time series count models while including
> this lagged cumulative value?
> 
> I would greatly appreciate it if anyone can direct me to relevant
> material on this. As a note, I have already looked at Cameron and
> Trivedi's book.
> 
> Many thanks,
> 
> Brett
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] definition of index.array and boot.return in the code for boot

2005-07-18 Thread Spencer Graves

  Excellent question.  Try 'getAnywhere("index.array")'.  It's hidden 
in "namespace:boot".  Ditto for "boot.return".

  spencer graves

Obrien, Josh wrote:

> Dear R friends,
> 
> I am reading the code for the function boot in package:boot in an attempt to 
> learn how and where it implements the random resampling used by the 
> non-parametric bootstraps.
> 
> The code contains two (apparent) functions - 'index.array'  and  
> 'boot.return' - for which I can find no documentation, and which don't even 
> seem to exist anywhere on the search path.  What are they?
> 
> Also, if the meanings of those two don't answer my larger question, could you 
> point me to the code that implements the random resampling?
> 
> Thanks very much for your help,
> 
> Josh O'Brien
> UC Davis
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA

[EMAIL PROTECTED]
www.pdf.com 
Tel:  408-938-4420
Fax: 408-280-7915

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] how to get dissimilarity matrix

2005-07-18 Thread Baoqiang Cao

Hello All,
   I'm learning R. Just wonder, any package or function that I can use to get 
the dissimilarity matrix? Thanks.

Best regards, 
  Baoqiang Cao

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Package Vegan: species accumlumation functions

2005-07-18 Thread Steven K Friedman


Hi everyone, 

I am working with a data frame consisting of 1009 sampling locations, 138 
species incidence and abundance data, and eight forest community types. 

My goal is to develop species acumulation curves and extrapolated estimates 
for each community type. 

I am using the following approach:
attach(forest_plots)
library(vegan)
# calcuate species abundance and species incidence (presence/absence)per 
plot
# where Point_ID = plot sample unit. 

sp.abund <- table(forest_plots$Latin_Name, forest_plots$Point_ID)
sp.incid <- matrix(ifelse(sp.abund > 0, 1, sp.abund)) 

## now quantify richness using community type as "pool" 

richness.pool <- specpool(sp.incid, type) 

> richness.pool
richness.pool
  Species Chao  Chao.SE   Jack.1 Jack1.SE   Jack.2 Boot  Boot.SE   n
5   NA   NA   NA   NA   NA   NA   NA   NA  
52
8  546 718.3939 28.70646 787.3462 139.0467 858.6446 662.9829 97.92155  
26
9   NA   NA   NA   NA   NA   NA   NA   NA  
93
10  NA   NA   NA   NA   NA   NA   NA   NA 
126
11  NA   NA   NA   NA   NA   NA   NA   NA  
36
13  NA   NA   NA   NA   NA   NA   NA   NA 
122
15  NA   NA   NA   NA   NA   NA   NA   NA  
36
18 364 687.1809 55.49003 591.5556 135.2185 722.6111 463.3126 83.16079   
9 

Ok, I do not understand this output.  Why is NA reported for all community
types other than type 8 and 18? 

Thanks for helping.
Steve F.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] New functions supporting GIF file format in R

2005-07-18 Thread Tuszynski, Jaroslaw W.

Hi,

 

A minor announcement. I just added two functions for reading and writing GIF
files to my caTools package. Input and output is in the form of standard R
matrices or arrays, and standard R color-maps (palettes). The functions can
read and write both regular GIF images, as well as, multi-frame animated
GIFs. Most of the work is done in C level code (included), so functions do
not use any external libraries. 

 

For more info and examples go to
http://cran.r-project.org/doc/packages/caTools.pdf
  and click GIF.

 

Jarek Tuszynski 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] listing datasets from all my packages

2005-07-18 Thread Elizabeth Purdom

Hi,
I am using R 2.1.0 on Windows XP and when I type data() to list the 
datasets in R, there is a helpful hint to type 'data(package = 
.packages(all.available = TRUE))' to see the datasets in all of the 
packages -- not just the active ones.

However, when I do this, I get the following message:
 > data(package = .packages(all.available = TRUE))
Error in rbind(...) : number of columns of matrices must match (see arg 2)
In addition: Warning messages:
1: datasets have been moved from package 'base' to package 'datasets' in: 
data(package = .packages(all.available = TRUE))
2: datasets have been moved from package 'stats' to package 'datasets' in: 
data(package = .packages(all.available = TRUE))

I possibly have old libraries in my R libraries because I copy them forward 
and update them with new versions of R, rather than redownload them. Is 
there a way to fix this or do the same another way? (I saw something in 
archives about a problem similar to this with .packages(), but I got the 
impression it was fixed for 2.1.0)

Thanks,
Elizabeth

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Time Series Count Models

2005-07-18 Thread Brett Gordon

Thanks for the suggestion. Is such a model appropriate for count data?
The library you reference seems to just be form standard regressions
(ie those with continuous dependent variables).

Thanks,
Brett

On 7/16/05, Spencer Graves <[EMAIL PROTECTED]> wrote:
>   Have you considered "lme" in library(nlme)?  If you want to go this
> route, I recommend Pinheiro and Bates (2000) Mixed-Effect Models in S
> and S-Plus (Springer).
> 
>   spencer graves
> 
> Brett Gordon wrote:
> 
> > Hello,
> >
> > I'm trying to model the entry of certain firms into a larger number of
> > distinct markets over time. I have a short time series, but a large
> > cross section (small T, big N).
> >
> > I have both time varying and non-time varying variables. Additionally,
> > since I'm modeling entry of firms, it seems like the number of
> > existing firms in the market at time t should depend on the number of
> > firms at (t-1), so I would like to include the lagged cumulative count.
> >
> > My basic question is whether it is appropriate (in a statistical
> > sense) to include both the time varying variables and the lagged
> > cumulative count variable. The lagged count aside, I know there are
> > standard extensions to count models to handle time series. However,
> > I'm not sure if anything changes when lagged values of the cumulative
> > dependent variable are added (i.e. are the regular standard errors
> > correct, are estimates consistent, etc).
> >
> > Can I still use one of the time series count models while including
> > this lagged cumulative value?
> >
> > I would greatly appreciate it if anyone can direct me to relevant
> > material on this. As a note, I have already looked at Cameron and
> > Trivedi's book.
> >
> > Many thanks,
> >
> > Brett
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html
> 
> --
> Spencer Graves, PhD
> Senior Development Engineer
> PDF Solutions, Inc.
> 333 West San Carlos Street Suite 700
> San Jose, CA 95110, USA
> 
> [EMAIL PROTECTED]
> www.pdf.com 
> Tel:  408-938-4420
> Fax: 408-280-7915
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] definition of index.array and boot.return in the code for boot

2005-07-18 Thread Obrien, Josh

Dear R friends,

I am reading the code for the function boot in package:boot in an attempt to 
learn how and where it implements the random resampling used by the 
non-parametric bootstraps.

The code contains two (apparent) functions - 'index.array'  and  'boot.return' 
- for which I can find no documentation, and which don't even seem to exist 
anywhere on the search path.  What are they?

Also, if the meanings of those two don't answer my larger question, could you 
point me to the code that implements the random resampling?

Thanks very much for your help,

Josh O'Brien
UC Davis

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] read large amount of data

2005-07-18 Thread Prof Brian Ripley

On Mon, 18 Jul 2005, Thomas Lumley wrote:

> On Mon, 18 Jul 2005, Weiwei Shi wrote:
>
>> Hi,
>> I have a dataset with 2194651x135, in which all the numbers are 0,1,2,
>> and is bar-delimited.
>>
>> I used the following approach which can handle 100,000 lines:
>> t<-scan('fv', sep='|', nlines=10)
>> t1<-matrix(t, nrow=135, ncol=10)
>> t2<-t(t1)
>> t3<-as.data.frame(t2)
>>
>> I changed my plan into using stratified sampling with replacement (col
>> 2 is my class variable: 1 or 2). The class distr is like:
>> awk -F\| '{print $2}' fv | sort | uniq -c
>> 2162792 1
>>  31859 2
>>
>> Is it possible to use R to read the whole dataset and do the
>> stratified sampling? Is it really dependent on my memory size?
>
> You may well not be able to read the whole data set into memory at once:
> it would take a bit more than 2Gb memory even to store it.

About 1.2G if stored as an integer (not double) vector.

> You can use readLines to read it in chunks of, say, 1 lines.
>
> To do stratified sampling I would suggest bernoulli sampling of slightly
> more than you want. Eg if you want 1 from class 1, keeping each
> elements with probability 10500/2162792 will get you Poisson(10500)
> elements, which will be more than 1 elements with better than 99.999%
> probability. You can then choose 1 at random from these. I can't think
> of an approach that it is guaranteed to work in one pass over the data,
> but 99.999% is pretty close.

Reservoir sampling methods will work in one pass.  See e.g. my 1987 book 
on Stochastic Simulation.  But Thomas' idea will be easier to implement in 
R, and I would have chosen 2 not 10500 and be sure I would get enough.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Survival dummy variables and some questions

2005-07-18 Thread Frank E Harrell Jr

Stephen wrote:
> Hi 1. Right perhaps this should clarify. I would like to extract 
> coefficeints for different levels of the IVs (covariate). So for 
> instance, age of onset I would want Hazards etc for every 5 years and so 
> on... The approach I took was to categorize the variables (e.g., age of 
> onset) and then turn the resultant categorical variable into a factor as 
> opposed to a variable... that is when the problems began An 
> alternative approach to pulling out different values at different levels 
> of the variable is what I seek. 2. I looked for the link, but can't 

Your needs don't require categorization.  You can request predicted 
values at any sequence of ages.  If you want hazard ratios you can take 
differences in predicted log hazards and antilog them.

Frank

-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Survival dummy variables and some questions

2005-07-18 Thread Frank E Harrell Jr

Stephen wrote:
> Hi 1. To clarify: There is a posting saying that dummy regression using 
> the coxph function is not possible... That posting may be outdated 

That does not make sense.

> 2. Q. You say 'Make sure that eventbefore is a "pre time zero" 
> measurement' please explain: Do you mean that if someone who is not left 
> censored and has no eventbefore then their value is zero? So John is in 
> my study from 1978 to 1992 and has no events prior to 1980 then John's 
> eventbefore is 0 prior to 1980. 3.  just kidding... Thanks S - 

I'm not sure you can handle left censoring by constructing something on 
the right hand side of the model.  But if you have a simple historical 
covariate such as "previous history of disease" before the start of 
follow-up time that's ok.

Frank

> Original Message - From: "Frank E Harrell Jr" To: "Stephen" Cc: 
> Sent: Monday, July 18, 2005 2:46 PM Subject: Re: [R] Survival dummy 
> variables and some questions > Stephen wrote: >> Hi All, >> >> I am 
> currently conducting some survival analyses. I would like to >> extract 
> coefficients at each level of the IVs. I read on a previous >> posting 
> that dummy regression using coxph was not >> possible. > > I'm not sure 
> what that means. > >> >> Therefore I though, hey why not categorize the 
> variables (I realize some >> folks object to categorization but the 
> paper I am >> replicating appears to have done so ...) > > The fact that 
> some people murder doesn't mean we should copy them. And > murdering 
> data, though not as serious, should also be avoided. > > Make sure that 
> eventbefore is a "pre time zero" measurement. > > Frank > >> >> and turn 
> the variables into factors and then try the analysis. >> >> E.g., 
> Dataset <- read.table("categ.dat", header=TRUE) >> >> 
> Dataset$eventbefore2c <- factor(Dataset$eventbefore) >> >> .. other IVs 
> here >> >> ... >> >> surv.mod1 <- coxph(Surv(start, stop, event) ~ sex2 
> + ageonset2c + >> eventbefore2c + daysbefore2c, data=Dataset) >> >> 
> Strangely enough, I receive a warning message when the variables are >> 
> treated in this way: X matrix deemed to be singular; variable 11 in: >> 
> coxph(Surv(start, stop, event) ~ sex2 + ageonset2c + eventbefore2c + I 
>  >> don't receive any warnings just treating the variables in their >> 
> initial continuous format. >> >> I am currently using version >> >> 
> platform i386-pc-mingw32 >> >> arch i386 os mingw32 system i386, mingw32 
>  >> status major 2 minor 1.1 >> year 2005 month 06 day 20 >> language R 
> Is this approach to dummy variable using coxph erroneous? >> >> Is there 
> another way to conduct dummy variable regression with coxph? >> >> Also, 
> if I include frailty (id) does anyone know of a useful way to >> 
> investigate frailty? >> >> If one were to plot recurrent events does 
> anyone know of a way of >> interpreting them? >> >> References & code 
> appreciated. >> >> BTW. not too familiar with R, less so with survival 
> analysis  but >> well worth the effort. >> >> Many thanks in 
> advance... >> >> Regards >> >> Stephen >> >> >>  ?"?   >> 
> http://mail.nana.co.il >> >> [[alternative HTML version deleted]] >> >> 
>  >> >> 
>  
>  >> >> __ >> 
> R-help@stat.math.ethz.ch mailing list >> 
> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the 
> posting guide! >> http://www.R-project.org/posting-guide.html > > > -- > 
> Frank E Harrell Jr Professor and Chair School of Medicine > Department 
> of Biostatistics Vanderbilt University >
> 
> נשלח ע"י דואר נענע
> http://mail.nana.co.il


-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] column-wise deletion in data-frames

2005-07-18 Thread Prof Brian Ripley

On Mon, 18 Jul 2005, Peter Dalgaard wrote:

> Prof Brian Ripley <[EMAIL PROTECTED]> writes:
>
>> On Mon, 18 Jul 2005, Peter Dalgaard wrote:
>>
>>> Chuck Cleland <[EMAIL PROTECTED]> writes:
>>>
> data <- as.data.frame(cbind(X1,X2,X3,X4,X5))
>
> So only X1, X3 and X5 are vars without any NAs and there are some vars 
> (X2 and
> X4 stacked in between that have NAs). Now, how can I extract those former 
> vars
> in a new dataset or remove all those latter vars in between that have NAs
> (without missing a single row)?
> ...

Someone else will probably suggest something more elegant, but how
 about this:

 newdata <- data[,-which(apply(data, 2, function(x){all(is.na(x))}))]
>>>
>>> (I think that's supposed to be any(), not all(), and which() is
>>> crossing the creek to fetch water.)
>>>
>>> This should do it:
>>>
>>> data[,apply(!is.na(data),2,all)]
>>
>> If `data' is a data frame, apply will coerce it to a matrix.
>
> So will is.na()...

Not quite.  is.na on a data frame will create a matrix by cbind-ing 
columns.   I was mainly commenting on Chuck Cleland's version, which 
coerces a data frame to a matrix then pulls out each column of the matrix, 
something that is quite wasteful of space.  Forming the logical matrix 
is.na(data) is also I think wasteful.

>> I would do
>> something like
>>
>> keep <- sapply(data, function(x) all(!is.na(x)))
>> data[keep]
>>
>> to use the list-like structure of a data frame and make the fewest
>> possible copies.
>
> I think the amount of copying is the same, but your version doesn't
> need to store the entire is.na(data) at once.
>
> Nitpick: !any(is.na(x)) should be marginally faster than all(!is.na(x)).

I doubt it is measurably so.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] colnames

2005-07-18 Thread Adaikalavan Ramasamy

This normally happens to me when I read in a table where the rownames
will be appended by an "X". Read help(make.names) for more information. 
Remember that R is primarily a statistical software and thus likes
colnames classes to be characters.

 mat1 <- matrix( 1:12, nc=3, dimnames=list(NULL, c(0,1,2)) )
 mat1
 0 1  2
[1,] 1 5  9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
 colnames(mat1)
[1] "0" "1" "2"
 is.character( colnames(mat1) )
[1] TRUE

However I am not able to reproduce what your problem

  mat2 <- matrix( 101:108, nc=2, dimnames=list(NULL, c("A", "B")) )
  mat2
A   B
 [1,] 101 105
 [2,] 102 106
 [3,] 103 107
 [4,] 104 108

 cbind(mat1, mat2)
  0 1  2   A   B
 [1,] 1 5  9 101 105
 [2,] 2 6 10 102 106
 [3,] 3 7 11 103 107
 [4,] 4 8 12 104 108

I tried other operation such as mat1[ , 1:2] + mat2 and 
mat1[ ,1] <- mat2[ ,2] but it does not add a preceding "X".
Can you give a reproducible example please ?

If you want to get rid of the preceding "X", try

 colnames( mat1 ) <- c("X0", "X1", "X2") 
 colnames( mat1 ) <- gsub("^X", "", colnames(mat1))

Why do want to do this anyway ?

Regards, Adai

On Mon, 2005-07-18 at 16:11 +0100, Gilbert Wu wrote:
> Hi,
>  
> I have a matrix with column names starting with a character in [0-9]. After 
> some matrix operations (e.g. copy to another matrix), R seems to add a 
> character 'X' in front of the column name. Is this a normal default behaviour 
> of R? Why has it got this behaviour? Can it be changed? What would be the 
> side effect?
>  
> Thank you.
>  
> Regards,
>  
> Gilbert
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] read large amount of data

2005-07-18 Thread Thomas Lumley

On Mon, 18 Jul 2005, Weiwei Shi wrote:

> Hi,
> I have a dataset with 2194651x135, in which all the numbers are 0,1,2,
> and is bar-delimited.
>
> I used the following approach which can handle 100,000 lines:
> t<-scan('fv', sep='|', nlines=10)
> t1<-matrix(t, nrow=135, ncol=10)
> t2<-t(t1)
> t3<-as.data.frame(t2)
>
> I changed my plan into using stratified sampling with replacement (col
> 2 is my class variable: 1 or 2). The class distr is like:
> awk -F\| '{print $2}' fv | sort | uniq -c
> 2162792 1
>  31859 2
>
> Is it possible to use R to read the whole dataset and do the
> stratified sampling? Is it really dependent on my memory size?

You may well not be able to read the whole data set into memory at once: 
it would take a bit more than 2Gb memory even to store it.

You can use readLines to read it in chunks of, say, 1 lines.

To do stratified sampling I would suggest bernoulli sampling of slightly 
more than you want. Eg if you want 1 from class 1, keeping each 
elements with probability 10500/2162792 will get you Poisson(10500) 
elements, which will be more than 1 elements with better than 99.999% 
probability. You can then choose 1 at random from these. I can't think 
of an approach that it is guaranteed to work in one pass over the data, 
but 99.999% is pretty close.

-thomas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Survival dummy variables and some questions

2005-07-18 Thread Stephen

Hi 1. Right perhaps this should clarify. I would like to extract
coefficeints for different levels of the IVs (covariate). So for
instance, age of onset I would want Hazards etc for every 5 years and so
on... The approach I took was to categorize the variables (e.g., age of
onset) and then turn the resultant categorical variable into a factor as
opposed to a variable... that is when the problems began An
alternative approach to pulling out different values at different levels
of the variable is what I seek. 2. I looked for the link, but can't
locate it. Will update. Many thanks S - Original Message - From:
"Prof Brian Ripley" To: "Stephen" Cc: "Frank E Harrell Jr" ; Sent:
Monday, July 18, 2005 4:46 PM Subject: Re: [R] Survival dummy variables
and some questions > This is almost unreadable. > > On Mon, 18 Jul 2005,
Stephen wrote: > >> Hi 1. To clarify: There is a posting saying that
dummy regression using >> the coxph function is not possible... That
posting may be outdated > > Clarification needs > > 1) an
explanation of what you mean by `dummy regression' and > > 2) a link to
the URL of that posting in the archives. > > I would normally use
RSiteSearch for the latter, but it is temporarily > unavailable. > >> 2.
Q. You say 'Make sure that eventbefore is a "pre time zero" >>
measurement' please explain: Do you mean that if someone who is not left
>> censored and has no eventbefore then their value is zero? So John is
in >> my study from 1978 to 1992 and has no events prior to 1980 then
John's >> eventbefore is 0 prior to 1980. 3.  just kidding... Thanks
S - >> Original Message - From: "Frank E Harrell Jr" To:
"Stephen" Cc: >> Sent: Monday, July 18, 2005 2:46 PM Subject: Re: [R]
Survival dummy >> variables and some questions > Stephen wrote: >> Hi
All, >> >> I am >> currently conducting some survival analyses. I would
like to >> extract >> coefficients at each level of the IVs. I read on a
previous >> posting >> that dummy regression using coxph was not >>
possible. > > I'm not sure >> what that means. > >> >> Therefore I
though, hey why not categorize the >> variables (I realize some >> folks
object to categorization but the >> paper I am >> replicating appears to
have done so ...) > > The fact that >> some people murder doesn't mean
we should copy them. And > murdering >> data, though not as serious,
should also be avoided. > > Make sure that >> eventbefore is a "pre time
zero" measurement. > > Frank > >> >> and turn >> the variables into
factors and then try the analysis. >> >> E.g., >> Dataset <-
read.table("categ.dat", header=TRUE) >> >> >> Dataset$eventbefore2c <-
factor(Dataset$eventbefore) >> >> .. other IVs >> here >> >> ... >> >>
surv.mod1 <- coxph(Surv(start, stop, event) ~ sex2 >> + ageonset2c + >>
eventbefore2c + daysbefore2c, data=Dataset) >> >> >> Strangely enough, I
receive a warning message when the variables are >> >> treated in this
way: X matrix deemed to be singular; variable 11 in: >> >>
coxph(Surv(start, stop, event) ~ sex2 + ageonset2c + eventbefore2c + I
 don't receive any warnings just treating the variables in their >>
>> initial continuous format. >> >> I am currently using version >> >>
>> platform i386-pc-mingw32 >> >> arch i386 os mingw32 system i386,
mingw32  status major 2 minor 1.1 >> year 2005 month 06 day 20 >>
language R >> Is this approach to dummy variable using coxph erroneous?
>> >> Is there >> another way to conduct dummy variable regression with
coxph? >> >> Also, >> if I include frailty (id) does anyone know of a
useful way to >> >> investigate frailty? >> >> If one were to plot
recurrent events does >> anyone know of a way of >> interpreting them?
>> >> References & code >> appreciated. >> >> BTW. not too familiar with
R, less so with survival >> analysis  but >> well worth the effort.
>> >> Many thanks in >> advance... >> >> Regards >> >> Stephen >> >> >>
 ?"?   >> >> http://mail.nana.co.il >> >> [[alternative HTML
version deleted]] >> >> >> >>

>> __ >> >>
R-help@stat.math.ethz.ch mailing list >> >>
https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the >>
posting guide! >> http://www.R-project.org/posting-guide.html > > > -- >
>> Frank E Harrell Jr Professor and Chair School of Medicine >
Department >> of Biostatistics Vanderbilt University > >> >>  ?"?
  >> http://mail.nana.co.il >> >> [[alternative HTML version
deleted]] >> >> __ >>
R-help@stat.math.ethz.ch mailing list >>
https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the
posting guide! >> http://www.R-project.org/posting-guide.html >> > > --
> Brian D. Ripley, [EMAIL PROTECTED] > Professor of Applied
Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford,
Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) >
Oxford OX

[R] read large amount of data

2005-07-18 Thread Weiwei Shi

Hi,
I have a dataset with 2194651x135, in which all the numbers are 0,1,2,
and is bar-delimited.

I used the following approach which can handle 100,000 lines:
t<-scan('fv', sep='|', nlines=10)
t1<-matrix(t, nrow=135, ncol=10)
t2<-t(t1)
t3<-as.data.frame(t2)

I changed my plan into using stratified sampling with replacement (col
2 is my class variable: 1 or 2). The class distr is like:
awk -F\| '{print $2}' fv | sort | uniq -c
2162792 1
  31859 2

Is it possible to use R to read the whole dataset and do the
stratified sampling? Is it really dependent on my memory size?
Mem:   3111736k total,  1023040k used,  2088696k free,   150160k buffers
Swap:  4008208k total,19040k used,  3989168k free,   668892k cached


Thanks,

weiwei

-- 
Weiwei Shi, Ph.D

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Picking a subset of explanatory variables corresponding to a subset or a transformed subsert of the response

2005-07-18 Thread Luwis Tapiwa Diya

I have a dataframe in which I have used a cerain procedure to select
the response a subset of the response and obtained a vector y.Now I
need to fit a glm of the response y given x(the covariates) but now I
get an error that the variables are not of equal length.How can I come
up with a way to pick the x values corresponding to only the subset of
the response I am interested in.

Regards,

Luwis

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Quantile Regression, S-Function "Rreg"

2005-07-18 Thread Stefan Hoderlein

Dear Brian,

thanks for your mail. For other reasons I need a local
polynomial. The nonparametric regression code is very
scetchy, but I have used it as base anyway.

Best

Stefan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] colnames

2005-07-18 Thread Gilbert Wu

Hi,
 
I have a matrix with column names starting with a character in [0-9]. After 
some matrix operations (e.g. copy to another matrix), R seems to add a 
character 'X' in front of the column name. Is this a normal default behaviour 
of R? Why has it got this behaviour? Can it be changed? What would be the side 
effect?
 
Thank you.
 
Regards,
 
Gilbert

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] levels() deletes other attributes

2005-07-18 Thread Heinz Tuechler

At 16:53 18.07.2005 +0200, Heinz Tuechler wrote:
>At 09:29 18.07.2005 -0500, Frank E Harrell Jr wrote:
>>Heinz Tuechler wrote:
>>> Dear All,
>>> 
>>> it seems to me that levels() deletes other attributes. See the following
>>> example:
>>> 
>>> ## example with levels
>>> f1 <- factor(c('level c','level b','level a','level c'), ordered=TRUE)
>>> attr(f1, 'testattribute') <- 'teststring'
>>> attributes(f1)
>>> levels(f1) <- c('L-A', 'L-B', 'L-C')
>>> attributes(f1)
>>> 
>>> If I run it, after assigning new levels, the class is only "factor"
instead
>>> of "ordered" "factor" and the $testattribute "teststring" is gone.
>>> 
>>> The same happens to the label() attribute of Hmisc.
>>> 
>>> ## example with levels and label
>>> library(Hmisc)
>>> f1 <- factor(c('level c','level b','level a','level c'), ordered=TRUE)
>>> label(f1) <- 'factor f1'
>>> attr(f1, 'testattribute') <- 'teststring'
>>> attributes(f1)
>>> levels(f1) <- c('L-A', 'L-B', 'L-C')
>>> attributes(f1)
>>> 
>>> Should I expect this behaviour?
>>
>>Does the same thing happen when you do
>>
>>  attr(f1,'levels') <- c('L-A',...)
>>
>>Frank
>
>No, it does not. With attr(f1,'levels') <- c('L-A', 'L-B', 'L-C') only the
>levels are changed, all other attributes remain as before.
>Heinz
>
I think, I know why attr(f1,'levels') behaves different from levels(f1) <- .
As far as I see, the method of levels for factor does not use attr() but
factor().
Heinz
>>> 
>>> Thanks
>>> 
>>> Heinz
>>> 
>>> # R-Version 2.1.0 Patched (2005-05-30)
>>> # Windows98
>>> 
>>> __
>>> R-help@stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide!
>http://www.R-project.org/posting-guide.html
>>> 
>>
>>
>>-- 
>>Frank E Harrell Jr   Professor and Chair   School of Medicine
>>  Department of Biostatistics   Vanderbilt University
>>
>>
>
>__
>R-help@stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] column-wise deletion in data-frames

2005-07-18 Thread Peter Dalgaard

Prof Brian Ripley <[EMAIL PROTECTED]> writes:

> On Mon, 18 Jul 2005, Peter Dalgaard wrote:
> 
> > Chuck Cleland <[EMAIL PROTECTED]> writes:
> >
> >>> data <- as.data.frame(cbind(X1,X2,X3,X4,X5))
> >>>
> >>> So only X1, X3 and X5 are vars without any NAs and there are some vars 
> >>> (X2 and
> >>> X4 stacked in between that have NAs). Now, how can I extract those former 
> >>> vars
> >>> in a new dataset or remove all those latter vars in between that have NAs
> >>> (without missing a single row)?
> >>> ...
> >>
> >>Someone else will probably suggest something more elegant, but how
> >> about this:
> >>
> >> newdata <- data[,-which(apply(data, 2, function(x){all(is.na(x))}))]
> >
> > (I think that's supposed to be any(), not all(), and which() is
> > crossing the creek to fetch water.)
> >
> > This should do it:
> >
> > data[,apply(!is.na(data),2,all)]
> 
> If `data' is a data frame, apply will coerce it to a matrix. 

So will is.na()...

> I would do
> something like
> 
> keep <- sapply(data, function(x) all(!is.na(x)))
> data[keep]
> 
> to use the list-like structure of a data frame and make the fewest
> possible copies.

I think the amount of copying is the same, but your version doesn't
need to store the entire is.na(data) at once.

Nitpick: !any(is.na(x)) should be marginally faster than all(!is.na(x)).

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] how to change bar colours in plot.stl

2005-07-18 Thread Prof Brian Ripley

On Mon, 18 Jul 2005, Michael Townsley wrote:

> Dear helpeRs,
>
> Is it possible to change the shading colour of the range bars in the plot
> generated by plot.stl?  By default they are grey, but I would prefer them
> white (I am preparing some graphics for a powerpoint presentation so I'm
> inverting all colours).
>
> As far as I can see plot.stl allows you to turn off the range bars, but
> nothing about the shading colour.  I tried to look at the function by typing:
>
> > plot.stl
> Error: Object "plot.stl" not found
>
> but received an error message.

So use

getS3method("plot", "stl")

to see it.  "light gray" is hardcoded, so you will need to make a copy, 
edit it, and use the edited copy.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] levels() deletes other attributes

2005-07-18 Thread Heinz Tuechler

At 09:29 18.07.2005 -0500, Frank E Harrell Jr wrote:
>Heinz Tuechler wrote:
>> Dear All,
>> 
>> it seems to me that levels() deletes other attributes. See the following
>> example:
>> 
>> ## example with levels
>> f1 <- factor(c('level c','level b','level a','level c'), ordered=TRUE)
>> attr(f1, 'testattribute') <- 'teststring'
>> attributes(f1)
>> levels(f1) <- c('L-A', 'L-B', 'L-C')
>> attributes(f1)
>> 
>> If I run it, after assigning new levels, the class is only "factor" instead
>> of "ordered" "factor" and the $testattribute "teststring" is gone.
>> 
>> The same happens to the label() attribute of Hmisc.
>> 
>> ## example with levels and label
>> library(Hmisc)
>> f1 <- factor(c('level c','level b','level a','level c'), ordered=TRUE)
>> label(f1) <- 'factor f1'
>> attr(f1, 'testattribute') <- 'teststring'
>> attributes(f1)
>> levels(f1) <- c('L-A', 'L-B', 'L-C')
>> attributes(f1)
>> 
>> Should I expect this behaviour?
>
>Does the same thing happen when you do
>
>  attr(f1,'levels') <- c('L-A',...)
>
>Frank

No, it does not. With attr(f1,'levels') <- c('L-A', 'L-B', 'L-C') only the
levels are changed, all other attributes remain as before.
Heinz

>> 
>> Thanks
>> 
>> Heinz
>> 
>> # R-Version 2.1.0 Patched (2005-05-30)
>> # Windows98
>> 
>> __
>> R-help@stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>> 
>
>
>-- 
>Frank E Harrell Jr   Professor and Chair   School of Medicine
>  Department of Biostatistics   Vanderbilt University
>
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Survival dummy variables and some questions

2005-07-18 Thread Prof Brian Ripley

This is almost unreadable.

On Mon, 18 Jul 2005, Stephen wrote:

> Hi 1. To clarify: There is a posting saying that dummy regression using
> the coxph function is not possible... That posting may be outdated

Clarification needs

1) an explanation of what you mean by `dummy regression' and

2) a link to the URL of that posting in the archives.

I would normally use RSiteSearch for the latter, but it is temporarily 
unavailable.

> 2. Q. You say 'Make sure that eventbefore is a "pre time zero"
> measurement' please explain: Do you mean that if someone who is not left
> censored and has no eventbefore then their value is zero? So John is in
> my study from 1978 to 1992 and has no events prior to 1980 then John's
> eventbefore is 0 prior to 1980. 3.  just kidding... Thanks S -
> Original Message - From: "Frank E Harrell Jr" To: "Stephen" Cc:
> Sent: Monday, July 18, 2005 2:46 PM Subject: Re: [R] Survival dummy
> variables and some questions > Stephen wrote: >> Hi All, >> >> I am
> currently conducting some survival analyses. I would like to >> extract
> coefficients at each level of the IVs. I read on a previous >> posting
> that dummy regression using coxph was not >> possible. > > I'm not sure
> what that means. > >> >> Therefore I though, hey why not categorize the
> variables (I realize some >> folks object to categorization but the
> paper I am >> replicating appears to have done so ...) > > The fact that
> some people murder doesn't mean we should copy them. And > murdering
> data, though not as serious, should also be avoided. > > Make sure that
> eventbefore is a "pre time zero" measurement. > > Frank > >> >> and turn
> the variables into factors and then try the analysis. >> >> E.g.,
> Dataset <- read.table("categ.dat", header=TRUE) >> >>
> Dataset$eventbefore2c <- factor(Dataset$eventbefore) >> >> .. other IVs
> here >> >> ... >> >> surv.mod1 <- coxph(Surv(start, stop, event) ~ sex2
> + ageonset2c + >> eventbefore2c + daysbefore2c, data=Dataset) >> >>
> Strangely enough, I receive a warning message when the variables are >>
> treated in this way: X matrix deemed to be singular; variable 11 in: >>
> coxph(Surv(start, stop, event) ~ sex2 + ageonset2c + eventbefore2c + I
>>> don't receive any warnings just treating the variables in their >>
> initial continuous format. >> >> I am currently using version >> >>
> platform i386-pc-mingw32 >> >> arch i386 os mingw32 system i386, mingw32
>>> status major 2 minor 1.1 >> year 2005 month 06 day 20 >> language R
> Is this approach to dummy variable using coxph erroneous? >> >> Is there
> another way to conduct dummy variable regression with coxph? >> >> Also,
> if I include frailty (id) does anyone know of a useful way to >>
> investigate frailty? >> >> If one were to plot recurrent events does
> anyone know of a way of >> interpreting them? >> >> References & code
> appreciated. >> >> BTW. not too familiar with R, less so with survival
> analysis  but >> well worth the effort. >> >> Many thanks in
> advance... >> >> Regards >> >> Stephen >> >> >>  ?"?   >>
> http://mail.nana.co.il >> >> [[alternative HTML version deleted]] >> >>
>
> 
> __ >>
> R-help@stat.math.ethz.ch mailing list >>
> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the
> posting guide! >> http://www.R-project.org/posting-guide.html > > > -- >
> Frank E Harrell Jr Professor and Chair School of Medicine > Department
> of Biostatistics Vanderbilt University >
>
>  ?"?  
> http://mail.nana.co.il
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] column-wise deletion in data-frames

2005-07-18 Thread Prof Brian Ripley

On Mon, 18 Jul 2005, Peter Dalgaard wrote:

> Chuck Cleland <[EMAIL PROTECTED]> writes:
>
>>> data <- as.data.frame(cbind(X1,X2,X3,X4,X5))
>>>
>>> So only X1, X3 and X5 are vars without any NAs and there are some vars (X2 
>>> and
>>> X4 stacked in between that have NAs). Now, how can I extract those former 
>>> vars
>>> in a new dataset or remove all those latter vars in between that have NAs
>>> (without missing a single row)?
>>> ...
>>
>>Someone else will probably suggest something more elegant, but how
>> about this:
>>
>> newdata <- data[,-which(apply(data, 2, function(x){all(is.na(x))}))]
>
> (I think that's supposed to be any(), not all(), and which() is
> crossing the creek to fetch water.)
>
> This should do it:
>
> data[,apply(!is.na(data),2,all)]

If `data' is a data frame, apply will coerce it to a matrix.  I would do
something like

keep <- sapply(data, function(x) all(!is.na(x)))
data[keep]

to use the list-like structure of a data frame and make the fewest 
possible copies.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Survival dummy variables and some questions

2005-07-18 Thread Stephen

Hi 1. To clarify: There is a posting saying that dummy regression using
the coxph function is not possible... That posting may be outdated
2. Q. You say 'Make sure that eventbefore is a "pre time zero"
measurement' please explain: Do you mean that if someone who is not left
censored and has no eventbefore then their value is zero? So John is in
my study from 1978 to 1992 and has no events prior to 1980 then John's
eventbefore is 0 prior to 1980. 3.  just kidding... Thanks S -
Original Message - From: "Frank E Harrell Jr" To: "Stephen" Cc:
Sent: Monday, July 18, 2005 2:46 PM Subject: Re: [R] Survival dummy
variables and some questions > Stephen wrote: >> Hi All, >> >> I am
currently conducting some survival analyses. I would like to >> extract
coefficients at each level of the IVs. I read on a previous >> posting
that dummy regression using coxph was not >> possible. > > I'm not sure
what that means. > >> >> Therefore I though, hey why not categorize the
variables (I realize some >> folks object to categorization but the
paper I am >> replicating appears to have done so ...) > > The fact that
some people murder doesn't mean we should copy them. And > murdering
data, though not as serious, should also be avoided. > > Make sure that
eventbefore is a "pre time zero" measurement. > > Frank > >> >> and turn
the variables into factors and then try the analysis. >> >> E.g.,
Dataset <- read.table("categ.dat", header=TRUE) >> >>
Dataset$eventbefore2c <- factor(Dataset$eventbefore) >> >> .. other IVs
here >> >> ... >> >> surv.mod1 <- coxph(Surv(start, stop, event) ~ sex2
+ ageonset2c + >> eventbefore2c + daysbefore2c, data=Dataset) >> >>
Strangely enough, I receive a warning message when the variables are >>
treated in this way: X matrix deemed to be singular; variable 11 in: >>
coxph(Surv(start, stop, event) ~ sex2 + ageonset2c + eventbefore2c + I
>> don't receive any warnings just treating the variables in their >>
initial continuous format. >> >> I am currently using version >> >>
platform i386-pc-mingw32 >> >> arch i386 os mingw32 system i386, mingw32
>> status major 2 minor 1.1 >> year 2005 month 06 day 20 >> language R
Is this approach to dummy variable using coxph erroneous? >> >> Is there
another way to conduct dummy variable regression with coxph? >> >> Also,
if I include frailty (id) does anyone know of a useful way to >>
investigate frailty? >> >> If one were to plot recurrent events does
anyone know of a way of >> interpreting them? >> >> References & code
appreciated. >> >> BTW. not too familiar with R, less so with survival
analysis  but >> well worth the effort. >> >> Many thanks in
advance... >> >> Regards >> >> Stephen >> >> >>  ?"?   >>
http://mail.nana.co.il >> >> [[alternative HTML version deleted]] >> >>
>> >>

>> >> __ >>
R-help@stat.math.ethz.ch mailing list >>
https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the
posting guide! >> http://www.R-project.org/posting-guide.html > > > -- >
Frank E Harrell Jr Professor and Chair School of Medicine > Department
of Biostatistics Vanderbilt University > 

 ?"?  
http://mail.nana.co.il

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] how to change bar colours in plot.stl

2005-07-18 Thread Michael Townsley

Dear helpeRs,

Is it possible to change the shading colour of the range bars in the plot 
generated by plot.stl?  By default they are grey, but I would prefer them 
white (I am preparing some graphics for a powerpoint presentation so I'm 
inverting all colours).

As far as I can see plot.stl allows you to turn off the range bars, but 
nothing about the shading colour.  I tried to look at the function by typing:

 > plot.stl
Error: Object "plot.stl" not found

but received an error message.

Thanks in advance,

MT



Dr Michael Townsley
Senior Research Fellow
Jill Dando Institute of Crime Science
University College London
Second Floor, Brook House
London, WC1E 7HN

Phone: 020 7679 0820
Fax: 020 7679 0828
Email: [EMAIL PROTECTED]
  
[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] levels() deletes other attributes

2005-07-18 Thread Heinz Tuechler

Dear All,

it seems to me that levels() deletes other attributes. See the following
example:

## example with levels
f1 <- factor(c('level c','level b','level a','level c'), ordered=TRUE)
attr(f1, 'testattribute') <- 'teststring'
attributes(f1)
levels(f1) <- c('L-A', 'L-B', 'L-C')
attributes(f1)

If I run it, after assigning new levels, the class is only "factor" instead
of "ordered" "factor" and the $testattribute "teststring" is gone.

The same happens to the label() attribute of Hmisc.

## example with levels and label
library(Hmisc)
f1 <- factor(c('level c','level b','level a','level c'), ordered=TRUE)
label(f1) <- 'factor f1'
attr(f1, 'testattribute') <- 'teststring'
attributes(f1)
levels(f1) <- c('L-A', 'L-B', 'L-C')
attributes(f1)

Should I expect this behaviour?

Thanks

Heinz

# R-Version 2.1.0 Patched (2005-05-30)
# Windows98

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] about 3d surface plot

2005-07-18 Thread Huntsinger, Reid

Have a look at the "rgl" package on CRAN.

Reid Huntsinger

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
Sent: Monday, July 18, 2005 8:24 AM
To: r-help@stat.math.ethz.ch
Subject: [R] about 3d surface plot

Hi 
I have a data format like this:
x coordinates   y coordinates z value
3.77E+002 7.13E+002 0,0
1.27E+003 5.52E+002 2,756785261
1.06E+003 4.76E+002 2,583918174
3.86E+002 7.15E+002 0,158626133
3.60E+002 1.77E+002 2,007595908
a pair of x and y corresponds to a z value. Can I use these data to draw a 
3d surface plot, in which surface constists of z values?if can, how?
thanks
Hao Wu

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] column-wise deletion in data-frames

2005-07-18 Thread Peter Dalgaard

Chuck Cleland <[EMAIL PROTECTED]> writes:

> > data <- as.data.frame(cbind(X1,X2,X3,X4,X5))
> > 
> > So only X1, X3 and X5 are vars without any NAs and there are some vars (X2 
> > and
> > X4 stacked in between that have NAs). Now, how can I extract those former 
> > vars
> > in a new dataset or remove all those latter vars in between that have NAs
> > (without missing a single row)?
> > ...
> 
>Someone else will probably suggest something more elegant, but how 
> about this:
> 
> newdata <- data[,-which(apply(data, 2, function(x){all(is.na(x))}))]

(I think that's supposed to be any(), not all(), and which() is
crossing the creek to fetch water.)

This should do it:

data[,apply(!is.na(data),2,all)]

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] column-wise deletion in data-frames

2005-07-18 Thread Chuck Cleland

[EMAIL PROTECTED] wrote:
> Hi,
> 
> I have a huge dataframe and like to delete all those variables from it that 
> that
> have NAs. The deletion of vars should be done column-wise, and not row-wise as
> na.omit would do it, because I have some vars that have NAs for all rows thus
> using na.omit I would end up with no obs. Is there a convenient way to do this
> R?
> 
> To make the question more explicit. Imagine a dataset that looks something 
> like
> this (but much bigger)
> 
> X1 <- rnorm(1000)
> X2 <- c(rep(NA,1000))
> X3 <- rnorm(1000)
> X4 <- c(rep(NA,499),1,44,rep(NA,499))
> X5 <- rnorm(1000)
> 
> data <- as.data.frame(cbind(X1,X2,X3,X4,X5))
> 
> So only X1, X3 and X5 are vars without any NAs and there are some vars (X2 and
> X4 stacked in between that have NAs). Now, how can I extract those former vars
> in a new dataset or remove all those latter vars in between that have NAs
> (without missing a single row)?
> ...

   Someone else will probably suggest something more elegant, but how 
about this:

newdata <- data[,-which(apply(data, 2, function(x){all(is.na(x))}))]

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 452-1424 (M, W, F)
fax: (917) 438-0894

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Survival dummy variables and some questions

2005-07-18 Thread Frank E Harrell Jr

Stephen wrote:
> Hi All,
> 
>  
> 
> I am currently conducting some survival analyses. I would like to
> extract coefficients at each level of the IVs. 
> 
> I read on a previous posting that dummy regression using coxph was not
> possible. 

I'm not sure what that means.

> 
> Therefore I though, hey why not categorize the variables 
> 
> (I realize some folks object to categorization but the paper I am
> replicating appears to have done so ...)

The fact that some people murder doesn't mean we should copy them.  And 
murdering data, though not as serious, should also be avoided.

Make sure that eventbefore is a "pre time zero" measurement.

Frank

> 
> and turn the variables into factors and then try the analysis.
> 
>  
> 
> E.g., 
> 
> Dataset <- read.table("categ.dat", header=TRUE)
> 
> Dataset$eventbefore2c <- factor(Dataset$eventbefore)
> 
> .. other IVs here
> 
> ...
> 
> surv.mod1 <- coxph(Surv(start, stop, event) ~ sex2 + ageonset2c +
> eventbefore2c  + daysbefore2c, data=Dataset)
> 
>  
> 
> Strangely enough, I receive a warning message when the variables are
> treated in this way: X matrix deemed to be singular; variable 11 in:
> coxph(Surv(start, stop, event) ~ sex2 + ageonset2c + eventbefore2c +  
> 
>  
> 
> I don’t receive any warnings just treating the variables in their
> initial continuous format.
> 
> I am currently using version
> 
> platform i386-pc-mingw32
> 
> arch i386   
> 
> os   mingw32
> 
> system   i386, mingw32  
> 
> status  
> 
> major2  
> 
> minor1.1
> 
> year 2005   
> 
> month06 
> 
> day  20 
> 
> language R 
> 
>  
> 
>  
> 
> Is this approach to dummy variable using coxph erroneous?
> 
>  
> 
> Is there another way to conduct dummy variable regression with coxph?
> 
>  
> 
> Also, if I include frailty (id) does anyone know of a useful way to
> investigate frailty?
> 
>  
> 
> If one were to plot recurrent events does anyone know of a way of
> interpreting them?
> 
>  
> 
> References & code appreciated.
> 
>  
> 
> BTW. not too familiar with R, less so with survival analysis  but
> well worth the effort.
> 
>  
> 
> Many thanks in advance...
> 
>  
> 
> Regards
> 
>  
> 
> Stephen
> 
> 
>  ?"?  
> http://mail.nana.co.il
> 
>   [[alternative HTML version deleted]]
> 
> 
> 
> 
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] column-wise deletion in data-frames

2005-07-18 Thread jhainm

Hi,

I have a huge dataframe and like to delete all those variables from it that that
have NAs. The deletion of vars should be done column-wise, and not row-wise as
na.omit would do it, because I have some vars that have NAs for all rows thus
using na.omit I would end up with no obs. Is there a convenient way to do this
R?

To make the question more explicit. Imagine a dataset that looks something like
this (but much bigger)

X1 <- rnorm(1000)
X2 <- c(rep(NA,1000))
X3 <- rnorm(1000)
X4 <- c(rep(NA,499),1,44,rep(NA,499))
X5 <- rnorm(1000)

data <- as.data.frame(cbind(X1,X2,X3,X4,X5))

So only X1, X3 and X5 are vars without any NAs and there are some vars (X2 and
X4 stacked in between that have NAs). Now, how can I extract those former vars
in a new dataset or remove all those latter vars in between that have NAs
(without missing a single row)?

Thank you very much!

Best,
jens

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] about 3d surface plot

2005-07-18 Thread 吴昊

Hi 
I have a data format like this:

x coordinates   y coordinates z value
3.77E+002 7.13E+002 0,0
1.27E+003 5.52E+002 2,756785261
1.06E+003 4.76E+002 2,583918174
3.86E+002 7.15E+002 0,158626133
3.60E+002 1.77E+002 2,007595908
a pair of x and y corresponds to a z value. Can I use these data to draw a 
3d surface plot, in which surface constists of z values?if can, how?

thanks
Hao Wu

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Proportion test in three-chices experiment

2005-07-18 Thread NOEL Yvonnick


Rafael,

when testing binomial hypotheses with both repeated measures and 
inter-group factors, you should make explicit your model on the 
intra-subject part of the data. You can't do Chi-square comparisons on 
count data that mix independent and dependent measures.


But you can define a Bernoulli logistic model at the individual single 
response level, and define a proper subject factor, and a stimulus-type 
factor, and possibly an item factor (nested within the stimulus-type 
category). This may be viewed as a Rasch model of measurement.


Within this model, coefficient estimates on the subject factor are 
measures of individual ability that can be a posteriori introduced in an 
ANOVA.


In some cases, you can model the intra-subject part of the data as a 
temporal model, assuming for instance that subjects are getting more 
efficient as time goes on.


HTH,

Yvonnick.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Quantile Regression, S-Function "Rreg"

2005-07-18 Thread Stefan Hoderlein

I have the following problem: I would like to do a
nonparametric quatile regression. Thus far I have used
the quantreg package and done a local quadratic, but
it does not seem to work well.

Alternatively, I have tried with an older S version I
have the function rreg, and used 

rreg(datax,datay,method=function(u) 
   {(abs(u)+(2*alpha-1)*u)},iter=100)

which gave me pretty acceptable results. What I would
like to do now is to have a similar command in R, but
with the functions

rlm  and   lqs

I do not seem to be able to get somewhere. Can anybody
help? 

I found in the archive under 

Message-ID:
<[EMAIL PROTECTED]>

a reply from Brian Ripley on a similar question, but
was not able to download the experimental file from
his website...

Thanks

Stefan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] rmpi in windows

2005-07-18 Thread Uzuner, Tolga

Hi Uwe,

Thanks for your kind response.

best,
Tolga

Please follow the attached hyperlink to an important disclaimer
http://www.csfb.com/legal_terms/disclaimer_europe.shtml

-Original Message-
From: Uwe Ligges [mailto:[EMAIL PROTECTED]
Sent: 16 July 2005 14:29
To: Uzuner, Tolga
Cc: 'r-help@stat.math.ethz.ch'
Subject: Re: [R] rmpi in windows

Uzuner, Tolga wrote:
> Hi Folks,
> Has anyone been able to get rmpi to work under windows ?

Rmpi uses the LAM implementation of MPI.
See
http://www.lam-mpi.org/faq/category12.php3
and read FAQ 2 which implicitly tells us that there is no native port, 
hence you cannot run it under Windows.
The package maintainer may know better.

Uwe Ligges

BTW: Why do you ask twice (in private message and on R-help)? I am 
reading R-help anyway ...

> Thanks,
> Tolga
> 
> Please follow the attached hyperlink to an important disclaimer
> 
> 
> 
> 
> ==
> Please access the attached hyperlink for an important electronic 
> communications disclaimer: 
> 
> http://www.csfb.com/legal_terms/disclaimer_external_email.shtml
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

==
Please access the attached hyperlink for an important electronic communications 
disclaimer: 

http://www.csfb.com/legal_terms/disclaimer_external_email.shtml

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] dataframes of unequal size

2005-07-18 Thread Renuka Sane

I have two dataframes C and C1. Each has three columns viz. state, psu
and weight. The dataframes are of unequal size i.e. C1 could be
2/25/50 rows and C has 42000 rows.  C1 is the master table i.e.
C1$state, C1$psu and C1$weight are never the same. ThisA. P., Urban, 0
is not so for C.

For example
C
state, psu,weight
A. P., Urban, 0
Mah., Rural, 0
W.B., Rural,0
Ass., Rural,0
M. P., Urban,0
A. P., Urban, 0
...

C1
state, psu, weight
A. P., Urban, 1.3
A. P., Rural, 1.2
M. P., Urban, 0.8
..

For every row of C, I want to check if C$state==C1$state and
C$psu==C1$psu. If it is, I want C$weight <- C1$weight, else C$weight
should be zero.

I am doing the following
for( i in 1:length(C$weight)) {
 C$w[C$state[i]==C1$state & C$psu[i]==C1$psu] <- C1$w[C$state[i] ==
C1$state & C$psu[i] == C1$psu]
}

This gives me the correct replacements for the number of rows in C1
and then just repeats the same weights for the remaning rows in C.

Can someone point out the error in what I am doing or show the correct
way of doing this?

Thanks,
Renuka

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Contingency-Coefficient, Variance

2005-07-18 Thread Christoph Buser

Hi

You can easily implement it by:

Mag. Ferri Leberl writes:
 > Dear Everybody!
 > Excuse me for this newbie-questions, but I have not found answers to these 
 > Question:
 > 
 > - Is there a command calculating the variance if the whole group is known 
 > (thus dividing by n instead of n-1)?

var(x)*(n-1)/n

 > 
 > - Which is the command to calculate the
 > - contingency-coefficient?

Have a look at ?chisq.test
There you can calculate and extract the chi-square statistic and
then it is easy to program the needed formula.
Please note that there is an argument "exact" in
chisq.test(). You can set it on FALSE if you do not want to
apply Yates continuity correction.

Regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--

 > 
 > Thank you in advance.
 > Mag. Ferri Leberl
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

50 matches

Mail list logo