Re: [R] Merge postscript files into ps/pdf

2010-11-11 Thread Joshua Wiley
Hi Ralf,

It is easy to make a bunch of graphs in one file (each on its own
page), using the onefile = TRUE argument to postscript() or pdf()
(depending what type of file you want).  I usually use Ghostscript for
tinkering with already created postscript or PDF files.  To me there
is more appropriate software than R to use if you want to
edit/merge/manipulate postscript or PDF files.

Cheers,

Josh

On Thu, Nov 11, 2010 at 11:07 PM, Ralf B  wrote:
> I created multiple postscript files using ?postscript. How can I merge
> them into a single postscript file using R? How can I merge them into
> a single pdf file?
>
> Thanks a lot,
> Ralf
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Merge postscript files into ps/pdf

2010-11-11 Thread Ralf B
I created multiple postscript files using ?postscript. How can I merge
them into a single postscript file using R? How can I merge them into
a single pdf file?

Thanks a lot,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding meta-data when creating objects. e.g: changing "<-" so to (for example) add "creation time" - how-to and pros/cons?

2010-11-11 Thread Tal Galili
Hello Barry,
With regards to the identical == FALSE, it didn't occur to me - great point,
thank you.
So question: how did you end up changing the "<-", so to enable the creation
of the .metadata object?

Cheers,
Tal

Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--




On Thu, Nov 11, 2010 at 12:30 PM, Barry Rowlingson <
b.rowling...@lancaster.ac.uk> wrote:

> On Thu, Nov 11, 2010 at 9:37 AM, Tal Galili  wrote:
>
> > 4) My real intention is to somehow change the "<-" operator (not simply
> the
> > assign).  I am unsure as to how to do that.
> > 5) Are there any major pros/cons to the adding of such meta-data to
> objects?
> > (for example, excessive overhead on memory/performance)
>
>  I had a go at doing (4) a few years back. The major problem I had was
> that if you do:
>
>  y <- 1:10
>  x <- y
>
>  with a <- operator that sets a timestamp then:
>
>  identical(x,y) is FALSE.
>
> I implemented timestamping by adding an attribute to objects during
> assigment by modifying the C source, and then lots and lots of R's
> tests failed during build because identical things were no longer
> identical.
>
> Might be better to store your metadata in a separate object, .metadata
> in the global env perhaps? Then just do:
>
> .metadata[[name_of_thing]]  = list(modified=Sys.time())
>
> in your modified assign.
>
> Performance will only be a problem if your program is doing nothing
> else except fiddle with metadata, I reckon. Your program does do
> something useful, doesn't it?
>
> Barry
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Time Delay / Wait

2010-11-11 Thread Michael Sumner
?Sys.sleep

On Fri, Nov 12, 2010 at 5:54 PM, Santosh Srinivas
 wrote:
> Hi Group,
>
> Is there something like a delay function based on System time or equivalent?
> I basically am generating a few graphs and I would like to see each graph
> for say 2mins before moving on to the next one?
>
> I can always have an empty for-loop I guess but is there a better way?
>
> Thanks,
> S
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Michael Sumner
Institute for Marine and Antarctic Studies, University of Tasmania
Hobart, Australia
e-mail: mdsum...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Time Delay / Wait

2010-11-11 Thread Santosh Srinivas
Hi Group,

Is there something like a delay function based on System time or equivalent?
I basically am generating a few graphs and I would like to see each graph
for say 2mins before moving on to the next one?

I can always have an empty for-loop I guess but is there a better way?

Thanks,
S

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] font family in R

2010-11-11 Thread David Winsemius


On Nov 12, 2010, at 12:27 AM, sachinthaka.abeyward...@allianz.com.au  
wrote:




Hi All,

I've been looking around (maybe not extensively) but I couldn't find  
any

documentation on the list of fonts thats useable with R.


?device
?pdf
?pdfFonts




Im trying to change the font of the title, seems like mtext is the  
only way

so far, which can actually write on top of each other depending on the
lines:

plot(1:5,1:5);
mtext("Title",side=3,cex=3.5,line=1);
mtext("Sub Title",side=3,cex=2.1,line=0);

Ofcourse I wont be using titles that big but would be great if R  
handled

that on its own rather than me having to worry about maximum height of
tables.

Also are there any packages out there that is better than the base R
graphics (mind you I do like it).

Thanks,
Sachin



David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] switching only axis off in plot

2010-11-11 Thread David Winsemius


On Nov 12, 2010, at 12:50 AM, sachinthaka.abeyward...@allianz.com.au  
wrote:




Hi R,

In the following code my x-axis is formatted in month format. Which Im
happy with. The y-axis is what I want to re-format with something  
else.


Did you mean x-axis?


My
question is, is it possible just to switch off the xaxis in plot  
function

(see below). If not how do you get the months to show up as FEB-,
MAR- and so on, so I could fit a label on x-axis.


axis(1, labels=toupper(format(period, format="%b-%Y")), at=period)



Thanks,
Sachin
p.s. sorry about corporate notice.

period<-c(100120,100220,100320);
y<-1:3;
period<-as.Date(strptime(period,("%y%m%d")));

#the x-axis has a nice format with the month names
win.graph();
plot(period,y);

#I dont want to lose this x-axis formatting :-(


Sarcasm does not translate well across text based communication.  
Please be a bit more concrete about what you want (or don't want.)



win.graph();
plot(period,y, axes=FALSE, ann=FALSE);
axis(2, at=1:3, labels=c("A","B","C"));

--- Please consider the environment before printing this email ---

Allianz - Best General Insurance Company of the Year 2010*
Allianz - General Insurance Company of the Year 2009+

* Australian Banking and Finance Insurance Awards
+ Australia and New Zealand Insurance Industry Awards

This email and any attachments has been sent by Allianz Australia  
Insurance Limited (ABN 15 000 122 850) and is intended solely for  
the addressee. It is confidential, may contain personal information  
and may be subject to legal professional privilege. Unauthorised use  
is strictly prohibited and may be unlawful. If you have received  
this by mistake, confidentiality and any legal privilege are not  
waived or lost and we ask that you contact the sender and delete and  
destroy this and any other copies. In relation to any legal use you  
may make of the contents of this email, you must ensure that you  
comply with the Privacy Act (Cth) 1988 and you should note that the  
contents may be subject to copyright and therefore may not be  
reproduced, communicated or adapted without the express consent of  
the owner of the copyright.
Allianz will not be liable in connection with any data corruption,  
interruption, delay, computer virus or unauthorised access or  
amendment to the contents of this email. If this email is a  
commercial electronic message and you would prefer not to receive  
further commercial electronic messages from Allianz, please forward  
a copy of this email to unsubscr...@allianz.com.au with the word  
unsubscribe in the subject header.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] switching only axis off in plot

2010-11-11 Thread Dennis Murphy
Hi:

Do you mean to leave the x-axis as is and redo the y? If so, use graphical
parameter yaxt:

x <- y <- 1:10
plot(x, y, yaxt = 'n')
axis(2,  at = ..., lab = ..., ...)# fill in the blanks

In case it matters, xaxt = 'n' suppresses the x-axis labels but not the y
labels.

HTH,
Dennis

On Thu, Nov 11, 2010 at 9:50 PM, wrote:

>
> Hi R,
>
> In the following code my x-axis is formatted in month format. Which Im
> happy with. The y-axis is what I want to re-format with something else. My
> question is, is it possible just to switch off the xaxis in plot function
> (see below). If not how do you get the months to show up as FEB-,
> MAR- and so on, so I could fit a label on x-axis.
>
> Thanks,
> Sachin
> p.s. sorry about corporate notice.
>
> period<-c(100120,100220,100320);
> y<-1:3;
> period<-as.Date(strptime(period,("%y%m%d")));
>
> #the x-axis has a nice format with the month names
> win.graph();
> plot(period,y);
>
> #I dont want to lose this x-axis formatting :-(
> win.graph();
> plot(period,y, axes=FALSE, ann=FALSE);
> axis(2, at=1:3, labels=c("A","B","C"));
>
> --- Please consider the environment before printing this email ---
>
> Allianz - Best General Insurance Company of the Year 2010*
> Allianz - General Insurance Company of the Year 2009+
>
> * Australian Banking and Finance Insurance Awards
> + Australia and New Zealand Insurance Industry Awards
>
> This email and any attachments has been sent by Allianz Australia Insurance
> Limited (ABN 15 000 122 850) and is intended solely for the addressee. It is
> confidential, may contain personal information and may be subject to legal
> professional privilege. Unauthorised use is strictly prohibited and may be
> unlawful. If you have received this by mistake, confidentiality and any
> legal privilege are not waived or lost and we ask that you contact the
> sender and delete and destroy this and any other copies. In relation to any
> legal use you may make of the contents of this email, you must ensure that
> you comply with the Privacy Act (Cth) 1988 and you should note that the
> contents may be subject to copyright and therefore may not be reproduced,
> communicated or adapted without the express consent of the owner of the
> copyright.
> Allianz will not be liable in connection with any data corruption,
> interruption, delay, computer virus or unauthorised access or amendment to
> the contents of this email. If this email is a commercial electronic message
> and you would prefer not to receive further commercial electronic messages
> from Allianz, please forward a copy of this email to
> unsubscr...@allianz.com.au with the word unsubscribe in the subject
> header.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] calculate variance with trimm mean

2010-11-11 Thread David Winsemius


On Nov 12, 2010, at 1:07 AM, Jumlong Vongprasert wrote:


Dear All
   I have problem about calculte variance use R.
   I use trimm mean by mean(x, trim=.05) and I want to calculate  
variance

with trimm=.05.
   How I can do this.


methods(mean)

Why not take the code for mean.default and substitute var  
for .Internal( mean(x)) ?



Many Thanks.
Jumlong

--
Jumlong Vongprasert Assist, Prof.
Institute of Research and Development
Ubon Ratchathani Rajabhat University

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] calculate variance with trimm mean

2010-11-11 Thread Jumlong Vongprasert
Dear All
I have problem about calculte variance use R.
I use trimm mean by mean(x, trim=.05) and I want to calculate variance
with trimm=.05.
How I can do this.
Many Thanks.
Jumlong

-- 
Jumlong Vongprasert Assist, Prof.
Institute of Research and Development
Ubon Ratchathani Rajabhat University
Ubon Ratchathani
THAILAND
34000

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] switching only axis off in plot

2010-11-11 Thread sachinthaka . abeywardana

Hi R,

In the following code my x-axis is formatted in month format. Which Im
happy with. The y-axis is what I want to re-format with something else. My
question is, is it possible just to switch off the xaxis in plot function
(see below). If not how do you get the months to show up as FEB-,
MAR- and so on, so I could fit a label on x-axis.

Thanks,
Sachin
p.s. sorry about corporate notice.

period<-c(100120,100220,100320);
y<-1:3;
period<-as.Date(strptime(period,("%y%m%d")));

#the x-axis has a nice format with the month names
win.graph();
plot(period,y);

#I dont want to lose this x-axis formatting :-(
win.graph();
plot(period,y, axes=FALSE, ann=FALSE);
axis(2, at=1:3, labels=c("A","B","C"));

--- Please consider the environment before printing this email --- 

Allianz - Best General Insurance Company of the Year 2010*
Allianz - General Insurance Company of the Year 2009+ 

* Australian Banking and Finance Insurance Awards
+ Australia and New Zealand Insurance Industry Awards 

This email and any attachments has been sent by Allianz Australia Insurance 
Limited (ABN 15 000 122 850) and is intended solely for the addressee. It is 
confidential, may contain personal information and may be subject to legal 
professional privilege. Unauthorised use is strictly prohibited and may be 
unlawful. If you have received this by mistake, confidentiality and any legal 
privilege are not waived or lost and we ask that you contact the sender and 
delete and destroy this and any other copies. In relation to any legal use you 
may make of the contents of this email, you must ensure that you comply with 
the Privacy Act (Cth) 1988 and you should note that the contents may be subject 
to copyright and therefore may not be reproduced, communicated or adapted 
without the express consent of the owner of the copyright.
Allianz will not be liable in connection with any data corruption, 
interruption, delay, computer virus or unauthorised access or amendment to the 
contents of this email. If this email is a commercial electronic message and 
you would prefer not to receive further commercial electronic messages from 
Allianz, please forward a copy of this email to unsubscr...@allianz.com.au with 
the word unsubscribe in the subject header.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] font family in R

2010-11-11 Thread sachinthaka . abeywardana

Hi All,

I've been looking around (maybe not extensively) but I couldn't find any
documentation on the list of fonts thats useable with R.

Im trying to change the font of the title, seems like mtext is the only way
so far, which can actually write on top of each other depending on the
lines:

plot(1:5,1:5);
mtext("Title",side=3,cex=3.5,line=1);
mtext("Sub Title",side=3,cex=2.1,line=0);

Ofcourse I wont be using titles that big but would be great if R handled
that on its own rather than me having to worry about maximum height of
tables.

Also are there any packages out there that is better than the base R
graphics (mind you I do like it).

Thanks,
Sachin
p.s. sorry about corporate notice.

--- Please consider the environment before printing this email --- 

Allianz - Best General Insurance Company of the Year 2010*
Allianz - General Insurance Company of the Year 2009+ 

* Australian Banking and Finance Insurance Awards
+ Australia and New Zealand Insurance Industry Awards 

This email and any attachments has been sent by Allianz Australia Insurance 
Limited (ABN 15 000 122 850) and is intended solely for the addressee. It is 
confidential, may contain personal information and may be subject to legal 
professional privilege. Unauthorised use is strictly prohibited and may be 
unlawful. If you have received this by mistake, confidentiality and any legal 
privilege are not waived or lost and we ask that you contact the sender and 
delete and destroy this and any other copies. In relation to any legal use you 
may make of the contents of this email, you must ensure that you comply with 
the Privacy Act (Cth) 1988 and you should note that the contents may be subject 
to copyright and therefore may not be reproduced, communicated or adapted 
without the express consent of the owner of the copyright.
Allianz will not be liable in connection with any data corruption, 
interruption, delay, computer virus or unauthorised access or amendment to the 
contents of this email. If this email is a commercial electronic message and 
you would prefer not to receive further commercial electronic messages from 
Allianz, please forward a copy of this email to unsubscr...@allianz.com.au with 
the word unsubscribe in the subject header.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] goodness-of-fit test

2010-11-11 Thread Andrew Halford
Hi All,

I have a dataset consisting of abundance counts of a fish and I want to test
if my data are poisson in distribution or normal.

My first question is whether it is more appropriate to model my data
according to a poisson distribution (if my test says it conforms) or use
transformed data to normalise the data distribution?

I have been using the vcd package

gf<-goodfit(Y,type= "poisson",method= "MinChisq")

but i get the following error message

Warning message:
In optimize(chi2, range(count)) : NA/Inf replaced by maximum positive value


I then binned my count data to see if that might help

   V1 V2
1   5 34
2  10 30
3  15 10
4  20  8
5  25  7
6  30  0
7  35  3
8  40  2
9  45  3
10 50  1
11 55  0
12 60  1

but still received an error message

 Goodness-of-fit test for poisson distribution

X^2 df P(> X^2)
Pearson 2573372 330
Warning message:
In summary.goodfit(gf) : Chi-squared approximation may be incorrect

Am I getting caught out because of zero counts or frequencies in my data?

Andy






-- 
Andrew Halford Ph.D
Associate Research Scientist
Marine Laboratory
University of Guam
Ph: +1 671 734 2948

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] JGR install problem

2010-11-11 Thread chinling

when i double click JGR.exe, it show " Can't Find Java/R interface (JRI)..."
my R version is R 2.12.0.
Did anyone know how to install the JGR?
-- 
View this message in context: 
http://r.789695.n4.nabble.com/JGR-install-problem-tp3039013p3039013.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Troubleshooting sweave

2010-11-11 Thread sachinthaka . abeywardana
seems like the texi2dvi doesnt exist on R2.12.0 anymore?

Sachin
p.s. sorry about corporate notice

--- Please consider the environment before printing this email --- 

Allianz - Best General Insurance Company of the Year 2010*
Allianz - General Insurance Company of the Year 2009+ 

* Australian Banking and Finance Insurance Awards
+ Australia and New Zealand Insurance Industry Awards 

This email and any attachments has been sent by Allianz Australia Insurance 
Limited (ABN 15 000 122 850) and is intended solely for the addressee. It is 
confidential, may contain personal information and may be subject to legal 
professional privilege. Unauthorised use is strictly prohibited and may be 
unlawful. If you have received this by mistake, confidentiality and any legal 
privilege are not waived or lost and we ask that you contact the sender and 
delete and destroy this and any other copies. In relation to any legal use you 
may make of the contents of this email, you must ensure that you comply with 
the Privacy Act (Cth) 1988 and you should note that the contents may be subject 
to copyright and therefore may not be reproduced, communicated or adapted 
without the express consent of the owner of the copyright.
Allianz will not be liable in connection with any data corruption, 
interruption, delay, computer virus or unauthorised access or amendment to the 
contents of this email. If this email is a commercial electronic message and 
you would prefer not to receive further commercial electronic messages from 
Allianz, please forward a copy of this email to unsubscr...@allianz.com.au with 
the word unsubscribe in the subject header.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov Smirnov Test

2010-11-11 Thread Kerry
Thanks Ted and Greg. I had actually tried pnorm and after having
problems, thought maybe I was misunderstanding dnorm as a variable in
ks.test due to over- (more likely under) thinking it. I'm assuming now
that ks.test will consider my data in cumulative form (makes sense now
that I think about it, but I didn't want to assume any steps that the
R version of k-s test takes). I plan to explore the ideas and run the
simulations you sent in full over the weekend.

Thanks again!
Kerry

On Nov 11, 12:05 pm, Greg Snow  wrote:
> Consider the following simulations (also fixing the pnorm instead of dnorm 
> that Ted pointed out and I missed):
>
> out1 <- replicate(1, {
>         x <- rnorm(1000, 100, 3);
>         ks.test( x, pnorm, mean=100, sd=3 )$p.value
>         } )
>
> out2 <- replicate(1, {
>         x <- rnorm(1000, 100, 3);
>         ks.test( x, pnorm, mean=mean(x), sd=sd(x) )$p.value
>         } )
>
> par(mfrow=c(2,1))
> hist(out1)
> hist(out2)
>
> mean(out1 <= 0.05 )
> mean(out2 <= 0.05 )
>
> In both cases the null hypothesis is true (or at least a meaningful 
> approximation to true) so the p-values should follow a uniform distribution.  
> In the case of out1 where the mean and sd are specified as part of the null 
> the p-values are reasonably uniform and the rejection rate is close to alpha 
> (should asymptotically approach alpha as the number of simulations 
> increases).  However looking at out2, where the parameters are set not by 
> outside knowledge or tests, but rather from the observed data, the p-values 
> are clearly not uniform and the rejection rate is far from alpha.
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.s...@imail.org801.408.8111begin_of_the_skype_highlighting  801.408.8111  end_of_the_skype_highlighting
>
>
>
> > -Original Message-
> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> > project.org] On Behalf Of Kerry
> > Sent: Thursday, November 11, 2010 12:02 AM
> > To: r-h...@r-project.org
> > Subject: Re: [R] Kolmogorov Smirnov Test
>
> > Thanks for the feedback. My goal is to run a simple test to show that
> > the data cannot be rejected as either normally or uniformally
> > distributed (depening on the variable), which is what a previous K-S
> > test run using SPSS had shown. The actual distribution I compare to my
> > sample only matters that it would be rejected were my data multi-
> > modal. This way I can suggest the data is from the same population. I
> > later run PCA and cluster analyses to confirm this but I want an easy
> > stat to start with for the individual variables.
>
> > I didn't think I was comparing my data against itself, but rather
> > again a normal distribution with the same mean and standard deviation.
> > Using the mean seems necessary, so is it incorrect to have the same
> > standard deviation too? I need to go back and read on the K-S test to
> > see what the appropriate constraints are before bothering anyone for
> > more help. Sorry, I thought I had it.
>
> > Thanks again,
> > kbrownk
>
> > On Nov 11, 12:40 am, Greg Snow  wrote:
> > > The way you are running the test the null hypothesis is that the data
> > comes from a normal distribution with mean=0 and standard deviation =
> > 1.  If your minimum data value is 0, then it seems very unlikely that
> > the mean is 0.  So the test is being strongly influenced by the mean
> > and standard deviation not just the shape of the distribution.
>
> > > Note that the KS test was not designed to test against a distribution
> > with parameters estimated from the same data (you can do the test, but
> > it makes the p-value inaccurate).  You can do a little better by
> > simulating the process and comparing the KS statistic to the
> > simulations rather than looking at the computed p-value.
>
> > > However you should ask yourself why you are doing the normality tests
> > in the first place.  The common reasons that people do this don't match
> > with what the tests actually test (see the fortunes on normality).
>
> > > --
> > > Gregory (Greg) L. Snow Ph.D.
> > > Statistical Data Center
> > > Intermountain Healthcare
> > > greg.s...@imail.org
> > > 801.408.8111
>
> > > > -Original Message-
> > > > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> > > > project.org] On Behalf Of Kerry
> > > > Sent: Wednesday, November 10, 2010 9:23 PM
> > > > To: r-h...@r-project.org
> > > > Subject: [R] Kolmogorov Smirnov Test
>
> > > > I'm using ks.test (mydata, dnorm) on my data. I know some of my
> > > > different variable samples (mydata1, mydata2, etc) must be normally
> > > > distributed but the p value is always < 2.0^-16 (the 2.0 can change
> > > > but not the exponent).
>
> > > > I want to test mydata against a normal distribution. What could I
> > be
> > > > doing wrong?
>
> > > > I tried instead using rnorm to create a normal distribution: y =
> > rnorm
> > > > (68,mean=mydata, sd=mydata), where N= t

Re: [R] Troubleshooting sweave

2010-11-11 Thread Duncan Murdoch

sachinthaka.abeyward...@allianz.com.au wrote:

Hi All,

I've reproduced the example from Prof. Friedrich Leisch's webpage. When I
write sweave("Example-1.Snw") OR sweave("Example-1.Rnw"), (yes, I renamed
them). I get the following error:

Writing to file example-1.tex
Processing code chunks ...
 1 : echo term verbatim

Error:  chunk 1
Error in library(ctest) : there is no package called 'ctest'

Also while I'm at it, is there an R command to compile the tex file as a
pdf or does the Sweave() function do that for me?


The patchDVI package on R-forge includes functions to process (possibly 
multiple) Sweave inputs into .pdf or .dvi files.


E.g.

SweavePDF("Example.Rnw")


Duncan Murdoch



Thanks,
Sachin
p.s. sorry about the corporate notice.

example-1.Rnw: from http://www.statistik.lmu.de/~leisch/Sweave/

\documentclass[a4paper]{article}

\title{Sweave Example 1}
\author{Friedrich Leisch}

\begin{document}

\maketitle

In this example we embed parts of the examples from the
\texttt{kruskal.test} help page into a \LaTeX{} document:

<<>>=
data(airquality)
library(ctest)
kruskal.test(Ozone ~ Month, data = airquality)
@
which shows that the location parameter of the Ozone
distribution varies significantly from month to month. Finally we
include a boxplot of the data:

\begin{center}
<>=
boxplot(Ozone ~ Month, data = airquality)
@
\end{center}

\end{document}

--- Please consider the environment before printing this email --- 


Allianz - Best General Insurance Company of the Year 2010*
Allianz - General Insurance Company of the Year 2009+ 


* Australian Banking and Finance Insurance Awards
+ Australia and New Zealand Insurance Industry Awards 


This email and any attachments has been sent by Allianz Australia Insurance 
Limited (ABN 15 000 122 850) and is intended solely for the addressee. It is 
confidential, may contain personal information and may be subject to legal 
professional privilege. Unauthorised use is strictly prohibited and may be 
unlawful. If you have received this by mistake, confidentiality and any legal 
privilege are not waived or lost and we ask that you contact the sender and 
delete and destroy this and any other copies. In relation to any legal use you 
may make of the contents of this email, you must ensure that you comply with 
the Privacy Act (Cth) 1988 and you should note that the contents may be subject 
to copyright and therefore may not be reproduced, communicated or adapted 
without the express consent of the owner of the copyright.
Allianz will not be liable in connection with any data corruption, 
interruption, delay, computer virus or unauthorised access or amendment to the 
contents of this email. If this email is a commercial electronic message and 
you would prefer not to receive further commercial electronic messages from 
Allianz, please forward a copy of this email to unsubscr...@allianz.com.au with 
the word unsubscribe in the subject header.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] create a pairwise coocurrence matrix

2010-11-11 Thread Dennis Murphy
Hi:

Another alternative is

crossprod(A)

which is meant to produce an optimized A'A  (not the vector cross-product
from intro physics :)

Example:

A <- matrix(rpois(9, 10), ncol = 3)
> A
 [,1] [,2] [,3]
[1,]6   10   14
[2,]75   16
[3,]   12   16   10
> t(A) %*% A
 [,1] [,2] [,3]
[1,]  229  287  316
[2,]  287  381  380
[3,]  316  380  552
> crossprod(A)
 [,1] [,2] [,3]
[1,]  229  287  316
[2,]  287  381  380
[3,]  316  380  552

A little test:

f <- function(A) {
 U <- crossprod(A)
 diag(U) <- 0
 U
  }
g <- function(A) {
 U <- t(A) %*% A
 diag(U) <- 0
 U
  }

# matrix multiplication on a 100 x 10 matrix
> system.time(replicate(1000, g(matrix(rpois(1000, 10), ncol = 10
   user  system elapsed
   0.240.000.23

# crossprod
> system.time(replicate(1000, f(matrix(rpois(1000, 10), ncol = 10
   user  system elapsed
   0.200.000.21

# matrix multiplication on a 10 x 100 matrix
> system.time(replicate(1000, g(matrix(rpois(1000, 10), ncol = 100
   user  system elapsed
   1.720.171.89

# crossprod
> system.time(replicate(1000, f(matrix(rpois(1000, 10), ncol = 100
   user  system elapsed
   0.470.090.56


HTH,
Dennis

On Thu, Nov 11, 2010 at 5:19 PM, Michael Bedward
wrote:

> On 12 November 2010 02:21, David Winsemius  wrote:
> >
> >> The fastest and easiest solution is
> >>
> >>  t(A) %*% A
> >
> > That is really elegant. (Wish I could remember my linear algebra lessons
> as
> > well from forty years ago.) I checked it against the specified output and
> > found that with one exception that the OP had planned for the diagonal to
> be
> > filled with zeroes. So that could be completed by a simple modification:
> >
> > temp <- t(A) %*% A
> > diag(temp) <- 0
> > temp
> >
>
> Excellent solution ! Small is beautiful :)
>
> Michael
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding meta-data when creating objects. e.g: changing "<-" so to (for example) add "creation time" - how-to and pros/cons?

2010-11-11 Thread Michael Bedward
Hi Ivan,

It doesn't work because there is no object called "x" in the
function's local scope.

Try this...

function(x, ...) {
 xname <- deparse(substitute(x))
 assign(xname, ...)
 x <- get(xname)
 attr(x, "creation.time") <- Sys.time()
 assign(xname, x, pos=.GlobalEnv)
}

Michael


On 11 November 2010 22:17, Ivan Calandra  wrote:
> Hi,
>
> I have tried this (just to practice):
> assign2 <- function(x, ...){
>  assign(x, ..., envir=.GlobalEnv)
>  attr(get(x, envir=.GlobalEnv), "creation.time") <- Sys.time()
> }
>
> assign2("y", 1:4)
> Error in attr(get(x), "creation.time") <- Sys.time() :
>  could not find function "get<-"
>
> Why doesn't it work?
> If I remove the attr() part,
> identical(y, get("y")) returns TRUE, so why attr() cannot work with it?
>
> Thanks in advance for the clarification,
>
> Ivan
>
>
> Le 11/11/2010 11:16, Michael Bedward a écrit :
>>
>> Hi Tal,
>>
>> Here's a way of doing the first bit...
>>
>> assign2<- function(x, ...) {
>>   xname<- deparse(substitute(x))
>>   assign(xname, ...)
>>   x<- get(xname)
>>   attr(x, "creation.time")<- Sys.time()
>>   assign(xname, x, pos=.GlobalEnv)
>> }
>>
>> Michael
>>
>>
>> On 11 November 2010 20:37, Tal Galili  wrote:
>>>
>>> My objective is to start having meta-data on objects that I create.
>>> For example, consider the following function:
>>>
>>> assign2<- function(x, ...)
>>> {
>>>  assign("x", ...)
>>> attr(x, "creation time")<- Sys.time()
>>>  x<<- x
>>> }
>>>
>>> assign2("x", 1:4)
>>>
>>> "assign2" assigns to x the vector 1:4, and it then also adds the creation
>>> time of the object.
>>>
>>> (Hat tip goes to Peter Alspach for pointing me to the concept of adding
>>> meta
>>> data to an object using attr)
>>>
>>>
>>> But this function has several major limitations:
>>> 1) It will not work for any assignment other then "x".  For example
>>> assign2("y", 1:4)
>>> Doesn't work.
>>> How might this be fixed ?
>>> 2) This function will probably need to also search the parent environment
>>> if
>>> the variable already exists.  If it does, then there should be a "update
>>> date" instead of "creation date".  But for that to work, I'll need a
>>> solution for problem 1.
>>> 3) How will this handle a case when we are updating only a subset of the
>>> items?  (for example:  assign2("x[1:2]", 8:9) )
>>> 4) My real intention is to somehow change the "<-" operator (not simply
>>> the
>>> assign).  I am unsure as to how to do that.
>>> 5) Are there any major pros/cons to the adding of such meta-data to
>>> objects?
>>> (for example, excessive overhead on memory/performance)
>>> 6) Is there already some system that knows how to do this in R (which I
>>> am
>>> simply ignorant about)?
>>>
>>> Thanks for following through, and for any suggestions/thoughts you might
>>> have.
>>>
>>> Best,
>>> Tal
>>>
>>>
>>>
>>>
>>>
>>> Contact
>>> Details:---
>>> Contact me: tal.gal...@gmail.com |  972-52-7275845
>>> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
>>> www.r-statistics.com (English)
>>>
>>> --
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> --
> Ivan CALANDRA
> PhD Student
> University of Hamburg
> Biozentrum Grindel und Zoologisches Museum
> Abt. Säugetiere
> Martin-Luther-King-Platz 3
> D-20146 Hamburg, GERMANY
> +49(0)40 42838 6231
> ivan.calan...@uni-hamburg.de
>
> **
> http://www.for771.uni-bonn.de
> http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Populating then sorting a matrix and/or data.frame

2010-11-11 Thread Noah Silverman
David,

Great solution.  While a bit longer to enter, it lets me explicitly
define a type for each column. 

Thanks!!!

-N

On 11/11/10 4:02 PM, David Winsemius wrote:
>
> On Nov 11, 2010, at 6:38 PM, Noah Silverman wrote:
>
>> That makes perfect sense.  All of my numbers are being coerced into
>> strings by the c() function.  Subsequently, my data.frame contains all
>> strings.
>>
>> I can't know the length of the data.frame ahead of time, so can't
>> predefine it like your example.
>> One thought would be to make it arbitrarily long filled with 0 and
>> delete off the unused rows.  But this seems rather wasteful.
>
> Although it might be faster, though. Here is a non-c() method using
> instead the list function (with options(stringsAsFactors=FALSE). List
> does not coerce to same mode and rbind.dta.frame will accept a list as
> a row argument:
>
> results <- data.frame(a=vector(mode="character", length=0) ,
>   b=vector(mode="numeric", length=0),
>  cc=vector(mode="numeric", length=0), # note: 
> avoid "c" as name
>   d=vector(mode="numeric", length=0))
>  n = 10
>  for(i in 1:n){
>  a = LETTERS[i];
>  b = i;
>  cc = 3*i + 2
>  d = rnorm(1);
>  results <- rbind(results, list(a=a,b=b,cc=cc,d=c))
>   }
>  results
>a  b cc  d
> 2  A  1  5  5
> 21 B  2  8  8
> 3  C  3 11 11
> 4  D  4 14 14
> 5  E  5 17 17
> 6  F  6 20 20
> 7  G  7 23 23
> 8  H  8 26 26
> 9  I  9 29 29
> 10 J 10 32 32
>
> OOOPs used d=c and there was a "c" vector hanging around to be picked up.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] exploratory analysis of large categorical datasets

2010-11-11 Thread Dennis Murphy
Hi:

A good place to start would be package vcd and its suite of demos and
vignettes, as well as the vcdExtra package, which adds a few more goodies
and a very nice introductory vignette by Michael Friendly. You can't fault
the package for a lack of documentation :)

You might also find the following link useful:  http://www.datavis.ca/R/
Scroll down to 'vcd and vcdExtra', and further down to 'tableplot', which
was recently released on CRAN.

HTH,
Dennis

On Thu, Nov 11, 2010 at 2:09 PM, Lara Poplarski wrote:

> Dear List,
>
>
> I am looking to perform exploratory analyses of two (relatively) large
> datasets of categorical data. The first one is a binary 80x100 matrix, in
> the form:
>
>
> matrix(sample(c(0,1),25,replace=TRUE), nrow = 5, ncol=5, dimnames = list(c(
> "group1", "group2","group3", "group4","group5"), c("V.1", "V.2", "V.3",
> "V.4", "V.5")))
>
>
> and the second one is a multistate 750x1500 matrix, with up to 15
> *unordered* states per variable, in the form:
>
>
> matrix(sample(c(1:15),25,replace=TRUE), nrow = 5, ncol=5, dimnames =
> list(c(
> "group1", "group2","group3", "group4","group5"), c("V.1", "V.2", "V.3",
> "V.4", "V.5")))
>
>
> Specifically, I am looking to see which pairs of variables are correlated.
> For continuos data, I would use cor() and cov() to generate the correlation
> matrix and the variance-covariance matrix, which I would then visualize
> with
> symnum() or image(). However, it is not clear to me whether this approach
> is
> suitable for categorical data of this kind.
>
>
> Since I am new to R, I would greatly appreciate any input on how to
> approach
> this task and on efficient visualization of the results.
>
>
> Many thanks in advance,
>
> Lara
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Troubleshooting sweave

2010-11-11 Thread Brian Diggs

On 11/11/2010 3:42 PM, sachinthaka.abeyward...@allianz.com.au wrote:


Hi All,

I've reproduced the example from Prof. Friedrich Leisch's webpage. When I
write sweave("Example-1.Snw") OR sweave("Example-1.Rnw"), (yes, I renamed
them). I get the following error:

Writing to file example-1.tex
Processing code chunks ...
  1 : echo term verbatim

Error:  chunk 1
Error in library(ctest) : there is no package called 'ctest'


That error message indicates that you attempted to load a package 
('ctest') that is not installed.  Normally, you would just need to 
install the package using the install.package() command. However, there 
is (no longer) a ctest package to install.  Digging through the email 
archives, it appears to have been merged into the stats package back in 
version 1.9.0.  So you can just delete the "library(ctest)" line in the 
Rnw file.


You can search for examples that are more recent; this one must have 
been from before April 2004 (when R 1.9.0 was released).  Though I don't 
blame you for being confused; I would also have expected the examples on 
what seems to be the canonical homepage to work and not be out-of-date.



Also while I'm at it, is there an R command to compile the tex file as a
pdf or does the Sweave() function do that for me?


Check out the texi2dvi() function in the tools package.  If you want a 
PDF, be sure to use the pdf option.



Thanks,
Sachin
p.s. sorry about the corporate notice.

example-1.Rnw: from http://www.statistik.lmu.de/~leisch/Sweave/

\documentclass[a4paper]{article}

\title{Sweave Example 1}
\author{Friedrich Leisch}

\begin{document}

\maketitle

In this example we embed parts of the examples from the
\texttt{kruskal.test} help page into a \LaTeX{} document:

<<>>=
data(airquality)
library(ctest)
kruskal.test(Ozone ~ Month, data = airquality)
@
which shows that the location parameter of the Ozone
distribution varies significantly from month to month. Finally we
include a boxplot of the data:

\begin{center}
<>=
boxplot(Ozone ~ Month, data = airquality)
@
\end{center}

\end{document}


--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] create a pairwise coocurrence matrix

2010-11-11 Thread Michael Bedward
On 12 November 2010 02:21, David Winsemius  wrote:
>
>> The fastest and easiest solution is
>>
>>  t(A) %*% A
>
> That is really elegant. (Wish I could remember my linear algebra lessons as
> well from forty years ago.) I checked it against the specified output and
> found that with one exception that the OP had planned for the diagonal to be
> filled with zeroes. So that could be completed by a simple modification:
>
> temp <- t(A) %*% A
> diag(temp) <- 0
> temp
>

Excellent solution ! Small is beautiful :)

Michael

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] can not produce graph using "splom"

2010-11-11 Thread Ben Bolker
jim holtman  gmail.com> writes:

> 
> Did you have 'dev.off()' at the end?
> 
> On Thu, Nov 11, 2010 at 7:27 PM, Xiaoqi Cui  mtu.edu> wrote:
> > Hi,
> >
> > I wrote a function basically to first read an input data file
> , then open an pdf file and draw graph using
> "splom". When testing, I ran the function line by line, 
> it can produce nice plot, but with like 50 warnings.
> However, whenever I ran this function as a whole,
> it can not produce any plot, the pdf file has nothing in it.
> It seems the "splom" function even hasn't been run. 
> Does anybody encountered similar problems, and would
> kindly share any possible reasons?
> 

  Does 

http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-do-lattice_002ftrellis-graphics-not-work_003f

  have anything to do with it?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] what's wrong with this 'length' in function?

2010-11-11 Thread Peter Langfelder
You change x from a single value to a vector of size 2, for example here:

if (j==2) {x=x+c(-1,1)*0.5}

That makes

c(   qchisq(1-alpha/2,df=2*x)/2,
  qchisq(alpha/2,df=2*x+2)/2)

a vector of 4 numbers that you are trying to assign to a row of a
matrix with two columns.

Keep x a single number and things will run smoothly.

Peter


On Thu, Nov 11, 2010 at 4:42 PM, casperyc  wrote:
>
> Hi all,
>
> I am having a trouble with this function I wrote
>
> ###
> p26=function(x,alpha){
>
>        # dummy variable
>        j=1
>
>        ci=matrix(ncol=2,nrow=3)
>
>        while (j<4){
>                if (j==2) {x=x+c(-1,1)*0.5}
>
>                ci[j,]=
>                x+qnorm(1-alpha/2)^2/2+
>                c(-1,1)*qnorm(1-alpha/2)*
>                sqrt(x+qnorm(1-alpha/2)^2/4)
>
>                j=j+1
>
>                if (j==3) { # exact
>                        x=x-c(-1,1)*0.5
>                        ci[j,]=c(
>                        qchisq(1-alpha/2,df=2*x)/2,
>                        qchisq(alpha/2,df=2*x+2)/2)
>                j=j+1
>                }
>        }
>                rownames(ci)=c('without','with','exact')
>                colnames(ci)=c('lower','upper')
>                return(round(ci,2))
> }
> ###
>
> Most bits are fine.
>
> The problem part is
> ###
>                if (j==3) { # exact
>                        x=x-c(-1,1)*0.5
>                        ci[j,]=c(
>                        qchisq(1-alpha/2,df=2*x)/2,
>                        qchisq(alpha/2,df=2*x+2)/2)
>                j=j+1
>                }
> ###
> in the middle, when I run the function with
>
> p26(89,0.05)
>
> I got the following
>
> ###
> Error in ci[j, ] = c(qchisq(1 - alpha/2, df = 2 * x)/2, qchisq(alpha/2,  :
>  number of items to replace is not a multiple of replacement length
> ###
>
> I have been looking at it for a long time, still dont know why the 'length'
> differ??
>
> can someone spot it?
>
> Thanks.
>
> casper
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/what-s-wrong-with-this-length-in-function-tp3038908p3038908.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Troubleshooting sweave

2010-11-11 Thread Marc Schwartz
On Nov 11, 2010, at 5:42 PM, sachinthaka.abeyward...@allianz.com.au wrote:

> 
> Hi All,
> 
> I've reproduced the example from Prof. Friedrich Leisch's webpage. When I
> write sweave("Example-1.Snw") OR sweave("Example-1.Rnw"), (yes, I renamed
> them). I get the following error:
> 
> Writing to file example-1.tex
> Processing code chunks ...
> 1 : echo term verbatim
> 
> Error:  chunk 1
> Error in library(ctest) : there is no package called 'ctest'
> 
> Also while I'm at it, is there an R command to compile the tex file as a
> pdf or does the Sweave() function do that for me?
> 
> Thanks,
> Sachin
> p.s. sorry about the corporate notice.
> 
> example-1.Rnw: from http://www.statistik.lmu.de/~leisch/Sweave/
> 
> \documentclass[a4paper]{article}
> 
> \title{Sweave Example 1}
> \author{Friedrich Leisch}
> 
> \begin{document}
> 
> \maketitle
> 
> In this example we embed parts of the examples from the
> \texttt{kruskal.test} help page into a \LaTeX{} document:
> 
> <<>>=
> data(airquality)
> library(ctest)
> kruskal.test(Ozone ~ Month, data = airquality)
> @
> which shows that the location parameter of the Ozone
> distribution varies significantly from month to month. Finally we
> include a boxplot of the data:
> 
> \begin{center}
> <>=
> boxplot(Ozone ~ Month, data = airquality)
> @
> \end{center}
> 
> \end{document}


'ctest' was a package that was merged into the 'stats' package in R version 
1.9.0, which was released back in 2004. So it would appear that Fritz (cc'd 
here) has not updated the examples to reflect that change.

A simple fix would be to remove the 'library(ctest)' line from the file.

If you look at:

  require(utils)
  ?Sweave

there are some examples of using both internal and external R commands to 
process the file into a PDF. Sweave itself just generates the TeX source file 
(.tex), which then requires a LaTeX installation to process that file into the 
PDF (or EPS as may be required). 

I would also be sure to review ?RweaveLatex for more information, including 
some potential environment issues that you may have to address in the Details 
section. Fritz' Sweave FAQ and manual on his site are also good references as 
may be required.

Presuming that you have LaTeX installed on your computer, the easiest thing to 
do may to use:

  pdflatex YourFileName.tex

at the command line, to process the resultant TeX file into a PDF. The steps 
can be a bit more complicated if you are using any LaTeX packages that might 
require the intermediate creation of a DVI and PS file. That would include 
packages such as PSTricks, etc.

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] what's wrong with this 'length' in function?

2010-11-11 Thread casperyc

Hi all,

I am having a trouble with this function I wrote

###
p26=function(x,alpha){

# dummy variable
j=1

ci=matrix(ncol=2,nrow=3)

while (j<4){
if (j==2) {x=x+c(-1,1)*0.5}

ci[j,]=
x+qnorm(1-alpha/2)^2/2+
c(-1,1)*qnorm(1-alpha/2)*
sqrt(x+qnorm(1-alpha/2)^2/4)

j=j+1

if (j==3) { # exact
x=x-c(-1,1)*0.5
ci[j,]=c(
qchisq(1-alpha/2,df=2*x)/2,
qchisq(alpha/2,df=2*x+2)/2)
j=j+1
}
}
rownames(ci)=c('without','with','exact')
colnames(ci)=c('lower','upper')
return(round(ci,2))
}
###

Most bits are fine.

The problem part is 
###
if (j==3) { # exact
x=x-c(-1,1)*0.5
ci[j,]=c(
qchisq(1-alpha/2,df=2*x)/2,
qchisq(alpha/2,df=2*x+2)/2)
j=j+1
}
###
in the middle, when I run the function with

p26(89,0.05)

I got the following

###
Error in ci[j, ] = c(qchisq(1 - alpha/2, df = 2 * x)/2, qchisq(alpha/2,  : 
  number of items to replace is not a multiple of replacement length
###

I have been looking at it for a long time, still dont know why the 'length'
differ??

can someone spot it?

Thanks.

casper

-- 
View this message in context: 
http://r.789695.n4.nabble.com/what-s-wrong-with-this-length-in-function-tp3038908p3038908.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] can not produce graph using "splom"

2010-11-11 Thread jim holtman
Did you have 'dev.off()' at the end?

On Thu, Nov 11, 2010 at 7:27 PM, Xiaoqi Cui  wrote:
> Hi,
>
> I wrote a function basically to first read an input data file, then open an 
> pdf file and draw graph using "splom". When testing, I ran the function line 
> by line, it can produce nice plot, but with like 50 warnings. However, 
> whenever I ran this function as a whole, it can not produce any plot, the pdf 
> file has nothing in it. It seems the "splom" function even hasn't been run. 
> Does anybody encountered similar problems, and would kindly share any 
> possible reasons?
>
> Thanks,
>
> Xiaoqi
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] can not produce graph using "splom"

2010-11-11 Thread Xiaoqi Cui
Hi,

I wrote a function basically to first read an input data file, then open an pdf 
file and draw graph using "splom". When testing, I ran the function line by 
line, it can produce nice plot, but with like 50 warnings. However, whenever I 
ran this function as a whole, it can not produce any plot, the pdf file has 
nothing in it. It seems the "splom" function even hasn't been run. Does anybody 
encountered similar problems, and would kindly share any possible reasons? 

Thanks,

Xiaoqi

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Populating then sorting a matrix and/or data.frame

2010-11-11 Thread David Winsemius


On Nov 11, 2010, at 6:38 PM, Noah Silverman wrote:


That makes perfect sense.  All of my numbers are being coerced into
strings by the c() function.  Subsequently, my data.frame contains all
strings.

I can't know the length of the data.frame ahead of time, so can't
predefine it like your example.
One thought would be to make it arbitrarily long filled with 0 and
delete off the unused rows.  But this seems rather wasteful.


Although it might be faster, though. Here is a non-c() method using  
instead the list function (with options(stringsAsFactors=FALSE). List  
does not coerce to same mode and rbind.dta.frame will accept a list as  
a row argument:


results <- data.frame(a=vector(mode="character", length=0) ,
  b=vector(mode="numeric", length=0),
 cc=vector(mode="numeric", length=0), # note:   
avoid "c" as name

  d=vector(mode="numeric", length=0))
 n = 10
 for(i in 1:n){
 a = LETTERS[i];
 b = i;
 cc = 3*i + 2
 d = rnorm(1);
 results <- rbind(results, list(a=a,b=b,cc=cc,d=c))
  }
 results
   a  b cc  d
2  A  1  5  5
21 B  2  8  8
3  C  3 11 11
4  D  4 14 14
5  E  5 17 17
6  F  6 20 20
7  G  7 23 23
8  H  8 26 26
9  I  9 29 29
10 J 10 32 32

OOOPs used d=c and there was a "c" vector hanging around to be picked  
up.


--
David.



-N

On 11/11/10 2:02 PM, Peter Langfelder wrote:
On Thu, Nov 11, 2010 at 1:19 PM, William Dunlap   
wrote:

Peter,

Your example doesn't work for me unless I
set options(stringsAsFactors=TRUE) first.
(If I do set that, then all columns of 'results'
have class "character", which I doubt the user
wants.)

You probably mean stringsAsFactors=FALSE.

What you say makes sense, because the c() function produces a vector
in which all components have the same type, wnd it will be character.
If you don't want to have characters, my solution would be

n = 10
results <- data.frame(a = rep("", n), b = rep(0, n), c = rep(0, n), d
= rep(0, n))
for(i in 1:n){
  a = LETTERS[i];
  b = i;
  c = 3*i + 2
  d = rnorm(1);
  results$a[i] = a
  results$b[i] = b
  results$c[i] = c
  results$d[i] = d
}


results

  a  b  c   d
1  A  1  5 -1.31553805
2  B  2  8  0.09198054
3  C  3 11 -0.05860804
4  D  4 14  0.77796136
5  E  5 17  1.28924697
6  F  6 20  0.47631483
7  G  7 23 -1.23727076
8  H  8 26  0.83595295
9  I  9 29  0.69435349
10 J 10 32 -0.30922930


mode(results[, 1])

[1] "character"

mode(results[, 2])

[1] "numeric"

mode(results[, 3])

[1] "numeric"

mode(results[, 4])

[1] "numeric"



or alternatively

n = 10
num <- data.frame(b = rep(0, n), c = rep(0, n), d = rep(0, n))
labels = rep("", n);
for(i in 1:n){
  a = LETTERS[i];
  b = i;
  c = 3*i + 2
  d = rnorm(1);
  labels[i] = a
  num[i, ] = c(b, c, d)
}
results = data.frame(a = labels, num)


results

  a  b  c   d
1  A  1  5 -0.47150097
2  B  2  8 -1.30507313
3  C  3 11 -1.09860425
4  D  4 14  0.91326330
5  E  5 17 -0.09732841
6  F  6 20 -0.75134162
7  G  7 23  0.31360908
8  H  8 26 -1.54406716
9  I  9 29 -0.36075743
10 J 10 32 -0.23758269

mode(results[, 1])

[1] "character"

mode(results[, 2])

[1] "numeric"

mode(results[, 3])

[1] "numeric"

mode(results[, 4])

[1] "numeric"


Peter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Populating then sorting a matrix and/or data.frame

2010-11-11 Thread Peter Langfelder
I see 4 ways to write the code:

1. make the frame very long at the start and use my code - this is
practical if you know that your data frame will not be longer than a
certain number of rows, be it a million;

2a. use something like

result1 = data.frame(a=a, b=b, c=c, d=d)

within the loop to create a 1x4 data frame that you can rbind to
results within the loop;

2b. make the code a bit more intelligent, for example by allocating
blocks of say n=1000 at a time as needed and rbind-ing them to result;

3. fill up results with characters using your rbind(results,
c(a,b,c,d)), then use something like

results[, c(2:4)] = apply(apply(results[, c(2:4), 2, as.character), 2,
as.numeric)

to convert the characters in columns 2:4 to numbers (this construct
also works with factors)

The difference between 2a and 2b is that 2b may be faster if n is
large, because 2a grows 4 objects by 1 unit n times, which is quite
slow. The same holds for solution 3. In that sense solution 1 may be
less wasteful than solutions 2a or 3 although it may not look like
that.

Peter



On Thu, Nov 11, 2010 at 3:38 PM, Noah Silverman  wrote:
> That makes perfect sense.  All of my numbers are being coerced into
> strings by the c() function.  Subsequently, my data.frame contains all
> strings.
>
> I can't know the length of the data.frame ahead of time, so can't
> predefine it like your example.
> One thought would be to make it arbitrarily long filled with 0 and
> delete off the unused rows.  But this seems rather wasteful.
>
> -N

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] comma separated format

2010-11-11 Thread David Winsemius


On Nov 11, 2010, at 5:51 PM, sachinthaka.abeyward...@allianz.com.au  
wrote:




Hi All,

I'm trying to create labels to plot such that it doesn't show up as
scientific notation. So for example how do I get R to show 1e06 as
$1,000,000.


Knowing how these things usually have even more compact solutions than  
I usually uncover at first, I will not be at all surprised if there  
are improvements offered:


fc <- formatC(1.234 * 10^(0:8), format="fg", big.mark = ",")
sprintf("$%s", fc)

#
#[1] "$1.234"   "$12.34"   "$123.4"   "$1,234"
"$12,340"  "$123,400"

#  "$1,234,000"
# [8] "$12,340,000"  "$123,400,000"

So the vectorized  one liner would be
> commaUS <- function(x) sprintf("$%s", formatC(x, format="fg",  
big.mark = ","))

> commaUS(c(1e06, 1e07, 1e08))
[1] "$1,000,000"   "$10,000,000"  "$100,000,000"

I could not find the right specs to formatC to get the "$" in the  
right place.


I see ggplot2 has a dollar() function from which a big.mark argument  
will be passed through to format(), but this suggests to me that there  
might not be a simpler solution in base R. The rrv package has a  
money() function.


--
David.




I was wondering if there was a single function which allows you to  
do that,
the same way that as.Date() allows you to show in date format on a  
plot.


Thanks,
Sachin

--- Please consider the environment before printing this email ---

Allianz - Best General Insurance Company of the Year 2010*
Allianz - General Insurance Company of the Year 2009+

* Australian Banking and Finance Insurance Awards
+ Australia and New Zealand Insurance Industry Awards

This email and any attachments has been sent by Allianz Australia  
Insurance Limited (ABN 15 000 122 850) and is intended solely for  
the addressee. It is confidential, may contain personal information  
and may be subject to legal professional privilege. Unauthorised use  
is strictly prohibited and may be unlawful. If you have received  
this by mistake, confidentiality and any legal privilege are not  
waived or lost and we ask that you contact the sender and delete and  
destroy this and any other copies. In relation to any legal use you  
may make of the contents of this email, you must ensure that you  
comply with the Privacy Act (Cth) 1988 and you should note that the  
contents may be subject to copyright and therefore may not be  
reproduced, communicated or adapted without the express consent of  
the owner of the copyright.
Allianz will not be liable in connection with any data corruption,  
interruption, delay, computer virus or unauthorised access or  
amendment to the contents of this email. If this email is a  
commercial electronic message and you would prefer not to receive  
further commercial electronic messages from Allianz, please forward  
a copy of this email to unsubscr...@allianz.com.au with the word  
unsubscribe in the subject header.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Troubleshooting sweave

2010-11-11 Thread sachinthaka . abeywardana

Hi All,

I've reproduced the example from Prof. Friedrich Leisch's webpage. When I
write sweave("Example-1.Snw") OR sweave("Example-1.Rnw"), (yes, I renamed
them). I get the following error:

Writing to file example-1.tex
Processing code chunks ...
 1 : echo term verbatim

Error:  chunk 1
Error in library(ctest) : there is no package called 'ctest'

Also while I'm at it, is there an R command to compile the tex file as a
pdf or does the Sweave() function do that for me?

Thanks,
Sachin
p.s. sorry about the corporate notice.

example-1.Rnw: from http://www.statistik.lmu.de/~leisch/Sweave/

\documentclass[a4paper]{article}

\title{Sweave Example 1}
\author{Friedrich Leisch}

\begin{document}

\maketitle

In this example we embed parts of the examples from the
\texttt{kruskal.test} help page into a \LaTeX{} document:

<<>>=
data(airquality)
library(ctest)
kruskal.test(Ozone ~ Month, data = airquality)
@
which shows that the location parameter of the Ozone
distribution varies significantly from month to month. Finally we
include a boxplot of the data:

\begin{center}
<>=
boxplot(Ozone ~ Month, data = airquality)
@
\end{center}

\end{document}

--- Please consider the environment before printing this email --- 

Allianz - Best General Insurance Company of the Year 2010*
Allianz - General Insurance Company of the Year 2009+ 

* Australian Banking and Finance Insurance Awards
+ Australia and New Zealand Insurance Industry Awards 

This email and any attachments has been sent by Allianz Australia Insurance 
Limited (ABN 15 000 122 850) and is intended solely for the addressee. It is 
confidential, may contain personal information and may be subject to legal 
professional privilege. Unauthorised use is strictly prohibited and may be 
unlawful. If you have received this by mistake, confidentiality and any legal 
privilege are not waived or lost and we ask that you contact the sender and 
delete and destroy this and any other copies. In relation to any legal use you 
may make of the contents of this email, you must ensure that you comply with 
the Privacy Act (Cth) 1988 and you should note that the contents may be subject 
to copyright and therefore may not be reproduced, communicated or adapted 
without the express consent of the owner of the copyright.
Allianz will not be liable in connection with any data corruption, 
interruption, delay, computer virus or unauthorised access or amendment to the 
contents of this email. If this email is a commercial electronic message and 
you would prefer not to receive further commercial electronic messages from 
Allianz, please forward a copy of this email to unsubscr...@allianz.com.au with 
the word unsubscribe in the subject header.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] comma separated format

2010-11-11 Thread jim holtman
Is this what you want:

> paste("$", format(1e6, big.mark = ',', format = 'f'), sep='')
[1] "$1,000,000"
>


On Thu, Nov 11, 2010 at 5:51 PM,
 wrote:
>
> Hi All,
>
> I'm trying to create labels to plot such that it doesn't show up as
> scientific notation. So for example how do I get R to show 1e06 as
> $1,000,000.
>
> I was wondering if there was a single function which allows you to do that,
> the same way that as.Date() allows you to show in date format on a plot.
>
> Thanks,
> Sachin
>
> --- Please consider the environment before printing this email ---
>
> Allianz - Best General Insurance Company of the Year 2010*
> Allianz - General Insurance Company of the Year 2009+
>
> * Australian Banking and Finance Insurance Awards
> + Australia and New Zealand Insurance Industry Awards
>
> This email and any attachments has been sent by Allianz Australia Insurance 
> Limited (ABN 15 000 122 850) and is intended solely for the addressee. It is 
> confidential, may contain personal information and may be subject to legal 
> professional privilege. Unauthorised use is strictly prohibited and may be 
> unlawful. If you have received this by mistake, confidentiality and any legal 
> privilege are not waived or lost and we ask that you contact the sender and 
> delete and destroy this and any other copies. In relation to any legal use 
> you may make of the contents of this email, you must ensure that you comply 
> with the Privacy Act (Cth) 1988 and you should note that the contents may be 
> subject to copyright and therefore may not be reproduced, communicated or 
> adapted without the express consent of the owner of the copyright.
> Allianz will not be liable in connection with any data corruption, 
> interruption, delay, computer virus or unauthorised access or amendment to 
> the contents of this email. If this email is a commercial electronic message 
> and you would prefer not to receive further commercial electronic messages 
> from Allianz, please forward a copy of this email to 
> unsubscr...@allianz.com.au with the word unsubscribe in the subject header.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Populating then sorting a matrix and/or data.frame

2010-11-11 Thread Noah Silverman
That makes perfect sense.  All of my numbers are being coerced into
strings by the c() function.  Subsequently, my data.frame contains all
strings.

I can't know the length of the data.frame ahead of time, so can't
predefine it like your example.
One thought would be to make it arbitrarily long filled with 0 and
delete off the unused rows.  But this seems rather wasteful.

-N

On 11/11/10 2:02 PM, Peter Langfelder wrote:
> On Thu, Nov 11, 2010 at 1:19 PM, William Dunlap  wrote:
>> Peter,
>>
>> Your example doesn't work for me unless I
>> set options(stringsAsFactors=TRUE) first.
>> (If I do set that, then all columns of 'results'
>> have class "character", which I doubt the user
>> wants.)
> You probably mean stringsAsFactors=FALSE.
>
> What you say makes sense, because the c() function produces a vector
> in which all components have the same type, wnd it will be character.
> If you don't want to have characters, my solution would be
>
> n = 10
> results <- data.frame(a = rep("", n), b = rep(0, n), c = rep(0, n), d
> = rep(0, n))
> for(i in 1:n){
>a = LETTERS[i];
>b = i;
>c = 3*i + 2
>d = rnorm(1);
>results$a[i] = a
>results$b[i] = b
>results$c[i] = c
>results$d[i] = d
> }
>
>> results
>a  b  c   d
> 1  A  1  5 -1.31553805
> 2  B  2  8  0.09198054
> 3  C  3 11 -0.05860804
> 4  D  4 14  0.77796136
> 5  E  5 17  1.28924697
> 6  F  6 20  0.47631483
> 7  G  7 23 -1.23727076
> 8  H  8 26  0.83595295
> 9  I  9 29  0.69435349
> 10 J 10 32 -0.30922930
>
>> mode(results[, 1])
> [1] "character"
>> mode(results[, 2])
> [1] "numeric"
>> mode(results[, 3])
> [1] "numeric"
>> mode(results[, 4])
> [1] "numeric"
>
>
>
> or alternatively
>
> n = 10
> num <- data.frame(b = rep(0, n), c = rep(0, n), d = rep(0, n))
> labels = rep("", n);
> for(i in 1:n){
>a = LETTERS[i];
>b = i;
>c = 3*i + 2
>d = rnorm(1);
>labels[i] = a
>num[i, ] = c(b, c, d)
> }
> results = data.frame(a = labels, num)
>
>> results
>a  b  c   d
> 1  A  1  5 -0.47150097
> 2  B  2  8 -1.30507313
> 3  C  3 11 -1.09860425
> 4  D  4 14  0.91326330
> 5  E  5 17 -0.09732841
> 6  F  6 20 -0.75134162
> 7  G  7 23  0.31360908
> 8  H  8 26 -1.54406716
> 9  I  9 29 -0.36075743
> 10 J 10 32 -0.23758269
>> mode(results[, 1])
> [1] "character"
>> mode(results[, 2])
> [1] "numeric"
>> mode(results[, 3])
> [1] "numeric"
>> mode(results[, 4])
> [1] "numeric"
>
>
> Peter
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Populating then sorting a matrix and/or data.frame

2010-11-11 Thread Noah Silverman
Your errors look exactly like mine.

Changing the option flag does allow me to create the data.frame without
any errors.  A quick look confirms that all the values are there and
correct.

However, R has coerced all of my numeric values to strings. 

Using your sample code also turns all the numeric values to strings. 
Perhaps it has something to do with the way R in interpreting the first
column?

Thanks!

-N



On 11/11/10 1:59 PM, William Dunlap wrote:
> You are right, I mistyped it.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com  
>
>> -Original Message-
>> From: John Kane [mailto:jrkrid...@yahoo.ca] 
>> Sent: Thursday, November 11, 2010 1:58 PM
>> To: Peter Langfelder; r-help@r-project.org; William Dunlap
>> Subject: Re: [R] Populating then sorting a matrix and/or data.frame
>>
>>
>>
>> --- On Thu, 11/11/10, William Dunlap  wrote:
>>
>>> From: William Dunlap 
>>> Subject: Re: [R] Populating then sorting a matrix and/or data.frame
>>> To: "Peter Langfelder" , 
>> r-help@r-project.org
>>> Received: Thursday, November 11, 2010, 4:19 PM
>>> Peter,
>>>
>>> Your example doesn't work for me unless I
>>> set options(stringsAsFactors=TRUE) first.
>>
>> Don't you mean stringsAsFactors=FALSE here?  At least I get 
>> the same results you do but with stringsAsFactors=FALSE  The 
>> TRUE condition is giving multiple NAs and error messages
>>
>>
>>
>>
>>
>>> (If I do set that, then all columns of 'results'
>>> have class "character", which I doubt the user
>>> wants.)
>>>
 results <- data.frame()

 n = 10
 for(i in 1:n){
>>> +a = LETTERS[i];
>>> +b = i;
>>> +c = 3*i + 2
>>> +d = rnorm(1);
>>> +results <- rbind(results, c(a,b,c,d))
>>> + }
>>> There were 36 warnings (use warnings() to see them)
 warnings()[1:5]
>>> $`invalid factor level, NAs generated`
>>> `[<-.factor`(`*tmp*`, ri, value = "B")
>>>
>>> $`invalid factor level, NAs generated`
>>> `[<-.factor`(`*tmp*`, ri, value = "2")
>>>
>>> $`invalid factor level, NAs generated`
>>> `[<-.factor`(`*tmp*`, ri, value = "8")
>>>
>>> $`invalid factor level, NAs generated`
>>> `[<-.factor`(`*tmp*`, ri, value = "-0.305558353507095")
>>>
>>> $`invalid factor level, NAs generated`
>>> `[<-.factor`(`*tmp*`, ri, value = "C")
>>>
 results
>>>X.A. X.1. X.5. X.1.43055780028799.
>>> 1 A1   
>>> 51.43055780028799
>>> 2   
>>> 
>>> 3   
>>> 
>>> 4   
>>> 
>>> 5   
>>> 
>>> 6   
>>> 
>>> 7   
>>> 
>>> 8   
>>> 
>>> 9   
>>> 
>>> 10
>>>   
>>>
>>> Bill Dunlap
>>> Spotfire, TIBCO Software
>>> wdunlap tibco.com  
>>>
 -Original Message-
 From: r-help-boun...@r-project.org
 [mailto:r-help-boun...@r-project.org]
>>> On Behalf Of Peter Langfelder
 Sent: Thursday, November 11, 2010 12:25 PM
 To: Noah Silverman
 Cc: r-help@r-project.org
 Subject: Re: [R] Populating then sorting a matrix
>>> and/or data.frame
 On Thu, Nov 11, 2010 at 11:33 AM, Noah Silverman
 
>>> wrote:
> Still doesn't work.
>
> When using rbind to build the data.frame, it get
>>> a 
 structure mostly full of
> NA.
> The data is correct, so something about pushing
>>> into the 
 data.frame is
> breaking.
>
> Example code:
> results <- data.frame()
>
> for(i in 1:n){
>#do all the work
>#a is a test label. b,c,d are numeric.
>results <- rbind(results, c(a,b,c,d))
> }
 Works for me:

 results <- data.frame()

 n = 10
 for(i in 1:n){
 a = LETTERS[i];
 b = i;
 c = 3*i + 2
 d = rnorm(1);
 results <- rbind(results, c(a,b,c,d))
 }


> results
 X.A. X.1. X.5. X.0.142223304589023.
 1 A1 
>>>   50.142223304589023
 2 B2 
>>>   80.243612305595176
 3 C   
>>> 3   110.476795513990516
 4 D   
>>> 4   14  1.0278220664213
 5 E   
>>> 5   170.916608672305205
 6 F   
>>> 6   20 
>>>1.61075985995586
 7 G   
>>> 7   230.370423691258896
 8 H   
>>> 8   26  -0.0528603547004191
 9 I   
>>> 9   29-2.07888666920403
 10   
>>> J   10   32   
>>> -1.87980721733655
 Maybe there's something wrong with the calculation you
>>> do?
 Peter

 __
 R-help@r-project.org
>>> mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
>>> reproducible code.
>>> __
>>> R-help@r-project.org
>>> mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, 

[R] comma separated format

2010-11-11 Thread sachinthaka . abeywardana

Hi All,

I'm trying to create labels to plot such that it doesn't show up as
scientific notation. So for example how do I get R to show 1e06 as
$1,000,000.

I was wondering if there was a single function which allows you to do that,
the same way that as.Date() allows you to show in date format on a plot.

Thanks,
Sachin

--- Please consider the environment before printing this email --- 

Allianz - Best General Insurance Company of the Year 2010*
Allianz - General Insurance Company of the Year 2009+ 

* Australian Banking and Finance Insurance Awards
+ Australia and New Zealand Insurance Industry Awards 

This email and any attachments has been sent by Allianz Australia Insurance 
Limited (ABN 15 000 122 850) and is intended solely for the addressee. It is 
confidential, may contain personal information and may be subject to legal 
professional privilege. Unauthorised use is strictly prohibited and may be 
unlawful. If you have received this by mistake, confidentiality and any legal 
privilege are not waived or lost and we ask that you contact the sender and 
delete and destroy this and any other copies. In relation to any legal use you 
may make of the contents of this email, you must ensure that you comply with 
the Privacy Act (Cth) 1988 and you should note that the contents may be subject 
to copyright and therefore may not be reproduced, communicated or adapted 
without the express consent of the owner of the copyright.
Allianz will not be liable in connection with any data corruption, 
interruption, delay, computer virus or unauthorised access or amendment to the 
contents of this email. If this email is a commercial electronic message and 
you would prefer not to receive further commercial electronic messages from 
Allianz, please forward a copy of this email to unsubscr...@allianz.com.au with 
the word unsubscribe in the subject header.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Develop, Scale and Deploy your analyses more easily with Spotfire S+ 8.2

2010-11-11 Thread Louis Bajuk-Yorgan
Continuing our focus on providing the best commercial platform for
creating and sharing statistical analyses, TIBCO is proud to announce
the general availability of Spotfire S+ 8.2. This release focuses on
helping our customers scale their analyses to solve larger analytic
problems and to share and deploy their analyses more easily.

The Spotfire S+ Product Family

This release builds on the features of TIBCO Spotfire Statistics
Services 3.1 (formerly S+Server) to make it even easier to create S+ and
R-powered analytic applications for deployment through TIBCO Spotfire.
We have also released a new version of Spotfire Miner, which combines
drag-and-drop analytic workflows with S+ scripts. 

Scaling to solve larger analytic problems 

Spotfire S+ now supports Windows 64-bit desktop platforms, and delivers
much faster matrix operations for higher overall performance.  The new
Statistics Services view in the Eclipse-based Spotfire S+ Workbench
provides a way to monitor the allocation and status of your Spotfire S+
and R analytic jobs run through Statistics Services, helping you utilize
the resources of your cluster or grid as efficiently as possible. 

Easier, More Secure Deployment

Spotfire S+ 8.1 introduced the ability to submit
computationally-intensive scripts to a server. This release provides the
ability to track and manage those jobs directly from within the Spotfire
S+ Workbench. The new Statistics Services View also lets you deploy
packages and data files easily and securely to a server from the
Spotfire S+ Workbench. 

Updated Spotfire S+ Workbench

The Spotfire S+ Workbench, our modern Eclipse-based Integrated
Development Environment (IDE) for developing and debugging S+ code, has
been updated to the latest version of the Eclipse framework to provide a
seamless user experience between multiple platforms (Unix, Linux, and
32-bit and 64-bit versions of Windows).

Focus on Customer Feedback

Based on many customer requests, this release provides numerous
improvements, including data access enhancements (such as the ability to
import from specific sheets in an Excel workbook), improved consistency
of colors and shapes between graphics devices, and support for Windows
7.

Enhanced R Compatibility
This version includes new functions (and new arguments to existing
functions) to improve compatibility with R, making it easier to migrate
packages from R to Spotfire S+ and simplifying writing scripts
compatible with both Spotfire S+ and R.

For more information on Spotfire S+ 8.2, please visit our website at:

http://spotfire.tibco.com/products/s-plus/statistical-analysis-software.
aspx

For more information on deploy S+ and R-powered applications through
Statistics Services, please visit:

http://spotfire.tibco.com/products/statistics-services/predictive-analyt
ics.aspx

--
Lou Bajuk-Yorgan
Sr. Director, Product Management
Spotfire, TIBCO Software
206-802-2328
lba...@tibco.com
http://spotfire.tibco.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] exploratory analysis of large categorical datasets

2010-11-11 Thread Lara Poplarski
Dear List,


I am looking to perform exploratory analyses of two (relatively) large
datasets of categorical data. The first one is a binary 80x100 matrix, in
the form:


matrix(sample(c(0,1),25,replace=TRUE), nrow = 5, ncol=5, dimnames = list(c(
"group1", "group2","group3", "group4","group5"), c("V.1", "V.2", "V.3",
"V.4", "V.5")))


and the second one is a multistate 750x1500 matrix, with up to 15
*unordered* states per variable, in the form:


matrix(sample(c(1:15),25,replace=TRUE), nrow = 5, ncol=5, dimnames = list(c(
"group1", "group2","group3", "group4","group5"), c("V.1", "V.2", "V.3",
"V.4", "V.5")))


Specifically, I am looking to see which pairs of variables are correlated.
For continuos data, I would use cor() and cov() to generate the correlation
matrix and the variance-covariance matrix, which I would then visualize with
symnum() or image(). However, it is not clear to me whether this approach is
suitable for categorical data of this kind.


Since I am new to R, I would greatly appreciate any input on how to approach
this task and on efficient visualization of the results.


Many thanks in advance,

Lara

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Populating then sorting a matrix and/or data.frame

2010-11-11 Thread Peter Langfelder
On Thu, Nov 11, 2010 at 1:19 PM, William Dunlap  wrote:
> Peter,
>
> Your example doesn't work for me unless I
> set options(stringsAsFactors=TRUE) first.

Yes, you need to set options(stringsAsFactors=FALSE) (note the FALSE).
I do it always so I forgot about that, sorry.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Populating then sorting a matrix and/or data.frame

2010-11-11 Thread William Dunlap
You are right, I mistyped it.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> -Original Message-
> From: John Kane [mailto:jrkrid...@yahoo.ca] 
> Sent: Thursday, November 11, 2010 1:58 PM
> To: Peter Langfelder; r-help@r-project.org; William Dunlap
> Subject: Re: [R] Populating then sorting a matrix and/or data.frame
> 
> 
> 
> --- On Thu, 11/11/10, William Dunlap  wrote:
> 
> > From: William Dunlap 
> > Subject: Re: [R] Populating then sorting a matrix and/or data.frame
> > To: "Peter Langfelder" , 
> r-help@r-project.org
> > Received: Thursday, November 11, 2010, 4:19 PM
> > Peter,
> > 
> > Your example doesn't work for me unless I
> > set options(stringsAsFactors=TRUE) first.
> 
> 
> Don't you mean stringsAsFactors=FALSE here?  At least I get 
> the same results you do but with stringsAsFactors=FALSE  The 
> TRUE condition is giving multiple NAs and error messages
> 
> 
> 
> 
> 
> > (If I do set that, then all columns of 'results'
> > have class "character", which I doubt the user
> > wants.)
> > 
> > > results <- data.frame()
> > >
> > > n = 10
> > > for(i in 1:n){
> > +    a = LETTERS[i];
> > +    b = i;
> > +    c = 3*i + 2
> > +    d = rnorm(1);
> > +    results <- rbind(results, c(a,b,c,d))
> > + }
> > There were 36 warnings (use warnings() to see them)
> > > warnings()[1:5]
> > $`invalid factor level, NAs generated`
> > `[<-.factor`(`*tmp*`, ri, value = "B")
> > 
> > $`invalid factor level, NAs generated`
> > `[<-.factor`(`*tmp*`, ri, value = "2")
> > 
> > $`invalid factor level, NAs generated`
> > `[<-.factor`(`*tmp*`, ri, value = "8")
> > 
> > $`invalid factor level, NAs generated`
> > `[<-.factor`(`*tmp*`, ri, value = "-0.305558353507095")
> > 
> > $`invalid factor level, NAs generated`
> > `[<-.factor`(`*tmp*`, ri, value = "C")
> > 
> > > results
> >    X.A. X.1. X.5. X.1.43055780028799.
> > 1     A    1   
> > 5    1.43055780028799
> > 2       
> >             
> > 3       
> >             
> > 4       
> >             
> > 5       
> >             
> > 6       
> >             
> > 7       
> >             
> > 8       
> >             
> > 9       
> >             
> > 10        
> >           
> > 
> > Bill Dunlap
> > Spotfire, TIBCO Software
> > wdunlap tibco.com  
> > 
> > > -Original Message-
> > > From: r-help-boun...@r-project.org
> > 
> > > [mailto:r-help-boun...@r-project.org]
> > On Behalf Of Peter Langfelder
> > > Sent: Thursday, November 11, 2010 12:25 PM
> > > To: Noah Silverman
> > > Cc: r-help@r-project.org
> > > Subject: Re: [R] Populating then sorting a matrix
> > and/or data.frame
> > > 
> > > On Thu, Nov 11, 2010 at 11:33 AM, Noah Silverman
> > > 
> > wrote:
> > > > Still doesn't work.
> > > >
> > > > When using rbind to build the data.frame, it get
> > a 
> > > structure mostly full of
> > > > NA.
> > > > The data is correct, so something about pushing
> > into the 
> > > data.frame is
> > > > breaking.
> > > >
> > > > Example code:
> > > > results <- data.frame()
> > > >
> > > > for(i in 1:n){
> > > >    #do all the work
> > > >    #a is a test label. b,c,d are numeric.
> > > >    results <- rbind(results, c(a,b,c,d))
> > > > }
> > > 
> > > Works for me:
> > > 
> > > results <- data.frame()
> > > 
> > > n = 10
> > > for(i in 1:n){
> > >    a = LETTERS[i];
> > >    b = i;
> > >    c = 3*i + 2
> > >    d = rnorm(1);
> > >    results <- rbind(results, c(a,b,c,d))
> > > }
> > > 
> > > 
> > > > results
> > > 
> > >    X.A. X.1. X.5. X.0.142223304589023.
> > > 1     A    1 
> >   5    0.142223304589023
> > > 2     B    2 
> >   8    0.243612305595176
> > > 3     C   
> > 3   11    0.476795513990516
> > > 4     D   
> > 4   14      1.0278220664213
> > > 5     E   
> > 5   17    0.916608672305205
> > > 6     F   
> > 6   20 
> >    1.61075985995586
> > > 7     G   
> > 7   23    0.370423691258896
> > > 8     H   
> > 8   26  -0.0528603547004191
> > > 9     I   
> > 9   29    -2.07888666920403
> > > 10   
> > J   10   32   
> > -1.87980721733655
> > > 
> > > Maybe there's something wrong with the calculation you
> > do?
> > > 
> > > Peter
> > > 
> > > __
> > > R-help@r-project.org
> > mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide 
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained,
> > reproducible code.
> > > 
> > 
> > __
> > R-help@r-project.org
> > mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> > reproducible code.
> > 
> 
> 
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Populating then sorting a matrix and/or data.frame

2010-11-11 Thread Peter Langfelder
On Thu, Nov 11, 2010 at 1:19 PM, William Dunlap  wrote:
> Peter,
>
> Your example doesn't work for me unless I
> set options(stringsAsFactors=TRUE) first.
> (If I do set that, then all columns of 'results'
> have class "character", which I doubt the user
> wants.)

You probably mean stringsAsFactors=FALSE.

What you say makes sense, because the c() function produces a vector
in which all components have the same type, wnd it will be character.
If you don't want to have characters, my solution would be

n = 10
results <- data.frame(a = rep("", n), b = rep(0, n), c = rep(0, n), d
= rep(0, n))
for(i in 1:n){
   a = LETTERS[i];
   b = i;
   c = 3*i + 2
   d = rnorm(1);
   results$a[i] = a
   results$b[i] = b
   results$c[i] = c
   results$d[i] = d
}

> results
   a  b  c   d
1  A  1  5 -1.31553805
2  B  2  8  0.09198054
3  C  3 11 -0.05860804
4  D  4 14  0.77796136
5  E  5 17  1.28924697
6  F  6 20  0.47631483
7  G  7 23 -1.23727076
8  H  8 26  0.83595295
9  I  9 29  0.69435349
10 J 10 32 -0.30922930

> mode(results[, 1])
[1] "character"
> mode(results[, 2])
[1] "numeric"
> mode(results[, 3])
[1] "numeric"
> mode(results[, 4])
[1] "numeric"



or alternatively

n = 10
num <- data.frame(b = rep(0, n), c = rep(0, n), d = rep(0, n))
labels = rep("", n);
for(i in 1:n){
   a = LETTERS[i];
   b = i;
   c = 3*i + 2
   d = rnorm(1);
   labels[i] = a
   num[i, ] = c(b, c, d)
}
results = data.frame(a = labels, num)

> results
   a  b  c   d
1  A  1  5 -0.47150097
2  B  2  8 -1.30507313
3  C  3 11 -1.09860425
4  D  4 14  0.91326330
5  E  5 17 -0.09732841
6  F  6 20 -0.75134162
7  G  7 23  0.31360908
8  H  8 26 -1.54406716
9  I  9 29 -0.36075743
10 J 10 32 -0.23758269
> mode(results[, 1])
[1] "character"
> mode(results[, 2])
[1] "numeric"
> mode(results[, 3])
[1] "numeric"
> mode(results[, 4])
[1] "numeric"


Peter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Populating then sorting a matrix and/or data.frame

2010-11-11 Thread John Kane


--- On Thu, 11/11/10, William Dunlap  wrote:

> From: William Dunlap 
> Subject: Re: [R] Populating then sorting a matrix and/or data.frame
> To: "Peter Langfelder" , r-help@r-project.org
> Received: Thursday, November 11, 2010, 4:19 PM
> Peter,
> 
> Your example doesn't work for me unless I
> set options(stringsAsFactors=TRUE) first.


Don't you mean stringsAsFactors=FALSE here?  At least I get the same results 
you do but with stringsAsFactors=FALSE  The TRUE condition is giving multiple 
NAs and error messages





> (If I do set that, then all columns of 'results'
> have class "character", which I doubt the user
> wants.)
> 
> > results <- data.frame()
> >
> > n = 10
> > for(i in 1:n){
> +    a = LETTERS[i];
> +    b = i;
> +    c = 3*i + 2
> +    d = rnorm(1);
> +    results <- rbind(results, c(a,b,c,d))
> + }
> There were 36 warnings (use warnings() to see them)
> > warnings()[1:5]
> $`invalid factor level, NAs generated`
> `[<-.factor`(`*tmp*`, ri, value = "B")
> 
> $`invalid factor level, NAs generated`
> `[<-.factor`(`*tmp*`, ri, value = "2")
> 
> $`invalid factor level, NAs generated`
> `[<-.factor`(`*tmp*`, ri, value = "8")
> 
> $`invalid factor level, NAs generated`
> `[<-.factor`(`*tmp*`, ri, value = "-0.305558353507095")
> 
> $`invalid factor level, NAs generated`
> `[<-.factor`(`*tmp*`, ri, value = "C")
> 
> > results
>    X.A. X.1. X.5. X.1.43055780028799.
> 1     A    1   
> 5    1.43055780028799
> 2       
>             
> 3       
>             
> 4       
>             
> 5       
>             
> 6       
>             
> 7       
>             
> 8       
>             
> 9       
>             
> 10        
>           
> 
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com  
> 
> > -Original Message-
> > From: r-help-boun...@r-project.org
> 
> > [mailto:r-help-boun...@r-project.org]
> On Behalf Of Peter Langfelder
> > Sent: Thursday, November 11, 2010 12:25 PM
> > To: Noah Silverman
> > Cc: r-help@r-project.org
> > Subject: Re: [R] Populating then sorting a matrix
> and/or data.frame
> > 
> > On Thu, Nov 11, 2010 at 11:33 AM, Noah Silverman
> > 
> wrote:
> > > Still doesn't work.
> > >
> > > When using rbind to build the data.frame, it get
> a 
> > structure mostly full of
> > > NA.
> > > The data is correct, so something about pushing
> into the 
> > data.frame is
> > > breaking.
> > >
> > > Example code:
> > > results <- data.frame()
> > >
> > > for(i in 1:n){
> > >    #do all the work
> > >    #a is a test label. b,c,d are numeric.
> > >    results <- rbind(results, c(a,b,c,d))
> > > }
> > 
> > Works for me:
> > 
> > results <- data.frame()
> > 
> > n = 10
> > for(i in 1:n){
> >    a = LETTERS[i];
> >    b = i;
> >    c = 3*i + 2
> >    d = rnorm(1);
> >    results <- rbind(results, c(a,b,c,d))
> > }
> > 
> > 
> > > results
> > 
> >    X.A. X.1. X.5. X.0.142223304589023.
> > 1     A    1 
>   5    0.142223304589023
> > 2     B    2 
>   8    0.243612305595176
> > 3     C   
> 3   11    0.476795513990516
> > 4     D   
> 4   14      1.0278220664213
> > 5     E   
> 5   17    0.916608672305205
> > 6     F   
> 6   20 
>    1.61075985995586
> > 7     G   
> 7   23    0.370423691258896
> > 8     H   
> 8   26  -0.0528603547004191
> > 9     I   
> 9   29    -2.07888666920403
> > 10   
> J   10   32   
> -1.87980721733655
> > 
> > Maybe there's something wrong with the calculation you
> do?
> > 
> > Peter
> > 
> > __
> > R-help@r-project.org
> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> reproducible code.
> > 
> 
> __
> R-help@r-project.org
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
> 



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Populating then sorting a matrix and/or data.frame

2010-11-11 Thread William Dunlap
Peter,

Your example doesn't work for me unless I
set options(stringsAsFactors=TRUE) first.
(If I do set that, then all columns of 'results'
have class "character", which I doubt the user
wants.)

> results <- data.frame()
>
> n = 10
> for(i in 1:n){
+a = LETTERS[i];
+b = i;
+c = 3*i + 2
+d = rnorm(1);
+results <- rbind(results, c(a,b,c,d))
+ }
There were 36 warnings (use warnings() to see them)
> warnings()[1:5]
$`invalid factor level, NAs generated`
`[<-.factor`(`*tmp*`, ri, value = "B")

$`invalid factor level, NAs generated`
`[<-.factor`(`*tmp*`, ri, value = "2")

$`invalid factor level, NAs generated`
`[<-.factor`(`*tmp*`, ri, value = "8")

$`invalid factor level, NAs generated`
`[<-.factor`(`*tmp*`, ri, value = "-0.305558353507095")

$`invalid factor level, NAs generated`
`[<-.factor`(`*tmp*`, ri, value = "C")

> results
   X.A. X.1. X.5. X.1.43055780028799.
1 A151.43055780028799
2
3
4
5
6
7
8
9
10   

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Peter Langfelder
> Sent: Thursday, November 11, 2010 12:25 PM
> To: Noah Silverman
> Cc: r-help@r-project.org
> Subject: Re: [R] Populating then sorting a matrix and/or data.frame
> 
> On Thu, Nov 11, 2010 at 11:33 AM, Noah Silverman
>  wrote:
> > Still doesn't work.
> >
> > When using rbind to build the data.frame, it get a 
> structure mostly full of
> > NA.
> > The data is correct, so something about pushing into the 
> data.frame is
> > breaking.
> >
> > Example code:
> > results <- data.frame()
> >
> > for(i in 1:n){
> >    #do all the work
> >    #a is a test label. b,c,d are numeric.
> >    results <- rbind(results, c(a,b,c,d))
> > }
> 
> Works for me:
> 
> results <- data.frame()
> 
> n = 10
> for(i in 1:n){
>a = LETTERS[i];
>b = i;
>c = 3*i + 2
>d = rnorm(1);
>results <- rbind(results, c(a,b,c,d))
> }
> 
> 
> > results
> 
>X.A. X.1. X.5. X.0.142223304589023.
> 1 A150.142223304589023
> 2 B280.243612305595176
> 3 C3   110.476795513990516
> 4 D4   14  1.0278220664213
> 5 E5   170.916608672305205
> 6 F6   20 1.61075985995586
> 7 G7   230.370423691258896
> 8 H8   26  -0.0528603547004191
> 9 I9   29-2.07888666920403
> 10J   10   32-1.87980721733655
> 
> Maybe there's something wrong with the calculation you do?
> 
> Peter
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R error using Survr function with gcmrec

2010-11-11 Thread Emily C
Sorry for the lack of context - I found a forum (
http://r.789695.n4.nabble.com/R-help-f789696.html) that I thought was easier
to navigate and was using it for the first time. Hit the reply button, but
forgot that the mailing list recipients would not see previous message in
the thread. I've pasted it at the bottom for reference.

I actually need my data as single year entries as I am assessing the effect
of a time-varying covariate ("var") that is measured (and changes) on an
annual basis. However, my events are assessed on a daily basis. But thank
you for pointing out the condition in Survr(). To correct the problem, I
believe it would still be valid to change the condition to:

if (length(unique(id)) >= length(event[event == 0])) {
  stop("Data doesn't match. Every subject must have a censored time")

Thank you for the reply!


On Thu, Nov 11, 2010 at 2:54 PM, David Winsemius wrote:

>
> On Nov 11, 2010, at 2:50 PM, David Winsemius wrote:
>
>
>> On Nov 11, 2010, at 2:09 PM, Emily wrote:
>>
>>
>>> I'm having the same problem
>>>
>>
>> (???: from a three year-old posting for which you didn't copy any
>> context.)
>>
>>  and was wondering whether you ever found a
>>> solution? It gives me the error "Error in Survr(id, time, event) : Data
>>> doesn't match. Every subject must have a censored time" even though all
>>> my
>>> subjects are right-censored, and to be sure, I've even used the
>>> addCenTime
>>> function. Any input appreciated!
>>>
>>
>> Your data has a lot of 0 events at the end of calendar years. That does
>> not
>>
>
>  seem to be the expected format for the Survr records. It appears to define
>> an
>>
>
>  invalid record as one where the only censoring event is at the time of the
>>
> ^valid^
>
>  last observation. Here's the first line in Survr that is throwing the
>> error:
>>
>> if (length(unique(id)) != length(event[event == 0])) {
>>   stop("Data doesn't match. Every subject must have a censored time")
>>
>> I suspect you need to collapse your single-year entries with 0 events into
>> multiple year entries with an event.
>>
>> --
>> David.
>>
>>
>>> Here's my sample data:
>>>
>>> id=c(rep(1,4),rep(2,4),rep(3,4),rep(4,5))
>>>
>>>
>>> start=c("1996-01-01","1997-01-01","1998-01-01","1998-03-15","1996-01-01","1996-04-15","1997-01-01","1998-01-01","1996-01-01","1997-01-01","1998-01-01","1998-09-30","1996-01-01","1997-01-01","1997-12-15","1998-01-01","1998-06-14")
>>>
>>>
>>> stop=c("1997-01-01","1998-01-01","1998-03-15","1999-01-01","1996-04-15","1997-01-01","1998-01-01","1999-01-01","1997-01-01","1998-01-01","1998-09-30","1999-01-01","1997-01-01","1997-12-15","1998-01-01","1998-06-14","1999-01-01")
>>>
>>> time=c(366,365,73,292,105,261,365,365,366,365,272,93,366,348,17,164,201)
>>>
>>> event=c(0,0,1,0,1,0,0,0,0,0,1,0,0,1,0,1,0)
>>>
>>> enum=c(rep(seq(1,4,1),4),5)
>>>
>>>
>>> var=c(21312,21869,22441,22441,3105,3105,3086,3075,130610,133147,135692,135692,11686,11976,11976,12251,12251)
>>>
>>> data=data.frame(id,start,stop,time,event,enum,var)
>>>
>>> dataOK=addCenTime(data)
>>> m<-gcmrec(Survr(id,time,event)~var, data=dataOK)
>>> --
>>> View this message in context:
>>> http://r.789695.n4.nabble.com/R-error-using-Survr-function-with-gcmrec-tp858931p3038374.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> David Winsemius, MD
> West Hartford, CT
>

Original message in thread:

Would someone be able to help with this question?  I'm using the
Gcmrec, Survrec, and Design packages to do a power analysis on
simulated data.  I'm receiving an error after using the Survr function
that all data must have a censoring time even after using the gcmrec
function:  newdata<-addCenTime(olddata).  My program is below.  I'd
greatly appreciate any help!

id<-c(seq(1,288,by=1),seq(1,79,by=1),seq(1,11,by=1))
x<-c(rep(0,5),rep(1,6),rep(0,45),rep(1,23),rep(0,124),rep(1,85),
  +rep(0,4),rep(1,1),rep(0,1),rep(1,5),rep(0,31),rep(1,14),rep(0,5),
  +rep(1,18),rep(0,8),rep(1,3))
myrates<-((1-x)*0.0639 + (x)*0.0320)
y<-c(rexp(378,rate=myrates))
cen<-c(rexp(378,0.0385))
time<-pmin(y,cen)
event<-as.numeric(y<=cen)
x2<-(x-1)*(-1)
bvdata<-data.frame(id,event,time,x2)
bvdata2<-addCenTime(bvdata)
fit<-cph(Survr(id,event,time)~x2,data=bvdata2)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https

Re: [R] randomForest can not handle categorical predictors with more than 32 categories

2010-11-11 Thread Sven Garbade
You can try ctree in package party, but anyway: what is the deeper
sense in a binary split for a variable with more than 32 levels?

Regards, Sven

2010/11/10 Erik Iverson :
> Well, the error message seems relatively straightforward.
>
> When you run str(x) (you did not provide the data)
>
> you should see 1 or more components are factors that have more than 32
> levels.  Apparently you can't include those predictors in a call
> to randomForest.
>
> You might find the following line of code useful:
>
> which(sapply(x, function(y) nlevels(y) > 32))
>
> Mai Dang wrote:
>>
>> I received this error
>> Error in randomForest.default(m, y, ...) :
>> Can not handle categorical predictors with more than 32 categories.
>>
>> using below code
>>
>> library(randomForest)
>> library(MASS)
>> memory.limit(size=12999)
>> x <- read.csv("D:/train_store_title_view.csv", header=TRUE)
>> x <- na.omit(x)
>> set.seed(131)
>> sales.rf <- randomForest(sales ~ ., data=x, mtry=3,
>> importance=TRUE)
>>
>> My machine (i7) running on 64 bit R with 12 gigs of RAM.
>>
>> Would anyone know how to avoid this error ?
>> Thank You for your reply,
>>
>> Mai Dang
>>
>>        [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Populating then sorting a matrix and/or data.frame

2010-11-11 Thread Peter Langfelder
On Thu, Nov 11, 2010 at 11:33 AM, Noah Silverman
 wrote:
> Still doesn't work.
>
> When using rbind to build the data.frame, it get a structure mostly full of
> NA.
> The data is correct, so something about pushing into the data.frame is
> breaking.
>
> Example code:
> results <- data.frame()
>
> for(i in 1:n){
>    #do all the work
>    #a is a test label. b,c,d are numeric.
>    results <- rbind(results, c(a,b,c,d))
> }

Works for me:

results <- data.frame()

n = 10
for(i in 1:n){
   a = LETTERS[i];
   b = i;
   c = 3*i + 2
   d = rnorm(1);
   results <- rbind(results, c(a,b,c,d))
}


> results

   X.A. X.1. X.5. X.0.142223304589023.
1 A150.142223304589023
2 B280.243612305595176
3 C3   110.476795513990516
4 D4   14  1.0278220664213
5 E5   170.916608672305205
6 F6   20 1.61075985995586
7 G7   230.370423691258896
8 H8   26  -0.0528603547004191
9 I9   29-2.07888666920403
10J   10   32-1.87980721733655

Maybe there's something wrong with the calculation you do?

Peter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] overlap histogram and density

2010-11-11 Thread Ted Harding
[OOPS!!I accidentally reproduced my second example below
 as my third example. Now corrected. See below.]

On 11-Nov-10 20:02:29, Ted Harding wrote:
 
 On 11-Nov-10 18:39:34, Roslina Zakaria wrote:
> Hi,
> Does anybody encounter the same problem when we overlap histogram
> and density that the density line seem to shift to the right a
> little bit?
> 
> If you do have the same problem, what should we do to correct that?
> Thank you.
> 
> par(mar=c(4,4,2,1.2),oma=c(0,0,0,0))
> hist(datobs,prob=TRUE,
>  main ="Volume of a catchment from four stations",
>  col="yellowgreen", cex.axis=1, xlab="rainfall",
>  ylab="Relative frequency", ylim= c(0,.003), xlim=c(0,1200))
> 
> lines(density(dd), lwd=3,col="red")
> 
>#legend("topright",c("observed","generated"),
>#   lty=c(0,1),fill=c("blue",""),bty="n")
> 
> legend("topright", legend = c("observed","generated"),
> col = c("yellowgreen", "red"), pch=c(15,NA), lty = c(0, 1), 
> lwd=c(0,3),bty="n", pt.cex=2)
> box()
> 
> Thank you.
 
In theory that is not a problem. The density() function will
estimate a density whose integral over each of the intervals
in the histogram is equal to the probability of that interval,
and the proportion of the data expected in that interval will
also be its probability.

In practice, the estent to which you observe what you describe
(or a displacement to the left) will depend on how your data
are distributed within the intervals, and on the precision
with which density() happens to estimate the true density.

The following 3 cases of the same data sampled from a log-Normal
distribution, illustrate different impressions of the kind that
one might get, depending on the details of the histogram. Note
that there is no overall effect of "displacement to the right
in any histogram, while the extent to which one observes it
varies according to the histogram. Without knowledge of your
data it is not possible to comment further on the extent to
which you have observed it yourself!

set.seed(54321)
N  <- 1000
X  <- exp(rnorm(N,sd=0.4))
dd <- density(X)

# A coarse histogram
H  <- hist(X,prob=TRUE,
   xlim=c(-0.5,4),ylim=c(0,max(dd$y)),breaks=0.5*(0:8))
dx <- unique(diff(H$breaks))
lines(dd$x,dd$y)
 
## A finer histogram
H  <- hist(X,prob=TRUE,
   xlim=c(-0.5,4),ylim=c(0,max(dd$y)),breaks=0.25*(0:16))
dx <- unique(diff(H$breaks))
lines(dd$x,dd$y)
 
## A still finer histogram
H  <- hist(X,prob=TRUE,
## OOPS!!  xlim=c(-0.5,4),ylim=c(0,max(dd$y)),breaks=0.25*(0:16))
   xlim=c(-0.5,4),ylim=c(0,max(dd$y)),breaks=0.20*(0:20))
dx <- unique(diff(H$breaks))
lines(dd$x,dd$y)
 
 
 Ted.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 11-Nov-10   Time: 20:12:27
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] overlap histogram and density

2010-11-11 Thread Ted Harding

On 11-Nov-10 18:39:34, Roslina Zakaria wrote:
> Hi,
> Does anybody encounter the same problem when we overlap histogram
> and density that the density line seem to shift to the right a
> little bit?
> 
> If you do have the same problem, what should we do to correct that?
> Thank you.
> 
> par(mar=c(4,4,2,1.2),oma=c(0,0,0,0))
> hist(datobs,prob=TRUE,
>  main ="Volume of a catchment from four stations",
>  col="yellowgreen", cex.axis=1, xlab="rainfall",
>  ylab="Relative frequency", ylim= c(0,.003), xlim=c(0,1200))
> 
> lines(density(dd), lwd=3,col="red")
> 
>#legend("topright",c("observed","generated"),
>#   lty=c(0,1),fill=c("blue",""),bty="n")
> 
> legend("topright", legend = c("observed","generated"),
> col = c("yellowgreen", "red"), pch=c(15,NA), lty = c(0, 1), 
> lwd=c(0,3),bty="n", pt.cex=2)
> box()
> 
> Thank you.

In theory that is not a problem. The density() function will
estimate a density whose integral over each of the intervals
in the histogram is equal to the probability of that interval,
and the proportion of the data expected in that interval will
also be its probability.

In practice, the estent to which you observe what you describe
(or a displacement to the left) will depend on how your data
are distributed within the intervals, and on the precision
with which density() happens to estimate the true density.

The following 3 cases of the same data sampled from a log-Normal
distribution, illustrate different impressions of the kind that
one might get, depending on the details of the histogram. Note
that there is no overall effect of "displacement to the right
in any histogram, while the extent to which one observes it
varies according to the histogram. Without knowledge of your
data it is not possible to comment further on the extent to
which you have observed it yourself!

set.seed(54321)
N  <- 1000
X  <- exp(rnorm(N,sd=0.4))
dd <- density(X)

## A coarse histogram
H  <- hist(X,prob=TRUE,
   xlim=c(-0.5,4),ylim=c(0,max(dd$y)),breaks=0.5*(0:8))
dx <- unique(diff(H$breaks))
lines(dd$x,dd$y)

## A finer histogram
H  <- hist(X,prob=TRUE,
   xlim=c(-0.5,4),ylim=c(0,max(dd$y)),breaks=0.25*(0:16))
dx <- unique(diff(H$breaks))
lines(dd$x,dd$y)

## A still finer histogram
H  <- hist(X,prob=TRUE,
   xlim=c(-0.5,4),ylim=c(0,max(dd$y)),breaks=0.25*(0:16))
dx <- unique(diff(H$breaks))
lines(dd$x,dd$y)


Ted.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 11-Nov-10   Time: 20:02:24
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R error using Survr function with gcmrec

2010-11-11 Thread David Winsemius


On Nov 11, 2010, at 2:50 PM, David Winsemius wrote:



On Nov 11, 2010, at 2:09 PM, Emily wrote:



I'm having the same problem


(???: from a three year-old posting for which you didn't copy any  
context.)



and was wondering whether you ever found a
solution? It gives me the error "Error in Survr(id, time, event) :  
Data
doesn't match. Every subject must have a censored time" even though  
all my
subjects are right-censored, and to be sure, I've even used the  
addCenTime

function. Any input appreciated!


Your data has a lot of 0 events at the end of calendar years. That  
does not


seem to be the expected format for the Survr records. It appears to  
define an


invalid record as one where the only censoring event is at the time  
of the

^valid^
last observation. Here's the first line in Survr that is throwing  
the error:


if (length(unique(id)) != length(event[event == 0])) {
   stop("Data doesn't match. Every subject must have a censored  
time")


I suspect you need to collapse your single-year entries with 0  
events into multiple year entries with an event.


--
David.



Here's my sample data:

id=c(rep(1,4),rep(2,4),rep(3,4),rep(4,5))

start 
= 
c 
("1996 
-01 
-01 
","1997 
-01 
-01 
","1998 
-01 
-01 
","1998 
-03 
-15 
","1996 
-01 
-01 
","1996 
-04 
-15 
","1997 
-01 
-01 
","1998 
-01 
-01 
","1996 
-01 
-01 
","1997 
-01 
-01 
","1998 
-01 
-01 
","1998 
-09 
-30 
","1996-01-01","1997-01-01","1997-12-15","1998-01-01","1998-06-14")


stop 
= 
c 
("1997 
-01 
-01 
","1998 
-01 
-01 
","1998 
-03 
-15 
","1999 
-01 
-01 
","1996 
-04 
-15 
","1997 
-01 
-01 
","1998 
-01 
-01 
","1999 
-01 
-01 
","1997 
-01 
-01 
","1998 
-01 
-01 
","1998 
-09 
-30 
","1999 
-01 
-01 
","1997-01-01","1997-12-15","1998-01-01","1998-06-14","1999-01-01")


time 
=c(366,365,73,292,105,261,365,365,366,365,272,93,366,348,17,164,201)


event=c(0,0,1,0,1,0,0,0,0,0,1,0,0,1,0,1,0)

enum=c(rep(seq(1,4,1),4),5)

var 
= 
c 
(21312,21869,22441,22441,3105,3105,3086,3075,130610,133147,135692,135692,11686,11976,11976,12251,12251 
)


data=data.frame(id,start,stop,time,event,enum,var)

dataOK=addCenTime(data)
m<-gcmrec(Survr(id,time,event)~var, data=dataOK)
--
View this message in context: 
http://r.789695.n4.nabble.com/R-error-using-Survr-function-with-gcmrec-tp858931p3038374.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] overlap histogram and density

2010-11-11 Thread Ben Bolker
Roslina Zakaria  yahoo.com> writes:

> 
> Hi,
> 
> Does anybody encounter the same problem when we overlap histogram and density 
>     that the density line seem to shift to the right a little bit?
>      

>     par(mar=c(4,4,2,1.2),oma=c(0,0,0,0))
>     hist(datobs,prob=TRUE, main ="Volume of a catchment from four 
>     stations",col="yellowgreen", cex.axis=1,
>     xlab="rainfall",ylab="Relative frequency", ylim= c(0,.003), 
> xlim=c(0,1200))
>     lines(density(dd), lwd=3,col="red")
>     legend("topright", legend = c("observed","generated"),
>        col = c("yellowgreen", "red"), pch=c(15,NA), lty = c(0, 1), 
>        lwd=c(0,3),bty="n", pt.cex=2)
>     box()

   Are dd and datobs the same?
   There is nothing obviously (to me) wrong here.
   Density estimation by definition smears out sharp peaks, which
can lead to differences between the histogram and density estimate.
   Hard to say any more without a reproducible example.

z <- rnorm(5000)
hist(z,prob=TRUE,col="gray",breaks=100)
lines(density(z),col="red")

  looks fine to me.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict.coxph and predict.survreg

2010-11-11 Thread Michael Haenlein
David, Mattia, James -- thanks so much for all your helpful comments!
I now have a much better understanding of how to calculate what I'm
interested in ... and what the risks are of doing so.
Thanks and all the best,
Michael


On Thu, Nov 11, 2010 at 7:33 PM, David Winsemius wrote:

>
> On Nov 11, 2010, at 12:14 PM, Michael Haenlein wrote:
>
>  Thanks for the comment, James!
>>
>> The problem is that my initial sample (Dataset 1) is truncated. That means
>> I
>> only observe "time to death" for those individuals who actually died
>> before
>> end of my observation period. It is my understanding that this type of
>> truncation creates a bias when I use a "normal" regression analysis. Hence
>> my idea to use some form of survival model.
>>
>> I had another look at predict.survreg and I think the option "response"
>> could work for me.
>> When I run the following code I get ptime = 290.3648.
>> I assume this means that an individual with ph.ecog=2 can be expected to
>> life another 290.3648 days before death occurs [days is the time scale of
>> the time variable).
>>
>
> It is a prediction under specific assumptions underpinning a parametric
> estimate.
>
>
>  Could someone confirm whether this makes sense?
>>
>
> You ought to confirm that it "makes sense" by comparing to your data:
> reauire(Hmisc); require(survival)
> 
>
> > describe(lung[lung$status==1&lung$ph.ecog==2,"time"])
> lung[lung$status == 1 & lung$ph.ecog == 2, "time"]
>  n missing  uniqueMean
>  6   0   6   293.7
>
>  92 105 211 292 511 551
> Frequency  1   1   1   1   1   1
> % 17  17  17  17  17  17
>
> > ?lung
>
> So status==1 is a censored case and the observed times are status==2
> > describe(lung[lung$status==2&lung$ph.ecog==2,"time"])
> lung[lung$status == 2 & lung$ph.ecog == 2, "time"]
>  n missing  uniqueMean .05 .10 .25 .50 .75
> .90 .95
> 44   1  44   226.0   14.95   36.90   94.50  178.50  295.75
>  500.00  635.85
>
> lowest :  11  12  13  26  30, highest: 524 533 654 707 814
>
> And the mean time to death (in a group that had only 6 censored individual
> at times from 92 to 551)  was 226 and median time to death among 44
> individuals is 178 with a right skewed distribution. You need to decide
> whether you want to make that particular prediction when you know that you
> forced a specific distributional form on the regression machinery by
> accepting the default.
>
>
>
>
>> lfit <- survreg(Surv(time, status) ~ ph.ecog, data=lung)
>> ptime <- predict(lfit, newdata=data.frame(ph.ecog=2), type='response')
>>
>>
>>
>> On Thu, Nov 11, 2010 at 5:26 PM, James C. Whanger
>> wrote:
>>
>>  Michael,
>>>
>>> You are looking to compute an estimated time to death -- rather than the
>>> odds of death conditional upon time.  Thus, you will want to use "time to
>>> death" as your dependent variable rather than a dichotomous outcome (
>>> 0=alive, 1=death).   You can accomplish this with a straight forward
>>> regression analysis.
>>>
>>> Best,
>>>
>>> Jim
>>>
>>> On Thu, Nov 11, 2010 at 3:44 AM, Michael Haenlein <
>>> haenl...@escpeurope.eu>wrote:
>>>
>>>  Dear all,

 I'm struggling with predicting "expected time until death" for a coxph
 and
 survreg model.

 I have two datasets. Dataset 1 includes a certain number of people for
 which
 I know a vector of covariates (age, gender, etc.) and their event times
 (i.e., I know whether they have died and when if death occurred prior to
 the
 end of the observation period). Dataset 2 includes another set of people
 for
 which I only have the covariate vector. I would like to use Dataset 1 to
 calibrate either a coxph or survreg model and then use this model to
 determine an "expected time until death" for the individuals in Dataset
 2.
 For example, I would like to know when a person in Dataset 2 will die,
 given
 his/ her age and gender.

 I checked predict.coxph and predict.survreg as well as the document "A
 Package for Survival Analysis in S" written by Terry M. Therneau but I
 have
 to admit that I'm a bit lost here.

 Could anyone give me some advice on how this could be done?

 Thanks very much in advance,

 Michael



 Michael Haenlein
 Professor of Marketing

>>>
>
> David Winsemius, MD
> West Hartford, CT
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Consistency of Logistic Regression

2010-11-11 Thread Ben Bolker
Albyn Jones  reed.edu> writes:

> 
> do you have factors (categorical variables) in the model?  it could be
> just a parameterization difference.
> 
> albyn
> 
> On Thu, Nov 11, 2010 at 12:41:03PM -0500, Benjamin Godlove wrote:
> > Dear R developers,
> > 
> > I have noticed a discrepancy between the coefficients returned by R's glm()
> > for logistic regression and SAS's PROC LOGISTIC.  I am using dist = binomial
> > and link = logit for both R and SAS.  I believe R uses IRLS whereas SAS uses
> > Fisher's scoring, but the difference is something like 100 SE on the
> > intercept.  What accounts for such a huge difference?


  As previous posters said.  Specifically:

* a huge change in the intercept is very unlikely to be caused by
a change in underlying algorithm unless the data are pathological
(separation, convergence issues, etc.)

* R's default 'intercept' is the value of the first factor
level; SAS's is the value of the last factor level. See ?contr.SAS ...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R error using Survr function with gcmrec

2010-11-11 Thread David Winsemius


On Nov 11, 2010, at 2:09 PM, Emily wrote:



I'm having the same problem


(???: from a three year-old posting for which you didn't copy any  
context.)



and was wondering whether you ever found a
solution? It gives me the error "Error in Survr(id, time, event) :  
Data
doesn't match. Every subject must have a censored time" even though  
all my
subjects are right-censored, and to be sure, I've even used the  
addCenTime

function. Any input appreciated!


Your data has a lot of 0 events at the end of calendar years. That  
does not seem to be the expected format for the Survr records. It  
appears to define an invalid record as one where the only censoring  
event is at the time of the last observation. Here's the first line in  
Survr that is throwing the error:


 if (length(unique(id)) != length(event[event == 0])) {
stop("Data doesn't match. Every subject must have a censored  
time")


I suspect you need to collapse your single-year entries with 0 events  
into multiple year entries with an event.


--
David.



Here's my sample data:

id=c(rep(1,4),rep(2,4),rep(3,4),rep(4,5))

start 
= 
c 
("1996 
-01 
-01 
","1997 
-01 
-01 
","1998 
-01 
-01 
","1998 
-03 
-15 
","1996 
-01 
-01 
","1996 
-04 
-15 
","1997 
-01 
-01 
","1998 
-01 
-01 
","1996 
-01 
-01 
","1997 
-01 
-01 
","1998 
-01 
-01 
","1998 
-09 
-30","1996-01-01","1997-01-01","1997-12-15","1998-01-01","1998-06-14")


stop 
= 
c 
("1997 
-01 
-01 
","1998 
-01 
-01 
","1998 
-03 
-15 
","1999 
-01 
-01 
","1996 
-04 
-15 
","1997 
-01 
-01 
","1998 
-01 
-01 
","1999 
-01 
-01 
","1997 
-01 
-01 
","1998 
-01 
-01 
","1998 
-09 
-30 
","1999 
-01 
-01","1997-01-01","1997-12-15","1998-01-01","1998-06-14","1999-01-01")


time 
=c(366,365,73,292,105,261,365,365,366,365,272,93,366,348,17,164,201)


event=c(0,0,1,0,1,0,0,0,0,0,1,0,0,1,0,1,0)

enum=c(rep(seq(1,4,1),4),5)

var 
= 
c 
(21312,21869,22441,22441,3105,3105,3086,3075,130610,133147,135692,135692,11686,11976,11976,12251,12251 
)


data=data.frame(id,start,stop,time,event,enum,var)

dataOK=addCenTime(data)
m<-gcmrec(Survr(id,time,event)~var, data=dataOK)
--
View this message in context: 
http://r.789695.n4.nabble.com/R-error-using-Survr-function-with-gcmrec-tp858931p3038374.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Populating then sorting a matrix and/or data.frame

2010-11-11 Thread Noah Silverman

Still doesn't work.

When using rbind to build the data.frame, it get a structure mostly full 
of NA.
The data is correct, so something about pushing into the data.frame is 
breaking.


Example code:
results <- data.frame()

for(i in 1:n){
#do all the work
#a is a test label. b,c,d are numeric.
results <- rbind(results, c(a,b,c,d))
}



On 11/11/10 2:00 AM, Michael Bedward wrote:

You can use rbind as in your original post, but if you've got a mix of
character and numeric data start with a data.frame rather than a
matrix.

Michael

On 11 November 2010 20:30, Noah Silverman  wrote:

That makes perfect sense.

Since I need to build up the results table sequentially as I iterate through
the data, how would you recommend it??

Thanks,

-N

On 11/11/10 12:03 AM, Michael Bedward wrote:

All values in a matrix are the same type, so if you've set up a matrix
with a character column then your numeric values will also be stored
as character. That would explain why they are being converted to
factors.  It would also explain why your query isn't working.

Michael


On 11 November 2010 18:40, Noah Silvermanwrote:

That was a typo.

It should have read:
results[results$one<100,]

It does still fail.

There is ONE column that is text.  So my guess is that R is seeing that
and
assuming that the entire data.frame should be factors.

-N

On 11/10/10 11:16 PM, Michael Bedward wrote:

Hello Noah,

If you set these names...

names(results)<- c("one", "two", "three")

this won't work...

results[results$c<  100,]

because you don't have a column called "c" (unless that's just a typo
in your post).


I tried making it a data.frame with
foo<- data.frame(results)

But that converted all the numeric values to factors!!!

Not sure what's going on there. If 'results' is a numeric matrix you
should get a data.frame with numeric cols since under the hood this is
just calling the as.data.frame function.

Michael





On 11 November 2010 16:02, Noah Silverman
  wrote:

Hi,

I have a process in R that produces a lot of output.  My plan was to
build
up a matrix or data.frame "row by row", so that I'll have a nice object
with
all the resulting data.

I started with:
results<- matrix(ncol=3)
names(results)<- c("one", "two", "three")

Then, when looping through the data:
results<- rbind(results, c(a,b,c))

This seems to work fine. BUT, my problem arises when I want to filter,
sort,
etc.

I tried (thinking like a data.frame):
results[results$c<  100,]

But that fails.

I tried making it a data.frame with
foo<- data.frame(results)

But that converted all the numeric values to factors!!!  Which causes a
whole mess of problems.

Any ideas??

-N

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R error using Survr function with gcmrec

2010-11-11 Thread Emily

I'm having the same problem and was wondering whether you ever found a
solution? It gives me the error "Error in Survr(id, time, event) : Data
doesn't match. Every subject must have a censored time" even though all my
subjects are right-censored, and to be sure, I've even used the addCenTime
function. Any input appreciated!

Here's my sample data:

id=c(rep(1,4),rep(2,4),rep(3,4),rep(4,5))

start=c("1996-01-01","1997-01-01","1998-01-01","1998-03-15","1996-01-01","1996-04-15","1997-01-01","1998-01-01","1996-01-01","1997-01-01","1998-01-01","1998-09-30","1996-01-01","1997-01-01","1997-12-15","1998-01-01","1998-06-14")

stop=c("1997-01-01","1998-01-01","1998-03-15","1999-01-01","1996-04-15","1997-01-01","1998-01-01","1999-01-01","1997-01-01","1998-01-01","1998-09-30","1999-01-01","1997-01-01","1997-12-15","1998-01-01","1998-06-14","1999-01-01")

time=c(366,365,73,292,105,261,365,365,366,365,272,93,366,348,17,164,201)

event=c(0,0,1,0,1,0,0,0,0,0,1,0,0,1,0,1,0)

enum=c(rep(seq(1,4,1),4),5)

var=c(21312,21869,22441,22441,3105,3105,3086,3075,130610,133147,135692,135692,11686,11976,11976,12251,12251)

data=data.frame(id,start,stop,time,event,enum,var)

dataOK=addCenTime(data)
m<-gcmrec(Survr(id,time,event)~var, data=dataOK)
-- 
View this message in context: 
http://r.789695.n4.nabble.com/R-error-using-Survr-function-with-gcmrec-tp858931p3038374.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Consistency of Logistic Regression

2010-11-11 Thread Albyn Jones
do you have factors (categorical variables) in the model?  it could be
just a parameterization difference.

albyn

On Thu, Nov 11, 2010 at 12:41:03PM -0500, Benjamin Godlove wrote:
> Dear R developers,
> 
> I have noticed a discrepancy between the coefficients returned by R's glm()
> for logistic regression and SAS's PROC LOGISTIC.  I am using dist = binomial
> and link = logit for both R and SAS.  I believe R uses IRLS whereas SAS uses
> Fisher's scoring, but the difference is something like 100 SE on the
> intercept.  What accounts for such a huge difference?
> 
> Thank you for your time.
> 
> Ben Godlove
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] User input after opening graphing device

2010-11-11 Thread Bert Gunter
PLEASE follow the posting guide and give us the output of sessionInfo().

I was unable to duplicate your problem -- it worked fine for me.

> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
 [1] datasets  splines   grid  tcltk stats graphics  grDevices
 [8] utils methods   base

other attached packages:
[1] TinnR_1.0.3 R2HTML_2.2  Hmisc_3.8-3 survival_2.35-8
[5] svSocket_0.9-48 lattice_0.19-13 MASS_7.3-8

loaded via a namespace (and not attached):
[1] cluster_1.13.1 svMisc_0.9-60  tools_2.12.0

-- Bert

On Thu, Nov 11, 2010 at 9:52 AM,   wrote:
>
> If I run the following:
>
>> windows()
>>
>> bringToTop(-1)
>>
>> interactive()
> [1] TRUE
>>
>> run <- readline(prompt = "Continue (Yes = 1, No = 2):")
> Continue (Yes = 1, No = 2):
>> dummy <- 1
>> run
> [1] ""
>
> it does not allow user input though the session is interactive (it jumps 
> right over the readline command).
>
> It would be great if you would have any suggestions of how to solve the 
> problem.
>
> Thanks,
>
> Jan
>
>
>
>
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Bert Gunter
Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Consistency of Logistic Regression

2010-11-11 Thread Erik Iverson

Is the algorithm converging? Is there separation (i.e.,
perfect predictor) in the model?
Are you getting a warning about fitted probabilities of
0 or 1?, etc.

We would need much more information (preferably a reproducible
example) before we can help.

Benjamin Godlove wrote:

Dear R developers,

I have noticed a discrepancy between the coefficients returned by R's glm()
for logistic regression and SAS's PROC LOGISTIC.  I am using dist = binomial
and link = logit for both R and SAS.  I believe R uses IRLS whereas SAS uses
Fisher's scoring, but the difference is something like 100 SE on the
intercept.  What accounts for such a huge difference?

Thank you for your time.

Ben Godlove

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] overlap histogram and density

2010-11-11 Thread Roslina Zakaria
Hi,

Does anybody encounter the same problem when we overlap histogram and density 
    that the density line seem to shift to the right a little bit?
     
    If you do have the same problem, what should we do to correct that?
     
    Thank you.
     
    par(mar=c(4,4,2,1.2),oma=c(0,0,0,0))
    hist(datobs,prob=TRUE, main ="Volume of a catchment from four 
    stations",col="yellowgreen", cex.axis=1,
    xlab="rainfall",ylab="Relative frequency", ylim= c(0,.003), xlim=c(0,1200))
    lines(density(dd), lwd=3,col="red")
    
#legend("topright",c("observed","generated"),lty=c(0,1),fill=c("blue",""),bty="n")

    legend("topright", legend = c("observed","generated"),
       col = c("yellowgreen", "red"), pch=c(15,NA), lty = c(0, 1), 
       lwd=c(0,3),bty="n", pt.cex=2)
    box()

Thank you.


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] User input after opening graphing device

2010-11-11 Thread jeitel

If I run the following:  
 
> windows()  
> 
> bringToTop(-1)
> 
> interactive()
[1] TRUE
> 
> run <- readline(prompt = "Continue (Yes = 1, No = 2):")  
Continue (Yes = 1, No = 2):
> dummy <- 1
> run
[1] ""

it does not allow user input though the session is interactive (it jumps right 
over the readline command). 

It would be great if you would have any suggestions of how to solve the problem.

Thanks,
 
Jan




  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Consistency of Logistic Regression

2010-11-11 Thread Benjamin Godlove
Dear R developers,

I have noticed a discrepancy between the coefficients returned by R's glm()
for logistic regression and SAS's PROC LOGISTIC.  I am using dist = binomial
and link = logit for both R and SAS.  I believe R uses IRLS whereas SAS uses
Fisher's scoring, but the difference is something like 100 SE on the
intercept.  What accounts for such a huge difference?

Thank you for your time.

Ben Godlove

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict.coxph and predict.survreg

2010-11-11 Thread David Winsemius


On Nov 11, 2010, at 12:14 PM, Michael Haenlein wrote:


Thanks for the comment, James!

The problem is that my initial sample (Dataset 1) is truncated. That  
means I
only observe "time to death" for those individuals who actually died  
before

end of my observation period. It is my understanding that this type of
truncation creates a bias when I use a "normal" regression analysis.  
Hence

my idea to use some form of survival model.

I had another look at predict.survreg and I think the option  
"response"

could work for me.
When I run the following code I get ptime = 290.3648.
I assume this means that an individual with ph.ecog=2 can be  
expected to
life another 290.3648 days before death occurs [days is the time  
scale of

the time variable).


It is a prediction under specific assumptions underpinning a  
parametric estimate.



Could someone confirm whether this makes sense?


You ought to confirm that it "makes sense" by comparing to your data:
reauire(Hmisc); require(survival)


> describe(lung[lung$status==1&lung$ph.ecog==2,"time"])
lung[lung$status == 1 & lung$ph.ecog == 2, "time"]
  n missing  uniqueMean
  6   0   6   293.7

  92 105 211 292 511 551
Frequency  1   1   1   1   1   1
% 17  17  17  17  17  17

> ?lung

So status==1 is a censored case and the observed times are status==2
> describe(lung[lung$status==2&lung$ph.ecog==2,"time"])
lung[lung$status == 2 & lung$ph.ecog == 2, "time"]
  n missing  uniqueMean .05 .10 .25 .50 . 
75 .90 .95
 44   1  44   226.0   14.95   36.90   94.50  178.50   
295.75  500.00  635.85


lowest :  11  12  13  26  30, highest: 524 533 654 707 814

And the mean time to death (in a group that had only 6 censored  
individual at times from 92 to 551)  was 226 and median time to death  
among 44 individuals is 178 with a right skewed distribution. You need  
to decide whether you want to make that particular prediction when you  
know that you forced a specific distributional form on the regression  
machinery by accepting the default.





lfit <- survreg(Surv(time, status) ~ ph.ecog, data=lung)
ptime <- predict(lfit, newdata=data.frame(ph.ecog=2), type='response')



On Thu, Nov 11, 2010 at 5:26 PM, James C. Whanger
wrote:


Michael,

You are looking to compute an estimated time to death -- rather  
than the
odds of death conditional upon time.  Thus, you will want to use  
"time to

death" as your dependent variable rather than a dichotomous outcome (
0=alive, 1=death).   You can accomplish this with a straight forward
regression analysis.

Best,

Jim

On Thu, Nov 11, 2010 at 3:44 AM, Michael Haenlein >wrote:



Dear all,

I'm struggling with predicting "expected time until death" for a  
coxph and

survreg model.

I have two datasets. Dataset 1 includes a certain number of people  
for

which
I know a vector of covariates (age, gender, etc.) and their event  
times
(i.e., I know whether they have died and when if death occurred  
prior to

the
end of the observation period). Dataset 2 includes another set of  
people

for
which I only have the covariate vector. I would like to use  
Dataset 1 to

calibrate either a coxph or survreg model and then use this model to
determine an "expected time until death" for the individuals in  
Dataset 2.
For example, I would like to know when a person in Dataset 2 will  
die,

given
his/ her age and gender.

I checked predict.coxph and predict.survreg as well as the  
document "A
Package for Survival Analysis in S" written by Terry M. Therneau  
but I

have
to admit that I'm a bit lost here.

Could anyone give me some advice on how this could be done?

Thanks very much in advance,

Michael



Michael Haenlein
Professor of Marketing



David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] standardized/studentized residuals with loess

2010-11-11 Thread Bert Gunter
This is a simple and sensible question that does not have a simple
answer. It's a research issue, and you should go to the literature to
see what approaches seem appropriate.

HOWEVER, one simple descriptive approach -- which, however, may have
important statistical flaws -- is to run loess on the absolute values
(or,perhaps, their sqare roots)of the raw residuals. I think this
might be asymptotically reasonable with a fixed smoothing window, but
small sample behavior could well be awful. Carefully done simulations
might help determine this.

As I said, no simple answer. Perhaps others with real expertise might
comment/correct -- maybe offlist, as this seems to be wandering from
R.

-- Bert



On Thu, Nov 11, 2010 at 5:08 AM, Oliver Frings
 wrote:
> Hi Josh,
>
> many thank's for your reply. I tried to read up on this more and to be frank
> I got a bit confused about the exact definition of residual standardization.
> It occurs to me that different people have different definitions and that it
> can be done with and without the leverage of each point. Anyhow, the way you
> did it seems correct to me! My only problem is now that it assumes the same
> standard error for each point. My data have definitely different standard
> deviations at different points. So I was wondering if there is a way to do
> it that accounts for the different standard deviations at different points?
>
> Many thanks!
>
> /Oliver
>
> On Wed, Nov 10, 2010 at 8:21 PM, Joshua Wiley wrote:
>
>> Hi Oliver,
>>
>> As a warning, I may be missing something too.  I did not see something
>> explicit in base R or MASS.  In a quick scan of the fourth edition of
>> the MASS book, I did not read anything that it is
>> illogical/unreasonable to try to find standardized residuals (but my
>> knowledge of local regression approaches nil).  With that background,
>> I proceeded to blithely scavenge from other functions until I came up
>> with this:
>>
>> loess.stdres <- function(model) {
>>  res <- model$residuals
>>  s <- sqrt(sum(res^2)/(length(res) - model$enp))
>>  stdres <- res/(sqrt(1 - hat(res)) * s)
>>  return(stdres)
>> }
>>
>> ## now for a half-baked check
>>
>> ## fit linear model and local regression
>> cars.lm <- lm(dist ~ speed, cars)
>> cars.lo <- loess(dist ~ speed, cars)
>>
>> ## these seem somewhat similar
>> rstandard(cars.lm)
>> c(scale(residuals(cars.lm)))
>>
>> ## these seem somewhat similar too
>> loess.stdres(cars.lo)
>> c(scale(cars.lo$residuals))
>>
>>
>> Cheers,
>>
>> Josh
>>
>>
>>
>> On Wed, Nov 10, 2010 at 9:24 AM, Oliver Frings
>>  wrote:
>> > Hi all,
>> >
>> > I'm trying to apply loess regression to my data and then use the fitted
>> > model to get the *standardized/studentized residuals. I understood that
>> for
>> > linear regression (lm) there are functions to do that:*
>> > *
>> > *
>> > fit1 = lm(y~x)
>> > stdres.fit1 = rstandard(fit1)
>> > studres.fit1 = rstudent(fit1)
>> >
>> > I was wondering if there is an equally simple way to get
>> > the standardized/studentized residuals for a loess model? BTW
>> > my apologies if there is something here that I'm missing.
>> >
>> > All the best,
>> > *
>> > *
>> > *Oliver *
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>
>>
>> --
>> Joshua Wiley
>> Ph.D. Student, Health Psychology
>> University of California, Los Angeles
>> http://www.joshuawiley.com/
>>
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Bert Gunter
Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict.coxph and predict.survreg

2010-11-11 Thread Michael Haenlein
Thanks for the comment, James!

The problem is that my initial sample (Dataset 1) is truncated. That means I
only observe "time to death" for those individuals who actually died before
end of my observation period. It is my understanding that this type of
truncation creates a bias when I use a "normal" regression analysis. Hence
my idea to use some form of survival model.

I had another look at predict.survreg and I think the option "response"
could work for me.
When I run the following code I get ptime = 290.3648.
I assume this means that an individual with ph.ecog=2 can be expected to
life another 290.3648 days before death occurs [days is the time scale of
the time variable).
Could someone confirm whether this makes sense?

lfit <- survreg(Surv(time, status) ~ ph.ecog, data=lung)
ptime <- predict(lfit, newdata=data.frame(ph.ecog=2), type='response')



On Thu, Nov 11, 2010 at 5:26 PM, James C. Whanger
wrote:

> Michael,
>
> You are looking to compute an estimated time to death -- rather than the
> odds of death conditional upon time.  Thus, you will want to use "time to
> death" as your dependent variable rather than a dichotomous outcome (
> 0=alive, 1=death).   You can accomplish this with a straight forward
> regression analysis.
>
> Best,
>
> Jim
>
> On Thu, Nov 11, 2010 at 3:44 AM, Michael Haenlein 
> wrote:
>
>> Dear all,
>>
>> I'm struggling with predicting "expected time until death" for a coxph and
>> survreg model.
>>
>> I have two datasets. Dataset 1 includes a certain number of people for
>> which
>> I know a vector of covariates (age, gender, etc.) and their event times
>> (i.e., I know whether they have died and when if death occurred prior to
>> the
>> end of the observation period). Dataset 2 includes another set of people
>> for
>> which I only have the covariate vector. I would like to use Dataset 1 to
>> calibrate either a coxph or survreg model and then use this model to
>> determine an "expected time until death" for the individuals in Dataset 2.
>> For example, I would like to know when a person in Dataset 2 will die,
>> given
>> his/ her age and gender.
>>
>> I checked predict.coxph and predict.survreg as well as the document "A
>> Package for Survival Analysis in S" written by Terry M. Therneau but I
>> have
>> to admit that I'm a bit lost here.
>>
>> Could anyone give me some advice on how this could be done?
>>
>> Thanks very much in advance,
>>
>> Michael
>>
>>
>>
>> Michael Haenlein
>> Professor of Marketing
>> ESCP Europe
>> Paris, France
>>
>>[[alternative HTML version deleted]]
>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> *James C. Whanger
> Research Consultant
> 2 Wolf Ridge Gap
> Ledyard, CT  06339
>
> Phone: 860.389.0414*
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov Smirnov Test

2010-11-11 Thread Greg Snow
Consider the following simulations (also fixing the pnorm instead of dnorm that 
Ted pointed out and I missed):

out1 <- replicate(1, { 
x <- rnorm(1000, 100, 3);
ks.test( x, pnorm, mean=100, sd=3 )$p.value
} )

out2 <- replicate(1, {
x <- rnorm(1000, 100, 3);
ks.test( x, pnorm, mean=mean(x), sd=sd(x) )$p.value
} )

par(mfrow=c(2,1))
hist(out1)
hist(out2)

mean(out1 <= 0.05 )
mean(out2 <= 0.05 )


In both cases the null hypothesis is true (or at least a meaningful 
approximation to true) so the p-values should follow a uniform distribution.  
In the case of out1 where the mean and sd are specified as part of the null the 
p-values are reasonably uniform and the rejection rate is close to alpha 
(should asymptotically approach alpha as the number of simulations increases).  
However looking at out2, where the parameters are set not by outside knowledge 
or tests, but rather from the observed data, the p-values are clearly not 
uniform and the rejection rate is far from alpha.


-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Kerry
> Sent: Thursday, November 11, 2010 12:02 AM
> To: r-help@r-project.org
> Subject: Re: [R] Kolmogorov Smirnov Test
> 
> Thanks for the feedback. My goal is to run a simple test to show that
> the data cannot be rejected as either normally or uniformally
> distributed (depening on the variable), which is what a previous K-S
> test run using SPSS had shown. The actual distribution I compare to my
> sample only matters that it would be rejected were my data multi-
> modal. This way I can suggest the data is from the same population. I
> later run PCA and cluster analyses to confirm this but I want an easy
> stat to start with for the individual variables.
> 
> I didn't think I was comparing my data against itself, but rather
> again a normal distribution with the same mean and standard deviation.
> Using the mean seems necessary, so is it incorrect to have the same
> standard deviation too? I need to go back and read on the K-S test to
> see what the appropriate constraints are before bothering anyone for
> more help. Sorry, I thought I had it.
> 
> Thanks again,
> kbrownk
> 
> On Nov 11, 12:40 am, Greg Snow  wrote:
> > The way you are running the test the null hypothesis is that the data
> comes from a normal distribution with mean=0 and standard deviation =
> 1.  If your minimum data value is 0, then it seems very unlikely that
> the mean is 0.  So the test is being strongly influenced by the mean
> and standard deviation not just the shape of the distribution.
> >
> > Note that the KS test was not designed to test against a distribution
> with parameters estimated from the same data (you can do the test, but
> it makes the p-value inaccurate).  You can do a little better by
> simulating the process and comparing the KS statistic to the
> simulations rather than looking at the computed p-value.
> >
> > However you should ask yourself why you are doing the normality tests
> in the first place.  The common reasons that people do this don't match
> with what the tests actually test (see the fortunes on normality).
> >
> > --
> > Gregory (Greg) L. Snow Ph.D.
> > Statistical Data Center
> > Intermountain Healthcare
> > greg.s...@imail.org
> > 801.408.8111
> >
> >
> >
> > > -Original Message-
> > > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> > > project.org] On Behalf Of Kerry
> > > Sent: Wednesday, November 10, 2010 9:23 PM
> > > To: r-h...@r-project.org
> > > Subject: [R] Kolmogorov Smirnov Test
> >
> > > I'm using ks.test (mydata, dnorm) on my data. I know some of my
> > > different variable samples (mydata1, mydata2, etc) must be normally
> > > distributed but the p value is always < 2.0^-16 (the 2.0 can change
> > > but not the exponent).
> >
> > > I want to test mydata against a normal distribution. What could I
> be
> > > doing wrong?
> >
> > > I tried instead using rnorm to create a normal distribution: y =
> rnorm
> > > (68,mean=mydata, sd=mydata), where N= the sample size from mydata.
> > > Then I ran the k-s: ks.test (mydata,y). Should this work?
> >
> > > One issue I had was that some of my data has a minimum value of 0,
> but
> > > rnorm ran as I have it above will potentially create negative
> numbers.
> >
> > > Also some of my variables will likely be better tested against non-
> > > normal distributions (uniform etc.), but if I figure I should learn
> > > how to even use ks.test first.
> >
> > > I used to use SPSS but am really trying to jump into R instead, but
> I
> > > find the help to assume too heavy of statistical knowledge.
> >
> > > I'm guessing I have a long road before I get this, so any bits of
> > > information that may help me get a bit further will be appreciated!
> >
> > > Thanks,
> > > kbr

[R] R2WinBUGS: Error in bugs(program="openbugs") but not openbugs()

2010-11-11 Thread Yuelin Li
I get an error when I call bugs(..., program="OpenBUGS",
bugs.directory="c:/Program Files/OpenBUGS/OpenBUGS312"),
expecting, as suggested in help(bugs), that it would fit the
model with openbugs() via BRugs.

 > help(bugs)
 ... either winbugs/WinBUGS or openbugs/OpenBUGS, the latter
 makes use of function openbugs and requires the CRAN package
 BRugs. ...

However, it works fine when I directly call openbugs().  All
other things are exactly the same.  It seems that (in my
settings) bugs(program="OpenBUGS") works differently than
openbugs().  Am I doing something wrong with bugs()?   Or
there is something wrong with my OpenBUGS installation?

I am using R-2.12.0, R2WinBUGS 2.1-16 (2009-11-06), OpenBUGS
3.1.2 rev 668 (2010-09-28), and BRugs 0.5-3 (2009-11-06)  on
a Windows XP machine.

Yuelin.

--- R file -
require(R2WinBUGS)
require(BRugs)
# Example in Albert (2007).  Bayesian Computation with R.  Springer.
# pp. 237-238.  Prior = beta(0.5, 0.5), observe Binom(n, p) 
# y=7 successes out of a sample of n=50.  Estimate p.
y <- 7
n <- 50
alpha <- 1.0 
beta <- 1.0 
data <- list("y", "n", "alpha", "beta")
inits <- function() { list(p = runif(1)) }
param <- "p"
# this works
Albert.bugs <- openbugs(data=data, inits=inits, parameters.to.save=param, 
model.file = "C:/tryR/WinBUGS/Albert11.txt", n.chains=3, n.iter=500)
print(Albert.bugs, digits.summary = 4)
# this fails
Albert.bugs <- bugs(data=data, inits=inits, parameters.to.save=param, 
model.file="C:/tryR/WinBUGS/Albert11.txt", n.chains=3, n.iter=500, 
program="OpenBUGS")

--- BUGS file: Albert11.txt --
model 
{ 
y ~ dbin(p, n)
p ~ dbeta( alpha, beta )
}


 
 =
 
 Please note that this e-mail and any files transmitted with it may be 
 privileged, confidential, and protected from disclosure under 
 applicable law. If the reader of this message is not the intended 
 recipient, or an employee or agent responsible for delivering this 
 message to the intended recipient, you are hereby notified that any 
 reading, dissemination, distribution, copying, or other use of this 
 communication or any of its attachments is strictly prohibited.  If 
 you have received this communication in error, please notify the 
 sender immediately by replying to this message and deleting this 
 message, any attachments, and all copies and backups from your 
 computer.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [lattice] densityplot label the peak.

2010-11-11 Thread Joon Yeong Kim
I tried using panel function but wasn't sure how to get the y value for the
peak of the densityplot.

I'm thinking there should be a way to retrieve the densityplot object so
that I can get the (x,y) values of the curve directly from the object.

Anyone know how to do this?

Thank you.

Joon

On Thu, Nov 11, 2010 at 12:29 AM, baptiste auguie <
baptiste.aug...@googlemail.com> wrote:

> Hi,
>
> The easiest way might be the directlabels package from R-forge.
> Otherwise, you could write your own panel function.
>
> HTH,
>
> baptiste
>
>
> On 10 November 2010 23:01, Joon Yeong Kim  wrote:
> > Hi,
> >
> > I've been trying to find a way to label the the peak or mean of a
> > densityplot for a while but haven't been successful. Does anyone know how
> to
> > accomplish this?
> >
> >
> > Thank you for your help in advance.
> >
> >
> > Joon
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict.coxph and predict.survreg

2010-11-11 Thread James C. Whanger
Michael,

You are looking to compute an estimated time to death -- rather than the
odds of death conditional upon time.  Thus, you will want to use "time to
death" as your dependent variable rather than a dichotomous outcome (
0=alive, 1=death).   You can accomplish this with a straight forward
regression analysis.

Best,

Jim

On Thu, Nov 11, 2010 at 3:44 AM, Michael Haenlein wrote:

> Dear all,
>
> I'm struggling with predicting "expected time until death" for a coxph and
> survreg model.
>
> I have two datasets. Dataset 1 includes a certain number of people for
> which
> I know a vector of covariates (age, gender, etc.) and their event times
> (i.e., I know whether they have died and when if death occurred prior to
> the
> end of the observation period). Dataset 2 includes another set of people
> for
> which I only have the covariate vector. I would like to use Dataset 1 to
> calibrate either a coxph or survreg model and then use this model to
> determine an "expected time until death" for the individuals in Dataset 2.
> For example, I would like to know when a person in Dataset 2 will die,
> given
> his/ her age and gender.
>
> I checked predict.coxph and predict.survreg as well as the document "A
> Package for Survival Analysis in S" written by Terry M. Therneau but I have
> to admit that I'm a bit lost here.
>
> Could anyone give me some advice on how this could be done?
>
> Thanks very much in advance,
>
> Michael
>
>
>
> Michael Haenlein
> Professor of Marketing
> ESCP Europe
> Paris, France
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
*James C. Whanger
Research Consultant
2 Wolf Ridge Gap
Ledyard, CT  06339

Phone: 860.389.0414*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict.coxph and predict.survreg

2010-11-11 Thread Michael Haenlein
Thanks very much for your answers, David and Mattia.

I understand that the baseline hazard in a Cox model is unknown and that
this makes the calculation of expected survival difficult.
Does this change when I move to a survreg model instead?

I think I'm OK with estimating a Cox model (or a survreg model) as I've done
so in the past.
But I'm lost with the different options in the prediction part (e.g.,
linear, quantile, risk, expected, ...).
Is there any document that can provide an explanation what these options
mean?

Sorry in case these questions are naive ... hope they're not too stupd ;-)


On Thu, Nov 11, 2010 at 5:03 PM, Mattia Prosperi  wrote:

> Indeed, from the predict() function of the coxph you cannot get
> directly "time" predictions, but only linear and exponential risk
> scores. This is because, in order to get the time, a baseline hazard
> has to be computed and it is not straightforward since it is implicit
> in the Cox model.
>
> 2010/11/11 David Winsemius :
> >
> > On Nov 11, 2010, at 3:44 AM, Michael Haenlein wrote:
> >
> >> Dear all,
> >>
> >> I'm struggling with predicting "expected time until death" for a coxph
> and
> >> survreg model.
> >>
> >> I have two datasets. Dataset 1 includes a certain number of people for
> >> which
> >> I know a vector of covariates (age, gender, etc.) and their event times
> >> (i.e., I know whether they have died and when if death occurred prior to
> >> the
> >> end of the observation period). Dataset 2 includes another set of people
> >> for
> >> which I only have the covariate vector. I would like to use Dataset 1 to
> >> calibrate either a coxph or survreg model and then use this model to
> >> determine an "expected time until death" for the individuals in Dataset
> 2.
> >> For example, I would like to know when a person in Dataset 2 will die,
> >> given
> >> his/ her age and gender.
> >>
> >> I checked predict.coxph and predict.survreg as well as the document "A
> >> Package for Survival Analysis in S" written by Terry M. Therneau but I
> >> have
> >> to admit that I'm a bit lost here.
> >
> > The first step would be creating a Surv-object, followed by running a
> > regression that created a coxph-object,  using dataset1 as input. So you
> > should be looking at:
> >
> > ?Surv
> > ?coxph
> >
> > There are worked examples in the help pages. You would then run predict()
> on
> > the coxph fit with "dataset2" as the newdata argument. The default output
> is
> > the linear predictor for the log-hazard relative to a mean survival
> estimate
> > but other sorts of estimates are possible. The survfit function provides
> > survival curve suitable for plotting.
> >
> > (You may want to inquire at a local medical school to find statisticians
> who
> > have experience with this approach. This is ordinary biostatistics these
> > days.)
> >
> > --
> > David.
> >
> >>
> >> Could anyone give me some advice on how this could be done?
> >>
> >> Thanks very much in advance,
> >>
> >> Michael
> >>
> >>
> >>
> >> Michael Haenlein
> >> Professor of Marketing
> >> ESCP Europe
> >> Paris, France
> >
> > David Winsemius, MD
> > West Hartford, CT
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Rdindex truncating titles?

2010-11-11 Thread Duncan Murdoch

On 07/11/2010 9:17 PM, Rolf Turner wrote:

On 8/11/2010, at 1:24 PM, Duncan Murdoch wrote:



>  ...  The docs do say to
>  limit the length of titles to 65 characters if possible; this is one
>  place where it matters.



The 65 character limit had not previously impinged itself upon my consciousness.
Adhering to this limit certainly seems to ``fix'' the problem.

Actually some experimentation seems to indicate that 72 characters is the 
effective
limit.  If a word spans the 72 character point, then the line is truncated just
before that point.

Again, I'm sure this didn't happen in the past; something has changed.  But we
move on when the world moves on.


I've taken a closer look at this, and I think we don't need to impose 
the 65 character limit in most places, so I'm going to remove it.  (The 
title shown in the title bar of your browser will still be truncated, 
but not the one shown in the main text of the help topic, or the one you 
noticed in the help index.)


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] log-transformed linear regression

2010-11-11 Thread Matt Shotwell
Servet,

These data do look linear in log space. Fortunately, the model

log(y) = a + b * log(x)

does have intercept zero in linear space. To see this, consider

log(y) = a + b * log(x)
 y = 10^(a + b * log(x))
 y = 10^a * 10^(b * log(x))
 y = 10^a * 10^(log(x^b))
 y = 10^a * x^b

Hence, y = 0 when x = 0. The code below estimates a and b. 

Of course,

y = 10^a * x^b 

is not a line, so we can't directly compare slopes. However, in the
region of your data, the estimated mean is _nearly_ linear. In fact, you
could consider looking at a linear approximation, say at the median of
your x values. The median of your x values is 0.958. For simplicity,
let's just say it's 1.0. The linear approximation (first order Taylor
expansion) of 

y = 10^a * x^b

at x = 1 is

y = 10^a + 10^a * b * (x - 1)
y = 10^a * (1 - b) + 10^a * b * x

So, the slope of the linear approximation is 10^a * b, and the intercept
is 10^a * (1 - b). Taking a and b from the analysis below, the
approximate intercept is -0.00442, and slope 0.22650. You could argue
that these values are consistent with the literature, but that the log
linear model is more appropriate for these data. You could even
construct a bootstrap confidence interval for the approximate slope.

-Matt

On Wed, 2010-11-10 at 19:27 -0500, servet cizmeli wrote:
> Dear List,
> 
> I would like to take another chance and see if there if someone has 
> anything to say to my last post...
> 
> bump
> 
> servet
> 
> 
> On 11/10/2010 01:11 PM, servet cizmeli wrote:
> > Hello,
> >
> > I have a basic question. Sorry if it is so evident
> >
> > I have the following data file :
> > http://ekumen.homelinux.net/mydata.txt
> >
> > I need to model Y~X-1 (simple linear regression through the origin) with
> > these data :
> >
> > load(file="mydata.txt")
> > X=k[,1]
> > Y=k[,2]
> >
> > aa=lm(Y~X-1)
> > dev.new()
> > plot(X,Y,log="xy")
> > abline(aa,untf=T)
> > abline(b=0.0235, a=0,col="red",untf=T)
> > abline(b=0.031, a=0,col="green",untf=T)
> >
> > Other people did the same kind of analysis with their data and found the
> > regression coefficients of 0.0235 (red line) and 0.031 (green line).
> >
> > Regression with my own data, though, yields a slope of 0.0458 (black
> > line) which is too high. Clearly my regression is too much influenced by
> > the single point with high values (X>100). I would not like to discard
> > this point, though, because I know that the measurement is correct. I
> > just would like to give it less weight...
> >
> > When I log-transform X and Y data, I obtain :
> >
> > dev.new()
> > plot(log10(X),log10(Y))
> > abline(v=0,h=0,col="cyan")
> > bb=lm(log10(Y)~log10(X))
> > abline(bb,col="blue")
> > bb
> >
> > I am happy with this regression. Now the slope is at the log-log domain.
> > I have to convert it back so that I can obtain a number comparable with
> > the literature (0.0235 and 0.031).  How to do it? I can't force the
> > second regression through the origin as the log-transformed data does
> > not go through the origin anymore.
> >
> > at first it seemed like an easy problem but I am at loss :o((
> > thanks a lot for your kindly help
> > servet
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Matthew S. Shotwell
Graduate Student 
Division of Biostatistics and Epidemiology
Medical University of South Carolina

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict.coxph and predict.survreg

2010-11-11 Thread Mattia Prosperi
Indeed, from the predict() function of the coxph you cannot get
directly "time" predictions, but only linear and exponential risk
scores. This is because, in order to get the time, a baseline hazard
has to be computed and it is not straightforward since it is implicit
in the Cox model.

2010/11/11 David Winsemius :
>
> On Nov 11, 2010, at 3:44 AM, Michael Haenlein wrote:
>
>> Dear all,
>>
>> I'm struggling with predicting "expected time until death" for a coxph and
>> survreg model.
>>
>> I have two datasets. Dataset 1 includes a certain number of people for
>> which
>> I know a vector of covariates (age, gender, etc.) and their event times
>> (i.e., I know whether they have died and when if death occurred prior to
>> the
>> end of the observation period). Dataset 2 includes another set of people
>> for
>> which I only have the covariate vector. I would like to use Dataset 1 to
>> calibrate either a coxph or survreg model and then use this model to
>> determine an "expected time until death" for the individuals in Dataset 2.
>> For example, I would like to know when a person in Dataset 2 will die,
>> given
>> his/ her age and gender.
>>
>> I checked predict.coxph and predict.survreg as well as the document "A
>> Package for Survival Analysis in S" written by Terry M. Therneau but I
>> have
>> to admit that I'm a bit lost here.
>
> The first step would be creating a Surv-object, followed by running a
> regression that created a coxph-object,  using dataset1 as input. So you
> should be looking at:
>
> ?Surv
> ?coxph
>
> There are worked examples in the help pages. You would then run predict() on
> the coxph fit with "dataset2" as the newdata argument. The default output is
> the linear predictor for the log-hazard relative to a mean survival estimate
> but other sorts of estimates are possible. The survfit function provides
> survival curve suitable for plotting.
>
> (You may want to inquire at a local medical school to find statisticians who
> have experience with this approach. This is ordinary biostatistics these
> days.)
>
> --
> David.
>
>>
>> Could anyone give me some advice on how this could be done?
>>
>> Thanks very much in advance,
>>
>> Michael
>>
>>
>>
>> Michael Haenlein
>> Professor of Marketing
>> ESCP Europe
>> Paris, France
>
> David Winsemius, MD
> West Hartford, CT
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] log-transformed linear regression

2010-11-11 Thread Mike Marchywka




I had a few more thoughts but briefly it almost 
looks like exp at low X and lin at higher X may
be something to test. Anyway, I'd be hard pressed to
see how a linear fit, that you are happy to turn to
log-log fit, could give you "a" number of any relevance
except maybe over a few limited ranges.

> plot(X,log(Y))
> sel=(X>2)
> plot(X[sel],log(Y[sel]))
> sel=(X<2)
> plot(X[sel],log(Y[sel]))

I had some longer message ideas but you can keep running models on
selected subsets of your data and see if that one point is your
real problem. And, above all, I wouldn't try to find an error measure
that just helps you match prior literature numbers found via a different 
method...



> From: marchy...@hotmail.com
> To: sa.cizm...@usherbrooke.ca; r-help@r-project.org
> Date: Thu, 11 Nov 2010 08:03:57 -0500
> Subject: Re: [R] log-transformed linear regression
>
>
>
>
>
>
> 
> > Date: Wed, 10 Nov 2010 19:27:20 -0500
> > From: sa.cizm...@usherbrooke.ca
> > To: r-help@r-project.org
> > Subject: Re: [R] log-transformed linear regression
> >
> > Dear List,
> >
> > I would like to take another chance and see if there if someone has
> > anything to say to my last post...
>
> ok, I took a look at it to help myself re-learn some R so
> I solicit better alts from R users in comments below.
>
> If I expand scale down where you have lots of points, the visual
> is something like a plateau for x<1 and then a general noisy
> trend upward- the slope you get for the aggregate blob of
> data probably depends on how your sampling is weighted. I wouldn't
> worry about the one point at large X in isolation quite yet but
> you may want to consider a piecewise linear or other way to handle
> the low X valued data. You may just want to spend more time staring
> at pictures before churning out p-values LOL.
>
> To be clear about your log issues, lm will fit a linear model to the stuff you
> give it and of course y=a*x means something different from log(y) =a*log(x).
> Overall, you need to figure out what to do with the models depending on
> what you think is a candidate for being real- indeed your points at large
> X and Y may be out of the linear range for this "system" but who knows.
>
> If you just try to fit a bunch of data to a line,
> the slope of course depends on which part of this bunch you have
> sampled. Generally to explore sensitivity, try things like this,
> with ".0" of course replaced by whatever threshold you want.
> It is easy to automate, put this in a loop and plot historgrams
> of the resultant slopes etc.
>
> > sel=runif(length(X))>.0
> > aasel=lm(Y[sel]~X[sel])
> > summary(aasel)
>
> Call:
> lm(formula = Y[sel] ~ X[sel])
>
> Residuals:
>   Min1QMedian3Q   Max
> -0.221957 -0.004207  0.004055  0.013395  0.232362
>
> Coefficients:
>   Estimate Std. Error t value Pr(>|t|)
> (Intercept) -0.0256642  0.0033254  -7.718 1.54e-12 ***
> X[sel]   0.0463599  0.0003146 147.365  < 2e-16 ***
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Residual standard error: 0.0399 on 150 degrees of freedom
> Multiple R-squared: 0.9931, Adjusted R-squared: 0.9931
> F-statistic: 2.172e+04 on 1 and 150 DF,  p-value: < 2.2e-16
>
> if you increase ".0" to something like .5 or so and keep repating
> this you get some idea of what can happen, make changes for the various
> models and then decide what you are trying to do etc.
>
>
>
> > sel=runif(length(X))>.9
> > aasel=lm(Y[sel]~X[sel])
> > summary(aasel)
>
> Call:
> lm(formula = Y[sel] ~ X[sel])
>
> Residuals:
>   Min1QMedian3Q   Max
> -0.199943 -0.002839  0.010626  0.019608  0.149099
>
> Coefficients:
>  Estimate Std. Error t value Pr(>|t|)
> (Intercept) -0.035546   0.013741  -2.587   0.0162 *
> X[sel]   0.052033   0.003401  15.301 7.04e-14 ***
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Residual standard error: 0.05878 on 24 degrees of freedom
> Multiple R-squared: 0.907,  Adjusted R-squared: 0.9031
> F-statistic: 234.1 on 1 and 24 DF,  p-value: 7.037e-14
>
>
> >
> > bump
> >
> > servet
> >
> >
> > On 11/10/2010 01:11 PM, servet cizmeli wrote:
> > > Hello,
> > >
> > > I have a basic question. Sorry if it is so evident
> > >
> > > I have the following data file :
> > > http://ekumen.homelinux.net/mydata.txt
> > >
> > > I need to model Y~X-1 (simple linear regression through the origin) with
> > > these data :
> > >
> > > load(file="mydata.txt")
> > > X=k[,1]
> > > Y=k[,2]
> > >
> > > aa=lm(Y~X-1)
> > > dev.new()
> > > plot(X,Y,log="xy")
> > > abline(aa,untf=T)
> > > abline(b=0.0235, a=0,col="red",untf=T)
> > > abline(b=0.031, a=0,col="green",untf=T)
> > >
> > > Other people did the same kind of analysis with their data and found the
> > > regression coefficients of 0.0235 (red line) and 0.031 (green line).
> > >
> > > Regression with my own data, though, yields a

Re: [R] NA handling problem in capscale (vegan)

2010-11-11 Thread Gavin Simpson
On Thu, 2010-11-11 at 18:14 +1100, Nevil Amos wrote:
> I am having a problem with NA handling in capscale function.
> 
> as I understand it from the help capscale should permit NA values when 
> na.action=na.mit or na.exclude,
> 
> however I am getting  the error
> 
> Error in X[nas, , drop = FALSE] : incorrect number of dimensions
> 
> trivial example is pasted below.
> 
> library(vegan)
> mydist<- matrix(1:25, 5, 5)
> mydata<-matrix(rnorm(1:25),5,5)
> mydata[3,1]<-NA
> mydata<-data.frame(mydata)
> 
> capscale(mydist~X1+X2+Condition(X3),data=mydata,na.action = na.omit)
> 
> 
> Thanks
> 
> Nevil Amos

It is failing within the formula parsing code because ordiParseFormula
thinks the response is a matrix but it was getting a dissimilarity
matrix in class "dist" form.

I think Jari has just fixed this in the R-Forge svn sources, but that
revision has not yet been built on the R-Forge build system. Check back
here later:

https://r-forge.r-project.org/R/?group_id=68

Thanks for the report.

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with slope comparisons

2010-11-11 Thread David Winsemius


On Nov 11, 2010, at 4:16 AM, Ravi Kulkarni wrote:



Hello Forum,
 I have data for two groups of subjects for a ratio variable (cortisol
level) assessed three times during the day (at 0600, 0900 and 2100  
hours).

So I have three means for each group. I have been asked to do a "slope
analysis" to compare the two groups.
 According to my understanding, this means that I regress the three  
means
on the time for each group and compare the slopes using a t-test  
(there is a

post on this: Etienne Toffin etoffin at ulb.ac.be
Tue Jan 27 20:08:36 CET 2009).
 My question: (if what I said is correct) it seems strange to me  
that we
calculate a regression line with just three data points - is this  
really

what is meant by a "slope analysis"?


Doubtful. What was probably meant was to do a regression with an  
interaction model based on the original data. Sounds like either you  
or your supervisor need some consulting resources.


--
David.


 Any help most appreciated...

 Ravi
--



David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict.coxph and predict.survreg

2010-11-11 Thread David Winsemius


On Nov 11, 2010, at 3:44 AM, Michael Haenlein wrote:


Dear all,

I'm struggling with predicting "expected time until death" for a  
coxph and

survreg model.

I have two datasets. Dataset 1 includes a certain number of people  
for which
I know a vector of covariates (age, gender, etc.) and their event  
times
(i.e., I know whether they have died and when if death occurred  
prior to the
end of the observation period). Dataset 2 includes another set of  
people for
which I only have the covariate vector. I would like to use Dataset  
1 to

calibrate either a coxph or survreg model and then use this model to
determine an "expected time until death" for the individuals in  
Dataset 2.
For example, I would like to know when a person in Dataset 2 will  
die, given

his/ her age and gender.

I checked predict.coxph and predict.survreg as well as the document "A
Package for Survival Analysis in S" written by Terry M. Therneau but  
I have

to admit that I'm a bit lost here.


The first step would be creating a Surv-object, followed by running a  
regression that created a coxph-object,  using dataset1 as input. So  
you should be looking at:


?Surv
?coxph

There are worked examples in the help pages. You would then run  
predict() on the coxph fit with "dataset2" as the newdata argument.  
The default output is the linear predictor for the log-hazard relative  
to a mean survival estimate but other sorts of estimates are possible.  
The survfit function provides survival curve suitable for plotting.


(You may want to inquire at a local medical school to find  
statisticians who have experience with this approach. This is ordinary  
biostatistics these days.)


--
David.



Could anyone give me some advice on how this could be done?

Thanks very much in advance,

Michael



Michael Haenlein
Professor of Marketing
ESCP Europe
Paris, France


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] bootstrap/boot unknown distribution

2010-11-11 Thread francogrex

Hi, this is a question about bootstrapping, it relates more to the concept
than to the R package boot. But I wonder below if boot can help me. I have
the below to calculate a certain point estimate:

estimate= (0.9 * 0.03 * 70 *
(((77 * (76 / 76.0)) / 83107) -
((174 * (154 / 154.0)) / 376354)))

= 8.77311

Now I want to calculate confidence intervals around that estimate so I
thought that since some are binomial proportions I would use random binomial
estimates around those within the formula such as to have:

boot_data <- (0.9 * 0.03 * 70 *
(((rbinom(1, 83107, (77 / 83107)) * (rbinom(1, 76, (70 /
76)) / 76.0)) / 83107) -
((rbinom(1, 376354, (174 / 376354)) * (rbinom(1, 154, (138 /
154)) / 154.0)) / 376354)))

and then plotting the histogram and getting the quantiles to derive the 95%
confidence intervals. In this case I would get:
> quantile(boot_data, probs=c(0.025, 0.5, 0.975))
 2.5%   50% 97.5%
 4.476521  8.258544 12.434737

Is this a good way of using the bootstrap? What other ways could I use to
get a better confidence interval that is estimated using better methods than
the quantile method? I now the boot package can give the CI using ABc, BC
and bootstrap-t for parametric and non-parametric situations. I think since
I don't have any (known) distribution on my formula to get the data values,
what is the most appropriate way to use the boot?
-- 
View this message in context: 
http://r.789695.n4.nabble.com/bootstrap-boot-unknown-distribution-tp3038020p3038020.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple Function

2010-11-11 Thread rnick

Thanks!

It worked fine.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Simple-Function-tp3035585p3038017.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Rserve Installing error on AIX5.3..

2010-11-11 Thread nabsan_jp

Hello all! I am nabsan.
I've got error below when I try to install the Rserve0.6.3 on AIX5.3.

also I tried several version of Rserve 0.4.3 to 0.6.3.. but there were same
error message..
[ld: 0711-317 ERROR: Undefined symbol: .main]

any help please.

--
OS:AIX5.3
R:R2.8.1(--enable-R-shlib)
Rserve: I tried .. 0.4.3 - 0.6.3
---

..console logging below...

# R CMD INSTALL Rserve_0.6-3.tar.gz -d 
* Installing to library '/usr/local/lib/R/library' 
'R CMD INSTALL': in startdir= /src/nsm/baldy/R_modules with tmpdir=
/tmp/R.INSTALL2199648-31295 
   lib= '/usr/local/lib/R/library', pkgs= '
"/tmp/R.INSTALL2199648-31295/Rserve"' 
   before checking lockdir= '/usr/local/lib/R/library/00LOCK' 
   after checking lockdir 
   build_help_opts= '--debug  --txt --html --latex --example' 
'R CMD INSTALL': now doing 'eval ... do_install': 
* Installing *source* package 'Rserve' ... 
checking whether to compile the server... yes 
checking whether to compile the client... no 
checking for gcc... gcc -std=gnu99 
checking whether the C compiler works... yes 
checking for C compiler default output file name... a.out 
checking for suffix of executables... 
checking whether we are cross compiling... no 
checking for suffix of object files... o 
checking whether we are using the GNU C compiler... yes 
checking whether gcc -std=gnu99 accepts -g... yes 
checking for gcc -std=gnu99 option to accept ISO C89... none needed 
checking how to run the C preprocessor... gcc -std=gnu99 -E 
checking for grep that handles long lines and -e... /usr/bin/grep 
checking for egrep... /usr/bin/grep -E 
checking for ANSI C header files... yes 
checking for sys/wait.h that is POSIX.1 compatible... yes 
checking for sys/types.h... yes 
checking for sys/stat.h... yes 
checking for stdlib.h... yes 
checking for string.h... yes 
checking for memory.h... yes 
checking for strings.h... yes 
checking for inttypes.h... yes 
checking for stdint.h... yes 
checking for unistd.h... yes 
checking for string.h... (cached) yes 
checking for memory.h... (cached) yes 
checking sys/time.h usability... yes 
checking sys/time.h presence... yes 
checking for sys/time.h... yes 
checking for unistd.h... (cached) yes 
checking for sys/stat.h... (cached) yes 
checking for sys/types.h... (cached) yes 
checking sys/socket.h usability... yes 
checking sys/socket.h presence... yes 
checking for sys/socket.h... yes 
checking sys/un.h usability... yes 
checking sys/un.h presence... yes 
checking for sys/un.h... yes 
checking netinet/in.h usability... yes 
checking netinet/in.h presence... yes 
checking for netinet/in.h... yes 
checking netinet/tcp.h usability... yes 
checking netinet/tcp.h presence... yes 
checking for netinet/tcp.h... yes 
checking for an ANSI C-conforming const... yes 
checking whether byte ordering is bigendian... yes 
checking whether time.h and sys/time.h may both be included... yes 
checking for pid_t... yes 
checking vfork.h usability... no 
checking vfork.h presence... no 
checking for vfork.h... no 
checking for fork... yes 
checking for vfork... yes 
checking for working fork... yes 
checking for working vfork... (cached) yes 
checking return type of signal handlers... void 
checking for memset... yes 
checking for mkdir... yes 
checking for rmdir... yes 
checking for select... yes 
checking for socket... yes 
checking for library containing crypt... none required 
checking crypt.h usability... yes 
checking crypt.h presence... yes 
checking for crypt.h... yes 
checking for socklen_t... yes 
checking for connect... yes 
checking for dlopen in -ldl... yes 
configure: creating ./config.status 
config.status: creating src/Makefile 
config.status: creating src/client/cxx/Makefile 
config.status: creating src/config.h 
** libs 
+ mkdir -p /usr/local/lib/R/library/Rserve/libs 
+ test -f src/Makefile 
+ + /usr/bin/sed -e s+^/++ 
+ echo 
arch= 
+ message arch - 
** arch - 
+ cd src 
+ makefiles=-f "/usr/local/lib/R/etc/Makeconf" -f Makefile 
+ test -r /home/baldy/.R/Makevars-powerpc-ibm-aix5.3.0.0 
+ test -r /home/baldy/.R/Makevars 
+ eval make -f "/usr/local/lib/R/etc/Makeconf" -f Makefile 
+ make -f /usr/local/lib/R/etc/Makeconf -f Makefile 
gcc -std=gnu99 -I/usr/local/lib/R/include  -I/usr/local/include 
-mno-fp-in-toc -DDAEMON -Iinclude -I. -I/usr/local/lib/R/include 
  -g -O2 -c Rserv.c -o Rserv.o 
gcc -std=gnu99 -I/usr/local/lib/R/include  -I/usr/local/include 
-mno-fp-in-toc -DDAEMON -Iinclude -I. -I/usr/local/lib/R/include 
  -g -O2 -c session.c -o session.o 
gcc -std=gnu99 -I/usr/local/lib/R/include  -I/usr/local/include 
-mno-fp-in-toc -DDAEMON -Iinclude -I. -I/usr/local/lib/R/include 
  -g -O2 -c md5.c -o md5.o 
gcc -std=gnu99  -o Rserve -L/usr/local/lib/R/lib -lR -ldl 
ld: 0711-317 ERROR: Undefined symbol: .main 
ld: 0711-345 Use the -bloadmap or -bnoquiet option to obtain more
information. 
collect2: ld returned 8 exit status 
make: The error code from the

Re: [R] clustering association rules

2010-11-11 Thread Michael Hahsler

Jüri,

How did you create the output?

An example to cluster transactions with arules can be found in:

Michael Hahsler and Kurt Hornik. Building on the arules infrastructure 
for analyzing transaction data with R. In R. Decker and H.-J. Lenz, 
editors, /Advances in Data Analysis, Proceedings of the 30th Annual 
Conference of the Gesellschaft für Klassifikation e.V., Freie 
Universität Berlin, March 8-10, 2006/, Studies in Classification, Data 
Analysis, and Knowledge Organization, pages 449-456. Springer-Verlag, 2007.


URL: http://michael.hahsler.net/research/arules_gfkl2006/arules_gfkl2006.pdf

Hope this helps,
-Michael

--
  Dr. Michael Hahsler, Visiting Assistant Professor
  Department of Computer Science and Engineering
  Lyle School of Engineering
  Southern Methodist University, Dallas, Texas

  (214) 768-8878 * mhahs...@lyle.smu.edu * http://lyle.smu.edu/~mhahsler

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problem with uroot package: ADF.test

2010-11-11 Thread Inger

Hi all,

I have a problem while working with the uroot package, more specifically
with the ADF.test. The smallest p-value it prints is 0.0100 but I want
to know the exact p-value. R does warn you about the problem by saying the
following: In interpolpval(code = code, stat = adfreg[, 3], N = N) : 
p-value is smaller than printed p-value.
But I do not know how to solve this issue. Does anyone know how to find the
exact p-value?

Thanks,
Inger
-- 
View this message in context: 
http://r.789695.n4.nabble.com/problem-with-uroot-package-ADF-test-tp3037893p3037893.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] create a pairwise coocurrence matrix

2010-11-11 Thread David Winsemius


On Nov 11, 2010, at 4:44 AM, Stefan Evert wrote:

Pasted and realigned from original posting:

term1 term2 term3 term4 term5
term1 0 2 0 1 3
term2 2 0 0 1 2
term3 0 0 0 0 0
term4 1 1 0 0 1
term5 3 2 0 1 1
Any ideas on how to do that?


If I understood you correctly, you have this matrix of indicator  
variables for occurrences of terms in documents:


 A <- matrix(c(1,1,0,0,1,1,1,0,1,1,1,0,0,0,1), nrow=3, byrow=TRUE,  
dimnames=list(paste("doc",1:3), paste("term",1:5)))

 A

and want to determine co-occurrence counts for pairs of terms,  
right? (The formatting of your matrices was messed up, and some of  
your co-occurrence counts don't make sense to me.)


The fastest and easiest solution is

 t(A) %*% A


That is really elegant. (Wish I could remember my linear algebra  
lessons as well from forty years ago.) I checked it against the  
specified output and found that with one exception that the OP had  
planned for the diagonal to be filled with zeroes. So that could be  
completed by a simple modification:


temp <- t(A) %*% A
diag(temp) <- 0
temp

--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] plotrix: reverse length scale

2010-11-11 Thread Pep Serra

Hi!

I wanted to know if there could be a way to reverse lengths in 
polar.plot or radial.plot.


As far as I am concerned, larger lengths imply lines or points further 
from the center of the circle and I want it the other way around: so if 
my length limits are from 0 to 2000 I want points with length near 2000 
to be closer to the circle center.


Thanx a lot,

pep
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Export lmList

2010-11-11 Thread David Winsemius


On Nov 11, 2010, at 6:04 AM, Rosario Garcia Gil wrote:


Hello

I am trying to export and lmList by write.table(x, "x.csv") and it  
complains.

I have tried to convert the lmList into data.frame but also complains.

Any suggestion?


The most important suggestion would to provide context. Why are you  
doing this?


The output of lmList, a list of fits, (assuming this to be from  
lme4 ... and the lack of specification would generate another  
suggestion) is not in a form suitable for coercion to a dataframe and  
write.table only works on objects that are coercible to a dataframe.  
If you are hoping to save an object that can be restored then the save/ 
load combination is the route to success.


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] r-2.12.0 - Hmisc

2010-11-11 Thread Uwe Ligges



On 11.11.2010 14:57, R Heberto Ghezzo, Dr wrote:

Hello, I am working on a HP Laptop with R 2.12 just installed, I tried:

R version 2.12.0 (2010-10-15)
Copyright (C) 2010 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: i386-pc-mingw32/i386 (32-bit)
...
library(Hmisc)
Loading required package: survival
Loading required package: splines
Error in library.dynam(lib, package, package.lib) :
   DLL 'cluster' not found: maybe not installed for this architecture?
Error: package/namespace load failed for 'Hmisc'


library(cluster)

Error: package 'cluster' is not installed for 'arch=i386'



but of course "cluster" is in the 
R-2.12.0/library/cluster/libs/i386/cluster.dll.
Does the loader of required packages not look into i386?



It does, but presumably you have another (outdated, compiled for R < 
2.12.x) instance of cluster installed in a library that is (in the 
search path)  in front of the base library you cited above.


Uwe Ligges



Thanks
Heberto Ghezzo
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] r-2.12.0 - Hmisc

2010-11-11 Thread Duncan Murdoch

On 11/11/2010 8:57 AM, R Heberto Ghezzo, Dr wrote:

Hello, I am working on a HP Laptop with R 2.12 just installed, I tried:

R version 2.12.0 (2010-10-15)
Copyright (C) 2010 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: i386-pc-mingw32/i386 (32-bit)
...
library(Hmisc)
Loading required package: survival
Loading required package: splines
Error in library.dynam(lib, package, package.lib) :
   DLL 'cluster' not found: maybe not installed for this architecture?
Error: package/namespace load failed for 'Hmisc'
>
>  library(cluster)
Error: package 'cluster' is not installed for 'arch=i386'
>
but of course "cluster" is in the 
R-2.12.0/library/cluster/libs/i386/cluster.dll.
Does the loader of required packages not look into i386?


I would guess that you have another copy of it somewhere else, installed 
for an earlier version of R.  Look at sessionInfo() or .libPaths() to 
find where.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] r-2.12.0 - Hmisc

2010-11-11 Thread R Heberto Ghezzo, Dr
Hello, I am working on a HP Laptop with R 2.12 just installed, I tried:

R version 2.12.0 (2010-10-15)
Copyright (C) 2010 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: i386-pc-mingw32/i386 (32-bit)
...
library(Hmisc)
Loading required package: survival
Loading required package: splines
Error in library.dynam(lib, package, package.lib) :
  DLL 'cluster' not found: maybe not installed for this architecture?
Error: package/namespace load failed for 'Hmisc'
>
> library(cluster)
Error: package 'cluster' is not installed for 'arch=i386'
>
but of course "cluster" is in the 
R-2.12.0/library/cluster/libs/i386/cluster.dll.
Does the loader of required packages not look into i386?
Thanks
Heberto Ghezzo
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Format table help

2010-11-11 Thread cameron

Phil

you're right Mthticker is not a table.  
Mthticker is a list.  I am trying to reformat all elements in the list.  or
i can reformat one element and do a loop around it.

Thanks 
Cameron


Mthticker
str
List of 5
 $ :'zoo' series from 01/02/04 to 11/01/10
  Data: chr [1:83] "LHG04   " "LHJ04   " "LHJ04   " "LHK04   " ...
  Index: Classes 'dates', 'times'  atomic [1:83] 12419 12450 12478 12509
12541 ...
  .. ..- attr(*, "format")= chr "m/d/y"
  .. ..- attr(*, "origin")= num [1:3] 1 1 1970
 $ :'zoo' series from 01/02/04 to 11/01/10
  Data: chr [1:83] "LHJ04   " "LHK04   " "LHK04   " "LHM04   " ...
  Index: Classes 'dates', 'times'  atomic [1:83] 12419 12450 12478 12509
12541 ...
  .. ..- attr(*, "format")= chr "m/d/y"
  .. ..- attr(*, "origin")= num [1:3] 1 1 1970
 $ :'zoo' series from 01/02/04 to 11/01/10
  Data: chr [1:83] "LHK04   " "LHM04   " "LHM04   " "LHN04   " ...
  Index: Classes 'dates', 'times'  atomic [1:83] 12419 12450 12478 12509
12541 ...
  .. ..- attr(*, "format")= chr "m/d/y"
  .. ..- attr(*, "origin")= num [1:3] 1 1 1970
 $ :'zoo' series from 01/02/04 to 11/01/10
  Data: chr [1:83] "LHM04   " "LHN04   " "LHN04   " "LHQ04   " ...
  Index: Classes 'dates', 'times'  atomic [1:83] 12419 12450 12478 12509
12541 ...
  .. ..- attr(*, "format")= chr "m/d/y"
  .. ..- attr(*, "origin")= num [1:3] 1 1 1970
 $ :'zoo' series from 01/02/04 to 11/01/10
  Data: chr [1:83] "LHN04   " "LHQ04   " "LHQ04   " "LHV04   " ...
  Index: Classes 'dates', 'times'  atomic [1:83] 12419 12450 12478 12509
12541 ...
  .. ..- attr(*, "format")= chr "m/d/y"
  .. ..- attr(*, "origin")= num [1:3] 1 1 1970




-- 
View this message in context: 
http://r.789695.n4.nabble.com/Format-table-help-tp3036945p3037822.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] standardized/studentized residuals with loess

2010-11-11 Thread Oliver Frings
Hi Josh,

many thank's for your reply. I tried to read up on this more and to be frank
I got a bit confused about the exact definition of residual standardization.
It occurs to me that different people have different definitions and that it
can be done with and without the leverage of each point. Anyhow, the way you
did it seems correct to me! My only problem is now that it assumes the same
standard error for each point. My data have definitely different standard
deviations at different points. So I was wondering if there is a way to do
it that accounts for the different standard deviations at different points?

Many thanks!

/Oliver

On Wed, Nov 10, 2010 at 8:21 PM, Joshua Wiley wrote:

> Hi Oliver,
>
> As a warning, I may be missing something too.  I did not see something
> explicit in base R or MASS.  In a quick scan of the fourth edition of
> the MASS book, I did not read anything that it is
> illogical/unreasonable to try to find standardized residuals (but my
> knowledge of local regression approaches nil).  With that background,
> I proceeded to blithely scavenge from other functions until I came up
> with this:
>
> loess.stdres <- function(model) {
>  res <- model$residuals
>  s <- sqrt(sum(res^2)/(length(res) - model$enp))
>  stdres <- res/(sqrt(1 - hat(res)) * s)
>  return(stdres)
> }
>
> ## now for a half-baked check
>
> ## fit linear model and local regression
> cars.lm <- lm(dist ~ speed, cars)
> cars.lo <- loess(dist ~ speed, cars)
>
> ## these seem somewhat similar
> rstandard(cars.lm)
> c(scale(residuals(cars.lm)))
>
> ## these seem somewhat similar too
> loess.stdres(cars.lo)
> c(scale(cars.lo$residuals))
>
>
> Cheers,
>
> Josh
>
>
>
> On Wed, Nov 10, 2010 at 9:24 AM, Oliver Frings
>  wrote:
> > Hi all,
> >
> > I'm trying to apply loess regression to my data and then use the fitted
> > model to get the *standardized/studentized residuals. I understood that
> for
> > linear regression (lm) there are functions to do that:*
> > *
> > *
> > fit1 = lm(y~x)
> > stdres.fit1 = rstandard(fit1)
> > studres.fit1 = rstudent(fit1)
> >
> > I was wondering if there is an equally simple way to get
> > the standardized/studentized residuals for a loess model? BTW
> > my apologies if there is something here that I'm missing.
> >
> > All the best,
> > *
> > *
> > *Oliver *
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> http://www.joshuawiley.com/
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Area-Interaction Process

2010-11-11 Thread Neba Funwi-Gabga
Hello,
Does the area-interaction process built in Spatstat actually work for data
in a polygonal windows? I try to run it but I get the error message that W
must be a rectangular object, but my window is a polygon of my study area.

Please any help?

neba

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plot options including formatting axes

2010-11-11 Thread Jannis

Hi Sachin,

please read the posting guide and include a reproducible example of what 
you want to do.


For your first question you should have a look at ?axis. Supplying the 
'at' argument with the positions of the desired marks and the 'labels' 
with text strings like '10.000$' should do what you want. There may be 
easier ways though...


The second problem should be solved with adding your xlab and ylab 
arguments to the call of plot() and not to title().



HTH
Jannis

sachinthaka.abeyward...@allianz.com.au schrieb:

Hi All,

Currently my plot shows the y-axis in scientific notation (1e07 and so on).
I want to be able to display this in dollars such that it shows $10,000,000
(including the commas). How do I do this.

Also with the xlabel and ylabel. I've specified: 'title('Cash vs
Time',xlab='Period',ylab=''); Im hoping that this will also not display
anything on the y-axis. Instead it is actually overwriting the default
x-label and the default y-label remains the same.

Is there something I was supposed to do. I thought by specifying the 'xlab'
option in title it gets rid of the default one.

Thanks,
Sachin

--- Please consider the environment before printing this email --- 


Allianz - Best General Insurance Company of the Year 2010*
Allianz - General Insurance Company of the Year 2009+ 


* Australian Banking and Finance Insurance Awards
+ Australia and New Zealand Insurance Industry Awards 


This email and any attachments has been sent by Allianz Australia Insurance 
Limited (ABN 15 000 122 850) and is intended solely for the addressee. It is 
confidential, may contain personal information and may be subject to legal 
professional privilege. Unauthorised use is strictly prohibited and may be 
unlawful. If you have received this by mistake, confidentiality and any legal 
privilege are not waived or lost and we ask that you contact the sender and 
delete and destroy this and any other copies. In relation to any legal use you 
may make of the contents of this email, you must ensure that you comply with 
the Privacy Act (Cth) 1988 and you should note that the contents may be subject 
to copyright and therefore may not be reproduced, communicated or adapted 
without the express consent of the owner of the copyright.
Allianz will not be liable in connection with any data corruption, 
interruption, delay, computer virus or unauthorised access or amendment to the 
contents of this email. If this email is a commercial electronic message and 
you would prefer not to receive further commercial electronic messages from 
Allianz, please forward a copy of this email to unsubscr...@allianz.com.au with 
the word unsubscribe in the subject header.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Nuisance parameters for Geyer Saturation process.

2010-11-11 Thread Neba Funwi-Gabga
Hello group,
Can someone please put me through how to estimate the nuisance parameters
(Saturation parameter and radius) for the Geyer saturation process?
It seems quite confusing to me how these parameters are achieved, but they
are very important in determining the interaction between points in a point
process.
I am using Spatstat package.

Thanks.

neba

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] log-transformed linear regression

2010-11-11 Thread Mike Marchywka






> Date: Wed, 10 Nov 2010 19:27:20 -0500
> From: sa.cizm...@usherbrooke.ca
> To: r-help@r-project.org
> Subject: Re: [R] log-transformed linear regression
>
> Dear List,
>
> I would like to take another chance and see if there if someone has
> anything to say to my last post...

ok, I took a look at it to help myself re-learn some R so
I solicit better alts from R users in comments below.

If I expand scale down where you have lots of points, the visual
is something like a plateau for x<1 and then a general noisy
trend upward- the slope you get for the aggregate blob of
data probably depends on how your sampling is weighted. I wouldn't
worry about the one point at large X in isolation quite yet but
you may want to consider a piecewise linear or other way to handle
the low X valued data. You may just want to spend more time staring
at pictures before churning out p-values LOL. 

To be clear about your log issues, lm will fit a linear model to the stuff you
give it and of course y=a*x means something different from log(y) =a*log(x).
Overall, you need to figure out what to do with the models depending on
what you think is a candidate for being real- indeed your points at large
X and Y may be out of the linear range for this "system" but who knows. 

If you just try to fit a bunch of data to a line,
the slope of course depends on which part of this bunch you have
sampled. Generally to explore sensitivity, try things like this,
with ".0" of course replaced by whatever threshold you want.
It is easy to automate, put this in a loop and plot historgrams
of the resultant slopes etc. 

> sel=runif(length(X))>.0
> aasel=lm(Y[sel]~X[sel])
> summary(aasel)

Call:
lm(formula = Y[sel] ~ X[sel])

Residuals:
  Min    1Q    Median    3Q   Max
-0.221957 -0.004207  0.004055  0.013395  0.232362

Coefficients:
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0256642  0.0033254  -7.718 1.54e-12 ***
X[sel]   0.0463599  0.0003146 147.365  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.0399 on 150 degrees of freedom
Multiple R-squared: 0.9931, Adjusted R-squared: 0.9931
F-statistic: 2.172e+04 on 1 and 150 DF,  p-value: < 2.2e-16

if you increase ".0" to something like .5 or so and keep repating
this you get some idea of what can happen, make changes for the various
models and then decide what you are trying to do etc. 



> sel=runif(length(X))>.9
> aasel=lm(Y[sel]~X[sel])
> summary(aasel)

Call:
lm(formula = Y[sel] ~ X[sel])

Residuals:
  Min    1Q    Median    3Q   Max
-0.199943 -0.002839  0.010626  0.019608  0.149099

Coefficients:
 Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.035546   0.013741  -2.587   0.0162 *
X[sel]   0.052033   0.003401  15.301 7.04e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.05878 on 24 degrees of freedom
Multiple R-squared: 0.907,  Adjusted R-squared: 0.9031
F-statistic: 234.1 on 1 and 24 DF,  p-value: 7.037e-14


>
> bump
>
> servet
>
>
> On 11/10/2010 01:11 PM, servet cizmeli wrote:
> > Hello,
> >
> > I have a basic question. Sorry if it is so evident
> >
> > I have the following data file :
> > http://ekumen.homelinux.net/mydata.txt
> >
> > I need to model Y~X-1 (simple linear regression through the origin) with
> > these data :
> >
> > load(file="mydata.txt")
> > X=k[,1]
> > Y=k[,2]
> >
> > aa=lm(Y~X-1)
> > dev.new()
> > plot(X,Y,log="xy")
> > abline(aa,untf=T)
> > abline(b=0.0235, a=0,col="red",untf=T)
> > abline(b=0.031, a=0,col="green",untf=T)
> >
> > Other people did the same kind of analysis with their data and found the
> > regression coefficients of 0.0235 (red line) and 0.031 (green line).
> >
> > Regression with my own data, though, yields a slope of 0.0458 (black
> > line) which is too high. Clearly my regression is too much influenced by
> > the single point with high values (X>100). I would not like to discard
> > this point, though, because I know that the measurement is correct. I
> > just would like to give it less weight...
> >
> > When I log-transform X and Y data, I obtain :
> >
> > dev.new()
> > plot(log10(X),log10(Y))
> > abline(v=0,h=0,col="cyan")
> > bb=lm(log10(Y)~log10(X))
> > abline(bb,col="blue")
> > bb
> >
> > I am happy with this regression. Now the slope is at the log-log domain.
> > I have to convert it back so that I can obtain a number comparable with
> > the literature (0.0235 and 0.031). How to do it? I can't force the
> > second regression through the origin as the log-transformed data does
> > not go through the origin anymore.
> >
> > at first it seemed like an easy problem but I am at loss :o((
> > thanks a lot for your kindly help
> > servet
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guid

[R] metafor: including phylogenetic non-independence among species?

2010-11-11 Thread Scott Chamberlain
Hello, Is it possible to include information about phylogenetic relatedness
among species (when species are replicates for each study within a
meta-analysis) in a meta-anlaysis in the metafor package?

Alternatively, I wonder if the method f Lajeunesse 2009 American Naturalist
has been adopted in R in any fashion?

Thanks, Scott Chamberlain

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plot Axes

2010-11-11 Thread Gavin Simpson
On Thu, 2010-11-11 at 01:41 -0800, dpender wrote:
> Thanks  Jim,
> 
> The next issue is how i get the RP lines to cut the axes. The lines either
> cut the x axis at D=35 or the y axis at H=4.5.
> 
> How do I set the axes so that the lines start and end on the axes?
> 
> http://r.789695.n4.nabble.com/file/n3037496/plot.jpeg 

See ?par and in particular 'xaxs' and 'yaxs'

e.g.:

plot(1:10, xaxs = "i", yaxs = "i")

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Survey on Statistical Computing Email Lists

2010-11-11 Thread ohri2007
If you have trouble viewing or submitting this form, you can fill it out  
online:
https://spreadsheets.google.com/viewform?formkey=dGczV2J4ai03NXhqZVZKZHR4YWFZSWc6MQ


Survey on Statistical Computing Email Lists


This is a survey for research purposes. All results would be aggregated and  
publicly shared. The null hypothesis we are trying to prove establish is  
the representation of women and minorities in statistical computing.


How many years have you worked with statistical software * Please enter  
approximate year who have worked (not just learnt) statistical software

1-3 years
3-5 years
5-10 years
> 10 years


Please name your ethnic affliation * Describe race /color

White
Black
Yellow (East Asian)
Brown (South East Asian)
Mixed
Other


Describe your gender * Sex -Male or Female

Male
Female


Describe your sexual orientation * L= Lesbian G=Gay B=BiSexual  
T=Transvesite Q= Questioning S=Straight

L
G
B
T
Q
Straight


Powered by Google Docs Report Abuse - Terms of Service - Additional Terms



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Number above the bar?

2010-11-11 Thread Marianne Promberger
Hi Joel,

> I got an barplot, and I would like to have the exact number of the bars just
> above the bars anyone know how to do this?

Google "r number above bars barplot" for threads on this list how to
do it and why you might not want to.

?text

Marianne

-- 
Marianne Promberger PhD, King's College London
http://promberger.info
R version 2.12.0 (2010-10-15)
Ubuntu 9.04

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] randomForest parameters for image classification

2010-11-11 Thread Liaw, Andy
Please show us the code you used to run randomForest, the output, as
well as what you get with other algorithms (on the same random subset
for comparison).  I have yet to see a dataset where randomForest does
_far_ worse than other methods.

Andy 

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Deschamps, Benjamin
> Sent: Tuesday, November 09, 2010 10:52 AM
> To: r-help@r-project.org
> Subject: [R] randomForest parameters for image classification
> 
> I am implementing an image classification algorithm using the
> randomForest package. The training data consists of 31000+ training
> cases over 26 variables, plus one factor predictor variable (the
> training class). The main issue I am encountering is very low overall
> classification accuracy (a lot of confusion between classes). 
> However, I
> know from other classifications (including a regular decision tree
> classifier) that the training and validation data is sound and capable
> of producing good accuracies). 
> 
>  
> 
> Currently, I am using the default parameters (500 trees, mtry not set
> (default), nodesize = 1, replace=TRUE). Does anyone have experience
> using this with large datasets? Currently I need to randomly sample my
> training data because giving it the full 31000+ cases returns 
> an out of
> memory error; the same thing happens with large numbers of 
> trees.  From
> what I read in the documentation, perhaps I do not have 
> enough trees to
> fully capture the training data?
> 
>  
> 
> Any suggestions or ideas will be greatly appreciated.
> 
>  
> 
> Benjamin
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
Notice:  This e-mail message, together with any attachme...{{dropped:11}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tinn-R_2.3.6.0

2010-11-11 Thread Uwe Ligges

Please tell the authors of Tinn-R.

Uwe Ligges

On 11.11.2010 12:12, Hannu Kahra wrote:

Hi,

I just found out that Tinn-R_2.3.6.0 is now available on
http://sourceforge.net/projects/tinn-r/.

When configuring Tinn-R: R>  Configure>  Permanent (Rprofile.site) I get
error

C:\Program Files\R\R-2.12.0\bin\etc\RProfile.site
The above file was not found.
Please try to repeat the procedure!

The correct path for RProfile.site is

C:\Program Files\R\R-2.12.0\etc\Rprofile.site

I was able to configure the file using the previous version of Tinn-R.

-Hannu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >