from:"Robert A. LaBudde"

[R] S-Plus "resample" package and associated functions

2007-09-10 Thread Robert A. LaBudde

Are there any packages in R that reproduce the package "resample" of S-Plus?

The sample() function in R doesn't provide equivalent flexibility of 
bootstrap() and bootstrap2().
========
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Excel

2007-08-28 Thread Robert A LaBudde

At 01:21 AM 8/28/2007, David wrote:
>On Tue, 28 Aug 2007, Robert A LaBudde wrote:
>
>>If you format the column as "Text", you won't have this problem. By
>>leaving the cells as "General", you leave it up to Excel to guess at
>>the correct interpretation.
>
>Not true actually. I had converted the column to Text because I saw 
>the interpretation as a date in the .xls file. I saved the .csv file 
>*after* the column had been converted to Text. Looking at the .csv 
>file in a text editor, the entry is correct.
>

You need to convert the column to Text before you enter the data. 
This tells Excel the presumption to use.

Converting to Text after you enter the values has no effect on 
previously entered values.

I've tested this using Excel 2000.

========
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Excel

2007-08-27 Thread Robert A LaBudde

If you format the column as "Text", you won't have this problem. By 
leaving the cells as "General", you leave it up to Excel to guess at 
the correct interpretation.

You will note that the conversion to a date occurs immediately in 
Excel when you enter the value. There are many formats to enter dates.

Either pre-format the column as Text, or prefix the individual entry 
with an ' to indicate text.

A similar problem occurs in R's read.table() function when a factor 
has levels that can be interpreted as numbers.

At 10:11 PM 8/27/2007, David wrote:

>A common process when data is obtained in an Excel spreadsheet is to save
>the spreadsheet as a .csv file then read it into R. Experienced users
>might have learned to be wary of dates (as I have) but possibly have not
>experienced what just happened to me. I thought I might just share it with
>r-help as a cautionary tale.
>
>I received an Excel file giving patient details. Each patient had an ID
>code in the form of three letters followed by four digits. (Actually a New
>Zealand National Health Identification.) I saved the .xls file as .csv.
>Then I opened up the .csv (with Excel) to look at it. In the column of ID
>codes I saw: Aug-99. Clicking on that entry it showed 1/08/2699.
>
>In a column of character data, Excel had interpreted AUG2699 as a date.
>
>The .csv did not actually have a date in that cell, but if I had saved the
>.csv file it would have.
>
>David Scott

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] chi-square statistics

2007-08-23 Thread Robert A LaBudde

At 12:41 AM 8/24/2007, Karen wrote:
>I'm wondering if R has functions to return pvalues with given x-squares and
>df. I googled for it but couldn't seem to find one. Appreciate any helps
>here.
>An example: df=4, x<- c(33.69, 32.96, 30.90) which are the statistic for
>chi-square, I'd like to get the corresponding pvalues for each values in x.

? pchisq

E.g.

pchisq(x, df, lower.tail=FALSE)

============
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Conditional logistic regression on n:m matched "cohort" data [Corrected]

2007-07-22 Thread Robert A. LaBudde

[Corrected the model formula to include "method".]

I am designing an interlaboratory validation study for a 
presence/absence alternative method test kit vs. a presence/absence 
reference method test kit.

There will be 10 laboratories conducting tests using both methods. In 
each laboratory, there will be 5 specimens tested, each of the 5 
specimens twice by both methods (alternative, standard).

The total number of data are 10 x 5 x 4 = 200.

The general structure is as follows:

id: sequential result label, 1 to 200.
lab: a factor which has 10 levels, one for each participating lab.
specimen: 1 to 5 in each lab, no connection between labs.
method: A or R.
result: 0 or 1 (absence or presence).

This experiment appears to be equivalent to a matched cohort study. 
The matching is done on specimens, and there are 2 alternative and 2 
reference results for each specimen.

The sketchy description for clogit() of package "survival" presumes a 
matched case-control study is being analyzed.

I am looking for the correct syntax to use in clogit() for this experiment.

Is

library('survival')
clogit(result ~ lab + method + strata(specimen), method="exact", data=mydata)

all that is required?

Thanks.
============
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Conditional logistic regression on n:m matched "cohort" data

2007-07-22 Thread Robert A. LaBudde

I am designing an interlaboratory validation study for a 
presence/absence alternative method test kit vs. a presence/absence 
reference method test kit.

There will be 10 laboratories conducting tests using both methods. In 
each laboratory, there will be 5 specimens tested, each of the 5 
specimens twice by both methods (alternative, standard).

The total number of data are 10 x 5 x 4 = 200.

The general structure is as follows:

id: sequential result label, 1 to 200.
lab: a factor which has 10 levels, one for each participating lab.
specimen: 1 to 5 in each lab, no connection between labs.
method: A or R.
result: 0 or 1 (absence or presence).

This experiment appears to be equivalent to a matched cohort study. 
The matching is done on specimens, and there are 2 alternative and 2 
reference results for each specimen.

The sketchy description for clogit() of package "survival" presumes a 
matched case-control study is being analyzed.

I am looking for the correct syntax to use in clogit() for this experiment.

Is

library('survival')
clogit(result ~ lab + strata(specimen), method="exact", data=mydata)

all that is required?

Thanks.
============
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Effect size, interactions, and main effects (Stats Question) [ffmanova]

2007-07-21 Thread Robert A LaBudde

Statistical significance is "detectability", and depends upon the 
size of the  sample as well as the effect. A large enough experiment 
will result in statistical detectability of almost every interaction 
action term allowed.

This is why estimation, not testing, has become the consensus 
recommendation in statistics.

As a practical matter, evaluate the combined effect of your model 
terms with and without the interaction term(s) you are worried about. 
Is the reduction in accuracy of physical importance? If so, the 
interaction terms are required for scientific reasons. If not, 
present both results and indicate the acceptability (for 
interpolation) of the simpler model.

You should also make it your first priority to hypothecate why the 
interaction terms are meaningful and expected. If a cause can be 
found, it may suggest an alternate model that will eliminate 
interactions, or satisfy your anxiety. If not, it may support your 
argument to simplify.

At 08:58 AM 7/21/2007, Mark wrote:

>Dear List Members,
>
>I would very much appreciate any pointers you could give me on the following
>matter:
>
>Main Question:
>To what extent does the "rule" that it is unreasonable to talk about main
>effects if there are significant interactions in a model depend upon effect
>size [of the significant interaction terms]?  Or is this not an issue?
>
>More practically:  Suppose I were to carry out a so-called Type-II MANOVA
>(using ffmanova) and were to find that the interaction term in a 2-way
>analysis has borderline significance (say p = 0.045) and a small effect
>size, whereas one of the main effects is highly signficant (say p = 6.8e-10)
>and has a large effect size.
>
>Would it in this case be reasonable for me to ignore the interaction term,
>and talk only about main effects?  And, presuming the main question is fair,
>are there general guidlines concerning the relationship between level of
>significance and effect size for interaction terms.
>
>Thank you in advance for your help,

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] No convergence using ADAPT

2007-07-07 Thread Robert A LaBudde

What versions of "adapt" and R are you using? The current package was 
built with R-2.5.1.

I tried your program with R-2.5.0, and got the answer 0.1501053 in 
just a few seconds.

At 03:20 PM 7/7/2007, Philip wrote:
>I am trying calculate a probability using numerical integration. The first
>program I ran spit out an answer in a very short time. The program is below:
>
>## START PROGRAM
>
>trial <- function(input)
>
>{
>pmvnorm(lower = c(0,0), upper = c(2, 2), mean = input, sigma = 
>matrix(c(.1, 0,
>0, .1), nrow = 2, ncol = 2, byrow = FALSE))
>}
>
>require(mvtnorm)
>require(adapt)
>
>bottomB <- -5*sqrt(.1)
>topB <- 2 + 5*sqrt(.1)
>areaB <- (topB - bottomB)^2
>
>unscaled.Po.in.a <- adapt(2, lo = c(bottomB, bottomB), up = c(topB, topB),
>minpts = 1000, eps = 1e-4, functn = trial)
>
>(1/areaB)*unscaled.Po.in.a$value
>
>## FINISH PROGRAM
>
>I tried to run the program again changing a.) sigma in the trial 
>function, b.)
>upper in the trial function, and c.) the bounds of integration; that is,
>bottomB and topB.  The new program is below:
>
>## START PROGRAM
>
>trial <- function(input)
>
>{
>pmvnorm(lower = c(0,0), upper = c(10, 10), mean = input, sigma = 
>matrix(c(.01,
>0, 0, .01), nrow = 2, ncol = 2, byrow = FALSE))
>}
>
>require(mvtnorm)
>require(adapt)
>
>bottomB <- -5*sqrt(.01)
>topB <- 10 + 5*sqrt(.01)
>areaB <- (topB - bottomB)^2
>
>unscaled.Po.in.a <- adapt(2, lo = c(bottomB, bottomB), up = c(topB, topB),
>minpts = 1000, eps = 1e-4, functn = trial)
>
>(1/areaB)*unscaled.Po.in.a$value
>
>## FINISH PROGRAM
>
>Now, the program just runs and runs (48 hours at last count!).  By playing
>around with the program, I have deduced the program is highly sensitive to
>changing the upper option in the trial function.  For example, using a vector
>like (4, 4) causes no problems and the program quickly yields an answer.  I
>have a couple of other programs where I can easily obtain a simulation-based
>answer, but I would ultimately like to know what's up with this 
>program before
>I give up on it so I can learn a thing or two.  Does anyone have any clues or
>tricks to get around this problem?  My guess is that it will simply be very
>difficult (impossible?) to obtain this type of relative error (eps = 
>1e-4) and
>I will have no choice but to pursue the simulation approach.
>
>Thanks for any responses ([EMAIL PROTECTED])!
>
>-- Phil
>
>Philip Turk
>Assistant Professor of Statistics
>Northern Arizona University
>Department of Mathematics and Statistics
>PO Box 5717
>Flagstaff, AZ 86011
>Phone: 928-523-6884
>Fax: 928-523-5847
>E-mail: [EMAIL PROTECTED]
>Web Site: http://jan.ucc.nau.edu/~stapjt-p/
>
>__
>R-help@stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] outlying

2007-06-19 Thread Robert A LaBudde

At 05:29 AM 6/19/2007, elyakhlifi wrote:
>hello,
>are there functions to detecte outlying observations in samples?
>thanks.

library('car')
? outlier.test

library('outliers')
? grubbs.test
? dixon.test
? cochran.test
? chisq.out.test
============
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Iterative Solver [Converting Matlab's solve()]

2007-06-18 Thread Robert A LaBudde

At 12:24 AM 6/19/2007, Boris wrote:
>I can't for the life of me figure out how to get the roots for this simple
>(but sovable only iteratively) equation in R:
>
>x = z*(1-(1-2/z)^y
>
>where x and y are known, and z is unknown. In Matlab, this amounts to:
>
>[my.solution] = solve('x = z*(1-(1-2/z)^y')
>my.solution.real = solution(y-1,y)
>
>% bottom line displays non-imaginary solution (last element)
>

1.  Your equation is syntactically incorrect. It is missing a ")". 
Ditto for your Matlab example.

2. Are you looking for an analytical (symbolic) solution? If so, I 
don't believe one exists (if you place a ")" after y).

3. To find numerical roots, see uniroot().

4. If you want more help, you'll have to specify domains or values for x and y.

====
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Optimization

2007-06-18 Thread Robert A LaBudde

You don't need optimization for the solution to your problem. You 
just need an understanding of the meaning of qnorm() and some simple algebra.

Try: x<- (0.01-0.0032)/qnorm(0.7,0,1)

At 12:01 PM 6/18/2007, you wrote:

>Hi, I would like to minimize the value of x1-x2, x2 is a fixed value of 0.01,
>x1 is the quantile of normal distribution (0.0032,x) with probability of
>0.7, and the changing value should be x. Initial value for x is 0.0207. I am
>using the following codes, but it does not work.
>
>fr <- function(x) {
>   x1<-qnorm(0.7,0.0032,x)
>   x2=0.01
>   x1-x2
>}
>xsd <- optim(0.0207, fr, NULL,method="BFGS")
>
>It is the first time I am trying to use optimization. Could anyone give me
>some advice?
>--
>View this message in context: 
>http://www.nabble.com/Optimization-tf3941212.html#a11178663
>Sent from the R help mailing list archive at Nabble.com.
>
>__
>R-help@stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [Not R question]: Better fit for order probit model

2007-06-17 Thread Robert A LaBudde

At 01:29 PM 6/17/2007, adschai wrote:
>Thank you so much Robert. Please find the information below.
>
>The scale 1-10 are subjective physical condition ratings scored by 
>inspection engineers at the site. 1-5 are in very bad conditions 
>(bridge close down to seriously deteriorated). The rest from 6-10 
>are categorized as good condition.However, the proportion of samples 
>in my data are as follows. Bridges with 1-5 scale covers about 2-3% 
>of total population. The majority of the bridges are in 7-8. 
>Certainly, I have enough data for model estimation but the mix of 
>the proportion is good. I attached the detail of condition rating 
>scale at the end of this message.
>

My guess is that you really have two distributions here: 1) Bridges 
that are basically under proper supervision and repair (Categories 
6-10), and those that are not Categories 1-5). These two classes 
would probably have dramatically different relations to the 
covariates your are using.

So my recommendation would be to consider splitting your response 
into two classes (0/1), each with 5 sub categories.

Keeping the two classes merged together in a single group is 
equivalent to merging two different distributions with a weighting 
factor. This may be causing a bimodal distribution which is giving 
you problems.

You could try your modeling on each of the two classes separately 
before continuing with your full dataset modeling. You may learn 
something useful to help you with your problems.

For the full model, you would need to include a full set of 
interaction terms with "class".
============
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [Not R question]: Better fit for order probit model

2007-06-16 Thread Robert A LaBudde

At 03:17 AM 6/16/2007, adschai wrote:
>Thank you so much Robert. I haven't thought about the idea of 
>clumping categories together. One of the reason is because these 
>categories are bridge condition rating scores. They indeed represent 
>different meaning and serviceability conditions. They vary from 0-9. 
>I have about 300,000 data in which the first 5 labels, i.e. 0-4, are 
>bad condition bridge and there are only less than 1000 instances in 
>total. The worst case, is for example, score 0 (meaning the bridge 
>is not operatable), I have 60 instances. Score 1 I have about 100.
>
>I would appreciate if you could provide some opinion as to how you 
>would make the order probit fits better in this case? Thank you so 
>much in advance.

You certainly appear to have enough data to populate these 
categories. Your problems in a getting a good fit may relate to other problems.

You need to supply more information in order to say more.

What are the definitions of each category?

Is the ordering consistent, or are there really two different scales, 
one for bridge with essentially no problems, and another for those 
with serious damage?

What evidence do you have that your fit is poor?

What model are you fitting?

Etc.

============
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [Not R question]: Better fit for order probit model

2007-06-15 Thread Robert A LaBudde

At 09:31 PM 6/15/2007, adschai wrote:
>I have a model which tries to fit a set of data with 10-level 
>ordered responses. Somehow, in my data, the majority of the 
>observations are from level 6-10 and leave only about 1-5% of total 
>observations contributed to level 1-10. As a result, my model tends 
>to perform badly on points that have lower level than 6.
>
>I would like to ask if there's any way to circumvent this problem or 
>not. I was thinking of the followings ideas. But I am opened to any 
>suggestions if you could please.
>
>1. Bootstrapping with small size of samples each time. Howevever, in 
>each sample basket, I intentionally sample in such a way that there 
>is a good mix between observations from each level. Then I have to 
>do this many times. But I don't know how to obtain the true standard 
>error of estimated parameters after all bootstrapping has been done. 
>Is it going to be simply the average of all standard errors 
>estimated each time?
>
>2. Weighting points with level 1-6 more. But it's unclear to me how 
>to put this weight back to maximum likelihood when estimating 
>parameters. It's unlike OLS where your objective is to minimize 
>error or, if you'd like, a penalty function. But MLE is obviously 
>not a penalty function.
>
>3. Do step-wise regression. I will segment the data into two 
>regions, first points with response less than 6 and the rest with 
>those above 6. The first step is a binary regression to determine if 
>the point belongs to which of the two groups. Then in the second 
>step, estimate ordered probit model for each group separately. The 
>question here is then, why I am choosing 6 as a cutting point 
>instead of others?
>
>Any suggestions would be really appreciated. Thank you.

You could do the obvious, and lump categories such as 1-6 or 1-7 
together to make a composite category.

You don't mention the size of your dataset. If there are 10,000 data, 
you might live with a 1% category. If you only have 100 data, you 
have too many categories.

Also, next time plan your study and training better so that next time 
your categories are fully utilized. And don't use so many categories. 
People have trouble even selecting responses on a 5-level scale.

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] pretty report

2007-06-12 Thread Robert A LaBudde

At 09:13 PM 6/12/2007, Don wrote:
>At 5:01 PM -0400 6/12/07, Weiwei Shi wrote:
> >Dear Listers:
> >
> >I have a couple of data frames to report and each corresponds to
> >different condtions, e.g. conditions=c(10, 15, 20, 25). In this
> >examples, four data frames need to be exported in a "pretty" report.
> >
> >I knew Perl has some module for exporting data to Excel and after
> >googling, I found R does not.
>
>I use write.table(), name the file with ".xls" as the suffix, then
>outside R I double-click on it and it opens in Excel. Granted, it's a
>text file, and Excel is opening a text file, but none the less, I
>

Note that files with a ".csv" extension are also associated with 
Excel and can opened with a double-click. Comma-separated-value files 
also can be unambiguously loaded by Excel without parsing.

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Appropriate regression model for categorical variables

2007-06-12 Thread Robert A LaBudde

At 01:45 PM 6/12/2007, Tirtha wrote:
>Dear users,
>In my psychometric test i have applied logistic regression on my data. My
>data consists of 50 predictors (22 continuous and 28 categorical) plus a
>binary response.
>
>Using glm(), stepAIC() i didn't get satisfactory result as misclassification
>rate is too high. I think categorical variables are responsible for this
>debacle. Some of them have more than 6 level (one has 10 level).
>
>Please suggest some better regression model for this situation. If possible
>you can suggest some article.

1. Using if a factor has many levels, there is a natural order to the 
levels. If so, consider fitting the factor as an ordered factor.

2. Break the factor levels into 2 or 3 groups that have some rational 
connection. Then fit the factor with a smaller number of levels. 
E.g., "race" might have levels "white", "black", "asian", "pacific", 
"Spanish surname", "other". Consider a change to "white", "nonwhite".

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] p-value from GEE why factor 2*pnorm?

2007-06-11 Thread Robert A LaBudde

At 11:21 AM 6/11/2007, Carmen wrote:
>
>In this tread there is a hint hwo to calculate the p-vlue of an GEE:
> > _http://finzi.psych.upenn.edu/R/Rhelp02a/archive/74150.html
> >
> > Then, get the P values using a normal approximation for the
> > distribution of z:
> >
> > /> 2 * pnorm(abs(coef(summary(fm1))[,5]), lower.tail = FALSE) /
> > (Intercept) TPTLD  0. 0.04190831
>
>1. why is the result multiplicated  with 2? There is a P-value between 1 and 2
>with the results below and multiplicated with 2:
>
>2*pnorm(c(1.8691945,0.5882351,2.4903091,1.9287802,2.3172983,2.2092593,2.2625959,1.6395695),
>lower.tail =TRUE)

1. The given in the thread mentioned was:

 2 * pnorm(abs(coef(summary(fm1))[,5]), lower.tail = FALSE)

2. The reason for the "2" at the front is to make it an "equal-tails" 
or "2-sided" confidence interval. Pedantically, you should use 1.96 
instead of 2.0 for consistency, but 2.0 = 1.96 rounded to one decimal place.

3. This is what is usually called a "Wald" type confidence interval, 
as it is simply the normal quantile (+/- 1.96) multiplied by the 
standard error of estimate to get the +/- widths for the interval. 
These would be added to the estimate itself to get the final Wald 
confidence interval, which obviously assumes a normal distribution applies.

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Windows vista's early terminate Rgui execution

2007-06-10 Thread Robert A LaBudde

At 03:28 PM 6/10/2007, [EMAIL PROTECTED] wrote:
>Hi,I have a frustrating problem from vista that I wonder if anyone 
>has come across the same problem. I wrote a script that involves 
>long computational time (although, during the calculation, it spits 
>out text on the gui to notify me the progress of the calculation 
>periodically). Windows vista always stopped my calculation and 
>claimed that 'Rgui is stop-working. Windows is checking for 
>solution.' And when I looked into task manager, windows already 
>stopped my Rgui process. I am quite disappointed with this. I would 
>really appreciate if anyone finds a solution to go around this 
>windows vista problem? Particularly, how to turn off this feature in 
>vista? Any help would be really appreciated. Thank you!- adschai

You probably need to contact Vista periodically so it knows you are awake.

Just include a line that does a call to Vista that doesn't do output, such as

useless <- dir()

placed in some outer loop that satisfies the "drop dead" time between calls.

Alternatively, you can attempt to find out how to change the registry 
entry corresponding to the wait time and increase it to a value you 
can live with.

============
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] What ECDF function?

2007-06-09 Thread Robert A LaBudde

At 06:36 PM 6/9/2007, Marco wrote:
>On 6/9/07, Robert A LaBudde <[EMAIL PROTECTED]> wrote:
> > At 12:57 PM 6/9/2007, Marco wrote:
> > >
>
>
>Hmmm I'm a bit confused, but very interested!
>So you don't use the R "ecdf", do you?

Only when an i/n edf is needed (some tests, such as ks.test() are 
based on this). Also, I frequently do modeling in Excel as well, 
where you need to enter your own formulas.

>
> > Also remember that edfs are not very accurate, so the differences
> > between these formulae are difficult to justify in practice.
> >
>
>I will bear in min! My first interpretation was that using some
>different from i/n (e.g. i/(n+1)), let to better individuate tail 
>differences (maybe...)

The chief advantage to i/(n+1) is that you don't generate 1.0 as an 
abscissa, as you do with i/n. But the same is true of (i-0.5)/n, and 
it's more accurate.

Unless you need to do otherwise, just use ecdf(), because it matches 
the theory for most uses, and it almost always doesn't matter that 
it's slightly less accurate than other choices.

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] What ECDF function?

2007-06-09 Thread Robert A LaBudde

At 12:57 PM 6/9/2007, Marco wrote:
>
>2.I found various version of P-P plot  where instead of using the
>"ecdf" function use ((1:n)-0.5)/n
>   After investigation I found there're different definition of ECDF
>(note "i" is the rank):
>   * Kaplan-Meier: i/n
>   * modified Kaplan-Meier: (i-0.5)/n
>   * Median Rank: (i-0.3)/(n+0.4)
>   * Herd Johnson i/(n+1)
>   * ...
>   Furthermore, similar expressions are used by "ppoints".
>   So,
>   2.1 For P-P plot, what shall I use?
>   2.2 In general why should I prefer one kind of CDF over another one?
>

This is an age-old debate in statistics. There are many different 
formulas, some of which are optimal for particular distributions.

Using i/n (which I would call the Kolmogorov method), (i-1)/n or 
i/(n+1) is to be discouraged for general ECDF modeling. These 
correspond in quality to the rectangular rule method of integration 
of the bins, and assume only that the underlying density function is 
piecewise constant. There is no disadvantage to using these methods, 
however, if the pdf has multiple discontinuities.

I tend to use (i-0.5)/n, which corresponds to integrating with the 
"midpoint rule", which is a 1-point Gaussian quadrature, and which is 
exact for linear behavior with derivative continuous. It's simple, 
it's accurate, and it is near optimal for a wide range of continuous 
alternatives.

The formula (i- 3/8)/(n + 1/4) is optimal for the normal 
distribution. However, it is equal to (i-0.5)/n to order 1/n^3, so 
there is no real benefit to using it. Similarly, there is a formula 
(i-.44)/(N+.12) for a Gumbel distribution. If you do know for sure 
(don't need to test) the form of the distribution, you're better off 
fitting that distribution function directly and not worrying about the edf.

Also remember that edfs are not very accurate, so the differences 
between these formulae are difficult to justify in practice.

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] glm() for log link and Weibull family

2007-06-08 Thread Robert A. LaBudde

I need to be able to run a generalized linear model with a log() link 
and a Weibull family, or something similar to deal with an extreme 
value distribution.

I actually have a large dataset where this is apparently necessary. 
It has to do with recovery of forensic samples from surfaces, where 
as much powder as possible is collected. This apparently causes the 
results to conform to some type of extreme value distribution, so 
Weibull is a reasonable starting point for exploration.

I have tried ('surface' and 'team' are factors)

glm(surfcount ~ surface*team, data=powderd, family=Gamma(link='log'))

but this doesn't quite do the trick. The standardized deviance 
residuals are still curved away from normal at the tails.

Thanks for any info you can give on this nonstandard model.
============
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] pnorm how to decide lower-tail true or false

2007-06-08 Thread Robert A LaBudde

At 01:31 PM 6/8/2007, Carmen wrote:
>Hi to all,
>maybe the last question was not clear enough.
>I did not found any hints how to decide whether it should use lower.tail
>or not.
>As it is an extra R-feature ( written in
>http://finzi.psych.upenn.edu/R/Rhelp02a/archive/66250.html )
>I do not find anything about it in any statistical books of me.
>Regards Carmen

pnorm(z, lower.tail=TRUE) (the R default) gives the probability of a 
normal variate being at or below z. This is the value commonly called 
the cumulative distribution function at the point z, or the integral 
from -Inf to z of the gaussian density.

pnorm(z, lower.tail=FALSE) gives the complement of the above, or 1 - 
cdf(z), and is the integral from z to Inf of the gaussian density.

E.g.,

 > pnorm(1.96, lower.tail=TRUE)
[1] 0.9750021
 > pnorm(1.96, lower.tail=FALSE)
[1] 0.02499790
 >

Use lower.tail=TRUE if you are, e.g., finding the probability at the 
lower tail of a confidence interval or if you want to the probability 
of values no larger than z.

Use lower.tail=FALSE if you are, e.g., trying to calculate test value 
significance or at the upper confidence limit, or you want the 
probability of values z or larger.

You should use pnorm(z, lower.tail=FALSE) instead of 1-pnorm(z) 
because the former returns a more accurate answer for large z.

This is really simple issue, and has no inherent complexity 
associated with it.
============
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Finding density curve peak

2007-06-02 Thread Robert A LaBudde

At 07:34 PM 6/2/2007, Davendra wrote:
>I have a dataset where I get a density curve of a continuous variable with
>two peaks.
>How can I get the peaks?
>Any simple solutions?

Plot the curve, and the use identify() and the mouse to click on the peaks.

========
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Curve crosses back to origin in plot

2007-05-27 Thread Robert A. LaBudde

Another sample problem: In the Windows version of R-2.5.0,

data(GHQ,package='HSAUR')
layout(1)
GHQ_glm_1<- glm(cbind(cases,non.cases) ~ GHQ, data=GHQ, family=binomial())
summary(GHQ_glm_1)
yfit_glm_1<- predict(GHQ_glm_1, type='response')
layout(1)
plot(probs ~ GHQ,pch=1,col=1,ylab='Probability(Case)', data=GHQ)
lines(yfit_glm_1 ~ GHQ, pch=3,col=3, data=GHQ)
legend("topleft", c("linear", "logistic"), pch=c(2,3), col=c(2,3))

Everything is fine, but the predicted values curve in the lines() 
statement becomes closed by a straight line segment connecting the 
last point to the first.

How can this be avoided? It appears to be wrapping due to the change 
from GHQ=10 for female to GHQ=0 again for male, bring the curve back 
to the beginning value.

One way to avoid this is to plot each sex's data in a separate lines() call.

Is there a way of doing this in a single lines() statement? If not, 
I'd like to know. If two lines() are needed, I'd like to know an 
efficient syntax. My attempt would be

lines(yfit_glm_1[0:10] ~ GHQ[0:10], pch=3,col=3, data=GHQ)
lines(yfit_glm_1[11:22] ~ GHQ[11:22], pch=3,col=3, data=GHQ)

which seems inelegant, as it involves numerical ranges that have to 
be determined by inspection.

Thanks again for answering these simple questions that seem to be the 
hardest to find answers for.
====
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to reference or sort rownames in a data frame

2007-05-27 Thread Robert A LaBudde

Thanks, Gabor.

I have to say I wouldn't have figured this out easily.

I'd summarize your comments by:

1. Remember to use arrays of logicals as indices.
2. Remember %in% for combination matches.
3. Remember which() to get indices.

It is the small tasks which appear most difficult to figure out in R.

At 10:29 PM 5/27/2007, Gabor wrote:
>On 5/27/07, Robert A. LaBudde <[EMAIL PROTECTED]> wrote:
>>As I was working through elementary examples, I was using dataset
>>"plasma" of package "HSAUR".
>>
>>In performing a logistic regression of the data, and making the
>>diagnostic plots (R-2.5.0)
>>
>>data(plasma,package='HSAUR')
>>plasma_1<- glm(ESR ~ fibrinogen * globulin, data=plasma, family=binomial())
>>layout(matrix(1:4,nrow=2))
>>plot(plasma_1)
>>
>>I find that data points corresponding to rownames 17 and 23 are
>>outliers and high leverage.
>>
>>I would then like to perform a fit without these two rows.
>>
>>In principle this should be easy, using an update() with subset=-c(17,23).
>>
>>The problem is that the rownames in this dataset are not ordered,
>>and, in fact, the relevant rows are 30 and 31, not 17 and 23.
>>
>>This brings up the following (elementary?) questions:
>>
>>1. How do you reference rows in "subset=" for which you know the
>>rownames, but not the row numbers?
>
>Use a logical vector:
>
>   rownames(plasma) %in% c(17, 23)
>
>>
>>2. How do you discovery the rows corresponding to particular
>>rownames? (Using plasma[rownames(plasma)==17,] shows the data, but
>>NOT the row number!) (Probably the same answer as in Q. 1 above.)
>
>  which(rownames(plasma) %in% c(17, 23)) # 30, 31
>
>>
>>3. How do you sort (order) the rows of an existing data frame so that
>>the rownames are in order?
>
>
>  plasma[order(as.numeric(rownames(plasma))), ]

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to reference or sort rownames in a data frame

2007-05-27 Thread Robert A. LaBudde

As I was working through elementary examples, I was using dataset 
"plasma" of package "HSAUR".

In performing a logistic regression of the data, and making the 
diagnostic plots (R-2.5.0)

data(plasma,package='HSAUR')
plasma_1<- glm(ESR ~ fibrinogen * globulin, data=plasma, family=binomial())
layout(matrix(1:4,nrow=2))
plot(plasma_1)

I find that data points corresponding to rownames 17 and 23 are 
outliers and high leverage.

I would then like to perform a fit without these two rows.

In principle this should be easy, using an update() with subset=-c(17,23).

The problem is that the rownames in this dataset are not ordered, 
and, in fact, the relevant rows are 30 and 31, not 17 and 23.

This brings up the following (elementary?) questions:

1. How do you reference rows in "subset=" for which you know the 
rownames, but not the row numbers?

2. How do you discovery the rows corresponding to particular 
rownames? (Using plasma[rownames(plasma)==17,] shows the data, but 
NOT the row number!) (Probably the same answer as in Q. 1 above.)

3. How do you sort (order) the rows of an existing data frame so that 
the rownames are in order?

I don't seem to know the magic words to find the answers to these 
questions in the help systems.

Obviously this can be done by writing new, brute force, functions 
scanning the subscripts, but there must be an (obvious?) direct way 
of doing this more elegantly.

Thanks for any pointers.
============
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Survival statistics--displaying multiple plots

2007-05-02 Thread Robert A LaBudde

The layout() function below states there are to be 3 graphs on the 
first row, side by side.

Each time you plot, R applies the graph to the next free position on 
the layout. After 3 plots with your different boolean subsets, you 
end up with 3 graphs side by side.

If this is not what you want (you asked for this in your first 
request), and instead you want 3 curves on the same, single graph, 
use plot() for the first curve(s) and then use lines() to add the 
additional curves.

Check the survfit() object to see how to pick off individual times 
and survival curves by subscripting.

At 12:09 AM 5/3/2007, Greg wrote:
>Thanks for replying Robert. Forgive me, it might be the hour or my
>limitations, but I am a little unclear on how to implement your suggestion.
>
>In my original example,
>
> >plot(survfit(Surv(days,status==1),subset(tips,meld<10))
>
>A plot of the fraction of patients surviving following the procedure against
>the number of days since the procedure would be generated for patients with
>meld scores of less than 10.
>
>Similarly, if I wanted to generate a survival curve of patients with scores
>of between 10 and 20, I can with the following:
>
>  >plot(survfit(Surv(days,status==1),subset(tips,meld>10 & meld <20))
>
>
>And for patients with meld>20,
>
> >plot(survfit(Surv(days,status==1),subset(tips,meld>20))
>
>
>But how do I display the curves in each cadre (meld<10, 10meld>20) on the same chart?
>
>
>-Original Message-
>From: Robert A LaBudde [mailto:[EMAIL PROTECTED]
>Sent: Wednesday, May 02, 2007 11:48 PM
>To: Gregory Pierce
>Subject: Re: [R] Survival statistics--displaying multiple plots
>
>? layout()
>? par()
>
>E.g.,
>
>layout(matrix(c(1,2,3),1,3,byrow=TRUE) #3 plots side-by-side
>
>Then use plot() three times to generate each of your graphs.
>
>At 11:14 PM 5/2/2007, Greg wrote:
> >I should clarify. I can generate plots for each category individually but
> >not for all three on the same chart.
> >
> >Greg
> >
> >-Original Message-
> >From: Gregory Pierce [mailto:[EMAIL PROTECTED]
> >Sent: Wednesday, May 02, 2007 10:21 PM
> >To: 'r-help@stat.math.ethz.ch'
> >Subject: Survival statistics--displaying multiple plots
> >
> >Hello all!
> >
> >I am once again analyzing patient survival data with chronic liver disease.
> >
> >The severity of the liver disease is given by a number which is
>continuously
> >variable. I have referred to this number as "meld"--model for end stage
> >liver disease--which is the result of a mathematical calculation on
> >underlying laboratory values. So, for example, I can generate a
>Kaplan-Meier
> >plot of patients undergoing a TIPS procedure with the following:
> >
> > >plot(survfit(Surv(days,status==1),subset(tips,meld<10))
> >
> >where "tips" is my data set, "days" is the number of days alive, and meld
>is
> >the meld score.
> >
> >What I would like to do is display the survival graphs of patients with
> >meld<10, 1020. I am unsure about how to go about this.
> >
> >Any suggestions would be appreciated.


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Survival statistics--displaying multiple plots

2007-05-02 Thread Robert A LaBudde

? layout()
? par()

E.g.,

layout(matrix(c(1,2,3),1,3,byrow=TRUE) #3 plots side-by-side

Then use plot() three times to generate each of your graphs.

At 11:14 PM 5/2/2007, Greg wrote:
>I should clarify. I can generate plots for each category individually but
>not for all three on the same chart.
>
>Greg
>
>-Original Message-
>From: Gregory Pierce [mailto:[EMAIL PROTECTED]
>Sent: Wednesday, May 02, 2007 10:21 PM
>To: 'r-help@stat.math.ethz.ch'
>Subject: Survival statistics--displaying multiple plots
>
>Hello all!
>
>I am once again analyzing patient survival data with chronic liver disease.
>
>The severity of the liver disease is given by a number which is continuously
>variable. I have referred to this number as "meld"--model for end stage
>liver disease--which is the result of a mathematical calculation on
>underlying laboratory values. So, for example, I can generate a Kaplan-Meier
>plot of patients undergoing a TIPS procedure with the following:
>
> >plot(survfit(Surv(days,status==1),subset(tips,meld<10))
>
>where "tips" is my data set, "days" is the number of days alive, and meld is
>the meld score.
>
>What I would like to do is display the survival graphs of patients with
>meld<10, 1020. I am unsure about how to go about this.
>
>Any suggestions would be appreciated.


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is R's fast fourier transform function different from "fft2" in Matlab?

2007-05-02 Thread Robert A LaBudde

Discrete Fourier transforms can be normalized in different ways.

Some apply the whole normalization to the forward transform, some to 
the reverse transform, some apply the square root to each, and some 
don't normalize at all (in which case the reverse of the forward 
transform will need scaling).

The latter apparently the case with R, according to your values.

Note that the R and the MatLab answers agree to within a scale factor 
for each row.

At 10:53 PM 5/2/2007, Li-Li wrote:
>Thanks for both replies.
>Then I found the "ifft2" from Matlab gives different result from "fft( ,
>inverse=T)" from R.
>An example:
>in R:
> > temp  <- matrix(c(1,4,2, 20), nrow=2)
> > fft(temp)
>[,1]   [,2]
>[1,]  27+0i -17+0i
>[2,] -21+0i  15+0i
> > fft(temp,inverse=T)
>[,1]   [,2]
>[1,]  27+0i -17+0i
>[2,] -21+0i  15+0i
>
>In Matlab:
> > A = [1,2;4,20];
> > fft2(A)
>Ans =
>27-17
>   -21 15
> >ifft2(A)
>Ans=
>6.7500-4.2500
>   -5.2500  3.7500
>
>I also tried mvfft with inverse but can't get same result with "ifft2". Does
>any function work?

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] upgrade to 2.5

2007-05-02 Thread Robert A LaBudde

At 01:41 PM 5/2/2007, you wrote:
>On 5/2/07, Sundar Dorai-Raj <[EMAIL PROTECTED]> wrote:
> >
> >
> > Iasonas Lamprianou said the following on 5/2/2007 8:25 AM:
> > > Hi I am using R version 2.4.1. How can I upgrade to version 2.5 
> without having to install all the packages again?
> > > Thanks
> > > Jason
> > >
> >
> > You may find the following link relevant.
> >
> > http://finzi.psych.upenn.edu/R/Rhelp02a/archive/75359.html
> >
>
>if you use Windows XP.

This link was useful to me, as I am new to R. (Win2000, R-2.5.0)

What I have been doing is using a file compare utility (Beyond 
Compare in my case) to move files in the old "library" directory to 
the new one, if the files are missing in the new one. Then I perform 
an update.packages command.

This procedure appears to work without problem.

It would seem much preferable to have all packages saved in an 
installation-independent directory, instead of a "library" directory 
under R's installation directory. Then, of course, no update would be 
necessary.

I can't find how this option is settable in R, other than a direct 
argument to library() or install.package().

How does one shift the R default libraries location to a particular directory?

Thanks.

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Log-likelihood function

2007-05-02 Thread Robert A LaBudde

At 07:30 AM 5/2/2007, Doxastic wrote:
>Thanks.  I used this and it gave me the same result as the "logLik" function.
>The reason I ask is the SAS output gives me a loglik = 1089.  R gives me
>-298.09583.  Both for my reduced model.  For the saturated (or complex)
>model, SAS gives me an loglik = 1143.  R gives me -298.1993.  The problem is
>these give two very different pictures about whether I can drop the
>interaction.  However, I think the residual deviance in the R output is
>equal to G^2.  So, I can just take the difference between those two.  If I
>do this, I get a difference with an interpretation similar to that of what
>comes from SAS.  So I think I'll just go with that.  But who knows if I'm
>right (not me)?

Some comments:

1. Use summary() on your glm() object to get a fuller display of 
post-fit statistics, including the starting ("null") and residual deviances.

2. The "deviance" is - 2 L, where L = ln(likelihood).

3. To test two nested models for the difference in covariates, 
subtract the two residual deviances and two d.f. and perform a 
chi-square test. This can be done nicely by anova() on the two glm() objects.

4. Check the coefficients in your SAS and R models and make sure you 
are performing the same fit in both cases.

========
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] bivariate normal distribution in likelihood

2007-04-30 Thread Robert A LaBudde

At 11:32 PM 4/30/2007, Deepankar wrote:
>Hi all,
>
>I am trying to do a maximum likelihood estimation. In my likelihood 
>function, I have to evaluate a double integral of the bivariate 
>normal density over a subset of the points of the plane. For 
>instance, if I denote by y the y co-ordinate and by x, the x 
>co-ordinate then the area over which I have to integrate is the 
>following set of points, A:
>A = {x>=0 & y < 3x+10}
>
>I have used the following code to calculate this double integral:
>
>x <- rmvnorm(10, mean=me, sigma=sig)
>c <- nrow(x)
>x1 <- x[ ,1]
>x2 <- x[ ,2]
>e1 <- as.numeric(x2 < 3*x1 + 10 & x1>0)
>p1 <- sum(e1)/c
>
>In this code, I provide the mean and covariance while drawing the 
>bivariate random normal variables and get "p1" as the required 
>answer. The problem is that I have to draw at least 100,000 
>bivariate random normals to get a reasonable answer; even then it is 
>not very accurate.
>
>Is there some other way to do the same thing more accurately and 
>more efficiently? For instance, can this be done using the bivariate 
>normal distribution function "pmvnorm"? Also feel free to point our 
>errors if you see one.

Simple random sampling is a poor way to evaluate an integral 
(expectation). It converges on the order of 1/sqrt(N).

Stratified random sampling would be better, as it converges on the 
order of 1/N.

Even better is product Gauss-Hermite quadrature which will give a 
very accurate answer with a few dozen points.

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to solve difficult equations?

2007-04-25 Thread Robert A LaBudde

At 03:15 AM 4/25/2007, francogrex wrote:

>This below is not solvable with uniroot to find "a":
>fn=function(a){
>b=(0.7/a)-a
>(1/(a+b+1))-0.0025
>}
>uniroot(fn,c(-500,500))  gives
>"Error in uniroot(fn, c(-500, 500)) : f() values at end points not of
>opposite sign"
>
>I read R-help posts and someone wrote a function:
>http://finzi.psych.upenn.edu/R/Rhelp02a/archive/92407.html
>but it is not very precise. Is there any '"standard" function in R that can
>solve this? thanks.

Actually, if you're solving fn(a)==0, then some trivial algebra leads 
to a linear equation with a=0.001754.

Why are you trying to solve this numerically? Is it a single instance 
of a larger, more general problem?

====
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Use of Lexis function to convert survival data to counting format

2007-04-24 Thread Robert A. LaBudde

I'm trying to convert a dataset from the time-independent analysis form

 > head(addicts)
   id clinic status survtime prison meth clinic01
1  1  1  1  428  0   501
2  2  1  1  275  1   551
3  3  1  1  262  0   551

into the "counting data format" necessary to perform extended Cox regression.

I got survSplit() to work, but it appears to have a problem with end 
times on the boundaries of the intervals. (My intervals are [0,183], 
[184,365], [366,548], [549,730], [730,Inf]). I looked in the 
searchable archives and found that others had also discovered 
problems with survSplit() and suggested Lexis() in the Epi package.

When I try to get Lexis to work, I get an error message that I cannot 
interpret:

 > 
addictscdf<-Lexis(entry=start,exit=stop,fail='status',breaks=c(184,366,549,730),
+   include=c('id','clinic','prison','meth','clinic01'),data=addicts)

 The following object(s) are masked from addicts :

  clinic id meth prison status survtime

Error: object is not subsettable


I'm sure I'm making some type of trivial error here, but I don't know 
how to find it. I would guess that it has something to do with 
'start' and 'stop', of which arguments I am apparently clueless as to 
the meaning.

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using R to create pdf's from each file in a directory

2007-04-20 Thread Robert A LaBudde

At 11:06 PM 4/20/2007, I wrote:
>At 09:40 PM 4/20/2007, gecko951 wrote:
> >
> >list <- dir("/tmp/data")
> >for(x in list){
> >d <- read.table(x, sep="\t", header=TRUE) # read data
> >pdf("/tmp/graph/x.pdf")  # file for graph
> >
>
>I'm a tyro at R, but it's obvious here that the line
>
>pdf("/tmp/graph/x.pdf")
>
>has the intended file name string 'x' embedded with the literal
>string enclosed in quotation marks. Obviously he needs "/tmp/graph/"
>string concatenated with x and then ".pdf".
>
>I would suggest a fix, but I am unable to find in the documentation
>how to concatenate two strings into a single string.
>
>So I will amplify gecko951's question to include "How you you
>concatenate two strings in R?". I.e.,
>
>x<-"abc"
>y<-"def"
>
>What operator or function returns "abcdef" as a result?
>

I found that paste() can be used for string concatenation, so I can 
now suggest a fix to gecko951's problem. Replace

pdf("/tmp/graph/x.pdf")

with

pdf(paste(paste('/tmp/graph/',x,sep=''),'.pdf',sep=''))

and the program should now work for the different file names.

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using R to create pdf's from each file in a directory

2007-04-20 Thread Robert A LaBudde

At 09:40 PM 4/20/2007, gecko951 wrote:
>
>list <- dir("/tmp/data")
>for(x in list){
>d <- read.table(x, sep="\t", header=TRUE) # read data
>pdf("/tmp/graph/x.pdf")  # file for graph
>

I'm a tyro at R, but it's obvious here that the line

pdf("/tmp/graph/x.pdf")

has the intended file name string 'x' embedded with the literal 
string enclosed in quotation marks. Obviously he needs "/tmp/graph/" 
string concatenated with x and then ".pdf".

I would suggest a fix, but I am unable to find in the documentation 
how to concatenate two strings into a single string.

So I will amplify gecko951's question to include "How you you 
concatenate two strings in R?". I.e.,

x<-"abc"
y<-"def"

What operator or function returns "abcdef" as a result?

Thanks.

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] S-Plus "resample" package and associated functions

Re: [R] Excel

Re: [R] Excel

Re: [R] chi-square statistics

[R] Conditional logistic regression on n:m matched "cohort" data [Corrected]

[R] Conditional logistic regression on n:m matched "cohort" data

Re: [R] Effect size, interactions, and main effects (Stats Question) [ffmanova]

Re: [R] No convergence using ADAPT

Re: [R] outlying

Re: [R] Iterative Solver [Converting Matlab's solve()]

Re: [R] Optimization

Re: [R] [Not R question]: Better fit for order probit model

Re: [R] [Not R question]: Better fit for order probit model

Re: [R] [Not R question]: Better fit for order probit model

Re: [R] pretty report

Re: [R] Appropriate regression model for categorical variables

Re: [R] p-value from GEE why factor 2*pnorm?

Re: [R] Windows vista's early terminate Rgui execution

Re: [R] What ECDF function?

Re: [R] What ECDF function?

[R] glm() for log link and Weibull family

Re: [R] pnorm how to decide lower-tail true or false

Re: [R] Finding density curve peak

[R] Curve crosses back to origin in plot

Re: [R] How to reference or sort rownames in a data frame

[R] How to reference or sort rownames in a data frame

Re: [R] Survival statistics--displaying multiple plots

Re: [R] Survival statistics--displaying multiple plots

Re: [R] Is R's fast fourier transform function different from "fft2" in Matlab?

Re: [R] upgrade to 2.5

Re: [R] Log-likelihood function

Re: [R] bivariate normal distribution in likelihood

Re: [R] How to solve difficult equations?

[R] Use of Lexis function to convert survival data to counting format

Re: [R] Using R to create pdf's from each file in a directory

Re: [R] Using R to create pdf's from each file in a directory

36 matches

Site Navigation

Mail list logo

Footer information