[R] S-Plus "resample" package and associated functions
Are there any packages in R that reproduce the package "resample" of S-Plus? The sample() function in R doesn't provide equivalent flexibility of bootstrap() and bootstrap2(). ======== Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Excel
At 01:21 AM 8/28/2007, David wrote: >On Tue, 28 Aug 2007, Robert A LaBudde wrote: > >>If you format the column as "Text", you won't have this problem. By >>leaving the cells as "General", you leave it up to Excel to guess at >>the correct interpretation. > >Not true actually. I had converted the column to Text because I saw >the interpretation as a date in the .xls file. I saved the .csv file >*after* the column had been converted to Text. Looking at the .csv >file in a text editor, the entry is correct. > You need to convert the column to Text before you enter the data. This tells Excel the presumption to use. Converting to Text after you enter the values has no effect on previously entered values. I've tested this using Excel 2000. ======== Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Excel
If you format the column as "Text", you won't have this problem. By leaving the cells as "General", you leave it up to Excel to guess at the correct interpretation. You will note that the conversion to a date occurs immediately in Excel when you enter the value. There are many formats to enter dates. Either pre-format the column as Text, or prefix the individual entry with an ' to indicate text. A similar problem occurs in R's read.table() function when a factor has levels that can be interpreted as numbers. At 10:11 PM 8/27/2007, David wrote: >A common process when data is obtained in an Excel spreadsheet is to save >the spreadsheet as a .csv file then read it into R. Experienced users >might have learned to be wary of dates (as I have) but possibly have not >experienced what just happened to me. I thought I might just share it with >r-help as a cautionary tale. > >I received an Excel file giving patient details. Each patient had an ID >code in the form of three letters followed by four digits. (Actually a New >Zealand National Health Identification.) I saved the .xls file as .csv. >Then I opened up the .csv (with Excel) to look at it. In the column of ID >codes I saw: Aug-99. Clicking on that entry it showed 1/08/2699. > >In a column of character data, Excel had interpreted AUG2699 as a date. > >The .csv did not actually have a date in that cell, but if I had saved the >.csv file it would have. > >David Scott Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chi-square statistics
At 12:41 AM 8/24/2007, Karen wrote: >I'm wondering if R has functions to return pvalues with given x-squares and >df. I googled for it but couldn't seem to find one. Appreciate any helps >here. >An example: df=4, x<- c(33.69, 32.96, 30.90) which are the statistic for >chi-square, I'd like to get the corresponding pvalues for each values in x. ? pchisq E.g. pchisq(x, df, lower.tail=FALSE) ============ Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Conditional logistic regression on n:m matched "cohort" data [Corrected]
[Corrected the model formula to include "method".] I am designing an interlaboratory validation study for a presence/absence alternative method test kit vs. a presence/absence reference method test kit. There will be 10 laboratories conducting tests using both methods. In each laboratory, there will be 5 specimens tested, each of the 5 specimens twice by both methods (alternative, standard). The total number of data are 10 x 5 x 4 = 200. The general structure is as follows: id: sequential result label, 1 to 200. lab: a factor which has 10 levels, one for each participating lab. specimen: 1 to 5 in each lab, no connection between labs. method: A or R. result: 0 or 1 (absence or presence). This experiment appears to be equivalent to a matched cohort study. The matching is done on specimens, and there are 2 alternative and 2 reference results for each specimen. The sketchy description for clogit() of package "survival" presumes a matched case-control study is being analyzed. I am looking for the correct syntax to use in clogit() for this experiment. Is library('survival') clogit(result ~ lab + method + strata(specimen), method="exact", data=mydata) all that is required? Thanks. ============ Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Conditional logistic regression on n:m matched "cohort" data
I am designing an interlaboratory validation study for a presence/absence alternative method test kit vs. a presence/absence reference method test kit. There will be 10 laboratories conducting tests using both methods. In each laboratory, there will be 5 specimens tested, each of the 5 specimens twice by both methods (alternative, standard). The total number of data are 10 x 5 x 4 = 200. The general structure is as follows: id: sequential result label, 1 to 200. lab: a factor which has 10 levels, one for each participating lab. specimen: 1 to 5 in each lab, no connection between labs. method: A or R. result: 0 or 1 (absence or presence). This experiment appears to be equivalent to a matched cohort study. The matching is done on specimens, and there are 2 alternative and 2 reference results for each specimen. The sketchy description for clogit() of package "survival" presumes a matched case-control study is being analyzed. I am looking for the correct syntax to use in clogit() for this experiment. Is library('survival') clogit(result ~ lab + strata(specimen), method="exact", data=mydata) all that is required? Thanks. ============ Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Effect size, interactions, and main effects (Stats Question) [ffmanova]
Statistical significance is "detectability", and depends upon the size of the sample as well as the effect. A large enough experiment will result in statistical detectability of almost every interaction action term allowed. This is why estimation, not testing, has become the consensus recommendation in statistics. As a practical matter, evaluate the combined effect of your model terms with and without the interaction term(s) you are worried about. Is the reduction in accuracy of physical importance? If so, the interaction terms are required for scientific reasons. If not, present both results and indicate the acceptability (for interpolation) of the simpler model. You should also make it your first priority to hypothecate why the interaction terms are meaningful and expected. If a cause can be found, it may suggest an alternate model that will eliminate interactions, or satisfy your anxiety. If not, it may support your argument to simplify. At 08:58 AM 7/21/2007, Mark wrote: >Dear List Members, > >I would very much appreciate any pointers you could give me on the following >matter: > >Main Question: >To what extent does the "rule" that it is unreasonable to talk about main >effects if there are significant interactions in a model depend upon effect >size [of the significant interaction terms]? Or is this not an issue? > >More practically: Suppose I were to carry out a so-called Type-II MANOVA >(using ffmanova) and were to find that the interaction term in a 2-way >analysis has borderline significance (say p = 0.045) and a small effect >size, whereas one of the main effects is highly signficant (say p = 6.8e-10) >and has a large effect size. > >Would it in this case be reasonable for me to ignore the interaction term, >and talk only about main effects? And, presuming the main question is fair, >are there general guidlines concerning the relationship between level of >significance and effect size for interaction terms. > >Thank you in advance for your help, Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] No convergence using ADAPT
What versions of "adapt" and R are you using? The current package was built with R-2.5.1. I tried your program with R-2.5.0, and got the answer 0.1501053 in just a few seconds. At 03:20 PM 7/7/2007, Philip wrote: >I am trying calculate a probability using numerical integration. The first >program I ran spit out an answer in a very short time. The program is below: > >## START PROGRAM > >trial <- function(input) > >{ >pmvnorm(lower = c(0,0), upper = c(2, 2), mean = input, sigma = >matrix(c(.1, 0, >0, .1), nrow = 2, ncol = 2, byrow = FALSE)) >} > >require(mvtnorm) >require(adapt) > >bottomB <- -5*sqrt(.1) >topB <- 2 + 5*sqrt(.1) >areaB <- (topB - bottomB)^2 > >unscaled.Po.in.a <- adapt(2, lo = c(bottomB, bottomB), up = c(topB, topB), >minpts = 1000, eps = 1e-4, functn = trial) > >(1/areaB)*unscaled.Po.in.a$value > >## FINISH PROGRAM > >I tried to run the program again changing a.) sigma in the trial >function, b.) >upper in the trial function, and c.) the bounds of integration; that is, >bottomB and topB. The new program is below: > >## START PROGRAM > >trial <- function(input) > >{ >pmvnorm(lower = c(0,0), upper = c(10, 10), mean = input, sigma = >matrix(c(.01, >0, 0, .01), nrow = 2, ncol = 2, byrow = FALSE)) >} > >require(mvtnorm) >require(adapt) > >bottomB <- -5*sqrt(.01) >topB <- 10 + 5*sqrt(.01) >areaB <- (topB - bottomB)^2 > >unscaled.Po.in.a <- adapt(2, lo = c(bottomB, bottomB), up = c(topB, topB), >minpts = 1000, eps = 1e-4, functn = trial) > >(1/areaB)*unscaled.Po.in.a$value > >## FINISH PROGRAM > >Now, the program just runs and runs (48 hours at last count!). By playing >around with the program, I have deduced the program is highly sensitive to >changing the upper option in the trial function. For example, using a vector >like (4, 4) causes no problems and the program quickly yields an answer. I >have a couple of other programs where I can easily obtain a simulation-based >answer, but I would ultimately like to know what's up with this >program before >I give up on it so I can learn a thing or two. Does anyone have any clues or >tricks to get around this problem? My guess is that it will simply be very >difficult (impossible?) to obtain this type of relative error (eps = >1e-4) and >I will have no choice but to pursue the simulation approach. > >Thanks for any responses ([EMAIL PROTECTED])! > >-- Phil > >Philip Turk >Assistant Professor of Statistics >Northern Arizona University >Department of Mathematics and Statistics >PO Box 5717 >Flagstaff, AZ 86011 >Phone: 928-523-6884 >Fax: 928-523-5847 >E-mail: [EMAIL PROTECTED] >Web Site: http://jan.ucc.nau.edu/~stapjt-p/ > >__ >R-help@stat.math.ethz.ch mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] outlying
At 05:29 AM 6/19/2007, elyakhlifi wrote: >hello, >are there functions to detecte outlying observations in samples? >thanks. library('car') ? outlier.test library('outliers') ? grubbs.test ? dixon.test ? cochran.test ? chisq.out.test ============ Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Iterative Solver [Converting Matlab's solve()]
At 12:24 AM 6/19/2007, Boris wrote: >I can't for the life of me figure out how to get the roots for this simple >(but sovable only iteratively) equation in R: > >x = z*(1-(1-2/z)^y > >where x and y are known, and z is unknown. In Matlab, this amounts to: > >[my.solution] = solve('x = z*(1-(1-2/z)^y') >my.solution.real = solution(y-1,y) > >% bottom line displays non-imaginary solution (last element) > 1. Your equation is syntactically incorrect. It is missing a ")". Ditto for your Matlab example. 2. Are you looking for an analytical (symbolic) solution? If so, I don't believe one exists (if you place a ")" after y). 3. To find numerical roots, see uniroot(). 4. If you want more help, you'll have to specify domains or values for x and y. ==== Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Optimization
You don't need optimization for the solution to your problem. You just need an understanding of the meaning of qnorm() and some simple algebra. Try: x<- (0.01-0.0032)/qnorm(0.7,0,1) At 12:01 PM 6/18/2007, you wrote: >Hi, I would like to minimize the value of x1-x2, x2 is a fixed value of 0.01, >x1 is the quantile of normal distribution (0.0032,x) with probability of >0.7, and the changing value should be x. Initial value for x is 0.0207. I am >using the following codes, but it does not work. > >fr <- function(x) { > x1<-qnorm(0.7,0.0032,x) > x2=0.01 > x1-x2 >} >xsd <- optim(0.0207, fr, NULL,method="BFGS") > >It is the first time I am trying to use optimization. Could anyone give me >some advice? >-- >View this message in context: >http://www.nabble.com/Optimization-tf3941212.html#a11178663 >Sent from the R help mailing list archive at Nabble.com. > >__ >R-help@stat.math.ethz.ch mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Not R question]: Better fit for order probit model
At 01:29 PM 6/17/2007, adschai wrote: >Thank you so much Robert. Please find the information below. > >The scale 1-10 are subjective physical condition ratings scored by >inspection engineers at the site. 1-5 are in very bad conditions >(bridge close down to seriously deteriorated). The rest from 6-10 >are categorized as good condition.However, the proportion of samples >in my data are as follows. Bridges with 1-5 scale covers about 2-3% >of total population. The majority of the bridges are in 7-8. >Certainly, I have enough data for model estimation but the mix of >the proportion is good. I attached the detail of condition rating >scale at the end of this message. > My guess is that you really have two distributions here: 1) Bridges that are basically under proper supervision and repair (Categories 6-10), and those that are not Categories 1-5). These two classes would probably have dramatically different relations to the covariates your are using. So my recommendation would be to consider splitting your response into two classes (0/1), each with 5 sub categories. Keeping the two classes merged together in a single group is equivalent to merging two different distributions with a weighting factor. This may be causing a bimodal distribution which is giving you problems. You could try your modeling on each of the two classes separately before continuing with your full dataset modeling. You may learn something useful to help you with your problems. For the full model, you would need to include a full set of interaction terms with "class". ============ Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Not R question]: Better fit for order probit model
At 03:17 AM 6/16/2007, adschai wrote: >Thank you so much Robert. I haven't thought about the idea of >clumping categories together. One of the reason is because these >categories are bridge condition rating scores. They indeed represent >different meaning and serviceability conditions. They vary from 0-9. >I have about 300,000 data in which the first 5 labels, i.e. 0-4, are >bad condition bridge and there are only less than 1000 instances in >total. The worst case, is for example, score 0 (meaning the bridge >is not operatable), I have 60 instances. Score 1 I have about 100. > >I would appreciate if you could provide some opinion as to how you >would make the order probit fits better in this case? Thank you so >much in advance. You certainly appear to have enough data to populate these categories. Your problems in a getting a good fit may relate to other problems. You need to supply more information in order to say more. What are the definitions of each category? Is the ordering consistent, or are there really two different scales, one for bridge with essentially no problems, and another for those with serious damage? What evidence do you have that your fit is poor? What model are you fitting? Etc. ============ Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Not R question]: Better fit for order probit model
At 09:31 PM 6/15/2007, adschai wrote: >I have a model which tries to fit a set of data with 10-level >ordered responses. Somehow, in my data, the majority of the >observations are from level 6-10 and leave only about 1-5% of total >observations contributed to level 1-10. As a result, my model tends >to perform badly on points that have lower level than 6. > >I would like to ask if there's any way to circumvent this problem or >not. I was thinking of the followings ideas. But I am opened to any >suggestions if you could please. > >1. Bootstrapping with small size of samples each time. Howevever, in >each sample basket, I intentionally sample in such a way that there >is a good mix between observations from each level. Then I have to >do this many times. But I don't know how to obtain the true standard >error of estimated parameters after all bootstrapping has been done. >Is it going to be simply the average of all standard errors >estimated each time? > >2. Weighting points with level 1-6 more. But it's unclear to me how >to put this weight back to maximum likelihood when estimating >parameters. It's unlike OLS where your objective is to minimize >error or, if you'd like, a penalty function. But MLE is obviously >not a penalty function. > >3. Do step-wise regression. I will segment the data into two >regions, first points with response less than 6 and the rest with >those above 6. The first step is a binary regression to determine if >the point belongs to which of the two groups. Then in the second >step, estimate ordered probit model for each group separately. The >question here is then, why I am choosing 6 as a cutting point >instead of others? > >Any suggestions would be really appreciated. Thank you. You could do the obvious, and lump categories such as 1-6 or 1-7 together to make a composite category. You don't mention the size of your dataset. If there are 10,000 data, you might live with a 1% category. If you only have 100 data, you have too many categories. Also, next time plan your study and training better so that next time your categories are fully utilized. And don't use so many categories. People have trouble even selecting responses on a 5-level scale. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pretty report
At 09:13 PM 6/12/2007, Don wrote: >At 5:01 PM -0400 6/12/07, Weiwei Shi wrote: > >Dear Listers: > > > >I have a couple of data frames to report and each corresponds to > >different condtions, e.g. conditions=c(10, 15, 20, 25). In this > >examples, four data frames need to be exported in a "pretty" report. > > > >I knew Perl has some module for exporting data to Excel and after > >googling, I found R does not. > >I use write.table(), name the file with ".xls" as the suffix, then >outside R I double-click on it and it opens in Excel. Granted, it's a >text file, and Excel is opening a text file, but none the less, I > Note that files with a ".csv" extension are also associated with Excel and can opened with a double-click. Comma-separated-value files also can be unambiguously loaded by Excel without parsing. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Appropriate regression model for categorical variables
At 01:45 PM 6/12/2007, Tirtha wrote: >Dear users, >In my psychometric test i have applied logistic regression on my data. My >data consists of 50 predictors (22 continuous and 28 categorical) plus a >binary response. > >Using glm(), stepAIC() i didn't get satisfactory result as misclassification >rate is too high. I think categorical variables are responsible for this >debacle. Some of them have more than 6 level (one has 10 level). > >Please suggest some better regression model for this situation. If possible >you can suggest some article. 1. Using if a factor has many levels, there is a natural order to the levels. If so, consider fitting the factor as an ordered factor. 2. Break the factor levels into 2 or 3 groups that have some rational connection. Then fit the factor with a smaller number of levels. E.g., "race" might have levels "white", "black", "asian", "pacific", "Spanish surname", "other". Consider a change to "white", "nonwhite". Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] p-value from GEE why factor 2*pnorm?
At 11:21 AM 6/11/2007, Carmen wrote: > >In this tread there is a hint hwo to calculate the p-vlue of an GEE: > > _http://finzi.psych.upenn.edu/R/Rhelp02a/archive/74150.html > > > > Then, get the P values using a normal approximation for the > > distribution of z: > > > > /> 2 * pnorm(abs(coef(summary(fm1))[,5]), lower.tail = FALSE) / > > (Intercept) TPTLD 0. 0.04190831 > >1. why is the result multiplicated with 2? There is a P-value between 1 and 2 >with the results below and multiplicated with 2: > >2*pnorm(c(1.8691945,0.5882351,2.4903091,1.9287802,2.3172983,2.2092593,2.2625959,1.6395695), >lower.tail =TRUE) 1. The given in the thread mentioned was: 2 * pnorm(abs(coef(summary(fm1))[,5]), lower.tail = FALSE) 2. The reason for the "2" at the front is to make it an "equal-tails" or "2-sided" confidence interval. Pedantically, you should use 1.96 instead of 2.0 for consistency, but 2.0 = 1.96 rounded to one decimal place. 3. This is what is usually called a "Wald" type confidence interval, as it is simply the normal quantile (+/- 1.96) multiplied by the standard error of estimate to get the +/- widths for the interval. These would be added to the estimate itself to get the final Wald confidence interval, which obviously assumes a normal distribution applies. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Windows vista's early terminate Rgui execution
At 03:28 PM 6/10/2007, [EMAIL PROTECTED] wrote: >Hi,I have a frustrating problem from vista that I wonder if anyone >has come across the same problem. I wrote a script that involves >long computational time (although, during the calculation, it spits >out text on the gui to notify me the progress of the calculation >periodically). Windows vista always stopped my calculation and >claimed that 'Rgui is stop-working. Windows is checking for >solution.' And when I looked into task manager, windows already >stopped my Rgui process. I am quite disappointed with this. I would >really appreciate if anyone finds a solution to go around this >windows vista problem? Particularly, how to turn off this feature in >vista? Any help would be really appreciated. Thank you!- adschai You probably need to contact Vista periodically so it knows you are awake. Just include a line that does a call to Vista that doesn't do output, such as useless <- dir() placed in some outer loop that satisfies the "drop dead" time between calls. Alternatively, you can attempt to find out how to change the registry entry corresponding to the wait time and increase it to a value you can live with. ============ Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What ECDF function?
At 06:36 PM 6/9/2007, Marco wrote: >On 6/9/07, Robert A LaBudde <[EMAIL PROTECTED]> wrote: > > At 12:57 PM 6/9/2007, Marco wrote: > > > > > >Hmmm I'm a bit confused, but very interested! >So you don't use the R "ecdf", do you? Only when an i/n edf is needed (some tests, such as ks.test() are based on this). Also, I frequently do modeling in Excel as well, where you need to enter your own formulas. > > > Also remember that edfs are not very accurate, so the differences > > between these formulae are difficult to justify in practice. > > > >I will bear in min! My first interpretation was that using some >different from i/n (e.g. i/(n+1)), let to better individuate tail >differences (maybe...) The chief advantage to i/(n+1) is that you don't generate 1.0 as an abscissa, as you do with i/n. But the same is true of (i-0.5)/n, and it's more accurate. Unless you need to do otherwise, just use ecdf(), because it matches the theory for most uses, and it almost always doesn't matter that it's slightly less accurate than other choices. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What ECDF function?
At 12:57 PM 6/9/2007, Marco wrote: > >2.I found various version of P-P plot where instead of using the >"ecdf" function use ((1:n)-0.5)/n > After investigation I found there're different definition of ECDF >(note "i" is the rank): > * Kaplan-Meier: i/n > * modified Kaplan-Meier: (i-0.5)/n > * Median Rank: (i-0.3)/(n+0.4) > * Herd Johnson i/(n+1) > * ... > Furthermore, similar expressions are used by "ppoints". > So, > 2.1 For P-P plot, what shall I use? > 2.2 In general why should I prefer one kind of CDF over another one? > This is an age-old debate in statistics. There are many different formulas, some of which are optimal for particular distributions. Using i/n (which I would call the Kolmogorov method), (i-1)/n or i/(n+1) is to be discouraged for general ECDF modeling. These correspond in quality to the rectangular rule method of integration of the bins, and assume only that the underlying density function is piecewise constant. There is no disadvantage to using these methods, however, if the pdf has multiple discontinuities. I tend to use (i-0.5)/n, which corresponds to integrating with the "midpoint rule", which is a 1-point Gaussian quadrature, and which is exact for linear behavior with derivative continuous. It's simple, it's accurate, and it is near optimal for a wide range of continuous alternatives. The formula (i- 3/8)/(n + 1/4) is optimal for the normal distribution. However, it is equal to (i-0.5)/n to order 1/n^3, so there is no real benefit to using it. Similarly, there is a formula (i-.44)/(N+.12) for a Gumbel distribution. If you do know for sure (don't need to test) the form of the distribution, you're better off fitting that distribution function directly and not worrying about the edf. Also remember that edfs are not very accurate, so the differences between these formulae are difficult to justify in practice. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] glm() for log link and Weibull family
I need to be able to run a generalized linear model with a log() link and a Weibull family, or something similar to deal with an extreme value distribution. I actually have a large dataset where this is apparently necessary. It has to do with recovery of forensic samples from surfaces, where as much powder as possible is collected. This apparently causes the results to conform to some type of extreme value distribution, so Weibull is a reasonable starting point for exploration. I have tried ('surface' and 'team' are factors) glm(surfcount ~ surface*team, data=powderd, family=Gamma(link='log')) but this doesn't quite do the trick. The standardized deviance residuals are still curved away from normal at the tails. Thanks for any info you can give on this nonstandard model. ============ Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pnorm how to decide lower-tail true or false
At 01:31 PM 6/8/2007, Carmen wrote: >Hi to all, >maybe the last question was not clear enough. >I did not found any hints how to decide whether it should use lower.tail >or not. >As it is an extra R-feature ( written in >http://finzi.psych.upenn.edu/R/Rhelp02a/archive/66250.html ) >I do not find anything about it in any statistical books of me. >Regards Carmen pnorm(z, lower.tail=TRUE) (the R default) gives the probability of a normal variate being at or below z. This is the value commonly called the cumulative distribution function at the point z, or the integral from -Inf to z of the gaussian density. pnorm(z, lower.tail=FALSE) gives the complement of the above, or 1 - cdf(z), and is the integral from z to Inf of the gaussian density. E.g., > pnorm(1.96, lower.tail=TRUE) [1] 0.9750021 > pnorm(1.96, lower.tail=FALSE) [1] 0.02499790 > Use lower.tail=TRUE if you are, e.g., finding the probability at the lower tail of a confidence interval or if you want to the probability of values no larger than z. Use lower.tail=FALSE if you are, e.g., trying to calculate test value significance or at the upper confidence limit, or you want the probability of values z or larger. You should use pnorm(z, lower.tail=FALSE) instead of 1-pnorm(z) because the former returns a more accurate answer for large z. This is really simple issue, and has no inherent complexity associated with it. ============ Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding density curve peak
At 07:34 PM 6/2/2007, Davendra wrote: >I have a dataset where I get a density curve of a continuous variable with >two peaks. >How can I get the peaks? >Any simple solutions? Plot the curve, and the use identify() and the mouse to click on the peaks. ======== Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Curve crosses back to origin in plot
Another sample problem: In the Windows version of R-2.5.0, data(GHQ,package='HSAUR') layout(1) GHQ_glm_1<- glm(cbind(cases,non.cases) ~ GHQ, data=GHQ, family=binomial()) summary(GHQ_glm_1) yfit_glm_1<- predict(GHQ_glm_1, type='response') layout(1) plot(probs ~ GHQ,pch=1,col=1,ylab='Probability(Case)', data=GHQ) lines(yfit_glm_1 ~ GHQ, pch=3,col=3, data=GHQ) legend("topleft", c("linear", "logistic"), pch=c(2,3), col=c(2,3)) Everything is fine, but the predicted values curve in the lines() statement becomes closed by a straight line segment connecting the last point to the first. How can this be avoided? It appears to be wrapping due to the change from GHQ=10 for female to GHQ=0 again for male, bring the curve back to the beginning value. One way to avoid this is to plot each sex's data in a separate lines() call. Is there a way of doing this in a single lines() statement? If not, I'd like to know. If two lines() are needed, I'd like to know an efficient syntax. My attempt would be lines(yfit_glm_1[0:10] ~ GHQ[0:10], pch=3,col=3, data=GHQ) lines(yfit_glm_1[11:22] ~ GHQ[11:22], pch=3,col=3, data=GHQ) which seems inelegant, as it involves numerical ranges that have to be determined by inspection. Thanks again for answering these simple questions that seem to be the hardest to find answers for. ==== Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to reference or sort rownames in a data frame
Thanks, Gabor. I have to say I wouldn't have figured this out easily. I'd summarize your comments by: 1. Remember to use arrays of logicals as indices. 2. Remember %in% for combination matches. 3. Remember which() to get indices. It is the small tasks which appear most difficult to figure out in R. At 10:29 PM 5/27/2007, Gabor wrote: >On 5/27/07, Robert A. LaBudde <[EMAIL PROTECTED]> wrote: >>As I was working through elementary examples, I was using dataset >>"plasma" of package "HSAUR". >> >>In performing a logistic regression of the data, and making the >>diagnostic plots (R-2.5.0) >> >>data(plasma,package='HSAUR') >>plasma_1<- glm(ESR ~ fibrinogen * globulin, data=plasma, family=binomial()) >>layout(matrix(1:4,nrow=2)) >>plot(plasma_1) >> >>I find that data points corresponding to rownames 17 and 23 are >>outliers and high leverage. >> >>I would then like to perform a fit without these two rows. >> >>In principle this should be easy, using an update() with subset=-c(17,23). >> >>The problem is that the rownames in this dataset are not ordered, >>and, in fact, the relevant rows are 30 and 31, not 17 and 23. >> >>This brings up the following (elementary?) questions: >> >>1. How do you reference rows in "subset=" for which you know the >>rownames, but not the row numbers? > >Use a logical vector: > > rownames(plasma) %in% c(17, 23) > >> >>2. How do you discovery the rows corresponding to particular >>rownames? (Using plasma[rownames(plasma)==17,] shows the data, but >>NOT the row number!) (Probably the same answer as in Q. 1 above.) > > which(rownames(plasma) %in% c(17, 23)) # 30, 31 > >> >>3. How do you sort (order) the rows of an existing data frame so that >>the rownames are in order? > > > plasma[order(as.numeric(rownames(plasma))), ] Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to reference or sort rownames in a data frame
As I was working through elementary examples, I was using dataset "plasma" of package "HSAUR". In performing a logistic regression of the data, and making the diagnostic plots (R-2.5.0) data(plasma,package='HSAUR') plasma_1<- glm(ESR ~ fibrinogen * globulin, data=plasma, family=binomial()) layout(matrix(1:4,nrow=2)) plot(plasma_1) I find that data points corresponding to rownames 17 and 23 are outliers and high leverage. I would then like to perform a fit without these two rows. In principle this should be easy, using an update() with subset=-c(17,23). The problem is that the rownames in this dataset are not ordered, and, in fact, the relevant rows are 30 and 31, not 17 and 23. This brings up the following (elementary?) questions: 1. How do you reference rows in "subset=" for which you know the rownames, but not the row numbers? 2. How do you discovery the rows corresponding to particular rownames? (Using plasma[rownames(plasma)==17,] shows the data, but NOT the row number!) (Probably the same answer as in Q. 1 above.) 3. How do you sort (order) the rows of an existing data frame so that the rownames are in order? I don't seem to know the magic words to find the answers to these questions in the help systems. Obviously this can be done by writing new, brute force, functions scanning the subscripts, but there must be an (obvious?) direct way of doing this more elegantly. Thanks for any pointers. ============ Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survival statistics--displaying multiple plots
The layout() function below states there are to be 3 graphs on the first row, side by side. Each time you plot, R applies the graph to the next free position on the layout. After 3 plots with your different boolean subsets, you end up with 3 graphs side by side. If this is not what you want (you asked for this in your first request), and instead you want 3 curves on the same, single graph, use plot() for the first curve(s) and then use lines() to add the additional curves. Check the survfit() object to see how to pick off individual times and survival curves by subscripting. At 12:09 AM 5/3/2007, Greg wrote: >Thanks for replying Robert. Forgive me, it might be the hour or my >limitations, but I am a little unclear on how to implement your suggestion. > >In my original example, > > >plot(survfit(Surv(days,status==1),subset(tips,meld<10)) > >A plot of the fraction of patients surviving following the procedure against >the number of days since the procedure would be generated for patients with >meld scores of less than 10. > >Similarly, if I wanted to generate a survival curve of patients with scores >of between 10 and 20, I can with the following: > > >plot(survfit(Surv(days,status==1),subset(tips,meld>10 & meld <20)) > > >And for patients with meld>20, > > >plot(survfit(Surv(days,status==1),subset(tips,meld>20)) > > >But how do I display the curves in each cadre (meld<10, 10meld>20) on the same chart? > > >-Original Message- >From: Robert A LaBudde [mailto:[EMAIL PROTECTED] >Sent: Wednesday, May 02, 2007 11:48 PM >To: Gregory Pierce >Subject: Re: [R] Survival statistics--displaying multiple plots > >? layout() >? par() > >E.g., > >layout(matrix(c(1,2,3),1,3,byrow=TRUE) #3 plots side-by-side > >Then use plot() three times to generate each of your graphs. > >At 11:14 PM 5/2/2007, Greg wrote: > >I should clarify. I can generate plots for each category individually but > >not for all three on the same chart. > > > >Greg > > > >-Original Message- > >From: Gregory Pierce [mailto:[EMAIL PROTECTED] > >Sent: Wednesday, May 02, 2007 10:21 PM > >To: 'r-help@stat.math.ethz.ch' > >Subject: Survival statistics--displaying multiple plots > > > >Hello all! > > > >I am once again analyzing patient survival data with chronic liver disease. > > > >The severity of the liver disease is given by a number which is >continuously > >variable. I have referred to this number as "meld"--model for end stage > >liver disease--which is the result of a mathematical calculation on > >underlying laboratory values. So, for example, I can generate a >Kaplan-Meier > >plot of patients undergoing a TIPS procedure with the following: > > > > >plot(survfit(Surv(days,status==1),subset(tips,meld<10)) > > > >where "tips" is my data set, "days" is the number of days alive, and meld >is > >the meld score. > > > >What I would like to do is display the survival graphs of patients with > >meld<10, 1020. I am unsure about how to go about this. > > > >Any suggestions would be appreciated. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survival statistics--displaying multiple plots
? layout() ? par() E.g., layout(matrix(c(1,2,3),1,3,byrow=TRUE) #3 plots side-by-side Then use plot() three times to generate each of your graphs. At 11:14 PM 5/2/2007, Greg wrote: >I should clarify. I can generate plots for each category individually but >not for all three on the same chart. > >Greg > >-Original Message- >From: Gregory Pierce [mailto:[EMAIL PROTECTED] >Sent: Wednesday, May 02, 2007 10:21 PM >To: 'r-help@stat.math.ethz.ch' >Subject: Survival statistics--displaying multiple plots > >Hello all! > >I am once again analyzing patient survival data with chronic liver disease. > >The severity of the liver disease is given by a number which is continuously >variable. I have referred to this number as "meld"--model for end stage >liver disease--which is the result of a mathematical calculation on >underlying laboratory values. So, for example, I can generate a Kaplan-Meier >plot of patients undergoing a TIPS procedure with the following: > > >plot(survfit(Surv(days,status==1),subset(tips,meld<10)) > >where "tips" is my data set, "days" is the number of days alive, and meld is >the meld score. > >What I would like to do is display the survival graphs of patients with >meld<10, 1020. I am unsure about how to go about this. > >Any suggestions would be appreciated. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is R's fast fourier transform function different from "fft2" in Matlab?
Discrete Fourier transforms can be normalized in different ways. Some apply the whole normalization to the forward transform, some to the reverse transform, some apply the square root to each, and some don't normalize at all (in which case the reverse of the forward transform will need scaling). The latter apparently the case with R, according to your values. Note that the R and the MatLab answers agree to within a scale factor for each row. At 10:53 PM 5/2/2007, Li-Li wrote: >Thanks for both replies. >Then I found the "ifft2" from Matlab gives different result from "fft( , >inverse=T)" from R. >An example: >in R: > > temp <- matrix(c(1,4,2, 20), nrow=2) > > fft(temp) >[,1] [,2] >[1,] 27+0i -17+0i >[2,] -21+0i 15+0i > > fft(temp,inverse=T) >[,1] [,2] >[1,] 27+0i -17+0i >[2,] -21+0i 15+0i > >In Matlab: > > A = [1,2;4,20]; > > fft2(A) >Ans = >27-17 > -21 15 > >ifft2(A) >Ans= >6.7500-4.2500 > -5.2500 3.7500 > >I also tried mvfft with inverse but can't get same result with "ifft2". Does >any function work? Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] upgrade to 2.5
At 01:41 PM 5/2/2007, you wrote: >On 5/2/07, Sundar Dorai-Raj <[EMAIL PROTECTED]> wrote: > > > > > > Iasonas Lamprianou said the following on 5/2/2007 8:25 AM: > > > Hi I am using R version 2.4.1. How can I upgrade to version 2.5 > without having to install all the packages again? > > > Thanks > > > Jason > > > > > > > You may find the following link relevant. > > > > http://finzi.psych.upenn.edu/R/Rhelp02a/archive/75359.html > > > >if you use Windows XP. This link was useful to me, as I am new to R. (Win2000, R-2.5.0) What I have been doing is using a file compare utility (Beyond Compare in my case) to move files in the old "library" directory to the new one, if the files are missing in the new one. Then I perform an update.packages command. This procedure appears to work without problem. It would seem much preferable to have all packages saved in an installation-independent directory, instead of a "library" directory under R's installation directory. Then, of course, no update would be necessary. I can't find how this option is settable in R, other than a direct argument to library() or install.package(). How does one shift the R default libraries location to a particular directory? Thanks. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Log-likelihood function
At 07:30 AM 5/2/2007, Doxastic wrote: >Thanks. I used this and it gave me the same result as the "logLik" function. >The reason I ask is the SAS output gives me a loglik = 1089. R gives me >-298.09583. Both for my reduced model. For the saturated (or complex) >model, SAS gives me an loglik = 1143. R gives me -298.1993. The problem is >these give two very different pictures about whether I can drop the >interaction. However, I think the residual deviance in the R output is >equal to G^2. So, I can just take the difference between those two. If I >do this, I get a difference with an interpretation similar to that of what >comes from SAS. So I think I'll just go with that. But who knows if I'm >right (not me)? Some comments: 1. Use summary() on your glm() object to get a fuller display of post-fit statistics, including the starting ("null") and residual deviances. 2. The "deviance" is - 2 L, where L = ln(likelihood). 3. To test two nested models for the difference in covariates, subtract the two residual deviances and two d.f. and perform a chi-square test. This can be done nicely by anova() on the two glm() objects. 4. Check the coefficients in your SAS and R models and make sure you are performing the same fit in both cases. ======== Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bivariate normal distribution in likelihood
At 11:32 PM 4/30/2007, Deepankar wrote: >Hi all, > >I am trying to do a maximum likelihood estimation. In my likelihood >function, I have to evaluate a double integral of the bivariate >normal density over a subset of the points of the plane. For >instance, if I denote by y the y co-ordinate and by x, the x >co-ordinate then the area over which I have to integrate is the >following set of points, A: >A = {x>=0 & y < 3x+10} > >I have used the following code to calculate this double integral: > >x <- rmvnorm(10, mean=me, sigma=sig) >c <- nrow(x) >x1 <- x[ ,1] >x2 <- x[ ,2] >e1 <- as.numeric(x2 < 3*x1 + 10 & x1>0) >p1 <- sum(e1)/c > >In this code, I provide the mean and covariance while drawing the >bivariate random normal variables and get "p1" as the required >answer. The problem is that I have to draw at least 100,000 >bivariate random normals to get a reasonable answer; even then it is >not very accurate. > >Is there some other way to do the same thing more accurately and >more efficiently? For instance, can this be done using the bivariate >normal distribution function "pmvnorm"? Also feel free to point our >errors if you see one. Simple random sampling is a poor way to evaluate an integral (expectation). It converges on the order of 1/sqrt(N). Stratified random sampling would be better, as it converges on the order of 1/N. Even better is product Gauss-Hermite quadrature which will give a very accurate answer with a few dozen points. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to solve difficult equations?
At 03:15 AM 4/25/2007, francogrex wrote: >This below is not solvable with uniroot to find "a": >fn=function(a){ >b=(0.7/a)-a >(1/(a+b+1))-0.0025 >} >uniroot(fn,c(-500,500)) gives >"Error in uniroot(fn, c(-500, 500)) : f() values at end points not of >opposite sign" > >I read R-help posts and someone wrote a function: >http://finzi.psych.upenn.edu/R/Rhelp02a/archive/92407.html >but it is not very precise. Is there any '"standard" function in R that can >solve this? thanks. Actually, if you're solving fn(a)==0, then some trivial algebra leads to a linear equation with a=0.001754. Why are you trying to solve this numerically? Is it a single instance of a larger, more general problem? ==== Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Use of Lexis function to convert survival data to counting format
I'm trying to convert a dataset from the time-independent analysis form > head(addicts) id clinic status survtime prison meth clinic01 1 1 1 1 428 0 501 2 2 1 1 275 1 551 3 3 1 1 262 0 551 into the "counting data format" necessary to perform extended Cox regression. I got survSplit() to work, but it appears to have a problem with end times on the boundaries of the intervals. (My intervals are [0,183], [184,365], [366,548], [549,730], [730,Inf]). I looked in the searchable archives and found that others had also discovered problems with survSplit() and suggested Lexis() in the Epi package. When I try to get Lexis to work, I get an error message that I cannot interpret: > addictscdf<-Lexis(entry=start,exit=stop,fail='status',breaks=c(184,366,549,730), + include=c('id','clinic','prison','meth','clinic01'),data=addicts) The following object(s) are masked from addicts : clinic id meth prison status survtime Error: object is not subsettable I'm sure I'm making some type of trivial error here, but I don't know how to find it. I would guess that it has something to do with 'start' and 'stop', of which arguments I am apparently clueless as to the meaning. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using R to create pdf's from each file in a directory
At 11:06 PM 4/20/2007, I wrote: >At 09:40 PM 4/20/2007, gecko951 wrote: > > > >list <- dir("/tmp/data") > >for(x in list){ > >d <- read.table(x, sep="\t", header=TRUE) # read data > >pdf("/tmp/graph/x.pdf") # file for graph > > > >I'm a tyro at R, but it's obvious here that the line > >pdf("/tmp/graph/x.pdf") > >has the intended file name string 'x' embedded with the literal >string enclosed in quotation marks. Obviously he needs "/tmp/graph/" >string concatenated with x and then ".pdf". > >I would suggest a fix, but I am unable to find in the documentation >how to concatenate two strings into a single string. > >So I will amplify gecko951's question to include "How you you >concatenate two strings in R?". I.e., > >x<-"abc" >y<-"def" > >What operator or function returns "abcdef" as a result? > I found that paste() can be used for string concatenation, so I can now suggest a fix to gecko951's problem. Replace pdf("/tmp/graph/x.pdf") with pdf(paste(paste('/tmp/graph/',x,sep=''),'.pdf',sep='')) and the program should now work for the different file names. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using R to create pdf's from each file in a directory
At 09:40 PM 4/20/2007, gecko951 wrote: > >list <- dir("/tmp/data") >for(x in list){ >d <- read.table(x, sep="\t", header=TRUE) # read data >pdf("/tmp/graph/x.pdf") # file for graph > I'm a tyro at R, but it's obvious here that the line pdf("/tmp/graph/x.pdf") has the intended file name string 'x' embedded with the literal string enclosed in quotation marks. Obviously he needs "/tmp/graph/" string concatenated with x and then ".pdf". I would suggest a fix, but I am unable to find in the documentation how to concatenate two strings into a single string. So I will amplify gecko951's question to include "How you you concatenate two strings in R?". I.e., x<-"abc" y<-"def" What operator or function returns "abcdef" as a result? Thanks. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.