[R] How to get robust M-estimator of multivariate scatter using Huber's psi?
How to get robust M-estimators of multivariate scatter using Huber's psi? Which package/function should I look into? Ideally, I hope I can self-define thresholds of Huber's psi function. Thanks a lot!!! -- View this message in context: http://www.nabble.com/How-to-get-robust-M-estimator-of-multivariate-scatter-using-Huber%27s-psi--tp20585755p20585755.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to get robust M-estimator of multivariate scatter using Huber's psi?
Yes I did. But the closest I found is the covMest function in rrcov package, which uses biweight function, which is rescending. Unfortunately, I need to use Huber's psi function to derive multivariate scatter, and I hope I can try different thresholds. Any help? Thanks a lot!!! David Winsemius wrote: Have you looked at the obvious Task View yet? http://cran.r-project.org/web/views/Robust.html -- David Winsemius On Nov 19, 2008, at 1:07 PM, rlearner309 wrote: How to get robust M-estimators of multivariate scatter using Huber's psi? Which package/function should I look into? Ideally, I hope I can self-define thresholds of Huber's psi function. Thanks a lot!!! -- View this message in context: http://www.nabble.com/How-to-get-robust-M-estimator-of-multivariate-scatter-using-Huber%27s-psi--tp20585755p20585755.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/How-to-get-robust-M-estimator-of-multivariate-scatter-using-Huber%27s-psi--tp20585755p20587324.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] A question about positive definite matrix
I know, this is a forum about R. But I am so desperate of this problem (BTW, anyone knows any good Statistics/Math forum to post question like this?): A and B are both n x n positive definite matrix. Denote A B, if A - B is positive definite. I know this is true: if A B, then A^{-1} B^{-1}. But how to prove this? I tried to diagonalize A and B, but since they can have different eigen structure,... I am stuck here. Thanks a lot for any help here. -- View this message in context: http://www.nabble.com/A-question-about-positive-definite-matrix-tp20063054p20063054.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to add notes to the graph?
Hi, I have a simple graph: x - c(1,2,3) plot(x, pch=16,type=b) I would like to add some notes just beside these 3 dots, and the notes are stored in a vector: a - c(12,54,84) So the result will be: there should be a 12 below the first dot (or next to it, but not replacing the solid dot), a 54 next to the second dot... what if a is not numeric? a - c(a,b,c) Thank you very much!! -- View this message in context: http://www.nabble.com/how-to-add-notes-to-the-graph--tp18689195p18689195.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to control the memory?
Hi, I have a huge data set to deal with. Sometimes I got warning message about limitation of memory, sometimes R reads in the data but it is very slow. My question is, is there anything I can do to lift the memory limitation (I did memory.limit(4095), anything else I can do?). I know I can also drop variables that R remembered, but I forgot the command. Thanks a lot! Oh, BTW, is there a function that can be used to generate weekdays (skip weekends automatically)? -- View this message in context: http://www.nabble.com/How-to-control-the-memory--tp18611155p18611155.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to filter a data frame?
I have a question about how to filter the data frame: Suppose my data frame has variables like gender, age,... How to get a subset of the data frame, with only female (or male) and/or age 50...? What is the typical syntax? I tried several condition expressions, but none of them worked... Thanks a lot! -- View this message in context: http://www.nabble.com/How-to-filter-a-data-frame--tp18587502p18587502.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Does R have SQL interface in windows?
Seems that the RmySQL package does not support windows... Thanks a lot! -- View this message in context: http://www.nabble.com/Does-R-have-SQL-interface-in-windows--tp18587733p18587733.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to filter a data frame?
Thank you all!! :-) rlearner309 wrote: I have a question about how to filter the data frame: Suppose my data frame has variables like gender, age,... How to get a subset of the data frame, with only female (or male) and/or age 50...? What is the typical syntax? I tried several condition expressions, but none of them worked... Thanks a lot! -- View this message in context: http://www.nabble.com/How-to-filter-a-data-frame--tp18587502p18598469.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Can I do regression by steps?
I saw this type of models in some of my company projects. To simplify: Y is regressed on X1 and X2. But the regression is done by two steps: First Y is regressed on X1 with intercept, and the residuals from the first step are used to regress on X2, without the constant. The reason to do so is some observations have X1 data but do not have X2, so I guess the person wants to use as much information as he can to get the coef. for X1, and then use part of the residuals (that have X2 data) to catch what is left to be explained by X2. But my concern is, should we consider the correlation between X1 and X2? If residuals from the first step are used, then X1 effect has been removed. Then what does it really mean by regressing residuals on X2, which has some X1 effect correlated with?? should X2 be adjusted by X1, too (regress X2 on X1 and use the residuals)? What if both X1 and X2 are dummy variables? Dummy variables can have a meaningful correlation, too, right? Thanks a lot! -- View this message in context: http://www.nabble.com/Can-I-do-regression-by-steps--tp18338562p18338562.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can I do regression by steps?
Thanks for the reply. I am awared of the difference, but can I do regression by steps at all? I am not feeling comfortable about it. John Sorkin wrote: Be very careful! When regression is performed by steps, you often will not get the same results as you would get from a single multivariable regression. The explanation for this is not simple, but a simplified explanation is that when you do your first regression, y=f(x1) all the total variance that can be accounted for is sucked up by x1 leaving little varinace to be accounted for by your second regression, residuals=f(x2). In contrast when you perform a multivariable regression, y=f(x1,x2) the total variance is proportioned between x1 and x2. John John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) rlearner309 [EMAIL PROTECTED] 7/8/2008 8:53 AM I saw this type of models in some of my company projects. To simplify: Y is regressed on X1 and X2. But the regression is done by two steps: First Y is regressed on X1 with intercept, and the residuals from the first step are used to regress on X2, without the constant. The reason to do so is some observations have X1 data but do not have X2, so I guess the person wants to use as much information as he can to get the coef. for X1, and then use part of the residuals (that have X2 data) to catch what is left to be explained by X2. But my concern is, should we consider the correlation between X1 and X2? If residuals from the first step are used, then X1 effect has been removed. Then what does it really mean by regressing residuals on X2, which has some X1 effect correlated with?? should X2 be adjusted by X1, too (regress X2 on X1 and use the residuals)? What if both X1 and X2 are dummy variables? Dummy variables can have a meaningful correlation, too, right? Thanks a lot! -- View this message in context: http://www.nabble.com/Can-I-do-regression-by-steps--tp18338562p18338562.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Confidentiality Statement: This email message, including any attachments, is for th...{{dropped:6}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Can-I-do-regression-by-steps--tp18338562p18350475.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] A quick question about lm()
I have a simple regression using lm(). If I just want to check the coefficient, I can use summary(lm())$coef; if I need the standard error, I can use summary(lm())$s, if I need the residuals, I can use summary(lm())$res. OK. How can I get the R-squares and Adjusted R-squares using $...? Is there a function, like objects(), that can show all the references for values? Thanks a lot! -- View this message in context: http://www.nabble.com/A-quick-question-about-lm%28%29-tp18316864p18316864.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A regression problem using dummy variables
sorry, made a stupid mistake. I got it. thanks a lot! Peter Dalgaard wrote: rlearner309 wrote: I think it is zero, because you have lots of zeros there. It is not like continous variables. Think again. The sum of products may be zero, but that is not the covariance. And don't dismiss Thomas, he is usually right. Anyways, the coefs of dummy variables represent differences to the same base level, and chosing a poorly determined base level (essentially: whose mean is determined by only a few observations) will cause high parameter correlation. It should only affect those parameters though, and it is not really clear what VIF means for dummy variables. One often choses to relevel() to make the largest group the base level, but it really comes down to which group contrasts you want to look at. Thomas Lumley wrote: On Wed, 2 Jul 2008, rlearner309 wrote: I think the covariance between dummy variables or between dummy variables and intercept should always be zero. meaning: no sigularity problem?? No. You can easily check that this is not true using the cov() function. Indicator variables for mutually exclusive groups are negatively correlated. -thomas rlearner309 wrote: This is actually more like a Statistics problem: I have a dataset with two dummy variables controlling three levels. The problem is, one level does not have many observations compared with other two levels (a couple of data points compared with 1000+ points on other levels). When I run the regression, the result is bad. I have unbalanced SE and VIF. Does this kind of problem also belong to near sigularity problem? Does it make any difference if I code the level that lacks data (0,0) in stead of (0,1)? thanks a lot! -- View this message in context: http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18237666.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- O__ Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18260470.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A regression problem using dummy variables
Yes. Because the slopes are supposed to be the same. Level shifts are needed to be modeled. Moshe Olshansky-2 wrote: Do you have a reason to treat all 3 levels together and not have a separate regression for each level? --- On Tue, 1/7/08, rlearner309 [EMAIL PROTECTED] wrote: From: rlearner309 [EMAIL PROTECTED] Subject: [R] A regression problem using dummy variables To: r-help@r-project.org Received: Tuesday, 1 July, 2008, 11:38 PM This is actually more like a Statistics problem: I have a dataset with two dummy variables controlling three levels. The problem is, one level does not have many observations compared with other two levels (a couple of data points compared with 1000+ points on other levels). When I run the regression, the result is bad. I have unbalanced SE and VIF. Does this kind of problem also belong to near sigularity problem? Does it make any difference if I code the level that lacks data (0,0) in stead of (0,1)? thanks a lot! -- View this message in context: http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18214377.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18230346.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A regression problem using dummy variables
I think the covariance between dummy variables or between dummy variables and intercept should always be zero. meaning: no sigularity problem?? rlearner309 wrote: This is actually more like a Statistics problem: I have a dataset with two dummy variables controlling three levels. The problem is, one level does not have many observations compared with other two levels (a couple of data points compared with 1000+ points on other levels). When I run the regression, the result is bad. I have unbalanced SE and VIF. Does this kind of problem also belong to near sigularity problem? Does it make any difference if I code the level that lacks data (0,0) in stead of (0,1)? thanks a lot! -- View this message in context: http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18237666.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A regression problem using dummy variables
I think it is zero, because you have lots of zeros there. It is not like continous variables. Thomas Lumley wrote: On Wed, 2 Jul 2008, rlearner309 wrote: I think the covariance between dummy variables or between dummy variables and intercept should always be zero. meaning: no sigularity problem?? No. You can easily check that this is not true using the cov() function. Indicator variables for mutually exclusive groups are negatively correlated. -thomas rlearner309 wrote: This is actually more like a Statistics problem: I have a dataset with two dummy variables controlling three levels. The problem is, one level does not have many observations compared with other two levels (a couple of data points compared with 1000+ points on other levels). When I run the regression, the result is bad. I have unbalanced SE and VIF. Does this kind of problem also belong to near sigularity problem? Does it make any difference if I code the level that lacks data (0,0) in stead of (0,1)? thanks a lot! -- View this message in context: http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18237666.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18248187.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] A regression problem using dummy variables
This is actually more like a Statistics problem: I have a dataset with two dummy variables controlling three levels. The problem is, one level does not have many observations compared with other two levels (a couple of data points compared with 1000+ points on other levels). When I run the regression, the result is bad. I have unbalanced SE and VIF. Does this kind of problem also belong to near sigularity problem? Does it make any difference if I code the level that lacks data (0,0) in stead of (0,1)? thanks a lot! -- View this message in context: http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18214377.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to get the distribution curve from a data set?
I have a question. I have a data set (about 100,000 observations). How would I get the distribution curve graph? This is like, if I use hist(x, freq=TRUE, breaks=1000) to get the histogram, now the question is, I don't need the histogram itself, I just need the curve that connects the top of each histogram bin. Thank you very much! -- View this message in context: http://www.nabble.com/how-to-get-the-distribution-curve-from-a-data-set--tp17678286p17678286.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Which lib/func should I use to get S-estimates of multivariate scatter?
I have a multivariate data set, and I would like to get the S-estimates of the scatter (robust estimates of variance-covariance matrix). Which library/function should I use? Thank you very much! -- View this message in context: http://www.nabble.com/Which-lib-func-should-I-use-to-get-S-estimates-of-multivariate-scatter--tp17678386p17678386.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.