[R] How to get robust M-estimator of multivariate scatter using Huber's psi?

2008-11-19 Thread rlearner309

How to get robust M-estimators of multivariate scatter using Huber's psi?
Which package/function should I look into?  Ideally, I hope I can
self-define thresholds of Huber's psi function.
Thanks a lot!!!
-- 
View this message in context: 
http://www.nabble.com/How-to-get-robust-M-estimator-of-multivariate-scatter-using-Huber%27s-psi--tp20585755p20585755.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to get robust M-estimator of multivariate scatter using Huber's psi?

2008-11-19 Thread rlearner309

Yes I did.  But the closest I found is the covMest function in rrcov package,
which uses biweight function, which is rescending.   Unfortunately, I need
to use Huber's psi function to derive multivariate scatter, and I hope I can
try different thresholds.  
Any help?  Thanks a lot!!!  


 

David Winsemius wrote:
 
 Have you looked at the obvious Task View yet?
 
 http://cran.r-project.org/web/views/Robust.html
 
 
 --  
 David Winsemius
 On Nov 19, 2008, at 1:07 PM, rlearner309 wrote:
 

 How to get robust M-estimators of multivariate scatter using Huber's  
 psi?
 Which package/function should I look into?  Ideally, I hope I can
 self-define thresholds of Huber's psi function.
 Thanks a lot!!!
 -- 
 View this message in context:
 http://www.nabble.com/How-to-get-robust-M-estimator-of-multivariate-scatter-using-Huber%27s-psi--tp20585755p20585755.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/How-to-get-robust-M-estimator-of-multivariate-scatter-using-Huber%27s-psi--tp20585755p20587324.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] A question about positive definite matrix

2008-10-20 Thread rlearner309

I know, this is a forum about R.  But I am so desperate of this problem (BTW,
anyone knows any good Statistics/Math forum to post question like this?):

A and B are both n x n positive definite matrix.
Denote A  B, if A - B is positive definite.
I know this is true: if A  B, then A^{-1}  B^{-1}.  But how to prove this?
I tried to diagonalize A and B, but since they can have different eigen
structure,... I am stuck here.
Thanks a lot for any help here.
-- 
View this message in context: 
http://www.nabble.com/A-question-about-positive-definite-matrix-tp20063054p20063054.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to add notes to the graph?

2008-07-28 Thread rlearner309

Hi,  
I have a simple graph:
x - c(1,2,3)
plot(x, pch=16,type=b)

I would like to add some notes just beside these 3 dots, and the notes are
stored in a vector:

a - c(12,54,84)  

So the result will be: there should be a 12 below the first dot (or next
to it, but not replacing the solid dot), a 54 next to the second dot...

what if a is not numeric? a - c(a,b,c)

Thank you very much!!
-- 
View this message in context: 
http://www.nabble.com/how-to-add-notes-to-the-graph--tp18689195p18689195.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to control the memory?

2008-07-23 Thread rlearner309

Hi, I have a huge data set to deal with.  Sometimes I got warning message
about limitation of memory, sometimes R reads in the data but it is very
slow.
My question is, is there anything I can do to lift the memory limitation (I
did memory.limit(4095), anything else I can do?).  I know I can also drop
variables that R remembered, but I forgot the command.
Thanks a lot!
Oh, BTW, is there a function that can be used to generate weekdays (skip
weekends automatically)?
-- 
View this message in context: 
http://www.nabble.com/How-to-control-the-memory--tp18611155p18611155.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to filter a data frame?

2008-07-22 Thread rlearner309

I have a question about how to filter the data frame:
Suppose my data frame has variables like gender, age,... How to get a subset
of the data frame, with only female (or male) and/or age  50...?  What is
the typical syntax?  I tried several condition expressions, but none of them
worked...

Thanks a lot!
-- 
View this message in context: 
http://www.nabble.com/How-to-filter-a-data-frame--tp18587502p18587502.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Does R have SQL interface in windows?

2008-07-22 Thread rlearner309

Seems that the RmySQL package does not support windows...
Thanks  a lot!
-- 
View this message in context: 
http://www.nabble.com/Does-R-have-SQL-interface-in-windows--tp18587733p18587733.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to filter a data frame?

2008-07-22 Thread rlearner309

Thank you all!!   :-) 



rlearner309 wrote:
 
 I have a question about how to filter the data frame:
 Suppose my data frame has variables like gender, age,... How to get a
 subset of the data frame, with only female (or male) and/or age  50...? 
 What is the typical syntax?  I tried several condition expressions, but
 none of them worked...
 
 Thanks a lot!
 

-- 
View this message in context: 
http://www.nabble.com/How-to-filter-a-data-frame--tp18587502p18598469.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Can I do regression by steps?

2008-07-08 Thread rlearner309

I saw this type of models in some of my company projects.  

To simplify:
Y is regressed on X1 and X2.  But the regression is done by two steps: 
First Y is regressed on X1 with intercept, and the residuals from the first
step are used to regress on X2, without the constant.  The reason to do so
is some observations have X1 data but do not have X2, so I guess the person
wants to use as much information as he can to get the coef. for X1, and then
use part of the residuals (that have X2 data) to catch what is left to be
explained by X2.

But my concern is, should we consider the correlation between X1 and X2?  If
residuals from the first step are used, then X1 effect has been removed. 
Then what does it really mean by regressing residuals on X2, which has some
X1 effect correlated with?? should X2 be adjusted by X1, too (regress X2 on
X1 and use the residuals)?  

What if both X1 and X2 are dummy variables?  Dummy variables can have a
meaningful correlation, too, right?

Thanks a lot!
-- 
View this message in context: 
http://www.nabble.com/Can-I-do-regression-by-steps--tp18338562p18338562.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can I do regression by steps?

2008-07-08 Thread rlearner309

Thanks for the reply.
I am awared of the difference, but can I do regression by steps at all?  I
am not feeling comfortable about it.



John Sorkin wrote:
 
 Be very careful!
 When regression is performed by steps, you often will not get the same
 results as you would get from a single multivariable regression. The
 explanation for this is not simple, but a simplified explanation is that
 when you do your first regression,
 y=f(x1)
 all the total variance that can be accounted for is sucked up by x1
 leaving little varinace to be accounted for by your second regression,
 residuals=f(x2). In contrast when you perform a multivariable regression,
 y=f(x1,x2) the total variance is proportioned between x1 and x2.
 John
 
 John David Sorkin M.D., Ph.D.
 Chief, Biostatistics and Informatics
 University of Maryland School of Medicine Division of Gerontology
 Baltimore VA Medical Center
 10 North Greene Street
 GRECC (BT/18/GR)
 Baltimore, MD 21201-1524
 (Phone) 410-605-7119
 (Fax) 410-605-7913 (Please call phone number above prior to faxing)
 
 rlearner309 [EMAIL PROTECTED] 7/8/2008 8:53 AM 
 
 I saw this type of models in some of my company projects.  
 
 To simplify:
 Y is regressed on X1 and X2.  But the regression is done by two steps: 
 First Y is regressed on X1 with intercept, and the residuals from the
 first
 step are used to regress on X2, without the constant.  The reason to do so
 is some observations have X1 data but do not have X2, so I guess the
 person
 wants to use as much information as he can to get the coef. for X1, and
 then
 use part of the residuals (that have X2 data) to catch what is left to be
 explained by X2.
 
 But my concern is, should we consider the correlation between X1 and X2? 
 If
 residuals from the first step are used, then X1 effect has been removed. 
 Then what does it really mean by regressing residuals on X2, which has
 some
 X1 effect correlated with?? should X2 be adjusted by X1, too (regress X2
 on
 X1 and use the residuals)?  
 
 What if both X1 and X2 are dummy variables?  Dummy variables can have a
 meaningful correlation, too, right?
 
 Thanks a lot!
 -- 
 View this message in context:
 http://www.nabble.com/Can-I-do-regression-by-steps--tp18338562p18338562.html 
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help 
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html 
 and provide commented, minimal, self-contained, reproducible code.
 
 Confidentiality Statement:
 This email message, including any attachments, is for th...{{dropped:6}}
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Can-I-do-regression-by-steps--tp18338562p18350475.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] A quick question about lm()

2008-07-07 Thread rlearner309

I have a simple regression using lm().
If I just want to check the coefficient, I can use summary(lm())$coef; if I
need the standard error, I can use summary(lm())$s, if I need the residuals,
I can use summary(lm())$res.  OK.  How can I get the R-squares and Adjusted
R-squares using $...?
Is there a function, like objects(), that can show all the references for
values?

Thanks a lot!
-- 
View this message in context: 
http://www.nabble.com/A-quick-question-about-lm%28%29-tp18316864p18316864.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A regression problem using dummy variables

2008-07-03 Thread rlearner309

sorry, made a stupid mistake.
I got it.
thanks a lot!

Peter Dalgaard wrote:
 
 rlearner309 wrote:
 I think it is zero, because you have lots of zeros there.  It is not like
 continous variables.

   
 Think again. The sum of products may be zero, but that is not the 
 covariance. And don't dismiss Thomas, he is usually right.
 
 Anyways, the coefs of dummy variables represent differences to the same 
 base level, and chosing a poorly determined base level (essentially: 
 whose mean is determined by only a few observations) will cause high 
 parameter correlation. It should only affect those parameters though, 
 and it is not really clear what VIF means for dummy variables. One often 
 choses to relevel() to make the largest group the base level, but it 
 really comes down to which group contrasts you want to look at.
 
 

 Thomas Lumley wrote:
   
 On Wed, 2 Jul 2008, rlearner309 wrote:

 
 I think the covariance between dummy variables or between dummy
 variables
 and
 intercept should always be zero.  meaning: no sigularity problem??

   
 No.  You can easily check that this is not true using the cov()
 function.
 Indicator variables for mutually exclusive groups are negatively
 correlated.

  -thomas



 
 rlearner309 wrote:
   
 This is actually more like a Statistics problem:
 I have a dataset with two dummy variables controlling three levels. 
 The
 problem is, one level does not have many observations compared with
 other
 two levels (a couple of data points compared with 1000+ points on
 other
 levels).  When I run the regression, the result is bad.  I have
 unbalanced
 SE and VIF.  Does this kind of problem also belong to near
 sigularity
 problem?  Does it make any difference if I code the level that lacks
 data
 (0,0) in stead of (0,1)?

 thanks a lot!

 
 --
 View this message in context:
 http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18237666.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

   
 Thomas Lumley   Assoc. Professor, Biostatistics
 [EMAIL PROTECTED]   University of Washington, Seattle

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 

   
 
 
 -- 
O__   Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B
   c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
  (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
 ~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18260470.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A regression problem using dummy variables

2008-07-02 Thread rlearner309

Yes.  Because the slopes are supposed to be the same.
Level shifts are needed to be modeled.


Moshe Olshansky-2 wrote:
 
 Do you have a reason to treat all 3 levels together and not have a
 separate regression for each level?
 
 
 --- On Tue, 1/7/08, rlearner309 [EMAIL PROTECTED] wrote:
 
 From: rlearner309 [EMAIL PROTECTED]
 Subject: [R]  A regression problem using dummy variables
 To: r-help@r-project.org
 Received: Tuesday, 1 July, 2008, 11:38 PM
 This is actually more like a Statistics problem:
 I have a dataset with two dummy variables controlling three
 levels.  The
 problem is, one level does not have many observations
 compared with other
 two levels (a couple of data points compared with 1000+
 points on other
 levels).  When I run the regression, the result is bad.  I
 have unbalanced
 SE and VIF.  Does this kind of problem also belong to
 near sigularity
 problem?  Does it make any difference if I code the level
 that lacks data
 (0,0) in stead of (0,1)?
 
 thanks a lot!
 -- 
 View this message in context:
 http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18214377.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18230346.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A regression problem using dummy variables

2008-07-02 Thread rlearner309

I think the covariance between dummy variables or between dummy variables and
intercept should always be zero.  meaning: no sigularity problem??



rlearner309 wrote:
 
 This is actually more like a Statistics problem:
 I have a dataset with two dummy variables controlling three levels.  The
 problem is, one level does not have many observations compared with other
 two levels (a couple of data points compared with 1000+ points on other
 levels).  When I run the regression, the result is bad.  I have unbalanced
 SE and VIF.  Does this kind of problem also belong to near sigularity
 problem?  Does it make any difference if I code the level that lacks data
 (0,0) in stead of (0,1)?
 
 thanks a lot!
 

-- 
View this message in context: 
http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18237666.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A regression problem using dummy variables

2008-07-02 Thread rlearner309

I think it is zero, because you have lots of zeros there.  It is not like
continous variables.



Thomas Lumley wrote:
 
 On Wed, 2 Jul 2008, rlearner309 wrote:
 

 I think the covariance between dummy variables or between dummy variables
 and
 intercept should always be zero.  meaning: no sigularity problem??

 
 No.  You can easily check that this is not true using the cov() function.
 Indicator variables for mutually exclusive groups are negatively
 correlated.
 
  -thomas
 
 
 

 rlearner309 wrote:

 This is actually more like a Statistics problem:
 I have a dataset with two dummy variables controlling three levels.  The
 problem is, one level does not have many observations compared with
 other
 two levels (a couple of data points compared with 1000+ points on other
 levels).  When I run the regression, the result is bad.  I have
 unbalanced
 SE and VIF.  Does this kind of problem also belong to near sigularity
 problem?  Does it make any difference if I code the level that lacks
 data
 (0,0) in stead of (0,1)?

 thanks a lot!


 --
 View this message in context:
 http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18237666.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 Thomas Lumley Assoc. Professor, Biostatistics
 [EMAIL PROTECTED] University of Washington, Seattle
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18248187.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] A regression problem using dummy variables

2008-07-01 Thread rlearner309

This is actually more like a Statistics problem:
I have a dataset with two dummy variables controlling three levels.  The
problem is, one level does not have many observations compared with other
two levels (a couple of data points compared with 1000+ points on other
levels).  When I run the regression, the result is bad.  I have unbalanced
SE and VIF.  Does this kind of problem also belong to near sigularity
problem?  Does it make any difference if I code the level that lacks data
(0,0) in stead of (0,1)?

thanks a lot!
-- 
View this message in context: 
http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18214377.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to get the distribution curve from a data set?

2008-06-05 Thread rlearner309

I have a question.
I have a data set (about 100,000 observations).  How would I get the
distribution curve graph?  This is like,  if I use hist(x, freq=TRUE,
breaks=1000) to get the histogram, now the question is, I don't need the
histogram itself, I just need the curve that connects the top of each
histogram bin.
Thank you very much!

-- 
View this message in context: 
http://www.nabble.com/how-to-get-the-distribution-curve-from-a-data-set--tp17678286p17678286.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Which lib/func should I use to get S-estimates of multivariate scatter?

2008-06-05 Thread rlearner309

I have a multivariate data set, and I would like to get the S-estimates of
the scatter (robust estimates of variance-covariance matrix).  Which
library/function should I use?  
Thank you very much!
-- 
View this message in context: 
http://www.nabble.com/Which-lib-func-should-I-use-to-get-S-estimates-of-multivariate-scatter--tp17678386p17678386.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.