Re: [R] randomForest outlier return NA

2010-07-15 Thread Liaw, Andy
There's a bug in the code.  If you add row names to the X matrix befor
you call randomForest(), you'd get:

R summary (outlier(mdl.rf) )
   Min. 1st Qu.  MedianMean 3rd Qu.Max. 
-1.0580 -0.5957  0.  0.6406  1.2650  9.5200 

I'll fix this in the next release.  Thanks for reporting.

Best,
Andy 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Pau Carrio Gaspar
Sent: Wednesday, July 14, 2010 6:36 AM
To: r-help@r-project.org
Subject: [R] randomForest outlier return NA

Dear R-users,

I have a problem with randomForest{outlier}.
After running the following code ( that produces a silly data set and
builds
a model with randomForest ):

###
library(randomForest)
set.seed(0)

## build data set
X - rbind(  matrix( runif(n=400,min=-1,max=1), ncol = 10 ) ,
rep(1,times= 10 )  )
Y - matrix( nrow = nrow(X), ncol = 1)
for( i in (1:nrow(X))){   Y[i,1] - sign( sum ( X[i,])) }

## build model
mdl.rf -  randomForest( x = X, y = as.factor(Y) , proximity=TRUE ,
mtry =
10 , ntree = 500)
summary (outlier(mdl.rf) )
###

I get the following output:

  Min. 1st Qu.  MedianMean 3rd Qu.Max.NA's
 41


Can anyone explain why the output of outlier only returns NA's ?

Thanks
Pau

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Notice:  This e-mail message, together with any attachme...{{dropped:11}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] randomForest outlier return NA

2010-07-15 Thread Pau Carrio Gaspar
 Hi Andy,

thanks for your reply and your further correction.

While the next release is not available I rewrite my code with your
suggestion in case it might help anyone.

###
library(randomForest)
set.seed(0)

## build data set in data frame
X - rbind(   matrix( runif(n=400,min=-1,max=1), ncol = 10 ) ,
rep(1,times= 10 )  )
Y -  matrix( nrow =  nrow(X) , ncol = 1)
for( i in (1:nrow(X))){   Y[i,1] - sign( sum ( as.numeric(X[i,]))) }

df - data.frame( X , Y )
##remove
rm(X,Y)
## build model
mdl.rf -  randomForest( formula = as.factor(Y) ~ . , data = df ,
proximity=TRUE ,  mtry = 10 , ntree = 500 )
summary (outlier(mdl.rf) )
##

Regards
Pau



2010/7/15 Liaw, Andy andy_l...@merck.com

 There's a bug in the code.  If you add row names to the X matrix befor
 you call randomForest(), you'd get:

 R summary (outlier(mdl.rf) )
Min. 1st Qu.  MedianMean 3rd Qu.Max.
 -1.0580 -0.5957  0.  0.6406  1.2650  9.5200

 I'll fix this in the next release.  Thanks for reporting.

 Best,
 Andy

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Pau Carrio Gaspar
 Sent: Wednesday, July 14, 2010 6:36 AM
 To: r-help@r-project.org
 Subject: [R] randomForest outlier return NA

 Dear R-users,

 I have a problem with randomForest{outlier}.
 After running the following code ( that produces a silly data set and
 builds
 a model with randomForest ):

 ###
 library(randomForest)
 set.seed(0)

 ## build data set
 X - rbind(  matrix( runif(n=400,min=-1,max=1), ncol = 10 ) ,
 rep(1,times= 10 )  )
 Y - matrix( nrow = nrow(X), ncol = 1)
 for( i in (1:nrow(X))){   Y[i,1] - sign( sum ( X[i,])) }

 ## build model
 mdl.rf -  randomForest( x = X, y = as.factor(Y) , proximity=TRUE ,
 mtry =
 10 , ntree = 500)
 summary (outlier(mdl.rf) )
 ###

 I get the following output:

  Min. 1st Qu.  MedianMean 3rd Qu.Max.NA's
 41


 Can anyone explain why the output of outlier only returns NA's ?

 Thanks
 Pau

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 Notice:  This e-mail message, together with any attach...{{dropped:16}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] randomForest outlier return NA

2010-07-14 Thread Pau Carrio Gaspar
Dear R-users,

I have a problem with randomForest{outlier}.
After running the following code ( that produces a silly data set and builds
a model with randomForest ):

###
library(randomForest)
set.seed(0)

## build data set
X - rbind(  matrix( runif(n=400,min=-1,max=1), ncol = 10 ) ,
rep(1,times= 10 )  )
Y - matrix( nrow = nrow(X), ncol = 1)
for( i in (1:nrow(X))){   Y[i,1] - sign( sum ( X[i,])) }

## build model
mdl.rf -  randomForest( x = X, y = as.factor(Y) , proximity=TRUE ,  mtry =
10 , ntree = 500)
summary (outlier(mdl.rf) )
###

I get the following output:

  Min. 1st Qu.  MedianMean 3rd Qu.Max.NA's
 41


Can anyone explain why the output of outlier only returns NA's ?

Thanks
Pau

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] randomForest outlier

2008-07-16 Thread Liaw, Andy
Perhaps if you follow the posting guide more closely, you might get more
(useful) replies, but without looking at your data, I doubt there's much
anyone can do for you.

The fact that the range of the outlying measures is -1 to 2 would tell
me there are no potential outliers by this measure.  Please see the
value section of ?outlier to see how this measure is computed.

Andy 

From: Birgitle
 
 Still the same question:
 
 
 Birgitle wrote:
  
  I try to use ?randomForest to find variables that are the 
 most important
  to divide my dataset (continuous, categorical variables) in 
 two given
  groups.
  
  But when I plot the outlier:
  
  plot(outlier(rfObject, cls=groupingVariable),
  type=p,col=c(red,green)[as.numeric(groupingVariable)])
  
  it seems to me that all my values appear as outliers.
  Has anybody suggestions what is going wrong in my analysis?
  
  
  
  
 
 Additonal remark
 The scaling of the y-axis is quite small between -1 and 2. 
 
 
 -
 The art of living is more like wrestling than dancing.
 (Marcus Aurelius)
 -- 
 View this message in context: 
 http://www.nabble.com/randomForest-outlier-tp17979182p18466832.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
Notice:  This e-mail message, together with any attachme...{{dropped:12}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] randomForest outlier

2008-07-16 Thread Birgit Lemcke

Thanks anyway for your answer.
That was also an option that I took into account (no potential  
outliers) and I will have a look at the value section of ?outliers.


B.

Am 16.07.2008 um 14:11 schrieb Liaw, Andy:

Perhaps if you follow the posting guide more closely, you might get  
more
(useful) replies, but without looking at your data, I doubt there's  
much

anyone can do for you.

The fact that the range of the outlying measures is -1 to 2 would tell
me there are no potential outliers by this measure.  Please see the
value section of ?outlier to see how this measure is computed.

Andy

From: Birgitle


Still the same question:


Birgitle wrote:


I try to use ?randomForest to find variables that are the

most important

to divide my dataset (continuous, categorical variables) in

two given

groups.

But when I plot the outlier:

plot(outlier(rfObject, cls=groupingVariable),
type=p,col=c(red,green)[as.numeric(groupingVariable)])

it seems to me that all my values appear as outliers.
Has anybody suggestions what is going wrong in my analysis?






Additonal remark
The scaling of the y-axis is quite small between -1 and 2.


-
The art of living is more like wrestling than dancing.
(Marcus Aurelius)
--
View this message in context:
http://www.nabble.com/randomForest-outlier-tp17979182p18466832.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Notice:  This e-mail message, together with any attachments, contains
information of Merck  Co., Inc. (One Merck Drive, Whitehouse Station,
New Jersey, USA 08889), and/or its affiliates (which may be known
outside the United States as Merck Frosst, Merck Sharp  Dohme or
MSD and in Japan, as Banyu - direct contact information for  
affiliates is

available at http://www.merck.com/contact/contacts.html) that may be
confidential, proprietary copyrighted and/or legally privileged. It is
intended solely for the use of the individual or entity named on this
message. If you are not the intended recipient, and have received this
message in error, please notify us immediately by reply e-mail and
then delete it from your system.



===
Birgit Lemcke
Institut of Systematic Botany
University of Zurich
Zollikerstrasse 107
CH-8008 Zürich
Switzerland
Ph: +41 (0)44 634 8351
mail: [EMAIL PROTECTED]
===

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] randomForest outlier

2008-07-16 Thread Birgit Lemcke

I use a different dissimlarity measure (library(analogue);Gowers Index).
I just wanted to look if there are similar values in both tables.

I mainly try to find a way to find the best model to explain my  
predefined groups (using a bunch of different variables:  
factors,count,numeric, ordered factors)

I am also fiddling around with a logistic regression.

B.


Am 16.07.2008 um 14:58 schrieb Liaw, Andy:

Note that I did say by this measure: what you may want to  
consider as an outlier may not be what this measure picks out.   
After all, RF proximities are a bit unusual as a similarity measure.



-Original Message-
From: Birgit Lemcke [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 16, 2008 8:55 AM
To: Liaw, Andy
Cc: R Hilfe
Subject: Re: [R] randomForest outlier

Thanks anyway for your answer.
That was also an option that I took into account (no potential
outliers) and I will have a look at the value section of ?outliers.

B.

Am 16.07.2008 um 14:11 schrieb Liaw, Andy:


Perhaps if you follow the posting guide more closely, you

might get

more
(useful) replies, but without looking at your data, I doubt

there's

much
anyone can do for you.

The fact that the range of the outlying measures is -1 to 2

would tell

me there are no potential outliers by this measure.  Please see the
value section of ?outlier to see how this measure is computed.

Andy

From: Birgitle


Still the same question:


Birgitle wrote:


I try to use ?randomForest to find variables that are the

most important

to divide my dataset (continuous, categorical variables) in

two given

groups.

But when I plot the outlier:

plot(outlier(rfObject, cls=groupingVariable),
type=p,col=c(red,green)[as.numeric(groupingVariable)])

it seems to me that all my values appear as outliers.
Has anybody suggestions what is going wrong in my analysis?






Additonal remark
The scaling of the y-axis is quite small between -1 and 2.


-
The art of living is more like wrestling than dancing.
(Marcus Aurelius)
--
View this message in context:
http://www.nabble.com/randomForest-outlier-tp17979182p18466832.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Notice:  This e-mail message, together with any

attachments, contains

information of Merck  Co., Inc. (One Merck Drive,

Whitehouse Station,

New Jersey, USA 08889), and/or its affiliates (which may be known
outside the United States as Merck Frosst, Merck Sharp  Dohme or
MSD and in Japan, as Banyu - direct contact information for
affiliates is
available at http://www.merck.com/contact/contacts.html) that may be
confidential, proprietary copyrighted and/or legally

privileged. It is

intended solely for the use of the individual or entity

named on this

message. If you are not the intended recipient, and have

received this

message in error, please notify us immediately by reply e-mail and
then delete it from your system.



===
Birgit Lemcke
Institut of Systematic Botany
University of Zurich
Zollikerstrasse 107
CH-8008 Zürich
Switzerland
Ph: +41 (0)44 634 8351
mail: [EMAIL PROTECTED]
===








Notice:  This e-mail message, together with any attachments, contains
information of Merck  Co., Inc. (One Merck Drive, Whitehouse Station,
New Jersey, USA 08889), and/or its affiliates (which may be known
outside the United States as Merck Frosst, Merck Sharp  Dohme or
MSD and in Japan, as Banyu - direct contact information for  
affiliates is

available at http://www.merck.com/contact/contacts.html) that may be
confidential, proprietary copyrighted and/or legally privileged. It is
intended solely for the use of the individual or entity named on this
message. If you are not the intended recipient, and have received this
message in error, please notify us immediately by reply e-mail and
then delete it from your system.



===
Birgit Lemcke
Institut of Systematic Botany
University of Zurich
Zollikerstrasse 107
CH-8008 Zürich
Switzerland
Ph: +41 (0)44 634 8351
mail: [EMAIL PROTECTED]
===

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] randomForest outlier

2008-07-15 Thread Birgitle

Still the same question:


Birgitle wrote:
 
 I try to use ?randomForest to find variables that are the most important
 to divide my dataset (continuous, categorical variables) in two given
 groups.
 
 But when I plot the outlier:
 
 plot(outlier(rfObject, cls=groupingVariable),
 type=p,col=c(red,green)[as.numeric(groupingVariable)])
 
 it seems to me that all my values appear as outliers.
 Has anybody suggestions what is going wrong in my analysis?
 
 
 
 

Additonal remark
The scaling of the y-axis is quite small between -1 and 2. 


-
The art of living is more like wrestling than dancing.
(Marcus Aurelius)
-- 
View this message in context: 
http://www.nabble.com/randomForest-outlier-tp17979182p18466832.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] randomForest outlier

2008-06-18 Thread Birgitle

I try to use ?randomForest to find variables that are the most important to
divide my dataset (continuous, categorical variables) in two given groups.

But when I plot the outliers:

plot(outlier(FemMalSex_NAavoid88.rf33, cls=FemMalSex_NAavoid88$Sex),
type=h,col=c(red,green)[as.numeric(FemMalSex_NAavoid88$Sex)])

it seems to me that all my values appear as outliers.
Has anybody suggestions what is going wrong in my analysis?




-
The art of living is more like wrestling than dancing.
(Marcus Aurelius)
-- 
View this message in context: 
http://www.nabble.com/randomForest-outlier-tp17979182p17979182.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.