[R] Problem with user defined split function in Rpart

2007-03-02 Thread Paolo Radaelli
Dear all,
I'm trying to manage with user defined split function in rpart. I (hope) 
correctly defined three functions (eval, split, init) I need.
I tested these function by executing them step by step and they work 
correctly.
When I try to build the tree by the rpart command (by specifying method= the 
list of the my three functions) I get no errors but the tree I obtain 
contains only the root !
I tried in different ways to understand the reason why the tree does not 
grow up but I had not success ...
The only strange thing I observed is that the CP parameter in summary.rpart 
is negative

 CP nsplit rel error
1 -1.011038  0 1

Where does the problem lie ?
Thank you for any helpful suggestion.

Paolo Radaelli

Paolo Radaelli
Dipartimento di Metodi Quantitativi per le Scienze Economiche ed Aziendali
Facoltà di Economia
Università degli Studi di Milano-Bicocca
P.zza dell'Ateneo Nuovo, 1
20126 Milano
Italy
e-mail [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] User defined split function in Rpart

2007-01-03 Thread Paolo Radaelli
Dear all,
 I'm trying to manage with user defined split function in rpart
(file rpart\tests\usersplits.R in 
http://cran.r-project.org/src/contrib/rpart_3.1-34.tar.gz - see bottom of 
the email).
Suppose to have the following data.frame (note that x's values are already 
sorted)
> D
y x
1 7 0.428
2 3 0.876
3 1 1.467
4 6 1.492
5 3 1.703
6 4 2.406
7 8 2.628
8 6 2.879
9 5 3.025
10 3 3.494
11 2 3.496
12 6 4.623
13 4 4.824
14 6 4.847
15 2 6.234
16 7 7.041
17 2 8.600
18 4 9.225
19 5 9.381
20 8 9.986

Running rpart and setting minbucket=1 and maxdepth=1 we get the following 
tree (which uses, by default, deviance):
> rpart(D$y~D$x,control=rpart.control(minbucket=1,maxdepth=1))
n= 20
node), split, n, deviance, yval * denotes terminal node
1) root 20 84.8 4.60
2) D$x< 9.6835 19 72.63158 4.421053 *
3) D$x>=9.6835 1 0.0 8.00 *

This means that the first 19 observation has been sent to the left side of 
the tree and one observation to the right.
This is correct when we observe goodness (the maximum is the last element of 
the vector).

The thing i really don't understand is the direction vector.
# direction= -1 = send "y< cutpoint" to the left side of the tree
# 1 = send "y< cutpoint" to the right

What does it mean ?
In the example here considered we have
> sign(lmean)
[1] 1 1 -1 -1 -1 -1 -1 1 1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1

Which is the criterion used ?
In my opinion we should have all the values equal to -1 given that they have 
to be sent to left side of the tree.
Does someone can help me ?
Thank you

###
# The split function, where most of the work occurs.
# Called once per split variable per node.
# If continuous=T (the case here considered)
# The actual x variable is ordered
# y is supplied in the sort order of x, with no missings,
# return two vectors of length (n-1):
# goodness = goodness of the split, larger numbers are better.
# 0 = couldn't find any worthwhile split
# the ith value of goodness evaluates splitting obs 1:i vs (i+1):n
# direction= -1 = send "y< cutpoint" to the left side of the tree
# 1 = send "y< cutpoint" to the right
# this is not a big deal, but making larger "mean y's" move towards
# the right of the tree, as we do here, seems to make it easier to
# read
# If continuos=F, x is a set of integers defining the groups for an
# unordered predictor. In this case:
# direction = a vector of length m= "# groups". It asserts that the
# best split can be found by lining the groups up in this order
# and going from left to right, so that only m-1 splits need to
# be evaluated rather than 2^(m-1)
# goodness = m-1 values, as before.
#
# The reason for returning a vector of goodness is that the C routine
# enforces the "minbucket" constraint. It selects the best return value
# that is not too close to an edge.
The vector wt of weights in our case is:
> wt
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

temp2 <- function(y, wt, x, parms, continuous) {
# Center y
n <- length(y)
y <- y- sum(y*wt)/sum(wt)
if (continuous) {
# continuous x variable
temp <- cumsum(y*wt)[-n]
left.wt <- cumsum(wt)[-n]
right.wt <- sum(wt) - left.wt
lmean <- temp/left.wt
rmean <- -temp/right.wt
goodness <- (left.wt*lmean^2 + right.wt*rmean^2)/sum(wt*y^2)
list(goodness= goodness, direction=sign(lmean))
}
}

Paolo Radaelli
Dipartimento di Metodi Quantitativi per le Scienze Economiche ed Aziendali
Facoltà di Economia
Università degli Studi di Milano-Bicocca
P.zza dell'Ateneo Nuovo, 1
20126 Milano
Italy
e-mail [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using a specified splitting criterion in tree and rpart

2006-11-27 Thread Paolo Radaelli
Dear all
I'm interested in fitting a classification tree by using a user-specified 
splitting criterion. If I am right,  Tree package allows to use only 
"deviance" or "gini" while the rpart package offers the possibility to 
specify, in the parms optional parameters for the splitting function, the 
component split (index) that can be chosen between gini and information.
Does someone can suggest me some documentation I can have a look do define 
my splitting criterion and pass it to rpart ?
Thanks
Paolo

Paolo Radaelli
Dipartimento di Metodi Quantitativi per le Scienze Economiche ed Aziendali
Facoltà di Economia
Università degli Studi di Milano-Bicocca
P.zza dell'Ateneo Nuovo, 1
20126 Milano
Italy
e-mail [EMAIL PROTECTED]
Tel +39 02 6448 3163
Fax +39 02 6448 3105

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Splitting criterion in tree and rpart

2006-11-24 Thread Paolo Radaelli
Dear all,
I'm interested in fitting a classification tree by using a user-specified 
splitting criterion. If I am right,  Tree package allows to use only "deviance" 
or "gini" while rpart package offers  "anova", "poisson", "class" or "exp". and 
tries to make an intelligent guess if method is missing. 
I also found that, in rpart, method can be a list of functions named init, 
split and eval but I really don't know how can I do it.  Does someone can 
suggest me some documentation I can have a look do define my splitting 
criterion and pass it to rpart ?
Thanks
Paolo

Paolo Radaelli
Dipartimento di Metodi Quantitativi per le Scienze Economiche ed Aziendali
Facoltà di Economia
Università degli Studi di Milano-Bicocca
e-mail [EMAIL PROTECTED]

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Compute quantiles with values and correspondent frequencies

2006-03-03 Thread Paolo Radaelli
Yes, it should work.
Thanks

- Original Message - 
From: "Sean Davis" <[EMAIL PROTECTED]>
To: "Paolo Radaelli" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Cc: "r-help" 
Sent: Friday, March 03, 2006 1:49 PM
Subject: Re: [R] Compute quantiles with values and correspondent frequencies


> How about this:
>
> > x <- rpois(100,4)
> > vals <- as.numeric(names(table(x)))
> > vals
>  [1]  0  1  2  3  4  5  6  7  8  9 10
> > tx.freq <- as.numeric(table(x))/100
> > tx.freq
>  [1] 0.01 0.07 0.12 0.16 0.27 0.10 0.07 0.07 0.11 0.01 0.01
> > quantile(x)
>   0%  25%  50%  75% 100%
>0346   10
> > quantile.from.freq <- function(vals,freq,quant) {
> + ord <- order(vals)
> + cs <- cumsum(freq[ord])
> + return(vals[max(which(cs > quantile.from.freq(vals,tx.freq,0.5)
> [1] 4
> > quantile.from.freq(vals,tx.freq,0.25)
> [1] 3
> > quantile.from.freq(vals,tx.freq,0.75)
> [1] 6
> > quantile.from.freq(vals,tx.freq,1)
> [1] 10
>
> Sean
>
>
> On 3/3/06 7:30 AM, "Paolo Radaelli" <[EMAIL PROTECTED]> wrote:
>
> > Yes, but it works only when you have integer counts for each value.
> > When I have only relative frequencies I can't repeat each value a
> > non-integer number of times.
> > Paolo
> >
> > - Original Message -
> > From: "Roger Bivand" <[EMAIL PROTECTED]>
> > To: "Paolo Radaelli" <[EMAIL PROTECTED]>
> > Cc: 
> > Sent: Friday, March 03, 2006 1:18 PM
> > Subject: Re: [R] Compute quantiles with values and correspondent
frequencies
> >
> >
> >> On Fri, 3 Mar 2006, Paolo Radaelli wrote:
> >>
> >>> Dear List, quantile(x) function allows to obtain specified quantiles
of
> >>> a vector of observations x.
> >>>
> >>> Is there an analogous function to compute quantiles in the case one
have
> >>> the vector of the observations x and the correspondent vector f of
> >>> relative frequencies ?
> >>
> >> Just use rep():
> >>
> >> x <- rpois(100, 4) # data
> >> quantile(x)
> >> tx <- table(x) # frequencies (need to be integer counts)
> >> tx
> >> v <- as.numeric(names(tx)) # values
> >> quantile(rep(v, as.integer(tx)))
> >>
> >> You have v and tx, so this should work.
> >>
> >>>
> >>> Thank you
> >>>
> >>> Paolo Radaelli
> >>>
> >>>
> >>>
> >>> [[alternative HTML version deleted]]
> >>>
> >>> __
> >>> R-help@stat.math.ethz.ch mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> >>>
> >>
> >> -- 
> >> Roger Bivand
> >> Economic Geography Section, Department of Economics, Norwegian School
of
> >> Economics and Business Administration, Helleveien 30, N-5045 Bergen,
> >> Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
> >> e-mail: [EMAIL PROTECTED]
> >>
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Compute quantiles with values and correspondent frequencies

2006-03-03 Thread Paolo Radaelli
Yes, but it works only when you have integer counts for each value.
When I have only relative frequencies I can't repeat each value a
non-integer number of times.
Paolo

- Original Message - 
From: "Roger Bivand" <[EMAIL PROTECTED]>
To: "Paolo Radaelli" <[EMAIL PROTECTED]>
Cc: 
Sent: Friday, March 03, 2006 1:18 PM
Subject: Re: [R] Compute quantiles with values and correspondent frequencies


> On Fri, 3 Mar 2006, Paolo Radaelli wrote:
>
> > Dear List, quantile(x) function allows to obtain specified quantiles of
> > a vector of observations x.
> >
> > Is there an analogous function to compute quantiles in the case one have
> > the vector of the observations x and the correspondent vector f of
> > relative frequencies ?
>
> Just use rep():
>
> x <- rpois(100, 4) # data
> quantile(x)
> tx <- table(x) # frequencies (need to be integer counts)
> tx
> v <- as.numeric(names(tx)) # values
> quantile(rep(v, as.integer(tx)))
>
> You have v and tx, so this should work.
>
> >
> > Thank you
> >
> > Paolo Radaelli
> >
> >
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
> >
>
> -- 
> Roger Bivand
> Economic Geography Section, Department of Economics, Norwegian School of
> Economics and Business Administration, Helleveien 30, N-5045 Bergen,
> Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
> e-mail: [EMAIL PROTECTED]
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Compute quantiles with values and correspondent frequencies

2006-03-03 Thread Paolo Radaelli
Dear List, 
quantile(x) function allows to obtain specified quantiles of a vector of 
observations x. 

Is there an analogous function to compute quantiles in the case one have the 
vector of the observations x and the correspondent vector f of relative 
frequencies ?

Thank you

Paolo Radaelli



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Hierarchical clustering with centroid method

2005-07-26 Thread Paolo Radaelli
Dear everybody! 
In the function hclust, at each stage distances between clusters are recomputed 
by the Lance-Williams dissimilarity update formula according to the
particular clustering method being used.
Using "centroid" method, Lance-Williams recurrence formula works properly only 
for euclidean distance. 
How is it possible to use properly centroid method with manhattan distance ?
Thanks

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] R-estimators

2004-04-19 Thread Paolo Radaelli
Dear all,
could you please suggest me the package I have to install in order to use ranks 
statistics in regression models (R-Estimators, wilcoxon scores ...)
Thanks in advance

Paolo Radaelli
[[alternative HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] R-estimators

2004-04-09 Thread Paolo Radaelli
Dear all,
could you please suggest me the package I have to install in order to use ranks 
statistics in regression models (R-Estimators, wilcoxon scores ...)
Thanks and Happy Easter

Paolo Radaelli
[[alternative HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html