[R] mob(party) formula question

2008-08-13 Thread Birgitle

I try tu use mob() with my data.frame ('data.frame':288 obs. of  81
variables; factors, numerics and ordered factors)
My response is a binary variable and I should use for modelling a logistic
regression (family=binomial).

I read in the MOB Vignette that I could use a formula like this if I would
like to have only partitioning variables apart from the response.

Test.mob-mob(Resp~1|Var1+Var2+, data=dataframe, model=glinearModel,
family=binomial())

but this gives me back an error-message:

Fehler in `[.data.frame`(x, r, vars, drop = drop) : 
  undefined columns selected

Error in `[.data.frame`(x, r, vars, drop = drop) : 
  undefined columns selected

But Var1, Var2 and Resp are in my dataframe. Why do I get this error?

I am also wondering how I can find out which variables I should use for
partitioning and which for modelling?

There are correlations between some variables in my dataframe. Would it be a
possibility to use always one variable of the correlated variable-pairs for
partitioning and one for modelling?

I would be very happy if somebody could give me some hints or answers to my
questions.

Many thanks in advance.

B.



-
The art of living is more like wrestling than dancing.
(Marcus Aurelius)
-- 
View this message in context: 
http://www.nabble.com/mob%28party%29-formula-question-tp18959898p18959898.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mob(party) formula question (example)

2008-08-13 Thread Birgitle

Here is an example that produces the same error:

Read in the following as textfile (save as DFExample.txt):

1   2   3   4   7   8   9   10  12  
13  14  15  16  17  18  19  21  22  23  
25  27  28  29  30  31  33  34
35  36  37  38  39  40  41  42  43  44  
45  46  47  48  49  50  51  52  53  54  
55  56  58  59  60
61  62  63  64  65  66  67  68  69  70  
71  72  73  74  75  76  77  78  79  80
AX  1   1   0   0   1   0   0   1   0   
0   0   0   0   0   0   1   0   1   0   
0   1   1   0   0   0   0   0   1   0   
0   0   1   0   0   0   0   0
1   0   0   0   0   0   0   0   0   0   
1   1   0   0   1   0   1   25  5   9   
1   8.5 2.5 3   5   2   2   3   3   1   
1   1   2   1   2
BX  1   1   0   0   1   0   0   1   NA  
NA  NA  0   0   0   0   1   0   0   1   
0   0   1   0   NA  NA  NA  NA  NA  NA  
NA  NA
0   0   0   1   0   NA  NA  NA  NA  NA  
NA  NA  0   0   0   0   1   1   0   0   
0   1   1   NA  NA  6   1   3.252.255   
5
2   2   3   3   1   1   1   1   1   1
CX  1   1   0   0   1   0   0   1   1   
0   0   0   1   0   1   0   0   1   0   
1   0   0   0   0   1   1   0   0   0   
0   0   0   1   0   0   0   0
1   0   0   0   0   0   0   0   0   1   
0   0   0   0   1   0   0   15  3.5 6   
1   5.5 5.5 5   5   2   2   1   2   1   
1   1   1   2   2
DX  1   1   0   0   1   0   0   1   0   
0   0   0   0   0   0   1   0   0   1   
0   0   1   0   0   1   0   1   0   0   
0   1   0   0   0   0   1   1
0   0   0   0   0   0   0   0   0   1   
0   1   0   1   0   1   0   50  17.57.5 
2.5 8.5 5   5   5   2   2   2   3   1   
1   1   1
3   3
EX  1   0   1   0   1   0   0   1   NA  
NA  NA  0   0   0   1   1   0   1   1   
0   1   0   0   0   0   0   0   1   0   
0   0   0   1   0   0
0   0   0   0   0   1   0   0   0   0   
0   1   0   1   0   0   0   1   0   NA  
NA  14.530  13  2.5 3   3   1   1   4   
4   1   1   1
1   1   1
FX  1   0   1   0   1   0   0   1   0   
0   0   0   0   0   0   1   0   0   1   
0   1   0   0   0   1   1   1   0   0   
1   1   0   0   0   1   1   1
0   0   0   1   0   0   0   0   0   1   
0   1   0   1   1   0   0   165 25  11.5
15  12  6.5 5   5   1   1   3   3   1   
1   1   1
4   5
GX  1   0   1   0   1   0   0   1   0   
0   1   0   0   0   0   1   1   1   0   
0   1   0   0   0   0   0   1   0   0   
1   1   0   1   0   0   1   0
0   0   0   1   0   0   0   1   0   0   
0   0   0   1   0   1   0   40  20  14.5
9.5 11  10  3   3   1   1   1   3   1   
1   3   4   1
3
HX  1   1   0   0   1   0   0   1   0   
0   0   0   0   0   0   1   0   1   0   
0   1   0   0   NA  NA  NA  NA  NA  NA  
NA  NA  NA
NA  NA  NA  NA  1  

Re: [R] mob(party) formula question

2008-08-13 Thread Achim Zeileis

On Wed, 13 Aug 2008, Birgitle wrote:


I try tu use mob() with my data.frame ('data.frame':288 obs. of  81
variables; factors, numerics and ordered factors)
My response is a binary variable and I should use for modelling a logistic
regression (family=binomial).

I read in the MOB Vignette that I could use a formula like this if I would
like to have only partitioning variables apart from the response.

Test.mob-mob(Resp~1|Var1+Var2+, data=dataframe, model=glinearModel,
family=binomial())


This works for me. Considering an example that is easily reproducible: 
classifying just two (out of three) species in the iris data.


iris2 - iris[-(1:50),]
iris2$Species - factor(iris2$Species)
mb - mob(Species ~ 1 | Petal.Length + Petal.Width + Sepal.Length +
   Sepal.Width, data = iris2, model = glinearModel, family = binomial())

and this runs fine, just selecting a single split

R mb
1) Petal.Width = 1.7; criterion = 1, statistic = 81.818
   2)*  weights = 54
Terminal node model
Binomial GLM with coefficients:
(Intercept)
  -2.282

1) Petal.Width  1.7
   3)*  weights = 46
Terminal node model
Binomial GLM with coefficients:
(Intercept)
   3.807


but this gives me back an error-message:

Error in `[.data.frame`(x, r, vars, drop = drop) :
 undefined columns selected

But Var1, Var2 and Resp are in my dataframe. Why do I get this error?


More importantly, when do you get this error? My guess is that this is 
during plotting, right?


If so, then the problem is that the plot() method for mob object by 
default calls node_bivplot() in each terminal node which is designed for 
generating partial regressor plots. In this situation this does not make 
sense because you don't have regressors in the terminal nodes.


We haven't got a panel function for the type of model you are looking at 
but I've just hacked a simple one that should be sufficient for your 
purposes. It is essentially like node_barplot() but exploits the binomial 
model. It is attached below. With this you can do

   plot(mb, terminal_panel = myplot, tnex = 2)


I am also wondering how I can find out which variables I should use for
partitioning and which for modelling?


For the variables for which a linear specification makes sense (at least 
in each component) then you should include them for modeling. And those 
variables for which it is not clear a priori what a useful parametric 
specification would be should be used as partitioning variables.



There are correlations between some variables in my dataframe. Would it be a
possibility to use always one variable of the correlated variable-pairs for
partitioning and one for modelling?


You can do that, but you could also do other combinations. That probably 
depends on your application.


hth,
Z

myplot - function(ctreeobj,
  col = black,
 fill = NULL,
 beside = NULL,
 ymax = NULL,
 ylines = NULL,
 widths = 1,
 gap = NULL,
 reverse = NULL,
 id = TRUE)
{
 getMaxPred - function(x) {
   mp - max(x$prediction)
   mpl - ifelse(x$terminal, 0, getMaxPred(x$left))
   mpr - ifelse(x$terminal, 0, getMaxPred(x$right))
   return(max(c(mp, mpl, mpr)))
 }

 y - response(ctreeobj)[[1]]

 if(is.factor(y) || class(y) == was_ordered) {
 ylevels - levels(y)
if(is.null(beside)) beside - if(length(ylevels)  3) FALSE else TRUE
 if(is.null(ymax)) ymax - if(beside) 1.1 else 1
if(is.null(gap)) gap - if(beside) 0.1 else 0
 } else {
 if(is.null(beside)) beside - FALSE
 if(is.null(ymax)) ymax - getMaxPred([EMAIL PROTECTED]) * 1.1
 ylevels - seq(along = [EMAIL PROTECTED])
 if(length(ylevels)  2) ylevels - 
if(is.null(gap)) gap - 1
 }
 if(is.null(reverse)) reverse - !beside
 if(is.null(fill)) fill - gray.colors(length(ylevels))
 if(is.null(ylines)) ylines - if(beside) c(3, 2) else c(1.5, 2.5)

 ### panel function for barplots in nodes
 rval - function(node) {

 ## parameter setup
fm - node$model
 pred - fm$family$linkinv(coef(fm))
if(reverse) {
  pred - rev(pred)
  ylevels - rev(ylevels)
}
 np - length(pred)
nc - if(beside) np else 1

fill - rep(fill, length.out = np)
 widths - rep(widths, length.out = nc)
col - rep(col, length.out = nc)
ylines - rep(ylines, length.out = 2)

gap - gap * sum(widths)
 yscale - c(0, ymax)
 xscale - c(0, sum(widths) + (nc+1)*gap)

 top_vp - viewport(layout = grid.layout(nrow = 2, ncol = 3,
widths = unit(c(ylines[1], 1, ylines[2]), c(lines, null, 
lines)),
heights = unit(c(1, 1), c(lines, null))),
width = unit(1, npc),
  

Re: [R] mob(party) formula question

2008-08-13 Thread Birgitle

Many thanks for your answer and the code that you offered me.

I get this error message after calling mob (look at my given example).
I guess it has something to do with the missings?

The iris example works also fine for me.

Sorry that I am not enough into statistics to really understand the
following:


Achim Zeileis wrote:
 
 
 .
 For the variables for which a linear specification makes sense (at least
 in each component) then you should include them for modeling. And those
 variables for which it is not clear a priori what a useful parametric
 specification would be should be used as partitioning variables. 
 ...
 
 

What do you mean with linear specification? I would be very happy if you
could explain.

Thanks again

B.



Achim Zeileis wrote:
 
 On Wed, 13 Aug 2008, Birgitle wrote:
 
 I try tu use mob() with my data.frame ('data.frame': 288 obs. of  81
 variables; factors, numerics and ordered factors)
 My response is a binary variable and I should use for modelling a
 logistic
 regression (family=binomial).

 I read in the MOB Vignette that I could use a formula like this if I
 would
 like to have only partitioning variables apart from the response.

 Test.mob-mob(Resp~1|Var1+Var2+, data=dataframe, model=glinearModel,
 family=binomial())
 
 This works for me. Considering an example that is easily reproducible: 
 classifying just two (out of three) species in the iris data.
 
 iris2 - iris[-(1:50),]
 iris2$Species - factor(iris2$Species)
 mb - mob(Species ~ 1 | Petal.Length + Petal.Width + Sepal.Length +
 Sepal.Width, data = iris2, model = glinearModel, family = binomial())
 
 
 
 


-
The art of living is more like wrestling than dancing.
(Marcus Aurelius)
-- 
View this message in context: 
http://www.nabble.com/mob%28party%29-formula-question-tp18959898p18962866.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mob(party) formula question

2008-08-13 Thread Achim Zeileis

On Wed, 13 Aug 2008, Birgitle wrote:


Many thanks for your answer and the code that you offered me.

I get this error message after calling mob (look at my given example).
I guess it has something to do with the missings?


Yes, you have to handle NAs in advance if you want to fit that model. 
We'll try to fix that in future versions.



The iris example works also fine for me.

Sorry that I am not enough into statistics to really understand the
following:


Achim Zeileis wrote:



.
For the variables for which a linear specification makes sense (at least
in each component) then you should include them for modeling. And those
variables for which it is not clear a priori what a useful parametric
specification would be should be used as partitioning variables.
...




What do you mean with linear specification? I would be very happy if you
could explain.


Well, in each node you fit a logistic regression model. This is a 
(generalized) linear model, hence the variables included have a linear 
influence (on the link scale) within each node. The partitioning variables 
on the other hand capture step-shaped influences (if they are selected by 
the algorithm). See the references on ?mob for further details.


Best,
Z

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mob(party) formula question

2008-08-13 Thread Birgitle

Thanks again.
Unfortunately I have always this missing values problem.
But the missings have also a meaning and its impossible to code it
differently or impute.

Also thanks for the explanation. Now I understand.

B.


Achim Zeileis wrote:
 
 On Wed, 13 Aug 2008, Birgitle wrote:
 
 Many thanks for your answer and the code that you offered me.

 I get this error message after calling mob (look at my given example).
 I guess it has something to do with the missings?
 
 Yes, you have to handle NAs in advance if you want to fit that model. 
 We'll try to fix that in future versions.
 
 The iris example works also fine for me.

 Sorry that I am not enough into statistics to really understand the
 following:


 Achim Zeileis wrote:


 .
 For the variables for which a linear specification makes sense (at least
 in each component) then you should include them for modeling. And those
 variables for which it is not clear a priori what a useful parametric
 specification would be should be used as partitioning variables.
 ...



 What do you mean with linear specification? I would be very happy if
 you
 could explain.
 
 Well, in each node you fit a logistic regression model. This is a 
 (generalized) linear model, hence the variables included have a linear 
 influence (on the link scale) within each node. The partitioning variables 
 on the other hand capture step-shaped influences (if they are selected by 
 the algorithm). See the references on ?mob for further details.
 
 Best,
 Z
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


-
The art of living is more like wrestling than dancing.
(Marcus Aurelius)
-- 
View this message in context: 
http://www.nabble.com/mob%28party%29-formula-question-tp18959898p18964864.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mob(party) formula question

2008-08-13 Thread Achim Zeileis

On Wed, 13 Aug 2008, Birgitle wrote:


Thanks again.
Unfortunately I have always this missing values problem.
But the missings have also a meaning and its impossible to code it
differently or impute.


That's ok. Just to clarify: NAs are not allowed in the response or the 
modeling variables. In principle, it would be possible to have NAs in the 
partitioning variables and try to handle it with surrogate splits. 
Currently, surrogates are not implemented in mob(), but we are currently 
working on infrastructure for this.


So the only work-around easily available at the moment is to call 
na.omit() (on the relevant variables only).


Best,
Z


Also thanks for the explanation. Now I understand.

B.


Achim Zeileis wrote:


On Wed, 13 Aug 2008, Birgitle wrote:


Many thanks for your answer and the code that you offered me.

I get this error message after calling mob (look at my given example).
I guess it has something to do with the missings?


Yes, you have to handle NAs in advance if you want to fit that model.
We'll try to fix that in future versions.


The iris example works also fine for me.

Sorry that I am not enough into statistics to really understand the
following:


Achim Zeileis wrote:



.
For the variables for which a linear specification makes sense (at least
in each component) then you should include them for modeling. And those
variables for which it is not clear a priori what a useful parametric
specification would be should be used as partitioning variables.
...




What do you mean with linear specification? I would be very happy if
you
could explain.


Well, in each node you fit a logistic regression model. This is a
(generalized) linear model, hence the variables included have a linear
influence (on the link scale) within each node. The partitioning variables
on the other hand capture step-shaped influences (if they are selected by
the algorithm). See the references on ?mob for further details.

Best,
Z

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





-
The art of living is more like wrestling than dancing.
(Marcus Aurelius)
--
View this message in context: 
http://www.nabble.com/mob%28party%29-formula-question-tp18959898p18964864.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.