[R] mob(party) formula question
I try tu use mob() with my data.frame ('data.frame':288 obs. of 81 variables; factors, numerics and ordered factors) My response is a binary variable and I should use for modelling a logistic regression (family=binomial). I read in the MOB Vignette that I could use a formula like this if I would like to have only partitioning variables apart from the response. Test.mob-mob(Resp~1|Var1+Var2+, data=dataframe, model=glinearModel, family=binomial()) but this gives me back an error-message: Fehler in `[.data.frame`(x, r, vars, drop = drop) : undefined columns selected Error in `[.data.frame`(x, r, vars, drop = drop) : undefined columns selected But Var1, Var2 and Resp are in my dataframe. Why do I get this error? I am also wondering how I can find out which variables I should use for partitioning and which for modelling? There are correlations between some variables in my dataframe. Would it be a possibility to use always one variable of the correlated variable-pairs for partitioning and one for modelling? I would be very happy if somebody could give me some hints or answers to my questions. Many thanks in advance. B. - The art of living is more like wrestling than dancing. (Marcus Aurelius) -- View this message in context: http://www.nabble.com/mob%28party%29-formula-question-tp18959898p18959898.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mob(party) formula question (example)
Here is an example that produces the same error: Read in the following as textfile (save as DFExample.txt): 1 2 3 4 7 8 9 10 12 13 14 15 16 17 18 19 21 22 23 25 27 28 29 30 31 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 AX 1 1 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 1 25 5 9 1 8.5 2.5 3 5 2 2 3 3 1 1 1 2 1 2 BX 1 1 0 0 1 0 0 1 NA NA NA 0 0 0 0 1 0 0 1 0 0 1 0 NA NA NA NA NA NA NA NA 0 0 0 1 0 NA NA NA NA NA NA NA 0 0 0 0 1 1 0 0 0 1 1 NA NA 6 1 3.252.255 5 2 2 3 3 1 1 1 1 1 1 CX 1 1 0 0 1 0 0 1 1 0 0 0 1 0 1 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 15 3.5 6 1 5.5 5.5 5 5 2 2 1 2 1 1 1 1 2 2 DX 1 1 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 50 17.57.5 2.5 8.5 5 5 5 2 2 2 3 1 1 1 1 3 3 EX 1 0 1 0 1 0 0 1 NA NA NA 0 0 0 1 1 0 1 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 1 0 NA NA 14.530 13 2.5 3 3 1 1 4 4 1 1 1 1 1 1 FX 1 0 1 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 1 1 1 0 0 1 1 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 1 0 1 0 1 1 0 0 165 25 11.5 15 12 6.5 5 5 1 1 3 3 1 1 1 1 4 5 GX 1 0 1 0 1 0 0 1 0 0 1 0 0 0 0 1 1 1 0 0 1 0 0 0 0 0 1 0 0 1 1 0 1 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 0 40 20 14.5 9.5 11 10 3 3 1 1 1 3 1 1 3 4 1 3 HX 1 1 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA 1
Re: [R] mob(party) formula question
On Wed, 13 Aug 2008, Birgitle wrote: I try tu use mob() with my data.frame ('data.frame':288 obs. of 81 variables; factors, numerics and ordered factors) My response is a binary variable and I should use for modelling a logistic regression (family=binomial). I read in the MOB Vignette that I could use a formula like this if I would like to have only partitioning variables apart from the response. Test.mob-mob(Resp~1|Var1+Var2+, data=dataframe, model=glinearModel, family=binomial()) This works for me. Considering an example that is easily reproducible: classifying just two (out of three) species in the iris data. iris2 - iris[-(1:50),] iris2$Species - factor(iris2$Species) mb - mob(Species ~ 1 | Petal.Length + Petal.Width + Sepal.Length + Sepal.Width, data = iris2, model = glinearModel, family = binomial()) and this runs fine, just selecting a single split R mb 1) Petal.Width = 1.7; criterion = 1, statistic = 81.818 2)* weights = 54 Terminal node model Binomial GLM with coefficients: (Intercept) -2.282 1) Petal.Width 1.7 3)* weights = 46 Terminal node model Binomial GLM with coefficients: (Intercept) 3.807 but this gives me back an error-message: Error in `[.data.frame`(x, r, vars, drop = drop) : undefined columns selected But Var1, Var2 and Resp are in my dataframe. Why do I get this error? More importantly, when do you get this error? My guess is that this is during plotting, right? If so, then the problem is that the plot() method for mob object by default calls node_bivplot() in each terminal node which is designed for generating partial regressor plots. In this situation this does not make sense because you don't have regressors in the terminal nodes. We haven't got a panel function for the type of model you are looking at but I've just hacked a simple one that should be sufficient for your purposes. It is essentially like node_barplot() but exploits the binomial model. It is attached below. With this you can do plot(mb, terminal_panel = myplot, tnex = 2) I am also wondering how I can find out which variables I should use for partitioning and which for modelling? For the variables for which a linear specification makes sense (at least in each component) then you should include them for modeling. And those variables for which it is not clear a priori what a useful parametric specification would be should be used as partitioning variables. There are correlations between some variables in my dataframe. Would it be a possibility to use always one variable of the correlated variable-pairs for partitioning and one for modelling? You can do that, but you could also do other combinations. That probably depends on your application. hth, Z myplot - function(ctreeobj, col = black, fill = NULL, beside = NULL, ymax = NULL, ylines = NULL, widths = 1, gap = NULL, reverse = NULL, id = TRUE) { getMaxPred - function(x) { mp - max(x$prediction) mpl - ifelse(x$terminal, 0, getMaxPred(x$left)) mpr - ifelse(x$terminal, 0, getMaxPred(x$right)) return(max(c(mp, mpl, mpr))) } y - response(ctreeobj)[[1]] if(is.factor(y) || class(y) == was_ordered) { ylevels - levels(y) if(is.null(beside)) beside - if(length(ylevels) 3) FALSE else TRUE if(is.null(ymax)) ymax - if(beside) 1.1 else 1 if(is.null(gap)) gap - if(beside) 0.1 else 0 } else { if(is.null(beside)) beside - FALSE if(is.null(ymax)) ymax - getMaxPred([EMAIL PROTECTED]) * 1.1 ylevels - seq(along = [EMAIL PROTECTED]) if(length(ylevels) 2) ylevels - if(is.null(gap)) gap - 1 } if(is.null(reverse)) reverse - !beside if(is.null(fill)) fill - gray.colors(length(ylevels)) if(is.null(ylines)) ylines - if(beside) c(3, 2) else c(1.5, 2.5) ### panel function for barplots in nodes rval - function(node) { ## parameter setup fm - node$model pred - fm$family$linkinv(coef(fm)) if(reverse) { pred - rev(pred) ylevels - rev(ylevels) } np - length(pred) nc - if(beside) np else 1 fill - rep(fill, length.out = np) widths - rep(widths, length.out = nc) col - rep(col, length.out = nc) ylines - rep(ylines, length.out = 2) gap - gap * sum(widths) yscale - c(0, ymax) xscale - c(0, sum(widths) + (nc+1)*gap) top_vp - viewport(layout = grid.layout(nrow = 2, ncol = 3, widths = unit(c(ylines[1], 1, ylines[2]), c(lines, null, lines)), heights = unit(c(1, 1), c(lines, null))), width = unit(1, npc),
Re: [R] mob(party) formula question
Many thanks for your answer and the code that you offered me. I get this error message after calling mob (look at my given example). I guess it has something to do with the missings? The iris example works also fine for me. Sorry that I am not enough into statistics to really understand the following: Achim Zeileis wrote: . For the variables for which a linear specification makes sense (at least in each component) then you should include them for modeling. And those variables for which it is not clear a priori what a useful parametric specification would be should be used as partitioning variables. ... What do you mean with linear specification? I would be very happy if you could explain. Thanks again B. Achim Zeileis wrote: On Wed, 13 Aug 2008, Birgitle wrote: I try tu use mob() with my data.frame ('data.frame': 288 obs. of 81 variables; factors, numerics and ordered factors) My response is a binary variable and I should use for modelling a logistic regression (family=binomial). I read in the MOB Vignette that I could use a formula like this if I would like to have only partitioning variables apart from the response. Test.mob-mob(Resp~1|Var1+Var2+, data=dataframe, model=glinearModel, family=binomial()) This works for me. Considering an example that is easily reproducible: classifying just two (out of three) species in the iris data. iris2 - iris[-(1:50),] iris2$Species - factor(iris2$Species) mb - mob(Species ~ 1 | Petal.Length + Petal.Width + Sepal.Length + Sepal.Width, data = iris2, model = glinearModel, family = binomial()) - The art of living is more like wrestling than dancing. (Marcus Aurelius) -- View this message in context: http://www.nabble.com/mob%28party%29-formula-question-tp18959898p18962866.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mob(party) formula question
On Wed, 13 Aug 2008, Birgitle wrote: Many thanks for your answer and the code that you offered me. I get this error message after calling mob (look at my given example). I guess it has something to do with the missings? Yes, you have to handle NAs in advance if you want to fit that model. We'll try to fix that in future versions. The iris example works also fine for me. Sorry that I am not enough into statistics to really understand the following: Achim Zeileis wrote: . For the variables for which a linear specification makes sense (at least in each component) then you should include them for modeling. And those variables for which it is not clear a priori what a useful parametric specification would be should be used as partitioning variables. ... What do you mean with linear specification? I would be very happy if you could explain. Well, in each node you fit a logistic regression model. This is a (generalized) linear model, hence the variables included have a linear influence (on the link scale) within each node. The partitioning variables on the other hand capture step-shaped influences (if they are selected by the algorithm). See the references on ?mob for further details. Best, Z __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mob(party) formula question
Thanks again. Unfortunately I have always this missing values problem. But the missings have also a meaning and its impossible to code it differently or impute. Also thanks for the explanation. Now I understand. B. Achim Zeileis wrote: On Wed, 13 Aug 2008, Birgitle wrote: Many thanks for your answer and the code that you offered me. I get this error message after calling mob (look at my given example). I guess it has something to do with the missings? Yes, you have to handle NAs in advance if you want to fit that model. We'll try to fix that in future versions. The iris example works also fine for me. Sorry that I am not enough into statistics to really understand the following: Achim Zeileis wrote: . For the variables for which a linear specification makes sense (at least in each component) then you should include them for modeling. And those variables for which it is not clear a priori what a useful parametric specification would be should be used as partitioning variables. ... What do you mean with linear specification? I would be very happy if you could explain. Well, in each node you fit a logistic regression model. This is a (generalized) linear model, hence the variables included have a linear influence (on the link scale) within each node. The partitioning variables on the other hand capture step-shaped influences (if they are selected by the algorithm). See the references on ?mob for further details. Best, Z __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - The art of living is more like wrestling than dancing. (Marcus Aurelius) -- View this message in context: http://www.nabble.com/mob%28party%29-formula-question-tp18959898p18964864.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mob(party) formula question
On Wed, 13 Aug 2008, Birgitle wrote: Thanks again. Unfortunately I have always this missing values problem. But the missings have also a meaning and its impossible to code it differently or impute. That's ok. Just to clarify: NAs are not allowed in the response or the modeling variables. In principle, it would be possible to have NAs in the partitioning variables and try to handle it with surrogate splits. Currently, surrogates are not implemented in mob(), but we are currently working on infrastructure for this. So the only work-around easily available at the moment is to call na.omit() (on the relevant variables only). Best, Z Also thanks for the explanation. Now I understand. B. Achim Zeileis wrote: On Wed, 13 Aug 2008, Birgitle wrote: Many thanks for your answer and the code that you offered me. I get this error message after calling mob (look at my given example). I guess it has something to do with the missings? Yes, you have to handle NAs in advance if you want to fit that model. We'll try to fix that in future versions. The iris example works also fine for me. Sorry that I am not enough into statistics to really understand the following: Achim Zeileis wrote: . For the variables for which a linear specification makes sense (at least in each component) then you should include them for modeling. And those variables for which it is not clear a priori what a useful parametric specification would be should be used as partitioning variables. ... What do you mean with linear specification? I would be very happy if you could explain. Well, in each node you fit a logistic regression model. This is a (generalized) linear model, hence the variables included have a linear influence (on the link scale) within each node. The partitioning variables on the other hand capture step-shaped influences (if they are selected by the algorithm). See the references on ?mob for further details. Best, Z __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - The art of living is more like wrestling than dancing. (Marcus Aurelius) -- View this message in context: http://www.nabble.com/mob%28party%29-formula-question-tp18959898p18964864.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.