Very good. Could you point me in a couple of potential directions for variable reduction? E.g. correlation analysis?
On Sep 20, 2012, at 10:36 PM, Bhupendrasinh Thakre wrote: > One possible way to think of it is using " variable reduction" before going > for J48. You may want to use several methods available for that. Again > prediction for brands is more of a business question to me. > > Two solution which I can think of. > 1. Variable reduction before decision tree. > 2. Let the intuition decide how many of them are "really" important. > > Please let us know your findings. All the best. > > Best Regards, > > Bhupendrasinh Thakre > Sent from my iPhone > > On Sep 21, 2012, at 12:16 AM, Vik Rubenfeld <v...@mindspring.com> wrote: > >> Bhupendrashinh, thanks very much! I ran J48 on a respondent-level data set >> and got a 61.75% correct classification rate! >> >> Correctly Classified Instances 988 61.75 % >> Incorrectly Classified Instances 612 38.25 % >> Kappa statistic 0.5651 >> Mean absolute error 0.0432 >> Root mean squared error 0.1469 >> Relative absolute error 52.7086 % >> Root relative squared error 72.6299 % >> Coverage of cases (0.95 level) 99.6875 % >> Mean rel. region size (0.95 level) 15.4915 % >> Total Number of Instances 1600 >> >> When I plot it I get an enormous chart. Running : >> >> >respLevelTree = J48(BRAND_NAME ~ PRI + PROM + FORM + FAMI + DRRE + FREC + >> >MODE + SPED + REVW, data = respLevel) >> >respLevelTree >> >> ...reports: >> >> J48 pruned tree >> ------------------ >> >> Is there a way to further prune the tree so that I can present a chart that >> would fit on a single page or two? >> >> Thanks very much in advance for any thoughts. >> >> >> -Vik >> >> >> >> >> On Sep 20, 2012, at 8:37 PM, Bhupendrasinh Thakre wrote: >> >>> Not very sure what the problem is as I was not able to take your data for >>> run. You might want to use dput() command to present the data. >>> >>> Now on the programming side. As we can see that we have more than 2 levels >>> for the brands and hence method = class is not able to able to understand >>> what you actually want from it. >>> >>> Suggestion : For predictions having more than 2 levels I will go for Weka >>> and specifically C4.5 algorithm. You also have the RWeka package for it. >>> >>> Best Regards, >>> >>> Bhupendrasinh Thakre >>> Sent from my iPhone >>> >>> On Sep 20, 2012, at 9:47 PM, Vik Rubenfeld <v...@mindspring.com> wrote: >>> >>>> I'm working with some data from which a client would like to make a >>>> decision tree predicting brand preference based on inputs such as price, >>>> speed, etc. After running the decision tree analysis using rpart, it >>>> appears that this data is not capable of predicting brand preference. >>>> >>>> Here's the data set: >>>> >>>> BRND PRI PROM FORM FAMI DRRE FREC MODE >>>> SPED REVW >>>> Brand 1 0.6989 0.4731 0.7849 0.6989 0.7419 0.6022 >>>> 0.8817 0.9032 0.6452 >>>> Brand 2 0.8621 0.3793 0.8621 0.931 0.7586 0.6897 >>>> 0.8966 0.9655 0.8276 >>>> Brand 3 0.6 0.1 0.6 0.7 0.9 0.7 >>>> 0.7 0.8 0.6 >>>> Brand 4 0.6429 0.25 0.5714 0.5 0.6071 0.5 >>>> 0.75 0.8214 0.5 >>>> Brand 5 0.7586 0.4224 0.7328 0.6638 0.7328 0.6379 >>>> 0.8621 0.8621 0.6897 >>>> Brand 6 0.75 0.0833 0.5833 0.4167 0.5 0.4167 >>>> 0.75 0.6667 0.5 >>>> Brand 7 0.7742 0.4839 0.6129 0.5161 0.8065 0.6452 >>>> 0.7742 0.9032 0.6129 >>>> Brand 8 0.6429 0.2679 0.6964 0.7143 0.875 0.5536 >>>> 0.8036 0.9464 0.6607 >>>> Brand 9 0.575 0.175 0.65 0.55 0.625 0.375 >>>> 0.825 0.85 0.475 >>>> Brand 10 0.8095 0.5238 0.6667 0.6429 0.6667 0.5952 >>>> 0.8571 0.8095 0.5714 >>>> Brand 11 0.6308 0.3 0.6077 0.5846 0.6769 0.5231 >>>> 0.7462 0.8846 0.6 >>>> Brand 12 0.7212 0.3152 0.7152 0.6545 0.6606 0.503 >>>> 0.8061 0.8909 0.6 >>>> Brand 13 0.7419 0.2258 0.6129 0.5806 0.7097 0.6129 >>>> 0.871 0.9677 0.3226 >>>> Brand 14 0.7176 0.2706 0.6353 0.5647 0.6941 0.4471 >>>> 0.7176 0.9412 0.5176 >>>> Brand 15 0.7287 0.3437 0.5995 0.5788 0.8527 0.5478 >>>> 0.8217 0.8941 0.6227 >>>> Brand 16 0.7 0.4 0.6 0.4 1 0.4 >>>> 0.9 0.9 0.5 >>>> Brand 17 0.7193 0.3333 0.6667 0.6667 0.7018 0.5263 >>>> 0.7719 0.8596 0.7018 >>>> Brand 18 0.7778 0.4127 0.6508 0.6349 0.7937 0.6032 >>>> 0.8571 0.9206 0.619 >>>> Brand 19 0.8028 0.2817 0.6197 0.4366 0.7042 0.4366 >>>> 0.7183 0.9155 0.5634 >>>> Brand 20 0.7736 0.2453 0.6226 0.3774 0.5849 0.3019 >>>> 0.717 0.8679 0.4717 >>>> Brand 21 0.8481 0.2152 0.6329 0.4051 0.6329 0.4557 >>>> 0.6962 0.8481 0.3418 >>>> Brand 22 0.75 0.3333 0.6667 0.5 0.6667 0.5833 >>>> 0.9167 0.9167 0.4167 >>>> >>>> Here are my R commands: >>>> >>>>> test.df = read.csv("test.csv") >>>>> head(test.df) >>>> BRND PRI PROM FORM FAMI DRRE FREC MODE SPED REVW >>>> 1 Brand 1 0.6989 0.4731 0.7849 0.6989 0.7419 0.6022 0.8817 0.9032 0.6452 >>>> 2 Brand 2 0.8621 0.3793 0.8621 0.9310 0.7586 0.6897 0.8966 0.9655 0.8276 >>>> 3 Brand 3 0.6000 0.1000 0.6000 0.7000 0.9000 0.7000 0.7000 0.8000 0.6000 >>>> 4 Brand 4 0.6429 0.2500 0.5714 0.5000 0.6071 0.5000 0.7500 0.8214 0.5000 >>>> 5 Brand 5 0.7586 0.4224 0.7328 0.6638 0.7328 0.6379 0.8621 0.8621 0.6897 >>>> 6 Brand 6 0.7500 0.0833 0.5833 0.4167 0.5000 0.4167 0.7500 0.6667 0.5000 >>>> >>>>> testTree = rpart(BRAND~PRI + PROM + FORM + FAMI+ DRRE + FREC + >>>>> MODE + SPED + REVW, method="class", data=test.df) >>>> >>>>> printcp(testTree) >>>> >>>> Classification tree: >>>> rpart(formula = BRND ~ PRI + PROM + FORM + FAMI + DRRE + FREC + >>>> MODE + SPED + REVW, data = test.df, method = "class") >>>> >>>> Variables actually used in tree construction: >>>> [1] FORM >>>> >>>> Root node error: 21/22 = 0.95455 >>>> >>>> n= 22 >>>> >>>> CP nsplit rel error xerror xstd >>>> 1 0.047619 0 1.00000 1.0476 0 >>>> 2 0.010000 1 0.95238 1.0476 0 >>>> >>>> I note that only one variable (FORM) was actually used in tree >>>> construction. When I run a plot using: >>>> >>>>> plot(testTree) >>>>> text(testTree) >>>> >>>> ...I get a tree with one branch. >>>> >>>> It looks to me like I'm doing everything right, and this data is just not >>>> capable of predicting brand preference. >>>> >>>> Am I missing anything? >>>> >>>> Thanks very much in advance for any thoughts! >>>> >>>> -Vik >>>> >>>> >>>> >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >> [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.