Re: [R] Decision Tree: Am I Missing Anything?

Vik Rubenfeld Thu, 20 Sep 2012 22:51:55 -0700

Very good. Could  you point me in a couple of potential directions for variable 
reduction? E.g. correlation analysis?



On Sep 20, 2012, at 10:36 PM, Bhupendrasinh Thakre wrote:

> One possible way to think of it is using " variable reduction" before going 
> for J48. You may want to use several methods available for that. Again 
> prediction for brands is more of a business question to me. 
> 
> Two solution which I can think of.
> 1. Variable reduction before decision tree.
> 2. Let the intuition decide how many of them are "really" important.
> 
> Please let us know your findings. All the best.
> 
> Best Regards,
> 
> Bhupendrasinh Thakre
> Sent from my iPhone
> 
> On Sep 21, 2012, at 12:16 AM, Vik Rubenfeld <v...@mindspring.com> wrote:
> 
>> Bhupendrashinh, thanks very much!  I ran J48 on a respondent-level data set 
>> and got a 61.75% correct classification rate!
>> 
>> Correctly Classified Instances         988               61.75   %
>> Incorrectly Classified Instances       612               38.25   %
>> Kappa statistic                          0.5651
>> Mean absolute error                      0.0432
>> Root mean squared error                  0.1469
>> Relative absolute error                 52.7086 %
>> Root relative squared error             72.6299 %
>> Coverage of cases (0.95 level)          99.6875 %
>> Mean rel. region size (0.95 level)      15.4915 %
>> Total Number of Instances             1600     
>> 
>> When I plot it I get an enormous chart.  Running :
>> 
>> >respLevelTree = J48(BRAND_NAME ~ PRI + PROM + FORM + FAMI + DRRE + FREC + 
>> >MODE + SPED + REVW, data = respLevel)
>> >respLevelTree
>> 
>> ...reports:
>> 
>> J48 pruned tree
>> ------------------
>> 
>> Is there a way to further prune the tree so that I can present a chart that 
>> would fit on a single page or two?
>> 
>> Thanks very much in advance for any thoughts.
>> 
>> 
>> -Vik
>> 
>> 
>> 
>> 
>> On Sep 20, 2012, at 8:37 PM, Bhupendrasinh Thakre wrote:
>> 
>>> Not very sure what the problem is as I was not able to take your data for 
>>> run. You might want to use dput() command to present the data. 
>>> 
>>> Now on the programming side. As we can see that we have more than 2 levels 
>>> for the brands and hence method  = class is not able to able to understand 
>>> what you actually want from it.
>>> 
>>> Suggestion : For predictions having more than 2 levels I will go for Weka 
>>> and specifically C4.5 algorithm. You also have the RWeka package for it.
>>> 
>>> Best Regards,
>>> 
>>> Bhupendrasinh Thakre
>>> Sent from my iPhone
>>> 
>>> On Sep 20, 2012, at 9:47 PM, Vik Rubenfeld <v...@mindspring.com> wrote:
>>> 
>>>> I'm working with some data from which a client would like to make a 
>>>> decision tree predicting brand preference based on inputs such as price, 
>>>> speed, etc.  After running the decision tree analysis using rpart, it 
>>>> appears that this data is not capable of predicting brand preference.  
>>>> 
>>>> Here's the data set:
>>>> 
>>>> BRND      PRI       PROM      FORM      FAMI      DRRE      FREC      MODE 
>>>>      SPED      REVW
>>>> Brand 1       0.6989    0.4731    0.7849    0.6989    0.7419    0.6022    
>>>> 0.8817    0.9032    0.6452
>>>> Brand 2       0.8621    0.3793    0.8621     0.931    0.7586    0.6897    
>>>> 0.8966    0.9655    0.8276
>>>> Brand 3          0.6       0.1       0.6       0.7       0.9       0.7     
>>>>   0.7       0.8       0.6
>>>> Brand 4       0.6429      0.25    0.5714       0.5    0.6071       0.5     
>>>>  0.75    0.8214       0.5
>>>> Brand 5       0.7586    0.4224    0.7328    0.6638    0.7328    0.6379    
>>>> 0.8621    0.8621    0.6897
>>>> Brand 6         0.75    0.0833    0.5833    0.4167       0.5    0.4167     
>>>>  0.75    0.6667       0.5
>>>> Brand 7       0.7742    0.4839    0.6129    0.5161    0.8065    0.6452    
>>>> 0.7742    0.9032    0.6129
>>>> Brand 8       0.6429    0.2679    0.6964    0.7143     0.875    0.5536    
>>>> 0.8036    0.9464    0.6607
>>>> Brand 9        0.575     0.175      0.65      0.55     0.625     0.375     
>>>> 0.825      0.85     0.475
>>>> Brand 10      0.8095    0.5238    0.6667    0.6429    0.6667    0.5952    
>>>> 0.8571    0.8095    0.5714
>>>> Brand 11      0.6308       0.3    0.6077    0.5846    0.6769    0.5231    
>>>> 0.7462    0.8846       0.6
>>>> Brand 12      0.7212    0.3152    0.7152    0.6545    0.6606     0.503    
>>>> 0.8061    0.8909       0.6
>>>> Brand 13      0.7419    0.2258    0.6129    0.5806    0.7097    0.6129     
>>>> 0.871    0.9677    0.3226
>>>> Brand 14      0.7176    0.2706    0.6353    0.5647    0.6941    0.4471    
>>>> 0.7176    0.9412    0.5176
>>>> Brand 15      0.7287    0.3437    0.5995    0.5788    0.8527    0.5478    
>>>> 0.8217    0.8941    0.6227
>>>> Brand 16         0.7       0.4       0.6       0.4         1       0.4     
>>>>   0.9       0.9       0.5
>>>> Brand 17      0.7193    0.3333    0.6667    0.6667    0.7018    0.5263    
>>>> 0.7719    0.8596    0.7018
>>>> Brand 18      0.7778    0.4127    0.6508    0.6349    0.7937    0.6032    
>>>> 0.8571    0.9206     0.619
>>>> Brand 19      0.8028    0.2817    0.6197    0.4366    0.7042    0.4366    
>>>> 0.7183    0.9155    0.5634
>>>> Brand 20      0.7736    0.2453    0.6226    0.3774    0.5849    0.3019     
>>>> 0.717    0.8679    0.4717
>>>> Brand 21      0.8481    0.2152    0.6329    0.4051    0.6329    0.4557    
>>>> 0.6962    0.8481    0.3418
>>>> Brand 22        0.75    0.3333    0.6667       0.5    0.6667    0.5833    
>>>> 0.9167    0.9167    0.4167
>>>> 
>>>> Here are my R commands:
>>>> 
>>>>> test.df = read.csv("test.csv")
>>>>> head(test.df)
>>>>    BRND    PRI   PROM   FORM   FAMI   DRRE   FREC   MODE   SPED   REVW
>>>> 1 Brand 1 0.6989 0.4731 0.7849 0.6989 0.7419 0.6022 0.8817 0.9032 0.6452
>>>> 2 Brand 2 0.8621 0.3793 0.8621 0.9310 0.7586 0.6897 0.8966 0.9655 0.8276
>>>> 3 Brand 3 0.6000 0.1000 0.6000 0.7000 0.9000 0.7000 0.7000 0.8000 0.6000
>>>> 4 Brand 4 0.6429 0.2500 0.5714 0.5000 0.6071 0.5000 0.7500 0.8214 0.5000
>>>> 5 Brand 5 0.7586 0.4224 0.7328 0.6638 0.7328 0.6379 0.8621 0.8621 0.6897
>>>> 6 Brand 6 0.7500 0.0833 0.5833 0.4167 0.5000 0.4167 0.7500 0.6667 0.5000
>>>> 
>>>>> testTree = rpart(BRAND~PRI  + PROM  + FORM +  FAMI+   DRRE +  FREC  + 
>>>>> MODE +  SPED +  REVW, method="class", data=test.df)
>>>> 
>>>>> printcp(testTree)
>>>> 
>>>> Classification tree:
>>>> rpart(formula = BRND ~ PRI + PROM + FORM + FAMI + DRRE + FREC + 
>>>>   MODE + SPED + REVW, data = test.df, method = "class")
>>>> 
>>>> Variables actually used in tree construction:
>>>> [1] FORM
>>>> 
>>>> Root node error: 21/22 = 0.95455
>>>> 
>>>> n= 22 
>>>> 
>>>>       CP nsplit rel error xerror xstd
>>>> 1 0.047619      0   1.00000 1.0476    0
>>>> 2 0.010000      1   0.95238 1.0476    0
>>>> 
>>>> I note that only one variable (FORM) was actually used in tree 
>>>> construction. When I run a plot using:
>>>> 
>>>>> plot(testTree)
>>>>> text(testTree)
>>>> 
>>>> ...I get a tree with one branch.  
>>>> 
>>>> It looks to me like I'm doing everything right, and this data is just not 
>>>> capable of predicting brand preference. 
>>>> 
>>>> Am I missing anything?
>>>> 
>>>> Thanks very much in advance for any thoughts!
>>>> 
>>>> -Vik
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>   [[alternative HTML version deleted]]
>>>> 
>>>> ______________________________________________
>>>> R-help@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide 
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>> 


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Decision Tree: Am I Missing Anything?

Reply via email to