Re: [R] 'R' Software Output Plagiarism

2015-09-24 Thread Vik Rubenfeld
Your professor should immediately recognize that the quoted code is standard 
regression input/output and that the Urkund results in this case are without 
merit.


> On Sep 22, 2015, at 7:27 AM, BARRETT, Oliver  wrote:
> 
> 
> Dear 'R' community support,
> 
> 
> I am a student at Skema business school and I have recently submitted my MSc 
> thesis/dissertation. This has been passed on to an external plagiarism 
> service provider, Urkund, who have scanned my document and returned a 
> plagiarism report to my professor having detected 32% plagiarism.
> 
> 
> I have contacted Urkund regarding this issue having committed no such 
> plagiarism and they have told me that all the plagiarism detected in my 
> document comes from the last 25% which consists only of 'R' regressions like 
> the one I have pasted below:
> 
> lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. +
>Fed.t.4., data = OLS_CAR, x = TRUE)
> 
> Residuals:
>  Min1QMedian3Q   Max
> -0.154587 -0.015961  0.001429  0.017196  0.110907
> 
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) -0.001630   0.001763  -0.925   0.3559
> Fed -0.121595   0.165359  -0.735   0.4627
> Fed.t.1. 0.344014   0.140979   2.440   0.0153 *
> Fed.t.2. 0.026529   0.143648   0.185   0.8536
> Fed.t.3. 0.622357   0.142021   4.382 1.62e-05 ***
> Fed.t.4. 0.291985   0.158914   1.837   0.0671 .
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> 
> Residual standard error: 0.0293 on 304 degrees of freedom
>  (20 observations deleted due to missingness)
> Multiple R-squared:  0.08629,  Adjusted R-squared:  0.07126
> F-statistic: 5.742 on 5 and 304 DF,  p-value: 4.422e-05
> 
> I have produced all of these regressions myself and pasted them directly from 
> the 'R' software package. My regression methodology is entirely my own along 
> with the sourcing and preperation of the data used to produce these 
> statistics.
> 
> I would be very grateful if you could provide my with some clarity as to why 
> this output from 'R' is reading as plagiarism.
> 
> I would like to thank you in advance,
> 
> Kind regards,
> 
> Oliver Barrett
> (+44) 7341 834 217
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using choiceDes Package to Design MaxDiff?

2015-03-12 Thread Vik Rubenfeld
I’m seeking to design a MaxDiff experiment that will have a number of blocks of 
this type:

Which of these items is the
most important?

Which of these items is the 
least important?

Item 1
Item 2
Item 3
Item 4

I’m seeking to use the choiceDes package 
http://cran.r-project.org/web/packages/choiceDes/choiceDes.pdf to design the 
experiment. The relevant function is tradeoff.des. Usage:

tradeoff.des(items, shown, vers, tasks, fname=NULL, Rd=20, Rc=NULL, print=TRUE)

I believe I understand the items, shown and vers parameters:
items: number of total items in the experiment
show: number of items shown per block
vers: number of blocks
...but I’m not quite sure what the tasks parameter is yet. For example, let’s 
say I have 20 items total in the study. I want to show 4 items per block, with 
10 blocks total. I enter:

tempDes - tradeoff.des(20, 4, 10, 2, tempDesign.txt, 20, NULL, TRUE)

…hoping that the 2 is the number of questions (most important/least important) 
per block. I get an error:

Error in optBlock(~., des.d, rep(tasks, vers), nRepeats = Rd) : 
  The number of withinData rows is not large enough to support the blocked 
model.

If I put the number of blocks up to 50:

 tempDes - tradeoff.des(20, 4, 50, 2, tempDesign.txt, 20, NULL, TRUE)

…I get the same error. 

What is the correct way to use choiceDes to design a MaxDiff experiment of this 
kind?

Thanks very much in advance to all for any thoughts or info!

Best,


-Vik



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Using choiceDes Package to Design MaxDiff?

2015-03-12 Thread Vik Rubenfeld
I’m seeking to design a MaxDiff experiment that will have a number of blocks of 
this type:

Which of these items is the
most important?

Which of these items is the 
least important?

Item 1
Item 2
Item 3
Item 4

I’m seeking to use the choiceDes package 
http://cran.r-project.org/web/packages/choiceDes/choiceDes.pdf to design the 
experiment. The relevant function is tradeoff.des. Usage:

tradeoff.des(items, shown, vers, tasks, fname=NULL, Rd=20, Rc=NULL, print=TRUE)

I believe I understand the items, shown and vers parameters:
items: number of total items in the experiment
show: number of items shown per block
vers: number of blocks
...but I’m not quite sure what the tasks parameter is yet. For example, let’s 
say I have 20 items total in the study. I want to show 4 items per block, with 
10 blocks total. I enter:

tempDes - tradeoff.des(20, 4, 10, 2, tempDesign.txt, 20, NULL, TRUE)

…hoping that the 2 is the number of questions (most important/least important) 
per block. I get an error:

Error in optBlock(~., des.d, rep(tasks, vers), nRepeats = Rd) : 
  The number of withinData rows is not large enough to support the blocked 
model.

If I put the number of blocks up to 50:

 tempDes - tradeoff.des(20, 4, 50, 2, tempDesign.txt, 20, NULL, TRUE)

…I get the same error. 

What is the correct way to use choiceDes to design a MaxDiff experiment of this 
kind?

Thanks very much in advance to all for any thoughts or info!

Best,


-Vik



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Conjoint Package

2014-08-29 Thread Vik Rubenfeld
I’m very glad to see the Conjoint Package for R. The documentation for it does 
not appear to specify methods for data acquisition. Are the cards to be 
individually scored by each respondent (most clients would rather see a 
choice-based methodology)?

SurveyGizmo, an excellent online survey host which I use, has in beta a 
Conjoint question type. However, it does not appear to calculate 
respondent-level utility values at this time. 

SurveyGizmo supports a conjoint question design in which each respondent is 
shown 3 cards at a time, and permitted to identify one of the three as Best, 
and one as Worst. (SG supports additional conjoint question designs as well).

Data acquired by SurveyGizmo conjoint looks like this for each respondent:

 Set #1
 Model Attribute   Model Value
 Price $300
 Size  7
 Memory128 gb
 Score:50
 
 Set #2
 Model Attribute   Model Value
 Price $100
 Size  4
 Memory16 gb
 Score:0
 
 Set #3
 Model Attribute   Model Value
 Price $200
 Size  6
 Memory64 gb
 Score:100
 
 Set #4
 Model Attribute   Model Value
 Price $100
 Size  5
 Memory32 gb
 Score:100
 
 Set #5
 Model Attribute   Model Value
 Price $200
 Size  5
 Memory32 gb
 Score:0

Score 100 = Best 
Score 50 = Not selected
Score 0 = Worst

Is it possible to use R-Project Conjoint Package with such a data file, to 
calculate respondent-level utility values?

Thanks very much in advance to all for any info!

Best,


-Vik
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Conjoint Package

2014-08-29 Thread Vik Rubenfeld
I’m very glad to see the Conjoint Package for R. The documentation for it does 
not appear to specify methods for data acquisition. Are the cards to be 
individually scored by each respondent (most clients would rather see a 
choice-based methodology)?

SurveyGizmo, an excellent online survey host which I use, has in beta a 
Conjoint question type. However, it does not appear to calculate 
respondent-level utility values at this time. 

SurveyGizmo supports a conjoint question design in which each respondent is 
shown 3 cards at a time, and permitted to identify one of the three as Best, 
and one as Worst. (SG supports additional conjoint question designs as well).

Data acquired by SurveyGizmo conjoint looks like this for each respondent:

 Set #1
 Model Attribute   Model Value
 Price $300
 Size  7
 Memory128 gb
 Score:50
 
 Set #2
 Model Attribute   Model Value
 Price $100
 Size  4
 Memory16 gb
 Score:0
 
 Set #3
 Model Attribute   Model Value
 Price $200
 Size  6
 Memory64 gb
 Score:100
 
 Set #4
 Model Attribute   Model Value
 Price $100
 Size  5
 Memory32 gb
 Score:100
 
 Set #5
 Model Attribute   Model Value
 Price $200
 Size  5
 Memory32 gb
 Score:0

Score 100 = Best 
Score 50 = Not selected
Score 0 = Worst

Is it possible to use the R-Project Conjoint Package with a data file like 
this, to calculate respondent-level utility values? In other words, are the 
scores (100, 50, 0) input that the Conjoint Package can use?

Thanks very much in advance to all for any info!

Best,


-Vik
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Decision Tree: Am I Missing Anything?

2012-09-22 Thread Vik Rubenfeld
Bhupendrashinh, thanks again for telling me about RWeka.  That made a big 
difference in a job I was working on this week. 

Have a great weekend.


-Vik
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Decision Tree: Am I Missing Anything?

2012-09-21 Thread Vik Rubenfeld
Max, I installed C50. I have a question about the syntax. Per the C50 manual:

## Default S3 method:
C5.0(x, y, trials = 1, rules= FALSE,
weights = NULL,
control = C5.0Control(),
costs = NULL, ...)

## S3 method for class ’formula’
C5.0(formula, data, weights, subset,
na.action = na.pass, ...)

I believe I need the method for class 'formula'. But I don't yet see in the 
manual how to tell C50 that I want to use that method. If I run:

respLevel = read.csv(Resp Level Data.csv)
respLevelTree = C5.0(BRAND_NAME ~ PRI + PROM + REVW + MODE + FORM + FAMI + DRRE 
+ FREC + SPED, data = respLevel)

...I get an error message:

Error in gsub(:, ., x, fixed = TRUE) : 
  input string 18 is invalid in this locale

What is the correct way to use the C5.0 method for class 'formula'?


-Vik

On Sep 21, 2012, at 4:18 AM, mxkuhn wrote:

 There is also C5.0 in the C50 package. It tends to have smaller trees that 
 C4.5 and much smaller trees than J48 when there are factor predictors. Also, 
 it has an optional feature selection (winnow) step that can be used. 
 
 Max
 
 On Sep 21, 2012, at 2:18 AM, Achim Zeileis achim.zeil...@uibk.ac.at wrote:
 
 Hi,
 
 just to add a few points to the discussion:
 
 - rpart() is able to deal with responses with more than two classes. Setting 
 method=class explicitly is not necessary if the response is a factor (as 
 in this case).
 
 - If your tree on this data is so huge that it can't even be plotted, I 
 wouldn't be surprised if it overfitted the data set. You should check for 
 this and possibly try to avoid unnecessary splits.
 
 - There are various ways to do so for J48 trees without variable reduction. 
 One could require a larger minimal leaf size (default is 2) or one can use 
 reduced error pruning, see WOW(J48) for more options. They can be easily 
 used as e.g. J48(..., control = Weka_control(R = TRUE,
 M = 10)) etc.
 
 - There are various other ways of fitting decision trees, see for example 
 http://CRAN.R-project.org/view=MachineLearning for an overview. In 
 particular, you might like the partykit package which additionally 
 provides the ctree() method and has a unified plotting interface for ctree, 
 rpart, and J48.
 
 hth,
 Z
 
 On Thu, 20 Sep 2012, Vik Rubenfeld wrote:
 
 Bhupendrashinh, thanks very much!  I ran J48 on a respondent-level data set 
 and got a 61.75% correct classification rate!
 
 Correctly Classified Instances 988   61.75   %
 Incorrectly Classified Instances   612   38.25   %
 Kappa statistic  0.5651
 Mean absolute error  0.0432
 Root mean squared error  0.1469
 Relative absolute error 52.7086 %
 Root relative squared error 72.6299 %
 Coverage of cases (0.95 level)  99.6875 %
 Mean rel. region size (0.95 level)  15.4915 %
 Total Number of Instances 1600
 
 When I plot it I get an enormous chart.  Running :
 
 respLevelTree = J48(BRAND_NAME ~ PRI + PROM + FORM + FAMI + DRRE + FREC + 
 MODE + SPED + REVW, data = respLevel)
 respLevelTree
 
 ...reports:
 
 J48 pruned tree
 --
 
 Is there a way to further prune the tree so that I can present a chart that 
 would fit on a single page or two?
 
 Thanks very much in advance for any thoughts.
 
 
 -Vik
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Decision Tree: Am I Missing Anything?

2012-09-20 Thread Vik Rubenfeld
I'm working with some data from which a client would like to make a decision 
tree predicting brand preference based on inputs such as price, speed, etc.  
After running the decision tree analysis using rpart, it appears that this data 
is not capable of predicting brand preference.  

Here's the data set:

BRND  PRI   PROM  FORM  FAMI  DRRE  FREC  MODE  
SPED  REVW
Brand 1   0.69890.47310.78490.69890.74190.6022
0.88170.90320.6452
Brand 2   0.86210.37930.8621 0.9310.75860.6897
0.89660.96550.8276
Brand 3  0.6   0.1   0.6   0.7   0.9   0.7   
0.7   0.8   0.6
Brand 4   0.6429  0.250.5714   0.50.6071   0.5  
0.750.8214   0.5
Brand 5   0.75860.42240.73280.66380.73280.6379
0.86210.86210.6897
Brand 6 0.750.08330.58330.4167   0.50.4167  
0.750.6667   0.5
Brand 7   0.77420.48390.61290.51610.80650.6452
0.77420.90320.6129
Brand 8   0.64290.26790.69640.7143 0.8750.5536
0.80360.94640.6607
Brand 90.575 0.175  0.65  0.55 0.625 0.375 
0.825  0.85 0.475
Brand 10  0.80950.52380.66670.64290.66670.5952
0.85710.80950.5714
Brand 11  0.6308   0.30.60770.58460.67690.5231
0.74620.8846   0.6
Brand 12  0.72120.31520.71520.65450.6606 0.503
0.80610.8909   0.6
Brand 13  0.74190.22580.61290.58060.70970.6129 
0.8710.96770.3226
Brand 14  0.71760.27060.63530.56470.69410.4471
0.71760.94120.5176
Brand 15  0.72870.34370.59950.57880.85270.5478
0.82170.89410.6227
Brand 16 0.7   0.4   0.6   0.4 1   0.4   
0.9   0.9   0.5
Brand 17  0.71930.0.66670.66670.70180.5263
0.77190.85960.7018
Brand 18  0.77780.41270.65080.63490.79370.6032
0.85710.9206 0.619
Brand 19  0.80280.28170.61970.43660.70420.4366
0.71830.91550.5634
Brand 20  0.77360.24530.62260.37740.58490.3019 
0.7170.86790.4717
Brand 21  0.84810.21520.63290.40510.63290.4557
0.69620.84810.3418
Brand 220.750.0.6667   0.50.66670.5833
0.91670.91670.4167

Here are my R commands:

 test.df = read.csv(test.csv)
 head(test.df)
 BRNDPRI   PROM   FORM   FAMI   DRRE   FREC   MODE   SPED   REVW
1 Brand 1 0.6989 0.4731 0.7849 0.6989 0.7419 0.6022 0.8817 0.9032 0.6452
2 Brand 2 0.8621 0.3793 0.8621 0.9310 0.7586 0.6897 0.8966 0.9655 0.8276
3 Brand 3 0.6000 0.1000 0.6000 0.7000 0.9000 0.7000 0.7000 0.8000 0.6000
4 Brand 4 0.6429 0.2500 0.5714 0.5000 0.6071 0.5000 0.7500 0.8214 0.5000
5 Brand 5 0.7586 0.4224 0.7328 0.6638 0.7328 0.6379 0.8621 0.8621 0.6897
6 Brand 6 0.7500 0.0833 0.5833 0.4167 0.5000 0.4167 0.7500 0.6667 0.5000

 testTree = rpart(BRAND~PRI  + PROM  + FORM +  FAMI+   DRRE +  FREC  + MODE +  
 SPED +  REVW, method=class, data=test.df)

 printcp(testTree)

Classification tree:
rpart(formula = BRND ~ PRI + PROM + FORM + FAMI + DRRE + FREC + 
MODE + SPED + REVW, data = test.df, method = class)

Variables actually used in tree construction:
[1] FORM

Root node error: 21/22 = 0.95455

n= 22 

CP nsplit rel error xerror xstd
1 0.047619  0   1.0 1.04760
2 0.01  1   0.95238 1.04760

I note that only one variable (FORM) was actually used in tree construction. 
When I run a plot using:

 plot(testTree)
 text(testTree)

...I get a tree with one branch.  

It looks to me like I'm doing everything right, and this data is just not 
capable of predicting brand preference. 

Am I missing anything?

Thanks very much in advance for any thoughts!

-Vik





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Decision Tree: Am I Missing Anything?

2012-09-20 Thread Vik Rubenfeld
Thanks! Here's the dput output:

 dput(test.df)
structure(list(BRND = structure(c(1L, 12L, 16L, 17L, 18L, 19L, 
20L, 21L, 22L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 13L, 
14L, 15L), .Label = c(Brand 1, Brand 10, Brand 11, Brand 12, 
Brand 13, Brand 14, Brand 15, Brand 16, Brand 17, Brand 18, 
Brand 19, Brand 2, Brand 20, Brand 21, Brand 22, Brand 3, 
Brand 4, Brand 5, Brand 6, Brand 7, Brand 8, Brand 9
), class = factor), PRI = c(0.6989, 0.8621, 0.6, 0.6429, 0.7586, 
0.75, 0.7742, 0.6429, 0.575, 0.8095, 0.6308, 0.7212, 0.7419, 
0.7176, 0.7287, 0.7, 0.7193, 0.7778, 0.8028, 0.7736, 0.8481, 
0.75), PROM = c(0.4731, 0.3793, 0.1, 0.25, 0.4224, 0.0833, 0.4839, 
0.2679, 0.175, 0.5238, 0.3, 0.3152, 0.2258, 0.2706, 0.3437, 0.4, 
0., 0.4127, 0.2817, 0.2453, 0.2152, 0.), FORM = c(0.7849, 
0.8621, 0.6, 0.5714, 0.7328, 0.5833, 0.6129, 0.6964, 0.65, 0.6667, 
0.6077, 0.7152, 0.6129, 0.6353, 0.5995, 0.6, 0.6667, 0.6508, 
0.6197, 0.6226, 0.6329, 0.6667), FAMI = c(0.6989, 0.931, 0.7, 
0.5, 0.6638, 0.4167, 0.5161, 0.7143, 0.55, 0.6429, 0.5846, 0.6545, 
0.5806, 0.5647, 0.5788, 0.4, 0.6667, 0.6349, 0.4366, 0.3774, 
0.4051, 0.5), DRRE = c(0.7419, 0.7586, 0.9, 0.6071, 0.7328, 0.5, 
0.8065, 0.875, 0.625, 0.6667, 0.6769, 0.6606, 0.7097, 0.6941, 
0.8527, 1, 0.7018, 0.7937, 0.7042, 0.5849, 0.6329, 0.6667), FREC = c(0.6022, 
0.6897, 0.7, 0.5, 0.6379, 0.4167, 0.6452, 0.5536, 0.375, 0.5952, 
0.5231, 0.503, 0.6129, 0.4471, 0.5478, 0.4, 0.5263, 0.6032, 0.4366, 
0.3019, 0.4557, 0.5833), MODE = c(0.8817, 0.8966, 0.7, 0.75, 
0.8621, 0.75, 0.7742, 0.8036, 0.825, 0.8571, 0.7462, 0.8061, 
0.871, 0.7176, 0.8217, 0.9, 0.7719, 0.8571, 0.7183, 0.717, 0.6962, 
0.9167), SPED = c(0.9032, 0.9655, 0.8, 0.8214, 0.8621, 0.6667, 
0.9032, 0.9464, 0.85, 0.8095, 0.8846, 0.8909, 0.9677, 0.9412, 
0.8941, 0.9, 0.8596, 0.9206, 0.9155, 0.8679, 0.8481, 0.9167), 
REVW = c(0.6452, 0.8276, 0.6, 0.5, 0.6897, 0.5, 0.6129, 0.6607, 
0.475, 0.5714, 0.6, 0.6, 0.3226, 0.5176, 0.6227, 0.5, 0.7018, 
0.619, 0.5634, 0.4717, 0.3418, 0.4167)), .Names = c(BRND, 
PRI, PROM, FORM, FAMI, DRRE, FREC, MODE, SPED, 
REVW), class = data.frame, row.names = c(NA, -22L))


I've downloaded rWeka and am looking at the documentation.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Decision Tree: Am I Missing Anything?

2012-09-20 Thread Vik Rubenfeld
Bhupendrashinh, thanks very much!  I ran J48 on a respondent-level data set and 
got a 61.75% correct classification rate!

Correctly Classified Instances 988   61.75   %
Incorrectly Classified Instances   612   38.25   %
Kappa statistic  0.5651
Mean absolute error  0.0432
Root mean squared error  0.1469
Relative absolute error 52.7086 %
Root relative squared error 72.6299 %
Coverage of cases (0.95 level)  99.6875 %
Mean rel. region size (0.95 level)  15.4915 %
Total Number of Instances 1600 

When I plot it I get an enormous chart.  Running :

respLevelTree = J48(BRAND_NAME ~ PRI + PROM + FORM + FAMI + DRRE + FREC + MODE 
+ SPED + REVW, data = respLevel)
respLevelTree

...reports:

J48 pruned tree
--

Is there a way to further prune the tree so that I can present a chart that 
would fit on a single page or two?

Thanks very much in advance for any thoughts.


-Vik




On Sep 20, 2012, at 8:37 PM, Bhupendrasinh Thakre wrote:

 Not very sure what the problem is as I was not able to take your data for 
 run. You might want to use dput() command to present the data. 
 
 Now on the programming side. As we can see that we have more than 2 levels 
 for the brands and hence method  = class is not able to able to understand 
 what you actually want from it.
 
 Suggestion : For predictions having more than 2 levels I will go for Weka and 
 specifically C4.5 algorithm. You also have the RWeka package for it.
 
 Best Regards,
 
 Bhupendrasinh Thakre
 Sent from my iPhone
 
 On Sep 20, 2012, at 9:47 PM, Vik Rubenfeld v...@mindspring.com wrote:
 
 I'm working with some data from which a client would like to make a decision 
 tree predicting brand preference based on inputs such as price, speed, etc.  
 After running the decision tree analysis using rpart, it appears that this 
 data is not capable of predicting brand preference.  
 
 Here's the data set:
 
 BRND  PRI   PROM  FORM  FAMI  DRRE  FREC  MODE   
SPED  REVW
 Brand 1   0.69890.47310.78490.69890.74190.6022
 0.88170.90320.6452
 Brand 2   0.86210.37930.8621 0.9310.75860.6897
 0.89660.96550.8276
 Brand 3  0.6   0.1   0.6   0.7   0.9   0.7   
 0.7   0.8   0.6
 Brand 4   0.6429  0.250.5714   0.50.6071   0.5  
 0.750.8214   0.5
 Brand 5   0.75860.42240.73280.66380.73280.6379
 0.86210.86210.6897
 Brand 6 0.750.08330.58330.4167   0.50.4167  
 0.750.6667   0.5
 Brand 7   0.77420.48390.61290.51610.80650.6452
 0.77420.90320.6129
 Brand 8   0.64290.26790.69640.7143 0.8750.5536
 0.80360.94640.6607
 Brand 90.575 0.175  0.65  0.55 0.625 0.375 
 0.825  0.85 0.475
 Brand 10  0.80950.52380.66670.64290.66670.5952
 0.85710.80950.5714
 Brand 11  0.6308   0.30.60770.58460.67690.5231
 0.74620.8846   0.6
 Brand 12  0.72120.31520.71520.65450.6606 0.503
 0.80610.8909   0.6
 Brand 13  0.74190.22580.61290.58060.70970.6129 
 0.8710.96770.3226
 Brand 14  0.71760.27060.63530.56470.69410.4471
 0.71760.94120.5176
 Brand 15  0.72870.34370.59950.57880.85270.5478
 0.82170.89410.6227
 Brand 16 0.7   0.4   0.6   0.4 1   0.4   
 0.9   0.9   0.5
 Brand 17  0.71930.0.66670.66670.70180.5263
 0.77190.85960.7018
 Brand 18  0.77780.41270.65080.63490.79370.6032
 0.85710.9206 0.619
 Brand 19  0.80280.28170.61970.43660.70420.4366
 0.71830.91550.5634
 Brand 20  0.77360.24530.62260.37740.58490.3019 
 0.7170.86790.4717
 Brand 21  0.84810.21520.63290.40510.63290.4557
 0.69620.84810.3418
 Brand 220.750.0.6667   0.50.66670.5833
 0.91670.91670.4167
 
 Here are my R commands:
 
 test.df = read.csv(test.csv)
 head(test.df)
BRNDPRI   PROM   FORM   FAMI   DRRE   FREC   MODE   SPED   REVW
 1 Brand 1 0.6989 0.4731 0.7849 0.6989 0.7419 0.6022 0.8817 0.9032 0.6452
 2 Brand 2 0.8621 0.3793 0.8621 0.9310 0.7586 0.6897 0.8966 0.9655 0.8276
 3 Brand 3 0.6000 0.1000 0.6000 0.7000 0.9000 0.7000 0.7000 0.8000 0.6000
 4 Brand 4 0.6429 0.2500 0.5714 0.5000 0.6071 0.5000 0.7500 0.8214 0.5000
 5 Brand 5 0.7586 0.4224 0.7328 0.6638 0.7328 0.6379 0.8621 0.8621 0.6897
 6 Brand 6 0.7500 0.0833 0.5833 0.4167 0.5000 0.4167

Re: [R] Decision Tree: Am I Missing Anything?

2012-09-20 Thread Vik Rubenfeld
Very good. Could  you point me in a couple of potential directions for variable 
reduction? E.g. correlation analysis?


On Sep 20, 2012, at 10:36 PM, Bhupendrasinh Thakre wrote:

 One possible way to think of it is using  variable reduction before going 
 for J48. You may want to use several methods available for that. Again 
 prediction for brands is more of a business question to me. 
 
 Two solution which I can think of.
 1. Variable reduction before decision tree.
 2. Let the intuition decide how many of them are really important.
 
 Please let us know your findings. All the best.
 
 Best Regards,
 
 Bhupendrasinh Thakre
 Sent from my iPhone
 
 On Sep 21, 2012, at 12:16 AM, Vik Rubenfeld v...@mindspring.com wrote:
 
 Bhupendrashinh, thanks very much!  I ran J48 on a respondent-level data set 
 and got a 61.75% correct classification rate!
 
 Correctly Classified Instances 988   61.75   %
 Incorrectly Classified Instances   612   38.25   %
 Kappa statistic  0.5651
 Mean absolute error  0.0432
 Root mean squared error  0.1469
 Relative absolute error 52.7086 %
 Root relative squared error 72.6299 %
 Coverage of cases (0.95 level)  99.6875 %
 Mean rel. region size (0.95 level)  15.4915 %
 Total Number of Instances 1600 
 
 When I plot it I get an enormous chart.  Running :
 
 respLevelTree = J48(BRAND_NAME ~ PRI + PROM + FORM + FAMI + DRRE + FREC + 
 MODE + SPED + REVW, data = respLevel)
 respLevelTree
 
 ...reports:
 
 J48 pruned tree
 --
 
 Is there a way to further prune the tree so that I can present a chart that 
 would fit on a single page or two?
 
 Thanks very much in advance for any thoughts.
 
 
 -Vik
 
 
 
 
 On Sep 20, 2012, at 8:37 PM, Bhupendrasinh Thakre wrote:
 
 Not very sure what the problem is as I was not able to take your data for 
 run. You might want to use dput() command to present the data. 
 
 Now on the programming side. As we can see that we have more than 2 levels 
 for the brands and hence method  = class is not able to able to understand 
 what you actually want from it.
 
 Suggestion : For predictions having more than 2 levels I will go for Weka 
 and specifically C4.5 algorithm. You also have the RWeka package for it.
 
 Best Regards,
 
 Bhupendrasinh Thakre
 Sent from my iPhone
 
 On Sep 20, 2012, at 9:47 PM, Vik Rubenfeld v...@mindspring.com wrote:
 
 I'm working with some data from which a client would like to make a 
 decision tree predicting brand preference based on inputs such as price, 
 speed, etc.  After running the decision tree analysis using rpart, it 
 appears that this data is not capable of predicting brand preference.  
 
 Here's the data set:
 
 BRND  PRI   PROM  FORM  FAMI  DRRE  FREC  MODE 
  SPED  REVW
 Brand 1   0.69890.47310.78490.69890.74190.6022
 0.88170.90320.6452
 Brand 2   0.86210.37930.8621 0.9310.75860.6897
 0.89660.96550.8276
 Brand 3  0.6   0.1   0.6   0.7   0.9   0.7 
   0.7   0.8   0.6
 Brand 4   0.6429  0.250.5714   0.50.6071   0.5 
  0.750.8214   0.5
 Brand 5   0.75860.42240.73280.66380.73280.6379
 0.86210.86210.6897
 Brand 6 0.750.08330.58330.4167   0.50.4167 
  0.750.6667   0.5
 Brand 7   0.77420.48390.61290.51610.80650.6452
 0.77420.90320.6129
 Brand 8   0.64290.26790.69640.7143 0.8750.5536
 0.80360.94640.6607
 Brand 90.575 0.175  0.65  0.55 0.625 0.375 
 0.825  0.85 0.475
 Brand 10  0.80950.52380.66670.64290.66670.5952
 0.85710.80950.5714
 Brand 11  0.6308   0.30.60770.58460.67690.5231
 0.74620.8846   0.6
 Brand 12  0.72120.31520.71520.65450.6606 0.503
 0.80610.8909   0.6
 Brand 13  0.74190.22580.61290.58060.70970.6129 
 0.8710.96770.3226
 Brand 14  0.71760.27060.63530.56470.69410.4471
 0.71760.94120.5176
 Brand 15  0.72870.34370.59950.57880.85270.5478
 0.82170.89410.6227
 Brand 16 0.7   0.4   0.6   0.4 1   0.4 
   0.9   0.9   0.5
 Brand 17  0.71930.0.66670.66670.70180.5263
 0.77190.85960.7018
 Brand 18  0.77780.41270.65080.63490.79370.6032
 0.85710.9206 0.619
 Brand 19  0.80280.28170.61970.43660.70420.4366
 0.71830.91550.5634
 Brand 20  0.77360.24530.62260.37740.58490.3019 
 0.7170.86790.4717
 Brand

[R] Does A Choice-Based Conjoint Study Have To Be Full Profile?

2012-08-10 Thread Vik Rubenfeld
In a Conjoint study, it's difficult for respondents to evaluate more than 6 
product attributes at a time. Some studies require more attributes. 

Often this is solved via the use of Adaptive Conjoint Analysis (ACA), in which 
the questionnaire is modified for each individual respondent as the survey is 
being taken. In ACA, it is not necessary to show the full profile -- i.e. all 
attributes -- of each product. Partial profiles are shown. A study can include 
up to 30 attributes, but respondents are never asked to consider more than 5 at 
a time.

However, in order to do ACA, as far as I know of at this time, one must have 
the survey hosted by a very expensive (e.g. $10,000) conjoint-oriented survey 
host.

My question is, is it possible to do partial profile Conjoint in R, without 
using ACA?

Thanks in advance to all for any info.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Correct Place to Seek an R-Project Consultant?

2012-08-06 Thread Vik Rubenfeld
I would like to find out how to apply commands found in the bayesm package, to 
analyze data gathered via a choice-based conjoint study. Is there a web 
resource where I can seek an R-Project consultant experienced in this, who I 
could hire to walk me through the appropriate bayesm commands to use for this 
purpose?

Thanks in advance to all for any info.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Can't Run Conjoint Package - Could not find function caFactorialDesign?

2012-08-03 Thread Vik Rubenfeld
I'm trying to run the Conjoint package, and I receive the error:

 Error: could not find function caFactorialDesign

I'm running R version 2.15.1 on Mac OS X.  I have installed the Conjoint 
package with the Install Dependencies checkbox checked. I have clicked the 
Update All button in the R Package Installer.

How can I correct this error?

Thanks in advance to all for any info.

(Please respond to this email address as well as to the list -- Thanks!). 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can't Run Conjoint Package - Could not find function caFactorialDesign?

2012-08-03 Thread Vik Rubenfeld
Thanks very much for this info, Sarah!

I have used library(conjoint). Here are the commands used:

 library(conjoint)
 experiment = expand.grid(
+ price = c(low, medium, high),
+ variety = c(black, green, red),
+ kind = c(bags, granulated, leafy),
+ aroma = c(yes, no))
 design-caFactorialDesign(data=experiment, type=orthogonal)
Error: could not find function caFactorialDesign

What could I be missing?

Best,


-Vik


On Aug 3, 2012, at 10:28 AM, Sarah Goslee wrote:

 Hi Vik,
 
 You don't need to post to nabble and to the R-help list. Just skip the
 nabble step!
 
 Have you loaded the package with:
 
 library(conjoint) # not Conjoint
 
 before you try to use any of its functions?
 
 Sarah
 
 
 
 On Fri, Aug 3, 2012 at 1:23 PM, Vik Rubenfeld v...@mindspring.com wrote:
 I'm trying to run the Conjoint package, and I receive the error:
 
 Error: could not find function caFactorialDesign
 
 I'm running R version 2.15.1 on Mac OS X.  I have installed the Conjoint 
 package with the Install Dependencies checkbox checked. I have clicked the 
 Update All button in the R Package Installer.
 
 How can I correct this error?
 
 Thanks in advance to all for any info.
 
 
 
 -- 
 Sarah Goslee
 http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can't Run Conjoint Package - Could not find function caFactorialDesign?

2012-08-03 Thread Vik Rubenfeld
Here is the output of sessionInfo():

 sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
 [1] conjoint_1.33 clusterSim_0.41-5 mlbench_2.1-1 MASS_7.3-20   
rgl_0.92.892  e1071_1.6 class_7.3-4  
 [8] R2HTML_2.2cluster_1.14.2ade4_1.4-17   AlgDesign_1.1-7  

loaded via a namespace (and not attached):
[1] tools_2.15.1
 

Per your recommendation, I have read the Posting Guide, and have sent an email 
to the Maintainers of this packages as well.

Best,


-Vik


On Aug 3, 2012, at 11:15 AM, Sarah Goslee wrote:

 Hi,
 
 On Fri, Aug 3, 2012 at 1:57 PM, Vik Rubenfeld v...@mindspring.com wrote:
 Thanks very much for this info, Sarah!
 
 I have used library(conjoint). Here are the commands used:
 
 library(conjoint)
 experiment = expand.grid(
 + price = c(low, medium, high),
 + variety = c(black, green, red),
 + kind = c(bags, granulated, leafy),
 + aroma = c(yes, no))
 design-caFactorialDesign(data=experiment, type=orthogonal)
 Error: could not find function caFactorialDesign
 
 What could I be missing?
 
 That's hard to say. We need at least the output of sessionInfo() to
 begin to decide. You've got the reproducible example, but a look at
 the posting guide might still offer you some advice.
 
 Sarah
 
 
 -- 
 Sarah Goslee
 http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can't Run Conjoint Package - Could not find function caFactorialDesign?

2012-08-03 Thread Vik Rubenfeld
Got it. Thanks so much for your help, Michael and Sarah!

Best,


-Vik

On Aug 3, 2012, at 11:50 AM, R. Michael Weylandt wrote:

 On Fri, Aug 3, 2012 at 1:34 PM, Sarah Goslee sarah.gos...@gmail.com wrote:
 On Fri, Aug 3, 2012 at 2:23 PM, R. Michael Weylandt
 michael.weyla...@gmail.com wrote:
 With conjoint_1.33 and rather up to date dependencies, I don't see
 caFactorialDesign and neither does getAnywhere().
 
 The function is present in conjoint 1.34, the current version on CRAN.
 
 Rarely do I have to remind respondents to update their installation. :)
 
 Sarah
 
 It looks like 1.34 was uploaded this morning and hasnt made it to my
 mirror yet: sorry for the noise.
 
 Vik, it looks like you're in the same boat. Try switching to the
 Austrian CRAN master and update -- then it'll be there.
 
 Best,
 Michael

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Newbie Correspondence Analysis Question

2010-09-26 Thread Vik Rubenfeld
I'm experienced in statistics, but I am a first-time R user.  I would like to 
use R for correspondence analysis.  I have installed R (Mac OSX). I have used 
the package installer to install the CA package.  I have run the following line 
with no errors to read in the data for a table:

NonLuxury - read.table(/Users/myUserName/Desktop/nonLuxury.data.txt)

The R online help appears to suggest that the following line should come next:

 corresp(NonLuxury)

However, I get the error message:

Error: could not find function corresp
 
The CA manual appears to suggest that the following line should come next:

 ca(NonLuxury)

Again, I get the error message:

Error: could not find function ca

What am I missing? Thanks very much in advance to all for any info.


-Vik
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Newbie Correspondence Analysis Question

2010-09-26 Thread Vik Rubenfeld
Thanks very much. 


-Vik


On Sep 26, 2010, at 9:45 AM, Chris Mcowen wrote:

 Have you loaded the library after installing it?
 
 Either use  library(CA) 
 
 Or
 
 Through the package manager tab
 
 Hth
 Sent from my iPhone
 
 On 26 Sep 2010, at 17:41, Vik Rubenfeld v...@mindspring.com wrote:
 
 I'm experienced in statistics, but I am a first-time R user.  I would like 
 to use R for correspondence analysis.  I have installed R (Mac OSX). I have 
 used the package installer to install the CA package.  I have run the 
 following line with no errors to read in the data for a table:
 
   NonLuxury - read.table(/Users/myUserName/Desktop/nonLuxury.data.txt)
 
 The R online help appears to suggest that the following line should come 
 next:
 
 corresp(NonLuxury)
 
 However, I get the error message:
 
   Error: could not find function corresp
 
 The CA manual appears to suggest that the following line should come next:
 
 ca(NonLuxury)
 
 Again, I get the error message:
 
   Error: could not find function ca
 
 What am I missing? Thanks very much in advance to all for any info.
 
 
 -Vik
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Storing CA Results to a Data Frame?

2010-09-26 Thread Vik Rubenfeld
I am successfully performing a correspondence analysis using the commands:

NonLuxury - read.table(/Users/myUserName/Desktop/nonLuxury.data.txt)
ca(NonLuxury)

I would like to store the results to a data frame so that I can write them to 
disk using write.table.  I have tried several things such as:

df - data.frame(ca(NonLuxury))
df - data.frame(data(ca(NonLuxury)))
etc.

...but clearly this is incorrect as it generates an error message. 

Is it possible to store the results of a CA to a dataframe, and if so, what is 
the correct way to do this?

Thanks in advance to all for any info.


-Vik
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Storing CA Results to a Data Frame?

2010-09-26 Thread Vik Rubenfeld
[Sorry- somehow the first time I posted this it got attached to another thread 
-Vik]

I am successfully performing a correspondence analysis using the commands: 

NonLuxury - read.table(/Users/myUserName/Desktop/nonLuxury.data.txt) 
ca(NonLuxury) 

I would like to store the results to a data frame so that I can write them to 
disk using write.table.  I have tried several things such as: 

df - data.frame(ca(NonLuxury)) 
df - data.frame(data(ca(NonLuxury))) 
etc. 

...but clearly this is incorrect as it generates an error message. 

Is it possible to store the results of a CA to a dataframe, and if so, what is 
the correct way to do this? 

Thanks in advance to all for any info. 


-Vik 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Storing CA Results to a Data Frame?

2010-09-26 Thread Vik Rubenfeld
Thanks very much for this great info, Ista.

Best,


-Vik

On Sep 26, 2010, at 12:10 PM, Ista Zahn wrote:

 Hi Vik,
 I suggest reading through some of the introductory documentation. R
 has several classes of objects, including matrix, list, data.frame
 etc. and a basic understanding of what these are is essential for
 effectively using R. An essential function is str() which shows you
 the structure of an object. Other essential functions include names(),
 help(), help.search(), and methods()
 
 An example session that is similar to your case:
 
 library(ca) # load the ca package
 data(author) # load the authors dataset
 str(author) # examine the authors data
 auth.ca - ca(author) # run the ca function on the authors data
 str(auth.ca) # examin the structure of the auth.ca results. Note that
 it is a list with class of ca
 methods(class=class(auth.ca)) # see what methods are defined for this
 type of object
 ?plot.ca ## look up the documentation for the plot method for objects
 of class ca
 plot(auth.ca) ## call the plot method
 auth.ca.sum - summary(auth.ca) ## call the summary method
 str(auth.ca.sum) # examine the structure of the auth.ca.sum object
 methods(class=class(auth.ca.sum)) ## find out what methods are defined for it
 ## Hmmn ok, so suppose I want to extract the rows and columns
 data.frames from auth.ca.sum but don't know how
 help.search(extract) ## first result is base::Extract
 ?Extract ## look up documentation for extract
 auth.ca.rows - auth.ca.sum[[rows]] ## extract the rows data.frame
 auth.ca.rows - auth.ca.sum[[columns]] ## extract the columns data.frame
 write.csv(auth.ca.rows) ## write results to a .csv file
 write.csv(auth.ca.rows) ## 
 
 HTH,
 Ista
 
 On Sun, Sep 26, 2010 at 6:10 PM, Vik Rubenfeld v...@mindspring.com wrote:,
 I am successfully performing a correspondence analysis using the commands:
 
NonLuxury - 
 read.table(/Users/myUserName/Desktop/nonLuxury.data.txt)
ca(NonLuxury)
 
 I would like to store the results to a data frame so that I can write them 
 to disk using write.table.  I have tried several things such as:
 
df - data.frame(ca(NonLuxury))
df - data.frame(data(ca(NonLuxury)))
etc.
 
 ...but clearly this is incorrect as it generates an error message.
 
 Is it possible to store the results of a CA to a dataframe, and if so, what 
 is the correct way to do this?
 
 Thanks in advance to all for any info.
 
 
 -Vik
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 -- 
 Ista Zahn
 Graduate student
 University of Rochester
 Department of Clinical and Social Psychology
 http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.