Re: [R] 'R' Software Output Plagiarism
Your professor should immediately recognize that the quoted code is standard regression input/output and that the Urkund results in this case are without merit. > On Sep 22, 2015, at 7:27 AM, BARRETT, Oliverwrote: > > > Dear 'R' community support, > > > I am a student at Skema business school and I have recently submitted my MSc > thesis/dissertation. This has been passed on to an external plagiarism > service provider, Urkund, who have scanned my document and returned a > plagiarism report to my professor having detected 32% plagiarism. > > > I have contacted Urkund regarding this issue having committed no such > plagiarism and they have told me that all the plagiarism detected in my > document comes from the last 25% which consists only of 'R' regressions like > the one I have pasted below: > > lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. + >Fed.t.4., data = OLS_CAR, x = TRUE) > > Residuals: > Min1QMedian3Q Max > -0.154587 -0.015961 0.001429 0.017196 0.110907 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) -0.001630 0.001763 -0.925 0.3559 > Fed -0.121595 0.165359 -0.735 0.4627 > Fed.t.1. 0.344014 0.140979 2.440 0.0153 * > Fed.t.2. 0.026529 0.143648 0.185 0.8536 > Fed.t.3. 0.622357 0.142021 4.382 1.62e-05 *** > Fed.t.4. 0.291985 0.158914 1.837 0.0671 . > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > Residual standard error: 0.0293 on 304 degrees of freedom > (20 observations deleted due to missingness) > Multiple R-squared: 0.08629, Adjusted R-squared: 0.07126 > F-statistic: 5.742 on 5 and 304 DF, p-value: 4.422e-05 > > I have produced all of these regressions myself and pasted them directly from > the 'R' software package. My regression methodology is entirely my own along > with the sourcing and preperation of the data used to produce these > statistics. > > I would be very grateful if you could provide my with some clarity as to why > this output from 'R' is reading as plagiarism. > > I would like to thank you in advance, > > Kind regards, > > Oliver Barrett > (+44) 7341 834 217 > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using choiceDes Package to Design MaxDiff?
I’m seeking to design a MaxDiff experiment that will have a number of blocks of this type: Which of these items is the most important? Which of these items is the least important? Item 1 Item 2 Item 3 Item 4 I’m seeking to use the choiceDes package http://cran.r-project.org/web/packages/choiceDes/choiceDes.pdf to design the experiment. The relevant function is tradeoff.des. Usage: tradeoff.des(items, shown, vers, tasks, fname=NULL, Rd=20, Rc=NULL, print=TRUE) I believe I understand the items, shown and vers parameters: items: number of total items in the experiment show: number of items shown per block vers: number of blocks ...but I’m not quite sure what the tasks parameter is yet. For example, let’s say I have 20 items total in the study. I want to show 4 items per block, with 10 blocks total. I enter: tempDes - tradeoff.des(20, 4, 10, 2, tempDesign.txt, 20, NULL, TRUE) …hoping that the 2 is the number of questions (most important/least important) per block. I get an error: Error in optBlock(~., des.d, rep(tasks, vers), nRepeats = Rd) : The number of withinData rows is not large enough to support the blocked model. If I put the number of blocks up to 50: tempDes - tradeoff.des(20, 4, 50, 2, tempDesign.txt, 20, NULL, TRUE) …I get the same error. What is the correct way to use choiceDes to design a MaxDiff experiment of this kind? Thanks very much in advance to all for any thoughts or info! Best, -Vik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using choiceDes Package to Design MaxDiff?
I’m seeking to design a MaxDiff experiment that will have a number of blocks of this type: Which of these items is the most important? Which of these items is the least important? Item 1 Item 2 Item 3 Item 4 I’m seeking to use the choiceDes package http://cran.r-project.org/web/packages/choiceDes/choiceDes.pdf to design the experiment. The relevant function is tradeoff.des. Usage: tradeoff.des(items, shown, vers, tasks, fname=NULL, Rd=20, Rc=NULL, print=TRUE) I believe I understand the items, shown and vers parameters: items: number of total items in the experiment show: number of items shown per block vers: number of blocks ...but I’m not quite sure what the tasks parameter is yet. For example, let’s say I have 20 items total in the study. I want to show 4 items per block, with 10 blocks total. I enter: tempDes - tradeoff.des(20, 4, 10, 2, tempDesign.txt, 20, NULL, TRUE) …hoping that the 2 is the number of questions (most important/least important) per block. I get an error: Error in optBlock(~., des.d, rep(tasks, vers), nRepeats = Rd) : The number of withinData rows is not large enough to support the blocked model. If I put the number of blocks up to 50: tempDes - tradeoff.des(20, 4, 50, 2, tempDesign.txt, 20, NULL, TRUE) …I get the same error. What is the correct way to use choiceDes to design a MaxDiff experiment of this kind? Thanks very much in advance to all for any thoughts or info! Best, -Vik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Conjoint Package
I’m very glad to see the Conjoint Package for R. The documentation for it does not appear to specify methods for data acquisition. Are the cards to be individually scored by each respondent (most clients would rather see a choice-based methodology)? SurveyGizmo, an excellent online survey host which I use, has in beta a Conjoint question type. However, it does not appear to calculate respondent-level utility values at this time. SurveyGizmo supports a conjoint question design in which each respondent is shown 3 cards at a time, and permitted to identify one of the three as Best, and one as Worst. (SG supports additional conjoint question designs as well). Data acquired by SurveyGizmo conjoint looks like this for each respondent: Set #1 Model Attribute Model Value Price $300 Size 7 Memory128 gb Score:50 Set #2 Model Attribute Model Value Price $100 Size 4 Memory16 gb Score:0 Set #3 Model Attribute Model Value Price $200 Size 6 Memory64 gb Score:100 Set #4 Model Attribute Model Value Price $100 Size 5 Memory32 gb Score:100 Set #5 Model Attribute Model Value Price $200 Size 5 Memory32 gb Score:0 Score 100 = Best Score 50 = Not selected Score 0 = Worst Is it possible to use R-Project Conjoint Package with such a data file, to calculate respondent-level utility values? Thanks very much in advance to all for any info! Best, -Vik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Conjoint Package
I’m very glad to see the Conjoint Package for R. The documentation for it does not appear to specify methods for data acquisition. Are the cards to be individually scored by each respondent (most clients would rather see a choice-based methodology)? SurveyGizmo, an excellent online survey host which I use, has in beta a Conjoint question type. However, it does not appear to calculate respondent-level utility values at this time. SurveyGizmo supports a conjoint question design in which each respondent is shown 3 cards at a time, and permitted to identify one of the three as Best, and one as Worst. (SG supports additional conjoint question designs as well). Data acquired by SurveyGizmo conjoint looks like this for each respondent: Set #1 Model Attribute Model Value Price $300 Size 7 Memory128 gb Score:50 Set #2 Model Attribute Model Value Price $100 Size 4 Memory16 gb Score:0 Set #3 Model Attribute Model Value Price $200 Size 6 Memory64 gb Score:100 Set #4 Model Attribute Model Value Price $100 Size 5 Memory32 gb Score:100 Set #5 Model Attribute Model Value Price $200 Size 5 Memory32 gb Score:0 Score 100 = Best Score 50 = Not selected Score 0 = Worst Is it possible to use the R-Project Conjoint Package with a data file like this, to calculate respondent-level utility values? In other words, are the scores (100, 50, 0) input that the Conjoint Package can use? Thanks very much in advance to all for any info! Best, -Vik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Decision Tree: Am I Missing Anything?
Bhupendrashinh, thanks again for telling me about RWeka. That made a big difference in a job I was working on this week. Have a great weekend. -Vik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Decision Tree: Am I Missing Anything?
Max, I installed C50. I have a question about the syntax. Per the C50 manual: ## Default S3 method: C5.0(x, y, trials = 1, rules= FALSE, weights = NULL, control = C5.0Control(), costs = NULL, ...) ## S3 method for class ’formula’ C5.0(formula, data, weights, subset, na.action = na.pass, ...) I believe I need the method for class 'formula'. But I don't yet see in the manual how to tell C50 that I want to use that method. If I run: respLevel = read.csv(Resp Level Data.csv) respLevelTree = C5.0(BRAND_NAME ~ PRI + PROM + REVW + MODE + FORM + FAMI + DRRE + FREC + SPED, data = respLevel) ...I get an error message: Error in gsub(:, ., x, fixed = TRUE) : input string 18 is invalid in this locale What is the correct way to use the C5.0 method for class 'formula'? -Vik On Sep 21, 2012, at 4:18 AM, mxkuhn wrote: There is also C5.0 in the C50 package. It tends to have smaller trees that C4.5 and much smaller trees than J48 when there are factor predictors. Also, it has an optional feature selection (winnow) step that can be used. Max On Sep 21, 2012, at 2:18 AM, Achim Zeileis achim.zeil...@uibk.ac.at wrote: Hi, just to add a few points to the discussion: - rpart() is able to deal with responses with more than two classes. Setting method=class explicitly is not necessary if the response is a factor (as in this case). - If your tree on this data is so huge that it can't even be plotted, I wouldn't be surprised if it overfitted the data set. You should check for this and possibly try to avoid unnecessary splits. - There are various ways to do so for J48 trees without variable reduction. One could require a larger minimal leaf size (default is 2) or one can use reduced error pruning, see WOW(J48) for more options. They can be easily used as e.g. J48(..., control = Weka_control(R = TRUE, M = 10)) etc. - There are various other ways of fitting decision trees, see for example http://CRAN.R-project.org/view=MachineLearning for an overview. In particular, you might like the partykit package which additionally provides the ctree() method and has a unified plotting interface for ctree, rpart, and J48. hth, Z On Thu, 20 Sep 2012, Vik Rubenfeld wrote: Bhupendrashinh, thanks very much! I ran J48 on a respondent-level data set and got a 61.75% correct classification rate! Correctly Classified Instances 988 61.75 % Incorrectly Classified Instances 612 38.25 % Kappa statistic 0.5651 Mean absolute error 0.0432 Root mean squared error 0.1469 Relative absolute error 52.7086 % Root relative squared error 72.6299 % Coverage of cases (0.95 level) 99.6875 % Mean rel. region size (0.95 level) 15.4915 % Total Number of Instances 1600 When I plot it I get an enormous chart. Running : respLevelTree = J48(BRAND_NAME ~ PRI + PROM + FORM + FAMI + DRRE + FREC + MODE + SPED + REVW, data = respLevel) respLevelTree ...reports: J48 pruned tree -- Is there a way to further prune the tree so that I can present a chart that would fit on a single page or two? Thanks very much in advance for any thoughts. -Vik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Decision Tree: Am I Missing Anything?
I'm working with some data from which a client would like to make a decision tree predicting brand preference based on inputs such as price, speed, etc. After running the decision tree analysis using rpart, it appears that this data is not capable of predicting brand preference. Here's the data set: BRND PRI PROM FORM FAMI DRRE FREC MODE SPED REVW Brand 1 0.69890.47310.78490.69890.74190.6022 0.88170.90320.6452 Brand 2 0.86210.37930.8621 0.9310.75860.6897 0.89660.96550.8276 Brand 3 0.6 0.1 0.6 0.7 0.9 0.7 0.7 0.8 0.6 Brand 4 0.6429 0.250.5714 0.50.6071 0.5 0.750.8214 0.5 Brand 5 0.75860.42240.73280.66380.73280.6379 0.86210.86210.6897 Brand 6 0.750.08330.58330.4167 0.50.4167 0.750.6667 0.5 Brand 7 0.77420.48390.61290.51610.80650.6452 0.77420.90320.6129 Brand 8 0.64290.26790.69640.7143 0.8750.5536 0.80360.94640.6607 Brand 90.575 0.175 0.65 0.55 0.625 0.375 0.825 0.85 0.475 Brand 10 0.80950.52380.66670.64290.66670.5952 0.85710.80950.5714 Brand 11 0.6308 0.30.60770.58460.67690.5231 0.74620.8846 0.6 Brand 12 0.72120.31520.71520.65450.6606 0.503 0.80610.8909 0.6 Brand 13 0.74190.22580.61290.58060.70970.6129 0.8710.96770.3226 Brand 14 0.71760.27060.63530.56470.69410.4471 0.71760.94120.5176 Brand 15 0.72870.34370.59950.57880.85270.5478 0.82170.89410.6227 Brand 16 0.7 0.4 0.6 0.4 1 0.4 0.9 0.9 0.5 Brand 17 0.71930.0.66670.66670.70180.5263 0.77190.85960.7018 Brand 18 0.77780.41270.65080.63490.79370.6032 0.85710.9206 0.619 Brand 19 0.80280.28170.61970.43660.70420.4366 0.71830.91550.5634 Brand 20 0.77360.24530.62260.37740.58490.3019 0.7170.86790.4717 Brand 21 0.84810.21520.63290.40510.63290.4557 0.69620.84810.3418 Brand 220.750.0.6667 0.50.66670.5833 0.91670.91670.4167 Here are my R commands: test.df = read.csv(test.csv) head(test.df) BRNDPRI PROM FORM FAMI DRRE FREC MODE SPED REVW 1 Brand 1 0.6989 0.4731 0.7849 0.6989 0.7419 0.6022 0.8817 0.9032 0.6452 2 Brand 2 0.8621 0.3793 0.8621 0.9310 0.7586 0.6897 0.8966 0.9655 0.8276 3 Brand 3 0.6000 0.1000 0.6000 0.7000 0.9000 0.7000 0.7000 0.8000 0.6000 4 Brand 4 0.6429 0.2500 0.5714 0.5000 0.6071 0.5000 0.7500 0.8214 0.5000 5 Brand 5 0.7586 0.4224 0.7328 0.6638 0.7328 0.6379 0.8621 0.8621 0.6897 6 Brand 6 0.7500 0.0833 0.5833 0.4167 0.5000 0.4167 0.7500 0.6667 0.5000 testTree = rpart(BRAND~PRI + PROM + FORM + FAMI+ DRRE + FREC + MODE + SPED + REVW, method=class, data=test.df) printcp(testTree) Classification tree: rpart(formula = BRND ~ PRI + PROM + FORM + FAMI + DRRE + FREC + MODE + SPED + REVW, data = test.df, method = class) Variables actually used in tree construction: [1] FORM Root node error: 21/22 = 0.95455 n= 22 CP nsplit rel error xerror xstd 1 0.047619 0 1.0 1.04760 2 0.01 1 0.95238 1.04760 I note that only one variable (FORM) was actually used in tree construction. When I run a plot using: plot(testTree) text(testTree) ...I get a tree with one branch. It looks to me like I'm doing everything right, and this data is just not capable of predicting brand preference. Am I missing anything? Thanks very much in advance for any thoughts! -Vik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Decision Tree: Am I Missing Anything?
Thanks! Here's the dput output: dput(test.df) structure(list(BRND = structure(c(1L, 12L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 13L, 14L, 15L), .Label = c(Brand 1, Brand 10, Brand 11, Brand 12, Brand 13, Brand 14, Brand 15, Brand 16, Brand 17, Brand 18, Brand 19, Brand 2, Brand 20, Brand 21, Brand 22, Brand 3, Brand 4, Brand 5, Brand 6, Brand 7, Brand 8, Brand 9 ), class = factor), PRI = c(0.6989, 0.8621, 0.6, 0.6429, 0.7586, 0.75, 0.7742, 0.6429, 0.575, 0.8095, 0.6308, 0.7212, 0.7419, 0.7176, 0.7287, 0.7, 0.7193, 0.7778, 0.8028, 0.7736, 0.8481, 0.75), PROM = c(0.4731, 0.3793, 0.1, 0.25, 0.4224, 0.0833, 0.4839, 0.2679, 0.175, 0.5238, 0.3, 0.3152, 0.2258, 0.2706, 0.3437, 0.4, 0., 0.4127, 0.2817, 0.2453, 0.2152, 0.), FORM = c(0.7849, 0.8621, 0.6, 0.5714, 0.7328, 0.5833, 0.6129, 0.6964, 0.65, 0.6667, 0.6077, 0.7152, 0.6129, 0.6353, 0.5995, 0.6, 0.6667, 0.6508, 0.6197, 0.6226, 0.6329, 0.6667), FAMI = c(0.6989, 0.931, 0.7, 0.5, 0.6638, 0.4167, 0.5161, 0.7143, 0.55, 0.6429, 0.5846, 0.6545, 0.5806, 0.5647, 0.5788, 0.4, 0.6667, 0.6349, 0.4366, 0.3774, 0.4051, 0.5), DRRE = c(0.7419, 0.7586, 0.9, 0.6071, 0.7328, 0.5, 0.8065, 0.875, 0.625, 0.6667, 0.6769, 0.6606, 0.7097, 0.6941, 0.8527, 1, 0.7018, 0.7937, 0.7042, 0.5849, 0.6329, 0.6667), FREC = c(0.6022, 0.6897, 0.7, 0.5, 0.6379, 0.4167, 0.6452, 0.5536, 0.375, 0.5952, 0.5231, 0.503, 0.6129, 0.4471, 0.5478, 0.4, 0.5263, 0.6032, 0.4366, 0.3019, 0.4557, 0.5833), MODE = c(0.8817, 0.8966, 0.7, 0.75, 0.8621, 0.75, 0.7742, 0.8036, 0.825, 0.8571, 0.7462, 0.8061, 0.871, 0.7176, 0.8217, 0.9, 0.7719, 0.8571, 0.7183, 0.717, 0.6962, 0.9167), SPED = c(0.9032, 0.9655, 0.8, 0.8214, 0.8621, 0.6667, 0.9032, 0.9464, 0.85, 0.8095, 0.8846, 0.8909, 0.9677, 0.9412, 0.8941, 0.9, 0.8596, 0.9206, 0.9155, 0.8679, 0.8481, 0.9167), REVW = c(0.6452, 0.8276, 0.6, 0.5, 0.6897, 0.5, 0.6129, 0.6607, 0.475, 0.5714, 0.6, 0.6, 0.3226, 0.5176, 0.6227, 0.5, 0.7018, 0.619, 0.5634, 0.4717, 0.3418, 0.4167)), .Names = c(BRND, PRI, PROM, FORM, FAMI, DRRE, FREC, MODE, SPED, REVW), class = data.frame, row.names = c(NA, -22L)) I've downloaded rWeka and am looking at the documentation. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Decision Tree: Am I Missing Anything?
Bhupendrashinh, thanks very much! I ran J48 on a respondent-level data set and got a 61.75% correct classification rate! Correctly Classified Instances 988 61.75 % Incorrectly Classified Instances 612 38.25 % Kappa statistic 0.5651 Mean absolute error 0.0432 Root mean squared error 0.1469 Relative absolute error 52.7086 % Root relative squared error 72.6299 % Coverage of cases (0.95 level) 99.6875 % Mean rel. region size (0.95 level) 15.4915 % Total Number of Instances 1600 When I plot it I get an enormous chart. Running : respLevelTree = J48(BRAND_NAME ~ PRI + PROM + FORM + FAMI + DRRE + FREC + MODE + SPED + REVW, data = respLevel) respLevelTree ...reports: J48 pruned tree -- Is there a way to further prune the tree so that I can present a chart that would fit on a single page or two? Thanks very much in advance for any thoughts. -Vik On Sep 20, 2012, at 8:37 PM, Bhupendrasinh Thakre wrote: Not very sure what the problem is as I was not able to take your data for run. You might want to use dput() command to present the data. Now on the programming side. As we can see that we have more than 2 levels for the brands and hence method = class is not able to able to understand what you actually want from it. Suggestion : For predictions having more than 2 levels I will go for Weka and specifically C4.5 algorithm. You also have the RWeka package for it. Best Regards, Bhupendrasinh Thakre Sent from my iPhone On Sep 20, 2012, at 9:47 PM, Vik Rubenfeld v...@mindspring.com wrote: I'm working with some data from which a client would like to make a decision tree predicting brand preference based on inputs such as price, speed, etc. After running the decision tree analysis using rpart, it appears that this data is not capable of predicting brand preference. Here's the data set: BRND PRI PROM FORM FAMI DRRE FREC MODE SPED REVW Brand 1 0.69890.47310.78490.69890.74190.6022 0.88170.90320.6452 Brand 2 0.86210.37930.8621 0.9310.75860.6897 0.89660.96550.8276 Brand 3 0.6 0.1 0.6 0.7 0.9 0.7 0.7 0.8 0.6 Brand 4 0.6429 0.250.5714 0.50.6071 0.5 0.750.8214 0.5 Brand 5 0.75860.42240.73280.66380.73280.6379 0.86210.86210.6897 Brand 6 0.750.08330.58330.4167 0.50.4167 0.750.6667 0.5 Brand 7 0.77420.48390.61290.51610.80650.6452 0.77420.90320.6129 Brand 8 0.64290.26790.69640.7143 0.8750.5536 0.80360.94640.6607 Brand 90.575 0.175 0.65 0.55 0.625 0.375 0.825 0.85 0.475 Brand 10 0.80950.52380.66670.64290.66670.5952 0.85710.80950.5714 Brand 11 0.6308 0.30.60770.58460.67690.5231 0.74620.8846 0.6 Brand 12 0.72120.31520.71520.65450.6606 0.503 0.80610.8909 0.6 Brand 13 0.74190.22580.61290.58060.70970.6129 0.8710.96770.3226 Brand 14 0.71760.27060.63530.56470.69410.4471 0.71760.94120.5176 Brand 15 0.72870.34370.59950.57880.85270.5478 0.82170.89410.6227 Brand 16 0.7 0.4 0.6 0.4 1 0.4 0.9 0.9 0.5 Brand 17 0.71930.0.66670.66670.70180.5263 0.77190.85960.7018 Brand 18 0.77780.41270.65080.63490.79370.6032 0.85710.9206 0.619 Brand 19 0.80280.28170.61970.43660.70420.4366 0.71830.91550.5634 Brand 20 0.77360.24530.62260.37740.58490.3019 0.7170.86790.4717 Brand 21 0.84810.21520.63290.40510.63290.4557 0.69620.84810.3418 Brand 220.750.0.6667 0.50.66670.5833 0.91670.91670.4167 Here are my R commands: test.df = read.csv(test.csv) head(test.df) BRNDPRI PROM FORM FAMI DRRE FREC MODE SPED REVW 1 Brand 1 0.6989 0.4731 0.7849 0.6989 0.7419 0.6022 0.8817 0.9032 0.6452 2 Brand 2 0.8621 0.3793 0.8621 0.9310 0.7586 0.6897 0.8966 0.9655 0.8276 3 Brand 3 0.6000 0.1000 0.6000 0.7000 0.9000 0.7000 0.7000 0.8000 0.6000 4 Brand 4 0.6429 0.2500 0.5714 0.5000 0.6071 0.5000 0.7500 0.8214 0.5000 5 Brand 5 0.7586 0.4224 0.7328 0.6638 0.7328 0.6379 0.8621 0.8621 0.6897 6 Brand 6 0.7500 0.0833 0.5833 0.4167 0.5000 0.4167
Re: [R] Decision Tree: Am I Missing Anything?
Very good. Could you point me in a couple of potential directions for variable reduction? E.g. correlation analysis? On Sep 20, 2012, at 10:36 PM, Bhupendrasinh Thakre wrote: One possible way to think of it is using variable reduction before going for J48. You may want to use several methods available for that. Again prediction for brands is more of a business question to me. Two solution which I can think of. 1. Variable reduction before decision tree. 2. Let the intuition decide how many of them are really important. Please let us know your findings. All the best. Best Regards, Bhupendrasinh Thakre Sent from my iPhone On Sep 21, 2012, at 12:16 AM, Vik Rubenfeld v...@mindspring.com wrote: Bhupendrashinh, thanks very much! I ran J48 on a respondent-level data set and got a 61.75% correct classification rate! Correctly Classified Instances 988 61.75 % Incorrectly Classified Instances 612 38.25 % Kappa statistic 0.5651 Mean absolute error 0.0432 Root mean squared error 0.1469 Relative absolute error 52.7086 % Root relative squared error 72.6299 % Coverage of cases (0.95 level) 99.6875 % Mean rel. region size (0.95 level) 15.4915 % Total Number of Instances 1600 When I plot it I get an enormous chart. Running : respLevelTree = J48(BRAND_NAME ~ PRI + PROM + FORM + FAMI + DRRE + FREC + MODE + SPED + REVW, data = respLevel) respLevelTree ...reports: J48 pruned tree -- Is there a way to further prune the tree so that I can present a chart that would fit on a single page or two? Thanks very much in advance for any thoughts. -Vik On Sep 20, 2012, at 8:37 PM, Bhupendrasinh Thakre wrote: Not very sure what the problem is as I was not able to take your data for run. You might want to use dput() command to present the data. Now on the programming side. As we can see that we have more than 2 levels for the brands and hence method = class is not able to able to understand what you actually want from it. Suggestion : For predictions having more than 2 levels I will go for Weka and specifically C4.5 algorithm. You also have the RWeka package for it. Best Regards, Bhupendrasinh Thakre Sent from my iPhone On Sep 20, 2012, at 9:47 PM, Vik Rubenfeld v...@mindspring.com wrote: I'm working with some data from which a client would like to make a decision tree predicting brand preference based on inputs such as price, speed, etc. After running the decision tree analysis using rpart, it appears that this data is not capable of predicting brand preference. Here's the data set: BRND PRI PROM FORM FAMI DRRE FREC MODE SPED REVW Brand 1 0.69890.47310.78490.69890.74190.6022 0.88170.90320.6452 Brand 2 0.86210.37930.8621 0.9310.75860.6897 0.89660.96550.8276 Brand 3 0.6 0.1 0.6 0.7 0.9 0.7 0.7 0.8 0.6 Brand 4 0.6429 0.250.5714 0.50.6071 0.5 0.750.8214 0.5 Brand 5 0.75860.42240.73280.66380.73280.6379 0.86210.86210.6897 Brand 6 0.750.08330.58330.4167 0.50.4167 0.750.6667 0.5 Brand 7 0.77420.48390.61290.51610.80650.6452 0.77420.90320.6129 Brand 8 0.64290.26790.69640.7143 0.8750.5536 0.80360.94640.6607 Brand 90.575 0.175 0.65 0.55 0.625 0.375 0.825 0.85 0.475 Brand 10 0.80950.52380.66670.64290.66670.5952 0.85710.80950.5714 Brand 11 0.6308 0.30.60770.58460.67690.5231 0.74620.8846 0.6 Brand 12 0.72120.31520.71520.65450.6606 0.503 0.80610.8909 0.6 Brand 13 0.74190.22580.61290.58060.70970.6129 0.8710.96770.3226 Brand 14 0.71760.27060.63530.56470.69410.4471 0.71760.94120.5176 Brand 15 0.72870.34370.59950.57880.85270.5478 0.82170.89410.6227 Brand 16 0.7 0.4 0.6 0.4 1 0.4 0.9 0.9 0.5 Brand 17 0.71930.0.66670.66670.70180.5263 0.77190.85960.7018 Brand 18 0.77780.41270.65080.63490.79370.6032 0.85710.9206 0.619 Brand 19 0.80280.28170.61970.43660.70420.4366 0.71830.91550.5634 Brand 20 0.77360.24530.62260.37740.58490.3019 0.7170.86790.4717 Brand
[R] Does A Choice-Based Conjoint Study Have To Be Full Profile?
In a Conjoint study, it's difficult for respondents to evaluate more than 6 product attributes at a time. Some studies require more attributes. Often this is solved via the use of Adaptive Conjoint Analysis (ACA), in which the questionnaire is modified for each individual respondent as the survey is being taken. In ACA, it is not necessary to show the full profile -- i.e. all attributes -- of each product. Partial profiles are shown. A study can include up to 30 attributes, but respondents are never asked to consider more than 5 at a time. However, in order to do ACA, as far as I know of at this time, one must have the survey hosted by a very expensive (e.g. $10,000) conjoint-oriented survey host. My question is, is it possible to do partial profile Conjoint in R, without using ACA? Thanks in advance to all for any info. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Correct Place to Seek an R-Project Consultant?
I would like to find out how to apply commands found in the bayesm package, to analyze data gathered via a choice-based conjoint study. Is there a web resource where I can seek an R-Project consultant experienced in this, who I could hire to walk me through the appropriate bayesm commands to use for this purpose? Thanks in advance to all for any info. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Can't Run Conjoint Package - Could not find function caFactorialDesign?
I'm trying to run the Conjoint package, and I receive the error: Error: could not find function caFactorialDesign I'm running R version 2.15.1 on Mac OS X. I have installed the Conjoint package with the Install Dependencies checkbox checked. I have clicked the Update All button in the R Package Installer. How can I correct this error? Thanks in advance to all for any info. (Please respond to this email address as well as to the list -- Thanks!). __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can't Run Conjoint Package - Could not find function caFactorialDesign?
Thanks very much for this info, Sarah! I have used library(conjoint). Here are the commands used: library(conjoint) experiment = expand.grid( + price = c(low, medium, high), + variety = c(black, green, red), + kind = c(bags, granulated, leafy), + aroma = c(yes, no)) design-caFactorialDesign(data=experiment, type=orthogonal) Error: could not find function caFactorialDesign What could I be missing? Best, -Vik On Aug 3, 2012, at 10:28 AM, Sarah Goslee wrote: Hi Vik, You don't need to post to nabble and to the R-help list. Just skip the nabble step! Have you loaded the package with: library(conjoint) # not Conjoint before you try to use any of its functions? Sarah On Fri, Aug 3, 2012 at 1:23 PM, Vik Rubenfeld v...@mindspring.com wrote: I'm trying to run the Conjoint package, and I receive the error: Error: could not find function caFactorialDesign I'm running R version 2.15.1 on Mac OS X. I have installed the Conjoint package with the Install Dependencies checkbox checked. I have clicked the Update All button in the R Package Installer. How can I correct this error? Thanks in advance to all for any info. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can't Run Conjoint Package - Could not find function caFactorialDesign?
Here is the output of sessionInfo(): sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] conjoint_1.33 clusterSim_0.41-5 mlbench_2.1-1 MASS_7.3-20 rgl_0.92.892 e1071_1.6 class_7.3-4 [8] R2HTML_2.2cluster_1.14.2ade4_1.4-17 AlgDesign_1.1-7 loaded via a namespace (and not attached): [1] tools_2.15.1 Per your recommendation, I have read the Posting Guide, and have sent an email to the Maintainers of this packages as well. Best, -Vik On Aug 3, 2012, at 11:15 AM, Sarah Goslee wrote: Hi, On Fri, Aug 3, 2012 at 1:57 PM, Vik Rubenfeld v...@mindspring.com wrote: Thanks very much for this info, Sarah! I have used library(conjoint). Here are the commands used: library(conjoint) experiment = expand.grid( + price = c(low, medium, high), + variety = c(black, green, red), + kind = c(bags, granulated, leafy), + aroma = c(yes, no)) design-caFactorialDesign(data=experiment, type=orthogonal) Error: could not find function caFactorialDesign What could I be missing? That's hard to say. We need at least the output of sessionInfo() to begin to decide. You've got the reproducible example, but a look at the posting guide might still offer you some advice. Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can't Run Conjoint Package - Could not find function caFactorialDesign?
Got it. Thanks so much for your help, Michael and Sarah! Best, -Vik On Aug 3, 2012, at 11:50 AM, R. Michael Weylandt wrote: On Fri, Aug 3, 2012 at 1:34 PM, Sarah Goslee sarah.gos...@gmail.com wrote: On Fri, Aug 3, 2012 at 2:23 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: With conjoint_1.33 and rather up to date dependencies, I don't see caFactorialDesign and neither does getAnywhere(). The function is present in conjoint 1.34, the current version on CRAN. Rarely do I have to remind respondents to update their installation. :) Sarah It looks like 1.34 was uploaded this morning and hasnt made it to my mirror yet: sorry for the noise. Vik, it looks like you're in the same boat. Try switching to the Austrian CRAN master and update -- then it'll be there. Best, Michael __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Newbie Correspondence Analysis Question
I'm experienced in statistics, but I am a first-time R user. I would like to use R for correspondence analysis. I have installed R (Mac OSX). I have used the package installer to install the CA package. I have run the following line with no errors to read in the data for a table: NonLuxury - read.table(/Users/myUserName/Desktop/nonLuxury.data.txt) The R online help appears to suggest that the following line should come next: corresp(NonLuxury) However, I get the error message: Error: could not find function corresp The CA manual appears to suggest that the following line should come next: ca(NonLuxury) Again, I get the error message: Error: could not find function ca What am I missing? Thanks very much in advance to all for any info. -Vik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Newbie Correspondence Analysis Question
Thanks very much. -Vik On Sep 26, 2010, at 9:45 AM, Chris Mcowen wrote: Have you loaded the library after installing it? Either use library(CA) Or Through the package manager tab Hth Sent from my iPhone On 26 Sep 2010, at 17:41, Vik Rubenfeld v...@mindspring.com wrote: I'm experienced in statistics, but I am a first-time R user. I would like to use R for correspondence analysis. I have installed R (Mac OSX). I have used the package installer to install the CA package. I have run the following line with no errors to read in the data for a table: NonLuxury - read.table(/Users/myUserName/Desktop/nonLuxury.data.txt) The R online help appears to suggest that the following line should come next: corresp(NonLuxury) However, I get the error message: Error: could not find function corresp The CA manual appears to suggest that the following line should come next: ca(NonLuxury) Again, I get the error message: Error: could not find function ca What am I missing? Thanks very much in advance to all for any info. -Vik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Storing CA Results to a Data Frame?
I am successfully performing a correspondence analysis using the commands: NonLuxury - read.table(/Users/myUserName/Desktop/nonLuxury.data.txt) ca(NonLuxury) I would like to store the results to a data frame so that I can write them to disk using write.table. I have tried several things such as: df - data.frame(ca(NonLuxury)) df - data.frame(data(ca(NonLuxury))) etc. ...but clearly this is incorrect as it generates an error message. Is it possible to store the results of a CA to a dataframe, and if so, what is the correct way to do this? Thanks in advance to all for any info. -Vik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Storing CA Results to a Data Frame?
[Sorry- somehow the first time I posted this it got attached to another thread -Vik] I am successfully performing a correspondence analysis using the commands: NonLuxury - read.table(/Users/myUserName/Desktop/nonLuxury.data.txt) ca(NonLuxury) I would like to store the results to a data frame so that I can write them to disk using write.table. I have tried several things such as: df - data.frame(ca(NonLuxury)) df - data.frame(data(ca(NonLuxury))) etc. ...but clearly this is incorrect as it generates an error message. Is it possible to store the results of a CA to a dataframe, and if so, what is the correct way to do this? Thanks in advance to all for any info. -Vik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Storing CA Results to a Data Frame?
Thanks very much for this great info, Ista. Best, -Vik On Sep 26, 2010, at 12:10 PM, Ista Zahn wrote: Hi Vik, I suggest reading through some of the introductory documentation. R has several classes of objects, including matrix, list, data.frame etc. and a basic understanding of what these are is essential for effectively using R. An essential function is str() which shows you the structure of an object. Other essential functions include names(), help(), help.search(), and methods() An example session that is similar to your case: library(ca) # load the ca package data(author) # load the authors dataset str(author) # examine the authors data auth.ca - ca(author) # run the ca function on the authors data str(auth.ca) # examin the structure of the auth.ca results. Note that it is a list with class of ca methods(class=class(auth.ca)) # see what methods are defined for this type of object ?plot.ca ## look up the documentation for the plot method for objects of class ca plot(auth.ca) ## call the plot method auth.ca.sum - summary(auth.ca) ## call the summary method str(auth.ca.sum) # examine the structure of the auth.ca.sum object methods(class=class(auth.ca.sum)) ## find out what methods are defined for it ## Hmmn ok, so suppose I want to extract the rows and columns data.frames from auth.ca.sum but don't know how help.search(extract) ## first result is base::Extract ?Extract ## look up documentation for extract auth.ca.rows - auth.ca.sum[[rows]] ## extract the rows data.frame auth.ca.rows - auth.ca.sum[[columns]] ## extract the columns data.frame write.csv(auth.ca.rows) ## write results to a .csv file write.csv(auth.ca.rows) ## HTH, Ista On Sun, Sep 26, 2010 at 6:10 PM, Vik Rubenfeld v...@mindspring.com wrote:, I am successfully performing a correspondence analysis using the commands: NonLuxury - read.table(/Users/myUserName/Desktop/nonLuxury.data.txt) ca(NonLuxury) I would like to store the results to a data frame so that I can write them to disk using write.table. I have tried several things such as: df - data.frame(ca(NonLuxury)) df - data.frame(data(ca(NonLuxury))) etc. ...but clearly this is incorrect as it generates an error message. Is it possible to store the results of a CA to a dataframe, and if so, what is the correct way to do this? Thanks in advance to all for any info. -Vik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.