[R] Intercept and slope of GLS model
Dear list - A simple question, i hope! With this truncated output from a GLS model the intercept is 0.004634487 but where is the slope? Coefficients: Value Std.Error t-valuep-value (Intercept) 0.004634487 0.0006773396 6.842192 0. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Extract subset of rows
Hi, I have two matrices with a common field = species what i want to do is make a matrix that combines the data held in the other two based on the species name. I.e ( simple example) Matrix 1 - monocot SPECIES V1 A 2 B 3 C 4 D 5 Matrix 2 - PCAresults SPECIES V2 A 0.2 B 0.3 C 4.1 D 3.2 matrix 3 - What i want SPECIES V1 V2 A 2 0.2 B 3 0.3 C 4 4.1 D 5 3.2 my attempt has been this test -monocot[which(monocot$SPECIES%in%PCAresults$SPECIES),] But this returns matrix with only those found in each but doesn't import the data from PCAresults Any help would be greatly appreciated John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extract subset of rows
Hi all, Sorry i forgot to mention there a species present in one matrix not in the other hence the problem i.e matrix 1 may have species E which isnt present in matrix 2 and matrix 2 may have species F not present in matrix 1. Sorry for the lack of clarification in the original post! John On 17 Dec 2010, at 15:19, John Haart wrote: Hi, I have two matrices with a common field = species what i want to do is make a matrix that combines the data held in the other two based on the species name. I.e ( simple example) Matrix 1 - monocot SPECIES V1 A 2 B 3 C 4 D 5 Matrix 2 - PCAresults SPECIES V2 A 0.2 B 0.3 C 4.1 D 3.2 matrix 3 - What i want SPECIES V1 V2 A 2 0.2 B 3 0.3 C 4 4.1 D 5 3.2 my attempt has been this test -monocot[which(monocot$SPECIES%in%PCAresults$SPECIES),] But this returns matrix with only those found in each but doesn't import the data from PCAresults Any help would be greatly appreciated John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help getting data in correct format
Dear All, I am having trouble getting my data into R as i need it! I am used to using read.delim() to open .txt files to do work on. The function i am using requires a matrix like the one below. My data is from excel and then saved as a txt file. I have tried the usual read.delim() approach but the function doesn't like it, i have also tried as.matrix(data =) but that doesn't work. What i require is the row names to be the first column of the txt file , as below This was made using this nTraits = 5 nSpecies = 20 a=matrix(runif(nTraits*nSpecies),nSpecies,nTraits) rownames(a) = letters[1:nSpecies] However i want my data in their not simulated [,1] [,2] [,3][,4] [,5] a 0.72869780 0.36399621 0.82468338 0.850754150 0.76943833 b 0.12871015 0.85622370 0.82368054 0.108350500 0.79933805 c 0.66413022 0.24059396 0.74626595 0.748063466 0.14243494 d 0.54113172 0.97769199 0.10538211 0.184683363 0.29701420 e 0.44891743 0.38594751 0.01475637 0.492245059 0.87616253 f 0.09938324 0.62971605 0.79302032 0.156930575 0.09860269 g 0.30953025 0.35611074 0.70155464 0.149412029 0.26068831 h 0.15825945 0.43603713 0.23960806 0.617240274 0.97756739 i 0.56588912 0.34942257 0.51724529 0.360303540 0.78650285 j 0.96083252 0.44307860 0.17529919 0.007925761 0.51184681 k 0.09452434 0.49068549 0.80419267 0.401200731 0.70748547 l 0.26930842 0.65879761 0.76251526 0.315836610 0.72541148 m 0.52158352 0.77047295 0.81276164 0.641264799 0.83530547 n 0.21039165 0.63698953 0.94877972 0.329354647 0.08512338 o 0.80294849 0.85471292 0.75379324 0.310917453 0.71528901 p 0.01494773 0.03056341 0.91496579 0.817269310 0.29831036 q 0.56848121 0.50864998 0.59944638 0.046270295 0.24808518 r 0.30511696 0.79790561 0.83210599 0.334429013 0.50195706 s 0.96551698 0.95807309 0.26016434 0.629840741 0.47213370 t 0.86062917 0.09867528 0.24388087 0.167937283 0.70620760 Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Randomly shuffle an array multiple times
Dear List, I have a table i have read into R: NameYes/No John0 Frank 1 Ann 0 James 1 Alex1 etc - 800 different times. What i want to do is shuffle yes/no and randomly re-assign them to the name. I have used sample() and permute(), however there is no way to do this 1000 times. Furthermore, i want to copy the data into a excel spreadsheet in the same order as the data was input so i can build up a distribution of the statistic for each name. When i use shuffle the date gets returned like this - [1] 1 0 0 1 0 1 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 0 1 0 0 0 1 0 1 [34] 0 1 0 0 0 0 0 0 0 1 0 1 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 1 1 1 0 0 0 [67] 0 0 0 0 0 0 0 1 1 1 0 1 0 0 1 1 0 1 0 1 0 1 1 1 0 0 1 0 0 1 1 1 1 [100] 1 1 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 1 0 0 0 1 0 1 0 0 [133] 0 0 0 0 0 0 1 0 1 1 0 1 1 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0 [166] 0 0 0 1 1 1 0 1 0 1 0 1 0 0 0 1 0 0 0 0 0 0 1 0 1 1 1 0 0 1 1 0 1 [199] 0 0 0 0 1 0 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 1 [232] 0 0 0 1 1 0 1 0 0 1 0 0 0 0 1 0 1 0 1 0 1 0 1 0 0 1 0 0 0 0 0 0 1 [265] 0 1 0 0 0 1 0 0 0 1 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 1 [298] 0 1 1 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0 1 0 [331] 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 1 1 etc Rather than like this, is there a way to change the output? John0 Frank 1 Ann 0 James 1 Alex1 Can anyone suggest a script that would achieve this? Thanks Peter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random assignment
Hi Denis and list Thanks for this , and sorry for not providing enough information First let me put the study into a bit more context : - I know the number of species at risk in each family, what i am asking is Is risk random according to family or do certain families have a disproportionate number of at risk species? My idea was to randomly allocate risk to the families based on the criteria below (binomial(nspecies, 0.0748)) and then compare this to the true data and see if there was a significant difference. So in answer to your questions, (assuming my method is correct !) Is this over all families, or within a particular family? If the former, why does a distinction of family matter? Within a particular family - this is because i am looking to see if risk in the observed data set is random in respect to family so this will provide the baseline to compare against. I guess you've stated the p, but what's the n? The number of species in each family? This varies largely, for instance i have some families that are monotypic (with 1 species) and then i have other families with 100+ species Assuming you have multiple families, do you want separate simulations per family, or do you want to do some sort of weighting (perhaps proportional to size) over all families? I am assuming i want some sort of weighting. This is because i am wanting to calculate the number of species expected to be at risk in EACH family under the random binomial distribution ( assuming every species has a 7.48% chance of being at risk. Thanks John On 15 Oct 2010, at 11:19, Dennis Murphy wrote: Hi: I don't believe you've provided quite enough information just yet... On Fri, Oct 15, 2010 at 2:22 AM, John Haart anothe...@me.com wrote: Dear List, I am doing some simulation in R and need basic help! I have a list of animal families for which i know the number of species in each family. I am working under the assumption that a species has a 7.48% chance of being at risk. Is this over all families, or within a particular family? If the former, why does a distinction of family matter? I want to simulate the number of species expected to be at risk under a random binomial distribution with 10,000 randomizations. I guess you've stated the p, but what's the n? The number of species in each family? If you're simulating on a family by family basis, then it would seem that a binomial(nspecies, 0.0748) distribution would be the reference. Assuming you have multiple families, do you want separate simulations per family, or do you want to do some sort of weighting (perhaps proportional to size) over all families? The latter is doable, but it would require a two-stage simulation: one to randomly select a family and then to randomly select a species. Dennis I am relatively knew to this field and would greatly appreciate a idiot-proof response, I.e how should the data be entered into R? I was thinking of using read.table, header = T, where the table has F = Family Name, and SP = Number of species in that family? John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random assignment
Hi Michael, Thanks for this - the reason i am following this approach is that it appeared in a paper i was reading, and i thought it was a interesting angle to take The paper is Vamosi Wilson, 2008. Nonrandom extinction leads to elevated loss of angiosperm evolutionary history. Ecology Letters, (2008) 11: 1047–1053. and the specific method i am following states :- We calculated the number of species expected to be at risk in each family under a random binomial distribution in 10 000 randomizations [generated using R version 2.6.0 (R Development Team 2007)] assuming every species has a 7.48% chance of being at risk. I guess the reason i am doing the simulation is because i am not hugely statistically minded and the paper was asking the same question i am interested in answering :). So following your approach - if family F has Fn species, your random expectation is that p * Fn of them will be at risk (p = 0.0748). The variance on that expectation will be p * (1-p) * Fn. Family f = Bromeliaceae , with Fn = 80, p=0.0748 random expectation = p*Fn = (0.0748*80) = 5.984 variance = p * (1-p) * Fn = (0.0748*0.9252) *80 = 5.5363968 So the random expectation is that the Bromeliaceae will have 6 species at risk, if risk is assigned randomly? So if i do this for all the families it will be the same as doing the simulation experiment outline in the method above? Thanks John On 15 Oct 2010, at 12:49, Michael Bedward wrote: Hi John, The word species attracted my attention :) Like Dennis, I'm not sure I understand your idea properly. In particular, I don't see what you need the simulation for. If family F has Fn species, your random expectation is that p * Fn of them will be at risk (p = 0.0748). The variance on that expectation will be p * (1-p) * Fn. If you do your simulation that's the result you'll get. Perhaps to initial identify families with disproportionate observed extinction rates all you need is the dbinom function ? Michael On 15 October 2010 22:29, John Haart anothe...@me.com wrote: Hi Denis and list Thanks for this , and sorry for not providing enough information First let me put the study into a bit more context : - I know the number of species at risk in each family, what i am asking is Is risk random according to family or do certain families have a disproportionate number of at risk species? My idea was to randomly allocate risk to the families based on the criteria below (binomial(nspecies, 0.0748)) and then compare this to the true data and see if there was a significant difference. So in answer to your questions, (assuming my method is correct !) Is this over all families, or within a particular family? If the former, why does a distinction of family matter? Within a particular family - this is because i am looking to see if risk in the observed data set is random in respect to family so this will provide the baseline to compare against. I guess you've stated the p, but what's the n? The number of species in each family? This varies largely, for instance i have some families that are monotypic (with 1 species) and then i have other families with 100+ species Assuming you have multiple families, do you want separate simulations per family, or do you want to do some sort of weighting (perhaps proportional to size) over all families? I am assuming i want some sort of weighting. This is because i am wanting to calculate the number of species expected to be at risk in EACH family under the random binomial distribution ( assuming every species has a 7.48% chance of being at risk. Thanks John On 15 Oct 2010, at 11:19, Dennis Murphy wrote: Hi: I don't believe you've provided quite enough information just yet... On Fri, Oct 15, 2010 at 2:22 AM, John Haart anothe...@me.com wrote: Dear List, I am doing some simulation in R and need basic help! I have a list of animal families for which i know the number of species in each family. I am working under the assumption that a species has a 7.48% chance of being at risk. Is this over all families, or within a particular family? If the former, why does a distinction of family matter? I want to simulate the number of species expected to be at risk under a random binomial distribution with 10,000 randomizations. I guess you've stated the p, but what's the n? The number of species in each family? If you're simulating on a family by family basis, then it would seem that a binomial(nspecies, 0.0748) distribution would be the reference. Assuming you have multiple families, do you want separate simulations per family, or do you want to do some sort of weighting (perhaps proportional to size) over all families? The latter is doable, but it would require a two-stage simulation: one to randomly select a family and then to randomly select a species. Dennis I am relatively knew to this field and would
[R] Random assignment
Dear List, I am doing some simulation in R and need basic help! I have a list of animal families for which i know the number of species in each family. I am working under the assumption that a species has a 7.48% chance of being at risk. I want to simulate the number of species expected to be at risk under a random binomial distribution with 10,000 randomizations. I am relatively knew to this field and would greatly appreciate a idiot-proof response, I.e how should the data be entered into R? I was thinking of using read.table, header = T, where the table has F = Family Name, and SP = Number of species in that family? John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help
Dear List and Frank, I have calculated the log-odds for my models but maybe i am not getting something but i am not understanding how for a categorical factor this helps? On all the examples i have see it relates to continuous factors where moving from one number to another shows either a increase or decrease, not as in my case a change of catagory. Furthermore, this gives the values for each factor independent of each other, how do i get the log-odds for the entire model? I appreciate i maybe trying to put things in boxes again, i am not i am happy to report the log odds of moving from one response level to the next but would like it for all the factors together not independently. John Low HighDiff. Effect S.E.Lower Upper WO Woody:Non_woody 1 2 NA 0.280.16-0.04 0.6 Odds Ratio 1 2 NA 1.32NA 0.961.82 PD Abiotic:Biotic 2 1 NA -1.21 0.13-1.47 -0.96 Odds Ratio 2 1 NA 0.3 NA 0.230.38 ALT All:Low 3 1 NA 0.470.190.110.84 Odds Ratio 3 1 NA 1.6 NA 1.112.31 ALT High:Low3 2 NA -0.07 0.14-0.35 0.21 Odds Ratio 3 2 NA 0.93NA 0.7 1.24 ALT Mid:Low 3 4 NA 0.390.150.1 0.67 Odds Ratio 3 4 NA 1.48NA 1.111.96 REG Two_plus:One1 2 NA -0.59 0.13-0.84 -0.34 Odds Ratio 1 2 NA 0.55NA 0.430.72 BIO Arctic:Subtropical/Tropical 4 1 NA -1.02 0.81-2.61 0.58 Odds Ratio 4 1 NA 0.36NA 0.071.78 BIO Boreal:Subtropical/Tropical 4 2 NA -1.21 0.81-2.79 0.37 Odds Ratio 4 2 NA 0.3 NA 0.061.44 BIO Mediterranean:Subtropical/Tropical 4 3 NA -1.89 0.48-2.83 -0.95 Odds Ratio 4 3 NA 0.15NA 0.060.39 BIO Temperate:Subtropical/Tropical 4 5 NA -0.09 0.16-0.41 0.23 Odds Ratio 4 5 NA 0.91NA 0.661.26 On 3 Oct 2010, at 15:29, Frank Harrell wrote: You still seem to be hung up on making arbitrary classifications. Instead, look at tendencies using odds ratios or rank correlation measures. My book Regression Modeling Strategies covers this. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2953220.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help
Thanks Frank and Greg, This makes alot more sense to me now. I appreciate you are both very busy, but i was wondering if i could trouble you for one last piece of advice. As my data is a little complicated for a first effort at R let alone modelling! The response is on a range from 1-6, which indicates extinction risk - 1 being least concern and 6 being critical - hence using a ordinal model The factors (6) are categorical - FRUIT TYPE - fleshy/dry HABITAT - terrestrial, aquatic, epiphyte etc etc I am asking the question - How do different combinations of factors effect extinction risk. Based on what you have both said i have called predict(model1, type=fitted) Would this be the best way predicting the probability of falling into each response category - y=2y=3 y=4 y=5 y=6 10.502220616 0.410236021 0.2892270912 0.2191420568 0.1774250519 20.745221699 0.668501579 0.5412223837 0.4486151612 0.3847379442 30.720381333 0.639796647 0.5095814746 0.4174618165 0.3551631876 40.752321112 0.676811675 0.5505781183 0.4579680710 0.3937100283 50.824388319 0.763956402 0.6543788296 0.5663098186 0.5008981585 60.824388319 0.763956402 0.6543788296 0.5663098186 0.5008981585 70.824388319 0.763956402 0.6543788296 0.5663098186 0.5008981585 80.824388319 0.763956402 0.6543788296 0.5663098186 0.5008981585 90.526291649 0.433739868 0.3094355120 0.2360800803 0.1919312111 I have 100 species for which i have their factors and i want to predict their response, so if i do the above and use the newdata function, and present the probabilities as above rather than trying to classify them? I tried polr and that classified each response as either 1 or 6 i.e no 2,3,4,5 - as did calling predict(model1, type=fitted.ind) which resulted in the probabilities of being 1 or 6 far outweighing 2,3,4,5 (Below) - this may just be that my model is not powefull enough to discrimate effectively as i know that is incorrect ( Brier score 2.01, AUC 66.9)? EXTINCTION=1 EXTINCTION=2 EXTINCTION=3 EXTINCTION=4 EXTINCTION=5 EXTINCTION=6 1 0.4977794 0.0919845942 0.121008930 0.070085034 0.0417170048 0.1774250519 2 0.2547783 0.0767201200 0.127279196 0.092607223 0.0638772170 0.3847379442 3 0.2796187 0.0805846862 0.130215173 0.092119658 0.0622986289 0.3551631876 4 0.2476789 0.0755094367 0.126233557 0.092610047 0.0642580427 0.3937100283 5 0.1756117 0.0604319173 0.109577572 0.088069011 0.0654116601 0.5008981585 6 0.1756117 0.0604319173 0.109577572 0.088069011 0.0654116601 0.5008981585 7 0.1756117 0.0604319173 0.109577572 0.088069011 0.0654116601 0.5008981585 8 0.1756117 0.0604319173 0.109577572 0.088069011 0.0654116601 0.5008981585 9 0.4737084 0.0925517814 0.124304356 0.073355432 0.0441488692 0.1919312111 10 0.2489307 0.0757263892 0.126424896 0.092614323 0.0641934484 0.3921102030 Thanks very much for any advice given, John 10 0.751069260 0.675342871 0.5489179746 0.4563036514 0.3921102030 On 1 Oct 2010, at 23:13, Frank Harrell wrote: Well put Greg. The job of the statistician is to produce good estimates (probabilities in this case). Those cannot be translated into action without subject-specific utility functions. Classification during the analysis or publication stage is not necessary. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2951976.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help
Dear list, I am relatively new to ordinal models and have been working through the example given by Frank Harrell in the predict.lrm {Design} help All of this makes sense to me, except for the responses, i,e how do i interpret them? i would be extremely grateful if someone could explain the results? First i establish the date and model - y - factor(sample(1:3, 400, TRUE), 1:3, c('good','better','best')) x1 - runif(400) x2 - runif(400) f - lrm(y ~ rcs(x1,4)*x2, x=TRUE) Get 0.95 confidence limits for Prob[better or best # How do i interpret this on the y scale i.e good,better,best? L - predict(f, se.fit=TRUE) #omitted kint= so use 1st intercept plogis(with(L, linear.predictors + 1.96*cbind(-se.fit,se.fit))) se.fit 1 0.6430994 0.8305201 2 0.5812662 0.7919122 3 0.5692593 0.7976906 4 0.5600308 0.7278637 5 0.6845250 0.8819143 6 0.5518848 0.7228657 7 0.5876031 0.7717215 8 0.6291766 0.8354423 9 0.5839353 0.8333790 10 0.5631326 0.8314051 Get Prob(better) than all others - # Does this mean that for data point 1, y= best as it has the higher probability? predict(f, type=fitted.ind)[1:10,] y=good y=bettery=best 1 0.2517915 0.3469692 0.4012392 2 0.3031733 0.3554471 0.3413796 3 0.3046236 0.3555365 0.3398398 4 0.3514780 0.3546880 0.2938340 5 0.1989827 0.3251784 0.4758390 6 0.3581265 0.3540297 0.2878438 7 0.3130150 0.3559091 0.3310759 8 0.2541324 0.3476007 0.3982669 9 0.2740127 0.3519713 0.3740160 10 0.2839907 0.3535331 0.3624763 Establish data frame to use as newdata d - data.frame(x1=c(.1,.5),x2=c(.5,.15)) Predict newdata - Prob(Y=j) for new observation predict(f, d, type=fitted) # Does this mean that for data point 1, y= better as it has the higher probability? y=better y=best 1 0.6800290 0.3239935 2 0.5846743 0.2409657 # Prob(Y=j) # Again - Does this mean that for data point 1, y= better as it has the higher probability? predict(f, d, type=fitted.ind) y=good y=bettery=best 1 0.3199710 0.3560355 0.3239935 2 0.4153257 0.3437086 0.2409657 predict mean(y) using codes 1,2,3 # How do i interpret this on the y scale i.e good,better,best? predict(f, d, type='mean', codes=TRUE) 12 2.004022 1.825640 Thanks for any advice it is greatly appreciated John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help
Frank, Thats great thanks for the advice, i appreciate that brier score, AUC etc are a better method of validation and discrimination but when it comes to predictions of new data d - data.frame(x1=c(.1,.5),x2=c(.5,.15)) predict(f, d, type=fitted.ind) y=good y=bettery=best 1 0.3199710 0.3560355 0.3239935 2 0.4153257 0.3437086 0.2409657 predict mean(y) using codes 1,2,3 predict(f, d, type='mean', codes=TRUE) 12 2.004022 1.825640 How do i use this information to assign x1 and x2 into a category on the response scale (good,better,best?) Thanks John On 1 Oct 2010, at 12:14, Frank Harrell wrote: John, Don't conclude that one category is the most probable when its probability of being equaled or exceeded is a maximum. The first category would always be the winner if that were the case. When you say y=best remember that you are dealing with a probability model. Nothing is forcing you to classify an observation, and unless the category's probability is high, this may be dangerous. You might do well to consider a more smooth approach such as using the generalized roc area (C-index) or its related rank correlation measure Dxy. Also there are odds ratios. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2891623.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fwd: Interpreting the example given by Frank Harrell in the predict.lrm {Design} help
Frank and list, The reason I am trying to assign them is because I have a data set where i have arrived at the most likely model that describes the data and now I have another dataset where I know the factors but not the response. Therefore, surely I need to assign the predicted values to a response in order to say something like: Based on the model I believe unknown 1 is good, where as unknown 2 is very good etc? Maybe I am missing something or using the wrong approach but I thought the main purpose of using the predict function on new data was to predict the response? John On 1 Oct 2010, at 14:51, Frank Harrell f.harr...@vanderbilt.edu wrote: Why assign them at all? Is this a forced choice at gunpoint problem? Remember what probabilities mean. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2909713.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.