[R] Intercept and slope of GLS model

2011-03-29 Thread John Haart
Dear list - 

A simple question, i hope!

With this truncated output from a GLS model the intercept is  0.004634487 but 
where is the slope?

Coefficients:
   Value Std.Error   
t-valuep-value
(Intercept)  0.004634487 0.0006773396  6.842192  0.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Extract subset of rows

2010-12-17 Thread John Haart
Hi,

I have two matrices with a common field = species what i want to do is make a 
matrix that combines the data held in the other two based on the species name.

I.e ( simple example)

Matrix 1 - monocot

SPECIES V1

A   2
B   3
C   4   
D   5

Matrix 2 - PCAresults

SPECIES V2

A   0.2
B   0.3
C   4.1 
D   3.2

matrix 3 - What i want

SPECIES V1  V2

A   2   0.2
B   3   0.3
C   4   4.1
D   5   3.2



my attempt has been this 

test -monocot[which(monocot$SPECIES%in%PCAresults$SPECIES),] 

But this returns matrix with only those found in each but doesn't import the 
data from PCAresults

Any help would be greatly appreciated

John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extract subset of rows

2010-12-17 Thread John Haart
Hi all,

Sorry i forgot to mention there a species present in one matrix not in the 
other hence the problem i.e matrix 1 may have species E which isnt present in 
matrix 2 and matrix 2 may have species F not present in matrix 1.

Sorry for the lack of clarification in the original post!

John
On 17 Dec 2010, at 15:19, John Haart wrote:

Hi,

I have two matrices with a common field = species what i want to do is make a 
matrix that combines the data held in the other two based on the species name.

I.e ( simple example)

Matrix 1 - monocot

SPECIES V1

A   2
B   3
C   4   
D   5

Matrix 2 - PCAresults

SPECIES V2

A   0.2
B   0.3
C   4.1 
D   3.2

matrix 3 - What i want

SPECIES V1  V2

A   2   0.2
B   3   0.3
C   4   4.1
D   5   3.2



my attempt has been this 

test -monocot[which(monocot$SPECIES%in%PCAresults$SPECIES),] 

But this returns matrix with only those found in each but doesn't import the 
data from PCAresults

Any help would be greatly appreciated

John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help getting data in correct format

2010-12-06 Thread John Haart

Dear All,

I am having trouble getting my data into R as i need it! I am used to using 
read.delim() to open .txt files to do work on. The function i am using 
requires a matrix like the one below.

My data is from excel and then saved as a txt file. I have tried the usual 
read.delim() approach but the function doesn't like it, i have also tried 
as.matrix(data =) but that doesn't work.

What i require is the row names to be the first column of the txt file , as 
below

This was made using this

nTraits = 5
nSpecies = 20
a=matrix(runif(nTraits*nSpecies),nSpecies,nTraits)
rownames(a) = letters[1:nSpecies]

However i want my data in their not simulated




 [,1]   [,2]   [,3][,4]   [,5]
a 0.72869780 0.36399621 0.82468338 0.850754150 0.76943833
b 0.12871015 0.85622370 0.82368054 0.108350500 0.79933805
c 0.66413022 0.24059396 0.74626595 0.748063466 0.14243494
d 0.54113172 0.97769199 0.10538211 0.184683363 0.29701420
e 0.44891743 0.38594751 0.01475637 0.492245059 0.87616253
f 0.09938324 0.62971605 0.79302032 0.156930575 0.09860269
g 0.30953025 0.35611074 0.70155464 0.149412029 0.26068831
h 0.15825945 0.43603713 0.23960806 0.617240274 0.97756739
i 0.56588912 0.34942257 0.51724529 0.360303540 0.78650285
j 0.96083252 0.44307860 0.17529919 0.007925761 0.51184681
k 0.09452434 0.49068549 0.80419267 0.401200731 0.70748547
l 0.26930842 0.65879761 0.76251526 0.315836610 0.72541148
m 0.52158352 0.77047295 0.81276164 0.641264799 0.83530547
n 0.21039165 0.63698953 0.94877972 0.329354647 0.08512338
o 0.80294849 0.85471292 0.75379324 0.310917453 0.71528901
p 0.01494773 0.03056341 0.91496579 0.817269310 0.29831036
q 0.56848121 0.50864998 0.59944638 0.046270295 0.24808518
r 0.30511696 0.79790561 0.83210599 0.334429013 0.50195706
s 0.96551698 0.95807309 0.26016434 0.629840741 0.47213370
t 0.86062917 0.09867528 0.24388087 0.167937283 0.70620760

Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Randomly shuffle an array multiple times

2010-10-18 Thread John Haart
Dear List,

I have a table i have read into R:

NameYes/No

John0
Frank   1
Ann 0
James   1
Alex1

etc  - 800 different times.

What i want to do is shuffle yes/no and randomly re-assign them to the name.

I have used sample() and permute(), however there is no way to do this 1000 
times. Furthermore, i want to copy the data into a excel spreadsheet in the 
same order as the data was input so i can build up a distribution of the 
statistic for each name. When i use shuffle the date gets returned like this -

[1] 1 0 0 1 0 1 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 0 1 0 0 0 1 0 1
 [34] 0 1 0 0 0 0 0 0 0 1 0 1 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 1 1 1 0 0 0
 [67] 0 0 0 0 0 0 0 1 1 1 0 1 0 0 1 1 0 1 0 1 0 1 1 1 0 0 1 0 0 1 1 1 1
[100] 1 1 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 1 0 0 0 1 0 1 0 0
[133] 0 0 0 0 0 0 1 0 1 1 0 1 1 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0
[166] 0 0 0 1 1 1 0 1 0 1 0 1 0 0 0 1 0 0 0 0 0 0 1 0 1 1 1 0 0 1 1 0 1
[199] 0 0 0 0 1 0 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 1
[232] 0 0 0 1 1 0 1 0 0 1 0 0 0 0 1 0 1 0 1 0 1 0 1 0 0 1 0 0 0 0 0 0 1
[265] 0 1 0 0 0 1 0 0 0 1 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 1
[298] 0 1 1 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0 1 0
[331] 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 1 1

etc

Rather than like this, is there a way to change the output?

John0
Frank   1
Ann 0
James   1
Alex1

Can anyone suggest a script that would achieve this?

Thanks

Peter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random assignment

2010-10-15 Thread John Haart
Hi Denis and list

Thanks for this , and sorry for not providing enough information

First let me put the study into a bit more context : -

I know the number of species at risk in each family, what i am asking  is Is 
risk random according to family or do certain families have a disproportionate 
number of at risk species?

My idea was to randomly allocate risk to the families based on the criteria 
below (binomial(nspecies, 0.0748)) and then compare this to the true data and 
see if there was a significant difference.

So in answer to your questions, (assuming my method is correct !)

 Is this over all families, or within a particular family? If the former, why
 does a distinction of family matter?

Within a particular family  - this is because i am looking to see if risk in 
the observed data set is random in respect to family so this will provide the 
baseline to compare against.

 I guess you've stated the p, but what's the n? The number of species in each
 family?

This varies largely, for instance i have some families that are monotypic  
(with 1 species) and then i have other families with 100+ species 


 Assuming you have multiple families, do you want separate simulations per
 family, or do you want to do some sort of weighting (perhaps proportional to
 size) over all families?

I am assuming i want some sort of weighting. This is because i am wanting to 
calculate the number of species expected to be at risk in EACH family under the 
random binomial distribution ( assuming every species has a 7.48% chance of 
being at risk.

Thanks

John




On 15 Oct 2010, at 11:19, Dennis Murphy wrote:

Hi:

I don't believe you've provided quite enough information just yet...

On Fri, Oct 15, 2010 at 2:22 AM, John Haart anothe...@me.com wrote:

 Dear List,
 
 I am doing some simulation in R and need basic help!
 
 I have a list of animal families for which i know the number of species in
 each family.
 
 I am working under the assumption that a species has a 7.48% chance of
 being at risk.
 

Is this over all families, or within a particular family? If the former, why
does a distinction of family matter?

 
 I want to simulate the number of species expected to be at risk under a
 random binomial distribution with 10,000 randomizations.
 

I guess you've stated the p, but what's the n? The number of species in each
family? If you're simulating on a family by family basis, then it would seem
that a binomial(nspecies, 0.0748) distribution would be the reference.
Assuming you have multiple families, do you want separate simulations per
family, or do you want to do some sort of weighting (perhaps proportional to
size) over all families? The latter is doable, but it would require a
two-stage simulation: one to randomly select a family and then to randomly
select a species.

Dennis


 
 I am relatively knew to this field and would greatly appreciate a
 idiot-proof response, I.e how should the data be entered into R? I was
 thinking of using read.table, header = T, where the table has F = Family
 Name, and SP = Number of species in that family?
 
 John
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random assignment

2010-10-15 Thread John Haart
Hi Michael,

Thanks for this - the reason i am following this approach is that it appeared 
in a paper i was reading, and i thought it was a interesting angle to take 

The paper is 

Vamosi  Wilson, 2008. Nonrandom extinction leads to elevated loss of 
angiosperm evolutionary history. Ecology Letters, (2008) 11: 1047–1053.

and the specific method i am following states :- 

 We calculated the number of species expected to be at risk in each family 
 under a random binomial distribution in 10 000 randomizations [generated 
 using R version 2.6.0 (R Development Team 2007)] assuming every species has a 
 7.48% chance of being at risk. 

I guess the reason i am doing the simulation is because i am not hugely 
statistically minded and the paper was asking the same question i am interested 
in answering :).

So following your approach -

 if family F has Fn species, your random expectation is that p * Fn of
 them will be at risk (p = 0.0748). The variance on that expectation
 will be p * (1-p) * Fn.


Family f = Bromeliaceae , with Fn = 80, p=0.0748
random expectation = p*Fn = (0.0748*80) = 5.984
variance = p * (1-p) * Fn = (0.0748*0.9252) *80 = 5.5363968

So the random expectation is that the Bromeliaceae will have 6 species at risk, 
if risk is assigned randomly?

So if i do this for all the families it will be the same as doing the 
simulation experiment outline in the method above?

Thanks

John




On 15 Oct 2010, at 12:49, Michael Bedward wrote:

Hi John,

The word species attracted my attention :)

Like Dennis, I'm not sure I understand your idea properly. In
particular, I don't see what you need the simulation for.

If family F has Fn species, your random expectation is that p * Fn of
them will be at risk (p = 0.0748). The variance on that expectation
will be p * (1-p) * Fn.

If you do your simulation that's the result you'll get.  Perhaps to
initial identify families with disproportionate observed extinction
rates all you need is the dbinom function ?

Michael


On 15 October 2010 22:29, John Haart anothe...@me.com wrote:
 Hi Denis and list
 
 Thanks for this , and sorry for not providing enough information
 
 First let me put the study into a bit more context : -
 
 I know the number of species at risk in each family, what i am asking  is Is 
 risk random according to family or do certain families have a 
 disproportionate number of at risk species?
 
 My idea was to randomly allocate risk to the families based on the criteria 
 below (binomial(nspecies, 0.0748)) and then compare this to the true data 
 and see if there was a significant difference.
 
 So in answer to your questions, (assuming my method is correct !)
 
 Is this over all families, or within a particular family? If the former, why
 does a distinction of family matter?
 
 Within a particular family  - this is because i am looking to see if risk in 
 the observed data set is random in respect to family so this will provide 
 the baseline to compare against.
 
 I guess you've stated the p, but what's the n? The number of species in each
 family?
 
 This varies largely, for instance i have some families that are monotypic  
 (with 1 species) and then i have other families with 100+ species
 
 
 Assuming you have multiple families, do you want separate simulations per
 family, or do you want to do some sort of weighting (perhaps proportional to
 size) over all families?
 
 I am assuming i want some sort of weighting. This is because i am wanting to 
 calculate the number of species expected to be at risk in EACH family under 
 the random binomial distribution ( assuming every species has a 7.48% chance 
 of being at risk.
 
 Thanks
 
 John
 
 
 
 
 On 15 Oct 2010, at 11:19, Dennis Murphy wrote:
 
 Hi:
 
 I don't believe you've provided quite enough information just yet...
 
 On Fri, Oct 15, 2010 at 2:22 AM, John Haart anothe...@me.com wrote:
 
 Dear List,
 
 I am doing some simulation in R and need basic help!
 
 I have a list of animal families for which i know the number of species in
 each family.
 
 I am working under the assumption that a species has a 7.48% chance of
 being at risk.
 
 
 Is this over all families, or within a particular family? If the former, why
 does a distinction of family matter?
 
 
 I want to simulate the number of species expected to be at risk under a
 random binomial distribution with 10,000 randomizations.
 
 
 I guess you've stated the p, but what's the n? The number of species in each
 family? If you're simulating on a family by family basis, then it would seem
 that a binomial(nspecies, 0.0748) distribution would be the reference.
 Assuming you have multiple families, do you want separate simulations per
 family, or do you want to do some sort of weighting (perhaps proportional to
 size) over all families? The latter is doable, but it would require a
 two-stage simulation: one to randomly select a family and then to randomly
 select a species.
 
 Dennis
 
 
 
 I am relatively knew to this field and would

[R] Random assignment

2010-10-15 Thread John Haart
Dear List,

I am doing some simulation in R and need basic help! 

I have a list of animal families for which i know the number of species in each 
family.

I am working under the assumption that a species has a 7.48% chance of being at 
risk. 

I want to simulate the number of species expected to be at risk under a random 
binomial distribution with 10,000 randomizations. 

I am relatively knew to this field and would greatly appreciate a idiot-proof 
response, I.e how should the data be entered into R? I was thinking of using 
read.table, header = T, where the table has F = Family Name, and SP = Number of 
species in that family?

John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

2010-10-04 Thread John Haart
Dear List and Frank,

I have calculated the log-odds for my models but maybe i am not getting 
something but i am not understanding how for a categorical factor this helps? 
On all the examples i have see it relates to continuous factors where moving 
from one number to another shows either a increase or decrease, not as in my 
case a change of catagory.

Furthermore, this gives the values for each factor independent of each other, 
how do i get the log-odds for the entire model? I appreciate i maybe trying to 
put things in boxes again, i am not i am happy to report the log odds  of 
moving from one response level to the next but would like it for all the 
factors together not independently.

John

Low HighDiff.   Effect  S.E.Lower   Upper
WO  Woody:Non_woody 1   2   
NA  0.280.16-0.04   0.6
Odds Ratio  1   
2   NA  1.32NA  0.961.82
PD  Abiotic:Biotic  2   
1   NA  -1.21   0.13-1.47   -0.96
Odds Ratio  2   
1   NA  0.3 NA  0.230.38
ALT All:Low 3   
1   NA  0.470.190.110.84
Odds Ratio  3   
1   NA  1.6 NA  1.112.31
ALT High:Low3   
2   NA  -0.07   0.14-0.35   0.21
Odds Ratio  3   
2   NA  0.93NA  0.7 1.24
ALT Mid:Low 3   
4   NA  0.390.150.1 0.67
Odds Ratio  3   
4   NA  1.48NA  1.111.96
REG Two_plus:One1   2   
NA  -0.59   0.13-0.84   -0.34
Odds Ratio  1   
2   NA  0.55NA  0.430.72
BIO Arctic:Subtropical/Tropical 4   
1   NA  -1.02   0.81-2.61   0.58
Odds Ratio  4   
1   NA  0.36NA  0.071.78
BIO Boreal:Subtropical/Tropical 4   2   
NA  -1.21   0.81-2.79   0.37
Odds Ratio  4   
2   NA  0.3 NA  0.061.44
BIO Mediterranean:Subtropical/Tropical  4   3   
NA  -1.89   0.48-2.83   -0.95
Odds Ratio  4   
3   NA  0.15NA  0.060.39
BIO Temperate:Subtropical/Tropical  4   5   
NA  -0.09   0.16-0.41   0.23
Odds Ratio  4   
5   NA  0.91NA  0.661.26
On 3 Oct 2010, at 15:29, Frank Harrell wrote:


You still seem to be hung up on making arbitrary classifications.  Instead,
look at tendencies using odds ratios or rank correlation measures.  My book
Regression Modeling Strategies covers this.

Frank

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2953220.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

2010-10-03 Thread John Haart
Thanks Frank and Greg, 

This makes alot more sense to me now. I appreciate you are both very busy, but 
i was wondering if i could trouble you for one last piece of advice. As my data 
is a little complicated for a first effort at R let alone modelling!

The response is on a range from 1-6, which indicates extinction risk - 1 being 
least concern and 6 being critical - hence using a ordinal model

The factors (6) are categorical - FRUIT TYPE - fleshy/dry
 HABITAT - terrestrial, 
aquatic, epiphyte

etc etc 

I am asking the question - How do different combinations of factors effect 
extinction risk.

Based on what you have both said i have called

 predict(model1, type=fitted)

Would this be the best way predicting the probability of falling into each 
response category  - 


y=2y=3 y=4 y=5 y=6
10.502220616 0.410236021 0.2892270912 0.2191420568 0.1774250519
20.745221699 0.668501579 0.5412223837 0.4486151612 0.3847379442
30.720381333 0.639796647 0.5095814746 0.4174618165 0.3551631876
40.752321112 0.676811675 0.5505781183 0.4579680710 0.3937100283
50.824388319 0.763956402 0.6543788296 0.5663098186 0.5008981585
60.824388319 0.763956402 0.6543788296 0.5663098186 0.5008981585
70.824388319 0.763956402 0.6543788296 0.5663098186 0.5008981585
80.824388319 0.763956402 0.6543788296 0.5663098186 0.5008981585
90.526291649 0.433739868 0.3094355120 0.2360800803 0.1919312111

I have 100 species for which i have their factors and i want to predict their 
response, so if i do the above and use the newdata function, and present the 
probabilities  as above rather than trying to classify them?

I  tried polr and that classified each response as either 1 or 6 i.e no 
2,3,4,5 - as did calling predict(model1, type=fitted.ind) which resulted in 
the probabilities of being 1 or 6 far outweighing 2,3,4,5 (Below) - this may 
just be that my model is not powefull enough to discrimate effectively as i 
know that is incorrect ( Brier score 2.01, AUC 66.9)?

 EXTINCTION=1 EXTINCTION=2 EXTINCTION=3 EXTINCTION=4 EXTINCTION=5 
EXTINCTION=6
1   0.4977794 0.0919845942  0.121008930  0.070085034 0.0417170048   
0.1774250519
2   0.2547783 0.0767201200  0.127279196  0.092607223 0.0638772170   
0.3847379442
3   0.2796187 0.0805846862  0.130215173  0.092119658 0.0622986289   
0.3551631876
4   0.2476789 0.0755094367  0.126233557  0.092610047 0.0642580427   
0.3937100283
5   0.1756117 0.0604319173  0.109577572  0.088069011 0.0654116601   
0.5008981585
6   0.1756117 0.0604319173  0.109577572  0.088069011 0.0654116601   
0.5008981585
7   0.1756117 0.0604319173  0.109577572  0.088069011 0.0654116601   
0.5008981585
8   0.1756117 0.0604319173  0.109577572  0.088069011 0.0654116601   
0.5008981585
9   0.4737084 0.0925517814  0.124304356  0.073355432 0.0441488692   
0.1919312111
10  0.2489307 0.0757263892  0.126424896  0.092614323 0.0641934484   
0.3921102030

 
Thanks very much for any advice given,

John


10   0.751069260 0.675342871 0.5489179746 0.4563036514 0.3921102030
On 1 Oct 2010, at 23:13, Frank Harrell wrote:


Well put Greg.  The job of the statistician is to produce good estimates
(probabilities in this case).  Those cannot be translated into action
without subject-specific utility functions.  Classification during the
analysis or publication stage is not necessary.

Frank

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2951976.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

2010-10-01 Thread John Haart
Dear list,

I am relatively new to ordinal models and have been working through the example 
given by Frank Harrell in the predict.lrm {Design} help 

All of this makes sense to me, except for the responses, i,e how do i interpret 
them? i would be extremely grateful if someone could explain the results?

First i establish the date and model - 

 y - factor(sample(1:3, 400, TRUE), 1:3, c('good','better','best'))
 x1 - runif(400)
 x2 - runif(400)
 f - lrm(y ~ rcs(x1,4)*x2, x=TRUE) 

Get 0.95 confidence limits for Prob[better or best

# How do i interpret this on the y scale i.e good,better,best?

 
 L - predict(f, se.fit=TRUE)   #omitted kint= so use 1st intercept
 plogis(with(L, linear.predictors + 1.96*cbind(-se.fit,se.fit)))

se.fit
 1   0.6430994 0.8305201
 2   0.5812662 0.7919122
 3   0.5692593 0.7976906
 4   0.5600308 0.7278637
 5   0.6845250 0.8819143
 6   0.5518848 0.7228657
 7   0.5876031 0.7717215
 8   0.6291766 0.8354423
 9   0.5839353 0.8333790
 10  0.5631326 0.8314051




 Get Prob(better) than all others - 

# Does this mean that for data point 1, y= best as it has the higher 
probability?

 predict(f, type=fitted.ind)[1:10,]

  y=good  y=bettery=best
1  0.2517915 0.3469692 0.4012392
2  0.3031733 0.3554471 0.3413796
3  0.3046236 0.3555365 0.3398398
4  0.3514780 0.3546880 0.2938340
5  0.1989827 0.3251784 0.4758390
6  0.3581265 0.3540297 0.2878438
7  0.3130150 0.3559091 0.3310759
8  0.2541324 0.3476007 0.3982669
9  0.2740127 0.3519713 0.3740160
10 0.2839907 0.3535331 0.3624763

Establish data frame to use as newdata

 d - data.frame(x1=c(.1,.5),x2=c(.5,.15))

Predict newdata - Prob(Y=j) for new observation

 predict(f, d, type=fitted)

# Does this mean that for data point 1, y= better as it has the higher 
probability?

  y=better   y=best
1 0.6800290 0.3239935
2 0.5846743 0.2409657

# Prob(Y=j)

# Again - Does this mean that for data point 1, y= better as it has the higher 
probability?

predict(f, d, type=fitted.ind)
 
y=good  y=bettery=best
1 0.3199710 0.3560355 0.3239935
2 0.4153257 0.3437086 0.2409657

predict mean(y) using codes 1,2,3

# How do i interpret this on the y scale i.e good,better,best?

  predict(f, d, type='mean', codes=TRUE)

   12 
2.004022 1.825640 

Thanks for any advice it is greatly appreciated

John

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

2010-10-01 Thread John Haart
Frank,

Thats great thanks for the advice, i appreciate that brier score, AUC etc are a 
better method of validation and discrimination  but when it comes to 
predictions of new data 

 d - data.frame(x1=c(.1,.5),x2=c(.5,.15))

 predict(f, d, type=fitted.ind)
  
 y=good  y=bettery=best
 1 0.3199710 0.3560355 0.3239935
 2 0.4153257 0.3437086 0.2409657
 
 predict mean(y) using codes 1,2,3
 
 
  predict(f, d, type='mean', codes=TRUE)
 
12 
 2.004022 1.825640 

How do i use this information  to assign x1 and x2 into a category on the 
response scale (good,better,best?)

Thanks

John




On 1 Oct 2010, at 12:14, Frank Harrell wrote:


John,

Don't conclude that one category is the most probable when its probability
of being equaled or exceeded is a maximum.  The first category would always
be the winner if that were the case.

When you say y=best remember that you are dealing with a probability model. 
Nothing is forcing you to classify an observation, and unless the category's
probability is high, this may be dangerous.  You might do well to consider a
more smooth approach such as using the generalized roc area (C-index) or its
related rank correlation measure Dxy.  Also there are odds ratios.

Frank

















-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2891623.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

2010-10-01 Thread John Haart
Frank and list,


The reason I am trying to assign them is because I have a data set where i have 
arrived at  the most likely model that describes the data and now I have 
another dataset where I know the factors but not the response.

Therefore, surely I need to assign the predicted values to a response in order 
to say something like: 

Based on the model I believe unknown 1 is good, where as unknown 2 is very good 
etc?

Maybe I am missing something or using the wrong approach but I thought the main 
purpose of using the predict function on new data was to predict the response?

John

On 1 Oct 2010, at 14:51, Frank Harrell f.harr...@vanderbilt.edu wrote:

 
 Why assign them at all?  Is this a forced choice at gunpoint problem? 
 Remember what probabilities mean.
 
 Frank
 
 -
 Frank Harrell
 Department of Biostatistics, Vanderbilt University
 -- 
 View this message in context: 
 http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2909713.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.