[R] How to remove NAs and lme function

2008-05-28 Thread Jen_mp3

I am working on a project to find a model for the concentration of dissolved
oxygen in the river clyde. Ive fitted a linear mixed model as
lme(DOW~Temperature+Salinity+Year+factor(Station)*factor(Depth),
random~1|id), where id is an identifier of the day over 20 years defined as
Day*1 + Month*100 + (1900 - Year).
Anyway, there are some NAs for the concentration of dissolved oxygen in the
water so I know you add in na.action = na.omit and that omits the NAs so
there are 9008 observations in the model, but it doesnt do it for the whole
data set where there are 10965 including observations with NAs. I would like
to plot the residuals from the model against the Salinity, Temperature and
Year, but when I try, it seems to want to take the observations of these
variables from the full data set and the residuals from the model which of
course doesnt work. I have tried using
data1 <- data[data$DOW != "NA",] on the whole data set but it doesnt work.
How can I remove the NAs from a data set? 

-- 
View this message in context: 
http://www.nabble.com/How-to-remove-NAs-and-lme-function-tp17510564p17510564.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to remove NAs and lme function

2008-05-28 Thread Jen_mp3

Thanks, that worked!


Andrew Robinson-6 wrote:
> 
> Jen,
> 
> try
> 
> na.action = na.exclude
> 
> Andrew
> 
> 
> On Wed, May 28, 2008 9:26 pm, Jen_mp3 wrote:
>>
>> I am working on a project to find a model for the concentration of
>> dissolved
>> oxygen in the river clyde. Ive fitted a linear mixed model as
>> lme(DOW~Temperature+Salinity+Year+factor(Station)*factor(Depth),
>> random~1|id), where id is an identifier of the day over 20 years defined
>> as
>> Day*1 + Month*100 + (1900 - Year).
>> Anyway, there are some NAs for the concentration of dissolved oxygen in
>> the
>> water so I know you add in na.action = na.omit and that omits the NAs so
>> there are 9008 observations in the model, but it doesnt do it for the
>> whole
>> data set where there are 10965 including observations with NAs. I would
>> like
>> to plot the residuals from the model against the Salinity, Temperature
>> and
>> Year, but when I try, it seems to want to take the observations of these
>> variables from the full data set and the residuals from the model which
>> of
>> course doesnt work. I have tried using
>> data1 <- data[data$DOW != "NA",] on the whole data set but it doesnt
>> work.
>> How can I remove the NAs from a data set?
>>
>> --
>> View this message in context:
>> http://www.nabble.com/How-to-remove-NAs-and-lme-function-tp17510564p17510564.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 
> Andrew Robinson
> Senior Lecturer in Statistics   Tel: +61-3-8344-6410
> Department of Mathematics and StatisticsFax: +61-3-8344 4599
> University of Melbourne, VIC 3010 Australia
> Email: [EMAIL PROTECTED]Website:
> http://www.ms.unimelb.edu.au
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/How-to-remove-NAs-and-lme-function-tp17510564p17514479.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Couple of Questions about Classification trees

2009-03-11 Thread Jen_mp3

So I have 2 sets of data - a training data set and a test data set. I've been
doing the analysis on the training data set and then using predict and
feeding the test data through that. There are 114 rows in the training data
and 117 in the test data and 1024 columns in both. It's actually the same
set of data split into two. The rows are made of 5 different numbers. They
do represent something but it would take too long to explain.

I want to try and find a classification rule for the 5 numbers in the rows
based on the columns so I created a classification tree and plotted that and
then pruned it. My question is how do you print the misclassification rate
at each node on the actual diagram of the classification tree. I can't seem
to get it up there. In my notes it uses gmistext() but I have a feeling that
it's for Splus rather than R as gmistext() doesn.t work for me either. 

Second question is when I try using the predict.tree to put the test data
into the tree and then plot it it comes up with a really weird and wrong
looking plot. Here is the code I'm using:
tree1 <- tree(row~.,data=train)
pruned.tree <- prune.tree(tree1, best = 5, method = "misclass")
predict.tree1 <- predict.tree(prune.tree, data = main)
plot(predict.tree);text(predict.tree)
I sort of don't get a classification tree, I get the x axis labelled 1, the
y axis labelled 2 and then about 4 small black rectangles scattered across
the plot. 

Thanks in Advance. 
-- 
View this message in context: 
http://www.nabble.com/Couple-of-Questions-about-Classification-trees-tp22461673p22461673.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Couple of Questions about Classification trees

2009-03-11 Thread Jen_mp3


Okay perhaps I should've been more clear about the data. Im actually working
on spectroscopic measurements from food authenticity testing. I have five
different types of meat: 55 of chicken, 55 of turkey, 55 of pork, 34 of beef
and 32 of lamb - 231 in total. On each of these 231 meats, 1024
spectroscopic measurements were taken. Matrix of 231 by 1024. But the
questions I want answered are which of the 1024 measurements are important
for predicting meat type and which of the different types of meat are
incorrectly classified - i.e can we tell the difference between chicken and
turkey. So to carry out a multivariate analysis on the data Ive split it
into two. A training data set and a test data set - half and half although I
think the larger half (55 goes into 27 and 28) went into the test data set
which explains the inequalities in the row numbers. By the way 1024 is
standard - can't change that. Can't change the 231 either. 

So I created a new row with the meat types for each row. 

End up with the following R code:
library(tree)
meat.tree <- tree(meat.type~., data=train)
using tree.cv (or cv.tree) lowest missclassification rate is 5 so cut the
number of nodes down to 5 using prune.tree
prunedtree <- prune.tree(meat.tree, best = 5, method = "misclass")
Then I want to use predict.tree and the test data set. 
predicttree <- predict.tree(prunedtree, data = test)
I already said what it produces. 

Again, how would I display the misclassification rate at each node on the
diagram? I know about misclass.tree(prunedtree, detail = TRUE) but that
doesn't actually display them on the classification tree - it just gives a
bunch of numbers of the worksheet and it just wouldn't look very neat if I
had to add them later. 

-- 
View this message in context: 
http://www.nabble.com/Couple-of-Questions-about-Classification-trees-tp22461673p22464302.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] MANOVA

2009-03-12 Thread Jen_mp3

No, MANOVA is for Multivariate analysis of variance which is used if there
are multiple responses as well as variables but you just have one response
which is blood pressure. You should just have
model <- lm(BP ~Weight+Height)
anova(model)
If Weight is related to Height only one should be significant so you can
drop it from the model and then use anova again to compare the two models. 
The collinearity between Weight and Height is really a separate question and
shouldn't be a problem when fitting a model for BP.





-- 
View this message in context: 
http://www.nabble.com/MANOVA-tp22470559p22485695.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.