[R] How to remove NAs and lme function
I am working on a project to find a model for the concentration of dissolved oxygen in the river clyde. Ive fitted a linear mixed model as lme(DOW~Temperature+Salinity+Year+factor(Station)*factor(Depth), random~1|id), where id is an identifier of the day over 20 years defined as Day*1 + Month*100 + (1900 - Year). Anyway, there are some NAs for the concentration of dissolved oxygen in the water so I know you add in na.action = na.omit and that omits the NAs so there are 9008 observations in the model, but it doesnt do it for the whole data set where there are 10965 including observations with NAs. I would like to plot the residuals from the model against the Salinity, Temperature and Year, but when I try, it seems to want to take the observations of these variables from the full data set and the residuals from the model which of course doesnt work. I have tried using data1 <- data[data$DOW != "NA",] on the whole data set but it doesnt work. How can I remove the NAs from a data set? -- View this message in context: http://www.nabble.com/How-to-remove-NAs-and-lme-function-tp17510564p17510564.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to remove NAs and lme function
Thanks, that worked! Andrew Robinson-6 wrote: > > Jen, > > try > > na.action = na.exclude > > Andrew > > > On Wed, May 28, 2008 9:26 pm, Jen_mp3 wrote: >> >> I am working on a project to find a model for the concentration of >> dissolved >> oxygen in the river clyde. Ive fitted a linear mixed model as >> lme(DOW~Temperature+Salinity+Year+factor(Station)*factor(Depth), >> random~1|id), where id is an identifier of the day over 20 years defined >> as >> Day*1 + Month*100 + (1900 - Year). >> Anyway, there are some NAs for the concentration of dissolved oxygen in >> the >> water so I know you add in na.action = na.omit and that omits the NAs so >> there are 9008 observations in the model, but it doesnt do it for the >> whole >> data set where there are 10965 including observations with NAs. I would >> like >> to plot the residuals from the model against the Salinity, Temperature >> and >> Year, but when I try, it seems to want to take the observations of these >> variables from the full data set and the residuals from the model which >> of >> course doesnt work. I have tried using >> data1 <- data[data$DOW != "NA",] on the whole data set but it doesnt >> work. >> How can I remove the NAs from a data set? >> >> -- >> View this message in context: >> http://www.nabble.com/How-to-remove-NAs-and-lme-function-tp17510564p17510564.html >> Sent from the R help mailing list archive at Nabble.com. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > Andrew Robinson > Senior Lecturer in Statistics Tel: +61-3-8344-6410 > Department of Mathematics and StatisticsFax: +61-3-8344 4599 > University of Melbourne, VIC 3010 Australia > Email: [EMAIL PROTECTED]Website: > http://www.ms.unimelb.edu.au > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/How-to-remove-NAs-and-lme-function-tp17510564p17514479.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Couple of Questions about Classification trees
So I have 2 sets of data - a training data set and a test data set. I've been doing the analysis on the training data set and then using predict and feeding the test data through that. There are 114 rows in the training data and 117 in the test data and 1024 columns in both. It's actually the same set of data split into two. The rows are made of 5 different numbers. They do represent something but it would take too long to explain. I want to try and find a classification rule for the 5 numbers in the rows based on the columns so I created a classification tree and plotted that and then pruned it. My question is how do you print the misclassification rate at each node on the actual diagram of the classification tree. I can't seem to get it up there. In my notes it uses gmistext() but I have a feeling that it's for Splus rather than R as gmistext() doesn.t work for me either. Second question is when I try using the predict.tree to put the test data into the tree and then plot it it comes up with a really weird and wrong looking plot. Here is the code I'm using: tree1 <- tree(row~.,data=train) pruned.tree <- prune.tree(tree1, best = 5, method = "misclass") predict.tree1 <- predict.tree(prune.tree, data = main) plot(predict.tree);text(predict.tree) I sort of don't get a classification tree, I get the x axis labelled 1, the y axis labelled 2 and then about 4 small black rectangles scattered across the plot. Thanks in Advance. -- View this message in context: http://www.nabble.com/Couple-of-Questions-about-Classification-trees-tp22461673p22461673.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Couple of Questions about Classification trees
Okay perhaps I should've been more clear about the data. Im actually working on spectroscopic measurements from food authenticity testing. I have five different types of meat: 55 of chicken, 55 of turkey, 55 of pork, 34 of beef and 32 of lamb - 231 in total. On each of these 231 meats, 1024 spectroscopic measurements were taken. Matrix of 231 by 1024. But the questions I want answered are which of the 1024 measurements are important for predicting meat type and which of the different types of meat are incorrectly classified - i.e can we tell the difference between chicken and turkey. So to carry out a multivariate analysis on the data Ive split it into two. A training data set and a test data set - half and half although I think the larger half (55 goes into 27 and 28) went into the test data set which explains the inequalities in the row numbers. By the way 1024 is standard - can't change that. Can't change the 231 either. So I created a new row with the meat types for each row. End up with the following R code: library(tree) meat.tree <- tree(meat.type~., data=train) using tree.cv (or cv.tree) lowest missclassification rate is 5 so cut the number of nodes down to 5 using prune.tree prunedtree <- prune.tree(meat.tree, best = 5, method = "misclass") Then I want to use predict.tree and the test data set. predicttree <- predict.tree(prunedtree, data = test) I already said what it produces. Again, how would I display the misclassification rate at each node on the diagram? I know about misclass.tree(prunedtree, detail = TRUE) but that doesn't actually display them on the classification tree - it just gives a bunch of numbers of the worksheet and it just wouldn't look very neat if I had to add them later. -- View this message in context: http://www.nabble.com/Couple-of-Questions-about-Classification-trees-tp22461673p22464302.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MANOVA
No, MANOVA is for Multivariate analysis of variance which is used if there are multiple responses as well as variables but you just have one response which is blood pressure. You should just have model <- lm(BP ~Weight+Height) anova(model) If Weight is related to Height only one should be significant so you can drop it from the model and then use anova again to compare the two models. The collinearity between Weight and Height is really a separate question and shouldn't be a problem when fitting a model for BP. -- View this message in context: http://www.nabble.com/MANOVA-tp22470559p22485695.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.