[R] Some coefficients are doubled when I use the step() function
Hello- Such a strange problem, can't figure it out at all. Using binomial glm models, and the step() function, so the call looks like this: sectionmodel = glm(formula = Target3 ~ S1Q12_NUM.1 + S1Q9_NUM.1 + S1Q5_NUM.1 + S1Q7_NUM.1 + S1Q8_NUM.1 + S1Q6_NUM.1 + S1Q10_NUM.1 + S1Q12_BURG.1 + S1Q12_CD.1 + S1Q4.1 + S1Q12_OTHVIOL.1 + S1Q8.1 + S1Q12_GBH.1 + S1Q11.1 + S1Q7.1 + S1Q12_THEFT.1 + S1Q12_DRIV.1 + S1Q5.1 + S1Q9.1 + S1Q12_DRUG.1, family = binomial, data = moddata) But when I run step() on the resulting model, some of the coefficents are doubled when it comes back, with a 2 at the end, e.g. like this: mymodel = step(sectionmodel, direction=backward, test=F) summary(mymodel) returns this: Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -4.585190.55675 -8.236 2e-16 *** S1Q12_NUM.1 0.184460.08576 2.151 0.0315 * S1Q4.12 0.568930.40281 1.412 0.1578 S1Q12_OTHVIOL.11 0.564350.38262 1.475 0.1402 S1Q12_GBH.11 0.491990.33175 1.483 0.1381 S1Q7.11 -1.273301.12897 -1.128 0.2594 S1Q7.12 -1.839271.16909 -1.573 0.1157 S1Q5.11 0.917421.19489 0.768 0.4426 S1Q5.12 2.168611.19864 1.809 0.0704 . S1Q12_DRUG.11-0.484000.29898 -1.619 0.1055 As you can see S1Q7.1 and S1Q5.1 are duplicated as S1Q7.11 and S1Q7.12 etc. I've googled and read and re-read the step() and stepAIC() documentation and I just can't figure out what it could mean. Removing the test=F bit also generates the same behaviour. Any help greatly appreciated. Chris Beeley Institute of Mental Health, UK __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] anova of lme objects (model1, model2) gives different results depending on order of models
Well that's that cleared up then. Thanks to all. Chris B. On 31/05/2012 17:51, Albyn Jones wrote: No, both yield the same result: reject the null hypothesis, which always corresponds to the restricted (smaller) model. albyn On Thu, May 31, 2012 at 12:47:30PM +0100, Chris Beeley wrote: Hello- I understand that it's convention, when comparing two models using the anova function anova(model1, model2), to put the more complicated (for want of a better word) model as the second model. However, I'm using lme in the nlme package and I've found that the order of the models actually gives opposite results. I'm not sure if this is supposed to be the case or if I have missed something important, and I can't find anything in the Pinheiro and Bates book or in ?anova, or in Google for that matter which unfortunately only returns results about ANOVA which isn't much help. I'm using the latest version of R and nlme, just checked both. Here is the code and output: PHQmodel1=lme(PHQ~Age+Gender+Date*Treatment, data=compfinal, random=~1|Case, na.action=na.omit) PHQmodel2=lme(PHQ~Age+Gender+Date*Treatment, data=compfinal, random=~1|Case, na.action=na.omit, + correlation=corAR1(form=~Date|Case)) anova(PHQmodel1, PHQmodel2) # accept model 2 Model df AIC BIClogLik Test L.Ratio p-value PHQmodel1 1 8 48784.57 48840.43 -24384.28 PHQmodel2 2 9 48284.68 48347.51 -24133.34 1 vs 2 501.8926.0001 PHQmodel1=lme(PHQ~Age+Gender+Date*Treatment, data=compfinal, random=~1|Case, na.action=na.omit, + correlation=corAR1(form=~Date|Case)) PHQmodel2=lme(PHQ~Age+Gender+Date*Treatment, data=compfinal, random=~1|Case, na.action=na.omit) anova(PHQmodel1, PHQmodel2) # accept model 2 Model df AIC BIClogLik Test L.Ratio p-value PHQmodel1 1 9 48284.68 48347.51 -24133.34 PHQmodel2 2 8 48784.57 48840.43 -24384.28 1 vs 2 501.8926.0001 In both cases I am led to accept model 2 even though they are opposite models. Is it really just that you have to put them in the right order? It just seems like if there were say four models you wouldn't necessarily be able to determine the correct order. Many thanks, Chris Beeley, Institute of Mental Health, UK ...session info follows sessionInfo() R version 2.15.0 (2012-03-30) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] gridExtra_0.9 RColorBrewer_1.0-5 car_2.0-12 nnet_7.3-1 MASS_7.3-17 [6] xtable_1.7-0 psych_1.2.4languageR_1.4 nlme_3.1-104 ggplot2_0.9.1 loaded via a namespace (and not attached): [1] colorspace_1.1-1 dichromat_1.2-4 digest_0.5.2 labeling_0.1 lattice_0.20-6 memoise_0.1 [7] munsell_0.3 plyr_1.7.1 proto_0.3-9.2 reshape2_1.2.1 scales_0.2.1 stringr_0.6 [13] tools_2.15.0 packageDescription(nlme) Package: nlme Version: 3.1-104 Date: 2012-05-21 Priority: recommended Title: Linear and Nonlinear Mixed Effects Models Authors@R: c(person(Jose, Pinheiro, comment = S version), person(Douglas, Bates, comment = up to 2007), person(Saikat, DebRoy, comment = up to 2002), person(Deepayan, Sarkar, comment = up to 2005), person(R-core, email = r-c...@r-project.org, role = c(aut, cre))) Author: Jose Pinheiro (S version), Douglas Bates (up to 2007), Saikat DebRoy (up to 2002), Deepayan Sarkar (up to 2005), the R Core team. Maintainer: R-corer-c...@r-project.org Description: Fit and compare Gaussian linear and nonlinear mixed-effects models. Depends: graphics, stats, R (= 2.13) Imports: lattice Suggests: Hmisc, MASS LazyLoad: yes LazyData: yes License: GPL (= 2) BugReports: http://bugs.r-project.org Packaged: 2012-05-23 07:28:59 UTC; ripley Repository: CRAN Date/Publication: 2012-05-23 07:37:45 Built: R 2.15.0; x86_64-pc-mingw32; 2012-05-29 12:36:01 UTC; windows __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] anova of lme objects (model1, model2) gives different results depending on order of models
Hello- I understand that it's convention, when comparing two models using the anova function anova(model1, model2), to put the more complicated (for want of a better word) model as the second model. However, I'm using lme in the nlme package and I've found that the order of the models actually gives opposite results. I'm not sure if this is supposed to be the case or if I have missed something important, and I can't find anything in the Pinheiro and Bates book or in ?anova, or in Google for that matter which unfortunately only returns results about ANOVA which isn't much help. I'm using the latest version of R and nlme, just checked both. Here is the code and output: PHQmodel1=lme(PHQ~Age+Gender+Date*Treatment, data=compfinal, random=~1|Case, na.action=na.omit) PHQmodel2=lme(PHQ~Age+Gender+Date*Treatment, data=compfinal, random=~1|Case, na.action=na.omit, + correlation=corAR1(form=~Date|Case)) anova(PHQmodel1, PHQmodel2) # accept model 2 Model df AIC BIClogLik Test L.Ratio p-value PHQmodel1 1 8 48784.57 48840.43 -24384.28 PHQmodel2 2 9 48284.68 48347.51 -24133.34 1 vs 2 501.8926 .0001 PHQmodel1=lme(PHQ~Age+Gender+Date*Treatment, data=compfinal, random=~1|Case, na.action=na.omit, + correlation=corAR1(form=~Date|Case)) PHQmodel2=lme(PHQ~Age+Gender+Date*Treatment, data=compfinal, random=~1|Case, na.action=na.omit) anova(PHQmodel1, PHQmodel2) # accept model 2 Model df AIC BIClogLik Test L.Ratio p-value PHQmodel1 1 9 48284.68 48347.51 -24133.34 PHQmodel2 2 8 48784.57 48840.43 -24384.28 1 vs 2 501.8926 .0001 In both cases I am led to accept model 2 even though they are opposite models. Is it really just that you have to put them in the right order? It just seems like if there were say four models you wouldn't necessarily be able to determine the correct order. Many thanks, Chris Beeley, Institute of Mental Health, UK ...session info follows sessionInfo() R version 2.15.0 (2012-03-30) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] gridExtra_0.9 RColorBrewer_1.0-5 car_2.0-12 nnet_7.3-1 MASS_7.3-17 [6] xtable_1.7-0 psych_1.2.4languageR_1.4 nlme_3.1-104 ggplot2_0.9.1 loaded via a namespace (and not attached): [1] colorspace_1.1-1 dichromat_1.2-4 digest_0.5.2 labeling_0.1 lattice_0.20-6 memoise_0.1 [7] munsell_0.3 plyr_1.7.1 proto_0.3-9.2 reshape2_1.2.1 scales_0.2.1 stringr_0.6 [13] tools_2.15.0 packageDescription(nlme) Package: nlme Version: 3.1-104 Date: 2012-05-21 Priority: recommended Title: Linear and Nonlinear Mixed Effects Models Authors@R: c(person(Jose, Pinheiro, comment = S version), person(Douglas, Bates, comment = up to 2007), person(Saikat, DebRoy, comment = up to 2002), person(Deepayan, Sarkar, comment = up to 2005), person(R-core, email = r-c...@r-project.org, role = c(aut, cre))) Author: Jose Pinheiro (S version), Douglas Bates (up to 2007), Saikat DebRoy (up to 2002), Deepayan Sarkar (up to 2005), the R Core team. Maintainer: R-core r-c...@r-project.org Description: Fit and compare Gaussian linear and nonlinear mixed-effects models. Depends: graphics, stats, R (= 2.13) Imports: lattice Suggests: Hmisc, MASS LazyLoad: yes LazyData: yes License: GPL (= 2) BugReports: http://bugs.r-project.org Packaged: 2012-05-23 07:28:59 UTC; ripley Repository: CRAN Date/Publication: 2012-05-23 07:37:45 Built: R 2.15.0; x86_64-pc-mingw32; 2012-05-29 12:36:01 UTC; windows __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Extract fitted values with and without offset from glm
Hello- In the notes for the lm function it states Offsets specified by offset will not be included in predictions by predict.lm, whereas those specified by an offset term in the formula will be. I would like to extract fitted values in just this way from a glm model, those with the offset and those without. I have tried doing things like this: predict.glm(glm(Incident~Numbers, offset=logit(Numbers), family=binomial, data=violdata)) predict.glm(glm(Incident~Numbers+offset(logit(Numbers)), family=binomial, data=violdata)) As well as like this: glm(Incident~Numbers, offset=logit(Numbers), family=binomial, data=violdata)$fitted.values glm(Incident~Numbers+offset(logit(Numbers)), family=binomial, data=violdata)$fitted.values But they return the same result. The first 50 lines of my data look like this: structure(list(Incident = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1), Numbers = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 13L, 13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L)), .Names = c(Incident, Numbers), row.names = c(NA, 50L), class = data.frame) Any assistance gratefully recieved. Many thanks, Chris Beeley, Institute of Mental Health, UK __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nested brew call yields Error in .brew.cat(26, 28) : unused argument(s) (26, 28)
Many thanks for this. I have a follow-up question. The output that I have from the nested brew call includes output like this: NANANANANANANANAN ... then a graph or a table ... then more NANANANANANANANANNANANANANANANANANNANANANANANANANANNANANANANANANANAN ... etc. It only occurs in the nested brew calls, not in the top level document, which is absolutely fine. There are functions defined in the top level file which the lower level files make use of. I assumed the problem was caused by my not understanding the documentation to do with nested brew calls; evidently this is not the case. I have several functions within the top file, one for drawing graphs, one for tables, another for wordclouds, etc. They all generate this NANANANANANANA behaviour, I have tested by putting them in and out of the code. I tried to produce a minimal self contained example containing a function defined in the top level file used by a file called in a nested brew: however this file worked fine. I realise this isn't a lot to go on, but the functions are fairly long and it clearly isn't a specific issue with a particular function because they all do it. Has anyone else ever had this happen to them? If so did you find a solution (other than manually removing the NAs using a final piece of code, which admittedly is not too arduous). Many thanks, Chris Beeley, Institute of Mental Health, UK On 30/03/2012 02:27, Matt Shotwell wrote: On Wed, 2012-03-28 at 11:40 +0100, Chris Beeley wrote: I am writing several webpages using the brew package and R2HTML. I would like to work off one script so I am using nested brew calls. The documentation for brew states that: NOTE: brew calls can be nested and rely on placing a function named ’.brew.cat’ in the environment in which it is passed. Each time brew is called, a check for the existence of this function is made. If it exists, then it is replaced with a new copy that is lexically scoped to the current brew frame. Once the brew call is done, the function is replaced with the previous function. The function is finally removed from the environment once all brew calls return. I'm afraid I can't quite figure out what it is I'm supposed to do here. I've tried loading the brew library within the script which I pass to brew, and I've tried defining brew cat like this: The paragraph above describes what brew is doing behind the scenes. It's not necessary to modify or set the .brew.cat function. A nested (or recursive) brew call occurs when brew() is called from a document currently being processed by brew(). To illustrate further, suppose there are two brew documents, example-1.brew and example-2.brew, where example-1.brew contains the following text (delimited by '''): ''' This text is in example-1.brew. %= brew::brew(example-2.brew) % ''' and the example-2.brew contains ''' This text is in example-2.brew. %= date() -% ''' Then from the R prompt we have: Rbrew::brew(example-1.brew) This text is in example-1.brew. This text is in example-2.brew. Thu Mar 29 20:24:52 2012 .brew.cat=function(){} This generates the following error message: Error in .brew.cat(26, 28) : unused argument(s) (26, 28) I think perhaps it is more likely that I need to insert into the script the actual content of .brew.cat, but I can't seem to get R to tell me what it is and Googling throws up a lot of stuff about beer and not much else (drew a blank also from RSiteSearch(Nested brew)) Any help gratefully received. Chris Beeley Institute of Mental Health, UK __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Nested brew call yields Error in .brew.cat(26, 28) : unused argument(s) (26, 28)
I am writing several webpages using the brew package and R2HTML. I would like to work off one script so I am using nested brew calls. The documentation for brew states that: NOTE: brew calls can be nested and rely on placing a function named ’.brew.cat’ in the environment in which it is passed. Each time brew is called, a check for the existence of this function is made. If it exists, then it is replaced with a new copy that is lexically scoped to the current brew frame. Once the brew call is done, the function is replaced with the previous function. The function is finally removed from the environment once all brew calls return. I'm afraid I can't quite figure out what it is I'm supposed to do here. I've tried loading the brew library within the script which I pass to brew, and I've tried defining brew cat like this: .brew.cat=function(){} This generates the following error message: Error in .brew.cat(26, 28) : unused argument(s) (26, 28) I think perhaps it is more likely that I need to insert into the script the actual content of .brew.cat, but I can't seem to get R to tell me what it is and Googling throws up a lot of stuff about beer and not much else (drew a blank also from RSiteSearch(Nested brew)) Any help gratefully received. Chris Beeley Institute of Mental Health, UK __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] as.numeric() generates NAs inside an apply call, but fine outside of it
Hello- I have rather a messy SPSS file which I have imported to R, I've dput'd some of the columns at the end of this message. I wish to get rid of all the labels and have numeric values using as.numeric. The funny thing is it works like this: as.numeric(mydata[,2]) # generates correct numbers however, if I pass the whole dataframe at once like this: apply(mydata, 1:2, function(x) as.numeric(x)) This same column, column 2, generates NAs with a in FUN(newX[, i], ...) : NAs introduced by coercion message. Meanwhile column 3 works fine like this: as.numeric(mydata[,3]) # generates correct numbers And generates numeric results out of the apply function. I think I basically know why, the str() command tells me that the variables which work okay are labelled whereas the ones that don't are Factor. However, I can't figure out what's special about the apply call that generates the NAs when as.numeric(mydata[,2]) doesn't and I'm not sure what to do about it in future. I realise I can just loop over the columns, but I would rather get to the bottom of this if I can so I know for future. Thanks in advance for any advice Chris Beeley Institute of Mental Health, UK dput() gives- structure(list(id = structure(1:79, label = structure(Participant, .Names = id), class = labelled), item2.jan11 = structure(c(4L, 3L, 6L, 4L, 6L, 6L, 2L, 6L, 2L, 2L, 3L, 3L, 1L, 6L, 2L, 6L, 4L, 2L, 6L, 2L, 6L, 6L, 6L, 4L, 4L, 6L, 2L, 6L, 2L, 6L, 2L, 3L, 6L, 6L, 3L, 6L, 5L, 6L, 3L, 6L, 1L, 3L, 3L, 3L, 6L, 4L, 1L, 3L, 6L, 2L, 6L, 2L, 6L, 6L, 6L, 4L, 3L, 6L, 6L, 6L, 6L, 6L, 3L, 6L, 2L, 6L, 6L, 2L, 4L, 6L, 2L, 5L, 6L, 6L, 6L, 6L, 1L, 6L, 4L), .Label = c(Not at all, a little, somewhat, quite a lot, very much, missing data ), class = c(labelled, factor), label = structure(The patients care for each other, .Names = item2_jan11)), item12.jan11 = structure(c(5L, 5L, 999L, 5L, 999L, 999L, 2L, 999L, 5L, 2L, 5L, 3L, 3L, 999L, 2L, 999L, 5L, 5L, 999L, 5L, 999L, 999L, 999L, 5L, 5L, 999L, 3L, 999L, 5L, 999L, 3L, 4L, 999L, 999L, 4L, 999L, 5L, 999L, 5L, 999L, 3L, 5L, 4L, 4L, 999L, 3L, 2L, 4L, 999L, 5L, 999L, 5L, 999L, 999L, 999L, 4L, 5L, 999L, 999L, 999L, 999L, 999L, 4L, 999L, 3L, 999L, 999L, 1L, 5L, 999L, 3L, 5L, 999L, 999L, 999L, 999L, 4L, 999L, 0L), value.labels = structure(c(999, 5, 4, 3, 2, 1), .Names = c(missing data, very much, quite a lot, somewhat, a little, Not at all )), label = structure(At times, members of staff are afraid of some of the patients, .Names = item12_jan11), class = labelled)), .Names = c(id, item2.jan11, item12.jan11), class = data.frame, row.names = c(NA, -79L)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] as.numeric() generates NAs inside an apply call, but fine outside of it
Perfect, many thanks for explanation and correct line of code. On 09/01/2012 14:29, peter dalgaard wrote: as.data.frame(lapply(mydata, as.numeric)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Basic question about re-writing for loop as a function
Hello- Sorry to ask a basic question, but I've spent many hours on this now and seem to be missing something. I have a loop that looks like this: mainmat=data.frame(matrix(data=0, ncol=92, nrow=length(predata$Words_MH))) for(i in 1:length(predata$Words_MH)){ for(j in 1:92){ mainmat[i,j]=ifelse(j %in% as.numeric(unlist(strsplit(predata$Words_MH[i], split=,))), 1, 0) } } What it's doing is creating a matrix with 92 columns, that's the number of different codes, and then for every row of my data it looks to see if the code (code 1, code 2, etc.) is in the string and if it is, returns a 1 in the relevant column (column 1 for code 1, column 2 for code 2, etc.) There are 1000 rows in the database, and I have to run several versions of this code, so it just takes way too long, I have been trying to rewrite using lapply. I tried this: myfunction=function(x, y) ifelse(x %in% as.numeric(unlist(strsplit(predata$Words_MH[y], split=,))), 1, 0) for(j in 1:92){ mainmat[,j]= lapply(predata$Words, myfunction) } but I don't think I can use something that takes two inputs, and I can't seem to remove either. Here's a dput of the first 10 rows of the variable in case that's helpful: predata$Words=c(1, 1, 1, 1, 2,3,4, 5, 1, 1, 6, 7,8,9,10) Given these data, I want the function to return, for the first column, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0 (because those are the values of Words which contain a 1) and for the second column return 0, 0, 0, 0, 1, 0, 0, 0, 0, 0 (because the fifth value is the only one that contains a 2). Any suggestions gratefully received! Chris Beeley Institute of Mental Health, UK __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] odfWeave repeats output
Hello all- I'm having a problem with odfWeave. I'm still testing it out, and have used both of these code chunks, which I copied off a blog: Number 1: A sample document last processed \Sexpr{Sys.time()}. This simply illustrates the output from an R command inserted into our document. This is using \Sexpr{version$version.string}. Number 2: Sample1= summary(iris) @ Both do the same thing, which is generate the document using this code: odfWeave(/media/Windows7/temp/GCAMT_in.odt, /media/Windows7/temp/GCAMT_out2.odt) But the output repeats over and over in the document in a bizarre way, stretching out over about 9 pages, like this (abbreviated): A sample document last processed 2011-08-12 09:55:51. This simply illustrates the output from an R command inserted into our document. This is using R version 2.12.1 (2010-12-16). ... A sample document last processedA sample document last processed 2011-08-12 09:55:51.2011-08-12 09:55:51. This simply illustrates the output from anThis simply illustrates the output from an R command inserted into our document.R command inserted into our document. This is using R version 2.12.1 (2010-12-16).This is using R version 2.12.1 (2010-12-16). ... etc. The really weird thing is that I have replicated the problem across two operating systems (dual boot on the same computer), windows 7 64bit and Linux Mint 11 (which is Ubuntu, not sure which version I'm afraid). I've been unable to find anyone on any forums or anything with the same problem. Using R v2.13 on Windows, v 2.12 on Linux, was using RStudio but just tested it without (just in case) and it does the same thing. Any suggestions gratefully received. Chris Beeley Institute of Mental Health, UK. Output of the operation is below. With this output: odfWeave(/media/Windows7/temp/GCAMT_in.odt, /media/Windows7/temp/GCAMT_out2.odt) Copying /media/Windows7/temp/GCAMT_in.odt Setting wd to /tmp/RtmpAwd1Bm/odfWeave12095551677 Unzipping ODF file using unzip -o GCAMT_in.odt Archive: GCAMT_in.odt extracting: mimetype creating: Configurations2/statusbar/ inflating: Configurations2/accelerator/current.xml creating: Configurations2/floater/ creating: Configurations2/popupmenu/ creating: Configurations2/progressbar/ creating: Configurations2/toolpanel/ creating: Configurations2/menubar/ creating: Configurations2/toolbar/ creating: Configurations2/images/Bitmaps/ inflating: content.xml inflating: manifest.rdf inflating: styles.xml extracting: meta.xml inflating: Thumbnails/thumbnail.png inflating: settings.xml inflating: META-INF/manifest.xml Removing GCAMT_in.odt Creating a Pictures directory Pre-processing the contents Sweaving content.Rnw Writing to file content_1.xml Processing code chunks ... 'content_1.xml' has been Sweaved Removing content.xml Post-processing the contents Removing content.Rnw Removing styles.xml Renaming styles_2.xml to styles.xml Removing manifest.xml Renaming manifest_2.xml to manifest.xml Removing extra files Packaging file using zip -r GCAMT_in.odt . adding: mimetype (stored 0%) adding: content.xml (deflated 98%) adding: settings.xml (deflated 84%) adding: meta.xml (deflated 57%) adding: META-INF/ (stored 0%) adding: META-INF/manifest.xml (deflated 83%) adding: styles.xml (deflated 93%) adding: manifest.rdf (deflated 54%) adding: Pictures/ (stored 0%) adding: Thumbnails/ (stored 0%) adding: Thumbnails/thumbnail.png (deflated 23%) adding: Configurations2/ (stored 0%) adding: Configurations2/progressbar/ (stored 0%) adding: Configurations2/images/ (stored 0%) adding: Configurations2/images/Bitmaps/ (stored 0%) adding: Configurations2/toolbar/ (stored 0%) adding: Configurations2/menubar/ (stored 0%) adding: Configurations2/statusbar/ (stored 0%) adding: Configurations2/popupmenu/ (stored 0%) adding: Configurations2/accelerator/ (stored 0%) adding: Configurations2/accelerator/current.xml (stored 0%) adding: Configurations2/floater/ (stored 0%) adding: Configurations2/toolpanel/ (stored 0%) Copying GCAMT_in.odt Resetting wd Removing /tmp/RtmpAwd1Bm/odfWeave12095551677 Done __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Match strings across two differently sized dataframes and copy corresponding row to dataframe
Hello- Sorry, this is a bit of a noob question, but I can't seem to progress it any further. I have two dataframes which contain a series of strings which exactly match. The problem is one has more rows than the other (more cases have been added) and they have been sorted so that they are not in the same order. The smaller dataframe, though, contains in another column which has codes classifying the strings. So, for every row of the larger dataframe, I want to look up the string in the smaller dataframe, and then use that row number to copy across the code for the string into the larger dataframe. Here's my idea so far: # comments is the smaller dataframe with the codes, mydata is the larger dataframe to which I would like to copy it. commvec=charmatch(comments$ImproveOne, mydata$Improve) # this is the match between the strings one way datavec=charmatch(mydata$Improve, comments$ImproveOne) # this is the match the other way mydata$ImproveCat1=NA # produce a variable to hold the copied codes mydata$ImproveCat1[datavec[!is.na(datavec)]]= comments$ImproveCat[commvec[!is.na(commvec)]] # for all the non missing row numbers identified in the larger dataframe- # copy the corresponding code from the smaller dataframe (which lives in comments$ImproveCat However, the last command doesn't work because the variables are not the same length. They nearly are though, not sure if that's coincidence or shows I'm close length(mydata$ImproveCat1[datavec[!is.na(datavec)]]) # yields 1567 length(comments$ImproveCat[commvec[!is.na(commvec)]]) # yields 1512 I'm sorry, I did try to construct an example dataframe, but ironically I can't make that work either! Sorry! Any help gratefully received. Many thanks! Chris Beeley Institute of Mental Health, UK __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Replace selected columns of a dataframe with NA
I am using the following command to replace all the missing values and assorted typos in a dataframe with NA: mydata[mydata80]=NA The problem is that the first column contains values which should be more than 80, so really I want to do it just for mydata[,2:length(mydata)] I can't seem to re-write the code to fit: mydata[,2:length(mydata)80]=NA # no error message, but doesn't work- doesn't do anything, it would seem I realise I can just keep the first column somewhere safe and copy it back again when I'm done, but I wondered if there was a more elegant solution, which would be much more important, if say I just wanted to replace the odd columns, or something like that. I found this code on the internet too: idx - which(foo80, arr.ind=TRUE) foo[idx[1], idx[2]] - NA But I can't seem to rewrite that either, for the same reason Many thanks! Chris Beeley Institute of Mental Health __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Subset command and the : operator
Hello- I have some code that looks like this: with(mydatalocal, sum(table(Service[Time==5:8]))) This is designed to add up the numbers of responses between the Time codes 5 to 8 (which are integers and refer to quarters). Service is just one of the variables, I'm just trying to count the number of responses so I picked any of the variables. However, there is something wrong, it returns far too low a number for the number of responses. Indeed, if I run this: with(mydatalocal, sum(table(Service[Time==5|Time==6|Time==7|Time==8]))) I get 4 times as many responses. I've tried to recreate the problem with the following code: mydata=data.frame(matrix(c(rep(1, 10), rep(2, 10), rep(3, 10), seq(1, 10, 1), seq(11, 20, 1), seq(21, 30, 1)), ncol=2)) with(mydata, sum(table(X1[X2==9:12]))) with(mydata, sum(table(X1[X2==9|X2==10|X2==11|X2==12]))) but to my immense frustration it actually seems to work fine there, the same number, 4, both times. However, it does generate the following error message: In X2 == 9:12 : longer object length is not a multiple of shorter object length I know I can use X1[ Time 9 Time 3] but I would like to know what is wrong with the 5:8 usage in case I put it somewhere else and don't notice the problem. Many thanks! Chris Beeley Institute of Mental Health __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to remove rows based on frequency of factor and then difference date scores
Many thanks to you both. I have now filed away for future reference the 2 factor tapply as well as the extremely useful looking plyr library. And the code worked beautifully :-) On 24 Aug 2010, at 19:47, Abhijit Dasgupta, PhD aikidasgu...@gmail.com wrote: The paste-y argument is my usual trick in these situations. I forget that tapply can take multiple ordering arguments :) Abhijit On 08/24/2010 02:17 PM, David Winsemius wrote: On Aug 24, 2010, at 1:59 PM, Abhijit Dasgupta, PhD wrote: The only problem with this is that Chris's unique individuals are a combination of Type and ID, as I understand it. So Type=A, ID=1 is a different individual from Type=B,ID=1. So we need to create a unique identifier per person, simplistically by uniqueID=paste(Type, ID, sep=''). Then, using this new identifier, everything follows. I see your point. I agree that a tapply method should present both factors in the indices argument. new.df - txt.df[ -which( txt.df$nn =1), ] new.df - new.df[ with(new.df, order(Type, ID) ), ] # and possibly needs to be ordered? new.df$diffdays - unlist( tapply(new.df$dt2, list(new.df$ID, new.df$Type), function(x) x[1] -x) ) new.df Type ID Date Valuedt2 nn diffdays 1A 1 16/09/2020 8 2020-09-16 30 2A 1 23/09/2010 9 2010-09-23 3 3646 4B 1 13/5/2010 6 2010-05-13 30 But do not agree that you need, in this case at least, to create a paste()-y index. Agreed, however, such a construction can be useful in other situations. -- Abhijit Dasgupta, PhD Director and Principal Statistician ARAASTAT Ph: 301.385.3067 E: adasgu...@araastat.com W: http://www.araastat.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to remove rows based on frequency of factor and then difference date scores
Hello- A basic question which has nonetheless floored me entirely. I have a dataset which looks like this: Type ID DateValue A 116/09/2020 8 A 1 23/09/2010 9 B 3 18/8/20107 B 1 13/5/20106 There are two Types, which correspond to different individuals in different conditions, and loads of ID labels (1:50) corresponding to the different individuals in each condition, and measurements at different times (from 1 to 10 measurements) for each individual. I want to perform the following operations: 1) Delete all individuals for whom only one measurement is available. In the dataset above, you can see that I want to delete the row Type B ID 3, and Type B ID 1, but without deleting the Type A ID 1 data because there is more than one measurement for Type A ID 1 (but not for Type B ID1) 2) Produce difference scores for each of the Dates, so each individual (Type A ID1 and all the others for whom more than one measurement exists) starts at Date 1 and goes up in integers according to how many days have elapsed. I just know there's some incredibly cunning R-ish way of doing this but after many hours of fiddling I have had to admit defeat. I would be very grateful for any words of advice. Many thanks, Chris Beeley, Institute of Mental Health, UK __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Odp: Problem with aggregating data across time points
on until 31-12-2009. The last four variables which you can see at the end of the email are my dependent variables, they are different types of violent and self harming behaviour shown by patients in a psychiatric hospital. What I want to do is: A) sum each of the dependent variables for each of the dates (so e.g. in the example above for 1-4-2007 it would be 3+2=5, 0+1=1, 1+2=3, and 3+4=7 for each of the variables) B) do this sum, but only in each location this time (location is the first variable)- so the sum for 1-4-2007 in location A, sum for 1-4-2007 in location B, and so on and so on. Because this is divided across locations, some dates will have no data going into them and will return 0 sums. Crucially I still want these dates to appear- so e.g. 21-5-2008 would appear as 0 0 0 0, then 22-5-2008 might have 1 2 0 0, then 23-5-2008 0 0 0 0 again, and etc. I've had several abortive attempts and done some Googling but have got nowhere. I'd greatly appreciate any advice. Many thanks, Chris Beeley (Institute of Mental Health, UK) structure(list(Location = structure(c(1L, 2L, 2L, 1L, 3L, 5L, 5L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 1L, 5L, 5L, 5L, 5L, 6L, 1L, 2L, 3L, 5L, 6L, 6L, 6L, 7L, 7L, 5L, 5L, 4L, 4L, 4L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 7L, 7L, 7L, 6L, 5L, 4L, 4L, 6L, 5L, 2L, 2L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 5L, 5L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c(, A, B, C, D, E, F), class = factor), Sex = c(NA, 1L, NA, NA, NA, 1L, 2L, NA, NA, 2L, 2L, NA, 2L, 2L, 1L, 1L, NA, 2L, 2L, 2L, 1L, NA, NA, 1L, 1L, 1L, 1L, 2L, 1L, 2L, NA, 1L, 1L, NA, 1L, NA, NA, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, NA, 1L, 2L, NA, 1L, 1L, NA, 1L, NA, 1L, 2L, NA, 1L, 1L, NA, 1L, 1L, 1L, NA, 2L, 2L, 1L, 2L, 1L ), Date = structure(c(1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 3L, 3L, 1L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 3L, 4L, 1L, 4L, 4L, 1L, 4L, 1L, 4L, 4L, 1L, 4L, 4L, 1L, 4L, 4L, 4L, 1L, 4L, 4L, 4L, 4L, 4L), .Label = c(, 01/04/07, 02/04/07, 03/04/07 ), class = factor), Time = structure(c(1L, 28L, 1L, 1L, 1L, 1L, 20L, 1L, 1L, 37L, 37L, 2L, 13L, 31L, 1L, 17L, 1L, 34L, 38L, 39L, 23L, 1L, 1L, 24L, 14L, 16L, 1L, 33L, 30L, 10L, 1L, 6L, 8L, 1L, 26L, 1L, 1L, 13L, 3L, 4L, 1L, 1L, 35L, 36L, 25L, 9L, 11L, 5L, 22L, 1L, 10L, 30L, 1L, 19L, 15L, 1L, 29L, 1L, 27L, 10L, 2L, 21L, 18L, 1L, 23L, 32L, 36L, 1L, 30L, 7L, 12L, 1L, 15L), .Label = c(, , 02:24:00, 03:44:00, 04:30:00, 07:00:00, 08:35:00, 09:20:00, 09:30:00, 10:00:00, 10:15:00, 10:45:00, 11:00:00, 11:20:00, 11:30:00, 11:35:00, 11:50:00, 12:00:00, 12:25:00, 12:30:00, 12:45:00, 15:00:00, 15:15:00, 15:30:00, 15:35:00, 17:15:00, 17:50:00, 18:00:00, 19:00:00, 19:30:00, 19:50:00, 20:00:00, 20:30:00, 20:55:00, 22:15:00, 22:30:00, 22:35:00, 22:40:00, 23:10:00 ), class = factor), verbal = c(NA, 3L, NA, NA, NA, 3L, 0L, NA, NA, 0L, 0L, NA, 0L, 0L, 0L, 4L, NA, 0L, 0L, 0L, 4L, NA, NA, 4L, 3L, 0L, 4L, 0L, 0L, 0L, NA, 0L, 0L, NA, 0L, NA, NA, 4L, 0L, 4L, 0L, 0L, 4L, 1L, 4L, 3L, 0L, 0L, 0L, NA, 4L, 0L, NA, 0L, 3L, NA, 1L, NA, 0L, 3L, NA, 1L, 4L, NA, 4L, 0L, 0L, NA, 0L, 0L, 0L, 0L, 1L), self.harm = c(NA, 0L, NA, NA, NA, 0L, 0L, NA, NA, 0L, 1L, NA, 2L, 0L, 0L, 2L, NA, 2L, 0L, 2L, 0L, NA, NA, 0L, 0L, 2L, 0L, 1L, 2L, 1L, NA, 0L, 0L, NA, 0L, NA, NA, 0L, 2L, 0L, 1L, 1L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, NA, 0L, 2L, NA, 0L, 0L, NA, 0L, NA, 4L, 0L, NA, 1L, 0L, NA, 1L, 3L, 1L, NA, 0L, 0L, 0L, 1L, 0L), violence_objects = c(NA, 0L, NA, NA, NA, 0L, 0L, NA, NA, 0L, 0L, NA, 0L, 0L, 0L, 3L, NA, 0L, 0L, 0L, 0L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA, 0L, 0L, NA, 0L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 4L, 0L, 4L, NA, 0L, 0L, NA, 0L, 0L, NA, 0L, NA, 0L, 0L, NA, 0L, 0L, NA, 0L, 0L, 0L, NA, 0L, 0L, 0L, 0L, 0L), violence = c(NA, 0L, NA, NA, NA, 0L, 1L, NA, NA, 3L, 0L, NA, 0L, 1L, 1L, 1L, NA, 1L, 1L, 0L, 0L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA, 3L, 3L, NA, 2L, NA, NA, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 3L, 0L, NA, 0L, 0L, NA, 2L, 0L, NA, 0L, NA, 0L, 0L, NA, 0L, 0L, NA, 0L, 0L, 0L, NA, 3L, 3L, 2L, 0L, 0L)), .Names = c(Location, Sex, Date, Time, verbal, self.harm, violence_objects, violence), class = data.frame, row.names = c(NA, -73L)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT
[R] Problem with aggregating data across time points
Hello- I have a dataset which basically looks like this: Location Sex Date Time VerbalSelf harm Violence_objects Violence A 1 1-4-2007 1800 3 0 1 3 A 1 1-4-2007 1230 21 2 4 D 2 2-4-2007 1100 04 0 0 ... I've put a dput of the first section of the data at the end of this email. Basically I have these data for several days across all of the dates, so 2 or more on 1-4-2007, 2 or more on 2-4-2007, and so on until 31-12-2009. The last four variables which you can see at the end of the email are my dependent variables, they are different types of violent and self harming behaviour shown by patients in a psychiatric hospital. What I want to do is: A) sum each of the dependent variables for each of the dates (so e.g. in the example above for 1-4-2007 it would be 3+2=5, 0+1=1, 1+2=3, and 3+4=7 for each of the variables) B) do this sum, but only in each location this time (location is the first variable)- so the sum for 1-4-2007 in location A, sum for 1-4-2007 in location B, and so on and so on. Because this is divided across locations, some dates will have no data going into them and will return 0 sums. Crucially I still want these dates to appear- so e.g. 21-5-2008 would appear as 0 0 0 0, then 22-5-2008 might have 1 2 0 0, then 23-5-2008 0 0 0 0 again, and etc. I've had several abortive attempts and done some Googling but have got nowhere. I'd greatly appreciate any advice. Many thanks, Chris Beeley (Institute of Mental Health, UK) structure(list(Location = structure(c(1L, 2L, 2L, 1L, 3L, 5L, 5L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 1L, 5L, 5L, 5L, 5L, 6L, 1L, 2L, 3L, 5L, 6L, 6L, 6L, 7L, 7L, 5L, 5L, 4L, 4L, 4L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 7L, 7L, 7L, 6L, 5L, 4L, 4L, 6L, 5L, 2L, 2L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 5L, 5L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c(, A, B, C, D, E, F), class = factor), Sex = c(NA, 1L, NA, NA, NA, 1L, 2L, NA, NA, 2L, 2L, NA, 2L, 2L, 1L, 1L, NA, 2L, 2L, 2L, 1L, NA, NA, 1L, 1L, 1L, 1L, 2L, 1L, 2L, NA, 1L, 1L, NA, 1L, NA, NA, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, NA, 1L, 2L, NA, 1L, 1L, NA, 1L, NA, 1L, 2L, NA, 1L, 1L, NA, 1L, 1L, 1L, NA, 2L, 2L, 1L, 2L, 1L ), Date = structure(c(1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 3L, 3L, 1L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 3L, 4L, 1L, 4L, 4L, 1L, 4L, 1L, 4L, 4L, 1L, 4L, 4L, 1L, 4L, 4L, 4L, 1L, 4L, 4L, 4L, 4L, 4L), .Label = c(, 01/04/07, 02/04/07, 03/04/07 ), class = factor), Time = structure(c(1L, 28L, 1L, 1L, 1L, 1L, 20L, 1L, 1L, 37L, 37L, 2L, 13L, 31L, 1L, 17L, 1L, 34L, 38L, 39L, 23L, 1L, 1L, 24L, 14L, 16L, 1L, 33L, 30L, 10L, 1L, 6L, 8L, 1L, 26L, 1L, 1L, 13L, 3L, 4L, 1L, 1L, 35L, 36L, 25L, 9L, 11L, 5L, 22L, 1L, 10L, 30L, 1L, 19L, 15L, 1L, 29L, 1L, 27L, 10L, 2L, 21L, 18L, 1L, 23L, 32L, 36L, 1L, 30L, 7L, 12L, 1L, 15L), .Label = c(, , 02:24:00, 03:44:00, 04:30:00, 07:00:00, 08:35:00, 09:20:00, 09:30:00, 10:00:00, 10:15:00, 10:45:00, 11:00:00, 11:20:00, 11:30:00, 11:35:00, 11:50:00, 12:00:00, 12:25:00, 12:30:00, 12:45:00, 15:00:00, 15:15:00, 15:30:00, 15:35:00, 17:15:00, 17:50:00, 18:00:00, 19:00:00, 19:30:00, 19:50:00, 20:00:00, 20:30:00, 20:55:00, 22:15:00, 22:30:00, 22:35:00, 22:40:00, 23:10:00 ), class = factor), verbal = c(NA, 3L, NA, NA, NA, 3L, 0L, NA, NA, 0L, 0L, NA, 0L, 0L, 0L, 4L, NA, 0L, 0L, 0L, 4L, NA, NA, 4L, 3L, 0L, 4L, 0L, 0L, 0L, NA, 0L, 0L, NA, 0L, NA, NA, 4L, 0L, 4L, 0L, 0L, 4L, 1L, 4L, 3L, 0L, 0L, 0L, NA, 4L, 0L, NA, 0L, 3L, NA, 1L, NA, 0L, 3L, NA, 1L, 4L, NA, 4L, 0L, 0L, NA, 0L, 0L, 0L, 0L, 1L), self.harm = c(NA, 0L, NA, NA, NA, 0L, 0L, NA, NA, 0L, 1L, NA, 2L, 0L, 0L, 2L, NA, 2L, 0L, 2L, 0L, NA, NA, 0L, 0L, 2L, 0L, 1L, 2L, 1L, NA, 0L, 0L, NA, 0L, NA, NA, 0L, 2L, 0L, 1L, 1L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, NA, 0L, 2L, NA, 0L, 0L, NA, 0L, NA, 4L, 0L, NA, 1L, 0L, NA, 1L, 3L, 1L, NA, 0L, 0L, 0L, 1L, 0L), violence_objects = c(NA, 0L, NA, NA, NA, 0L, 0L, NA, NA, 0L, 0L, NA, 0L, 0L, 0L, 3L, NA, 0L, 0L, 0L, 0L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA, 0L, 0L, NA, 0L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 4L, 0L, 4L, NA, 0L, 0L, NA, 0L, 0L, NA, 0L, NA, 0L, 0L, NA, 0L, 0L, NA, 0L, 0L, 0L, NA, 0L, 0L, 0L, 0L, 0L), violence = c(NA, 0L, NA, NA, NA, 0L, 1L, NA, NA, 3L, 0L, NA, 0L, 1L, 1L, 1L, NA, 1L, 1L, 0L, 0L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA, 3L, 3L, NA, 2L, NA, NA, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 3L, 0L, NA, 0L, 0L, NA, 2L, 0L, NA, 0L, NA, 0L, 0L, NA, 0L, 0L, NA, 0L, 0L, 0L, NA, 3L, 3L, 2L, 0L, 0L)), .Names = c(Location, Sex, Date, Time, verbal, self.harm