Re: [R] Word cloud based on a specific site
Hi Jim and Bert! Thank you so much! I already did the search by "word cloud r". However, it return a lot of resources tell me how I make a word cloud from a dataset. For example [1] is a best resources and I understand how I made a word cloud from a csv file. However, I am interested in another approach. I would like to: 1. access a site www.foo.bar 2. search inside it for a word "tree" 3. made a word cloud from all others word in that page (i.e. index.html) 4. download all words in that page 5. so, I got a word cloud that tell how is the more frequent word linked to "tree" in a specif site. Is it possible in R? If it is, I will do a Google search. Have you a suggestion for search? Like Jim pointed me? ("word cloud R", for example). I'm not a native in English and I get difficult to made the correct terms to search. 1. http://www.datascribble.com/blog/data-science/r/building-word-cloud-r/ Thank you! A nice weekend! Marcelo On 29/03/21 at 07:16, Bert Gunter wrote: >Also (I think): >[1]https://cran.r-project.org/web/views/NaturalLanguageProcessing.html >(get to know the CRAN resources!). >Bert Gunter >"The trouble with having an open mind is that people keep coming along >and sticking things into it." >-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >On Mon, Mar 29, 2021 at 7:12 PM Jim Lemon <[2]drjimle...@gmail.com> >wrote: > > Hi Marcelo, > Just google for "word cloud r". Too much information. > Jim > On Tue, Mar 30, 2021 at 11:18 AM Marcelo Laia > <[3]marcelol...@gmail.com> wrote: > > > > Hi, > > > > I would like to do a word cloud in a specif site related to a > specific > > word. > > > > For example, I could be interested in discovery what are the words > > linked to word "tree" in a site like www.foo.bar and have the > result in > > a wordcloud image. > > > > Please, someone could me point me out a package or a bibliography > or > > tutorial or somethings else? > > > > Thank you so much! > > > > -- > > Marcelo > > > > __ > > [4]R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, > see > > [5]https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > [6]http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > __ > [7]R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > [8]https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > [9]http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > Referências > >1. https://cran.r-project.org/web/views/NaturalLanguageProcessing.html >2. mailto:drjimle...@gmail.com >3. mailto:marcelol...@gmail.com >4. mailto:R-help@r-project.org >5. https://stat.ethz.ch/mailman/listinfo/r-help >6. http://www.R-project.org/posting-guide.html >7. mailto:R-help@r-project.org >8. https://stat.ethz.ch/mailman/listinfo/r-help >9. http://www.R-project.org/posting-guide.html -- Marcelo __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Word cloud based on a specific site
Hi, I would like to do a word cloud in a specif site related to a specific word. For example, I could be interested in discovery what are the words linked to word "tree" in a site like www.foo.bar and have the result in a wordcloud image. Please, someone could me point me out a package or a bibliography or tutorial or somethings else? Thank you so much! -- Marcelo __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2 stat_smooth formula different units
Hi Rui, You are very welcome! On 11/11/20 at 11:10, Rui Barradas wrote: > > dput(head(dat, 20)) > structure(list(Bloco = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Espacamento = c("3 x 1", "3 x 1", "3 x 1", "3 x 1", "3 x 1", "3 x 1", "3 x 1", "3 x 1", "3 x 1", "3 x 1", "3 x 1", "3 x 1", "3 x 1", "3 x 1", "3 x 1", "3 x 1", "3 x 1", "3 x 1", "3 x 1", "3 x 1"), Clone = c("AEC 0020", "AEC 0020", "AEC 0020", "AEC 0020", "AEC 0020", "AEC 0020", "AEC 0020", "AEC 0020", "AEC 0020", "AEC 0020", "AEC 0020", "AEC 0020", "AEC 0020", "AEC 0020", "AEC 0020", "AEC 0020", "AEC 0020", "AEC 0020", "AEC 0020", "AEC 0020"), Sulco = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L), Arvore = c(1L, 3L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 1L, 2L, 4L, 5L, 6L, 8L, 9L, 10L, 11L, 1L, 2L), DAP = c(7, 7.73, 7.64, 9.61, 11.94, 11.46, 11.68, 11.84, 13.37, 11.14, 10.5, 12.19, 7.23, 8.94, 9.99, 12.67, 5.09, 6.37, 10.28, 8.12), Altura = c(14.8, 17.2, 14.8, 17.2, 18.5, 19.2, 19.2, 18, 19.3, 18.2, 18.1, 18.1, 15.7, 17.1, 19.3, 19.2, 10.9, 13.2, 17.1, 16.5), Observacao = c("", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "")), row.names = c(1L, 3L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 18L, 19L, 20L, 21L, 22L, 23L), class = "data.frame") -- Marcelo __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot2 stat_smooth formula different units
Hi, I am running these approaches: Model 1 ggplot( dat , aes(x=DAP, y=Altura, color=as.factor(Espacamento) )) + geom_point(size=0.5) + stat_smooth(method = "lm", formula = y ~ x + I(x^2), size = 1) + facet_grid(Espacamento ~ Clone) + theme(legend.position="none") Model 2 ggplot( dat , aes(x=DAP, y=Altura, color=as.factor(Espacamento) )) + geom_point(size=0.5) + stat_smooth(method = "lm", formula = I(log(y)) ~ I(1/x), size = 1) + facet_grid(Espacamento ~ Clone) + theme(legend.position="none") In model 1, both, original variables and fitted variables are plotted in the same units. However, in the second one, points is plotted in the original variable, instead of fitted variables. I know that exp(fitted(model2)) do the trick and return the variables to the original units. But, I don't know how I do this in the stat_smooth function. Please, have you a tip for help me? Thank you! -- Marcelo __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in Cairo::Cairo(file = imgName, unit = "in", dpi = dpi, width = w, : Failed to create Cairo backend!
1. https://github.com/jsychong/MetaboAnalystR 2. https://www.dropbox.com/s/rchhnjg0gziwr2l/heatmap_1_dpi72.png?dl=0 3. https://www.dropbox.com/s/rchhnjg0gziwr2l/heatmap_1_dpi72.png?dl=0 On 04/03/19 at 02:41, Marcelo Laia wrote: > Hi, > > I'm trying to do a MetaboAnalystR [1]'s analysis with a large dataset. All > works > great except PlotHeatMap function. This functions plot two type of image > output: > "overview" and "detail". In "overview" mode, we can do plot the image in png > or > pdf. However, in this mode, we do not could see the heatmap genes labels [2]. > If > I try to "detail" mode, in pdf graphics device, an output image [3] is > generated. However, it wasn't opened in acroread, evince. It is only viewed in > xpdf. If I try to "detail" mode, in png graphics device, I got the error: > > Error in Cairo::Cairo(file = imgName, unit = "in", dpi = dpi, width = w, : > Failed to create Cairo backend! > > I figured out that this error is not MetaboAnalystR related. Maybe it is > related > with Cairo package/library. > > Someone already/yet having had this issue? Are there workaround for that? > > Best Wishes! > > -- > Marcelo -- Marcelo __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in Cairo::Cairo(file = imgName, unit = "in", dpi = dpi, width = w, : Failed to create Cairo backend!
Hi, I'm trying to do a MetaboAnalystR [1]'s analysis with a large dataset. All works great except PlotHeatMap function. This functions plot two type of image output: "overview" and "detail". In "overview" mode, we can do plot the image in png or pdf. However, in this mode, we do not could see the heatmap genes labels [2]. If I try to "detail" mode, in pdf graphics device, an output image [3] is generated. However, it wasn't opened in acroread, evince. It is only viewed in xpdf. If I try to "detail" mode, in png graphics device, I got the error: Error in Cairo::Cairo(file = imgName, unit = "in", dpi = dpi, width = w, : Failed to create Cairo backend! I figured out that this error is not MetaboAnalystR related. Maybe it is related with Cairo package/library. Someone already/yet having had this issue? Are there workaround for that? Best Wishes! -- Marcelo __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] To compare and filter text (mining data)
Hi, I have a experiment like this: Trat Rep Peak CAS 11 1123-92-2 11 2109-21-7 11 32867-05-2 11 ... ... 11 33 99-86-5 12 1562-74-3 12 2123-92-2 12 3109-21-7 12 ... ... 12 45 2867-05-2 ... 14 3 18 2867-05-2 Trat = Treatment - range from 1 to 14 Rep = Biological Replicate - range from 1 to 3 Peak = Peak from GC/MS chromatogram - range from 1 to n (n>1) CAS = oil CAS Number [1] I would like to compare all 14 treatments (3 replicates) and print only Trat and Rep and Peak that have exclusive CAS, and the CAS number, off course. In fact, I would like to know if there are exclusive CAS in a specific treatment. Is it possible to do it inside R? Could you share a code ou paper ou tutorial to do that? Or point me out a R package/library? Thank you very much! 1. https://www.cas.org/content/chemical-substances/faqs -- Marcelo __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] qRT-PCR Sample Maximization software analysis and proper experimental design advice
Hello, After a very good advice from here [1] I am learning about experimental qRT-PCR design and I found Hellemans et al., 2007 [2], as suggested by Jo, and Rieu and Powers 2009 [3]. So, I made a experimental design for me [4]. However, I am not sure what software I could use to do the analysis afterwards. There are qBase+, but we don't have grant to by it. So, we are try a workaround in R, that is opensource. Have you any suggestion to me about: A. My experimental design; http://goo.gl/68h1ul It is correct? If not, could you suggest me? B. How I could analyse my experimental desing? What software I could use? There are R package for that? I was learning about EasyqPCR package, but, may be it need IRC (Control) in all run (plate). 1. https://groups.yahoo.com/neo/groups/qpcrlistserver/conversations/topics/11975 2. http://genomebiology.com/2007/8/2/r19 3. http://dx.doi.org/10.1105%2Ftpc.109.066001 4. http://goo.gl/68h1ul Thank you very much! Marcelo Luiz de Laia Universidade Federal dos Vales do Jequitinhonha e Mucuri www.ufvjm.edu.br Brazil -- Laia, M. L. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [UPDATE] grofit issues with replicates - probit or logit or glmm
I done a mistake when I paste the script in the message. An update: url.csv - https://dl.dropboxusercontent.com/u/34009642/cepajabo07_wide_acumulado.csv data02 - read.table(url.csv, header=TRUE, sep=\t, dec=,) head(data02) timepoints - 1:5 # 5 days time - t(matrix(rep(timepoints, 120), c(5, 120))) # 5 days and 120 experimentos # (6 iso * 4 doses # * 5 rep) time MyOpt1 - grofit.control(smooth.gc = 0.5, parameter = 28, interactive = FALSE) MyOpt2 - grofit.control(smooth.gc = 0.5, parameter = 28, interactive = FALSE, log.x.dr = TRUE) TestRun1 - grofit(time, data02, TRUE, MyOpt1) TestRun2 - grofit(time, data02, TRUE, MyOpt2) TestRun1$drFit TestRun2$drFit colData - c(black, cyan, magenta, blue) plot(TestRun1$gcFit, opt = s, colData = colData, colSpline = 1, pch = 1:4, cex = 1) plot(TestRun2$gcFit, opt = s, colData = colData, colSpline = 1, pch = 1:4, cex = 1) plot(TestRun1$drFit$drFittedSplines[[1]], colData = colData, pch = 1:4, cex = 1) plot(TestRun2$drFit$drFittedSplines[[1]], colData = colData, pch = 1:4, cex = 1) Thank you very much! Marcelo On 12/02/15 at 12:57am, Marcelo Laia wrote: Hello I tried use grofit package in our data set. We provide a subset of our data with X iso, and 4 doses, and insect died was count each day for long 5 days. We started with Y insects per dishes. When one is dead, it was counted and removed. Died insect is cumulative in the next days. i.e. day 1 died 1. day 2 no died, so, day 2 is assigned 1 died (from day 1). Here is the script: library(lattice) library(grofit) library(repmis) url.csv - https://dl.dropboxusercontent.com/u/34009642/cepajabo07_wide_acumulado.csv data02 - read.table(url.csv, header=TRUE, sep=\t, dec=,) head(data02) timepoints - 1:5 # 5 days time - t(matrix(rep(timepoints, 120), c(5, 120))) # 5 days and 120 experiments # (6 iso * 4 doses # * 5 rep) time TestRun1$drFit TestRun2$drFit colData - c(black, cyan, magenta, blue) plot(TestRun1$gcFit, opt = s, colData = colData, colSpline = 1, pch = 1:4, cex = 1) plot(TestRun2$gcFit, opt = s, colData = colData, colSpline = 1, pch = 1:4, cex = 1) plot(TestRun1$drFit$drFittedSplines[[1]], colData = colData, pch = 1:4, cex = 1) plot(TestRun2$drFit$drFittedSplines[[1]], colData = colData, pch = 1:4, cex = 1) The problem: grofit didn't deal with replicates and do a curve for each ones. Is it a way to get response curve with the replicates? We are interested in LD50, and dose response curve, and graphs. Any suggestion is very welcome! Thank you! -- Marcelo -- __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] grofit issues with replicates - probit or logit or glmm
Hello I tried use grofit package in our data set. We provide a subset of our data with X iso, and 4 doses, and insect died was count each day for long 5 days. We started with Y insects per dishes. When one is dead, it was counted and removed. Died insect is cumulative in the next days. i.e. day 1 died 1. day 2 no died, so, day 2 is assigned 1 died (from day 1). Here is the script: library(lattice) library(grofit) library(repmis) url.csv - https://dl.dropboxusercontent.com/u/34009642/cepajabo07_wide_acumulado.csv data02 - read.table(url.csv, header=TRUE, sep=\t, dec=,) head(data02) timepoints - 1:5 # 5 days time - t(matrix(rep(timepoints, 120), c(5, 120))) # 5 days and 120 experiments # (6 iso * 4 doses # * 5 rep) time TestRun1$drFit TestRun2$drFit colData - c(black, cyan, magenta, blue) plot(TestRun1$gcFit, opt = s, colData = colData, colSpline = 1, pch = 1:4, cex = 1) plot(TestRun2$gcFit, opt = s, colData = colData, colSpline = 1, pch = 1:4, cex = 1) plot(TestRun1$drFit$drFittedSplines[[1]], colData = colData, pch = 1:4, cex = 1) plot(TestRun2$drFit$drFittedSplines[[1]], colData = colData, pch = 1:4, cex = 1) The problem: grofit didn't deal with replicates and do a curve for each ones. Is it a way to get response curve with the replicates? We are interested in LD50, and dose response curve, and graphs. Any suggestion is very welcome! Thank you! -- Marcelo __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Dose response glmer
I am trying to do a dose response in my dataset, but nothing go a head. I am adapting a script shared on the web, but I unable to make it useful for my dataset. I would like to got the LC50 for each Isolado and if there are differences between then. My data is https://dl.dropboxusercontent.com/u/34009642/R/dead_alive.csv Here what I copy and try to modifying: library(plyr) library(lattice) library(lme4) library(arm) library(lmerTest) library(faraway) library(car) ## Conc are concentration. I input only the coef, but, all, ## except 0, that is my control (without Isolado), are base ## 10. i.e: 10^4, 10^6 e 10^8. data - read.table(dead_alive.csv, sep = \t, dec=,, header = TRUE) data$Rep - factor(data$Rep) mean_data - ddply(data, c(Isolado, Conc, Day), numcolwise(mean)) xyplot(Dead/(Dead + Live) ~ Conc|Isolado, groups = Day, type = l, ylab='Probability', xlab='Dose', data = mean_data) xyplot(Dead/(Dead + Live) ~ Day|Isolado, groups = Conc, type = l, ylab='Probability', xlab='Dose', data = mean_data) model.logit - glmer(cbind(Dead, Live) ~ -1 + Isolado + Isolado:Conc + (0 + Conc|Day), family=binomial, data = data) Anova(model.logit) summary(model.logit) model.probit - glmer(cbind(Dead, Live) ~ Isolado + Isolado:Conc + (0 + Conc|Day), family=binomial(link=probit), data=data) model.cloglog - glm(cbind(Dead, Live) ~ Isolado + Isolado:Conc + (1 + Conc|Day), family=binomial(link=cloglog), data=data) x - seq(0,8, by=0.2) prob.logit - ilogit(model.logit$coef[1] + model.logit$coef[2]*x) prob.probit - pnorm(model.probit$coef[1] + model.probit$coef[2]*x) prob.cloglog - 1-exp(-exp((model.cloglog$coef[1] + model.cloglog$coef[2]*x))) with(subdata, plot(Dead/(Dead + Live) ~ Conc, group = Day, ) lines(x, prob.logit) # solid curve = logit lines(x, prob.probit, lty=2) # dashed = probit lines(x, prob.cloglog, lty=5) # longdash = c-log-log plot(x, prob.logit, type='l', ylab='Probability', xlab='Dose') # solid curve = logit lines(x, prob.probit, lty=2) # dashed = probit lines(x, prob.cloglog, lty=5) # longdash = c-log-log matplot(x, cbind(prob.probit/prob.logit, (1-prob.probit)/(1-prob.logit)), type='l', xlab='Dose', ylab='Ratio') matplot(x, cbind(prob.cloglog/prob.logit, (1-prob.cloglog)/(1-prob.logit)), type='l', xlab='Dose', ylab='Ratio') model.logit.data - glm(cbind(Dead,Live) ~ Conc, family=binomial, data=data) pred2.5 - predict(model.logit.data, newdata=data.frame(Conc=2.5), se=T) ilogit(pred2.5$fit) ilogit(c(pred2.5$fit - 1.96*pred2.5$se.fit, pred2.5$fit + 1.96*pred2.5$se.fit)) ## what are this 1.96 Where it come from? ### If there are several predictors, just put in the code ### above something like: ### newdata=data.frame(conc=2.5,x2=4.6,x3=5.8) ### or whatever is the desired set of predictor values... ### Effective Dose calculation: # What is the concentration that yields a probability of 0.5 of an # insect dying? library(MASS) dose.p(model.logit.data, p=0.5) # A 95% CI for the ED50: c(2 - 1.96*0.1466921, 2 + 1.96*0.1466921) # What is the concentration that yields a probability of 0.8 of an # insect dying? dose.p(model.logit.data, p=0.8) -- Laia, ML __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Book recomendation: Repeated Measurements
Dear, I need a book about repeated measurements analisys with R. In Amazon, I found this one: Models for Repeated Measurements (Oxford Statistical Science Series) J. K. Lindsey 1999 2ed. I would like a book with examples, data and R code. I work with trees (forest breeding). Could you recomend a book to me? Thank you very much! -- Laia, M. L. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Book recomendation: Repeated Measurements
Yes! I have a look there before I post! Thank you very much! 2013/7/14 Bert Gunter gunter.ber...@gene.com: Look before you post -- specifically at the Books subpage of CRAN's R homepage: http://www.r-project.org/ -- Bert On Sun, Jul 14, 2013 at 12:56 PM, Marcelo Laia marcelol...@gmail.com wrote: Dear, I need a book about repeated measurements analisys with R. In Amazon, I found this one: Models for Repeated Measurements (Oxford Statistical Science Series) J. K. Lindsey 1999 2ed. I would like a book with examples, data and R code. I work with trees (forest breeding). Could you recomend a book to me? Thank you very much! -- Laia, M. L. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm -- Laia, M. L. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] setup multcompBoxplot
I did a multcompBoxplot like this: xzx -multcompBoxplot(DESC_COMP ~ CLONETRAT,data = data.cerato, + sortFn = median, decreasing=TRUE, + horizontal=FALSE, compFn = TukeyHSD, + plotList=list( + boxplot=list(fig=c(0, 0.75, 0, 1), + las=3, cex.axis=1.0), + multcompLetters=list( +fig=c(0.87, 0.97, 0.03, 0.98), +type='Letters') )) and got this graphic: https://dl.dropbox.com/u/34009642/boxplot_DESC_S_interacao.jpg I would like to got all letters (a, b, c, ...) matching to the x labels. Please, could you see that the a is out of the first x label? Please, could you help me? -- O___ - Marcelo Luiz de Laia c/ /'_ - Diamantina (*) \(*)- Minas Gerais ~- Brazil ^- Linux user number 487797 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Improve lattice XYPLOT graphic
Hi, How I could improve this graphic? http://www.divshare.com/download/10754700-f81 I would like to write groups labels in each panel and override the labels from object. I am try this code: xyplot(percentagem.mortos~tempo|trat, data=bio.ens, type=a, auto.key=list(points=FALSE, lines=TRUE, columns=3), ylim=c(0,100),scales = list(x = list(at = c(48, 72, 96), labels = c(48, 72, 96)), cex=2.0), ylab=list(expression(Percentagem de mortos),cex=2.5), xlab=list(expression(Horas após aplicação), cex=2.5), between = list(x = c(0.25, 0.25), y = 0.25), par.settings = simpleTheme(col=blue, pch=20, cex=2.3, lwd=6), par.strip.text = list(lines = 1, cex = 1.5), #strip = strip.custom(strip.names=c(TRUE, FALSE), #strip.levels=c(FALSE,FALSE), # var.name=expression(c(113 - 3%*%10^8,H_2O)) #factor.levels=expression(c(113 - 3%*%10^8,H_2O)) ) I already tried many options, but with out success. Google have showed me many suggestions, like this one: http://tolstoy.newcastle.edu.au/R/e2/help/06/09/1409.html I had used this suggestion and the next code to pass the groups labels to xyplot: levs - c(expression(133 - 3%*%10^8), expression(H_2O), expression(H_2O+Tween), expression(113 - 1%*%10^8), expression(113 - 3%*%10^8), expression(133 - 1%*%10^8)) without success. I tried to change the labels on data file and/or bio.ens object, but no success. Do you have any suggestion for me? Different colors with legend? What you suggest me? 113 and 133 are different isolates. 1%*%10^8 and 3%*%10^8 are different treatments. H_2O and H_2O+Tween are my controls. Any suggestion is very welcome! Thank you very much -- Marcelo Luiz de Laia Universidade do Estado de Santa Catarina UDESC - www.cav.udesc.br Lages - SC - Brazil Linux user number 487797 -- Marcelo Luiz de Laia Universidade do Estado de Santa Catarina UDESC - www.cav.udesc.br Lages - SC - Brazil Linux user number 487797 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R died on large data set
Hi, I am trying to run a script on R and it died before finish. I already read the list archives, and memory help pages (http://tinyurl.com/yaxco6w), but I am unable to solve the issue. My Debian shows: marc...@laia:~$ ulimit unlimited marc...@laia:~$ On system monitor (gnome) I see that R reaches 1.9 Gb, before die. The R code is: ls() ## only todos.norm object are listed [1] todos.norm dim(todos.norm) [1] 9600 15 library(cluster) pearson.dist - as.dist(1-cor(t(todos.norm), method=pearson)) Died What I could do to solve my problem? sessionInfo() ## after restart R R version 2.10.1 (2009-12-14) i486-pc-linux-gnu locale: [1] LC_CTYPE=pt_BR.UTF-8 LC_NUMERIC=C [3] LC_TIME=pt_BR.UTF-8LC_COLLATE=pt_BR.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=pt_BR.UTF-8 [7] LC_PAPER=pt_BR.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=pt_BR.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base My system: Linux laia 2.6.32-trunk-686 #1 SMP Sun Jan 10 06:32:16 UTC 2010 i686 GNU/Linux Than you very much! -- Marcelo Luiz de Laia Brazil Linux user number 487797 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Categorical data repeated on time analysis
Hi, I am trying to analyze a data set when nematodes were killed after a drug administration. We have counted the number of nematode died and the number of nematode survival at three time points. So, there are 100% died in some plot and could be found zero percent in another. Then, the data set have a lot of zeros. I have googled and found a lot of information. Moreover, my data isn't adjust to a normal distribution. I have transformed it to square root, but, due to zeros, it don't fit to a normal. The design is a split-plot. I divided a Petri dish in four parts and each day we measured one of they. Here is a sample of the data: tratrep timekilled living percent.killed percent.living 1 1 48 8 6 57.14 42.86 1 2 48 17 15 53.13 46.88 1 3 48 6 4 60.00 40.00 1 1 72 17 15 53.13 46.88 1 2 72 24 33 42.11 57.89 1 3 72 11 0 100.00 0.00 1 1 96 18 28 39.13 60.87 1 2 96 19 6 76.00 24.00 1 3 96 9 10 47.37 52.63 2 1 48 7 2 77.78 22.22 2 2 48 10 4 71.43 28.57 2 3 48 8 2 80.00 20.00 2 1 72 5 2 71.43 28.57 2 2 72 14 13 51.85 48.15 2 3 72 30 1 96.77 3.23 2 1 96 2 6 25.00 75.00 2 2 96 11 15 42.31 57.69 2 3 96 3 2 60.00 40.00 3 1 48 8 8 50.00 50.00 3 2 48 6 7 46.15 53.85 3 3 48 0 2 0.00100.00 3 1 72 3 3 50.00 50.00 3 2 72 5 1 83.33 16.67 3 3 72 18 10 64.29 35.71 3 1 96 4 0 100.00 0.00 3 2 96 0 0 0.000.00 3 3 96 18 19 48.65 51.35 We have counted killed and living because free-living nematode reproduce very fast, so I need to know the number of living in the medium. What you suggest me for analyze this on R? What transformation I could do? There were a specific package for that? Have you did something like this? Thank you very much -- Marcelo Luiz de Laia Lages - SC - Brazil Linux user number 487797 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Repeated measures unbalanced in a split-split design
Hi, I have a experiment with block, plots, sub-plots, and sub-sub-plots with repeated measures and 3 factors (factorial design) when we have been observed diameter (mm), high (cm) and leaves number (count). However, we don't have one treatment in one factor, so, my design is unbalanced. On a previous message here, a friend tell me that It appears to me that your design is a split-split plot with repeated measures at the split-split plot level. Because you have multiple sizes of experimental unit (blocks, plots and sub-plots), you have a different random error term at each size of unit, so you have to analyze it as a mixed-effects model. For the diameter and height measurements, you can probably get away with using normal errors, but for the counts, you may well have to use a generalized linear mixed model. So, I am trying to analyze my data with car package. I have: time (days after germination) - 4 levels (38, 53, 73, 85) Hormone - 2 levels (SH, CH) on sub-plots Block - 4 blocks Treatment - 6 levels (1, 2, 3, 4, 5, and 6) on sub-sub-plots Plant - subjects I measured Diameter (mm), Height (cm), HD (height/diameter), and Number of Leaves (count) at each time point. But, plant can be died and I got NAs. However, Treatment 6 (control) is only present on SH sub-plots. It isn't present on CH sub-plots. I try this model: idata.Cana - data.frame(Time=factor(c(38,53,73,85))) idata.Cana mod.Cana - lm(cbind(Diameter.38, Diameter.53, Diameter.73, Diameter.85) ~ Treatment*Hormone, data=marcelo.subset) mod.Cana Call: lm(formula = cbind(Diameter.38, Diameter.53, Diameter.73, Diameter.85) ~ Treatment * Hormone, data = marcelo.subset) Coefficients: Diameter.38 Diameter.53 Diameter.73 Diameter.85 (Intercept)1.24000 1.35750 1.99375 2.31000 Treatment2-0.31625 -0.14250 0.07500 -0.13875 Treatment3-0.19250 -0.01500 -0.20875 -0.36875 Treatment4-0.35375 -0.08500 -0.22750 -0.27125 Treatment5-0.29125 0.04875 -0.14375 -0.26375 Treatment6-0.00125 -0.25750 -0.81125 -0.77750 HormoneSH -0.30875 -0.08875 0.31500 0.07000 Treatment2:HormoneSH 0.19875 0.11250 -0.44500 -0.24875 Treatment3:HormoneSH 0.15375 0.01875 -0.12125 0.07000 Treatment4:HormoneSH 0.28000 -0.04250 -0.41750 -0.38750 Treatment5:HormoneSH 0.40875 -0.11125 -0.17750 -0.05125 Treatment6:HormoneSHNA NA NA NA av.Cana - Anova(mod.Cana, idata=idata.Cana, idesign= ~ as.factor(Idade)) Erro em solve.default(crossprod(model.matrix(mod))) : rotina Lapack dgesv: sistema é exatamente singular How I model my data to analyze it with this unbalanced design? How I could use the block factor on model? Or it is not necessary? And sub-plots? Please, here you could find my design http://www.divshare.com/download/9431636-e0c and here you could find a subset of my data http://www.divshare.com/download/9456640-fd7 Thank you very much! -- Marcelo Luiz de Laia Universidade do Estado de Santa Catarina UDESC - www.cav.udesc.br Lages - SC - Brazil Linux user number 487797 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Input file format to Anova from car package
Dear list member, My question is related to input file format to an Anova from car package. Here is an example of what I did: My file format is like this (and I dislike the idea that I will need to recode it): Hormone day Block Treatment Plant Diameter High N.Leaves SH 23 1 1 1 3.19 25.3 2 SH 23 1 1 2 3.42 5.5 1 SH 23 1 2 1 2.19 5.2 2 SH 23 1 2 2 2.17 7.6 2 CH 23 1 1 1 3.64 6.5 2 CH 23 1 1 2 2.8 3.7 2 CH 23 1 2 1 3.28 4 2 CH 23 1 2 2 2.82 5.2 2 SH 23 2 1 1 2.87 6.4 2 SH 23 2 1 2 2.8 6 2 SH 23 2 2 1 2.02 4.5 2 SH 23 2 2 2 3.15 5.5 2 CH 23 2 1 1 3.22 2.3 2 CH 23 2 1 2 2.45 3.8 2 CH 23 2 2 1 1.85 3.5 2 CH 23 2 2 2 3.13 4.4 2 CH 39 1 1 1 2.64 6 2 CH 39 1 1 2 4.33 10 2 CH 39 1 2 1 3.74 9 2 CH 39 1 2 2 3.23 8 2 SH 39 1 1 1 3.8 8 2 SH 39 1 1 2 2.35 9 2 SH 39 1 2 1 3.66 6 2 SH 39 1 2 2 3.92 7 2 CH 39 2 1 1 3.28 7 2 CH 39 2 1 2 4.99 7 2 CH 39 2 2 1 2.49 6 2 CH 39 2 2 2 4.75 7 2 SH 39 2 1 1 3.35 5 2 SH 39 2 1 2 4.38 7 2 SH 39 2 2 1 5.11 9 2 SH 39 2 2 2 2.71 5 2 idata - data.frame(Idade=factor(c(23,39))) a = read.table(clipboard, sep= , head=T) mod.ok - lm(Diameter ~ Treatment*Hormone, data=a) av.ok - Anova(mod.ok, idata=idata, idesign=~as.factor(day)) summary(av.ok) Sum Sq Df F valuePr(F) Min. : 0.02153 Min. : 1.00 Min. :0.02828 Min. :0.5105 1st Qu.: 0.06169 1st Qu.: 1.00 1st Qu.:0.06346 1st Qu.:0.6331 Median : 0.20667 Median : 1.00 Median :0.09863 Median :0.7558 Mean : 5.43711 Mean : 7.75 Mean :0.19043 Mean :0.7113 3rd Qu.: 5.58208 3rd Qu.: 7.75 3rd Qu.:0.27150 3rd Qu.:0.8117 Max. :21.31356 Max. :28.00 Max. :0.44437 Max. :0.8677 NA's :1.0 NA's :1. This result is wrong, I believe. Here, is a file format with repeated measures side-by-side: Hormone Block Treatment Plant Diameter.23 Diameter.39 High.23 High.39 N.Leaves.23 N.Leaves.39 SH 1 1 1 3.19 2.64 25.3 6 2 2 SH 1 1 2 3.42 4.33 5.5 10 1 2 SH 1 2 1 2.19 3.74 5.2 9 2 2 SH 1 2 2 2.17 3.23 7.6 8 2 2 CH 1 1 1 3.64 3.8 6.5 8 2 2 CH 1 1 2 2.8 2.35 3.7 9 2 2 CH 1 2 1 3.28 3.66 4 6 2 2 CH 1 2 2 2.82 3.92 5.2 7 2 2 SH 2 1 1 2.87 3.28 6.4 7 2 2 SH 2 1 2 2.8 4.99 6 7 2 2 SH 2 2 1 2.02 2.49 4.5 6 2 2 SH 2 2 2 3.15 4.75 5.5 7 2 2 CH 2 1 1 3.22 3.35 2.3 5 2 2 CH 2 1 2 2.45 4.38 3.8 7 2 2 CH 2 2 1 1.85 5.11 3.5 9 2 2 CH 2 2 2 3.13 2.71 4.4 5 2 2 idata - data.frame(day=factor(c(23,39))) a = read.table(clipboard, sep= , head=T) mod.ok - lm(cbind(Diameter.23,Diameter.39) ~ Treatment*Hormone, data=a) av.ok - Anova(mod.ok, idata=idata, idesign= ~ as.factor(day)) summary(av.ok) Type II Repeated Measures MANOVA Tests: -- Term: Treatment Response transformation matrix: (Intercept) Diameter.23 1 Diameter.39 1 Sum of squares and products for the hypothesis: (Intercept) (Intercept) 0.6765062 Sum of squares and products for error: (Intercept) (Intercept)13.05917 Multivariate Tests: Treatment Df test stat approx F num Df den Df Pr(F) Pillai1 0.0492517 0.6216377 1 12 0.44574 Wilks 1 0.9507483 0.6216377 1 12 0.44574 Hotelling-Lawley 1 0.0518031 0.6216377 1 12 0.44574 Roy 1 0.0518031 0.6216377 1 12 0.44574 -- Term: Hormone Response transformation matrix: (Intercept) Diameter.23 1 Diameter.39 1 Sum of squares and products for the hypothesis: (Intercept) (Intercept) 0.09150625 Sum of squares and products for error: (Intercept) (Intercept)13.05917 Multivariate Tests: Hormone Df test stat approx F num Df den Df Pr(F) Pillai1 0.0069583 0.08408456 1 12 0.77679 Wilks 1 0.9930417 0.08408456 1 12 0.77679 Hotelling-Lawley 1 0.0070070 0.08408456 1 12 0.77679 Roy 1 0.0070070 0.08408456 1 12 0.77679 -- Term: Treatment:Hormone Response transformation matrix: (Intercept) Diameter.23 1 Diameter.39 1 Sum of squares and products for the hypothesis: (Intercept) (Intercept)1.139556 Sum of squares and products for error: (Intercept) (Intercept)13.05917 Multivariate Tests: Treatment:Hormone Df test stat approx F num Df den Df Pr(F) Pillai1 0.0802576 1.047132 1 12 0.32636 Wilks 1 0.9197424 1.047132 1 12 0.32636 Hotelling-Lawley 1 0.0872610 1.047132 1 12 0.32636 Roy 1 0.0872610 1.047132 1 12 0.32636 -- Term: as.factor(day) Response transformation matrix: as.factor(day)1 Diameter.23 1 Diameter.39 -1 Sum of squares and products for the hypothesis:
[R] Repeated measures on a factorial unbalanced in a blocks with split-plot design
Dear all, I am trying to analyze data from an experiment like this: Factors: Hormone - Levels: SH, CH (S = without; C=with; H=Hormone) Time - Levels: 19/08/09, 04/09/09, 18/09/09, 08/10/09, 20/10/09 (DD/MM/YY) Nutrition - Levels: Completa, Sem (without) Macronutrition - Levels: Ca, K, Mg, P, Sem (without) Time is the measures day. It reflect the days after germination. Blocks : 4 plants per sub-plots: 16 Each plot was divided in two parts equals. In each part, there was 6 sub-plots with 16 plants (2x8 plants). In the first part of the plot, it was treated with CH and other-one was treated with SH. No randomization here. Factors Nutrition and Macronutrition was combined together: Treatment 1 - Completa, Sem Treatment 2 - Completa, Ca Treatment 3 - Completa, Mg Treatment 4 - Completa, P Treatment 5 - Completa, K Treatment 6 - Sem, Sem (control: without Hormone, without Nutrition, and without Macronutrition) This six treatments were randomized on each sub-plot in CH and SH. Randomization was different for each Block. However, treatment 6 is not present on CH. It is only present on SH range. Here was a experimental design: http://www.divshare.com/download/9241232-392 Each Time, we measured Diametro (centimeters), Altura (centimeters), and N.Folhas (count). We are interested on treatments effects on Diameter (Diametro), High (Altura), and leaves number (N.Folhas). Are there effects? Are there time effects? And interactions? How is the best time, and the best nutrition, and the best macronutrition combination? How is the influence of hormone? And interactions? I try this approach, but I don't know how I could handle the repeated measures here! Nor if this approach is correct for me. Here is a subset of my data http://www.divshare.com/download/9241231-428 dados - read.table(marcelo_laia.txt,sep=\t,dec=,,header=TRUE) summary(dados) dados.model - aov(Diametro ~ Block + + Hormone + Error(Block/Hormone) + + Treatment + Treatment:Hormone + + Hormone/Block/Treatment, + data=dados) summary(dados.model) This model was correct? (T6 was present only on SH range) How I could include the repeated measures here? Thank you very much! -- Marcelo Luiz de Laia Universidade do Estado de Santa Catarina UDESC - www.cav.udesc.br Lages - SC - Brazil Linux user number 487797 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Difficult to set a quiet formula in maanova
Hi, I am trying to run an analysis with the package maanova and I am not getting success. I suppose that I am wrong on set up the formula, so the issue may not be related to R, properly. I have two varieties of plants (V1 and V2). A group of each ones were treated and another was not treated. After treatment, in three different time RNA was collected from treated and from not treated plants for both varieties. So, I have: Var: 2 (varieties) Trat: 2 (treatment) Time: 3 Sample: 3 (biological replicate) Probes: 3575 Spots: 2 for each probe I used the following code: library(vsn) todos - read.matrix(todos_v1_e_v2_back_c.txt,sep=\t) todos.norm - vsn2(todos) write.table(exprs(todos.norm),todos.norm.txt,sep=\t) library(maanova) fabiana.raw - read.madata(todos.norm.vsn2.maanova.txt, designfile=design.txt, header=TRUE, spotflag=FALSE,CloneID=1,metarow=2, metacol=3, pmt=4) fabiana - createData(fabiana.raw, n.rep=2, avgreps=1, log.trans=FALSE) model.full.mix - makeModel(data=fabiana, formula=~Var+Trat+Time+Sample+Var:Trat+Var:Time+Trat:Time+Var:Trat:Time, random=~Sample) summary(model.full.mix) Model Summary This is a mixed effect model Gene-specific ANOVA model: Var + Trat + Time + Var:Trat + Var:Time + Trat:Time + Var:Trat:Time + Sample Gene-specific Random terms: Sample Gene-specific covariate: None Class Level Information Class Levels Effect 1 Var 2 fixed 2 Trat 2 fixed 3 Time 3 fixed 4 Var:Trat 4 fixed 5 Var:Time 6 fixed 6 Trat:Time 6 fixed 7 Var:Trat:Time 3 fixed 8Sample 2 random Dimensions Observations(per gene): 36 Columns in X:24 Columns in Z:3 Warning message: number of rows of result is not a multiple of vector length (arg 2) in: cbind(1, Class, Levels, Effect) (I do no what this warning means. my be the error was here.) anova.full.mix - fitmaanova(fabiana, model.full.mix) Calculating variance components for fixed model... Fitting mixed effect model... Finish gene number 100 ... (...) Finish gene number 3500 ... Error in next.fix:(next.fix + ncols - 1) : NA/NaN argument I was inspecting the files of data and there's nothing wrong close to the probe 3500. Then I calculei the average for each probe in Excel. I tried to perform the analysis again adjusting the option n.rep = 1 and avgreps = 0. There was the same mistake: Finish gene number 3500 ... Error in next.fix:(next.fix + ncols - 1) : NA/NaN argument Then I decided do a permutation on my data and the error continued occurring in the same place: Finish gene number 3500 ... Error in next.fix:(next.fix + ncols - 1) : NA/NaN argument Then I exhausted my knowledge and need a help. Could you suggest me anything here? Thank you very much. -- Marcelo Luiz de Laia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plot graph with error bars trouble
Hi, I have a data set like this: MutantRepTime OD 02H02100.029 02H02200.029 02H02300.023 02H02180.655 02H02280.615 02H02380.557 02H021121.776 02H02212 1.859 02H023121.668 02H021163.379 02H022163.726 02H023163.367 306100.033 306200.035 306300.034 30618 0.377 306280.488 306380.409 3061121.106 3062121.348 3063121.246 3061162.706 3062163.073 3063163.038 I need to plot a graph OD over the time for each one mutant with error bars. I try the package sciplot, but this package is set up to handle factorial treatments, so the spacing in x-axis is fixed to be equal. Than, with it I got something like this: | | | | | +- 08 12 16 But, I would like spacing between 0 and 8 2-fold the spacign between 8 and 12, like this: | | | | | +-- 04 8 12 16 Could you point me out another way to do this with out using sciplot? Any suggestion is very appreciated. In advance, I doesn't have a good knowledge about R language. Thank you very much -- Marcelo Luiz de Laia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.