[R] The newest version of Rstudio Desktop v0.98.1049 couldn't be installed
Dear expeRts, I find the newest Rstudio Desktop v0.98.1049 for windows is not newest, after i installed, it was a old version. -- PO SU mail: desolato...@163.com Majored in Statistics from SJTU __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Convert time zone to difference from Coordinated Universal Time
Hello everyone, I want to convert times provided by Sys.time() to use the difference from Coordinated Universal Time instead of the character abbreviation. For example, instead of: 2014-09-03 21:12:35 EDT I want the value as: 2004-09-03 13:20:00-04:00 Is there a way to do this with strftime() ? Thanks in advance, Tim UCB BIOSCIENCES, Inc. Mail P.O. Box 110167 - Research Triangle Park - NC 27709 - USA Via Courier 8010 Arco Corporate Drive - Suite 100 - Raleigh - NC 27617 - USA Phone +1 919 767 2555 - Fax +1 919 767 2570 (Ref: #*UBI0111) [Ref-UBI0111] Legal Notice: This electronic mail and its attachments a...{{dropped:16}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] GLM Help
I think you are looking for ~ Region + Region:Helpers - 1 a.k.a. ~ Region/Helpers - 1 Notice that these are actually the same model as your glm3 (and also as ~Region*Helpers), only the parametrization differs. The latter includes an overall Helpers term so that the interaction coefficients should be read as differences in slope. (With default treatment contrasts, the Helpers term would be the slope for the first region and the interactions are differences in slope compared to the first region). -pd On 03 Sep 2014, at 17:17 , Kathy Haapala ka...@haapi.mn.org wrote: Hi all, I have a large set of data that looks something like this, although this data frame is much smaller and includes made up numbers to make my question easier. x.df - data.frame(Region = c(A, A, A, A, A, B, B, B, B, B, B, C, C, C, C), Group_ID = c(1:15), No_Offspring = c(3, 0, 4, 2, 1, 0, 3, 4, 3, 2, 2, 5, 4, 1, 3), M_Offspring = c(2, 0, 2, 1, 0, 0, 1, 1, 2, 0, 1, 3, 2, 1, 1), F_Offspring = c(1, 0, 2, 1, 1, 0, 2, 3, 1, 2, 1, 2, 2, 0, 2), No_Helpers = c(5, 0, 2, 1, 0, 1, 3, 4, 2, 3, 2, 3, 4, 0, 0)) x.df Region Group_ID No_Offspring M_Offspring F_Offspring No_Helpers 1 A13 2 1 5 2 A20 0 0 0 3 A34 2 2 2 4 A42 1 1 1 5 A51 0 1 0 6 B60 0 0 1 7 B73 1 2 3 8 B84 1 3 4 9 B93 2 1 2 10 B 102 0 2 3 11 B 112 1 1 2 12 C 125 3 2 3 13 C 134 2 2 4 14 C 141 1 0 0 15 C 153 1 2 0 I have been using GLMs to determine if the number of helpers (No_Helpers) has an effect on the sex ratio of the offspring. Here's the GLM I have been using: prop.male - x.df$M_Offspring/x.df$No_Offspring glm = glm(prop.male~No_Helpers,binomial,data=x.df) However, now I'd like to fit a model with region-specific regressions and see if this has more support than the model without region-specificity. So, I'd like one model that generates a regression for each region (A, B, C). I've tried treating No_Helpers and Region as covariates: glm2 = glm(prop.male~No_Helpers+Region-1,binomial,data=x.df) which includes region-specificity in the intercepts, but not the entire regression, and as interaction terms: glm3 = glm(prop.male~No_Helpers*Region-1,binomial,data=x.df) which also does not give me an intercept and slope for each region. I'm not sure how else to adjust the formula, or if the adjustment should be somewhere else in the GLM call. Thanks in advance for your help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Convert time zone to difference from Coordinated Universal Time
On 04/09/2014 02:27, tim.willi...@ucb.com wrote: Hello everyone, I want to convert times provided by Sys.time() to use the difference from Coordinated Universal Time instead of the character abbreviation. For example, instead of: 2014-09-03 21:12:35 EDT I want the value as: 2004-09-03 13:20:00-04:00 Is there a way to do this with strftime() ? On some systems. Although the posting guide required it, you did not provide information on yours. Please do read the help page for yourself -- %z is relevant. Thanks in advance, Tim UCB BIOSCIENCES, Inc. Mail P.O. Box 110167 - Research Triangle Park - NC 27709 - USA Via Courier 8010 Arco Corporate Drive - Suite 100 - Raleigh - NC 27617 - USA Phone +1 919 767 2555 - Fax +1 919 767 2570 (Ref: #*UBI0111) [Ref-UBI0111] Legal Notice: This electronic mail and its attachments a...{{dropped:16}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Emeritus Professor of Applied Statistics, University of Oxford 1 South Parks Road, Oxford OX1 3TG, UK __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The newest version of Rstudio Desktop v0.98.1049 couldn't be installed
On 04-09-2014, at 04:27, PO SU rhelpmaill...@163.com wrote: Dear expeRts, I find the newest Rstudio Desktop v0.98.1049 for windows is not newest, after i installed, it was a old version. Questions and information relating to RStudio do not belong on this list. Send mail to RStudio support. Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The newest version of Rstudio Desktop v0.98.1049 couldn't be installed
Please ask your question to the dedicated forum: https://support.rstudio.com Regards, Pascal On Thu, Sep 4, 2014 at 11:27 AM, PO SU rhelpmaill...@163.com wrote: Dear expeRts, I find the newest Rstudio Desktop v0.98.1049 for windows is not newest, after i installed, it was a old version. -- PO SU mail: desolato...@163.com Majored in Statistics from SJTU __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Pascal Oettli Project Scientist JAMSTEC Yokohama, Japan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in ur.df function
Dear R users, For a time series, say y: y-cumsum(rnorm(100)) # I used ur.df function (urca package) to test for unit root with/without a drift as follows: test-ur.df(y,lags=3,type=drift) # this works for the artificial data here, but when I apply the same function to my very big data, it comes with the following error: Error in coef(summary(result))[2, 3] : subscript out of bounds Any suggestion please? Mamuash __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] snow/Rmpi without MPI.spawn?
On 09/03/2014 10:24 PM, Leek, Jim wrote: Thanks for the tips. I'll take a look around for for loops in the morning. I think the example you provided worked for OpenMPI. (The default on our machine is MPICH2, but it gave the same error about calling spawn.) Anyway, with OpenMPI I got this: # salloc -n 12 orterun -n 1 R -f spawn.R library(Rmpi) ## Recent Rmpi bug -- should be mpi.universe.size() nWorkers - mpi.universe.size() (the '## Recent Rmpi bug' comment should have been removed, it's a holdover from when the script was written several years ago) nslaves = 4 mpi.spawn.Rslaves(nslaves) The argument needs to be named mpi.spawn.Rslaves(nslaves=4) otherwise R matches unnamed arguments by position, and '4' is associated with the 'Rscript' argument. Martin Reported: 2 (out of 2) daemons - 4 (out of 4) procs Then it hung there. So things spawned anyway, which is progress. I'm just not sure is that expected behavior for parSupply or not. Jim -Original Message- From: Martin Morgan [mailto:mtmor...@fhcrc.org] Sent: Wednesday, September 03, 2014 5:08 PM To: Leek, Jim; r-help@r-project.org Subject: Re: [R] snow/Rmpi without MPI.spawn? On 09/03/2014 03:25 PM, Jim Leek wrote: I'm a programmer at a high-performance computing center. I'm not very familiar with R, but I have used MPI from C, C++, and Python. I have to run an R code provided by a guy who knows R, but not MPI. So, this fellow used the R snow library to parallelize his R code (theoretically, I'm not actually sure what he did.) I need to get this code running on our machines. However, Rmpi and snow seem to require mpi spawn, which our computing center doesn't support. I even tried building Rmpi with MPICH1 instead of 2, because Rmpi has that option, but it still tries to use spawn. I can launch plenty of processes, but I have to launch them all at once at the beginning. Is there any way to convince Rmpi to just use those processes rather than trying to spawn its own? I haven't found any documentation on this issue, although I would've thought it would be quite common. This script spawn.R === # salloc -n 12 orterun -n 1 R -f spawn.R library(Rmpi) ## Recent Rmpi bug -- should be mpi.universe.size() nWorkers - mpi.universe.size() mpi.spawn.Rslaves(nslaves=nWorkers) mpiRank - function(i) c(i=i, rank=mpi.comm.rank()) mpi.parSapply(seq_len(2*nWorkers), mpiRank) mpi.close.Rslaves() mpi.quit() can be run like the comment suggests salloc -n 12 orterun -n 1 R -f spawn.R uses slurm (or whatever job manager) to allocate resources for 12 tasks and spawn within that allocation. Maybe that's 'good enough' -- spawning within the assigned allocation? Likely this requires minimal modification of the current code. More extensive is to revise the manager/worker-style code to something more like single instruction, multiple data simd.R == ## salloc -n 4 orterun R --slave -f simd.R sink(/dev/null) # don't capture output -- more care needed here library(Rmpi) TAGS = list(FROM_WORKER=1L) .comm = 0L ## shared `work', here just determine rank and host work = c(rank=mpi.comm.rank(.comm), host=system(hostname, intern=TRUE)) if (mpi.comm.rank(.comm) == 0) { ## manager mpi.barrier(.comm) nWorkers = mpi.comm.size(.comm) res = list(nWorkers) for (i in seq_len(nWorkers - 1L)) { res[[i]] - mpi.recv.Robj(mpi.any.source(), TAGS$FROM_WORKER, comm=.comm) } res[[nWorkers]] = work sink() # start capturing output print(do.call(rbind, res)) } else { ## worker mpi.barrier(.comm) mpi.send.Robj(work, 0L, TAGS$FROM_WORKER, comm=.comm) } mpi.quit() but this likely requires some serious code revision; if going this route then http://r-pbd.org/ might be helpful (and from a similar HPC environment). It's always worth asking whether the code is written to be efficient in R -- a typical 'mistake' is to write R-level explicit 'for' loops that copy-and-append results, along the lines of len - 10 result - NULL for (i in seq_len(len)) ## some complicated calculation, then... result - c(result, sqrt(i)) whereas it's much better to pre-allocate and fill result - integer(len) for (i in seq_len(len)) result[[i]] = sqrt(i) or lapply(seq_len(len), sqrt) and very much better still to 'vectorize' result - sqrt(seq_len(len)) (timing for me are about 1 minute for copy-and-append, .2 s for pre-allocate and fill, and .002s for vectorize). Pushing back on the guy providing the code (grep for for loops, and look for that copy-and-append pattern) might save you from having to use parallel evaluation at all. Martin Thanks, Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and
Re: [R] wilcox.test - difference between p-values of R and online calculators
I think that the issue, at least with the online calculator that I looked at, is that it does not adjust the standard deviation of the test statistic for ties, so the standard deviation is larger and hence larger p-value. I was able to reproduce the reported z-score using the equation for the standard deviation with out ties. Dave Message: 14 Date: Wed, 3 Sep 2014 23:20:04 +0200 From: peter dalgaard pda...@gmail.com https://mail.google.com/mail/?view=cmfs=1tf=1to=pda...@gmail.com To: David L Carlson dcarl...@tamu.edu https://mail.google.com/mail/?view=cmfs=1tf=1to=dcarl...@tamu.edu Cc: r-help@r-project.org https://mail.google.com/mail/?view=cmfs=1tf=1to=r-help@r-project.org r-help@r-project.org https://mail.google.com/mail/?view=cmfs=1tf=1to=r-help@r-project.org, W Bradley Knox bradk...@mit.edu https://mail.google.com/mail/?view=cmfs=1tf=1to=bradk...@mit.edu Subject: Re: [R] wilcox.test - difference between p-values of R and online calculators Message-ID: ffde9637-160e-4555-9c2a-e94494700...@gmail.com https://mail.google.com/mail/?view=cmfs=1tf=1to=ffde9637-160e-4555-9c2a-e94494700...@gmail.com Content-Type: text/plain; charset=us-ascii Notice that correct=TRUE for wilcox.test refers to the continuity correction, not the correction for ties. You can fairly easily simulate from the exact distribution of W: x - c(359,359,359,359,359,359,335,359,359,359,359, 359,359,359,359,359,359,359,359,359,359,303,359,359,359) y - c(332,85,359,359,359,220,231,300,359,237,359,183,286, 355,250,105,359,359,298,359,359,359,28.6,359,359,128) R - rank(c(x,y)) sim - replicate(1e6,sum(sample(R,25))) - 325 # With no ties, the ranks would be a permutation of 1:51, and we could do sim2 - replicate(1e6,sum(sample(1:51,25))) - 325 In either case, the p-value is the probability that W = 485 or W = 165, and mean(sim = 485 | sim = 165) [1] 0.000151 mean(sim2 = 485 | sim2 = 165) [1] 0.002182 Also, try plot(density(sim)) lines(density(sim2)) and notice that the distribution of sim is narrower than that of sim2 (hence the smaller p-value with tie correction), but also that the normal approximationtion is not nearly as good as for the untied case. The clumpiness is due to the fact that 35 of the ranks have the maximum value of 34 (corresponding to the original 359's). -pd On 03 Sep 2014, at 19:13 , David L Carlson dcarl...@tamu.edu https://mail.google.com/mail/?view=cmfs=1tf=1to=dcarl...@tamu.edu wrote: Since they all have the same W/U value, it seems likely that the difference is how the different versions adjust the standard error for ties. Here are a couple of posts addressing the issues of ties: http://tolstoy.newcastle.edu.au/R/e8/help/09/12/9200.html http://stats.stackexchange.com/questions/6127/which-permutation-test-implementation-in-r-to-use-instead-of-t-tests-paired-and David C From: wbradleyk...@gmail.com https://mail.google.com/mail/?view=cmfs=1tf=1to=wbradleyk...@gmail.com [mailto:wbradleyk...@gmail.com https://mail.google.com/mail/?view=cmfs=1tf=1to=wbradleyk...@gmail.com] On Behalf Of W Bradley Knox Sent: Wednesday, September 3, 2014 9:20 AM To: David L Carlson Cc: Tal Galili; r-help@r-project.org https://mail.google.com/mail/?view=cmfs=1tf=1to=r-help@r-project.org Subject: Re: [R] wilcox.test - difference between p-values of R and online calculators Tal and David, thanks for your messages. I should have added that I tried all variations of true/false values for the exact and correct parameters. Running with correct=FALSE makes only a tiny change, resulting in W = 485, p-value = 0.0002481. At one point, I also thought that the discrepancy between R and these online calculators might come from how ties are handled, but the fact that R and two of the online calcultors reach the same U/W values seems to indicate that ties aren't the issue, since (I believe) the U or W values contain all of the information needed to calculate the p-value, assuming the number of samples is also known for each condition. (However, it's been a while since I looked into how MWU tests work, so maybe now's the time to refresh.) If that's correct, the discrepancy seems to be based in what R does with the W value that is identical to the U values of two of the online calculators. (I'm also assuming that U and W have the same meaning, which seems likely.) - Brad W. Bradley Knox, PhD http://bradknox.nethttp://bradknox.net/ bradk...@mit.edu https://mail.google.com/mail/?view=cmfs=1tf=1to=bradk...@mit.edu mailto:bradk...@mit.edu https://mail.google.com/mail/?view=cmfs=1tf=1to=bradk...@mit.edu On Wed, Sep 3, 2014 at 9:10 AM, David L Carlson dcarl...@tamu.edu https://mail.google.com/mail/?view=cmfs=1tf=1to=dcarl...@tamu.edu mailto:dcarl...@tamu.edu https://mail.google.com/mail/?view=cmfs=1tf=1to=dcarl...@tamu.edu wrote: That does not change the results. The
Re: [R] wilcox.test - difference between p-values of R and online calculators
Yes, that is the point that David made and that I illustrated with the simulations: The null distribution of W is more narrow in the presence of ties, hence W=485 is a more extreme observation in the tied case. I.e. it will look less extreme if you ignore that there are ties. -pd On 04 Sep 2014, at 15:17 , Lorenz, David lor...@usgs.gov wrote: I think that the issue, at least with the online calculator that I looked at, is that it does not adjust the standard deviation of the test statistic for ties, so the standard deviation is larger and hence larger p-value. I was able to reproduce the reported z-score using the equation for the standard deviation with out ties. Dave Message: 14 Date: Wed, 3 Sep 2014 23:20:04 +0200 From: peter dalgaard pda...@gmail.com https://mail.google.com/mail/?view=cmfs=1tf=1to=pda...@gmail.com To: David L Carlson dcarl...@tamu.edu https://mail.google.com/mail/?view=cmfs=1tf=1to=dcarl...@tamu.edu Cc: r-help@r-project.org https://mail.google.com/mail/?view=cmfs=1tf=1to=r-help@r-project.org r-help@r-project.org https://mail.google.com/mail/?view=cmfs=1tf=1to=r-help@r-project.org, W Bradley Knox bradk...@mit.edu https://mail.google.com/mail/?view=cmfs=1tf=1to=bradk...@mit.edu Subject: Re: [R] wilcox.test - difference between p-values of R and online calculators Message-ID: ffde9637-160e-4555-9c2a-e94494700...@gmail.com https://mail.google.com/mail/?view=cmfs=1tf=1to=ffde9637-160e-4555-9c2a-e94494700...@gmail.com Content-Type: text/plain; charset=us-ascii Notice that correct=TRUE for wilcox.test refers to the continuity correction, not the correction for ties. You can fairly easily simulate from the exact distribution of W: x - c(359,359,359,359,359,359,335,359,359,359,359, 359,359,359,359,359,359,359,359,359,359,303,359,359,359) y - c(332,85,359,359,359,220,231,300,359,237,359,183,286, 355,250,105,359,359,298,359,359,359,28.6,359,359,128) R - rank(c(x,y)) sim - replicate(1e6,sum(sample(R,25))) - 325 # With no ties, the ranks would be a permutation of 1:51, and we could do sim2 - replicate(1e6,sum(sample(1:51,25))) - 325 In either case, the p-value is the probability that W = 485 or W = 165, and mean(sim = 485 | sim = 165) [1] 0.000151 mean(sim2 = 485 | sim2 = 165) [1] 0.002182 Also, try plot(density(sim)) lines(density(sim2)) and notice that the distribution of sim is narrower than that of sim2 (hence the smaller p-value with tie correction), but also that the normal approximationtion is not nearly as good as for the untied case. The clumpiness is due to the fact that 35 of the ranks have the maximum value of 34 (corresponding to the original 359's). -pd On 03 Sep 2014, at 19:13 , David L Carlson dcarl...@tamu.edu https://mail.google.com/mail/?view=cmfs=1tf=1to=dcarl...@tamu.edu wrote: Since they all have the same W/U value, it seems likely that the difference is how the different versions adjust the standard error for ties. Here are a couple of posts addressing the issues of ties: http://tolstoy.newcastle.edu.au/R/e8/help/09/12/9200.html http://stats.stackexchange.com/questions/6127/which-permutation-test-implementation-in-r-to-use-instead-of-t-tests-paired-and David C From: wbradleyk...@gmail.com https://mail.google.com/mail/?view=cmfs=1tf=1to=wbradleyk...@gmail.com [mailto:wbradleyk...@gmail.com https://mail.google.com/mail/?view=cmfs=1tf=1to=wbradleyk...@gmail.com] On Behalf Of W Bradley Knox Sent: Wednesday, September 3, 2014 9:20 AM To: David L Carlson Cc: Tal Galili; r-help@r-project.org https://mail.google.com/mail/?view=cmfs=1tf=1to=r-help@r-project.org Subject: Re: [R] wilcox.test - difference between p-values of R and online calculators Tal and David, thanks for your messages. I should have added that I tried all variations of true/false values for the exact and correct parameters. Running with correct=FALSE makes only a tiny change, resulting in W = 485, p-value = 0.0002481. At one point, I also thought that the discrepancy between R and these online calculators might come from how ties are handled, but the fact that R and two of the online calcultors reach the same U/W values seems to indicate that ties aren't the issue, since (I believe) the U or W values contain all of the information needed to calculate the p-value, assuming the number of samples is also known for each condition. (However, it's been a while since I looked into how MWU tests work, so maybe now's the time to refresh.) If that's correct, the discrepancy seems to be based in what R does with the W value that is identical to the U values of two of the online calculators. (I'm also assuming that U and W have the same meaning, which seems likely.) - Brad W. Bradley Knox, PhD http://bradknox.nethttp://bradknox.net/ bradk...@mit.edu https://mail.google.com/mail/?view=cmfs=1tf=1to=bradk...@mit.edu mailto:bradk...@mit.edu
[R] Operator proposal: %between%
Not sure if this is the proper list to propose changes like this, if it passes constructive criticism, it would like to have a %between% operator in the R language. I currently have this in my local R startup script: `%between%` - function(x,...) { y - range( unlist(c(...)) ) return( x = y[1] x = y[2] ) } It allows me to do things like: 5 %between c(1,10) and also as act as an in_range operator: foo %between% a.long.list.with.many.values This may seem unnecessary, since 5 = foo[1] foo= foo[2] is also quite short to type, but there is a mental cost to this, eg if you are deeply focused on a complicated program flow, the %between% construct is a lot easier to type out, and relate to, than the logically more complex construct with and =/=, at least in my experience. -- mvh Torbjørn Lindahl [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Convert time zone to difference from Coordinated
Thank you for your reply. %z fits the bill perfectly. Apologies for my breach of etiquette on my first post to the list. - Tim R 3.1.1 on Windows 7 OS From: Prof Brian Ripley rip...@stats.ox.ac.uk To: r-help@r-project.org Subject: Re: [R] Convert time zone to difference from Coordinated Universal Time Message-ID: 54080c8f.4030...@stats.ox.ac.uk Content-Type: text/plain; charset=windows-1252; format=flowed On 04/09/2014 02:27, tim.willi...@ucb.com wrote: Hello everyone, I want to convert times provided by Sys.time() to use the difference from Coordinated Universal Time instead of the character abbreviation. For example, instead of: 2014-09-03 21:12:35 EDT I want the value as: 2004-09-03 13:20:00-04:00 Is there a way to do this with strftime() ? On some systems. Although the posting guide required it, you did not provide information on yours. Please do read the help page for yourself -- %z is relevant. UCB BIOSCIENCES, Inc. Mail P.O. Box 110167 - Research Triangle Park - NC 27709 - USA Via Courier 8010 Arco Corporate Drive - Suite 100 - Raleigh - NC 27617 - USA Phone +1 919 767 2555 - Fax +1 919 767 2570 (Ref: #*UBI0111) [Ref-UBI0111] Legal Notice: This electronic mail and its attachments a...{{dropped:16}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Operator proposal: %between%
On 04/09/2014 10:41 AM, Torbjørn Lindahl wrote: Not sure if this is the proper list to propose changes like this, if it passes constructive criticism, it would like to have a %between% operator in the R language. But it appears that you do: I currently have this in my local R startup script: `%between%` - function(x,...) { y - range( unlist(c(...)) ) return( x = y[1] x = y[2] ) } It allows me to do things like: 5 %between c(1,10) and also as act as an in_range operator: foo %between% a.long.list.with.many.values So what you are asking is that someone should make this available to others, as well. That seems like a reasonable thing to do, but why shouldn't that someone be you? This may seem unnecessary, since 5 = foo[1] foo= foo[2] is also quite short to type, but there is a mental cost to this, eg if you are deeply focused on a complicated program flow, the %between% construct is a lot easier to type out, and relate to, than the logically more complex construct with and =/=, at least in my experience. One problem with your definition is that it's not clear it does the right thing when x is a vector. I might have a vector of lower bounds, and a vector of upper bounds, and want to check each element of x against the corresponding bound, i.e. compute lower = x x = upper Your %between% operator could be rewritten so that x %between% cbind(lower, upper) would give this result, but it doesn't do so now. (I'm not saying you *should* rewrite it like that, but it's something you should consider.) Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] snow/Rmpi without MPI.spawn?
Ah, now it's working. Thanks. Now I just need to figure out how to get snow doing this... Jim On 09/04/2014 05:03 AM, Martin Morgan wrote: On 09/03/2014 10:24 PM, Leek, Jim wrote: Thanks for the tips. I'll take a look around for for loops in the morning. I think the example you provided worked for OpenMPI. (The default on our machine is MPICH2, but it gave the same error about calling spawn.) Anyway, with OpenMPI I got this: # salloc -n 12 orterun -n 1 R -f spawn.R library(Rmpi) ## Recent Rmpi bug -- should be mpi.universe.size() nWorkers - mpi.universe.size() (the '## Recent Rmpi bug' comment should have been removed, it's a holdover from when the script was written several years ago) nslaves = 4 mpi.spawn.Rslaves(nslaves) The argument needs to be named mpi.spawn.Rslaves(nslaves=4) otherwise R matches unnamed arguments by position, and '4' is associated with the 'Rscript' argument. Martin Reported: 2 (out of 2) daemons - 4 (out of 4) procs Then it hung there. So things spawned anyway, which is progress. I'm just not sure is that expected behavior for parSupply or not. Jim -Original Message- From: Martin Morgan [mailto:mtmor...@fhcrc.org] Sent: Wednesday, September 03, 2014 5:08 PM To: Leek, Jim; r-help@r-project.org Subject: Re: [R] snow/Rmpi without MPI.spawn? On 09/03/2014 03:25 PM, Jim Leek wrote: I'm a programmer at a high-performance computing center. I'm not very familiar with R, but I have used MPI from C, C++, and Python. I have to run an R code provided by a guy who knows R, but not MPI. So, this fellow used the R snow library to parallelize his R code (theoretically, I'm not actually sure what he did.) I need to get this code running on our machines. However, Rmpi and snow seem to require mpi spawn, which our computing center doesn't support. I even tried building Rmpi with MPICH1 instead of 2, because Rmpi has that option, but it still tries to use spawn. I can launch plenty of processes, but I have to launch them all at once at the beginning. Is there any way to convince Rmpi to just use those processes rather than trying to spawn its own? I haven't found any documentation on this issue, although I would've thought it would be quite common. This script spawn.R === # salloc -n 12 orterun -n 1 R -f spawn.R library(Rmpi) ## Recent Rmpi bug -- should be mpi.universe.size() nWorkers - mpi.universe.size() mpi.spawn.Rslaves(nslaves=nWorkers) mpiRank - function(i) c(i=i, rank=mpi.comm.rank()) mpi.parSapply(seq_len(2*nWorkers), mpiRank) mpi.close.Rslaves() mpi.quit() can be run like the comment suggests salloc -n 12 orterun -n 1 R -f spawn.R uses slurm (or whatever job manager) to allocate resources for 12 tasks and spawn within that allocation. Maybe that's 'good enough' -- spawning within the assigned allocation? Likely this requires minimal modification of the current code. More extensive is to revise the manager/worker-style code to something more like single instruction, multiple data simd.R == ## salloc -n 4 orterun R --slave -f simd.R sink(/dev/null) # don't capture output -- more care needed here library(Rmpi) TAGS = list(FROM_WORKER=1L) .comm = 0L ## shared `work', here just determine rank and host work = c(rank=mpi.comm.rank(.comm), host=system(hostname, intern=TRUE)) if (mpi.comm.rank(.comm) == 0) { ## manager mpi.barrier(.comm) nWorkers = mpi.comm.size(.comm) res = list(nWorkers) for (i in seq_len(nWorkers - 1L)) { res[[i]] - mpi.recv.Robj(mpi.any.source(), TAGS$FROM_WORKER, comm=.comm) } res[[nWorkers]] = work sink() # start capturing output print(do.call(rbind, res)) } else { ## worker mpi.barrier(.comm) mpi.send.Robj(work, 0L, TAGS$FROM_WORKER, comm=.comm) } mpi.quit() but this likely requires some serious code revision; if going this route then http://r-pbd.org/ might be helpful (and from a similar HPC environment). It's always worth asking whether the code is written to be efficient in R -- a typical 'mistake' is to write R-level explicit 'for' loops that copy-and-append results, along the lines of len - 10 result - NULL for (i in seq_len(len)) ## some complicated calculation, then... result - c(result, sqrt(i)) whereas it's much better to pre-allocate and fill result - integer(len) for (i in seq_len(len)) result[[i]] = sqrt(i) or lapply(seq_len(len), sqrt) and very much better still to 'vectorize' result - sqrt(seq_len(len)) (timing for me are about 1 minute for copy-and-append, .2 s for pre-allocate and fill, and .002s for vectorize). Pushing back on the guy providing the code (grep for for loops, and look for that copy-and-append pattern) might save you from having to use parallel evaluation at all. Martin Thanks, Jim __
Re: [R] Covariance between two dichotomous variables
If you have 2 dichotomous variables coded 0/1 (and stored as numerics) then the var and cov functions can be used to compute the covariance as if they were continuous variables. Some algebra shows that the continous covariance and the binomial covariance only differ by the denominator (n for binomial, n-1 for continuous), for large sample sizes the difference is trivial, for small sample sizes (or even large if you want) you can just multiply by (n-1)/n to correct. On Tue, Sep 2, 2014 at 10:29 PM, Heather Kettrey heather.h.kett...@vanderbilt.edu wrote: Hi, I am trying to test a mediation hypothesis using coefficients from logistic regression analyses (x, m, and y are all dichotomous). I am running a test of significance using MacKinnon and Dwyer's adaptation of Sobel's test (i.e., correcting for different scales of coefficients in cases of a dichotomous outcome). In order to make this correction I need to compute the covariance between x and m. I have searched various R packages and the R-help page archive and cannot find a way to do this in R. Does anyone know how to compute the covariance between two dichotomous variables in R? It seems like there should be a very simple answer to this question, but I cannot find it. Thanks in advance! Heather -- Heather Hensman Kettrey PhD Candidate Department of Sociology Vanderbilt University [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Defining vectors with per-determined correlations
I need to define three vectors x, y, z (each of length 100) such that the pair-wise correlations of the vectors have per-defined values r1 and r2. More specifically I need to define x, y, and z so that: corr(x,y) = r1 corr(y,z) = r2 Is there any easy way to accomplish this with R? Thank you, John John David Sorkin M.D., Ph.D. Professor of Medicine Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Confidentiality Statement: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] snow/Rmpi without MPI.spawn?
You could look into the RMPISNOW shell script that is included in snow for use with mpirun, eg as mpirun -np 3 RMPISNOW The script might need adjusting for your setting. Best, luke On Thu, 4 Sep 2014, Jim Leek wrote: Ah, now it's working. Thanks. Now I just need to figure out how to get snow doing this... Jim On 09/04/2014 05:03 AM, Martin Morgan wrote: On 09/03/2014 10:24 PM, Leek, Jim wrote: Thanks for the tips. I'll take a look around for for loops in the morning. I think the example you provided worked for OpenMPI. (The default on our machine is MPICH2, but it gave the same error about calling spawn.) Anyway, with OpenMPI I got this: # salloc -n 12 orterun -n 1 R -f spawn.R library(Rmpi) ## Recent Rmpi bug -- should be mpi.universe.size() nWorkers - mpi.universe.size() (the '## Recent Rmpi bug' comment should have been removed, it's a holdover from when the script was written several years ago) nslaves = 4 mpi.spawn.Rslaves(nslaves) The argument needs to be named mpi.spawn.Rslaves(nslaves=4) otherwise R matches unnamed arguments by position, and '4' is associated with the 'Rscript' argument. Martin Reported: 2 (out of 2) daemons - 4 (out of 4) procs Then it hung there. So things spawned anyway, which is progress. I'm just not sure is that expected behavior for parSupply or not. Jim -Original Message- From: Martin Morgan [mailto:mtmor...@fhcrc.org] Sent: Wednesday, September 03, 2014 5:08 PM To: Leek, Jim; r-help@r-project.org Subject: Re: [R] snow/Rmpi without MPI.spawn? On 09/03/2014 03:25 PM, Jim Leek wrote: I'm a programmer at a high-performance computing center. I'm not very familiar with R, but I have used MPI from C, C++, and Python. I have to run an R code provided by a guy who knows R, but not MPI. So, this fellow used the R snow library to parallelize his R code (theoretically, I'm not actually sure what he did.) I need to get this code running on our machines. However, Rmpi and snow seem to require mpi spawn, which our computing center doesn't support. I even tried building Rmpi with MPICH1 instead of 2, because Rmpi has that option, but it still tries to use spawn. I can launch plenty of processes, but I have to launch them all at once at the beginning. Is there any way to convince Rmpi to just use those processes rather than trying to spawn its own? I haven't found any documentation on this issue, although I would've thought it would be quite common. This script spawn.R === # salloc -n 12 orterun -n 1 R -f spawn.R library(Rmpi) ## Recent Rmpi bug -- should be mpi.universe.size() nWorkers - mpi.universe.size() mpi.spawn.Rslaves(nslaves=nWorkers) mpiRank - function(i) c(i=i, rank=mpi.comm.rank()) mpi.parSapply(seq_len(2*nWorkers), mpiRank) mpi.close.Rslaves() mpi.quit() can be run like the comment suggests salloc -n 12 orterun -n 1 R -f spawn.R uses slurm (or whatever job manager) to allocate resources for 12 tasks and spawn within that allocation. Maybe that's 'good enough' -- spawning within the assigned allocation? Likely this requires minimal modification of the current code. More extensive is to revise the manager/worker-style code to something more like single instruction, multiple data simd.R == ## salloc -n 4 orterun R --slave -f simd.R sink(/dev/null) # don't capture output -- more care needed here library(Rmpi) TAGS = list(FROM_WORKER=1L) .comm = 0L ## shared `work', here just determine rank and host work = c(rank=mpi.comm.rank(.comm), host=system(hostname, intern=TRUE)) if (mpi.comm.rank(.comm) == 0) { ## manager mpi.barrier(.comm) nWorkers = mpi.comm.size(.comm) res = list(nWorkers) for (i in seq_len(nWorkers - 1L)) { res[[i]] - mpi.recv.Robj(mpi.any.source(), TAGS$FROM_WORKER, comm=.comm) } res[[nWorkers]] = work sink() # start capturing output print(do.call(rbind, res)) } else { ## worker mpi.barrier(.comm) mpi.send.Robj(work, 0L, TAGS$FROM_WORKER, comm=.comm) } mpi.quit() but this likely requires some serious code revision; if going this route then http://r-pbd.org/ might be helpful (and from a similar HPC environment). It's always worth asking whether the code is written to be efficient in R -- a typical 'mistake' is to write R-level explicit 'for' loops that copy-and-append results, along the lines of len - 10 result - NULL for (i in seq_len(len)) ## some complicated calculation, then... result - c(result, sqrt(i)) whereas it's much better to pre-allocate and fill result - integer(len) for (i in seq_len(len)) result[[i]] = sqrt(i) or lapply(seq_len(len), sqrt) and very much better still to 'vectorize' result - sqrt(seq_len(len)) (timing for me are about 1 minute for copy-and-append, .2 s for pre-allocate and fill, and .002s for vectorize). Pushing back on the
[R] citation of a task view
Hi all, Which is a formal bibliography citation of an R's task view? For example if I want to make a citation of MetaAnalysis task view. Thanks in advance! Pablo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Revolutions blog roundup: August 2014
Revolution Analytics staff and guests write about R every weekday at the Revolutions blog: http://blog.revolutionanalytics.com and every month I post a summary of articles from the previous month of particular interest to readers of r-help. In case you missed them, here are some articles related to R from the month of August: R is the most popular software in the KDNuggets poll for the 4th year running: http://bit.ly/1AaTLsu The frequency of R user group meetings continues to rise, and there are now 147 R user groups worldwide: http://bit.ly/1AaTINi A video interview with me (David Smith) at the useR! 2014 conference: http://bit.ly/1AaTINj In a provocative op-ed, Norm Matloff worries that Statistics is losing ground to Computer Science: http://bit.ly/1AaTLss A new certification program for Revolution R Enterprise: http://bit.ly/1AaTJ3D An interactive map of R user groups around the world, created with R and Shiny: http://bit.ly/1AaTLst Using R to generate calendar entries (and create photo opportunities): http://bit.ly/1AaTINk Integrating R with production systems with Domino: http://bit.ly/1AaTLsw The New York Times compares data science to janitorial work: http://bit.ly/1AaTLsv Rdocumentation.org provides search for CRAN, GitHub and BioConductor packages and publishes a top-10 list of packages by downloads: http://bit.ly/1AaTLsz An update to the airlines data set (the iris of Big Data) with flights through the end of 2012: http://bit.ly/1AaTLsx A consultant compares the statistical capabilities of R, Matlab, SAS, Stata and SPSS: http://bit.ly/1AaTINo Using heatmaps to explore correlations in financial portfolios: http://bit.ly/1AaTJ3C Video of John Chambers' keynote at the useR! 2014 conference on the interfaces, efficiency, big data and the history of R: http://bit.ly/1AaTLsy CIO magazine says the open source R language is becoming pervasive: http://bit.ly/1AaTJ3E Reviews of some presentations at the JSM 2014 conference that used R: http://bit.ly/1AaTJ3F GRAN is a new R package to manage package repositories to support reproducibility: http://bit.ly/1AaTLIM The ASA launches a PR campaign to promote the role of statisticians in society: http://bit.ly/1AaTLIN Video replay of the webinar Applications in R, featuring examples from several companies using R: http://bit.ly/1AaTJ3G General interest stories (not related to R) in the past month included: dance moves from Japan (http://bit.ly/1AaTLIP), an earthquake's signal in personal sensors (http://bit.ly/1AaTLIQ), a 3-minute movie in less than 4k (http://bit.ly/1AaTLIR), smooth time-lapse videos (http://bit.ly/1AaTLIS), representing mazes as trees (http://bit.ly/1AaTJ3K), and the view from inside a fireworks display (http://bit.ly/1AaTLIV). Meeting times for local R user groups (http://blog.revolutionanalytics.com/local-r-groups.html) can be found on the updated R Community Calendar at: http://blog.revolutionanalytics.com/calendar.html If you're looking for more articles about R, you can find summaries from previous months at http://blog.revolutionanalytics.com/roundups/. You can receive daily blog posts via email using services like blogtrottr.com, or join the Revolution Analytics mailing list at http://revolutionanalytics.com/newsletter to be alerted to new articles on a monthly basis. As always, thanks for the comments and please keep sending suggestions to me at da...@revolutionanalytics.com or via Twitter (I'm @revodavid). Cheers, # David -- David M Smith da...@revolutionanalytics.com Chief Community Officer, Revolution Analytics http://blog.revolutionanalytics.com Tel: +1 (650) 646-9523 (Chicago IL, USA) Twitter: @revodavid -- Try Enterprise R Now! https://aws.amazon.com/marketplace/seller-profile/ref=_ptnr_emailfooter?ie=UTF8id=3c6536d3-8115-4bc0-a713-be58e257a7be Get a 14 Day Free Trial of Revolution R Enterprise on AWS Marketplace __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] log likelihood and optimize
Hello I want to estimate the covariance matrix of the likelihood f(x1,x2,x3)=f(x2|x1)f(x3|x2)f(x1), where f(x2|x1) follows a Binomial distribution with parameters (2, 0.2), f(x3|x2) follows a Binomial distribution with parameters (2, 0.8) and f(x1) follows a Binomial distribution with parameters (2, 0.1). Could you please suggest a way of doing it using log likelihood and optimize? Many thanks Tonia Marks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] mvpart error in R 3.1.1 s_to_rp not available for .C()
Dear R list, I'm working with recursive tress using packages mvpart and rpart in R in linux xubuntu (64). The package performed with no problem under my previous R version (2.14) I had recently updated my R version to 3.1.1 and when I try to run a mvpart model I get the following error mesage data(spider) mvpart(data.matrix(spider[,1:12])~herbs+reft+moss+sand+twigs+water, data=spider) Error en .C(s_to_rp, n = as.integer(nobs), nvarx = as.integer(nvar), : s_to_rp not available for .C() for package mvpart I tried to find this problem on the web, but I was not able to find any response. If you can please help me to solve this problem, I would be grateful. Best regards and thank you in advance Angel Segura PD below you will find R and session info Working on Ubuntu 12.04.2 LTS \n \l R.Version() $platform [1] x86_64-pc-linux-gnu $arch [1] x86_64 $os [1] linux-gnu $system [1] x86_64, linux-gnu $status [1] $major [1] 3 $minor [1] 1.1 $year [1] 2014 $month [1] 07 $day [1] 10 $`svn rev` [1] 66115 $language [1] R $version.string [1] R version 3.1.1 (2014-07-10) $nickname [1] Sock it to Me sessionInfo() R version 3.1.1 (2014-07-10) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=es_ES.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=es_ES.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=es_ES.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] mvpart_1.6-2 rpart_4.1-8 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] gplot heatmaps: clustering according to rowsidecolors + key.xtickfun
Hi there, I have two questions about heatmap.2 in gplot. My input is a simple square matrix with numeric values between 75 and 100 (it is a similarity matrix based on bacterial DNA sequences). 1. I can sort my input matrix into categories with rowsidecolors (in this case, very conveniently by bacterial taxa). I do a clustering and reordering of my matrix by Rowv=TRUE (and Colv=Rowv). The question is now, can i combine the two features that the clustering/reordering is done only for submatrices defined by the vectors given in rowsidecolors (so, in this case, that the original ordering by bacterial taxa is preserved)? That would be very amazing. 2. I have set my own coloring rules with: mypal - c(grey,blue, green,yellow,orange,red) col_breaks = c(seq(0,74.9), seq(75.0,78.4), seq(78.5,81.9), seq(82.0,86.4), seq(86.5, 94.5), seq(94.5,100.0)) Is it possible to pass this sequential ordering to key.xtickfun? May i ask for an example code? Thank you very much! -- Tim Richter-Heitmann (M.Sc.) PhD Candidate International Max-Planck Research School for Marine Microbiology University of Bremen Microbial Ecophysiology Group (AG Friedrich) FB02 - Biologie/Chemie Leobener Straße (NW2 A2130) D-28359 Bremen Tel.: 0049(0)421 218-63062 Fax: 0049(0)421 218-63069 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with regression
I have this code: Vm - c(6.2208, 4.9736, 4.1423, 3.1031, 2.4795, 1.6483, 1.2328, 0.98357, 0.81746, 0.60998); #Molvolume p - c(0.4, 0.5, 0.6, 0.8, 1, 1.5, 2, 2.5, 3, 4)*1000; #Pressure Rydb - 8.3144621; #Constant Tempi - 300; #Temperature in Kelvin Vmi - Vm^(-1); #To get Vm^(-1) Zi - (p*Vm)/(Rydb*Tempi) #To get Z #Plot dframe - data.frame(Vmi, Zi) plot(dframe, pch=19, col='red', main='Thermodynamic properties of Argon', xlab='1/Vm', ylab='Z') #Fit for B fitb -lm(Zi ~ Vmi); fitb$coefficients[1]; fitb$coefficients[2]; summary(fitb) I want to make a regression on the data with this generel formula: y=1+Bx+Cx^2. I need to figure out what B and C in this formula is. Please help me! I want to become better to R. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R for chemistry
Dear community. I am studying chemistry and physics. We don'te get an intro to mathematic programms or programming. We shall just find something and use it. So I have choosen R. But was that a good choice? Do you think I could get threw my study with R as my only programming language (combined with C++) and as my only mathematic calculator. Is it an alternative to MatLab? Or is R just for statistics? Hopefully anyone can answer this question? Kind regards! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] citation of a task view
On 04/09/2014 4:52 AM, Dr. Pablo E. Verde wrote: Hi all, Which is a formal bibliography citation of an R's task view? For example if I want to make a citation of MetaAnalysis task view. Thanks in advance! I don't think there is a recognized standard one. I would use whatever format your journal requires for citing any web page, e.g. something like Lewin-Koh, Nicholas (2013). CRAN Task View: Graphic Displays Dynamic Graphics Graphic Devices Visualization. Web page with URL http://cran.r-project.org/web/views/Bayesian.html, retrieved Sept. 4, 2014. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Defining vectors with per-determined correlations
See ?mvrnorm in the MASS package. Best, Ista On Thu, Sep 4, 2014 at 12:04 PM, John Sorkin jsor...@grecc.umaryland.edu wrote: I need to define three vectors x, y, z (each of length 100) such that the pair-wise correlations of the vectors have per-defined values r1 and r2. More specifically I need to define x, y, and z so that: corr(x,y) = r1 corr(y,z) = r2 Is there any easy way to accomplish this with R? Thank you, John John David Sorkin M.D., Ph.D. Professor of Medicine Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Confidentiality Statement: This email message, including any attachments, is for ...{{dropped:18}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R for chemistry
On Thu, 4 Sep 2014, Basilius Sapientia wrote: So I have choosen R. But was that a good choice? Basilius, Yes. For data analyses. While you could use R as a general programming language, but others are better suited. Do you think I could get threw my study with R as my only programming language (combined with C++) and as my only mathematic calculator. Is it an alternative to MatLab? Or is R just for statistics? Take a close look at Python. It has extensive scientific support (NumPy, SciPy, Pandas, etc.) and can do what MatLab does (so does the open-source Octave, by the way). It's also used for general progamming; for example, the Mailman mailing list manager is written in python. I moved from C to Python a number of years ago and have no regrets. Rich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Built R package with example
Dear All, How can I add folder content examples of needed files to run example? The simple add of folder did not built and reload with the package. Thanks! Karim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame
Hi David and list: This is working, except at this command mycast - dcast(mymelt, row~color, value.var=rank, fill=0) dcast is using length as the default aggregating function. This results in not accurate results. It tells me, for example how many choices were missing values and it tells me if a person selected any given option (value is reported as 1). When I try to run your reproducible research, it works great, but something with the aggregating function is not working properly with mine. Any other thoughts? Simon On Aug 18, 2014, at 10:44 AM, David L Carlson dcarl...@tamu.edu wrote: Another approach using reshape2: library(reshape2) # Construct data/ add column of row numbers set.seed(42) mydf - data.frame(t(replicate(100, sample(c(red, blue, + green, yellow, NA), 4 mydf - data.frame(rows=1:100, mydf) colnames(mydf) - c(row, rank1, rank2, rank3, rank4) head(mydf) row rank1 rank2 rank3 rank4 1 1 NA yellowred blue 2 2 yellow green NA red 3 3 yellow green blue NA 4 4 NA blue yellow green 5 5 NAred blue green 6 6 NAred green blue # Reshape mymelt - melt(mydf, id.vars=1, measure.vars=2:5, + variable.name=rank, value.name=color) # Convert rank to numeric mymelt$rank - as.numeric(mymelt$rank) mycast - dcast(mymelt, row~color, value.var=rank, fill=0) head(mycast) row blue green red yellow NA 1 14 0 3 2 1 2 20 2 4 1 3 3 33 2 0 1 4 4 42 4 0 3 1 5 53 4 2 0 1 6 64 3 2 0 1 David C -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David L Carlson Sent: Sunday, August 17, 2014 6:32 PM To: Simon Kiss; r-help@r-project.org Subject: Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame There is probably an easier way to do this, but set.seed(42) mydf - data.frame(t(replicate(100, sample(c(red, blue, + green, yellow, NA), 4 colnames(mydf) - c(rank1, rank2, rank3, rank4) head(mydf) rank1 rank2 rank3 rank4 1 NA yellowred blue 2 yellow green NA red 3 yellow green blue NA 4 NA blue yellow green 5 NAred blue green 6 NAred green blue lvls - levels(mydf$rank1) # convert color factors to numeric for (i in seq_along(mydf)) mydf[,i] - as.numeric(mydf[,i]) # stack the columns mydf2 - stack(mydf) # convert rank factor to numeric mydf2$ind - as.numeric(mydf2$ind) # add row numbers mydf2 - data.frame(rows=1:100, mydf2) # Create table mytbl - xtabs(ind~rows+values, mydf2) # convert to data frame mydf3 - data.frame(unclass(mytbl)) colnames(mydf3) - lvls head(mydf3) blue green red yellow 14 0 3 2 20 2 4 1 33 2 0 1 42 4 0 3 53 4 2 0 64 3 2 0 David C -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Simon Kiss Sent: Friday, August 15, 2014 3:58 PM To: r-help@r-project.org Subject: Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame Both the suggestions I got work very well, but what I didn't realize is that NA values would cause serious problems. Where there is a missing value, using the argument na.last=NA to order just returns the the order of the factor levels, but excludes the missing values, but I have no idea where those occur in the or rather which of those variables were actually missing. Have I explained this problem sufficiently? I didn't think it would cause such a problem so I didn't include it in the original problem definition. Yours, Simon On Jul 25, 2014, at 4:58 PM, David L Carlson dcarl...@tamu.edu wrote: I think this gets what you want. But your data are not reproducible since they are randomly drawn without setting a seed and the two data sets have no relationship to one another. set.seed(42) mydf - data.frame(t(replicate(100, sample(c(red, blue, + green, yellow) colnames(mydf) - c(rank1, rank2, rank3, rank4) mydf2 - data.frame(t(apply(mydf, 1, order))) colnames(mydf2) - levels(mydf$rank1) head(mydf) rank1 rank2 rank3 rank4 1 yellow greenred blue 2 green blue yellow red 3 green yellowred blue 4 yellowred green blue 5 yellowred green blue 6 yellowred blue green head(mydf2) blue green red yellow 14 2 3 1 22 1 4 3 34 1 3 2 44 3 2 1 54 3 2 1 63 4 2 1 - David L Carlson Department of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Simon Kiss Sent: Friday, July 25, 2014 2:34 PM To:
Re: [R] citation of a task view
On Thu, 4 Sep 2014, Duncan Murdoch wrote: On 04/09/2014 4:52 AM, Dr. Pablo E. Verde wrote: Hi all, Which is a formal bibliography citation of an R's task view? For example if I want to make a citation of MetaAnalysis task view. Thanks in advance! I don't think there is a recognized standard one. Not yet. But since this summer the web pages contain meta tags (both in Highwire Press and Dublin Core format) that state how the pages can be cited. I would use whatever format your journal requires for citing any web page, e.g. something like Lewin-Koh, Nicholas (2013). CRAN Task View: Graphic Displays Dynamic Graphics Graphic Devices Visualization. Web page with URL http://cran.r-project.org/web/views/Bayesian.html, retrieved Sept. 4, 2014. I would recommend two changes: (1) Use the official stable URL http://CRAN.R-project.org/view=... (2) Instead of the retrieved information, use the version date stated on the task view. For example for the current version of the MetaAnalysis view: Michael Dewey (2014). CRAN Task View: Meta-Analysis. Version 2014-07-25. URL http://CRAN.R-project.org/view=MetaAnalysis. or in BibTeX: @Misc{, author = {Michael Dewey}, note = {Version~2014-07-25}, title = {{CRAN} Task View: Meta-Analysis}, year = {2014}, url = {http://CRAN.R-project.org/view=MetaAnalysis} } hth, Z Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R for chemistry
R is useful for quite a range of applications, but not everything. I recommend planning on learning multiple programming languages eventually, because each type of problem has its own set of useful phrases. An example of this in R is comparing the base, lattice, and ggplot models of graph generation... each has its own perspective that is valuable in different contexts. Another example might be in iterative algorithms... these are often implemented in C or C++ or Fortran and called from R. It is common to build packages in R to create convenient groups of functions that are useful for specific problem types, but other languages sometimes have features that make these packages look clumsy. Knowing about how other languages do things can make it easier to see better solutions in R, or even avoid struggling with poorly-suited functions. In the vein of communicating appropriately, be sure follow the instructions in the footer of this or any other post, which among other things asks you to not post in HTML on this list. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On September 4, 2014 8:41:28 AM PDT, Basilius Sapientia basilius...@gmail.com wrote: Dear community. I am studying chemistry and physics. We don'te get an intro to mathematic programms or programming. We shall just find something and use it. So I have choosen R. But was that a good choice? Do you think I could get threw my study with R as my only programming language (combined with C++) and as my only mathematic calculator. Is it an alternative to MatLab? Or is R just for statistics? Hopefully anyone can answer this question? Kind regards! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] citation of a task view
On 04/09/2014 2:28 PM, Achim Zeileis wrote: On Thu, 4 Sep 2014, Duncan Murdoch wrote: On 04/09/2014 4:52 AM, Dr. Pablo E. Verde wrote: Hi all, Which is a formal bibliography citation of an R's task view? For example if I want to make a citation of MetaAnalysis task view. Thanks in advance! I don't think there is a recognized standard one. Not yet. But since this summer the web pages contain meta tags (both in Highwire Press and Dublin Core format) that state how the pages can be cited. I would use whatever format your journal requires for citing any web page, e.g. something like Lewin-Koh, Nicholas (2013). CRAN Task View: Graphic Displays Dynamic Graphics Graphic Devices Visualization. Web page with URL http://cran.r-project.org/web/views/Bayesian.html, retrieved Sept. 4, 2014. I would recommend two changes: (1) Use the official stable URL http://CRAN.R-project.org/view=... (2) Instead of the retrieved information, use the version date stated on the task view. For example for the current version of the MetaAnalysis view: Thanks. One more correction: I described the Graphics view, but put in the link to the Bayesian one :-). So the real link should have been http://CRAN.R-project.org/view=Graphics Duncan Murdoch Michael Dewey (2014). CRAN Task View: Meta-Analysis. Version 2014-07-25. URL http://CRAN.R-project.org/view=MetaAnalysis. or in BibTeX: @Misc{, author = {Michael Dewey}, note = {Version~2014-07-25}, title = {{CRAN} Task View: Meta-Analysis}, year = {2014}, url = {http://CRAN.R-project.org/view=MetaAnalysis} } hth, Z Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame
I think we would need enough of the data you are using to figure out how to modify the process. Can you use dput() to send a small data set that fails to work? David C -Original Message- From: Simon Kiss [mailto:sjk...@gmail.com] Sent: Thursday, September 4, 2014 1:28 PM To: David L Carlson Cc: r-help@r-project.org Subject: Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame Hi David and list: This is working, except at this command mycast - dcast(mymelt, row~color, value.var=rank, fill=0) dcast is using length as the default aggregating function. This results in not accurate results. It tells me, for example how many choices were missing values and it tells me if a person selected any given option (value is reported as 1). When I try to run your reproducible research, it works great, but something with the aggregating function is not working properly with mine. Any other thoughts? Simon On Aug 18, 2014, at 10:44 AM, David L Carlson dcarl...@tamu.edu wrote: Another approach using reshape2: library(reshape2) # Construct data/ add column of row numbers set.seed(42) mydf - data.frame(t(replicate(100, sample(c(red, blue, + green, yellow, NA), 4 mydf - data.frame(rows=1:100, mydf) colnames(mydf) - c(row, rank1, rank2, rank3, rank4) head(mydf) row rank1 rank2 rank3 rank4 1 1 NA yellowred blue 2 2 yellow green NA red 3 3 yellow green blue NA 4 4 NA blue yellow green 5 5 NAred blue green 6 6 NAred green blue # Reshape mymelt - melt(mydf, id.vars=1, measure.vars=2:5, + variable.name=rank, value.name=color) # Convert rank to numeric mymelt$rank - as.numeric(mymelt$rank) mycast - dcast(mymelt, row~color, value.var=rank, fill=0) head(mycast) row blue green red yellow NA 1 14 0 3 2 1 2 20 2 4 1 3 3 33 2 0 1 4 4 42 4 0 3 1 5 53 4 2 0 1 6 64 3 2 0 1 David C -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David L Carlson Sent: Sunday, August 17, 2014 6:32 PM To: Simon Kiss; r-help@r-project.org Subject: Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame There is probably an easier way to do this, but set.seed(42) mydf - data.frame(t(replicate(100, sample(c(red, blue, + green, yellow, NA), 4 colnames(mydf) - c(rank1, rank2, rank3, rank4) head(mydf) rank1 rank2 rank3 rank4 1 NA yellowred blue 2 yellow green NA red 3 yellow green blue NA 4 NA blue yellow green 5 NAred blue green 6 NAred green blue lvls - levels(mydf$rank1) # convert color factors to numeric for (i in seq_along(mydf)) mydf[,i] - as.numeric(mydf[,i]) # stack the columns mydf2 - stack(mydf) # convert rank factor to numeric mydf2$ind - as.numeric(mydf2$ind) # add row numbers mydf2 - data.frame(rows=1:100, mydf2) # Create table mytbl - xtabs(ind~rows+values, mydf2) # convert to data frame mydf3 - data.frame(unclass(mytbl)) colnames(mydf3) - lvls head(mydf3) blue green red yellow 14 0 3 2 20 2 4 1 33 2 0 1 42 4 0 3 53 4 2 0 64 3 2 0 David C -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Simon Kiss Sent: Friday, August 15, 2014 3:58 PM To: r-help@r-project.org Subject: Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame Both the suggestions I got work very well, but what I didn't realize is that NA values would cause serious problems. Where there is a missing value, using the argument na.last=NA to order just returns the the order of the factor levels, but excludes the missing values, but I have no idea where those occur in the or rather which of those variables were actually missing. Have I explained this problem sufficiently? I didn't think it would cause such a problem so I didn't include it in the original problem definition. Yours, Simon On Jul 25, 2014, at 4:58 PM, David L Carlson dcarl...@tamu.edu wrote: I think this gets what you want. But your data are not reproducible since they are randomly drawn without setting a seed and the two data sets have no relationship to one another. set.seed(42) mydf - data.frame(t(replicate(100, sample(c(red, blue, + green, yellow) colnames(mydf) - c(rank1, rank2, rank3, rank4) mydf2 - data.frame(t(apply(mydf, 1, order))) colnames(mydf2) - levels(mydf$rank1) head(mydf) rank1 rank2 rank3 rank4 1 yellow greenred blue 2 green blue yellow red 3 green yellowred blue 4 yellowred green blue 5 yellowred green blue 6 yellowred blue green head(mydf2) blue green red yellow 14 2 3 1 22 1 4 3
Re: [R] structural equation modeling in sem, error, The model has negative degrees of freedom = -3, and The model is almost surely misspecified...
Dear Prof. John, I'm trying to solve the following model in R, but I getting error about the degree of freedom. As I don't have much experience, could you please explain to me what is the problem? I'm studying the influence of several soil parameters (pH, NH4, OM, Moisture) on the abundance of some microbial groups (Nitrosotalea, Nitrosos_Cl1, Nitrosos_Cl3, Nitrosos_Cl4 e Nitrosos_Cl7), and on enzyme activity (PNA). Thanks in advance, Best regards Michele mod.pnr1 - specifyModel() PD_AOA - PNA, B1, NA Nitrosotalea - PNA, B2, NA Nitrosos_Cl1 - PNA, B3, NA Nitrosos_Cl3 - PNA, B4, NA Nitrosos_Cl4 - PNA, B5, NA Nitrosos_Cl7 - PNA, B6, NA pH - PNA, B9, NA NH4 - PNA, B10, NA OM - PNA, B11, NA Moisture - PNA, B12, NA pH - PD_AOA, B18, NA NH4 - PD_AOA, B19, NA OM - PD_AOA, B20, NA Moisture - PD_AOA, B21, NA pH - Nitrosotalea, B22, NA NH4 - Nitrosotalea, B23, NA OM - Nitrosotalea, B24, NA Moisture - Nitrosotalea, B25, NA pH - Nitrosos_Cl1, B26, NA NH4 - Nitrosos_Cl1, B27, NA OM - Nitrosos_Cl1, B28, NA Moisture - Nitrosos_Cl1, B29, NA pH - Nitrosos_Cl3, B30, NA NH4 - Nitrosos_Cl3, B31, NA OM - Nitrosos_Cl3, B32, NA Moisture - Nitrosos_Cl3, B33, NA pH - Nitrosos_Cl4, B34, NA NH4 - Nitrosos_Cl4, B35, NA OM - Nitrosos_Cl4, B36, NA Moisture - Nitrosos_Cl4, B37, NA pH - Nitrosos_Cl7, B38, NA NH4 - Nitrosos_Cl7, B39, NA OM - Nitrosos_Cl7, B40, NA Moisture - Nitrosos_Cl7, B41, NA Nitrosotalea - Nitrosos_Cl1, B53, NA Nitrosotalea - Nitrosos_Cl3, B54, NA Nitrosotalea - Nitrosos_Cl4, B55, NA Nitrosotalea - Nitrosos_Cl7, B56, NA Nitrosos_Cl1 - Nitrosotalea, B57, NA Nitrosos_Cl1 - Nitrosos_Cl3, B58, NA Nitrosos_Cl1 - Nitrosos_Cl4, B59, NA Nitrosos_Cl1 - Nitrosos_Cl7, B60, NA Nitrosos_Cl3 - Nitrosotalea, B61, NA Nitrosos_Cl3 - Nitrosos_Cl1, B62, NA Nitrosos_Cl3 - Nitrosos_Cl4, B63, NA Nitrosos_Cl3 - Nitrosos_Cl7, B64, NA Nitrosos_Cl4 - Nitrosotalea, B65, NA Nitrosos_Cl4 - Nitrosos_Cl1, B66, NA Nitrosos_Cl4 - Nitrosos_Cl3, B67, NA Nitrosos_Cl4 - Nitrosos_Cl7, B68, NA Nitrosos_Cl7 - Nitrosotalea, B69, NA Nitrosos_Cl7 - Nitrosos_Cl1, B70, NA Nitrosos_Cl7 - Nitrosos_Cl4, B71, NA Nitrosos_Cl7 - Nitrosos_Cl3, B72, NA pH - NH4, B42, NA pH - OM, B43, NA NH4 - OM, B45, NA NH4 - pH, B46, NA OM - NH4, B47, NA OM - Moisture, B48, NA OM - pH, B52, NA Moisture - OM, B70, NA PNA - PNA, e12, NA PD_AOA - PD_AOA, NA, 1 Nitrosotalea - Nitrosotalea, e5, NA Nitrosos_Cl1 - Nitrosos_Cl1, e6, NA Nitrosos_Cl3 - Nitrosos_Cl3, e7, NA Nitrosos_Cl4 - Nitrosos_Cl4, e8, NA Nitrosos_Cl7 - Nitrosos_Cl7, e9, NA NH4 - NH4, e6, NA OM - OM, e5, NA Moisture - Moisture, e7, NA pH - pH, NA, 1 -- Dra. Michele de C�ssia Pereira e Silva Escola Superior de Agricultura Luiz de Queiroz (ESALQ)/ USP Departamento de Ci�ncia do Solo e Nutri��o de Plantas Laborat�rio de Microbiologia do Solo Av P�dua Dias, 11 CP 09 CEP-13400-970 Piracicaba - S�o Paulo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] depth of labels of axis
On Sep 3, 2014, at 10:05 PM, Jinsong Zhao wrote: On 2014/9/3 21:33, Jinsong Zhao wrote: On 2014/9/2 11:50, David L Carlson wrote: The bottom of the expression is set by the lowest character (which can even change for subscripted letters with descenders. The solution is to get axis() to align the tops of the axis labels and move the line up to reduce the space, e.g. plot(1:5, xaxt = n) axis(1, at = 1:5, labels = c(expression(E[g]), E, expression(E[j]), E, expression(E[t])), padj=1, mgp=c(3, .1, 0)) # Check alignment abline(h=.7, xpd=TRUE, lty=3) yes. In this situation, padj = 1 is the fast solution. However, If there are also superscript, then it's hard to alignment all the labels. If R provide a mechanism that aligns the label in axis() or text() with the baseline of the character without the super- and/or sub-script, that will be terrific. it seems that the above wish is on the Graphics TODO lists: https://www.stat.auckland.ac.nz/~paul/R/graphicstodos.html Allow text adjustment for mathematical annotations which is relative to a text baseline (in addition to the current situation where adjustment is relative to the bounding box). In many case adding a phantom argument will correct aliognment problems: plot(1:5, xaxt = n) axis(1, at = 1:5, labels = c(expression(E[g]), E~phantom(E[g]), expression(E[j]), E~phantom(E[g]), expression(E[t]))) abline(h=.7, xpd=TRUE, lty=3) Notice that c(expression(.), ...) will coerce all items separated by commas to expressions, sot you cna just put in native expression that are not surrounded by the `expression`-function c(expression(E[g]), E~phantom(E[g]), expression(E[j]) ) #returns # expression(E[g], E ~ phantom(E[g]), E[j]) The tilde is actually a function that converts parse-able strings into R language objects: c(expression(E[g]), E~phantom(E[g]), ~E[j]) -- David. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jinsong Zhao Sent: Monday, September 1, 2014 6:41 PM To: r-help@r-project.org Subject: [R] depth of labels of axis Hi there, With the following code, plot(1:5, xaxt = n) axis(1, at = 1:5, labels = c(expression(E[g]), E, expression(E[j]), E, expression(E[t]))) you may notice that the E within labels of axis(1) are not at the same depth. So the vision of axis(1) labels is something like wave. Is there a possible way to typeset the labels so that they are have the same depth? Any suggestions will be really appreciated. Thanks in advance. Best regards, Jinsong David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Operator proposal: %between%
The TeachingDemos package has %% and %=% operators for a between style comparison. So for your example you could write: 1 %% 5 %% 10 or 1 %=% 5 %=% 10 And these operators already work with vectors: lb %=% x %% ub and can even be further chained: 0 %% x %% y %% z %% 1 # only points where x y and y z and all between 0 and 1. It is a little bit different syntax from yours, but would that do what you want? If not, we could add a %between% function (expand it a bit following Duncan's suggestion) to the TeachingDemos package if you don't want to create your own package. On Thu, Sep 4, 2014 at 8:41 AM, Torbjørn Lindahl torbjorn.lind...@gmail.com wrote: Not sure if this is the proper list to propose changes like this, if it passes constructive criticism, it would like to have a %between% operator in the R language. I currently have this in my local R startup script: `%between%` - function(x,...) { y - range( unlist(c(...)) ) return( x = y[1] x = y[2] ) } It allows me to do things like: 5 %between c(1,10) and also as act as an in_range operator: foo %between% a.long.list.with.many.values This may seem unnecessary, since 5 = foo[1] foo= foo[2] is also quite short to type, but there is a mental cost to this, eg if you are deeply focused on a complicated program flow, the %between% construct is a lot easier to type out, and relate to, than the logically more complex construct with and =/=, at least in my experience. -- mvh Torbjørn Lindahl [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] structural equation modeling in sem, error, The model has negative degrees of freedom = -3, and The model is almost surely misspecified...
Dear Michele, It's impossible to know without the data, since that's the only way to determine which variables in the model are observed and which are latent variables, but if there are negative df, then you're trying to estimate a model with more free parameters than there are moments (typically, covariances) among the observed variables. Clearly, such a model is necessarily underidentified. Additionally, I suggest that you use specifyEquations() in preference to specifyModel() to describe the model. That should prove simpler (but of course won't allow you to estimate an underidentified model). I hope this helps, John --- John Fox, Professor Chair, Sociology Graduate Programme McMaster University Hamilton, Ontario, Canada http://socserv.socsci.mcmaster.ca/jfox/ -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Michele Silva Sent: Thursday, September 04, 2014 2:58 PM To: r-help@r-project.org Subject: Re: [R] structural equation modeling in sem, error, The model has negative degrees of freedom = -3, and The model is almost surely misspecified... Dear Prof. John, I'm trying to solve the following model in R, but I getting error about the degree of freedom. As I don't have much experience, could you please explain to me what is the problem? I'm studying the influence of several soil parameters (pH, NH4, OM, Moisture) on the abundance of some microbial groups (Nitrosotalea, Nitrosos_Cl1, Nitrosos_Cl3, Nitrosos_Cl4 e Nitrosos_Cl7), and on enzyme activity (PNA). Thanks in advance, Best regards Michele mod.pnr1 - specifyModel() PD_AOA - PNA, B1, NA Nitrosotalea - PNA, B2, NA Nitrosos_Cl1 - PNA, B3, NA Nitrosos_Cl3 - PNA, B4, NA Nitrosos_Cl4 - PNA, B5, NA Nitrosos_Cl7 - PNA, B6, NA pH - PNA, B9, NA NH4 - PNA, B10, NA OM - PNA, B11, NA Moisture - PNA, B12, NA pH - PD_AOA, B18, NA NH4 - PD_AOA, B19, NA OM - PD_AOA, B20, NA Moisture - PD_AOA, B21, NA pH - Nitrosotalea, B22, NA NH4 - Nitrosotalea, B23, NA OM - Nitrosotalea, B24, NA Moisture - Nitrosotalea, B25, NA pH - Nitrosos_Cl1, B26, NA NH4 - Nitrosos_Cl1, B27, NA OM - Nitrosos_Cl1, B28, NA Moisture - Nitrosos_Cl1, B29, NA pH - Nitrosos_Cl3, B30, NA NH4 - Nitrosos_Cl3, B31, NA OM - Nitrosos_Cl3, B32, NA Moisture - Nitrosos_Cl3, B33, NA pH - Nitrosos_Cl4, B34, NA NH4 - Nitrosos_Cl4, B35, NA OM - Nitrosos_Cl4, B36, NA Moisture - Nitrosos_Cl4, B37, NA pH - Nitrosos_Cl7, B38, NA NH4 - Nitrosos_Cl7, B39, NA OM - Nitrosos_Cl7, B40, NA Moisture - Nitrosos_Cl7, B41, NA Nitrosotalea - Nitrosos_Cl1, B53, NA Nitrosotalea - Nitrosos_Cl3, B54, NA Nitrosotalea - Nitrosos_Cl4, B55, NA Nitrosotalea - Nitrosos_Cl7, B56, NA Nitrosos_Cl1 - Nitrosotalea, B57, NA Nitrosos_Cl1 - Nitrosos_Cl3, B58, NA Nitrosos_Cl1 - Nitrosos_Cl4, B59, NA Nitrosos_Cl1 - Nitrosos_Cl7, B60, NA Nitrosos_Cl3 - Nitrosotalea, B61, NA Nitrosos_Cl3 - Nitrosos_Cl1, B62, NA Nitrosos_Cl3 - Nitrosos_Cl4, B63, NA Nitrosos_Cl3 - Nitrosos_Cl7, B64, NA Nitrosos_Cl4 - Nitrosotalea, B65, NA Nitrosos_Cl4 - Nitrosos_Cl1, B66, NA Nitrosos_Cl4 - Nitrosos_Cl3, B67, NA Nitrosos_Cl4 - Nitrosos_Cl7, B68, NA Nitrosos_Cl7 - Nitrosotalea, B69, NA Nitrosos_Cl7 - Nitrosos_Cl1, B70, NA Nitrosos_Cl7 - Nitrosos_Cl4, B71, NA Nitrosos_Cl7 - Nitrosos_Cl3, B72, NA pH - NH4, B42, NA pH - OM, B43, NA NH4 - OM, B45, NA NH4 - pH, B46, NA OM - NH4, B47, NA OM - Moisture, B48, NA OM - pH, B52, NA Moisture - OM, B70, NA PNA - PNA, e12, NA PD_AOA - PD_AOA, NA, 1 Nitrosotalea - Nitrosotalea, e5, NA Nitrosos_Cl1 - Nitrosos_Cl1, e6, NA Nitrosos_Cl3 - Nitrosos_Cl3, e7, NA Nitrosos_Cl4 - Nitrosos_Cl4, e8, NA Nitrosos_Cl7 - Nitrosos_Cl7, e9, NA NH4 - NH4, e6, NA OM - OM, e5, NA Moisture - Moisture, e7, NA pH - pH, NA, 1 -- Dra. Michele de C�ssia Pereira e Silva Escola Superior de Agricultura Luiz de Queiroz (ESALQ)/ USP Departamento de Ci�ncia do Solo e Nutri��o de Plantas Laborat�rio de Microbiologia do Solo Av P�dua Dias, 11 CP 09 CEP-13400-970 Piracicaba - S�o Paulo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with regression
You will find lots of examples if you do an internet search for R quadratic regression Here's just one ... http://www.theanalysisfactor.com/r-tutorial-4/ Jean On Thu, Sep 4, 2014 at 10:40 AM, Basilius Sapientia basilius...@gmail.com wrote: I have this code: Vm - c(6.2208, 4.9736, 4.1423, 3.1031, 2.4795, 1.6483, 1.2328, 0.98357, 0.81746, 0.60998); #Molvolume p - c(0.4, 0.5, 0.6, 0.8, 1, 1.5, 2, 2.5, 3, 4)*1000; #Pressure Rydb - 8.3144621; #Constant Tempi - 300; #Temperature in Kelvin Vmi - Vm^(-1); #To get Vm^(-1) Zi - (p*Vm)/(Rydb*Tempi) #To get Z #Plot dframe - data.frame(Vmi, Zi) plot(dframe, pch=19, col='red', main='Thermodynamic properties of Argon', xlab='1/Vm', ylab='Z') #Fit for B fitb -lm(Zi ~ Vmi); fitb$coefficients[1]; fitb$coefficients[2]; summary(fitb) I want to make a regression on the data with this generel formula: y=1+Bx+Cx^2. I need to figure out what B and C in this formula is. Please help me! I want to become better to R. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] citation of a task view
Hi Achim and Murdoch, Thanks a lot! Cheers, Pablo Achim Zeileis achim.zeil...@uibk.ac.at escribió: On Thu, 4 Sep 2014, Duncan Murdoch wrote: On 04/09/2014 4:52 AM, Dr. Pablo E. Verde wrote: Hi all, Which is a formal bibliography citation of an R's task view? For example if I want to make a citation of MetaAnalysis task view. Thanks in advance! I don't think there is a recognized standard one. Not yet. But since this summer the web pages contain meta tags (both in Highwire Press and Dublin Core format) that state how the pages can be cited. I would use whatever format your journal requires for citing any web page, e.g. something like Lewin-Koh, Nicholas (2013). CRAN Task View: Graphic Displays Dynamic Graphics Graphic Devices Visualization. Web page with URL http://cran.r-project.org/web/views/Bayesian.html, retrieved Sept. 4, 2014. I would recommend two changes: (1) Use the official stable URL http://CRAN.R-project.org/view=... (2) Instead of the retrieved information, use the version date stated on the task view. For example for the current version of the MetaAnalysis view: Michael Dewey (2014). CRAN Task View: Meta-Analysis. Version 2014-07-25. URL http://CRAN.R-project.org/view=MetaAnalysis. or in BibTeX: @Misc{, author = {Michael Dewey}, note = {Version~2014-07-25}, title = {{CRAN} Task View: Meta-Analysis}, year = {2014}, url = {http://CRAN.R-project.org/view=MetaAnalysis} } hth, Z Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with regression
On Sep 4, 2014, at 8:40 AM, Basilius Sapientia wrote: I have this code: Vm - c(6.2208, 4.9736, 4.1423, 3.1031, 2.4795, 1.6483, 1.2328, 0.98357, 0.81746, 0.60998); #Molvolume p - c(0.4, 0.5, 0.6, 0.8, 1, 1.5, 2, 2.5, 3, 4)*1000; #Pressure Rydb - 8.3144621; #Constant Tempi - 300; #Temperature in Kelvin Vmi - Vm^(-1); #To get Vm^(-1) Zi - (p*Vm)/(Rydb*Tempi) #To get Z #Plot dframe - data.frame(Vmi, Zi) plot(dframe, pch=19, col='red', main='Thermodynamic properties of Argon', xlab='1/Vm', ylab='Z') #Fit for B fitb -lm(Zi ~ Vmi); fitb$coefficients[1]; fitb$coefficients[2]; summary(fitb) The appropriate approach to regression on polynomials is to use poly(.) fitb -lm(Zi ~ poly( Vmi, 2) ); fitb Call: lm(formula = Zi ~ poly(Vmi, 2)) Coefficients: (Intercept) poly(Vmi, 2)1 poly(Vmi, 2)2 0.9907137 -0.0198321 0.0006682 summary(fitb) Call: lm(formula = Zi ~ poly(Vmi, 2)) Residuals: Min 1Q Median 3QMax -2.622e-05 -7.785e-06 3.268e-06 1.047e-05 1.557e-05 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept)9.907e-01 4.884e-06 202853.74 2e-16 *** poly(Vmi, 2)1 -1.983e-02 1.544e-05 -1284.12 2e-16 *** poly(Vmi, 2)2 6.682e-04 1.544e-05 43.27 9.2e-10 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.544e-05 on 7 degrees of freedom Multiple R-squared: 1, Adjusted R-squared: 1 F-statistic: 8.254e+05 on 2 and 7 DF, p-value: 2.2e-16 The second order term has been constructed to not be highly correlated with the linear term. plot( Zi, predict(fitb) ) And now that you know that both terms are significant, construct that polynomial with: fitb -lm(Zi ~ Vmi+I(Vmi^2) ); fitb Call: lm(formula = Zi ~ Vmi + I(Vmi^2)) Coefficients: (Intercept) Vmi I(Vmi^2) 0.64-0.015025 0.001063 I want to make a regression on the data with this generel formula: y=1+Bx+Cx^2. I need to figure out what B and C in this formula is. Please help me! I want to become better to R. Please read the Posting Guide. It's really very easy to post in plain-text from gmail. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] depth of labels of axis
The problem with this approach is that the horizontal positioning of the labels is based on the width of the label including the phantom part so that the E's are pushed to the left of the tick mark (at least on my Windows machine). But it does provide a way of dealing with superscripts as long as the phantom is added to each label and hadj= is used to position the label horizontally, eg (changing the last label to a superscript for illustration): lbl - expression(E[g]~phantom(E[g]), E~phantom(E[g]), E[j]~phantom(E[g]), E~phantom(E[g]), E^t~phantom(E[g])) plot(1:5, xaxt = n) axis(1, at = 1:5, labels = lbl, hadj=.1) abline(h=.7, xpd=TRUE, lty=3) David C -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David Winsemius Sent: Thursday, September 4, 2014 2:25 PM To: Jinsong Zhao Cc: r-help@r-project.org Subject: Re: [R] depth of labels of axis On Sep 3, 2014, at 10:05 PM, Jinsong Zhao wrote: On 2014/9/3 21:33, Jinsong Zhao wrote: On 2014/9/2 11:50, David L Carlson wrote: The bottom of the expression is set by the lowest character (which can even change for subscripted letters with descenders. The solution is to get axis() to align the tops of the axis labels and move the line up to reduce the space, e.g. plot(1:5, xaxt = n) axis(1, at = 1:5, labels = c(expression(E[g]), E, expression(E[j]), E, expression(E[t])), padj=1, mgp=c(3, .1, 0)) # Check alignment abline(h=.7, xpd=TRUE, lty=3) yes. In this situation, padj = 1 is the fast solution. However, If there are also superscript, then it's hard to alignment all the labels. If R provide a mechanism that aligns the label in axis() or text() with the baseline of the character without the super- and/or sub-script, that will be terrific. it seems that the above wish is on the Graphics TODO lists: https://www.stat.auckland.ac.nz/~paul/R/graphicstodos.html Allow text adjustment for mathematical annotations which is relative to a text baseline (in addition to the current situation where adjustment is relative to the bounding box). In many case adding a phantom argument will correct aliognment problems: plot(1:5, xaxt = n) axis(1, at = 1:5, labels = c(expression(E[g]), E~phantom(E[g]), expression(E[j]), E~phantom(E[g]), expression(E[t]))) abline(h=.7, xpd=TRUE, lty=3) Notice that c(expression(.), ...) will coerce all items separated by commas to expressions, sot you cna just put in native expression that are not surrounded by the `expression`-function c(expression(E[g]), E~phantom(E[g]), expression(E[j]) ) #returns # expression(E[g], E ~ phantom(E[g]), E[j]) The tilde is actually a function that converts parse-able strings into R language objects: c(expression(E[g]), E~phantom(E[g]), ~E[j]) -- David. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jinsong Zhao Sent: Monday, September 1, 2014 6:41 PM To: r-help@r-project.org Subject: [R] depth of labels of axis Hi there, With the following code, plot(1:5, xaxt = n) axis(1, at = 1:5, labels = c(expression(E[g]), E, expression(E[j]), E, expression(E[t]))) you may notice that the E within labels of axis(1) are not at the same depth. So the vision of axis(1) labels is something like wave. Is there a possible way to typeset the labels so that they are have the same depth? Any suggestions will be really appreciated. Thanks in advance. Best regards, Jinsong David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Subset a column with specific characters
This post has NOT been accepted by the mailing list yet. I would like to subset a column based on the contents of a column with specific character. In the sample data I wish to do the following: First keep the data based on column prog if prog contains ca, and secondly to drop if race contains ic Thanks library(foreign) hsb2 - read.dta('http://www.ats.ucla.edu/stat/stata/notes/hsb2.dta') __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] find the data frames in list of objects and make a list of them
Thank you very much, Bill ! It has taken my a while to figure out, but yes, what I need is a list (the R object, list) of data frames and not a character vector containing the names of the data frames. Thank you very much. This works well and is getting me in the direction I want to go. Matthew On 8/13/2014 7:40 PM, William Dunlap wrote: Previously you asked A second question: is this the best way to make a list of data frames without having to manually type c(dataframe1, dataframe2, ...) ? If you use 'c' there you will not get a list of data.frames - you will get a list of all the columns in the data.frame you supplied. Use 'list' instead of 'c' if you are taking that route. The *apply functions are helpful here. To make list of all data.frames in an environment you can use the following function, which takes the environment to search as an argument. f - function(envir = globalenv()) { tmp - eapply(envir, all.names=TRUE, FUN=function(obj) if (is.data.frame(obj)) obj else NULL) # remove NULL's now tmp[!vapply(tmp, is.null, TRUE)] } Use is as allDataFrames - f(globalenv()) # or just f() Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Aug 13, 2014 at 3:49 PM, Matthew mccorm...@molbio.mgh.harvard.edu wrote: Hi Richard, Thank you very much for your reply and your code. Your code is doing just what I asked for, but does not seem to be what I need. I will need to review some basic R before I can continue. I am trying to list data frames in order to bind them into 1 single data frame with something like: dplyr::rbind_all(list of data frames), but when I try dplyr::rbind_all(lsDataFrame(ls())), I get the error: object at index 1 not a data.frame. So, I am going to have to learn some more about lists in R before proceding. Thank you for your help and code. Matthew Matthew On 8/13/2014 3:12 PM, Richard M. Heiberger wrote: I would do something like this lsDataFrame - function(xx=ls()) xx[sapply(xx, function(x) is.data.frame(get(x)))] ls(package:datasets) lsDataFrame(ls(package:datasets)) On Wed, Aug 13, 2014 at 2:56 PM, Matthew mccorm...@molbio.mgh.harvard.edu wrote: Hi everyone, I would like the find which objects are data frames in all the objects I have created ( in other words in what you get when you type: ls() ), then I would like to make a list of these data frames. Explained in other words; after typing ls(), you get the names of objects. Which objects are data frames ? How to then make a list of these data frames. A second question: is this the best way to make a list of data frames without having to manually type c(dataframe1, dataframe2, ...) ? Matthew __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] calculate Euclidean distances between populations in R with this data structure
I want to calculate Euclidean distance between 12 populations, in each population there are 20 samples and each sample is measured for 100 genes (these are microarray data; the numbers here are just examples). The equation I found is: distance = sqrt{[sum(Average of xi -average of yi)^2] /n }, i=1 to n; where xi and yi are the expression of gene i over two populations with p and q samples (x1, x2,...,xp), (y1, y2,...,yq), n is the number of genes. part of data are pasted below row.names pop1.1pop1.2 pop1.3 pop1.4 pop2.1 pop2.2 pop2.3 pop2.4 7A5 5.38194 4.06191 4.88044 5.60383 6.23101 6.53738 4.80336 5.86136 A1BG5.15155 4.29441 4.59131 4.90026 4.62908 4.48712 4.73039 4.46208 A1CF4.22396 4.14451 4.41465 3.93179 4.89638 4.66109 4.20918 4.48107 A26C3 12.1969 12.4179 10.9786 11.7659 11.405 11.7594 11.1757 11.8128 How might one calculate these distances in R with this data structure? Thanks, Ding - *SECURITY/CONFIDENTIALITY WARNING: This message and any attachments are intended solely for the individual or entity to which they are addressed. This communication may contain information that is privileged, confidential, or exempt from disclosure under applicable law (e.g., personal health information, research data, financial information). Because this e-mail has been sent without encryption, individuals other than the intended recipient may be able to view the information, forward it to others or tamper with the information without the knowledge or consent of the sender. If you are not the intended recipient, or the employee or person responsible for delivering the message to the intended recipient, any dissemination, distribution or copying of the communication is strictly prohibited. If you received the communication in error, please notify the sender immediately by replying to this message and deleting the message and any accompanying files from your system. If, due to the security risks, you do not wi! sh to receive further communications via e-mail, please reply to this message and inform the sender that you do not wish to receive further e-mail from the sender. (fpc5p) - [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Operator proposal: %between%
On Sep 4, 2014, at 12:54 PM, Greg Snow wrote: The TeachingDemos package has %% and %=% operators for a between style comparison. So for your example you could write: 1 %% 5 %% 10 or 1 %=% 5 %=% 10 And these operators already work with vectors: lb %=% x %% ub and can even be further chained: 0 %% x %% y %% z %% 1 # only points where x y and y z and all between 0 and 1. It is a little bit different syntax from yours, but would that do what you want? If not, we could add a %between% function (expand it a bit following Duncan's suggestion) to the TeachingDemos package if you don't want to create your own package. If you are accepting feature requests I would like to see a `%btwn%` function that would accept as its second argument either a two element numeric or alpha vector or a two column matrix of with the same number of rows as the first argument. Something along these lines: `%btwn%` - function(x,y) if(!is.null(dim(y))dim(y)[1] == length(x) ){x = y[,1] x y[,2]}else{x = y[1] x y[2]} 4 %btwn% c(2,6) [1] TRUE -- David On Thu, Sep 4, 2014 at 8:41 AM, Torbjørn Lindahl torbjorn.lind...@gmail.com wrote: Not sure if this is the proper list to propose changes like this, if it passes constructive criticism, it would like to have a %between% operator in the R language. I currently have this in my local R startup script: `%between%` - function(x,...) { y - range( unlist(c(...)) ) return( x = y[1] x = y[2] ) } It allows me to do things like: 5 %between c(1,10) and also as act as an in_range operator: foo %between% a.long.list.with.many.values This may seem unnecessary, since 5 = foo[1] foo= foo[2] is also quite short to type, but there is a mental cost to this, eg if you are deeply focused on a complicated program flow, the %between% construct is a lot easier to type out, and relate to, than the logically more complex construct with and =/=, at least in my experience. -- mvh Torbjørn Lindahl [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subset a column with specific characters
On Sep 4, 2014, at 2:58 PM, Kuma Raj wrote: This post has NOT been accepted by the mailing list yet. Well, it has now. Were you earlier posting from Nabble? (Not an efficient strategy.) I would like to subset a column based on the contents of a column with specific character. In the sample data I wish to do the following: First keep the data based on column prog if prog contains ca, and secondly to drop if race contains ic Thanks library(foreign) hsb2 - read.dta('http://www.ats.ucla.edu/stat/stata/notes/hsb2.dta') NROW( hsb2[ grepl(ca, hsb2$prog) !grepl(ic, hsb2$race) , ] ) [1] 120 -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calculate Euclidean distances between populations in R with this data structure
I'd probably start with ?dist Sarah On Thu, Sep 4, 2014 at 4:10 PM, Ding, Yuan Chun ycd...@coh.org wrote: I want to calculate Euclidean distance between 12 populations, in each population there are 20 samples and each sample is measured for 100 genes (these are microarray data; the numbers here are just examples). The equation I found is: distance = sqrt{[sum(Average of xi -average of yi)^2] /n }, i=1 to n; where xi and yi are the expression of gene i over two populations with p and q samples (x1, x2,...,xp), (y1, y2,...,yq), n is the number of genes. part of data are pasted below row.names pop1.1pop1.2 pop1.3 pop1.4 pop2.1 pop2.2 pop2.3 pop2.4 7A5 5.38194 4.06191 4.88044 5.60383 6.23101 6.53738 4.80336 5.86136 A1BG5.15155 4.29441 4.59131 4.90026 4.62908 4.48712 4.73039 4.46208 A1CF4.22396 4.14451 4.41465 3.93179 4.89638 4.66109 4.20918 4.48107 A26C3 12.1969 12.4179 10.9786 11.7659 11.405 11.7594 11.1757 11.8128 How might one calculate these distances in R with this data structure? Thanks, Ding -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calculate Euclidean distances between populations in R with this data structure
Hi, Please keep your replies on the R-help list so others may participate in the conversation. On Thu, Sep 4, 2014 at 8:12 PM, Ding, Yuan Chun ycd...@coh.org wrote: Hi Sarah, Thank you very much for your quick response. I checked the dist() function. It calculate distance between two samples with a number of variables. Variable1 variable 2 variable 3 variable4 X 3 5 67 Y 4 8 910 So it is easy to calculate distance between x and y. But in my study, X is a group with 20 samples and y is another group with 30 samples, so I need to calculate distance between x group between y group. That doesn't make any sense to me. If the variables are different, how can you calculate a distance between them? You also potentially run into scaling issues. Also, your original question (below) stated that your populations have 20 samples. I think I need to get mean for each group, then use dist() function. I tried to find a R package to do it. I think you'd be better off reconsidering what you're trying to accomplish. Sarah Thanks, Ding -Original Message- From: Sarah Goslee [mailto:sarah.gos...@gmail.com] Sent: Thursday, September 04, 2014 4:49 PM To: Ding, Yuan Chun Cc: r-help@R-project.org Subject: Re: [R] calculate Euclidean distances between populations in R with this data structure I'd probably start with ?dist Sarah On Thu, Sep 4, 2014 at 4:10 PM, Ding, Yuan Chun ycd...@coh.org wrote: I want to calculate Euclidean distance between 12 populations, in each population there are 20 samples and each sample is measured for 100 genes (these are microarray data; the numbers here are just examples). The equation I found is: distance = sqrt{[sum(Average of xi -average of yi)^2] /n }, i=1 to n; where xi and yi are the expression of gene i over two populations with p and q samples (x1, x2,...,xp), (y1, y2,...,yq), n is the number of genes. part of data are pasted below row.names pop1.1pop1.2 pop1.3 pop1.4 pop2.1 pop2.2 pop2.3 pop2.4 7A5 5.38194 4.06191 4.88044 5.60383 6.23101 6.53738 4.80336 5.86136 A1BG5.15155 4.29441 4.59131 4.90026 4.62908 4.48712 4.73039 4.46208 A1CF4.22396 4.14451 4.41465 3.93179 4.89638 4.66109 4.20918 4.48107 A26C3 12.1969 12.4179 10.9786 11.7659 11.405 11.7594 11.1757 11.8128 How might one calculate these distances in R with this data structure? Thanks, Ding -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calculate Euclidean distances between populations in R with this data structure
Hi Sarah, Thank you very much for your quick response. I checked the dist() function. It calculate distance between two samples with a number of variables. Variable1 variable 2 variable 3 variable4 X 3 5 67 Y 4 8 910 So it is easy to calculate distance between x and y. But in my study, X is a group with 20 samples and y is another group with 30 samples, so I need to calculate distance between x group and y group. I think I need to calculate a mean for each group, then use dist() function. I tried to find a R package to do it. Thanks, Ding -Original Message- From: Sarah Goslee [mailto:sarah.gos...@gmail.com] Sent: Thursday, September 04, 2014 4:49 PM To: Ding, Yuan Chun Cc: r-help@R-project.org Subject: Re: [R] calculate Euclidean distances between populations in R with this data structure I'd probably start with ?dist Sarah On Thu, Sep 4, 2014 at 4:10 PM, Ding, Yuan Chun ycd...@coh.org wrote: I want to calculate Euclidean distance between 12 populations, in each population there are 20 samples and each sample is measured for 100 genes (these are microarray data; the numbers here are just examples). The equation I found is: distance = sqrt{[sum(Average of xi -average of yi)^2] /n }, i=1 to n; where xi and yi are the expression of gene i over two populations with p and q samples (x1, x2,...,xp), (y1, y2,...,yq), n is the number of genes. part of data are pasted below row.names pop1.1pop1.2 pop1.3 pop1.4 pop2.1 pop2.2 pop2.3 pop2.4 7A5 5.38194 4.06191 4.88044 5.60383 6.23101 6.53738 4.80336 5.86136 A1BG5.15155 4.29441 4.59131 4.90026 4.62908 4.48712 4.73039 4.46208 A1CF4.22396 4.14451 4.41465 3.93179 4.89638 4.66109 4.20918 4.48107 A26C3 12.1969 12.4179 10.9786 11.7659 11.405 11.7594 11.1757 11.8128 How might one calculate these distances in R with this data structure? Thanks, Ding -- Sarah Goslee http://www.functionaldiversity.org - *SECURITY/CONFIDENTIALITY WARNING: This message and any attachments are intended solely for the individual or entity to which they are addressed. This communication may contain information that is privileged, confidential, or exempt from disclosure under applicable law (e.g., personal health information, research data, financial information). Because this e-mail has been sent without encryption, individuals other than the intended recipient may be able to view the information, forward it to others or tamper with the information without the knowledge or consent of the sender. If you are not the intended recipient, or the employee or person responsible for delivering the message to the intended recipient, any dissemination, distribution or copying of the communication is strictly prohibited. If you received the communication in error, please notify the sender immediately by replying to this message and deleting the message and any accompanying files from your system. If, due to the security risks, you do not wi! sh to receive further communications via e-mail, please reply to this message and inform the sender that you do not wish to receive further e-mail from the sender. (fpc5p) - __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R-es] Como se hace el operador o (OR) para seleccionar dos o mas niveles de un vector ?
Muchas gracias a los dos !! Slds, eric. On Thu 04 Sep 2014 03:36:52 CLT, Carlos Ortega wrote: También puedes utilizar el parámetro subset dentro de xyplot(). O aplicarlo primero fuera de xyplot()... http://www.ats.ucla.edu/stat/r/modules/subsetting.htm Saludos, Carlos Ortega www.qualityexcellence.es http://www.qualityexcellence.es El 4 de septiembre de 2014, 0:41, eric ericconchamu...@gmail.com mailto:ericconchamu...@gmail.com escribió: Estimados, tengo un data.frame con una columna que tiene tres diferentes niveles (aunque la columna no es propiamente de un factor, son solo tres letras diferentes), por ejemplo c, t y s, y necesito usar los datos que tienen c o t, como tengo que hacerlo ? Creo que a veces he usado algo asi: dataframe - dataframe[dataframe$columna==c(c,t),] pero por alguna razon, cuando uso esa forma dentro del codigo para crear un grafico, por ejemplo: xyplot(are ~ con | sol, data=datEnd[datEnd$iso==c(c,t),]) el resultado no es correcto. Alguna idea ? Muchas gracias, Eric. -- Forest Engineer Master in Environmental and Natural Resource Economics Ph.D. student in Sciences of Natural Resources at La Frontera University Member in AguaDeTemu2030, citizen movement for Temuco with green city standards for living Nota: Las tildes se han omitido para asegurar compatibilidad con algunos lectores de correo. ___ R-help-es mailing list R-help-es@r-project.org mailto:R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es -- Saludos, Carlos Ortega www.qualityexcellence.es http://www.qualityexcellence.es -- Forest Engineer Master in Environmental and Natural Resource Economics Ph.D. student in Sciences of Natural Resources at La Frontera University Member in AguaDeTemu2030, citizen movement for Temuco with green city standards for living Nota: Las tildes se han omitido para asegurar compatibilidad con algunos lectores de correo. ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es
[R-es] Paquetes Rhadoop.
Hola, antes que nada, muchas gracias por aceptar mi solicitud =). Tengo unos inconvenientes y quisiera saber si alguien me podría ayudar con lo siguiente: necesito instalar los paquetes rhdfs, rmr, rhbase en Rhadoop, no se donde descargarme los paquetes y tampoco como seria su instalación en R. Estoy empezando un trabajo de Big Data en R y Hadoop y les agradeceria mucho por cualquier información que me puedan dar sobre el tema del manejo de datos en Rhadoop. Atte. Jessica G. Morocho G. [[alternative HTML version deleted]] ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es