[R] Different output from lm() and lmPerm lmp() if categorical variables are included in the analysis
I've found a problem when using categorical variables in lmp() from package lmPerm According to help(lmp): This function will behave identically to lm() if the following parameters are set: perm=, seq=TRUE, center=FALSE.) But not in the case of including categorical variables: require(lmPerm) set.seed(42) testx1 - rnorm(100,10,5) testx2 - c(rep(a,50),rep(b,50)) testy - 5*testx1 + 3 + runif(100,-20,20) test - data.frame(x1=testx1,x2= testx2,y=testy) atest - lm(y ~ x1*x2,data=test) aptest - lmp(y ~ x1*x2,data=test,perm = , seqs = TRUE, center = FALSE) summary(atest) Call: lm(formula = y ~ x1 * x2, data = test) Residuals: Min 1Q Median 3Q Max -17.1777 -9.5306 -0.9733 7.6840 22.2728 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -2.0036 3.2488 -0.6170.539 x15.3346 0.2861 18.646 2e-16 *** x2b 2.4952 5.2160 0.4780.633 x1:x2b -0.3833 0.4568 -0.8390.404 summary(aptest) Call: lmp(formula = y ~ x1 * x2, data = test, perm = , seqs = TRUE, center = FALSE) Residuals: Min 1Q Median 3Q Max -17.1777 -9.5306 -0.9733 7.6840 22.2728 Coefficients: Estimate Std. Error t value Pr(|t|) x1 5.1429 0.2284 22.516 2e-16 *** x21 -1.2476 2.6080 -0.4780.633 x1:x21 0.1917 0.2284 0.8390.404 It looks like lmp() is internally coding dummy variables in a different way, so lmp results are for a (named 1 by lmp) while lm results are for b ? Agus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] ndl 0.2.13 released today.
Dear R-package Folk, I am pleased to announce the release of a new version of the ndl package ( http://cran.r-project.org/web/packages/ndl/) What is NDL? It is a simple learning model based on the Rescorla-Wagner model of discrimination learning. I have become the new maintainer, replacing Dr. Antti Arppe (thank you for all your hard work, Antti!) This release is a major update. In particular the following items are new since the last CRAN version (v 0.1.6 from Dec. 2012) * improved speed in counting co-occurrence counts through the use of Rcpp and C++ functions. * improved scalability: it can process many millions of events, with much larger numbers of cues and outcomes. * support for Unicode text * new ability to count background rates (optional) * new method for converting counts to probabilities -- and many other small improvements. For access to the development branch, to join development, or to submit issues, please go to: https://bitbucket.org/cyrusshaoul/ndl/ Best regards, Cyrus -- Cyrus Shaoul, PhD http://www.sfs.uni-tuebingen.de/~cshaoul/ [[alternative HTML version deleted]] ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with interaction in lmer even after creating an interaction variable
Thank you very much, First, sorry for posting on wrong mailing list, I did not know that there exists a special one for lmer. Yes, there are collinearities in the data. Still, I would like to have the variables in one model to compare explained variability. Is there some option, or it is simply impossible? Thank you, Anna On 07.11.2013 16:03, bbolker [via R] wrote: a_lampei anna.lampei-bucharova at uni-tuebingen.de writes: Dear all, I have a problem with interactions in lmer. I have 2 factors (garden and gebiet) which interact, plus one other variable (home), dataframe arr. When I put: / lmer (biomass ~ home + garden:gebiet + ( 1|Block), data = arr)/ it writes: /Error in lme4::lFormula(formula = biomass ~ home + garden:gebiet + (1 | : rank of X = 28 ncol(X) = 30/ In the lmer help I found out that if not all combination of the interaction are realized, lmer has problems and one should do new variable using interaction, which I did: / arr$agg - interaction (arr$gebiet, arr$garden, drop = TRUE)/ when I fit the interaction term now: / lmer (biomass ~ home + agg+ ( 1|Block), data = arr)/ The error does not change: / Error in lme4::lFormula(formula = biomass ~ home + agg + (1 | Block), : rank of X = 28 ncol(X) = 29/ No NAs are in the given variables in the dataset. Interestingly it works when I put only given interaction like /lmer (biomass ~ agg + ( 1|Block), data = arr)/ Even following models work: /lmer (biomass ~ gebiet*garden + ( 1|Block), data = arr) lmer (biomass ~ garden + garden:gebiet +( 1|Block), data = arr)/ But if I add the interaction term in th enew formate of the new fariable, it reports again the same error. /lmer (biomass ~ garden + agg +( 1|Block), data = arr)/ If I put any other variable from the very same dataframe (not only variable home), the error is reported again. I do not understand it, the new variable is just another factor now, or? And it is in the same dataframe, it has the same length. Does anyone have any idea? Thanks a lot, Anna This probably belongs on r-sig-mixed-models. Presumably 'home' is still correlated with one of the columns of 'garden:gebiet'. Here's an example of how you can use svd() to find out which of your columns are collinear: set.seed(101) d - data.frame(x=runif(100),y=1:100,z=2:101) m - model.matrix(~x+y+z,data=d) s - svd(m) zapsmall(s$d) ## [1] 828.8452 6.6989 2.6735 0. ## this tells us there is one collinear component zapsmall(s$v) ##[,1] [,2] [,3] [,4] ## [1,] -0.0105005 -0.7187260 0.3872847 0.5773503 ## [2,] -0.0054954 -0.4742947 -0.8803490 0.000 ## [3,] -0.7017874 0.3692117 -0.1945349 0.5773503 ## [4,] -0.7122879 -0.3495142 0.1927498 -0.5773503 ## this tells us that the first (intercept), third (y), ## and fourth (z) column of the model matrix are ## involved in the collinear term, i.e. ## 1+y-z is zero __ [hidden email] /user/SendEmail.jtp?type=nodenode=4679965i=0 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/problem-with-interaction-in-lmer-even-after-creating-an-interaction-variable-tp4679951p4679965.html To unsubscribe from problem with interaction in lmer even after creating an interaction variable, click here http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4679951code=YW5uYS5sYW1wZWktYnVjaGFyb3ZhQHVuaS10dWViaW5nZW4uZGV8NDY3OTk1MXwyMjkyNTIzNjM=. NAML http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://r.789695.n4.nabble.com/problem-with-interaction-in-lmer-even-after-creating-an-interaction-variable-tp4679951p4680022.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Graph dashboard
Hi, I have been exploring graph dashboards like http://sematext.com/img/products/spm/spm-solr-overview.png. I use R but haven't attempted to create a dashboard like this. I am thinking of parsing logs and showing dynamic logs - logs that fit into a small window but move left or right with new data - in a browser or just a R graphics window. R shiny could be useful. But that is a browser view. Isn't it ? So the dynamic view works differently. What are the recommendations ? The logs that will be parsed are all static. Thanks, Mohan This e-Mail may contain proprietary and confidential information and is sent for the intended recipient(s) only. If by an addressing or transmission error this mail has been misdirected to you, you are requested to delete this mail immediately. You are also hereby notified that any use, any form of reproduction, dissemination, copying, disclosure, modification, distribution and/or publication of this e-mail message, contents or its attachment other than by its intended recipient/s is strictly prohibited. Visit us at http://www.polarisFT.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inserting 17M entries into env took 18h, inserting 34M entries taking 5+ days
Thanks to Thomas, Martin, Jim and William, Your input was very informative, and thanks for the reference to Sedgwick. In the end, it does seem to me that all these algorithms require fast lookup by ID of nodes to access data, and that conditional on such fast lookup, algorithms are possible with efficiency O(n) or O(n*log(n)) (depending on whether lookup time is constant or logarithmic). I believe my original algorithm achieves that. We come back to the fact that I assumed that R environments, implemented as hash tables, would give me that fast lookup. But on my systems, their efficiency (for insert and lookup) seems to degrade fast at several million entries. Certainly much faster than either O(1) or O(log(n)). I believe this does not have to do with disk access time. For example, I tested this on my desktop computer, running a pure hash insert loop. I observe 100% processor use but no disk access, as the size of the hash table approaches millions of entries. I have tested this on two systems, but have not gone into the implementation of the hashed environments to look at this in details. If others have the same (or different) experiences with using hashed environments with millions of entries, it would be very useful to know. Barring a solution to the hashed environment speed, it seems the way to speed this algorithm up (within the confines of R) would be to move away from hash tables and towards a numerically indexed array. Thanks again for all of the help, Magnus On 11/4/2013 8:20 PM, Thomas Lumley wrote: On Sat, Nov 2, 2013 at 11:12 AM, Martin Morgan mtmor...@fhcrc.org mailto:mtmor...@fhcrc.org wrote: On 11/01/2013 08:22 AM, Magnus Thor Torfason wrote: Sure, I was attempting to be concise and boiling it down to what I saw as the root issue, but you are right, I could have taken it a step further. So here goes. I have a set of around around 20M string pairs. A given string (say, A) can either be equivalent to another string (B) or not. If A and B occur together in the same pair, they are equivalent. But equivalence is transitive, so if A and B occur together in one pair, and A and C occur together in another pair, then A and C are also equivalent. I need a way to quickly determine if any two strings from my data set are equivalent or not. Do you mean that if A,B occur together and B,C occur together, then A,B and A,C are equivalent? Here's a function that returns a unique identifier (not well tested!), allowing for transitive relations but not circularity. uid - function(x, y) { i - seq_along(x) # global index xy - paste0(x, y) # make unique identifiers idx - match(xy, xy) repeat { ## transitive look-up y_idx - match(y[idx], x) # look up 'y' in 'x' keep - !is.na http://is.na(y_idx) if (!any(keep)) # no transitive relations, done! break x[idx[keep]] - x[y_idx[keep]] y[idx[keep]] - y[y_idx[keep]] ## create new index of values xy - paste0(x, y) idx - match(xy, xy) } idx } Values with the same index are identical. Some tests x - c(1, 2, 3, 4) y - c(2, 3, 5, 6) uid(x, y) [1] 1 1 1 4 i - sample(x); uid(x[i], y[i]) [1] 1 1 3 1 uid(as.character(x), as.character(y)) ## character() ok [1] 1 1 1 4 uid(1:10, 1 + 1:10) [1] 1 1 1 1 1 1 1 1 1 1 uid(integer(), integer()) integer(0) x - c(1, 2, 3) y - c(2, 3, 1) uid(x, y) ## circular! C-c C-c I think this will scale well enough, but the worst-case scenario can be made to be log(longest chain) and copying can be reduced by using an index i and subsetting the original vector on each iteration. I think you could test for circularity by checking that the updated x are not a permutation of the kept x, all(x[y_idx[keep]] %in% x[keep])) Martin This problem (union-find) is discussed in Chapter 1 of Sedgwick's Algorithms. There's an algorithm given that takes linear time to build the structure, worst-case logarithmic time to query, and effectively constant average time to query (inverse-Ackerman amortized complexity). -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal,
[R] Remove a column of a matrix with unnamed column header
I have a matrix names test which I want to convert to a data frame. When I use a command test2-as.data.frame(test) it is executed without a problem. But when I want to browse the content I receive an error message Error in data.frame(outcome = c(cardva, respir, cereb, neoplasm, : duplicate row.names: Estimate . The problem is clearly due to a duplicate in row name . But I am unable to remove this column. I need help on how to remove this specific column that has essentially no column header name. dput of the matrix is here: dput(test) structure(c(cardva, respir, cereb, neoplasm, ami, ischem, heartf, pneumo, copd, asthma, dysrhy, diabet, 0.00259492159959046, 0.00979775441709427, 0.00103414632535868, 0.00486468139227382, 0.0164825543879707, 0.0116647168053943, -0.0012137908515233, 0.00730433232907741, 0.00355583994565985, 0.000712387285735019, -0.00103763671307935, 0.00981500221106926, 0.00325476724733837, 0.0049232113728293, 0.00520118026087645, 0.00386848394426742, 0.00688121694253705, 0.00585772614064902, 0.00564983058883797, 0.0061328202328586, 0.0108212194251692, 0.0173804438930357, 0.00867931407250442, 0.0106638104533486, 0.425323120845664, 0.0466180768654915, 0.842402292743715, 0.208609687427072, 0.0166336682608816, 0.0464833846710956, 0.8299010611324, 0.233685747699204, 0.742469001175026, 0.967306766450795, 0.904840885401235, 0.357394700741248), .Dim = c(12L, 4L), .Dimnames = list( c(Estimate, Estimate, Estimate, Estimate, Estimate, Estimate, Estimate, Estimate, Estimate, Estimate, Estimate, Estimate), c(outcome, beta, se, pval ))) test2-as.data.frame(test) test2 Error in data.frame(outcome = c(cardva, respir, cereb, neoplasm, : duplicate row.names: Estimate [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Crime hotspot maps (kernel density)
Hi everybody, does anyone of you know how to create a (crime) hotspot map using R? Are there any packages or do you know any ressources? It should be something like this: http://www.caliper.com/Maptitude/Crime/MotorVehicleTheft2.png (but it doesnt necessarely have to be a map) Many thanks, David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prod and F90 product
sounds like FAQ 7.31 Sent from my iPad On Nov 7, 2013, at 18:39, Filippo ingf...@gmail.com wrote: Hi, I'm having strange differences between the R function prod ad the F90 function product. Processing the same vector C (see attachment). I get 2 different results: prod(C) = 1.069678e-307 testProduct(C) = 0 where testProd is the following wrapping function: testProd - function(x) { return(.Fortran('testProd', as.double(x), as.double(0), as.double(0), as.integer(length(x } subroutine testProd(x, p, q, n) implicit none integer, intent (in) :: n double precision, intent (in) :: x(n) double precision, intent (out) :: p double precision, intent (out) :: q integer :: i p = product(x) q=1 do i = 1, n q = q*x(i) end do end subroutine testProd I check the lowest possible number and seems to be the same for both R and F90. Can anyone help me understanding this behaviour? Thank you in advance Regards, Filippo C __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Remove a column of a matrix with unnamed column header
On 08-11-2013, at 10:40, Kuma Raj pollar...@gmail.com wrote: I have a matrix names test which I want to convert to a data frame. When I use a command test2-as.data.frame(test) it is executed without a problem. But when I want to browse the content I receive an error message Error in data.frame(outcome = c(cardva, respir, cereb, neoplasm, : duplicate row.names: Estimate . The problem is clearly due to a duplicate in row name . But I am unable to remove this column. I need help on how to remove this specific column that has essentially no column header name. dput of the matrix is here: dput(test) structure(c(cardva, respir, cereb, neoplasm, ami, ischem, heartf, pneumo, copd, asthma, dysrhy, diabet, 0.00259492159959046, 0.00979775441709427, 0.00103414632535868, 0.00486468139227382, 0.0164825543879707, 0.0116647168053943, -0.0012137908515233, 0.00730433232907741, 0.00355583994565985, 0.000712387285735019, -0.00103763671307935, 0.00981500221106926, 0.00325476724733837, 0.0049232113728293, 0.00520118026087645, 0.00386848394426742, 0.00688121694253705, 0.00585772614064902, 0.00564983058883797, 0.0061328202328586, 0.0108212194251692, 0.0173804438930357, 0.00867931407250442, 0.0106638104533486, 0.425323120845664, 0.0466180768654915, 0.842402292743715, 0.208609687427072, 0.0166336682608816, 0.0464833846710956, 0.8299010611324, 0.233685747699204, 0.742469001175026, 0.967306766450795, 0.904840885401235, 0.357394700741248), .Dim = c(12L, 4L), .Dimnames = list( c(Estimate, Estimate, Estimate, Estimate, Estimate, Estimate, Estimate, Estimate, Estimate, Estimate, Estimate, Estimate), c(outcome, beta, se, pval ))) test2-as.data.frame(test) test2 Error in data.frame(outcome = c(cardva, respir, cereb, neoplasm, : duplicate row.names: Estimate rownames(test) - NULL Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Remove a column of a matrix with unnamed column header
Beend, Thanks for that. Conversion of test to a data frame resulted in a factor. Is there a possibility to selectively convert to numeric? I have tried this code and that has not produced the intended result. test[, c(2:4)] - sapply(test[, c(2:4)], as.numeric) On 8 November 2013 11:31, Berend Hasselman b...@xs4all.nl wrote: On 08-11-2013, at 10:40, Kuma Raj pollar...@gmail.com wrote: I have a matrix names test which I want to convert to a data frame. When I use a command test2-as.data.frame(test) it is executed without a problem. But when I want to browse the content I receive an error message Error in data.frame(outcome = c(cardva, respir, cereb, neoplasm, : duplicate row.names: Estimate . The problem is clearly due to a duplicate in row name . But I am unable to remove this column. I need help on how to remove this specific column that has essentially no column header name. dput of the matrix is here: dput(test) structure(c(cardva, respir, cereb, neoplasm, ami, ischem, heartf, pneumo, copd, asthma, dysrhy, diabet, 0.00259492159959046, 0.00979775441709427, 0.00103414632535868, 0.00486468139227382, 0.0164825543879707, 0.0116647168053943, -0.0012137908515233, 0.00730433232907741, 0.00355583994565985, 0.000712387285735019, -0.00103763671307935, 0.00981500221106926, 0.00325476724733837, 0.0049232113728293, 0.00520118026087645, 0.00386848394426742, 0.00688121694253705, 0.00585772614064902, 0.00564983058883797, 0.0061328202328586, 0.0108212194251692, 0.0173804438930357, 0.00867931407250442, 0.0106638104533486, 0.425323120845664, 0.0466180768654915, 0.842402292743715, 0.208609687427072, 0.0166336682608816, 0.0464833846710956, 0.8299010611324, 0.233685747699204, 0.742469001175026, 0.967306766450795, 0.904840885401235, 0.357394700741248), .Dim = c(12L, 4L), .Dimnames = list( c(Estimate, Estimate, Estimate, Estimate, Estimate, Estimate, Estimate, Estimate, Estimate, Estimate, Estimate, Estimate), c(outcome, beta, se, pval ))) test2-as.data.frame(test) test2 Error in data.frame(outcome = c(cardva, respir, cereb, neoplasm, : duplicate row.names: Estimate rownames(test) - NULL Berend [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error running MuMIn dredge function using glmer models
There is indeed a glitch in 'dredge' that prevents you from seeing the actual error message. It is explained in ?dredge, in section Missing values. (it's been corrected now in 1.9.14, on R-forge) kamil On 2013-11-08 11:00, r-help-requ...@r-project.org wrote: -- Message: 26 Date: Thu, 7 Nov 2013 11:55:50 -0500 From: Martin Turcottemart.turco...@gmail.com To:r-help@r-project.org Subject: [R] Error running MuMIn dredge function using glmer models Message-ID:1e4f5497-ccb4-4e8b-a23a-8aa5e1136...@gmail.com Content-Type: text/plain Dear list, I am trying to use MuMIn to compare all possible mixed models using the dredge function on binomial data but I am getting an error message that I cannot decode. This error only occurs when I use glmer. When I use an lmer analysis on a different response variable every works great. Example using a simplified glmer model global model: mod- glmer(cbind(st$X2.REP.LIVE, st$X2.REP.DEAD) ~ DOMESTICATION*GLUC + (1|PAIR), data=st, na.action=na.omit , family=binomial) The response variables are the number of survival and dead insects (successes and failures) DOMESTICATION is a 2 level factor. GLUC is a continuous variable. PAIR is coded as a factor or character (both ways fail). This model functions correctly but when I try it with dredge() I get an error. g- dredge(mod, beta=F, evaluate=F, rank='AIC') Error in sprintf(gettext(fmt, domain = domain), ...) : invalid type of argument[1]: 'symbol' When I try with another rank the same thing happens: chat- deviance(mod)/58 g- dredge(mod, beta=F, evaluate=F, rank='QAIC', chat=chat) Error in sprintf(gettext(fmt, domain = domain), ...) : invalid type of argument[1]: 'symbol' Any suggestions would be greatly appreciated thanks Martin Turcotte, Ph. D. mart.turco...@gmail.com The University of Aberdeen is a charity registered in Scotland, No SC013683. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to derive true surface area from `computeContour3d' (misc3d package)
I want to compute the total surface area of an isosurface in 3D space as approximated with `computeContour3d' by adding up the areas of the triangles making up the contour surface. problem: the vertex matrix returned by `computeContour3d' in general seems to provide the vertices not in the frame of reference in which the original data are given but apparently rather after some linear transformation (scaling + translation (+ rotation?) -- or I am having some fundamental misconception of what is going on. I'm interested in the simplest case where the input data are provided as a 3D array on an equidistant grid (i.e. leaving the x,y,z arguments at their defaults). e.g. (slight modification of `example(computeContour3d)'): library(misc3d) x - seq(-1,1,len=11) g - expand.grid(x = x, y = x, z = x) v - array(g$x^4 + g$y^4 + g$z^4, rep(length(x),3)) con - computeContour3d(v, max(v), 1) drawScene(makeTriangles(con)) this is (approximately) a cube with edge length 10 (taking the grid spacing as the unit of length). so the expected (approximate) surface area is 600. indeed, apply(con, 2, range) yields [,1] [,2] [,3] [1,]111 [2,] 11 11 11 which might be interpreted as providing the vertices in coordinates where the grid spacing is used as unit of length. however I get an area of only about 430 instead of approx. 600 which is already a much much larger deviation from the ideal cube surface than I would have expected given the small amount of smoothing at the box edges and corners (but I have to double-check whether my triangle area computation is right, although I believe it is). choosing instead x - seq(-2,2,len=50) however, the corresponding range of `con' is [,1] [,2] [,3] [1,] 13.274 13.274 13.274 [2,] 37.726 37.726 37.726 which cannot be the grid coordinates (which should be in the range [1,50]). adopting this interpretation nevertheless (vertices are given in grid coordinates) the sum of the triangle areas only amounts to about 2600 instead of the expected approx. 49^2*6 = 14406 question 1: am I making a stupid error (if so which one...)? if not so: question 2: is there a linear transformation from the original grid coordinates (with range from 1 to dim(v)[n], n=1:3) involved which yields the reported vertex coordinates? question 3: could someone please explain where to find this information (even if hidden in the source code of the package) how to convert the vertex coordinates as delivered by `computeContour3D' to 'grid coordinates' (or true world coordinates in general (if the x,y,z arguments are specified, too)? for the wishlist: it would of course be nice if `computeContour3d' would indeed return the total surface area itself, e.g. as an attribute of the triangles matrix. for the devs: there is a typo in the manpage of this function: Value: A matrix of three columns representing the triangles making up the contour surface. Each row represents a vertex and goups of three rows represent a triangle. (should be `groups' instead of `goups') -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to derive true surface area from `computeContour3d' (misc3d package) -- follow up
regarding my previous mail for this topic, I have in the meantime identified my misconception. actually, `computeContour3d' returns the vertices just fine in the correct coordinate frame. the misconception was caused basically by assuming that the `level' argument was a fractional threshold relative to the maximum of the array. so I believed that the rendered cube actually is the outer surface of the defined object in the example provided in the manpage. I know understandt it's an absolute level and `example(computeContour3d)' consequently displays some interior isocontour. this explains all my apparent errors. I believe the manpage would benefit from a slight clarification that `level' actually is an absolute, not a relative/fractional threshold. apologies for the noise. j. ps: it of course would still be nice, if the surface area (or a vector containing the individual triangle areas) were returned to the caller as well ... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to derive true surface area from `computeContour3d' (misc3d package) -- follow up
On Fri, Nov 8, 2013 at 1:01 PM, j. van den hoff veedeeh...@googlemail.com wrote: ps: it of course would still be nice, if the surface area (or a vector containing the individual triangle areas) were returned to the caller as well ... Does the 'surfaceArea' function in the sp package do what you want? It's Edzer's integration of an R function that I wrote that calls some C code that someone else wrote that implements an algorithm from 2004. You just need to coerce your grid data into the right form. Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to derive true surface area from `computeContour3d' (misc3d package) -- follow up
On Fri, 08 Nov 2013 15:32:00 +0100, Barry Rowlingson b.rowling...@lancaster.ac.uk wrote: On Fri, Nov 8, 2013 at 1:01 PM, j. van den hoff veedeeh...@googlemail.com wrote: ps: it of course would still be nice, if the surface area (or a vector containing the individual triangle areas) were returned to the caller as well ... Does the 'surfaceArea' function in the sp package do what you want? It's Edzer's integration of an R function that I wrote that calls some C code that someone else wrote that implements an algorithm from 2004. You just need to coerce your grid data into the right form. not quite, I believe: I need to compute the area of a (closed) iso-surface of a 3D object defined by samples on a discrete 3D grid. if I understand correctly from a quick view on `sp', `surfaceArea' does compute the surface integral of some function z = f(x,y), instead. but thanks for the pointer anyway, this might still be useful in a different context. joerg Barry -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with interaction in lmer even after creating an interaction variable
a_lampei anna.lampei-bucharova at uni-tuebingen.de writes: Thank you very much, First, sorry for posting on wrong mailing list, I did not know that there exists a special one for lmer. Yes, there are collinearities in the data. Still, I would like to have the variables in one model to compare explained variability. Is there some option, or it is simply impossible? Thank you, Anna The development version of lme4 has an (experimental) feature that automatically removes collinear columns of the model matrix; you could try that. Further discussion on r-sig-mixed-models ... Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Remove a column of a matrix with unnamed column header
Hi, Try: test2 - as.data.frame(test,stringsAsFactors=FALSE) test2[,c(2:4)] - lapply(test2[,c(2:4)],as.numeric) A.K. On Friday, November 8, 2013 6:24 AM, Kuma Raj pollar...@gmail.com wrote: Beend, Thanks for that. Conversion of test to a data frame resulted in a factor. Is there a possibility to selectively convert to numeric? I have tried this code and that has not produced the intended result. test[, c(2:4)] - sapply(test[, c(2:4)], as.numeric) On 8 November 2013 11:31, Berend Hasselman b...@xs4all.nl wrote: On 08-11-2013, at 10:40, Kuma Raj pollar...@gmail.com wrote: I have a matrix names test which I want to convert to a data frame. When I use a command test2-as.data.frame(test) it is executed without a problem. But when I want to browse the content I receive an error message Error in data.frame(outcome = c(cardva, respir, cereb, neoplasm, : duplicate row.names: Estimate . The problem is clearly due to a duplicate in row name . But I am unable to remove this column. I need help on how to remove this specific column that has essentially no column header name. dput of the matrix is here: dput(test) structure(c(cardva, respir, cereb, neoplasm, ami, ischem, heartf, pneumo, copd, asthma, dysrhy, diabet, 0.00259492159959046, 0.00979775441709427, 0.00103414632535868, 0.00486468139227382, 0.0164825543879707, 0.0116647168053943, -0.0012137908515233, 0.00730433232907741, 0.00355583994565985, 0.000712387285735019, -0.00103763671307935, 0.00981500221106926, 0.00325476724733837, 0.0049232113728293, 0.00520118026087645, 0.00386848394426742, 0.00688121694253705, 0.00585772614064902, 0.00564983058883797, 0.0061328202328586, 0.0108212194251692, 0.0173804438930357, 0.00867931407250442, 0.0106638104533486, 0.425323120845664, 0.0466180768654915, 0.842402292743715, 0.208609687427072, 0.0166336682608816, 0.0464833846710956, 0.8299010611324, 0.233685747699204, 0.742469001175026, 0.967306766450795, 0.904840885401235, 0.357394700741248), .Dim = c(12L, 4L), .Dimnames = list( c(Estimate, Estimate, Estimate, Estimate, Estimate, Estimate, Estimate, Estimate, Estimate, Estimate, Estimate, Estimate), c(outcome, beta, se, pval ))) test2-as.data.frame(test) test2 Error in data.frame(outcome = c(cardva, respir, cereb, neoplasm, : duplicate row.names: Estimate rownames(test) - NULL Berend [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SNPRelate: Plink conversion
Hi, Following my earlier posts about having problems performing a PCA, I have worked out what the problem is. The problem lies within the PLINK to gds conversion. It seems as though the SNPs are imported as samples and in turn, the samples are recognised as SNPs: snpsgdsSummary(chr2L) Some values of snp.position are invalid (should be 0)! Some values of snp.chromosome are invalid (should be finite and =1)! Some of snp.allele are not standard! E.g, 2/-9 The file name: chr2L The total number of samples: 2638506 The total number of SNPs: 67 SNP genotypes are stored in SNP-major mode. The number of valid samples: 2638506 The number of valid SNPs: 0 Anyone have any ideas on how to fix this? Thanks, Danica [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SNPRelate: Plink conversion
Doesn't this belong on Bioconductor rather than here? -- Bert On Fri, Nov 8, 2013 at 6:04 AM, Danica Fabrigar danica_...@hotmail.com wrote: Hi, Following my earlier posts about having problems performing a PCA, I have worked out what the problem is. The problem lies within the PLINK to gds conversion. It seems as though the SNPs are imported as samples and in turn, the samples are recognised as SNPs: snpsgdsSummary(chr2L) Some values of snp.position are invalid (should be 0)! Some values of snp.chromosome are invalid (should be finite and =1)! Some of snp.allele are not standard! E.g, 2/-9 The file name: chr2L The total number of samples: 2638506 The total number of SNPs: 67 SNP genotypes are stored in SNP-major mode. The number of valid samples: 2638506 The number of valid SNPs: 0 Anyone have any ideas on how to fix this? Thanks, Danica [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error running MuMIn dredge function using glmer models
Removing na.action=na.omit solved the problem. Thanks for the help and thanks Dr. BatoÅ for making such useful package. Mart Martin Turcotte, Ph. D. mart.turco...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Uploading Google Spreadsheet data into R
Stripping down to the bare essentials seems to get it. In particular making the query just select * instead of select * where B!='' works. You don't need the processing that the more complicated Guardian web page requires. After loading the RCurl package and creating the gsqAPI function: tmp=gsqAPI(0AkvLBhzbLcz5dHljNGhUdmNJZ0dOdGJLTVRjTkRhTkE,selec t *, 0) str(tmp) 'data.frame': 9 obs. of 3 variables: $ COL1: chr 25/10/2013 25/10/2013 31/10/2013 31/10/2013 ... $ COL2: int 50 10 16 18 25 34 56 47 50 $ COL3: chr TEXT TEXT TEXT TEXT TEXT ... tmp COL1 COL2 COL3 1 25/10/2013 50 TEXT 2 25/10/2013 10 TEXT TEXT 3 31/10/2013 16 TEXT 4 31/10/2013 18 TEXT 5 31/10/2013 25 TEXT TEXT 6 31/10/2013 34 TEXT 7 31/10/2013 56 TEXT 8 31/10/2013 47 TEXT 9 31/10/2013 50 TEXT - David L Carlson Department of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Luca Meyer Sent: Friday, November 8, 2013 1:33 AM To: r-help@r-project.org Subject: [R] Uploading Google Spreadsheet data into R Hello, I am trying to upload data I have on a Google Spreadsheet within R to perform some analysis. I regularly update such data and need to perform data analysis in the quickiest possible way - i.e. without need to publish the data, so I was wondering how to make work this piece of code (source http://www.r-bloggers.com/datagrabbing-commonly-formatted-sheets -from-a-google-spreadsheet-guardian-2014-university-guide-data/) with my dataset (see https://docs.google.com/spreadsheet/ccc?key=0AkvLBhzbLcz5dHljNGh UdmNJZ0dOdGJLTVRjTkRhTkE#gid=0 ): library(RCurl) gsqAPI = function(key,query,gid=0){ tmp=getURL( paste( sep=,'https://spreadsheets.google.com/tq?', 'tqx=out:csv','tq=', curlEscape(query), 'key=', key, 'gid=', gid), ssl.verifypeer = FALSE ) return( read.csv( textConnection( tmp ), stringsAsFactors=F ) ) } handler=function(key,i){ tmp=gsqAPI(key,select * where B!='', i) subject=sub(.Rank,'',colnames(tmp)[1]) colnames(tmp)[1]=Subject.Rank tmp$subject=subject tmp } key='0AkvLBhzbLcz5dHljNGhUdmNJZ0dOdGJLTVRjTkRhTkE' gdata=handler(key,0) The code is currently returning the following: Error in `$-.data.frame`(`*tmp*`, subject, value = COL1) : replacement has 1 row, data has 0 Thank you in advance, Luca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Uploading Google Spreadsheet data into R
It does indeed. Thank you David, Luca 2013/11/8 David Carlson dcarl...@tamu.edu Stripping down to the bare essentials seems to get it. In particular making the query just select * instead of select * where B!='' works. You don't need the processing that the more complicated Guardian web page requires. After loading the RCurl package and creating the gsqAPI function: tmp=gsqAPI(0AkvLBhzbLcz5dHljNGhUdmNJZ0dOdGJLTVRjTkRhTkE,selec t *, 0) str(tmp) 'data.frame': 9 obs. of 3 variables: $ COL1: chr 25/10/2013 25/10/2013 31/10/2013 31/10/2013 ... $ COL2: int 50 10 16 18 25 34 56 47 50 $ COL3: chr TEXT TEXT TEXT TEXT TEXT ... tmp COL1 COL2 COL3 1 25/10/2013 50 TEXT 2 25/10/2013 10 TEXT TEXT 3 31/10/2013 16 TEXT 4 31/10/2013 18 TEXT 5 31/10/2013 25 TEXT TEXT 6 31/10/2013 34 TEXT 7 31/10/2013 56 TEXT 8 31/10/2013 47 TEXT 9 31/10/2013 50 TEXT - David L Carlson Department of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Luca Meyer Sent: Friday, November 8, 2013 1:33 AM To: r-help@r-project.org Subject: [R] Uploading Google Spreadsheet data into R Hello, I am trying to upload data I have on a Google Spreadsheet within R to perform some analysis. I regularly update such data and need to perform data analysis in the quickiest possible way - i.e. without need to publish the data, so I was wondering how to make work this piece of code (source http://www.r-bloggers.com/datagrabbing-commonly-formatted-sheets -from-a-google-spreadsheet-guardian-2014-university-guide-data/) with my dataset (see https://docs.google.com/spreadsheet/ccc?key=0AkvLBhzbLcz5dHljNGh UdmNJZ0dOdGJLTVRjTkRhTkE#gid=0 ): library(RCurl) gsqAPI = function(key,query,gid=0){ tmp=getURL( paste( sep=,'https://spreadsheets.google.com/tq?', 'tqx=out:csv','tq=', curlEscape(query), 'key=', key, 'gid=', gid), ssl.verifypeer = FALSE ) return( read.csv( textConnection( tmp ), stringsAsFactors=F ) ) } handler=function(key,i){ tmp=gsqAPI(key,select * where B!='', i) subject=sub(.Rank,'',colnames(tmp)[1]) colnames(tmp)[1]=Subject.Rank tmp$subject=subject tmp } key='0AkvLBhzbLcz5dHljNGhUdmNJZ0dOdGJLTVRjTkRhTkE' gdata=handler(key,0) The code is currently returning the following: Error in `$-.data.frame`(`*tmp*`, subject, value = COL1) : replacement has 1 row, data has 0 Thank you in advance, Luca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] select .txt from .txt in a directory
Hi, I have 300 .txt files in a directory. Out of this 300, I need just 100 of the files. I have the names of the 100 .txt files which are also found in the 300 .txt files. How can I extract only the 100 .txt files from the 300 ,txt files? e.g given d1.txt, ds.txt, dx.txt, df.txt...d300.txt, how can I select only d1.txt and df.txt? Remember, I have 300 of such and want to extract 100 of them with names known. Thanks for your great help. Atem. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nonnormal Residuals and GAMs
Hi Colin, The GAMLSS package allows modelling of the response variable distribution using either Exponential family or non-Exponential family distributions. It also allows modelling of the scale parameter (and hence the dispersion parameter for Exponential family distributions) using explanatory variables. This can be important for selecting mean model terms and is particularly important when interest lies in the variance and/or quantiles of the response variable. Robert Rigby On 06/11/13 21:46, Collin Lynch wrote: Greetings, My question is more algorithmic than prectical. What I am trying to determine is, are the GAM algorithms used in the mgcv package affected by nonnormally-distributed residuals? As I understand the theory of linear models the Gauss-Markov theorem guarantees that least-squares regression is optimal over all unbiased estimators iff the data meet the conditions linearity, homoscedasticity, independence, and normally-distributed residuals. Absent the last requirement it is optimal but only over unbiased linear estimators. What I am trying to determine is whether or not it is necessary to check for normally-distributed errors in a GAM from mgcv. I know that the unsmoothed terms, if any, will be fitted by ordinary least-squares but I am unsure whether the default Penalized Iteratively Reweighted Least Squares method used in the package is also based upon this assumption or falls under any analogue to the Gauss-Markov Theorem. Thank you in advance for any help. Sincrely, Collin Lynch Companies Act 2006 : http://www.londonmet.ac.uk/companyinfo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] select .txt from .txt in a directory
How do you decide which ones you need? Is there some pattern that lets you distinguish needing df.txt from not needing ds.txt? You say you have the names - how do you have them? In a text file? What are you trying to do with the text files? Sarah On Fri, Nov 8, 2013 at 12:33 PM, Zilefac Elvis zilefacel...@yahoo.com wrote: Hi, I have 300 .txt files in a directory. Out of this 300, I need just 100 of the files. I have the names of the 100 .txt files which are also found in the 300 .txt files. How can I extract only the 100 .txt files from the 300 ,txt files? e.g given d1.txt, ds.txt, dx.txt, df.txt...d300.txt, how can I select only d1.txt and df.txt? Remember, I have 300 of such and want to extract 100 of them with names known. Thanks for your great help. Atem. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] select .txt from .txt in a directory
I do not understand the question. If you already know the names what is the problem to select the files by names? If you have the names but not inside of R you have to find a name pattern to avoid typing them in. Is there a pattern, e.g. da.txt, db.txt, dc.txt? On 08 Nov 2013, at 18:33, Zilefac Elvis zilefacel...@yahoo.com wrote: Hi, I have 300 .txt files in a directory. Out of this 300, I need just 100 of the files. I have the names of the 100 .txt files which are also found in the 300 .txt files. How can I extract only the 100 .txt files from the 300 ,txt files? e.g given d1.txt, ds.txt, dx.txt, df.txt...d300.txt, how can I select only d1.txt and df.txt? Remember, I have 300 of such and want to extract 100 of them with names known. Thanks for your great help. Atem. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] select .txt from .txt in a directory
1. Please don't post in HTML (see posting guide). 2. What do you mean by extract? 3. Your qiestion sounds very basic. Have you read An Introduction to R or other online R tutorial? If not please do so before posting further. All of R's file input functions allow you to specify the directory path and/or filename, so if I understand you correctly, it's just a matter of giving them to the appropriate function in some sort of loop. e.g. something like alldat - lapply(filenameList, function(x)InputFunction(x,...)) 4. If you need something fancier than is described in the tutorials, consult the R data Import/Export manual, please. -- Bert On Fri, Nov 8, 2013 at 9:33 AM, Zilefac Elvis zilefacel...@yahoo.com wrote: Hi, I have 300 .txt files in a directory. Out of this 300, I need just 100 of the files. I have the names of the 100 .txt files which are also found in the 300 .txt files. How can I extract only the 100 .txt files from the 300 ,txt files? e.g given d1.txt, ds.txt, dx.txt, df.txt...d300.txt, how can I select only d1.txt and df.txt? Remember, I have 300 of such and want to extract 100 of them with names known. Thanks for your great help. Atem. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] select .txt from .txt in a directory
All files are text files. They are found in a folder on my computer. Assume that I know the names of some of the files I want to select from the 300 txt files. How can I do this in R. Atem. On Friday, November 8, 2013 11:44 AM, Simon Zehnder szehn...@uni-bonn.de wrote: I do not understand the question. If you already know the names what is the problem to select the files by names? If you have the names but not inside of R you have to find a name pattern to avoid typing them in. Is there a pattern, e.g. da.txt, db.txt, dc.txt? On 08 Nov 2013, at 18:33, Zilefac Elvis zilefacel...@yahoo.com wrote: Hi, I have 300 .txt files in a directory. Out of this 300, I need just 100 of the files. I have the names of the 100 .txt files which are also found in the 300 .txt files. How can I extract only the 100 .txt files from the 300 ,txt files? e.g given d1.txt, ds.txt, dx.txt, df.txt...d300.txt, how can I select only d1.txt and df.txt? Remember, I have 300 of such and want to extract 100 of them with names known. Thanks for your great help. Atem. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] select .txt from .txt in a directory
If you want to type in the names by hand, you can simply use read.table to load them into R … I still don’t get the aim of your text file handling On 08 Nov 2013, at 18:51, Zilefac Elvis zilefacel...@yahoo.com wrote: All files are text files. They are found in a folder on my computer. Assume that I know the names of some of the files I want to select from the 300 txt files. How can I do this in R. Atem. On Friday, November 8, 2013 11:44 AM, Simon Zehnder szehn...@uni-bonn.de wrote: I do not understand the question. If you already know the names what is the problem to select the files by names? If you have the names but not inside of R you have to find a name pattern to avoid typing them in. Is there a pattern, e.g. da.txt, db.txt, dc.txt? On 08 Nov 2013, at 18:33, Zilefac Elvis zilefacel...@yahoo.com wrote: Hi, I have 300 .txt files in a directory. Out of this 300, I need just 100 of the files. I have the names of the 100 .txt files which are also found in the 300 .txt files. How can I extract only the 100 .txt files from the 300 ,txt files? e.g given d1.txt, ds.txt, dx.txt, df.txt...d300.txt, how can I select only d1.txt and df.txt? Remember, I have 300 of such and want to extract 100 of them with names known. Thanks for your great help. Atem. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Adding Proxy information in 'R' application
R-Help Mailing List, I'm currently working with a user who is actively trying to download install libraries for 'R' on her office PC. While using install.packages(packageName, dependencies = TRUE) works without a problem on our home PCs, we use a proxy at the firm and therefore it doesn't let the application go directly out of the network on port 80. Is there a way to manually set proxy information within the application so that it can, indeed, reach the internet when we're trying to download and install libraries (and necessary dependencies) from within the application? I've gone through some of the options but there's nothing there for it. Regards, José Emmanuel Batista If you are not an intended recipient of this e-mail, you are not authorized to duplicate, copy, retransmit or redistribute it by any means. Please delete it and any attachments immediately and notify the sender that you have received it in error. Unless specifically indicated, this e-mail is not an offer to buy or sell or a solicitation to buy or sell any securities, investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Neuberger Berman. Any views or opinions presented are solely those of the author and do not necessarily represent those of Neuberger Berman. This e-mail is subject to terms available at the following link: www.nb.com/disclaimer/usa.html. By messaging with Neuberger Berman you consent to the foregoing. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to show the second abline ?
Hi, I have the next script in R: x=c(8.0,17.5,23.5,32.0,38.5,48.5,58.5,68.5) y=c(267,246,290,294,302,301,301,298) gap.plot(x,y,ylim=c(8,310),pch=8,cex=0.5, xlab=c('Time'),ylab=c('uS'), gap=c(30,240),gap.axis='y', ytics=c(10,20,30,270,280,290,300)) abline(h=31,col='white',lwd=20) axis.break(axis=2,31) axis.break(axis=4,31) abline(coef(lm(x~y)),col=1)#Why don't show this??? Thank's for Your help, Péter -- Domokos Péter BSc student Babes-Bolyai University Biology and Geology Faculty Hungarian Department of Biology and Ecology Cluj Napoca Romania [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SNPRelate: Plink conversion
Hi Bert, I thought it was suitable to post the question on the R mailing list first seeing as the problem/question is related to an R package. Danica Date: Fri, 8 Nov 2013 08:14:03 -0800 Subject: Re: [R] SNPRelate: Plink conversion From: gunter.ber...@gene.com To: danica_...@hotmail.com CC: r-help@r-project.org Doesn't this belong on Bioconductor rather than here? -- Bert On Fri, Nov 8, 2013 at 6:04 AM, Danica Fabrigar danica_...@hotmail.com wrote: Hi, Following my earlier posts about having problems performing a PCA, I have worked out what the problem is. The problem lies within the PLINK to gds conversion. It seems as though the SNPs are imported as samples and in turn, the samples are recognised as SNPs: snpsgdsSummary(chr2L) Some values of snp.position are invalid (should be 0)! Some values of snp.chromosome are invalid (should be finite and =1)! Some of snp.allele are not standard! E.g, 2/-9 The file name: chr2L The total number of samples: 2638506 The total number of SNPs: 67 SNP genotypes are stored in SNP-major mode. The number of valid samples: 2638506 The number of valid SNPs: 0 Anyone have any ideas on how to fix this? Thanks, Danica [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to show the second abline ?
I have no idea where gap.plot() came from, so I can't reproduce this, but you almost certainly need y ~ x in your formula. abline(coef(lm(y ~ x)),col=1) Sarah On Fri, Nov 8, 2013 at 11:04 AM, Domokos Péter dom...@gmail.com wrote: Hi, I have the next script in R: x=c(8.0,17.5,23.5,32.0,38.5,48.5,58.5,68.5) y=c(267,246,290,294,302,301,301,298) gap.plot(x,y,ylim=c(8,310),pch=8,cex=0.5, xlab=c('Time'),ylab=c('uS'), gap=c(30,240),gap.axis='y', ytics=c(10,20,30,270,280,290,300)) abline(h=31,col='white',lwd=20) axis.break(axis=2,31) axis.break(axis=4,31) abline(coef(lm(x~y)),col=1)#Why don't show this??? Thank's for Your help, Péter -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fw: select .txt from .txt in a directory
Hi, I'm not particularly interested in opening unsolicited binary attachments. Why don't you use dput() to provide part of your data to the R-help list (copied on this email; emailing just me not being that useful). You still haven't told us what you want to do with the named text files - read them into R? In general, you would read the file with the list of names into R, then use a loop or a *apply construct to import each of those named files. Based on what you've said, the fact that your desired list has only 100 of the 300 total files is a red herring. Sarah On Fri, Nov 8, 2013 at 1:30 PM, Zilefac Elvis zilefacel...@yahoo.com wrote: Hi Sarah Attached are my data files. Btemperature_Stations is my main file. Temperature inventory is my 'wanted' file and is a subset of Btemperature_Stations. Using column 3 in both files, select the files in Temperature inventory from Btemperature_Stations. The .zip file contains the .txt files which you will extract to a folder and do the selection in R. Thanks, Atem. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Revolutions blog: October roundup
Revolution Analytics staff write about R every weekday at the Revolutions blog: http://blog.revolutionanalytics.com and every month I post a summary of articles from the previous month of particular interest to readers of r-help. In case you missed them, here are some articles related to R from the month of October: Joe Rickert recounts the R presence at the Strata + Hadoop World conference, including slides from the R and Hadoop tutorial: http://bit.ly/1acPavl Hadley Wickham's favorite tools, gadgets and software (including of course R): http://bit.ly/1acPavm Revolution R Enterprise 7 is announced, including R 3.0.2: http://bit.ly/1acP8DM I was interviewed on camera by theCUBE about R, data science, and Revolution R Enterprise 7: http://bit.ly/1acP8DO Patrick Burns shares some good reasons for switching from spreadsheets to R for data analysis: http://bit.ly/1acP8DN R is used for several sports-related analyses at the The New England Symposium of Statistics in Sport: http://bit.ly/1acPavk Some tips for using the .Rprofile file to customize your R session at startup: http://bit.ly/1acP8DP Quandl’s introduction to econometrics using R: http://bit.ly/1acPavr Video replay of a recent webinar by DataSong on implementing time-to-event models with Revolution R Enterprise: http://bit.ly/1acP8DQ Revolution R Enterprise is now integrated with Alteryx to provide a drag-and-drop GUI workflow for R: http://bit.ly/1acPavt An article in Forbes discusses using R from the Alteryx drag-and-drop workflow interface: http://bit.ly/1acPavs Joe Rickert reviews the sessions at the ACM 2013 Big Data Camp: http://bit.ly/1acP8DR The New York Times published an article on fantasy football analysis with R: http://bit.ly/1acPaLG The latest Rexer poll shows the use of R continues to skyrocket. It’s the most popular data mining tool by a wide margin: http://bit.ly/1acPaLH Replays of two recent webinar presentations on using R on Hadoop, presented by Cloudera http://bit.ly/1acPaLI and Hortonworks http://bit.ly/1acP8U8 in conjunction with Revolution Analytics. Tips and resources for using R for signal processing and time series analysis: http://bit.ly/1acP8U7 The popular data-visualization software Tableau adds integration with R: http://bit.ly/1acP8U5 An interactive web tool explains Simpson’s paradox: http://bit.ly/1acPaLJ R-related presentations from the DataWeek 2013 conference, including how an IBM division replaced SAS with R: http://bit.ly/1acP8U9 Some non-R stories in the past month included: remembering video stores (http://bit.ly/1acP8Ub), some optical illusion trickery (http://bit.ly/1acPaLK), better voting systems (http://bit.ly/1acPaLL), a funny interpretation of air safety videos (http://bit.ly/1acP8Ua) and a discussion on how to get ROT from analytics (http://bit.ly/1acPaLM). Meeting times for local R user groups (http://bit.ly/eC5YQe) can be found on the updated R Community Calendar at: http://bit.ly/bb3naW If you're looking for more articles about R, you can find summaries from previous months at http://blog.revolutionanalytics.com/roundups/. You can receive daily blog posts via email using services like blogtrottr.com, or join the Revolution Analytics mailing list at http://bit.ly/MH2I2Q to be alerted to new articles on a monthly basis. As always, thanks for the comments and please keep sending suggestions to me at da...@revolutionanalytics.com . Don't forget you can also follow the blog using an RSS reader, or by following me on Twitter (I'm @revodavid). Cheers, # David -- David M Smith da...@revolutionanalytics.com VP of Marketing, Revolution Analytics http://blog.revolutionanalytics.com Tel: +1 (650) 646-9523 (Seattle WA, USA) Twitter: @revodavid We're hiring! www.revolutionanalytics.com/careers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] select .txt from .txt in a directory
Elvis, first, keep things on the list - so others can learn and comment. Second, as Sarah already commented: We do not like to open unsolicited binary attachments on the list. Sarah gives a good hint how to post data to the list. What I would do so far is use the matching columns to get the names you need from BTemperature: temp_inv - read.table(Temperature Inventory, … ) (here I would change the .xlsx to a .csv and use read.csv instead of read.table) btemp - read.table(“BTemperature_Stations.txt”, … ) (again think about converting via Excel to .csv - it makes things far more easy) Check ?read.table for options - you gonna need them. Then match mynames - btemp[(temp_inv[,3] %in% btemp[, 3]), 2] Now you have the names of the stations and if your .txt files are named by the stations you can do something like: for (name in mynames) { tmp.table - read.table(paste(“path/to/your/Homog_daily_min_temp/“, name, “.txt”, sep = “”), … ) …. do things with the data } Best Simon On 08 Nov 2013, at 19:26, Zilefac Elvis zilefacel...@yahoo.com wrote: Hi Simon, Attached are my data files. Btemperature_Stations is my main file. Temperature inventory is my 'wanted' file and is a subset of Btemperature_Stations. Using column 3 in both files, select the files in Temperature inventory from Btemperature_Stations. The .zip file contains the .txt files which you will extract to a folder and do the selection in R. Thanks, Atem. On Friday, November 8, 2013 11:54 AM, Simon Zehnder szehn...@uni-bonn.de wrote: If you want to type in the names by hand, you can simply use read.table to load them into R … I still don’t get the aim of your text file handling On 08 Nov 2013, at 18:51, Zilefac Elvis zilefacel...@yahoo.com wrote: All files are text files. They are found in a folder on my computer. Assume that I know the names of some of the files I want to select from the 300 txt files. How can I do this in R. Atem. On Friday, November 8, 2013 11:44 AM, Simon Zehnder szehn...@uni-bonn.de wrote: I do not understand the question. If you already know the names what is the problem to select the files by names? If you have the names but not inside of R you have to find a name pattern to avoid typing them in. Is there a pattern, e.g. da.txt, db.txt, dc.txt? On 08 Nov 2013, at 18:33, Zilefac Elvis zilefacel...@yahoo.com wrote: Hi, I have 300 .txt files in a directory. Out of this 300, I need just 100 of the files. I have the names of the 100 .txt files which are also found in the 300 .txt files. How can I extract only the 100 .txt files from the 300 ,txt files? e.g given d1.txt, ds.txt, dx.txt, df.txt...d300.txt, how can I select only d1.txt and df.txt? Remember, I have 300 of such and want to extract 100 of them with names known. Thanks for your great help. Atem. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. BTemperature_Stations.txtTempearture inventory.xlsxHomog_daily_min_temp.zip __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Date handling in R is hard to understand
Dear All, I usually work with time series data. The data may come in AM/PM date format or on 24 hour time basis. R can not recognize the two differences automatically - at least for me. I have to specifically tell R in which time format the data is. It seems that Pandas knows how to handle date without being told the format. The problem arises when I try to shift time by a certain time. Say adding 3600 to shift it forward, that case I have to use something like: Measured_data$Date - as.POSIXct(as.character(Measured_data$Date), tz=,format = %m/%d/%Y %I:%M %p)+3600 or Measured_data$Date - as.POSIXct(as.character(Measured_data$Date), tz=,format = %m/%d/%Y %H:%M)+3600 depending on the format. The date also attaches MDT or MST and so on. When merging two data frames with dates of different format that may create a problem (I think). When I get data from excel it could be in any/random format and I needed to customize the date to use in R in one of the above formats. Any TIPS - for automatic processing with no need to specifically tell the data format ? Another problem I saw was that when using r bind to bind data frames, if one column of one of the data frames is a character data (say for example none - coming from mysql) format R doesn't know how to concatenate numeric column from the other data frame to it. I needed to change the numeric to character and later after binding takes place I had to re-convert it to numeric. But, this causes problem in an automated environment. Any suggestion ? Thanks Mihretu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] C50 Node Assignment
In my role as a moderator I am attempting to bypass the automatic mail filters that are blocking this posting. Please reply to the list and to: = Kevin Shaney kevin.sha...@rosetta.com C50 Node Assignment I am using C50 to classify individuals into 5 groups / categories (factor variable). The tree / set of rules has 10 rules for classification. I am trying to extract the RULE for which each individual qualifies (a number between 1 and 10), and cannot figure out how to do so. I can extract the predicted group and predicted group probability, but not the RULE to which an individual qualifies. Please let me know if you can help! Kevin = -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding Proxy information in 'R' application
Hi Jose I faced the same problem at my workplace too - the solution (at least for us) was to insert the following function into the Rprofile.ste file in the etc folder in the R install folder - or, if the .First function already exists, you could just insert the line beginning Sys.setenv( .) into that function. The bit in the speech marks needs to be a suitable web proxy address and port, and the 'http_proxy_user=ask' bit is telling R to ask for a username and password. .First - function() { Sys.setenv(http_proxy=http://webproxy:8080 http_proxy_user=ask) } Hope this works for you! Cheers, Carina On 8 November 2013 17:10, Batista, Jose jose.bati...@nb.com wrote: R-Help Mailing List, I'm currently working with a user who is actively trying to download install libraries for 'R' on her office PC. While using install.packages(packageName, dependencies = TRUE) works without a problem on our home PCs, we use a proxy at the firm and therefore it doesn't let the application go directly out of the network on port 80. Is there a way to manually set proxy information within the application so that it can, indeed, reach the internet when we're trying to download and install libraries (and necessary dependencies) from within the application? I've gone through some of the options but there's nothing there for it. Regards, José Emmanuel Batista If you are not an intended recipient of this e-mail, you are not authorized to duplicate, copy, retransmit or redistribute it by any means. Please delete it and any attachments immediately and notify the sender that you have received it in error. Unless specifically indicated, this e-mail is not an offer to buy or sell or a solicitation to buy or sell any securities, investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Neuberger Berman. Any views or opinions presented are solely those of the author and do not necessarily represent those of Neuberger Berman. This e-mail is subject to terms available at the following link: www.nb.com/disclaimer/usa.html. By messaging with Neuberger Berman you consent to the foregoing. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Date handling in R is hard to understand
Have a look at the lubridate package. It claims to try to make dealing with dates easier. -- Bert On Fri, Nov 8, 2013 at 11:41 AM, Alemu Tadesse alemu.tade...@gmail.com wrote: Dear All, I usually work with time series data. The data may come in AM/PM date format or on 24 hour time basis. R can not recognize the two differences automatically - at least for me. I have to specifically tell R in which time format the data is. It seems that Pandas knows how to handle date without being told the format. The problem arises when I try to shift time by a certain time. Say adding 3600 to shift it forward, that case I have to use something like: Measured_data$Date - as.POSIXct(as.character(Measured_data$Date), tz=,format = %m/%d/%Y %I:%M %p)+3600 or Measured_data$Date - as.POSIXct(as.character(Measured_data$Date), tz=,format = %m/%d/%Y %H:%M)+3600 depending on the format. The date also attaches MDT or MST and so on. When merging two data frames with dates of different format that may create a problem (I think). When I get data from excel it could be in any/random format and I needed to customize the date to use in R in one of the above formats. Any TIPS - for automatic processing with no need to specifically tell the data format ? Another problem I saw was that when using r bind to bind data frames, if one column of one of the data frames is a character data (say for example none - coming from mysql) format R doesn't know how to concatenate numeric column from the other data frame to it. I needed to change the numeric to character and later after binding takes place I had to re-convert it to numeric. But, this causes problem in an automated environment. Any suggestion ? Thanks Mihretu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Crime hotspot maps (kernel density)
It is not clear to me what you want/need to do, but it is possible that the spatstat package (in particular the function density.ppp()) might help you. cheers, Rolf Turner On 11/08/13 23:10, David Studer wrote: Hi everybody, does anyone of you know how to create a (crime) hotspot map using R? Are there any packages or do you know any ressources? It should be something like this: http://www.caliper.com/Maptitude/Crime/MotorVehicleTheft2.png (but it doesnt necessarely have to be a map) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to show the second abline ?
On 11/09/2013 03:04 AM, Domokos Péter wrote: Hi, I have the next script in R: x=c(8.0,17.5,23.5,32.0,38.5,48.5,58.5,68.5) y=c(267,246,290,294,302,301,301,298) gap.plot(x,y,ylim=c(8,310),pch=8,cex=0.5, xlab=c('Time'),ylab=c('uS'), gap=c(30,240),gap.axis='y', ytics=c(10,20,30,270,280,290,300)) abline(h=31,col='white',lwd=20) axis.break(axis=2,31) axis.break(axis=4,31) abline(coef(lm(x~y)),col=1)#Why don't show this??? Hi Peter, Perhaps because both of these numbers: coef(lm(x~y)) (Intercept)y -176.50471600.7425131 are off the scale of your plot. Do you really want: lmcoef-coef(lm(y~x)) abline(lmcoef[1]-210,lmcoef[2],col=1) Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Date handling in R is hard to understand
Hi Mihretu, Can you grep for AM or PM? If so build your format string depending upon whether one of these exists in the date string. Jim On 11/09/2013 06:41 AM, Alemu Tadesse wrote: Dear All, I usually work with time series data. The data may come in AM/PM date format or on 24 hour time basis. R can not recognize the two differences automatically - at least for me. I have to specifically tell R in which time format the data is. It seems that Pandas knows how to handle date without being told the format. The problem arises when I try to shift time by a certain time. Say adding 3600 to shift it forward, that case I have to use something like: Measured_data$Date- as.POSIXct(as.character(Measured_data$Date), tz=,format = %m/%d/%Y %I:%M %p)+3600 or Measured_data$Date- as.POSIXct(as.character(Measured_data$Date), tz=,format = %m/%d/%Y %H:%M)+3600 depending on the format. The date also attaches MDT or MST and so on. When merging two data frames with dates of different format that may create a problem (I think). When I get data from excel it could be in any/random format and I needed to customize the date to use in R in one of the above formats. Any TIPS - for automatic processing with no need to specifically tell the data format ? Another problem I saw was that when using r bind to bind data frames, if one column of one of the data frames is a character data (say for example none - coming from mysql) format R doesn't know how to concatenate numeric column from the other data frame to it. I needed to change the numeric to character and later after binding takes place I had to re-convert it to numeric. But, this causes problem in an automated environment. Any suggestion ? Thanks Mihretu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Geweke Diagnostic in CODA package
Hi all: The CODA package provides Geweke Diagnostic method for convergence checking. The geweke.diag in CODA returns Z-score value but not give a conclustion that it is convergence or not. So I'd like to know how small/big the magnitude of Z-score is corresponding to the convergence of a chain. That is, Doese Z-score smaller or more than *threshold *determine the convergence? If so, how big the *threshold *value? See as follows: data(line) geweke.diag(line) $line1 Fraction in 1st window = 0.1 Fraction in 2nd window = 0.5 alphabeta sigma 1.1726 -0.7537 1.0182 $line2 Fraction in 1st window = 0.1 Fraction in 2nd window = 0.5 alphabeta sigma -0.1307 -1.7929 -0.6381 Can anyone tell which chain (line1 or line2) has a better property of convergenc based on the returned Z-scores? Thank you! David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] making chains from pairs
Hello, having a data frame like test with pairs of characters I would like to create chains. For instance from the pairs A/B and B/I you get the vector A B I. It is like jumping from one pair to the next related pair. So for my example test you should get: A B F G H I C F I K D L M N O P test V1 V2 1 A B 2 A F 3 A G 4 A H 5 B F 6 B I 7 C F 8 C I 9 C K 10 D L 11 D M 12 D N 13 L O 14 L P Thanks Hermann dput (test) structure(list(V1 = c(A, A, A, A, B, B, C, C, C, D, D, D, L, L), V2 = c(B, F, G, H, F, I, F, I, K, L, M, N, O, P)), .Names = c(V1, V2), row.names = c(NA, -14L), class = data.frame) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SNPRelate: Plink conversion
You might try to import your data in GenABEL, use as.numeric (gtdata (data)) to get a matrix that delivers you 0,1 or 2 for each snp and id (observation) and then try prcomp. Also check this http://gettinggeneticsdone.blogspot.de/2011/10/new-dimension-to-principal-components_27.html http://www.hsph.harvard.edu/alkes-price/software/ Hope this helps. Hermann 2013/11/8 Danica Fabrigar danica_...@hotmail.com Hi Bert, I thought it was suitable to post the question on the R mailing list first seeing as the problem/question is related to an R package. Danica Date: Fri, 8 Nov 2013 08:14:03 -0800 Subject: Re: [R] SNPRelate: Plink conversion From: gunter.ber...@gene.com To: danica_...@hotmail.com CC: r-help@r-project.org Doesn't this belong on Bioconductor rather than here? -- Bert On Fri, Nov 8, 2013 at 6:04 AM, Danica Fabrigar danica_...@hotmail.com wrote: Hi, Following my earlier posts about having problems performing a PCA, I have worked out what the problem is. The problem lies within the PLINK to gds conversion. It seems as though the SNPs are imported as samples and in turn, the samples are recognised as SNPs: snpsgdsSummary(chr2L) Some values of snp.position are invalid (should be 0)! Some values of snp.chromosome are invalid (should be finite and =1)! Some of snp.allele are not standard! E.g, 2/-9 The file name: chr2L The total number of samples: 2638506 The total number of SNPs: 67 SNP genotypes are stored in SNP-major mode. The number of valid samples: 2638506 The number of valid SNPs: 0 Anyone have any ideas on how to fix this? Thanks, Danica [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] select .txt from .txt in a directory
Hi Atem, It is not clear what you wanted to do. If you want to transfer the subset of files from the main folder to a new location, then you may try: (make sure you create a copy of the original .txt folder before doing this) I created three sub folders and two files (BTemperature_Stations.txt and Tempearture inventory.csv) in my working directory. list.files() #[1] BTemperature_Stations.txt Files1 ## Files1 folder contains all the .txt files; #SubsetFiles: created to subset the files that match the condition #[3] FilesCopy SubsetFiles1 #FilesCopy. A copy of the Files1 folder #[5] Tempearture inventory.csv list.files(pattern=\\.) #[1] BTemperature_Stations.txt Tempearture inventory.csv fl1 - list.files(pattern=\\.) dat1 - read.table(fl1[1],header=TRUE,sep=,stringsAsFactors=FALSE,fill=TRUE,check.names=FALSE) dat2 - read.csv(fl1[2],header=TRUE,sep=,,stringsAsFactors=FALSE,check.names=FALSE) vec1 - dat1[,3][dat1[,3]%in% dat2[,3]] vec2 - list.files(path=/home/arunksa111/Zl/Files1,recursive=TRUE) sum(gsub(.txt,,vec2) %in% vec1) #[1] 98 vec3 - vec2[gsub(.txt,,vec2) %in% vec1] lapply(vec3, function(x) file.rename(paste(/home/arunksa111/Zl/Files1,x,sep=/), paste(/home/arunksa111/Zl/SubsetFiles1,x,sep=/))) #change the path accordingly. length(list.files(path=/home/arunksa111/Zl/SubsetFiles1)) #[1] 98 fileDim - sapply(vec3,function(x) {x1 -read.delim(paste(/home/arunksa111/Zl/SubsetFiles1,x,sep=/),header=TRUE,stringsAsFactors=FALSE,sep=,,check.names=FALSE); dim(x1)}) fileDim[,1:3] # dn3011120.txt dn3011240.txt dn3011887.txt #[1,] 1151 791 1054 #[2,] 7 7 7 A.K. On Friday, November 8, 2013 1:41 PM, Zilefac Elvis zilefacel...@yahoo.com wrote: Hi AK, I want to select some files from a list of files. All are text files. The index for selection is found in column 3 of both files. Attached are my data files. Btemperature_Stations is my main file. Temperature inventory is my 'wanted' file and is a subset of Btemperature_Stations. Using column 3 in both files, select the files in Temperature inventory from Btemperature_Stations. The .zip file contains the .txt files which you will extract to a folder and do the selection in R. Thanks, Atem. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Please help me to short my code
Hi, Try either: Ceramic - read.table(ceramic.dat,header=TRUE) Ceramic1 - Ceramic Ceramic$indx - ((seq_len(nrow(Ceramic))-1)%/%60)+1 library(plyr) DF1 - data.frame(M=as.vector(t(ddply(Ceramic,.(indx), colwise(mean))[,-1])), S=as.vector(t(ddply(Ceramic,.(indx),colwise(sd))[,-1])),Rep = 60) colnames(DF)[3] - colnames(DF1)[3] identical(DF,DF1) #[1] TRUE #or indx - ((seq_len(nrow(Ceramic))-1)%/%60)+1 Ceramic2 - do.call(data.frame, c(aggregate(.~indx,data=Ceramic1,function(x) c(mean(x),sd(x))), check.names=FALSE))[,-1] DF2 - data.frame(M= as.vector(t(Ceramic2[,seq(1,ncol(Ceramic2),by=2)])), S= as.vector(t(Ceramic2[,seq(2,ncol(Ceramic2),by=2)])),Rep =60) identical(DF,DF2) #[1] TRUE A.K. please help me to short the code #To import data onto R Ceramic-read.table(D:/ceramic.dat,header=T) #to obtain mean, standard deviation and number of observations- LAB1-Ceramic[1:60,] LAB2-Ceramic[61:120,] LAB3-Ceramic[121:180,] LAB4-Ceramic[181:240,] LAB5-Ceramic[241:300,] LAB6-Ceramic[301:360,] LAB7-Ceramic[361:420,] LAB8-Ceramic[421:480,] M1-sapply(LAB1,mean) M2-sapply(LAB2,mean) M3-sapply(LAB3,mean) M4-sapply(LAB4,mean) M5-sapply(LAB5,mean) M6-sapply(LAB6,mean) M7-sapply(LAB7,mean) M8-sapply(LAB8,mean) S1-sapply(LAB1,sd) S2-sapply(LAB2,sd) S3-sapply(LAB3,sd) S4-sapply(LAB4,sd) S5-sapply(LAB5,sd) S6-sapply(LAB6,sd) S7-sapply(LAB7,sd) S8-sapply(LAB8,sd) #tabulating results- M-c(M1,M2,M3,M4,M5,M6,M7,M8) S-c(S1,S2,S3,S4,S5,S6,S7,S8) DF-data.frame(M,S,c(rep(60))) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.