Re: [R] Subsetting a list of lists using lapply
Thanks Chuck and Rolf. While Rolf’s code also works on the dput that I actually gave you (a smaller subset of the full dataset), it failed to work on the larger dataset, because there are further exceptions: input[[i]]$content[[1]] is sometimes a list, sometimes a character vector, and sometimes input[[i]]$content simply returns list(). Chuck’s solution however bypasses this and works on the full dataset (which was 8mb, which is why I didn’t upload it as a gist). Best, Aron -- Aron Lindberg Doctoral Candidate, Information Systems Weatherhead School of Management Case Western Reserve University aronlindberg.github.io On Fri, Feb 20, 2015 at 12:44 AM, Charles Berry ccbe...@ucsd.edu wrote: Aron Lindberg aron.lindberg at case.edu writes: Hi Everyone, I'm working on a thorny subsetting problem involving list of lists. I've put a dput of the data here: https://gist.githubusercontent.com/aronlindberg/b916dee897d051ac5be5/ raw/a78cbf873a7e865c3173f943ff6309ea688c653b/dput IIUC, you want the value of every list element that is named sha and that name will only apply to atomic objects. If so, this should do it. input - dget(/tmp/dpt) shas - unlist( input, use.names=FALSE )[ grepl( sha, names(unlist(input)))] input[[67]]$content[[1]]$sha [1] 58cf43ecdc1beb7e1043e9de612ecc817b090f15 which(input[[67]]$content[[1]]$sha == shas ) [1] 194 HTH, Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] colours in ggplot2
Dear Antonello, You can specify the colours manually with scale_colour_manual(). See http://docs.ggplot2.org/0.9.3.1/scale_manual.html for some examples. The last examples uses greys. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie Kwaliteitszorg / team Biometrics Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2015-02-20 13:54 GMT+01:00 Antonello Preti antovi...@gmail.com: Hi, I'm using ggplot2 to make a plot of the regression of a variable x (let say, levels of depression), on a variable y (let say, degree of social impairment), by taking into account a binary factor (having had or not a past admission to a psychiatric service), and age of partecipants. After some search in Internet I produced a code which is satisfying to me. This site was very helpful: http://editerna.free.fr/wp/?p=266 However, I have a problem: no matter what I try, the figures always include bluette and pink flamingo colours. The figure is for an academic article, and I cannot afford the price of having the plot printed in colours. I've extracted the structure of the figure, and I understand that the problem is in the scale_name hue, but I cannot figure out how to deal with it. Any way to override the ggplot2 system of dealing with factors? Here the codes and the sessionInfo The code is a bit baroque, but this is the best I was able to do. Thank you in advance, Antonello Preti code for exemplification ### the dataset df - structure(list(Social_impairment = c(2.83, 3.08, 2.75, 2.08, 2.92, 1.75, 3.5, 2.33, 2.91, 2.5, 3.25, 2.64, 3.25, 2.83, 2.08, 2.25, 2.17, 2.42, 2.58, 2.42, 2.58, 2.42, 3, 3, 2.83, 2.67, 3.58, 1.58, 2.83, 2.83, 2.67, 3.17, 2.42, 1.92, 2.92, 2.5, 2.42, 2.42, 2.58, 2.42, 3.33, 3, 3.17, 2.17, 2.58, 2.67, 2.58, 3.75, 2.5, 2.08, 2.25, 3.25, 3.17, 2.91, 2.08, 2.25, 3.08, 2.91, 3.08, 2.92, 1.83, 2.5, 2.5, 2.83, 2.67, 3.33, 2.83, 3.33, 2.92, 3), Levels_Depression = c(1.3, 1.71, 3.08, 0.48, 0.51, 0.71, 1.37, 0.2, 1.21, 1.07, 2.8, 1.24, 0.46, 0.97, 0.81, 1.13, 1.58, 3.12, 1.8, 1.54, 1.02, 0.32, 2.63, 1.39, 1.34, 2.37, 2.6, 1.11, 1.59, 2.17, 1.99, 0.59, 0.76, 0.23, 2.22, 1.98, 0.41, 0.32, 0.37, 1.11, 2.29, 0.97, 1.61, 1.27, 1.22, 2.38, 1.28, 1.21, 0.93, 2.3, 0.8, 2.1, 2.86, 2.47, 2.34, 2.67, 0.31, 0.88, 1.84, 0.23, 2.41, 0.56, 2.03, 1.11, 0.12, 2.39, 0.34, 2.08, 1.01, 1.51), Age = c(66, 59, 49, 70, 42, 55, 28, 41, 69, 65, 40, 21, 18, 77, 28, 40, 47, 37, 47, 39, 32, 33, 42, 28, 59, 49, 29, 41, 22, 29, 53, 39, 55, 61, 30, 49, 43, 46, 18, 36, 34, 17, 42, 37, 37, 54, 48, 23, 71, 42, 52, 83, 19, 47, 23, 80, 43, 38, 47, 80, 36, 73, 74, 51, 76, 14, 65, 39, 17, 73), Past_Admissions = c(1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1)), .Names = c(Social_impairment, Levels_Depression, Age, Past_Admissions), row.names = c(NA, 70L), class = data.frame) dim(df) head(df) str(df) summary(df) ### call the library library(ggplot2) the plot Levels_Depression on Social_impairment by Past_Admissions (yes/no) linear model radius of the bubbles proportional to age background elimination p1 - ggplot(data = df, aes(x =Levels_Depression, y = Social_impairment, group = as.factor(Past_Admissions), col = as.factor(Past_Admissions))) + geom_point(aes(size = Age)) + geom_smooth(method = lm) + xlab(Levels of depression) + ylab(Social impairment) + scale_colour_discrete(History of \npast admissions\nto a psychiatric service, labels = c(No, Yes)) p1 + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line(colour = black)) ### change of then axes' ticks p1 + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line(colour = black), axis.text = element_text(color = black, size = 12, face = italic)) ### after saving, dev.off() ### Age on Social_impairment by Past_Admissions (yes/no) linear model radius of the bubbles proportional to Levels_Depression background elimination p2 - ggplot(data = df, aes(x =Age , y = Social_impairment, group = as.factor(Past_Admissions), col = as.factor(Past_Admissions))) + geom_point(aes(size = Levels_Depression)) +
Re: [R] How to analyse nonlinear response to categorical and quantitative explanatory variables?
You want to use a generalized linear model of some sort glm(count ~ flow + gravity + group, data=mydata, family=poisson) would be a start, however, the effects of flow rate are nonlinear, so you might use a natural spline term like ns(flow,5) to allow nonlinearity, and there also seem to be interactions in your plot. library(splines) glm(count ~ ns(flow,5) * gravity + group, data=mydata, family=poisson) That might get you started while you look for a statistician to consult with. -Michael On 2/19/2015 9:47 AM, Jan-Ulrich Kreft wrote: Dear list I have data from a collaborator who has used DesignExpert to design the experiment and analyse the data but no longer has access to this software and does not know exactly what the software did and why. So I’m now trying to analyse the data in R but can't quite decide what to do. Cell count is the response variable (number of cells attached to a surface per unit area and time interval, so could be Poisson distributed). This cell count depends on whether the surface was oriented upwards or downwards (categorical - with or against gravity). Some more categorical variables were also studied such as surface material (glass or polycarbonate, symbols g and p in the figure) and position in flow cell (inlet or outlet), but they seem to have no significant effect. Cell count also depends on a quantitative variable in a nonlinear manner: the flow rate with which the cell suspension was pumped along the surface. I was wondering which kind of statistical model would be appropriate. I was first thinking ANCOVA but this seems to be a linear model and treating the quantitative explanatory variable as covariate when this is actually of interest. What else could I use? Attached a figure showing the means of 4 replicates. Many thanks. Best wishes, Jan. --- Dr Jan-Ulrich Kreft +44 (0)121 41-48851 School of Biosciences University of Birmingham, Birmingham, B15 2TT, UK http://www.tinyurl.com/kreftlab __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] colours in ggplot2
Hi, I'm using ggplot2 to make a plot of the regression of a variable x (let say, levels of depression), on a variable y (let say, degree of social impairment), by taking into account a binary factor (having had or not a past admission to a psychiatric service), and age of partecipants. After some search in Internet I produced a code which is satisfying to me. This site was very helpful: http://editerna.free.fr/wp/?p=266 However, I have a problem: no matter what I try, the figures always include bluette and pink flamingo colours. The figure is for an academic article, and I cannot afford the price of having the plot printed in colours. I've extracted the structure of the figure, and I understand that the problem is in the scale_name hue, but I cannot figure out how to deal with it. Any way to override the ggplot2 system of dealing with factors? Here the codes and the sessionInfo The code is a bit baroque, but this is the best I was able to do. Thank you in advance, Antonello Preti code for exemplification ### the dataset df - structure(list(Social_impairment = c(2.83, 3.08, 2.75, 2.08, 2.92, 1.75, 3.5, 2.33, 2.91, 2.5, 3.25, 2.64, 3.25, 2.83, 2.08, 2.25, 2.17, 2.42, 2.58, 2.42, 2.58, 2.42, 3, 3, 2.83, 2.67, 3.58, 1.58, 2.83, 2.83, 2.67, 3.17, 2.42, 1.92, 2.92, 2.5, 2.42, 2.42, 2.58, 2.42, 3.33, 3, 3.17, 2.17, 2.58, 2.67, 2.58, 3.75, 2.5, 2.08, 2.25, 3.25, 3.17, 2.91, 2.08, 2.25, 3.08, 2.91, 3.08, 2.92, 1.83, 2.5, 2.5, 2.83, 2.67, 3.33, 2.83, 3.33, 2.92, 3), Levels_Depression = c(1.3, 1.71, 3.08, 0.48, 0.51, 0.71, 1.37, 0.2, 1.21, 1.07, 2.8, 1.24, 0.46, 0.97, 0.81, 1.13, 1.58, 3.12, 1.8, 1.54, 1.02, 0.32, 2.63, 1.39, 1.34, 2.37, 2.6, 1.11, 1.59, 2.17, 1.99, 0.59, 0.76, 0.23, 2.22, 1.98, 0.41, 0.32, 0.37, 1.11, 2.29, 0.97, 1.61, 1.27, 1.22, 2.38, 1.28, 1.21, 0.93, 2.3, 0.8, 2.1, 2.86, 2.47, 2.34, 2.67, 0.31, 0.88, 1.84, 0.23, 2.41, 0.56, 2.03, 1.11, 0.12, 2.39, 0.34, 2.08, 1.01, 1.51), Age = c(66, 59, 49, 70, 42, 55, 28, 41, 69, 65, 40, 21, 18, 77, 28, 40, 47, 37, 47, 39, 32, 33, 42, 28, 59, 49, 29, 41, 22, 29, 53, 39, 55, 61, 30, 49, 43, 46, 18, 36, 34, 17, 42, 37, 37, 54, 48, 23, 71, 42, 52, 83, 19, 47, 23, 80, 43, 38, 47, 80, 36, 73, 74, 51, 76, 14, 65, 39, 17, 73), Past_Admissions = c(1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1)), .Names = c(Social_impairment, Levels_Depression, Age, Past_Admissions), row.names = c(NA, 70L), class = data.frame) dim(df) head(df) str(df) summary(df) ### call the library library(ggplot2) the plot Levels_Depression on Social_impairment by Past_Admissions (yes/no) linear model radius of the bubbles proportional to age background elimination p1 - ggplot(data = df, aes(x =Levels_Depression, y = Social_impairment, group = as.factor(Past_Admissions), col = as.factor(Past_Admissions))) + geom_point(aes(size = Age)) + geom_smooth(method = lm) + xlab(Levels of depression) + ylab(Social impairment) + scale_colour_discrete(History of \npast admissions\nto a psychiatric service, labels = c(No, Yes)) p1 + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line(colour = black)) ### change of then axes' ticks p1 + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line(colour = black), axis.text = element_text(color = black, size = 12, face = italic)) ### after saving, dev.off() ### Age on Social_impairment by Past_Admissions (yes/no) linear model radius of the bubbles proportional to Levels_Depression background elimination p2 - ggplot(data = df, aes(x =Age , y = Social_impairment, group = as.factor(Past_Admissions), col = as.factor(Past_Admissions))) + geom_point(aes(size = Levels_Depression)) + geom_smooth(method = lm) +xlab(Age of participants) + ylab(Social impairment) + scale_colour_discrete(History of \npast admissions\nto a psychiatric service, labels = c(No, Yes)) p2 + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line(colour = black)) ### change of then axes' ticks p2 + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line(colour = black), axis.text = element_text(color = black, size = 12, face = italic)) ### after saving, dev.off() ### paired plots library(gridExtra) grid.arrange(p1, p2, ncol = 2) ### sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=Italian_Italy.1252
Re: [R] multiple parameter optimization with optim()
John et al Thank you for your advice below. It was sloppy of me not to verify my reproducible code below. I have tried a few of your suggestions and wrapped the working code into the function below called pl2. The function properly lands on the right model parameters when I use the optim or nlminb (for nlminb I had to increase max iterations). The function is enormously slow. At first, I created the object rr1 with two calls to sapply(). This works, but creates an extremely large matrix at each iteration. library(statmod) dat - replicate(20, sample(c(0,1), 2000, replace = T)) a - b - rep(1, 20) Q - 10 qq - gauss.quad.prob(Q, dist = 'normal', mu = 0, sigma=1) nds - qq$nodes wts - qq$weights rr1 - sapply(1:nrow(dat), function(j) sapply(1:Q, function(i) exp(sum(dbinom(dat[j,], 1, 1/ (1 + exp(- 1.7 * a * (qq$nodes[i] - b))), log = TRUE))) * qq$weights[i])) So, I thought to reduce some memory, I would do it this way which is equivalent, doesn't create such a large matrix, but instead uses an explicit loop. Both approaches are still equally as slow. rr1 - numeric(nrow(dat)) for(j in 1:length(rr1)){ rr1[j] - sum(sapply(1:Q, function(i) exp(sum(dbinom(dat[j,], 1, 1/ (1 + exp(- 1.7 * a * (nds[i] - b))), log = TRUE))) * wts[i])) } As you noted, my likelihood is not complex; in fact I have another program that uses newton-raphson with the analytic first and second derivatives because they are so easy to find. In that program, the model converges very (very) quickly. My purpose in using numeric differentiation is experiential in some respects and hoping to apply this to problems for which the analytic derivatives might not be so easy to come by. I think the basic idea here to improve speed is to make a call to the gradient, which I understand to be the vector of first derivatives of my likelihood function, is that right? If that is right, in a multi-parameter problem, I'm not sure how to think about the gradient function. Since I am maximizing w.r.t. a and b (these are the parameters of the model), I would have a vector of first partials for a and another for b. So I conceptually do not understand what the gradient would be in this instance, perhaps some clarification would be helpful. Below is the working function, which as I noted is enormously slow. Any advice on speed improvements here would be helpful. Thank you pl2 - function(dat, Q, startVal = NULL, ...){ if(!is.null(startVal) length(startVal) != ncol(dat) ){ stop(Length of argument startVal not equal to the number of parameters estimated) } if(!is.null(startVal)){ startVal - startVal } else { p - colMeans(dat) startValA - rep(1, ncol(dat)) startValB - as.vector(log((1 - p)/p)) startVal - c(startValA,startValB) } rr1 - numeric(nrow(dat)) qq - gauss.quad.prob(Q, dist = 'normal', mu = 0, sigma=1) nds - qq$nodes wts - qq$weights dat - as.matrix(dat) fn - function(params){ a - params[1:20] b - params[21:40] for(j in 1:length(rr1)){ rr1[j] - sum(sapply(1:Q, function(i) exp(sum(dbinom(dat[j,], 1, 1/ (1 + exp(- 1.7 * a * (nds[i] - b))), log = TRUE))) * wts[i])) } -sum(log(rr1)) } #opt - optim(startVal, fn, method = BFGS, hessian = TRUE) opt - nlminb(startVal, fn) #opt - Rcgmin(startVal, fn) opt #list(coefficients = opt$par, LogLik = -opt$value, Std.Error = sqrt(diag(solve(opt$hessian } dat - replicate(20, sample(c(0,1), 2000, replace = T)) r2 - pl2(datat, Q =10) -Original Message- From: Prof J C Nash (U30A) [mailto:nas...@uottawa.ca] Sent: Wednesday, February 18, 2015 9:07 AM To: r-help@r-project.org; Doran, Harold Subject: Re: [R] multiple parameter optimization with optim() Some observations -- no solution here though: 1) the code is not executable. I tried. Maybe that makes it reproducible! Typos such as stat mod, undefined Q etc. 2) My experience is that any setup with a ?apply approach that doesn't then check to see that the structure of the data is correct has a high probability of failure due to mismatch with the optimizer requirements. It's worth being VERY pedestrian in setting up optimization functions and checking obsessively that you get what you expect and that there are no regions you
[R] irregular sequence of events
Dear all I know I am missing something obvious but after few hours of trials I ask for some help. I have some sequence of values (days) x - 1:30 and an indication of event start and end day mimo-c(5,10, 13,16, 21,27) or events - structure(list(start = c(5, 13, 21), end = c(10, 16, 27)), .Names = c(start, end), row.names = c(NA, -3L), class = data.frame) I need to get a factor indicating event event - c(rep(NA, 4), rep(A1, 6), rep(NA, 2), rep(A2, 4), rep(NA, 4), rep(A3, 7), rep(NA,3)) factor(event) In such small example I can do it manually but I have a long vector of dates and would like to use start and end day of events either from mimo vector or from events data frame. Is there any function which does it automagically? I know I have seen it before but I cannot find it now. Best regards Petr Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům. Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému. Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu. V případě, že je tento e-mail součástí obchodního jednání: - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu. - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce s dodatkem či odchylkou. - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným dosažením shody na všech jejích náležitostech. - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi či osobě jím zastoupené známá. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner. The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email. In case that this e-mail forms part of business dealings: - the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning. - if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation. - the sender insists on that the respective contract is concluded only upon an express mutual agreement on all its aspects. - the sender of this e-mail informs that he/she is not authorized to enter into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such authorization or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such authorization is known to the recipient of the person represented by the recipient. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Design patterns for data munging?
Hi All, The most difficult challenge that I face in “learning R” is to do data munging. I have reviewed Hadley’s advanced R programming guide, familiarized myself with data structures, subsetting, plyr, dplyr, tidy, the lapply() family of functions, basic string manipulation and grepping, SQL etc. I’ve also written a few dozens of functions that do basic data munging tasks. Further, I’ve already reviewed things like the Coursera course “Computing for Data Analysis” - https://www.coursera.org/course/compdata and Data Camp's data.table course. However, many of the tasks that are commonly solved by the tools mentioned above seem to be mainly applied to datasets with fairly well-structured variables that needs to be transformed and subsetted in various ways - these tasks are often not so difficult. Much of my work involves querying APIs, SQL databases or scraping websites, and then assembling lists of various things that can then be transformed into social networks or timestamped sequences of various events etc. Solutions to many tricky problems in this area still seem to imply creative leaps of imagination that I can understand after I see them, but I have trouble seeing how I could ever come up with them independently. Therefore I ask - what do I need to learn to become better at solving tricky data munging problems? I realize a common answer may be: solve many data munging problems. I understand that this is a clear factor, however, I’m trying to figure out if there is some more tangible guidance. * Is there something like “design patterns” for data munging? * Would doing a course in algorithms help? (I’ve reviewed parts of Guide to Programming and Algorithms Using R - http://www.springer.com/computer/swe/book/978-1-4471-5327-6 - many of the problems are mathematical and seem far-removed from the kinds of problems that I’m trying to solve) * Is there something like SelectorGadget (http://selectorgadget.com/) for R objects? * Could something like OpenRefine (http://openrefine.org/) make these tasks easier? Best, Aron -- Aron Lindberg Doctoral Candidate, Information Systems Weatherhead School of Management Case Western Reserve University aronlindberg.github.io [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subsetting a list of lists using lapply
Hmm…Chuck’s solution may actually be problematic because there are several entries which at the deepest level are called “sha”, but that should not be included, such as: input[[67]]$content[[1]]$commit$tree$sha and input[[67]]$content[[1]]$parents[[1]]$sha it’s only the “sha” that fit the following subsetting pattern that should be included: input[[i]]$content[[1]]$sha[1] It’s getting thornier! To be fair to Rolf’s solution (which probably can be updated to solve the problem), I’ve posted the complete dput here: https://gist.githubusercontent.com/aronlindberg/92700c04c88ff112e4f7/raw/0f3cd8468f4dc82267be3cec72d53a7a04f5c449/dput.R -- Aron Lindberg Doctoral Candidate, Information Systems Weatherhead School of Management Case Western Reserve University aronlindberg.github.io On Fri, Feb 20, 2015 at 8:25 AM, Aron Lindberg aron.lindb...@case.edu wrote: Thanks Chuck and Rolf. While Rolf’s code also works on the dput that I actually gave you (a smaller subset of the full dataset), it failed to work on the larger dataset, because there are further exceptions: input[[i]]$content[[1]] is sometimes a list, sometimes a character vector, and sometimes input[[i]]$content simply returns list(). Chuck’s solution however bypasses this and works on the full dataset (which was 8mb, which is why I didn’t upload it as a gist). Best, Aron -- Aron Lindberg Doctoral Candidate, Information Systems Weatherhead School of Management Case Western Reserve University aronlindberg.github.io On Fri, Feb 20, 2015 at 12:44 AM, Charles Berry ccbe...@ucsd.edu wrote: Aron Lindberg aron.lindberg at case.edu writes: Hi Everyone, I'm working on a thorny subsetting problem involving list of lists. I've put a dput of the data here: https://gist.githubusercontent.com/aronlindberg/b916dee897d051ac5be5/ raw/a78cbf873a7e865c3173f943ff6309ea688c653b/dput IIUC, you want the value of every list element that is named sha and that name will only apply to atomic objects. If so, this should do it. input - dget(/tmp/dpt) shas - unlist( input, use.names=FALSE )[ grepl( sha, names(unlist(input)))] input[[67]]$content[[1]]$sha [1] 58cf43ecdc1beb7e1043e9de612ecc817b090f15 which(input[[67]]$content[[1]]$sha == shas ) [1] 194 HTH, Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] irregular sequence of events
Hi, On Fri, Feb 20, 2015 at 9:27 AM, PIKAL Petr petr.pi...@precheza.cz wrote: Dear all I know I am missing something obvious but after few hours of trials I ask for some help. I have some sequence of values (days) x - 1:30 and an indication of event start and end day mimo-c(5,10, 13,16, 21,27) cut(x, mimo) [1] NANANANANA(5,10] (5,10] (5,10] (5,10] [10] (5,10] (10,13] (10,13] (10,13] (13,16] (13,16] (13,16] (16,21] (16,21] [19] (16,21] (16,21] (16,21] (21,27] (21,27] (21,27] (21,27] (21,27] (21,27] [28] NANANA Levels: (5,10] (10,13] (13,16] (16,21] (21,27] should get you started. You'll need to tweak the arguments to get exactly what you want, or events - structure(list(start = c(5, 13, 21), end = c(10, 16, 27)), .Names = c(start, end), row.names = c(NA, -3L), class = data.frame) I need to get a factor indicating event event - c(rep(NA, 4), rep(A1, 6), rep(NA, 2), rep(A2, 4), rep(NA, 4), rep(A3, 7), rep(NA,3)) factor(event) In such small example I can do it manually but I have a long vector of dates and would like to use start and end day of events either from mimo vector or from events data frame. Is there any function which does it automagically? I know I have seen it before but I cannot find it now. Best regards Petr -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem in R
Dear all members I have the following matrix in R thd - matrix(testJAGSdata$thd, ncol=6, byrow=TRUE) head(thd) [,1] [,2] [,3] [,4] [,5] [,6] [1,] -200 -2.517 -1.245 -0.444 0.848 200 [2,] -200 -1.447 -0.420 0.119 1.245 200 [3,] -200 -1.671 -0.869 -0.194 0.679 200 [4,] -200 -1.642 -0.869 -0.293 0.332 200 [5,] -200 -1.671 -0.827 0.052 0.756 200 [6,] -200 -1.769 -1.098 -0.469 0.255 200 [7,] -200 -1.490 -0.670 -0.082 0.880 200 [8,] -200 -1.933 -0.880 -0.317 1.008 200 [9,] -200 -1.587 -0.624 0.000 1.008 200 [10,] -200 -1.983 -1.348 -0.348 1.045 200 [11,] -200 -1.983 -1.229 -0.247 0.869 200 [12,] -200 -2.262 -1.426 0.037 1.330 200 [13,] -200 -2.371 -1.295 -0.224 0.651 200 [14,] -200 -2.039 -1.112 -0.149 1.169 200 [15,] -200 -2.262 -1.198 -0.309 1.198 200 [16,] -200 -2.176 -1.537 -0.717 0.597 200 [17,] -200 -1.447 -0.786 0.119 1.008 200 [18,] -200 -2.039 -1.769 -0.661 0.642 200 and when i implemented this matrix i found this error + head(thd) + [,1] [,2] [,3] [,4] [,5] [,6] + [1,] -200 -2.517 -1.245 -0.444 0.848 200 Error: unexpected numeric constant in: [,1] [,2] [,3] [,4] [,5] [,6] [1,] -200 -2.517 -1.245 -0.444 0.848 [2,] -200 -1.447 -0.420 0.119 1.245 200 Error: unexpected '[' in [ [3,] -200 -1.671 -0.869 -0.194 0.679 200 Error: unexpected '[' in [ [4,] -200 -1.642 -0.869 -0.293 0.332 200 Error: unexpected '[' in [ [5,] -200 -1.671 -0.827 0.052 0.756 200 Error: unexpected '[' in [ [6,] -200 -1.769 -1.098 -0.469 0.255 200 Error: unexpected '[' in [ [7,] -200 -1.490 -0.670 -0.082 0.880 200 Error: unexpected '[' in [ [8,] -200 -1.933 -0.880 -0.317 1.008 200 Error: unexpected '[' in [ [9,] -200 -1.587 -0.624 0.000 1.008 200 Error: unexpected '[' in [ [10,] -200 -1.983 -1.348 -0.348 1.045 200 Error: unexpected '[' in [ [11,] -200 -1.983 -1.229 -0.247 0.869 200 Error: unexpected '[' in [ [12,] -200 -2.262 -1.426 0.037 1.330 200 Error: unexpected '[' in [ [13,] -200 -2.371 -1.295 -0.224 0.651 200 Error: unexpected '[' in [ [14,] -200 -2.039 -1.112 -0.149 1.169 200 Error: unexpected '[' in [ [15,] -200 -2.262 -1.198 -0.309 1.198 200 Error: unexpected '[' in [ [16,] -200 -2.176 -1.537 -0.717 0.597 200 Error: unexpected '[' in [ [17,] -200 -1.447 -0.786 0.119 1.008 200 Error: unexpected '[' in [ [18,] -200 -2.039 -1.769 -0.661 0.642 200 Error: unexpected '[' in [ Any help would be very appreciated thanks in advance -- Thanoon Y. Thanoon PhD Candidate Department of Mathematical Sciences Faculty of Science University Technology Malaysia, UTM E.Mail: thanoon.youni...@gmail.com E.Mail: dawn_praye...@yahoo.com Facebook:Thanoon Younis AL-Shakerchy Twitter: Thanoon Alshakerchy H.P:00601127550205 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem in R
Hi, On Fri, Feb 20, 2015 at 9:48 AM, thanoon younis thanoon.youni...@gmail.com wrote: Dear all members I have the following matrix in R thd - matrix(testJAGSdata$thd, ncol=6, byrow=TRUE) head(thd) [,1] [,2] [,3] [,4] [,5] [,6] [1,] -200 -2.517 -1.245 -0.444 0.848 200 [2,] -200 -1.447 -0.420 0.119 1.245 200 [3,] -200 -1.671 -0.869 -0.194 0.679 200 [4,] -200 -1.642 -0.869 -0.293 0.332 200 [5,] -200 -1.671 -0.827 0.052 0.756 200 [6,] -200 -1.769 -1.098 -0.469 0.255 200 [7,] -200 -1.490 -0.670 -0.082 0.880 200 [8,] -200 -1.933 -0.880 -0.317 1.008 200 [9,] -200 -1.587 -0.624 0.000 1.008 200 [10,] -200 -1.983 -1.348 -0.348 1.045 200 [11,] -200 -1.983 -1.229 -0.247 0.869 200 [12,] -200 -2.262 -1.426 0.037 1.330 200 [13,] -200 -2.371 -1.295 -0.224 0.651 200 [14,] -200 -2.039 -1.112 -0.149 1.169 200 [15,] -200 -2.262 -1.198 -0.309 1.198 200 [16,] -200 -2.176 -1.537 -0.717 0.597 200 [17,] -200 -1.447 -0.786 0.119 1.008 200 [18,] -200 -2.039 -1.769 -0.661 0.642 200 This is not reproducible, so the below are guesses. and when i implemented this matrix i found this error I have no idea what implemented this matrix might mean. + head(the) First problem: the + means R expects continuation of a previous line, because the command is incomplete. So whatever you did BEFORE this line is wrong. + [,1] [,2] [,3] [,4] [,5] [,6] + [1,] -200 -2.517 -1.245 -0.444 0.848 200 Error: unexpected numeric constant in: [,1] [,2] [,3] [,4] [,5] [,6] [1,] -200 -2.517 -1.245 -0.444 0.848 These and subsequent errors are what you'd get if you pasted the above R output back into the R console. Why would you do that? So: check your previous commands. If you can't find your mistake, respond to the list with a clear reproducible example. [2,] -200 -1.447 -0.420 0.119 1.245 200 Error: unexpected '[' in [ [3,] -200 -1.671 -0.869 -0.194 0.679 200 Error: unexpected '[' in [ [4,] -200 -1.642 -0.869 -0.293 0.332 200 Error: unexpected '[' in [ [5,] -200 -1.671 -0.827 0.052 0.756 200 Error: unexpected '[' in [ [6,] -200 -1.769 -1.098 -0.469 0.255 200 Error: unexpected '[' in [ [7,] -200 -1.490 -0.670 -0.082 0.880 200 Error: unexpected '[' in [ [8,] -200 -1.933 -0.880 -0.317 1.008 200 Error: unexpected '[' in [ [9,] -200 -1.587 -0.624 0.000 1.008 200 Error: unexpected '[' in [ [10,] -200 -1.983 -1.348 -0.348 1.045 200 Error: unexpected '[' in [ [11,] -200 -1.983 -1.229 -0.247 0.869 200 Error: unexpected '[' in [ [12,] -200 -2.262 -1.426 0.037 1.330 200 Error: unexpected '[' in [ [13,] -200 -2.371 -1.295 -0.224 0.651 200 Error: unexpected '[' in [ [14,] -200 -2.039 -1.112 -0.149 1.169 200 Error: unexpected '[' in [ [15,] -200 -2.262 -1.198 -0.309 1.198 200 Error: unexpected '[' in [ [16,] -200 -2.176 -1.537 -0.717 0.597 200 Error: unexpected '[' in [ [17,] -200 -1.447 -0.786 0.119 1.008 200 Error: unexpected '[' in [ [18,] -200 -2.039 -1.769 -0.661 0.642 200 Error: unexpected '[' in [ Any help would be very appreciated thanks in advance -- Thanoon Y. Thanoon PhD Candidate Department of Mathematical Sciences Faculty of Science University Technology Malaysia, UTM E.Mail: thanoon.youni...@gmail.com E.Mail: dawn_praye...@yahoo.com Facebook:Thanoon Younis AL-Shakerchy Twitter: Thanoon Alshakerchy H.P:00601127550205 -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multiple parameter optimization with optim()
This is not the proper venue for a discussion of the mathematics of optimization, no matter that it is interesting. Please take it off list. Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. Clifford Stoll On Fri, Feb 20, 2015 at 6:03 AM, Doran, Harold hdo...@air.org wrote: John et al Thank you for your advice below. It was sloppy of me not to verify my reproducible code below. I have tried a few of your suggestions and wrapped the working code into the function below called pl2. The function properly lands on the right model parameters when I use the optim or nlminb (for nlminb I had to increase max iterations). The function is enormously slow. At first, I created the object rr1 with two calls to sapply(). This works, but creates an extremely large matrix at each iteration. library(statmod) dat - replicate(20, sample(c(0,1), 2000, replace = T)) a - b - rep(1, 20) Q - 10 qq - gauss.quad.prob(Q, dist = 'normal', mu = 0, sigma=1) nds - qq$nodes wts - qq$weights rr1 - sapply(1:nrow(dat), function(j) sapply(1:Q, function(i) exp(sum(dbinom(dat[j,], 1, 1/ (1 + exp(- 1.7 * a * (qq$nodes[i] - b))), log = TRUE))) * qq$weights[i])) So, I thought to reduce some memory, I would do it this way which is equivalent, doesn't create such a large matrix, but instead uses an explicit loop. Both approaches are still equally as slow. rr1 - numeric(nrow(dat)) for(j in 1:length(rr1)){ rr1[j] - sum(sapply(1:Q, function(i) exp(sum(dbinom(dat[j,], 1, 1/ (1 + exp(- 1.7 * a * (nds[i] - b))), log = TRUE))) * wts[i])) } As you noted, my likelihood is not complex; in fact I have another program that uses newton-raphson with the analytic first and second derivatives because they are so easy to find. In that program, the model converges very (very) quickly. My purpose in using numeric differentiation is experiential in some respects and hoping to apply this to problems for which the analytic derivatives might not be so easy to come by. I think the basic idea here to improve speed is to make a call to the gradient, which I understand to be the vector of first derivatives of my likelihood function, is that right? If that is right, in a multi-parameter problem, I'm not sure how to think about the gradient function. Since I am maximizing w.r.t. a and b (these are the parameters of the model), I would have a vector of first partials for a and another for b. So I conceptually do not understand what the gradient would be in this instance, perhaps some clarification would be helpful. Below is the working function, which as I noted is enormously slow. Any advice on speed improvements here would be helpful. Thank you pl2 - function(dat, Q, startVal = NULL, ...){ if(!is.null(startVal) length(startVal) != ncol(dat) ){ stop(Length of argument startVal not equal to the number of parameters estimated) } if(!is.null(startVal)){ startVal - startVal } else { p - colMeans(dat) startValA - rep(1, ncol(dat)) startValB - as.vector(log((1 - p)/p)) startVal - c(startValA,startValB) } rr1 - numeric(nrow(dat)) qq - gauss.quad.prob(Q, dist = 'normal', mu = 0, sigma=1) nds - qq$nodes wts - qq$weights dat - as.matrix(dat) fn - function(params){ a - params[1:20] b - params[21:40] for(j in 1:length(rr1)){ rr1[j] - sum(sapply(1:Q, function(i) exp(sum(dbinom(dat[j,], 1, 1/ (1 + exp(- 1.7 * a * (nds[i] - b))), log = TRUE))) * wts[i])) } -sum(log(rr1)) } #opt - optim(startVal, fn, method = BFGS, hessian = TRUE) opt - nlminb(startVal, fn) #opt - Rcgmin(startVal, fn) opt #list(coefficients = opt$par, LogLik = -opt$value, Std.Error = sqrt(diag(solve(opt$hessian } dat - replicate(20, sample(c(0,1), 2000, replace = T)) r2 - pl2(datat, Q =10) -Original Message- From: Prof J C Nash (U30A) [mailto:nas...@uottawa.ca] Sent: Wednesday, February 18, 2015 9:07 AM To: r-help@r-project.org; Doran, Harold Subject: Re: [R] multiple parameter optimization with optim() Some observations -- no solution here though: 1) the code is not executable. I tried. Maybe that makes
[R] creating a distinct zip file
Hello yet again. I am trying to create a zip file for a friend who has a Windows machine. He needs to access this via the local zip file packages option. When I use R CMD INSTALL --compile-both, it produces an item in the library tree (as promised). However, I would like to have an actual .zip file. I do know at one time that was possible, not sure if I can still do it. I did try R CMD INSTALL --force-biarch as well, same result as compile both. thank you for any suggestions. Sincerely, Erin -- Erin Hodgess Associate Professor Department of Mathematical and Statistics University of Houston - Downtown mailto: erinm.hodg...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] irregular sequence of events
A little shorter version of the SQL solution after consulting with my SQL expert: require(sqldf) timeline - data.frame(time = 1:30) events - structure(list(start = c(5, 13, 21), end = c(10, 16, 27)), .Names = c(start, + end), row.names = c(NA, -3L), class = data.frame) # add event number events$num - paste0(A, seq(nrow(events))) events start end num 1 5 10 A1 213 16 A2 321 27 A3 sqldf( + select t.*, e.num + from timeline t + left join events as e + on t.time between e.start and e.end + ) time num 1 1 NA 2 2 NA 3 3 NA 4 4 NA 5 5 A1 6 6 A1 7 7 A1 8 8 A1 9 9 A1 10 10 A1 11 11 NA 12 12 NA 13 13 A2 14 14 A2 15 15 A2 16 16 A2 17 17 NA 18 18 NA 19 19 NA 20 20 NA 21 21 A3 22 22 A3 23 23 A3 24 24 A3 25 25 A3 26 26 A3 27 27 A3 28 28 NA 29 29 NA 30 30 NA Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Fri, Feb 20, 2015 at 9:27 AM, PIKAL Petr petr.pi...@precheza.cz wrote: Dear all I know I am missing something obvious but after few hours of trials I ask for some help. I have some sequence of values (days) x - 1:30 and an indication of event start and end day mimo-c(5,10, 13,16, 21,27) or events - structure(list(start = c(5, 13, 21), end = c(10, 16, 27)), .Names = c(start, end), row.names = c(NA, -3L), class = data.frame) I need to get a factor indicating event event - c(rep(NA, 4), rep(A1, 6), rep(NA, 2), rep(A2, 4), rep(NA, 4), rep(A3, 7), rep(NA,3)) factor(event) In such small example I can do it manually but I have a long vector of dates and would like to use start and end day of events either from mimo vector or from events data frame. Is there any function which does it automagically? I know I have seen it before but I cannot find it now. Best regards Petr Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům. Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému. Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu. V případě, že je tento e-mail součástí obchodního jednání: - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu. - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce s dodatkem či odchylkou. - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným dosažením shody na všech jejích náležitostech. - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi či osobě jím zastoupené známá. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner. The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email. In case that this e-mail forms part of business dealings: - the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning. - if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation. - the sender insists on that the respective contract is concluded only upon an express mutual agreement on all its aspects. - the sender of this e-mail informs that he/she is not authorized to enter into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such authorization or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such authorization is known to the recipient of the person represented by the recipient.
Re: [R] Simple Histogram
Hi, Your data may look like the following and named speed.txt in working directory. Then the cod follows the data. The graph is attached as speed.pdf. Speed 50 52 55 57 58 59 60 61 62 63 64 65 65 65 67 68 68 68 68 69 69 70 71 72 72 72 73 73 73 73 75 76 77 78 79 speed - read.table(speed.txt,header=TRUE) speed Speed 1 50 2 52 3 55 4 57 5 58 6 59 7 60 8 61 9 62 1063 1164 1265 1365 1465 1567 1668 1768 1868 1968 2069 2169 2270 2371 2472 2572 2672 2773 2873 2973 3073 3175 3276 3377 3478 3579 hist(speed$Speed) Speed.pdf http://r.789695.n4.nabble.com/file/n4703616/Speed.pdf -- View this message in context: http://r.789695.n4.nabble.com/Simple-Histogram-tp4703615p4703616.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] running rcorr (Hmisc) on multiple cores
Hello, I am having trouble generating a correlation matrix on multiple cores. I have a matrix myMat for which I would like to do a Pearson correlation. Lets say dim(myMat) is 100 200. I want a 200x200 correlation matrix and corresponding p-value matrix. I like to use rcorr(myMat) in the Hmisc package, but for larger matrices this command is too time consuming. I have spent a day playing with mclapply(myMat, rcorr, ...) from the parallel package, trying to distribute the job on multiple cores. But I can't figure it out. I also tried mclapply( myMat, cor.test, ...), but it runs even more slowly. Does anyone have any suggestions? Thanks very much for your help, Beck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] irregular sequence of events
Hi, Herr is one implementation with function named eventList. start [1] 5 13 21 start [1] 5 13 21 end [1] 10 16 27 x [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 eventList function(start, end, x) { result - character(0) for (i in 1:length(start)) { if (i == 1) { if (start[i] 1) { result - c(result, rep(NA, start[i] - 1)) } result - c(result, rep(paste0(A,i), end[i] - start[i] + 1)) } else { if (start[i] end[i - 1] + 1) { result - c(result, rep(NA, start[i] - end[i - 1] - 1)) } result - c(result, rep(paste0(A, i), end[i] - start[i] + 1)) } } if (end[length(start)] length(x)) { result - c(result, rep(NA, length(x) - end[length(start)])) } return(result) } eventList(start, end, x) [1] NA NA NA NA A1 A1 A1 A1 A1 A1 NA NA A2 A2 A2 A2 NA NA NA [20] NA A3 A3 A3 A3 A3 A3 A3 NA NA NA -- View this message in context: http://r.789695.n4.nabble.com/irregular-sequence-of-events-tp4703579p4703624.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Example of Calling a DLL
All, I'm a newbie to R and am interested in seeing a simple example of calling a 3rd party Visual Studio generated DLL from RStudio. Does anyone have a simple example which also walks through the preliminary steps of setting up the INCLUDE path and the library path to either a DLL or LIB file ? I have tried to find an easy example, but thus far has no luck finding an example using Rcpp to communicate to a 3rd party visual studio DLL. Many Thanks in Advance, Alex __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I access a specific element of a multi-dimensional list
Using lapply() where Jim used sapply() would keep the types right and be a fair bit faster than a solution based on repeatedly appending to a list (like your getFirst). Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Feb 20, 2015 at 1:52 PM, JS Huang js.hu...@protective.com wrote: Hi, Jim's answer is neat. There is an issue on the result. All are characters even though some are numeric or logic. The following implementation retains the variable type. x [[1]] [1] 2 3 5 [[2]] [1] aa bb cc [[3]] [1] TRUE FALSE TRUE getFirst function(aList) { result - list() for (i in 1:length(aList)) { result - c(result, aList[[i]][1]) } return(result) } getFirst(x) [[1]] [1] 2 [[2]] [1] aa [[3]] [1] TRUE -- View this message in context: http://r.789695.n4.nabble.com/How-do-I-access-a-specific-element-of-a-multi-dimensional-list-tp4703596p4703622.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating a distinct zip file
On 21/02/15 15:02, Jeff Newmiller wrote: R CMD INSTALL --build packagename That will create a *.tar.gz file, not a *.zip file. The latter being what Erin wanted, if I understand correctly. I have worked around the problem in the past with a shell script like unto: #! /bin/csh set vnum = `grep Version $pkge/DESCRIPTION | sed -e 's/Version: //'` R CMD INSTALL -l Lib $pkge /dev/null cd Lib zip -r -l $pkge.zip $pkge /dev/null mv $pkge.zip ../$pkge_$vnum.zip In the foregoing pkge is the name of the package you are trying to build. You will have to have created the holding library Lib a priori. There are doubtless (much) better ways of accomplishing this task, but I don't know them. cheers, Rolf --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On February 20, 2015 1:07:10 PM PST, Erin Hodgess erinm.hodg...@gmail.com wrote: Hello yet again. I am trying to create a zip file for a friend who has a Windows machine. He needs to access this via the local zip file packages option. When I use R CMD INSTALL --compile-both, it produces an item in the library tree (as promised). However, I would like to have an actual .zip file. I do know at one time that was possible, not sure if I can still do it. I did try R CMD INSTALL --force-biarch as well, same result as compile both. thank you for any suggestions. Sincerely, Erin __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Rolf Turner Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276 Home phone: +64-9-480-4619 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I access a specific element of a multi-dimensional list
Hi, Jim's answer is neat. There is an issue on the result. All are characters even though some are numeric or logic. The following implementation retains the variable type. x [[1]] [1] 2 3 5 [[2]] [1] aa bb cc [[3]] [1] TRUE FALSE TRUE getFirst function(aList) { result - list() for (i in 1:length(aList)) { result - c(result, aList[[i]][1]) } return(result) } getFirst(x) [[1]] [1] 2 [[2]] [1] aa [[3]] [1] TRUE -- View this message in context: http://r.789695.n4.nabble.com/How-do-I-access-a-specific-element-of-a-multi-dimensional-list-tp4703596p4703622.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Replacing 9999 and 999 values with NA
Hello All, I have a data frame of two columns for wind. The first column is for wind speed and the second wind direction. I'm trying to replace the values in the first column and the 999 values in the second column with NA. I tried to use the function ltdl.fix.df but it doesn't seem to do anything. ltdl.fix.df(windMV, zero2na = FALSE, coded = 999) n = 9432 by p = 4 matrix checked, 0 NA(s) present 0 factor variable(s) present 5675 value(s) coded 999 set to NA 0 -ve value(s) set to +ve half the negative value I have R version 3.1.1 Thanks, Alexandra [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Replacing 9999 and 999 values with NA
You did not say how you imported the data, but if you used one of the read.table variants (including read.csv) then you can use the na.strings argument as documented in the help file for read.table. Next time please read the posting guide, as there are some useful tips in there, such as posting using plain text (a setting in your email program) so we don't get garbled info from you, and providing a reproducible example. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On February 20, 2015 10:55:30 AM PST, Alexandra Catena amc5...@gmail.com wrote: Hello All, I have a data frame of two columns for wind. The first column is for wind speed and the second wind direction. I'm trying to replace the values in the first column and the 999 values in the second column with NA. I tried to use the function ltdl.fix.df but it doesn't seem to do anything. ltdl.fix.df(windMV, zero2na = FALSE, coded = 999) n = 9432 by p = 4 matrix checked, 0 NA(s) present 0 factor variable(s) present 5675 value(s) coded 999 set to NA 0 -ve value(s) set to +ve half the negative value I have R version 3.1.1 Thanks, Alexandra [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating a distinct zip file
R CMD INSTALL --build packagename --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On February 20, 2015 1:07:10 PM PST, Erin Hodgess erinm.hodg...@gmail.com wrote: Hello yet again. I am trying to create a zip file for a friend who has a Windows machine. He needs to access this via the local zip file packages option. When I use R CMD INSTALL --compile-both, it produces an item in the library tree (as promised). However, I would like to have an actual .zip file. I do know at one time that was possible, not sure if I can still do it. I did try R CMD INSTALL --force-biarch as well, same result as compile both. thank you for any suggestions. Sincerely, Erin __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subsetting a list of lists using lapply
How can you expect a solution if you cannot specify the problem? -- Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. Clifford Stoll On Fri, Feb 20, 2015 at 6:13 AM, Aron Lindberg aron.lindb...@case.edu wrote: Hmm…Chuck’s solution may actually be problematic because there are several entries which at the deepest level are called “sha”, but that should not be included, such as: input[[67]]$content[[1]]$commit$tree$sha and input[[67]]$content[[1]]$parents[[1]]$sha it’s only the “sha” that fit the following subsetting pattern that should be included: input[[i]]$content[[1]]$sha[1] It’s getting thornier! To be fair to Rolf’s solution (which probably can be updated to solve the problem), I’ve posted the complete dput here: https://gist.githubusercontent.com/aronlindberg/92700c04c88ff112e4f7/raw/0f3cd8468f4dc82267be3cec72d53a7a04f5c449/dput.R -- Aron Lindberg Doctoral Candidate, Information Systems Weatherhead School of Management Case Western Reserve University aronlindberg.github.io On Fri, Feb 20, 2015 at 8:25 AM, Aron Lindberg aron.lindb...@case.edu wrote: Thanks Chuck and Rolf. While Rolf’s code also works on the dput that I actually gave you (a smaller subset of the full dataset), it failed to work on the larger dataset, because there are further exceptions: input[[i]]$content[[1]] is sometimes a list, sometimes a character vector, and sometimes input[[i]]$content simply returns list(). Chuck’s solution however bypasses this and works on the full dataset (which was 8mb, which is why I didn’t upload it as a gist). Best, Aron -- Aron Lindberg Doctoral Candidate, Information Systems Weatherhead School of Management Case Western Reserve University aronlindberg.github.io On Fri, Feb 20, 2015 at 12:44 AM, Charles Berry ccbe...@ucsd.edu wrote: Aron Lindberg aron.lindberg at case.edu writes: Hi Everyone, I'm working on a thorny subsetting problem involving list of lists. I've put a dput of the data here: https://gist.githubusercontent.com/aronlindberg/b916dee897d051ac5be5/ raw/a78cbf873a7e865c3173f943ff6309ea688c653b/dput IIUC, you want the value of every list element that is named sha and that name will only apply to atomic objects. If so, this should do it. input - dget(/tmp/dpt) shas - unlist( input, use.names=FALSE )[ grepl( sha, names(unlist(input)))] input[[67]]$content[[1]]$sha [1] 58cf43ecdc1beb7e1043e9de612ecc817b090f15 which(input[[67]]$content[[1]]$sha == shas ) [1] 194 HTH, Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Averaging column scores when participants vary in number of observations
And just to muddy the waters more here's another way to do it using the handy plyr package where the data.frame is dat1 library(plyr) ddply(dat1, .(Participant.ID), summarize, mean = mean(Score)) John Kane Kingston ON Canada -Original Message- From: js.hu...@protective.com Sent: Thu, 19 Feb 2015 17:36:19 -0800 (PST) To: r-help@r-project.org Subject: Re: [R] Averaging column scores when participants vary in number of observations Hi, Another implication: data1 Observation Participant.ID Video.Coder Score 1 A 1 Donald 4 2 B 1 Tracy 5 3 C 2 Donald 6 4 D 3 Sam 2 5 E 3 Tracy 3 6 F 4 Donald 2 7 G 4 Tracy 1 8 H 5 Sam 8 tapply(data1$Score,data1$Participant.ID,mean) 1 2 3 4 5 4.5 6.0 2.5 1.5 8.0 -- View this message in context: http://r.789695.n4.nabble.com/Re-Averaging-column-scores-when-participants-vary-in-number-of-observations-tp4703549p4703561.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Can't remember your password? Do you need a strong and secure password? Use Password manager! It stores your passwords protects your account. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Raster Help
Simon, You missed a key request from Sven. He asked for data in dput() format. This is essential for dealing with many problems. Do ?dput for info on the function but esssentially it let's the reader see your data exactly as you see it, unfazed buy any special setting the reader may have for reading in data on R. Here is a little example of dput output. Just copy and paste into to get the new data.frame dat. dat1 - structure(list(Observation = c(A, B, C, D, E, F, G, H), Participant.ID = c(1L, 1L, 2L, 3L, 3L, 4L, 4L, 5L), Video.Coder = c(Donald, Tracy, Donald, Sam, Tracy, Donald, Tracy, Sam), Score = c(4L, 5L, 6L, 2L, 3L, 2L, 1L, 8L)), .Names = c(Observation, Participant.ID, Video.Coder, Score), class = data.frame, row.names = c(NA, -8L)) See these for some hints on asking questions. https://github.com/hadley/devtools/wiki/Reproducibility http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example John Kane Kingston ON Canada -Original Message- From: simon.t...@adtrak.co.uk Sent: Fri, 20 Feb 2015 15:45:04 + To: sven.temp...@gmail.com Subject: Re: [R] Raster Help Hi Sven, Many thanks for the reply and my apologies for not posting any code. So far, I have been able to write this (but it's very basic and just getting me to the 'complicated' stage). setwd(C:\\Users\\simon.tarr\\Documents\\GIS\\Test Data) require(raster) require(rgdal) revenue-read.table(revenue.csv,header=T,row.names=1,sep=,) postcodes-raster(C:\\Users\\simon.tarr\\Documents\\GIS\\Test Data\\rasters\\postcodes\\postcodes.img) trim(postcodes) plot(postcodes) I have attached a .csv file that contains my revenue data (this is actually just made up data- I wanted to make sure I could get the mapping to work before I start handling large quantities of real data). As I mentioned, the raster contains the same list of postcode names that appear in the CSV. So I need to somehow 'attach' the revenue figures to each postcode in the raster and then plot this. I hope this makes sense and apologies for the loose language...it's the only way I can think of to describe it. I'm trying hard to learn R and its syntax but sometimes I get stuck. I often know what needs to be done but struggle to write the necessary code to make it happen. All the best, Simon On 19 February 2015 at 20:37, Sven E. Templer sven.temp...@gmail.com wrote: Without (example) code it is hard to follow... use ?dput to present some data (subset). But if it is data.frames you are dealing with (for sure with read.csv, but not so sure at all with raster maps), give this a try: ?merge On 19 February 2015 at 17:44, Simon Tarr simon.t...@adtrak.co.uk wrote: Hello everyone, I need a little help with some R syntax to complete what (I think) is a fairly straightforward task- hopefully someone can assist! I have a raster map of the UK which is split into postcode areas (e.g. DE, NG, NR etc. 127 postcodes in total). I have installed the package 'raster' and have successfully plotted the .img in R. All working and looks correct with the raster. I also have a comma delimited CSV file containing the same postcodes as the raster with another column next to it containing revenue for each postcode. *I was wondering if someone could help me merge/bind the revenue figures into the correct postcode in the raster so that I can plot revenue per postcode.* I feel I should be using cbind and reclassify to do this but I can't be sure. Any help would be appreciated. Thanks in advance! Simon [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop! __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] return named list from foreach
You cannot do that in one step. Do it right after: names(out) - df$nm Please don't post using HTML format.. it scrambles code, and since we cannot see what you saw it doesn't help in any way. Also note that df is a function in the base stats package... not a good name to use. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On February 20, 2015 7:44:41 AM PST, Alexander Shenkin ashen...@ufl.edu wrote: Hello all, I've been trying to figure out how to return a named list from foreach. Given that the order of the returned list is guaranteed to be in the order in which the object is passed to foreach, list members can be named afterwards. However, I'm wondering if there's a better way to do it, perhaps with some sort of combine function? library(doParallel) library(foreach) cl - makeCluster(4) registerDoParallel(cl) df = data.frame(nm = letters[11:20], a = 1:10, b=11:20) out = foreach(i=1:nrow(df)) %dopar% { a = list(j = sqrt(df[i,]$a), k = sqrt(df[i,]$b)) a } How do I name the elements of out using the corresponding values df$nm? thanks, allie [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem in R
First, please reply to the list, not just me. On Fri, Feb 20, 2015 at 10:22 AM, thanoon younis thanoon.youni...@gmail.com wrote: thank you very much for your help actually, my data set like this #Data Set testJAGSdata = list(N1=200, N2=200, P=18, R=structure( .Data=c(8.0, 1.0,1.0, 8.0), .Dim=c(2,2)), thd - matrix(testJAGSdata$thd, ncol=6, byrow=TRUE) head(thd) [,1] [,2] [,3] [,4] [,5] [,6] [1,] -200 -2.517 -1.245 -0.444 0.848 200 [2,] -200 -1.447 -0.420 0.119 1.245 200 [3,] -200 -1.671 -0.869 -0.194 0.679 200 [4,] -200 -1.642 -0.869 -0.293 0.332 200 [5,] -200 -1.671 -0.827 0.052 0.756 200 [6,] -200 -1.769 -1.098 -0.469 0.255 200 [7,] -200 -1.490 -0.670 -0.082 0.880 200 [8,] -200 -1.933 -0.880 -0.317 1.008 200 [9,] -200 -1.587 -0.624 0.000 1.008 200 [10,] -200 -1.983 -1.348 -0.348 1.045 200 [11,] -200 -1.983 -1.229 -0.247 0.869 200 [12,] -200 -2.262 -1.426 0.037 1.330 200 [13,] -200 -2.371 -1.295 -0.224 0.651 200 [14,] -200 -2.039 -1.112 -0.149 1.169 200 [15,] -200 -2.262 -1.198 -0.309 1.198 200 [16,] -200 -2.176 -1.537 -0.717 0.597 200 [17,] -200 -1.447 -0.786 0.119 1.008 200 [18,] -200 -2.039 -1.769 -0.661 0.642 200 #Data Set testJAGSdata = list(N1=200, N2=200, P=18, + +R=structure( + .Data=c(8.0, 1.0,1.0, 8.0), + .Dim=c(2,2)), + + thd - matrix(testJAGSdata$thd, ncol=6, byrow=TRUE) + head(the) Did you do what I suggested and look at your commands? R is expecting the rest of whatever your first command is supposed to do, and it isn't complete. That's what the + prompt is trying to tell you. The first command ends with a , and doesn't have enough parentheses - something is missing there. After that, you continue trying to paste in R output to the console. It looks to me like you are working by copying and pasting from notes that you don't understand. Maybe going back and rereading an introduction to R would help you. + [,1] [,2] [,3] [,4] [,5] [,6] + [1,] -200 -2.517 -1.245 -0.444 0.848 200 Error: unexpected numeric constant in: [,1] [,2] [,3] [,4] [,5] [,6] [1,] -200 -2.517 -1.245 -0.444 0.848 [2,] -200 -1.447 -0.420 0.119 1.245 200 Error: unexpected '[' in [ [3,] -200 -1.671 -0.869 -0.194 0.679 200 Error: unexpected '[' in [ [4,] -200 -1.642 -0.869 -0.293 0.332 200 Error: unexpected '[' in [ [5,] -200 -1.671 -0.827 0.052 0.756 200 Error: unexpected '[' in [ [6,] -200 -1.769 -1.098 -0.469 0.255 200 Error: unexpected '[' in [ [7,] -200 -1.490 -0.670 -0.082 0.880 200 Error: unexpected '[' in [ [8,] -200 -1.933 -0.880 -0.317 1.008 200 Error: unexpected '[' in [ [9,] -200 -1.587 -0.624 0.000 1.008 200 Error: unexpected '[' in [ [10,] -200 -1.983 -1.348 -0.348 1.045 200 Error: unexpected '[' in [ [11,] -200 -1.983 -1.229 -0.247 0.869 200 Error: unexpected '[' in [ [12,] -200 -2.262 -1.426 0.037 1.330 200 Error: unexpected '[' in [ [13,] -200 -2.371 -1.295 -0.224 0.651 200 Error: unexpected '[' in [ [14,] -200 -2.039 -1.112 -0.149 1.169 200 Error: unexpected '[' in [ [15,] -200 -2.262 -1.198 -0.309 1.198 200 Error: unexpected '[' in [ [16,] -200 -2.176 -1.537 -0.717 0.597 200 Error: unexpected '[' in [ [17,] -200 -1.447 -0.786 0.119 1.008 200 Error: unexpected '[' in [ [18,] -200 -2.039 -1.769 -0.661 0.642 200 Error: unexpected '[' in [ Many thanks in advance On 20 February 2015 at 18:04, Sarah Goslee sarah.gos...@gmail.com wrote: Hi, On Fri, Feb 20, 2015 at 9:48 AM, thanoon younis thanoon.youni...@gmail.com wrote: Dear all members I have the following matrix in R thd - matrix(testJAGSdata$thd, ncol=6, byrow=TRUE) head(thd) [,1] [,2] [,3] [,4] [,5] [,6] [1,] -200 -2.517 -1.245 -0.444 0.848 200 [2,] -200 -1.447 -0.420 0.119 1.245 200 [3,] -200 -1.671 -0.869 -0.194 0.679 200 [4,] -200 -1.642 -0.869 -0.293 0.332 200 [5,] -200 -1.671 -0.827 0.052 0.756 200 [6,] -200 -1.769 -1.098 -0.469 0.255 200 [7,] -200 -1.490 -0.670 -0.082 0.880 200 [8,] -200 -1.933 -0.880 -0.317 1.008 200 [9,] -200 -1.587 -0.624 0.000 1.008 200 [10,] -200 -1.983 -1.348 -0.348 1.045 200 [11,] -200 -1.983 -1.229 -0.247 0.869 200 [12,] -200 -2.262 -1.426 0.037 1.330 200 [13,] -200 -2.371 -1.295 -0.224 0.651 200 [14,] -200 -2.039 -1.112 -0.149 1.169 200 [15,] -200 -2.262 -1.198 -0.309 1.198 200 [16,] -200 -2.176 -1.537 -0.717 0.597 200 [17,] -200 -1.447 -0.786 0.119 1.008 200 [18,] -200 -2.039 -1.769 -0.661 0.642 200 This is not reproducible, so the below are guesses. and when i implemented this matrix i found this error I have no idea what implemented this matrix might mean. + head(the) First problem: the + means R expects continuation of a previous line, because the command is incomplete. So whatever you did BEFORE this line is wrong. + [,1] [,2] [,3] [,4] [,5] [,6] + [1,] -200
[R] return named list from foreach
Hello all, I've been trying to figure out how to return a named list from foreach. Given that the order of the returned list is guaranteed to be in the order in which the object is passed to foreach, list members can be named afterwards. However, I'm wondering if there's a better way to do it, perhaps with some sort of combine function? library(doParallel) library(foreach) cl - makeCluster(4) registerDoParallel(cl) df = data.frame(nm = letters[11:20], a = 1:10, b=11:20) out = foreach(i=1:nrow(df)) %dopar% { a = list(j = sqrt(df[i,]$a), k = sqrt(df[i,]$b)) a } How do I name the elements of out using the corresponding values df$nm? thanks, allie [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Raster Help
Hi John everyone else, My apologies for not providing you all with a good reproducible example. I will get to work on this and reply in due course. Thanks John for pointing me in the right direction with this. Regards, On 20 February 2015 at 16:06, John Kane jrkrid...@inbox.com wrote: Simon, You missed a key request from Sven. He asked for data in dput() format. This is essential for dealing with many problems. Do ?dput for info on the function but esssentially it let's the reader see your data exactly as you see it, unfazed buy any special setting the reader may have for reading in data on R. Here is a little example of dput output. Just copy and paste into to get the new data.frame dat. dat1 - structure(list(Observation = c(A, B, C, D, E, F, G, H), Participant.ID = c(1L, 1L, 2L, 3L, 3L, 4L, 4L, 5L), Video.Coder = c(Donald, Tracy, Donald, Sam, Tracy, Donald, Tracy, Sam), Score = c(4L, 5L, 6L, 2L, 3L, 2L, 1L, 8L)), .Names = c(Observation, Participant.ID, Video.Coder, Score), class = data.frame, row.names = c(NA, -8L)) See these for some hints on asking questions. https://github.com/hadley/devtools/wiki/Reproducibility http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example John Kane Kingston ON Canada -Original Message- From: simon.t...@adtrak.co.uk Sent: Fri, 20 Feb 2015 15:45:04 + To: sven.temp...@gmail.com Subject: Re: [R] Raster Help Hi Sven, Many thanks for the reply and my apologies for not posting any code. So far, I have been able to write this (but it's very basic and just getting me to the 'complicated' stage). setwd(C:\\Users\\simon.tarr\\Documents\\GIS\\Test Data) require(raster) require(rgdal) revenue-read.table(revenue.csv,header=T,row.names=1,sep=,) postcodes-raster(C:\\Users\\simon.tarr\\Documents\\GIS\\Test Data\\rasters\\postcodes\\postcodes.img) trim(postcodes) plot(postcodes) I have attached a .csv file that contains my revenue data (this is actually just made up data- I wanted to make sure I could get the mapping to work before I start handling large quantities of real data). As I mentioned, the raster contains the same list of postcode names that appear in the CSV. So I need to somehow 'attach' the revenue figures to each postcode in the raster and then plot this. I hope this makes sense and apologies for the loose language...it's the only way I can think of to describe it. I'm trying hard to learn R and its syntax but sometimes I get stuck. I often know what needs to be done but struggle to write the necessary code to make it happen. All the best, Simon On 19 February 2015 at 20:37, Sven E. Templer sven.temp...@gmail.com wrote: Without (example) code it is hard to follow... use ?dput to present some data (subset). But if it is data.frames you are dealing with (for sure with read.csv, but not so sure at all with raster maps), give this a try: ?merge On 19 February 2015 at 17:44, Simon Tarr simon.t...@adtrak.co.uk wrote: Hello everyone, I need a little help with some R syntax to complete what (I think) is a fairly straightforward task- hopefully someone can assist! I have a raster map of the UK which is split into postcode areas (e.g. DE, NG, NR etc. 127 postcodes in total). I have installed the package 'raster' and have successfully plotted the .img in R. All working and looks correct with the raster. I also have a comma delimited CSV file containing the same postcodes as the raster with another column next to it containing revenue for each postcode. *I was wondering if someone could help me merge/bind the revenue figures into the correct postcode in the raster so that I can plot revenue per postcode.* I feel I should be using cbind and reclassify to do this but I can't be sure. Any help would be appreciated. Thanks in advance! Simon [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop! Check it out at http://www.inbox.com/earth [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
Re: [R] irregular sequence of events
Here is a solution using the sqldf package: require(sqldf) timeline - data.frame(time = 1:30) events - structure(list(start = c(5, 13, 21), end = c(10, 16, 27)), .Names = c(start, + end), row.names = c(NA, -3L), class = data.frame) # add event number events$num - paste0(A, seq(nrow(events))) events start end num 1 5 10 A1 213 16 A2 321 27 A3 sqldf( + select t.*, e.num + from timeline t + left join ( + select t.*, e.num + from timeline t, events e + where t.time between e.start and e.end) as e + on t.time = e.time + ) time num 1 1 NA 2 2 NA 3 3 NA 4 4 NA 5 5 A1 6 6 A1 7 7 A1 8 8 A1 9 9 A1 10 10 A1 11 11 NA 12 12 NA 13 13 A2 14 14 A2 15 15 A2 16 16 A2 17 17 NA 18 18 NA 19 19 NA 20 20 NA 21 21 A3 22 22 A3 23 23 A3 24 24 A3 25 25 A3 26 26 A3 27 27 A3 28 28 NA 29 29 NA 30 30 NA Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Fri, Feb 20, 2015 at 9:27 AM, PIKAL Petr petr.pi...@precheza.cz wrote: Dear all I know I am missing something obvious but after few hours of trials I ask for some help. I have some sequence of values (days) x - 1:30 and an indication of event start and end day mimo-c(5,10, 13,16, 21,27) or events - structure(list(start = c(5, 13, 21), end = c(10, 16, 27)), .Names = c(start, end), row.names = c(NA, -3L), class = data.frame) I need to get a factor indicating event event - c(rep(NA, 4), rep(A1, 6), rep(NA, 2), rep(A2, 4), rep(NA, 4), rep(A3, 7), rep(NA,3)) factor(event) In such small example I can do it manually but I have a long vector of dates and would like to use start and end day of events either from mimo vector or from events data frame. Is there any function which does it automagically? I know I have seen it before but I cannot find it now. Best regards Petr Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům. Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému. Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu. V případě, že je tento e-mail součástí obchodního jednání: - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu. - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce s dodatkem či odchylkou. - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným dosažením shody na všech jejích náležitostech. - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi či osobě jím zastoupené známá. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner. The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email. In case that this e-mail forms part of business dealings: - the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning. - if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation. - the sender insists on that the respective contract is concluded only upon an express mutual agreement on all its aspects. - the sender of this e-mail informs that he/she is not authorized to enter into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such authorization or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such
Re: [R] Raster Help
Hi Sven, Many thanks for the reply and my apologies for not posting any code. So far, I have been able to write this (but it's very basic and just getting me to the 'complicated' stage). setwd(C:\\Users\\simon.tarr\\Documents\\GIS\\Test Data) require(raster) require(rgdal) revenue-read.table(revenue.csv,header=T,row.names=1,sep=,) postcodes-raster(C:\\Users\\simon.tarr\\Documents\\GIS\\Test Data\\rasters\\postcodes\\postcodes.img) trim(postcodes) plot(postcodes) I have attached a .csv file that contains my revenue data (this is actually just made up data- I wanted to make sure I could get the mapping to work before I start handling large quantities of real data). As I mentioned, the raster contains the same list of postcode names that appear in the CSV. So I need to somehow 'attach' the revenue figures to each postcode in the raster and then plot this. I hope this makes sense and apologies for the loose language...it's the only way I can think of to describe it. I'm trying hard to learn R and its syntax but sometimes I get stuck. I often know what needs to be done but struggle to write the necessary code to make it happen. All the best, Simon On 19 February 2015 at 20:37, Sven E. Templer sven.temp...@gmail.com wrote: Without (example) code it is hard to follow... use ?dput to present some data (subset). But if it is data.frames you are dealing with (for sure with read.csv, but not so sure at all with raster maps), give this a try: ?merge On 19 February 2015 at 17:44, Simon Tarr simon.t...@adtrak.co.uk wrote: Hello everyone, I need a little help with some R syntax to complete what (I think) is a fairly straightforward task- hopefully someone can assist! I have a raster map of the UK which is split into postcode areas (e.g. DE, NG, NR etc. 127 postcodes in total). I have installed the package 'raster' and have successfully plotted the .img in R. All working and looks correct with the raster. I also have a comma delimited CSV file containing the same postcodes as the raster with another column next to it containing revenue for each postcode. *I was wondering if someone could help me merge/bind the revenue figures into the correct postcode in the raster so that I can plot revenue per postcode.* I feel I should be using cbind and reclassify to do this but I can't be sure. Any help would be appreciated. Thanks in advance! Simon [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Issue:CCC Doesn't Match in R and SAS
Hi, I was trying to calculate the CCC metric in R with the help of NbClust source code and the available SAS manual for CCC. All the Pseudo-F and R-square are matching exactly with the SAS output except for the E_R2 and hence CCC. I have tried and tested in multiple ways but couldn’t get any explanation for this. I have attached sample data, initial seed and also the SAS cluster output in Rdata format which I used for E_R2 and CCC calculation. FYI, following are the values of E_R2 in SAS and R respectively: E_R2=0.4630339 (R); but ERSQ=0.3732597284 (SAS) Could you please help me out with finding what's going wrong in the background? - Kindly find below the codes I have used for this: -- SAS -- proc fastclus data=sample_data maxiter=100 seed=initial_seed maxc=5 outstat=metrics out=output; Var v1 v2 v3 v4 v5 v6; run; -- R -- load(C:\\Users\\sagnik\\Desktop\\SAS_cluster.Rdata) clust.perf.metrics - function(data, cl) { data1 - as.matrix(data) numberObsBefore - dim(data1)[1] data - na.omit(data1) nn - numberObsAfter - dim(data)[1] pp - dim(data)[2] qq - max(cl) TT - t(data) %*% data sizeEigenTT - length(eigen(TT)$value) eigenValues - eigen(TT/(nn - 1))$value for (i in 1:sizeEigenTT) { if (eigenValues[i] 0) { cat(paste(There are only, numberObsAfter, non-missing observations out of a possible, numberObsBefore, observations.)) stop(The TSS matrix is indefinite. There must be too many missing values. The index cannot be calculated.) } } s1 - sqrt(eigenValues) ss - rep(1, sizeEigenTT) for (i in 1:sizeEigenTT) { if (s1[i] != 0) ss[i] = s1[i] } vv - prod(ss) z - matrix(0, ncol = qq, nrow = nn) clX - as.matrix(cl) for (i in 1:nn) for (j in 1:qq) { z[i, j] == 0 if (clX[i, 1] == j) z[i, j] = 1 } xbar - solve(t(z) %*% z) %*% t(z) %*% data B - t(xbar) %*% t(z) %*% z %*% xbar W - TT - B R2 - 1 - (sum(diag(W))/sum(diag(TT))) PseudoF - (sum(diag(B))/(qq-1))/(sum(diag(W))/(nn-qq)) v1 - 1 u1 - rep(0, pp) c1 - (vv/qq)^(1/pp) u1 - ss/c1 k1 - sum((u1 = 1) == TRUE) p1 - min(k1, qq - 1) if (all(p1 0, p1 pp)) { for (i in 1:p1) { v1 - v1 * ss[i]} c - (v1/qq)^(1/p1) u - ss/c b1 - sum(1/(nn + u[1:p1])) b2 - sum(u[(p1 + 1):pp]^2/(nn + u[(p1 + 1):pp]), na.rm = TRUE) E_R2 - 1 - ((b1 + b2)/sum(u^2)) * ((nn - qq)^2/nn) * (1 + (4/nn)) ccc - log((1 - E_R2)/(1 - R2)) * (sqrt(nn * p1/2)/((0.001 + E_R2)^1.2)) } else { b1 - sum(1/(nn + u)) E_R2 - 1 - (b1/sum(u^2)) * ((nn - qq)^2/nn) * (1 + 4/nn) ccc - log((1 - E_R2)/(1 - R2)) * (sqrt(nn * pp/2)/((0.001 + E_R2)^1.2)) } results - list(R_2=R2, PseudoF=PseudoF, CCC = ccc, E_R2=E_R2); return(results) } clust.perf.metrics(output[,1:6],output[,7]) #--- THANKS IN ADVANCE, REGARDS, SAGNIK __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How do I access a specific element of a multi-dimensional list
Dear list, Let's say I have setup the following list: a = c(2, 3, 5) b = c(aa, bb, cc) c = c(TRUE, FALSE, TRUE) x = list(a, b, c) I want to access the first second dimension element of each first dimension element so that the result is something like: (2, aa, TRUE) In my real life problem the list is about 350 elements in the first dimension so the solution must handle that. Sincerely Knut Hansen __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Split a dataframe by rownames and/or colnames
Dear List, Consider this example df - data.frame(matrix(rnorm(9*9), ncol=9)) names(df) - c(c_1, d_1, e_1, a_p, b_p, c_p, 1_o1, 2_o1, 3_o1) row.names(df) - names(df) indx - gsub(.*_, , names(df)) I can split the dataframe by the index that is given in the column.names after the underscore _. list2env( setNames( lapply(split(colnames(df), indx), function(x) df[x]), paste('df', sort(unique(indx)), sep=_)), envir=.GlobalEnv) However, i changed my mind and want to do it now by rownames. Exchanging colnames with rownames does not work, it gives the exact same output (9 rows x 3 columns). I could do as.data.frame(t(df_x), but maybe that is not elegant. What would be the solution for splitting the dataframe by rows? Thank you very much! -- Tim Richter-Heitmann __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I access a specific element of a multi-dimensional list
try this: a = c(2, 3, 5) b = c(aa, bb, cc) c = c(TRUE, FALSE, TRUE) x = list(a, b, c) x [[1]] [1] 2 3 5 [[2]] [1] aa bb cc [[3]] [1] TRUE FALSE TRUE sapply(x, '[[', 1) [1] 2aa TRUE Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Fri, Feb 20, 2015 at 7:18 AM, Knut Hansen knut.han...@uit.no wrote: Dear list, Let's say I have setup the following list: a = c(2, 3, 5) b = c(aa, bb, cc) c = c(TRUE, FALSE, TRUE) x = list(a, b, c) I want to access the first second dimension element of each first dimension element so that the result is something like: (2, aa, TRUE) In my real life problem the list is about 350 elements in the first dimension so the solution must handle that. Sincerely Knut Hansen __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Windows7, latest R-Studio, newb, how to display 1 column name from a data frame.
I'm supposed to return for class data with the ID and the value. I'm returning just the correct value. Here's the code and the output. nobs - data.frame() files_list - list.files(directory, full.names=TRUE) dat - data.frame() for (i in id){ dat - (read.csv(files_list[i])) nobs - sum(complete.cases(dat)) print(nobs) } } The below values are correct, but I dont have the id in front of each sum. Any help? complete(specdata,c(2,4,8,10,12)) [1] 1041 [1] 474 [1] 192 [1] 148 [1] 96 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Example of Calling a DLL
This is off-topic here (read the posting guide). You would probably proceed most effectively by studying how GCC interacts with VS object code, e.g. [1], and studying the Writing R Extensions manual. [1] http://stackoverflow.com/questions/8683046/compatibility-of-dll-a-lib-def-between-visualstudio-and-gcc --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On February 20, 2015 9:11:45 PM PST, Alex Restrepo alex_restr...@hotmail.com wrote: All, I'm a newbie to R and am interested in seeing a simple example of calling a 3rd party Visual Studio generated DLL from RStudio. Does anyone have a simple example which also walks through the preliminary steps of setting up the INCLUDE path and the library path to either a DLL or LIB file ? I have tried to find an easy example, but thus far has no luck finding an example using Rcpp to communicate to a 3rd party visual studio DLL. Many Thanks in Advance, Alex __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating a distinct zip file
On Windows it builds a zip file. If you are on Linux, you might [1] need [2]. [1] https://stat.ethz.ch/pipermail/r-help/2005-January/063596.html [2] http://cran.r-project.org/doc/contributed/cross-build.pdf --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On February 20, 2015 6:56:34 PM PST, Rolf Turner r.tur...@auckland.ac.nz wrote: On 21/02/15 15:02, Jeff Newmiller wrote: R CMD INSTALL --build packagename That will create a *.tar.gz file, not a *.zip file. The latter being what Erin wanted, if I understand correctly. I have worked around the problem in the past with a shell script like unto: #! /bin/csh set vnum = `grep Version $pkge/DESCRIPTION | sed -e 's/Version: //'` R CMD INSTALL -l Lib $pkge /dev/null cd Lib zip -r -l $pkge.zip $pkge /dev/null mv $pkge.zip ../$pkge_$vnum.zip In the foregoing pkge is the name of the package you are trying to build. You will have to have created the holding library Lib a priori. There are doubtless (much) better ways of accomplishing this task, but I don't know them. cheers, Rolf --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On February 20, 2015 1:07:10 PM PST, Erin Hodgess erinm.hodg...@gmail.com wrote: Hello yet again. I am trying to create a zip file for a friend who has a Windows machine. He needs to access this via the local zip file packages option. When I use R CMD INSTALL --compile-both, it produces an item in the library tree (as promised). However, I would like to have an actual .zip file. I do know at one time that was possible, not sure if I can still do it. I did try R CMD INSTALL --force-biarch as well, same result as compile both. thank you for any suggestions. Sincerely, Erin __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating a distinct zip file
On 21/02/2015 07:31, Jeff Newmiller wrote: On Windows it builds a zip file. If you are on Linux, you might [1] need [2]. [1] https://stat.ethz.ch/pipermail/r-help/2005-January/063596.html [2] http://cran.r-project.org/doc/contributed/cross-build.pdf But the first is from 2005 and the second is invalid (it should be http://cran.r-project.org/doc/contrib/cross-build.pdf and describes R 1.7.x: what it describes is no longer supported). For some time you can install a package without compilable sources on any R platform. So the tarball would be all that is needed. If compilation is needed, submit to winbuilder to make .zip file. See also http://cran.r-project.org/bin/windows/base/rw-FAQ.html#How-can-I-get-a-binary-version-of-a-package_003f --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On February 20, 2015 6:56:34 PM PST, Rolf Turner r.tur...@auckland.ac.nz wrote: On 21/02/15 15:02, Jeff Newmiller wrote: R CMD INSTALL --build packagename That will create a *.tar.gz file, not a *.zip file. The latter being what Erin wanted, if I understand correctly. I have worked around the problem in the past with a shell script like unto: #! /bin/csh set vnum = `grep Version $pkge/DESCRIPTION | sed -e 's/Version: //'` R CMD INSTALL -l Lib $pkge /dev/null cd Lib zip -r -l $pkge.zip $pkge /dev/null mv $pkge.zip ../$pkge_$vnum.zip In the foregoing pkge is the name of the package you are trying to build. You will have to have created the holding library Lib a priori. There are doubtless (much) better ways of accomplishing this task, but I don't know them. cheers, Rolf --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On February 20, 2015 1:07:10 PM PST, Erin Hodgess erinm.hodg...@gmail.com wrote: Hello yet again. I am trying to create a zip file for a friend who has a Windows machine. He needs to access this via the local zip file packages option. When I use R CMD INSTALL --compile-both, it produces an item in the library tree (as promised). However, I would like to have an actual .zip file. I do know at one time that was possible, not sure if I can still do it. I did try R CMD INSTALL --force-biarch as well, same result as compile both. thank you for any suggestions. Sincerely, Erin __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Emeritus Professor of Applied Statistics, University of Oxford 1 South Parks Road, Oxford OX1 3TG, UK __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] trouble with .Rd file
Hello everyone! I've been messing with this .Rd file and am having forest/trees problem by now. Here is the section of the .Rd file that is the troublemaker: \usage{ plot.fore4nodate(y, sim, dates, date.fmt = %Y-%m-%d, gof.leg = FALSE, gof.digits = 2, legend = , leg.cex = 1, bands.col = lightblue, border = NA, tick.tstep = auto, lab.tstep = auto, lab.fmt = NULL, main = NULL, cal.ini = NA, val.ini = NA, xlab = Time, ylab = , ylim, col = c(black, blue), type = c(lines, lines), cex = c(1.8, 1.8), cex.axis = 1.8, ex.lab = 1.8, lwd = c(2.5, 2.5), y = 1:2, pch = c(1, 9), cex.main = 2.1, lasa = 1, mt1 = 1.27, ...) } Now the error part: c:\Progra~1\R\R-3.0.2\bin\x64\Rcmd build ts1 * checking for file 'ts1/DESCRIPTION' ... OK * preparing 'ts1': * checking DESCRIPTION meta-information ... OK Warning: newline within quoted string at plot.fore4nodate.Rd:10 Error in parse_Rd (C:/Users/hodgesse/AppData/Local/Temp/Rt../ts1/man/plot.fore4nodate.RD, : Unexpected end of input in (in quoted string opened at plot.fore4nodate.Rd:14.26) Execution halted The 14.26 would be at the word dates in the first line of the plot.fore4nodate line. This is making me a little nuts. Actually a lot nuts. If anyone can see anything, I would really appreciate any suggestions. Sincerely, Erin -- Erin Hodgess Associate Professor Department of Mathematical and Statistics University of Houston - Downtown mailto: erinm.hodg...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] trouble with .Rd file
On 20/02/2015 12:58 PM, Erin Hodgess wrote: Hello everyone! I've been messing with this .Rd file and am having forest/trees problem by now. Here is the section of the .Rd file that is the troublemaker: \usage{ plot.fore4nodate(y, sim, dates, date.fmt = %Y-%m-%d, gof.leg = FALSE, gof.digits = 2, legend = , leg.cex = 1, bands.col = lightblue, border = NA, tick.tstep = auto, lab.tstep = auto, lab.fmt = NULL, main = NULL, cal.ini = NA, val.ini = NA, xlab = Time, ylab = , ylim, col = c(black, blue), type = c(lines, lines), cex = c(1.8, 1.8), cex.axis = 1.8, ex.lab = 1.8, lwd = c(2.5, 2.5), y = 1:2, pch = c(1, 9), cex.main = 2.1, lasa = 1, mt1 = 1.27, ...) } Now the error part: c:\Progra~1\R\R-3.0.2\bin\x64\Rcmd build ts1 * checking for file 'ts1/DESCRIPTION' ... OK * preparing 'ts1': * checking DESCRIPTION meta-information ... OK Warning: newline within quoted string at plot.fore4nodate.Rd:10 Error in parse_Rd (C:/Users/hodgesse/AppData/Local/Temp/Rt../ts1/man/plot.fore4nodate.RD, : Unexpected end of input in (in quoted string opened at plot.fore4nodate.Rd:14.26) Execution halted The 14.26 would be at the word dates in the first line of the plot.fore4nodate line. This is making me a little nuts. Actually a lot nuts. If anyone can see anything, I would really appreciate any suggestions. Generally percent symbols (%) need to be escaped in Rd files. So the default value for date.fmt should be entered as \%Y-\%m-\%d. Duncan Murdoch __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Chi-square test
Hello, If the vector of observed frequencies is: f-c(0,0,0,2,3,6,17,15,21,21,14,10,5,1,5) and the vector of probability :p11-c(7.577864e-06, 1.999541e-04 ,1.833510e-03, 9.059845e-03, 2.886977e-02, 6.546229e-02 ,1.124083e-01, 1.525880e-01, 1.689712e-01, 1.563522e-01, 1.232031e-01, 8.395000e-02, 5.009534e-02, 2.645857e-02,0.0205403) The sum of the probabilities is equal to one. But when I want to do the the Chi-square test, I get this error: probabilities must sum to one. Does anybody know the reason? Best Regards, pari __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Split a dataframe by rownames and/or colnames
I think ?tapply and friends: ?by ?aggregate ?ave is what you want. -- Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. Clifford Stoll On Fri, Feb 20, 2015 at 9:33 AM, Tim Richter-Heitmann trich...@uni-bremen.de wrote: Dear List, Consider this example df - data.frame(matrix(rnorm(9*9), ncol=9)) names(df) - c(c_1, d_1, e_1, a_p, b_p, c_p, 1_o1, 2_o1, 3_o1) row.names(df) - names(df) indx - gsub(.*_, , names(df)) I can split the dataframe by the index that is given in the column.names after the underscore _. list2env( setNames( lapply(split(colnames(df), indx), function(x) df[x]), paste('df', sort(unique(indx)), sep=_)), envir=.GlobalEnv) However, i changed my mind and want to do it now by rownames. Exchanging colnames with rownames does not work, it gives the exact same output (9 rows x 3 columns). I could do as.data.frame(t(df_x), but maybe that is not elegant. What would be the solution for splitting the dataframe by rows? Thank you very much! -- Tim Richter-Heitmann __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subsetting a list of lists using lapply
On Fri, 20 Feb 2015, Aron Lindberg wrote: Hmm…Chuck’s solution may actually be problematic because there are several entries which at the deepest level are called “sha”, but that should not be included, such as: input[[67]]$content[[1]]$commit$tree$sha and input[[67]]$content[[1]]$parents[[1]]$sha it’s only the “sha” that fit the following subsetting pattern that should be included: input[[i]]$content[[1]]$sha[1] This should be straightforward. Look at what grepl() is doing. And look at what names(unlist(input)) yields. You can either write a regular expression to handle this (perhaps content.sha$) or write other grepl() expressions to select (or get rid of) the desired (or unwanted) pattern. See ?grepl and the page on regular expression referenced there. HTH, Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi-square test
On 20-02-2015, at 19:05, pari hesabi statistic...@hotmail.com wrote: Hello, If the vector of observed frequencies is: f-c(0,0,0,2,3,6,17,15,21,21,14,10,5,1,5) and the vector of probability :p11-c(7.577864e-06, 1.999541e-04 ,1.833510e-03, 9.059845e-03, 2.886977e-02, 6.546229e-02 ,1.124083e-01, 1.525880e-01, 1.689712e-01, 1.563522e-01, 1.232031e-01, 8.395000e-02, 5.009534e-02, 2.645857e-02,0.0205403) The sum of the probabilities is equal to one. But when I want to do the the Chi-square test, I get this error: probabilities must sum to one. print sum(p11)-1 Does anybody know the reason? R FAQ 7.31 (http://cran.r-project.org/doc/FAQ/R-FAQ.html) Berend Best Regards, pari __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] trouble with .Rd file
Ah! Perfect. Thanks so much! Sincerely, Erin On Fri, Feb 20, 2015 at 1:03 PM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 20/02/2015 12:58 PM, Erin Hodgess wrote: Hello everyone! I've been messing with this .Rd file and am having forest/trees problem by now. Here is the section of the .Rd file that is the troublemaker: \usage{ plot.fore4nodate(y, sim, dates, date.fmt = %Y-%m-%d, gof.leg = FALSE, gof.digits = 2, legend = , leg.cex = 1, bands.col = lightblue, border = NA, tick.tstep = auto, lab.tstep = auto, lab.fmt = NULL, main = NULL, cal.ini = NA, val.ini = NA, xlab = Time, ylab = , ylim, col = c(black, blue), type = c(lines, lines), cex = c(1.8, 1.8), cex.axis = 1.8, ex.lab = 1.8, lwd = c(2.5, 2.5), y = 1:2, pch = c(1, 9), cex.main = 2.1, lasa = 1, mt1 = 1.27, ...) } Now the error part: c:\Progra~1\R\R-3.0.2\bin\x64\Rcmd build ts1 * checking for file 'ts1/DESCRIPTION' ... OK * preparing 'ts1': * checking DESCRIPTION meta-information ... OK Warning: newline within quoted string at plot.fore4nodate.Rd:10 Error in parse_Rd (C:/Users/hodgesse/AppData/Local/Temp/Rt../ts1/man/plot.fore4nodate.RD, : Unexpected end of input in (in quoted string opened at plot.fore4nodate.Rd:14.26) Execution halted The 14.26 would be at the word dates in the first line of the plot.fore4nodate line. This is making me a little nuts. Actually a lot nuts. If anyone can see anything, I would really appreciate any suggestions. Generally percent symbols (%) need to be escaped in Rd files. So the default value for date.fmt should be entered as \%Y-\%m-\%d. Duncan Murdoch -- Erin Hodgess Associate Professor Department of Mathematical and Statistics University of Houston - Downtown mailto: erinm.hodg...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi-square test
On Feb 20, 2015, at 10:05 AM, pari hesabi wrote: Hello, If the vector of observed frequencies is: f-c(0,0,0,2,3,6,17,15,21,21,14,10,5,1,5) and the vector of probability :p11-c(7.577864e-06, 1.999541e-04 ,1.833510e-03, 9.059845e-03, 2.886977e-02, 6.546229e-02 ,1.124083e-01, 1.525880e-01, 1.689712e-01, 1.563522e-01, 1.232031e-01, 8.395000e-02, 5.009534e-02, 2.645857e-02,0.0205403) The sum of the probabilities is equal to one. Well, the sum is close to 1.0 but not exact. There's a simple fix: sum(p11)==1 [1] FALSE sum( p11/sum(p11) )==1 [1] TRUE But when I want to do the the Chi-square test, I get this error: probabilities must sum to one. Does anybody know the reason? Numerical accuracy. See R-FAQ 7.31 -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi-square test
And probably why chisq.test has the rescale.p= argument. Your second problem with small expected values can be handled with simulate.p.value=. chisq.test(f, p=p11) Error in chisq.test(f, p = p11) : probabilities must sum to 1. 1-sum(p11) [1] 4.3036e-08 chisq.test(f, p=p11, rescale.p=TRUE) Chi-squared test for given probabilities data: f X-squared = 7.6268, df = 14, p-value = 0.9078 Warning message: In chisq.test(f, p = p11, rescale.p = TRUE) : Chi-squared approximation may be incorrect chisq.test(f, p=p11, rescale.p=TRUE, simulate.p.value=TRUE) Chi-squared test for given probabilities with simulated p-value (based on 2000 replicates) data: f X-squared = 7.6268, df = NA, p-value = 0.7996 - David L Carlson Department of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Berend Hasselman Sent: Friday, February 20, 2015 12:13 PM To: pari hesabi Cc: r-help@r-project.org Subject: Re: [R] Chi-square test On 20-02-2015, at 19:05, pari hesabi statistic...@hotmail.com wrote: Hello, If the vector of observed frequencies is: f-c(0,0,0,2,3,6,17,15,21,21,14,10,5,1,5) and the vector of probability :p11-c(7.577864e-06, 1.999541e-04 ,1.833510e-03, 9.059845e-03, 2.886977e-02, 6.546229e-02 ,1.124083e-01, 1.525880e-01, 1.689712e-01, 1.563522e-01, 1.232031e-01, 8.395000e-02, 5.009534e-02, 2.645857e-02,0.0205403) The sum of the probabilities is equal to one. But when I want to do the the Chi-square test, I get this error: probabilities must sum to one. print sum(p11)-1 Does anybody know the reason? R FAQ 7.31 (http://cran.r-project.org/doc/FAQ/R-FAQ.html) Berend Best Regards, pari __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subsetting a list of lists using lapply
On Feb 20, 2015, at 6:13 AM, Aron Lindberg wrote: Hmm…Chuck’s solution may actually be problematic because there are several entries which at the deepest level are called “sha”, but that should not be included, such as: input[[67]]$content[[1]]$commit$tree$sh and input[[67]]$content[[1]]$parents[[1]]$sha it’s only the “sha” that fit the following subsetting pattern that should be included: input[[i]]$content[[1]]$sha[1] It’s getting thornier! To be fair to Rolf’s solution (which probably can be updated to solve the problem), I’ve posted the complete dput here: https://gist.githubusercontent.com/aronlindberg/92700c04c88ff112e4f7/raw/0f3cd8468f4dc82267be3cec72d53a7a04f5c449/dput.R I didn't try on the larger example, but this works on the smaller one: get_shas - function(input){ x - lapply(input, [[, content) y - lapply(x, [[, 1) z - lapply(y, function(yy) if( length(names(yy)) names(yy) ==sha ){ yy[[sha]] }) } sha_lists - get_shas(input) It does deliver an entry for every leaf of the input-object which is either the value of sha or NA. I think that is not a bad thing because it lets you figure out where the values are coming from. -- Aron Lindberg Doctoral Candidate, Information Systems Weatherhead School of Management Case Western Reserve University aronlindberg.github.io On Fri, Feb 20, 2015 at 8:25 AM, Aron Lindberg aron.lindb...@case.edu wrote: Thanks Chuck and Rolf. While Rolf’s code also works on the dput that I actually gave you (a smaller subset of the full dataset), it failed to work on the larger dataset, because there are further exceptions: input[[i]]$content[[1]] is sometimes a list, sometimes a character vector, and sometimes input[[i]]$content simply returns list(). Chuck’s solution however bypasses this and works on the full dataset (which was 8mb, which is why I didn’t upload it as a gist). Best, Aron -- Aron Lindberg Doctoral Candidate, Information Systems Weatherhead School of Management Case Western Reserve University aronlindberg.github.io On Fri, Feb 20, 2015 at 12:44 AM, Charles Berry ccbe...@ucsd.edu wrote: Aron Lindberg aron.lindberg at case.edu writes: Hi Everyone, I'm working on a thorny subsetting problem involving list of lists. I've put a dput of the data here: https://gist.githubusercontent.com/aronlindberg/b916dee897d051ac5be5/ raw/a78cbf873a7e865c3173f943ff6309ea688c653b/dput IIUC, you want the value of every list element that is named sha and that name will only apply to atomic objects. If so, this should do it. input - dget(/tmp/dpt) shas - unlist( input, use.names=FALSE )[ grepl( sha, names(unlist(input)))] input[[67]]$content[[1]]$sha [1] 58cf43ecdc1beb7e1043e9de612ecc817b090f15 which(input[[67]]$content[[1]]$sha == shas ) [1] 194 HTH, Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Split a dataframe by rownames and/or colnames
On Feb 20, 2015, at 9:33 AM, Tim Richter-Heitmann wrote: Dear List, Consider this example df - data.frame(matrix(rnorm(9*9), ncol=9)) names(df) - c(c_1, d_1, e_1, a_p, b_p, c_p, 1_o1, 2_o1, 3_o1) row.names(df) - names(df) indx - gsub(.*_, , names(df)) I can split the dataframe by the index that is given in the column.names after the underscore _. list2env( setNames( lapply(split(colnames(df), indx), function(x) df[x]), paste('df', sort(unique(indx)), sep=_)), envir=.GlobalEnv) However, i changed my mind and want to do it now by rownames. Exchanging colnames with rownames does not work, it gives the exact same output (9 rows x 3 columns). I could do as.data.frame(t(df_x), but maybe that is not elegant. What would be the solution for splitting the dataframe by rows? The split.data.frame method seems to work perfectly well with a rownames-derived index argument: split(df, sub(.+_,, rownames(df) ) ) $`1` c_1 d_1 e_1 a_p b_p c_p 1_o1 2_o1 3_o1 c_1 -0.11 -0.04 1.33 -0.87 -0.16 -0.25 -0.75 0.34 0.14 d_1 -0.62 -0.94 0.80 -0.78 -0.70 0.74 0.11 1.44 -0.33 e_1 0.98 -0.83 0.48 0.19 -0.32 -1.01 1.28 1.04 -2.16 $o1 c_1 d_1 e_1 a_p b_p c_p 1_o1 2_o1 3_o1 1_o1 -0.93 -0.02 0.69 -0.67 1.04 1.04 -1.50 -0.36 0.50 2_o1 0.02 -0.16 -0.09 -1.50 -0.02 -1.04 1.07 -0.45 1.56 3_o1 -1.42 0.88 -0.05 0.85 -1.35 0.21 1.35 0.92 -0.76 $p c_1 d_1 e_1 a_p b_p c_p 1_o1 2_o1 3_o1 a_p -1.35 0.91 -0.58 -0.63 0.94 -1.13 0.71 0.25 0.82 b_p -0.25 -0.73 -0.41 -1.71 1.28 0.19 -0.35 1.74 -0.93 c_p -0.01 -1.11 -0.12 0.58 1.51 0.03 -0.99 -0.23 -0.03 Thank you very much! -- Tim Richter-Heitmann -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subsetting a list of lists using lapply
The elNamed(x, name) function can simplify this code a bit. The following gives the same result as David W's get_shas() for the sample dataset provided: get_shas2 - function (input) { lapply(input, function(el) elNamed(elNamed(el, content)[[1]], sha)[1]) } Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Feb 20, 2015 at 10:56 AM, David Winsemius dwinsem...@comcast.net wrote: On Feb 20, 2015, at 6:13 AM, Aron Lindberg wrote: Hmm…Chuck’s solution may actually be problematic because there are several entries which at the deepest level are called “sha”, but that should not be included, such as: input[[67]]$content[[1]]$commit$tree$sh and input[[67]]$content[[1]]$parents[[1]]$sha it’s only the “sha” that fit the following subsetting pattern that should be included: input[[i]]$content[[1]]$sha[1] It’s getting thornier! To be fair to Rolf’s solution (which probably can be updated to solve the problem), I’ve posted the complete dput here: https://gist.githubusercontent.com/aronlindberg/92700c04c88ff112e4f7/raw/0f3cd8468f4dc82267be3cec72d53a7a04f5c449/dput.R I didn't try on the larger example, but this works on the smaller one: get_shas - function(input){ x - lapply(input, [[, content) y - lapply(x, [[, 1) z - lapply(y, function(yy) if( length(names(yy)) names(yy) ==sha ){ yy[[sha]] }) } sha_lists - get_shas(input) It does deliver an entry for every leaf of the input-object which is either the value of sha or NA. I think that is not a bad thing because it lets you figure out where the values are coming from. -- Aron Lindberg Doctoral Candidate, Information Systems Weatherhead School of Management Case Western Reserve University aronlindberg.github.io On Fri, Feb 20, 2015 at 8:25 AM, Aron Lindberg aron.lindb...@case.edu wrote: Thanks Chuck and Rolf. While Rolf’s code also works on the dput that I actually gave you (a smaller subset of the full dataset), it failed to work on the larger dataset, because there are further exceptions: input[[i]]$content[[1]] is sometimes a list, sometimes a character vector, and sometimes input[[i]]$content simply returns list(). Chuck’s solution however bypasses this and works on the full dataset (which was 8mb, which is why I didn’t upload it as a gist). Best, Aron -- Aron Lindberg Doctoral Candidate, Information Systems Weatherhead School of Management Case Western Reserve University aronlindberg.github.io On Fri, Feb 20, 2015 at 12:44 AM, Charles Berry ccbe...@ucsd.edu wrote: Aron Lindberg aron.lindberg at case.edu writes: Hi Everyone, I'm working on a thorny subsetting problem involving list of lists. I've put a dput of the data here: https://gist.githubusercontent.com/aronlindberg/b916dee897d051ac5be5/ raw/a78cbf873a7e865c3173f943ff6309ea688c653b/dput IIUC, you want the value of every list element that is named sha and that name will only apply to atomic objects. If so, this should do it. input - dget(/tmp/dpt) shas - unlist( input, use.names=FALSE )[ grepl( sha, names(unlist(input)))] input[[67]]$content[[1]]$sha [1] 58cf43ecdc1beb7e1043e9de612ecc817b090f15 which(input[[67]]$content[[1]]$sha == shas ) [1] 194 HTH, Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.