Re: [R] Rstudio Error in View : 'wildcard' is missing
a) RStudio has its own support forum on its website. If your problem only happens in RStudio, then your question belongs there. If not, demonstrate the sequence of steps it takes to obtain your error using plain R and re-post. b) This kind of thing can happen when you corrupt your workspace. Beware of auto saving your workspace... instead, build scripts that analyze your data from raw input to in-memory analysis results. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Ellen Sebastian elle...@stanford.edu wrote: Hello, Whenever I try to view anything (matrix, data frame, etc) using View() in RStudio, I get the error: Error in View : 'wildcard' is missing. Google hasn't returned any relevant help... Does anyone have an idea as to how I can fix this?? Thanks!! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] quesion about model g1
On Mon, 29 Apr 2013, meng wrote: Hello Achim: Sorry for another question about the model g1 in the last mail. As to model g2 and g3: g2 - glm(Freq ~ (age + drug + case)^2, data = df, family = poisson) g3 - glm(Freq ~ age * drug * case, data = df, family = poisson) anova(g2, g3, test = Chisq) I know clearly that the only difference between g2 and g3 is that g2 has no 3-way interaction while g3 has,and anova tests whether this only difference(i.e. 3-way interaction) is significant or not. But as to g1 and g3: g1 - glm(Freq ~ age/(drug + case), data = df, family = poisson) I can't find out the only difference between g1 and g3,so I don't know what anova(g1, g3, test = Chisq) tests for. Also, what / sign following age in g1 refers to? The / could be replaced by a * here and the fitted values and corresponding log-likelihood would not change. Only the coefficients change: / induces a nested coding while * employs the interaction coding. Breaking everything down to main and interaction effects (and ignoring the particular coding of the coefficients), the three models are g1: a + d + c + a:d + a:c g2: a + d + c + a:d + a:c + d:c g3: a + d + c + a:d + a:c + d:c + a:d:c with interpretations: g1: conditional independence of drug and case given age g2: no three-way interaction (case depends on drug but in the same way for different levels of age) g3: saturated model Many thanks and sorry for many quesions. Best At 2013-04-24 22:22:55,Achim Zeileis achim.zeil...@uibk.ac.at wrote: On Wed, 24 Apr 2013, meng wrote: Hi,Achim: Can all the analysis you mentioned via loglm be performed via glm(...,family=poisson)? Yes. ## transform table back to data.frame df - as.data.frame(tab) ## fit models: conditional independence, no-three way interaction, ## and saturated g1 - glm(Freq ~ age/(drug + case), data = df, family = poisson) g2 - glm(Freq ~ (age + drug + case)^2, data = df, family = poisson) g3 - glm(Freq ~ age * drug * case, data = df, family = poisson) ## likelihood-ratio tests (against saturated) anova(g1, g3, test = Chisq) anova(g2, g3, test = Chisq) ## compare fitted frequencies (which are essentially equal) all.equal(as.numeric(fitted(g1)), as.data.frame(as.table(fitted(m1)))$Freq) all.equal(as.numeric(fitted(g2)), as.data.frame(as.table(fitted(m2)))$Freq) The difference is mainly that loglm() has a specialized user interface and that it uses a different optimizer (iterative proportional fitting rather than iterative reweighted least squares). Best, Z Many thanks. At 2013-04-24 19:37:10,Achim Zeileis achim.zeil...@uibk.ac.at wrote: On Wed, 24 Apr 2013, meng wrote: Hi all: For stratified count data,how to perform regression analysis? My data: age case oc count 1 1 121 1 1 226 1 2 117 1 2 259 2 1 118 2 1 288 2 2 1 7 2 2 2 95 age: 1:40y 2:40y case: 1:patient 2:health oc: 1:use drug 2:not use drug My purpose: Anaysis whether case and oc are correlated, and age is a stratified varia ble. My solution: 1,Mantel-Haenszel test by using function mantelhaen.test 2,loglinear regression by using function glm(count~case*oc,family=poisson ).But I don't know how to handle variable age,which is the stratified vari able. Instead of using glm(family = poisson) it is also convenient to use loglm() from package MASS for the associated convenience table. The code below shows how to set up the contingency table, visualize it using package vcd, and then fit two models using loglm. The models considered are conditional independence of case and drug given age and the no three-way interaction already suggested by Peter. Both models are also accompanied by visualizations of the residuals. ## contingency table with nice labels tab - expand.grid(drug = 1:2, case = 1:2, age = 1:2) tab$count - c(21, 26, 17, 59, 18, 88, 7, 95) tab$age - factor(tab$age, levels = 1:2, labels = c(40, 40)) tab$case - factor(tab$case, levels = 1:2, labels = c(patient, healthy)) tab$drug - factor(tab$drug, levels = 1:2, labels = c(yes, no)) tab - xtabs(count ~ age + drug + case, data = tab) ## visualize case explained by drug given age library(vcd) mosaic(case ~ drug | age, data = tab, split_vertical = c(TRUE, TRUE, FALSE)) ## test wheter drug and case are independent given age m1 - loglm(~ age/(drug + case), data = tab) m1 ## visualize corresponding residuals from independence model mosaic(case ~ drug | age, data = tab, split_vertical = c(TRUE, TRUE, FALSE), residuals_type = deviance, gp = shading_hcl, gp_args = list(interpolate = 1.2)) mosaic(case ~ drug | age, data = tab, split_vertical = c(TRUE, TRUE, FALSE), residuals_type = pearson, gp = shading_hcl, gp_args = list(interpolate = 1.2)) ## test whether there is no three-way interaction ## (i.e., dependence of case on drug is the same for both age groups) m2 - loglm(~ (age + drug + case)^2, data = tab) m2 ##
[R] Comparing two different 'survival' events for the same subject using survdiff?
I have a dataset which for the sake of simplicity has two endpoints. We would like to test if two different end-points have the same eventual meaning. To try and take an example that people might understand better: Lets assume we had a group of subjects who all received a treatment. The could stop treatment for any reason (side effects, treatment stops working etc). Getting that data is very easy. Measuring if treatment stops working is very hard to capture... so we would like to test if duration on treatment (easy) is the same as time to treatment failure (hard). My data might look like this: A = c(9.77, 0.43, 0.03, 3.50, 7.07, 6.57, 8.57, 2.30, 6.17, 3.27, 2.57, 0.77) B = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1) # 1 = yes (censored) C = c( 9.80, 0.43, 5.93, 8.43, 6.80, 2.60, 8.93, 8.37, 12.23, 5.83, 13.17, 0.77) D = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1) # 1 = yes (censored) myData = data.frame (TimeOnTx = A, StillOnTx = B, TimeToFailure = C, NotFailed = D) We can do a survival analysis on those individually: OnTxFit = survfit (Surv ( TimeOnTx, StillOnTx==0 ) ~ 1 , data = myData) FailedFit = survfit (Surv ( TimeToFailure , NotFailed==0 ) ~ 1 , data = myData) plot(OnTxFit) lines(OnTxFit) But how can I do a survdiff type of comparison between the two? Do I have to restructure the data so that Time's are all in one column, Event in another and then a Group to indicate what type of event it is? Seems a complex way to do it (especially as the dataset is of course more complex than I've just shown)... so I thought maybe I'm missing something... This message may contain confidential information. If yo...{{dropped:19}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Counting number of consecutive occurrences per rows
Hi, I would appreciate if somebody could help me with following calculation. I have a dataframe, by 10 minutes time, for mostly one year data. This is small example: dput(test) structure(list(jul = structure(c(14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655), origin = structure(0, class = Date)), time = structure(c(1266258354, 1266258954, 1266259554, 1266260154, 1266260754, 1266261354, 1266261954, 1266262554, 1266263154, 1266263754, 1266264354, 1266264954, 1266265554, 1266266154, 1266266754, 1266267354), class = c(POSIXct, POSIXt), tzone = GMT), act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0, 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0)), .Names = c(jul, time, act, day ), class = data.frame, row.names = c(510L, 512L, 514L, 516L, 518L, 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L, 540L)) Looks like this: test jultime act day 510 14655 2010-02-15 18:25:54 130 1 512 14655 2010-02-15 18:35:54 23 1 514 14655 2010-02-15 18:45:54 45 1 516 14655 2010-02-15 18:55:54 200 1 518 14655 2010-02-15 19:05:54 200 1 520 14655 2010-02-15 19:15:54 200 1 522 14655 2010-02-15 19:25:54 199 1 524 14655 2010-02-15 19:35:54 150 1 526 14655 2010-02-15 19:45:54 0 1 528 14655 2010-02-15 19:55:54 0 1 530 14655 2010-02-15 20:05:54 0 0 532 14655 2010-02-15 20:15:54 0 0 534 14655 2010-02-15 20:25:54 34 0 536 14655 2010-02-15 20:35:54 200 0 538 14655 2010-02-15 20:45:54 200 0 540 14655 2010-02-15 20:55:54 145 0 What I would like to calculate is the number of consecutive occurrences of values 200, 0 and together values from 1 til 199 (in fact the values that differ from 200 and 0) in column act. I would like to get something like this (result$res) result jultime act day res res2 510 14655 2010-02-15 18:25:54 130 1 33 512 14655 2010-02-15 18:35:54 23 1 33 514 14655 2010-02-15 18:45:54 45 1 33 516 14655 2010-02-15 18:55:54 200 1 33 518 14655 2010-02-15 19:05:54 200 1 33 520 14655 2010-02-15 19:15:54 200 1 33 522 14655 2010-02-15 19:25:54 199 1 22 524 14655 2010-02-15 19:35:54 150 1 22 526 14655 2010-02-15 19:45:54 0 1 42 528 14655 2010-02-15 19:55:54 0 1 42 530 14655 2010-02-15 20:05:54 0 0 42 532 14655 2010-02-15 20:15:54 0 0 42 534 14655 2010-02-15 20:25:54 34 0 11 536 14655 2010-02-15 20:35:54 200 0 22 538 14655 2010-02-15 20:45:54 200 0 22 540 14655 2010-02-15 20:55:54 145 0 11 And if possible, distinguish among day==1 and day==0 (see the act values of 0 for example), results as in result$res2. After it I would like to make a resume table per days (jul): where maxres is max(result$res) for the act value where minres is min(result$res) for the act value where sumres is sum(result$res) for the act value (for example, if the 200 value ocurrs in different times per day(jul) consecutively 3, 5, 1, 6 and 7 times the sumres would be 3+5+1+6+7= 22) something like this (this are made up numbers): julact maxres minres sumres 146550 4 1 25 14655 200 32 48 146551-199 3171 146560 8238 14656 200 15360 146561-199 114 46 ... (theoretically the sum of sumres per day(jul) should be 144) sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) I hope my explanation is sufficient. I appreciate any hint. Thank you, Zuzana [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to call an object given a string?
Hello, This is very basic and very frustrating. Suppose this: A=5 B=5 C=10 ls() A B C I would like this xpto() 5 5 10 How can I do xpto()? Thanks Rui [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Convert continuous variable into discrete variable
It is important to check for lack of fit of the categorized variable. One way to do this is to test for the additional predictive ability of the original continuous variable after adjusting for its categorized version. It is very uncommon for a categorized continuous variable to fit well, because its assumed discontinuities seldom exist in nature and most relationships are not piecewise flat. Frank levanovd wrote Or even simpler (no need to specify labels): x-runif(100,0,100) u - cut(x, breaks = c(0, 3, 4.5, 6, 8, Inf), labels = FALSE) - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Convert-continuous-variable-into-discrete-variable-tp3671032p4665699.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to call an object given a string?
Hi Rui, how about this sapply(ls(),get) cheers Am 29.04.2013 13:07, schrieb Rui Esteves: Hello, This is very basic and very frustrating. Suppose this: A=5 B=5 C=10 ls() A B C I would like this xpto() 5 5 10 How can I do xpto()? Thanks Rui [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Eik Vettorazzi Institut für Medizinische Biometrie und Epidemiologie Universitätsklinikum Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790 -- Pflichtangaben gemäß Gesetz über elektronische Handelsregister und Genossenschaftsregister sowie das Unternehmensregister (EHUG): Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; Gerichtsstand: Hamburg Vorstandsmitglieder: Prof. Dr. Martin Zeitz (Vorsitzender), Dr. Alexander Kirstein, Joachim Prölß, Prof. Dr. Dr. Uwe Koch-Gromus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Comparing two different 'survival' events for the same subject using survdiff?
It isn't that complex: myDataLong - data.frame(Time=c(A, C), Censored=c(B, D), group=rep(0:1, times=c(length(A), length(C Fit = survfit(Surv(Time, Censored==0) ~ group, data=myDataLong) plot(Fit, col=1:2) survdiff(Surv(Time, Censored==0) ~ group, data=myDataLong) However, your approach (a 'wide' data frame) suggests that there are equal numbers in the two survival studies. Are they even the same people? Is it even the same study? If so, this is a competing risks question and would have to be approached differently. And, of course, absence of evidence is not evidence of absence. Failing to reject the null hypothesis that the distributions are different is not proof that the distributions are equal. Chris -Original Message- From: Polwart Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST) [mailto:calum.polw...@nhs.net] Sent: Monday, April 29, 2013 4:48 AM To: r-help@r-project.org Subject: [R] Comparing two different 'survival' events for the same subject using survdiff? I have a dataset which for the sake of simplicity has two endpoints. We would like to test if two different end-points have the same eventual meaning. To try and take an example that people might understand better: Lets assume we had a group of subjects who all received a treatment. The could stop treatment for any reason (side effects, treatment stops working etc). Getting that data is very easy. Measuring if treatment stops working is very hard to capture... so we would like to test if duration on treatment (easy) is the same as time to treatment failure (hard). My data might look like this: A = c(9.77, 0.43, 0.03, 3.50, 7.07, 6.57, 8.57, 2.30, 6.17, 3.27, 2.57, 0.77) B = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1) # 1 = yes (censored) C = c( 9.80, 0.43, 5.93, 8.43, 6.80, 2.60, 8.93, 8.37, 12.23, 5.83, 13.17, 0.77) D = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1) # 1 = yes (censored) myData = data.frame (TimeOnTx = A, StillOnTx = B, TimeToFailure = C, NotFailed = D) We can do a survival analysis on those individually: OnTxFit = survfit (Surv ( TimeOnTx, StillOnTx==0 ) ~ 1 , data = myData) FailedFit = survfit (Surv ( TimeToFailure , NotFailed==0 ) ~ 1 , data = myData) plot(OnTxFit) lines(OnTxFit) But how can I do a survdiff type of comparison between the two? Do I have to restructure the data so that Time's are all in one column, Event in another and then a Group to indicate what type of event it is? Seems a complex way to do it (especially as the dataset is of course more complex than I've just shown)... so I thought maybe I'm missing something... This message may contain confidential information. If yo...{{dropped:7}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Comparing two different 'survival' events for the same subject using survdiff?
It isn't that complex: myDataLong - data.frame(Time=c(A, C), Censored=c(B, D), group=rep(0:1, times=c(length(A), length(C Fit = survfit(Surv(Time, Censored==0) ~ group, data=myDataLong) plot(Fit, col=1:2) survdiff(Surv(Time, Censored==0) ~ group, data=myDataLong) Yes - for the example its not complex - but once we get down to having more data columns I think it may... Maybe I ignore those and just build 'myDataLong' for this specific test. However, your approach (a 'wide' data frame) suggests that there are equal numbers in the two survival studies. Are they even the same people? Is it even the same study? If so, this is a competing risks question and would have to be approached differently. Yes its the same patients. The two events are technically independant of each other but the hope is that the easier outcome measure would predict the other... I'm not familliar with competing risks and so will have to read up on it but it isn't a scenario where A or B happens, A happens and B happens and you might expect A happened because B happened... And, of course, absence of evidence is not evidence of absence. Failing to reject the null hypothesis that the distributions are different is not proof that the distributions are equal. Yes absolutely - however I'm half expecting to detect a difference and so then dismiss using A as a surrogate of B... Thanks -Original Message- From: Polwart Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST) [mailto:calum.polw...@nhs.net] Sent: Monday, April 29, 2013 4:48 AM To: r-help@r-project.org Subject: [R] Comparing two different 'survival' events for the same subject using survdiff? I have a dataset which for the sake of simplicity has two endpoints. We would like to test if two different end-points have the same eventual meaning. To try and take an example that people might understand better: Lets assume we had a group of subjects who all received a treatment. The could stop treatment for any reason (side effects, treatment stops working etc). Getting that data is very easy. Measuring if treatment stops working is very hard to capture... so we would like to test if duration on treatment (easy) is the same as time to treatment failure (hard). My data might look like this: A = c(9.77, 0.43, 0.03, 3.50, 7.07, 6.57, 8.57, 2.30, 6.17, 3.27, 2.57, 0.77) B = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1) # 1 = yes (censored) C = c( 9.80, 0.43, 5.93, 8.43, 6.80, 2.60, 8.93, 8.37, 12.23, 5.83, 13.17, 0.77) D = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1) # 1 = yes (censored) myData = data.frame (TimeOnTx = A, StillOnTx = B, TimeToFailure = C, NotFailed = D) We can do a survival analysis on those individually: OnTxFit = survfit (Surv ( TimeOnTx, StillOnTx==0 ) ~ 1 , data = myData) FailedFit = survfit (Surv ( TimeToFailure , NotFailed==0 ) ~ 1 , data = myData) plot(OnTxFit) lines(OnTxFit) But how can I do a survdiff type of comparison between the two? Do I have to restructure the data so that Time's are all in one column, Event in another and then a Group to indicate what type of event it is? Seems a complex way to do it (especially as the dataset is of course more complex than I've just shown)... so I thought maybe I'm missing something... This message may contain confidential information. If yo...{{dropped:29}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting number of consecutive occurrences per rows
Forgot the last part of the question: test - structure(list(jul = structure(c(14655, 14655, 14655, 14655, + 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, + 14655, 14655, 14655), origin = structure(0, class = Date)), + time = structure(c(1266258354, 1266258954, 1266259554, 1266260154, + 1266260754, 1266261354, 1266261954, 1266262554, 1266263154, + 1266263754, 1266264354, 1266264954, 1266265554, 1266266154, + 1266266754, 1266267354), class = c(POSIXct, POSIXt), tzone = + GMT), + act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0, + 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, + 0, 0, 0, 0, 0, 0)), .Names = c(jul, time, act, day + ), class = data.frame, row.names = c(510L, 512L, 514L, 516L, + 518L, 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L, + 540L)) # add key to separate data test$key - ifelse(test$act == 0 + , 1L # 0 + , ifelse(test$act == 200 + , 3L # 200 + , 2L # 1-199 + ) + ) # mark changes in sequence test$resChange - cumsum(c(1L, abs(diff(test$key test$res - ave(test$resChange, test$resChange, FUN = length) test$res2 - ave(test$resChange, test$resChange, test$day, FUN = length) require(data.table) # use this for aggregation test - data.table(test) testResume - test[ + , list(maxres = max(res) + , minres = min(res) + , sumres = length(unique(resChange)) + ) + , keyby = c('day', 'key') + ] # change 'key' testResume$key - c('0', '1-199', '200')[testResume$key] testResume day key maxres minres sumres 1: 0 0 4 4 1 2: 0 1-199 1 1 2 3: 0 200 2 2 1 4: 1 0 4 4 1 5: 1 1-199 3 2 2 6: 1 200 3 3 1 On Mon, Apr 29, 2013 at 6:44 AM, zuzana zajkova zuzu...@gmail.com wrote: Hi, I would appreciate if somebody could help me with following calculation. I have a dataframe, by 10 minutes time, for mostly one year data. This is small example: dput(test) structure(list(jul = structure(c(14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655), origin = structure(0, class = Date)), time = structure(c(1266258354, 1266258954, 1266259554, 1266260154, 1266260754, 1266261354, 1266261954, 1266262554, 1266263154, 1266263754, 1266264354, 1266264954, 1266265554, 1266266154, 1266266754, 1266267354), class = c(POSIXct, POSIXt), tzone = GMT), act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0, 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0)), .Names = c(jul, time, act, day ), class = data.frame, row.names = c(510L, 512L, 514L, 516L, 518L, 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L, 540L)) Looks like this: test jultime act day 510 14655 2010-02-15 18:25:54 130 1 512 14655 2010-02-15 18:35:54 23 1 514 14655 2010-02-15 18:45:54 45 1 516 14655 2010-02-15 18:55:54 200 1 518 14655 2010-02-15 19:05:54 200 1 520 14655 2010-02-15 19:15:54 200 1 522 14655 2010-02-15 19:25:54 199 1 524 14655 2010-02-15 19:35:54 150 1 526 14655 2010-02-15 19:45:54 0 1 528 14655 2010-02-15 19:55:54 0 1 530 14655 2010-02-15 20:05:54 0 0 532 14655 2010-02-15 20:15:54 0 0 534 14655 2010-02-15 20:25:54 34 0 536 14655 2010-02-15 20:35:54 200 0 538 14655 2010-02-15 20:45:54 200 0 540 14655 2010-02-15 20:55:54 145 0 What I would like to calculate is the number of consecutive occurrences of values 200, 0 and together values from 1 til 199 (in fact the values that differ from 200 and 0) in column act. I would like to get something like this (result$res) result jultime act day res res2 510 14655 2010-02-15 18:25:54 130 1 33 512 14655 2010-02-15 18:35:54 23 1 33 514 14655 2010-02-15 18:45:54 45 1 33 516 14655 2010-02-15 18:55:54 200 1 33 518 14655 2010-02-15 19:05:54 200 1 33 520 14655 2010-02-15 19:15:54 200 1 33 522 14655 2010-02-15 19:25:54 199 1 22 524 14655 2010-02-15 19:35:54 150 1 22 526 14655 2010-02-15 19:45:54 0 1 42 528 14655 2010-02-15 19:55:54 0 1 42 530 14655 2010-02-15 20:05:54 0 0 42 532 14655 2010-02-15 20:15:54 0 0 42 534 14655 2010-02-15 20:25:54 34 0 11 536 14655 2010-02-15 20:35:54 200 0 22 538 14655 2010-02-15 20:45:54 200 0 22 540 14655 2010-02-15 20:55:54 145 0 11 And if possible, distinguish among day==1 and day==0 (see the act values of 0 for example), results as in result$res2. After it I would like to make a resume table per days (jul): where maxres is max(result$res) for the act value where minres is min(result$res)
[R] Hi
What is the entry code formula autocovariance and autocorrelation in R program for these data? ac(2,3.5,3.5,2.2,2.2,3.3,2.5,2.5,3.2,2.5,2.5,2.7,1.7,2.7,2.9,2.3,2.7,3,1.8,2.5,3.1,2.5,2.5,3.2,2.7,1.9,2.6,2.3,2.7,3.2,2.2,1.5,2.3,2.6,2.5,2.9,2,2.5,2.6,2.4,2.6,2.8,2.5,2.6,3.2,1.8,2.7,3.4,2.2,2.9,3.2) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] BCP utility
Hello, Currently we can load the data with the Bulkload facility with SAS using the BCP utility instead of the t-sql command BULK INSERT to copy data from a file to a SQL table. From now I can see that RODBC package use only the t-sql command BULK INSERT. It could be interesting to see if the R command can accept the use of the BCP utility instead of the use of the t-sql command Bulk insert. Using BCP should avoid the need of the high privilege Bulkadmin requested with the t-sql command BULK INSERT. Some of you know if the BCP utility is usable with R? Stef. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] quesion about model g1
Thanks for your reply. As to g2 and g3: g2: a + d + c + a:d + a:c + d:c g3: a + d + c + a:d + a:c + d:c + a:d:c The only difference between g2 and g3 is a:d:c,which refers to case depends on drug but in the same way for different levels of age. And anova tests whether this only differenceis significant. But as to g1 and g3: g1: a + d + c + a:d + a:c g3: a + d + c + a:d + a:c + d:c + a:d:c The only difference between g1 and g3 is d:c + a:d:c. What's d:c + a:d:c refers to? Thanks. At 2013-04-29 14:33:25,Achim Zeileis achim.zeil...@uibk.ac.at wrote: On Mon, 29 Apr 2013, meng wrote: Hello Achim: Sorry for another question about the model g1 in the last mail. As to model g2 and g3: g2 - glm(Freq ~ (age + drug + case)^2, data = df, family = poisson) g3 - glm(Freq ~ age * drug * case, data = df, family = poisson) anova(g2, g3, test = Chisq) I know clearly that the only difference between g2 and g3 is that g2 has no 3-way interaction while g3 has,and anova tests whether this only difference(i.e. 3-way interaction) is significant or not. But as to g1 and g3: g1 - glm(Freq ~ age/(drug + case), data = df, family = poisson) I can't find out the only difference between g1 and g3,so I don't know what anova(g1, g3, test = Chisq) tests for. Also, what / sign following age in g1 refers to? The / could be replaced by a * here and the fitted values and corresponding log-likelihood would not change. Only the coefficients change: / induces a nested coding while * employs the interaction coding. Breaking everything down to main and interaction effects (and ignoring the particular coding of the coefficients), the three models are g1: a + d + c + a:d + a:c g2: a + d + c + a:d + a:c + d:c g3: a + d + c + a:d + a:c + d:c + a:d:c with interpretations: g1: conditional independence of drug and case given age g2: no three-way interaction (case depends on drug but in the same way for different levels of age) g3: saturated model Many thanks and sorry for many quesions. Best At 2013-04-24 22:22:55,Achim Zeileis achim.zeil...@uibk.ac.at wrote: On Wed, 24 Apr 2013, meng wrote: Hi,Achim: Can all the analysis you mentioned via loglm be performed via glm(...,family=poisson)? Yes. ## transform table back to data.frame df - as.data.frame(tab) ## fit models: conditional independence, no-three way interaction, ## and saturated g1 - glm(Freq ~ age/(drug + case), data = df, family = poisson) g2 - glm(Freq ~ (age + drug + case)^2, data = df, family = poisson) g3 - glm(Freq ~ age * drug * case, data = df, family = poisson) ## likelihood-ratio tests (against saturated) anova(g1, g3, test = Chisq) anova(g2, g3, test = Chisq) ## compare fitted frequencies (which are essentially equal) all.equal(as.numeric(fitted(g1)), as.data.frame(as.table(fitted(m1)))$Freq) all.equal(as.numeric(fitted(g2)), as.data.frame(as.table(fitted(m2)))$Freq) The difference is mainly that loglm() has a specialized user interface and that it uses a different optimizer (iterative proportional fitting rather than iterative reweighted least squares). Best, Z Many thanks. At 2013-04-24 19:37:10,Achim Zeileis achim.zeil...@uibk.ac.at wrote: On Wed, 24 Apr 2013, meng wrote: Hi all: For stratified count data,how to perform regression analysis? My data: age case oc count 1 1 121 1 1 226 1 2 117 1 2 259 2 1 118 2 1 288 2 2 1 7 2 2 2 95 age: 1:40y 2:40y case: 1:patient 2:health oc: 1:use drug 2:not use drug My purpose: Anaysis whether case and oc are correlated, and age is a stratified varia ble. My solution: 1,Mantel-Haenszel test by using function mantelhaen.test 2,loglinear regression by using function glm(count~case*oc,family=poisson ).But I don't know how to handle variable age,which is the stratified vari able. Instead of using glm(family = poisson) it is also convenient to use loglm() from package MASS for the associated convenience table. The code below shows how to set up the contingency table, visualize it using package vcd, and then fit two models using loglm. The models considered are conditional independence of case and drug given age and the no three-way interaction already suggested by Peter. Both models are also accompanied by visualizations of the residuals. ## contingency table with nice labels tab - expand.grid(drug = 1:2, case = 1:2, age = 1:2) tab$count - c(21, 26, 17, 59, 18, 88, 7, 95) tab$age - factor(tab$age, levels = 1:2, labels = c(40, 40)) tab$case - factor(tab$case, levels = 1:2, labels = c(patient, healthy)) tab$drug - factor(tab$drug, levels = 1:2, labels = c(yes, no)) tab - xtabs(count ~ age + drug + case, data = tab) ## visualize case explained by drug given age library(vcd) mosaic(case ~ drug | age, data = tab, split_vertical = c(TRUE, TRUE, FALSE)) ## test
[R] all.vars for nested expressions
Dear R fellows, Assume I define a - expression(fn+tp) sen - expression(tp/a) Now I'd like to know, which variables are necessary for calculating sen all.vars(sen) This results in a vector c(tp,a). But I'd like all.vars to evaluate the sen-object down to the ground level, which would result in a vector c(tp,fn) (because a was defined as fn+tp). In other words, I'd like all.vars to expand the a-object (and all other downstream objects). I am looking for a solution, that works with much more levels. This is just a very simple example. I'd appreciate any suggestions how to do that very much! Thanks in advance, Felix [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to call an object given a string?
Hi, res- unlist(mget(ls())) names(res)-NULL res #[1] 5 5 10 A.K. - Original Message - From: Rui Esteves ruimax...@gmail.com To: r-help@r-project.org Cc: Sent: Monday, April 29, 2013 7:07 AM Subject: [R] How to call an object given a string? Hello, This is very basic and very frustrating. Suppose this: A=5 B=5 C=10 ls() A B C I would like this xpto() 5 5 10 How can I do xpto()? Thanks Rui [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] all.vars for nested expressions
Dear R fellows, Assume I define a - expression(fn+tp) sen - expression(tp/a) Now I'd like to know, which variables are necessary for calculating sen all.vars(sen) This results in a vector c(tp,a). But I'd like all.vars to evaluate the sen-object down to the ground level, which would result in a vector c(tp,fn) (because a was defined as fn+tp). In other words, I'd like all.vars to expand the a-object (and all other downstream objects). I am looking for a solution, that works with much more levels. This is just a very simple example. I'd appreciate any suggestions how to do that very much! Thanks in advance, Felix [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting number of consecutive occurrences per rows
try this: test - structure(list(jul = structure(c(14655, 14655, 14655, 14655, + 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, + 14655, 14655, 14655), origin = structure(0, class = Date)), + time = structure(c(1266258354, 1266258954, 1266259554, 1266260154, + 1266260754, 1266261354, 1266261954, 1266262554, 1266263154, + 1266263754, 1266264354, 1266264954, 1266265554, 1266266154, + 1266266754, 1266267354), class = c(POSIXct, POSIXt), tzone = + GMT), + act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0, + 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, + 0, 0, 0, 0, 0, 0)), .Names = c(jul, time, act, day + ), class = data.frame, row.names = c(510L, 512L, 514L, 516L, + 518L, 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L, + 540L)) # add key to separate data test$key - ifelse(test$act == 0 + , 1L # 0 + , ifelse(test$act == 200 + , 3L # 200 + , 2L # 1-199 + ) + ) # mark changes in sequence test$resChange - cumsum(c(1L, abs(diff(test$key test$res - ave(test$resChange, test$resChange, FUN = length) test$res2 - ave(test$resChange, test$resChange, test$day, FUN = length) test jultime act day key resChange res res2 510 14655 2010-02-15 18:25:54 130 1 2 1 33 512 14655 2010-02-15 18:35:54 23 1 2 1 33 514 14655 2010-02-15 18:45:54 45 1 2 1 33 516 14655 2010-02-15 18:55:54 200 1 3 2 33 518 14655 2010-02-15 19:05:54 200 1 3 2 33 520 14655 2010-02-15 19:15:54 200 1 3 2 33 522 14655 2010-02-15 19:25:54 199 1 2 3 22 524 14655 2010-02-15 19:35:54 150 1 2 3 22 526 14655 2010-02-15 19:45:54 0 1 1 4 42 528 14655 2010-02-15 19:55:54 0 1 1 4 42 530 14655 2010-02-15 20:05:54 0 0 1 4 42 532 14655 2010-02-15 20:15:54 0 0 1 4 42 534 14655 2010-02-15 20:25:54 34 0 2 5 11 536 14655 2010-02-15 20:35:54 200 0 3 6 22 538 14655 2010-02-15 20:45:54 200 0 3 6 22 540 14655 2010-02-15 20:55:54 145 0 2 7 11 On Mon, Apr 29, 2013 at 6:44 AM, zuzana zajkova zuzu...@gmail.com wrote: Hi, I would appreciate if somebody could help me with following calculation. I have a dataframe, by 10 minutes time, for mostly one year data. This is small example: dput(test) structure(list(jul = structure(c(14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655), origin = structure(0, class = Date)), time = structure(c(1266258354, 1266258954, 1266259554, 1266260154, 1266260754, 1266261354, 1266261954, 1266262554, 1266263154, 1266263754, 1266264354, 1266264954, 1266265554, 1266266154, 1266266754, 1266267354), class = c(POSIXct, POSIXt), tzone = GMT), act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0, 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0)), .Names = c(jul, time, act, day ), class = data.frame, row.names = c(510L, 512L, 514L, 516L, 518L, 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L, 540L)) Looks like this: test jultime act day 510 14655 2010-02-15 18:25:54 130 1 512 14655 2010-02-15 18:35:54 23 1 514 14655 2010-02-15 18:45:54 45 1 516 14655 2010-02-15 18:55:54 200 1 518 14655 2010-02-15 19:05:54 200 1 520 14655 2010-02-15 19:15:54 200 1 522 14655 2010-02-15 19:25:54 199 1 524 14655 2010-02-15 19:35:54 150 1 526 14655 2010-02-15 19:45:54 0 1 528 14655 2010-02-15 19:55:54 0 1 530 14655 2010-02-15 20:05:54 0 0 532 14655 2010-02-15 20:15:54 0 0 534 14655 2010-02-15 20:25:54 34 0 536 14655 2010-02-15 20:35:54 200 0 538 14655 2010-02-15 20:45:54 200 0 540 14655 2010-02-15 20:55:54 145 0 What I would like to calculate is the number of consecutive occurrences of values 200, 0 and together values from 1 til 199 (in fact the values that differ from 200 and 0) in column act. I would like to get something like this (result$res) result jultime act day res res2 510 14655 2010-02-15 18:25:54 130 1 33 512 14655 2010-02-15 18:35:54 23 1 33 514 14655 2010-02-15 18:45:54 45 1 33 516 14655 2010-02-15 18:55:54 200 1 33 518 14655 2010-02-15 19:05:54 200 1 33 520 14655 2010-02-15 19:15:54 200 1 33 522 14655 2010-02-15 19:25:54 199 1 22 524 14655 2010-02-15 19:35:54 150 1 22 526 14655 2010-02-15 19:45:54 0 1 42 528 14655 2010-02-15 19:55:54 0 1 42 530 14655 2010-02-15 20:05:54 0 0 42 532 14655 2010-02-15 20:15:54 0 0 42 534 14655 2010-02-15
[R] Adding elements in data.frame subsets and also subtracting an element from the rest elements in data.frame
Dear R forum I have a data.frame as cashflow_df = data.frame(instrument = c(ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC, ABC, PQR, PQR, PQR,PQR,PQR,PQR,PQR,PQR,PQR,PQR, PQR, PQR, PQR,PQR, PQR,PQR,PQR,PQR, PQR,PQR,UVWXYZ,UVWXYZ, UVWXYZ, UVWXYZ, UVWXYZ,UVWXYZ,UVWXYZ,UVWXYZ, UVWXYZ, UVWXYZ), id = c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5, 1,1,2,2,3,3,4,4, 5,5), cashflow = c(5000,5000,505000,5000,5000,505000,5000,5000,505000, 5000,5000, 505000, 5000,5000,505000,500,500,500,102000,500,500,500,102000,500,500,500,102000,500,500,500,102000,500,500,500,102000,8000,808000,8000,808000,8000,808000,8000,808000,8000,808000), cashflows_pv = c(4931.054, 4479.1116, 431160.8529,4931.9604, 4485.6393, 432064.0228, 4932.5438,4489.8451,432646.2398,4932.1548,4487.0404,432257.9551,4932.6087,4490.3129,432711.0084,493.6326,474.0524,455.2489,82252.0304,493.8083,474.7543,456.4356,82744.9157,493.6003,473.9235,455.031,82161.7368,493.8175,474.7913,456.4982,82770.9849,493.8592,474.9581,456.7804,82888.4556,7451.3118,681810.5522,7462.0148,684153.4992,7441.1294,679585.9186,7426.6407,676427.7274,7427.1225,676532.6262)) # __ cashflow_df instrument id cashflow cashflows_pv 1 ABC 1 5000 4931.0540 2 ABC 1 5000 4479.1116 3 ABC 1 505000 431160.8529 4 ABC 2 5000 4931.9604 5 ABC 2 5000 4485.6393 6 ABC 2 505000 432064.0228 7 ABC 3 5000 4932.5438 8 ABC 3 5000 4489.8451 9 ABC 3 505000 432646.2398 10 ABC 4 5000 4932.1548 11 ABC 4 5000 4487.0404 12 ABC 4 505000 432257.9551 13 ABC 5 5000 4932.6087 14 ABC 5 5000 4490.3129 15 ABC 5 505000 432711.0084 16 PQR 1 500 493.6326 17 PQR 1 500 474.0524 18 PQR 1 500 455.2489 19 PQR 1 102000 82252.0304 20 PQR 2 500 493.8083 21 PQR 2 500 474.7543 22 PQR 2 500 456.4356 23 PQR 2 102000 82744.9157 24 PQR 3 500 493.6003 25 PQR 3 500 473.9235 26 PQR 3 500 455.0310 27 PQR 3 102000 82161.7368 28 PQR 4 500 493.8175 29 PQR 4 500 474.7913 30 PQR 4 500 456.4982 31 PQR 4 102000 82770.9849 32 PQR 5 500 493.8592 33 PQR 5 500 474.9581 34 PQR 5 500 456.7804 35 PQR 5 102000 82888.4556 36 UVWXYZ 1 8000 7451.3118 37 UVWXYZ 1 808000 681810.5522 38 UVWXYZ 2 8000 7462.0148 39 UVWXYZ 2 808000 684153.4992 40 UVWXYZ 3 8000 7441.1294 41 UVWXYZ 3 808000 679585.9186 42 UVWXYZ 4 8000 7426.6407 43 UVWXYZ 4 808000 676427.7274 44 UVWXYZ 5 8000 7427.1225 45 UVWXYZ 5 808000 676532.6262 # === # My PROBLEM For a given instrument and id, I need the totals of cashflow and cashflows_pv and also the difference of (total_cashflow_pv pertaining to the first ID for the given instrument from total_cashflow_pv for the same instrument) as shown in the fourth column of following output. output instrument id total_cashflow total_cashflow_pv 1 ABC 1 515000 440571.02 2 ABC 2 515000 441481.62 3 ABC 3 515000 442068.63 4 ABC 4 515000 441677.15 5 ABC 5 515000 442133.93 6 PQR 1 103500 83674.96 7 PQR 2 103500 84169.91 8 PQR 3 103500 83584.29 9 PQR 4 103500 84196.09 10 PQR 5 103500 84314.05 11 UVWXYZ 1 816000 689261.86 12 UVWXYZ 2 816000 691615.51 13 UVWXYZ 3 816000 687027.05 14 UVWXYZ 4 816000 683854.37 15 UVWXYZ 5 816000 683959.75 cashflow_change 1 0. # This is (440571.02 - 440571.02) 1st ID value - 1st ID value for ABC 2 910.6040 # This is (441481.62 - 440571.02) 2nd ID value - 1st ID value for ABC 3 1497.6102 # This is (442068.63 - 440571.02) 3rd ID value - 1st ID value for ABC 4 1106.1318 5 1562.9115 6 0. # This is (83674.96 - 83674.96) 1st ID value - 1st ID value for PQR 7 494.9496 8 -90.6727 9 521.1276 10 639.0890 11 0. 12 2353.6500 13 -2234.8160 14 -5407.4959 15 -5302.1153 # This is (683959.75 -689261.86 ) 5th ID value - 1st ID value for UVWXYZ Kindly guide Regards Katherine [[alternative HTML version deleted]]
Re: [R] Counting number of consecutive occurrences per rows
Hi rrr-rle(as.numeric(cut(test$act, c(0,1,199,200), include.lowest=T))) test$res - rep(rrr$lengths, rrr$lengths) If you put it in function fff- function(x, limits=c(0,1,199,200)) { rrr-rle(as.numeric(cut(x, limits, include.lowest=T))) res - rep(rrr$lengths, rrr$lengths) res } you can use split/lapply approach test$res2-unlist(lapply(split(test$act, factor(test$day, levels=c(1,0))), fff)) Beware of correct ordering of days in output. Without correct leveling of factor 0 precedes 1. And for the last part probably aggregate can be the way. aggregate(test$res, list(test$jul, cut(test$act, c(0,1,199,200), include.lowest=T)), max) Group.1 Group.2 x 1 14655 [0,1] 4 2 14655 (1,199] 3 3 14655 (199,200] 3 aggregate(test$res, list(test$jul, cut(test$act, c(0,1,199,200), include.lowest=T)), min) Group.1 Group.2 x 1 14655 [0,1] 4 2 14655 (1,199] 1 3 14655 (199,200] 2 Regards Petr -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of zuzana zajkova Sent: Monday, April 29, 2013 12:45 PM To: r-help@r-project.org Subject: [R] Counting number of consecutive occurrences per rows Hi, I would appreciate if somebody could help me with following calculation. I have a dataframe, by 10 minutes time, for mostly one year data. This is small example: dput(test) structure(list(jul = structure(c(14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655), origin = structure(0, class = Date)), time = structure(c(1266258354, 1266258954, 1266259554, 1266260154, 1266260754, 1266261354, 1266261954, 1266262554, 1266263154, 1266263754, 1266264354, 1266264954, 1266265554, 1266266154, 1266266754, 1266267354), class = c(POSIXct, POSIXt), tzone = GMT), act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0, 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0)), .Names = c(jul, time, act, day ), class = data.frame, row.names = c(510L, 512L, 514L, 516L, 518L, 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L, 540L)) Looks like this: test jultime act day 510 14655 2010-02-15 18:25:54 130 1 512 14655 2010-02-15 18:35:54 23 1 514 14655 2010-02-15 18:45:54 45 1 516 14655 2010-02-15 18:55:54 200 1 518 14655 2010-02-15 19:05:54 200 1 520 14655 2010-02-15 19:15:54 200 1 522 14655 2010-02-15 19:25:54 199 1 524 14655 2010-02-15 19:35:54 150 1 526 14655 2010-02-15 19:45:54 0 1 528 14655 2010-02-15 19:55:54 0 1 530 14655 2010-02-15 20:05:54 0 0 532 14655 2010-02-15 20:15:54 0 0 534 14655 2010-02-15 20:25:54 34 0 536 14655 2010-02-15 20:35:54 200 0 538 14655 2010-02-15 20:45:54 200 0 540 14655 2010-02-15 20:55:54 145 0 What I would like to calculate is the number of consecutive occurrences of values 200, 0 and together values from 1 til 199 (in fact the values that differ from 200 and 0) in column act. I would like to get something like this (result$res) result jultime act day res res2 510 14655 2010-02-15 18:25:54 130 1 33 512 14655 2010-02-15 18:35:54 23 1 33 514 14655 2010-02-15 18:45:54 45 1 33 516 14655 2010-02-15 18:55:54 200 1 33 518 14655 2010-02-15 19:05:54 200 1 33 520 14655 2010-02-15 19:15:54 200 1 33 522 14655 2010-02-15 19:25:54 199 1 22 524 14655 2010-02-15 19:35:54 150 1 22 526 14655 2010-02-15 19:45:54 0 1 42 528 14655 2010-02-15 19:55:54 0 1 42 530 14655 2010-02-15 20:05:54 0 0 42 532 14655 2010-02-15 20:15:54 0 0 42 534 14655 2010-02-15 20:25:54 34 0 11 536 14655 2010-02-15 20:35:54 200 0 22 538 14655 2010-02-15 20:45:54 200 0 22 540 14655 2010-02-15 20:55:54 145 0 11 And if possible, distinguish among day==1 and day==0 (see the act values of 0 for example), results as in result$res2. After it I would like to make a resume table per days (jul): where maxres is max(result$res) for the act value where minres is min(result$res) for the act value where sumres is sum(result$res) for the act value (for example, if the 200 value ocurrs in different times per day(jul) consecutively 3, 5, 1, 6 and 7 times the sumres would be 3+5+1+6+7= 22) something like this (this are made up numbers): julact maxres minres sumres 146550 4 1 25 14655 200 32 48 146551-199 3171 146560 8238 14656 200 15360 146561-199 114 46 ... (theoretically the sum of sumres per
[R] Arma - estimate of variance of white noise variables
Hi all, Suppose I am fitting an arma(p,q) model to a time series y_t. So, my model should contain (q+1) white noise variables. As far as I know, each of them should have the same variance. How do I get the estimate of this variance by running the arma(y) function (or is there any other way)? Appreciate your help. Thanks, Preetam -- Preetam Pal (+91)-9432212774 M-Stat 2nd Year, Room No. N-114 Statistics Division, C.V.Raman Hall Indian Statistical Institute, B.H.O.S. Kolkata. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hi
Fatos Baruti fatosbaruti at gmail.com writes: What is the entry code formula autocovariance and autocorrelation in R program for these data? a-c(2,3.5,3.5,2.2,2.2,3.3,2.5,2.5,3.2,2.5,2.5,2.7,1.7,2.7,2.9,2. 3,2.7,3,1.8,2.5,3.1,2.5,2.5,3.2,2.7,1.9,2.6,2.3,2.7,3.2, 2.2,1.5,2.3,2.6,2.5,2.9,2,2.5,2.6,2.4,2.6,2.8,2.5,2.6,3.2,1.8, 2.7,3.4,2.2,2.9,3.2) How hard did you look for an answer before posting to the list ... ? ?acf acf(a) acf(a)$acf acf(a)$acf*var(a) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cannot compile R on Cray XE6 HLRS HERMIT
Dear All, I am trying to compile R-3.0 on Cray xe6 (HLRS) HERMIT, no success so far. Here is my experience: I use this to configure and make R: CC=cc \ CXX=CC \ F77=ftn \ FC=ftn \ CPPFLAGS=-I$PREFIX/include \ LDFLAGS=-L$PREFIX/lib${LIBDIRSUFFIX} \ ./configure --prefix=$PREFIX \ --exec-prefix=$PREFIX \ --bindir=$PREFIX/bin \ --sbindir=$PREFIX/sbin \ --sysconfdir=$PKG/etc \ --localstatedir=$PKG/var \ --libdir=$PREFIX/lib${LIBDIRSUFFIX} \ --datarootdir=$PREFIX/share \ --datadir=$PREFIX/share \ --infodir=$PREFIX/info \ --mandir=$PREFIX/man \ --docdir=$PREFIX/doc/$PRGNAM-$VERSION \ rdocdir=$PREFIX/doc/$PRGNAM-$VERSION \ rincludedir=$PREFIX/include \ rsharedir=$PREFIX/share \ --disable-BLAS-shlib \ --with-blas \ --with-lapack \ --without-x \ || exit 1 make || exit 1 My environment is as follows: 1) modules/3.2.6.7 13) udreg/2.3.2-1.0401.5929.3.3.gem 25) configuration/1.0-1.0401.35391.1.2.gem 2) xtpe-network-gemini 14) ugni/4.0-1.0401.5928.9.5.gem 26) hosts/1.0-1.0401.35364.1.115.gem 3) xtpe-interlagos 15) pmi/4.0.1-1..9421.73.3.gem 27) lbcd/2.1-1.0401.35360.1.2.gem 4) cray-mpich2/5.6.4 16) dmapp/3.2.1-1.0401.5983.4.5.gem 28) nodehealth/5.0-1.0401.38460.12.18.gem 5) eswrap/1.0.917) gni-headers/2.1-1.0401.5675.4.4.gem 29) pdsh/2.26-1.0401.37449.1.1.gem 6) torque/2.5.918) xpmem/0.1-2.0401.36790.4.3.gem 30) shared-root/1.0-1.0401.37253.3.50.gem 7) moab/6.1.5.s199219) job/1.5.5-0.1_2.0401.35380.1.10.gem 31) switch/1.0-1.0401.36779.2.72.gem 8) system/ws_tools 20) csa/3.0.0-1_2.0401.37452.4.50.gem 32) xe-sysroot/4.1.40 9) system/hlrs-defaults21) dvs/1.8.6_0.9.0-1.0401.1401.1.120 33) atp/1.6.2 10) xt-asyncpe/5.19 22) rca/1.0.0-2.0401.38656.2.2.gem 11) gcc/4.7.2 23) audit/1.0.0-1.0401.37969.2.32.gem 12) xt-libsci/12.0.01 24) ccm/2.2.0-1.0401.37254.2.142 1. PrgEnv-gnu/4.1.40 checking for C libraries of cc -std=gnu99... -L/univ_2/ws3/ws/ipmiva-WRF_331_CORDEX-0/system/usr/lib64 -L/opt/cray/udreg/2.3.2-1.0401.5929.3.3.gem/lib64 -L/opt/cray/ugni/4.0-1.0401.5928.9.5.gem/lib64 -L/opt/cray/pmi/4.0.1-1..9421.73.3.gem/lib64 -L/opt/cray/dmapp/3.2.1-1.0401.5983.4.5.gem/lib64 -L/opt/cray/xpmem/0.1-2.0401.36790.4.3.gem/lib64 -L/opt/cray/rca/1.0.0-2.0401.38656.2.2.gem/lib64 -L/opt/cray/mpt/5.6.4/gni/mpich2-gnu/47/lib -L/opt/cray/libsci/12.0.01/gnu/47/interlagos/lib -L/opt/cray/xe-sysroot/4.1.40/usr/lib64 -L/opt/cray/xe-sysroot/4.1.40/lib64 -L/opt/cray/xe-sysroot/4.1.40/usr/lib/alps -L/usr/lib/alps -L/opt/gcc/4.7.2/snos/lib/gcc/x86_64-suse-linux/4.7.2 -L/opt/gcc/4.7.2/snos/lib/gcc/x86_64-suse-linux/4.7.2/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/opt/gcc/4.7.2/snos/lib/gcc/x86_64-suse-linux/4.7.2/../../.. -lrca -L/opt/cray/atp/1.6.2/lib/ -lAtpSigHCommData -lAtpSigHandler -lgfortran -lscicpp_gnu -lsci_gnu_mp -lstdc++ -lmpich_gnu_47 -lmpl -lrt -lxpmem -ldmapp -lugni -lpmi -lalpslli -lalpsutil -ludreg -lpthread -lm -lgomp -lgcc_eh checking for dummy main to link with Fortran 77 libraries... none checking for Fortran 77 name-mangling scheme... lower case, underscore, no extra underscore checking whether ftn appends underscores to external names... yes checking whether ftn appends extra underscores to external names... no checking whether mixed C/Fortran code can be run... yes checking whether ftn and cc -std=gnu99 agree on int and double... configure: WARNING: ftn and cc -std=gnu99 disagree on int and double configure: error: Maybe change CFLAGS or FFLAGS? 2. PrgEnv-gnu/4.1.40 + craype-target-native R is now configured for x86_64-unknown-linux-gnu Source directory: . Installation directory:/univ_2/ws3/ws/ipmiva-WRF_331_CORDEX-0/system/usr C compiler:cc -std=gnu99 -g -O2 Fortran 77 compiler: gfortran -g -O2 C++ compiler: CC -g -O2 Fortran 90/95 compiler:ftn -g -O2 Obj-C compiler: Interfaces supported: External libraries:readline Additional capabilities: JPEG Options enabled: shared R library, shared BLAS, R profiling, memory profiling, strict barrier, static HTML Recommended packages: yes nmath/*.o` ../extra/zlib/libz.a ../extra/bzip2/libbz2.a ../extra/pcre/libpcre.a ../extra/tre/libtre.a ../extra/xz/liblzma.a -L../../lib -lRblas -lgfortran -lm -lquadmath -lreadline -lncurses -lrt -ldl -lm make[4]: Entering directory `/univ_2/ws3/ws/ipmiva-WRF_331_CORDEX-0/build/tmp/R-3.0.0/src/main' mkdir -p -- /univ_2/ws3/ws/ipmiva-WRF_331_CORDEX-0/build/tmp/R-3.0.0/bin/exec make[4]: Leaving directory `/univ_2/ws3/ws/ipmiva-WRF_331_CORDEX-0/build/tmp/R-3.0.0/src/main' make[3]: Leaving
Re: [R] Adding elements in data.frame subsets and also subtracting an element from the rest elements in data.frame
If this is a homework problem, there is a no homework policy on this list. -- Bert On Mon, Apr 29, 2013 at 5:24 AM, Katherine Gobin katherine_go...@yahoo.com wrote: Dear R forum I have a data.frame as cashflow_df = data.frame(instrument = c(ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC, ABC, PQR, PQR, PQR,PQR,PQR,PQR,PQR,PQR,PQR,PQR, PQR, PQR, PQR,PQR, PQR,PQR,PQR,PQR, PQR,PQR,UVWXYZ,UVWXYZ, UVWXYZ, UVWXYZ, UVWXYZ,UVWXYZ,UVWXYZ,UVWXYZ, UVWXYZ, UVWXYZ), id = c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5, 1,1,2,2,3,3,4,4, 5,5), cashflow = c(5000,5000,505000,5000,5000,505000,5000,5000,505000, 5000,5000, 505000, 5000,5000,505000,500,500,500,102000,500,500,500,102000,500,500,500,102000,500,500,500,102000,500,500,500,102000,8000,808000,8000,808000,8000,808000,8000,808000,8000,808000), cashflows_pv = c(4931.054, 4479.1116, 431160.8529,4931.9604, 4485.6393, 432064.0228, 4932.5438,4489.8451,432646.2398,4932.1548,4487.0404,432257.9551,4932.6087,4490.3129,432711.0084,493.6326,474.0524,455.2489,82252.0304,493.8083,474.7543,456.4356,82744.9157,493.6003,473.9235,455.031,82161.7368,493.8175,474.7913,456.4982,82770.9849,493.8592,474.9581,456.7804,82888.4556,7451.3118,681810.5522,7462.0148,684153.4992,7441.1294,679585.9186,7426.6407,676427.7274,7427.1225,676532.6262)) # __ cashflow_df instrument id cashflow cashflows_pv 1 ABC 1 50004931.0540 2 ABC 1 50004479.1116 3 ABC 1 505000 431160.8529 4 ABC 2 50004931.9604 5 ABC 2 50004485.6393 6 ABC 2 505000 432064.0228 7 ABC 3 50004932.5438 8 ABC 3 50004489.8451 9 ABC 3 505000 432646.2398 10ABC 4 50004932.1548 11ABC 4 50004487.0404 12ABC 4 505000 432257.9551 13ABC 5 50004932.6087 14ABC 5 50004490.3129 15ABC 5 505000 432711.0084 16PQR 1 500 493.6326 17PQR 1 500 474.0524 18PQR 1 500 455.2489 19PQR 1 102000 82252.0304 20PQR 2 500 493.8083 21PQR 2 500 474.7543 22PQR 2 500 456.4356 23PQR 2 102000 82744.9157 24PQR 3 500 493.6003 25PQR 3 500 473.9235 26PQR 3 500 455.0310 27PQR 3 102000 82161.7368 28PQR 4 500 493.8175 29PQR 4 500 474.7913 30PQR 4 500 456.4982 31PQR 4 102000 82770.9849 32PQR 5 500 493.8592 33PQR 5 500 474.9581 34PQR 5 500 456.7804 35PQR 5 102000 82888.4556 36 UVWXYZ 1 80007451.3118 37 UVWXYZ 1 808000 681810.5522 38 UVWXYZ 2 80007462.0148 39 UVWXYZ 2 808000 684153.4992 40 UVWXYZ 3 80007441.1294 41 UVWXYZ 3 808000 679585.9186 42 UVWXYZ 4 80007426.6407 43 UVWXYZ 4 808000 676427.7274 44 UVWXYZ 5 80007427.1225 45 UVWXYZ 5 808000 676532.6262 # === # My PROBLEM For a given instrument and id, I need the totals of cashflow and cashflows_pv and also the difference of (total_cashflow_pv pertaining to the first ID for the given instrument from total_cashflow_pv for the same instrument) as shown in the fourth column of following output. output instrument id total_cashflow total_cashflow_pv 1 ABC 1 515000 440571.02 2 ABC 2 515000 441481.62 3 ABC 3 515000 442068.63 4 ABC 4 515000 441677.15 5 ABC 5 515000 442133.93 6 PQR 1 103500 83674.96 7 PQR 2 103500 84169.91 8 PQR 3 103500 83584.29 9 PQR 4 103500 84196.09 10PQR 5 103500 84314.05 11 UVWXYZ 1 816000 689261.86 12 UVWXYZ 2 816000 691615.51 13 UVWXYZ 3 816000 687027.05 14 UVWXYZ 4 816000 683854.37 15 UVWXYZ 5 816000 683959.75 cashflow_change 1 0. # This is (440571.02 - 440571.02) 1st ID value - 1st ID value for ABC 2 910.6040# This is (441481.62 - 440571.02) 2nd ID value - 1st ID value for ABC 31497.6102 # This is (442068.63 - 440571.02) 3rd ID value - 1st ID value for ABC 41106.1318 51562.9115 6 0.# This is (83674.96 - 83674.96) 1st ID value - 1st ID value for PQR 7 494.9496 8 -90.6727 9 521.1276 10639.0890 11 0. 12
[R] Need help on matrix calculation
Hello again, Let say I have 1 matrix: Mat - matrix(1:12, 4, 3) rownames(Mat) - letters[1:4] Now I want to subscript of Mat in following way: Subscript_Vec - c(a, e, b, c) However when I want to use this vector, I am geting following error: Mat[Subscript_Vec, ] Error: subscript out of bounds Basically I want to get my final matrix in following way: V1 V2 V3 a 1 5 9 e NA NA NA b 2 6 10 c 3 7 11 i.e. if some of the element(s) in 'Subscript_Vec' is not in 'Mat' then that row would be filled by NA, WITHOUT altering the sequence of 'Subscript_Vec' Is there any direct way to achieve that? Thanks and regards, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding elements in data.frame subsets and also subtracting an element from the rest elements in data.frame
Hi Katherine, res1-aggregate(cbind(cashflow,cashflows_pv)~instrument+id,data=cashflow_df,sum) res2-res1[order(res1$instrument),] res2$cashflow_change-with(res2,ave(cashflows_pv,instrument,FUN=function(x) x-head(x,1))) names(res2)[3:4]- paste0(total_,names(res2)[3:4]) res2 # instrument id total_cashflow total_cashflows_pv cashflow_change #1 ABC 1 515000 440571.02 0. #4 ABC 2 515000 441481.62 910.6040 #7 ABC 3 515000 442068.63 1497.6102 #10 ABC 4 515000 441677.15 1106.1318 #13 ABC 5 515000 442133.93 1562.9115 #2 PQR 1 103500 83674.96 0. #5 PQR 2 103500 84169.91 494.9496 #8 PQR 3 103500 83584.29 -90.6727 #11 PQR 4 103500 84196.09 521.1276 #14 PQR 5 103500 84314.05 639.0890 #3 UVWXYZ 1 816000 689261.86 0. #6 UVWXYZ 2 816000 691615.51 2353.6500 #9 UVWXYZ 3 816000 687027.05 -2234.8160 #12 UVWXYZ 4 816000 683854.37 -5407.4959 #15 UVWXYZ 5 816000 683959.75 -5302.1153 A.K. - Original Message - From: Katherine Gobin katherine_go...@yahoo.com To: r-help@r-project.org Cc: Sent: Monday, April 29, 2013 8:24 AM Subject: [R] Adding elements in data.frame subsets and also subtracting an element from the rest elements in data.frame Dear R forum I have a data.frame as cashflow_df = data.frame(instrument = c(ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC, ABC, PQR, PQR, PQR,PQR,PQR,PQR,PQR,PQR,PQR,PQR, PQR, PQR, PQR,PQR, PQR,PQR,PQR,PQR, PQR,PQR,UVWXYZ,UVWXYZ, UVWXYZ, UVWXYZ, UVWXYZ,UVWXYZ,UVWXYZ,UVWXYZ, UVWXYZ, UVWXYZ), id = c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5, 1,1,2,2,3,3,4,4, 5,5), cashflow = c(5000,5000,505000,5000,5000,505000,5000,5000,505000, 5000,5000, 505000, 5000,5000,505000,500,500,500,102000,500,500,500,102000,500,500,500,102000,500,500,500,102000,500,500,500,102000,8000,808000,8000,808000,8000,808000,8000,808000,8000,808000), cashflows_pv = c(4931.054, 4479.1116, 431160.8529,4931.9604, 4485.6393, 432064.0228, 4932.5438,4489.8451,432646.2398,4932.1548,4487.0404,432257.9551,4932.6087,4490.3129,432711.0084,493.6326,474.0524,455.2489,82252.0304,493.8083,474.7543,456.4356,82744.9157,493.6003,473.9235,455.031,82161.7368,493.8175,474.7913,456.4982,82770.9849,493.8592,474.9581,456.7804,82888.4556,7451.3118,681810.5522,7462.0148,684153.4992,7441.1294,679585.9186,7426.6407,676427.7274,7427.1225,676532.6262)) # __ cashflow_df instrument id cashflow cashflows_pv 1 ABC 1 5000 4931.0540 2 ABC 1 5000 4479.1116 3 ABC 1 505000 431160.8529 4 ABC 2 5000 4931.9604 5 ABC 2 5000 4485.6393 6 ABC 2 505000 432064.0228 7 ABC 3 5000 4932.5438 8 ABC 3 5000 4489.8451 9 ABC 3 505000 432646.2398 10 ABC 4 5000 4932.1548 11 ABC 4 5000 4487.0404 12 ABC 4 505000 432257.9551 13 ABC 5 5000 4932.6087 14 ABC 5 5000 4490.3129 15 ABC 5 505000 432711.0084 16 PQR 1 500 493.6326 17 PQR 1 500 474.0524 18 PQR 1 500 455.2489 19 PQR 1 102000 82252.0304 20 PQR 2 500 493.8083 21 PQR 2 500 474.7543 22 PQR 2 500 456.4356 23 PQR 2 102000 82744.9157 24 PQR 3 500 493.6003 25 PQR 3 500 473.9235 26 PQR 3 500 455.0310 27 PQR 3 102000 82161.7368 28 PQR 4 500 493.8175 29 PQR 4 500 474.7913 30 PQR 4 500 456.4982 31 PQR 4 102000 82770.9849 32 PQR 5 500 493.8592 33 PQR 5 500 474.9581 34 PQR 5 500 456.7804 35 PQR 5 102000 82888.4556 36 UVWXYZ 1 8000 7451.3118 37 UVWXYZ 1 808000 681810.5522 38 UVWXYZ 2 8000 7462.0148 39 UVWXYZ 2 808000 684153.4992 40 UVWXYZ 3 8000 7441.1294 41 UVWXYZ 3 808000 679585.9186 42 UVWXYZ 4 8000 7426.6407 43 UVWXYZ 4 808000 676427.7274 44 UVWXYZ 5 8000 7427.1225 45 UVWXYZ 5 808000 676532.6262 # === # My PROBLEM For a given instrument and id, I need the totals of cashflow and cashflows_pv and also the difference of (total_cashflow_pv pertaining to the first ID for the given instrument from total_cashflow_pv for the same instrument) as shown in the fourth
Re: [R] Need help on matrix calculation
Christofer, The following should get you started: r - Mat[match(rownames(Mat), Subscript_Vec),] rownames(r) - Subscript_Vec r HTH, Jorge.- On Mon, Apr 29, 2013 at 11:38 PM, Christofer Bogaso bogaso.christo...@gmail.com wrote: Hello again, Let say I have 1 matrix: Mat - matrix(1:12, 4, 3) rownames(Mat) - letters[1:4] Now I want to subscript of Mat in following way: Subscript_Vec - c(a, e, b, c) However when I want to use this vector, I am geting following error: Mat[Subscript_Vec, ] Error: subscript out of bounds Basically I want to get my final matrix in following way: V1 V2 V3 a 1 5 9 e NA NA NA b 2 6 10 c 3 7 11 i.e. if some of the element(s) in 'Subscript_Vec' is not in 'Mat' then that row would be filled by NA, WITHOUT altering the sequence of 'Subscript_Vec' Is there any direct way to achieve that? Thanks and regards, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding elements in data.frame subsets and also subtracting an element from the rest elements in data.frame
You can also use: library(plyr) res-mutate(ddply(cashflow_df,.(instrument,id),numcolwise(sum)),cashflow_change=ave(cashflows_pv,instrument,FUN=function(x) x-head(x,1))) names(res)[3:4]- paste0(total_,names(res)[3:4]) res # instrument id total_cashflow total_cashflows_pv cashflow_change #1 ABC 1 515000 440571.02 0. #2 ABC 2 515000 441481.62 910.6040 #3 ABC 3 515000 442068.63 1497.6102 #4 ABC 4 515000 441677.15 1106.1318 #5 ABC 5 515000 442133.93 1562.9115 #6 PQR 1 103500 83674.96 0. #7 PQR 2 103500 84169.91 494.9496 #8 PQR 3 103500 83584.29 -90.6727 #9 PQR 4 103500 84196.09 521.1276 #10 PQR 5 103500 84314.05 639.0890 #11 UVWXYZ 1 816000 689261.86 0. #12 UVWXYZ 2 816000 691615.51 2353.6500 #13 UVWXYZ 3 816000 687027.05 -2234.8160 #14 UVWXYZ 4 816000 683854.37 -5407.4959 #15 UVWXYZ 5 816000 683959.75 -5302.1153 A.K. - Original Message - From: arun smartpink...@yahoo.com To: Katherine Gobin katherine_go...@yahoo.com Cc: R help r-help@r-project.org Sent: Monday, April 29, 2013 9:43 AM Subject: Re: [R] Adding elements in data.frame subsets and also subtracting an element from the rest elements in data.frame Hi Katherine, res1-aggregate(cbind(cashflow,cashflows_pv)~instrument+id,data=cashflow_df,sum) res2-res1[order(res1$instrument),] res2$cashflow_change-with(res2,ave(cashflows_pv,instrument,FUN=function(x) x-head(x,1))) names(res2)[3:4]- paste0(total_,names(res2)[3:4]) res2 # instrument id total_cashflow total_cashflows_pv cashflow_change #1 ABC 1 515000 440571.02 0. #4 ABC 2 515000 441481.62 910.6040 #7 ABC 3 515000 442068.63 1497.6102 #10 ABC 4 515000 441677.15 1106.1318 #13 ABC 5 515000 442133.93 1562.9115 #2 PQR 1 103500 83674.96 0. #5 PQR 2 103500 84169.91 494.9496 #8 PQR 3 103500 83584.29 -90.6727 #11 PQR 4 103500 84196.09 521.1276 #14 PQR 5 103500 84314.05 639.0890 #3 UVWXYZ 1 816000 689261.86 0. #6 UVWXYZ 2 816000 691615.51 2353.6500 #9 UVWXYZ 3 816000 687027.05 -2234.8160 #12 UVWXYZ 4 816000 683854.37 -5407.4959 #15 UVWXYZ 5 816000 683959.75 -5302.1153 A.K. - Original Message - From: Katherine Gobin katherine_go...@yahoo.com To: r-help@r-project.org Cc: Sent: Monday, April 29, 2013 8:24 AM Subject: [R] Adding elements in data.frame subsets and also subtracting an element from the rest elements in data.frame Dear R forum I have a data.frame as cashflow_df = data.frame(instrument = c(ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC, ABC, PQR, PQR, PQR,PQR,PQR,PQR,PQR,PQR,PQR,PQR, PQR, PQR, PQR,PQR, PQR,PQR,PQR,PQR, PQR,PQR,UVWXYZ,UVWXYZ, UVWXYZ, UVWXYZ, UVWXYZ,UVWXYZ,UVWXYZ,UVWXYZ, UVWXYZ, UVWXYZ), id = c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5, 1,1,2,2,3,3,4,4, 5,5), cashflow = c(5000,5000,505000,5000,5000,505000,5000,5000,505000, 5000,5000, 505000, 5000,5000,505000,500,500,500,102000,500,500,500,102000,500,500,500,102000,500,500,500,102000,500,500,500,102000,8000,808000,8000,808000,8000,808000,8000,808000,8000,808000), cashflows_pv = c(4931.054, 4479.1116, 431160.8529,4931.9604, 4485.6393, 432064.0228, 4932.5438,4489.8451,432646.2398,4932.1548,4487.0404,432257.9551,4932.6087,4490.3129,432711.0084,493.6326,474.0524,455.2489,82252.0304,493.8083,474.7543,456.4356,82744.9157,493.6003,473.9235,455.031,82161.7368,493.8175,474.7913,456.4982,82770.9849,493.8592,474.9581,456.7804,82888.4556,7451.3118,681810.5522,7462.0148,684153.4992,7441.1294,679585.9186,7426.6407,676427.7274,7427.1225,676532.6262)) # __ cashflow_df instrument id cashflow cashflows_pv 1 ABC 1 5000 4931.0540 2 ABC 1 5000 4479.1116 3 ABC 1 505000 431160.8529 4 ABC 2 5000 4931.9604 5 ABC 2 5000 4485.6393 6 ABC 2 505000 432064.0228 7 ABC 3 5000 4932.5438 8 ABC 3 5000 4489.8451 9 ABC 3 505000 432646.2398 10 ABC 4 5000 4932.1548 11 ABC 4 5000 4487.0404 12 ABC 4
Re: [R] Need help on matrix calculation
r # [,1] [,2] [,3] #a 1 5 9 #e 3 7 11 #b 4 8 12 #c NA NA NA I guess you meant: r1- Mat[match(Subscript_Vec,rownames(Mat)),] rownames(r1)- Subscript_Vec r1 # [,1] [,2] [,3] #a 1 5 9 #e NA NA NA #b 2 6 10 #c 3 7 11 A.K. - Original Message - From: Jorge I Velez jorgeivanve...@gmail.com To: Christofer Bogaso bogaso.christo...@gmail.com Cc: r-help r-help@r-project.org Sent: Monday, April 29, 2013 9:45 AM Subject: Re: [R] Need help on matrix calculation Christofer, The following should get you started: r - Mat[match(rownames(Mat), Subscript_Vec),] rownames(r) - Subscript_Vec r HTH, Jorge.- On Mon, Apr 29, 2013 at 11:38 PM, Christofer Bogaso bogaso.christo...@gmail.com wrote: Hello again, Let say I have 1 matrix: Mat - matrix(1:12, 4, 3) rownames(Mat) - letters[1:4] Now I want to subscript of Mat in following way: Subscript_Vec - c(a, e, b, c) However when I want to use this vector, I am geting following error: Mat[Subscript_Vec, ] Error: subscript out of bounds Basically I want to get my final matrix in following way: V1 V2 V3 a 1 5 9 e NA NA NA b 2 6 10 c 3 7 11 i.e. if some of the element(s) in 'Subscript_Vec' is not in 'Mat' then that row would be filled by NA, WITHOUT altering the sequence of 'Subscript_Vec' Is there any direct way to achieve that? Thanks and regards, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] all.vars for nested expressions
Hi Felix, I thought, this could be an easy task for substitute, and the following works as expected: all.vars(substitute(expression(tp/a),list(a=expression(fn+tp # [1] tp fn But (of course) all.vars(substitute(sen,list(a=a))) does not yield the desired result, and I can't figure out, how to set up as.name, bquote, eval, deparse etc to do the task properly. Instead, my approach is a recursive call to all.vars xall.help-function(x){ #check if there is an object with name x if(exists(x)) lapply(all.vars(get(x)),xall.help) else x} xall.vars-function(x){ if (!is.character(x)) x-paste(substitute(x)) #for convenience put in a single vecotr #xall.help returns a 'parsed tree' unique(unlist(xall.help(x))) } #example fn-expression(n1+n2) a - expression(fn+tp) sen - expression(tp/a) xall.vars(sen) # [1] tp n1 n2 cheers. Am 29.04.2013 13:33, schrieb flxms: Dear R fellows, Assume I define a - expression(fn+tp) sen - expression(tp/a) Now I'd like to know, which variables are necessary for calculating sen all.vars(sen) This results in a vector c(tp,a). But I'd like all.vars to evaluate the sen-object down to the ground level, which would result in a vector c(tp,fn) (because a was defined as fn+tp). In other words, I'd like all.vars to expand the a-object (and all other downstream objects). I am looking for a solution, that works with much more levels. This is just a very simple example. I'd appreciate any suggestions how to do that very much! Thanks in advance, Felix [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Eik Vettorazzi Institut für Medizinische Biometrie und Epidemiologie Universitätsklinikum Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790 -- Pflichtangaben gemäß Gesetz über elektronische Handelsregister und Genossenschaftsregister sowie das Unternehmensregister (EHUG): Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; Gerichtsstand: Hamburg Vorstandsmitglieder: Prof. Dr. Martin Zeitz (Vorsitzender), Dr. Alexander Kirstein, Joachim Prölß, Prof. Dr. Dr. Uwe Koch-Gromus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rbinding some elements from a list and obtain another list
Hi everybody, I have a list, where every element of this list is a data frame. An example: Mylist-list(A=data.frame, B=data.frame, C=data.frame, D=data.frame) I want to rbind some elements of this list. As an example: Output-list(AB=data.frame, CD=data.frame) Where AB=rbind(A,B) CD=rbind(C,D) Ive tried: f-function(x){ for (i in seq(1,length(names(x)),2)){ aa-do.call(rbind,x[i:i+1]) aa }} bb-f(mylist) or f-function(x){ for (i in seq(1,length(names(x)),2)){ aa[i]-do.call(rbind,x[i:i+1]) list(aa[i]) }} bb-f (mylist) but it doesnt works f-function(x){ + for (i in seq(1,length(names(x)),2)){ + aa-do.call(rbind,x[i:i+1]) + aa + }} bb-f(mylist) bb NULL f-function(x){ + for (i in seq(1,length(names(x)),2)){ + aa-do.call(rbind,x[i:i+1]) + aa + }} bb-f(mylist) f-function(x){ + for (i in seq(1,length(names(x)),2)){ + aa[i]-do.call(rbind,x[i:i+1]) + list(aa[i]) + }} bb-f(mylist) Mensajes de aviso perdidos 1: In aa[i] - do.call(rbind, x[i:i + 1]) : número de items para para sustituir no es un múltiplo de la longitud del reemplazo 2: In aa[i] - do.call(rbind, x[i:i + 1]) : número de items para para sustituir no es un múltiplo de la longitud del reemplazo 3: In aa[i] - do.call(rbind, x[i:i + 1]) : número de items para para sustituir no es un múltiplo de la longitud del reemplazo 4: In aa[i] - do.call(rbind, x[i:i + 1]) : número de items para para sustituir no es un múltiplo de la longitud del reemplazo 5: In aa[i] - do.call(rbind, x[i:i + 1]) : número de items para para sustituir no es un múltiplo de la longitud del reemplazo 6: In aa[i] - do.call(rbind, x[i:i + 1]) : número de items para para sustituir no es un múltiplo de la longitud del reemplazo Thanks! Montserrat [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prcomp( and cmdscale( not equivalent?
I may not understand completely, but it seems you have a 45x45 distance matrix of stimuli and you want to use to determine which stimuli are similar. Wouldn't hierarchical clustering be a more straightforward approach? ?hclust - David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Bob Wiley Sent: Friday, April 26, 2013 4:33 PM To: r-help@r-project.org Subject: [R] prcomp( and cmdscale( not equivalent? Hello, I have a dilemma that I'm hoping the R gurus will be able to help resolve. For background: My data is in the form of a (dis)similarity matrix created from taking the inverse of normalized reaction times. That is, each cell of the matrix represents how long it took to distinguish two stimuli from one another-- a square matrix of 45X45 where the diagonal values are all zero (since this represents two identical stimuli). I have been using cmdscale with this matrix as the input-- So: X = cmdscale(mydata,k=44,add=FALSE,eig=TRUE)$points returns a 45x34 matrix because only 34 of the eigenvalues 0 I then run prcomp on the (transposition of) this matrix: prcomp(t(X),scale.=TRUE) The goal is to take the original matrix of inverse reaction times and transform that data such that we have PCs that show how stimuli are grouping together-- high absolute value loadings/coordinates on a given dimension should reflect how similar the stimuli are to one another. My concern is that I'm not fully understanding the mathematics behind cmdscale( and prcomp(, and that I may just be losing a lot of information or introducting noise? Or is my approach theoretically sound... I've read a TON on this now but I can't see exactly what R is doing with these two functions. thank you! -bob JHU Robert (Bob) Wiley [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Function for Data Frame
Dear R Helpers, I have about 20 data frames that I need to do a series of data scrubbing steps to. I have the list of data frames in a list so that I can use lapply. I am trying to build a function that will do the data scrubbing that I need. However, I am new to functions and there is something fundamental that I am not understanding. I use the return function at the end of the function and this completes the data processing specified in the function, but leaves the data frame that I want changed unaffected. How do I get my function to apply its results to the data frame in question instead of simply displaying the results to the screen? Any helpful guidance would be most appreciated. --John Sparks x=as.data.frame(matrix(c(1,2,3, 1,2,3, 1,2,2, 1,2,2, 1,1,1),ncol=3,byrow=T)) myfunc-function(DF){ DF-subset(DF,select=-c(V1)) return(DF) } myfunc(x) #How to get this change to data frame x? #And preferrably not send the results to the screen? x __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need help on matrix calculation
Sorry, the first line should have been Mat[match( Subscript_Vec, rownames(Mat)),] and the rest remains the same. Best, Jorge.- On Mon, Apr 29, 2013 at 11:45 PM, Jorge I Velez jorgeivanve...@gmail.comwrote: Christofer, The following should get you started: r - Mat[match(rownames(Mat), Subscript_Vec),] rownames(r) - Subscript_Vec r HTH, Jorge.- On Mon, Apr 29, 2013 at 11:38 PM, Christofer Bogaso bogaso.christo...@gmail.com wrote: Hello again, Let say I have 1 matrix: Mat - matrix(1:12, 4, 3) rownames(Mat) - letters[1:4] Now I want to subscript of Mat in following way: Subscript_Vec - c(a, e, b, c) However when I want to use this vector, I am geting following error: Mat[Subscript_Vec, ] Error: subscript out of bounds Basically I want to get my final matrix in following way: V1 V2 V3 a 1 5 9 e NA NA NA b 2 6 10 c 3 7 11 i.e. if some of the element(s) in 'Subscript_Vec' is not in 'Mat' then that row would be filled by NA, WITHOUT altering the sequence of 'Subscript_Vec' Is there any direct way to achieve that? Thanks and regards, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] all.vars for nested expressions
Try poking around in the codetools package. E.g., you can do things like the following expr1 - quote(a - fn + tp) # put 'a' in the expression expr2 - quote( tp / a + fn) expr12 - call({, expr1, expr2) expr12 # { #a - fn + tp #tp/a + fn # } library(codetools) findLocals(expr12) # from codetools # [1] a setdiff(all.vars(expr12), findLocals(expr12)) # [1] fn tp Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of flxms Sent: Monday, April 29, 2013 4:34 AM To: r-help@r-project.org Subject: [R] all.vars for nested expressions Dear R fellows, Assume I define a - expression(fn+tp) sen - expression(tp/a) Now I'd like to know, which variables are necessary for calculating sen all.vars(sen) This results in a vector c(tp,a). But I'd like all.vars to evaluate the sen-object down to the ground level, which would result in a vector c(tp,fn) (because a was defined as fn+tp). In other words, I'd like all.vars to expand the a-object (and all other downstream objects). I am looking for a solution, that works with much more levels. This is just a very simple example. I'd appreciate any suggestions how to do that very much! Thanks in advance, Felix [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] expanding a presence only dataset into presence/absence
Hello, I'm working with a very large dataset (250,000+ lines in its' current form) that includes presence only data on various species (which is nested within different sites and sampling dates). I need to convert this into a dataset with presence/absence for each species. For example, I would like to expand My current data to Desired data: My current data Species Site Date a 1 1 b 1 1 b 1 2 c 1 3 Desired data Species Present Site Date a 1 1 1 b 1 1 1 c 0 1 1 a 0 2 2 b 1 2 2 C 0 2 2 a 0 3 3 b 0 3 3 c 1 3 3 I've scoured the web, including Rseek and haven't found a resolution (and note that a similar question was asked sometime in 2011 without an answer). Does anyone have any thoughts? Thank you in advance. -- Matthew D. Venesky, Ph.D. Postdoctoral Research Associate, Department of Integrative Biology, The University of South Florida, Tampa, FL 33620 Website: http://mvenesky.myweb.usf.edu/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Function for Data Frame
Hi, If it is for the list: lst1- list(x,x,x) lst1-lapply(lst1,myfunc) - Original Message - From: arun smartpink...@yahoo.com To: Sparks, John James jspa...@uic.edu Cc: R help r-help@r-project.org Sent: Monday, April 29, 2013 12:13 PM Subject: Re: [R] Function for Data Frame Hi, If I understand it correctly, x-myfunc(x) x # V2 V3 #1 2 3 #2 2 3 #3 2 2 #4 2 2 #5 1 1 A.K. - Original Message - From: Sparks, John James jspa...@uic.edu To: r-help@r-project.org Cc: Sent: Monday, April 29, 2013 10:23 AM Subject: [R] Function for Data Frame Dear R Helpers, I have about 20 data frames that I need to do a series of data scrubbing steps to. I have the list of data frames in a list so that I can use lapply. I am trying to build a function that will do the data scrubbing that I need. However, I am new to functions and there is something fundamental that I am not understanding. I use the return function at the end of the function and this completes the data processing specified in the function, but leaves the data frame that I want changed unaffected. How do I get my function to apply its results to the data frame in question instead of simply displaying the results to the screen? Any helpful guidance would be most appreciated. --John Sparks x=as.data.frame(matrix(c(1,2,3, 1,2,3, 1,2,2, 1,2,2, 1,1,1),ncol=3,byrow=T)) myfunc-function(DF){ DF-subset(DF,select=-c(V1)) return(DF) } myfunc(x) #How to get this change to data frame x? #And preferrably not send the results to the screen? x __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to add new rows in a dataframe?
Hi, dat1- read.table(text= id t scores 2 0 1.2 2 2 2.3 2 3 3.6 2 4 5.6 2 6 7.8 3 0 1.6 3 1 1.2 3 4 1.5 ,sep=,header=TRUE) library(zoo) res1-do.call(rbind,lapply(split(dat1,dat1$id),function(x) {t1-seq(min(x$t),max(x$t));scores1-na.locf(x$scores[match(t1,x$t)]);data.frame(id=rep(unique(x$id),length(t1)),t1,scores1)})) row.names(res1)- 1:nrow(res1) res1 # id t1 scores1 #1 2 0 1.2 #2 2 1 1.2 #3 2 2 2.3 #4 2 3 3.6 #5 2 4 5.6 #6 2 5 5.6 #7 2 6 7.8 #8 3 0 1.6 #9 3 1 1.2 #10 3 2 1.2 #11 3 3 1.2 #12 3 4 1.5 libray(plyr) dat2-ddply(dat1,.(id),summarize,t=seq(min(t),max(t))) res2-mutate(join(dat2,dat1,type=full),scores=na.locf(scores)) identical(res1,res2) #[1] TRUE res2 # id t scores #1 2 0 1.2 #2 2 1 1.2 #3 2 2 2.3 #4 2 3 3.6 #5 2 4 5.6 #6 2 5 5.6 #7 2 6 7.8 #8 3 0 1.6 #9 3 1 1.2 #10 3 2 1.2 #11 3 3 1.2 #12 3 4 1.5 A.K. Hello , dear experts, I have my data like this: id t scores 2 0 1.2 2 2 2.3 2 3 3.6 2 4 5.6 2 6 7.8 3 0 1.6 3 1 1.2 3 4 1.5 I want to fullifill the t, so i want to add the rows with the data of (t-1) just get another dataframe like this: id t scores 2 0 1.2 2 1 1.2 2 2 2.3 2 3 3.6 2 4 5.6 2 5 5.6 2 6 7.8 3 0 1.6 3 1 1.2 3 2 1.2 3 4 1.5 How can i get the result like this? In reality, i have 4000 obervations, so it's difficult to add the lines manuelly. Thank you so much. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rbinding some elements from a list and obtain another list
On Apr 29, 2013, at 6:54 AM, De Castro Pascual, Montserrat wrote: Hi everybody, I have a list, where every element of this list is a data frame. An example: Mylist-list(A=data.frame, B=data.frame, C=data.frame, D=data.frame) I'm looking at this apparently malformed command and wondering if that is the root of all your problems. Do you know how to make a simple successful example of a list of dataframes? -- David. I want to rbind some elements of this list. As an example: Output-list(AB=data.frame, CD=data.frame) Where AB=rbind(A,B) CD=rbind(C,D) I’ve tried: f-function(x){ for (i in seq(1,length(names(x)),2)){ aa-do.call(rbind,x[i:i+1]) aa }} bb-f(mylist) or f-function(x){ for (i in seq(1,length(names(x)),2)){ aa[i]-do.call(rbind,x[i:i+1]) list(aa[i]) }} bb-f (mylist) but it doesn’t works f-function(x){ + for (i in seq(1,length(names(x)),2)){ + aa-do.call(rbind,x[i:i+1]) + aa + }} bb-f(mylist) bb NULL f-function(x){ + for (i in seq(1,length(names(x)),2)){ + aa-do.call(rbind,x[i:i+1]) + aa + }} bb-f(mylist) f-function(x){ + for (i in seq(1,length(names(x)),2)){ + aa[i]-do.call(rbind,x[i:i+1]) + list(aa[i]) + }} bb-f(mylist) Mensajes de aviso perdidos 1: In aa[i] - do.call(rbind, x[i:i + 1]) : número de items para para sustituir no es un múltiplo de la longitud del reemplazo 2: In aa[i] - do.call(rbind, x[i:i + 1]) : número de items para para sustituir no es un múltiplo de la longitud del reemplazo 3: In aa[i] - do.call(rbind, x[i:i + 1]) : número de items para para sustituir no es un múltiplo de la longitud del reemplazo 4: In aa[i] - do.call(rbind, x[i:i + 1]) : número de items para para sustituir no es un múltiplo de la longitud del reemplazo 5: In aa[i] - do.call(rbind, x[i:i + 1]) : número de items para para sustituir no es un múltiplo de la longitud del reemplazo 6: In aa[i] - do.call(rbind, x[i:i + 1]) : número de items para para sustituir no es un múltiplo de la longitud del reemplazo Thanks! Montserrat [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Stratified Random Sampling Proportional to Size
This problem in sampling::strata() comes from calling cbind on a zero-row data.frame with a scalar number. library(sampling) strata(mtcars[,c(mpg,hp,gear)], strat=gear, size=c(5,5,0)) Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 0, 1 In addition: Warning message: In strata(mtcars[, c(mpg, hp, gear)], strat = gear, size = c(5, : the method is not specified; by default, the method is srswor traceback() 5: stop(arguments imply differing number of rows: , paste(unique(nrows), collapse = , )) 4: data.frame(..., check.names = FALSE) 3: cbind(deparse.level, ...) 2: cbind(r, i) 1: strata(mtcars[, c(mpg, hp, gear)], strat = gear, size = c(5, 5, 0)) Changing that cbind call from cbind(r, i) to cbind(r, rep(i, length.out=nrow(r))) would fix it up. cbind is not entirely consistent with what it does with a 0-row rectangular input and a scalar. With a matrix you get a 0-row result and a warning m - matrix(numeric(), nrow=0, ncol=3, dimnames=list(NULL,paste0(Col,1:3))) str(cbind(m, 666)) num[0 , 1:4] - attr(*, dimnames)=List of 2 ..$ : NULL ..$ : chr [1:4] Col1 Col2 Col3 Warning message: In cbind(m, 666) : number of rows of result is not a multiple of vector length (arg 2) With a data.frame you get an error str(cbind(data.frame(m), 666)) Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 0, 1 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Thomas Lumley Sent: Sunday, April 28, 2013 1:31 PM To: Jeff Newmiller Cc: R help (r-help@r-project.org) Subject: Re: [R] Stratified Random Sampling Proportional to Size It looks as though you can't sample zero observations from a stratum. If you take the example on the help page and change one of the sample sizes to zero you get exactly the same error. From the fact that there isn't a more explicit error message, I would guess that the author just never considered the possibility that someone would have a population stratum and not sample from it. -thomas On Sun, Apr 28, 2013 at 7:14 PM, Jeff Newmiller jdnew...@dcn.davis.ca.uswrote: a) Please post plain text b) Please make reproducible examples (e.g. telling us how you accessed a database that we have no access to is not helpful). See ?head, ?dput and [1] c) I don't know anything about the sampling package or the strata function, but I would recommend eliminating the rows that have zeros from the input data. E.g.: stratum_cp - stratum_cp[ 0stratum_cp$stratp, ] [1] http://stackoverflow.com/**questions/5963269/how-to-make-** a-great-r-reproducible-examplehttp://stackoverflow.com/questions/5963269/how- to-make-a-great-r-reproducible-example On Fri, 26 Apr 2013, Lopez, Dan wrote: Hello R Experts, I kindly request your assistance on figuring out how to get a stratified random sampling proportional to 100. Below is my r code showing what I did and the error I'm getting with sampling::strata # FIRST I summarized count of records by the two variables I want to use as strata Library(RODBC) library(sqldf) library(sampling) #After establishing connection I query the data and sort it by strata APPT_TYP_CD_LL and EMPL_TYPE and store it in a dataframe CURRPOP-sqlQuery(ch,SELECT APPT_TYP_CD_LL, EMPL_TYPE,ASOFDATE,EMPLID,** NAME,DEPTID,JOBCODE,JOBTITLE,**SAL_ADMIN_PLAN,RET_TYP_CD_LL FROM PS_EMPLOYEES_LL WHERE EMPL_STATUS NOT IN('R','T') ORDER BY APPT_TYP_CD_LL, EMPL_TYPE) #ROWID is a dummy ID I added and repositioned after the strat columns for later use CURRPOP$ROWID-seq(nrow(**CURRPOP)) CURRPOP-CURRPOP[,c(1:2,11,3:**10)] # My strata. Stratp is how many I want to sampled from each strata. NOTE THERE ARE SOME 0's which just means I won't sample from that group. stratum_cp-sqldf(SELECT APPT_TYP_CD_LL,EMPL_TYPE, count(*) HC FROM CURRPOP GROUP BY APPT_TYP_CD_LL,EMPL_TYPE) stratum_cp$stratp-round(**stratum_cp$HC/nrow(CURRPOP)***100) stratum_cp APPT_TYP_CD_LL EMPL_TYPE HC stratp 1 FA S1 0 2 FC S5 0 3 FP S 173 3 4 FR H 170 3 5 FX H 49 1 6 FX S 57 1 7 IN H 1589 25 8 IN S 3987 63 9 IP H7 0 10 IP S 53 1 11 SA H8 0 12 SE S 43 1 13 SF H 14 0 14 SF S1 0 15 SG S 10 0 16 ST H 107 2 17 ST S6 0 #THEN I attempted to use
[R] plspm error: singular matrix 'a' in 'solve'
Hello, I am running a simple plspm for a class project due later today and I am receiving the following error despite following along exactly with Gaston Sanchez's directions in PLS Path Modeling with R: Error in solve.qr(qr(X.blok), Z[, j]) : singular matrix 'a' in 'solve' I would greatly appreciate any help resolving this matter. I got the same error after changing the inner model matrix to be the same as the model Sanchez uses in his first example. The package seems to be working because innerplot() has worked. My code is below. Thanks very much! -Mitch Hunter Early = c(0, 0, 0) Late = c(0, 0, 0) Weediness = c(1, 1, 0) wd.inner = rbind(Early, Late, Weediness) colnames(wd.inner) = rownames(wd.inner) innerplot(wd.inner, box.size = 0.1) wd.outer = list(2:5,6:9,10) wd.modes = c(B, B, A) wd.pls1 = plspm(md, wd.inner, wd.outer, wd.modes) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Function for Data Frame
Hi, If I understand it correctly, x-myfunc(x) x # V2 V3 #1 2 3 #2 2 3 #3 2 2 #4 2 2 #5 1 1 A.K. - Original Message - From: Sparks, John James jspa...@uic.edu To: r-help@r-project.org Cc: Sent: Monday, April 29, 2013 10:23 AM Subject: [R] Function for Data Frame Dear R Helpers, I have about 20 data frames that I need to do a series of data scrubbing steps to. I have the list of data frames in a list so that I can use lapply. I am trying to build a function that will do the data scrubbing that I need. However, I am new to functions and there is something fundamental that I am not understanding. I use the return function at the end of the function and this completes the data processing specified in the function, but leaves the data frame that I want changed unaffected. How do I get my function to apply its results to the data frame in question instead of simply displaying the results to the screen? Any helpful guidance would be most appreciated. --John Sparks x=as.data.frame(matrix(c(1,2,3, 1,2,3, 1,2,2, 1,2,2, 1,1,1),ncol=3,byrow=T)) myfunc-function(DF){ DF-subset(DF,select=-c(V1)) return(DF) } myfunc(x) #How to get this change to data frame x? #And preferrably not send the results to the screen? x __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] expanding a presence only dataset into presence/absence
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Matthew Venesky Sent: Monday, April 29, 2013 8:13 AM To: r-help@r-project.org Subject: [R] expanding a presence only dataset into presence/absence Hello, I'm working with a very large dataset (250,000+ lines in its' current form) that includes presence only data on various species (which is nested within different sites and sampling dates). I need to convert this into a dataset with presence/absence for each species. For example, I would like to expand My current data to Desired data: My current data Species Site Date a 1 1 b 1 1 b 1 2 c 1 3 Desired data Species Present Site Date a 1 1 1 b 1 1 1 c 0 1 1 a 0 2 2 b 1 2 2 C 0 2 2 a 0 3 3 b 0 3 3 c 1 3 3 I've scoured the web, including Rseek and haven't found a resolution (and note that a similar question was asked sometime in 2011 without an answer). Does anyone have any thoughts? Thank you in advance. Matthew, You need to clarify your requirements before anyone can help you. Your presence-only data only contains one site, but your desired data has three. How are we to know how many sites there are? Also, your presence-only data has species c present at site 1 on date 3, but it is not present in your desired data. It is not at all clear (nor is it deducible) how you get from your example data to your desired data. If you clarify your requirements, maybe someone will be able to help. Dan Daniel Nordlund Bothell, WA USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Function for Data Frame
Hi, On Apr 29, 2013, at 10:23 AM, Sparks, John James wrote: Dear R Helpers, I have about 20 data frames that I need to do a series of data scrubbing steps to. I have the list of data frames in a list so that I can use lapply. I am trying to build a function that will do the data scrubbing that I need. However, I am new to functions and there is something fundamental that I am not understanding. I use the return function at the end of the function and this completes the data processing specified in the function, but leaves the data frame that I want changed unaffected. How do I get my function to apply its results to the data frame in question instead of simply displaying the results to the screen? Any helpful guidance would be most appreciated. --John Sparks x=as.data.frame(matrix(c(1,2,3, 1,2,3, 1,2,2, 1,2,2, 1,1,1),ncol=3,byrow=T)) myfunc-function(DF){ DF-subset(DF,select=-c(V1)) return(DF) } myfunc(x) #How to get this change to data frame x? #And preferrably not send the results to the screen? x Good question! In your example, x is passed into myfunc by value (a copy of the value of x) rather than by reference (like passing in the social security number of x). So your scrubbing within the function is done on a copy of x, which you call DF. To update the value of x outside of your function, you have to assign the returned value of myfunc to x x - myfunc(x) See more at ... http://cran.r-project.org/doc/manuals/r-release/R-intro.html#Writing-your-own-functions Cheers, Ben __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Ben Tupper Bigelow Laboratory for Ocean Sciences 60 Bigelow Drive, P.O. Box 380 East Boothbay, Maine 04544 http://www.bigelow.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rbinding some elements from a list and obtain another list
Hi, Try this: set.seed(24) lst1-lapply(1:4,function(x) as.data.frame(matrix(sample(1:20,20,replace=TRUE),ncol=5))) names(lst1)- LETTERS[1:4] res-lapply(list(c(A,B),c(C,D)), function(x) do.call(rbind,lst1[x])) res #[[1]] # V1 V2 V3 V4 V5 #A.1 6 14 17 14 4 #A.2 5 19 6 14 1 #A.3 15 6 13 7 11 #A.4 11 16 8 19 3 #B.1 2 5 13 8 15 #B.2 12 14 1 3 13 #B.3 15 2 7 19 14 #B.4 3 12 5 5 20 # #[[2]] # V1 V2 V3 V4 V5 #C.1 10 1 6 10 10 #C.2 8 2 7 15 6 #C.3 6 8 10 11 4 #C.4 5 8 18 20 3 #D.1 10 15 15 1 12 #D.2 5 7 10 20 17 #D.3 6 19 3 13 1 #D.4 3 20 5 7 15 A.K. - Original Message - From: De Castro Pascual, Montserrat mdecas...@creal.cat To: r-help@r-project.org Cc: Sent: Monday, April 29, 2013 9:54 AM Subject: [R] rbinding some elements from a list and obtain another list Hi everybody, I have a list, where every element of this list is a data frame. An example: Mylist-list(A=data.frame, B=data.frame, C=data.frame, D=data.frame) I want to rbind some elements of this list. As an example: Output-list(AB=data.frame, CD=data.frame) Where AB=rbind(A,B) CD=rbind(C,D) I’ve tried: f-function(x){ for (i in seq(1,length(names(x)),2)){ aa-do.call(rbind,x[i:i+1]) aa }} bb-f(mylist) or f-function(x){ for (i in seq(1,length(names(x)),2)){ aa[i]-do.call(rbind,x[i:i+1]) list(aa[i]) }} bb-f (mylist) but it doesn’t works f-function(x){ + for (i in seq(1,length(names(x)),2)){ + aa-do.call(rbind,x[i:i+1]) + aa + }} bb-f(mylist) bb NULL f-function(x){ + for (i in seq(1,length(names(x)),2)){ + aa-do.call(rbind,x[i:i+1]) + aa + }} bb-f(mylist) f-function(x){ + for (i in seq(1,length(names(x)),2)){ + aa[i]-do.call(rbind,x[i:i+1]) + list(aa[i]) + }} bb-f(mylist) Mensajes de aviso perdidos 1: In aa[i] - do.call(rbind, x[i:i + 1]) : número de items para para sustituir no es un múltiplo de la longitud del reemplazo 2: In aa[i] - do.call(rbind, x[i:i + 1]) : número de items para para sustituir no es un múltiplo de la longitud del reemplazo 3: In aa[i] - do.call(rbind, x[i:i + 1]) : número de items para para sustituir no es un múltiplo de la longitud del reemplazo 4: In aa[i] - do.call(rbind, x[i:i + 1]) : número de items para para sustituir no es un múltiplo de la longitud del reemplazo 5: In aa[i] - do.call(rbind, x[i:i + 1]) : número de items para para sustituir no es un múltiplo de la longitud del reemplazo 6: In aa[i] - do.call(rbind, x[i:i + 1]) : número de items para para sustituir no es un múltiplo de la longitud del reemplazo Thanks! Montserrat [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] expanding a presence only dataset into presence/absence
Hi, Your output dataset is bit confusing as it contains Sites that were not in the input. Using your input dataset, I am getting this: dat1- read.table(text= Species Site Date a 1 1 b 1 1 b 1 2 c 1 3 ,sep=,header=TRUE,stringsAsFactors=FALSE) dat1$Present- 1 dat2-expand.grid(unique(dat1$Species),unique(dat1$Site),unique(dat1$Date)) colnames(dat2)- colnames(dat1) res-merge(dat1,dat2,by=c(Species,Site,Date),all=TRUE) res[is.na(res)]- 0 res-res[order(res$Date),] res # Species Site Date Present #1 a 1 1 1 #4 b 1 1 1 #7 c 1 1 0 #2 a 1 2 0 #5 b 1 2 1 #8 c 1 2 0 #3 a 1 3 0 #6 b 1 3 0 #9 c 1 3 1 A.K. - Original Message - From: Matthew Venesky mvene...@gmail.com To: r-help@r-project.org Cc: Sent: Monday, April 29, 2013 11:12 AM Subject: [R] expanding a presence only dataset into presence/absence Hello, I'm working with a very large dataset (250,000+ lines in its' current form) that includes presence only data on various species (which is nested within different sites and sampling dates). I need to convert this into a dataset with presence/absence for each species. For example, I would like to expand My current data to Desired data: My current data Species Site Date a 1 1 b 1 1 b 1 2 c 1 3 Desired data Species Present Site Date a 1 1 1 b 1 1 1 c 0 1 1 a 0 2 2 b 1 2 2 C 0 2 2 a 0 3 3 b 0 3 3 c 1 3 3 I've scoured the web, including Rseek and haven't found a resolution (and note that a similar question was asked sometime in 2011 without an answer). Does anyone have any thoughts? Thank you in advance. -- Matthew D. Venesky, Ph.D. Postdoctoral Research Associate, Department of Integrative Biology, The University of South Florida, Tampa, FL 33620 Website: http://mvenesky.myweb.usf.edu/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lavaan and semTools warning message
Hello all, I am running a simple path analysis with the function sem.mi (of semTools) after doing multiple imputation in my (missing) data. However, depending on the option to combine the chi-square, I get the following warning messages: Warning messages: 1: In estimateVCOV(lavaanModel, samplestats = lavaanSampleStats, ... : lavaan WARNING: could not compute standard errors! 2: In pchisq(chisq, df) : NaNs produced 3: In estimateVCOV(lavaanModel, samplestats = lavaanSampleStats, ... : lavaan WARNING: could not compute standard errors! 4: In pchisq(chisq, df) : NaNs produced 5: In estimateVCOV(lavaanModel, samplestats = lavaanSampleStats, ... : lavaan WARNING: could not compute standard errors! and so forth. The options chi=mr and mplus result in these warning messages, but options chi=lmrr and none run fine (no warning messages). Even when I get these warning messages, all estimates (including se and chi) are printed out in the results (using summary, for example). Also, using the function sem (of lavaan package) directly (with one of the replicated datasets) runs fine. Here is the code I'm using: # start code # model syntax model1 - ' # regressions r1 ~ p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p11 r2 ~ r1+p2+p4+p5+p6+p8+p9+p10+p11+p12+p13+p14 ' # run sem (N=124); data are already imputed out1 - sem.mi(model1,imputedData,m=20,chi=mr,fixed.x=T,std.ov=T) summary(out1) inspect(out1, imputed) # the combined chi is presented # end code Can someone tell me why am I getting these warning messages? Thanks, Duarte [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Stratified Random Sampling Proportional to Size
Hi Jeff, a b) points taken. Thanks for the reference too. c) taking the zero's out did the trick. Dan -Original Message- From: Jeff Newmiller [mailto:jdnew...@dcn.davis.ca.us] Sent: Sunday, April 28, 2013 12:15 AM To: Lopez, Dan Cc: R help (r-help@r-project.org) Subject: Re: [R] Stratified Random Sampling Proportional to Size a) Please post plain text b) Please make reproducible examples (e.g. telling us how you accessed a database that we have no access to is not helpful). See ?head, ?dput and [1] c) I don't know anything about the sampling package or the strata function, but I would recommend eliminating the rows that have zeros from the input data. E.g.: stratum_cp - stratum_cp[ 0stratum_cp$stratp, ] [1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example On Fri, 26 Apr 2013, Lopez, Dan wrote: Hello R Experts, I kindly request your assistance on figuring out how to get a stratified random sampling proportional to 100. Below is my r code showing what I did and the error I'm getting with sampling::strata # FIRST I summarized count of records by the two variables I want to use as strata Library(RODBC) library(sqldf) library(sampling) #After establishing connection I query the data and sort it by strata APPT_TYP_CD_LL and EMPL_TYPE and store it in a dataframe CURRPOP-sqlQuery(ch,SELECT APPT_TYP_CD_LL, EMPL_TYPE,ASOFDATE,EMPLID,NAME,DEPTID,JOBCODE,JOBTITLE,SAL_ADMIN_PLAN, RET_TYP_CD_LL FROM PS_EMPLOYEES_LL WHERE EMPL_STATUS NOT IN('R','T') ORDER BY APPT_TYP_CD_LL, EMPL_TYPE) #ROWID is a dummy ID I added and repositioned after the strat columns for later use CURRPOP$ROWID-seq(nrow(CURRPOP)) CURRPOP-CURRPOP[,c(1:2,11,3:10)] # My strata. Stratp is how many I want to sampled from each strata. NOTE THERE ARE SOME 0's which just means I won't sample from that group. stratum_cp-sqldf(SELECT APPT_TYP_CD_LL,EMPL_TYPE, count(*) HC FROM CURRPOP GROUP BY APPT_TYP_CD_LL,EMPL_TYPE) stratum_cp$stratp-round(stratum_cp$HC/nrow(CURRPOP)*100) stratum_cp APPT_TYP_CD_LL EMPL_TYPE HC stratp 1 FA S1 0 2 FC S5 0 3 FP S 173 3 4 FR H 170 3 5 FX H 49 1 6 FX S 57 1 7 IN H 1589 25 8 IN S 3987 63 9 IP H7 0 10 IP S 53 1 11 SA H8 0 12 SE S 43 1 13 SF H 14 0 14 SF S1 0 15 SG S 10 0 16 ST H 107 2 17 ST S6 0 #THEN I attempted to use sampling::strata using the instructions in that package and got an error #I use stratum_cp$stratp for my sizes. s-strata(CURRPOP,c(APPT_TYP_CD_LL,EMPL_TYPE),size=stratum_cp$str atp,method=srswor) Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 0, 1 traceback() 5: stop(arguments imply differing number of rows: , paste(unique(nrows), collapse = , )) 4: data.frame(..., check.names = FALSE) 3: cbind(deparse.level, ...) 2: cbind(r, i) 1: strata(CURRPOP, c(APPT_TYP_CD_LL, EMPL_TYPE), size = stratum_cp$stratp, method = srswor) #In lieu of a reproducible sample here is some info regarding most of my data dim(CURRPOP) [1] 6280 11 #Cols w/ personal info have been removed in this output str(CURRPOP[,c(1:3,7:11)]) 'data.frame': 6280 obs. of 8 variables: $ APPT_TYP_CD_LL: Factor w/ 12 levels FA,FC,FP,..: 1 2 2 2 2 2 3 3 3 3 ... $ EMPL_TYPE : Factor w/ 2 levels H,S: 2 2 2 2 2 2 2 2 2 2 ... $ ROWID : int 1 2 3 4 5 6 7 8 9 10 ... $ DEPTID: int 9825 9613 9613 9852 9772 9852 9853 9853 9853 9854 ... $ JOBCODE : Factor w/ 325 levels 055.2,055.3,..: 311 112 112 112 112 112 298 299 299 300 ... $ JOBTITLE : Factor w/ 325 levels Accounting Assistant,..: 227 192 192 192 192 192 190 191 191 153 ... $ SAL_ADMIN_PLAN: Factor w/ 40 levels ADE,AME,ASE,..: 36 38 38 38 38 38 31 31 31 31 ... $ RET_TYP_CD_LL : Factor w/ 2 levels TCP1,TCP2: 2 2 2 2 2 2 2 2 2 2 ... Daniel Lopez Workforce Analyst HRIM - Workforce Analytics Metrics Strategic Human Resources Management wf-analytics-metr...@lists.llnl.govmailto:wf-analytics-metrics@lists. llnl.gov (925) 422-0814 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. --- Jeff NewmillerThe
Re: [R] cannot compile R on Cray XE6 HLRS HERMIT
Martin Ivanov tramni at abv.bg writes: Dear All, I am trying to compile R-3.0 on Cray xe6 (HLRS) HERMIT, no success so far. Here is my experience: You might be better off posting this to the r-de...@r-project.org mailing list (the list is for developer queries: technically this isn't development, but queries about compilation on exotic systems appear more often there, and often require input from R-core members ...) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] expanding a presence only dataset into presence/absence
I am sorry. I forgot to update the code:dat1- read.table(text= Species Site Date a 1 1 b 1 1 b 1 2 c 1 3 ,sep=,header=TRUE,stringsAsFactors=FALSE) dat1$Present- 1 dat2-expand.grid(unique(dat1$Species),unique(dat1$Site),unique(dat1$Date)) colnames(dat2)- colnames(dat1)[-4] #changed here res-merge(dat1,dat2,by=c(Species,Site,Date),all=TRUE) res[is.na(res)]- 0 res-res[order(res$Date),] row.names(res)- 1:nrow(res) res # Species Site Date Present #1 a 1 1 1 #2 b 1 1 1 #3 c 1 1 0 #4 a 1 2 0 #5 b 1 2 1 #6 c 1 2 0 #7 a 1 3 0 #8 b 1 3 0 #9 c 1 3 1 A.K. From: Matthew Venesky mvene...@gmail.com To: arun smartpink...@yahoo.com Sent: Monday, April 29, 2013 1:58 PM Subject: Re: [R] expanding a presence only dataset into presence/absence The output that you prepared (for Site 1) looks good... however, I can't get that code to work. I get the following error: dat2-expand.grid(unique(dat1$Species),unique(dat1$Site),unique(dat1$Date))colnames(dat2)- colnames(dat1) Error: unexpected symbol in dat2-expand.grid(unique(dat1$Species),unique(dat1$Site),unique(dat1$Date))colnames -- Matthew D. Venesky, Ph.D. Postdoctoral Research Associate, Department of Integrative Biology, The University of South Florida, Tampa, FL 33620 Website: http://mvenesky.myweb.usf.edu/ On Mon, Apr 29, 2013 at 1:44 PM, arun smartpink...@yahoo.com wrote: Hi Matthew, So, do you think the output I gave is different from what you expected? Thanks, Arun From: Matthew Venesky mvene...@gmail.com To: arun smartpink...@yahoo.com Sent: Monday, April 29, 2013 1:15 PM Subject: Re: [R] expanding a presence only dataset into presence/absence I see what you are confused about. I'm sorry. I gave extra sites as examples in my table called Desired Data such that there are 3 sites in the Desired Data and only 1 site in the My current data. Ignore sites 2 and 3; you should see what I am trying to do using only site 1. -- Matthew D. Venesky, Ph.D. Postdoctoral Research Associate, Department of Integrative Biology, The University of South Florida, Tampa, FL 33620 Website: http://mvenesky.myweb.usf.edu/ On Mon, Apr 29, 2013 at 1:11 PM, Matthew Venesky mvene...@gmail.com wrote: That is part of the difficulty. If Species C was present only on Date 3, we need to have the code manually add Species C as absent (i.e., assign it a value of 0) at that site on the previous sampling dates. Or, is there something else that is confusing you that I am not explaining? -- Matthew D. Venesky, Ph.D. Postdoctoral Research Associate, Department of Integrative Biology, The University of South Florida, Tampa, FL 33620 Website: http://mvenesky.myweb.usf.edu/ On Mon, Apr 29, 2013 at 12:47 PM, arun smartpink...@yahoo.com wrote: Hi, Your output dataset is bit confusing as it contains Sites that were not in the input. Using your input dataset, I am getting this: dat1- read.table(text= Species Site Date a 1 1 b 1 1 b 1 2 c 1 3 ,sep=,header=TRUE,stringsAsFactors=FALSE) dat1$Present- 1 dat2-expand.grid(unique(dat1$Species),unique(dat1$Site),unique(dat1$Date)) colnames(dat2)- colnames(dat1) res-merge(dat1,dat2,by=c(Species,Site,Date),all=TRUE) res[is.na(res)]- 0 res-res[order(res$Date),] res # Species Site Date Present #1 a 1 1 1 #4 b 1 1 1 #7 c 1 1 0 #2 a 1 2 0 #5 b 1 2 1 #8 c 1 2 0 #3 a 1 3 0 #6 b 1 3 0 #9 c 1 3 1 A.K. - Original Message - From: Matthew Venesky mvene...@gmail.com To: r-help@r-project.org Cc: Sent: Monday, April 29, 2013 11:12 AM Subject: [R] expanding a presence only dataset into presence/absence Hello, I'm working with a very large dataset (250,000+ lines in its' current form) that includes presence only data on various species (which is nested within different sites and sampling dates). I need to convert this into a dataset with presence/absence for each species. For example, I would like to expand My current data to Desired data: My current data Species Site Date a 1 1 b 1 1 b 1 2 c 1 3 Desired data Species Present Site Date a 1 1 1 b 1 1 1 c 0 1 1 a 0 2 2 b 1 2 2 C 0 2 2 a 0 3 3 b 0 3 3 c 1 3 3 I've scoured the web, including Rseek and haven't found a resolution (and note that a similar question was asked sometime in 2011 without an answer). Does anyone have any thoughts? Thank you in advance. -- Matthew D. Venesky, Ph.D. Postdoctoral Research Associate, Department of Integrative Biology, The University of South Florida, Tampa, FL 33620 Website: http://mvenesky.myweb.usf.edu/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing
Re: [R] parSapply can't find function
Hi, Uwe. I still don't get how this can be done correctly. Here is what I tried. In the file funcs.R, define these functions: library('modeest') x = vector(length=500) x = sapply(x, function(i) i=sample(c(1,0), 1)) pastK = function(n, x, k) { if (nk) { return(x[(n-k):(n-1)]) } else {return(NA)} } predR = function(x, k) { pastList = lapply(1:length(x), function(n) pastK(n, x, k)) pred = sapply(pastList, function(v) mfv(v)[1]) ratio = sum(pred==x, na.rm=T)/(length(pred) - sum(is.na(pred))) } Then do the following: library('snow') cl = makeCluster(rep('localhost', 12), 'SOCK') clusterSetupRNG(cl) clusterEvalQ(cl, 'source(funcs.R)') testK = function() { k = seq(3, 25, 2) r = parSapply(cl, k, function(i) predR(x, i)) print(r) } testK() stopCluster(cl) The error still pops up: Error in checkForRemoteErrors(val) : 12 nodes produced errors; first error: could not find function predR Best regards, Kaiyin ZHONG -- FMB, Erasmus MC k.zh...@erasmusmc.nl kindlych...@gmail.com On Tue, Apr 23, 2013 at 3:44 PM, Uwe Ligges lig...@statistik.tu-dortmund.de wrote: On 23.04.2013 15:00, Kaiyin Zhong (Victor Chung) wrote: Thanks for the reply. How can i make the functions known to all nodes? See ?clusterEvalQ you may also want to try the parallel packages. Best, Uwe Ligges Best regards, Kaiyin ZHONG --**--** FMB, Erasmus MC k.zh...@erasmusmc.nl mailto:k.zh...@erasmusmc.nl kindlych...@gmail.com mailto:kindlych...@gmail.com On Tue, Apr 23, 2013 at 2:43 PM, Uwe Ligges lig...@statistik.tu-dortmund.**de lig...@statistik.tu-dortmund.de mailto:lig...@statistik.tu-**dortmund.delig...@statistik.tu-dortmund.de wrote: On 18.04.2013 11:11, Kaiyin Zhong (Victor Chung) wrote: Here is the code, assuming 8 cores in the cpu. library('modeest') library('snow') cl = makeCluster(rep('localhost', 8), 'SOCK') x = vector(length=50) x = sapply(x, function(i) i=sample(c(1,0), 1)) pastK = function(n, x, k) { if (nk) { return(x[(n-k):(n-1)]) } else {return(NA)} } predR = function(x, k) { pastList = lapply(1:length(x), function(n) pastK(n, x, k)) pred = sapply(pastList, function(v) mfv(v)[1]) ratio = sum(pred==x, na.rm=T)/(length(pred) - sum(is.na http://is.na(pred))) } testK = function() { k = seq(3, 25, 2) r = parSapply(cl, k, function(i) predR(x, i)) #r = sapply(k, function(i) predR(x, i)) } r = testK() stopCluster(cl) Here is the error: Error in checkForRemoteErrors(val) : 8 nodes produced errors; first error: could not find function predR predR is not yet known on all nodes, just on the master. You have to tell the nodes about the definition first. Best, Uwe Ligges Best regards, Kaiyin ZHONG -- FMB, Erasmus MC k.zh...@erasmusmc.nl mailto:k.zh...@erasmusmc.nl kindlych...@gmail.com mailto:kindlych...@gmail.com [[alternative HTML version deleted]] __**__ R-help@r-project.org mailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/_**_listinfo/r-helphttps://stat.ethz.ch/mailman/__listinfo/r-help https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/__**posting-guide.htmlhttp://www.R-project.org/__posting-guide.html http://www.R-project.org/**posting-guide.htmlhttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] parSapply can't find function
On Apr 29, 2013, at 11:16 AM, Kaiyin Zhong (Victor Chung) wrote: Hi, Uwe. I still don't get how this can be done correctly. Here is what I tried. In the file funcs.R, define these functions: library('modeest') x = vector(length=500) x = sapply(x, function(i) i=sample(c(1,0), 1)) pastK = function(n, x, k) { if (nk) { return(x[(n-k):(n-1)]) } else {return(NA)} } predR = function(x, k) { pastList = lapply(1:length(x), function(n) pastK(n, x, k)) pred = sapply(pastList, function(v) mfv(v)[1]) ratio = sum(pred==x, na.rm=T)/(length(pred) - sum(is.na(pred))) } Then do the following: library('snow') cl = makeCluster(rep('localhost', 12), 'SOCK') clusterSetupRNG(cl) clusterEvalQ(cl, 'source(funcs.R)') Are you sure those outer single quote marks are not the problem? -- David. testK = function() { k = seq(3, 25, 2) r = parSapply(cl, k, function(i) predR(x, i)) print(r) } testK() stopCluster(cl) The error still pops up: Error in checkForRemoteErrors(val) : 12 nodes produced errors; first error: could not find function predR Best regards, Kaiyin ZHONG -- FMB, Erasmus MC k.zh...@erasmusmc.nl kindlych...@gmail.com On Tue, Apr 23, 2013 at 3:44 PM, Uwe Ligges lig...@statistik.tu-dortmund.de wrote: On 23.04.2013 15:00, Kaiyin Zhong (Victor Chung) wrote: Thanks for the reply. How can i make the functions known to all nodes? See ?clusterEvalQ you may also want to try the parallel packages. Best, Uwe Ligges Best regards, Kaiyin ZHONG --**--** FMB, Erasmus MC k.zh...@erasmusmc.nl mailto:k.zh...@erasmusmc.nl kindlych...@gmail.com mailto:kindlych...@gmail.com On Tue, Apr 23, 2013 at 2:43 PM, Uwe Ligges lig...@statistik.tu-dortmund.**de lig...@statistik.tu-dortmund.de mailto:lig...@statistik.tu-**dortmund.delig...@statistik.tu-dortmund.de wrote: On 18.04.2013 11:11, Kaiyin Zhong (Victor Chung) wrote: Here is the code, assuming 8 cores in the cpu. library('modeest') library('snow') cl = makeCluster(rep('localhost', 8), 'SOCK') x = vector(length=50) x = sapply(x, function(i) i=sample(c(1,0), 1)) pastK = function(n, x, k) { if (nk) { return(x[(n-k):(n-1)]) } else {return(NA)} } predR = function(x, k) { pastList = lapply(1:length(x), function(n) pastK(n, x, k)) pred = sapply(pastList, function(v) mfv(v)[1]) ratio = sum(pred==x, na.rm=T)/(length(pred) - sum(is.na http://is.na(pred))) } testK = function() { k = seq(3, 25, 2) r = parSapply(cl, k, function(i) predR(x, i)) #r = sapply(k, function(i) predR(x, i)) } r = testK() stopCluster(cl) Here is the error: Error in checkForRemoteErrors(val) : 8 nodes produced errors; first error: could not find function predR predR is not yet known on all nodes, just on the master. You have to tell the nodes about the definition first. Best, Uwe Ligges Best regards, Kaiyin ZHONG -- FMB, Erasmus MC k.zh...@erasmusmc.nl mailto:k.zh...@erasmusmc.nl kindlych...@gmail.com mailto:kindlych...@gmail.com [[alternative HTML version deleted]] __**__ R-help@r-project.org mailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/_**_listinfo/r-helphttps://stat.ethz.ch/mailman/__listinfo/r-help https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/__**posting-guide.htmlhttp://www.R-project.org/__posting-guide.html http://www.R-project.org/**posting-guide.htmlhttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] parSapply can't find function
Oh, indeed, that IS the problem. Thank you!!! Best regards, Kaiyin ZHONG -- FMB, Erasmus MC k.zh...@erasmusmc.nl kindlych...@gmail.com On Mon, Apr 29, 2013 at 8:22 PM, David Winsemius dwinsem...@comcast.netwrote: On Apr 29, 2013, at 11:16 AM, Kaiyin Zhong (Victor Chung) wrote: Hi, Uwe. I still don't get how this can be done correctly. Here is what I tried. In the file funcs.R, define these functions: library('modeest') x = vector(length=500) x = sapply(x, function(i) i=sample(c(1,0), 1)) pastK = function(n, x, k) { if (nk) { return(x[(n-k):(n-1)]) } else {return(NA)} } predR = function(x, k) { pastList = lapply(1:length(x), function(n) pastK(n, x, k)) pred = sapply(pastList, function(v) mfv(v)[1]) ratio = sum(pred==x, na.rm=T)/(length(pred) - sum(is.na(pred))) } Then do the following: library('snow') cl = makeCluster(rep('localhost', 12), 'SOCK') clusterSetupRNG(cl) clusterEvalQ(cl, 'source(funcs.R)') Are you sure those outer single quote marks are not the problem? -- David. testK = function() { k = seq(3, 25, 2) r = parSapply(cl, k, function(i) predR(x, i)) print(r) } testK() stopCluster(cl) The error still pops up: Error in checkForRemoteErrors(val) : 12 nodes produced errors; first error: could not find function predR Best regards, Kaiyin ZHONG -- FMB, Erasmus MC k.zh...@erasmusmc.nl kindlych...@gmail.com On Tue, Apr 23, 2013 at 3:44 PM, Uwe Ligges lig...@statistik.tu-dortmund.de wrote: On 23.04.2013 15:00, Kaiyin Zhong (Victor Chung) wrote: Thanks for the reply. How can i make the functions known to all nodes? See ?clusterEvalQ you may also want to try the parallel packages. Best, Uwe Ligges Best regards, Kaiyin ZHONG --**--** FMB, Erasmus MC k.zh...@erasmusmc.nl mailto:k.zh...@erasmusmc.nl kindlych...@gmail.com mailto:kindlych...@gmail.com On Tue, Apr 23, 2013 at 2:43 PM, Uwe Ligges lig...@statistik.tu-dortmund.**de lig...@statistik.tu-dortmund.de mailto:lig...@statistik.tu-**dortmund.de lig...@statistik.tu-dortmund.de wrote: On 18.04.2013 11:11, Kaiyin Zhong (Victor Chung) wrote: Here is the code, assuming 8 cores in the cpu. library('modeest') library('snow') cl = makeCluster(rep('localhost', 8), 'SOCK') x = vector(length=50) x = sapply(x, function(i) i=sample(c(1,0), 1)) pastK = function(n, x, k) { if (nk) { return(x[(n-k):(n-1)]) } else {return(NA)} } predR = function(x, k) { pastList = lapply(1:length(x), function(n) pastK(n, x, k)) pred = sapply(pastList, function(v) mfv(v)[1]) ratio = sum(pred==x, na.rm=T)/(length(pred) - sum(is.na http://is.na(pred))) } testK = function() { k = seq(3, 25, 2) r = parSapply(cl, k, function(i) predR(x, i)) #r = sapply(k, function(i) predR(x, i)) } r = testK() stopCluster(cl) Here is the error: Error in checkForRemoteErrors(val) : 8 nodes produced errors; first error: could not find function predR predR is not yet known on all nodes, just on the master. You have to tell the nodes about the definition first. Best, Uwe Ligges Best regards, Kaiyin ZHONG -- FMB, Erasmus MC k.zh...@erasmusmc.nl mailto:k.zh...@erasmusmc.nl kindlych...@gmail.com mailto:kindlych...@gmail.com [[alternative HTML version deleted]] __**__ R-help@r-project.org mailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/_**_listinfo/r-help https://stat.ethz.ch/mailman/__listinfo/r-help https://stat.ethz.ch/mailman/**listinfo/r-help https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/__**posting-guide.html http://www.R-project.org/__posting-guide.html http://www.R-project.org/**posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA [[alternative HTML
Re: [R] parSapply can't find function
On 29/04/2013 2:16 PM, Kaiyin Zhong (Victor Chung) wrote: Hi, Uwe. I still don't get how this can be done correctly. Here is what I tried. In the file funcs.R, define these functions: library('modeest') x = vector(length=500) x = sapply(x, function(i) i=sample(c(1,0), 1)) pastK = function(n, x, k) { if (nk) { return(x[(n-k):(n-1)]) } else {return(NA)} } predR = function(x, k) { pastList = lapply(1:length(x), function(n) pastK(n, x, k)) pred = sapply(pastList, function(v) mfv(v)[1]) ratio = sum(pred==x, na.rm=T)/(length(pred) - sum(is.na(pred))) } Then do the following: library('snow') cl = makeCluster(rep('localhost', 12), 'SOCK') clusterSetupRNG(cl) clusterEvalQ(cl, 'source(funcs.R)') The expression being evaluated there is a string, 'source(funcs.R)' You want an expression, e.g. clusterEvalQ(cl, source(funcs.R)) Duncan Murdoch testK = function() { k = seq(3, 25, 2) r = parSapply(cl, k, function(i) predR(x, i)) print(r) } testK() stopCluster(cl) The error still pops up: Error in checkForRemoteErrors(val) : 12 nodes produced errors; first error: could not find function predR Best regards, Kaiyin ZHONG -- FMB, Erasmus MC k.zh...@erasmusmc.nl kindlych...@gmail.com On Tue, Apr 23, 2013 at 3:44 PM, Uwe Ligges lig...@statistik.tu-dortmund.de wrote: On 23.04.2013 15:00, Kaiyin Zhong (Victor Chung) wrote: Thanks for the reply. How can i make the functions known to all nodes? See ?clusterEvalQ you may also want to try the parallel packages. Best, Uwe Ligges Best regards, Kaiyin ZHONG --**--** FMB, Erasmus MC k.zh...@erasmusmc.nl mailto:k.zh...@erasmusmc.nl kindlych...@gmail.com mailto:kindlych...@gmail.com On Tue, Apr 23, 2013 at 2:43 PM, Uwe Ligges lig...@statistik.tu-dortmund.**de lig...@statistik.tu-dortmund.de mailto:lig...@statistik.tu-**dortmund.delig...@statistik.tu-dortmund.de wrote: On 18.04.2013 11:11, Kaiyin Zhong (Victor Chung) wrote: Here is the code, assuming 8 cores in the cpu. library('modeest') library('snow') cl = makeCluster(rep('localhost', 8), 'SOCK') x = vector(length=50) x = sapply(x, function(i) i=sample(c(1,0), 1)) pastK = function(n, x, k) { if (nk) { return(x[(n-k):(n-1)]) } else {return(NA)} } predR = function(x, k) { pastList = lapply(1:length(x), function(n) pastK(n, x, k)) pred = sapply(pastList, function(v) mfv(v)[1]) ratio = sum(pred==x, na.rm=T)/(length(pred) - sum(is.na http://is.na(pred))) } testK = function() { k = seq(3, 25, 2) r = parSapply(cl, k, function(i) predR(x, i)) #r = sapply(k, function(i) predR(x, i)) } r = testK() stopCluster(cl) Here is the error: Error in checkForRemoteErrors(val) : 8 nodes produced errors; first error: could not find function predR predR is not yet known on all nodes, just on the master. You have to tell the nodes about the definition first. Best, Uwe Ligges Best regards, Kaiyin ZHONG -- FMB, Erasmus MC k.zh...@erasmusmc.nl mailto:k.zh...@erasmusmc.nl kindlych...@gmail.com mailto:kindlych...@gmail.com [[alternative HTML version deleted]] __**__ R-help@r-project.org mailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/_**_listinfo/r-helphttps://stat.ethz.ch/mailman/__listinfo/r-help https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/__**posting-guide.htmlhttp://www.R-project.org/__posting-guide.html http://www.R-project.org/**posting-guide.htmlhttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] parSapply can't find function
Sorry, I got some new error: Error in cut.default(i, breaks) : 'breaks' are not unique traceback() 20: stop('breaks' are not unique) 19: cut.default(i, breaks) 18: cut(i, breaks) 17: split.default(i, cut(i, breaks)) 16: split(i, cut(i, breaks)) 15: structure(split(i, cut(i, breaks)), names = NULL) 14: splitIndices(length(x), ncl) 13: lapply(splitIndices(length(x), ncl), function(i) x[i]) 12: splitList(x, length(cl)) 11: staticClusterApply(cl, fun, length(x), argfun) 10: clusterApply(cl, splitList(x, length(cl)), lapply, fun, ...) 9: lapply(args, enquote) 8: do.call(fun, lapply(args, enquote)) 7: docall(c, clusterApply(cl, splitList(x, length(cl)), lapply, fun, ...)) 6: parLapply(cl, as.list(X), FUN, ...) 5: parSapply(cl, eff, pow_error) at testing.R#5 4: eval(expr, envir, enclos) 3: eval(ei, envir) 2: withVisible(eval(ei, envir)) 1: source(testing.R) The sequential run was ok. Best regards, Kaiyin ZHONG -- FMB, Erasmus MC k.zh...@erasmusmc.nl kindlych...@gmail.com On Mon, Apr 29, 2013 at 8:26 PM, Duncan Murdoch murdoch.dun...@gmail.comwrote: On 29/04/2013 2:16 PM, Kaiyin Zhong (Victor Chung) wrote: Hi, Uwe. I still don't get how this can be done correctly. Here is what I tried. In the file funcs.R, define these functions: library('modeest') x = vector(length=500) x = sapply(x, function(i) i=sample(c(1,0), 1)) pastK = function(n, x, k) { if (nk) { return(x[(n-k):(n-1)]) } else {return(NA)} } predR = function(x, k) { pastList = lapply(1:length(x), function(n) pastK(n, x, k)) pred = sapply(pastList, function(v) mfv(v)[1]) ratio = sum(pred==x, na.rm=T)/(length(pred) - sum(is.na(pred))) } Then do the following: library('snow') cl = makeCluster(rep('localhost', 12), 'SOCK') clusterSetupRNG(cl) clusterEvalQ(cl, 'source(funcs.R)') The expression being evaluated there is a string, 'source(funcs.R)' You want an expression, e.g. clusterEvalQ(cl, source(funcs.R)) Duncan Murdoch testK = function() { k = seq(3, 25, 2) r = parSapply(cl, k, function(i) predR(x, i)) print(r) } testK() stopCluster(cl) The error still pops up: Error in checkForRemoteErrors(val) : 12 nodes produced errors; first error: could not find function predR Best regards, Kaiyin ZHONG -- FMB, Erasmus MC k.zh...@erasmusmc.nl kindlych...@gmail.com On Tue, Apr 23, 2013 at 3:44 PM, Uwe Ligges lig...@statistik.tu-dortmund.**de lig...@statistik.tu-dortmund.de wrote: On 23.04.2013 15:00, Kaiyin Zhong (Victor Chung) wrote: Thanks for the reply. How can i make the functions known to all nodes? See ?clusterEvalQ you may also want to try the parallel packages. Best, Uwe Ligges Best regards, Kaiyin ZHONG --**--** FMB, Erasmus MC k.zh...@erasmusmc.nl mailto:k.zh...@erasmusmc.nl kindlych...@gmail.com mailto:kindlych...@gmail.com On Tue, Apr 23, 2013 at 2:43 PM, Uwe Ligges lig...@statistik.tu-dortmund.de lig...@statistik.tu-dortmund.** de lig...@statistik.tu-dortmund.de mailto:lig...@statistik.tu-dortmund.de http://dortmund.de ligges@statistik.**tu-dortmund.de lig...@statistik.tu-dortmund.de wrote: On 18.04.2013 11:11, Kaiyin Zhong (Victor Chung) wrote: Here is the code, assuming 8 cores in the cpu. library('modeest') library('snow') cl = makeCluster(rep('localhost', 8), 'SOCK') x = vector(length=50) x = sapply(x, function(i) i=sample(c(1,0), 1)) pastK = function(n, x, k) { if (nk) { return(x[(n-k):(n-1)]) } else {return(NA)} } predR = function(x, k) { pastList = lapply(1:length(x), function(n) pastK(n, x, k)) pred = sapply(pastList, function(v) mfv(v)[1]) ratio = sum(pred==x, na.rm=T)/(length(pred) - sum(is.na http://is.na(pred))) } testK = function() { k = seq(3, 25, 2) r = parSapply(cl, k, function(i) predR(x, i)) #r = sapply(k, function(i) predR(x, i)) } r = testK() stopCluster(cl) Here is the error: Error in checkForRemoteErrors(val) : 8 nodes produced errors; first error: could not find function predR predR is not yet known on all nodes, just on the master. You have to tell the nodes about the definition first. Best, Uwe Ligges Best regards, Kaiyin ZHONG -- FMB, Erasmus MC k.zh...@erasmusmc.nl mailto:k.zh...@erasmusmc.nl kindlych...@gmail.com mailto:kindlych...@gmail.com [[alternative HTML version deleted]]
Re: [R] speed of a vector operation question
Thank you all very much for your time and suggestions. The link to stackoverflow was very helpful. Here are some timings in case someone wants to know. (I noticed that microbenchmark results vary, depending on how many functions one tries to benchmark at a time. However, the min stays about the same) # just to refresh, most of the code is from stackoverflow link provided by Martin Morgan : http://stackoverflow.com/questions/16213029/more-efficient- strategy-for-which-or-match f0 - function(v) length(which(v 0)) f1 - function(v) sum(v 0) f2 - function(v) which.min(v 0) - 1L f3 - function(x) { # binary search implemented in R imin - 1L imax - length(x) while (imax = imin) { imid - as.integer(imin + (imax - imin) / 2) if (x[imid] = 0) imax - imid - 1L else imin - imid + 1L } imax } f3.c - cmpfun(f3) # pre-compiled # binary search in C f4 - cfunction(c(x = numeric), int imin = 0, imax = Rf_length(x) - 1, imid; while (imax = imin) { imid = imin + (imax - imin) / 2; if (REAL(x)[imid] = 0) imax = imid - 1; else imin = imid + 1; } return ScalarInteger(imax + 1); ) # this one is separate suggestion by William Dunlap : f5 - function(v) { tabulate(findInterval(v, c(-Inf, 0, 1, Inf)))[1] } vec - c(seq(-100,-1,length.out=1e6), rep(0,20), seq(1,100,length.out=1e6)) # the identity of results was verified microbenchmark(f1(vec), f2(vec), f3(vec), f3.c(vec), f4(vec), f5(vec)) Unit: microseconds expr min lqmedian uq max neval f1(vec) 17054.233 17831.1385 18514.305 19512.4705 54603.435 100 f2(vec) 23624.353 25026.4265 26034.785 29322.1150 60014.458 100 f3(vec)76.90293.2340 111.834 116.8370 129.888 100 f3.c(vec)21.88330.753037.75754.125062.939 100 f4(vec) 6.57510.588530.38931.938537.610 100 f5(vec) 35365.088 36767.6175 38317.103 40671.2000 69209.425 100 So, i'll try to go with the inline binary search and see if I can precompile complex conditions. Thank you, again, for your help! Mikhail. On Friday, April 26, 2013 20:52:27 Suzen, Mehmet wrote: Hello Mikhail, I could suggest you to use ff package for fast access to large data structures: http://cran.r-project.org/web/packages/ff/index.html http://wsopuppenkiste.wiso.uni-goettingen.de/ff/ff_1.0/inst/doc/ff.pdf Best Mehmet On 26 April 2013 18:12, Mikhail Umorin mike...@gmail.com wrote: Hello, I am dealing with numeric vectors 10^5 to 10^6 elements long. The values are sorted (with duplicates) in the vector (v). I am obtaining the length of vectors such as (v c) or (v c1 v c2), where c, c1, c2 are some scalar variables. What is the most efficient way to do this? I am using sum(v c) since TRUE's are 1's and FALSE's are 0's. This seems to me more efficient than length(which(v c)), but, please, correct me if I'm wrong. So, is there anything faster than what I already use? I'm running R 2.14.2 on Linux kernel 3.4.34. I appreciate your time, Mikhail [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Function for Data Frame
Just to add a little, don't get distracted by the return() function. Functions return the value of their final expression, provided it isn't an assignment. For your example, this will do the job: myfunc - function(DF) subset(DF, select=-V1) If you want to modify the data frames in place, one way is to use a loop instead of lapply. mydfs - list(DF1, DF2, DF3) for (il in 1:3) mydfs[[il]] - myfunc(mydfs[[il]]) But so should mydfs - lapply(mydfs,myfunc) I doubt very much you'll see any performance difference between using lapply() and using an explicit loop. -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 4/29/13 7:23 AM, Sparks, John James jspa...@uic.edu wrote: Dear R Helpers, I have about 20 data frames that I need to do a series of data scrubbing steps to. I have the list of data frames in a list so that I can use lapply. I am trying to build a function that will do the data scrubbing that I need. However, I am new to functions and there is something fundamental that I am not understanding. I use the return function at the end of the function and this completes the data processing specified in the function, but leaves the data frame that I want changed unaffected. How do I get my function to apply its results to the data frame in question instead of simply displaying the results to the screen? Any helpful guidance would be most appreciated. --John Sparks x=as.data.frame(matrix(c(1,2,3, 1,2,3, 1,2,2, 1,2,2, 1,1,1),ncol=3,byrow=T)) myfunc-function(DF){ DF-subset(DF,select=-c(V1)) return(DF) } myfunc(x) #How to get this change to data frame x? #And preferrably not send the results to the screen? x __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rbinding some elements from a list and obtain another list
In addition to the other responses, consider this: i - 3 i:i+1 [1] 4 i:(i+1) [1] 3 4 -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 4/29/13 6:54 AM, De Castro Pascual, Montserrat mdecas...@creal.cat wrote: Hi everybody, I have a list, where every element of this list is a data frame. An example: Mylist-list(A=data.frame, B=data.frame, C=data.frame, D=data.frame) I want to rbind some elements of this list. As an example: Output-list(AB=data.frame, CD=data.frame) Where AB=rbind(A,B) CD=rbind(C,D) I¹ve tried: f-function(x){ for (i in seq(1,length(names(x)),2)){ aa-do.call(rbind,x[i:i+1]) aa }} bb-f(mylist) or f-function(x){ for (i in seq(1,length(names(x)),2)){ aa[i]-do.call(rbind,x[i:i+1]) list(aa[i]) }} bb-f (mylist) but it doesn¹t works f-function(x){ + for (i in seq(1,length(names(x)),2)){ + aa-do.call(rbind,x[i:i+1]) + aa + }} bb-f(mylist) bb NULL f-function(x){ + for (i in seq(1,length(names(x)),2)){ + aa-do.call(rbind,x[i:i+1]) + aa + }} bb-f(mylist) f-function(x){ + for (i in seq(1,length(names(x)),2)){ + aa[i]-do.call(rbind,x[i:i+1]) + list(aa[i]) + }} bb-f(mylist) Mensajes de aviso perdidos 1: In aa[i] - do.call(rbind, x[i:i + 1]) : número de items para para sustituir no es un múltiplo de la longitud del reemplazo 2: In aa[i] - do.call(rbind, x[i:i + 1]) : número de items para para sustituir no es un múltiplo de la longitud del reemplazo 3: In aa[i] - do.call(rbind, x[i:i + 1]) : número de items para para sustituir no es un múltiplo de la longitud del reemplazo 4: In aa[i] - do.call(rbind, x[i:i + 1]) : número de items para para sustituir no es un múltiplo de la longitud del reemplazo 5: In aa[i] - do.call(rbind, x[i:i + 1]) : número de items para para sustituir no es un múltiplo de la longitud del reemplazo 6: In aa[i] - do.call(rbind, x[i:i + 1]) : número de items para para sustituir no es un múltiplo de la longitud del reemplazo Thanks! Montserrat [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] expanding a presence only dataset into presence/absence
HI, Check if this is what you wanted. I am not sure the Hopeful outcome includes all the possible combinations. dat1- read.csv(Matthewdat.csv,sep=,,header=TRUE,stringsAsFactors=FALSE) dat1 # Species CallingIndex Site Date #1 Pseudacris crucifer 2 3608 3/31/2001 #2 Anaxyrus fowleri 2 3638 4/13/2001 #3 Pseudacris crucifer 3 3641 3/23/2001 #4 Pseudacris kalmi 1 3641 3/23/2001 #5 Lithobates catesbeianus 1 3641 4/27/2001 #6 Pseudacris crucifer 2 3641 4/27/2001 #7 Pseudacris crucifer 3 3663 4/5/2001 #8 Pseudacris crucifer 2 3663 5/2/2001 #9 Lithobates clamitans 1 3663 6/6/2001 dat1New-do.call(rbind,lapply(split(dat1,dat1$Site), function(x) {x$Present-1; x})) row.names(dat1New)-1:nrow(dat1New) dat2-do.call(rbind,lapply(split(dat1,dat1$Site),function(x) expand.grid(unique(x$Species),unique(x$Site),unique(x$Date row.names(dat2)- 1:nrow(dat2) colnames(dat2)- colnames(dat1)[c(1,3,4)] res-merge(dat1New,dat2,by=c(Species,Site,Date),all=TRUE) res[is.na(res)]-0 res # Species Site Date CallingIndex Present #1 Anaxyrus fowleri 3638 4/13/2001 2 1 #2 Lithobates catesbeianus 3641 3/23/2001 0 0 #3 Lithobates catesbeianus 3641 4/27/2001 1 1 #4 Lithobates clamitans 3663 4/5/2001 0 0 #5 Lithobates clamitans 3663 5/2/2001 0 0 #6 Lithobates clamitans 3663 6/6/2001 1 1 #7 Pseudacris crucifer 3608 3/31/2001 2 1 #8 Pseudacris crucifer 3641 3/23/2001 3 1 #9 Pseudacris crucifer 3641 4/27/2001 2 1 #10 Pseudacris crucifer 3663 4/5/2001 3 1 #11 Pseudacris crucifer 3663 5/2/2001 2 1 #12 Pseudacris crucifer 3663 6/6/2001 0 0 #13 Pseudacris kalmi 3641 3/23/2001 1 1 #14 Pseudacris kalmi 3641 4/27/2001 0 0 A.K. From: Matthew Venesky mvene...@gmail.com To: arun smartpink...@yahoo.com Sent: Monday, April 29, 2013 3:54 PM Subject: Re: [R] expanding a presence only dataset into presence/absence Arun, Thanks again for your time on this. We are getting very close but not quite there. The problem is that I only gave you a very simple example because I didn't want to bog any of the readers of the blog down. If you have any interest or time, I was wondering if you could consider the full example and some actual data (attached CSV). As you'll see, there is an additional column titled CallingIndex, which is an estimate of the species abundance (range of 1-3). If they were present, they were given a value that ranged from 1-3; if they were absent, they were not given any value. Editing your code to reflect this wasn't a problem. However, what I didn't explain in enough detail to you is the specific contexts when we want to add zeros to the data. Essentially, we want to nest species within site and date and add zeros accordingly. If a species is never found at a site, we do not want to make any adjustments to the data. For example, Anaxyrus fowleri was not found at site 3608, so we do not want the code to add a row with Anaxyrus fowleri to site 3608 (in your code, it would add this). What we do want, however, is to add a zero for a species that was found on one date at a site but never found again on other dates. For example, Lithobates clamitans was found at site 3663 on 6/6/2001 but not observed on the other 2 sampling dates, so we want to assign a calling index of 0 for Lithobates clamitans on sampling date 4/5/2001 and 5/2/2001 for site 3663 (and also make the appropriate addition for Pseudacris crucifer on the appropriate sampling dates). You should be able to visualize what I am looking to do in the CSV file attached to this email. Does this make sense? Do you know of any code to do this task? -- Matthew D. Venesky, Ph.D. Postdoctoral Research Associate, Department of Integrative Biology, The University of South Florida, Tampa, FL 33620 Website: http://mvenesky.myweb.usf.edu/ On Mon, Apr 29, 2013 at 2:11 PM, arun smartpink...@yahoo.com wrote: Hi Matthew, No problem. Regards, Arun From: Matthew Venesky mvene...@gmail.com To: arun smartpink...@yahoo.com Sent: Monday, April 29, 2013 2:09 PM Subject: Re: [R] expanding a presence only dataset into presence/absence This, my friend, is a stroke of genius. I'll give it a try on the real data and I will keep you posted. Many, many, thanks. -- Matthew D. Venesky, Ph.D. Postdoctoral Research Associate, Department of Integrative Biology, The University of South Florida, Tampa, FL 33620 Website: http://mvenesky.myweb.usf.edu/ On Mon, Apr 29, 2013 at 2:05 PM, arun
Re: [R] Arma - estimate of variance of white noise variables
Hello, Em 29-04-2013 13:49, Preetam Pal escreveu: Hi all, Suppose I am fitting an arma(p,q) model to a time series y_t. So, my model should contain (q+1) white noise variables. Why? How on hearth can you say this? As far as I know, each of them should have the same variance. How do I get the estimate of this variance by running the arma(y) function (or is there any other way)? I'm not certain that the following is what you're looking for. library(tseries) fit - arma(y, ...etc...) var(resid(fit)) Hope this helps, Rui Barradas Appreciate your help. Thanks, Preetam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] biplot for principal componens analysis
I did a PCA for my data which has a dimension of 19000X4 using princomp pca2=princomp((data), cor=F) and obtained a biplot with 19000 labels which were very busy. How can I just show 19000 spot w/o labels? biplot(pca2) Thanks a lot:)) -data A1 A2 L1 L2 E_6 0.23 4.05 13.35 11.86 E_00011 118.74 177.87 144.20 136.05 E_00062 8.50 0.60 73.11 45.81 E_00070 1.31 4.92 0.98 1.23 E_00071 97.41 39.90 31.15 150.77 E_00104 0.00 0.43 18.93 31.28 . . . . . . . . . E_18586 0.00 0.0 0.00 0.95 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] getting started in parallel computing on a windows OS
Martin, This worked, thanks again! *Ben Caldwell* Graduate Fellow University of California, Berkeley 130 Mulford Hall #3114 Berkeley, CA 94720 Office 223 Mulford Hall (510)859-3358 On Thu, Apr 25, 2013 at 10:04 PM, Benjamin Caldwell btcaldw...@berkeley.edu wrote: Thanks for this martin. I'll start retooling and let you know how it goes. Ben Caldwell Graduate fellow On Apr 24, 2013 4:34 PM, Martin Morgan mtmor...@fhcrc.org wrote: On 04/24/2013 02:50 PM, Benjamin Caldwell wrote: Dear R help, I've what I think is a fairly simple parallel problem, and am getting bogged down in documentation and packages for much more complex situations. I have a big matrix (30^5,5]. I have a function that will act on each row of that matrix sequentially and output the 'best' result from the whole matrix (it compares the result from each row to the last and keeps the 'better' result). I would like to divide that first large matrix into chunks equal to the number of cores I have available to me, and work through each chunk, then output the results from each chunk. I'm really having trouble making head or tail of how to do this on a windows machine - lots of different false starts on several different packages now. Basically, I have the function, and I can of course easily divide the matrix into chunks. I just need a way to process each chunk in parallel (other than opening new R sessions for each core manually). Any help much appreciated - after two days of trying to get this to work I'm pretty burnt out. Hi Ben -- in your code from this morning you had a function fitting - function(ndx.grd=two,dt.grd=**one,ind.vr='ind',rsp.vr='res') { ## ... setup for(i in 1:length(ndx.grd[,1])){ ## ... do work } ## ... collate results } that you're trying to run in parallel. Obviously the ## ... represent lines I've removed. When you say something like y - foreach(icount(length(two))) %dopar% fitting() its saying that you want to run fitting() length(two) times. So you're actually doing the same thing length(two) times, whereas you really want to divide the work thats inside fitting() into chunks, and do those on separate cores! Conceptually what you'd like to do is fit_one - function(idx, ndx.grd, dt.grd, ind.vr, rsp.vr) { ## ... do work on row idx _ONLY_ } and then evaluate with ## ... setup y - foreach (idx = icount(nrow(two)) %dopar% one_fit(idx, two, one, ind, res) ## ... collate so that fit_one fits just one of your combinations. foreach will worry about distributing the work. Make sure that fit_one works first, before trying to run this in parallel; your use of try(), trying to fit different data types (character, integer, numeric) into a matrix rather than data.frame, and the type coercions all indicate that you're fighting with R rather than working with it. Hope that helps, Martin Thanks *Ben Caldwell* [[alternative HTML version deleted]] __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] interesting behavior from aaply
Dear helpers, I'm using plyr to process a large matrix for the first time. My code is set up to work with matrixes, since I learned the hard way that dataframes are considerably slower to process. I started using aaply(), but the data was rearranged from a flat matrix to a [, , 4] array for larger input matrixes. I'm sure something clever is happening that I'm just not seeing - anyone have any insight? I can provide the code for index.frames() if you like, but it's pretty turgid stuff. For now I'm just using adply(), since it gives an output that I'd expect. Best toy.mat - all.combinations[10:30,] aaply(toy.mat,1, index.frames) Var1 1234 46 3599.848 3665.454 12946.41 12946.41 51 3600.020 3666.424 12946.41 12946.41 56 3600.167 3667.301 12946.41 12946.41 61 3600.291 3668.058 12946.41 12946.41 66 3600.404 3668.766 12946.41 12946.41 71 3600.563 3669.779 12946.41 12946.41 76 3600.563 3669.779 12946.41 12946.41 81 3600.563 3669.779 12946.41 12946.41 86 3600.563 3669.779 12946.41 12946.41 91 3600.563 3669.779 12946.41 12946.41 96 3600.563 3669.779 12946.41 12946.41 101 3600.563 3669.779 12946.41 12946.41 106 3600.563 3669.779 12946.41 12946.41 111 3600.563 3669.779 12946.41 12946.41 116 3600.563 3669.779 12946.41 12946.41 121 3600.563 3669.779 12946.41 12946.41 126 3600.563 3669.779 12946.41 12946.41 131 3600.563 3669.779 12946.41 12946.41 136 3600.563 3669.779 12946.41 12946.41 141 3600.563 3669.779 12946.41 12946.41 146 3600.563 3669.779 12946.41 12946.41 toy.mat - all.combinations[10:100,] aaply(toy.mat,1, index.frames, .progress=win) , , = 1 Var2 Var1 27 12 17 1 NA 3632.275 3630.730 3627.652 6 NA 3638.913 3638.271 3635.214 11NA 3592.933 3593.322 3595.973 16NA 3588.024 3588.232 3589.256 21NA 3593.917 3594.088 3594.834 26NA 3596.888 3597.051 3597.752 31NA 3597.896 3598.056 3598.741 36NA 3598.994 3599.153 3599.837 41NA 3599.571 3599.729 3600.413 46 3599.848 3599.848 3600.006 3600.689 51 3600.020 3600.020 3600.178 NA 56 3600.167 3600.167 3600.325 NA 61 3600.291 3600.291 3600.448 NA 66 3600.404 3600.404 3600.561 NA 71 3600.563 3600.563 3600.721 NA 76 3600.563 3600.563 3600.721 NA 81 3600.563 3600.563 3600.721 NA 86 3600.563 3600.563 3600.721 NA 91 3600.563 3600.563 3600.721 NA 96 3600.563 3600.563 3600.721 NA 101 3600.563 3600.563 3600.721 NA 106 3600.563 3600.563 3600.721 NA 111 3600.563 3600.563 3600.721 NA 116 3600.563 3600.563 3600.721 NA 121 3600.563 3600.563 3600.721 NA 126 3600.563 3600.563 3600.721 NA 131 3600.563 3600.563 3600.721 NA 136 3600.563 3600.563 3600.721 NA 141 3600.563 3600.563 3600.721 NA 146 3600.563 3600.563 3600.721 NA , , = 2 Var2 Var1 27 12 17 1 NA 3681.001 3698.490 3688.247 6 NA 3664.453 3676.527 3666.970 11NA 3662.162 3662.919 3668.211 16NA 3661.484 3661.476 3661.975 21NA 3648.731 3647.986 3650.290 26NA 3649.497 3648.367 3653.130 31NA 3652.778 3651.586 3656.638 36NA 3660.082 3659.050 3662.755 41NA 3663.944 3663.025 3665.933 46 3665.454 3665.454 3664.572 3667.226 51 3666.424 3666.424 3665.571 NA 56 3667.301 3667.301 3666.477 NA 61 3668.058 3668.058 3667.261 NA 66 3668.766 3668.766 3667.998 NA 71 3669.779 3669.779 3669.056 NA 76 3669.779 3669.779 3669.056 NA 81 3669.779 3669.779 3669.056 NA 86 3669.779 3669.779 3669.056 NA 91 3669.779 3669.779 3669.056 NA 96 3669.779 3669.779 3669.056 NA 101 3669.779 3669.779 3669.056 NA 106 3669.779 3669.779 3669.056 NA 111 3669.779 3669.779 3669.056 NA 116 3669.779 3669.779 3669.056 NA 121 3669.779 3669.779 3669.056 NA 126 3669.779 3669.779 3669.056 NA 131 3669.779 3669.779 3669.056 NA 136 3669.779 3669.779 3669.056 NA 141 3669.779 3669.779 3669.056 NA 146 3669.779 3669.779 3669.056 NA , , = 3 Var2 Var1 27 12 17 1 NA 12946.41 12946.41 12946.41 6 NA 12946.41 12946.41 12946.41 11NA 12946.41 12946.41 12946.41 16NA 12946.41 12946.41 12946.41 21NA 12946.41 12946.41 12946.41 26NA 12946.41 12946.41 12946.41 31NA 12946.41 12946.41 12946.41 36NA 12946.41 12946.41 12946.41 41NA 12946.41 12946.41 12946.41 46 12946.41 12946.41 12946.41 12946.41 51 12946.41 12946.41 12946.41 NA 56 12946.41 12946.41 12946.41 NA 61 12946.41 12946.41 12946.41 NA 66 12946.41 12946.41 12946.41 NA 71 12946.41 12946.41 12946.41 NA 76
[R] bigmemory and R 3.0
Dear helpers, Does anyone have information on the status of bigmemory and R3.0? Will it just take time for the devs to re-code for the new environment? Or is there an alternative for this new version? Thanks Ben Caldwell [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R help - bootstrap with survival analysis
Hi, I'm not sure if this is the proper way to ask questions, sorry if not. But here's my problem: I'm trying to do a bootstrap estimate of the mean for some survival data. Is there a way to specifically call upon the rmean value, in order to store it in an object? I've used print(...,print.rmean=T) to print the summary of survfit, but I'm not sure how to access only rmean because it does not show up under attributes for survfit. Thanks for any help in advance! Fayaaz Khatri [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bigmemory and R 3.0
On 29 April 2013 at 15:46, Benjamin Caldwell wrote: | Dear helpers, | | Does anyone have information on the status of bigmemory and R3.0? Will it | just take time for the devs to re-code for the new environment? Or is there | an alternative for this new version? It just works, with R 3.0.0 and other versions (see below). Did you maybe forget to reinstall any of the relevant packages? Dirk edd@max:~$ R R version 3.0.0 (2013-04-03) -- Masked Marvel Copyright (C) 2013 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. R library(bigmemory) Loading required package: bigmemory.sri Loading required package: BH bigmemory = 4.0 is a major revision since 3.1.2; please see packages biganalytics and and bigtabulate and http://www.bigmemory.org for more information. R -- Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Is there a function that print a string vertically (by adding \n)?
Hi, I'd like to print a string vertically. For example, I would like to print abcd as a\nb\nc\nd Is there a function in R such that Input: abcd Output: a\nb\nc\nd? Thanks, Miao [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there a function that print a string vertically (by adding \n)?
Hi, May be this helps: cat(paste(strsplit(abcd,)[[1]],collapse=\n)) #a #b #c #d A.K. - Original Message - From: jpm miao miao...@gmail.com To: r-help r-help@r-project.org Cc: Sent: Monday, April 29, 2013 9:41 PM Subject: [R] Is there a function that print a string vertically (by adding \n)? Hi, I'd like to print a string vertically. For example, I would like to print abcd as a\nb\nc\nd Is there a function in R such that Input: abcd Output: a\nb\nc\nd? Thanks, Miao [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Looping Over Data Frames
On Apr 29, 2013, at 6:03 PM, Sparks, John James wrote: Dear R Helpers, I am re-phrasing a question that I put forth earlier today due to some particulars in the solution that I am searching for. Many thanks to those who answered the previous post and to any who would be willing to answer this one. I have a set of data frames. I need to perform some data scrubbing on each of them. I am trying to figure out how to perform the same steps on each data frame using some sort of loop or something along those lines. Because my actual data frames are quite large and the steps I am taking moderately complicated, I would very much prefer not to put them all together in a list because when I get an error, I can't determine which part of the list is the source of the error. So, I would really appreciate it if someone could post a way to perform the following sub setting function on the three simple data frames in the example below with some sort of loop or something along those lines, which would work directly on the data frames in question. Many thanks in advance. Please let me know if there is anything I can do to make the question more clear. --John Sparks x=as.data.frame(matrix(c(1,2,3, 1,2,3, 1,2,2, 1,2,2, 1,1,1),ncol=3,byrow=T)) y=as.data.frame(matrix(c(1,2,3, 1,2,3, 1,2,2, 1,2,2, 1,1,1),ncol=3,byrow=T)) z=as.data.frame(matrix(c(1,2,3, 1,2,3, 1,2,2, 1,2,2, 1,1,1),ncol=3,byrow=T)) #Want to build some sort of loop for this. x-subset(x,select=-c(V1)) y-subset(y,select=-c(V1)) z-subset(z,select=-c(V1)) for(i in letters[24:26] ) assign( i, subset(get(i), select=-c(V1)) ) x V2 V3 1 2 3 2 2 3 3 2 2 4 2 2 5 1 1 -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question regarding error x and y lengths differ
Hello, I'm a first semester statistics studenthttp://r.789695.n4.nabble.com/Question-regarding-error-quot-x-and-y-lengths-differ-quot-td4665773.html#and I am using R for roughly the third time ever. I am following a tutorial and yet I still get the error x and y lengths differ. I am very new to this program, and I have searched for solutions, but because I do not understand the program too well, I am not sure which solution may apply to me. Any help is much appreciated! The question is as follows: Do this given your population standard deviation. If we pick a confidence interval, say á = 0.1(90% confidence), we can compute a confidence interval for our measure of the population mean for each one of our samples. Now let¢s compute and plot the confidence intervals for the 50 samples: m = 50; n = 40; mu = mean(pop); sigma = sd(pop); SE = sigma/sqrt(n) # Standard error in mean alpha = 0.10 ; zstar = qnorm(1-alpha/2); # Find z for 90% confidence matplot(rbind( samp_mean - zstar*SE, samp_mean + zstar*SE),rbind(1:m,1:m), type=l, lty=1); abline(v=mu) I am receiving the error Error in xy.coords(x, y, xlabel, ylabel, log = log) : 'x' and 'y' lengths differ when inputting matplot(rbind( samp_mean - zstar*SE, samp_mean + zstar*SE),rbind(1:m,1:m), type=l, lty=1); Thank you very much! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there a function that print a string vertically (by adding \n)?
On Apr 29, 2013, at 6:41 PM, jpm miao wrote: Hi, I'd like to print a string vertically. For example, I would like to print abcd as a\nb\nc\nd Is there a function in R such that Input: abcd Output: a\nb\nc\nd? do.call( paste, list( strsplit(abcd, )[[1]] , collapse=\\n)) [1] a\\nb\\nc\\nd Notice that I am refusing to acquiese by your request because I do not think you understand how escaped characters are represented in R. (In programming the customer is not always right.) -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there a function that print a string vertically (by adding \n)?
On Apr 29, 2013, at 6:50 PM, David Winsemius wrote: On Apr 29, 2013, at 6:41 PM, jpm miao wrote: Hi, I'd like to print a string vertically. For example, I would like to print abcd as a\nb\nc\nd Is there a function in R such that Input: abcd Output: a\nb\nc\nd? do.call( paste, list( strsplit(abcd, )[[1]] , collapse=\\n)) [1] a\\nb\\nc\\nd Notice that I am refusing to acquiese by your request because I do not think you understand how escaped characters are represented in R. (In programming the customer is not always right.) Not is the programmer. I see that: cat( a\nb\nc\nd) ... is probably what you wanted and my answer was not. Apologies for the noise. -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R/3.0.0 serialize limits
Hello, I am wondering if I am misinterpreting something from R/3.0.0 NEWS LONG VECTORS: This section applies only to 64-bit platforms. ... o serialize() to a raw vector is unlimited in size (except by resources). However when I try the following it fails: foo - raw(25) print(object.size(foo),units=auto) 2.3 Gb bar - serialize(foo, NULL) Error: serialization is too large to store in a raw vector However this works: foo - raw(20) print(object.size(foo),units=auto) 1.9 Gb bar - serialize(foo, NULL) So it appears there may be a 2GB limit, which I've read should only be the case for 32-bit or pre-R/3.0.0 installations. sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_AU.UTF-8LC_COLLATE=en_AU.UTF-8 [5] LC_MONETARY=en_AU.UTF-8LC_MESSAGES=en_AU.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base I've tested this on a few different hosts with no success - Ubuntu Raring Ringtail (apt-get), SLES 11.2 (building R from source), and Windows binaries Regards, Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question regarding error x and y lengths differ
On 04/30/2013 11:38 AM, Sean Doyle wrote: Hello, I'm a first semester statistics studenthttp://r.789695.n4.nabble.com/Question-regarding-error-quot-x-and-y-lengths-differ-quot-td4665773.html#and I am using R for roughly the third time ever. I am following a tutorial and yet I still get the error x and y lengths differ. I am very new to this program, and I have searched for solutions, but because I do not understand the program too well, I am not sure which solution may apply to me. Any help is much appreciated! The question is as follows: Do this given your population standard deviation. If we pick a confidence interval, say á = 0.1(90% confidence), we can compute a confidence interval for our measure of the population mean for each one of our samples. Now let¢s compute and plot the confidence intervals for the 50 samples: m = 50; n = 40; mu = mean(pop); sigma = sd(pop); SE = sigma/sqrt(n) # Standard error in mean alpha = 0.10 ; zstar = qnorm(1-alpha/2); # Find z for 90% confidence matplot(rbind( samp_mean - zstar*SE, samp_mean + zstar*SE),rbind(1:m,1:m), type=l, lty=1); abline(v=mu) I am receiving the error Error in xy.coords(x, y, xlabel, ylabel, log = log) : 'x' and 'y' lengths differ when inputtingmatplot(rbind( samp_mean - zstar*SE, samp_mean + zstar*SE),rbind(1:m,1:m), type=l, lty=1); Hi Sean, Without doing your homework for you, I would suggest trying this: length(samp_mean) length(zstar*SE) length(1:m) I would be very surprised if you got the same answer for all three. Also you might want to read the help page for matplot carefully to decide which is the x and which is the y you want to plot. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bigmemory and R 3.0
On 29/04/2013 23:46, Benjamin Caldwell wrote: Dear helpers, Does anyone have information on the status of bigmemory and R3.0? Will it just take time for the devs to re-code for the new environment? Or is there an alternative for this new version? What are you asking about? 'bigmemory' has been available for R 3.0.0 (sic) for a long time for all OSes bar Solaris and Windows, where the maintainers excluded it a long time ago (not just for R 3.0.0). Thanks Ben Caldwell [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Please see what it has to say about mis-reading R version numbers and HTML mail, and asking maintainers about their packages. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.