Re: [R] Find the prediction or the fitted values for an lm model
On Thu, 28 Nov 2013, jpm miao wrote: Hi, I would like to fit my data with a 4th order polynomial. Now I have only 5 data point, I should have a polynomial that exactly pass the five point Then I would like to compute the fitted or predict value with a relatively large x dataset. How can I do it? BTW, I thought the model prodfn should pass by (0,0), but I just wonder why the const is unequal to zero x1-c(0,3,4,5,8) y1-c(0,1,4,7,8) prodfn-lm(y1 ~ poly(x1, 4)) x-seq(0,8,0.01) temp-predict(prodfn,data.frame(x=x)) # This line does not work.. You need to call the variable x1 because that is the name you used in the original data: plot(x, predict(prodfn,data.frame(x1=x)), type = l) points(x1, y1) prodfn Call: lm(formula = y1 ~ poly(x1, 4)) Coefficients: (Intercept) poly(x1, 4)1 poly(x1, 4)2 poly(x1, 4)3 poly(x1, 4)4 4.000e+00 6.517e+00-4.918e-16-2.744e+00-8.882e-16 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Automatic saving of many regression's output
Hi, lst1[[1]][,2] - NA lst2 - lapply(lst1,function(x) summary(lm(rate~.,data=x))) Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases lst2 - lapply(lst1[sapply(lst1,function(x) !(all(rowSums(is.na(x))0)))],function(x) summary(lm(rate~.,data=x)) ) A.K. Hi, thank you for help. :-) I applied your script to the data but I have got the error: Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) casesI forget to write that some of the data are NA. I executed this code: lst1 - split(data[,-16],data[,16]) any(sapply(lst1,nrow)!=123) #[1] FALSE lst2 - lapply(lst1,function(x) summary(lm(rate~cap.log+liqamih.log+pbv,data=x))) # here I can set the dependent variables if I want to test different versions of the model (e.g with only e dependent variables), right? length(lst2) #[1] 334 On Wednesday, November 27, 2013 5:27 PM, arun smartpink...@yahoo.com wrote: Hi, Try: set.seed(49) dat1 - as.data.frame(matrix(sample(c(NA,1:50),41082*15,replace=TRUE),ncol=15)) dat1$indx - as.numeric(gl(334*123,123,334*123)) names(dat1)[1] - rate lst1 - split(dat1[,-16],dat1[,16]) any(sapply(lst1,nrow)!=123) #[1] FALSE lst2 - lapply(lst1,function(x) summary(lm(rate~.,data=x))) length(lst2) #[1] 334 A.K. Hi all! I am very beginner in R so please excuse me some of the naive questions. I am learning. Here is description of my problem: I have database (in single csv file) characteristic_1 characteristic_2 ... characteristic_49 subject_1 | c1_1_t=1 | c2_1_t=1 ... | c49_1_t=1 subject_2 | c1_2_t=1 | c2_2_t=1 ... | c49_2_t=1 subject_3 | c1_3_t=1 | c2_3_t=1 ... | c49_3_t=1 ... subject_334 | c1_334_t=1 | c2_334_t=1 ... | c49_334_t=1 subject_1 | c1_1_t=2 | c2_1_t=2 ... | c49_1_t=2 subject_2 | c1_2_t=2 | c2_2_t=2 ... | c49_2_t=2 subject_3 | c1_3_t=2 | c2_3_t=2 ... | c49_3_t=2 ... subject_334 | c1_3_t=2 | c2_3_t=2 ... | c49_3_t=2 and so on ... till t (time) = 123 so I have 334 subjects with 49 characteristics measured in 123 points of time. I would like to run 123 regressions (three kinds: lm, rlm and lmrob - for comparison reasons) each one for 334 subjects and 49 dependent variables and after each regression (actually after conducting each of the three regressions:lm, rlm and lmrob) I would like to save txt (or csv) file with results (summary) and some test* (each regression can be named reg_1, reg_2 ... reg_123) for those regressions. To make things more clear: regressions would look like that: summary(lm(rate~cap.log+liqamih.log+liqwol.log+pbv.log+mom.log+ +beta.wig+beta.wig.eq +beta.sp +beta.wig.macro +beta.sp.macro +beta.sentim.pl+beta.sentim.pl.ort +beta.sentim.usa+beta.sentim.usa.ort, data=data)) the problem is how to make this lm() above for rolling window id est for first 334 observations? (total observations: 123*334) and so on. I need to run regression_1 for first 334 observations, regression_2 for next 334 obs (from 335 to 669) and so on till regression_123 (from last 40748 till 41082). And each time I run such regression I would like to save results (summary and mentioned tests). Then I would like to repeat the same procedure but for rlm() and lmrob() if possible. I think I can write tests part of the script alone (could you write me some comments where exactly I should put it in script to have the test automatically repeated the results saved), but 'saving' and 'repeating 123 times' procedures are quite complicated for me, at least now. So here I am asking for help with it. In the end I would like to have three txt (or csv) files: one containing 123 summaries and test results of lm, one containing 123 summaries and test results of rlm and one containing 123 summaries and test results of lmrob. Could someone help me with this task? I am grateful for your help and support. *like: jarque.bera.test() vif() ncvTest() durbinWatsonTest() ---some of them are not applicable for rlm and lmrob - so in this case I would like to have test NA in the three output txt (or csv) files Some of them are also not applicable to cross-sectional regressions ... but still I would like to keep them in script for later modifications __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Automatic saving of many regression's output
Hi, You may try something like: set.seed(49) dat1 - as.data.frame(matrix(sample(1:300,41082*15,replace=TRUE),ncol=15)) #created only 15 columns as shown in your model dat1$indx - as.numeric(gl(334*123,123,334*123)) names(dat1)[1] - rate lst1 - split(dat1[,-16],dat1[,16]) any(sapply(lst1,nrow)!=123) #[1] FALSE lst2 - lapply(lst1,function(x) summary(lm(rate~.,data=x))) length(lst2) #[1] 334 A.K. On Wednesday, November 27, 2013 3:41 PM, nooldor nool...@gmail.com wrote: Thank you for reply. OK. you are right, let's make it more clear: regressions would look like that: summary(lm(rate~cap.log+liqamih.log+liqwol.log+pbv.log+mom.log+ +beta.wig+beta.wig.eq +beta.sp +beta.wig.macro +beta.sp.macro +beta.sentim.pl+beta.sentim.pl.ort +beta.sentim.usa+beta.sentim.usa.ort, data=data)) the problem is how to make this lm() above for rolling window id est for first 334 observations? (total observations: 123*334). I need to run regresion_1 for first 334 observations, regression_2 for next 334 obs (from 335 to 669) and so on till regression_123 (from last 40748 till 41082). And each time I run such regression I would like to save results (summary and mentioned tests). Then I would like to repeat the same procedure but for rlm() and lmrob() if possible. Hope it's better described now. On 27 November 2013 21:24, arun smartpink...@yahoo.com wrote: So, if you have 49 dependent variables, what would be the model for one of the 123 regressions. You haven't provided any reproducible example, so its a lot of guess work. On Wednesday, November 27, 2013 3:18 PM, nooldor nool...@gmail.com wrote: HI, Yes, I need to run regression 123 times - each time for 334 subjects with 49 dependent variables. Now I am trying rollapply function, but as I mentioned I am beginner so it takes time ... On 27 November 2013 21:11, smartpink...@yahoo.com wrote: Hi, You said you wanted 123 test results of 'lm'. You have 49 dependent variables. So, there is something missing in your description. quote author='nooldor' Hi all! I am very beginner in R so please excuse me some of the naive questions. I am learning. Here is description of my problem: I have database (in single csv file) characteristic_1 characteristic_2 ... characteristic_49 subject_1 | c1_1_t=1 | c2_1_t=1 ... | c49_1_t=1 subject_2 | c1_2_t=1 | c2_2_t=1 ... | c49_2_t=1 subject_3 | c1_3_t=1 | c2_3_t=1 ... | c49_3_t=1 ... subject_334 | c1_334_t=1 | c2_334_t=1 ... | c49_334_t=1 subject_1 | c1_1_t=2 | c2_1_t=2 ... | c49_1_t=2 subject_2 | c1_2_t=2 | c2_2_t=2 ... | c49_2_t=2 subject_3 | c1_3_t=2 | c2_3_t=2 ... | c49_3_t=2 ... subject_334 | c1_3_t=2 | c2_3_t=2 ... | c49_3_t=2 and so on ... till t (time) = 123 so I have 334 subjects with 49 characteristics measured in 123 points of time. I would like to run 123 regressions (three kinds: lm, rlm and lmrob - for comparison reasons) each one for 334 subjects and 49 dependent variables and after each regression (actually after conducting each of the three regressions:lm, rlm and lmrob) I would like to save txt (or csv) file with results (summary) and some test* (each regression can be named reg_1, reg_2 ... reg_123) for those regressions. I think I can write tests part of the script alone (could you write me some comments where exactly I should put it in script to have the test automatically repeated the results saved), but 'saving' and 'repeating 123 times' procedures are quite complicated for me, at least now. So here I am asking for help with it. In the end I would like to have three txt (or csv) files: one containing 123 summaries and test results of lm, one containing 123 summaries and test results of rlm and one containing 123 summaries and test results of lmrob. Could someone help me with this task? I am grateful for your help and support. *like: jarque.bera.test() vif() ncvTest() durbinWatsonTest() ---some of them are not applicable for rlm and lmrob - so in this case I would like to have test NA in the three output txt (or csv) files Some of them are also not applicable to cross-sectional regressions ... but still I would like to keep them in script for later modifications /quote Quoted from: http://r.789695.n4.nabble.com/Automatic-saving-of-many-regression-s-output-tp4681284.html _ Sent from http://r.789695.n4.nabble.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal,
Re: [R] Automatic saving of many regression's output
Hi, 2. You need to tell which package you are using. 3. Does this work for you? capture.output(lst2,file=nooldor.txt) 4. lst2 - lapply(lst1[sapply(lst1,function(x) !(all(rowSums(is.na(x))0)))],function(x) print(summary(lm(rate~.,data=x))) ###prints the output on R console A.K. Hi, Thank you for patience and help :-) now the code looks like that: data-read.table(reg3-dane.csv, head=T, sep=;, dec=,) data$indx - as.numeric(gl(334*123,123,334*123)) lst1 - split(data[,-16],data[,16]) # 1. by changing 16 parameter I can add or remove variables (also by modyfing the reg3-dane.csv file), right? any(sapply(lst1,nrow)!=123) #[1] FALSE lst2 - lapply(lst1[sapply(lst1,function(x) !(all(rowSums(is.na(x))0)))],function(x) summary(lm(rate~cap.log+liqamih.log+pbv,data=x)) ) length(lst2) # 2.where I can place the test for each (from 123) regression like jarque.bera.test() vif() ncvTest() durbinWatsonTest() to have it saved with regression summary? and 3. how to get those list with results more user-friendly? I would like to get the report #[1] 334 is it ok? Could you help me with the questions in remarks above? And could you modify the script to also print the summary (and tests) of each regression (each of 123) in console? Best wishes! T.S. On Wednesday, November 27, 2013 5:49 PM, arun smartpink...@yahoo.com wrote: Hi, lst1[[1]][,2] - NA lst2 - lapply(lst1,function(x) summary(lm(rate~.,data=x))) Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases lst2 - lapply(lst1[sapply(lst1,function(x) !(all(rowSums(is.na(x))0)))],function(x) summary(lm(rate~.,data=x)) ) A.K. Hi, thank you for help. :-) I applied your script to the data but I have got the error: Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) casesI forget to write that some of the data are NA. I executed this code: lst1 - split(data[,-16],data[,16]) any(sapply(lst1,nrow)!=123) #[1] FALSE lst2 - lapply(lst1,function(x) summary(lm(rate~cap.log+liqamih.log+pbv,data=x))) # here I can set the dependent variables if I want to test different versions of the model (e.g with only e dependent variables), right? length(lst2) #[1] 334 On Wednesday, November 27, 2013 5:27 PM, arun smartpink...@yahoo.com wrote: Hi, Try: set.seed(49) dat1 - as.data.frame(matrix(sample(c(NA,1:50),41082*15,replace=TRUE),ncol=15)) dat1$indx - as.numeric(gl(334*123,123,334*123)) names(dat1)[1] - rate lst1 - split(dat1[,-16],dat1[,16]) any(sapply(lst1,nrow)!=123) #[1] FALSE lst2 - lapply(lst1,function(x) summary(lm(rate~.,data=x))) length(lst2) #[1] 334 A.K. Hi all! I am very beginner in R so please excuse me some of the naive questions. I am learning. Here is description of my problem: I have database (in single csv file) characteristic_1 characteristic_2 ... characteristic_49 subject_1 | c1_1_t=1 | c2_1_t=1 ... | c49_1_t=1 subject_2 | c1_2_t=1 | c2_2_t=1 ... | c49_2_t=1 subject_3 | c1_3_t=1 | c2_3_t=1 ... | c49_3_t=1 ... subject_334 | c1_334_t=1 | c2_334_t=1 ... | c49_334_t=1 subject_1 | c1_1_t=2 | c2_1_t=2 ... | c49_1_t=2 subject_2 | c1_2_t=2 | c2_2_t=2 ... | c49_2_t=2 subject_3 | c1_3_t=2 | c2_3_t=2 ... | c49_3_t=2 ... subject_334 | c1_3_t=2 | c2_3_t=2 ... | c49_3_t=2 and so on ... till t (time) = 123 so I have 334 subjects with 49 characteristics measured in 123 points of time. I would like to run 123 regressions (three kinds: lm, rlm and lmrob - for comparison reasons) each one for 334 subjects and 49 dependent variables and after each regression (actually after conducting each of the three regressions:lm, rlm and lmrob) I would like to save txt (or csv) file with results (summary) and some test* (each regression can be named reg_1, reg_2 ... reg_123) for those regressions. To make things more clear: regressions would look like that: summary(lm(rate~cap.log+liqamih.log+liqwol.log+pbv.log+mom.log+ +beta.wig+beta.wig.eq +beta.sp +beta.wig.macro +beta.sp.macro +beta.sentim.pl+beta.sentim.pl.ort +beta.sentim.usa+beta.sentim.usa.ort, data=data)) the problem is how to make this lm() above for rolling window id est for first 334 observations? (total observations: 123*334) and so on. I need to run regression_1 for first 334 observations, regression_2 for next 334 obs (from 335 to 669) and so on till regression_123 (from last 40748 till 41082). And each time I run such regression I would like to save results (summary and mentioned tests). Then I would like to repeat the same procedure but for
Re: [R] Automatic saving of many regression's output
Hi, Try: set.seed(49) dat1 - as.data.frame(matrix(sample(c(NA,1:50),41082*15,replace=TRUE),ncol=15)) dat1$indx - as.numeric(gl(334*123,123,334*123)) names(dat1)[1] - rate lst1 - split(dat1[,-16],dat1[,16]) any(sapply(lst1,nrow)!=123) #[1] FALSE lst2 - lapply(lst1,function(x) summary(lm(rate~.,data=x))) length(lst2) #[1] 334 A.K. Hi all! I am very beginner in R so please excuse me some of the naive questions. I am learning. Here is description of my problem: I have database (in single csv file) characteristic_1 characteristic_2 ... characteristic_49 subject_1 | c1_1_t=1 | c2_1_t=1 ... | c49_1_t=1 subject_2 | c1_2_t=1 | c2_2_t=1 ... | c49_2_t=1 subject_3 | c1_3_t=1 | c2_3_t=1 ... | c49_3_t=1 ... subject_334 | c1_334_t=1 | c2_334_t=1 ... | c49_334_t=1 subject_1 | c1_1_t=2 | c2_1_t=2 ... | c49_1_t=2 subject_2 | c1_2_t=2 | c2_2_t=2 ... | c49_2_t=2 subject_3 | c1_3_t=2 | c2_3_t=2 ... | c49_3_t=2 ... subject_334 | c1_3_t=2 | c2_3_t=2 ... | c49_3_t=2 and so on ... till t (time) = 123 so I have 334 subjects with 49 characteristics measured in 123 points of time. I would like to run 123 regressions (three kinds: lm, rlm and lmrob - for comparison reasons) each one for 334 subjects and 49 dependent variables and after each regression (actually after conducting each of the three regressions:lm, rlm and lmrob) I would like to save txt (or csv) file with results (summary) and some test* (each regression can be named reg_1, reg_2 ... reg_123) for those regressions. To make things more clear: regressions would look like that: summary(lm(rate~cap.log+liqamih.log+liqwol.log+pbv.log+mom.log+ +beta.wig+beta.wig.eq +beta.sp +beta.wig.macro +beta.sp.macro +beta.sentim.pl+beta.sentim.pl.ort +beta.sentim.usa+beta.sentim.usa.ort, data=data)) the problem is how to make this lm() above for rolling window id est for first 334 observations? (total observations: 123*334) and so on. I need to run regression_1 for first 334 observations, regression_2 for next 334 obs (from 335 to 669) and so on till regression_123 (from last 40748 till 41082). And each time I run such regression I would like to save results (summary and mentioned tests). Then I would like to repeat the same procedure but for rlm() and lmrob() if possible. I think I can write tests part of the script alone (could you write me some comments where exactly I should put it in script to have the test automatically repeated the results saved), but 'saving' and 'repeating 123 times' procedures are quite complicated for me, at least now. So here I am asking for help with it. In the end I would like to have three txt (or csv) files: one containing 123 summaries and test results of lm, one containing 123 summaries and test results of rlm and one containing 123 summaries and test results of lmrob. Could someone help me with this task? I am grateful for your help and support. *like: jarque.bera.test() vif() ncvTest() durbinWatsonTest() ---some of them are not applicable for rlm and lmrob - so in this case I would like to have test NA in the three output txt (or csv) files Some of them are also not applicable to cross-sectional regressions ... but still I would like to keep them in script for later modifications __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Automatic saving of many regression's output
Hi, No problem, You could try: library(tseries) res6 - do.call(rbind,lapply(lst1[sapply(lst1,function(x) !(all(rowSums(is.na(x))0)))],function(x) {resid - residuals(lm(rate~.,data=x)); unlist(jarque.bera.test(resid)[1:3])}) ) A.K. On Wednesday, November 27, 2013 7:47 PM, Tomasz Schabek schabek.tom...@gmail.com wrote: Great! Thank you for help one more time! yes, you are right - jarque.bera.test() should be applied to a vector, so the deal is: residuals from each of those 123 regressions captured by e.g: resid -residuals(model) and jarque.bera.test(resid) are tested in jarque.bera.test(). Could you manage it? You are really helpful and kind person! Kind regards, Atenciosamente, Pozdrawiam, T. S. On 28 November 2013 01:33, arun smartpink...@yahoo.com wrote: Hi, In that case: lst5 - lapply(lst1[sapply(lst1,function(x) !(all(rowSums(is.na(x))0)))],function(x) vif(lm(rate~., x))) res5 - do.call(rbind,lst5) As I mentioned earlier, it is not clear how you wanted to test jarque.bera.test(). Also, the results from lst3,lst4,lst5 etc could be saved using capture.output() (not tested though). Or if you wanted to modify it and wanted only specific categories, for example: res4 - do.call(rbind,lapply(lst4,function(x) unlist(x[-4]))) On Wednesday, November 27, 2013 7:21 PM, nooldor nool...@gmail.com wrote: Thank you for fast answer! and big THANK for help! I found error in the previous script (it was doing 334 regressions on 123 length vectors and it should be opposite: 123 regressions on 334 length vector) anyway I modify it: data-read.table(reg3-dane.csv, head=T, sep=;, dec=,) data$indx - as.numeric(gl(123*334,334,123*334)) lst1 - split(data[,-16],data[,16]) any(sapply(lst1,nrow)!=123) #[1] FALSE lapply(lst1[sapply(lst1,function(x) !(all(rowSums(is.na(x))0)))],function(x) summary(lm(rate~cap.log,data=x)) ) capture.output(lst2,file=nooldor.txt) it's ok now (at least when I compared regression summary from excel and R it was the same :-) ) capture.output(lst2,file=nooldor.txt) works fine! packages: vif {car} jarque.bera.test {tseries} ncvTest {car} durbinWatsonTest {car} R version 3.0.2 (2013-09-25) T.S. On 28 November 2013 00:38, arun smartpink...@yahoo.com wrote: Hi, 2. You need to tell which package you are using. 3. Does this work for you? capture.output(lst2,file=nooldor.txt) 4. lst2 - lapply(lst1[sapply(lst1,function(x) !(all(rowSums(is.na(x))0)))],function(x) print(summary(lm(rate~.,data=x))) ###prints the output on R console A.K. Hi, Thank you for patience and help :-) now the code looks like that: data-read.table(reg3-dane.csv, head=T, sep=;, dec=,) data$indx - as.numeric(gl(334*123,123,334*123)) lst1 - split(data[,-16],data[,16]) # 1. by changing 16 parameter I can add or remove variables (also by modyfing the reg3-dane.csv file), right? any(sapply(lst1,nrow)!=123) #[1] FALSE lst2 - lapply(lst1[sapply(lst1,function(x) !(all(rowSums(is.na(x))0)))],function(x) summary(lm(rate~cap.log+liqamih.log+pbv,data=x)) ) length(lst2) # 2.where I can place the test for each (from 123) regression like jarque.bera.test() vif() ncvTest() durbinWatsonTest() to have it saved with regression summary? and 3. how to get those list with results more user-friendly? I would like to get the report #[1] 334 is it ok? Could you help me with the questions in remarks above? And could you modify the script to also print the summary (and tests) of each regression (each of 123) in console? Best wishes! T.S. On Wednesday, November 27, 2013 5:49 PM, arun smartpink...@yahoo.com wrote: Hi, lst1[[1]][,2] - NA lst2 - lapply(lst1,function(x) summary(lm(rate~.,data=x))) Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases lst2 - lapply(lst1[sapply(lst1,function(x) !(all(rowSums(is.na(x))0)))],function(x) summary(lm(rate~.,data=x)) ) A.K. Hi, thank you for help. :-) I applied your script to the data but I have got the error: Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) casesI forget to write that some of the data are NA. I executed this code: lst1 - split(data[,-16],data[,16]) any(sapply(lst1,nrow)!=123) #[1] FALSE lst2 - lapply(lst1,function(x) summary(lm(rate~cap.log+liqamih.log+pbv,data=x))) # here I can set the dependent variables if I want to test different versions of the model (e.g with only e dependent variables), right? length(lst2) #[1] 334 On Wednesday, November 27, 2013 5:27 PM, arun smartpink...@yahoo.com wrote: Hi, Try: set.seed(49) dat1 - as.data.frame(matrix(sample(c(NA,1:50),41082*15,replace=TRUE),ncol=15)) dat1$indx - as.numeric(gl(334*123,123,334*123)) names(dat1)[1] - rate lst1 - split(dat1[,-16],dat1[,16]) any(sapply(lst1,nrow)!=123) #[1] FALSE lst2 - lapply(lst1,function(x) summary(lm(rate~.,data=x))) length(lst2) #[1] 334 A.K. Hi all! I am very beginner in R so please excuse me some of the naive questions. I am
Re: [R] Automatic saving of many regression's output
HI, Just tried ncvTest() and durbinWatsonTest() from library(car) f4 - function(meanmod, dta, varmod) { assign(.dta, dta, envir=.GlobalEnv) assign(.meanmod, meanmod, envir=.GlobalEnv) m1 - lm(.meanmod, .dta) ans - ncvTest(m1, varmod) remove(.dta, envir=.GlobalEnv) remove(.meanmod, envir=.GlobalEnv) ans } library(car) lst3 - lapply(lst1[sapply(lst1,function(x) !(all(rowSums(is.na(x))0)))],function(x) f4(rate~., x)) lst4 - lapply(lst1[sapply(lst1,function(x) !(all(rowSums(is.na(x))0)))],function(x) durbinWatsonTest(lm(rate~., x))) ?jarque.bera.test() from library(tseries) is applied on a numeric vector or time series. A.K. On Wednesday, November 27, 2013 6:38 PM, arun smartpink...@yahoo.com wrote: Hi, 2. You need to tell which package you are using. 3. Does this work for you? capture.output(lst2,file=nooldor.txt) 4. lst2 - lapply(lst1[sapply(lst1,function(x) !(all(rowSums(is.na(x))0)))],function(x) print(summary(lm(rate~.,data=x))) ###prints the output on R console A.K. Hi, Thank you for patience and help :-) now the code looks like that: data-read.table(reg3-dane.csv, head=T, sep=;, dec=,) data$indx - as.numeric(gl(334*123,123,334*123)) lst1 - split(data[,-16],data[,16]) # 1. by changing 16 parameter I can add or remove variables (also by modyfing the reg3-dane.csv file), right? any(sapply(lst1,nrow)!=123) #[1] FALSE lst2 - lapply(lst1[sapply(lst1,function(x) !(all(rowSums(is.na(x))0)))],function(x) summary(lm(rate~cap.log+liqamih.log+pbv,data=x)) ) length(lst2) # 2.where I can place the test for each (from 123) regression like jarque.bera.test() vif() ncvTest() durbinWatsonTest() to have it saved with regression summary? and 3. how to get those list with results more user-friendly? I would like to get the report #[1] 334 is it ok? Could you help me with the questions in remarks above? And could you modify the script to also print the summary (and tests) of each regression (each of 123) in console? Best wishes! T.S. On Wednesday, November 27, 2013 5:49 PM, arun smartpink...@yahoo.com wrote: Hi, lst1[[1]][,2] - NA lst2 - lapply(lst1,function(x) summary(lm(rate~.,data=x))) Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases lst2 - lapply(lst1[sapply(lst1,function(x) !(all(rowSums(is.na(x))0)))],function(x) summary(lm(rate~.,data=x)) ) A.K. Hi, thank you for help. :-) I applied your script to the data but I have got the error: Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) casesI forget to write that some of the data are NA. I executed this code: lst1 - split(data[,-16],data[,16]) any(sapply(lst1,nrow)!=123) #[1] FALSE lst2 - lapply(lst1,function(x) summary(lm(rate~cap.log+liqamih.log+pbv,data=x))) # here I can set the dependent variables if I want to test different versions of the model (e.g with only e dependent variables), right? length(lst2) #[1] 334 On Wednesday, November 27, 2013 5:27 PM, arun smartpink...@yahoo.com wrote: Hi, Try: set.seed(49) dat1 - as.data.frame(matrix(sample(c(NA,1:50),41082*15,replace=TRUE),ncol=15)) dat1$indx - as.numeric(gl(334*123,123,334*123)) names(dat1)[1] - rate lst1 - split(dat1[,-16],dat1[,16]) any(sapply(lst1,nrow)!=123) #[1] FALSE lst2 - lapply(lst1,function(x) summary(lm(rate~.,data=x))) length(lst2) #[1] 334 A.K. Hi all! I am very beginner in R so please excuse me some of the naive questions. I am learning. Here is description of my problem: I have database (in single csv file) characteristic_1 characteristic_2 ... characteristic_49 subject_1 | c1_1_t=1 | c2_1_t=1 ... | c49_1_t=1 subject_2 | c1_2_t=1 | c2_2_t=1 ... | c49_2_t=1 subject_3 | c1_3_t=1 | c2_3_t=1 ... | c49_3_t=1 ... subject_334 | c1_334_t=1 | c2_334_t=1 ... | c49_334_t=1 subject_1 | c1_1_t=2 | c2_1_t=2 ... | c49_1_t=2 subject_2 | c1_2_t=2 | c2_2_t=2 ... | c49_2_t=2 subject_3 | c1_3_t=2 | c2_3_t=2 ... | c49_3_t=2 ... subject_334 | c1_3_t=2 | c2_3_t=2 ... | c49_3_t=2 and so on ... till t (time) = 123 so I have 334 subjects with 49 characteristics measured in 123 points of time. I would like to run 123 regressions (three kinds: lm, rlm and lmrob - for comparison reasons) each one for 334 subjects and 49 dependent variables and after each regression (actually after conducting each of the three regressions:lm, rlm and lmrob) I would like to save txt (or csv) file with results (summary) and some test* (each regression can be named reg_1, reg_2 ... reg_123) for those regressions. To make things more clear: regressions would look like that:
Re: [R] Find the prediction or the fitted values for an lm model
See in-line below. On 11/28/13 20:50, jpm miao wrote: Hi, I would like to fit my data with a 4th order polynomial. Now I have only 5 data point, I should have a polynomial that exactly pass the five point Then I would like to compute the fitted or predict value with a relatively large x dataset. How can I do it? BTW, I thought the model prodfn should pass by (0,0), but I just wonder why the const is unequal to zero Because poly() produces orthonormalized polynomials, Look at poly(x1,4). It is not much like cbind(x1,x1^2,x1^3,x1^4), is it? cheers, Rolf Turner x1-c(0,3,4,5,8) y1-c(0,1,4,7,8) prodfn-lm(y1 ~ poly(x1, 4)) x-seq(0,8,0.01) temp-predict(prodfn,data.frame(x=x)) # This line does not work.. prodfn Call: lm(formula = y1 ~ poly(x1, 4)) Coefficients: (Intercept) poly(x1, 4)1 poly(x1, 4)2 poly(x1, 4)3 poly(x1, 4)4 4.000e+00 6.517e+00-4.918e-16-2.744e+00-8.882e-16 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems dealing with matrices
Hi, Sorry for continuous bothering. Continuum of the previous problem... I have the following matrices and vectors, dcmat-matrix(c(0.13,0.61,0.25,0.00,0.00,0.00,0.52,0.37,0.09,0.00,0.00,0.00, 0.58,0.30,0.11,0.00,0.00,0.00,0.46,0.22,0.00,0.00,0.00,0.00, 0.09),nrow=5,ncol=5) volini-matrix(c(0,0,0,0,0),nrow=5,ncol=1) volinp1-c(0, 0.0004669094, 0.0027610861, 0.0086204692, 0.0200137754, 0.0389069106 ,0.0670942588, 0.1060941424, 0.1570990708, 0.2209672605, 0.2982420945, 0.3891882830, 0.4938361307, 0.6120278338, 0.7434618363, 0.8877329008, 1.0443667375, 1.2128488387, 1.3926476912, 1.5832328410, 1.7840884399, 1.9947229566, 2.2146757191, 2.4435209092, 2.6808695568, 2.9263700050, 3.1797072430, 3.4406014299, 3.7088058696, 3.9841046430, 4.2663100561, 4.5552600226, 4.8508154713, 5.1528578389, 5.4612866929, 5.7760175114, 6.0969796345, 6.4241143947, 6.7573734248, 7, 7 ,7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7) I've calculated the following matrices vol and volyrdc1 (obviously with the help of Jeff and Arun): #Blank matrices for dumping final values vol - matrix( NA, nrow=5, ncol=length(volinp1)) volyrdc1-matrix(NA, nrow=5,ncol=length(volinp1),dimnames= list(c(DC1,DC2,DC3,DC4,DC5),c(seq(0,500,5 vol[ , 1 ] - dcmat %*% (volini+(volinp1[1]*wt)) wt-matrix(c(1,0,0,0,0),nrow=5) for ( idx in seq_along(volinp1)[ -1 ] ) { vol[ , idx ] - dcmat %*% ( vol[ , idx-1 ] + volinp1[idx] * wt ) } vol volyrdc1[,1]-vol[,1] for ( idx in seq_along(volinp1)[ -1 ] ) { volyrdc1[ , idx ] - vol[ , idx-1 ] + volinp1[idx] * wt } volyrdc1 My final matrix in 'volyrdc1' (kind of transition matrix model). Now, what I want to do is to calculate when the colsum-colSums(volyrdc1) reaches a certain value and I want to get the index of the element in the 'colsum' vector at that point. For e.g. when colsum[colsum=18] ? It will give a series of cases where the condition is true. But I want index of the element immediately when the condition is met. In this case, the answer I want is 140 (colsum[29] returns both value (18.63) and the character (140) attributing the index). Actually, in my case 140 is year (age) when the 'colsum' becomes =18. At is point it would be great if I can calculate when 'colsum' levels off (up to two decimal place)? The answer is: 305 and at that point colsum==45.37. I also want to calculate what should be the value in volini[1,1] to get a certain value in 'colsum' at a certain year (age)(vector element index explained earlier)? For e.g. I want to find out that what should be the value in volini[1,1] if I want colsum==18 at 100(charater attributing colsum[21])? The answer is: 15910 and the 'volini' matrix will look like: volini-matrix(c(15910,0,0,0,0),nrow=5,ncol=1) Any pointer, suggestions,... will be gratefully acknowledged. P.S. Can you please suggest me any effective R programming book that describe core elements of R programming? Thanks in advance. Regards, Halim --- Md. Abdul Halim Assistant Professor Department of Forestry and Environmental Science Shahjalal University of Science and Technology,Sylhet-3114, Bangladesh. Cell: +8801714078386. alt. e-mail: xo...@yahoo.com On Tue, 26 Nov 2013 20:21:14 -0800 (PST), arun wrote HI Halim, No problem. Regards, Arun On Tuesday, November 26, 2013 11:18 PM, halim10-fes halim10- f...@sust.edu wrote: Hi Arun, Thanks for your help. Sorry for my late response. Take care and stay fine. Regards, Halim On Sun, 24 Nov 2013 07:45:24 -0800 (PST), arun wrote Hi Halim, I guess this works for you. Modifying Jeff's solution: volinp-c(0,0.000467,0.002762,0.008621,0.020014,0.038907,0.067094) vol1 - dcmat %*% (volmat +wt) for(idx in seq_along(volinp)[-1]){ vol1 - cbind(vol1,dcmat %*% (vol1[,idx-1] + volinp[idx] *wt)) } #or vol - matrix( NA, nrow=5, ncol=length( volinp ) ) vol[ , 1 ] - dcmat %*% ( volmat + wt ) for ( idx in seq_along(volinp)[ -1 ] ) { vol[ , idx ] - dcmat %*% ( vol[ , idx-1 ] + volinp[idx] * wt ) } identical(vol,vol1) #[1] TRUE A.K. On Sunday, November 24, 2013 7:16 AM, halim10-fes halim10- f...@sust.edu wrote: Hi Arun, OK, no problem. Thank you very much for your attention. I've posted an annex to my previous problem. I will appreciate your comments/suggestions on it. Off-topic: You're a very helpful man. I like your attitude to helping others. Take care. Halim On Sun, 24 Nov 2013 01:18:18 -0800 (PST), arun wrote Hi, Please disregard my earlier message. Looks like Jeff understand it better and answered it. Regards, Arun On Sunday, November 24, 2013 3:23 AM, arun smartpink...@yahoo.com wrote: Hi, I am finding some inconsistency with your description. For example:
Re: [R] if, apply, ifelse
On 11/28/2013 04:33 AM, Andrea Lamont wrote: Hello: This seems like an obvious question, but I am having trouble answering it. I am new to R, so I apologize if its too simple to be posting. I have searched for solutions to no avail. I have data that I am trying to set up for further analysis (training data). What I need is 12 groups based on patterns of 4 variables. The complication comes in when missing data is present. Let me describe with an example - focusing on just 3 of the 12 groups: ... Any ideas on how to approach this efficiently? Hi Andrea, I would first convert the matrix a to a data frame: a1-as.data.frame(a) Then I would start adding columns: # group 1 is a 1 (logical TRUE) in col1 and at least one other 1 # here NAs are converted to zeros a1$group1-a1$col1 (ifelse(is.na(a1$col2),0,a1$col2) | ifelse(is.na(a1$col3),0,a1$col3) | ifelse(is.na(a1$col4),0,a1$col4)) # group 2 is a 1 in col1 and no other 1s # here NAs are converted to 1s a1$group2-a1$col1 !(ifelse(is.na(a1$col2),1,a1$col2) | ifelse(is.na(a1$col3),1,a1$col3) | ifelse(is.na(a1$col4),1,a1$col4)) # here NAs are converted to 1s a1$group3-!ifelse(is.na(a1$col1),1,a1$col1) and so on. It is clunky, but then you've got a clunky problem. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] if, apply, ifelse
Hi Andrea, A cleaner alternative to Jim's suggestion is something like a.df - as.data.frame(a) group1 - (a.df$col1 == 1) apply(a.df[,c(col2,col3,col4)], 2, function(x) any(x == 1 | is.na(x))) group2 - (a.df$col1 == 1) apply(a.df[,c(col2,col3,col4)], 1, function(x) all(x == 0 | is.na(x))) group3 - (a.df$col1 != 1) - Jon On Thu, Nov 28, 2013 at 5:10 PM, Jim Lemon j...@bitwrit.com.au wrote: On 11/28/2013 04:33 AM, Andrea Lamont wrote: Hello: This seems like an obvious question, but I am having trouble answering it. I am new to R, so I apologize if its too simple to be posting. I have searched for solutions to no avail. I have data that I am trying to set up for further analysis (training data). What I need is 12 groups based on patterns of 4 variables. The complication comes in when missing data is present. Let me describe with an example - focusing on just 3 of the 12 groups: ... Any ideas on how to approach this efficiently? Hi Andrea, I would first convert the matrix a to a data frame: a1-as.data.frame(a) Then I would start adding columns: # group 1 is a 1 (logical TRUE) in col1 and at least one other 1 # here NAs are converted to zeros a1$group1-a1$col1 (ifelse(is.na(a1$col2),0,a1$col2) | ifelse(is.na(a1$col3),0,a1$col3) | ifelse(is.na(a1$col4),0,a1$col4)) # group 2 is a 1 in col1 and no other 1s # here NAs are converted to 1s a1$group2-a1$col1 !(ifelse(is.na(a1$col2),1,a1$col2) | ifelse(is.na(a1$col3),1,a1$col3) | ifelse(is.na(a1$col4),1,a1$col4)) # here NAs are converted to 1s a1$group3-!ifelse(is.na(a1$col1),1,a1$col1) and so on. It is clunky, but then you've got a clunky problem. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/ posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help ANN
Hi everybody, first, I'm not high skilled about R, so please: be understandable!! I would like to create an artificial neural network with R but I don't know its parameters jet (number of layers, number of neurons,...). I downloaded the package ANN and I use the function ANNGA, but I'm afraid I haven't really created a neural network. In fact, at the end of the process I have just this output: Call: ANNGA.default(x = input, y = output, design = c(1, 3, 1), population = 100, mutation = 0.2, crossover = 0.6, maxW = 10, minW = -10, maxGen = 1000, error = 0.001) Mean Squared Error-- 0.01148523 R2-- 0.6918387 Number of generation 1001 Weight range at initialization-- [ 10 , -10 ] Weight range resulted from the optimisation- [ 13.58698 , -12.93606 ] Well, I would like to know if there is in ANN a function to *create* a neural network and if not, which package I have to download, Nnet? Thanks in advance! Giulia Di Lauro. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ODE does not reach steady state and increase exponentially
Dear all, please follow the link to the question that I posted on StackOverflow about my R code with ODE http://stackoverflow.com/questions/20218065/ode-does-not-reach-steady-state-and-increase-exponentially I am trying to write a code for a differential equation that should give me the biomass of different size classes, depending on the amount of available food. Nevertheless the biomass does not reach steady state, and increases exponentially even with parameters set to 0 (which in theory should result in a biomass value of 0) Thank you very much for your help! I appreciate it Matteo -- . . . '.-:-.` ' : ` .-: .' `. ,/ (o) \ \`._/ ,__) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multivariate dispersion distances
Dear All, I'm using betadisper {vegan} and I'm interested not only in the dispersion within the group but also the distances between the groups. With betadisper I get distances to group centroids but is it possible to get distances to other groups centroids? It might be possible to do it by hand by the formula given in the description of the betadisper (below) but I'm a bit confused how to treat the imaginary part there... z[ij]^c = sqrt(Delta^2(u[ij]^+, c[i]^+) - Delta^2(u[ij]^-, c[i]^-)) I would highly appreciate all the help I can get! -Merja -- View this message in context: http://r.789695.n4.nabble.com/Multivariate-dispersion-distances-tp4681326.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] date format
Dear Users of R, I have a data frame with three column, the first column contains years, the second one months and third one, the days (cbind( mm dd)). I want to combine them so that i have one column with the date format as (dd.mm.). Is there a way of doing that. Thanks in advance, Eliza [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multivariate dispersion distances
M Elo merja.t.elo at luukku.com writes: Dear All, I'm using betadisper {vegan} and I'm interested not only in the dispersion within the group but also the distances between the groups. With betadisper I get distances to group centroids but is it possible to get distances to other groups centroids? It might be possible to do it by hand by the formula given in the description of the betadisper (below) but I'm a bit confused how to treat the imaginary part there... z[ij]^c = sqrt(Delta^2(u[ij]^+, c[i]^+) - Delta^2(u[ij]^-, c[i]^-)) I would highly appreciate all the help I can get! Merja, You should do it exactly in the same way as you wrote above: subtract the squared Euclidean distances in the imaginary part from the squared Euclidean distances in the real part and take the square root. I think doing this by hand is the only way to do this directly. The scope of the method is to compare dispersions within groups. There are other tools to compare the locations of group centroids (adonis in vegan), but they won't give you distances. Cheers, Jari Oksanen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] if, apply, ifelse
Jim, et. al: rowSums(a, na.rm=TRUE) ## Fast! tells you whether you have 0, 1, or = 1 TRUE in each row. This can then be combined with the ifelse() conditions to get what the OP seems to want. As you said, it's clunky, and is just a minor simplification. But, then again, her logic seemed somewhat confusing. Cheers, Bert On Thu, Nov 28, 2013 at 1:10 AM, Jim Lemon j...@bitwrit.com.au wrote: On 11/28/2013 04:33 AM, Andrea Lamont wrote: Hello: This seems like an obvious question, but I am having trouble answering it. I am new to R, so I apologize if its too simple to be posting. I have searched for solutions to no avail. I have data that I am trying to set up for further analysis (training data). What I need is 12 groups based on patterns of 4 variables. The complication comes in when missing data is present. Let me describe with an example - focusing on just 3 of the 12 groups: ... Any ideas on how to approach this efficiently? Hi Andrea, I would first convert the matrix a to a data frame: a1-as.data.frame(a) Then I would start adding columns: # group 1 is a 1 (logical TRUE) in col1 and at least one other 1 # here NAs are converted to zeros a1$group1-a1$col1 (ifelse(is.na(a1$col2),0,a1$col2) | ifelse(is.na(a1$col3),0,a1$col3) | ifelse(is.na(a1$col4),0,a1$col4)) # group 2 is a 1 in col1 and no other 1s # here NAs are converted to 1s a1$group2-a1$col1 !(ifelse(is.na(a1$col2),1,a1$col2) | ifelse(is.na(a1$col3),1,a1$col3) | ifelse(is.na(a1$col4),1,a1$col4)) # here NAs are converted to 1s a1$group3-!ifelse(is.na(a1$col1),1,a1$col1) and so on. It is clunky, but then you've got a clunky problem. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] date format
eliza botto eliza_botto at hotmail.com writes: Dear Users of R, I have a data frame with three column, the first column contains years, the second one months and third one, the days (cbind( mm dd)). I want to combine them so that i have one column with the date format as (dd.mm.). Is there a way of doing that. Thanks in advance, Eliza I think just paste(dd,mm,,sep=.) should work fine (where 'dd','mm', '' are references to your columns) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] date format
Dear bert, arun and philipps,Thanks for your help. It worked perfectly fine for me.:D Eliza Date: Thu, 28 Nov 2013 16:09:58 +0100 From: wev...@web.de To: eliza_bo...@hotmail.com; r-help@r-project.org Subject: Re: [R] date format Hi Eliza, # you can use paste to create a new vector: date1-paste( dataframe[,3], dataframe[,2],dataframe[,1], sep=. ) # you could then turn that into a Date-Time-Class with which you could do calculations strptime(date1, format=%d.%m.%Y) Am 28.11.2013 14:54, schrieb eliza botto: Dear Users of R, I have a data frame with three column, the first column contains years, the second one months and third one, the days (cbind( mm dd)). I want to combine them so that i have one column with the date format as (dd.mm.). Is there a way of doing that. Thanks in advance, Eliza [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Philipp Wevers wev...@web.de Mobil: 015253710061 fest: 03080921097 Koloniestraße 126 A 13359 Berlin wev...@web.de --- Diese E-Mail ist frei von Viren und Malware, denn der avast! Antivirus Schutz ist aktiv. http://www.avast.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] date format
Hello, Maybe something like the following. dat - data.frame( = 2011:2013, mm = 1:3, dd = 4:6) apply(dat, 1, function(x) paste(rev(x), collapse = .)) Hope this helps, Rui Barradas Em 28-11-2013 13:54, eliza botto escreveu: Dear Users of R, I have a data frame with three column, the first column contains years, the second one months and third one, the days (cbind( mm dd)). I want to combine them so that i have one column with the date format as (dd.mm.). Is there a way of doing that. Thanks in advance, Eliza [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] date format
Thnx rui, Eliza Date: Thu, 28 Nov 2013 15:16:35 + From: ruipbarra...@sapo.pt To: eliza_bo...@hotmail.com; r-help@r-project.org Subject: Re: [R] date format Hello, Maybe something like the following. dat - data.frame( = 2011:2013, mm = 1:3, dd = 4:6) apply(dat, 1, function(x) paste(rev(x), collapse = .)) Hope this helps, Rui Barradas Em 28-11-2013 13:54, eliza botto escreveu: Dear Users of R, I have a data frame with three column, the first column contains years, the second one months and third one, the days (cbind( mm dd)). I want to combine them so that i have one column with the date format as (dd.mm.). Is there a way of doing that. Thanks in advance, Eliza [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Relative Cumulative Frequency of Event Occurence
Hi, My objective is to calculate Relative (Cumulative) Frequency of Event Occurrence - something as follows: Sample.Number 1st.Fly 2nd.Fly Did.E.occur? Relative.Cum.Frequency.of.E 1 G B No 0.000 2 B B Yes 0.500 3 B G No 0.333 4 G B No 0.250 5 G G Yes 0.400 6 G B No 0.333 7 B B Yes 0.429 8 G G Yes 0.500 9 G B No 0.444 10 B B Yes 0.500 Please refer to the code below: ## # 1. v.fly=c(G,B) # Outcome is Green or Blue fly # 2. n=10 # No of Events / Trials # 3. v.smp = seq(1:n) # Event Id # 4. v.fst = sample(v.fly,n,rep=T) # Simulating First Draw # 5. v.sec = sample(v.fly,n,rep=T) # Simulating Second Draw # 6. df.1 = data.frame(sample = v.smp, fst=v.fst, sec = v.sec) # Clumping in a DF # 7. df.1$E.Occur = with(df.1, ifelse(fst==sec,TRUE,FALSE)) # Event Occurs, if color is same in both the the draws # 8. df.1$Rel.Freq = with(df.1, cumsum(E.occur)/(E.Occur)) # Relative Frequency This line does NOT work, and needs to fix the denominator part ## Problem is with #8, specifically the part: cumsum(E.occur)/(E.Occur) The denominator E.Occur is a fixed value, instead of a moving count. I have tried nrow(), length() but none provides a moving version of row count, as cumsum does for the True values, occurring so far. dput(df.1) structure(list(Sample.Number = 1:10, X1st.Fly = c(G, B, B, G, G, G, B, G, G, B), X2nd.Fly = c(B, B, G, B, G, B, B, G, B, B), Did.E.occur. = c(No, Yes, No, No, Yes, No, Yes, Yes, No, Yes), Relative.Cum.Frequency.of.E = c(0, 0.5, 0.333, 0.25, 0.4, 0.333, 0.429, 0.5, 0.444, 0.5)), .Names = c(Sample.Number, X1st.Fly, X2nd.Fly, Did.E.occur., Relative.Cum.Frequency.of.E ), class = data.frame, row.names = c(NA, -10L)) Cheers ! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] date format
Hi, Try: dat1 - data.frame(years=rep(1991:1992,12), months=rep(1:12,2),days= rep(1,24)) dat1$day - format(as.Date(paste(dat1[,1],sprintf(%02d,dat1[,2]),sprintf(%02d,dat1[,3]),sep=.),%Y.%m.%d),%d.%m.%Y) A.K. On Thursday, November 28, 2013 8:56 AM, eliza botto eliza_bo...@hotmail.com wrote: Dear Users of R, I have a data frame with three column, the first column contains years, the second one months and third one, the days (cbind( mm dd)). I want to combine them so that i have one column with the date format as (dd.mm.). Is there a way of doing that. Thanks in advance, Eliza [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] date format
#Or paste(dat[,3],dat[,2],dat[,1],sep=.) #[1] 4.1.2011 5.2.2012 6.3.2013 # as.character(interaction(dat[,3:1])) paste(sprintf(%02d,dat[,3]),sprintf(%02d,dat[,2]),dat[,1],sep=.) #[1] 04.01.2011 05.02.2012 06.03.2013 A.K. On Thursday, November 28, 2013 10:18 AM, Rui Barradas ruipbarra...@sapo.pt wrote: Hello, Maybe something like the following. dat - data.frame( = 2011:2013, mm = 1:3, dd = 4:6) apply(dat, 1, function(x) paste(rev(x), collapse = .)) Hope this helps, Rui Barradas Em 28-11-2013 13:54, eliza botto escreveu: Dear Users of R, I have a data frame with three column, the first column contains years, the second one months and third one, the days (cbind( mm dd)). I want to combine them so that i have one column with the date format as (dd.mm.). Is there a way of doing that. Thanks in advance, Eliza [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems dealing with matrices
Hi Halim, For the first two questions, you may try: colsum1 - colSums(volyrdc1) min(which(colsum1=18)) #[1] 29 #or head(which(colsum1=18),1) #140 # 29 colsum1[substr(colsum1,6,7)==00] ## this is not very clear 305 45.37004 #or colsum1[colsum1=18][substr(colsum1[colsum1=18],6,7)==00] 305 45.37004 #because sprintf(%.4f,colsum1[colsum1=18]) colsum1[colsum1=18][gsub(.*\\.\\d{2},,sprintf(%.4f,colsum1[colsum1=18]))==00] 180 305 32.88996 45.37004 A.K. On Thursday, November 28, 2013 3:57 AM, halim10-fes halim10-...@sust.edu wrote: Hi, Sorry for continuous bothering. Continuum of the previous problem... I have the following matrices and vectors, dcmat-matrix(c(0.13,0.61,0.25,0.00,0.00,0.00,0.52,0.37,0.09,0.00,0.00,0.00, 0.58,0.30,0.11,0.00,0.00,0.00,0.46,0.22,0.00,0.00,0.00,0.00, 0.09),nrow=5,ncol=5) volini-matrix(c(0,0,0,0,0),nrow=5,ncol=1) volinp1-c(0, 0.0004669094, 0.0027610861, 0.0086204692, 0.0200137754, 0.0389069106 ,0.0670942588, 0.1060941424, 0.1570990708, 0.2209672605, 0.2982420945, 0.3891882830, 0.4938361307, 0.6120278338, 0.7434618363, 0.8877329008, 1.0443667375, 1.2128488387, 1.3926476912, 1.5832328410, 1.7840884399, 1.9947229566, 2.2146757191, 2.4435209092, 2.6808695568, 2.9263700050, 3.1797072430, 3.4406014299, 3.7088058696, 3.9841046430, 4.2663100561, 4.5552600226, 4.8508154713, 5.1528578389, 5.4612866929, 5.7760175114, 6.0969796345, 6.4241143947, 6.7573734248, 7, 7 ,7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7) I've calculated the following matrices vol and volyrdc1 (obviously with the help of Jeff and Arun): #Blank matrices for dumping final values vol - matrix( NA, nrow=5, ncol=length(volinp1)) volyrdc1-matrix(NA, nrow=5,ncol=length(volinp1),dimnames= list(c(DC1,DC2,DC3,DC4,DC5),c(seq(0,500,5 vol[ , 1 ] - dcmat %*% (volini+(volinp1[1]*wt)) wt-matrix(c(1,0,0,0,0),nrow=5) for ( idx in seq_along(volinp1)[ -1 ] ) { vol[ , idx ] - dcmat %*% ( vol[ , idx-1 ] + volinp1[idx] * wt ) } vol volyrdc1[,1]-vol[,1] for ( idx in seq_along(volinp1)[ -1 ] ) { volyrdc1[ , idx ] - vol[ , idx-1 ] + volinp1[idx] * wt } volyrdc1 My final matrix in 'volyrdc1' (kind of transition matrix model). Now, what I want to do is to calculate when the colsum-colSums(volyrdc1) reaches a certain value and I want to get the index of the element in the 'colsum' vector at that point. For e.g. when colsum[colsum=18] ? It will give a series of cases where the condition is true. But I want index of the element immediately when the condition is met. In this case, the answer I want is 140 (colsum[29] returns both value (18.63) and the character (140) attributing the index). Actually, in my case 140 is year (age) when the 'colsum' becomes =18. At is point it would be great if I can calculate when 'colsum' levels off (up to two decimal place)? The answer is: 305 and at that point colsum==45.37. I also want to calculate what should be the value in volini[1,1] to get a certain value in 'colsum' at a certain year (age)(vector element index explained earlier)? For e.g. I want to find out that what should be the value in volini[1,1] if I want colsum==18 at 100(charater attributing colsum[21])? The answer is: 15910 and the 'volini' matrix will look like: volini-matrix(c(15910,0,0,0,0),nrow=5,ncol=1) Any pointer, suggestions,... will be gratefully acknowledged. P.S. Can you please suggest me any effective R programming book that describe core elements of R programming? Thanks in advance. Regards, Halim --- Md. Abdul Halim Assistant Professor Department of Forestry and Environmental Science Shahjalal University of Science and Technology,Sylhet-3114, Bangladesh. Cell: +8801714078386. alt. e-mail: xo...@yahoo.com On Tue, 26 Nov 2013 20:21:14 -0800 (PST), arun wrote HI Halim, No problem. Regards, Arun On Tuesday, November 26, 2013 11:18 PM, halim10-fes halim10- f...@sust.edu wrote: Hi Arun, Thanks for your help. Sorry for my late response. Take care and stay fine. Regards, Halim On Sun, 24 Nov 2013 07:45:24 -0800 (PST), arun wrote Hi Halim, I guess this works for you. Modifying Jeff's solution: volinp-c(0,0.000467,0.002762,0.008621,0.020014,0.038907,0.067094) vol1 - dcmat %*% (volmat +wt) for(idx in seq_along(volinp)[-1]){ vol1 - cbind(vol1,dcmat %*% (vol1[,idx-1] + volinp[idx] *wt)) } #or vol - matrix( NA, nrow=5, ncol=length( volinp ) ) vol[ , 1 ] - dcmat %*% ( volmat + wt ) for ( idx in seq_along(volinp)[ -1 ] ) { vol[ , idx ] - dcmat %*% ( vol[ , idx-1 ] + volinp[idx] * wt ) } identical(vol,vol1) #[1] TRUE A.K. On Sunday, November 24, 2013 7:16 AM, halim10-fes halim10- f...@sust.edu wrote: Hi Arun, OK, no problem. Thank you very much for your
Re: [R] Relative Cumulative Frequency of Event Occurence
HI, From the dput() version of df.1, it looks like you want: cumsum(df.1[,4]==Yes)/seq_len(nrow(df.1)) [1] 0.000 0.500 0.333 0.250 0.400 0.333 0.4285714 [8] 0.500 0.444 0.500 A.K. On Thursday, November 28, 2013 11:26 AM, Burhan ul haq ulh...@gmail.com wrote: Hi, My objective is to calculate Relative (Cumulative) Frequency of Event Occurrence - something as follows: Sample.Number 1st.Fly 2nd.Fly Did.E.occur? Relative.Cum.Frequency.of.E 1 G B No 0.000 2 B B Yes 0.500 3 B G No 0.333 4 G B No 0.250 5 G G Yes 0.400 6 G B No 0.333 7 B B Yes 0.429 8 G G Yes 0.500 9 G B No 0.444 10 B B Yes 0.500 Please refer to the code below: ## # 1. v.fly=c(G,B) # Outcome is Green or Blue fly # 2. n=10 # No of Events / Trials # 3. v.smp = seq(1:n) # Event Id # 4. v.fst = sample(v.fly,n,rep=T) # Simulating First Draw # 5. v.sec = sample(v.fly,n,rep=T) # Simulating Second Draw # 6. df.1 = data.frame(sample = v.smp, fst=v.fst, sec = v.sec) # Clumping in a DF # 7. df.1$E.Occur = with(df.1, ifelse(fst==sec,TRUE,FALSE)) # Event Occurs, if color is same in both the the draws # 8. df.1$Rel.Freq = with(df.1, cumsum(E.occur)/(E.Occur)) # Relative Frequency This line does NOT work, and needs to fix the denominator part ## Problem is with #8, specifically the part: cumsum(E.occur)/(E.Occur) The denominator E.Occur is a fixed value, instead of a moving count. I have tried nrow(), length() but none provides a moving version of row count, as cumsum does for the True values, occurring so far. dput(df.1) structure(list(Sample.Number = 1:10, X1st.Fly = c(G, B, B, G, G, G, B, G, G, B), X2nd.Fly = c(B, B, G, B, G, B, B, G, B, B), Did.E.occur. = c(No, Yes, No, No, Yes, No, Yes, Yes, No, Yes), Relative.Cum.Frequency.of.E = c(0, 0.5, 0.333, 0.25, 0.4, 0.333, 0.429, 0.5, 0.444, 0.5)), .Names = c(Sample.Number, X1st.Fly, X2nd.Fly, Did.E.occur., Relative.Cum.Frequency.of.E ), class = data.frame, row.names = c(NA, -10L)) Cheers ! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] polychoric correlation with multiple imputations, a Strate and a Weight
Hi there, I'm generally more a Stata user than a R user, but I need to computed something, and I am not able to do it with Stata 13. So, here I am! I have a database that has multiple imputations (imputations are already done) with a complex sample design (Strate and Weight). Is it possible, in R, to run polychoric correlation with multiple imputation, a Strate and a Weight ? For the moment, I have been able to do some statistical analysis with Strate and Weight using the Survey package. I also know that there is a package named Amelia II that can handle multiple imputations, and polycor that can compute polychoric correlations. The problem is that I don't know how to merge all those packages together, if it is possible. So: 1- It is possible to do so ? 2- IF yes, I'm not sure to know how to merge all this together. Any ideas? I don't need any syntax (at least for the moment). I just need a hint of where to start (is there a package that can handle it, is there somebody who wrote paper(s) about it, etc.); I am completely stuck. Thank you! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Counting variables repeted in dataframe columns to create a presence-absence table
Hi! I'm new in R and I'm writing you asking for some guidance. I had analyzed a comparative genomic microarray data of /56 Salmonella/ strains to identify absent genes in each of the serovars, and finally I got a matrix that looks like that: data[1:5,1:5] Abortusovis07918 Agona08561 Anatum08125 Arizonae65S Braenderup08488 1 S5305B_IGR S5305B_IGR S5305B_IGR S5305B_IGR S5305B_IGR 2 S5305A_IGR S5300A_IGR S5305A_IGR S5300A_IGR S5300A_IGR 3 S5300A_IGR S5300B_IGR S5300A_IGR S5300B_IGR S5300B_IGR 4 S5300B_IGR S5299B_IGR S5300B_IGR S5299B_IGR S5299B_IGR 5 S5299B_IGR S5299A_IGR S5299B_IGR S5829B_IGR S5299A_IGR The variables corresponds to those genes identified as absent in each of the serovars. I would like to create a presence-absence matrix of those genes comparing all the serovars at the same time, I assume that should not be complicated but I don't know how to do it. I would like a matrix similar to the next one: data_m[1:5,1:5] Abortusovis07918 Agona08561 Anatum08125 Arizonae65S Braenderup08488 S5305B_IGR 11 11 1 S5305A_IGR 10 10 0 S5300A_IGR 11 11 1 Any help would be welcome, and thank you in advance, Oihane -- Oihane Irazoki Sanchez PhD Student, Molecular Microbiology Genetics and Microbiology Department, Faculty of Biosciences Autonomous University of Barcelona 08193 Bellaterra (Barcelona), Spain Telf: 34 - 935 811 665 E-mail: oihane.iraz...@uab.cat / o.iraz...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assiging name to ip address range
Hi , If it is like: vec1 - c(10.20.30.01,10.20.30.02,10.20.30.40,10.20.30.41,10.20.30.45,10.20.30.254,10.20.30.255,10.20.30.256,10.20.30.313) vec2 - as.numeric(paste0(gsub(^\\d{2}\\.\\d{2}\\.(\\d{2}\\.).*,\\1,vec1),sprintf(%03d,as.numeric(gsub(^\\d{2}\\.\\d{2}\\.\\d{2}\\.,,vec1) as.character(cut(vec2,breaks=c(30,30.040,30.255,30.313),labels=paste0(SKH,1:3))) #[1] SKH1 SKH1 SKH1 SKH2 SKH2 SKH2 SKH2 SKH3 SKH3 #or if the column is: dat1 - data.frame(iprange =c(10.20.30.01 - 10.20.30.40, 10.20.30.40 - 10.20.30.255)) dat1[,1] - factor(dat1[,1],labels=paste0(SKH,1:2)) A.K. I have an ip address column in my dataset which r read as factor.I want to create a new variable for a range like if 10.20.30.01 - 10.20.30.40 then SKH1 if 10.20.30.40 -10.20.30.255 then SKH2 so on 10.20 will always remian same ,other values will change I have around 500 values which i want to assign as per ip address.i am not able to use greater than or less than function .please advise how to do that. Thanks!! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding NA values in random positions in a dataframe
Hi, One way would be: set.seed(42) dat1 - as.data.frame(matrix(sample(c(1:5,NA),50,replace=TRUE,prob=c(10,15,15,20,30,10)),ncol=5)) set.seed(49) dat1[!is.na(dat1)][ match( sample(seq(dat1[!is.na(dat1)]),length(dat1[!is.na(dat1)])*(0.20)),seq(dat1[!is.na(dat1)]))] - NA length(dat1[is.na(dat1)])/length(unlist(dat1)) #[1] 0.28 A.K. Hello, I'm quite new at R so I don't know which is the most efficient way to execute a function that I could write easily in other languages. This is my problem: I have a dataframe with a certain numbers of NA (approximately 10%). I want to add other NA values in random positions of the dataframes until reaching an overall proportions of NA values of 30% (clearly the positions with NA values don't have to change). I tried looking at iterative function in R as apply or sapply but I can't actually figure out how to use them in this case. Thank you. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] importing many csv files into separate matrices
On Nov 27, 2013, at 2:39 PM, yetik serbest wrote: Hi Everyone, I am trying to import many CSV files to their own matrices. Example, alaska_93.csv to alaska. When I execute the following, for each csv.file separately it is successful. singleCSVFile2Matrix - function(x,path) { assign(gsub(pattern=.csv,x,replacement=),read.csv(paste(path,x,sep=))) } when I try to include it in a loop in another function (I have so many csv files to import), it doesn't work. I mean the following function doesn't do it. loadCSVFiles_old - function(path) { x - list.files(path) for (i in 1:length(x)) { assign(gsub(pattern=.csv,x[i],replacement=),read.csv(paste(path,x[i],sep=))) } } It appears you are not returning the values that you created inside that function to the global environment. I would have expected that you would either given `assign` an environment argument or that you would have created a list of items to return from the function. ?environment ?assign Perhaps: loadCSVFiles_old - function(path) { x - list.files(path) for (i in 1:length(x)) { assign(gsub(pattern=.csv,x[i],replacement=), read.csv(paste(path,x[i],sep=))) envir=.GlobalEnv } } Instead, if I execute the foor loop in the command line, it works. I am puzzled. Appreciate any help. thanks yetik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting variables repeted in dataframe columns to create a presence-absence table
Hi, Try: data_m - read.table(text=Abortusovis07918 Agona08561 Anatum08125 Arizonae65S Braenderup08488 1 S5305B_IGR S5305B_IGR S5305B_IGR S5305B_IGR S5305B_IGR 2 S5305A_IGR S5300A_IGR S5305A_IGR S5300A_IGR S5300A_IGR 3 S5300A_IGR S5300B_IGR S5300A_IGR S5300B_IGR S5300B_IGR 4 S5300B_IGR S5299B_IGR S5300B_IGR S5299B_IGR S5299B_IGR 5 S5299B_IGR S5299A_IGR S5299B_IGR S5829B_IGR S5299A_IGR,sep=,header=TRUE,stringsAsFactors=FALSE) data_m$new -1 library(reshape2) dM - melt(data_m,id.vars=new) xtabs(new~value+variable,dM) #or dcast(dM,value~variable,value.var=new,fill=0) A.K. On Thursday, November 28, 2013 12:18 PM, Gmail o.iraz...@gmail.com wrote: Hi! I'm new in R and I'm writing you asking for some guidance. I had analyzed a comparative genomic microarray data of /56 Salmonella/ strains to identify absent genes in each of the serovars, and finally I got a matrix that looks like that: data[1:5,1:5] Abortusovis07918 Agona08561 Anatum08125 Arizonae65S Braenderup08488 1 S5305B_IGR S5305B_IGR S5305B_IGR S5305B_IGR S5305B_IGR 2 S5305A_IGR S5300A_IGR S5305A_IGR S5300A_IGR S5300A_IGR 3 S5300A_IGR S5300B_IGR S5300A_IGR S5300B_IGR S5300B_IGR 4 S5300B_IGR S5299B_IGR S5300B_IGR S5299B_IGR S5299B_IGR 5 S5299B_IGR S5299A_IGR S5299B_IGR S5829B_IGR S5299A_IGR The variables corresponds to those genes identified as absent in each of the serovars. I would like to create a presence-absence matrix of those genes comparing all the serovars at the same time, I assume that should not be complicated but I don't know how to do it. I would like a matrix similar to the next one: data_m[1:5,1:5] Abortusovis07918 Agona08561 Anatum08125 Arizonae65S Braenderup08488 S5305B_IGR 1 1 1 1 1 S5305A_IGR 1 0 1 0 0 S5300A_IGR 1 1 1 1 1 Any help would be welcome, and thank you in advance, Oihane -- Oihane Irazoki Sanchez PhD Student, Molecular Microbiology Genetics and Microbiology Department, Faculty of Biosciences Autonomous University of Barcelona 08193 Bellaterra (Barcelona), Spain Telf: 34 - 935 811 665 E-mail: oihane.iraz...@uab.cat / o.iraz...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.