Re: [R] Find the prediction or the fitted values for an lm model

2013-11-28 Thread Achim Zeileis

On Thu, 28 Nov 2013, jpm miao wrote:


Hi,

  I would like to fit my data with a 4th order polynomial. Now I have only
5 data point, I should have a polynomial that exactly pass the five point

  Then I would like to compute the fitted or predict value with a
relatively large x dataset. How can I do it?

  BTW, I thought the model prodfn should pass by (0,0), but I just
wonder why the const is unequal to zero

x1-c(0,3,4,5,8)
y1-c(0,1,4,7,8)
prodfn-lm(y1 ~ poly(x1, 4))

x-seq(0,8,0.01)

temp-predict(prodfn,data.frame(x=x))   # This line does not work..


You need to call the variable x1 because that is the name you used in the 
original data:


plot(x, predict(prodfn,data.frame(x1=x)), type = l)
points(x1, y1)




prodfn


Call:
lm(formula = y1 ~ poly(x1, 4))

Coefficients:
(Intercept)  poly(x1, 4)1  poly(x1, 4)2  poly(x1, 4)3  poly(x1, 4)4
  4.000e+00 6.517e+00-4.918e-16-2.744e+00-8.882e-16

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Automatic saving of many regression's output

2013-11-28 Thread arun
Hi,

lst1[[1]][,2] - NA
lst2 - lapply(lst1,function(x) summary(lm(rate~.,data=x)))
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  0 (non-NA) cases



lst2 - lapply(lst1[sapply(lst1,function(x) 
!(all(rowSums(is.na(x))0)))],function(x) summary(lm(rate~.,data=x)) )
A.K.



Hi,

thank you for help. :-)

I applied your script to the data but I have got the error:

Error
 in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :  0 
(non-NA) casesI forget to write that some of the data are NA.

I executed this code:

lst1 - split(data[,-16],data[,16])
any(sapply(lst1,nrow)!=123)
#[1] FALSE
lst2
 - lapply(lst1,function(x) 
summary(lm(rate~cap.log+liqamih.log+pbv,data=x))) # here I can set the 
dependent variables if I  want to test different versions of the model 
(e.g with only e dependent variables), right?
length(lst2)
#[1] 334





On Wednesday, November 27, 2013 5:27 PM, arun smartpink...@yahoo.com wrote:
Hi,
Try:
set.seed(49)
dat1 - as.data.frame(matrix(sample(c(NA,1:50),41082*15,replace=TRUE),ncol=15))
 dat1$indx - as.numeric(gl(334*123,123,334*123))
names(dat1)[1] - rate
 lst1 - split(dat1[,-16],dat1[,16])
any(sapply(lst1,nrow)!=123)
#[1] FALSE
lst2 - lapply(lst1,function(x) summary(lm(rate~.,data=x)))
 length(lst2)
#[1] 334

A.K.

Hi all! 

I am very beginner in R so please excuse me some of the naive questions. I am 
learning. 
Here is description of my problem: 

I have database (in single csv file) 
                   characteristic_1    characteristic_2               ...       
   characteristic_49 
subject_1     |      c1_1_t=1             |   c2_1_t=1             ... |     
c49_1_t=1 
subject_2     |      c1_2_t=1             |   c2_2_t=1             ... |     
c49_2_t=1 
subject_3     |      c1_3_t=1             |   c2_3_t=1             ... |     
c49_3_t=1 
... 
subject_334  |      c1_334_t=1         |   c2_334_t=1          ... |     
c49_334_t=1 
subject_1     |      c1_1_t=2            |   c2_1_t=2              ... |     
c49_1_t=2 
subject_2     |      c1_2_t=2            |   c2_2_t=2              ... |     
c49_2_t=2 
subject_3     |      c1_3_t=2            |   c2_3_t=2              ... |     
c49_3_t=2 
... 
subject_334  |      c1_3_t=2            |   c2_3_t=2              ... |     
c49_3_t=2 

and so on ... till t (time) = 123 

so I have 334 subjects with 49 characteristics measured in 123 points of time. 

I would like to run 123 regressions (three kinds: lm, rlm and 
lmrob - for comparison reasons) each one for 334 subjects and 49 
dependent variables and after each regression (actually after conducting
each of the three regressions:lm, rlm and lmrob) I would like to save 
txt (or csv) file with results (summary) and some test* (each regression
can be named reg_1, reg_2 ... reg_123) for those regressions. 

To make things more clear: 
regressions would look like that: 

summary(lm(rate~cap.log+liqamih.log+liqwol.log+pbv.log+mom.log+ 
             +beta.wig+beta.wig.eq 
           +beta.sp 
           +beta.wig.macro 
           +beta.sp.macro 
           +beta.sentim.pl+beta.sentim.pl.ort 
           +beta.sentim.usa+beta.sentim.usa.ort, data=data)) 

the problem is how to make this lm() above for rolling window 
id est for first 334 observations? (total observations: 123*334) and so 
on. 
I need to run regression_1 for first 334 observations, regression_2 
for next 334 obs (from 335 to 669) and so on till regression_123 (from 
last 40748 till 41082). 
And each time I run such regression I would like to save results (summary and 
mentioned tests). 

Then I would like to repeat the same procedure but for rlm() and lmrob() if 
possible. 

I think I can write tests part of the script alone (could you 
write me some comments where exactly I should put it in script to have 
the test automatically repeated the results saved), but 'saving' and 
'repeating 123 times' procedures are quite complicated for me, at least 
now. So here I am asking for help with it. 

In the end I would like to have three txt (or csv) files: 
one containing 123 summaries and test results of lm, 
one containing 123 summaries and test results of rlm 
and one containing 123 summaries and test results of lmrob. 

Could someone help me with this task? 
I am grateful for your help and support. 

 
*like: 
jarque.bera.test() 
vif() 
ncvTest() 
durbinWatsonTest() 

---some of them are not applicable for rlm and lmrob - so in 
this case I would like to have test NA in the three output txt (or 
csv) files 
Some of them are also not applicable to cross-sectional regressions 
... but still I would like to keep them in script for later 
modifications

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Automatic saving of many regression's output

2013-11-28 Thread arun
Hi,
You may try something like:
set.seed(49)
dat1 - as.data.frame(matrix(sample(1:300,41082*15,replace=TRUE),ncol=15)) 
#created only 15 columns as shown in your model
 dat1$indx - as.numeric(gl(334*123,123,334*123))
names(dat1)[1] - rate
 lst1 - split(dat1[,-16],dat1[,16])
any(sapply(lst1,nrow)!=123)
#[1] FALSE
lst2 - lapply(lst1,function(x) summary(lm(rate~.,data=x)))
 length(lst2)
#[1] 334


A.K.




On Wednesday, November 27, 2013 3:41 PM, nooldor nool...@gmail.com wrote:


Thank you for reply.


OK.


you are right, let's make it more clear:

regressions would look like that:

summary(lm(rate~cap.log+liqamih.log+liqwol.log+pbv.log+mom.log+
 +beta.wig+beta.wig.eq
   +beta.sp
   +beta.wig.macro
   +beta.sp.macro
   +beta.sentim.pl+beta.sentim.pl.ort
   +beta.sentim.usa+beta.sentim.usa.ort, data=data))


the problem is how to make this lm() above for rolling window id est for 
first 334 observations? (total observations: 123*334). 
I need to run regresion_1 for first 334 observations, regression_2 for next 334 
obs (from 335 to 669) and so on till regression_123 (from last 40748 till 
41082).

And each time I run such regression I would like to save results (summary and 
mentioned tests).


Then I would like to repeat the same procedure but for rlm() and lmrob() if 
possible.


Hope it's better described now.








On 27 November 2013 21:24, arun smartpink...@yahoo.com wrote:

So, if you have 49 dependent variables, what would be the model for one of the 
123 regressions.
You haven't provided any reproducible example, so its a lot of guess work.










On Wednesday, November 27, 2013 3:18 PM, nooldor nool...@gmail.com wrote:

HI,

Yes, I need to run regression 123 times - each time for 334 subjects with 49 
dependent variables.
Now I am trying rollapply function, but as I mentioned I am beginner so it 
takes time ...




On 27 November 2013 21:11, smartpink...@yahoo.com wrote:

Hi,
You said you wanted 123 test results of 'lm'.  You have 49 dependent 
variables.  So, there is something missing in your description.

quote author='nooldor'
Hi all!

I am very beginner in R so please excuse me some of the naive questions. I
am learning.
Here is description of my problem:

I have database (in single csv file)
                   characteristic_1    characteristic_2               ...
characteristic_49
subject_1     |      c1_1_t=1             |   c2_1_t=1             ... |
c49_1_t=1
subject_2     |      c1_2_t=1             |   c2_2_t=1             ... |
c49_2_t=1
subject_3     |      c1_3_t=1             |   c2_3_t=1             ... |
c49_3_t=1
...
subject_334  |      c1_334_t=1         |   c2_334_t=1          ... |
c49_334_t=1
subject_1     |      c1_1_t=2            |   c2_1_t=2              ... |
c49_1_t=2
subject_2     |      c1_2_t=2            |   c2_2_t=2              ... |
c49_2_t=2
subject_3     |      c1_3_t=2            |   c2_3_t=2              ... |
c49_3_t=2
...
subject_334  |      c1_3_t=2            |   c2_3_t=2              ... |
c49_3_t=2

and so on ... till t (time) = 123

so I have 334 subjects with 49 characteristics measured in 123 points of
time.

I would like to run 123 regressions (three kinds: lm, rlm and lmrob - for
comparison reasons) each one for 334 subjects and 49 dependent variables and
after each regression (actually after conducting each of the three
regressions:lm, rlm and lmrob) I would like to save txt (or csv) file with
results (summary) and some test* (each regression can be named reg_1, reg_2
... reg_123) for those regressions.

I think I can write tests part of the script alone (could you write me
some comments where exactly I should put it in script to have the test
automatically repeated the results saved), but 'saving' and 'repeating 123
times' procedures are quite complicated for me, at least now. So here I am
asking for help with it.

In the end I would like to have three txt (or csv) files:
one containing 123 summaries and test results of lm,
one containing 123 summaries and test results of rlm
and one containing 123 summaries and test results of lmrob.

Could someone help me with this task?
I am grateful for your help and support.


*like:
jarque.bera.test()
vif()
ncvTest()
durbinWatsonTest()

---some of them are not applicable for rlm and lmrob - so in this case I
would like to have test NA in the three output txt (or csv) files
Some of them are also not applicable to cross-sectional regressions ... but
still I would like to keep them in script for later modifications
/quote
Quoted from:
http://r.789695.n4.nabble.com/Automatic-saving-of-many-regression-s-output-tp4681284.html


_
Sent from http://r.789695.n4.nabble.com




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, 

Re: [R] Automatic saving of many regression's output

2013-11-28 Thread arun
Hi,

2. You need to tell which package you are using.

3. Does this work for you?
capture.output(lst2,file=nooldor.txt)

4. 


lst2
 - lapply(lst1[sapply(lst1,function(x) 
!(all(rowSums(is.na(x))0)))],function(x) 
print(summary(lm(rate~.,data=x)))  ###prints the output on R console

A.K.


Hi,

Thank you for patience and help :-)

now the code looks like that:


data-read.table(reg3-dane.csv, head=T, sep=;, dec=,)
data$indx - as.numeric(gl(334*123,123,334*123))
lst1
 - split(data[,-16],data[,16]) # 1. by changing 16 parameter I can
 add or remove variables (also by modyfing the reg3-dane.csv file), 
right?
any(sapply(lst1,nrow)!=123)
#[1] FALSE
lst2 - lapply(lst1[sapply(lst1,function(x) 
!(all(rowSums(is.na(x))0)))],function(x) 
summary(lm(rate~cap.log+liqamih.log+pbv,data=x)) )
length(lst2)
 # 2.where I can place the test for each (from 123) regression like 
jarque.bera.test() 
vif() 
ncvTest() 
durbinWatsonTest() to have it saved with regression summary? and 3. how 
to get those list with results more user-friendly? I would like to get 
the report  
#[1] 334  


is it ok?

Could you help me with the questions in remarks above?

And could you modify the script to also print the summary (and tests) of each 
regression (each of 123) in console?


Best wishes!
T.S.



On Wednesday, November 27, 2013 5:49 PM, arun smartpink...@yahoo.com wrote:



Hi,

lst1[[1]][,2] - NA
lst2 - lapply(lst1,function(x) summary(lm(rate~.,data=x)))
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  0 (non-NA) cases



lst2 - lapply(lst1[sapply(lst1,function(x) 
!(all(rowSums(is.na(x))0)))],function(x) summary(lm(rate~.,data=x)) )
A.K.



Hi,

thank you for help. :-)

I applied your script to the data but I have got the error:

Error
in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :  0 
(non-NA) casesI forget to write that some of the data are NA.

I executed this code:

lst1 - split(data[,-16],data[,16])
any(sapply(lst1,nrow)!=123)
#[1] FALSE
lst2
- lapply(lst1,function(x) 
summary(lm(rate~cap.log+liqamih.log+pbv,data=x))) # here I can set the 
dependent variables if I  want to test different versions of the model 
(e.g with only e dependent variables), right?
length(lst2)
#[1] 334






On Wednesday, November 27, 2013 5:27 PM, arun smartpink...@yahoo.com wrote:
Hi,
Try:
set.seed(49)
dat1 - as.data.frame(matrix(sample(c(NA,1:50),41082*15,replace=TRUE),ncol=15))
 dat1$indx - as.numeric(gl(334*123,123,334*123))
names(dat1)[1] - rate
 lst1 - split(dat1[,-16],dat1[,16])
any(sapply(lst1,nrow)!=123)
#[1] FALSE
lst2 - lapply(lst1,function(x) summary(lm(rate~.,data=x)))
 length(lst2)
#[1] 334

A.K.

Hi all! 

I am very beginner in R so please excuse me some of the naive questions. I am 
learning. 
Here is description of my problem: 

I have database (in single csv file) 
                   characteristic_1    characteristic_2               ...       
   characteristic_49 
subject_1     |      c1_1_t=1             |   c2_1_t=1             ... |     
c49_1_t=1 
subject_2     |      c1_2_t=1             |   c2_2_t=1             ... |     
c49_2_t=1 
subject_3     |      c1_3_t=1             |   c2_3_t=1             ... |     
c49_3_t=1 
... 
subject_334  |      c1_334_t=1         |   c2_334_t=1          ... |     
c49_334_t=1 
subject_1     |      c1_1_t=2            |   c2_1_t=2              ... |     
c49_1_t=2 
subject_2     |      c1_2_t=2            |   c2_2_t=2              ... |     
c49_2_t=2 
subject_3     |      c1_3_t=2            |   c2_3_t=2              ... |     
c49_3_t=2 
... 
subject_334  |      c1_3_t=2            |   c2_3_t=2              ... |     
c49_3_t=2 

and so on ... till t (time) = 123 

so I have 334 subjects with 49 characteristics measured in 123 points of time. 

I would like to run 123 regressions (three kinds: lm, rlm and 
lmrob - for comparison reasons) each one for 334 subjects and 49 
dependent variables and after each regression (actually after conducting
each of the three regressions:lm, rlm and lmrob) I would like to save 
txt (or csv) file with results (summary) and some test* (each regression
can be named reg_1, reg_2 ... reg_123) for those regressions. 

To make things more clear: 
regressions would look like that: 

summary(lm(rate~cap.log+liqamih.log+liqwol.log+pbv.log+mom.log+ 
             +beta.wig+beta.wig.eq 
           +beta.sp 
           +beta.wig.macro 
           +beta.sp.macro 
           +beta.sentim.pl+beta.sentim.pl.ort 
           +beta.sentim.usa+beta.sentim.usa.ort, data=data)) 

the problem is how to make this lm() above for rolling window 
id est for first 334 observations? (total observations: 123*334) and so 
on. 
I need to run regression_1 for first 334 observations, regression_2 
for next 334 obs (from 335 to 669) and so on till regression_123 (from 
last 40748 till 41082). 
And each time I run such regression I would like to save results (summary and 
mentioned tests). 

Then I would like to repeat the same procedure but for 

Re: [R] Automatic saving of many regression's output

2013-11-28 Thread arun
Hi,
Try:
set.seed(49)
dat1 - as.data.frame(matrix(sample(c(NA,1:50),41082*15,replace=TRUE),ncol=15))
 dat1$indx - as.numeric(gl(334*123,123,334*123))
names(dat1)[1] - rate
 lst1 - split(dat1[,-16],dat1[,16])
any(sapply(lst1,nrow)!=123)
#[1] FALSE
lst2 - lapply(lst1,function(x) summary(lm(rate~.,data=x)))
 length(lst2)
#[1] 334

A.K.

Hi all! 

I am very beginner in R so please excuse me some of the naive questions. I am 
learning. 
Here is description of my problem: 

I have database (in single csv file) 
                   characteristic_1    characteristic_2               ...       
   characteristic_49 
subject_1     |      c1_1_t=1             |   c2_1_t=1             ... |     
c49_1_t=1 
subject_2     |      c1_2_t=1             |   c2_2_t=1             ... |     
c49_2_t=1 
subject_3     |      c1_3_t=1             |   c2_3_t=1             ... |     
c49_3_t=1 
... 
subject_334  |      c1_334_t=1         |   c2_334_t=1          ... |     
c49_334_t=1 
subject_1     |      c1_1_t=2            |   c2_1_t=2              ... |     
c49_1_t=2 
subject_2     |      c1_2_t=2            |   c2_2_t=2              ... |     
c49_2_t=2 
subject_3     |      c1_3_t=2            |   c2_3_t=2              ... |     
c49_3_t=2 
... 
subject_334  |      c1_3_t=2            |   c2_3_t=2              ... |     
c49_3_t=2 

and so on ... till t (time) = 123 

so I have 334 subjects with 49 characteristics measured in 123 points of time. 

I would like to run 123 regressions (three kinds: lm, rlm and 
lmrob - for comparison reasons) each one for 334 subjects and 49 
dependent variables and after each regression (actually after conducting
 each of the three regressions:lm, rlm and lmrob) I would like to save 
txt (or csv) file with results (summary) and some test* (each regression
 can be named reg_1, reg_2 ... reg_123) for those regressions. 

To make things more clear: 
regressions would look like that: 

summary(lm(rate~cap.log+liqamih.log+liqwol.log+pbv.log+mom.log+ 
             +beta.wig+beta.wig.eq 
           +beta.sp 
           +beta.wig.macro 
           +beta.sp.macro 
           +beta.sentim.pl+beta.sentim.pl.ort 
           +beta.sentim.usa+beta.sentim.usa.ort, data=data)) 

the problem is how to make this lm() above for rolling window 
id est for first 334 observations? (total observations: 123*334) and so 
on. 
I need to run regression_1 for first 334 observations, regression_2 
for next 334 obs (from 335 to 669) and so on till regression_123 (from 
last 40748 till 41082). 
And each time I run such regression I would like to save results (summary and 
mentioned tests). 

Then I would like to repeat the same procedure but for rlm() and lmrob() if 
possible. 

I think I can write tests part of the script alone (could you 
write me some comments where exactly I should put it in script to have 
the test automatically repeated the results saved), but 'saving' and 
'repeating 123 times' procedures are quite complicated for me, at least 
now. So here I am asking for help with it. 

In the end I would like to have three txt (or csv) files: 
one containing 123 summaries and test results of lm, 
one containing 123 summaries and test results of rlm 
and one containing 123 summaries and test results of lmrob. 

Could someone help me with this task? 
I am grateful for your help and support. 

 
*like: 
jarque.bera.test() 
vif() 
ncvTest() 
durbinWatsonTest() 

---some of them are not applicable for rlm and lmrob - so in 
this case I would like to have test NA in the three output txt (or 
csv) files 
Some of them are also not applicable to cross-sectional regressions 
... but still I would like to keep them in script for later 
modifications

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Automatic saving of many regression's output

2013-11-28 Thread arun
Hi,
No problem,

You could try:
library(tseries)

res6 - do.call(rbind,lapply(lst1[sapply(lst1,function(x) 
!(all(rowSums(is.na(x))0)))],function(x) {resid - 
residuals(lm(rate~.,data=x)); unlist(jarque.bera.test(resid)[1:3])}) )


 A.K.




On Wednesday, November 27, 2013 7:47 PM, Tomasz Schabek 
schabek.tom...@gmail.com wrote:

Great!

Thank you for help one more time!
yes, you are right - jarque.bera.test() should be applied to a vector, so the 
deal is: residuals from each of those 123 regressions captured by e.g:
resid -residuals(model)  and jarque.bera.test(resid) are tested in 
jarque.bera.test(). Could you manage it?

You are really helpful and kind person!




Kind regards,
Atenciosamente,
Pozdrawiam,

T. S.


On 28 November 2013 01:33, arun smartpink...@yahoo.com wrote:



Hi,
In that case:

 lst5 - lapply(lst1[sapply(lst1,function(x) 
!(all(rowSums(is.na(x))0)))],function(x) vif(lm(rate~., x)))
res5 - do.call(rbind,lst5)


As I mentioned earlier, it is not clear how you wanted to test 
jarque.bera.test().  Also, the results from lst3,lst4,lst5 etc could be saved 
using capture.output() (not tested though).  Or if you wanted to modify it and 
wanted only specific categories, for example:
 res4 - do.call(rbind,lapply(lst4,function(x) unlist(x[-4])))



 

On Wednesday, November 27, 2013 7:21 PM, nooldor nool...@gmail.com wrote:

Thank you for fast answer!

and big THANK for help!

I found error in the previous script (it was doing 334 regressions on 123 
length vectors and it should be opposite: 123 regressions on 334 length 
vector) anyway I modify it:

data-read.table(reg3-dane.csv, head=T, sep=;, dec=,)
data$indx - as.numeric(gl(123*334,334,123*334))
lst1 - split(data[,-16],data[,16])
any(sapply(lst1,nrow)!=123)
#[1] FALSE
lapply(lst1[sapply(lst1,function(x) !(all(rowSums(is.na(x))0)))],function(x) 
summary(lm(rate~cap.log,data=x)) )
capture.output(lst2,file=nooldor.txt)
it's ok now (at least when I compared regression summary from excel and R it 
was the same :-) )


capture.output(lst2,file=nooldor.txt) works fine!

packages:
vif {car}
jarque.bera.test {tseries}

ncvTest {car}
durbinWatsonTest {car}


R version 3.0.2 (2013-09-25)


T.S.


On 28 November 2013 00:38, arun smartpink...@yahoo.com wrote:

Hi,

2. You need to tell which package you are using.

3. Does this work for you?
capture.output(lst2,file=nooldor.txt)

4.



lst2
 - lapply(lst1[sapply(lst1,function(x)
!(all(rowSums(is.na(x))0)))],function(x)
print(summary(lm(rate~.,data=x)))  ###prints the output on R console

A.K.



Hi,

Thank you for patience and help :-)

now the code looks like that:


data-read.table(reg3-dane.csv, head=T, sep=;, dec=,)
data$indx - as.numeric(gl(334*123,123,334*123))
lst1

 - split(data[,-16],data[,16]) # 1. by changing 16 parameter I can
 add or remove variables (also by modyfing the reg3-dane.csv file),
right?
any(sapply(lst1,nrow)!=123)
#[1] FALSE

lst2 - lapply(lst1[sapply(lst1,function(x) 
!(all(rowSums(is.na(x))0)))],function(x) 
summary(lm(rate~cap.log+liqamih.log+pbv,data=x)) )
length(lst2)
 # 2.where I can place the test for each (from 123) regression like
jarque.bera.test() 
vif() 
ncvTest() 
durbinWatsonTest() to have it saved with regression summary? and 3. how
to get those list with results more user-friendly? I would like to get
the report 
#[1] 334 


is it ok?

Could you help me with the questions in remarks above?

And could you modify the script to also print the summary (and tests) of each 
regression (each of 123) in console?


Best wishes!
T.S.




On Wednesday, November 27, 2013 5:49 PM, arun smartpink...@yahoo.com wrote:



Hi,

lst1[[1]][,2] - NA
lst2 - lapply(lst1,function(x) summary(lm(rate~.,data=x)))
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
  0 (non-NA) cases



lst2 - lapply(lst1[sapply(lst1,function(x) 
!(all(rowSums(is.na(x))0)))],function(x) summary(lm(rate~.,data=x)) )
A.K.



Hi,

thank you for help. :-)

I applied your script to the data but I have got the error:

Error
in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :  0
(non-NA) casesI forget to write that some of the data are NA.

I executed this code:

lst1 - split(data[,-16],data[,16])
any(sapply(lst1,nrow)!=123)
#[1] FALSE
lst2
- lapply(lst1,function(x)
summary(lm(rate~cap.log+liqamih.log+pbv,data=x))) # here I can set the
dependent variables if I  want to test different versions of the model
(e.g with only e dependent variables), right?
length(lst2)
#[1] 334






On Wednesday, November 27, 2013 5:27 PM, arun smartpink...@yahoo.com wrote:
Hi,
Try:
set.seed(49)
dat1 - 
as.data.frame(matrix(sample(c(NA,1:50),41082*15,replace=TRUE),ncol=15))
 dat1$indx - as.numeric(gl(334*123,123,334*123))
names(dat1)[1] - rate
 lst1 - split(dat1[,-16],dat1[,16])
any(sapply(lst1,nrow)!=123)
#[1] FALSE
lst2 - lapply(lst1,function(x) summary(lm(rate~.,data=x)))
 length(lst2)
#[1] 334

A.K.

Hi all!

I am very beginner in R so please excuse me some of the naive questions. I am 

Re: [R] Automatic saving of many regression's output

2013-11-28 Thread arun
HI,

Just tried ncvTest() and durbinWatsonTest() from library(car)


f4 - function(meanmod, dta, varmod) {
assign(.dta, dta, envir=.GlobalEnv)
assign(.meanmod, meanmod, envir=.GlobalEnv)
m1 - lm(.meanmod, .dta)
ans - ncvTest(m1, varmod)
remove(.dta, envir=.GlobalEnv)
remove(.meanmod, envir=.GlobalEnv)
ans
}
library(car)
 lst3 - lapply(lst1[sapply(lst1,function(x) 
!(all(rowSums(is.na(x))0)))],function(x) f4(rate~., x))
 lst4 - lapply(lst1[sapply(lst1,function(x) 
!(all(rowSums(is.na(x))0)))],function(x) durbinWatsonTest(lm(rate~., x)))
?jarque.bera.test() from library(tseries) is applied on a numeric vector or 
time series. 

A.K.





On Wednesday, November 27, 2013 6:38 PM, arun smartpink...@yahoo.com wrote:
Hi,

2. You need to tell which package you are using.

3. Does this work for you?
capture.output(lst2,file=nooldor.txt)

4. 


lst2
- lapply(lst1[sapply(lst1,function(x) 
!(all(rowSums(is.na(x))0)))],function(x) 
print(summary(lm(rate~.,data=x)))  ###prints the output on R console

A.K.


Hi,

Thank you for patience and help :-)

now the code looks like that:


data-read.table(reg3-dane.csv, head=T, sep=;, dec=,)
data$indx - as.numeric(gl(334*123,123,334*123))
lst1
- split(data[,-16],data[,16]) # 1. by changing 16 parameter I can
add or remove variables (also by modyfing the reg3-dane.csv file), 
right?
any(sapply(lst1,nrow)!=123)
#[1] FALSE
lst2 - lapply(lst1[sapply(lst1,function(x) 
!(all(rowSums(is.na(x))0)))],function(x) 
summary(lm(rate~cap.log+liqamih.log+pbv,data=x)) )
length(lst2)
# 2.where I can place the test for each (from 123) regression like 
jarque.bera.test() 
vif() 
ncvTest() 
durbinWatsonTest() to have it saved with regression summary? and 3. how 
to get those list with results more user-friendly? I would like to get 
the report  
#[1] 334  


is it ok?

Could you help me with the questions in remarks above?

And could you modify the script to also print the summary (and tests) of each 
regression (each of 123) in console?


Best wishes!
T.S.




On Wednesday, November 27, 2013 5:49 PM, arun smartpink...@yahoo.com wrote:



Hi,

lst1[[1]][,2] - NA
lst2 - lapply(lst1,function(x) summary(lm(rate~.,data=x)))
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  0 (non-NA) cases



lst2 - lapply(lst1[sapply(lst1,function(x) 
!(all(rowSums(is.na(x))0)))],function(x) summary(lm(rate~.,data=x)) )
A.K.



Hi,

thank you for help. :-)

I applied your script to the data but I have got the error:

Error
in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :  0 
(non-NA) casesI forget to write that some of the data are NA.

I executed this code:

lst1 - split(data[,-16],data[,16])
any(sapply(lst1,nrow)!=123)
#[1] FALSE
lst2
- lapply(lst1,function(x) 
summary(lm(rate~cap.log+liqamih.log+pbv,data=x))) # here I can set the 
dependent variables if I  want to test different versions of the model 
(e.g with only e dependent variables), right?
length(lst2)
#[1] 334






On Wednesday, November 27, 2013 5:27 PM, arun smartpink...@yahoo.com wrote:
Hi,
Try:
set.seed(49)
dat1 - as.data.frame(matrix(sample(c(NA,1:50),41082*15,replace=TRUE),ncol=15))
 dat1$indx - as.numeric(gl(334*123,123,334*123))
names(dat1)[1] - rate
 lst1 - split(dat1[,-16],dat1[,16])
any(sapply(lst1,nrow)!=123)
#[1] FALSE
lst2 - lapply(lst1,function(x) summary(lm(rate~.,data=x)))
 length(lst2)
#[1] 334

A.K.

Hi all! 

I am very beginner in R so please excuse me some of the naive questions. I am 
learning. 
Here is description of my problem: 

I have database (in single csv file) 
                   characteristic_1    characteristic_2               ...       
   characteristic_49 
subject_1     |      c1_1_t=1             |   c2_1_t=1             ... |     
c49_1_t=1 
subject_2     |      c1_2_t=1             |   c2_2_t=1             ... |     
c49_2_t=1 
subject_3     |      c1_3_t=1             |   c2_3_t=1             ... |     
c49_3_t=1 
... 
subject_334  |      c1_334_t=1         |   c2_334_t=1          ... |     
c49_334_t=1 
subject_1     |      c1_1_t=2            |   c2_1_t=2              ... |     
c49_1_t=2 
subject_2     |      c1_2_t=2            |   c2_2_t=2              ... |     
c49_2_t=2 
subject_3     |      c1_3_t=2            |   c2_3_t=2              ... |     
c49_3_t=2 
... 
subject_334  |      c1_3_t=2            |   c2_3_t=2              ... |     
c49_3_t=2 

and so on ... till t (time) = 123 

so I have 334 subjects with 49 characteristics measured in 123 points of time. 

I would like to run 123 regressions (three kinds: lm, rlm and 
lmrob - for comparison reasons) each one for 334 subjects and 49 
dependent variables and after each regression (actually after conducting
each of the three regressions:lm, rlm and lmrob) I would like to save 
txt (or csv) file with results (summary) and some test* (each regression
can be named reg_1, reg_2 ... reg_123) for those regressions. 

To make things more clear: 
regressions would look like that: 


Re: [R] Find the prediction or the fitted values for an lm model

2013-11-28 Thread Rolf Turner


See in-line below.

On 11/28/13 20:50, jpm miao wrote:

Hi,

I would like to fit my data with a 4th order polynomial. Now I have only
5 data point, I should have a polynomial that exactly pass the five point

Then I would like to compute the fitted or predict value with a
relatively large x dataset. How can I do it?

BTW, I thought the model prodfn should pass by (0,0), but I just
wonder why the const is unequal to zero


Because poly() produces orthonormalized polynomials,  Look at poly(x1,4).
It is not much like cbind(x1,x1^2,x1^3,x1^4), is it?

cheers,

Rolf Turner


x1-c(0,3,4,5,8)
y1-c(0,1,4,7,8)
prodfn-lm(y1 ~ poly(x1, 4))

x-seq(0,8,0.01)

temp-predict(prodfn,data.frame(x=x))   # This line does not work..



prodfn

Call:
lm(formula = y1 ~ poly(x1, 4))

Coefficients:
  (Intercept)  poly(x1, 4)1  poly(x1, 4)2  poly(x1, 4)3  poly(x1, 4)4
4.000e+00 6.517e+00-4.918e-16-2.744e+00-8.882e-16

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problems dealing with matrices

2013-11-28 Thread halim10-fes
Hi,

Sorry for continuous bothering. Continuum of the previous problem...

I have the following matrices and vectors,

dcmat-matrix(c(0.13,0.61,0.25,0.00,0.00,0.00,0.52,0.37,0.09,0.00,0.00,0.00, 
0.58,0.30,0.11,0.00,0.00,0.00,0.46,0.22,0.00,0.00,0.00,0.00, 
0.09),nrow=5,ncol=5) 

volini-matrix(c(0,0,0,0,0),nrow=5,ncol=1)

volinp1-c(0, 0.0004669094, 0.0027610861, 0.0086204692, 0.0200137754, 
0.0389069106 ,0.0670942588, 0.1060941424, 0.1570990708, 0.2209672605, 
0.2982420945, 0.3891882830, 0.4938361307, 0.6120278338, 0.7434618363, 
0.8877329008, 1.0443667375, 1.2128488387, 1.3926476912, 1.5832328410, 
1.7840884399, 1.9947229566, 2.2146757191, 2.4435209092, 2.6808695568, 
2.9263700050, 3.1797072430, 3.4406014299, 3.7088058696, 3.9841046430, 
4.2663100561, 4.5552600226, 4.8508154713, 5.1528578389, 5.4612866929,
5.7760175114, 6.0969796345, 6.4241143947, 6.7573734248, 7, 7 ,7, 7, 7, 7, 7, 
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,  7,  
7, 7, 7)


I've calculated the following matrices vol and volyrdc1 (obviously with the 
help of Jeff and Arun):

#Blank matrices for dumping final values

vol - matrix( NA, nrow=5, ncol=length(volinp1))

volyrdc1-matrix(NA, nrow=5,ncol=length(volinp1),dimnames= 
list(c(DC1,DC2,DC3,DC4,DC5),c(seq(0,500,5

vol[ , 1 ] - dcmat %*% (volini+(volinp1[1]*wt))

wt-matrix(c(1,0,0,0,0),nrow=5)

for ( idx in seq_along(volinp1)[ -1 ] ) { 
  vol[ , idx ] - dcmat %*% ( vol[ , idx-1 ] + volinp1[idx] * wt ) 
}  

vol

volyrdc1[,1]-vol[,1]

for ( idx in seq_along(volinp1)[ -1 ] ) { 
  volyrdc1[ , idx ] - vol[ , idx-1 ] + volinp1[idx] * wt
  }  
volyrdc1

My final matrix in 'volyrdc1' (kind of transition matrix model). 

Now, what I want to do is to calculate when the colsum-colSums(volyrdc1) 
reaches a certain value and I want to get the index of the element in the 
'colsum' vector at that point. For e.g. when colsum[colsum=18] ? It will give 
a series of cases where the condition is true. But I want index of the element 
immediately when the condition is met. In this case, the answer I want is 140 
(colsum[29] returns both value (18.63) and the character (140) attributing 
the index). Actually, in my case 140 is year (age) when the 'colsum' becomes 
=18. At is point it would be great if I can calculate when 'colsum' levels 
off (up to two decimal place)? The answer is: 305 and at that point 
colsum==45.37.  

I also want to calculate what should be the value in volini[1,1] to get a 
certain value in 'colsum' at a certain year (age)(vector element index 
explained earlier)? For e.g. I want to find out that what should be the value 
in volini[1,1] if I want colsum==18 at 100(charater attributing colsum[21])? 
The answer is: 15910 and the 'volini' matrix will look like:

volini-matrix(c(15910,0,0,0,0),nrow=5,ncol=1)

Any pointer, suggestions,... will be gratefully acknowledged.

P.S. Can you please suggest me any effective R programming book that describe 
core elements of R programming?

Thanks in advance.

Regards,

Halim                
---
Md. Abdul Halim
Assistant Professor
Department of Forestry and Environmental Science
Shahjalal University of Science and Technology,Sylhet-3114,
Bangladesh.
Cell: +8801714078386.
alt. e-mail: xo...@yahoo.com







On Tue, 26 Nov 2013 20:21:14 -0800 (PST), arun wrote
 HI Halim,
 
 No problem.
 Regards,
 Arun
 
 On Tuesday, November 26, 2013 11:18 PM, halim10-fes halim10-
 f...@sust.edu wrote: Hi Arun,
 
 Thanks for your help. Sorry for my late response. Take care and stay 
 fine.
 
 Regards,
 
 Halim
 
 On Sun, 24 Nov 2013 07:45:24 -0800 (PST), arun wrote
  Hi Halim,
  I guess this works for you.  Modifying Jeff's solution:
  
  volinp-c(0,0.000467,0.002762,0.008621,0.020014,0.038907,0.067094)
  vol1 - dcmat %*% (volmat +wt)
  for(idx in seq_along(volinp)[-1]){
   vol1 - cbind(vol1,dcmat %*% (vol1[,idx-1] + volinp[idx] *wt))
   }
  
  #or
  
  vol - matrix( NA, nrow=5, ncol=length( volinp ) )
  vol[ , 1 ] - dcmat %*% ( volmat + wt )
  
  for ( idx in seq_along(volinp)[ -1 ] ) {
    vol[ , idx ] - dcmat %*% ( vol[ , idx-1 ] + volinp[idx] * wt )
  }
  identical(vol,vol1)
  #[1] TRUE
  
  A.K.
  
  On Sunday, November 24, 2013 7:16 AM, halim10-fes halim10-
  f...@sust.edu wrote: Hi Arun,
  
  OK, no problem. Thank you very much for your attention. I've posted 
  an annex to my previous problem. I will appreciate your 
  comments/suggestions on it.
  
  Off-topic: You're a very helpful man. I like your attitude to 
  helping others.
  
  Take care.
  
  Halim
  
  On Sun, 24 Nov 2013 01:18:18 -0800 (PST), arun wrote
   Hi,
   Please disregard my earlier message. Looks like Jeff understand it 
   better and answered it. Regards, Arun
   
   On Sunday, November 24, 2013 3:23 AM, arun smartpink...@yahoo.com 
wrote:
   Hi,
   I am finding some inconsistency with your description.
   For example:
   

Re: [R] if, apply, ifelse

2013-11-28 Thread Jim Lemon

On 11/28/2013 04:33 AM, Andrea Lamont wrote:

Hello:

This seems like an obvious question, but I am having trouble answering it.
I am new to R, so I apologize if its too simple to be posting. I have
searched for solutions to no avail.

I have data that I am trying to set up for further analysis (training
data). What I need is 12 groups based on patterns of 4 variables. The
complication comes in when missing data is present. Let  me describe with
an example - focusing on just 3 of the 12 groups:
...
Any ideas on how to approach this efficiently?


Hi Andrea,
I would first convert the matrix a to a data frame:

a1-as.data.frame(a)

Then I would start adding columns:

# group 1 is a 1 (logical TRUE) in col1 and at least one other 1
# here NAs are converted to zeros
a1$group1-a1$col1  (ifelse(is.na(a1$col2),0,a1$col2) |
 ifelse(is.na(a1$col3),0,a1$col3) |
 ifelse(is.na(a1$col4),0,a1$col4))
# group 2 is a 1 in col1 and no other 1s
# here NAs are converted to 1s
a1$group2-a1$col1  !(ifelse(is.na(a1$col2),1,a1$col2) |
 ifelse(is.na(a1$col3),1,a1$col3) |
 ifelse(is.na(a1$col4),1,a1$col4))
# here NAs are converted to 1s
a1$group3-!ifelse(is.na(a1$col1),1,a1$col1)

and so on. It is clunky, but then you've got a clunky problem.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] if, apply, ifelse

2013-11-28 Thread Miguel Manese
Hi Andrea,

A cleaner alternative to Jim's suggestion is something like

a.df - as.data.frame(a)

group1 - (a.df$col1 == 1)  apply(a.df[,c(col2,col3,col4)], 2,
function(x) any(x == 1 | is.na(x)))

group2 - (a.df$col1 == 1)  apply(a.df[,c(col2,col3,col4)], 1,
function(x) all(x == 0 | is.na(x)))

group3 - (a.df$col1 != 1)

- Jon



On Thu, Nov 28, 2013 at 5:10 PM, Jim Lemon j...@bitwrit.com.au wrote:

 On 11/28/2013 04:33 AM, Andrea Lamont wrote:

 Hello:

 This seems like an obvious question, but I am having trouble answering it.
 I am new to R, so I apologize if its too simple to be posting. I have
 searched for solutions to no avail.

 I have data that I am trying to set up for further analysis (training
 data). What I need is 12 groups based on patterns of 4 variables. The
 complication comes in when missing data is present. Let  me describe with
 an example - focusing on just 3 of the 12 groups:
 ...

 Any ideas on how to approach this efficiently?

  Hi Andrea,
 I would first convert the matrix a to a data frame:

 a1-as.data.frame(a)

 Then I would start adding columns:

 # group 1 is a 1 (logical TRUE) in col1 and at least one other 1
 # here NAs are converted to zeros
 a1$group1-a1$col1  (ifelse(is.na(a1$col2),0,a1$col2) |
  ifelse(is.na(a1$col3),0,a1$col3) |
  ifelse(is.na(a1$col4),0,a1$col4))
 # group 2 is a 1 in col1 and no other 1s
 # here NAs are converted to 1s
 a1$group2-a1$col1  !(ifelse(is.na(a1$col2),1,a1$col2) |
  ifelse(is.na(a1$col3),1,a1$col3) |
  ifelse(is.na(a1$col4),1,a1$col4))
 # here NAs are converted to 1s
 a1$group3-!ifelse(is.na(a1$col1),1,a1$col1)

 and so on. It is clunky, but then you've got a clunky problem.

 Jim


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/
 posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help ANN

2013-11-28 Thread Giulia Di Lauro
Hi everybody,
first, I'm not high skilled about R, so please: be understandable!!

I would like to create an artificial neural network with R but I don't know
its parameters jet (number of layers, number of neurons,...).
I downloaded the package ANN and I use the function ANNGA, but I'm afraid
I haven't really created a neural network. In fact, at the end of the
process I have just this output:

Call:
ANNGA.default(x = input, y = output, design = c(1, 3, 1), population = 100,
mutation = 0.2, crossover = 0.6, maxW = 10, minW = -10, maxGen = 1000,
error = 0.001)


Mean Squared Error-- 0.01148523
R2-- 0.6918387
Number of generation 1001
Weight range at initialization-- [ 10 , -10 ]
Weight range resulted from the optimisation- [ 13.58698 , -12.93606 ]


Well, I would like to know if there is in ANN a function to *create* a
neural network and if not, which package I have to download, Nnet?

Thanks in advance!
Giulia Di Lauro.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ODE does not reach steady state and increase exponentially

2013-11-28 Thread Matteo Charlie Ichino
Dear all,
please follow the link to the question that I posted on StackOverflow about
my R code with ODE
http://stackoverflow.com/questions/20218065/ode-does-not-reach-steady-state-and-increase-exponentially

I am trying to write a code for a differential equation that should give me
the biomass of different size classes, depending on the amount of available
food. Nevertheless the biomass does not reach steady state, and increases
exponentially even with parameters set to 0 (which in theory should result
in a biomass value of 0)

Thank you very much for your help!
I appreciate it

Matteo
-- 
   . . .
'.-:-.`
'  :  `
 .-:
   .'   `.
 ,/   (o) \
 \`._/  ,__)
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Multivariate dispersion distances

2013-11-28 Thread M Elo
Dear All,

I'm using betadisper {vegan} and I'm interested not only in the dispersion
within the group but also the distances between the groups. With betadisper
I get distances to group centroids but is it possible to get distances to
other groups centroids? 

It might be possible to do it by hand by the formula given in the
description of the betadisper (below) but I'm a bit confused how to treat
the imaginary part there...

z[ij]^c = sqrt(Delta^2(u[ij]^+, c[i]^+) - Delta^2(u[ij]^-, c[i]^-))


I would highly appreciate all the help I can get!

-Merja





--
View this message in context: 
http://r.789695.n4.nabble.com/Multivariate-dispersion-distances-tp4681326.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] date format

2013-11-28 Thread eliza botto
Dear Users of R,
I have a data frame with three column, the first column contains years, the 
second one months and third one, the days (cbind( mm dd)). I want to 
combine them so that i have one column with the date format as (dd.mm.).
Is there a way of doing that.
Thanks in advance,
Eliza 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multivariate dispersion distances

2013-11-28 Thread Jari Oksanen
M Elo merja.t.elo at luukku.com writes:

 
 Dear All,
 
 I'm using betadisper {vegan} and I'm interested not only in the dispersion
 within the group but also the distances between the groups. With betadisper
 I get distances to group centroids but is it possible to get distances to
 other groups centroids? 
 
 It might be possible to do it by hand by the formula given in the
 description of the betadisper (below) but I'm a bit confused how to treat
 the imaginary part there...
 
 z[ij]^c = sqrt(Delta^2(u[ij]^+, c[i]^+) - Delta^2(u[ij]^-, c[i]^-))
 
 I would highly appreciate all the help I can get!

Merja,

You should do it exactly in the same way as you wrote above: subtract the
squared Euclidean distances in the imaginary part from the squared
Euclidean distances in the real part and take the square root. I think
doing this by hand is the only way to do this directly. The scope of 
the method is to compare dispersions within groups. There are other
tools to compare the locations of group centroids (adonis in vegan), but
they won't give you distances.

Cheers, Jari Oksanen

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] if, apply, ifelse

2013-11-28 Thread Bert Gunter
Jim, et. al:

rowSums(a, na.rm=TRUE) ## Fast!

tells you whether you have 0, 1, or = 1 TRUE in each row.
This can then be combined with the ifelse() conditions to get what the
OP seems to want. As you said, it's clunky, and is just a minor
simplification. But, then again, her logic seemed somewhat confusing.

Cheers,
Bert



On Thu, Nov 28, 2013 at 1:10 AM, Jim Lemon j...@bitwrit.com.au wrote:
 On 11/28/2013 04:33 AM, Andrea Lamont wrote:

 Hello:

 This seems like an obvious question, but I am having trouble answering it.
 I am new to R, so I apologize if its too simple to be posting. I have
 searched for solutions to no avail.

 I have data that I am trying to set up for further analysis (training
 data). What I need is 12 groups based on patterns of 4 variables. The
 complication comes in when missing data is present. Let  me describe with
 an example - focusing on just 3 of the 12 groups:
 ...
 Any ideas on how to approach this efficiently?

 Hi Andrea,
 I would first convert the matrix a to a data frame:

 a1-as.data.frame(a)

 Then I would start adding columns:

 # group 1 is a 1 (logical TRUE) in col1 and at least one other 1
 # here NAs are converted to zeros
 a1$group1-a1$col1  (ifelse(is.na(a1$col2),0,a1$col2) |
  ifelse(is.na(a1$col3),0,a1$col3) |
  ifelse(is.na(a1$col4),0,a1$col4))
 # group 2 is a 1 in col1 and no other 1s
 # here NAs are converted to 1s
 a1$group2-a1$col1  !(ifelse(is.na(a1$col2),1,a1$col2) |
  ifelse(is.na(a1$col3),1,a1$col3) |
  ifelse(is.na(a1$col4),1,a1$col4))
 # here NAs are converted to 1s
 a1$group3-!ifelse(is.na(a1$col1),1,a1$col1)

 and so on. It is clunky, but then you've got a clunky problem.

 Jim

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

(650) 467-7374

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] date format

2013-11-28 Thread Ben Bolker
eliza botto eliza_botto at hotmail.com writes:

 
 Dear Users of R,
 I have a data frame with three column, the first column contains years,
the second one months and third one,
 the days (cbind( mm dd)). I want to combine them so that i have one
column with the date format as (dd.mm.).
 Is there a way of doing that.
 Thanks in advance,
 Eliza 

  I think just paste(dd,mm,,sep=.) should work fine (where 'dd','mm',
'' are references to your columns)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] date format

2013-11-28 Thread eliza botto
Dear bert, arun and philipps,Thanks for your help. It worked perfectly fine for 
me.:D
Eliza

 Date: Thu, 28 Nov 2013 16:09:58 +0100
 From: wev...@web.de
 To: eliza_bo...@hotmail.com; r-help@r-project.org
 Subject: Re: [R] date format
 
 Hi Eliza,
 
 # you can use paste to create a new vector:
 date1-paste( dataframe[,3], dataframe[,2],dataframe[,1], sep=. )
 
 # you could then turn that into a Date-Time-Class with which you could 
 do calculations
 strptime(date1, format=%d.%m.%Y)
 
 
 Am 28.11.2013 14:54, schrieb eliza botto:
  Dear Users of R,
  I have a data frame with three column, the first column contains years, the 
  second one months and third one, the days (cbind( mm dd)). I want to 
  combine them so that i have one column with the date format as (dd.mm.).
  Is there a way of doing that.
  Thanks in advance,
  Eliza   
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 -- 
 
 Philipp Wevers
 wev...@web.de
 Mobil: 015253710061
 fest: 03080921097
 Koloniestraße 126 A
 13359 Berlin
 wev...@web.de
 
 
 ---
 Diese E-Mail ist frei von Viren und Malware, denn der avast! Antivirus Schutz 
 ist aktiv.
 http://www.avast.com
 
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] date format

2013-11-28 Thread Rui Barradas

Hello,

Maybe something like the following.

dat - data.frame( = 2011:2013, mm = 1:3, dd = 4:6)

apply(dat, 1, function(x) paste(rev(x), collapse = .))


Hope this helps,

Rui Barradas

Em 28-11-2013 13:54, eliza botto escreveu:

Dear Users of R,
I have a data frame with three column, the first column contains years, the 
second one months and third one, the days (cbind( mm dd)). I want to 
combine them so that i have one column with the date format as (dd.mm.).
Is there a way of doing that.
Thanks in advance,
Eliza   
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] date format

2013-11-28 Thread eliza botto
Thnx rui,
Eliza

 Date: Thu, 28 Nov 2013 15:16:35 +
 From: ruipbarra...@sapo.pt
 To: eliza_bo...@hotmail.com; r-help@r-project.org
 Subject: Re: [R] date format
 
 Hello,
 
 Maybe something like the following.
 
 dat - data.frame( = 2011:2013, mm = 1:3, dd = 4:6)
 
 apply(dat, 1, function(x) paste(rev(x), collapse = .))
 
 
 Hope this helps,
 
 Rui Barradas
 
 Em 28-11-2013 13:54, eliza botto escreveu:
  Dear Users of R,
  I have a data frame with three column, the first column contains years, the 
  second one months and third one, the days (cbind( mm dd)). I want to 
  combine them so that i have one column with the date format as (dd.mm.).
  Is there a way of doing that.
  Thanks in advance,
  Eliza   
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Relative Cumulative Frequency of Event Occurence

2013-11-28 Thread Burhan ul haq
Hi,

My objective is to calculate Relative (Cumulative) Frequency of Event
Occurrence - something as follows:

Sample.Number 1st.Fly 2nd.Fly  Did.E.occur? Relative.Cum.Frequency.of.E
1 G B No 0.000
2 B B Yes 0.500
3 B G No 0.333
4 G B No 0.250
5 G G Yes 0.400
6 G B No 0.333
7 B B Yes 0.429
8 G G Yes 0.500
9 G B No 0.444
10 B B Yes 0.500

Please refer to the code below:
##
# 1.
v.fly=c(G,B) # Outcome is Green or Blue fly

# 2.
n=10 # No of Events / Trials

# 3.
v.smp = seq(1:n) # Event Id

# 4.
v.fst = sample(v.fly,n,rep=T) # Simulating First Draw

# 5.
v.sec = sample(v.fly,n,rep=T)  # Simulating Second Draw

# 6.
df.1 = data.frame(sample = v.smp, fst=v.fst, sec = v.sec) # Clumping in a DF

# 7.
df.1$E.Occur = with(df.1, ifelse(fst==sec,TRUE,FALSE)) # Event Occurs, if
color is same in both the the draws

# 8.
df.1$Rel.Freq = with(df.1, cumsum(E.occur)/(E.Occur)) # Relative Frequency
 This line does NOT work, and needs to fix the denominator part
##

Problem is with #8, specifically the part:
cumsum(E.occur)/(E.Occur)

The denominator E.Occur is a fixed value, instead of a moving count. I have
tried nrow(), length() but none provides a moving version of row count, as
cumsum does for the True values, occurring so far.

 dput(df.1)
structure(list(Sample.Number = 1:10, X1st.Fly = c(G, B, B,
G, G, G, B, G, G, B), X2nd.Fly = c(B, B, G,
B, G, B, B, G, B, B), Did.E.occur. = c(No, Yes,
No, No, Yes, No, Yes, Yes, No, Yes),
Relative.Cum.Frequency.of.E = c(0,
0.5, 0.333, 0.25, 0.4, 0.333, 0.429, 0.5, 0.444, 0.5)), .Names =
c(Sample.Number,
X1st.Fly, X2nd.Fly, Did.E.occur., Relative.Cum.Frequency.of.E
), class = data.frame, row.names = c(NA, -10L))


Cheers !

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] date format

2013-11-28 Thread arun
Hi,
Try:
dat1 - data.frame(years=rep(1991:1992,12), months=rep(1:12,2),days= rep(1,24))
 dat1$day - 
format(as.Date(paste(dat1[,1],sprintf(%02d,dat1[,2]),sprintf(%02d,dat1[,3]),sep=.),%Y.%m.%d),%d.%m.%Y)
A.K.




On Thursday, November 28, 2013 8:56 AM, eliza botto eliza_bo...@hotmail.com 
wrote:
Dear Users of R,
I have a data frame with three column, the first column contains years, the 
second one months and third one, the days (cbind( mm dd)). I want to 
combine them so that i have one column with the date format as (dd.mm.).
Is there a way of doing that.
Thanks in advance,
Eliza                           
    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] date format

2013-11-28 Thread arun
#Or
 paste(dat[,3],dat[,2],dat[,1],sep=.)
#[1] 4.1.2011 5.2.2012 6.3.2013
#
 as.character(interaction(dat[,3:1]))


 paste(sprintf(%02d,dat[,3]),sprintf(%02d,dat[,2]),dat[,1],sep=.)
#[1] 04.01.2011 05.02.2012 06.03.2013


A.K.




On Thursday, November 28, 2013 10:18 AM, Rui Barradas ruipbarra...@sapo.pt 
wrote:
Hello,

Maybe something like the following.

dat - data.frame( = 2011:2013, mm = 1:3, dd = 4:6)

apply(dat, 1, function(x) paste(rev(x), collapse = .))


Hope this helps,

Rui Barradas

Em 28-11-2013 13:54, eliza botto escreveu:
 Dear Users of R,
 I have a data frame with three column, the first column contains years, the 
 second one months and third one, the days (cbind( mm dd)). I want to 
 combine them so that i have one column with the date format as (dd.mm.).
 Is there a way of doing that.
 Thanks in advance,
 Eliza                         
     [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problems dealing with matrices

2013-11-28 Thread arun
Hi Halim,

For the first two questions, you may try:
colsum1 - colSums(volyrdc1)
min(which(colsum1=18))
#[1] 29
#or
 head(which(colsum1=18),1)
#140 
# 29 


colsum1[substr(colsum1,6,7)==00]  ## this is not very clear
 305 
45.37004 
#or
colsum1[colsum1=18][substr(colsum1[colsum1=18],6,7)==00]
 305 
45.37004 

#because
sprintf(%.4f,colsum1[colsum1=18])
colsum1[colsum1=18][gsub(.*\\.\\d{2},,sprintf(%.4f,colsum1[colsum1=18]))==00]
 180  305 
32.88996 45.37004 



A.K.




On Thursday, November 28, 2013 3:57 AM, halim10-fes halim10-...@sust.edu 
wrote:
Hi,

Sorry for continuous bothering. Continuum of the previous problem...

I have the following matrices and vectors,

dcmat-matrix(c(0.13,0.61,0.25,0.00,0.00,0.00,0.52,0.37,0.09,0.00,0.00,0.00, 
                0.58,0.30,0.11,0.00,0.00,0.00,0.46,0.22,0.00,0.00,0.00,0.00, 
                0.09),nrow=5,ncol=5) 

volini-matrix(c(0,0,0,0,0),nrow=5,ncol=1)

volinp1-c(0, 0.0004669094, 0.0027610861, 0.0086204692, 0.0200137754, 
0.0389069106 ,0.0670942588, 0.1060941424, 0.1570990708, 0.2209672605, 
0.2982420945, 0.3891882830, 0.4938361307, 0.6120278338, 0.7434618363, 
0.8877329008, 1.0443667375, 1.2128488387, 1.3926476912, 1.5832328410, 
1.7840884399, 1.9947229566, 2.2146757191, 2.4435209092, 2.6808695568, 
2.9263700050, 3.1797072430, 3.4406014299, 3.7088058696, 3.9841046430, 
4.2663100561, 4.5552600226, 4.8508154713, 5.1528578389, 5.4612866929,
5.7760175114, 6.0969796345, 6.4241143947, 6.7573734248, 7, 7 ,7, 7, 7, 7, 7, 
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,  7,  
7, 7, 7)


I've calculated the following matrices vol and volyrdc1 (obviously with the 
help of Jeff and Arun):

#Blank matrices for dumping final values

vol - matrix( NA, nrow=5, ncol=length(volinp1))

volyrdc1-matrix(NA, nrow=5,ncol=length(volinp1),dimnames= 
list(c(DC1,DC2,DC3,DC4,DC5),c(seq(0,500,5

vol[ , 1 ] - dcmat %*% (volini+(volinp1[1]*wt))

wt-matrix(c(1,0,0,0,0),nrow=5)

for ( idx in seq_along(volinp1)[ -1 ] ) { 
  vol[ , idx ] - dcmat %*% ( vol[ , idx-1 ] + volinp1[idx] * wt ) 
}  

vol

volyrdc1[,1]-vol[,1]

for ( idx in seq_along(volinp1)[ -1 ] ) { 
  volyrdc1[ , idx ] - vol[ , idx-1 ] + volinp1[idx] * wt
  }  
volyrdc1

My final matrix in 'volyrdc1' (kind of transition matrix model). 

Now, what I want to do is to calculate when the colsum-colSums(volyrdc1) 
reaches a certain value and I want to get the index of the element in the 
'colsum' vector at that point. For e.g. when colsum[colsum=18] ? It will give 
a series of cases where the condition is true. But I want index of the element 
immediately when the condition is met. In this case, the answer I want is 140 
(colsum[29] returns both value (18.63) and the character (140) attributing 
the index). Actually, in my case 140 is year (age) when the 'colsum' becomes 
=18. At is point it would be great if I can calculate when 'colsum' levels 
off (up to two decimal place)? The answer is: 305 and at that point 
colsum==45.37.  

I also want to calculate what should be the value in volini[1,1] to get a 
certain value in 'colsum' at a certain year (age)(vector element index 
explained earlier)? For e.g. I want to find out that what should be the value 
in volini[1,1] if I want colsum==18 at 100(charater attributing colsum[21])? 
The answer is: 15910 and the 'volini' matrix will look like:

volini-matrix(c(15910,0,0,0,0),nrow=5,ncol=1)

Any pointer, suggestions,... will be gratefully acknowledged.

P.S. Can you please suggest me any effective R programming book that describe 
core elements of R programming?

Thanks in advance.

Regards,

Halim                
---
Md. Abdul Halim
Assistant Professor
Department of Forestry and Environmental Science
Shahjalal University of Science and Technology,Sylhet-3114,
Bangladesh.
Cell: +8801714078386.
alt. e-mail: xo...@yahoo.com







On Tue, 26 Nov 2013 20:21:14 -0800 (PST), arun wrote
 HI Halim,
 
 No problem.
 Regards,
 Arun
 
 On Tuesday, November 26, 2013 11:18 PM, halim10-fes halim10-
 f...@sust.edu wrote: Hi Arun,
 
 Thanks for your help. Sorry for my late response. Take care and stay 
 fine.
 
 Regards,
 
 Halim
 
 On Sun, 24 Nov 2013 07:45:24 -0800 (PST), arun wrote
  Hi Halim,
  I guess this works for you.  Modifying Jeff's solution:
  
  volinp-c(0,0.000467,0.002762,0.008621,0.020014,0.038907,0.067094)
  vol1 - dcmat %*% (volmat +wt)
  for(idx in seq_along(volinp)[-1]){
   vol1 - cbind(vol1,dcmat %*% (vol1[,idx-1] + volinp[idx] *wt))
   }
  
  #or
  
  vol - matrix( NA, nrow=5, ncol=length( volinp ) )
  vol[ , 1 ] - dcmat %*% ( volmat + wt )
  
  for ( idx in seq_along(volinp)[ -1 ] ) {
    vol[ , idx ] - dcmat %*% ( vol[ , idx-1 ] + volinp[idx] * wt )
  }
  identical(vol,vol1)
  #[1] TRUE
  
  A.K.
  
  On Sunday, November 24, 2013 7:16 AM, halim10-fes halim10-
  f...@sust.edu wrote: Hi Arun,
  
  OK, no problem. Thank you very much for your 

Re: [R] Relative Cumulative Frequency of Event Occurence

2013-11-28 Thread arun
HI,
From the dput() version of df.1, it looks like you want:
cumsum(df.1[,4]==Yes)/seq_len(nrow(df.1))
 [1] 0.000 0.500 0.333 0.250 0.400 0.333 0.4285714
 [8] 0.500 0.444 0.500


A.K.


On Thursday, November 28, 2013 11:26 AM, Burhan ul haq ulh...@gmail.com wrote:
Hi,

My objective is to calculate Relative (Cumulative) Frequency of Event
Occurrence - something as follows:

Sample.Number 1st.Fly 2nd.Fly  Did.E.occur? Relative.Cum.Frequency.of.E
1 G B No 0.000
2 B B Yes 0.500
3 B G No 0.333
4 G B No 0.250
5 G G Yes 0.400
6 G B No 0.333
7 B B Yes 0.429
8 G G Yes 0.500
9 G B No 0.444
10 B B Yes 0.500

Please refer to the code below:
##
# 1.
v.fly=c(G,B) # Outcome is Green or Blue fly

# 2.
n=10 # No of Events / Trials

# 3.
v.smp = seq(1:n) # Event Id

# 4.
v.fst = sample(v.fly,n,rep=T) # Simulating First Draw

# 5.
v.sec = sample(v.fly,n,rep=T)  # Simulating Second Draw

# 6.
df.1 = data.frame(sample = v.smp, fst=v.fst, sec = v.sec) # Clumping in a DF

# 7.
df.1$E.Occur = with(df.1, ifelse(fst==sec,TRUE,FALSE)) # Event Occurs, if
color is same in both the the draws

# 8.
df.1$Rel.Freq = with(df.1, cumsum(E.occur)/(E.Occur)) # Relative Frequency
 This line does NOT work, and needs to fix the denominator part
##

Problem is with #8, specifically the part:
cumsum(E.occur)/(E.Occur)

The denominator E.Occur is a fixed value, instead of a moving count. I have
tried nrow(), length() but none provides a moving version of row count, as
cumsum does for the True values, occurring so far.

 dput(df.1)
structure(list(Sample.Number = 1:10, X1st.Fly = c(G, B, B,
G, G, G, B, G, G, B), X2nd.Fly = c(B, B, G,
B, G, B, B, G, B, B), Did.E.occur. = c(No, Yes,
No, No, Yes, No, Yes, Yes, No, Yes),
Relative.Cum.Frequency.of.E = c(0,
0.5, 0.333, 0.25, 0.4, 0.333, 0.429, 0.5, 0.444, 0.5)), .Names =
c(Sample.Number,
X1st.Fly, X2nd.Fly, Did.E.occur., Relative.Cum.Frequency.of.E
), class = data.frame, row.names = c(NA, -10L))


Cheers !

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] polychoric correlation with multiple imputations, a Strate and a Weight

2013-11-28 Thread Jesse Gervais
Hi there,

I'm generally more a Stata user than a R user, but I need to computed
something, and I am not able to do it with Stata 13. So, here I am!

I have a database that has multiple  imputations (imputations are already
done) with a complex sample design (Strate and Weight).

 Is it possible, in R, to run polychoric correlation with multiple
imputation, a Strate and a Weight ?

For the moment, I have been able to do some statistical analysis with
Strate and Weight using the Survey package. I also know that there is a
package named Amelia II that can handle multiple imputations, and polycor
that can compute polychoric correlations. The problem is that I don't know
how to merge all those packages together, if it is possible. So:

1- It is possible to do so ?
2- IF yes, I'm not sure to know how to merge all this together. Any ideas?

I don't need any syntax (at least for the moment). I just need a hint of
where to start (is there a package that can handle it, is there somebody
who wrote paper(s) about it, etc.); I am completely stuck.

Thank you!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Counting variables repeted in dataframe columns to create a presence-absence table

2013-11-28 Thread Gmail
Hi!

I'm new in R and I'm writing you asking for some guidance. I had 
analyzed a comparative genomic microarray data of /56 Salmonella/ 
strains to identify absent genes in each of the serovars, and finally I 
got a matrix that looks like that:

  data[1:5,1:5]
   Abortusovis07918 Agona08561 Anatum08125 Arizonae65S Braenderup08488
1   S5305B_IGR S5305B_IGR  S5305B_IGR  S5305B_IGR S5305B_IGR
2   S5305A_IGR S5300A_IGR  S5305A_IGR  S5300A_IGR S5300A_IGR
3   S5300A_IGR S5300B_IGR  S5300A_IGR  S5300B_IGR S5300B_IGR
4   S5300B_IGR S5299B_IGR  S5300B_IGR  S5299B_IGR S5299B_IGR
5   S5299B_IGR S5299A_IGR  S5299B_IGR  S5829B_IGR S5299A_IGR

The variables corresponds to those genes identified as absent in each of 
the serovars. I would like to create a presence-absence matrix of those 
genes comparing all the serovars at the same time, I assume that should 
not be complicated but I don't know how to do it.

I would like a matrix similar to the next one:

  data_m[1:5,1:5]
   Abortusovis07918 Agona08561 Anatum08125 Arizonae65S 
Braenderup08488
S5305B_IGR  11   11  1
S5305A_IGR  10   10 0
S5300A_IGR  11   11  1

Any help would be welcome, and thank you in advance,

Oihane


-- 

Oihane Irazoki Sanchez
PhD Student, Molecular Microbiology

Genetics and Microbiology Department, Faculty of Biosciences
Autonomous University of Barcelona
08193 Bellaterra (Barcelona), Spain

Telf: 34 - 935 811 665
E-mail: oihane.iraz...@uab.cat / o.iraz...@gmail.com


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Assiging name to ip address range

2013-11-28 Thread arun
Hi , 


If it is like:


vec1 - 
c(10.20.30.01,10.20.30.02,10.20.30.40,10.20.30.41,10.20.30.45,10.20.30.254,10.20.30.255,10.20.30.256,10.20.30.313)
vec2 - 
as.numeric(paste0(gsub(^\\d{2}\\.\\d{2}\\.(\\d{2}\\.).*,\\1,vec1),sprintf(%03d,as.numeric(gsub(^\\d{2}\\.\\d{2}\\.\\d{2}\\.,,vec1)
 
as.character(cut(vec2,breaks=c(30,30.040,30.255,30.313),labels=paste0(SKH,1:3)))
#[1] SKH1 SKH1 SKH1 SKH2 SKH2 SKH2 SKH2 SKH3 SKH3


#or if the column is:
dat1 - data.frame(iprange =c(10.20.30.01 - 10.20.30.40, 10.20.30.40 - 
10.20.30.255))
 dat1[,1] - factor(dat1[,1],labels=paste0(SKH,1:2))
A.K.




I have an ip address column in my dataset which r read as factor.I want to 
create a new variable for a range 
like if 10.20.30.01 - 10.20.30.40 then SKH1 
     if 10.20.30.40 -10.20.30.255 then SKH2  so on 

10.20 will always remian same ,other values will change 
I have around 500 values which i want to assign as per ip address.i 
am not able to use greater than or less than function .please advise how
 to do that. 

Thanks!! 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding NA values in random positions in a dataframe

2013-11-28 Thread arun
Hi,
One way would be:
 set.seed(42)
 dat1 - 
as.data.frame(matrix(sample(c(1:5,NA),50,replace=TRUE,prob=c(10,15,15,20,30,10)),ncol=5))
set.seed(49)
 dat1[!is.na(dat1)][ match( 
sample(seq(dat1[!is.na(dat1)]),length(dat1[!is.na(dat1)])*(0.20)),seq(dat1[!is.na(dat1)]))]
 - NA
length(dat1[is.na(dat1)])/length(unlist(dat1))
#[1] 0.28

A.K.


Hello, I'm quite new at R so I don't know which is the most efficient 
way to execute a function that I could write easily in other languages. 

This is my problem: I have a dataframe with a certain numbers of
 NA (approximately 10%). I want to add other NA values in random 
positions of the dataframes until reaching an overall proportions of NA 
values of 30% (clearly the positions with NA values don't have to 
change). I tried looking at iterative function in R as apply or sapply 
but I can't actually figure out how to use them in this case. Thank you.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] importing many csv files into separate matrices

2013-11-28 Thread David Winsemius

On Nov 27, 2013, at 2:39 PM, yetik serbest wrote:

 Hi Everyone,
  
 I am trying to import many CSV files to their own matrices. Example, 
 alaska_93.csv to alaska. When I execute the following, for each csv.file 
 separately it is successful.
  
 singleCSVFile2Matrix - function(x,path) {
  assign(gsub(pattern=.csv,x,replacement=),read.csv(paste(path,x,sep=)))
 }
  
 when I try to include it in a loop in another function (I have so many csv 
 files to import), it doesn't work. I mean the following function doesn't do 
 it.
  
 loadCSVFiles_old - function(path) {
  x - list.files(path)
  for (i in 1:length(x)) {
   
 assign(gsub(pattern=.csv,x[i],replacement=),read.csv(paste(path,x[i],sep=)))
   }
 }

It appears you are not returning the values that you created inside that 
function to the global environment. I would have expected that you would either 
given `assign` an environment argument or that you would have created a list of 
items to return from the function.

?environment
?assign

Perhaps:

loadCSVFiles_old - function(path) {
 x - list.files(path)
 for (i in 1:length(x)) {
  assign(gsub(pattern=.csv,x[i],replacement=),
 read.csv(paste(path,x[i],sep=)))
 envir=.GlobalEnv
  }
}



  
 Instead, if I execute the foor loop in the command line, it works. I am 
 puzzled. Appreciate any help.
  
 thanks
 yetik
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Counting variables repeted in dataframe columns to create a presence-absence table

2013-11-28 Thread arun
Hi,
Try:
data_m - read.table(text=Abortusovis07918 Agona08561 Anatum08125 Arizonae65S 
Braenderup08488
1  S5305B_IGR S5305B_IGR  S5305B_IGR  S5305B_IGR S5305B_IGR
2  S5305A_IGR S5300A_IGR  S5305A_IGR  S5300A_IGR S5300A_IGR
3  S5300A_IGR S5300B_IGR  S5300A_IGR  S5300B_IGR S5300B_IGR
4  S5300B_IGR S5299B_IGR  S5300B_IGR  S5299B_IGR S5299B_IGR
5  S5299B_IGR S5299A_IGR  S5299B_IGR  S5829B_IGR 
S5299A_IGR,sep=,header=TRUE,stringsAsFactors=FALSE)
 data_m$new -1
library(reshape2)
 dM - melt(data_m,id.vars=new)
xtabs(new~value+variable,dM)
#or
 dcast(dM,value~variable,value.var=new,fill=0)


A.K.


On Thursday, November 28, 2013 12:18 PM, Gmail o.iraz...@gmail.com wrote:
Hi!

I'm new in R and I'm writing you asking for some guidance. I had 
analyzed a comparative genomic microarray data of /56 Salmonella/ 
strains to identify absent genes in each of the serovars, and finally I 
got a matrix that looks like that:

 data[1:5,1:5]
   Abortusovis07918 Agona08561 Anatum08125 Arizonae65S Braenderup08488
1       S5305B_IGR S5305B_IGR  S5305B_IGR  S5305B_IGR S5305B_IGR
2       S5305A_IGR S5300A_IGR  S5305A_IGR  S5300A_IGR S5300A_IGR
3       S5300A_IGR S5300B_IGR  S5300A_IGR  S5300B_IGR S5300B_IGR
4       S5300B_IGR S5299B_IGR  S5300B_IGR  S5299B_IGR S5299B_IGR
5       S5299B_IGR S5299A_IGR  S5299B_IGR  S5829B_IGR S5299A_IGR

The variables corresponds to those genes identified as absent in each of 
the serovars. I would like to create a presence-absence matrix of those 
genes comparing all the serovars at the same time, I assume that should 
not be complicated but I don't know how to do it.

I would like a matrix similar to the next one:

 data_m[1:5,1:5]
               Abortusovis07918 Agona08561 Anatum08125 Arizonae65S 
Braenderup08488
S5305B_IGR          1                1           1        1      1
S5305A_IGR          1                0           1        0     0
S5300A_IGR          1                1           1        1      1

Any help would be welcome, and thank you in advance,

Oihane


-- 

Oihane Irazoki Sanchez
PhD Student, Molecular Microbiology

Genetics and Microbiology Department, Faculty of Biosciences
Autonomous University of Barcelona
08193 Bellaterra (Barcelona), Spain

Telf: 34 - 935 811 665
E-mail: oihane.iraz...@uab.cat / o.iraz...@gmail.com


    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.