Re: [R] Help understanding loop behaviour

2021-04-30 Thread Rui Barradas

Hello,

Right, thanks. I should be


xx$I <- ave(xx$NUMBER_OF_YEARS, xx$COMPANY_NUMBER, FUN = function(x){
c(rep(1, length(x) - 1), length(x))  ### ???
})


Hope this helps,

Rui Barradas

Às 19:46 de 30/04/21, Bert Gunter escreveu:

There is something wrong here I believe -- see inline below:

Bert Gunter

"The trouble with having an open mind is that people keep coming along 
and sticking things into it."

-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Apr 30, 2021 at 10:37 AM Rui Barradas <mailto:ruipbarra...@sapo.pt>> wrote:


Hello,

For column J, ave/seq_along seems to be the simplest. For column I, ave
is also a good option, it avoids split/lapply.


xx$I <- ave(xx$NUMBER_OF_YEARS, xx$COMPANY_NUMBER, FUN = function(x){
    c(rep(1, length(x) - 1), max(length(x)))  ### ???
})

**
length() returns a single integer, so max(length(x)) makes no sense


xx$J <- ave(xx$NUMBER_OF_YEARS, xx$COMPANY_NUMBER, FUN = seq_along)


Hope this helps,

Às 11:49 de 30/04/21, PIKAL Petr escreveu:
 > Hallo,
 >
 > Sorry, my suggestion did not worked in your case correctly as
split used
 > natural factor ordering.
 >
 > So using Jim's data, this results in desired output.
 >
 > #prepare factor in original ordering
 > ff <- factor(xx[,1], levels=unique(xx[,1]))
 > lll <- split(xx$COMPANY_NUMBER, ff)
 > xx$I <- unlist(lapply(lll, function(x) c(rep(1, length(x)-1),
 > max(length(x,use.names=FALSE)
 > xx$J <- unlist(lapply(lll, function(x) 1:length(x)), use.names=FALSE)
 >> xx
 >     COMPANY_NUMBER NUMBER_OF_YEARS I J
 > 1           70837               3 1 1
 > 2           70837               3 1 2
 > 3           70837               3 3 3
 > 4         1000403               4 1 1
 > 5         1000403               4 1 2
 > 6         1000403               4 1 3
 > 7         1000403               4 4 4
 > 8        10029943               3 1 1
 > 9        10029943               3 1 2
 > 10       10029943               3 3 3
 > 11       10037980               4 1 1
 > 12       10037980               4 1 2
 > 13       10037980               4 1 3
 > 14       10037980               4 4 4
 > 15       10057418               3 1 1
 > 16       10057418               3 1 2
 > 17       10057418               3 3 3
 > 18        1009550               4 1 1
 > 19        1009550               4 1 2
 > 20        1009550               4 1 3
 > 21        1009550               4 4 4
 >
 > Cheers.
 > Petr
 >
 >> -Original Message-
 >> From: R-help mailto:r-help-boun...@r-project.org>> On Behalf Of Jim Lemon
     >> Sent: Friday, April 30, 2021 11:45 AM
 >> To: e-mail ma015k3113 mailto:ma015k3...@blueyonder.co.uk>>; r-help mailing list
 >> mailto:r-help@r-project.org>>
 >> Subject: Re: [R] Help understanding loop behaviour
 >>
 >> Hi email,
 >> If you want what you described, try this:
 >>
 >> xx<-read.table(text="COMPANY_NUMBER NUMBER_OF_YEARS
 >> 0070837  3
 >> 0070837  3
 >> 0070837  3
 >> 1000403  4
 >> 1000403  4
 >> 1000403  4
 >> 1000403  4
 >> 10029943  3
 >> 10029943  3
 >> 10029943  3
 >> 10037980  4
 >> 10037980  4
 >> 10037980  4
 >> 10037980  4
 >> 10057418  3
 >> 10057418  3
 >> 10057418  3
 >> 1009550  4
 >> 1009550  4
 >> 1009550  4
 >> 1009550  4",
 >> header=TRUE,stringsAsFactors=FALSE)
 >> xx$I<-NA
 >> xx$J<-NA
 >> row_count<-1
 >> for(row in 1:nrow(xx)) {
 >>   if(row == nrow(xx) ||
 >> xx$COMPANY_NUMBER[row]==xx$COMPANY_NUMBER[row+1]) {
 >>    xx$I[row]<-1
 >>    xx$J[row]<-row_count
 >>    row_count<-row_count+1
 >>   } else {
 >>    xx$I[row]<-xx$J[row]<-xx$NUMBER_OF_YEARS[row]
 >>    row_count<-1
 >>   }
 >> }
 >> xx
 >>
 >> Like Petr, I am assuming that you want company 10057418 treated
the same
 >> as the others. If not, let us know why. I am also adssuming that
the first
 > three
 >> rows should _not_ have a "#" at the beginning, which means that
they will
 > be
 >> discarded.
 >>
 >> Jim
 >>
 >> On Fri, Ap

Re: [R] Help understanding loop behaviour

2021-04-30 Thread Bert Gunter
There is something wrong here I believe -- see inline below:

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Apr 30, 2021 at 10:37 AM Rui Barradas  wrote:

> Hello,
>
> For column J, ave/seq_along seems to be the simplest. For column I, ave
> is also a good option, it avoids split/lapply.
>
>
> xx$I <- ave(xx$NUMBER_OF_YEARS, xx$COMPANY_NUMBER, FUN = function(x){
>c(rep(1, length(x) - 1), max(length(x)))  ### ???
> })
>
> **
length() returns a single integer, so max(length(x)) makes no sense



> xx$J <- ave(xx$NUMBER_OF_YEARS, xx$COMPANY_NUMBER, FUN = seq_along)
>
>
> Hope this helps,
>
> Às 11:49 de 30/04/21, PIKAL Petr escreveu:
> > Hallo,
> >
> > Sorry, my suggestion did not worked in your case correctly as split used
> > natural factor ordering.
> >
> > So using Jim's data, this results in desired output.
> >
> > #prepare factor in original ordering
> > ff <- factor(xx[,1], levels=unique(xx[,1]))
> > lll <- split(xx$COMPANY_NUMBER, ff)
> > xx$I <- unlist(lapply(lll, function(x) c(rep(1, length(x)-1),
> > max(length(x,use.names=FALSE)
> > xx$J <- unlist(lapply(lll, function(x) 1:length(x)), use.names=FALSE)
> >> xx
> > COMPANY_NUMBER NUMBER_OF_YEARS I J
> > 1   70837   3 1 1
> > 2   70837   3 1 2
> > 3   70837   3 3 3
> > 4 1000403   4 1 1
> > 5 1000403   4 1 2
> > 6 1000403   4 1 3
> > 7 1000403   4 4 4
> > 810029943   3 1 1
> > 910029943   3 1 2
> > 10   10029943   3 3 3
> > 11   10037980   4 1 1
> > 12   10037980   4 1 2
> > 13   10037980   4 1 3
> > 14   10037980   4 4 4
> > 15   10057418   3 1 1
> > 16   10057418   3 1 2
> > 17   10057418   3 3 3
> > 181009550   4 1 1
> > 191009550   4 1 2
> > 201009550   4 1 3
> > 211009550   4 4 4
> >
> > Cheers.
> > Petr
> >
> >> -Original Message-
> >> From: R-help  On Behalf Of Jim Lemon
> >> Sent: Friday, April 30, 2021 11:45 AM
> >> To: e-mail ma015k3113 ; r-help mailing
> list
> >> 
> >> Subject: Re: [R] Help understanding loop behaviour
> >>
> >> Hi email,
> >> If you want what you described, try this:
> >>
> >> xx<-read.table(text="COMPANY_NUMBER NUMBER_OF_YEARS
> >> 0070837  3
> >> 0070837  3
> >> 0070837  3
> >> 1000403  4
> >> 1000403  4
> >> 1000403  4
> >> 1000403  4
> >> 10029943  3
> >> 10029943  3
> >> 10029943  3
> >> 10037980  4
> >> 10037980  4
> >> 10037980  4
> >> 10037980  4
> >> 10057418  3
> >> 10057418  3
> >> 10057418  3
> >> 1009550  4
> >> 1009550  4
> >> 1009550  4
> >> 1009550  4",
> >> header=TRUE,stringsAsFactors=FALSE)
> >> xx$I<-NA
> >> xx$J<-NA
> >> row_count<-1
> >> for(row in 1:nrow(xx)) {
> >>   if(row == nrow(xx) ||
> >> xx$COMPANY_NUMBER[row]==xx$COMPANY_NUMBER[row+1]) {
> >>xx$I[row]<-1
> >>xx$J[row]<-row_count
> >>row_count<-row_count+1
> >>   } else {
> >>xx$I[row]<-xx$J[row]<-xx$NUMBER_OF_YEARS[row]
> >>row_count<-1
> >>   }
> >> }
> >> xx
> >>
> >> Like Petr, I am assuming that you want company 10057418 treated the same
> >> as the others. If not, let us know why. I am also adssuming that the
> first
> > three
> >> rows should _not_ have a "#" at the beginning, which means that they
> will
> > be
> >> discarded.
> >>
> >> Jim
> >>
> >> On Fri, Apr 30, 2021 at 1:41 AM e-mail ma015k3113 via R-help  >> project.org> wrote:
> >>>
> >>> I am trying to understand how loops in operate. I have a simple
> >>> dataframe xx which is as follows
> >>>
> >>> COMPANY_NUMBER   NUMBER_OF_YEARS
> >>>
> >>> #0070837 

Re: [R] Help understanding loop behaviour

2021-04-30 Thread Rui Barradas

Hello,

For column J, ave/seq_along seems to be the simplest. For column I, ave 
is also a good option, it avoids split/lapply.



xx$I <- ave(xx$NUMBER_OF_YEARS, xx$COMPANY_NUMBER, FUN = function(x){
  c(rep(1, length(x) - 1), max(length(x)))
})

xx$J <- ave(xx$NUMBER_OF_YEARS, xx$COMPANY_NUMBER, FUN = seq_along)


Hope this helps,

Às 11:49 de 30/04/21, PIKAL Petr escreveu:

Hallo,

Sorry, my suggestion did not worked in your case correctly as split used
natural factor ordering.

So using Jim's data, this results in desired output.

#prepare factor in original ordering
ff <- factor(xx[,1], levels=unique(xx[,1]))
lll <- split(xx$COMPANY_NUMBER, ff)
xx$I <- unlist(lapply(lll, function(x) c(rep(1, length(x)-1),
max(length(x,use.names=FALSE)
xx$J <- unlist(lapply(lll, function(x) 1:length(x)), use.names=FALSE)

xx

COMPANY_NUMBER NUMBER_OF_YEARS I J
1   70837   3 1 1
2   70837   3 1 2
3   70837   3 3 3
4 1000403   4 1 1
5 1000403   4 1 2
6 1000403   4 1 3
7 1000403   4 4 4
810029943   3 1 1
910029943   3 1 2
10   10029943   3 3 3
11   10037980   4 1 1
12   10037980   4 1 2
13   10037980   4 1 3
14   10037980   4 4 4
15   10057418   3 1 1
16   10057418   3 1 2
17   10057418   3 3 3
181009550   4 1 1
191009550   4 1 2
201009550   4 1 3
211009550   4 4 4

Cheers.
Petr


-Original Message-
From: R-help  On Behalf Of Jim Lemon
Sent: Friday, April 30, 2021 11:45 AM
To: e-mail ma015k3113 ; r-help mailing list

Subject: Re: [R] Help understanding loop behaviour

Hi email,
If you want what you described, try this:

xx<-read.table(text="COMPANY_NUMBER NUMBER_OF_YEARS
0070837  3
0070837  3
0070837  3
1000403  4
1000403  4
1000403  4
1000403  4
10029943  3
10029943  3
10029943  3
10037980  4
10037980  4
10037980  4
10037980  4
10057418  3
10057418  3
10057418  3
1009550  4
1009550  4
1009550  4
1009550  4",
header=TRUE,stringsAsFactors=FALSE)
xx$I<-NA
xx$J<-NA
row_count<-1
for(row in 1:nrow(xx)) {
  if(row == nrow(xx) ||
xx$COMPANY_NUMBER[row]==xx$COMPANY_NUMBER[row+1]) {
   xx$I[row]<-1
   xx$J[row]<-row_count
   row_count<-row_count+1
  } else {
   xx$I[row]<-xx$J[row]<-xx$NUMBER_OF_YEARS[row]
   row_count<-1
  }
}
xx

Like Petr, I am assuming that you want company 10057418 treated the same
as the others. If not, let us know why. I am also adssuming that the first

three

rows should _not_ have a "#" at the beginning, which means that they will

be

discarded.

Jim

On Fri, Apr 30, 2021 at 1:41 AM e-mail ma015k3113 via R-help  wrote:


I am trying to understand how loops in operate. I have a simple
dataframe xx which is as follows

COMPANY_NUMBER   NUMBER_OF_YEARS

#0070837 3
#0070837 3
#0070837 3
1000403   4
1000403   4
1000403   4
1000403   4
10029943 3
10029943 3
10029943 3
10037980 4
10037980 4
10037980 4
10037980 4
10057418 3
10057418 3

10057418 3
1009550   4
1009550   4
1009550   4
1009550   4
The code I have written is

while (i <= nrow(xx1) )

{

for (j in 1:xx1$NUMBER_OF_YEARS[i])
{
xx1$I[i] <- i
xx1$J[j] <- j
xx1$NUMBER_OF_YEARS_j[j] <- xx1$NUMBER_OF_YEARS[j] } i=i +
(xx1$NUMBER_OF_YEARS[i] ) } After running the code I want my

dataframe

to look like

|COMPANY_NUMBER |NUMBER_OF_YEARS| | I| |J|

|#0070837 |3| |1| |1|
|#0070837 |3| |1| |2|
|#0070837 |3| |3| |3|
|1000403 |4| |1| |1|
|1000403 |4| |1| |2|
|1000403 |4| |1| |3|
|1000403 |4| |4| |4|
|10029943 |3| |1| |1|
|10029943 |3| |1| |2|
|10029943 |3| |3| |3|
|10037980 |4| |1| |1|
|10037980 |4| |1| |2|
|10037980 |4| |1| |3|
|10037980 |4| |4| |4|
|10057418 |3| |1| |1|
|10057418 |3| |1| |1|
|10057418 |3| |1| |1|
|1009550 |4| |1| |1|
|1009550 |4| |1| |2|
|1009550 |4| |1| |3|
|1009550 |4| |4| |4|


I get the correct value of I but in the wrong row but the vaule of J
is correct in the first iteration and then it goes to 1

Any help will be greatly appreciated
 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and m

Re: [R] Help understanding loop behaviour

2021-04-30 Thread PIKAL Petr
Hallo,

Sorry, my suggestion did not worked in your case correctly as split used
natural factor ordering.

So using Jim's data, this results in desired output.

#prepare factor in original ordering
ff <- factor(xx[,1], levels=unique(xx[,1]))
lll <- split(xx$COMPANY_NUMBER, ff)
xx$I <- unlist(lapply(lll, function(x) c(rep(1, length(x)-1),
max(length(x,use.names=FALSE)
xx$J <- unlist(lapply(lll, function(x) 1:length(x)), use.names=FALSE)
> xx
   COMPANY_NUMBER NUMBER_OF_YEARS I J
1   70837   3 1 1
2   70837   3 1 2
3   70837   3 3 3
4 1000403   4 1 1
5 1000403   4 1 2
6 1000403   4 1 3
7 1000403   4 4 4
810029943   3 1 1
910029943   3 1 2
10   10029943   3 3 3
11   10037980   4 1 1
12   10037980   4 1 2
13   10037980   4 1 3
14   10037980   4 4 4
15   10057418   3 1 1
16   10057418   3 1 2
17   10057418   3 3 3
181009550   4 1 1
191009550   4 1 2
201009550   4 1 3
211009550   4 4 4

Cheers.
Petr

> -Original Message-
> From: R-help  On Behalf Of Jim Lemon
> Sent: Friday, April 30, 2021 11:45 AM
> To: e-mail ma015k3113 ; r-help mailing list
> 
> Subject: Re: [R] Help understanding loop behaviour
> 
> Hi email,
> If you want what you described, try this:
> 
> xx<-read.table(text="COMPANY_NUMBER NUMBER_OF_YEARS
> 0070837  3
> 0070837  3
> 0070837  3
> 1000403  4
> 1000403  4
> 1000403  4
> 1000403  4
> 10029943  3
> 10029943  3
> 10029943  3
> 10037980  4
> 10037980  4
> 10037980  4
> 10037980  4
> 10057418  3
> 10057418  3
> 10057418  3
> 1009550  4
> 1009550  4
> 1009550  4
> 1009550  4",
> header=TRUE,stringsAsFactors=FALSE)
> xx$I<-NA
> xx$J<-NA
> row_count<-1
> for(row in 1:nrow(xx)) {
>  if(row == nrow(xx) ||
> xx$COMPANY_NUMBER[row]==xx$COMPANY_NUMBER[row+1]) {
>   xx$I[row]<-1
>   xx$J[row]<-row_count
>   row_count<-row_count+1
>  } else {
>   xx$I[row]<-xx$J[row]<-xx$NUMBER_OF_YEARS[row]
>   row_count<-1
>  }
> }
> xx
> 
> Like Petr, I am assuming that you want company 10057418 treated the same
> as the others. If not, let us know why. I am also adssuming that the first
three
> rows should _not_ have a "#" at the beginning, which means that they will
be
> discarded.
> 
> Jim
> 
> On Fri, Apr 30, 2021 at 1:41 AM e-mail ma015k3113 via R-help  project.org> wrote:
> >
> > I am trying to understand how loops in operate. I have a simple
> > dataframe xx which is as follows
> >
> > COMPANY_NUMBER   NUMBER_OF_YEARS
> >
> > #0070837 3
> > #0070837 3
> > #0070837 3
> > 1000403   4
> > 1000403   4
> > 1000403   4
> > 1000403   4
> > 10029943 3
> > 10029943 3
> > 10029943 3
> > 10037980 4
> > 10037980 4
> > 10037980 4
> > 10037980 4
> > 10057418 3
> > 10057418 3
> >
> > 10057418 3
> > 1009550   4
> > 1009550   4
> > 1009550   4
> > 1009550   4
> > The code I have written is
> >
> > while (i <= nrow(xx1) )
> >
> > {
> >
> > for (j in 1:xx1$NUMBER_OF_YEARS[i])
> > {
> > xx1$I[i] <- i
> > xx1$J[j] <- j
> > xx1$NUMBER_OF_YEARS_j[j] <- xx1$NUMBER_OF_YEARS[j] } i=i +
> > (xx1$NUMBER_OF_YEARS[i] ) } After running the code I want my
> dataframe
> > to look like
> >
> > |COMPANY_NUMBER |NUMBER_OF_YEARS| | I| |J|
> >
> > |#0070837 |3| |1| |1|
> > |#0070837 |3| |1| |2|
> > |#0070837 |3| |3| |3|
> > |1000403 |4| |1| |1|
> > |1000403 |4| |1| |2|
> > |1000403 |4| |1| |3|
> > |1000403 |4| |4| |4|
> > |10029943 |3| |1| |1|
> > |10029943 |3| |1| |2|
> > |10029943 |3| |3| |3|
> > |10037980 |4| |1| |1|
> > |10037980 |4| |1| |2|
> > |10037980 |4| |1| |3|
> > |10037

Re: [R] Help understanding loop behaviour

2021-04-30 Thread Jim Lemon
Hi email,
If you want what you described, try this:

xx<-read.table(text="COMPANY_NUMBER NUMBER_OF_YEARS
0070837  3
0070837  3
0070837  3
1000403  4
1000403  4
1000403  4
1000403  4
10029943  3
10029943  3
10029943  3
10037980  4
10037980  4
10037980  4
10037980  4
10057418  3
10057418  3
10057418  3
1009550  4
1009550  4
1009550  4
1009550  4",
header=TRUE,stringsAsFactors=FALSE)
xx$I<-NA
xx$J<-NA
row_count<-1
for(row in 1:nrow(xx)) {
 if(row == nrow(xx) || xx$COMPANY_NUMBER[row]==xx$COMPANY_NUMBER[row+1]) {
  xx$I[row]<-1
  xx$J[row]<-row_count
  row_count<-row_count+1
 } else {
  xx$I[row]<-xx$J[row]<-xx$NUMBER_OF_YEARS[row]
  row_count<-1
 }
}
xx

Like Petr, I am assuming that you want company 10057418 treated the
same as the others. If not, let us know why. I am also adssuming that
the first three rows should _not_ have a "#" at the beginning, which
means that they will be discarded.

Jim

On Fri, Apr 30, 2021 at 1:41 AM e-mail ma015k3113 via R-help
 wrote:
>
> I am trying to understand how loops in operate. I have a simple dataframe xx 
> which is as follows
>
> COMPANY_NUMBER   NUMBER_OF_YEARS
>
> #0070837 3
> #0070837 3
> #0070837 3
> 1000403   4
> 1000403   4
> 1000403   4
> 1000403   4
> 10029943 3
> 10029943 3
> 10029943 3
> 10037980 4
> 10037980 4
> 10037980 4
> 10037980 4
> 10057418 3
> 10057418 3
>
> 10057418 3
> 1009550   4
> 1009550   4
> 1009550   4
> 1009550   4
> The code I have written is
>
> while (i <= nrow(xx1) )
>
> {
>
> for (j in 1:xx1$NUMBER_OF_YEARS[i])
> {
> xx1$I[i] <- i
> xx1$J[j] <- j
> xx1$NUMBER_OF_YEARS_j[j] <- xx1$NUMBER_OF_YEARS[j]
> }
> i=i + (xx1$NUMBER_OF_YEARS[i] )
> }
> After running the code I want my dataframe to look like
>
> |COMPANY_NUMBER |NUMBER_OF_YEARS| | I| |J|
>
> |#0070837 |3| |1| |1|
> |#0070837 |3| |1| |2|
> |#0070837 |3| |3| |3|
> |1000403 |4| |1| |1|
> |1000403 |4| |1| |2|
> |1000403 |4| |1| |3|
> |1000403 |4| |4| |4|
> |10029943 |3| |1| |1|
> |10029943 |3| |1| |2|
> |10029943 |3| |3| |3|
> |10037980 |4| |1| |1|
> |10037980 |4| |1| |2|
> |10037980 |4| |1| |3|
> |10037980 |4| |4| |4|
> |10057418 |3| |1| |1|
> |10057418 |3| |1| |1|
> |10057418 |3| |1| |1|
> |1009550 |4| |1| |1|
> |1009550 |4| |1| |2|
> |1009550 |4| |1| |3|
> |1009550 |4| |4| |4|
>
>
> I get the correct value of I but in the wrong row but the vaule of J is 
> correct in the first iteration and then it goes to 1
>
> Any help will be greatly appreciated
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help understanding loop behaviour

2021-04-30 Thread PIKAL Petr
Hi

Your code is hardly readable as you used HTML formating (not recommended) so
I used another (split) approach.

Third column seems to be simple 

#make list
lll <- split(as.factor(COMPANY_NUMBER), COMPANY_NUMBER)

#calculate sequences
as.numeric(unlist(lapply(lll, function(x) 1:length(x
should give you third column

The second column seems to be calculated this way.
lapply(lll, function(x) c(rep(1, length(x)-1), max(length(x

I believe others could come with simpler solutions.

BTW why result for
10057418 
Should be different?

Cheers
Petr

> -Original Message-
> From: R-help  On Behalf Of e-mail
> ma015k3113 via R-help
> Sent: Thursday, April 29, 2021 5:41 PM
> To: r-help@r-project.org
> Subject: [R] Help understanding loop behaviour
> 
> I am trying to understand how loops in operate. I have a simple dataframe
xx
> which is as follows
> 
> COMPANY_NUMBER   NUMBER_OF_YEARS
> 
> #0070837 3
> #0070837 3
> #0070837 3
> 1000403   4
> 1000403   4
> 1000403   4
> 1000403   4
> 10029943 3
> 10029943 3
> 10029943 3
> 10037980 4
> 10037980 4
> 10037980 4
> 10037980 4
> 10057418 3
> 10057418 3
> 
> 10057418 3
> 1009550   4
> 1009550   4
> 1009550   4
> 1009550   4
> The code I have written is
> 
> while (i <= nrow(xx1) )
> 
> {
> 
> for (j in 1:xx1$NUMBER_OF_YEARS[i])
> {
> xx1$I[i] <- i
> xx1$J[j] <- j
> xx1$NUMBER_OF_YEARS_j[j] <- xx1$NUMBER_OF_YEARS[j] } i=i +
> (xx1$NUMBER_OF_YEARS[i] ) } After running the code I want my dataframe
> to look like
> 
> |COMPANY_NUMBER |NUMBER_OF_YEARS| | I| |J|
> 
> |#0070837 |3| |1| |1|
> |#0070837 |3| |1| |2|
> |#0070837 |3| |3| |3|
> |1000403 |4| |1| |1|
> |1000403 |4| |1| |2|
> |1000403 |4| |1| |3|
> |1000403 |4| |4| |4|
> |10029943 |3| |1| |1|
> |10029943 |3| |1| |2|
> |10029943 |3| |3| |3|
> |10037980 |4| |1| |1|
> |10037980 |4| |1| |2|
> |10037980 |4| |1| |3|
> |10037980 |4| |4| |4|
> |10057418 |3| |1| |1|
> |10057418 |3| |1| |1|
> |10057418 |3| |1| |1|
> |1009550 |4| |1| |1|
> |1009550 |4| |1| |2|
> |1009550 |4| |1| |3|
> |1009550 |4| |4| |4|
> 
> 
> I get the correct value of I but in the wrong row but the vaule of J is
correct in
> the first iteration and then it goes to 1
> 
> Any help will be greatly appreciated
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help understanding loop behaviour

2021-04-29 Thread e-mail ma015k3113 via R-help
I am trying to understand how loops in operate. I have a simple dataframe xx 
which is as follows

COMPANY_NUMBER   NUMBER_OF_YEARS
 
#0070837 3
#0070837 3
#0070837 3
1000403   4
1000403   4
1000403   4
1000403   4
10029943 3
10029943 3
10029943 3
10037980 4
10037980 4
10037980 4
10037980 4
10057418 3
10057418 3

10057418 3
1009550   4
1009550   4
1009550   4
1009550   4
The code I have written is

while (i <= nrow(xx1) )

{

for (j in 1:xx1$NUMBER_OF_YEARS[i])
{
xx1$I[i] <- i
xx1$J[j] <- j
xx1$NUMBER_OF_YEARS_j[j] <- xx1$NUMBER_OF_YEARS[j]
}
i=i + (xx1$NUMBER_OF_YEARS[i] )
}
After running the code I want my dataframe to look like

|COMPANY_NUMBER |NUMBER_OF_YEARS| | I| |J|

|#0070837 |3| |1| |1|
|#0070837 |3| |1| |2|
|#0070837 |3| |3| |3|
|1000403 |4| |1| |1|
|1000403 |4| |1| |2|
|1000403 |4| |1| |3|
|1000403 |4| |4| |4|
|10029943 |3| |1| |1|
|10029943 |3| |1| |2|
|10029943 |3| |3| |3|
|10037980 |4| |1| |1|
|10037980 |4| |1| |2|
|10037980 |4| |1| |3|
|10037980 |4| |4| |4|
|10057418 |3| |1| |1|
|10057418 |3| |1| |1|
|10057418 |3| |1| |1|
|1009550 |4| |1| |1|
|1009550 |4| |1| |2|
|1009550 |4| |1| |3|
|1009550 |4| |4| |4|


I get the correct value of I but in the wrong row but the vaule of J is correct 
in the first iteration and then it goes to 1

Any help will be greatly appreciated
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] making code (loop) more efficient

2020-12-16 Thread Ana Marija
Indeed it was the issue with data.table. I converted it to data.frame
and it worked like a charm.
Thank you so much for your insight!

This is the code that worked:

library(parallel)
library(data.table)
library(doSNOW)

n <-  parallel::detectCores()
cl <- parallel::makeCluster(n, type = "SOCK")
doSNOW::registerDoSNOW(cl)
files <- list.files("/WEIGHTS1/Retina", pattern=".RDat", ignore.case=T)

lst_out <- foreach::foreach(i = seq_along(files),
  .packages = c("data.table") ) %dopar% {

   tmp <- get(load(files[i]))
   a <- data.table::copy(tmp)
   a=as.data.frame(a)
   rm(tmp)
   gc()

   names <- rownames(a)
   if("blup" %in% colnames(a)) {
 data <- data.table(names, a["blup"])
 nm1 <- c("rsid", "ref_allele", "eff_allele")
 data[,  (nm1) := tstrsplit(names, ":")[-2]]
 out <- data[, .(rsid, weight = blup, ref_allele, eff_allele)][,
   WGT := files[i]][]
} else {

 data <- data.table(names)
 nm1 <- c("rsid", "ref_allele", "eff_allele")
 data[,  (nm1) := tstrsplit(names, ":")[-2]]
 out <- data[, .(rsid,  ref_allele, eff_allele)][,
   WGT := files[i]][]
   }

return(out)
   rm(data)
   gc()
 }
parallel::stopCluster(cl)

big_data <- rbindlist(lst_out, fill = TRUE)

On Wed, Dec 16, 2020 at 9:31 AM Ana Marija  wrote:
>
> HI Jim,
>
> this is what I as running:
>
> library(parallel)
> library(data.table)
> library(foreach)
> library(doSNOW)
>
> n <-  parallel::detectCores()
> cl <- parallel::makeCluster(n, type = "SOCK")
> doSNOW::registerDoSNOW(cl)
> files <- list.files("/WEIGHTS1/Retina", pattern=".RDat", ignore.case=T)
>
> lst_out <- foreach::foreach(i = seq_along(files),
>   .packages = c("data.table") ) %dopar% {
>
>a <- get(load(files[i]))
> namesplit<-strsplit(rownames(a),":")
> rsid<-unlist(lapply(namesplit,"[",1))
> ref_allele<-unlist(lapply(namesplit,"[",3))
> eff_allele<-unlist(lapply(namesplit,"[",4))
> WGT<-rep(files[i],length(rsid))
> data<-data.frame(rsid=rsid,weight=a$blup, #weight is "blup" column
>  ref_allele=ref_allele,eff_allele,WGT=WGT)
>
>return(data)
>  }
> parallel::stopCluster(cl)
>
> big_data <- rbindlist(lst_out)
>
> and i got:
> Error in { : task 4 failed - "$ operator is invalid for atomic vectors"
> > parallel::stopCluster(cl)
>
> I uploaded 3 RDat file here if you want to try it
> https://github.com/montenegrina/sample
>
> Thank you for looking into this
> Ana
>
> On Tue, Dec 15, 2020 at 11:45 PM Jim Lemon  wrote:
> >
> > Hi Ana,
> > Back on the job. I'm not sure how this will work in your setup, but
> > here is a try:
> >
> > a<-read.table(text="top1 blup lasso enet
> > rs4980905:184404:C:A  0.07692622 -1.881795e-04 00
> > rs7978751:187541:G:C  0.62411425  9.934994e-04 00
> > rs2368831:188285:C:T  0.69529158  1.211028e-03 00
> > rs12830904:188335:T:A 0.92793158 -9.143555e-05 00
> > rs1500098:189093:G:C  0.42032471  9.001814e-04 00
> > rs79410690:190097:G:A 0.26194244  5.019037e-04 00",
> > header=TRUE,stringsAsFactors=FALSE)
> > namesplit<-strsplit(rownames(a),":")
> > rsid<-unlist(lapply(namesplit,"[",1))
> > ref_allele<-unlist(lapply(namesplit,"[",3))
> > eff_allele<-unlist(lapply(namesplit,"[",4))
> > # here I'm assuming that the filename
> > # is stored in files[i]
> > files<-"retina.ENSG0135776.wgt.RDat"
> > i<-1
> > WGT<-rep(files[i],length(rsid))
> > data<-data.frame(rsid=rsid,weight=a$top1,
> >  ref_allele=ref_allele,eff_allele,WGT=WGT)
> > data
> >
> > Note that the output is a data frame, not a data table. I hope that
> > the function for creating a data table is close enough to that for a
> > data frame for you to work it out. If not I can probably have a look
> > at it a bit later.
> >
> > Jim
> >
> > On Wed, Dec 16, 2020 at 1:45 PM Ana Marija  
> > wrote:
> > >
> > > Hi Jim,
> > >
> > > as always you're completely right, this is what is happening:
> > >
> > > > head(a)
> > > top1  blup lasso enet
> > > rs4980905:184404:C:A  0.07692622 -1.881795e-04 00
> > > rs7978751:187541:G:C  0.62411425  9.934994e-04 00
> > > rs2368831:188285:C:T  0.69529158  1.211028e-03 00
> > > rs12830904:188335:T:A 0.92793158 -9.143555e-05 00
> > > rs1500098:189093:G:C  0.42032471  9.001814e-04 00
> > > rs79410690:190097:G:A 0.26194244  5.019037e-04 00
> > > >names <- rownames(a)
> > > >data <- data.table(names, a["blup"])
> > > > head(data)
> > >names V2
> > > 1:  rs4980905:184404:C:A NA
> > > 2:  rs7978751:187541:G:C NA
> > > 3:  rs2368831:188285:C:T NA
> > > 4: rs12830904:188335:T:A NA
> > > 5:  rs1500098:189093:G:C NA
> > > 6: rs79410690:190097:G:A NA
> > >
> > > So my goal is to transform what is in "a" to this for every RDat file:
> > >
> > >   rsidweight ref_allele eff_allele
> > > 1:  rs72763981  9.376766e-09  C  G
> > > 2: rs144383755 -2.093346e-09  A  G
> > > 3:   

Re: [R] making code (loop) more efficient

2020-12-16 Thread Ana Marija
HI Jim,

this is what I as running:

library(parallel)
library(data.table)
library(foreach)
library(doSNOW)

n <-  parallel::detectCores()
cl <- parallel::makeCluster(n, type = "SOCK")
doSNOW::registerDoSNOW(cl)
files <- list.files("/WEIGHTS1/Retina", pattern=".RDat", ignore.case=T)

lst_out <- foreach::foreach(i = seq_along(files),
  .packages = c("data.table") ) %dopar% {

   a <- get(load(files[i]))
namesplit<-strsplit(rownames(a),":")
rsid<-unlist(lapply(namesplit,"[",1))
ref_allele<-unlist(lapply(namesplit,"[",3))
eff_allele<-unlist(lapply(namesplit,"[",4))
WGT<-rep(files[i],length(rsid))
data<-data.frame(rsid=rsid,weight=a$blup, #weight is "blup" column
 ref_allele=ref_allele,eff_allele,WGT=WGT)

   return(data)
 }
parallel::stopCluster(cl)

big_data <- rbindlist(lst_out)

and i got:
Error in { : task 4 failed - "$ operator is invalid for atomic vectors"
> parallel::stopCluster(cl)

I uploaded 3 RDat file here if you want to try it
https://github.com/montenegrina/sample

Thank you for looking into this
Ana

On Tue, Dec 15, 2020 at 11:45 PM Jim Lemon  wrote:
>
> Hi Ana,
> Back on the job. I'm not sure how this will work in your setup, but
> here is a try:
>
> a<-read.table(text="top1 blup lasso enet
> rs4980905:184404:C:A  0.07692622 -1.881795e-04 00
> rs7978751:187541:G:C  0.62411425  9.934994e-04 00
> rs2368831:188285:C:T  0.69529158  1.211028e-03 00
> rs12830904:188335:T:A 0.92793158 -9.143555e-05 00
> rs1500098:189093:G:C  0.42032471  9.001814e-04 00
> rs79410690:190097:G:A 0.26194244  5.019037e-04 00",
> header=TRUE,stringsAsFactors=FALSE)
> namesplit<-strsplit(rownames(a),":")
> rsid<-unlist(lapply(namesplit,"[",1))
> ref_allele<-unlist(lapply(namesplit,"[",3))
> eff_allele<-unlist(lapply(namesplit,"[",4))
> # here I'm assuming that the filename
> # is stored in files[i]
> files<-"retina.ENSG0135776.wgt.RDat"
> i<-1
> WGT<-rep(files[i],length(rsid))
> data<-data.frame(rsid=rsid,weight=a$top1,
>  ref_allele=ref_allele,eff_allele,WGT=WGT)
> data
>
> Note that the output is a data frame, not a data table. I hope that
> the function for creating a data table is close enough to that for a
> data frame for you to work it out. If not I can probably have a look
> at it a bit later.
>
> Jim
>
> On Wed, Dec 16, 2020 at 1:45 PM Ana Marija  
> wrote:
> >
> > Hi Jim,
> >
> > as always you're completely right, this is what is happening:
> >
> > > head(a)
> > top1  blup lasso enet
> > rs4980905:184404:C:A  0.07692622 -1.881795e-04 00
> > rs7978751:187541:G:C  0.62411425  9.934994e-04 00
> > rs2368831:188285:C:T  0.69529158  1.211028e-03 00
> > rs12830904:188335:T:A 0.92793158 -9.143555e-05 00
> > rs1500098:189093:G:C  0.42032471  9.001814e-04 00
> > rs79410690:190097:G:A 0.26194244  5.019037e-04 00
> > >names <- rownames(a)
> > >data <- data.table(names, a["blup"])
> > > head(data)
> >names V2
> > 1:  rs4980905:184404:C:A NA
> > 2:  rs7978751:187541:G:C NA
> > 3:  rs2368831:188285:C:T NA
> > 4: rs12830904:188335:T:A NA
> > 5:  rs1500098:189093:G:C NA
> > 6: rs79410690:190097:G:A NA
> >
> > So my goal is to transform what is in "a" to this for every RDat file:
> >
> >   rsidweight ref_allele eff_allele
> > 1:  rs72763981  9.376766e-09  C  G
> > 2: rs144383755 -2.093346e-09  A  G
> > 3:   rs1925717  1.511376e-08  T  C
> > 4:  rs61827307 -1.625302e-08  C  A
> > 5:  rs61827308 -1.625302e-08  G  C
> > 6: rs199623136 -9.128354e-10 GC  G
> >WGT
> > 1: retina.ENSG0135776.wgt.RDat
> > 2: retina.ENSG0135776.wgt.RDat
> > 3: retina.ENSG0135776.wgt.RDat
> > 4: retina.ENSG0135776.wgt.RDat
> > 5: retina.ENSG0135776.wgt.RDat
> > 6: retina.ENSG0135776.wgt.RDat
> >
> > so from rs4980905:184404:C:A I would take rs4980905 to be in column
> > "rsid", C in column "ref_allele" and A to be in column "eff_allele",
> > WGT column would just be filled with a name of the particular RDat
> > file.
> >
> > So the issue is in these lines:
> >
> > a <- get(load(files[i]))
> > names <- rownames(a)
> > data <- data.table(names, a["blup"])
> > nm1 <- c("rsid", "ref_allele", "eff_allele")
> >
> > any idea how I can rewrite this?
> >
> >
> >
> > On Tue, Dec 15, 2020 at 8:30 PM Jim Lemon  wrote:
> > >
> > > Hi Ana,
> > > I would look at "data" in your second example and see if it contains a
> > > column named "blup" or just the values that were extracted from
> > > a$blup. Also, I assume that weight=blup looks for an object named
> > > "blup", which may not be there.
> > >
> > > Jim
> > >
> > > On Wed, Dec 16, 2020 at 1:20 PM Ana Marija  
> > > wrote:
> > > >
> > > > Hi Jim,
> > > >
> > > > Maybe my post is confusing.
> > > >
> > > > so "dd" came from my slow code and I don't use it again in parallelized 

Re: [R] making code (loop) more efficient

2020-12-15 Thread Jim Lemon
Hi Ana,
Back on the job. I'm not sure how this will work in your setup, but
here is a try:

a<-read.table(text="top1 blup lasso enet
rs4980905:184404:C:A  0.07692622 -1.881795e-04 00
rs7978751:187541:G:C  0.62411425  9.934994e-04 00
rs2368831:188285:C:T  0.69529158  1.211028e-03 00
rs12830904:188335:T:A 0.92793158 -9.143555e-05 00
rs1500098:189093:G:C  0.42032471  9.001814e-04 00
rs79410690:190097:G:A 0.26194244  5.019037e-04 00",
header=TRUE,stringsAsFactors=FALSE)
namesplit<-strsplit(rownames(a),":")
rsid<-unlist(lapply(namesplit,"[",1))
ref_allele<-unlist(lapply(namesplit,"[",3))
eff_allele<-unlist(lapply(namesplit,"[",4))
# here I'm assuming that the filename
# is stored in files[i]
files<-"retina.ENSG0135776.wgt.RDat"
i<-1
WGT<-rep(files[i],length(rsid))
data<-data.frame(rsid=rsid,weight=a$top1,
 ref_allele=ref_allele,eff_allele,WGT=WGT)
data

Note that the output is a data frame, not a data table. I hope that
the function for creating a data table is close enough to that for a
data frame for you to work it out. If not I can probably have a look
at it a bit later.

Jim

On Wed, Dec 16, 2020 at 1:45 PM Ana Marija  wrote:
>
> Hi Jim,
>
> as always you're completely right, this is what is happening:
>
> > head(a)
> top1  blup lasso enet
> rs4980905:184404:C:A  0.07692622 -1.881795e-04 00
> rs7978751:187541:G:C  0.62411425  9.934994e-04 00
> rs2368831:188285:C:T  0.69529158  1.211028e-03 00
> rs12830904:188335:T:A 0.92793158 -9.143555e-05 00
> rs1500098:189093:G:C  0.42032471  9.001814e-04 00
> rs79410690:190097:G:A 0.26194244  5.019037e-04 00
> >names <- rownames(a)
> >data <- data.table(names, a["blup"])
> > head(data)
>names V2
> 1:  rs4980905:184404:C:A NA
> 2:  rs7978751:187541:G:C NA
> 3:  rs2368831:188285:C:T NA
> 4: rs12830904:188335:T:A NA
> 5:  rs1500098:189093:G:C NA
> 6: rs79410690:190097:G:A NA
>
> So my goal is to transform what is in "a" to this for every RDat file:
>
>   rsidweight ref_allele eff_allele
> 1:  rs72763981  9.376766e-09  C  G
> 2: rs144383755 -2.093346e-09  A  G
> 3:   rs1925717  1.511376e-08  T  C
> 4:  rs61827307 -1.625302e-08  C  A
> 5:  rs61827308 -1.625302e-08  G  C
> 6: rs199623136 -9.128354e-10 GC  G
>WGT
> 1: retina.ENSG0135776.wgt.RDat
> 2: retina.ENSG0135776.wgt.RDat
> 3: retina.ENSG0135776.wgt.RDat
> 4: retina.ENSG0135776.wgt.RDat
> 5: retina.ENSG0135776.wgt.RDat
> 6: retina.ENSG0135776.wgt.RDat
>
> so from rs4980905:184404:C:A I would take rs4980905 to be in column
> "rsid", C in column "ref_allele" and A to be in column "eff_allele",
> WGT column would just be filled with a name of the particular RDat
> file.
>
> So the issue is in these lines:
>
> a <- get(load(files[i]))
> names <- rownames(a)
> data <- data.table(names, a["blup"])
> nm1 <- c("rsid", "ref_allele", "eff_allele")
>
> any idea how I can rewrite this?
>
>
>
> On Tue, Dec 15, 2020 at 8:30 PM Jim Lemon  wrote:
> >
> > Hi Ana,
> > I would look at "data" in your second example and see if it contains a
> > column named "blup" or just the values that were extracted from
> > a$blup. Also, I assume that weight=blup looks for an object named
> > "blup", which may not be there.
> >
> > Jim
> >
> > On Wed, Dec 16, 2020 at 1:20 PM Ana Marija  
> > wrote:
> > >
> > > Hi Jim,
> > >
> > > Maybe my post is confusing.
> > >
> > > so "dd" came from my slow code and I don't use it again in parallelized 
> > > code.
> > >
> > > So for example for one of my files:
> > >
> > > if
> > > i="retina.ENSG0120647.wgt.RDat"
> > > > a <- get(load(i))
> > > > head(a)
> > > top1  blup lasso enet
> > > rs4980905:184404:C:A  0.07692622 -1.881795e-04 00
> > > rs7978751:187541:G:C  0.62411425  9.934994e-04 00
> > > rs2368831:188285:C:T  0.69529158  1.211028e-03 00
> > > ...
> > >
> > > Slow code was posted just to show what was running very slow and it
> > > was running. I really need help fixing parallelized version.
> > >
> > > On Tue, Dec 15, 2020 at 7:35 PM Jim Lemon  wrote:
> > > >
> > > > Hi Ana,
> > > > My guess is that in your second code fragment you are assigning the
> > > > rownames of "a" and the _values_ contained in a$blup to the data.table
> > > > "data". As I don't have much experience with data tables I may be
> > > > wrong, but I suspect that the column name "blup" may not be visible or
> > > > even present in "data". I don't see it in "dd" above this code
> > > > fragment.
> > > >
> > > > Jim
> > > >
> > > > On Wed, Dec 16, 2020 at 11:12 AM Ana Marija 
> > > >  wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > I made a terribly inefficient code which runs forever but it does run.
> > > > >
> > > > > library(dplyr)
> > > > > 

Re: [R] making code (loop) more efficient

2020-12-15 Thread Ana Marija
Hi Jim,

as always you're completely right, this is what is happening:

> head(a)
top1  blup lasso enet
rs4980905:184404:C:A  0.07692622 -1.881795e-04 00
rs7978751:187541:G:C  0.62411425  9.934994e-04 00
rs2368831:188285:C:T  0.69529158  1.211028e-03 00
rs12830904:188335:T:A 0.92793158 -9.143555e-05 00
rs1500098:189093:G:C  0.42032471  9.001814e-04 00
rs79410690:190097:G:A 0.26194244  5.019037e-04 00
>names <- rownames(a)
>data <- data.table(names, a["blup"])
> head(data)
   names V2
1:  rs4980905:184404:C:A NA
2:  rs7978751:187541:G:C NA
3:  rs2368831:188285:C:T NA
4: rs12830904:188335:T:A NA
5:  rs1500098:189093:G:C NA
6: rs79410690:190097:G:A NA

So my goal is to transform what is in "a" to this for every RDat file:

  rsidweight ref_allele eff_allele
1:  rs72763981  9.376766e-09  C  G
2: rs144383755 -2.093346e-09  A  G
3:   rs1925717  1.511376e-08  T  C
4:  rs61827307 -1.625302e-08  C  A
5:  rs61827308 -1.625302e-08  G  C
6: rs199623136 -9.128354e-10 GC  G
   WGT
1: retina.ENSG0135776.wgt.RDat
2: retina.ENSG0135776.wgt.RDat
3: retina.ENSG0135776.wgt.RDat
4: retina.ENSG0135776.wgt.RDat
5: retina.ENSG0135776.wgt.RDat
6: retina.ENSG0135776.wgt.RDat

so from rs4980905:184404:C:A I would take rs4980905 to be in column
"rsid", C in column "ref_allele" and A to be in column "eff_allele",
WGT column would just be filled with a name of the particular RDat
file.

So the issue is in these lines:

a <- get(load(files[i]))
names <- rownames(a)
data <- data.table(names, a["blup"])
nm1 <- c("rsid", "ref_allele", "eff_allele")

any idea how I can rewrite this?



On Tue, Dec 15, 2020 at 8:30 PM Jim Lemon  wrote:
>
> Hi Ana,
> I would look at "data" in your second example and see if it contains a
> column named "blup" or just the values that were extracted from
> a$blup. Also, I assume that weight=blup looks for an object named
> "blup", which may not be there.
>
> Jim
>
> On Wed, Dec 16, 2020 at 1:20 PM Ana Marija  
> wrote:
> >
> > Hi Jim,
> >
> > Maybe my post is confusing.
> >
> > so "dd" came from my slow code and I don't use it again in parallelized 
> > code.
> >
> > So for example for one of my files:
> >
> > if
> > i="retina.ENSG0120647.wgt.RDat"
> > > a <- get(load(i))
> > > head(a)
> > top1  blup lasso enet
> > rs4980905:184404:C:A  0.07692622 -1.881795e-04 00
> > rs7978751:187541:G:C  0.62411425  9.934994e-04 00
> > rs2368831:188285:C:T  0.69529158  1.211028e-03 00
> > ...
> >
> > Slow code was posted just to show what was running very slow and it
> > was running. I really need help fixing parallelized version.
> >
> > On Tue, Dec 15, 2020 at 7:35 PM Jim Lemon  wrote:
> > >
> > > Hi Ana,
> > > My guess is that in your second code fragment you are assigning the
> > > rownames of "a" and the _values_ contained in a$blup to the data.table
> > > "data". As I don't have much experience with data tables I may be
> > > wrong, but I suspect that the column name "blup" may not be visible or
> > > even present in "data". I don't see it in "dd" above this code
> > > fragment.
> > >
> > > Jim
> > >
> > > On Wed, Dec 16, 2020 at 11:12 AM Ana Marija  
> > > wrote:
> > > >
> > > > Hello,
> > > >
> > > > I made a terribly inefficient code which runs forever but it does run.
> > > >
> > > > library(dplyr)
> > > > library(splitstackshape)
> > > >
> > > > datalist = list()
> > > > files <- list.files("/WEIGHTS1/Retina", pattern=".RDat", ignore.case=T)
> > > >
> > > > for(i in files)
> > > > {
> > > > a<-get(load(i))
> > > > names <- rownames(a)
> > > > data <- as.data.frame(cbind(names,a))
> > > > rownames(data) <- NULL
> > > > dd=na.omit(concat.split.multiple(data = data, split.cols = c("names"),
> > > > seps = ":"))
> > > > dd=select(dd,names_1,blup,names_3,names_4)
> > > > colnames(dd)=c("rsid","weight","ref_allele","eff_allele")
> > > > dd$WGT<-i
> > > > datalist[[i]] <- dd # add it to your list
> > > > }
> > > >
> > > > big_data = do.call(rbind, datalist)
> > > >
> > > > There is 17345 RDat files this loop has to go through. And each file
> > > > has approximately 10,000 lines. All RDat files can be downloaded from
> > > > here: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE115828 and
> > > > they are compressed in this file: GSE115828_retina_TWAS_wgts.tar.gz .
> > > > And subset of 3 of those .RDat files is here:
> > > > https://github.com/montenegrina/sample
> > > >
> > > > For one of those files, say i="retina.ENSG0135776.wgt.RDat"
> > > > dd looks like this:
> > > >
> > > > > head(dd)
> > > >   rsidweight ref_allele eff_allele
> > > > 1:  rs72763981  9.376766e-09  C  G
> > > > 2: rs144383755 -2.093346e-09  A  G
> > > > 3:   

Re: [R] making code (loop) more efficient

2020-12-15 Thread Jim Lemon
Hi Ana,
I would look at "data" in your second example and see if it contains a
column named "blup" or just the values that were extracted from
a$blup. Also, I assume that weight=blup looks for an object named
"blup", which may not be there.

Jim

On Wed, Dec 16, 2020 at 1:20 PM Ana Marija  wrote:
>
> Hi Jim,
>
> Maybe my post is confusing.
>
> so "dd" came from my slow code and I don't use it again in parallelized code.
>
> So for example for one of my files:
>
> if
> i="retina.ENSG0120647.wgt.RDat"
> > a <- get(load(i))
> > head(a)
> top1  blup lasso enet
> rs4980905:184404:C:A  0.07692622 -1.881795e-04 00
> rs7978751:187541:G:C  0.62411425  9.934994e-04 00
> rs2368831:188285:C:T  0.69529158  1.211028e-03 00
> ...
>
> Slow code was posted just to show what was running very slow and it
> was running. I really need help fixing parallelized version.
>
> On Tue, Dec 15, 2020 at 7:35 PM Jim Lemon  wrote:
> >
> > Hi Ana,
> > My guess is that in your second code fragment you are assigning the
> > rownames of "a" and the _values_ contained in a$blup to the data.table
> > "data". As I don't have much experience with data tables I may be
> > wrong, but I suspect that the column name "blup" may not be visible or
> > even present in "data". I don't see it in "dd" above this code
> > fragment.
> >
> > Jim
> >
> > On Wed, Dec 16, 2020 at 11:12 AM Ana Marija  
> > wrote:
> > >
> > > Hello,
> > >
> > > I made a terribly inefficient code which runs forever but it does run.
> > >
> > > library(dplyr)
> > > library(splitstackshape)
> > >
> > > datalist = list()
> > > files <- list.files("/WEIGHTS1/Retina", pattern=".RDat", ignore.case=T)
> > >
> > > for(i in files)
> > > {
> > > a<-get(load(i))
> > > names <- rownames(a)
> > > data <- as.data.frame(cbind(names,a))
> > > rownames(data) <- NULL
> > > dd=na.omit(concat.split.multiple(data = data, split.cols = c("names"),
> > > seps = ":"))
> > > dd=select(dd,names_1,blup,names_3,names_4)
> > > colnames(dd)=c("rsid","weight","ref_allele","eff_allele")
> > > dd$WGT<-i
> > > datalist[[i]] <- dd # add it to your list
> > > }
> > >
> > > big_data = do.call(rbind, datalist)
> > >
> > > There is 17345 RDat files this loop has to go through. And each file
> > > has approximately 10,000 lines. All RDat files can be downloaded from
> > > here: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE115828 and
> > > they are compressed in this file: GSE115828_retina_TWAS_wgts.tar.gz .
> > > And subset of 3 of those .RDat files is here:
> > > https://github.com/montenegrina/sample
> > >
> > > For one of those files, say i="retina.ENSG0135776.wgt.RDat"
> > > dd looks like this:
> > >
> > > > head(dd)
> > >   rsidweight ref_allele eff_allele
> > > 1:  rs72763981  9.376766e-09  C  G
> > > 2: rs144383755 -2.093346e-09  A  G
> > > 3:   rs1925717  1.511376e-08  T  C
> > > 4:  rs61827307 -1.625302e-08  C  A
> > > 5:  rs61827308 -1.625302e-08  G  C
> > > 6: rs199623136 -9.128354e-10 GC  G
> > >WGT
> > > 1: retina.ENSG0135776.wgt.RDat
> > > 2: retina.ENSG0135776.wgt.RDat
> > > 3: retina.ENSG0135776.wgt.RDat
> > > 4: retina.ENSG0135776.wgt.RDat
> > > 5: retina.ENSG0135776.wgt.RDat
> > > 6: retina.ENSG0135776.wgt.RDat
> > >
> > > so on attempt to parallelize this I did this:
> > >
> > > library(parallel)
> > > library(data.table)
> > > library(foreach)
> > > library(doSNOW)
> > >
> > > n <-  parallel::detectCores()
> > > cl <- parallel::makeCluster(n, type = "SOCK")
> > > doSNOW::registerDoSNOW(cl)
> > > files <- list.files("/WEIGHTS1/Retina", pattern=".RDat", ignore.case=T)
> > >
> > > lst_out <- foreach::foreach(i = seq_along(files),
> > >   .packages = c("data.table") ) %dopar% {
> > >
> > >a <- get(load(files[i]))
> > >names <- rownames(a)
> > >data <- data.table(names, a["blup"])
> > >nm1 <- c("rsid", "ref_allele", "eff_allele")
> > >data[,  (nm1) := tstrsplit(names, ":")[-2]]
> > >return(data[, .(rsid, weight = blup, ref_allele, eff_allele)][,
> > >WGT := files[i]][])
> > >  }
> > > parallel::stopCluster(cl)
> > >
> > > big_data <- rbindlist(lst_out)
> > >
> > > I am getting this Error:
> > >
> > > Error in { : task 7 failed - "object 'blup' not found"
> > > > parallel::stopCluster(cl)
> > >
> > > Can you please advise,
> > > Ana
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide 
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see

Re: [R] making code (loop) more efficient

2020-12-15 Thread Ana Marija
Hi Jim,

Maybe my post is confusing.

so "dd" came from my slow code and I don't use it again in parallelized code.

So for example for one of my files:

if
i="retina.ENSG0120647.wgt.RDat"
> a <- get(load(i))
> head(a)
top1  blup lasso enet
rs4980905:184404:C:A  0.07692622 -1.881795e-04 00
rs7978751:187541:G:C  0.62411425  9.934994e-04 00
rs2368831:188285:C:T  0.69529158  1.211028e-03 00
...

Slow code was posted just to show what was running very slow and it
was running. I really need help fixing parallelized version.

On Tue, Dec 15, 2020 at 7:35 PM Jim Lemon  wrote:
>
> Hi Ana,
> My guess is that in your second code fragment you are assigning the
> rownames of "a" and the _values_ contained in a$blup to the data.table
> "data". As I don't have much experience with data tables I may be
> wrong, but I suspect that the column name "blup" may not be visible or
> even present in "data". I don't see it in "dd" above this code
> fragment.
>
> Jim
>
> On Wed, Dec 16, 2020 at 11:12 AM Ana Marija  
> wrote:
> >
> > Hello,
> >
> > I made a terribly inefficient code which runs forever but it does run.
> >
> > library(dplyr)
> > library(splitstackshape)
> >
> > datalist = list()
> > files <- list.files("/WEIGHTS1/Retina", pattern=".RDat", ignore.case=T)
> >
> > for(i in files)
> > {
> > a<-get(load(i))
> > names <- rownames(a)
> > data <- as.data.frame(cbind(names,a))
> > rownames(data) <- NULL
> > dd=na.omit(concat.split.multiple(data = data, split.cols = c("names"),
> > seps = ":"))
> > dd=select(dd,names_1,blup,names_3,names_4)
> > colnames(dd)=c("rsid","weight","ref_allele","eff_allele")
> > dd$WGT<-i
> > datalist[[i]] <- dd # add it to your list
> > }
> >
> > big_data = do.call(rbind, datalist)
> >
> > There is 17345 RDat files this loop has to go through. And each file
> > has approximately 10,000 lines. All RDat files can be downloaded from
> > here: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE115828 and
> > they are compressed in this file: GSE115828_retina_TWAS_wgts.tar.gz .
> > And subset of 3 of those .RDat files is here:
> > https://github.com/montenegrina/sample
> >
> > For one of those files, say i="retina.ENSG0135776.wgt.RDat"
> > dd looks like this:
> >
> > > head(dd)
> >   rsidweight ref_allele eff_allele
> > 1:  rs72763981  9.376766e-09  C  G
> > 2: rs144383755 -2.093346e-09  A  G
> > 3:   rs1925717  1.511376e-08  T  C
> > 4:  rs61827307 -1.625302e-08  C  A
> > 5:  rs61827308 -1.625302e-08  G  C
> > 6: rs199623136 -9.128354e-10 GC  G
> >WGT
> > 1: retina.ENSG0135776.wgt.RDat
> > 2: retina.ENSG0135776.wgt.RDat
> > 3: retina.ENSG0135776.wgt.RDat
> > 4: retina.ENSG0135776.wgt.RDat
> > 5: retina.ENSG0135776.wgt.RDat
> > 6: retina.ENSG0135776.wgt.RDat
> >
> > so on attempt to parallelize this I did this:
> >
> > library(parallel)
> > library(data.table)
> > library(foreach)
> > library(doSNOW)
> >
> > n <-  parallel::detectCores()
> > cl <- parallel::makeCluster(n, type = "SOCK")
> > doSNOW::registerDoSNOW(cl)
> > files <- list.files("/WEIGHTS1/Retina", pattern=".RDat", ignore.case=T)
> >
> > lst_out <- foreach::foreach(i = seq_along(files),
> >   .packages = c("data.table") ) %dopar% {
> >
> >a <- get(load(files[i]))
> >names <- rownames(a)
> >data <- data.table(names, a["blup"])
> >nm1 <- c("rsid", "ref_allele", "eff_allele")
> >data[,  (nm1) := tstrsplit(names, ":")[-2]]
> >return(data[, .(rsid, weight = blup, ref_allele, eff_allele)][,
> >WGT := files[i]][])
> >  }
> > parallel::stopCluster(cl)
> >
> > big_data <- rbindlist(lst_out)
> >
> > I am getting this Error:
> >
> > Error in { : task 7 failed - "object 'blup' not found"
> > > parallel::stopCluster(cl)
> >
> > Can you please advise,
> > Ana
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] making code (loop) more efficient

2020-12-15 Thread Jim Lemon
Hi Ana,
My guess is that in your second code fragment you are assigning the
rownames of "a" and the _values_ contained in a$blup to the data.table
"data". As I don't have much experience with data tables I may be
wrong, but I suspect that the column name "blup" may not be visible or
even present in "data". I don't see it in "dd" above this code
fragment.

Jim

On Wed, Dec 16, 2020 at 11:12 AM Ana Marija  wrote:
>
> Hello,
>
> I made a terribly inefficient code which runs forever but it does run.
>
> library(dplyr)
> library(splitstackshape)
>
> datalist = list()
> files <- list.files("/WEIGHTS1/Retina", pattern=".RDat", ignore.case=T)
>
> for(i in files)
> {
> a<-get(load(i))
> names <- rownames(a)
> data <- as.data.frame(cbind(names,a))
> rownames(data) <- NULL
> dd=na.omit(concat.split.multiple(data = data, split.cols = c("names"),
> seps = ":"))
> dd=select(dd,names_1,blup,names_3,names_4)
> colnames(dd)=c("rsid","weight","ref_allele","eff_allele")
> dd$WGT<-i
> datalist[[i]] <- dd # add it to your list
> }
>
> big_data = do.call(rbind, datalist)
>
> There is 17345 RDat files this loop has to go through. And each file
> has approximately 10,000 lines. All RDat files can be downloaded from
> here: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE115828 and
> they are compressed in this file: GSE115828_retina_TWAS_wgts.tar.gz .
> And subset of 3 of those .RDat files is here:
> https://github.com/montenegrina/sample
>
> For one of those files, say i="retina.ENSG0135776.wgt.RDat"
> dd looks like this:
>
> > head(dd)
>   rsidweight ref_allele eff_allele
> 1:  rs72763981  9.376766e-09  C  G
> 2: rs144383755 -2.093346e-09  A  G
> 3:   rs1925717  1.511376e-08  T  C
> 4:  rs61827307 -1.625302e-08  C  A
> 5:  rs61827308 -1.625302e-08  G  C
> 6: rs199623136 -9.128354e-10 GC  G
>WGT
> 1: retina.ENSG0135776.wgt.RDat
> 2: retina.ENSG0135776.wgt.RDat
> 3: retina.ENSG0135776.wgt.RDat
> 4: retina.ENSG0135776.wgt.RDat
> 5: retina.ENSG0135776.wgt.RDat
> 6: retina.ENSG0135776.wgt.RDat
>
> so on attempt to parallelize this I did this:
>
> library(parallel)
> library(data.table)
> library(foreach)
> library(doSNOW)
>
> n <-  parallel::detectCores()
> cl <- parallel::makeCluster(n, type = "SOCK")
> doSNOW::registerDoSNOW(cl)
> files <- list.files("/WEIGHTS1/Retina", pattern=".RDat", ignore.case=T)
>
> lst_out <- foreach::foreach(i = seq_along(files),
>   .packages = c("data.table") ) %dopar% {
>
>a <- get(load(files[i]))
>names <- rownames(a)
>data <- data.table(names, a["blup"])
>nm1 <- c("rsid", "ref_allele", "eff_allele")
>data[,  (nm1) := tstrsplit(names, ":")[-2]]
>return(data[, .(rsid, weight = blup, ref_allele, eff_allele)][,
>WGT := files[i]][])
>  }
> parallel::stopCluster(cl)
>
> big_data <- rbindlist(lst_out)
>
> I am getting this Error:
>
> Error in { : task 7 failed - "object 'blup' not found"
> > parallel::stopCluster(cl)
>
> Can you please advise,
> Ana
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] making code (loop) more efficient

2020-12-15 Thread Ana Marija
Hello,

I made a terribly inefficient code which runs forever but it does run.

library(dplyr)
library(splitstackshape)

datalist = list()
files <- list.files("/WEIGHTS1/Retina", pattern=".RDat", ignore.case=T)

for(i in files)
{
a<-get(load(i))
names <- rownames(a)
data <- as.data.frame(cbind(names,a))
rownames(data) <- NULL
dd=na.omit(concat.split.multiple(data = data, split.cols = c("names"),
seps = ":"))
dd=select(dd,names_1,blup,names_3,names_4)
colnames(dd)=c("rsid","weight","ref_allele","eff_allele")
dd$WGT<-i
datalist[[i]] <- dd # add it to your list
}

big_data = do.call(rbind, datalist)

There is 17345 RDat files this loop has to go through. And each file
has approximately 10,000 lines. All RDat files can be downloaded from
here: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE115828 and
they are compressed in this file: GSE115828_retina_TWAS_wgts.tar.gz .
And subset of 3 of those .RDat files is here:
https://github.com/montenegrina/sample

For one of those files, say i="retina.ENSG0135776.wgt.RDat"
dd looks like this:

> head(dd)
  rsidweight ref_allele eff_allele
1:  rs72763981  9.376766e-09  C  G
2: rs144383755 -2.093346e-09  A  G
3:   rs1925717  1.511376e-08  T  C
4:  rs61827307 -1.625302e-08  C  A
5:  rs61827308 -1.625302e-08  G  C
6: rs199623136 -9.128354e-10 GC  G
   WGT
1: retina.ENSG0135776.wgt.RDat
2: retina.ENSG0135776.wgt.RDat
3: retina.ENSG0135776.wgt.RDat
4: retina.ENSG0135776.wgt.RDat
5: retina.ENSG0135776.wgt.RDat
6: retina.ENSG0135776.wgt.RDat

so on attempt to parallelize this I did this:

library(parallel)
library(data.table)
library(foreach)
library(doSNOW)

n <-  parallel::detectCores()
cl <- parallel::makeCluster(n, type = "SOCK")
doSNOW::registerDoSNOW(cl)
files <- list.files("/WEIGHTS1/Retina", pattern=".RDat", ignore.case=T)

lst_out <- foreach::foreach(i = seq_along(files),
  .packages = c("data.table") ) %dopar% {

   a <- get(load(files[i]))
   names <- rownames(a)
   data <- data.table(names, a["blup"])
   nm1 <- c("rsid", "ref_allele", "eff_allele")
   data[,  (nm1) := tstrsplit(names, ":")[-2]]
   return(data[, .(rsid, weight = blup, ref_allele, eff_allele)][,
   WGT := files[i]][])
 }
parallel::stopCluster(cl)

big_data <- rbindlist(lst_out)

I am getting this Error:

Error in { : task 7 failed - "object 'blup' not found"
> parallel::stopCluster(cl)

Can you please advise,
Ana

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R for-loop to add layer to lattice plot

2020-10-28 Thread Luigi Marongiu
Awesome, thanks!

On Wed, Oct 28, 2020 at 7:00 AM Deepayan Sarkar
 wrote:
>
> On Tue, Oct 27, 2020 at 6:04 PM Luigi Marongiu  
> wrote:
> >
> > Hello,
> > I am using e1071 to run support vector machine. I would like to plot
> > the data with lattice and specifically show the hyperplanes created by
> > the system.
> > I can store the hyperplane as a contour in an object, and I can plot
> > one object at a time. Since there will be thousands of elements to
> > plot, I can't manually add them one by one to the plot, so I tried to
> > loop into them, but only the last is added.
> > Here it the working example for more clarity:
> >
> > ```
> > library(e1071)
> > library(lattice)
> > library(latticeExtra)
> >
> > make.grid <- function(x, n = 1000) {
> >   grange = apply(x, 2, range)
> >   x1 = seq(from = grange[1,1], to = grange[2,1], length = n)
> >   x2 = seq(from = grange[1,2], to = grange[2,2], length = n)
> >   expand.grid(X1 = x1, X2 = x2)
> > }
> >
> > plot_list <- list()
> > for (i in 1:10) {
> >   x1 = rnorm(100, mean = 0.2, sd = 0.15)
> >   y1 = rnorm(100, mean = 0.7, sd = 0.15)
> >   y2 = rnorm(100, mean = 0.2, sd = 0.15)
> >   x2 = rnorm(100, mean = 0.75, sd = 0.15)
> >   df = data.frame(x = c(x1,x2), y=c(y1,y2),
> >   z=c(rep(0, length(x1)), rep(1, length(x2
> >   df$z = factor(c(rep(0, length(x1)), rep(1, length(x2
> >   df[, "train"] <- ifelse(runif(nrow(df)) < 0.8, 1, 0)
> >   trainset <- df[df$train == 1, ]
> >   testset <- df[df$train == 0, ]
> >   trainColNum <- grep("train", names(df))
> >   trainset <- trainset[, -trainColNum]
> >   testset <- testset[, -trainColNum]
> >   svm_model <- svm(z ~ .,
> >   data = trainset,
> >   type = "C-classification",
> >   kernel = "linear",
> >   scale = FALSE)
> >   # generate contour
> >   xmat = make.grid(matrix(c(testset$x, testset$y),
> >   ncol = 2, byrow=FALSE))
> >   xgrid = as.data.frame(xmat)
> >   names(xgrid) = c("x", "y")
> >   z = predict(svm_model, xgrid)
> >   xyz_dat = as.data.frame(cbind(xgrid, z))
> >   plot_list[[i]] = contourplot(z ~ y+x, data=xyz_dat, pretty = TRUE,
> >xlim=c(-1,50), ylim=c(-0.001, 0.05),
> >labels = FALSE, col = "blue", lwd = 0.5)
> >
> > }
> > # the contour is stored in the object plot_list
> > str(plot_list) # confirm that there is data here
> >
> > # I can add one element at the time to lattice's xyplot and store it
> > in an object P
> > P = xyplot(y ~ x, group = z, data = df,
> >pch = 16, cex = 1.5, alpha = 0.25) + as.layer(plot_list[[1]]) +
> >   as.layer(plot_list[[2]])
> > plot(P)  # this demonstrates that the lines are not the same
> >
> > # but if I add the elements via loop, it does not work
> > for (i in 1:length(plot_list)) {
> >   print(i)
> >   P = xyplot(y ~ x, group = z, data = df,
> >  pch = 16, cex = 1.5, alpha = 0.25) + as.layer(plot_list[[i]])
> > }
> > plot(P)
> > ```
> >
> > Am I missing something?
>
> Yes, as Mark says, you need to change the last part to something like
>
> P = xyplot(y ~ x, group = z, data = df, pch = 16, cex = 1.5, alpha = 0.25)
> for (i in 1:length(plot_list)) {
>   print(i)
>   P = P + as.layer(plot_list[[i]])
> }
> plot(P)
>
> -Deepayan
>
> > Thank you
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.



-- 
Best regards,
Luigi

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R for-loop to add layer to lattice plot

2020-10-28 Thread Deepayan Sarkar
On Tue, Oct 27, 2020 at 6:04 PM Luigi Marongiu  wrote:
>
> Hello,
> I am using e1071 to run support vector machine. I would like to plot
> the data with lattice and specifically show the hyperplanes created by
> the system.
> I can store the hyperplane as a contour in an object, and I can plot
> one object at a time. Since there will be thousands of elements to
> plot, I can't manually add them one by one to the plot, so I tried to
> loop into them, but only the last is added.
> Here it the working example for more clarity:
>
> ```
> library(e1071)
> library(lattice)
> library(latticeExtra)
>
> make.grid <- function(x, n = 1000) {
>   grange = apply(x, 2, range)
>   x1 = seq(from = grange[1,1], to = grange[2,1], length = n)
>   x2 = seq(from = grange[1,2], to = grange[2,2], length = n)
>   expand.grid(X1 = x1, X2 = x2)
> }
>
> plot_list <- list()
> for (i in 1:10) {
>   x1 = rnorm(100, mean = 0.2, sd = 0.15)
>   y1 = rnorm(100, mean = 0.7, sd = 0.15)
>   y2 = rnorm(100, mean = 0.2, sd = 0.15)
>   x2 = rnorm(100, mean = 0.75, sd = 0.15)
>   df = data.frame(x = c(x1,x2), y=c(y1,y2),
>   z=c(rep(0, length(x1)), rep(1, length(x2
>   df$z = factor(c(rep(0, length(x1)), rep(1, length(x2
>   df[, "train"] <- ifelse(runif(nrow(df)) < 0.8, 1, 0)
>   trainset <- df[df$train == 1, ]
>   testset <- df[df$train == 0, ]
>   trainColNum <- grep("train", names(df))
>   trainset <- trainset[, -trainColNum]
>   testset <- testset[, -trainColNum]
>   svm_model <- svm(z ~ .,
>   data = trainset,
>   type = "C-classification",
>   kernel = "linear",
>   scale = FALSE)
>   # generate contour
>   xmat = make.grid(matrix(c(testset$x, testset$y),
>   ncol = 2, byrow=FALSE))
>   xgrid = as.data.frame(xmat)
>   names(xgrid) = c("x", "y")
>   z = predict(svm_model, xgrid)
>   xyz_dat = as.data.frame(cbind(xgrid, z))
>   plot_list[[i]] = contourplot(z ~ y+x, data=xyz_dat, pretty = TRUE,
>xlim=c(-1,50), ylim=c(-0.001, 0.05),
>labels = FALSE, col = "blue", lwd = 0.5)
>
> }
> # the contour is stored in the object plot_list
> str(plot_list) # confirm that there is data here
>
> # I can add one element at the time to lattice's xyplot and store it
> in an object P
> P = xyplot(y ~ x, group = z, data = df,
>pch = 16, cex = 1.5, alpha = 0.25) + as.layer(plot_list[[1]]) +
>   as.layer(plot_list[[2]])
> plot(P)  # this demonstrates that the lines are not the same
>
> # but if I add the elements via loop, it does not work
> for (i in 1:length(plot_list)) {
>   print(i)
>   P = xyplot(y ~ x, group = z, data = df,
>  pch = 16, cex = 1.5, alpha = 0.25) + as.layer(plot_list[[i]])
> }
> plot(P)
> ```
>
> Am I missing something?

Yes, as Mark says, you need to change the last part to something like

P = xyplot(y ~ x, group = z, data = df, pch = 16, cex = 1.5, alpha = 0.25)
for (i in 1:length(plot_list)) {
  print(i)
  P = P + as.layer(plot_list[[i]])
}
plot(P)

-Deepayan

> Thank you
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R for-loop to add layer to lattice plot

2020-10-27 Thread Mark Leeds
Hi: I think you're writing over the plots so only the last one exists.
Maybe try P = P + whatever but
I'm not sure if that's allowed with plots.



On Tue, Oct 27, 2020 at 8:34 AM Luigi Marongiu 
wrote:

> Hello,
> I am using e1071 to run support vector machine. I would like to plot
> the data with lattice and specifically show the hyperplanes created by
> the system.
> I can store the hyperplane as a contour in an object, and I can plot
> one object at a time. Since there will be thousands of elements to
> plot, I can't manually add them one by one to the plot, so I tried to
> loop into them, but only the last is added.
> Here it the working example for more clarity:
>
> ```
> library(e1071)
> library(lattice)
> library(latticeExtra)
>
> make.grid <- function(x, n = 1000) {
>   grange = apply(x, 2, range)
>   x1 = seq(from = grange[1,1], to = grange[2,1], length = n)
>   x2 = seq(from = grange[1,2], to = grange[2,2], length = n)
>   expand.grid(X1 = x1, X2 = x2)
> }
>
> plot_list <- list()
> for (i in 1:10) {
>   x1 = rnorm(100, mean = 0.2, sd = 0.15)
>   y1 = rnorm(100, mean = 0.7, sd = 0.15)
>   y2 = rnorm(100, mean = 0.2, sd = 0.15)
>   x2 = rnorm(100, mean = 0.75, sd = 0.15)
>   df = data.frame(x = c(x1,x2), y=c(y1,y2),
>   z=c(rep(0, length(x1)), rep(1, length(x2
>   df$z = factor(c(rep(0, length(x1)), rep(1, length(x2
>   df[, "train"] <- ifelse(runif(nrow(df)) < 0.8, 1, 0)
>   trainset <- df[df$train == 1, ]
>   testset <- df[df$train == 0, ]
>   trainColNum <- grep("train", names(df))
>   trainset <- trainset[, -trainColNum]
>   testset <- testset[, -trainColNum]
>   svm_model <- svm(z ~ .,
>   data = trainset,
>   type = "C-classification",
>   kernel = "linear",
>   scale = FALSE)
>   # generate contour
>   xmat = make.grid(matrix(c(testset$x, testset$y),
>   ncol = 2, byrow=FALSE))
>   xgrid = as.data.frame(xmat)
>   names(xgrid) = c("x", "y")
>   z = predict(svm_model, xgrid)
>   xyz_dat = as.data.frame(cbind(xgrid, z))
>   plot_list[[i]] = contourplot(z ~ y+x, data=xyz_dat, pretty = TRUE,
>xlim=c(-1,50), ylim=c(-0.001, 0.05),
>labels = FALSE, col = "blue", lwd = 0.5)
>
> }
> # the contour is stored in the object plot_list
> str(plot_list) # confirm that there is data here
>
> # I can add one element at the time to lattice's xyplot and store it
> in an object P
> P = xyplot(y ~ x, group = z, data = df,
>pch = 16, cex = 1.5, alpha = 0.25) + as.layer(plot_list[[1]]) +
>   as.layer(plot_list[[2]])
> plot(P)  # this demonstrates that the lines are not the same
>
> # but if I add the elements via loop, it does not work
> for (i in 1:length(plot_list)) {
>   print(i)
>   P = xyplot(y ~ x, group = z, data = df,
>  pch = 16, cex = 1.5, alpha = 0.25) + as.layer(plot_list[[i]])
> }
> plot(P)
> ```
>
> Am I missing something?
> Thank you
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R for-loop to add layer to lattice plot

2020-10-27 Thread Luigi Marongiu
Hello,
I am using e1071 to run support vector machine. I would like to plot
the data with lattice and specifically show the hyperplanes created by
the system.
I can store the hyperplane as a contour in an object, and I can plot
one object at a time. Since there will be thousands of elements to
plot, I can't manually add them one by one to the plot, so I tried to
loop into them, but only the last is added.
Here it the working example for more clarity:

```
library(e1071)
library(lattice)
library(latticeExtra)

make.grid <- function(x, n = 1000) {
  grange = apply(x, 2, range)
  x1 = seq(from = grange[1,1], to = grange[2,1], length = n)
  x2 = seq(from = grange[1,2], to = grange[2,2], length = n)
  expand.grid(X1 = x1, X2 = x2)
}

plot_list <- list()
for (i in 1:10) {
  x1 = rnorm(100, mean = 0.2, sd = 0.15)
  y1 = rnorm(100, mean = 0.7, sd = 0.15)
  y2 = rnorm(100, mean = 0.2, sd = 0.15)
  x2 = rnorm(100, mean = 0.75, sd = 0.15)
  df = data.frame(x = c(x1,x2), y=c(y1,y2),
  z=c(rep(0, length(x1)), rep(1, length(x2
  df$z = factor(c(rep(0, length(x1)), rep(1, length(x2
  df[, "train"] <- ifelse(runif(nrow(df)) < 0.8, 1, 0)
  trainset <- df[df$train == 1, ]
  testset <- df[df$train == 0, ]
  trainColNum <- grep("train", names(df))
  trainset <- trainset[, -trainColNum]
  testset <- testset[, -trainColNum]
  svm_model <- svm(z ~ .,
  data = trainset,
  type = "C-classification",
  kernel = "linear",
  scale = FALSE)
  # generate contour
  xmat = make.grid(matrix(c(testset$x, testset$y),
  ncol = 2, byrow=FALSE))
  xgrid = as.data.frame(xmat)
  names(xgrid) = c("x", "y")
  z = predict(svm_model, xgrid)
  xyz_dat = as.data.frame(cbind(xgrid, z))
  plot_list[[i]] = contourplot(z ~ y+x, data=xyz_dat, pretty = TRUE,
   xlim=c(-1,50), ylim=c(-0.001, 0.05),
   labels = FALSE, col = "blue", lwd = 0.5)

}
# the contour is stored in the object plot_list
str(plot_list) # confirm that there is data here

# I can add one element at the time to lattice's xyplot and store it
in an object P
P = xyplot(y ~ x, group = z, data = df,
   pch = 16, cex = 1.5, alpha = 0.25) + as.layer(plot_list[[1]]) +
  as.layer(plot_list[[2]])
plot(P)  # this demonstrates that the lines are not the same

# but if I add the elements via loop, it does not work
for (i in 1:length(plot_list)) {
  print(i)
  P = xyplot(y ~ x, group = z, data = df,
 pch = 16, cex = 1.5, alpha = 0.25) + as.layer(plot_list[[i]])
}
plot(P)
```

Am I missing something?
Thank you

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to loop over two files ...

2020-06-22 Thread Ana Marija
Thank you so much as.character(r) indeed resolved the issue!

On Sat, Jun 20, 2020 at 3:47 AM Ivan Krylov  wrote:
>
> On Fri, 19 Jun 2020 19:36:41 -0500
> Ana Marija  wrote:
>
> > Error in cat(x, file = file, sep = c(rep.int(sep, ncolumns - 1),
> > "\n"),  : argument 1 (type 'list') cannot be handled by 'cat'
>
> It might be a good idea to try to solve problems like this yourself
> instead of waiting for hours for someone to reply. All the required
> information is there in the error message: write() fails because r is a
> list. Why is r a list? It's returned from GET(), so let's read its
> documentation.
>
> httr::GET() returns a response object, not a string [1]. Try passing
> as.character(r) or content(r,'text') instead of just r to write(...) or
> use a different way of extracting the actual response from the response
> object.
>
> --
> Best regards,
> Ivan
>
> [1] https://httr.r-lib.org/reference/GET.html

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to loop over two files ...

2020-06-20 Thread Rasmus Liland
On 2020-06-20 07:29 -0700, Bert Gunter wrote:
> On Fri, Jun 19, 2020 at 11:17 PM Rasmus Liland wrote:
> > On 2020-06-19 18:33 -0700, Bert Gunter wrote:
> > > Why aren't you posting on the 
> > > Bioconductor Help forum instead
> >
> > Perhaps r-sig-genetics@ or r-sig-phylo@?  
> 
> genetics is not genomics. Nor are 
> phylogenies. I still believe Bioc is the 
> right resource.

Right, I had a hunch all these fields 
were kind of related somehow ¯\_(ツ)_/¯


signature.asc
Description: PGP signature
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to loop over two files ...

2020-06-20 Thread Bert Gunter
genetics is not genomics. Nor are phylogenies. I still believe Bioc is the
right resource.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Jun 19, 2020 at 11:17 PM Rasmus Liland  wrote:

> Dear Bert,
>
> On 2020-06-19 18:33 -0700, Bert Gunter wrote:
> > All of your torrent of requests for help to
> > have others do your work for you are about
> > genomics issues.
>
> I am a bioinformatician, I am supposed should
> know all of these things, GWAS, Le ggplot,
> etc. ...
>
> > Why aren't you posting on the Bioconductor
> > Help forum instead, where both the
> > expertise and tools for such matters exist?
> > I would characterize your posts here as
> > being largely inappropriate for that
> > reason.
>
> Perhaps r-sig-genetics@ or r-sig-phylo@?  It
> needs some more volume ...
>
> Best,
> Rasmus
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to loop over two files ...

2020-06-20 Thread Ivan Krylov
On Fri, 19 Jun 2020 19:36:41 -0500
Ana Marija  wrote:

> Error in cat(x, file = file, sep = c(rep.int(sep, ncolumns - 1),
> "\n"),  : argument 1 (type 'list') cannot be handled by 'cat'

It might be a good idea to try to solve problems like this yourself
instead of waiting for hours for someone to reply. All the required
information is there in the error message: write() fails because r is a
list. Why is r a list? It's returned from GET(), so let's read its
documentation.

httr::GET() returns a response object, not a string [1]. Try passing
as.character(r) or content(r,'text') instead of just r to write(...) or
use a different way of extracting the actual response from the response
object.

-- 
Best regards,
Ivan

[1] https://httr.r-lib.org/reference/GET.html

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to loop over two files ...

2020-06-20 Thread Rasmus Liland
Dear Bert,

On 2020-06-19 18:33 -0700, Bert Gunter wrote:
> All of your torrent of requests for help to 
> have others do your work for you are about 
> genomics issues.

I am a bioinformatician, I am supposed should 
know all of these things, GWAS, Le ggplot, 
etc. ... 

> Why aren't you posting on the Bioconductor 
> Help forum instead, where both the 
> expertise and tools for such matters exist?  
> I would characterize your posts here as 
> being largely inappropriate for that 
> reason.

Perhaps r-sig-genetics@ or r-sig-phylo@?  It 
needs some more volume ... 

Best,
Rasmus


signature.asc
Description: PGP signature
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to loop over two files ...

2020-06-19 Thread Bert Gunter
All of your torrent of requests for help to have others do your work for
you are about genomics issues. Why aren't you posting on the Bioconductor
Help forum instead, where both the expertise and tools for such matters
exist?  I would characterize your posts here as being largely inappropriate
for that reason.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Jun 19, 2020 at 5:35 PM Ana Marija 
wrote:

> Hi Rasmus,
>
> I got those SNPs from two GWAS-es which I run with different
> phenotypes and I would like to compare weather the top SNPs in both of
> them are in LD.
> So 1n.txt and 1g.txt are just top SNPs from those two GWAS-es.
> Unfortunately https://ldlink.nci.nih.gov/?tab=ldpair works for only
> two SNPs at the time and I need to do that for 300 pairs
>
> On Fri, Jun 19, 2020 at 6:42 PM Rasmus Liland  wrote:
> >
> > On 2020-06-19 14:34 -0500, Ana Marija wrote:
> > >
> > > I have two files (each has 300 lines)like this:
> >
> > The example looks quite similar to the R example in
> > https://rest.ensembl.org/documentation/info/ld_pairwise_get#ra
> > ...
> >
> > The question becomes: how did you query the
> > 600 variant names in 1g.txt and 1n.txt?
> >
> >   curl 'https://rest.ensembl.org/ld/human/pairwise/rs6792369/rs1042779?'
> -H 'Content-type:application/json'
> >
> > shows the 26 population_names for the
> > rs6792369/rs1042779 combination ...
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to loop over two files ...

2020-06-19 Thread Ana Marija
unfortunately it complains again:

> f1 <- read_tsv("1g", col_names=F)
Parsed with column specification:
cols(
  X1 = col_character()
)
> f2 <- read_tsv("1n", col_names=F)
Parsed with column specification:
cols(
  X1 = col_character()
)
> for ( a in rownames(f1) ) {
+
+for ( b in rownames(f2) ) {
+
+ ext <- paste0( "/ld/human/pairwise/",
+   f1[a,1],
+   "/",
+   f2[b,1],
+   "?population_name=1000GENOMES:phase_3:KHV")
+
+ r <- GET(paste(server, ext, sep = ""),
+ content_type("application/json"))
+
+ write(r,file="list.txt",append=TRUE)
+
+
+}
+
+ }
Error in cat(x, file = file, sep = c(rep.int(sep, ncolumns - 1), "\n"),  :
  argument 1 (type 'list') cannot be handled by 'cat'

> traceback()
2: cat(x, file = file, sep = c(rep.int(sep, ncolumns - 1), "\n"),
   append = append)
1: write(r, file = "list.txt", append = TRUE)

On Fri, Jun 19, 2020 at 5:19 PM  wrote:
>
> Sorry - its been a long week!
>
> there is a foreach package but I try to avoid extras
>
> make your for statements:
>
> for ( a in rownames(f1) ) {
>
> # a will now be a row number rather than the value, so replace ' a ' in
> the paste0 with: f1[ a, 1]
>
> so
>
> ext <- paste0( "/ld/human/pairwise/",
>   f1[a,1],
>   "/",
>   f2[b,1],
>   "?population_name=1000GENOMES:phase_3:KHV")
>
> On 2020-06-19 22:54, Ana Marija wrote:
> > I tried it:
> >
> >  > library(httr)
> >> library(jsonlite)
> >> library(xml2)
> >> library(readr)
> >> server <- "http://rest.ensembl.org;
> >> f1 <- read_tsv("1g", col_names=F)
> > Parsed with column specification:
> > cols(
> >   X1 = col_character()
> > )
> >> f2 <- read_tsv("1n", col_names=F)
> > Parsed with column specification:
> > cols(
> >   X1 = col_character()
> > )
> >>
> >> for ( a in as.list(f1[,1]) ) {
> > +
> > +for ( b in as.list(f2[,1]) ) {
> > +
> > + ext <- paste0( "/ld/human/pairwise/",
> > + a,
> > + "/",
> > + b,
> > + "?population_name=1000GENOMES:phase_3:KHV")
> > +
> > + r <- GET(paste(server, ext, sep = ""),
> > + content_type("application/json"))
> > +
> > + write(r,file="list.txt",append=TRUE)
> > +
> > +
> > +}
> > +
> > + }
> > Error in parse_url(url) : length(url) == 1 is not TRUE
> >
> >> traceback()
> > 10: stop(simpleError(msg, call = if (p <- sys.parent(1L)) sys.call(p)))
> > 9: stopifnot(length(url) == 1)
> > 8: parse_url(url)
> > 7: is.url(url)
> > 6: stopifnot(is.url(url))
> > 5: build_url(parse_url(url)[c("scheme", "hostname", "port")])
> > 4: handle_name(url)
> > 3: handle_find(url)
> > 2: handle_url(handle, url, ...)
> > 1: GET(paste(server, ext, sep = ""), content_type("application/json"))
> >
> > On Fri, Jun 19, 2020 at 4:41 PM  wrote:
> >>
> >> Oh - read.text isn't in base!  Not sure where is came from (my head
> >> mostly!)  You may have something that adds it but better to use
> >> something that works.  So try using:
> >>
> >> library(readr)
> >> f1 <- read_tsv("1g.txt", col.names=F)
> >>
> >> This will give you a tibble with f1$X1 with the file in it
> >>
> >> then loop it with (a in as.list(f1[,1])
> >>
> >> Others will have much slicker code than me!
> >>
> >> On 2020-06-19 22:02, Ana Marija wrote:
> >> > Hi,
> >> >
> >> > thanks for getting back to me, it is just for my job :)
> >> >
> >> > so I tried it:
> >> >
> >> > library(httr)
> >> > library(jsonlite)
> >> > library(xml2)
> >> > library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R",
> >> > "lib")))
> >> > sparkR.session(master = "local[*]", sparkConfig =
> >> > list(spark.driver.memory = "2g"))
> >> >
> >> > server <- "http://rest.ensembl.org;
> >> >
> >> > f1 <- read.text("1g.txt")
> >> > f2 <- read.text("1n.txt")
> >> >
> >> > for ( a in as.list(f1) ) {
> >> >
> >> >for ( b in as.list(f2) ) {
> >> >
> >> > ext <- paste0( "/ld/human/pairwise/",
> >> > a,
> >> > "/",
> >> > b,
> >> > "?population_name=1000GENOMES:phase_3:KHV")
> >> >
> >> > r <- GET(paste(server, ext, sep = ""),
> >> > content_type("application/json"))
> >> >
> >> > write(r,file="list.txt",append=TRUE)
> >> >
> >> >
> >> >}
> >> >
> >> > }
> >> >
> >> > and I got this error:
> >> > Error in as.list.default(f1) :
> >> >   no method for coercing this S4 class to a vector
> >> >
> >> > Please advise
> >> >
> >> > On Fri, Jun 19, 2020 at 3:28 PM  wrote:
> >> >>
> >> >> so (untested) if you did something like
> >> >>
> >> >> f1 <- read.text("1g.txt")
> >> >> f2 <- read.text("1n.txt")
> >> >>
> >> >> for ( a in as.list(f1) ) {
> >> >>
> >> >>for ( b in as.list(f2) ) {
> >> >>
> >> >> ext <- paste0( "/ld/human/pairwise/",
> >> >> a,
> >> >> "/",
> >> >> b,
> >> >> 

Re: [R] How to loop over two files ...

2020-06-19 Thread Ana Marija
Hi Rasmus,

I got those SNPs from two GWAS-es which I run with different
phenotypes and I would like to compare weather the top SNPs in both of
them are in LD.
So 1n.txt and 1g.txt are just top SNPs from those two GWAS-es.
Unfortunately https://ldlink.nci.nih.gov/?tab=ldpair works for only
two SNPs at the time and I need to do that for 300 pairs

On Fri, Jun 19, 2020 at 6:42 PM Rasmus Liland  wrote:
>
> On 2020-06-19 14:34 -0500, Ana Marija wrote:
> >
> > I have two files (each has 300 lines)like this:
>
> The example looks quite similar to the R example in
> https://rest.ensembl.org/documentation/info/ld_pairwise_get#ra
> ...
>
> The question becomes: how did you query the
> 600 variant names in 1g.txt and 1n.txt?
>
>   curl 'https://rest.ensembl.org/ld/human/pairwise/rs6792369/rs1042779?' -H 
> 'Content-type:application/json'
>
> shows the 26 population_names for the
> rs6792369/rs1042779 combination ...

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to loop over two files ...

2020-06-19 Thread Rasmus Liland
On 2020-06-19 14:34 -0500, Ana Marija wrote:
> 
> I have two files (each has 300 lines)like this:

The example looks quite similar to the R example in 
https://rest.ensembl.org/documentation/info/ld_pairwise_get#ra 
...

The question becomes: how did you query the 
600 variant names in 1g.txt and 1n.txt?

  curl 'https://rest.ensembl.org/ld/human/pairwise/rs6792369/rs1042779?' -H 
'Content-type:application/json'

shows the 26 population_names for the 
rs6792369/rs1042779 combination ... 


signature.asc
Description: PGP signature
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to loop over two files ...

2020-06-19 Thread cpolwart

Sorry - its been a long week!

there is a foreach package but I try to avoid extras

make your for statements:

for ( a in rownames(f1) ) {

# a will now be a row number rather than the value, so replace ' a ' in 
the paste0 with: f1[ a, 1]


so

ext <- paste0( "/ld/human/pairwise/",
 f1[a,1],
 "/",
 f2[b,1],
 "?population_name=1000GENOMES:phase_3:KHV")

On 2020-06-19 22:54, Ana Marija wrote:

I tried it:

 > library(httr)

library(jsonlite)
library(xml2)
library(readr)
server <- "http://rest.ensembl.org;
f1 <- read_tsv("1g", col_names=F)

Parsed with column specification:
cols(
  X1 = col_character()
)

f2 <- read_tsv("1n", col_names=F)

Parsed with column specification:
cols(
  X1 = col_character()
)


for ( a in as.list(f1[,1]) ) {

+
+for ( b in as.list(f2[,1]) ) {
+
+ ext <- paste0( "/ld/human/pairwise/",
+ a,
+ "/",
+ b,
+ "?population_name=1000GENOMES:phase_3:KHV")
+
+ r <- GET(paste(server, ext, sep = ""),
+ content_type("application/json"))
+
+ write(r,file="list.txt",append=TRUE)
+
+
+}
+
+ }
Error in parse_url(url) : length(url) == 1 is not TRUE


traceback()

10: stop(simpleError(msg, call = if (p <- sys.parent(1L)) sys.call(p)))
9: stopifnot(length(url) == 1)
8: parse_url(url)
7: is.url(url)
6: stopifnot(is.url(url))
5: build_url(parse_url(url)[c("scheme", "hostname", "port")])
4: handle_name(url)
3: handle_find(url)
2: handle_url(handle, url, ...)
1: GET(paste(server, ext, sep = ""), content_type("application/json"))

On Fri, Jun 19, 2020 at 4:41 PM  wrote:


Oh - read.text isn't in base!  Not sure where is came from (my head
mostly!)  You may have something that adds it but better to use
something that works.  So try using:

library(readr)
f1 <- read_tsv("1g.txt", col.names=F)

This will give you a tibble with f1$X1 with the file in it

then loop it with (a in as.list(f1[,1])

Others will have much slicker code than me!

On 2020-06-19 22:02, Ana Marija wrote:
> Hi,
>
> thanks for getting back to me, it is just for my job :)
>
> so I tried it:
>
> library(httr)
> library(jsonlite)
> library(xml2)
> library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R",
> "lib")))
> sparkR.session(master = "local[*]", sparkConfig =
> list(spark.driver.memory = "2g"))
>
> server <- "http://rest.ensembl.org;
>
> f1 <- read.text("1g.txt")
> f2 <- read.text("1n.txt")
>
> for ( a in as.list(f1) ) {
>
>for ( b in as.list(f2) ) {
>
> ext <- paste0( "/ld/human/pairwise/",
> a,
> "/",
> b,
> "?population_name=1000GENOMES:phase_3:KHV")
>
> r <- GET(paste(server, ext, sep = ""),
> content_type("application/json"))
>
> write(r,file="list.txt",append=TRUE)
>
>
>}
>
> }
>
> and I got this error:
> Error in as.list.default(f1) :
>   no method for coercing this S4 class to a vector
>
> Please advise
>
> On Fri, Jun 19, 2020 at 3:28 PM  wrote:
>>
>> so (untested) if you did something like
>>
>> f1 <- read.text("1g.txt")
>> f2 <- read.text("1n.txt")
>>
>> for ( a in as.list(f1) ) {
>>
>>for ( b in as.list(f2) ) {
>>
>> ext <- paste0( "/ld/human/pairwise/",
>> a,
>> "/",
>> b,
>> "?population_name=1000GENOMES:phase_3:KHV")
>>
>> r <- GET(paste(server, ext, sep = ""),
>> content_type("application/json"))
>>
>> # You presumably need to do something with 'r' at the
>> moment its over written by the next loop..  were
>> # you appending it to list.txt?  Possibly its just a
>> bit
>> of the R output you want.?
>>
>> write(r,file="list.txt",append=TRUE)
>>
>>
>>}
>>
>> }
>>
>>
>> Are we doing your PhD for you ;-)  Do we get to share ;-)
>>
>>
>> On 2020-06-19 20:34, Ana Marija wrote:
>> > Hello,
>> >
>> > I have two files (each has 300 lines)like this:
>> >
>> > head 1g.txt
>> > rs6792369
>> > rs1414517
>> > rs16857712
>> > rs16857703
>> > rs12239392
>> > ...
>> >
>> > head 1n.txt
>> > rs1042779
>> > rs2360630
>> > rs10753597
>> > rs7549096
>> > rs2343491
>> > ...
>> >
>> > For each pair of rs# from those two files I can run this command in R
>> >
>> > library(httr)
>> > library(jsonlite)
>> > library(xml2)
>> >
>> > server <- "http://rest.ensembl.org;
>> > ext <-
>> > 
"/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV"
>> >
>> > r <- GET(paste(server, ext, sep = ""),
>> > content_type("application/json"))
>> >
>> > stop_for_status(r)
>> > head(fromJSON(toJSON(content(r
>> >d_prime   r2 variation1 variation2 population_name
>> > 1 0.975513 0.951626  rs6792369  rs1042779 1000GENOMES:phase_3:KHV
>> >
>> > What I would like to do is to do is to run this command for every SNP
>> > in one list (1g.txt) to each SNP in another list (1n.txt). Where SNP#
>> > is rs# and output 

Re: [R] How to loop over two files ...

2020-06-19 Thread Rasmus Liland
Dear other list readers, 

On 2020-06-19 23:31 +0200, Rasmus Liland wrote:
> I have attached my rds here.  

only Ana recieved this because of a Mailman 
attachment policy, which also is why my 
signature was bad ...

Best,
Rasmus


signature.asc
Description: PGP signature
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to loop over two files ...

2020-06-19 Thread Ana Marija
I tried it:

 > library(httr)
> library(jsonlite)
> library(xml2)
> library(readr)
> server <- "http://rest.ensembl.org;
> f1 <- read_tsv("1g", col_names=F)
Parsed with column specification:
cols(
  X1 = col_character()
)
> f2 <- read_tsv("1n", col_names=F)
Parsed with column specification:
cols(
  X1 = col_character()
)
>
> for ( a in as.list(f1[,1]) ) {
+
+for ( b in as.list(f2[,1]) ) {
+
+ ext <- paste0( "/ld/human/pairwise/",
+ a,
+ "/",
+ b,
+ "?population_name=1000GENOMES:phase_3:KHV")
+
+ r <- GET(paste(server, ext, sep = ""),
+ content_type("application/json"))
+
+ write(r,file="list.txt",append=TRUE)
+
+
+}
+
+ }
Error in parse_url(url) : length(url) == 1 is not TRUE

> traceback()
10: stop(simpleError(msg, call = if (p <- sys.parent(1L)) sys.call(p)))
9: stopifnot(length(url) == 1)
8: parse_url(url)
7: is.url(url)
6: stopifnot(is.url(url))
5: build_url(parse_url(url)[c("scheme", "hostname", "port")])
4: handle_name(url)
3: handle_find(url)
2: handle_url(handle, url, ...)
1: GET(paste(server, ext, sep = ""), content_type("application/json"))

On Fri, Jun 19, 2020 at 4:41 PM  wrote:
>
> Oh - read.text isn't in base!  Not sure where is came from (my head
> mostly!)  You may have something that adds it but better to use
> something that works.  So try using:
>
> library(readr)
> f1 <- read_tsv("1g.txt", col.names=F)
>
> This will give you a tibble with f1$X1 with the file in it
>
> then loop it with (a in as.list(f1[,1])
>
> Others will have much slicker code than me!
>
> On 2020-06-19 22:02, Ana Marija wrote:
> > Hi,
> >
> > thanks for getting back to me, it is just for my job :)
> >
> > so I tried it:
> >
> > library(httr)
> > library(jsonlite)
> > library(xml2)
> > library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R",
> > "lib")))
> > sparkR.session(master = "local[*]", sparkConfig =
> > list(spark.driver.memory = "2g"))
> >
> > server <- "http://rest.ensembl.org;
> >
> > f1 <- read.text("1g.txt")
> > f2 <- read.text("1n.txt")
> >
> > for ( a in as.list(f1) ) {
> >
> >for ( b in as.list(f2) ) {
> >
> > ext <- paste0( "/ld/human/pairwise/",
> > a,
> > "/",
> > b,
> > "?population_name=1000GENOMES:phase_3:KHV")
> >
> > r <- GET(paste(server, ext, sep = ""),
> > content_type("application/json"))
> >
> > write(r,file="list.txt",append=TRUE)
> >
> >
> >}
> >
> > }
> >
> > and I got this error:
> > Error in as.list.default(f1) :
> >   no method for coercing this S4 class to a vector
> >
> > Please advise
> >
> > On Fri, Jun 19, 2020 at 3:28 PM  wrote:
> >>
> >> so (untested) if you did something like
> >>
> >> f1 <- read.text("1g.txt")
> >> f2 <- read.text("1n.txt")
> >>
> >> for ( a in as.list(f1) ) {
> >>
> >>for ( b in as.list(f2) ) {
> >>
> >> ext <- paste0( "/ld/human/pairwise/",
> >> a,
> >> "/",
> >> b,
> >> "?population_name=1000GENOMES:phase_3:KHV")
> >>
> >> r <- GET(paste(server, ext, sep = ""),
> >> content_type("application/json"))
> >>
> >> # You presumably need to do something with 'r' at the
> >> moment its over written by the next loop..  were
> >> # you appending it to list.txt?  Possibly its just a
> >> bit
> >> of the R output you want.?
> >>
> >> write(r,file="list.txt",append=TRUE)
> >>
> >>
> >>}
> >>
> >> }
> >>
> >>
> >> Are we doing your PhD for you ;-)  Do we get to share ;-)
> >>
> >>
> >> On 2020-06-19 20:34, Ana Marija wrote:
> >> > Hello,
> >> >
> >> > I have two files (each has 300 lines)like this:
> >> >
> >> > head 1g.txt
> >> > rs6792369
> >> > rs1414517
> >> > rs16857712
> >> > rs16857703
> >> > rs12239392
> >> > ...
> >> >
> >> > head 1n.txt
> >> > rs1042779
> >> > rs2360630
> >> > rs10753597
> >> > rs7549096
> >> > rs2343491
> >> > ...
> >> >
> >> > For each pair of rs# from those two files I can run this command in R
> >> >
> >> > library(httr)
> >> > library(jsonlite)
> >> > library(xml2)
> >> >
> >> > server <- "http://rest.ensembl.org;
> >> > ext <-
> >> > "/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV"
> >> >
> >> > r <- GET(paste(server, ext, sep = ""),
> >> > content_type("application/json"))
> >> >
> >> > stop_for_status(r)
> >> > head(fromJSON(toJSON(content(r
> >> >d_prime   r2 variation1 variation2 population_name
> >> > 1 0.975513 0.951626  rs6792369  rs1042779 1000GENOMES:phase_3:KHV
> >> >
> >> > What I would like to do is to do is to run this command for every SNP
> >> > in one list (1g.txt) to each SNP in another list (1n.txt). Where SNP#
> >> > is rs# and output every line of result in list.txt
> >> >
> >> > The process is illustrated in the attachment.
> >> >
> >> > Please help,
> >> > Ana
> >> >
> >> > 

Re: [R] How to loop over two files ...

2020-06-19 Thread cpolwart
Oh - read.text isn't in base!  Not sure where is came from (my head 
mostly!)  You may have something that adds it but better to use 
something that works.  So try using:


library(readr)
f1 <- read_tsv("1g.txt", col.names=F)

This will give you a tibble with f1$X1 with the file in it

then loop it with (a in as.list(f1[,1])

Others will have much slicker code than me!

On 2020-06-19 22:02, Ana Marija wrote:

Hi,

thanks for getting back to me, it is just for my job :)

so I tried it:

library(httr)
library(jsonlite)
library(xml2)
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", 
"lib")))

sparkR.session(master = "local[*]", sparkConfig =
list(spark.driver.memory = "2g"))

server <- "http://rest.ensembl.org;

f1 <- read.text("1g.txt")
f2 <- read.text("1n.txt")

for ( a in as.list(f1) ) {

   for ( b in as.list(f2) ) {

ext <- paste0( "/ld/human/pairwise/",
a,
"/",
b,
"?population_name=1000GENOMES:phase_3:KHV")

r <- GET(paste(server, ext, sep = ""),
content_type("application/json"))

write(r,file="list.txt",append=TRUE)


   }

}

and I got this error:
Error in as.list.default(f1) :
  no method for coercing this S4 class to a vector

Please advise

On Fri, Jun 19, 2020 at 3:28 PM  wrote:


so (untested) if you did something like

f1 <- read.text("1g.txt")
f2 <- read.text("1n.txt")

for ( a in as.list(f1) ) {

   for ( b in as.list(f2) ) {

ext <- paste0( "/ld/human/pairwise/",
a,
"/",
b,
"?population_name=1000GENOMES:phase_3:KHV")

r <- GET(paste(server, ext, sep = ""),
content_type("application/json"))

# You presumably need to do something with 'r' at the
moment its over written by the next loop..  were
# you appending it to list.txt?  Possibly its just a 
bit

of the R output you want.?

write(r,file="list.txt",append=TRUE)


   }

}


Are we doing your PhD for you ;-)  Do we get to share ;-)


On 2020-06-19 20:34, Ana Marija wrote:
> Hello,
>
> I have two files (each has 300 lines)like this:
>
> head 1g.txt
> rs6792369
> rs1414517
> rs16857712
> rs16857703
> rs12239392
> ...
>
> head 1n.txt
> rs1042779
> rs2360630
> rs10753597
> rs7549096
> rs2343491
> ...
>
> For each pair of rs# from those two files I can run this command in R
>
> library(httr)
> library(jsonlite)
> library(xml2)
>
> server <- "http://rest.ensembl.org;
> ext <-
> 
"/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV"
>
> r <- GET(paste(server, ext, sep = ""),
> content_type("application/json"))
>
> stop_for_status(r)
> head(fromJSON(toJSON(content(r
>d_prime   r2 variation1 variation2 population_name
> 1 0.975513 0.951626  rs6792369  rs1042779 1000GENOMES:phase_3:KHV
>
> What I would like to do is to do is to run this command for every SNP
> in one list (1g.txt) to each SNP in another list (1n.txt). Where SNP#
> is rs# and output every line of result in list.txt
>
> The process is illustrated in the attachment.
>
> Please help,
> Ana
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to loop over two files ...

2020-06-19 Thread Rasmus Liland
On 2020-06-19 16:07 -0500, Ana Marija wrote:
> HI Rasmus,
> 
> I tried it:
> 
> library(base)
> 
> > r <- readRDS(paste0(population.name, ".rds"))
> Error in gzfile(file, "rb") : cannot open the connection
> In addition: Warning message:
> In gzfile(file, "rb") :
>   cannot open compressed file '1000GENOMES:phase_3:KHV.rds', probable
> reason 'No such file or directory'

Because I run my script again and again after 
every little small change using the program 
entr[1] as opposed to using Emacs Speaks 
Statistics or RStudio, I find it useful to 
save partial outputs in rds files, but it 
also make sense to not call ensembl.org again 
and again ...

Right, so you would run the commented bit 
before that first, then save the output list 
to the rds to not send too many requests to 
the list.  I have attached my rds here.  

files <- c("1g.txt", "1n.txt")
files <- lapply(files, readLines)
server <- "http://rest.ensembl.org;
population.name <- "1000GENOMES:phase_3:KHV"
ext <- apply(expand.grid(files), 1, function(x) {
  return(paste0(server, "/ld/human/pairwise/",
x[1], "/", x[2],
"?population_name=", population.name))
})

r <- lapply(ext, function(x) {
  httr::GET(x, httr::content_type("application/json"))
})
names(r) <- ext
file <- paste0(population.name, ".rds")

saveRDS(object=r, compress="xz", file=file)  # <--- Then save the list 
here for another time!
# r <- readRDS(paste0(population.name, ".rds"))  # Read it back like 
this

r <-
sapply(r, function(x) {
  x <- jsonlite::fromJSON(jsonlite::toJSON(httr::content(x)))
  length(x)
})
names(r) <- NULL
r

[1] http://eradman.com/entrproject/


signature.asc
Description: PGP signature
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to loop over two files ...

2020-06-19 Thread Ana Marija
HI Rasmus,

I tried it:

library(base)

files <- c("1g.txt", "1n.txt")
files <- lapply(files, readLines)
server <- "http://rest.ensembl.org;
population.name <- "1000GENOMES:phase_3:KHV"
ext <- apply(expand.grid(files), 1, function(x) {
  return(paste0(server, "/ld/human/pairwise/",
x[1], "/", x[2],
"?population_name=", population.name))
})

r <- readRDS(paste0(population.name, ".rds"))
lapply(r[1:4], function(x) {
  jsonlite::fromJSON(jsonlite::toJSON(httr::content(x)))
})

and I got this error:
> r <- readRDS(paste0(population.name, ".rds"))
Error in gzfile(file, "rb") : cannot open the connection
In addition: Warning message:
In gzfile(file, "rb") :
  cannot open compressed file '1000GENOMES:phase_3:KHV.rds', probable
reason 'No such file or directory'
> lapply(r[1:4], function(x) {
+   jsonlite::fromJSON(jsonlite::toJSON(httr::content(x)))
+ })
Error in lapply(r[1:4], function(x) { : object 'r' not found

Am I am doing here something wrong?
Do I need any other libraries loaded?

Thanks
Ana

On Fri, Jun 19, 2020 at 3:49 PM Rasmus Liland  wrote:
>
> On 2020-06-19 14:34 -0500, Ana Marija wrote:
> >
> > server <- "http://rest.ensembl.org;
> > ext <- 
> > "/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV"
> >
> > r <- GET(paste(server, ext, sep = ""), content_type("application/json"))
> >
> > stop_for_status(r)
> > head(fromJSON(toJSON(content(r
> >d_prime   r2 variation1 variation2 population_name
> > 1 0.975513 0.951626  rs6792369  rs1042779 1000GENOMES:phase_3:KHV
> >
> > What I would like to do is to do is to run this command for every SNP
> > in one list (1g.txt) to each SNP in another list (1n.txt). Where SNP#
> > is rs# and output every line of result in list.txt
>
> Dear Ana,
>
> I tried, but for some reason I get only a
> response for the first URL you supplied.
>
> I wrote this:
>
> files <- c("1g.txt", "1n.txt")
> files <- lapply(files, readLines)
> server <- "http://rest.ensembl.org;
> population.name <- "1000GENOMES:phase_3:KHV"
> ext <- apply(expand.grid(files), 1, function(x) {
>   return(paste0(server, "/ld/human/pairwise/",
> x[1], "/", x[2],
> "?population_name=", population.name))
> })
>
> # r <- lapply(ext, function(x) {
> #   httr::GET(x, httr::content_type("application/json"))
> # })
> # names(r) <- ext
> # file <- paste0(population.name, ".rds")
> # saveRDS(object=r, compress="xz", file=file)
>
> r <- readRDS(paste0(population.name, ".rds"))
> lapply(r[1:4], function(x) {
>   jsonlite::fromJSON(jsonlite::toJSON(httr::content(x)))
> })
>
>
> Which if you are able to run it (saving the
> output in that rds file), yields this:
>
> 
> $`http://rest.ensembl.org/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV`
>   variation2 population_name  d_prime   r2 variation1
> 1  rs1042779 1000GENOMES:phase_3:KHV 0.975513 0.951626  rs6792369
>
> 
> $`http://rest.ensembl.org/ld/human/pairwise/rs1414517/rs1042779?population_name=1000GENOMES:phase_3:KHV`
> list()
>
> 
> $`http://rest.ensembl.org/ld/human/pairwise/rs16857712/rs1042779?population_name=1000GENOMES:phase_3:KHV`
> list()
>
> 
> $`http://rest.ensembl.org/ld/human/pairwise/rs16857703/rs1042779?population_name=1000GENOMES:phase_3:KHV`
> list()
>
> For some reason, only the first url works ...
>
> I am a bit unfamiliar working with REST
> API's.  Or web scraping in general.  Daniel
> Cegiełka knows something in this thread some
> days ago, where it might be similar to the
> API of borsaitaliana.it, where you can supply
> headers with curl like he quickly did [2].
>
> You might be able to supply the list of SNPs
> in a header to Ensemble in httr::GET somehow
> if you read some docs on their API?
>
> Best,
> Rasmus
>
> [1] https://marc.info/?t=15924924612=1=2
> [2] https://marc.info/?l=r-sig-finance=159249894208684=2

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to loop over two files ...

2020-06-19 Thread Ana Marija
Hi,

thanks for getting back to me, it is just for my job :)

so I tried it:

library(httr)
library(jsonlite)
library(xml2)
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
sparkR.session(master = "local[*]", sparkConfig =
list(spark.driver.memory = "2g"))

server <- "http://rest.ensembl.org;

f1 <- read.text("1g.txt")
f2 <- read.text("1n.txt")

for ( a in as.list(f1) ) {

   for ( b in as.list(f2) ) {

ext <- paste0( "/ld/human/pairwise/",
a,
"/",
b,
"?population_name=1000GENOMES:phase_3:KHV")

r <- GET(paste(server, ext, sep = ""),
content_type("application/json"))

write(r,file="list.txt",append=TRUE)


   }

}

and I got this error:
Error in as.list.default(f1) :
  no method for coercing this S4 class to a vector

Please advise

On Fri, Jun 19, 2020 at 3:28 PM  wrote:
>
> so (untested) if you did something like
>
> f1 <- read.text("1g.txt")
> f2 <- read.text("1n.txt")
>
> for ( a in as.list(f1) ) {
>
>for ( b in as.list(f2) ) {
>
> ext <- paste0( "/ld/human/pairwise/",
> a,
> "/",
> b,
> "?population_name=1000GENOMES:phase_3:KHV")
>
> r <- GET(paste(server, ext, sep = ""),
> content_type("application/json"))
>
> # You presumably need to do something with 'r' at the
> moment its over written by the next loop..  were
> # you appending it to list.txt?  Possibly its just a bit
> of the R output you want.?
>
> write(r,file="list.txt",append=TRUE)
>
>
>}
>
> }
>
>
> Are we doing your PhD for you ;-)  Do we get to share ;-)
>
>
> On 2020-06-19 20:34, Ana Marija wrote:
> > Hello,
> >
> > I have two files (each has 300 lines)like this:
> >
> > head 1g.txt
> > rs6792369
> > rs1414517
> > rs16857712
> > rs16857703
> > rs12239392
> > ...
> >
> > head 1n.txt
> > rs1042779
> > rs2360630
> > rs10753597
> > rs7549096
> > rs2343491
> > ...
> >
> > For each pair of rs# from those two files I can run this command in R
> >
> > library(httr)
> > library(jsonlite)
> > library(xml2)
> >
> > server <- "http://rest.ensembl.org;
> > ext <-
> > "/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV"
> >
> > r <- GET(paste(server, ext, sep = ""),
> > content_type("application/json"))
> >
> > stop_for_status(r)
> > head(fromJSON(toJSON(content(r
> >d_prime   r2 variation1 variation2 population_name
> > 1 0.975513 0.951626  rs6792369  rs1042779 1000GENOMES:phase_3:KHV
> >
> > What I would like to do is to do is to run this command for every SNP
> > in one list (1g.txt) to each SNP in another list (1n.txt). Where SNP#
> > is rs# and output every line of result in list.txt
> >
> > The process is illustrated in the attachment.
> >
> > Please help,
> > Ana
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to loop over two files ...

2020-06-19 Thread Rasmus Liland
On 2020-06-19 14:34 -0500, Ana Marija wrote:
> 
> server <- "http://rest.ensembl.org;
> ext <- 
> "/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV"
> 
> r <- GET(paste(server, ext, sep = ""), content_type("application/json"))
> 
> stop_for_status(r)
> head(fromJSON(toJSON(content(r
>d_prime   r2 variation1 variation2 population_name
> 1 0.975513 0.951626  rs6792369  rs1042779 1000GENOMES:phase_3:KHV
> 
> What I would like to do is to do is to run this command for every SNP
> in one list (1g.txt) to each SNP in another list (1n.txt). Where SNP#
> is rs# and output every line of result in list.txt

Dear Ana,

I tried, but for some reason I get only a 
response for the first URL you supplied.  

I wrote this:

files <- c("1g.txt", "1n.txt")
files <- lapply(files, readLines)
server <- "http://rest.ensembl.org;
population.name <- "1000GENOMES:phase_3:KHV"
ext <- apply(expand.grid(files), 1, function(x) {
  return(paste0(server, "/ld/human/pairwise/",
x[1], "/", x[2],
"?population_name=", population.name))
})

# r <- lapply(ext, function(x) {
#   httr::GET(x, httr::content_type("application/json"))
# })
# names(r) <- ext
# file <- paste0(population.name, ".rds")
# saveRDS(object=r, compress="xz", file=file)

r <- readRDS(paste0(population.name, ".rds"))
lapply(r[1:4], function(x) {
  jsonlite::fromJSON(jsonlite::toJSON(httr::content(x)))
})


Which if you are able to run it (saving the 
output in that rds file), yields this: 


$`http://rest.ensembl.org/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV`
  variation2 population_name  d_prime   r2 variation1
1  rs1042779 1000GENOMES:phase_3:KHV 0.975513 0.951626  rs6792369


$`http://rest.ensembl.org/ld/human/pairwise/rs1414517/rs1042779?population_name=1000GENOMES:phase_3:KHV`
list()


$`http://rest.ensembl.org/ld/human/pairwise/rs16857712/rs1042779?population_name=1000GENOMES:phase_3:KHV`
list()


$`http://rest.ensembl.org/ld/human/pairwise/rs16857703/rs1042779?population_name=1000GENOMES:phase_3:KHV`
list()

For some reason, only the first url works ...

I am a bit unfamiliar working with REST 
API's.  Or web scraping in general.  Daniel 
Cegiełka knows something in this thread some 
days ago, where it might be similar to the 
API of borsaitaliana.it, where you can supply 
headers with curl like he quickly did [2].

You might be able to supply the list of SNPs 
in a header to Ensemble in httr::GET somehow 
if you read some docs on their API? 

Best,
Rasmus

[1] https://marc.info/?t=15924924612=1=2
[2] https://marc.info/?l=r-sig-finance=159249894208684=2


signature.asc
Description: PGP signature
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to loop over two files ...

2020-06-19 Thread cpolwart

so (untested) if you did something like

f1 <- read.text("1g.txt")
f2 <- read.text("1n.txt")

for ( a in as.list(f1) ) {

  for ( b in as.list(f2) ) {

ext <- paste0( "/ld/human/pairwise/",
   a,
   "/",
   b,
   "?population_name=1000GENOMES:phase_3:KHV")

   r <- GET(paste(server, ext, sep = ""), 
content_type("application/json"))


   # You presumably need to do something with 'r' at the 
moment its over written by the next loop..  were
   # you appending it to list.txt?  Possibly its just a bit 
of the R output you want.?


   write(r,file="list.txt",append=TRUE)


  }

}


Are we doing your PhD for you ;-)  Do we get to share ;-)


On 2020-06-19 20:34, Ana Marija wrote:

Hello,

I have two files (each has 300 lines)like this:

head 1g.txt
rs6792369
rs1414517
rs16857712
rs16857703
rs12239392
...

head 1n.txt
rs1042779
rs2360630
rs10753597
rs7549096
rs2343491
...

For each pair of rs# from those two files I can run this command in R

library(httr)
library(jsonlite)
library(xml2)

server <- "http://rest.ensembl.org;
ext <-
"/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV"

r <- GET(paste(server, ext, sep = ""), 
content_type("application/json"))


stop_for_status(r)
head(fromJSON(toJSON(content(r
   d_prime   r2 variation1 variation2 population_name
1 0.975513 0.951626  rs6792369  rs1042779 1000GENOMES:phase_3:KHV

What I would like to do is to do is to run this command for every SNP
in one list (1g.txt) to each SNP in another list (1n.txt). Where SNP#
is rs# and output every line of result in list.txt

The process is illustrated in the attachment.

Please help,
Ana

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to loop over two files ...

2020-06-19 Thread Ana Marija
Hello,

I have two files (each has 300 lines)like this:

head 1g.txt
rs6792369
rs1414517
rs16857712
rs16857703
rs12239392
...

head 1n.txt
rs1042779
rs2360630
rs10753597
rs7549096
rs2343491
...

For each pair of rs# from those two files I can run this command in R

library(httr)
library(jsonlite)
library(xml2)

server <- "http://rest.ensembl.org;
ext <- 
"/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV"

r <- GET(paste(server, ext, sep = ""), content_type("application/json"))

stop_for_status(r)
head(fromJSON(toJSON(content(r
   d_prime   r2 variation1 variation2 population_name
1 0.975513 0.951626  rs6792369  rs1042779 1000GENOMES:phase_3:KHV

What I would like to do is to do is to run this command for every SNP
in one list (1g.txt) to each SNP in another list (1n.txt). Where SNP#
is rs# and output every line of result in list.txt

The process is illustrated in the attachment.

Please help,
Ana


lists.pdf
Description: Adobe PDF document
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R-es] for-loop con partial

2020-06-10 Thread Javier Marcuzzi
Estimado Manuel Mendoza

Habría que verlo, en forma rápida y con un posible error de mi parte, puede
radicar en que la primer ventana tiene los parámetros y las segundas no, o
dicho de otra forma, algún índice se crea en la primera y en las sucesivas
no se crea nuevamente este índice, se me ocurre que la ventana no tiene que
ver con R, sino con el sistema operativo que recibe la orden desde R, de
ahí el posible conflicto en el índice. Pero no estoy seguro, tomarlo con
pinzas.

Javier Rubén Marcuzzi

El mié., 10 jun. 2020 a las 16:15, Manuel Mendoza (<
mmend...@fulbrightmail.org>) escribió:

> Muy buenas, ¿a ver si hay alguien que sepa por qué en este loop, si hago,
> p.e., i = 1 y corro las 2 filas de dentro, me abre una ventana y me hace el
> PDP de frg, es decir, lo hace bien, pero si corro todo el loop me abre las
> ventanas pero las deja vacías?
>
> predictores <- c("frg","omn","bc","co","pr","gg","fg","mf","br","hc")
> for(i in 1:length(predictores)){
> windows()
> partial(RFfit, pred.var = predictores[i], which.class = "Ard", plot =
> T,prob = T, chull=T, type="classification",plot.engine = "ggplot2", rug=T)
>   }
>
> Gracias,
> Manuel
>
> [[alternative HTML version deleted]]
>
> ___
> R-help-es mailing list
> R-help-es@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-help-es
>

[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


[R-es] for-loop con partial

2020-06-10 Thread Manuel Mendoza
Muy buenas, ¿a ver si hay alguien que sepa por qué en este loop, si hago,
p.e., i = 1 y corro las 2 filas de dentro, me abre una ventana y me hace el
PDP de frg, es decir, lo hace bien, pero si corro todo el loop me abre las
ventanas pero las deja vacías?

predictores <- c("frg","omn","bc","co","pr","gg","fg","mf","br","hc")
for(i in 1:length(predictores)){
windows()
partial(RFfit, pred.var = predictores[i], which.class = "Ard", plot =
T,prob = T, chull=T, type="classification",plot.engine = "ggplot2", rug=T)
  }

Gracias,
Manuel

[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R] [R ] Writing loop to estimate ARCH test for a multiple columns of a data frame?

2020-05-13 Thread Subhamitra Patra
Dear Sir,

I am so sorry that due to certain inconveniences, I became late to try your
suggested code and to reply to your email.

Thank you very much for your wonderful solution and suggestion for my
problem. Like before,  Your suggested code has worked awesome. Even, I
successfully imported the required output to the word following your
suggested similar path for the Libre office editor.

But, I have certain queries on your suggested code mentioned below which I
would like to discuss with you for my further learning.

1. Is there any difference between reading the tab and text file in R
because when I used  sp_8_5<-read.table("sp_8_5.tab",sep="\t",


header=TRUE,stringsAsFactors=FALSE)
it had thrown some error. But, when I changed the sp_8_5.tab into
sp_8_5.text, it worked. So, here my query, "does R read tab and text file
differently, however, both the files are similar"?

2. In the code, "return(sprintf("ChiSq = %.1f, p =
%.3f",archout$statistic,archout$p.value))", sprintf stands for printing the
particular results (i.e., statistics and p-value), right? Further, "ChiSq =
%.1f, p = %.3f" indicate the calling the values up to 1 and 3 decimal
points respectively, right? kindly correct me if I am worng in my
interpretation.

3. While opening a text file, sink("sp_8_5.txt")
 for(row in 0:2) {
 for(column in 1:4)

cat(spout[[column+row*4]],ifelse(column
< 4,"\t","\n"))
 }
   sink()
3.1. what sink indicates, I think here sink calls for the arranging of the
statistics and p-values in the required 3*4 dimension in the generated text
file, right? Please educate me.
3.2 Hence, the results are arranged in 3 rows and 4 columns in the text
file. I understand the code for arranging loop for columns [i.e.,
for(column in 1:4) ], but i didn't understand the loop for row [i.e., for(row
in 0:2)]. In particular, what is the logic behind the setting of 2 rather
than 3 for 3 rows in "for(row in 0:2)"?
3.3. In the code, "cat(spout[[column+row*4]],ifelse(column <
4,"\t","\n"))", what cat indicates? what is the logic behind [column+row*4]
 and ifelse(column < 4,"\t","\n") ? This is my major query in the entire
code. Please help me to understand this line.


Along with the above queries in your suggested code, I have one more query that
is it possible to rename each row and column? Actually, why I am asking
this because I have data from 80 countries, and each country has 5 columns
of data arranging in 5 columns. In other words, the total number of columns
in my study is 400. While doing the ARCH test for each column, there may be
a mistake to arrange the results in the text file. Thus, I want to arrange
the resulted statistics for 5 columns (for instance A1, A2, A3, A4, A5) for
each country in the following way which I think will definitely avoid any
kind of typo-mistake in arranging output in the text file. In other words,
Each row will have results for each country arranged in 5 columns for the
particular 5 variables which help to identify the particular result for the
particular columns of the particular countries in an easy manner.


Country   A1A2   A3 A4 A5
India  0.65  0.33   0.32   0.12  0.34
Israel  0.35  0.05   0.100.15   0.23
Australia  0.43  0.250.450.550.56

and so on.


Thank you very much, Sir, for educating a R learner for which I shall be
always grateful to you.


[image: Mailtrack]

Sender
notified by
Mailtrack

05/13/20,
04:56:34 PM

On Sat, May 9, 2020 at 8:58 AM Jim Lemon  wrote:

> Hi Subhamitra,
> I have washed the dishes and had a night's sleep, so I can now deal with
> your text munging problem. First, I'll reiterate the solution I sent:
>
> sp_8_5<-read.table("sp_8_5.tab",sep="\t",
>  header=TRUE,stringsAsFactors=FALSE)
> library(tseries)
> library(FinTS)
> # create a function that returns only the
> # statistic and p.value as a string
> archStatP<-function(x) {
>  archout<-ArchTest(x)
>  # I have truncated the values here
>  return(sprintf("ChiSq = %.1f, p =
> %.3f",archout$statistic,archout$p.value))
> }
> # using "lapply", run the test on each column
> spout<-lapply(sp_8_5[,2:13],archStatP)
>
> If you look at "spout" you will see that it is a list of 12 character
> strings. I arranged this as you seem to want the contents of a 3x4 table in
> a document. This is one way to do it, there are others.
>
> First, create a text table of the desired dimensions. I'll do it with
> loops as you seem to be familiar with them:
>
> # open a text file
> sink("sp_8_5.txt")
> for(row in 0:2) {
>  for(column in 1:4)
>   cat(spout[[column+row*4]],ifelse(column < 4,"\t","\n"))
> }
> sink()
>
> If you 

Re: [R] [R ] Writing loop to estimate ARCH test for a multiple columns of a data frame?

2020-05-08 Thread Jim Lemon
Hi Subhamitra,
I have washed the dishes and had a night's sleep, so I can now deal with
your text munging problem. First, I'll reiterate the solution I sent:

sp_8_5<-read.table("sp_8_5.tab",sep="\t",
 header=TRUE,stringsAsFactors=FALSE)
library(tseries)
library(FinTS)
# create a function that returns only the
# statistic and p.value as a string
archStatP<-function(x) {
 archout<-ArchTest(x)
 # I have truncated the values here
 return(sprintf("ChiSq = %.1f, p = %.3f",archout$statistic,archout$p.value))
}
# using "lapply", run the test on each column
spout<-lapply(sp_8_5[,2:13],archStatP)

If you look at "spout" you will see that it is a list of 12 character
strings. I arranged this as you seem to want the contents of a 3x4 table in
a document. This is one way to do it, there are others.

First, create a text table of the desired dimensions. I'll do it with loops
as you seem to be familiar with them:

# open a text file
sink("sp_8_5.txt")
for(row in 0:2) {
 for(column in 1:4)
  cat(spout[[column+row*4]],ifelse(column < 4,"\t","\n"))
}
sink()

If you open this file in a text editor (e.g. Notepad) you will see that it
contains 3 lines (rows), each with four TAB separated strings. Now to
import this into a word processing document. I don't have MS Word, so I'll
do it with Libre Office Writer and hope that the procedure is similar.

Move to where you want the table in your document
Select Insert|Text from file from the top menu
Select (highlight) the text you have imported
Select Convert|Text to table from the top menu

The highlighted area should become a table. I had to reduce the font size
from 12 to 10 to get the strings to fit into the cells.

There are probably a few more changes that you will want, so let me know if
you strike trouble.

Jim


On Fri, May 8, 2020 at 11:28 PM Subhamitra Patra 
wrote:

> Dear Sir,
>
> Thank you very much for your wonderful suggestion for my problem. Your
> suggested code has excellently worked and successfully extracted the
> statistics and p-value in another R object.
>
> Concerning your last suggestion, I attempted to separate the strings with
> TAB character in the "spout" object by using different alternative packages
> like dplyr, tidyr, qdap, ans also by using split,strsplit function so that
> can export the statistics and p-values for each column to excel, and later
> to the MSword file, but got the below error.
>
> By using the  split function, I wrote the code as,
> *string[] split = s.Split(spout, '\t')*
> where I got the following errors.
> Error: unexpected symbol in "string[] split"
> Error: unexpected symbol in "string[[]]split"
> Error in strsplit(row, "\t") : non-character argument
>
> Then I tried with  strsplit function by the below code
> *strsplit(spout, split)*
> But, got the below error as
> Error in as.character(split) :
>   cannot coerce type 'closure' to vector of type 'character'.
>
> Then used dplyr and tidyr package and the wrote the below code
> library(dplyr)
> library(tidyr)
> *separate(spout,value,into=c(“ChiSq”,”p”),sep=”,”)*
> *separate(spout,List of length 12,into=c(“ChiSq”,”p”),sep="\t")*
> But, got the errors as,
> Error: unexpected input in "separate(spout,value,into=c(“"
> Error: unexpected symbol in "separate(spout,List of"
>
> Then used qdap package with the code below
>
> *colsplit2df(spout,, c("ChiSq", "p"), ",")*
> *colsplit2df(spout,, c("ChiSq", "p"), sep = "\t")*
> But got the following errors
> Error in dataframe[, splitcol] : incorrect number of dimensions
> In addition: Warning message:
> In colsplit2df_helper(dataframe = dataframe, splitcol = splitcols[i],  :
>   dataframe object is not of the class data.frame
> Error in dataframe[, splitcol] : incorrect number of dimensions
> In addition: Warning message:
> In colsplit2df_helper(dataframe = dataframe, splitcol = splitcols[i],  :
>   dataframe object is not of the class data.frame
>
> Sir, please suggest me where I am going wrong in the above to separate
> string in the "spout" object.
>
> Thank you very much for your help.
>
> [image: Mailtrack]
> 
>  Sender
> notified by
> Mailtrack
> 
>  05/08/20,
> 06:51:46 PM
>
> On Fri, May 8, 2020 at 4:47 PM Jim Lemon  wrote:
>
>> 1) In general, *apply functions return a list with the number of elements
>> equal to the number of columns or other elements of the input data. You can
>> assign that list as I have to "spout" in the first example.
>>
>> 2) spout<-list() assigns the name "spout" to an empty list. As we are
>> processing columns 2 to 12 of the input data, spout[[i-1]] assigns the
>> results to elements 1 to 11 of the list "spout". Just a low trick.
>>
>> 1a) Yes, you can create a "wrapper" function that will return only the
>> statistic and p.value.
>>
>> # create a function that returns only the
>> # statistic and p.value as a string
>> 

Re: [R] [R ] Writing loop to estimate ARCH test for a multiple columns of a data frame?

2020-05-08 Thread Subhamitra Patra
Dear Sir,

Thank you very much for your wonderful suggestion for my problem. Your
suggested code has excellently worked and successfully extracted the
statistics and p-value in another R object.

Concerning your last suggestion, I attempted to separate the strings with
TAB character in the "spout" object by using different alternative packages
like dplyr, tidyr, qdap, ans also by using split,strsplit function so that
can export the statistics and p-values for each column to excel, and later
to the MSword file, but got the below error.

By using the  split function, I wrote the code as,
*string[] split = s.Split(spout, '\t')*
where I got the following errors.
Error: unexpected symbol in "string[] split"
Error: unexpected symbol in "string[[]]split"
Error in strsplit(row, "\t") : non-character argument

Then I tried with  strsplit function by the below code
*strsplit(spout, split)*
But, got the below error as
Error in as.character(split) :
  cannot coerce type 'closure' to vector of type 'character'.

Then used dplyr and tidyr package and the wrote the below code
library(dplyr)
library(tidyr)
*separate(spout,value,into=c(“ChiSq”,”p”),sep=”,”)*
*separate(spout,List of length 12,into=c(“ChiSq”,”p”),sep="\t")*
But, got the errors as,
Error: unexpected input in "separate(spout,value,into=c(“"
Error: unexpected symbol in "separate(spout,List of"

Then used qdap package with the code below

*colsplit2df(spout,, c("ChiSq", "p"), ",")*
*colsplit2df(spout,, c("ChiSq", "p"), sep = "\t")*
But got the following errors
Error in dataframe[, splitcol] : incorrect number of dimensions
In addition: Warning message:
In colsplit2df_helper(dataframe = dataframe, splitcol = splitcols[i],  :
  dataframe object is not of the class data.frame
Error in dataframe[, splitcol] : incorrect number of dimensions
In addition: Warning message:
In colsplit2df_helper(dataframe = dataframe, splitcol = splitcols[i],  :
  dataframe object is not of the class data.frame

Sir, please suggest me where I am going wrong in the above to separate
string in the "spout" object.

Thank you very much for your help.

[image: Mailtrack]

Sender
notified by
Mailtrack

05/08/20,
06:51:46 PM

On Fri, May 8, 2020 at 4:47 PM Jim Lemon  wrote:

> 1) In general, *apply functions return a list with the number of elements
> equal to the number of columns or other elements of the input data. You can
> assign that list as I have to "spout" in the first example.
>
> 2) spout<-list() assigns the name "spout" to an empty list. As we are
> processing columns 2 to 12 of the input data, spout[[i-1]] assigns the
> results to elements 1 to 11 of the list "spout". Just a low trick.
>
> 1a) Yes, you can create a "wrapper" function that will return only the
> statistic and p.value.
>
> # create a function that returns only the
> # statistic and p.value as a string
> archStatP<-function(x) {
>  archout<-ArchTest(x)
>  return(sprintf("ChiSq = %f, p = %f",archout$statistic,archout$p.value))
> }
> # using "lapply", run the test on each column
> spout<-lapply(sp_8_5[,2:12],archStatP)
>
> Note that I should have used "lapply". I didn't check the output carefully
> enough.
>
> 2a) Now you only have to separate the strings in "spout" with TAB
> characters and import the result into Excel. I have to wash the dishes, so
> you're on your own.
>
> Jim
>
> On Fri, May 8, 2020 at 8:26 PM Subhamitra Patra <
> subhamitra.pa...@gmail.com> wrote:
>
>> Dear Sir,
>>
>> Thank you very much for such an excellent solution to my problem. I was
>> trying sapply function since last days, but was really unable to write
>> properly. Now, I understood my mistake in using sapply function in the
>> code. Therefore, I have two queries regarding this which I want to discuss
>> here just for my learning purpose.
>>
>> 1. While using sapply function for estimating one method across the
>> columns of a data frame, one needs to define the list of the output table
>> after using sapply so that the test results for each column will be
>> consistently stored in an output object, right?
>>
>> 2. In the spout<- list() command, what spout[[i-1]]  indicates?
>>
>> Sir, one more possibility which I would like to ask related to my above
>> problem just to learn for further R programming language.
>>
>> After running your suggested code, all the results for each column are
>> being stored in the spout object. From this, I need only the statistics and
>> P-value for each column. So, my queries are:
>>
>> 1. Is there any way to extract only two values (i.e., statistics and
>> p-value) for each column that stored in spout object and save these two
>> values in another R data frame for each column?
>>  or
>> 2. Is there any possibility that the statistics and p-value
>> calculated for each column can directly export to a word file in a table
>> format (having 4 

Re: [R] [R ] Writing loop to estimate ARCH test for a multiple columns of a data frame?

2020-05-08 Thread Jim Lemon
1) In general, *apply functions return a list with the number of elements
equal to the number of columns or other elements of the input data. You can
assign that list as I have to "spout" in the first example.

2) spout<-list() assigns the name "spout" to an empty list. As we are
processing columns 2 to 12 of the input data, spout[[i-1]] assigns the
results to elements 1 to 11 of the list "spout". Just a low trick.

1a) Yes, you can create a "wrapper" function that will return only the
statistic and p.value.

# create a function that returns only the
# statistic and p.value as a string
archStatP<-function(x) {
 archout<-ArchTest(x)
 return(sprintf("ChiSq = %f, p = %f",archout$statistic,archout$p.value))
}
# using "lapply", run the test on each column
spout<-lapply(sp_8_5[,2:12],archStatP)

Note that I should have used "lapply". I didn't check the output carefully
enough.

2a) Now you only have to separate the strings in "spout" with TAB
characters and import the result into Excel. I have to wash the dishes, so
you're on your own.

Jim

On Fri, May 8, 2020 at 8:26 PM Subhamitra Patra 
wrote:

> Dear Sir,
>
> Thank you very much for such an excellent solution to my problem. I was
> trying sapply function since last days, but was really unable to write
> properly. Now, I understood my mistake in using sapply function in the
> code. Therefore, I have two queries regarding this which I want to discuss
> here just for my learning purpose.
>
> 1. While using sapply function for estimating one method across the
> columns of a data frame, one needs to define the list of the output table
> after using sapply so that the test results for each column will be
> consistently stored in an output object, right?
>
> 2. In the spout<- list() command, what spout[[i-1]]  indicates?
>
> Sir, one more possibility which I would like to ask related to my above
> problem just to learn for further R programming language.
>
> After running your suggested code, all the results for each column are
> being stored in the spout object. From this, I need only the statistics and
> P-value for each column. So, my queries are:
>
> 1. Is there any way to extract only two values (i.e., statistics and
> p-value) for each column that stored in spout object and save these two
> values in another R data frame for each column?
>  or
> 2. Is there any possibility that the statistics and p-value calculated for
> each column can directly export to a word file in a table format (having 4
> columns and 3 rows). In particular, is it possible to extract both
> statistic and p-value results for each column to an MS word file with the
> format of A1, A2, A3, A4 column results in 1st row, A5, A6, A7, A8 column
> results in 2nd row, and A9, A10, A11, A12 column results in the 3rd row of
> the table?
>
>
> Like before, your suggestion will definitely help me to learn the advanced
> R language.
>
> Thank you very much for your help.
>
> [image: Mailtrack]
> 
>  Sender
> notified by
> Mailtrack
> 
>  05/08/20,
> 03:47:26 PM
>
> On Fri, May 8, 2020 at 2:37 PM Jim Lemon  wrote:
>
>> Hi Subhamitra,
>> This isn't too hard:
>>
>> # read in the sample data that was
>> # saved in the file "sp_8_5.tab"
>> sp_8_5<-read.table("sp_8_5.tab",sep="\t",
>>  header=TRUE,stringsAsFactors=FALSE)
>> library(tseries)
>> library(FinTS)
>> # using "sapply", run the test on each column
>> spout<-sapply(sp_8_5[,2:12],ArchTest)
>>
>> The list "spout" contains the test results. If you really want to use a
>> loop:
>>
>> spout<-list()
>> for(i in 2:12) spout[[i-1]]<-ArchTest(sp_8_5[,i])
>>
>> Jim
>>
>>
>> On Fri, May 8, 2020 at 5:27 PM Subhamitra Patra <
>> subhamitra.pa...@gmail.com> wrote:
>>
>>> Dear Sir,
>>>
>>> Herewith I am pasting a part of my sample data having 12 columns below,
>>> and want to calculate ARCH test for the 12 columns by using a loop.
>>>
>>>
>
> --
> *Best Regards,*
> *Subhamitra Patra*
> *Phd. Research Scholar*
> *Department of Humanities and Social Sciences*
> *Indian Institute of Technology, Kharagpur*
> *INDIA*
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [R ] Writing loop to estimate ARCH test for a multiple columns of a data frame?

2020-05-08 Thread Subhamitra Patra
Dear Sir,

Thank you very much for such an excellent solution to my problem. I was
trying sapply function since last days, but was really unable to write
properly. Now, I understood my mistake in using sapply function in the
code. Therefore, I have two queries regarding this which I want to discuss
here just for my learning purpose.

1. While using sapply function for estimating one method across the columns
of a data frame, one needs to define the list of the output table after
using sapply so that the test results for each column will be consistently
stored in an output object, right?

2. In the spout<- list() command, what spout[[i-1]]  indicates?

Sir, one more possibility which I would like to ask related to my above
problem just to learn for further R programming language.

After running your suggested code, all the results for each column are
being stored in the spout object. From this, I need only the statistics and
P-value for each column. So, my queries are:

1. Is there any way to extract only two values (i.e., statistics and
p-value) for each column that stored in spout object and save these two
values in another R data frame for each column?
 or
2. Is there any possibility that the statistics and p-value calculated for
each column can directly export to a word file in a table format (having 4
columns and 3 rows). In particular, is it possible to extract both
statistic and p-value results for each column to an MS word file with the
format of A1, A2, A3, A4 column results in 1st row, A5, A6, A7, A8 column
results in 2nd row, and A9, A10, A11, A12 column results in the 3rd row of
the table?


Like before, your suggestion will definitely help me to learn the advanced
R language.

Thank you very much for your help.

[image: Mailtrack]

Sender
notified by
Mailtrack

05/08/20,
03:47:26 PM

On Fri, May 8, 2020 at 2:37 PM Jim Lemon  wrote:

> Hi Subhamitra,
> This isn't too hard:
>
> # read in the sample data that was
> # saved in the file "sp_8_5.tab"
> sp_8_5<-read.table("sp_8_5.tab",sep="\t",
>  header=TRUE,stringsAsFactors=FALSE)
> library(tseries)
> library(FinTS)
> # using "sapply", run the test on each column
> spout<-sapply(sp_8_5[,2:12],ArchTest)
>
> The list "spout" contains the test results. If you really want to use a
> loop:
>
> spout<-list()
> for(i in 2:12) spout[[i-1]]<-ArchTest(sp_8_5[,i])
>
> Jim
>
>
> On Fri, May 8, 2020 at 5:27 PM Subhamitra Patra <
> subhamitra.pa...@gmail.com> wrote:
>
>> Dear Sir,
>>
>> Herewith I am pasting a part of my sample data having 12 columns below,
>> and want to calculate ARCH test for the 12 columns by using a loop.
>>
>>

-- 
*Best Regards,*
*Subhamitra Patra*
*Phd. Research Scholar*
*Department of Humanities and Social Sciences*
*Indian Institute of Technology, Kharagpur*
*INDIA*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [R ] Writing loop to estimate ARCH test for a multiple columns of a data frame?

2020-05-08 Thread Jim Lemon
Hi Subhamitra,
This isn't too hard:

# read in the sample data that was
# saved in the file "sp_8_5.tab"
sp_8_5<-read.table("sp_8_5.tab",sep="\t",
 header=TRUE,stringsAsFactors=FALSE)
library(tseries)
library(FinTS)
# using "sapply", run the test on each column
spout<-sapply(sp_8_5[,2:12],ArchTest)

The list "spout" contains the test results. If you really want to use a
loop:

spout<-list()
for(i in 2:12) spout[[i-1]]<-ArchTest(sp_8_5[,i])

Jim


On Fri, May 8, 2020 at 5:27 PM Subhamitra Patra 
wrote:

> Dear Sir,
>
> Herewith I am pasting a part of my sample data having 12 columns below,
> and want to calculate ARCH test for the 12 columns by using a loop.
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [R ] Writing loop to estimate ARCH test for a multiple columns of a data frame?

2020-05-08 Thread Subhamitra Patra
Dear Sir,

Herewith I am pasting a part of my sample data having 12 columns below, and
want to calculate ARCH test for the 12 columns by using a loop.

Please help me in this regard. Thank you very much for your help.

Year_Month A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12
94-Jan 0.051197 7.05E-05 0.058806 -0.00818 0.538001 0.009766 0.025787
0.035478 0.056663 0.014665 0.23132 0.008644
94-Feb 0.06424 -0.01086 0.049823 -0.04989 0.557945 0.00974 0.027757 0.021494
0.016947 0.014584 0.229776 -0.02317
94-Mar 0.056168 -0.00626 0.061555 -0.03427 0.524705 0.009694 0.027632
-0.00656 0.008358 0.014499 0.190421 0.003026
94-Apr 0.129051 0.043813 0.060453 0.017469 0.545895 0.009615 0.01932 0.01171
0.016003 0.014412 0.140396 0.017556
94-May 0.142182 -0.03848 0.059938 0.015054 0.525178 0.009479 0.027741
0.000605 0.0185 0.014327 0.093228 -0.03989
94-Jun 0.152981 -0.03227 0.071485 -0.01025 0.363882 0.009323 0.030762
0.013005 0.03634 0.014239 0.035625 -0.01355
94-Jul 0.16216 0.046374 0.073669 0.020508 0.3405 0.00926 0.044822 -0.00954
0.042422 0.014154 0.037954 0.00097
94-Aug 0.124355 -0.06952 0.091429 0.015932 0.38519 0.009269 0.071701
0.000623 0.051954 0.01407 0.055852 0.007522
94-Sep 0.059405 0.057487 0.086265 -0.01169 0.401963 0.009171 0.086685
-0.01058 0.054404 0.013986 0.07285 0.002022
94-Oct 0.0594 0.021166 0.080765 0.006442 0.438915 0.009041 0.070351 0.006776
0.033622 0.013906 0.068344 -0.01532
94-Nov 0.072064 -0.03104 0.079567 -0.03295 0.521214 0.008986 0.066044
-0.01853 0.035202 0.013826 0.067093 -0.02278
94-Dec 0.068208 0.01024 0.069919 -0.01507 0.461059 0.008856 0.050985
0.009514 0.008638 0.013744 0.040348 0.00423
95-Jan 0.079074 -0.00153 0.070458 -0.04205 0.506227 0.00883 0.046561
-0.03907 0.015322 0.013662 0.034103 -0.00888
95-Feb 0.074231 -0.05728 0.062612 0.035992 0.487126 0.008815 0.052816
-0.01344 0.06728 0.013583 0.063281 -0.0054
95-Mar 0.065212 0.056084 0.095783 0.006825 0.476386 0.008774 0.047498
0.015178 0.040273 0.013499 0.060805 0.006099
95-Apr 0.081238 0.024283 0.098827 0.005791 0.432363 0.008748 0.06047
0.011613 0.013068 0.013417 0.058321 -0.01281
95-May 0.093726 0.008623 0.076698 0.027274 0.321103 0.008679 0.037962
0.00115 0.013647 0.013339 0.066724 -0.00271
95-Jun 0.113998 0.005484 0.073392 -0.00252 0.38195 0.008684 0.042794
-0.01133 0.054244 0.013261 0.055655 0.015941
95-Jul 0.097076 0.008842 0.090776 0.006378 0.622055 0.008728 0.036476
0.016159 0.055301 0.013188 0.057034 -0.0036
95-Aug 0.075751 0.002437 0.094687 -0.00398 0.637972 0.008839 0.052791
-0.00819 0.327487 0.013114 0.067734 0.00565
95-Sep 0.074714 0.001279 0.091216 0.013169 0.656225 0.008956 0.086582
-0.0013 0.690172 0.01304 0.059523 0.028675
95-Oct 0.048771 -0.01775 0.098525 0.003447 0.68386 0.009071 0.091073
-0.01597 0.640065 0.012967 0.030469 0.005139
95-Nov 0.069776 -0.00164 0.077763 0.00158 0.559675 0.008808 0.094129 0.01832
0.726821 0.012893 0.030908 -0.00955
95-Dec 0.135469 0.001886 0.074658 0.01263 0.563716 0.00 0.113828
0.011372 0.737532 0.012822 0.224459 -0.00186
96-Jan 0.175166 0.00068 0.071721 0.030701 0.534648 0.009114 0.086481
0.016228 0.687297 0.013112 0.349764 0.000727
96-Feb 0.167327 0.013771 0.055352 -0.03142 0.556339 0.009119 0.080475
-0.00691 0.696365 0.013077 0.342758 -4.90E-05
96-Mar 0.158759 -0.02094 0.042232 -0.00331 0.532126 0.009041 0.077231
0.009009 0.579396 0.012271 0.342196 -0.002
96-Apr 0.116956 0.02624 0.051037 -0.01496 0.575416 0.009123 0.079496
0.017197 0.557262 0.012094 0.299566 0.022657
96-May 0.109049 -0.02648 0.059972 0.00658 0.616302 0.009086 0.095365
-0.01682 0.521757 0.011933 0.074309 0.021621
96-Jun 0.102001 2.71E-05 0.060901 -0.00372 0.593491 0.009213 0.095232
0.001363 0.523983 0.011757 0.070504 -0.00507
96-Jul 0.079941 -0.02107 0.046018 -0.00708 0.562537 0.009136 0.094451
-0.01132 0.534417 0.011413 0.073706 -0.00615
96-Aug 0.109775 0.005178 0.051713 0.007174 0.54939 0.009008 0.088945
-0.01136 0.445843 0.010925 0.066559 0.009937
96-Sep 0.089581 -0.0005 0.049835 0.016873 0.54664 0.008887 0.082659 0.011384
0.435423 0.010697 0.091269 0.00687
96-Oct 0.07429 -0.01499 0.063584 0.008829 0.485504 0.008965 0.072986
-0.01695 0.54066 0.010649 0.325364 0.012261
96-Nov 0.060441 0.021057 0.100844 0.018152 0.415023 0.009033 0.072366
-0.00222 0.646444 0.010653 0.323194 0.01409
96-Dec 0.061482 0.0218 0.142038 -6.42E-06 0.492536 0.008947 0.081333
-0.02433 0.661019 0.010555 0.367988 -0.00023
97-Jan 0.053437 0.025314 0.137257 -0.00659 0.578904 0.008841 0.074613
-0.0068 0.628154 0.010609 0.355763 0.00581
97-Feb 0.080489 -0.01411 0.123644 0.009692 0.571364 0.008794 0.07673
0.005832 0.549697 0.010781 0.21588 0.070824
97-Mar 0.097621 0.00073 0.115192 -0.04503 0.639719 0.008686 0.065906
-0.01063 0.543819 0.010442 0.129773 0.004692
97-Apr 0.112502 -0.00052 0.064499 0.007382 0.648139 0.008674 0.038621
0.006408 0.591661 0.010283 0.079461 0.009395
97-May 0.109789 0.028968 0.079382 0.032543 0.530901 0.008884 0.029301
0.039566 0.492504 0.01004 0.042617 -0.00151
97-Jun 0.087521 -0.03031 0.037389 0.001738 0.500643 0.00886 

Re: [R] [R ] Writing loop to estimate ARCH test for a multiple columns of a data frame?

2020-05-07 Thread Jim Lemon
Hi Subhamitra,
For some reason, your data didn't make it through. Maybe you tried to
send an .xls or .xlsx file. If so, export it as CSV or if it's not too
big, just paste the text into your email.

Jim

On Thu, May 7, 2020 at 10:30 PM Subhamitra Patra
 wrote:
>
> Dear R-users,
>
> I want to estimate ARCH test for multiple columns (i.e.,  from 2:21 COLUMNS
> ) in my data. For this purpose, I want to run a loop to calculate ARCH test
> results for each column in the data frame. I tried by using for loop and
> lapply function, but unable to write a loop for computing the ARCH test
> simultaneously for each column (i.e., from 2:21 columns) of my data frame.
>
> Below is my ARCH test code which I want to estimate for multiple columns of
> the data frame in a loop.
>
> library(tseries)
>
> library(FinTS)
>
> ArchTest (A, lags=1, demean = FALSE)
>
> Hence, A is a vector for which the ARCH test result is calculated. Here, I
> want to write a loop so that the ArchTest can be calculated simultaneously
> for each column of my data frame. From ARCH test result, I require only the
> calculated Chi-square value and its p-value for each column that stored in
> another matrix or object for each column as an output file.
>
> For your convenience, I attached my sample data below. Please find it.
>
> Please help me for which I shall be always grateful to you.
>
> Thank you.
>
> --
> *Best Regards,*
> *Subhamitra Patra*
> *Phd. Research Scholar*
> *Department of Humanities and Social Sciences*
> *Indian Institute of Technology, Kharagpur*
> *INDIA*
>
> [image: Mailtrack]
> 
> Sender
> notified by
> Mailtrack
> 
> 05/07/20,
> 05:51:03 PM
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R ] Writing loop to estimate ARCH test for a multiple columns of a data frame?

2020-05-07 Thread Subhamitra Patra
Dear R-users,

I want to estimate ARCH test for multiple columns (i.e.,  from 2:21 COLUMNS
) in my data. For this purpose, I want to run a loop to calculate ARCH test
results for each column in the data frame. I tried by using for loop and
lapply function, but unable to write a loop for computing the ARCH test
simultaneously for each column (i.e., from 2:21 columns) of my data frame.

Below is my ARCH test code which I want to estimate for multiple columns of
the data frame in a loop.

library(tseries)

library(FinTS)

ArchTest (A, lags=1, demean = FALSE)

Hence, A is a vector for which the ARCH test result is calculated. Here, I
want to write a loop so that the ArchTest can be calculated simultaneously
for each column of my data frame. From ARCH test result, I require only the
calculated Chi-square value and its p-value for each column that stored in
another matrix or object for each column as an output file.

For your convenience, I attached my sample data below. Please find it.

Please help me for which I shall be always grateful to you.

Thank you.

-- 
*Best Regards,*
*Subhamitra Patra*
*Phd. Research Scholar*
*Department of Humanities and Social Sciences*
*Indian Institute of Technology, Kharagpur*
*INDIA*

[image: Mailtrack]

Sender
notified by
Mailtrack

05/07/20,
05:51:03 PM
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with loop in folders

2020-04-25 Thread Fredrik Karlsson
Hi,

I am sorry if I am misunderstanding what you are trying to do here, but can
you simplify it this way?
 (unfortualtely, this is untested since I dont have a suitable set of files
and a directory structure to test against)

dbifiles <- list.files(pattern="*.dbi",recursive=TRUE)

csvfiles <- gsub("dbi$","csv",dbifiles)

for(i in seq_along(csvfiles)){

df <- read.dbf(dbfiles[i])

write.csv( df, file =csvfiles[i])

}

or something along these lines?

Fredrik

On Fri, Apr 24, 2020 at 4:08 PM Shubhasmita Sahani <
shubhasmita.sah...@gmail.com> wrote:

> Hi Everyone,
> I am trying to loop through the folders in the major working directory.
> Read the dbf file into the data frame then save the data frame as CSV file
> in another folder.
> For this, I have written this code, But not able to figure out where it is
> going wrong. Any ideas will be of great support.
>
>
>  setwd(choose.dir())
>  csvpath= "C:/plan/Learning/dummydata/csv/"
>  a<-list.dirs()
>  inpath<-"C:/workplan/Q2/Project1"
>
>  for (folder in list.dirs()[-1]) {
>
>path<-setwd(paste0("inpath",folder))
>dbf<-list.files(path, pattern = "*ward.dbf")
>df <- read.dbf(dbf)
>dbfname<-basename(dbf)
>name<-file_path_sans_ext(dbfname)  # get the name of the file like
> agra_ward
>write.csv( df, file = paste0("csvpath",name,"csv"))
>print(path)
>
>  }
>
>
>
>
>
> --
> Thanks & Regards,
> Shubhasmita Sahani
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
"Life is like a trumpet - if you don't put anything into it, you don't get
anything out of it."

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with loop in folders

2020-04-24 Thread Sarah Goslee
I suspect much if not all of your trouble would be eliminated by using
file.path() instead of paste0().

https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/file.path

(Also check your file name - you probably want a . between name and
csv, so using paste(name, "csv", sep = ".") would create a more usual
file name.

It's always a good idea to work thru your loop by hand once, and look
at all the intermediate steps. Often that quickly shows you where you
went wrong.

Sarah

On Fri, Apr 24, 2020 at 10:08 AM Shubhasmita Sahani
 wrote:
>
> Hi Everyone,
> I am trying to loop through the folders in the major working directory.
> Read the dbf file into the data frame then save the data frame as CSV file
> in another folder.
> For this, I have written this code, But not able to figure out where it is
> going wrong. Any ideas will be of great support.
>
>
>  setwd(choose.dir())
>  csvpath= "C:/plan/Learning/dummydata/csv/"
>  a<-list.dirs()
>  inpath<-"C:/workplan/Q2/Project1"
>
>  for (folder in list.dirs()[-1]) {
>
>path<-setwd(paste0("inpath",folder))
>dbf<-list.files(path, pattern = "*ward.dbf")
>df <- read.dbf(dbf)
>dbfname<-basename(dbf)
>name<-file_path_sans_ext(dbfname)  # get the name of the file like
> agra_ward
>write.csv( df, file = paste0("csvpath",name,"csv"))
>print(path)
>
>  }
>
>
>
>
>
> --
> Thanks & Regards,
> Shubhasmita Sahani
>
-- 
Sarah Goslee (she/her)
http://www.numberwright.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with loop in folders

2020-04-24 Thread Bert Gunter
What package is "read.dbf" from? What error message/behavior did you see?
Should it be:
 path<-setwd(paste0("inpath/",folder)) ## did you forget the "/" ?

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Apr 24, 2020 at 7:08 AM Shubhasmita Sahani
 wrote:
>
> Hi Everyone,
> I am trying to loop through the folders in the major working directory.
> Read the dbf file into the data frame then save the data frame as CSV file
> in another folder.
> For this, I have written this code, But not able to figure out where it is
> going wrong. Any ideas will be of great support.
>
>
>  setwd(choose.dir())
>  csvpath= "C:/plan/Learning/dummydata/csv/"
>  a<-list.dirs()
>  inpath<-"C:/workplan/Q2/Project1"
>
>  for (folder in list.dirs()[-1]) {
>
>path<-setwd(paste0("inpath",folder))
>dbf<-list.files(path, pattern = "*ward.dbf")
>df <- read.dbf(dbf)
>dbfname<-basename(dbf)
>name<-file_path_sans_ext(dbfname)  # get the name of the file like
> agra_ward
>write.csv( df, file = paste0("csvpath",name,"csv"))
>print(path)
>
>  }
>
>
>
>
>
> --
> Thanks & Regards,
> Shubhasmita Sahani
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem with loop in folders

2020-04-24 Thread Shubhasmita Sahani
Hi Everyone,
I am trying to loop through the folders in the major working directory.
Read the dbf file into the data frame then save the data frame as CSV file
in another folder.
For this, I have written this code, But not able to figure out where it is
going wrong. Any ideas will be of great support.


 setwd(choose.dir())
 csvpath= "C:/plan/Learning/dummydata/csv/"
 a<-list.dirs()
 inpath<-"C:/workplan/Q2/Project1"

 for (folder in list.dirs()[-1]) {

   path<-setwd(paste0("inpath",folder))
   dbf<-list.files(path, pattern = "*ward.dbf")
   df <- read.dbf(dbf)
   dbfname<-basename(dbf)
   name<-file_path_sans_ext(dbfname)  # get the name of the file like
agra_ward
   write.csv( df, file = paste0("csvpath",name,"csv"))
   print(path)

 }





-- 
Thanks & Regards,
Shubhasmita Sahani

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [FORGED] Re: Loop With Dates

2019-09-23 Thread Rolf Turner



On 22/09/19 11:19 PM, Richard O'Keefe wrote:




Whenever you want a vector that counts something,
cumsum of a logical vector is a good thing to try.




Fortune nomination.

cheers,

Rolf

--
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with loop for column means into new column by a subset Factor w/131 levels

2019-04-30 Thread Bill Poling
I ran this routine but I was thinking there must be a more elegant way of doing 
this.


#https://community.rstudio.com/t/how-to-average-mean-variables-in-r-based-on-the-level-of-another-variable-and-save-this-as-a-new-variable/8764/8

hcd2tmp2_summmary <- hcd2tmp2 %>%
  select(.) %>%
  group_by(Procedure_Code1) %>%
  summarize(average = mean(Allowed_Amt))
# A tibble: 131 x 2
# Procedure_Code1 average
#  
# 1 A960657785.
# 2 J0129 5420.
# 3 J0178 4700.
# 4 J018013392.
# 5 J020256328.
# 6 J025617366.
# 7 J0257 7563.
# 8 J0485 2450.
# 9 J0490 6398.
# 10 J05854492.
# ... with 121 more rows

hcd2tmp2 <- hcd2tmp %>%
  group_by(Procedure_Code1) %>%
  summarise(Avg_Allowed_Amt = mean(Allowed_Amt))

view(hcd2tmp2)


hcd2tmp3 <- hcd2tmp %>%
  group_by(Procedure_Code1) %>%
  summarise(Avg_AllowByLimit = mean(AllowByLimit))

view(hcd2tmp3)


hcd2tmp4 <- hcd2tmp %>%
  group_by(Procedure_Code1) %>%
  summarise(Avg_UnitsByDose = mean(UnitsByDose))

view(hcd2tmp4)

hcd2tmp5 <- hcd2tmp %>%
  group_by(Procedure_Code1) %>%
  summarise(Avg_LimitByUnits = mean(LimitByUnits))

view(hcd2tmp5)

#Joins


hcd2tmp <- left_join(hcd2tmp2, hcd2tmp, by = 
c("Procedure_Code1"="Procedure_Code1"))
hcd2tmp <- left_join(hcd2tmp3, hcd2tmp, by = 
c("Procedure_Code1"="Procedure_Code1"))
hcd2tmp <- left_join(hcd2tmp4, hcd2tmp, by = 
c("Procedure_Code1"="Procedure_Code1"))
hcd2tmp <- left_join(hcd2tmp5, hcd2tmp, by = 
c("Procedure_Code1"="Procedure_Code1"))

view(hcd2tmp)

hcd2tmp$Avg_LimitByUnits <- round(hcd2tmp$Avg_LimitByUnits, digits = 2)
hcd2tmp$Avg_Allowed_Amt <- round(hcd2tmp$Avg_Allowed_Amt, digits = 2)
hcd2tmp$Avg_AllowByLimit <- round(hcd2tmp$Avg_AllowByLimit, digits = 2)
hcd2tmp$Avg_UnitsByDose <- round(hcd2tmp$Avg_UnitsByDose, digits = 2)

view(hcd2tmp)

#Over under columns
hcd2tmp$AllowByLimitFlag <- hcd2tmp$AllowByLimit > hcd2tmp$Avg_AllowByLimit
hcd2tmp$LimitByUnitsFlag <- hcd2tmp$LimitByUnits > hcd2tmp$Avg_LimitByUnits
hcd2tmp$Allowed_AmtFlag  <- hcd2tmp$Allowed_Amt  > hcd2tmp$Avg_Allowed_Amt
hcd2tmp$UnitsByDoseFlag  <- hcd2tmp$UnitsByDose  > hcd2tmp$Avg_UnitsByDose

view(hcd2tmp)


-Original Message-
From: Bill Poling
Sent: Tuesday, April 30, 2019 12:51 PM
To: r-help (r-help@r-project.org) 
Cc: Bill Poling 
Subject: Help with loop for column means into new column by a subset Factor 
w/131 levels

Good afternoon.

#RStudio Version 1.1.456
sessionInfo()
#R version 3.5.3 (2019-03-11)
#Platform: x86_64-w64-mingw32/x64 (64-bit) #Running under: Windows >= 8 x64 
(build 9200)



#I have a DF of 8 columns and 14025 rows

str(hcd2tmp2)

# 'data.frame':14025 obs. of  8 variables:
# $ Submitted_Charge: num  21021 15360 40561 29495 7904 ...
# $ Allowed_Amt : num  18393 6254 40561 29495 7904 ...
# $ Submitted_Units : num  60 240 420 45 120 215 215 15 57 2 ...
# $ Procedure_Code1 : Factor w/ 131 levels "A9606","J0129",..: 43 113 117 125 
24 85 85 90 86 25 ...
# $ AllowByLimit: num  4.268 0.949 7.913 6.124 3.524 ...
# $ UnitsByDose : num  600 240 420 450 120 215 215 750 570 500 ...
# $ LimitByUnits: num  4310 6591 5126 4816 2243 ...
# $ HCPCSCodeDose1  : num  10 1 1 10 1 1 1 50 10 250 ...

#I would like to create four additional columns that are the mean of four 
current columns in the DF.
#Current columns
#Allowed_Amt
#LimitByUnits
#AllowByLimit
#UnitsByDose

#The goal is to be able to identify rows where (for instance) Allowed_Amt is 
greater than the average (aka outliers).

#The trick Is I want the means of those columns based on a Factor value
#The Factor is:
#Procedure_Code1 : Factor w/ 131 levels "A9606","J0129"

#So each of my four new columns will have 131 distinct values based on the mean 
for the specific Procedure_Code1 grouping

#In SQL it would look something like this:

#SELECT *,
# NewCol1 = mean(Allowed_Amt) OVER (PARTITION BY Procedure_Code1),
# NewCol2 = mean(LimitByUnits) OVER (PARTITION BY Procedure_Code1),
# NewCol3 = mean(AllowByLimit) OVER (PARTITION BY Procedure_Code1),
# NewCol4 = mean(UnitsByDose) OVER (PARTITION BY Procedure_Code1)
#INTO NewTable
#FROM Oldtable

#Here are some sample data

head(hcd2tmp2, n=40)
#  Submitted_Charge Allowed_Amt Submitted_Units Procedure_Code1 
AllowByLimit UnitsByDose LimitByUnits HCPCSCodeDose1
# 1  21020.7018393.12  60   J17454.2679810  
   600  4309.56 10
# 2  15360.00 6254.40 240   J92990.9488785  
   240  6591.36  1
# 3  40561.3240561.32 420   J93067.9133539  
   420  5125.68  1
# 4  29495.2529495.25  45   J93556.1244417  
   450  4815.99 10
# 5   7904.30 7904.30 120   J08973.5243000  
   120  2242.80  1
# 6  15331.9510614.31   

[R] Help with loop for column means into new column by a subset Factor w/131 levels

2019-04-30 Thread Bill Poling
Good afternoon.

#RStudio Version 1.1.456
sessionInfo()
#R version 3.5.3 (2019-03-11)
#Platform: x86_64-w64-mingw32/x64 (64-bit)
#Running under: Windows >= 8 x64 (build 9200)



#I have a DF of 8 columns and 14025 rows

str(hcd2tmp2)

# 'data.frame':14025 obs. of  8 variables:
# $ Submitted_Charge: num  21021 15360 40561 29495 7904 ...
# $ Allowed_Amt : num  18393 6254 40561 29495 7904 ...
# $ Submitted_Units : num  60 240 420 45 120 215 215 15 57 2 ...
# $ Procedure_Code1 : Factor w/ 131 levels "A9606","J0129",..: 43 113 117 125 
24 85 85 90 86 25 ...
# $ AllowByLimit: num  4.268 0.949 7.913 6.124 3.524 ...
# $ UnitsByDose : num  600 240 420 450 120 215 215 750 570 500 ...
# $ LimitByUnits: num  4310 6591 5126 4816 2243 ...
# $ HCPCSCodeDose1  : num  10 1 1 10 1 1 1 50 10 250 ...

#I would like to create four additional columns that are the mean of four 
current columns in the DF.
#Current columns
#Allowed_Amt
#LimitByUnits
#AllowByLimit
#UnitsByDose

#The goal is to be able to identify rows where (for instance) Allowed_Amt is 
greater than the average (aka outliers).

#The trick Is I want the means of those columns based on a Factor value
#The Factor is:
#Procedure_Code1 : Factor w/ 131 levels "A9606","J0129"

#So each of my four new columns will have 131 distinct values based on the mean 
for the specific Procedure_Code1 grouping

#In SQL it would look something like this:

#SELECT *,
# NewCol1 = mean(Allowed_Amt) OVER (PARTITION BY Procedure_Code1),
# NewCol2 = mean(LimitByUnits) OVER (PARTITION BY Procedure_Code1),
# NewCol3 = mean(AllowByLimit) OVER (PARTITION BY Procedure_Code1),
# NewCol4 = mean(UnitsByDose) OVER (PARTITION BY Procedure_Code1)
#INTO NewTable
#FROM Oldtable

#Here are some sample data

head(hcd2tmp2, n=40)
#  Submitted_Charge Allowed_Amt Submitted_Units Procedure_Code1 
AllowByLimit UnitsByDose LimitByUnits HCPCSCodeDose1
# 1  21020.7018393.12  60   J17454.2679810  
   600  4309.56 10
# 2  15360.00 6254.40 240   J92990.9488785  
   240  6591.36  1
# 3  40561.3240561.32 420   J93067.9133539  
   420  5125.68  1
# 4  29495.2529495.25  45   J93556.1244417  
   450  4815.99 10
# 5   7904.30 7904.30 120   J08973.5243000  
   120  2242.80  1
# 6  15331.9510614.31 215   J90342.0586686  
   215  5155.91  1
# 7  15331.9510614.31 215   J90342.0586686  
   215  5155.91  1
# 8461.900.00  15   J90450.000  
   75046.38 50
# 9  27340.9615092.21  57   J90353.2600227  
   570  4629.48 10
# 10   768.00  576.00   2   J11901.3617343  
   500   422.99250
# 11   101.00   38.38   5   J2250   59.9687500  
 5 0.64  1
# 12 17458.400.00 200   J90330.000  
   200  5990.00  1
# 13  7885.10 7569.70   1   J1745  105.3835445  
1071.83 10
# 14  2015.00 1155.78   4   J27855.0051100  
 0   230.92  0
# 15   443.72  443.72  12   J9045   11.9601078  
   60037.10 50
# 16113750.00   113750.00 600   J23503.3025003  
   600 34443.60  1
# 17  3582.85 3582.85  10   J2469   30.5573561  
   250   117.25 25
# 18  5152.65 5152.65  50   J27961.4362988  
   500  3587.45 10
# 19  5152.65 5152.65  50   J27961.4362988  
   500  3587.45 10
# 20 39664.090.00  74   J93550.000  
   740  7919.63 10
# 21   166.71  102.53   9   J90453.6841538  
   45027.83 50
# 22 13823.61 9676.53   1   J25052.0785247  
 6  4655.48  6
# 23 90954.0026436.53 360   J17861.7443775  
  3600 15155.28 10
# 24  4800.00 3494.40 800   J32620.8861838  
   800  3943.20  1
# 25   216.00  105.84   4   J0696   42.336  
  1000 2.50250
# 26  5300.00 4770.00   1   J01784.9677151  
 1   

Re: [R] Recreate for loop without using for loop

2019-02-10 Thread Jeff Newmiller
There is a no-homework policy stated in the Posting Guide.

On February 10, 2019 8:59:41 AM PST, Rima El-zein  wrote:
>Hi.
>
>
>
>Can someone please help me recreate this code without using a for loop?
>Idk if I'm supposed to use a map function or something else.
>
>
>
>qprob <- function(pp) {
>
>  qq <- 1 - pp -1
>
>  stotal <- 0.0
>
>  for (i in 1:length(pp))
>
>stotal <- stotal + pp[i] * prod(qq[-i])
>
>  return(stotal)
>
>}
>
>Best regards,
>Rima
>
>
>Sendt fra Mail til
>Windows 10
>
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] From for loop to lappy?

2018-10-01 Thread Ek Esawi
Thank you Don. It works
EK
On Mon, Oct 1, 2018 at 6:39 PM MacQueen, Don  wrote:
>
> Try
>
> A <- lapply(file.names, function(fn) extract_tables(fn)
>
>
> --
> Don MacQueen
> Lawrence Livermore National Laboratory
> 7000 East Ave., L-627
> Livermore, CA 94550
> 925-423-1062
> Lab cell 925-724-7509
>
>
>
> On 10/1/18, 3:32 PM, "R-help on behalf of Ek Esawi" 
>  wrote:
>
> Hi All—
>
> I am using Tabulizer to extract tables from PDF files. Tabulizer
> creates a list of matrices for each set of tables in each document.
> My code, below, works well. Then i thought i would use lapply instead
> of for loop since it is a little faster and more compact,
> but i kept getting an error message below.
>
> Any help is greatly appreciated
>
> EK
>
> install.packages("tabulizer")
> installed.packages("stringr")
> library(stringi)
> library(tabulizer)
> path = "C:/Users/name/Documents/TextMining/"
> file.names <- dir(path, pattern =".PDF")
>
> for(i in 1:length(file.names)){
>   print(file.names[i])
>   A[[i]] <- extract_tables(file.names[i])
> }
>
>
> lapply(file.names, function(i) A[[i]] <- extract_tables(file.names[i]))
>
>  Error in normalizePath(path.expand(path), winslash, mustWork) :
>   path[1]="NA": The system cannot find the file specified
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] From for loop to lappy?

2018-10-01 Thread MacQueen, Don via R-help
Try

A <- lapply(file.names, function(fn) extract_tables(fn)


--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
Lab cell 925-724-7509
 
 

On 10/1/18, 3:32 PM, "R-help on behalf of Ek Esawi" 
 wrote:

Hi All—

I am using Tabulizer to extract tables from PDF files. Tabulizer
creates a list of matrices for each set of tables in each document.
My code, below, works well. Then i thought i would use lapply instead
of for loop since it is a little faster and more compact,
but i kept getting an error message below.

Any help is greatly appreciated

EK

install.packages("tabulizer")
installed.packages("stringr")
library(stringi)
library(tabulizer)
path = "C:/Users/name/Documents/TextMining/"
file.names <- dir(path, pattern =".PDF")

for(i in 1:length(file.names)){
  print(file.names[i])
  A[[i]] <- extract_tables(file.names[i])
}


lapply(file.names, function(i) A[[i]] <- extract_tables(file.names[i]))

 Error in normalizePath(path.expand(path), winslash, mustWork) :
  path[1]="NA": The system cannot find the file specified

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] From for loop to lappy?

2018-10-01 Thread Ek Esawi
Hi All—

I am using Tabulizer to extract tables from PDF files. Tabulizer
creates a list of matrices for each set of tables in each document.
My code, below, works well. Then i thought i would use lapply instead
of for loop since it is a little faster and more compact,
but i kept getting an error message below.

Any help is greatly appreciated

EK

install.packages("tabulizer")
installed.packages("stringr")
library(stringi)
library(tabulizer)
path = "C:/Users/name/Documents/TextMining/"
file.names <- dir(path, pattern =".PDF")

for(i in 1:length(file.names)){
  print(file.names[i])
  A[[i]] <- extract_tables(file.names[i])
}


lapply(file.names, function(i) A[[i]] <- extract_tables(file.names[i]))

 Error in normalizePath(path.expand(path), winslash, mustWork) :
  path[1]="NA": The system cannot find the file specified

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Stop a loop if it takes long time

2018-07-24 Thread Jeff Newmiller
Depends somewhat on what you are doing in the loop and how much of a 
performance hit you are willing to accept. [1]

[1] 
https://stackoverflow.com/questions/7891073/time-out-an-r-command-via-something-like-try

On July 24, 2018 3:17:41 AM PDT, Christofer Bogaso 
 wrote:
>Hi,
>
>Let say I am implementing a loop using for() / apply()-family etc.
>
>Now, the calculation-time within a particular loop is not fixed, means,
>some loop takes a long time to finish calculation, and next loop
>perhaps
>very quick to finish.
>
>I am exploring if there is any way, to check if the calculation within
>a
>particular loop takes longer time than a pre-fixed threshold and if it
>does
>then kill that loop and proceed to the next.
>
>Is it possible to implement such without adding much overhead with
>existing
>calculation?
>
>Thanks for your feedback
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Stop a loop if it takes long time

2018-07-24 Thread Christofer Bogaso
Hi,

Let say I am implementing a loop using for() / apply()-family etc.

Now, the calculation-time within a particular loop is not fixed, means,
some loop takes a long time to finish calculation, and next loop perhaps
very quick to finish.

I am exploring if there is any way, to check if the calculation within a
particular loop takes longer time than a pre-fixed threshold and if it does
then kill that loop and proceed to the next.

Is it possible to implement such without adding much overhead with existing
calculation?

Thanks for your feedback

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simplify the loop over the 3rd dimension of a 3D array

2018-07-12 Thread David Winsemius


> On Jul 12, 2018, at 9:40 AM, Duncan Murdoch  wrote:
> 
> On 12/07/2018 11:34 AM, Marine Regis wrote:
>> Hello all,
>> Is there an efficient way to simplify the loop over the 3rd dimension of a 
>> 3D array ? I want to keep the loop over the "time". Here is the code:
>> set.seed(12345)
>> ind <- 10
>> time_seq <- seq(0, 8, 1)
>> col_array <- c(paste("time_", time_seq, sep=""))
>> tab <- array(0, dim=c(length(time_seq) , length(col_array), ind), 
>> dimnames=list(NULL, col_array, as.character(seq(1, ind, 1
>> print(tab)
>> tab[1,c("time_0"),] <- round(runif(ind, 0, 100))
>> print(tab)
>> for(time in 1:(length(time_seq) - 1)){
>>   for(i in 1:ind){
>> tab[time + 1,c("time_0"),i] <- round(runif(1, 0, 100))
>> tab[time + 1,c("time_1"),i] <- tab[time,c("time_0"),i]
>> tab[time + 1,c("time_2"),i] <- tab[time,c("time_1"),i]
>> tab[time + 1,c("time_3"),i] <- tab[time,c("time_2"),i]
>> tab[time + 1,c("time_4"),i] <- tab[time,c("time_3"),i]
>> tab[time + 1,c("time_5"),i] <- tab[time,c("time_4"),i]
>> tab[time + 1,c("time_6"),i] <- tab[time,c("time_5"),i]
>> tab[time + 1,c("time_7"),i] <- tab[time,c("time_6"),i]
>> tab[time + 1,c("time_8"),i] <- tab[time,c("time_7"),i]
>>   }
>> }
> 
> It looks as though you are setting all entries to the same value.

I agree that it looked like that to me as well but in testing with a slight 
smaller version of the array I found that it was not so simple. I shortended 
the arrae to be dim = c(3,5,5) so that I could see it on one page, and then ran 
the code:

> for(time in 1:(length(time_seq) - 1)){
+  for(i in 1:ind){
+tab[time + 1,c("time_0"),i] <- round(runif(1, 0, 100))
+tab[time + 1,c("time_1"),i] <- tab[time,c("time_0"),i]
+tab[time + 1,c("time_2"),i] <- tab[time,c("time_1"),i]
+tab[time + 1,c("time_3"),i] <- tab[time,c("time_2"),i]
+tab[time + 1,c("time_4"),i] <- tab[time,c("time_3"),i]
+
+  }
+ }
> 
> print(tab)
, , 1

 time_0 time_1 time_2 time_3 time_4
[1,] 72  0  0  0  0
[2,] 89 72  0  0  0
[3,] 33 89 72  0  0
[4,] 99 33 89 72  0
[5,] 74 99 33 89 72

, , 2

 time_0 time_1 time_2 time_3 time_4
[1,] 88  0  0  0  0
[2,] 46 88  0  0  0
[3,] 51 46 88  0  0
[4,]  3 51 46 88  0
[5,]  0  3 51 46 88

, , 3

 time_0 time_1 time_2 time_3 time_4
[1,] 76  0  0  0  0
[2,] 17 76  0  0  0
[3,] 73 17 76  0  0
[4,] 15 73 17 76  0
[5,] 39 15 73 17 76


So the code was filling in the diagonal and a shifter version of the diagonal 
values. Whether that was the intent of the OP was not clear from the original 
email. The practice of throwing code as the only description of the problem is 
a common source of confusion.





> A simpler way to do that would be this loop:
> 
> for(time in 1:(length(time_seq) - 1)){
>  for(i in 1:ind){
>tab[time + 1,,i] <- round(runif(1, 0, 100))
>  }
> }
> 
> You could also do away with the inner loop by generating ind random values 
> all at once.  You have to be a little careful with the ordering; I think this 
> gets it right:
> 
> for(time in 1:(length(time_seq) - 1)){
>  tab[time + 1,,] <- t(matrix(round(runif(ind, 0, 100)), ind, 9))
> }
> 
> And then you can do away with the loop entirely, since none of the values 
> depend on earlier calculations.  Just generate ind*length(time_seq) uniforms, 
> and put them in the array in the right order.  You could use aperm() to do 
> this instead of t(), but be careful, it's easy to get the permutation wrong.  
> (I'm not even going to try now. :-).
> 
> Duncan Murdoch
> 
>> print(tab)
>> In fact, the array has 80 observations for the 3rd dimension.
>> Many thanks for your time
>> Have a great day
>> Marine
>>  [[alternative HTML version deleted]]
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   
-Gehm's Corollary to Clarke's Third Law

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 

Re: [R] Simplify the loop over the 3rd dimension of a 3D array

2018-07-12 Thread Duncan Murdoch

On 12/07/2018 11:34 AM, Marine Regis wrote:

Hello all,


Is there an efficient way to simplify the loop over the 3rd dimension of a 3D array ? I 
want to keep the loop over the "time". Here is the code:


set.seed(12345)
ind <- 10
time_seq <- seq(0, 8, 1)
col_array <- c(paste("time_", time_seq, sep=""))
tab <- array(0, dim=c(length(time_seq) , length(col_array), ind), 
dimnames=list(NULL, col_array, as.character(seq(1, ind, 1
print(tab)

tab[1,c("time_0"),] <- round(runif(ind, 0, 100))
print(tab)


for(time in 1:(length(time_seq) - 1)){
   for(i in 1:ind){
 tab[time + 1,c("time_0"),i] <- round(runif(1, 0, 100))
 tab[time + 1,c("time_1"),i] <- tab[time,c("time_0"),i]
 tab[time + 1,c("time_2"),i] <- tab[time,c("time_1"),i]
 tab[time + 1,c("time_3"),i] <- tab[time,c("time_2"),i]
 tab[time + 1,c("time_4"),i] <- tab[time,c("time_3"),i]
 tab[time + 1,c("time_5"),i] <- tab[time,c("time_4"),i]
 tab[time + 1,c("time_6"),i] <- tab[time,c("time_5"),i]
 tab[time + 1,c("time_7"),i] <- tab[time,c("time_6"),i]
 tab[time + 1,c("time_8"),i] <- tab[time,c("time_7"),i]
   }
}


It looks as though you are setting all entries to the same value.  A 
simpler way to do that would be this loop:


for(time in 1:(length(time_seq) - 1)){
  for(i in 1:ind){
tab[time + 1,,i] <- round(runif(1, 0, 100))
  }
}

You could also do away with the inner loop by generating ind random 
values all at once.  You have to be a little careful with the ordering; 
I think this gets it right:


for(time in 1:(length(time_seq) - 1)){
  tab[time + 1,,] <- t(matrix(round(runif(ind, 0, 100)), ind, 9))
}

And then you can do away with the loop entirely, since none of the 
values depend on earlier calculations.  Just generate 
ind*length(time_seq) uniforms, and put them in the array in the right 
order.  You could use aperm() to do this instead of t(), but be careful, 
it's easy to get the permutation wrong.  (I'm not even going to try now. 
:-).


Duncan Murdoch



print(tab)



In fact, the array has 80 observations for the 3rd dimension.


Many thanks for your time

Have a great day

Marine

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Simplify the loop over the 3rd dimension of a 3D array

2018-07-12 Thread Marine Regis
Hello all,


Is there an efficient way to simplify the loop over the 3rd dimension of a 3D 
array ? I want to keep the loop over the "time". Here is the code:


set.seed(12345)
ind <- 10
time_seq <- seq(0, 8, 1)
col_array <- c(paste("time_", time_seq, sep=""))
tab <- array(0, dim=c(length(time_seq) , length(col_array), ind), 
dimnames=list(NULL, col_array, as.character(seq(1, ind, 1
print(tab)

tab[1,c("time_0"),] <- round(runif(ind, 0, 100))
print(tab)


for(time in 1:(length(time_seq) - 1)){
  for(i in 1:ind){
tab[time + 1,c("time_0"),i] <- round(runif(1, 0, 100))
tab[time + 1,c("time_1"),i] <- tab[time,c("time_0"),i]
tab[time + 1,c("time_2"),i] <- tab[time,c("time_1"),i]
tab[time + 1,c("time_3"),i] <- tab[time,c("time_2"),i]
tab[time + 1,c("time_4"),i] <- tab[time,c("time_3"),i]
tab[time + 1,c("time_5"),i] <- tab[time,c("time_4"),i]
tab[time + 1,c("time_6"),i] <- tab[time,c("time_5"),i]
tab[time + 1,c("time_7"),i] <- tab[time,c("time_6"),i]
tab[time + 1,c("time_8"),i] <- tab[time,c("time_7"),i]
  }
}

print(tab)



In fact, the array has 80 observations for the 3rd dimension.


Many thanks for your time

Have a great day

Marine

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using for loop with data frames.

2018-05-10 Thread MacQueen, Don
Evidently, you want your loop to create new data frames, named (in this example)
  df_selected1
  df_selected2
  df_selected3

Yes, it can be done. But to do it you will have to use the get() and assign() 
functions, and construct the data frame names as character strings. Syntax like
df_bs_id[3]
does not give you df_bs_id3.

R experts typically discourage this kind of approach. A method more consistent 
with how R is designed to work would be to store the data frames as elements of 
a list.

dflst <- list(df_bs_id1, df_bs_id2, df_bs_id3)
nframes <- length(dflist)
newdf <- dflst

for (id in seq(nframes)) {
   newdf[id] <- dflst[[ id ]][ , c("column1", "column2")]  
}

Optionally, you could name the list elements:
 
names(dflst) <- paste0('df_selected', seq(nframes))

After which you would have the original data frames as elements of dflst, and 
the processed data frames as elements of newdf. The loop can be simplified a 
bit if you don't need to keep copies of the original data frames.

With this approach, it would be better create dflst using a loop over the 
incoming file names, running read.csv() inside the loop. In which case you 
would never create separate data frames df_bs_id1, df_bs_id2, etc.

I have used both approaches at various times over the years, and each has pros 
and cons. In general, I would recommend the list approach, however, especially 
if you have a large number of files to process.

-Don

--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
Lab cell 925-724-7509
 
 
On 5/10/18, 7:33 AM, "R-help on behalf of Marcelo Mariano Silva" 
 wrote:

Hi,

Is it possible use a loop to process many data frames in the same way?

For example, if I have three data frames, all with same variables


df_bs_id1 <- read.csv("test1.csv",header =TRUE)
df_bs_id2 <- read.csv("test2.csv",header =TRUE)
df_bs_id3 <- read.csv("test3.csv",header =TRUE)


How could I would implement a code loop that , for instance, would select
two coluns of interest in a fashion of the code below ?


# selecting only 2 columns of interest

for (1, 1:3) {
df_selected [i] <- df_bs_id[i]  [ , c("column1", "column2")]  }


Tks

MMS

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using for loop with data frames.

2018-05-10 Thread John Kane via R-help
 
Why not just use an rbind() and create one data.frame?
On Thursday, May 10, 2018, 10:34:19 a.m. EDT, Marcelo Mariano Silva 
 wrote:  
 
 Hi,

Is it possible use a loop to process many data frames in the same way?

For example, if I have three data frames, all with same variables


df_bs_id1 <- read.csv("test1.csv",header =TRUE)
df_bs_id2 <- read.csv("test2.csv",header =TRUE)
df_bs_id3 <- read.csv("test3.csv",header =TRUE)


How could I would implement a code loop that , for instance, would select
two coluns of interest in a fashion of the code below ?


# selecting only 2 columns of interest

for (1, 1:3) {
df_selected [i] <- df_bs_id[i]  [ , c("column1", "column2")]  }


Tks

MMS

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using for loop with data frames.

2018-05-10 Thread Marcelo Mariano Silva
Hi,

Is it possible use a loop to process many data frames in the same way?

For example, if I have three data frames, all with same variables


df_bs_id1 <- read.csv("test1.csv",header =TRUE)
df_bs_id2 <- read.csv("test2.csv",header =TRUE)
df_bs_id3 <- read.csv("test3.csv",header =TRUE)


How could I would implement a code loop that , for instance, would select
two coluns of interest in a fashion of the code below ?


# selecting only 2 columns of interest

for (1, 1:3) {
df_selected [i] <- df_bs_id[i]  [ , c("column1", "column2")]  }


Tks

MMS

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] inefficient for loop, is there a better way?

2017-12-12 Thread Bert Gunter
I believe ?filter will do what you want.

I used  n = 100 instead of 1000:

ts <- 1:100
examp <- data.frame(ts=ts, stage=sin(ts))
examp <- within(examp, {
  abv_1 <- filter(stage > 0.6, rep(1,7),sides =1)
  abv_2 <- filter(stage > .85, rep(1,7), sides =1)
   })
examp

I think this should be fairly fast, but let us know if not. There may be
other alternatives that might be faster.
Assuming it does what you wanted, of course.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Tue, Dec 12, 2017 at 5:36 PM, Morway, Eric  wrote:

> The code below is a small reproducible example of a much larger problem.
> While the script below works, it is really slow on the true dataset with
> many more rows and columns.  I'm hoping to get the same result to examp,
> but with significant time savings.
>
> The example below is setting up a data.frame for an ensuing regression
> analysis.  The purpose of the script below is to appends columns to 'examp'
> that contain values corresponding to the total number of days in the
> previous 7 ('per') above some stage ('elev1' or 'elev2').  Is there a
> faster method that leverages existing R functionality?  I feel like the
> hack below is pretty clunky and can be sped up on the true dataset.  I
> would like to run a more efficient script many times adjusting the value of
> 'per'.
>
> ts <- 1:1000
> examp <- data.frame(ts=ts, stage=sin(ts))
>
> hi1 <- list()
> hi2 <- list()
> per <- 7
> elev1 <- 0.6
> elev2 <- 0.85
> for(i in per:nrow(examp)){
> examp_per <- examp[seq(i - (per - 1), i, by=1),]
> stg_hi_cond1 <- subset(examp_per, examp_per$stage > elev1)
> stg_hi_cond2 <- subset(examp_per, examp_per$stage > elev2)
>
> hi1 <- c(hi1, nrow(stg_hi_cond1))
> hi2 <- c(hi2, nrow(stg_hi_cond2))
> }
> examp$days_abv_0.6_in_last_7   <- c(rep(NA, times=per-1), unlist(hi1))
> examp$days_abv_0.85_in_last_7  <- c(rep(NA, times=per-1), unlist(hi2))
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] inefficient for loop, is there a better way?

2017-12-12 Thread Yvan Richard
One way of doing it with data.table. It seems to scale up pretty well.
It takes 4 seconds on my computer with ts <- 1:1e6.

library(data.table)
per <- 7
elev1 <- 0.6
elev2 <- 0.85

ts <- 1:1000

examp <- data.table(ts=ts, stage=sin(ts))
examp[, `:=`(days_abv_0.6_in_last_7  = apply(do.call('cbind',
shift(stage, 1:per)), 1, function(x) sum(x > elev1)),
   days_abv_0.85_in_last_7 = apply(do.call('cbind',
shift(stage, 1:per)), 1, function(x) sum(x > elev2)))]



On 13 December 2017 at 14:36, Morway, Eric  wrote:
> The code below is a small reproducible example of a much larger problem.
> While the script below works, it is really slow on the true dataset with
> many more rows and columns.  I'm hoping to get the same result to examp,
> but with significant time savings.
>
> The example below is setting up a data.frame for an ensuing regression
> analysis.  The purpose of the script below is to appends columns to 'examp'
> that contain values corresponding to the total number of days in the
> previous 7 ('per') above some stage ('elev1' or 'elev2').  Is there a
> faster method that leverages existing R functionality?  I feel like the
> hack below is pretty clunky and can be sped up on the true dataset.  I
> would like to run a more efficient script many times adjusting the value of
> 'per'.
>
> ts <- 1:1000
> examp <- data.frame(ts=ts, stage=sin(ts))
>
> hi1 <- list()
> hi2 <- list()
> per <- 7
> elev1 <- 0.6
> elev2 <- 0.85
> for(i in per:nrow(examp)){
> examp_per <- examp[seq(i - (per - 1), i, by=1),]
> stg_hi_cond1 <- subset(examp_per, examp_per$stage > elev1)
> stg_hi_cond2 <- subset(examp_per, examp_per$stage > elev2)
>
> hi1 <- c(hi1, nrow(stg_hi_cond1))
> hi2 <- c(hi2, nrow(stg_hi_cond2))
> }
> examp$days_abv_0.6_in_last_7   <- c(rep(NA, times=per-1), unlist(hi1))
> examp$days_abv_0.85_in_last_7  <- c(rep(NA, times=per-1), unlist(hi2))
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Yvan Richard, PhD
Environmental data scientist



Physical address: Level 4, 158 Victoria St, Te Aro, Wellington, New Zealand
Postal address: PO Box 27535, Wellington 6141, New Zealand
Phone: 022 643 7881

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] inefficient for loop, is there a better way?

2017-12-12 Thread William Dunlap via R-help
Try using stats::filter (not the unfortunately named dplyr::filter, which
is entirely different).
state>elev is a logical vector, but filter(), like most numerical
functions, treats TRUEs as 1s
and FALSEs as 0s.

E.g.,

> str( stats::filter( x=examp$stage>elev1, filter=rep(1,7),
method="convolution", sides=1) )
 Time-Series [1:1000] from 1 to 1000: NA NA NA NA NA NA 3 3 2 2 ...
> str( stats::filter( x=examp$stage>elev2, filter=rep(1,7),
method="convolution", sides=1) )
 Time-Series [1:1000] from 1 to 1000: NA NA NA NA NA NA 1 2 1 1 ...


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Dec 12, 2017 at 5:36 PM, Morway, Eric  wrote:

> The code below is a small reproducible example of a much larger problem.
> While the script below works, it is really slow on the true dataset with
> many more rows and columns.  I'm hoping to get the same result to examp,
> but with significant time savings.
>
> The example below is setting up a data.frame for an ensuing regression
> analysis.  The purpose of the script below is to appends columns to 'examp'
> that contain values corresponding to the total number of days in the
> previous 7 ('per') above some stage ('elev1' or 'elev2').  Is there a
> faster method that leverages existing R functionality?  I feel like the
> hack below is pretty clunky and can be sped up on the true dataset.  I
> would like to run a more efficient script many times adjusting the value of
> 'per'.
>
> ts <- 1:1000
> examp <- data.frame(ts=ts, stage=sin(ts))
>
> hi1 <- list()
> hi2 <- list()
> per <- 7
> elev1 <- 0.6
> elev2 <- 0.85
> for(i in per:nrow(examp)){
> examp_per <- examp[seq(i - (per - 1), i, by=1),]
> stg_hi_cond1 <- subset(examp_per, examp_per$stage > elev1)
> stg_hi_cond2 <- subset(examp_per, examp_per$stage > elev2)
>
> hi1 <- c(hi1, nrow(stg_hi_cond1))
> hi2 <- c(hi2, nrow(stg_hi_cond2))
> }
> examp$days_abv_0.6_in_last_7   <- c(rep(NA, times=per-1), unlist(hi1))
> examp$days_abv_0.85_in_last_7  <- c(rep(NA, times=per-1), unlist(hi2))
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] inefficient for loop, is there a better way?

2017-12-12 Thread Morway, Eric
The code below is a small reproducible example of a much larger problem.
While the script below works, it is really slow on the true dataset with
many more rows and columns.  I'm hoping to get the same result to examp,
but with significant time savings.

The example below is setting up a data.frame for an ensuing regression
analysis.  The purpose of the script below is to appends columns to 'examp'
that contain values corresponding to the total number of days in the
previous 7 ('per') above some stage ('elev1' or 'elev2').  Is there a
faster method that leverages existing R functionality?  I feel like the
hack below is pretty clunky and can be sped up on the true dataset.  I
would like to run a more efficient script many times adjusting the value of
'per'.

ts <- 1:1000
examp <- data.frame(ts=ts, stage=sin(ts))

hi1 <- list()
hi2 <- list()
per <- 7
elev1 <- 0.6
elev2 <- 0.85
for(i in per:nrow(examp)){
examp_per <- examp[seq(i - (per - 1), i, by=1),]
stg_hi_cond1 <- subset(examp_per, examp_per$stage > elev1)
stg_hi_cond2 <- subset(examp_per, examp_per$stage > elev2)

hi1 <- c(hi1, nrow(stg_hi_cond1))
hi2 <- c(hi2, nrow(stg_hi_cond2))
}
examp$days_abv_0.6_in_last_7   <- c(rep(NA, times=per-1), unlist(hi1))
examp$days_abv_0.85_in_last_7  <- c(rep(NA, times=per-1), unlist(hi2))

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nls() and loop

2017-10-21 Thread Martin Maechler
Dear Vangi,

> Evangelina Viotto 
> on Fri, 20 Oct 2017 11:37:12 -0300 writes:

> Hello I´m need fitt growth curve with data length-age. I want to evaluate
> which is the function that best predicts my data, to do so I compare the
> Akaikes of different models. I'm now need to evaluate if changing the
> initial values changes the parameters and which do not allow to estimate
> the model.

> To do this I use the function nls(); 

good!

> and I randomize the initial values (real positive number).

Not a very good idea, I'm sorry to say:
You have enough observations to fit such a simple 3-parameter model:

I'm using your data showing you two models, both provided with R
as "self starting models" already.
Self starting means the use the data (and math and linear least
squares) to get good initial (aka "starting") values for the parameters.

The first model, SSasymp() is equivalent yours but more smartly
parametrized: it uses exp(lrc) (!) -- see   help(SSasymp)   in R, 
the 2nd model assumes the true curve goes through the origin ( = (0,0)
and hence uses one parameter less.
As we will see, both models fit ok, but the more simple models
may be preferable.

Here is the (self contained) R code, including your data at the beginning:



ANO <- c( 1.65, 1.69, 1.72, 1.72, 1.72, 1.72, 1.73, 2.66 ,2.66, 2.66, 2.66, 
2.76, 2.76, 2.76 ,2.76, 2.78, 2.8, 3.65, 3.65 ,3.65, 3.78, 3.78, 5.07, 7.02, 
7.1, 7.81, 8.72, 8.74, 8.78, 8.8, 8.8, 8.83, 8.98, 9.1, 9.11, 9.75, 9.82, 9.84, 
9.87, 9.87, 10.99, 11.67, 11.8, 11.81, 13.93, 14.83, 15.82, 15.99, 16.87, 
16.88, 16.9, 17.68, 17.79, 17.8, 17.8)

SVL <- 
c(26.11,29.02,41.13,30.96,37.74,29.02,33.38,24.18,34.35,35.8,29.99,42.59,27.57,47.43,46.95,30.47,29.75,35.8,40.65,36.29,34.83,29.02,43.5,75,68,70,67.5,80,77.5,68,68,73.84,72.14,68,64.5,58.5,72.63,78.44,71.17,70.69,77,79,78,68.5,69.72,71.66,77,77,79,76.5,78.5,79,73,80,69.72)

d.SA <- data.frame(SVL, ANO) # creo data frame {but do _not_ call it 'data'}
str(d.SA) ## 55 x 2
summary(d.SA) # to get an idea;  the plot below is more useful

## MM: just do this: it's equivalent to your model (but better parametrized!)
fm1 <- nls(SVL ~ SSasymp(ANO, Asym, lrc, c0), data = d.SA)
summary(fm1)

## Compute nicely spaced predicted values for plotting:
ANO. <- seq(-1/2, 30, by = 1/8)
SVL. <- predict(fm1, newdata = list(ANO = ANO.))

plot(SVL ~ ANO, d.SA, xlim = range(ANO, ANO.), ylim = range(SVL, SVL.))
lines(SVL. ~ ANO., col="red", lwd=2)
abline(h = coef(fm1)[["Asym"]], col = "tomato", lty=2, lwd = 1.5)
abline(h=0, v=0, lty=3, lwd = 1/2)

## Both from summary  (where 'lrc' has large variance) and because of the fit,
## Trying the Asymptotic through the origin ==> 1 parameter less instead :
fmO <- nls(SVL ~ SSasympOrig(ANO, Asym, lrc), data = d.SA)
summary(fmO)## (much better determined)
SVL.O <- predict(fmO, newdata = list(ANO = ANO.))
lines(SVL.O ~ ANO., col="blue", lwd=2)# does go through origin (0,0)
abline(h = coef(fmO)[["Asym"]], col = "skyblue", lty=2, lwd = 1.5)

## look close, I'd probably take the simpler one:
## and classical statistical inference also does not see a significant
## difference between the two fitted models :
anova(fm1, fmO)
## Analysis of Variance Table

## Model 1: SVL ~ SSasymp(ANO, Asym, lrc, c0)
## Model 2: SVL ~ SSasympOrig(ANO, Asym, lrc)
##   Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
## 1 52 2635.1
## 2 53 2702.6 -1 -67.55   1.333 0.2535

---

I see that the 10 nice self-starting models that came with nls
already in the  1990's   are not known and probably not
understood by many.

I'm working at making their help pages nicer, notably by
slightly enhancing  the nice model-visualizing plot, you already now
get in R when you run

example(SSasymp)
or
example(SSasympOrig)

(but unfortunately, they currently use 'lwd = 0' to draw the asymptote
 which shows fine on a PDF but not on a typical my screen graphics device.)


Martin Maechler
ETH Zurich and R Core team

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] create a loop

2017-10-21 Thread Ismail SEZEN
you sould look at "boot" package. also search "bootstrap R" keywords in
google.

20 Eki 2017 23:12 tarihinde "Marna Wagley"  yazdı:

> Hi R Users,
> I do have very big data sets and wanted to run some of the analyses many
> times with randomization (1000 times).
> I have done the analysis using an example data but it need to be done with
> randomized data (1000 times). I am doing manually for 1 times but
> taking so much time, I wonder whether it is possible to perform the
> analysis with creating a loop for many replicated datasets?  The code and
> the example data sets are attached.
>
> I will be very grateful if someone help me to create the loop for the
> following example data and the analyses.
>
> I appreciate  your help.
>
>
> MW
>
> #
>
> dat1<-structure(list(RegionA = structure(c(1L, 1L, 2L, 3L, 3L, 4L, 5L, 5L,
> 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c("Ra", "Rb", "Rc",
> "Rd", "Re", "Rf"), class = "factor"), site = structure(c(1L, 12L, 13L, 14L,
> 15L, 16L, 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label
> = c("s1", "s10", "s11", "s12", "s13", "s14", "s15", "s16", "s17", "s18",
> "s19", "s2", "s3", "s4", "s5", "s6", "s7", "s8", "s9"), class = "factor"),
> temp = c(23L, 21L, 10L, 15L, 16L, 8L, 13L, 1L, 23L, 19L, 25L, 19L, 12L,
> 16L,
> 19L, 21L, 12L, 5L, 7L), group = structure(c(1L, 1L, 1L, 2L, 2L, 2L,
>
> 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B",
> "C"), class = "factor")), .Names = c("RegionA", "site", "temp", "group"),
> class = "data.frame", row.names = c(NA, -19L))
>
> head(dat1)
>
>
> dat2<-structure(list(group = structure(1:3, .Label = c("A", "B", "C"
>
> ), class = "factor"), totalP = c(250L, 375L, 180L), sampled = c(25L,
>
> 37L, 27L)), .Names = c("group", "total.pop", "sampled.pop"), class =
> "data.frame", row.names = c(NA,
>
> -3L))
>
>
> ##
>
> idx <- 1:nrow(dat1)
>
> lll <- split(idx, dat1$group)
>
>
> ##
>
> #Replication 1 create a resampled data
>
> 
>
> Replication1<-dat1[unlist(lapply(lll, sample, rep=TRUE)),]
>
>
> Summary.Rep1<-ddply(Replication1, c("group"), summarise,
>
> N= length(group),
>
>mean = mean(temp, na.rm=TRUE),
>
>sd   = sd(temp),
>
>se   = sd / sqrt(N),
>
>variance=sd^2
>
> )
>
> #merge two datasets (dat1 and dat2)
>
> Rep1<-merge(Summary.Rep1, dat2, by="group")
>
>
> #calclate adjusted mean. variance
>
> Rep1$adj.mean<-(Rep1$total.pop*Rep1$mean)/sum(Rep1$total.pop)
>
> Rep1$adj.var<-(Rep1$variance)/(Rep1$sampled.pop/(1-(Rep1$sampled.pop/Rep1$
> total.pop)))
>
> Rep1$over.adj.var<-(Rep1$total.pop/sum(Rep1$total.pop))^2*Rep1$adj.var
>
>
> Rep1$total<-Rep1$adj.mean*(Rep1$total.pop)
>
> ##
>
> Estimated.TotalTemp<-sum(Rep1$adj.mean)*sum(Rep1$total.pop)
>
> Estimated.totalvar<-sum(Rep1$adj.var)
>
> Estimated.SE<-sqrt(Estimated.totalvar)*sum(Rep1$total.pop)
>
> RESULTS.R1<-data.frame(Estimated.TotalTemp, SE=Estimated.SE)
>
> RESULTS.R1
>
>
>
>
> ##
>
> #Replication 2 create a resampled data
>
> 
>
> Replication2<-dat1[unlist(lapply(lll, sample, rep=TRUE)),]
>
>
> Summary.Rep2<-ddply(Replication2, c("group"), summarise,
>
> N= length(group),
>
>mean = mean(temp, na.rm=TRUE),
>
>sd   = sd(temp),
>
>se   = sd / sqrt(N),
>
>variance=sd^2
>
> )
>
> #merge two datasets
>
> Rep1<-merge(Summary.Rep2, dat2, by="group")
>
>
> #calclate adjusted mean. variance
>
> Rep2$adj.mean<-(Rep2$total.pop*Rep2$mean)/sum(Rep2$total.pop)
>
> Rep2$adj.var<-(Rep2$variance)/(Rep2$sampled.pop/(1-(Rep2$sampled.pop/Rep2$
> total.pop)))
>
> Rep2$over.adj.var<-(Rep2$total.pop/sum(Rep2$total.pop))^2*Rep2$adj.var
>
>
> Rep2$total<-Rep2$adj.mean*(Rep2$total.pop)
>
>
> ##
>
> Estimated.TotalTemp<-sum(Rep2$adj.mean)*sum(Rep2$total.pop)
>
> Estimated.totalvar<-sum(Rep2$adj.var)
>
> Estimated.SE<-sqrt(Estimated.totalvar)*sum(Rep2$total.pop)
>
> RESULTS.R2<-data.frame(Estimated.TotalTemp, SE=Estimated.SE)
>
>
> ##
>
> #combined all results from 1000 runs
>
> ALL.Results(Restult.R1, Result.R2)
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] create a loop

2017-10-20 Thread Marna Wagley
Hi R Users,
I do have very big data sets and wanted to run some of the analyses many
times with randomization (1000 times).
I have done the analysis using an example data but it need to be done with
randomized data (1000 times). I am doing manually for 1 times but
taking so much time, I wonder whether it is possible to perform the
analysis with creating a loop for many replicated datasets?  The code and
the example data sets are attached.

I will be very grateful if someone help me to create the loop for the
following example data and the analyses.

I appreciate  your help.


MW

#

dat1<-structure(list(RegionA = structure(c(1L, 1L, 2L, 3L, 3L, 4L, 5L, 5L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c("Ra", "Rb", "Rc",
"Rd", "Re", "Rf"), class = "factor"), site = structure(c(1L, 12L, 13L, 14L,
15L, 16L, 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label
= c("s1", "s10", "s11", "s12", "s13", "s14", "s15", "s16", "s17", "s18",
"s19", "s2", "s3", "s4", "s5", "s6", "s7", "s8", "s9"), class = "factor"),
temp = c(23L, 21L, 10L, 15L, 16L, 8L, 13L, 1L, 23L, 19L, 25L, 19L, 12L, 16L,
19L, 21L, 12L, 5L, 7L), group = structure(c(1L, 1L, 1L, 2L, 2L, 2L,

2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B",
"C"), class = "factor")), .Names = c("RegionA", "site", "temp", "group"),
class = "data.frame", row.names = c(NA, -19L))

head(dat1)


dat2<-structure(list(group = structure(1:3, .Label = c("A", "B", "C"

), class = "factor"), totalP = c(250L, 375L, 180L), sampled = c(25L,

37L, 27L)), .Names = c("group", "total.pop", "sampled.pop"), class =
"data.frame", row.names = c(NA,

-3L))


##

idx <- 1:nrow(dat1)

lll <- split(idx, dat1$group)


##

#Replication 1 create a resampled data



Replication1<-dat1[unlist(lapply(lll, sample, rep=TRUE)),]


Summary.Rep1<-ddply(Replication1, c("group"), summarise,

N= length(group),

   mean = mean(temp, na.rm=TRUE),

   sd   = sd(temp),

   se   = sd / sqrt(N),

   variance=sd^2

)

#merge two datasets (dat1 and dat2)

Rep1<-merge(Summary.Rep1, dat2, by="group")


#calclate adjusted mean. variance

Rep1$adj.mean<-(Rep1$total.pop*Rep1$mean)/sum(Rep1$total.pop)

Rep1$adj.var<-(Rep1$variance)/(Rep1$sampled.pop/(1-(Rep1$sampled.pop/Rep1$
total.pop)))

Rep1$over.adj.var<-(Rep1$total.pop/sum(Rep1$total.pop))^2*Rep1$adj.var


Rep1$total<-Rep1$adj.mean*(Rep1$total.pop)

##

Estimated.TotalTemp<-sum(Rep1$adj.mean)*sum(Rep1$total.pop)

Estimated.totalvar<-sum(Rep1$adj.var)

Estimated.SE<-sqrt(Estimated.totalvar)*sum(Rep1$total.pop)

RESULTS.R1<-data.frame(Estimated.TotalTemp, SE=Estimated.SE)

RESULTS.R1




##

#Replication 2 create a resampled data



Replication2<-dat1[unlist(lapply(lll, sample, rep=TRUE)),]


Summary.Rep2<-ddply(Replication2, c("group"), summarise,

N= length(group),

   mean = mean(temp, na.rm=TRUE),

   sd   = sd(temp),

   se   = sd / sqrt(N),

   variance=sd^2

)

#merge two datasets

Rep1<-merge(Summary.Rep2, dat2, by="group")


#calclate adjusted mean. variance

Rep2$adj.mean<-(Rep2$total.pop*Rep2$mean)/sum(Rep2$total.pop)

Rep2$adj.var<-(Rep2$variance)/(Rep2$sampled.pop/(1-(Rep2$sampled.pop/Rep2$
total.pop)))

Rep2$over.adj.var<-(Rep2$total.pop/sum(Rep2$total.pop))^2*Rep2$adj.var


Rep2$total<-Rep2$adj.mean*(Rep2$total.pop)


##

Estimated.TotalTemp<-sum(Rep2$adj.mean)*sum(Rep2$total.pop)

Estimated.totalvar<-sum(Rep2$adj.var)

Estimated.SE<-sqrt(Estimated.totalvar)*sum(Rep2$total.pop)

RESULTS.R2<-data.frame(Estimated.TotalTemp, SE=Estimated.SE)


##

#combined all results from 1000 runs

ALL.Results(Restult.R1, Result.R2)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nls() and loop

2017-10-20 Thread J C Nash
Yes, some form of try() is often needed with nls() to avoid scripts stopping.

You might also find nlxb() from package nlsr more reliable in finding 
solutions. It uses analytic derivatives if
available if the model is given as an expression, and a Marquardt stabilized 
solver. But do expect it to take more
iterations. The syntax is close, but not perfectly equivalent, to that of nls().

JN

On 2017-10-20 11:20 AM, Jeff Newmiller wrote:
> ?tryCatch
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nls() and loop

2017-10-20 Thread Jeff Newmiller
?tryCatch
-- 
Sent from my phone. Please excuse my brevity.

On October 20, 2017 7:37:12 AM PDT, Evangelina Viotto 
 wrote:
>Hello I´m need fitt growth curve with data length-age. I want to
>evaluate
>which is the function that best predicts my data, to do so I compare
>the
>Akaikes of different models. I'm now need to evaluate if changing the
>initial values changes the parameters and which do not allow to
>estimate
>the model.
>To do this I use the function nls(); and I randomize the initial values
>(real positive number).  To that I put it inside a function that every
>time
>q is executed it changes the initial parameters and affter and then do
>a
>loop  y and  save a list of the results that interest me in the
>function.
>this problem is does not converge by the initial values, the loop stops
>and
>throws error.
>I need to continue and  save initial values with the error that
>generates
>those values
>
>
>Cheers
>
>Vangi
>
>
>
>ANO<- c( 1.65, 1.69, 1.72, 1.72, 1.72, 1.72, 1.73, 2.66 ,2.66, 2.66,
>2.66,
>2.76, 2.76, 2.76 ,2.76, 2.78, 2.8, 3.65, 3.65 ,3.65, 3.78, 3.78, 5.07,
>7.02,
>7.1, 7.81, 8.72, 8.74, 8.78, 8.8, 8.8, 8.83, 8.98, 9.1, 9.11, 9.75,
>9.82,
>9.84, 9.87, 9.87, 10.99, 11.67, 11.8, 11.81, 13.93, 14.83, 15.82,
>15.99,
>16.87, 16.88, 16.9, 17.68, 17.79, 17.8, 17.8)
>
>
>SVL<-c(26.11,29.02,41.13,30.96,37.74,29.02,33.38,24.18,34.35,35.8,29.99,42.59,27.57,47.43,46.95,30.47,29.75,35.8,40.65,36.29,34.83,29.02,43.5,75,68,70,67.5,80,77.5,68,68,73.84,72.14,68,64.5,58.5,72.63,78.44,71.17,70.69,77,79,78,68.5,69.72,71.66,77,77,79,76.5,78.5,79,73,80,69.72)
>
>data<-data.frame (SVL, ANO)# creo data frame
>data
>> Logiscorri<-function(){
>+   a<-runif(1, min=0, max=150)#devuelve 1 al azar dentro de un max y
>un
>min
>+   b<-runif(1, min=0, max=100)
>+   g<-runif (1, min=0, max=1)
>+   d<-runif (1,min=0, max=100)
>+
>+   ## estimo la curva de distribucion de mis datos
>+   caiman<-nls(SVL~DE+(alfa/(1+exp(-gamma*ANO))),
>+   data=data,
>+   start=list(alfa= a  ,gamma= g, DE= d),
>+   control=nls.control(maxiter = 100, warnOnly=TRUE),
>+   trace=FALSE)
>+   caimansum<-summary(caiman)#ME DA LOS PARAMETROS ESTIMADO, EL NUM DE
>ITERACIONES
>+   ## analizamos akaike
>+   akaike<-AIC(caiman)
>+   Bayesiano<-BIC(caiman)
>+   alfa<-coef(caiman)[1]
>+   beta<-coef(caiman)[2]
>+   gamma<- coef(caiman)[3]
>+   DE<- coef(caiman)[4]
>+   formu<-formula(caiman)
>+
>+   ValoresIniciales<-c(a, g, d)
>+   resultados<-list(formu, caimansum, ValoresIniciales, akaike,
>Bayesiano)
>+   return(resultados)
>+ }
>> Logiscorri()
>[[1]]
>SVL ~ DE + (alfa/(1 + exp(-gamma * ANO)))
>
>
>[[2]]
>
>Formula: SVL ~ DE + (alfa/(1 + exp(-gamma * ANO)))
>
>Parameters:
>  Estimate Std. Error t value Pr(>|t|)
>alfa  133.0765 6.9537  19.138  < 2e-16 ***
>gamma   0.2746 0.0371   7.401 1.13e-09 ***
>DE-54.0467 7.1047  -7.607 5.34e-10 ***
>---
>Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
>Residual standard error: 6.821 on 52 degrees of freedom
>
>Number of iterations to convergence: 30
>Achieved convergence tolerance: 4.995e-06
>
>
>[[3]]
>[1] 112.2528283   0.4831461  38.5151401
>
>[[4]]
>[1] 372.2001
>
>[[5]]
>[1] 380.2294
>
>> resultados<-list()
>> resultados
>list()
>> for(i in 1:10){
>+   resultados[i]<- list(Logiscorri())
>+ }
>Error in chol2inv(object$m$Rmat()) :
>  element (2, 2) is zero, so the inverse cannot be computed
>In addition: Warning message:
>In nls(SVL ~ DE + (alfa/(1 + exp(-gamma * ANO))), data = data, start =
>list(alfa = a,  :
>  singular gradient
>> names(resultados)<- 1:10
>Error in names(resultados) <- 1:10 :
>  'names' attribute [10] must be the same length as the vector [4]
>> parametros<- t(sapply(LogisticoConCorri, "[[", "parámetros")) #estp
>lo
>que hace es ir item por item de la lista y sacar los parámetros
>Error in FUN(X[[i]], ...) : subscript out of bounds
>> colnames(parametros)<- c("alfa", "beta", "gamma", "DE")
>Error in dimnames(x) <- dn :
>  length of 'dimnames' [2] not equal to array extent
>> akaikefinal<- sapply(LogisticoConCorri, "[[", "akaike")#esto va item
>por
>item de la lista y saca el akaike
>Error in FUN(X[[i]], ...) : subscript out of bounds
>> bayesfinal<- sapply(LogisticoConCorri, "[[", "Bayesiano")
>Error in FUN(X[[i]], ...) : subscript out of bounds
>>
>> --
>Biól. Evangelina V. Viotto
>Laboratorio Ecología Animal
>Centro de investigaciones Científicas y de Transferencias de
>Tecnología Aplicada a la Producción
>(CICyTTP-CONICET-UADER)
>Diamante, Entre Ríos
>Argentina
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__

[R] nls() and loop

2017-10-20 Thread Evangelina Viotto
Hello I´m need fitt growth curve with data length-age. I want to evaluate
which is the function that best predicts my data, to do so I compare the
Akaikes of different models. I'm now need to evaluate if changing the
initial values changes the parameters and which do not allow to estimate
the model.
To do this I use the function nls(); and I randomize the initial values
(real positive number).  To that I put it inside a function that every time
q is executed it changes the initial parameters and affter and then do a
loop  y and  save a list of the results that interest me in the function.
this problem is does not converge by the initial values, the loop stops and
throws error.
I need to continue and  save initial values with the error that generates
those values


Cheers

Vangi



ANO<- c( 1.65, 1.69, 1.72, 1.72, 1.72, 1.72, 1.73, 2.66 ,2.66, 2.66, 2.66,
2.76, 2.76, 2.76 ,2.76, 2.78, 2.8, 3.65, 3.65 ,3.65, 3.78, 3.78, 5.07, 7.02,
7.1, 7.81, 8.72, 8.74, 8.78, 8.8, 8.8, 8.83, 8.98, 9.1, 9.11, 9.75, 9.82,
9.84, 9.87, 9.87, 10.99, 11.67, 11.8, 11.81, 13.93, 14.83, 15.82, 15.99,
16.87, 16.88, 16.9, 17.68, 17.79, 17.8, 17.8)


SVL<-c(26.11,29.02,41.13,30.96,37.74,29.02,33.38,24.18,34.35,35.8,29.99,42.59,27.57,47.43,46.95,30.47,29.75,35.8,40.65,36.29,34.83,29.02,43.5,75,68,70,67.5,80,77.5,68,68,73.84,72.14,68,64.5,58.5,72.63,78.44,71.17,70.69,77,79,78,68.5,69.72,71.66,77,77,79,76.5,78.5,79,73,80,69.72)

data<-data.frame (SVL, ANO)# creo data frame
data
> Logiscorri<-function(){
+   a<-runif(1, min=0, max=150)#devuelve 1 al azar dentro de un max y un
min
+   b<-runif(1, min=0, max=100)
+   g<-runif (1, min=0, max=1)
+   d<-runif (1,min=0, max=100)
+
+   ## estimo la curva de distribucion de mis datos
+   caiman<-nls(SVL~DE+(alfa/(1+exp(-gamma*ANO))),
+   data=data,
+   start=list(alfa= a  ,gamma= g, DE= d),
+   control=nls.control(maxiter = 100, warnOnly=TRUE),
+   trace=FALSE)
+   caimansum<-summary(caiman)#ME DA LOS PARAMETROS ESTIMADO, EL NUM DE
ITERACIONES
+   ## analizamos akaike
+   akaike<-AIC(caiman)
+   Bayesiano<-BIC(caiman)
+   alfa<-coef(caiman)[1]
+   beta<-coef(caiman)[2]
+   gamma<- coef(caiman)[3]
+   DE<- coef(caiman)[4]
+   formu<-formula(caiman)
+
+   ValoresIniciales<-c(a, g, d)
+   resultados<-list(formu, caimansum, ValoresIniciales, akaike, Bayesiano)
+   return(resultados)
+ }
> Logiscorri()
[[1]]
SVL ~ DE + (alfa/(1 + exp(-gamma * ANO)))


[[2]]

Formula: SVL ~ DE + (alfa/(1 + exp(-gamma * ANO)))

Parameters:
  Estimate Std. Error t value Pr(>|t|)
alfa  133.0765 6.9537  19.138  < 2e-16 ***
gamma   0.2746 0.0371   7.401 1.13e-09 ***
DE-54.0467 7.1047  -7.607 5.34e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6.821 on 52 degrees of freedom

Number of iterations to convergence: 30
Achieved convergence tolerance: 4.995e-06


[[3]]
[1] 112.2528283   0.4831461  38.5151401

[[4]]
[1] 372.2001

[[5]]
[1] 380.2294

> resultados<-list()
> resultados
list()
> for(i in 1:10){
+   resultados[i]<- list(Logiscorri())
+ }
Error in chol2inv(object$m$Rmat()) :
  element (2, 2) is zero, so the inverse cannot be computed
In addition: Warning message:
In nls(SVL ~ DE + (alfa/(1 + exp(-gamma * ANO))), data = data, start =
list(alfa = a,  :
  singular gradient
> names(resultados)<- 1:10
Error in names(resultados) <- 1:10 :
  'names' attribute [10] must be the same length as the vector [4]
> parametros<- t(sapply(LogisticoConCorri, "[[", "parámetros")) #estp lo
que hace es ir item por item de la lista y sacar los parámetros
Error in FUN(X[[i]], ...) : subscript out of bounds
> colnames(parametros)<- c("alfa", "beta", "gamma", "DE")
Error in dimnames(x) <- dn :
  length of 'dimnames' [2] not equal to array extent
> akaikefinal<- sapply(LogisticoConCorri, "[[", "akaike")#esto va item por
item de la lista y saca el akaike
Error in FUN(X[[i]], ...) : subscript out of bounds
> bayesfinal<- sapply(LogisticoConCorri, "[[", "Bayesiano")
Error in FUN(X[[i]], ...) : subscript out of bounds
>
> --
Biól. Evangelina V. Viotto
Laboratorio Ecología Animal
Centro de investigaciones Científicas y de Transferencias de
Tecnología Aplicada a la Producción
(CICyTTP-CONICET-UADER)
Diamante, Entre Ríos
Argentina

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Nested for loop

2017-08-08 Thread S Ellison
> The code I've attached works for a population of 400 and samples 100 times.
> I'd like to extend this to 300 samples and 3 populations. So, the x-axis would
> range from 0-300 samples.
> 
> What I'm having trouble with is finding a way to change the population mid-
> way through the function. I want samples 1-100 to be taken from a
> population of 400, samples 101-200 to be taken from a sample of 800 and
> samples 201-300 from a population of 300. The end result should look
> something like a heart rate monitor.

You could write your function to take a list of either subpopulations or sets 
of population parameters, lapply your simulation generator over the list and 
(assuming the output from each of those is a vector) use c(that.list, 
recursive=TRUE) to concatenate the resulting list of vectors into a single 
vector.


S Ellison


***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nested for loop

2017-08-07 Thread Kirsten Morehouse
Hi Caitlin and Ben,

Thanks for your responses! My issue is that I'd like to create one
continuous line, rather than 3 lines overlayed.

The code I've attached works for a population of 400 and samples 100 times.
I'd like to extend this to 300 samples and 3 populations. So, the x-axis
would range from 0-300 samples.

What I'm having trouble with is finding a way to change the population
mid-way through the function. I want samples 1-100 to be taken from a
population of 400, samples 101-200 to be taken from a sample of 800 and
samples 201-300 from a population of 300. The end result should look
something like a heart rate monitor.

Aside from the rationale, does what I'm explaining make sense?

Best,

Kirsten

On Mon, Aug 7, 2017 at 3:18 PM, Caitlin  wrote:

> Hi.
>
> A nested for loop is not terribly efficient (it's O(n^2)). Can you
> vectorize it? If so, this would be a far more efficient and faster approach.
>
> ~Caitlin
>
> On Saturday, August 5, 2017, Kirsten Morehouse 
> wrote:
>
>> Hi! Thanks for taking the time to read this.
>>
>> The code below creates a graph that takes 100 samples that are between 5%
>> and 15% of the population (400).
>>
>> What I'd like to do, however, is add two other sections to the graph. It
>> would look something like this:
>>
>> from 1-100 samples take 100 samples that are between 5% and 15% of the
>> population (400). From 101-200 take 100 samples that are between 5% and
>> 15%
>> of the population (800). From 201-300 take 100 samples that are between 5%
>> and 15% of the population (300).
>>
>> I assume this would require a nested for loop. Does anyone have advice as
>> to how to do this?
>>
>> Thanks for your time. Kirsten
>>
>> ## Mark-Recapture
>> ## Estimate popoulation from repeated sampling
>>
>> ## Population size
>> N <- 400
>> N
>>
>> ## Vector labeling each item in the population
>> pop <- c(1:N)
>> pop
>>
>> ## Lower and upper bounds of sample size
>> lower.bound <- round(x = .05 * N, digits = 0)
>> lower.bound ## Smallest possible sample size
>>
>> upper.bound <- round(x = .15 * N, digits = 0)
>> upper.bound ## Largest possible sample size
>>
>> ## Length of sample size interval
>> length.ss.interval <- length(c(lower.bound:upper.bound))
>> length.ss.interval ## total possible sample sizes, ranging form
>> lower.bound
>> to upper.bound
>>
>> ## Determine a sample size randomly (not a global variable...simply for
>> test purposes)
>> ## Between lower and upper bounds set previously
>> ## Give equal weight to each possible sample size in this interval
>> sample(x = c(lower.bound:upper.bound),
>>size = 1,
>>prob = c(rep(1/length.ss.interval, length.ss.interval)))
>>
>> ## Specify number of samples to take
>> n.samples <- 100
>>
>> ## Initiate empty matrix
>> ## 1st column is population (item 1 thorugh item 400)
>> ## 2nd through nth column are all rounds of sampling
>> dat <- matrix(data = NA,
>>   nrow = length(pop),
>>   ncol = n.samples + 1)
>>
>> dat[,1] <- pop
>>
>> dat
>>
>> ## Take samples of random sizes
>> ## Record results in columns 2 through n
>> ## 1 = sampled (marked)
>> ## 0 = not sampled (not marked)
>> for(i in 2:ncol(dat)) {
>>   a.sample <- sample(x = pop,
>>  size = sample(x = c(lower.bound:upper.bound),
>>size = 1,
>>prob = c(rep(1/length.ss.interval,
>> length.ss.interval))),
>>  replace = FALSE)
>>   dat[,i] <- dat[,1] %in% a.sample
>> }
>>
>> ## How large was each sample size?
>> apply(X = dat, MARGIN = 2, FUN = sum)
>> ## 1st element is irrelevant
>> ## 2nd element through nth element: sample size for each of the 100
>> samples
>>
>> ## At this point, all computations can be done using dat
>>
>> ## Create Schnabel dataframe using dat
>> ## Google the Schnabel formula
>>
>> schnabel.comp <- data.frame(sample = 1:n.samples,
>> n.sampled = apply(X = dat, MARGIN = 2, FUN =
>> sum)[2:length(apply(X = dat, MARGIN = 2, FUN = sum))]
>> )
>>
>> ## First column: which sample, 1-100
>> ## Second column: number selected in that sample
>>
>>
>> ## How many items were previously sampled?
>> ## For 1st sample, it's 0
>> ## For 2nd sample, code is different than for remaning samples
>>
>> n.prev.sampled <- c(0, rep(NA, n.samples-1))
>> n.prev.sampled
>>
>> n.prev.sampled[2] <- sum(ifelse(test = dat[,3] == 1 & dat[,2] == 1,
>> yes = 1,
>> no = 0))
>>
>> n.prev.sampled
>>
>> for(i in 4:ncol(dat)) {
>>   n.prev.sampled[i-1] <- sum(ifelse(test = dat[,i] == 1 &
>> rowSums(dat[,2:(i-1)]) > 0,
>> yes = 1,
>> no = 0))
>> }
>>
>> schnabel.comp$n.prev.sampled <- n.prev.sampled
>>
>> ## n.newly.sampled: in each sample, how many items were newly sampled?
>> ## i.e., never seen before?
>> 

Re: [R] Nested for loop

2017-08-07 Thread Ben Tupper
Hmmm.

If I understand you correctly, your question has to do with adding lines to 
your graph?  If so, my ggplot2 skills are sort of floppy, but you could append 
your sampling results to your data frame (one for each sample set) and then 
simply add layers.  Sort of like this.

N <- 10
x <- 1:N
df <- data.frame(
x = x,
y1 = sample(x, N, replace = TRUE),
y2 = sample(x, N, replace = TRUE),
y3 = sample(x, N, replace = TRUE))

ggplot(df, mapping = aes(x = x, y = y1)) +
geom_point(aes(y = y1), col = 'orange') + geom_line(aes(y = y1),col = 
'orange') +
geom_point(aes(y = y2), col = 'blue') + geom_line(aes(y = y2), col = 
'blue') +
geom_point(aes(y = y3), col = 'gray') + geom_line(aes(y = y3), col = 
'gray')

If plotting is not the issue then I don't understand what your question is.

Cheers,
Ben


> On Aug 6, 2017, at 3:44 PM, Kirsten Morehouse  wrote:
> 
> Hi Ben,
> 
> That's exactly right! Except for each set it's the sample population that is 
> 400, 800 or 300. I want to take 3 samples, each of 100, where only the 
> population differs. I can do this separately, but I'm having trouble putting 
> them all on the same graph. 
> 
> I'd like to have sample on the x axis (1-300) and estimate on the y axis. I 
> want to show how population affects the estimates. 
> 
> Does this make more sense?
> 
> Thanks for your time!
> 
> Kirsten 
> On Sun, Aug 6, 2017 at 3:21 PM Ben Tupper  > wrote:
> Hi Kirsten,
> 
> 
> 
> I can run your example code but I can't quite follow your division of 
> sampling.  Can you restate the the task?  Below is what I think you are 
> asking for, but I have the feeling I may be off the mark.
> 
> 
> 
> 
> 
> Set A: 400 samples, draw 100 in range of 5 to 15
> 
> 
> 
> Set B: 800 samples, draw 100 in range of 5 to 15
> 
> 
> 
> Set C: 300 samples, draw 100 in range of 5 to 15
> 
> 
> 
> Ben
> 
> 
> 
> > On Aug 5, 2017, at 9:21 AM, Kirsten Morehouse  > > wrote:
> 
> >
> 
> > Hi! Thanks for taking the time to read this.
> 
> >
> 
> > The code below creates a graph that takes 100 samples that are between 5%
> 
> > and 15% of the population (400).
> 
> >
> 
> > What I'd like to do, however, is add two other sections to the graph. It
> 
> > would look something like this:
> 
> >
> 
> > from 1-100 samples take 100 samples that are between 5% and 15% of the
> 
> > population (400). From 101-200 take 100 samples that are between 5% and 15%
> 
> > of the population (800). From 201-300 take 100 samples that are between 5%
> 
> > and 15% of the population (300).
> 
> >
> 
> > I assume this would require a nested for loop. Does anyone have advice as
> 
> > to how to do this?
> 
> >
> 
> > Thanks for your time. Kirsten
> 
> >
> 
> > ## Mark-Recapture
> 
> > ## Estimate popoulation from repeated sampling
> 
> >
> 
> > ## Population size
> 
> > N <- 400
> 
> > N
> 
> >
> 
> > ## Vector labeling each item in the population
> 
> > pop <- c(1:N)
> 
> > pop
> 
> >
> 
> > ## Lower and upper bounds of sample size
> 
> > lower.bound <- round(x = .05 * N, digits = 0)
> 
> > lower.bound ## Smallest possible sample size
> 
> >
> 
> > upper.bound <- round(x = .15 * N, digits = 0)
> 
> > upper.bound ## Largest possible sample size
> 
> >
> 
> > ## Length of sample size interval
> 
> > length.ss.interval <- length(c(lower.bound:upper.bound))
> 
> > length.ss.interval ## total possible sample sizes, ranging form lower.bound
> 
> > to upper.bound
> 
> >
> 
> > ## Determine a sample size randomly (not a global variable...simply for
> 
> > test purposes)
> 
> > ## Between lower and upper bounds set previously
> 
> > ## Give equal weight to each possible sample size in this interval
> 
> > sample(x = c(lower.bound:upper.bound),
> 
> >   size = 1,
> 
> >   prob = c(rep(1/length.ss.interval, length.ss.interval)))
> 
> >
> 
> > ## Specify number of samples to take
> 
> > n.samples <- 100
> 
> >
> 
> > ## Initiate empty matrix
> 
> > ## 1st column is population (item 1 thorugh item 400)
> 
> > ## 2nd through nth column are all rounds of sampling
> 
> > dat <- matrix(data = NA,
> 
> >  nrow = length(pop),
> 
> >  ncol = n.samples + 1)
> 
> >
> 
> > dat[,1] <- pop
> 
> >
> 
> > dat
> 
> >
> 
> > ## Take samples of random sizes
> 
> > ## Record results in columns 2 through n
> 
> > ## 1 = sampled (marked)
> 
> > ## 0 = not sampled (not marked)
> 
> > for(i in 2:ncol(dat)) {
> 
> >  a.sample <- sample(x = pop,
> 
> > size = sample(x = c(lower.bound:upper.bound),
> 
> >   size = 1,
> 
> >   prob = c(rep(1/length.ss.interval,
> 
> > length.ss.interval))),
> 
> > replace = FALSE)
> 
> >  dat[,i] <- dat[,1] %in% a.sample
> 
> > }
> 
> >
> 
> > ## How large was each sample size?
> 
> > apply(X = dat, MARGIN 

Re: [R] Nested for loop

2017-08-06 Thread Kirsten Morehouse
Hi Ben,

That's exactly right! Except for each set it's the sample population that
is 400, 800 or 300. I want to take 3 samples, each of 100, where only the
population differs. I can do this separately, but I'm having trouble
putting them all on the same graph.

I'd like to have sample on the x axis (1-300) and estimate on the y axis. I
want to show how population affects the estimates.

Does this make more sense?

Thanks for your time!

Kirsten
On Sun, Aug 6, 2017 at 3:21 PM Ben Tupper  wrote:

> Hi Kirsten,
>
>
>
> I can run your example code but I can't quite follow your division of
> sampling.  Can you restate the the task?  Below is what I think you are
> asking for, but I have the feeling I may be off the mark.
>
>
>
>
>
> Set A: 400 samples, draw 100 in range of 5 to 15
>
>
>
> Set B: 800 samples, draw 100 in range of 5 to 15
>
>
>
> Set C: 300 samples, draw 100 in range of 5 to 15
>
>
>
> Ben
>
>
>
> > On Aug 5, 2017, at 9:21 AM, Kirsten Morehouse 
> wrote:
>
> >
>
> > Hi! Thanks for taking the time to read this.
>
> >
>
> > The code below creates a graph that takes 100 samples that are between 5%
>
> > and 15% of the population (400).
>
> >
>
> > What I'd like to do, however, is add two other sections to the graph. It
>
> > would look something like this:
>
> >
>
> > from 1-100 samples take 100 samples that are between 5% and 15% of the
>
> > population (400). From 101-200 take 100 samples that are between 5% and
> 15%
>
> > of the population (800). From 201-300 take 100 samples that are between
> 5%
>
> > and 15% of the population (300).
>
> >
>
> > I assume this would require a nested for loop. Does anyone have advice as
>
> > to how to do this?
>
> >
>
> > Thanks for your time. Kirsten
>
> >
>
> > ## Mark-Recapture
>
> > ## Estimate popoulation from repeated sampling
>
> >
>
> > ## Population size
>
> > N <- 400
>
> > N
>
> >
>
> > ## Vector labeling each item in the population
>
> > pop <- c(1:N)
>
> > pop
>
> >
>
> > ## Lower and upper bounds of sample size
>
> > lower.bound <- round(x = .05 * N, digits = 0)
>
> > lower.bound ## Smallest possible sample size
>
> >
>
> > upper.bound <- round(x = .15 * N, digits = 0)
>
> > upper.bound ## Largest possible sample size
>
> >
>
> > ## Length of sample size interval
>
> > length.ss.interval <- length(c(lower.bound:upper.bound))
>
> > length.ss.interval ## total possible sample sizes, ranging form
> lower.bound
>
> > to upper.bound
>
> >
>
> > ## Determine a sample size randomly (not a global variable...simply for
>
> > test purposes)
>
> > ## Between lower and upper bounds set previously
>
> > ## Give equal weight to each possible sample size in this interval
>
> > sample(x = c(lower.bound:upper.bound),
>
> >   size = 1,
>
> >   prob = c(rep(1/length.ss.interval, length.ss.interval)))
>
> >
>
> > ## Specify number of samples to take
>
> > n.samples <- 100
>
> >
>
> > ## Initiate empty matrix
>
> > ## 1st column is population (item 1 thorugh item 400)
>
> > ## 2nd through nth column are all rounds of sampling
>
> > dat <- matrix(data = NA,
>
> >  nrow = length(pop),
>
> >  ncol = n.samples + 1)
>
> >
>
> > dat[,1] <- pop
>
> >
>
> > dat
>
> >
>
> > ## Take samples of random sizes
>
> > ## Record results in columns 2 through n
>
> > ## 1 = sampled (marked)
>
> > ## 0 = not sampled (not marked)
>
> > for(i in 2:ncol(dat)) {
>
> >  a.sample <- sample(x = pop,
>
> > size = sample(x = c(lower.bound:upper.bound),
>
> >   size = 1,
>
> >   prob = c(rep(1/length.ss.interval,
>
> > length.ss.interval))),
>
> > replace = FALSE)
>
> >  dat[,i] <- dat[,1] %in% a.sample
>
> > }
>
> >
>
> > ## How large was each sample size?
>
> > apply(X = dat, MARGIN = 2, FUN = sum)
>
> > ## 1st element is irrelevant
>
> > ## 2nd element through nth element: sample size for each of the 100
> samples
>
> >
>
> > ## At this point, all computations can be done using dat
>
> >
>
> > ## Create Schnabel dataframe using dat
>
> > ## Google the Schnabel formula
>
> >
>
> > schnabel.comp <- data.frame(sample = 1:n.samples,
>
> >n.sampled = apply(X = dat, MARGIN = 2, FUN =
>
> > sum)[2:length(apply(X = dat, MARGIN = 2, FUN = sum))]
>
> > )
>
> >
>
> > ## First column: which sample, 1-100
>
> > ## Second column: number selected in that sample
>
> >
>
> >
>
> > ## How many items were previously sampled?
>
> > ## For 1st sample, it's 0
>
> > ## For 2nd sample, code is different than for remaning samples
>
> >
>
> > n.prev.sampled <- c(0, rep(NA, n.samples-1))
>
> > n.prev.sampled
>
> >
>
> > n.prev.sampled[2] <- sum(ifelse(test = dat[,3] == 1 & dat[,2] == 1,
>
> >yes = 1,
>
> >no = 0))
>
> >
>
> > n.prev.sampled
>
> >
>
> > for(i in 4:ncol(dat)) {
>
> >  n.prev.sampled[i-1] <- sum(ifelse(test = dat[,i] == 1 &
>
> > 

Re: [R] Nested for loop

2017-08-06 Thread Ben Tupper
Hi Kirsten,

I can run your example code but I can't quite follow your division of sampling. 
 Can you restate the the task?  Below is what I think you are asking for, but I 
have the feeling I may be off the mark.


Set A: 400 samples, draw 100 in range of 5 to 15

Set B: 800 samples, draw 100 in range of 5 to 15

Set C: 300 samples, draw 100 in range of 5 to 15

Ben

> On Aug 5, 2017, at 9:21 AM, Kirsten Morehouse  wrote:
> 
> Hi! Thanks for taking the time to read this.
> 
> The code below creates a graph that takes 100 samples that are between 5%
> and 15% of the population (400).
> 
> What I'd like to do, however, is add two other sections to the graph. It
> would look something like this:
> 
> from 1-100 samples take 100 samples that are between 5% and 15% of the
> population (400). From 101-200 take 100 samples that are between 5% and 15%
> of the population (800). From 201-300 take 100 samples that are between 5%
> and 15% of the population (300).
> 
> I assume this would require a nested for loop. Does anyone have advice as
> to how to do this?
> 
> Thanks for your time. Kirsten
> 
> ## Mark-Recapture
> ## Estimate popoulation from repeated sampling
> 
> ## Population size
> N <- 400
> N
> 
> ## Vector labeling each item in the population
> pop <- c(1:N)
> pop
> 
> ## Lower and upper bounds of sample size
> lower.bound <- round(x = .05 * N, digits = 0)
> lower.bound ## Smallest possible sample size
> 
> upper.bound <- round(x = .15 * N, digits = 0)
> upper.bound ## Largest possible sample size
> 
> ## Length of sample size interval
> length.ss.interval <- length(c(lower.bound:upper.bound))
> length.ss.interval ## total possible sample sizes, ranging form lower.bound
> to upper.bound
> 
> ## Determine a sample size randomly (not a global variable...simply for
> test purposes)
> ## Between lower and upper bounds set previously
> ## Give equal weight to each possible sample size in this interval
> sample(x = c(lower.bound:upper.bound),
>   size = 1,
>   prob = c(rep(1/length.ss.interval, length.ss.interval)))
> 
> ## Specify number of samples to take
> n.samples <- 100
> 
> ## Initiate empty matrix
> ## 1st column is population (item 1 thorugh item 400)
> ## 2nd through nth column are all rounds of sampling
> dat <- matrix(data = NA,
>  nrow = length(pop),
>  ncol = n.samples + 1)
> 
> dat[,1] <- pop
> 
> dat
> 
> ## Take samples of random sizes
> ## Record results in columns 2 through n
> ## 1 = sampled (marked)
> ## 0 = not sampled (not marked)
> for(i in 2:ncol(dat)) {
>  a.sample <- sample(x = pop,
> size = sample(x = c(lower.bound:upper.bound),
>   size = 1,
>   prob = c(rep(1/length.ss.interval,
> length.ss.interval))),
> replace = FALSE)
>  dat[,i] <- dat[,1] %in% a.sample
> }
> 
> ## How large was each sample size?
> apply(X = dat, MARGIN = 2, FUN = sum)
> ## 1st element is irrelevant
> ## 2nd element through nth element: sample size for each of the 100 samples
> 
> ## At this point, all computations can be done using dat
> 
> ## Create Schnabel dataframe using dat
> ## Google the Schnabel formula
> 
> schnabel.comp <- data.frame(sample = 1:n.samples,
>n.sampled = apply(X = dat, MARGIN = 2, FUN =
> sum)[2:length(apply(X = dat, MARGIN = 2, FUN = sum))]
> )
> 
> ## First column: which sample, 1-100
> ## Second column: number selected in that sample
> 
> 
> ## How many items were previously sampled?
> ## For 1st sample, it's 0
> ## For 2nd sample, code is different than for remaning samples
> 
> n.prev.sampled <- c(0, rep(NA, n.samples-1))
> n.prev.sampled
> 
> n.prev.sampled[2] <- sum(ifelse(test = dat[,3] == 1 & dat[,2] == 1,
>yes = 1,
>no = 0))
> 
> n.prev.sampled
> 
> for(i in 4:ncol(dat)) {
>  n.prev.sampled[i-1] <- sum(ifelse(test = dat[,i] == 1 &
> rowSums(dat[,2:(i-1)]) > 0,
>yes = 1,
>no = 0))
> }
> 
> schnabel.comp$n.prev.sampled <- n.prev.sampled
> 
> ## n.newly.sampled: in each sample, how many items were newly sampled?
> ## i.e., never seen before?
> schnabel.comp$n.newly.sampled <- with(schnabel.comp,
>  n.sampled - n.prev.sampled)
> 
> ## cum.sampled: how many total items have you seen?
> schnabel.comp$cum.sampled <- c(0,
> cumsum(schnabel.comp$n.newly.sampled)[2:n.samples-1])
> 
> ## numerator of schnabel formula
> schnabel.comp$numerator <- with(schnabel.comp,
>n.sampled * cum.sampled)
> 
> ## denominator of schnable formula is n.prev.sampled
> 
> ## pop.estimate -- after each sample (starting with 2nd -- need at least
> two samples)
> schnabel.comp$pop.estimate <- NA
> 
> for(i in 1:length(schnabel.comp$pop.estimate)) {
>  schnabel.comp$pop.estimate[i] <- sum(schnabel.comp$numerator[1:i]) /

[R] Nested for loop

2017-08-05 Thread Kirsten Morehouse
Hi! Thanks for taking the time to read this.

The code below creates a graph that takes 100 samples that are between 5%
and 15% of the population (400).

What I'd like to do, however, is add two other sections to the graph. It
would look something like this:

from 1-100 samples take 100 samples that are between 5% and 15% of the
population (400). From 101-200 take 100 samples that are between 5% and 15%
of the population (800). From 201-300 take 100 samples that are between 5%
and 15% of the population (300).

I assume this would require a nested for loop. Does anyone have advice as
to how to do this?

Thanks for your time. Kirsten

## Mark-Recapture
## Estimate popoulation from repeated sampling

## Population size
N <- 400
N

## Vector labeling each item in the population
pop <- c(1:N)
pop

## Lower and upper bounds of sample size
lower.bound <- round(x = .05 * N, digits = 0)
lower.bound ## Smallest possible sample size

upper.bound <- round(x = .15 * N, digits = 0)
upper.bound ## Largest possible sample size

## Length of sample size interval
length.ss.interval <- length(c(lower.bound:upper.bound))
length.ss.interval ## total possible sample sizes, ranging form lower.bound
to upper.bound

## Determine a sample size randomly (not a global variable...simply for
test purposes)
## Between lower and upper bounds set previously
## Give equal weight to each possible sample size in this interval
sample(x = c(lower.bound:upper.bound),
   size = 1,
   prob = c(rep(1/length.ss.interval, length.ss.interval)))

## Specify number of samples to take
n.samples <- 100

## Initiate empty matrix
## 1st column is population (item 1 thorugh item 400)
## 2nd through nth column are all rounds of sampling
dat <- matrix(data = NA,
  nrow = length(pop),
  ncol = n.samples + 1)

dat[,1] <- pop

dat

## Take samples of random sizes
## Record results in columns 2 through n
## 1 = sampled (marked)
## 0 = not sampled (not marked)
for(i in 2:ncol(dat)) {
  a.sample <- sample(x = pop,
 size = sample(x = c(lower.bound:upper.bound),
   size = 1,
   prob = c(rep(1/length.ss.interval,
length.ss.interval))),
 replace = FALSE)
  dat[,i] <- dat[,1] %in% a.sample
}

## How large was each sample size?
apply(X = dat, MARGIN = 2, FUN = sum)
## 1st element is irrelevant
## 2nd element through nth element: sample size for each of the 100 samples

## At this point, all computations can be done using dat

## Create Schnabel dataframe using dat
## Google the Schnabel formula

schnabel.comp <- data.frame(sample = 1:n.samples,
n.sampled = apply(X = dat, MARGIN = 2, FUN =
sum)[2:length(apply(X = dat, MARGIN = 2, FUN = sum))]
)

## First column: which sample, 1-100
## Second column: number selected in that sample


## How many items were previously sampled?
## For 1st sample, it's 0
## For 2nd sample, code is different than for remaning samples

n.prev.sampled <- c(0, rep(NA, n.samples-1))
n.prev.sampled

n.prev.sampled[2] <- sum(ifelse(test = dat[,3] == 1 & dat[,2] == 1,
yes = 1,
no = 0))

n.prev.sampled

for(i in 4:ncol(dat)) {
  n.prev.sampled[i-1] <- sum(ifelse(test = dat[,i] == 1 &
rowSums(dat[,2:(i-1)]) > 0,
yes = 1,
no = 0))
}

schnabel.comp$n.prev.sampled <- n.prev.sampled

## n.newly.sampled: in each sample, how many items were newly sampled?
## i.e., never seen before?
schnabel.comp$n.newly.sampled <- with(schnabel.comp,
  n.sampled - n.prev.sampled)

## cum.sampled: how many total items have you seen?
schnabel.comp$cum.sampled <- c(0,
cumsum(schnabel.comp$n.newly.sampled)[2:n.samples-1])

## numerator of schnabel formula
schnabel.comp$numerator <- with(schnabel.comp,
n.sampled * cum.sampled)

## denominator of schnable formula is n.prev.sampled

## pop.estimate -- after each sample (starting with 2nd -- need at least
two samples)
schnabel.comp$pop.estimate <- NA

for(i in 1:length(schnabel.comp$pop.estimate)) {
  schnabel.comp$pop.estimate[i] <- sum(schnabel.comp$numerator[1:i]) /
sum(schnabel.comp$n.prev.sampled[1:i])
}


## Plot population estimate after each sample
if (!require("ggplot2")) {install.packages("ggplot2"); require("ggplot2")}
if (!require("scales")) {install.packages("scales"); require("scales")}


small.sample.dat <- schnabel.comp

small.sample <- ggplot(data = small.sample.dat,
   mapping = aes(x = sample, y = pop.estimate)) +
  geom_point(size = 2) +
  geom_line() +
  geom_hline(yintercept = N, col = "red", lwd = 1) +
  coord_cartesian(xlim = c(0:100), ylim = c(300:500)) +
  scale_x_continuous(breaks = pretty_breaks(11)) +
  scale_y_continuous(breaks = pretty_breaks(11)) +
  labs(x = "\nSample", y = "Population estimate\n",
   title = 

Re: [R] Rainbow in loop

2017-06-08 Thread WRAY NICHOLAS
Yep fabbo I can then call each vector as a separate element in a list and it
gives the colours...  Thanks aleph-null times Nick

> 
> On 08 June 2017 at 14:13 Boris Steipe  wrote:
> 
> 
> Does:
> 
> rainbow(3)[1]
> rainbow(3)[2]
> rainbow(3)[3]
> 
> ... solve your issue?
> 
> B.
> 
> 
> 
> 
> 
> > On Jun 8, 2017, at 8:20 AM, WRAY NICHOLAS 
> > wrote:
> >
> > Hi R folk I have a distance time graph for a locomotive and at various
> > times
> > different events occur on board the loco. I want to put a vertical line
> > on the
> > speed time graph for each event, but I want to colour each different
> > kind of
> > event differently to see visually whether there's any pattern to these
> > events
> > happening. I could just create a vector of colours and use abline which
> > is easy
> > obviously, but there's a different number of events for each loco and
> > what I
> > would like to do is to use rainbow to create a new palette for each
> > graph
> >
> > To illustrate I have some model code (made-up and v simplified) Real
> > times are
> > not necessarily whole numbers so there's not a one to one correspondence
> > between times and index of time elements
> >
> > sec.time<-seq(0,100)
> >
> > distance<-c(rep(0,
> > 10),rep(1,5),rep(2,20),rep(3,10),rep(4,20),rep(5,5),rep(6,31))
> > plot(sec.time,distance,type="l")
> > horntime<-c(7,23,52,67,81,90)
> > wipertime<-c(4,18,34,47,62,78,89)
> > calltime<-c(27,58,93)
> >
> > abline(v=sec.time[horntime], col="red")
> > abline(v=sec.time[wipertime], col="blue")
> > abline(v=sec.time[calltime], col="green")
> >
> > what I want, in this case as there are three events, is to have horn in
> > red,
> > wiper in blue and call in green using rainbow with 3. The problem is
> > that I
> > can't see how to call rainbow using a sequence (for loop doesn't work)
> > and
> > putting horn/wiper/call as vectors in a list doesn't work as it's too
> > recursive.
> > I suppose that I could put horn/wiper/call etc into a matrix of width
> > the
> > longest vector and fill the other spaces with dummy -1 but this seems a
> > bit
> > inelegant and also in principle I'd like to be able to call rainbow in
> > the
> > desired way as I have to prepare various graphs of different aspects of
> > loco
> > behaviour
> >
> > If anyone has any ideas I'd be v grateful
> >
> > Thanks Nick
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 

>
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Rainbow in loop

2017-06-08 Thread Boris Steipe
Does:

  rainbow(3)[1]
  rainbow(3)[2]
  rainbow(3)[3]

... solve your issue?

B.





> On Jun 8, 2017, at 8:20 AM, WRAY NICHOLAS  wrote:
> 
> Hi R folk  I have a distance time graph for a locomotive and at various times
> different events occur on board the loco.  I want to put a vertical line on 
> the
> speed time graph for each event, but I want to colour each different kind of
> event differently to see visually whether there's any pattern to these events
> happening.  I could just create a vector of colours and use abline which is 
> easy
> obviously, but there's a different number of events for each loco and what I
> would like to do is to use rainbow to create a new palette for each graph
> 
> To illustrate I have some model code (made-up and v simplified)  Real times 
> are
> not necessarily whole numbers so  there's  not a one to one correspondence
> between times and index of time elements
> 
> sec.time<-seq(0,100)
> 
> distance<-c(rep(0,
> 10),rep(1,5),rep(2,20),rep(3,10),rep(4,20),rep(5,5),rep(6,31))
> plot(sec.time,distance,type="l")
> horntime<-c(7,23,52,67,81,90)
> wipertime<-c(4,18,34,47,62,78,89)
> calltime<-c(27,58,93)
> 
> abline(v=sec.time[horntime], col="red")
> abline(v=sec.time[wipertime], col="blue")
> abline(v=sec.time[calltime], col="green")
> 
> what I want, in this case as there are three events, is to have horn in red,
> wiper in blue and call in green using rainbow with 3.  The problem is that I
> can't see how to call rainbow using a sequence (for loop doesn't work) and
> putting horn/wiper/call as vectors in a list doesn't work as it's too 
> recursive.
> I suppose that I could put horn/wiper/call etc into a matrix of width the
> longest vector and fill the other spaces with dummy -1 but this seems a bit
> inelegant and also in principle I'd like to be able to call rainbow in the
> desired way as I have to prepare various graphs of different aspects of loco
> behaviour
> 
> If anyone has any ideas I'd be v grateful
> 
> Thanks Nick
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Rainbow in loop

2017-06-08 Thread WRAY NICHOLAS
Hi R folk  I have a distance time graph for a locomotive and at various times
different events occur on board the loco.  I want to put a vertical line on the
speed time graph for each event, but I want to colour each different kind of
event differently to see visually whether there's any pattern to these events
happening.  I could just create a vector of colours and use abline which is easy
obviously, but there's a different number of events for each loco and what I
would like to do is to use rainbow to create a new palette for each graph

To illustrate I have some model code (made-up and v simplified)  Real times are
not necessarily whole numbers so  there's  not a one to one correspondence
between times and index of time elements

sec.time<-seq(0,100)

distance<-c(rep(0,
10),rep(1,5),rep(2,20),rep(3,10),rep(4,20),rep(5,5),rep(6,31))
plot(sec.time,distance,type="l")
horntime<-c(7,23,52,67,81,90)
wipertime<-c(4,18,34,47,62,78,89)
calltime<-c(27,58,93)

abline(v=sec.time[horntime], col="red")
abline(v=sec.time[wipertime], col="blue")
abline(v=sec.time[calltime], col="green")

what I want, in this case as there are three events, is to have horn in red,
wiper in blue and call in green using rainbow with 3.  The problem is that I
can't see how to call rainbow using a sequence (for loop doesn't work) and
putting horn/wiper/call as vectors in a list doesn't work as it's too recursive.
 I suppose that I could put horn/wiper/call etc into a matrix of width the
longest vector and fill the other spaces with dummy -1 but this seems a bit
inelegant and also in principle I'd like to be able to call rainbow in the
desired way as I have to prepare various graphs of different aspects of loco
behaviour

If anyone has any ideas I'd be v grateful

Thanks Nick
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] odfWeave - A loop of the "same" data

2017-06-01 Thread Henrik Bengtsson
This is what the R.rsp (https://cran.r-project.org/package=R.rsp; I'm
the author) and it's RSP markup is good at and was designed to handle.
We're using it lots in report generation where we iterate of elements,
e.g. over the 24 chromosomes.  See Section 2.3 in
https://cran.r-project.org/web/packages/R.rsp/vignettes/Dynamic_document_creation_using_RSP.pdf.
RSP is independent of input format - all it requires is that it's
text-based - so you can use RSP-embedded LaTeX, HTML, Markdown, ...,
and even RSP-embedded Sweave, knitr, Rmarkdown (where it then it
effectively works as a pre-processor to those formats).

Hope this helps

Henrik



On Thu, Jun 1, 2017 at 9:35 AM, Charles C. Berry  wrote:
> On Thu, 1 Jun 2017, POLWART, Calum (COUNTY DURHAM AND DARLINGTON NHS
> FOUNDATION TRUST) via R-help wrote:
>
>> Before I go and do this another way - can I check if anyone has a way of
>> looping through data in odfWeave (or possibly sweave) to do a repeating
>> analysis on subsets of data?
>>
>> For simplicity lets use mtcars dataset in R to explain.  Dataset looks
>> like this:
>>
>>> mtcars
>>
>>   mpg cyl disp  hp drat   wt ...
>> Mazda RX4 21.0   6  160 110 3.90 2.62 ...
>> Mazda RX4 Wag 21.0   6  160 110 3.90 2.88 ...
>> Datsun 71022.8   4  108  93 3.85 2.32 ...
>>   
>>
>> Say I wanted to have a 'catalogue' style report from mtcars, where on each
>> page I would perhaps have the Rowname as a heading and then plot a graph of
>> mpg highlighting that specific car
>>
>> Then add a page break and *do the same for the next car*.  I can manually
>> do this of course, but it is effectively a loop something like this:
>>
>> for (n in length(mtcars$mpg)) {
>> barplot (mtcars$mpg, col=c(rep(1,n-1),2,rep(1,length(mtcars$mpg)-n)))
>> }
>>
>> There is a odfWeave page break function so I can do that sort of thing (I
>> think).  But I don't think I can output more than one image can I? In
>> reality I will want several images and a table per "catalogue" page.
>>
>> At the moment I think I need to create a master odt document, and create
>> individual catalogue pages.  And merge them into one document - but that
>> feels clunky (unless I can script the merge!)
>>
>> Anyone got a better way?
>
>
>
> For a complex template inside a loop, I'd probably do as Jeff suggests and
> use a knitr child document for ease of developing and debugging the
> template.
>
> But for the simple case you describe I'd use a brew script to
> unroll the loop.
>
> You would write your input file as usual, but put a brew script in the
> right place, then run brew on the input file to produce an
> intermediate file that unrolls the loop, then weave the intermediate
> file to get your desired result.  Here is a simple example of such you can
> run in an R session (assuming the brew package is installed) and see the
> results printed out.
>
> --8<---cut here---start->8---
>
> brew::brew(text="
>
> Everything before the loop
>
> <% for (i in 1:10) { %>
> Print the value of i
> <% print(i) %> or better yet
> \\Sexpr{<%= i %>}
> <% } %>
>
> everything after
>
> ")
>
> --8<---cut here---end--->8---
>
> The double backslash is needed in the literal string used here.  If
> you put that script in a file using an editor, you would just use a
> single backslash.
>
> HTH,
>
> Chuck
>
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] odfWeave - A loop of the "same" data

2017-06-01 Thread Charles C. Berry

On Thu, 1 Jun 2017, POLWART, Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION 
TRUST) via R-help wrote:

Before I go and do this another way - can I check if anyone has a way of 
looping through data in odfWeave (or possibly sweave) to do a repeating 
analysis on subsets of data?


For simplicity lets use mtcars dataset in R to explain.  Dataset looks like 
this:


mtcars

  mpg cyl disp  hp drat   wt ...
Mazda RX4 21.0   6  160 110 3.90 2.62 ...
Mazda RX4 Wag 21.0   6  160 110 3.90 2.88 ...
Datsun 71022.8   4  108  93 3.85 2.32 ...
  

Say I wanted to have a 'catalogue' style report from mtcars, where on 
each page I would perhaps have the Rowname as a heading and then plot a 
graph of mpg highlighting that specific car


Then add a page break and *do the same for the next car*.  I can manually do 
this of course, but it is effectively a loop something like this:

for (n in length(mtcars$mpg)) {
barplot (mtcars$mpg, col=c(rep(1,n-1),2,rep(1,length(mtcars$mpg)-n)))
}

There is a odfWeave page break function so I can do that sort of thing 
(I think).  But I don't think I can output more than one image can I? 
In reality I will want several images and a table per "catalogue" page.


At the moment I think I need to create a master odt document, and create 
individual catalogue pages.  And merge them into one document - but that 
feels clunky (unless I can script the merge!)


Anyone got a better way?



For a complex template inside a loop, I'd probably do as Jeff suggests and 
use a knitr child document for ease of developing and debugging the 
template.


But for the simple case you describe I'd use a brew script to
unroll the loop.

You would write your input file as usual, but put a brew script in the
right place, then run brew on the input file to produce an
intermediate file that unrolls the loop, then weave the intermediate
file to get your desired result.  Here is a simple example of such you 
can run in an R session (assuming the brew package is installed) and see 
the results printed out.


--8<---cut here---start->8---

brew::brew(text="

Everything before the loop

<% for (i in 1:10) { %>
Print the value of i
<% print(i) %> or better yet
\\Sexpr{<%= i %>}
<% } %>

everything after

")

--8<---cut here---end--->8---

The double backslash is needed in the literal string used here.  If
you put that script in a file using an editor, you would just use a
single backslash.

HTH,

Chuck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] odfWeave - A loop of the "same" data

2017-06-01 Thread Jeff Newmiller
I do this regularly with knitr [1]. I have never used odfWeave, but would 
imagine that similar principles apply. 

If you make a child document that assumes that the desired data are stored in 
one or more objects, then you can use a for loop in the master document that 
repeatedly extracts the desired subsets and puts them into the objects the 
child document expects them to be in, parses the child document, and then 
"cat"s the parsed results into the master document output.

[1] https://yihui.name/knitr/demo/child/
-- 
Sent from my phone. Please excuse my brevity.

On June 1, 2017 3:55:33 AM PDT, "POLWART,  Calum (COUNTY DURHAM AND DARLINGTON 
NHS FOUNDATION TRUST) via R-help"  wrote:
>Before I go and do this another way - can I check if anyone has a way
>of looping through data in odfWeave (or possibly sweave) to do a
>repeating analysis on subsets of data?
>
>For simplicity lets use mtcars dataset in R to explain.  Dataset looks
>like this:
>
>> mtcars
>   mpg cyl disp  hp drat   wt ...
>Mazda RX4 21.0   6  160 110 3.90 2.62 ...
>Mazda RX4 Wag 21.0   6  160 110 3.90 2.88 ...
>Datsun 71022.8   4  108  93 3.85 2.32 ...
>   
>
>Say I wanted to have a 'catalogue' style report from mtcars, where on
>each page I would perhaps have the Rowname as a heading and then plot a
>graph of mpg highlighting that specific car
>
>Then add a page break and *do the same for the next car*.  I can
>manually do this of course, but it is effectively a loop something like
>this:
>
>for (n in length(mtcars$mpg)) {
>barplot (mtcars$mpg, col=c(rep(1,n-1),2,rep(1,length(mtcars$mpg)-n)))
>}
>
>There is a odfWeave page break function so I can do that sort of thing
>(I think).  But I don't think I can output more than one image can I? 
>In reality I will want several images and a table per "catalogue" page.
>
>At the moment I think I need to create a master odt document, and
>create individual catalogue pages.  And merge them into one document -
>but that feels clunky (unless I can script the merge!)
>
>Anyone got a better way?
>
>
>
>
>
>
>This message may contain confidential information. If
>yo...{{dropped:21}}
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] odfWeave - A loop of the "same" data

2017-06-01 Thread POLWART, Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST) via R-help
Before I go and do this another way - can I check if anyone has a way of 
looping through data in odfWeave (or possibly sweave) to do a repeating 
analysis on subsets of data?

For simplicity lets use mtcars dataset in R to explain.  Dataset looks like 
this:

> mtcars
   mpg cyl disp  hp drat   wt ...
Mazda RX4 21.0   6  160 110 3.90 2.62 ...
Mazda RX4 Wag 21.0   6  160 110 3.90 2.88 ...
Datsun 71022.8   4  108  93 3.85 2.32 ...
   

Say I wanted to have a 'catalogue' style report from mtcars, where on each page 
I would perhaps have the Rowname as a heading and then plot a graph of mpg 
highlighting that specific car

Then add a page break and *do the same for the next car*.  I can manually do 
this of course, but it is effectively a loop something like this:

for (n in length(mtcars$mpg)) {
barplot (mtcars$mpg, col=c(rep(1,n-1),2,rep(1,length(mtcars$mpg)-n)))
}

There is a odfWeave page break function so I can do that sort of thing (I 
think).  But I don't think I can output more than one image can I?  In reality 
I will want several images and a table per "catalogue" page.

At the moment I think I need to create a master odt document, and create 
individual catalogue pages.  And merge them into one document - but that feels 
clunky (unless I can script the merge!)

Anyone got a better way?






This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nested for loop with data table

2017-05-06 Thread Ek Esawi
Thank you Jeff. Your idea, as i mentioned on my previous posting, did
indeed work. I read somewhere that both data table dplyr do great many
things and i plan to learn both as much as i can. Suggestions on this list
either get you the answer you are looking for or give you lead to an answer.

Thanks again

On Thu, May 4, 2017 at 12:04 AM, Jeff Newmiller 
wrote:

> You seem to be unaware of the "aggregate" data processing concept. There
> are many ways to accomplish aggregation, but I am not fluent in data.table
> methods but knowing the concept is the first step.
>
> Perhaps look closely at [1], or Google for data table aggregation yourself?
>
> [1] https://www.r-bloggers.com/efficient-aggregation-and-
> more-using-data-table/amp/
> --
> Sent from my phone. Please excuse my brevity.
>
> On May 3, 2017 8:17:21 AM PDT, Ek Esawi  wrote:
> >Thank you both Boris and Jim. Thank you, Boris, for advising to read
> >the
> >posting guide; I had and I just did.
> >
> >Jim’s idea is exactly what I want; however, I could not pass sset1,
> >sset2,
> >etc. to the j nested loop and collect the results in an vector.
> >
> >Here attached my code, file, and my question which should be clear now.
> >The
> >question again is instead of using separate loops for each sset1 and
> >sset2,
> >I want one nested loop? Because I have at least 10 subsets
> >(sset1,sset2,sset3…..sset10).
> >
> >Thanks again, EK
> >
> >
> >---The code--
> >
> >install.packages("data.table")
> >library(data.table)
> >File1 <-  "C:/Users/SampleData.csv"
> >DT <- fread(File1)
> >sset1 <- DT[Num<10<10]
> >sset2 <- DT[Num>10<15]
> >
> ># Count how many combinations of A,B,C,D,E,F in each subset
> >for ( i in 1:length(sset1)){
> >  aa <- c(sset1[Grade=="A",.N],sset1[Grade=="D",.N])
> >  bb <- c(sset1[Grade=="B",.N],sset1[Grade=="F",.N])
> >  cc <- c(sset1[Grade=="C",.N],sset1[Grade=="A",.N])
> >  counts <- c(aa, bb,cc)
> >}
> >
> >for ( i in 1:length(sset2)){
> >  aa1 <- c(sset2[Grade=="A",.N],sset2[Grade=="D",.N])
> >  bb1 <- c(sset2[Grade=="B",.N],sset2[Grade=="F",.N])
> >  cc1 <- c(sset2[Grade=="C",.N],sset2[Grade=="A",.N])
> >  counts <-  c(aa1,bb1,cc1)
> >}
> >
> >---The File
> >
> >   Num  Color Grade ValueMonth Day
> > 1:   1 yellow A20  May   1
> > 2:   2  green B25 June   2
> > 3:   3  green A10April   3
> > 4:   4  black A17   August   3
> > 5:   5red C 5 December   5
> > 6:   6 orange D 0  January  13
> > 7:   7 orange E12  January   5
> > 8:   8 orange F11 February   8
> > 9:   9 orange F99 July  23
> >10:  10 orange F70  May   7
> >11:  11  black A77 June  11
> >12:  12  green B87April  33
> >13:  13  black A79   August   9
> >14:  14  green A68 December  14
> >15:  15  black C90  January  31
> >16:  16  green D79  January  11
> >17:  17  black E   101 February  17
> >18:  18red F90 July  21
> >19:  19red F   112 February  13
> >20:  20red F   101 July  20
> >
> >On Tue, May 2, 2017 at 12:35 PM, Ek Esawi  wrote:
> >
> >> I have a huge data file; a sample is listed below. I am using the
> >package
> >> data table to process the file and I am stuck on one issue and need
> >some
> >> feedback. I used fread to create a data table. Then I divided the
> >data
> >> table (named File1) into 10 general subsets using common table
> >commands
> >> such as:
> >>
> >>
> >>
> >> AAA <- File1[Num<5>15]
> >>
> >> BBB <- File1[Num>15<10]
> >>
> >> …..
> >>
> >> …..
> >>
> >> …..
> >>
> >> …..
> >>
> >> …..
> >>
> >> …..
> >>
> >>
> >>
> >> I wanted to divide and count each of the above subsets based on a set
> >of
> >> parameters common to all subsets. I did the following to go through
> >each
> >> subset and it works:
> >>
> >> For (I in 1: length (AAA)) {
> >>
> >>   aa <- c(AAA[color==”green”==”a”,month==”Januray”
> >> .N],[ AAA[color==”green”==”b”& month==”June”’ .N])
> >>
> >> }
> >>
> >>
> >>
> >> The question: I don’t want to have a separate loop for each subset
> >(10
> >> loops). Instead, I was hoping to have 2 nested loops in the form
> >below:
> >>
> >>
> >>
> >> For (I in 1:N)){
> >>
> >>   For (j in 1:M){
> >>
> >>
> >>
> >> }
> >>
> >> }
> >>
> >>
> >>
> >>  Sample
> >>
> >>
> >> Num
> >>
> >> Color
> >>
> >> Grade
> >>
> >> Value
> >>
> >> Month
> >>
> >> Day
> >>
> >> 1
> >>
> >> yellow
> >>
> >> A
> >>
> >> 20
> >>
> >> May
> >>
> >> 1
> >>
> >> 2
> >>
> >> green
> >>
> >> B
> >>
> >> 25
> >>
> >> June
> >>
> >> 2
> >>
> >> 3
> >>
> >> green
> >>
> >> A
> >>
> >> 10
> >>
> >> April
> >>
> >> 3
> >>
> >> 4
> >>
> >> black
> >>
> >> A
> >>
> >> 17
> >>
> >> August
> >>
> >> 3
> >>
> >> 5
> >>
> >> red
> >>
> >> C
> >>
> >> 5
> >>
> >> December
> >>
> >> 5
> >>
> >> 6
> >>
> >> orange
> >>
> >> D
> >>
> >> 0
> >>
> >> January
> >>
> >> 

Re: [R] nested for loop with data table

2017-05-04 Thread PIKAL Petr
Hi

better to present us your data by dput, so they can be directly used.

> dput(dat)
dat <- structure(list(Num = 1:20, Color = structure(c(5L, 2L, 2L, 1L,
4L, 3L, 3L, 3L, 3L, 3L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 4L, 4L, 4L
), .Label = c("black", "green", "orange", "red", "yellow"), class = "factor"),
Grade = structure(c(1L, 2L, 1L, 1L, 3L, 4L, 5L, 6L, 6L, 6L,
1L, 2L, 1L, 1L, 3L, 4L, 5L, 6L, 6L, 6L), .Label = c("A",
"B", "C", "D", "E", "F"), class = "factor"), value = c(20L,
25L, 10L, 17L, 5L, 0L, 12L, 11L, 99L, 70L, 77L, 87L, 79L,
68L, 90L, 79L, 101L, 90L, 112L, 101L), Month = structure(c(8L,
7L, 1L, 2L, 3L, 5L, 5L, 4L, 6L, 8L, 7L, 1L, 2L, 3L, 5L, 5L,
4L, 6L, 4L, 6L), .Label = c("April", "August", "December",
"February", "January", "July", "June", "May"), class = "factor"),
Day = c(1L, 2L, 3L, 3L, 5L, 13L, 5L, 8L, 23L, 7L, 11L, 33L,
9L, 14L, 31L, 11L, 17L, 21L, 13L, 20L)), .Names = c("Num",
"Color", "Grade", "value", "Month", "Day"), class = "data.frame", row.names = 
c(NA,
-20L))
>

I do not know your exact intention and data.table commands. You can get some 
summary numbers simply by

table(dat$Grade[dat$Num<10 & dat$Day<10])

A B C D E F
3 1 1 0 1 1

It is probably preferable to obtain logical vectors for Num and Day before 
starting tabulation.

Cheers
Petr


> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ek Esawi
> Sent: Wednesday, May 3, 2017 5:17 PM
> To: r-help@r-project.org
> Subject: Re: [R] nested for loop with data table
>
> Thank you both Boris and Jim. Thank you, Boris, for advising to read the
> posting guide; I had and I just did.
>
> Jim’s idea is exactly what I want; however, I could not pass sset1, sset2, 
> etc.
> to the j nested loop and collect the results in an vector.
>
> Here attached my code, file, and my question which should be clear now.
> The question again is instead of using separate loops for each sset1 and
> sset2, I want one nested loop? Because I have at least 10 subsets
> (sset1,sset2,sset3…..sset10).
>
> Thanks again, EK
>
>
> ---The code--
>
> install.packages("data.table")
> library(data.table)
> File1 <-  "C:/Users/SampleData.csv"
> DT <- fread(File1)
> sset1 <- DT[Num<10<10]
> sset2 <- DT[Num>10<15]
>
> # Count how many combinations of A,B,C,D,E,F in each subset for ( i in
> 1:length(sset1)){
>   aa <- c(sset1[Grade=="A",.N],sset1[Grade=="D",.N])
>   bb <- c(sset1[Grade=="B",.N],sset1[Grade=="F",.N])
>   cc <- c(sset1[Grade=="C",.N],sset1[Grade=="A",.N])
>   counts <- c(aa, bb,cc)
> }
>
> for ( i in 1:length(sset2)){
>   aa1 <- c(sset2[Grade=="A",.N],sset2[Grade=="D",.N])
>   bb1 <- c(sset2[Grade=="B",.N],sset2[Grade=="F",.N])
>   cc1 <- c(sset2[Grade=="C",.N],sset2[Grade=="A",.N])
>   counts <-  c(aa1,bb1,cc1)
> }
>
> ---The File
>
>Num  Color Grade ValueMonth Day
>  1:   1 yellow A20  May   1
>  2:   2  green B25 June   2
>  3:   3  green A10April   3
>  4:   4  black A17   August   3
>  5:   5red C 5 December   5
>  6:   6 orange D 0  January  13
>  7:   7 orange E12  January   5
>  8:   8 orange F11 February   8
>  9:   9 orange F99 July  23
> 10:  10 orange F70  May   7
> 11:  11  black A77 June  11
> 12:  12  green B87April  33
> 13:  13  black A79   August   9
> 14:  14  green A68 December  14
> 15:  15  black C90  January  31
> 16:  16  green D79  January  11
> 17:  17  black E   101 February  17
> 18:  18red F90 July  21
> 19:  19red F   112 February  13
> 20:  20red F   101 July  20
>
> On Tue, May 2, 2017 at 12:35 PM, Ek Esawi <esaw...@gmail.com> wrote:
>
> > I have a huge data file; a sample is listed below. I am using the
> > package data table to process the file and I am stuck on one issue and
> > need some feedback. I used fread to create a data table. Then I
> > divided the data table (named File1) into 10 general subsets using
> > common table commands such as:
> >
> >
> >
> > AAA <- File1[Num<5>15]
> >
> > BBB <- File1[Num>15<10]
> >
> 

Re: [R] nested for loop with data table

2017-05-03 Thread Jeff Newmiller
You seem to be unaware of the "aggregate" data processing concept. There are 
many ways to accomplish aggregation, but I am not fluent in data.table methods 
but knowing the concept is the first step.

Perhaps look closely at [1], or Google for data table aggregation yourself? 

[1] 
https://www.r-bloggers.com/efficient-aggregation-and-more-using-data-table/amp/
-- 
Sent from my phone. Please excuse my brevity.

On May 3, 2017 8:17:21 AM PDT, Ek Esawi  wrote:
>Thank you both Boris and Jim. Thank you, Boris, for advising to read
>the
>posting guide; I had and I just did.
>
>Jim’s idea is exactly what I want; however, I could not pass sset1,
>sset2,
>etc. to the j nested loop and collect the results in an vector.
>
>Here attached my code, file, and my question which should be clear now.
>The
>question again is instead of using separate loops for each sset1 and
>sset2,
>I want one nested loop? Because I have at least 10 subsets
>(sset1,sset2,sset3…..sset10).
>
>Thanks again, EK
>
>
>---The code--
>
>install.packages("data.table")
>library(data.table)
>File1 <-  "C:/Users/SampleData.csv"
>DT <- fread(File1)
>sset1 <- DT[Num<10<10]
>sset2 <- DT[Num>10<15]
>
># Count how many combinations of A,B,C,D,E,F in each subset
>for ( i in 1:length(sset1)){
>  aa <- c(sset1[Grade=="A",.N],sset1[Grade=="D",.N])
>  bb <- c(sset1[Grade=="B",.N],sset1[Grade=="F",.N])
>  cc <- c(sset1[Grade=="C",.N],sset1[Grade=="A",.N])
>  counts <- c(aa, bb,cc)
>}
>
>for ( i in 1:length(sset2)){
>  aa1 <- c(sset2[Grade=="A",.N],sset2[Grade=="D",.N])
>  bb1 <- c(sset2[Grade=="B",.N],sset2[Grade=="F",.N])
>  cc1 <- c(sset2[Grade=="C",.N],sset2[Grade=="A",.N])
>  counts <-  c(aa1,bb1,cc1)
>}
>
>---The File
>
>   Num  Color Grade ValueMonth Day
> 1:   1 yellow A20  May   1
> 2:   2  green B25 June   2
> 3:   3  green A10April   3
> 4:   4  black A17   August   3
> 5:   5red C 5 December   5
> 6:   6 orange D 0  January  13
> 7:   7 orange E12  January   5
> 8:   8 orange F11 February   8
> 9:   9 orange F99 July  23
>10:  10 orange F70  May   7
>11:  11  black A77 June  11
>12:  12  green B87April  33
>13:  13  black A79   August   9
>14:  14  green A68 December  14
>15:  15  black C90  January  31
>16:  16  green D79  January  11
>17:  17  black E   101 February  17
>18:  18red F90 July  21
>19:  19red F   112 February  13
>20:  20red F   101 July  20
>
>On Tue, May 2, 2017 at 12:35 PM, Ek Esawi  wrote:
>
>> I have a huge data file; a sample is listed below. I am using the
>package
>> data table to process the file and I am stuck on one issue and need
>some
>> feedback. I used fread to create a data table. Then I divided the
>data
>> table (named File1) into 10 general subsets using common table
>commands
>> such as:
>>
>>
>>
>> AAA <- File1[Num<5>15]
>>
>> BBB <- File1[Num>15<10]
>>
>> …..
>>
>> …..
>>
>> …..
>>
>> …..
>>
>> …..
>>
>> …..
>>
>>
>>
>> I wanted to divide and count each of the above subsets based on a set
>of
>> parameters common to all subsets. I did the following to go through
>each
>> subset and it works:
>>
>> For (I in 1: length (AAA)) {
>>
>>   aa <- c(AAA[color==”green”==”a”,month==”Januray”
>> .N],[ AAA[color==”green”==”b”& month==”June”’ .N])
>>
>> }
>>
>>
>>
>> The question: I don’t want to have a separate loop for each subset
>(10
>> loops). Instead, I was hoping to have 2 nested loops in the form
>below:
>>
>>
>>
>> For (I in 1:N)){
>>
>>   For (j in 1:M){
>>
>>
>>
>> }
>>
>> }
>>
>>
>>
>>  Sample
>>
>>
>> Num
>>
>> Color
>>
>> Grade
>>
>> Value
>>
>> Month
>>
>> Day
>>
>> 1
>>
>> yellow
>>
>> A
>>
>> 20
>>
>> May
>>
>> 1
>>
>> 2
>>
>> green
>>
>> B
>>
>> 25
>>
>> June
>>
>> 2
>>
>> 3
>>
>> green
>>
>> A
>>
>> 10
>>
>> April
>>
>> 3
>>
>> 4
>>
>> black
>>
>> A
>>
>> 17
>>
>> August
>>
>> 3
>>
>> 5
>>
>> red
>>
>> C
>>
>> 5
>>
>> December
>>
>> 5
>>
>> 6
>>
>> orange
>>
>> D
>>
>> 0
>>
>> January
>>
>> 13
>>
>> 7
>>
>> orange
>>
>> E
>>
>> 12
>>
>> January
>>
>> 5
>>
>> 8
>>
>> orange
>>
>> F
>>
>> 11
>>
>> February
>>
>> 8
>>
>> 9
>>
>> orange
>>
>> F
>>
>> 99
>>
>> July
>>
>> 23
>>
>> 10
>>
>> orange
>>
>> F
>>
>> 70
>>
>> May
>>
>> 7
>>
>> 11
>>
>> black
>>
>> A
>>
>> 77
>>
>> June
>>
>> 11
>>
>> 12
>>
>> green
>>
>> B
>>
>> 87
>>
>> April
>>
>> 33
>>
>> 13
>>
>> black
>>
>> A
>>
>> 79
>>
>> August
>>
>> 9
>>
>> 14
>>
>> green
>>
>> A
>>
>> 68
>>
>> December
>>
>> 14
>>
>> 15
>>
>> black
>>
>> C
>>
>> 90
>>
>> January
>>
>> 31
>>
>> 16
>>
>> green
>>
>> D
>>
>> 79
>>
>> January
>>
>> 11
>>
>> 17
>>
>> black
>>
>> E
>>
>> 101
>>
>> February
>>
>> 17
>>
>> 18
>>
>> red
>>
>> F
>>
>> 90
>>
>> July
>>
>> 21
>>
>> 19
>>
>> red
>>
>> F
>>
>> 112
>>
>> February
>>
>> 13
>>
>> 20
>>
>> red
>>
>> F
>>
>> 101
>>
>> July
>>
>> 20
>>
>>

Re: [R] nested for loop with data table

2017-05-03 Thread Ek Esawi
Thank you both Boris and Jim. Thank you, Boris, for advising to read the
posting guide; I had and I just did.

Jim’s idea is exactly what I want; however, I could not pass sset1, sset2,
etc. to the j nested loop and collect the results in an vector.

Here attached my code, file, and my question which should be clear now. The
question again is instead of using separate loops for each sset1 and sset2,
I want one nested loop? Because I have at least 10 subsets
(sset1,sset2,sset3…..sset10).

Thanks again, EK


---The code--

install.packages("data.table")
library(data.table)
File1 <-  "C:/Users/SampleData.csv"
DT <- fread(File1)
sset1 <- DT[Num<10<10]
sset2 <- DT[Num>10<15]

# Count how many combinations of A,B,C,D,E,F in each subset
for ( i in 1:length(sset1)){
  aa <- c(sset1[Grade=="A",.N],sset1[Grade=="D",.N])
  bb <- c(sset1[Grade=="B",.N],sset1[Grade=="F",.N])
  cc <- c(sset1[Grade=="C",.N],sset1[Grade=="A",.N])
  counts <- c(aa, bb,cc)
}

for ( i in 1:length(sset2)){
  aa1 <- c(sset2[Grade=="A",.N],sset2[Grade=="D",.N])
  bb1 <- c(sset2[Grade=="B",.N],sset2[Grade=="F",.N])
  cc1 <- c(sset2[Grade=="C",.N],sset2[Grade=="A",.N])
  counts <-  c(aa1,bb1,cc1)
}

---The File

   Num  Color Grade ValueMonth Day
 1:   1 yellow A20  May   1
 2:   2  green B25 June   2
 3:   3  green A10April   3
 4:   4  black A17   August   3
 5:   5red C 5 December   5
 6:   6 orange D 0  January  13
 7:   7 orange E12  January   5
 8:   8 orange F11 February   8
 9:   9 orange F99 July  23
10:  10 orange F70  May   7
11:  11  black A77 June  11
12:  12  green B87April  33
13:  13  black A79   August   9
14:  14  green A68 December  14
15:  15  black C90  January  31
16:  16  green D79  January  11
17:  17  black E   101 February  17
18:  18red F90 July  21
19:  19red F   112 February  13
20:  20red F   101 July  20

On Tue, May 2, 2017 at 12:35 PM, Ek Esawi  wrote:

> I have a huge data file; a sample is listed below. I am using the package
> data table to process the file and I am stuck on one issue and need some
> feedback. I used fread to create a data table. Then I divided the data
> table (named File1) into 10 general subsets using common table commands
> such as:
>
>
>
> AAA <- File1[Num<5>15]
>
> BBB <- File1[Num>15<10]
>
> …..
>
> …..
>
> …..
>
> …..
>
> …..
>
> …..
>
>
>
> I wanted to divide and count each of the above subsets based on a set of
> parameters common to all subsets. I did the following to go through each
> subset and it works:
>
> For (I in 1: length (AAA)) {
>
>   aa <- c(AAA[color==”green”==”a”,month==”Januray”
> .N],[ AAA[color==”green”==”b”& month==”June”’ .N])
>
> }
>
>
>
> The question: I don’t want to have a separate loop for each subset (10
> loops). Instead, I was hoping to have 2 nested loops in the form below:
>
>
>
> For (I in 1:N)){
>
>   For (j in 1:M){
>
>
>
> }
>
> }
>
>
>
>  Sample
>
>
> Num
>
> Color
>
> Grade
>
> Value
>
> Month
>
> Day
>
> 1
>
> yellow
>
> A
>
> 20
>
> May
>
> 1
>
> 2
>
> green
>
> B
>
> 25
>
> June
>
> 2
>
> 3
>
> green
>
> A
>
> 10
>
> April
>
> 3
>
> 4
>
> black
>
> A
>
> 17
>
> August
>
> 3
>
> 5
>
> red
>
> C
>
> 5
>
> December
>
> 5
>
> 6
>
> orange
>
> D
>
> 0
>
> January
>
> 13
>
> 7
>
> orange
>
> E
>
> 12
>
> January
>
> 5
>
> 8
>
> orange
>
> F
>
> 11
>
> February
>
> 8
>
> 9
>
> orange
>
> F
>
> 99
>
> July
>
> 23
>
> 10
>
> orange
>
> F
>
> 70
>
> May
>
> 7
>
> 11
>
> black
>
> A
>
> 77
>
> June
>
> 11
>
> 12
>
> green
>
> B
>
> 87
>
> April
>
> 33
>
> 13
>
> black
>
> A
>
> 79
>
> August
>
> 9
>
> 14
>
> green
>
> A
>
> 68
>
> December
>
> 14
>
> 15
>
> black
>
> C
>
> 90
>
> January
>
> 31
>
> 16
>
> green
>
> D
>
> 79
>
> January
>
> 11
>
> 17
>
> black
>
> E
>
> 101
>
> February
>
> 17
>
> 18
>
> red
>
> F
>
> 90
>
> July
>
> 21
>
> 19
>
> red
>
> F
>
> 112
>
> February
>
> 13
>
> 20
>
> red
>
> F
>
> 101
>
> July
>
> 20
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] nested for loop with data table

2017-05-02 Thread Jim Lemon
Hi Ek,
I think you want your example to look like this:

Sample<-read.table(text=
"Num Color Grade Value Month Day
1 yellow A 20 May 1
2 green B 25 June 2
3 green A 10 April 3
4 black A 17 August 3
5 red C 5 December 5
6 orange D 0 January 13
7 orange E 12 January 5
8 orange F 11 February 8
9 orange F 99 July 23
10 orange F 70 May 7
11 black A 77 June 11
12 green B 87 April 33
13 black A 79 August 9
14 green A 68 December 14
15 black C 90 January 31
16 green D 79 January 11
17 black E 101 February 17
18 red F 90 July 21
19 red F 112 February 13
20 red F 101 July 20",
header=TRUE)
AAA<-Sample[Sample$Num < 5 & Sample$Day < 3,]
BBB<-Sample[Sample$Num > 15 & Sample$Day > 13,]
for(i in 1:length(AAA)) {
 for(j in 1:length(BBB)) {
  ...
 }
}

except in data.table notation. However, I can't work out what you want
to do in the loop.

Jim


On Wed, May 3, 2017 at 2:35 AM, Ek Esawi  wrote:
> I have a huge data file; a sample is listed below. I am using the package
> data table to process the file and I am stuck on one issue and need some
> feedback. I used fread to create a data table. Then I divided the data
> table (named File1) into 10 general subsets using common table commands
> such as:
>
>
>
> AAA <- File1[Num<5>15]
>
> BBB <- File1[Num>15<10]
>
> …..
>
> …..
>
> …..
>
> …..
>
> …..
>
> …..
>
>
>
> I wanted to divide and count each of the above subsets based on a set of
> parameters common to all subsets. I did the following to go through each
> subset and it works:
>
> For (I in 1: length (AAA)) {
>
>   aa <- c(AAA[color==”green”==”a”,month==”Januray” .N],[
> AAA[color==”green”==”b”& month==”June”’ .N])
>
> }
>
>
>
> The question: I don’t want to have a separate loop for each subset (10
> loops). Instead, I was hoping to have 2 nested loops in the form below:
>
>
>
> For (I in 1:N)){
>
>   For (j in 1:M){
>
>
>
> }
>
> }
>
>
>
>  Sample
>
>
> Num
>
> Color
>
> Grade
>
> Value
>
> Month
>
> Day
>
> 1
>
> yellow
>
> A
>
> 20
>
> May
>
> 1
>
> 2
>
> green
>
> B
>
> 25
>
> June
>
> 2
>
> 3
>
> green
>
> A
>
> 10
>
> April
>
> 3
>
> 4
>
> black
>
> A
>
> 17
>
> August
>
> 3
>
> 5
>
> red
>
> C
>
> 5
>
> December
>
> 5
>
> 6
>
> orange
>
> D
>
> 0
>
> January
>
> 13
>
> 7
>
> orange
>
> E
>
> 12
>
> January
>
> 5
>
> 8
>
> orange
>
> F
>
> 11
>
> February
>
> 8
>
> 9
>
> orange
>
> F
>
> 99
>
> July
>
> 23
>
> 10
>
> orange
>
> F
>
> 70
>
> May
>
> 7
>
> 11
>
> black
>
> A
>
> 77
>
> June
>
> 11
>
> 12
>
> green
>
> B
>
> 87
>
> April
>
> 33
>
> 13
>
> black
>
> A
>
> 79
>
> August
>
> 9
>
> 14
>
> green
>
> A
>
> 68
>
> December
>
> 14
>
> 15
>
> black
>
> C
>
> 90
>
> January
>
> 31
>
> 16
>
> green
>
> D
>
> 79
>
> January
>
> 11
>
> 17
>
> black
>
> E
>
> 101
>
> February
>
> 17
>
> 18
>
> red
>
> F
>
> 90
>
> July
>
> 21
>
> 19
>
> red
>
> F
>
> 112
>
> February
>
> 13
>
> 20
>
> red
>
> F
>
> 101
>
> July
>
> 20
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] nested for loop with data table

2017-05-02 Thread Boris Steipe
There's a lot that doesn't make sense here. I think what you need to do is 
produce a small, reproducible example, post that with dput() and state your 
question more clearly - including what you have tried and what didn't work. 
You'll probably be amazed how quickly you will get good advice if 
_you_only_follow_the_posting_guide_.

B.




> On May 2, 2017, at 12:35 PM, Ek Esawi  wrote:
> 
> I have a huge data file; a sample is listed below. I am using the package
> data table to process the file and I am stuck on one issue and need some
> feedback. I used fread to create a data table. Then I divided the data
> table (named File1) into 10 general subsets using common table commands
> such as:
> 
> 
> 
> AAA <- File1[Num<5>15]
> 
> BBB <- File1[Num>15<10]
> 
> …..
> 
> …..
> 
> …..
> 
> …..
> 
> …..
> 
> …..
> 
> 
> 
> I wanted to divide and count each of the above subsets based on a set of
> parameters common to all subsets. I did the following to go through each
> subset and it works:
> 
> For (I in 1: length (AAA)) {
> 
>  aa <- c(AAA[color==”green”==”a”,month==”Januray” .N],[
> AAA[color==”green”==”b”& month==”June”’ .N])
> 
> }
> 
> 
> 
> The question: I don’t want to have a separate loop for each subset (10
> loops). Instead, I was hoping to have 2 nested loops in the form below:
> 
> 
> 
> For (I in 1:N)){
> 
>  For (j in 1:M){
> 
> 
> 
> }
> 
> }
> 
> 
> 
> Sample
> 
> 
> Num
> 
> Color
> 
> Grade
> 
> Value
> 
> Month
> 
> Day
> 
> 1
> 
> yellow
> 
> A
> 
> 20
> 
> May
> 
> 1
> 
> 2
> 
> green
> 
> B
> 
> 25
> 
> June
> 
> 2
> 
> 3
> 
> green
> 
> A
> 
> 10
> 
> April
> 
> 3
> 
> 4
> 
> black
> 
> A
> 
> 17
> 
> August
> 
> 3
> 
> 5
> 
> red
> 
> C
> 
> 5
> 
> December
> 
> 5
> 
> 6
> 
> orange
> 
> D
> 
> 0
> 
> January
> 
> 13
> 
> 7
> 
> orange
> 
> E
> 
> 12
> 
> January
> 
> 5
> 
> 8
> 
> orange
> 
> F
> 
> 11
> 
> February
> 
> 8
> 
> 9
> 
> orange
> 
> F
> 
> 99
> 
> July
> 
> 23
> 
> 10
> 
> orange
> 
> F
> 
> 70
> 
> May
> 
> 7
> 
> 11
> 
> black
> 
> A
> 
> 77
> 
> June
> 
> 11
> 
> 12
> 
> green
> 
> B
> 
> 87
> 
> April
> 
> 33
> 
> 13
> 
> black
> 
> A
> 
> 79
> 
> August
> 
> 9
> 
> 14
> 
> green
> 
> A
> 
> 68
> 
> December
> 
> 14
> 
> 15
> 
> black
> 
> C
> 
> 90
> 
> January
> 
> 31
> 
> 16
> 
> green
> 
> D
> 
> 79
> 
> January
> 
> 11
> 
> 17
> 
> black
> 
> E
> 
> 101
> 
> February
> 
> 17
> 
> 18
> 
> red
> 
> F
> 
> 90
> 
> July
> 
> 21
> 
> 19
> 
> red
> 
> F
> 
> 112
> 
> February
> 
> 13
> 
> 20
> 
> red
> 
> F
> 
> 101
> 
> July
> 
> 20
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] nested for loop with data table

2017-05-02 Thread Ek Esawi
I have a huge data file; a sample is listed below. I am using the package
data table to process the file and I am stuck on one issue and need some
feedback. I used fread to create a data table. Then I divided the data
table (named File1) into 10 general subsets using common table commands
such as:



AAA <- File1[Num<5>15]

BBB <- File1[Num>15<10]

…..

…..

…..

…..

…..

…..



I wanted to divide and count each of the above subsets based on a set of
parameters common to all subsets. I did the following to go through each
subset and it works:

For (I in 1: length (AAA)) {

  aa <- c(AAA[color==”green”==”a”,month==”Januray” .N],[
AAA[color==”green”==”b”& month==”June”’ .N])

}



The question: I don’t want to have a separate loop for each subset (10
loops). Instead, I was hoping to have 2 nested loops in the form below:



For (I in 1:N)){

  For (j in 1:M){



}

}



 Sample


Num

Color

Grade

Value

Month

Day

1

yellow

A

20

May

1

2

green

B

25

June

2

3

green

A

10

April

3

4

black

A

17

August

3

5

red

C

5

December

5

6

orange

D

0

January

13

7

orange

E

12

January

5

8

orange

F

11

February

8

9

orange

F

99

July

23

10

orange

F

70

May

7

11

black

A

77

June

11

12

green

B

87

April

33

13

black

A

79

August

9

14

green

A

68

December

14

15

black

C

90

January

31

16

green

D

79

January

11

17

black

E

101

February

17

18

red

F

90

July

21

19

red

F

112

February

13

20

red

F

101

July

20

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Creating a loop with code from the mblm package

2016-09-04 Thread Bert Gunter
Please go through an R tutorial or two before posting further here.
There are many good ones on the web. Some recommendations can be found
here:
https://www.rstudio.com/online-learning/

Your question:

"And my Year column is the first column in my csv file, which I
thought made it column 0. Am I mistaken, is it supposed to be column
1?"

is absolutely basic, and indicates that you have not yet made much
effort to learn R. The inevitable result, of course, is confusion and
errors of the sort you describe.

The answer is that R indices start at 1, but there is a great deal
more to them than that, which tutorials would tell you about.

Cheers,
Bert




Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sun, Sep 4, 2016 at 7:10 AM, Bailey Hewitt <bails...@hotmail.com> wrote:
> Hi Jim,
>
>
> Thank you for the suggestion. Unfortunately, when I tried this it gave me the 
> same error as I was getting before. I was wondering the same thing as you, 
> because of the way my function is set up is it even possible to iterate 
> through columns in that type of function? And my Year column is the first 
> column in my csv file, which I thought made it column 0. Am I mistaken, is it 
> supposed to be column 1?
>
>
> Thanks!
>
>
> Bailey
>
>
> 
> From: Jim Lemon <drjimle...@gmail.com>
> Sent: September 4, 2016 4:55 AM
> To: Bailey Hewitt
> Cc: R-help
> Subject: Re: [R] Creating a loop with code from the mblm package
>
> Hi Bailey,
> Treat it as a guess, but try this:
>
> for (i in c(1:3)){
>  y<-mydata[,i]
>  x <- mblm(y ~ Year, mydata, repeated = FALSE)
>  print(x)
> }
>
> I'm not sure that you can mix indexed columns with column names. Also,
> Year is column 4, no?
>
> Jim
>
>
> On Sun, Sep 4, 2016 at 11:43 AM, Bailey Hewitt <bails...@hotmail.com> wrote:
>> Hello,
>>
>>
>> I am a novice in coding in R and have come across an error I am having a 
>> hard time fixing. I am trying to use the mblm package to run a Theil-Sen 
>> linear model. The code for this function is:
>>
>> mblm(Y ~ X, dataframe, repeated = FALSE)
>>
>> My goal is to put this into a loop so that I can calculate the Theil-Sen 
>> slope of each column in my csv. file. The file contains one column of years 
>> (x value) and 3 columns of days of the year (y values). All columns are the 
>> same length. The code I currently have is:
>>
>>
>> read.csv("~/Documents/NH- Lake Mendota_SenSlope_Data2.csv", header = TRUE, 
>> sep = ",")
>>
>> mydata= read.csv("~/Documents/NH- Lake Mendota_SenSlope_Data2.csv", header = 
>> TRUE, sep = ",")
>>
>>
>> attach(mydata)
>>
>>
>> install.packages("mblm")
>>
>> library("mblm")
>>
>>
>> for (i in c(1:3)){
>>
>>   x <- mblm(mydata[,i] ~ Year, mydata, repeated = FALSE)
>>
>>   print(x)
>>
>> }
>>
>>
>> Which gives me the following error:
>>
>> Error in names(res$residuals) = as.character(1:length(res$residuals)) :
>>
>>   'names' attribute [2] must be the same length as the vector [0]
>>
>>
>> Which I cannot seem to solve although as I understand it it is an error that 
>> I am causing in the mblm package. If anyone has any insight into how I 
>> [[elided Hotmail spam]]
>>
>>
>> Bailey
>>
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>
> z.ch/mailman/listinfo/r-help>
> stat.ethz.ch
> The main R mailing list, for announcements about the development of R and the 
> availability of new code, questions and answers about problems and solutions 
> using R ...
>
>
>
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating a loop with code from the mblm package

2016-09-04 Thread Jim Lemon
Hi Bailey,
Treat it as a guess, but try this:

for (i in c(1:3)){
 y<-mydata[,i]
 x <- mblm(y ~ Year, mydata, repeated = FALSE)
 print(x)
}

I'm not sure that you can mix indexed columns with column names. Also,
Year is column 4, no?

Jim


On Sun, Sep 4, 2016 at 11:43 AM, Bailey Hewitt  wrote:
> Hello,
>
>
> I am a novice in coding in R and have come across an error I am having a hard 
> time fixing. I am trying to use the mblm package to run a Theil-Sen linear 
> model. The code for this function is:
>
> mblm(Y ~ X, dataframe, repeated = FALSE)
>
> My goal is to put this into a loop so that I can calculate the Theil-Sen 
> slope of each column in my csv. file. The file contains one column of years 
> (x value) and 3 columns of days of the year (y values). All columns are the 
> same length. The code I currently have is:
>
>
> read.csv("~/Documents/NH- Lake Mendota_SenSlope_Data2.csv", header = TRUE, 
> sep = ",")
>
> mydata= read.csv("~/Documents/NH- Lake Mendota_SenSlope_Data2.csv", header = 
> TRUE, sep = ",")
>
>
> attach(mydata)
>
>
> install.packages("mblm")
>
> library("mblm")
>
>
> for (i in c(1:3)){
>
>   x <- mblm(mydata[,i] ~ Year, mydata, repeated = FALSE)
>
>   print(x)
>
> }
>
>
> Which gives me the following error:
>
> Error in names(res$residuals) = as.character(1:length(res$residuals)) :
>
>   'names' attribute [2] must be the same length as the vector [0]
>
>
> Which I cannot seem to solve although as I understand it it is an error that 
> I am causing in the mblm package. If anyone has any insight into how I could 
> start fixing this that would be greatly appreciated!
>
>
> Bailey
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   3   4   5   6   >