subject:"\[R\] Looping"

Re: [R] Looping

2024-02-20 Thread Steven Yen



Steven from iPhone

> On Feb 19, 2024, at 4:56 PM, Steven Yen  wrote:
> 
> Thanks to all. Glad there are many options.
> 
> Steven from iPhone
> 
>>> On Feb 19, 2024, at 1:55 PM, Rui Barradas  wrote:
>>> 
>> Às 03:27 de 19/02/2024, Steven Yen escreveu:
>>> I need to read csv files repeatedly, named data1.csv, data2.csv,… 
>>> data24.csv, 24 altogether. That is,
>>> data<-read.csv(“data1.csv”)
>>> …
>>> data<-read.csv(“data24.csv”)
>>> …
>>> Is there a way to do this in a loop? Thank you.
>>> Steven from iPhone
>>>[[alternative HTML version deleted]]
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> Hello,
>> 
>> Here is a way of reading the files in a *apply loop. The file names are 
>> created by getting them from file (list.files) or by a string editing 
>> function (sprintf).
>> 
>> 
>> # file_names_vec <- list.files(pattern = "data\\d+\\.csv")
>> file_names_vec <- sprintf("data%d.csv", 1:24)
>> data_list <- sapply(file_names_vec, read.csv, simplify = FALSE)
>> 
>> # access the 1st data.frame
>> data_list[[1L]]
>> # same as above
>> data_list[["data1.csv"]]
>> # same as above
>> data_list$data1.csv
>> 
>> 
>> Hope this helps,
>> 
>> Rui Barradas
>> 
>> 
>> 
>> --
>> Este e-mail foi analisado pelo software antivírus AVG para verificar a 
>> presença de vírus.
>> www.avg.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping

2024-02-19 Thread Marc Girondot via R-help

In my package HelpersMG, I have included a function to read in one time 
all the files of a folder and they are stored in a list:


read_folder(
  folder = try(file.choose(), silent = TRUE),
  file = NULL,
  wildcard = "*.*",
  read = read.delim,
  ...
)

In your case, for example:

library("HelpersMG")
data_list <- read_folder(folder=".", file=paste0("data", 
as.character(1:24),".csv"), read=read.csv)

data_df <-   do.call("rbind", data_list)

Marc

Le 19/02/2024 à 04:27, Steven Yen a écrit :

I need to read csv files repeatedly, named data1.csv, data2.csv,… data24.csv, 
24 altogether. That is,

data<-read.csv(“data1.csv”)
…
data<-read.csv(“data24.csv”)
…

Is there a way to do this in a loop? Thank you.

Steven from iPhone
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping

2024-02-18 Thread Rui Barradas


Às 03:27 de 19/02/2024, Steven Yen escreveu:

I need to read csv files repeatedly, named data1.csv, data2.csv,… data24.csv, 
24 altogether. That is,

data<-read.csv(“data1.csv”)
…
data<-read.csv(“data24.csv”)
…

Is there a way to do this in a loop? Thank you.

Steven from iPhone
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Here is a way of reading the files in a *apply loop. The file names are 
created by getting them from file (list.files) or by a string editing 
function (sprintf).



# file_names_vec <- list.files(pattern = "data\\d+\\.csv")
file_names_vec <- sprintf("data%d.csv", 1:24)
data_list <- sapply(file_names_vec, read.csv, simplify = FALSE)

# access the 1st data.frame
data_list[[1L]]
# same as above
data_list[["data1.csv"]]
# same as above
data_list$data1.csv


Hope this helps,

Rui Barradas



--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping

2024-02-18 Thread Richard O'Keefe

f <- function (filename) {
  data<- read.csv(filename)
  ..
}
for (filename in paste0("data", 1:24, ".csv")) f(filename)

Depending on what exactly you have in your file system,

for (filename in system("ls data*.csv", TRUE)) f(filename)

might work.

On Mon, 19 Feb 2024 at 16:33, Steven Yen  wrote:
>
> I need to read csv files repeatedly, named data1.csv, data2.csv,… data24.csv, 
> 24 altogether. That is,
>
> data<-read.csv(“data1.csv”)
> …
> data<-read.csv(“data24.csv”)
> …
>
> Is there a way to do this in a loop? Thank you.
>
> Steven from iPhone
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping

2024-02-18 Thread avi.e.gross

Steven,

It depends what you want to do. What you are showing seems to replace the 
values stored in "data" each time.

Many kinds of loops will do that, with one simple way being to store all the 
filenames in a list and loop on the contents of the list as arguments to 
read.csv.

Since you show filenames as having a number from 1 to 24 in middle, you can 
make such a vector using paste().

A somewhat related question is if you want to concatenate all the data into one 
larger data.frame. 


-Original Message-
From: R-help  On Behalf Of Steven Yen
Sent: Sunday, February 18, 2024 10:28 PM
To: R-help Mailing List 
Subject: [R] Looping

I need to read csv files repeatedly, named data1.csv, data2.csv,… data24.csv, 
24 altogether. That is, 

data<-read.csv(“data1.csv”)
…
data<-read.csv(“data24.csv”)
…

Is there a way to do this in a loop? Thank you.

Steven from iPhone
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping

2024-02-18 Thread Peter Langfelder

Try

for (ind in 1:24)
{
   data = read.csv(paste0("data", ind, ".csv"))
   ...
}


Peter

On Mon, Feb 19, 2024 at 11:33 AM Steven Yen  wrote:
>
> I need to read csv files repeatedly, named data1.csv, data2.csv,… data24.csv, 
> 24 altogether. That is,
>
> data<-read.csv(“data1.csv”)
> …
> data<-read.csv(“data24.csv”)
> …
>
> Is there a way to do this in a loop? Thank you.
>
> Steven from iPhone
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Looping

2024-02-18 Thread Steven Yen

I need to read csv files repeatedly, named data1.csv, data2.csv,… data24.csv, 
24 altogether. That is, 

data<-read.csv(“data1.csv”)
…
data<-read.csv(“data24.csv”)
…

Is there a way to do this in a loop? Thank you.

Steven from iPhone
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping through all matrix columns each 1 row at a time

2022-04-21 Thread Eric Berger

Hi Paul,
I am not sure I understand your question, but perhaps the following is helpful.
In particular, the apply() function used with MAR=1, applies a
function to a matrix row-wise.

set.seed(123)
m <- matrix(sample(1:6,5*12,replace=TRUE),ncol=12) ## dummy data
m
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,]366311154 1 5 1
[2,]631356315 2 2 6
[3,]352133515 5 1 5
[4,]243424423 5 1 1
[5,]265126236 4 3 2

apply(m,MAR=1,function(v) length(setdiff(1:6,v))==0)

[1] FALSE FALSE FALSE FALSE  TRUE  ## only the last row has all numbers from 1-6

HTH,
Eric


On Thu, Apr 21, 2022 at 7:55 AM Paul Bernal  wrote:
>
> Dear R friends,
>
> One question, so, thanks to the Bert's kind feedback, I was able to create
> my matrix using the following code:
> dice_rolls = 120
> num_dice   = 1
> dice_sides = 6
>
> #performing simulation
> dice_simul = data.frame(dice(rolls = dice_rolls, ndice = num_dice, sides =
> dice_sides, plot.it = TRUE))
>
> dice_simul
>
>
> prob_matrix <- matrix(dice_simul[,1], ncol = 12, byrow = TRUE)
>
> colnames(prob_matrix) <-
> c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")
>
> Now, I need to perform an analysis for each column, one row at a time. For
> example, I need to know if numbers 1 through 6 all appear in the twelve
> column for row 1, then for row 2, etc.
>
> Any guidance will be greatly appreciated.
>
> Best,
> Paul
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Looping through all matrix columns each 1 row at a time

2022-04-20 Thread Paul Bernal

Dear R friends,

One question, so, thanks to the Bert's kind feedback, I was able to create
my matrix using the following code:
dice_rolls = 120
num_dice   = 1
dice_sides = 6

#performing simulation
dice_simul = data.frame(dice(rolls = dice_rolls, ndice = num_dice, sides =
dice_sides, plot.it = TRUE))

dice_simul


prob_matrix <- matrix(dice_simul[,1], ncol = 12, byrow = TRUE)

colnames(prob_matrix) <-
c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")

Now, I need to perform an analysis for each column, one row at a time. For
example, I need to know if numbers 1 through 6 all appear in the twelve
column for row 1, then for row 2, etc.

Any guidance will be greatly appreciated.

Best,
Paul

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping through data error

2021-04-13 Thread Bert Gunter

Well, if I understand your query, wouldn't the following simple approach
suffice -- it assumes that the results for each company are ordered by
year, as your example seems to show:

## test is your example data
## first remove NA's
test2 <- na.omit(test)

## Now just use tapply():
> out <-with(test2, tapply(CLOSE_SHARE_PRICE, COMPANY_NUMBER,
+   FUN =function(x)100 /x[1]))
> out
 1091347 1135606922705 SC192761
12.28501 91.74312 15.26718 91.74312
## essentially a labelled vector
##You can use  %/% if you only want the whole number of shares that can be
purchased

It's somewhat messier if the results are not ordered by date within company
-- you could use by() and POSIXct to order the dates within company to get
the right one.

Cheers,

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Apr 13, 2021 at 11:34 AM e-mail ma015k3113 via R-help <
r-help@r-project.org> wrote:

> Rui, excellent diagnosis and suggestion. It worked but my damn logic is
> still not delivering what I want-will spend more time on it tomorrow.
>
>
> Kind regards
>
> Ahson
>
> > On 13 April 2021 at 17:06 Rui Barradas  wrote:
> >
> >
> > Hello,
> >
> > A close parenthesis is missing in the nd if.
> >
> >
> > for (i in 1:(nrow(PLC_Return)-1)){
> >if (i == 1){
> >  NUMBER_OF_SHARES[i] = 100/is.na(CLOSE_SHARE_PRICE[i])
> >} else if(is.na(PLC_Return[i, 1]) == is.na(PLC_Return[i + 1, 1])){
> >  NUMBER_OF_SHARES[i]=0
> >} else {
> >  NUMBER_OF_SHARES[i] = 100/is.na(CLOSE_SHARE_PRICE[i])
> >}
> > }
> >
> >
> > Hope this helps,
> >
> > Rui Barradas
> >
> > Às 13:51 de 13/04/21, e-mail ma015k3113 via R-help escreveu:
> > > Dear All,I have a dataframe with 4 variables and I am trying to
> calculate how many shares can be purchased with £100 in the first year when
> the company was listed
> > >
> > > The data looks like:
> > >
> > > COMPANY_NUMBER YEAR_END_DATE CLOSE_SHARE_PRICE  NUMBER_OF_SHARES
> > > 2270530/09/2002
>   NA 0
> > > 2270530/09/2004
>  NA  0
> > > 2270530/09/2005
> 6.55 0
> > > 2270530/09/2006
> 7.5   0
> > > 2270530/09/2007
> 9.65 0
> > > 2270530/09/2008
> 6.55 0
> > > 109134731/01/2010
> 8.14 0
> > > 1091347 31/01/2011
> 11.38 0
> > > 11356069   30/06/2019
> 1.09   0
> > > SC192761 31/01/2000
>  NA   0
> > > SC192761 31/01/2001
>  NA   0
> > > SC192761  31/01/2002
> NA   0
> > > SC192761 31/01/2004
>  NA   0
> > > SC192761 31/01/2005
>  NA   0
> > > SC192761  31/01/2006
> 1.09   0
> > > SC192761  31/01/2008
> 1.24   0
> > > SC192761  31/01/2009
>  0.90
> > > SC192761  31/01/2010 1.14
>   0
> > > SC192761   31/01/20111.25
>   0
> > > SC192761  31/01/2012 1.29
>   0
> > >
> > >
> > > The code I have written is
> > >
> > > i <- 0
> > >
> > > for (i in 1:(nrow(PLC_Return)-1))
> > > if (i == 1)
> > > {
> > > NUMBER_OF_SHARES[i] = 100/is.na(CLOSE_SHARE_PRICE[i])
> > > } else if
> > > (is.na(PLC_Return[i, 1]) == is.na(PLC_Return[i + 1, 1])
> > > {
> > > NUMBER_OF_SHARES[i]=0
> > > } else
> > > {
> > > NUMBER_OF_SHARES[i] = 100/is.na(CLOSE_SHARE_PRICE[i])
> > > }
> > >
> > >
> > > The error I get is Error: unexpected 'else' in:
> > >
> > > " NUMBER_OF_SHARES[i] = 0
> > > } else"
> > >> {NUMBER_OF_SHARES[i] = 100/is.na(CLOSE_SHARE_PRICE[i])}
> > >>
> > >> }
> > > Error: unexpected '}' in "}"
> > >
> > >
> > > Don't know how to fix it-any help will be appreciated.
> > >
> > >
> > > Kind regards
> > >
> > >
> > > Ahson
> > > [[alternative HTML version deleted]]
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >

Re: [R] Looping through data error

2021-04-13 Thread e-mail ma015k3113 via R-help

Rui, excellent diagnosis and suggestion. It worked but my damn logic is still 
not delivering what I want-will spend more time on it tomorrow.


Kind regards

Ahson

> On 13 April 2021 at 17:06 Rui Barradas  wrote:
> 
> 
> Hello,
> 
> A close parenthesis is missing in the nd if.
> 
> 
> for (i in 1:(nrow(PLC_Return)-1)){
>if (i == 1){
>  NUMBER_OF_SHARES[i] = 100/is.na(CLOSE_SHARE_PRICE[i])
>} else if(is.na(PLC_Return[i, 1]) == is.na(PLC_Return[i + 1, 1])){
>  NUMBER_OF_SHARES[i]=0
>} else {
>  NUMBER_OF_SHARES[i] = 100/is.na(CLOSE_SHARE_PRICE[i])
>}
> }
> 
> 
> Hope this helps,
> 
> Rui Barradas
> 
> Às 13:51 de 13/04/21, e-mail ma015k3113 via R-help escreveu:
> > Dear All,I have a dataframe with 4 variables and I am trying to calculate 
> > how many shares can be purchased with £100 in the first year when the 
> > company was listed
> > 
> > The data looks like:
> > 
> > COMPANY_NUMBER YEAR_END_DATE CLOSE_SHARE_PRICE  NUMBER_OF_SHARES
> > 2270530/09/2002  NA 
> > 0
> > 2270530/09/2004 NA  
> > 0
> > 2270530/09/20056.55 
> > 0
> > 2270530/09/20067.5  
> >  0
> > 2270530/09/20079.65 
> > 0
> > 2270530/09/20086.55 
> > 0
> > 109134731/01/20108.14   
> >   0
> > 1091347 31/01/2011  11.38   
> >   0
> > 11356069   30/06/2019  1.09 
> >   0
> > SC192761 31/01/2000 NA  
> >  0
> > SC192761 31/01/2001 NA  
> >  0
> > SC192761  31/01/2002NA  
> >  0
> > SC192761 31/01/2004 NA  
> >  0
> > SC192761 31/01/2005 NA  
> >  0
> > SC192761  31/01/2006  1.09  
> >  0
> > SC192761  31/01/2008  1.24  
> >  0
> > SC192761  31/01/2009   0.9  
> >   0
> > SC192761  31/01/2010 1.14   
> >  0
> > SC192761   31/01/20111.25   
> >  0
> > SC192761  31/01/2012 1.29   
> >  0
> > 
> > 
> > The code I have written is
> > 
> > i <- 0
> > 
> > for (i in 1:(nrow(PLC_Return)-1))
> > if (i == 1)
> > {
> > NUMBER_OF_SHARES[i] = 100/is.na(CLOSE_SHARE_PRICE[i])
> > } else if
> > (is.na(PLC_Return[i, 1]) == is.na(PLC_Return[i + 1, 1])
> > {
> > NUMBER_OF_SHARES[i]=0
> > } else
> > {
> > NUMBER_OF_SHARES[i] = 100/is.na(CLOSE_SHARE_PRICE[i])
> > }
> > 
> > 
> > The error I get is Error: unexpected 'else' in:
> > 
> > " NUMBER_OF_SHARES[i] = 0
> > } else"
> >> {NUMBER_OF_SHARES[i] = 100/is.na(CLOSE_SHARE_PRICE[i])}
> >>
> >> }
> > Error: unexpected '}' in "}"
> > 
> > 
> > Don't know how to fix it-any help will be appreciated.
> > 
> > 
> > Kind regards
> > 
> > 
> > Ahson
> > [[alternative HTML version deleted]]
> > 
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping through data error

2021-04-13 Thread e-mail ma015k3113 via R-help

Jim, thanks for taking the time to look into this. Yes, these if else 
statements are so confusing.

I tried your amended scode and it does not work. The error are as follows:


Error: unexpected '}' in " }"
> NUMBER_OF_SHARES[i] = 100 / is.na(CLOSE_SHARE_PRICE[i])
> }
Error: unexpected '}' in " }"
> }
Error: unexpected '}' in " }"
> }
Error: unexpected '}' in "}"
>

I have spent so much time on this-hopefully I will come to grips sooner or 
later. In the mean time any further suggestion?


Kind regards


Ahson

> On 13 April 2021 at 14:29 jim holtman  wrote:
> 
> Your code was formatted incorrectly.  There is always a problem with the 
> 'else' statement after an 'if' since in R there is no semicolon to mark the 
> end of a line.  Here might be a better format for your code.  I would 
> recommend the liberal use of "{}"s when using 'if/else'
> 
> 
> 
> i <- 0
> 
> for (i in 1:(nrow(PLC_Return) - 1)) {
>   if (i == 1) {
> NUMBER_OF_SHARES[i] = 100 /http://is.na (CLOSE_SHARE_PRICE[i])
>   } else {
> if (http://is.na (PLC_Return[i, 1]) ==http://is.na (PLC_Return[i + 1, 
> 1]) {
>   NUMBER_OF_SHARES[i] = 0
> } else {
>   NUMBER_OF_SHARES[i] = 100 /http://is.na (CLOSE_SHARE_PRICE[i])
> }
>   }
> }  
> 
> 
> Jim Holtman
> Data Munger Guru
>  
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
> 
> 
> On Tue, Apr 13, 2021 at 5:51 AM e-mail ma015k3113 via R-help < 
> r-help@r-project.org mailto:r-help@r-project.org > wrote:
> 
> > > Dear All,I have a dataframe with 4 variables and I am trying to 
> calculate how many shares can be purchased with £100 in the first year when 
> the company was listed
> > 
> > The data looks like:
> > 
> > COMPANY_NUMBER YEAR_END_DATE CLOSE_SHARE_PRICE  NUMBER_OF_SHARES
> > 2270530/09/2002 
> >  NA 0
> > 2270530/09/2004 
> > NA  0
> > 2270530/09/2005 
> >6.55 0
> > 2270530/09/2006 
> >7.5   0
> > 2270530/09/2007 
> >9.65 0
> > 2270530/09/2008 
> >6.55 0
> > 109134731/01/2010   
> >  8.14 0
> > 1091347 31/01/2011  
> > 11.38 0
> > 11356069   30/06/2019  
> > 1.09   0
> > SC192761 31/01/2000 
> > NA   0
> > SC192761 31/01/2001 
> > NA   0
> > SC192761  31/01/2002
> > NA   0
> > SC192761 31/01/2004 
> > NA   0
> > SC192761 31/01/2005 
> > NA   0
> > SC192761  31/01/2006  
> > 1.09   0
> > SC192761  31/01/2008  
> > 1.24   0
> > SC192761  31/01/2009   
> > 0.90
> > SC192761  31/01/2010 
> > 1.140
> > SC192761   31/01/2011
> > 1.250
> > SC192761  31/01/2012 
> > 1.290
> > 
> > 
> > The code I have written is
> > 
> > i <- 0
> > 
> > for (i in 1:(nrow(PLC_Return)-1))
> > if (i == 1)
> > {
> > NUMBER_OF_SHARES[i] = 100/http://is.na (CLOSE_SHARE_PRICE[i])
> > } else if
> > (http://is.na (PLC_Return[i, 1]) ==http://is.na (PLC_Return[i + 1, 
> > 1])
> > {
> > NUMBER_OF_SHARES[i]=0
> > } else
> > {
> > NUMBER_OF_SHARES[i] = 100/http://is.na (CLOSE_SHARE_PRICE[i])
> > }
> > 
> > 
> > The error I get is Error: unexpected 'else' in:
> >

Re: [R] Looping through data error

2021-04-13 Thread Rui Barradas


Hello,

Typo, inline.

Às 17:06 de 13/04/21, Rui Barradas escreveu:

Hello,

A close parenthesis is missing in the nd if.


Should be "the 2nd if".

Rui Barradas




for (i in 1:(nrow(PLC_Return)-1)){
   if (i == 1){
     NUMBER_OF_SHARES[i] = 100/is.na(CLOSE_SHARE_PRICE[i])
   } else if(is.na(PLC_Return[i, 1]) == is.na(PLC_Return[i + 1, 1])){
     NUMBER_OF_SHARES[i]=0
   } else {
     NUMBER_OF_SHARES[i] = 100/is.na(CLOSE_SHARE_PRICE[i])
   }
}


Hope this helps,

Rui Barradas

Às 13:51 de 13/04/21, e-mail ma015k3113 via R-help escreveu:
Dear All,I have a dataframe with 4 variables and I am trying to 
calculate how many shares can be purchased with £100 in the first year 
when the company was listed


The data looks like:

COMPANY_NUMBER YEAR_END_DATE CLOSE_SHARE_PRICE  NUMBER_OF_SHARES
22705
30/09/2002  NA 0
22705
30/09/2004 NA  0
22705    30/09/2005
6.55 0
22705    30/09/2006
7.5   0
22705    30/09/2007
9.65 0
22705    30/09/2008
6.55 0
1091347    31/01/2010
8.14 0
1091347 31/01/2011  
11.38 0
11356069   30/06/2019  
1.09   0
SC192761 31/01/2000 
NA   0
SC192761 31/01/2001 
NA   0
SC192761  31/01/2002
NA   0
SC192761 31/01/2004 
NA   0
SC192761 31/01/2005 
NA   0
SC192761  31/01/2006  
1.09   0
SC192761  31/01/2008  
1.24   0
SC192761  31/01/2009   
0.9    0
SC192761  31/01/2010 
1.14    0
SC192761   31/01/2011
1.25    0
SC192761  31/01/2012 
1.29    0



The code I have written is

i <- 0

for (i in 1:(nrow(PLC_Return)-1))
if (i == 1)
{
NUMBER_OF_SHARES[i] = 100/is.na(CLOSE_SHARE_PRICE[i])
} else if
(is.na(PLC_Return[i, 1]) == is.na(PLC_Return[i + 1, 1])
{
NUMBER_OF_SHARES[i]=0
} else
{
NUMBER_OF_SHARES[i] = 100/is.na(CLOSE_SHARE_PRICE[i])
}


The error I get is Error: unexpected 'else' in:

" NUMBER_OF_SHARES[i] = 0
} else"

{NUMBER_OF_SHARES[i] = 100/is.na(CLOSE_SHARE_PRICE[i])}

}

Error: unexpected '}' in "}"


Don't know how to fix it-any help will be appreciated.


Kind regards


Ahson
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping through data error

2021-04-13 Thread Rui Barradas


Hello,

A close parenthesis is missing in the nd if.


for (i in 1:(nrow(PLC_Return)-1)){
  if (i == 1){
NUMBER_OF_SHARES[i] = 100/is.na(CLOSE_SHARE_PRICE[i])
  } else if(is.na(PLC_Return[i, 1]) == is.na(PLC_Return[i + 1, 1])){
NUMBER_OF_SHARES[i]=0
  } else {
NUMBER_OF_SHARES[i] = 100/is.na(CLOSE_SHARE_PRICE[i])
  }
}


Hope this helps,

Rui Barradas

Às 13:51 de 13/04/21, e-mail ma015k3113 via R-help escreveu:

Dear All,I have a dataframe with 4 variables and I am trying to calculate how 
many shares can be purchased with £100 in the first year when the company was 
listed

The data looks like:

COMPANY_NUMBER YEAR_END_DATE CLOSE_SHARE_PRICE  NUMBER_OF_SHARES
2270530/09/2002  NA 
0
2270530/09/2004 NA  
0
2270530/09/20056.55 
0
2270530/09/20067.5  
 0
2270530/09/20079.65 
0
2270530/09/20086.55 
0
109134731/01/20108.14   
  0
1091347 31/01/2011  11.38   
  0
11356069   30/06/2019  1.09 
  0
SC192761 31/01/2000 NA  
 0
SC192761 31/01/2001 NA  
 0
SC192761  31/01/2002NA  
 0
SC192761 31/01/2004 NA  
 0
SC192761 31/01/2005 NA  
 0
SC192761  31/01/2006  1.09  
 0
SC192761  31/01/2008  1.24  
 0
SC192761  31/01/2009   0.9  
  0
SC192761  31/01/2010 1.14   
 0
SC192761   31/01/20111.25   
 0
SC192761  31/01/2012 1.29   
 0


The code I have written is

i <- 0

for (i in 1:(nrow(PLC_Return)-1))
if (i == 1)
{
NUMBER_OF_SHARES[i] = 100/is.na(CLOSE_SHARE_PRICE[i])
} else if
(is.na(PLC_Return[i, 1]) == is.na(PLC_Return[i + 1, 1])
{
NUMBER_OF_SHARES[i]=0
} else
{
NUMBER_OF_SHARES[i] = 100/is.na(CLOSE_SHARE_PRICE[i])
}


The error I get is Error: unexpected 'else' in:

" NUMBER_OF_SHARES[i] = 0
} else"

{NUMBER_OF_SHARES[i] = 100/is.na(CLOSE_SHARE_PRICE[i])}

}

Error: unexpected '}' in "}"


Don't know how to fix it-any help will be appreciated.


Kind regards


Ahson
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping through data error

2021-04-13 Thread jim holtman

Your code was formatted incorrectly.  There is always a problem with the
'else' statement after an 'if' since in R there is no semicolon to mark the
end of a line.  Here might be a better format for your code.  I would
recommend the liberal use of "{}"s when using 'if/else'



i <- 0

for (i in 1:(nrow(PLC_Return) - 1)) {
  if (i == 1) {
NUMBER_OF_SHARES[i] = 100 / is.na(CLOSE_SHARE_PRICE[i])
  } else {
if (is.na(PLC_Return[i, 1]) == is.na(PLC_Return[i + 1, 1]) {
  NUMBER_OF_SHARES[i] = 0
} else {
  NUMBER_OF_SHARES[i] = 100 / is.na(CLOSE_SHARE_PRICE[i])
}
  }
}


Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Tue, Apr 13, 2021 at 5:51 AM e-mail ma015k3113 via R-help <
r-help@r-project.org> wrote:

> Dear All,I have a dataframe with 4 variables and I am trying to calculate
> how many shares can be purchased with £100 in the first year when the
> company was listed
>
> The data looks like:
>
> COMPANY_NUMBER YEAR_END_DATE CLOSE_SHARE_PRICE  NUMBER_OF_SHARES
> 2270530/09/2002
> NA 0
> 2270530/09/2004
>  NA  0
> 2270530/09/2005
> 6.55 0
> 2270530/09/2006
> 7.5   0
> 2270530/09/2007
> 9.65 0
> 2270530/09/2008
> 6.55 0
> 109134731/01/20108.14
>0
> 1091347 31/01/2011  11.38
>0
> 11356069   30/06/2019  1.09
>0
> SC192761 31/01/2000 NA
>0
> SC192761 31/01/2001 NA
>0
> SC192761  31/01/2002NA
>0
> SC192761 31/01/2004 NA
>0
> SC192761 31/01/2005 NA
>0
> SC192761  31/01/2006  1.09
>0
> SC192761  31/01/2008  1.24
>0
> SC192761  31/01/2009   0.9
> 0
> SC192761  31/01/2010 1.14
>   0
> SC192761   31/01/20111.25
>   0
> SC192761  31/01/2012 1.29
>   0
>
>
> The code I have written is
>
> i <- 0
>
> for (i in 1:(nrow(PLC_Return)-1))
> if (i == 1)
> {
> NUMBER_OF_SHARES[i] = 100/is.na(CLOSE_SHARE_PRICE[i])
> } else if
> (is.na(PLC_Return[i, 1]) == is.na(PLC_Return[i + 1, 1])
> {
> NUMBER_OF_SHARES[i]=0
> } else
> {
> NUMBER_OF_SHARES[i] = 100/is.na(CLOSE_SHARE_PRICE[i])
> }
>
>
> The error I get is Error: unexpected 'else' in:
>
> " NUMBER_OF_SHARES[i] = 0
> } else"
> > {NUMBER_OF_SHARES[i] = 100/is.na(CLOSE_SHARE_PRICE[i])}
> >
> > }
> Error: unexpected '}' in "}"
>
>
> Don't know how to fix it-any help will be appreciated.
>
>
> Kind regards
>
>
> Ahson
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Looping through data error

2021-04-13 Thread e-mail ma015k3113 via R-help

Dear All,I have a dataframe with 4 variables and I am trying to calculate how 
many shares can be purchased with £100 in the first year when the company was 
listed

The data looks like:

COMPANY_NUMBER YEAR_END_DATE CLOSE_SHARE_PRICE  NUMBER_OF_SHARES
2270530/09/2002  NA 
0
2270530/09/2004 NA  
0
2270530/09/20056.55 
0
2270530/09/20067.5  
 0
2270530/09/20079.65 
0
2270530/09/20086.55 
0
109134731/01/20108.14   
  0
1091347 31/01/2011  11.38   
  0
11356069   30/06/2019  1.09 
  0
SC192761 31/01/2000 NA  
 0
SC192761 31/01/2001 NA  
 0
SC192761  31/01/2002NA  
 0
SC192761 31/01/2004 NA  
 0
SC192761 31/01/2005 NA  
 0
SC192761  31/01/2006  1.09  
 0
SC192761  31/01/2008  1.24  
 0
SC192761  31/01/2009   0.9  
  0
SC192761  31/01/2010 1.14   
 0
SC192761   31/01/20111.25   
 0
SC192761  31/01/2012 1.29   
 0


The code I have written is

i <- 0

for (i in 1:(nrow(PLC_Return)-1))
if (i == 1)
{
NUMBER_OF_SHARES[i] = 100/is.na(CLOSE_SHARE_PRICE[i])
} else if
(is.na(PLC_Return[i, 1]) == is.na(PLC_Return[i + 1, 1])
{
NUMBER_OF_SHARES[i]=0
} else
{
NUMBER_OF_SHARES[i] = 100/is.na(CLOSE_SHARE_PRICE[i])
}


The error I get is Error: unexpected 'else' in:

" NUMBER_OF_SHARES[i] = 0
} else"
> {NUMBER_OF_SHARES[i] = 100/is.na(CLOSE_SHARE_PRICE[i])}
>
> }
Error: unexpected '}' in "}"


Don't know how to fix it-any help will be appreciated.


Kind regards


Ahson
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping thorugh dataframe

2020-07-22 Thread William Dunlap via R-help

> library(dplyr, warn.conflicts=FALSE)
> d <- data.frame(Company=c("MATH","IFUL","SSI","MATH","MATH","SSI"), 
> Turnover=c(2,3,5,7,9,11))
> d %>% group_by(Company) %>% summarize(Count=n(), MeanTurnover=mean(Turnover), 
> TotalTurnover=sum(Turnover))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 4
  Company Count MeanTurnover TotalTurnover
  
1 IFUL13 3
2 MATH3618
3 SSI 2816

[The 'override with .groups' comment arose in a recent version of
dplyr.  It is a bit annoying.]

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, Jul 22, 2020 at 4:36 PM e-mail ma015k3113 via R-help
 wrote:
>
> Bert, thanks for responding to my email. I do realise that newbie's like my 
> can expect curt answers but not to worry. I am definitely learning 'R' and 
> what I posted are also statements from R. The statements run perfectly well 
> but don't do what I want them to do. My mistake I have posted sample data. 
> Here is the data:
>
> COMPANY_NUMBER  COMPANY_NAMEYEAR_END_DATE   Turnover
> 22705   AA  30/09/10420,000
> 22705   AA  30/09/09406,000
> 113560  BB  30/06/19474,000
> 192761  CC  31/01/19796,000
> 192761  CC  31/01/18909,000
> 192761  CC  31/01/17788,000
> 5625107 DD  30/06/193,254,002
> 5625107 DD  30/06/181,840,436
>
> All_companies$count <-0
> while All_companies$COMPANY_NAME == All_companies$COMPANY_NAME + 1
> + {All_companies$count=All_companies$count+1}
>
> I want to find out many times each company has appeared in the dataframe and 
> the average of the turnover for the years. Like company AA appears twice and 
> average turnover is 413,000.
>
> 'All_companies' is the name of the dataframe.
>
> In the end apologies for not being more clear the first time around and of 
> course many thanks for your help in advance.
>
> Kind regards
>
>
> Ahson
>
> On 21 July 2020 at 18:41 Bert Gunter  wrote:
>
> What language are you programming in? -- it certainly isn't R.
>
> I suggest that you stop what you're doing and go through an R tutorial or two 
> before proceeding. This list cannot serve as a substitute for doing such 
> homework (is this homework, btw? -- that's off topic here) nor can we provide 
> such tutorials.
>
> I'm pretty sure the answer is quite simple, though it's a bit unclear as you 
> did not provide a reprex (see the posting guide linked below for how to post 
> here). However, I see no purpose in my blurting it out when you do not seem 
> aware of even the most basic R constructs -- e.g. see ?while. Of course, 
> others may disagree and provide you what you seek.
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and 
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping thorugh dataframe

2020-07-22 Thread Sarah Goslee

Hi,

Your sample code suggests that you don't yet understand how R works,
and might benefit from a tutorial or two. However, your verbal
description of what you want is quite straightforward. Here's a
R-style way to count the number of times each company appears, and to
get the mean value of Turnover for each company:


All_companies <- read.table(text =
"COMPANY_NUMBER  COMPANY_NAMEYEAR_END_DATE   Turnover
22705   AA  30/09/1042
22705   AA  30/09/09406000
113560  BB  30/06/19474000
192761  CC  31/01/19796000
192761  CC  31/01/18909000
192761  CC  31/01/17788000
5625107 DD  30/06/193254002
5625107 DD  30/06/181840436", header=TRUE)

table(All_companies$COMPANY_NAME)

AA BB CC DD
 2  1  3  2

aggregate(Turnover ~ COMPANY_NAME, data = All_companies, FUN = mean)

  COMPANY_NAME Turnover
1   AA   413000
2   BB   474000
3   CC   831000
4   DD  2547219


On Wed, Jul 22, 2020 at 7:36 PM e-mail ma015k3113 via R-help
 wrote:
>
> Bert, thanks for responding to my email. I do realise that newbie's like my 
> can expect curt answers but not to worry. I am definitely learning 'R' and 
> what I posted are also statements from R. The statements run perfectly well 
> but don't do what I want them to do. My mistake I have posted sample data. 
> Here is the data:
>
> COMPANY_NUMBER  COMPANY_NAMEYEAR_END_DATE   Turnover
> 22705   AA  30/09/10420,000
> 22705   AA  30/09/09406,000
> 113560  BB  30/06/19474,000
> 192761  CC  31/01/19796,000
> 192761  CC  31/01/18909,000
> 192761  CC  31/01/17788,000
> 5625107 DD  30/06/193,254,002
> 5625107 DD  30/06/181,840,436
>
> All_companies$count <-0
> while All_companies$COMPANY_NAME == All_companies$COMPANY_NAME + 1
> + {All_companies$count=All_companies$count+1}
>
> I want to find out many times each company has appeared in the dataframe and 
> the average of the turnover for the years. Like company AA appears twice and 
> average turnover is 413,000.
>
> 'All_companies' is the name of the dataframe.
>
> In the end apologies for not being more clear the first time around and of 
> course many thanks for your help in advance.
>
> Kind regards
>
>
> Ahson
>
> On 21 July 2020 at 18:41 Bert Gunter  wrote:
>
> What language are you programming in? -- it certainly isn't R.
>
> I suggest that you stop what you're doing and go through an R tutorial or two 
> before proceeding. This list cannot serve as a substitute for doing such 
> homework (is this homework, btw? -- that's off topic here) nor can we provide 
> such tutorials.
>
> I'm pretty sure the answer is quite simple, though it's a bit unclear as you 
> did not provide a reprex (see the posting guide linked below for how to post 
> here). However, I see no purpose in my blurting it out when you do not seem 
> aware of even the most basic R constructs -- e.g. see ?while. Of course, 
> others may disagree and provide you what you seek.
>


-- 
Sarah Goslee (she/her)
http://www.numberwright.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping thorugh dataframe

2020-07-22 Thread e-mail ma015k3113 via R-help

Bert, thanks for responding to my email. I do realise that newbie's like my can 
expect curt answers but not to worry. I am definitely learning 'R' and what I 
posted are also statements from R. The statements run perfectly well but don't 
do what I want them to do. My mistake I have posted sample data. Here is the 
data:

COMPANY_NUMBER  COMPANY_NAMEYEAR_END_DATE   Turnover
22705   AA  30/09/10420,000
22705   AA  30/09/09406,000
113560  BB  30/06/19474,000
192761  CC  31/01/19796,000
192761  CC  31/01/18909,000
192761  CC  31/01/17788,000
5625107 DD  30/06/193,254,002
5625107 DD  30/06/181,840,436

All_companies$count <-0
while All_companies$COMPANY_NAME == All_companies$COMPANY_NAME + 1
+ {All_companies$count=All_companies$count+1}

I want to find out many times each company has appeared in the dataframe and 
the average of the turnover for the years. Like company AA appears twice and 
average turnover is 413,000.

'All_companies' is the name of the dataframe.

In the end apologies for not being more clear the first time around and of 
course many thanks for your help in advance.

Kind regards


Ahson

On 21 July 2020 at 18:41 Bert Gunter  wrote:

What language are you programming in? -- it certainly isn't R.

I suggest that you stop what you're doing and go through an R tutorial or two 
before proceeding. This list cannot serve as a substitute for doing such 
homework (is this homework, btw? -- that's off topic here) nor can we provide 
such tutorials.

I'm pretty sure the answer is quite simple, though it's a bit unclear as you 
did not provide a reprex (see the posting guide linked below for how to post 
here). However, I see no purpose in my blurting it out when you do not seem 
aware of even the most basic R constructs -- e.g. see ?while. Of course, others 
may disagree and provide you what you seek.


Bert Gunter

"The trouble with having an open mind is that people keep coming along and 
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping through a dataframe

2020-07-21 Thread Jim Lemon

Hi Ahson,
Guessing what your data frame might look like, here are two easy ways:

All_companies<-data.frame(year=c(1970:2015,2000:2015,2010:2015),
 COMPANY_NUMBER=c(rep(1,46),rep(2,16),rep(3,6)),
 COMPANY_NAME=c(rep("IBM",46),rep("AMAZON",16),rep("SPACE-X",6)))
# easy ways
table(All_companies$COMPANY_NAME)
table(All_companies$COMPANY_NUMBER)

I'm too lazy to provide a difficult way.

Jim

On Wed, Jul 22, 2020 at 3:21 AM e-mail ma015k3113 via R-help
 wrote:
>
> Dear All, I have a dataframe which has a few thousand companies with unique 
> company numbers and names and  each company has data for several years and 
> each year is stored in a separate row.
>
> I want to get a total for the number of years of data for each company. When 
> I loop through the data with the following command  I get a value of ‘1’ 
> rather than a total of the rows for each company
>
> All_companies$count <-0
>
> while All_companies$COMPANY_NAME == All_companies$COMPANY_NAME + 1
>
> + {All_companies$count=All_companies$count+1}
>
> Can you kindly help me on this?
>
> Ahson
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping through a dataframe

2020-07-21 Thread Roger Bos

What you are asking is one area where the package data.table really
shines.  You didn't provide an example, but based on your question you
would do something like:

library(data.table)
dt <- as.data.table(All_companies)
dt[, .N, by=COMPANY_NAME]

You will have to read up on data.table, but .N gives you the number of
observations and when using data.table (unike data.frames) you can use the
column name directly in the 'by' parameter with needing to append the name
of the R object or use quotes.

Obviously this is just one of many ways to do what you are asking.

HTH, Roger

On Tue, Jul 21, 2020 at 1:21 PM e-mail ma015k3113 via R-help <
r-help@r-project.org> wrote:

> Dear All, I have a dataframe which has a few thousand companies with
> unique company numbers and names and  each company has data for several
> years and each year is stored in a separate row.
>
> I want to get a total for the number of years of data for each company.
> When I loop through the data with the following command  I get a value of
> ‘1’ rather than a total of the rows for each company
>
> All_companies$count <-0
>
> while All_companies$COMPANY_NAME == All_companies$COMPANY_NAME + 1
>
> + {All_companies$count=All_companies$count+1}
>
> Can you kindly help me on this?
>
> Ahson
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping through a dataframe

2020-07-21 Thread John Kane

It occurs to me a simple table command will do what you say you want but I
suspect the real analysis is more complicated

dat1  <-  data.frame(aa =  sample(letters[1:5], 10, replace = TRUE),
   bb  =  1:10)

 table(dat1$aa)

On Tue, 21 Jul 2020 at 14:01, John Kane  wrote:

> As Bert says that does not look like R
>
> Have a look an these links for some suggestions on asking questions here.
>
>  http://adv-r.had.co.nz/Reproducibility.html
>
>
> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
>
> On Tue, 21 Jul 2020 at 13:42, Bert Gunter  wrote:
>
>> What language are you programming in? -- it certainly isn't R.
>>
>> I suggest that you stop what you're doing and go through an R tutorial or
>> two before proceeding. This list cannot serve as a substitute for doing
>> such homework (is this homework, btw? -- that's off topic here) nor can we
>> provide such tutorials.
>>
>> I'm pretty sure the answer is quite simple, though it's a bit unclear as
>> you did not provide a reprex (see the posting guide linked below for how
>> to
>> post here). However, I see no purpose in my blurting it out when you do
>> not
>> seem aware of even the most basic R constructs -- e.g. see ?while. Of
>> course, others may disagree and provide you what you seek.
>>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along and
>> sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Tue, Jul 21, 2020 at 10:21 AM e-mail ma015k3113 via R-help <
>> r-help@r-project.org> wrote:
>>
>> > Dear All, I have a dataframe which has a few thousand companies with
>> > unique company numbers and names and  each company has data for several
>> > years and each year is stored in a separate row.
>> >
>> > I want to get a total for the number of years of data for each company.
>> > When I loop through the data with the following command  I get a value
>> of
>> > ‘1’ rather than a total of the rows for each company
>> >
>> > All_companies$count <-0
>> >
>> > while All_companies$COMPANY_NAME == All_companies$COMPANY_NAME + 1
>> >
>> > + {All_companies$count=All_companies$count+1}
>> >
>> > Can you kindly help me on this?
>> >
>> > Ahson
>> > [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> --
> John Kane
> Kingston ON Canada
>


-- 
John Kane
Kingston ON Canada

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping through a dataframe

2020-07-21 Thread John Kane

As Bert says that does not look like R

Have a look an these links for some suggestions on asking questions here.

 http://adv-r.had.co.nz/Reproducibility.html

http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

On Tue, 21 Jul 2020 at 13:42, Bert Gunter  wrote:

> What language are you programming in? -- it certainly isn't R.
>
> I suggest that you stop what you're doing and go through an R tutorial or
> two before proceeding. This list cannot serve as a substitute for doing
> such homework (is this homework, btw? -- that's off topic here) nor can we
> provide such tutorials.
>
> I'm pretty sure the answer is quite simple, though it's a bit unclear as
> you did not provide a reprex (see the posting guide linked below for how to
> post here). However, I see no purpose in my blurting it out when you do not
> seem aware of even the most basic R constructs -- e.g. see ?while. Of
> course, others may disagree and provide you what you seek.
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Tue, Jul 21, 2020 at 10:21 AM e-mail ma015k3113 via R-help <
> r-help@r-project.org> wrote:
>
> > Dear All, I have a dataframe which has a few thousand companies with
> > unique company numbers and names and  each company has data for several
> > years and each year is stored in a separate row.
> >
> > I want to get a total for the number of years of data for each company.
> > When I loop through the data with the following command  I get a value of
> > ‘1’ rather than a total of the rows for each company
> >
> > All_companies$count <-0
> >
> > while All_companies$COMPANY_NAME == All_companies$COMPANY_NAME + 1
> >
> > + {All_companies$count=All_companies$count+1}
> >
> > Can you kindly help me on this?
> >
> > Ahson
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
John Kane
Kingston ON Canada

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping through a dataframe

2020-07-21 Thread Bert Gunter

What language are you programming in? -- it certainly isn't R.

I suggest that you stop what you're doing and go through an R tutorial or
two before proceeding. This list cannot serve as a substitute for doing
such homework (is this homework, btw? -- that's off topic here) nor can we
provide such tutorials.

I'm pretty sure the answer is quite simple, though it's a bit unclear as
you did not provide a reprex (see the posting guide linked below for how to
post here). However, I see no purpose in my blurting it out when you do not
seem aware of even the most basic R constructs -- e.g. see ?while. Of
course, others may disagree and provide you what you seek.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Tue, Jul 21, 2020 at 10:21 AM e-mail ma015k3113 via R-help <
r-help@r-project.org> wrote:

> Dear All, I have a dataframe which has a few thousand companies with
> unique company numbers and names and  each company has data for several
> years and each year is stored in a separate row.
>
> I want to get a total for the number of years of data for each company.
> When I loop through the data with the following command  I get a value of
> ‘1’ rather than a total of the rows for each company
>
> All_companies$count <-0
>
> while All_companies$COMPANY_NAME == All_companies$COMPANY_NAME + 1
>
> + {All_companies$count=All_companies$count+1}
>
> Can you kindly help me on this?
>
> Ahson
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Looping through a dataframe

2020-07-21 Thread e-mail ma015k3113 via R-help

Dear All, I have a dataframe which has a few thousand companies with unique 
company numbers and names and  each company has data for several years and each 
year is stored in a separate row.

I want to get a total for the number of years of data for each company. When I 
loop through the data with the following command  I get a value of ‘1’ rather 
than a total of the rows for each company

All_companies$count <-0

while All_companies$COMPANY_NAME == All_companies$COMPANY_NAME + 1

+ {All_companies$count=All_companies$count+1}

Can you kindly help me on this?

Ahson
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping with looping

2019-04-19 Thread ani jaya

Thank you very much, Dr. Snow, your suggestion helps a lot.

Best,
Saat

On Fri, Apr 19, 2019 at 4:14 AM Greg Snow <538...@gmail.com> wrote:

> When the goal of looping is to compute something and save each
> iteration into a vector or list, then it is usually easier to use the
> lapply/sapply/replicate functions and save the result into a single
> list rather than a bunch of global variables.
>
> Here is a quick example that does the same computations as your code,
> but save the results into a list where each element is a vector of
> length 100:
>
> sam<-c(9,7,8,6,6,7,8,6,7,3)
> a <- lapply(2:9, function(k){
>   replicate(100, mean(sample(sam, k, replace=TRUE)))
> })
>
> # optional
> names(a) <- sprintf("a%i", 2:9)
>
> hist(a[["a2"]]
> hist(a$a9)
> w <- "a5"
> hist(a[[w]])
>
>
> Saving everything into a single list (or matrix/array/etc.) makes it
> easier to loop over all of the results later on (and prevents the hard
> to track down bugs from using dynamically named global variables).
> Here is an example based on the results from above:
>
> par(mfrow=c(3,3))
> for(i in seq_along(a)) {
>   hist(a[[i]], xlab='x', main=sprintf("k = %i", (2:9)[i]))
> }
>
>
>
>
>
> On Thu, Apr 18, 2019 at 9:19 AM ani jaya  wrote:
> >
> > Dear R community,
> >
> > I'm trying to create a looping to see the effect of number of samples
> from
> > one dataset.
> > Lets say I have 10 values in a single data frame and I want to see the
> mean
> > of each sampling let say from 2-9 number of sampling. But I want to do
> the
> > repetition let say up to 100 for each number of sampling and put it in a
> > different dataframe, let say a2,a3,a4,... which contain a2[1] is the mean
> > of first repetition and so on. I believe this is possible but I'm newbie
> > here.
> >
> > > version
> >
> > platform   x86_64-w64-mingw32
> > arch   x86_64
> > os mingw32
> > system x86_64, mingw32
> > status
> > major  3
> > minor  5.3
> > year   2019
> > month  03
> > day11
> > svn rev76217
> > language   R
> > version.string R version 3.5.3 (2019-03-11)
> > nickname   Great Truth
> >
> >
> >  The simple code that I have:
> >
> > sam<-c(9,7,8,6,6,7,8,6,7,3)
> > for (k in seq(2,9,1)){
> > a <- numeric(100)
> >   for (i in 1:100){
> >   a[i] <- mean(sample(sam,k,replace=T))
> >
> >   }
> >   }
> >
> > I can do enough with this code but i want to the variable name also
> > move based on k.
> >
> > I have googling enough and meet assign and paste command but not really
> help.
> > Any help would be appreciate.
> >
> >
> >
> > Best,
> >
> > Saat M.
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Gregory (Greg) L. Snow Ph.D.
> 538...@gmail.com
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping with looping

2019-04-18 Thread Greg Snow

When the goal of looping is to compute something and save each
iteration into a vector or list, then it is usually easier to use the
lapply/sapply/replicate functions and save the result into a single
list rather than a bunch of global variables.

Here is a quick example that does the same computations as your code,
but save the results into a list where each element is a vector of
length 100:

sam<-c(9,7,8,6,6,7,8,6,7,3)
a <- lapply(2:9, function(k){
  replicate(100, mean(sample(sam, k, replace=TRUE)))
})

# optional
names(a) <- sprintf("a%i", 2:9)

hist(a[["a2"]]
hist(a$a9)
w <- "a5"
hist(a[[w]])


Saving everything into a single list (or matrix/array/etc.) makes it
easier to loop over all of the results later on (and prevents the hard
to track down bugs from using dynamically named global variables).
Here is an example based on the results from above:

par(mfrow=c(3,3))
for(i in seq_along(a)) {
  hist(a[[i]], xlab='x', main=sprintf("k = %i", (2:9)[i]))
}





On Thu, Apr 18, 2019 at 9:19 AM ani jaya  wrote:
>
> Dear R community,
>
> I'm trying to create a looping to see the effect of number of samples from
> one dataset.
> Lets say I have 10 values in a single data frame and I want to see the mean
> of each sampling let say from 2-9 number of sampling. But I want to do the
> repetition let say up to 100 for each number of sampling and put it in a
> different dataframe, let say a2,a3,a4,... which contain a2[1] is the mean
> of first repetition and so on. I believe this is possible but I'm newbie
> here.
>
> > version
>
> platform   x86_64-w64-mingw32
> arch   x86_64
> os mingw32
> system x86_64, mingw32
> status
> major  3
> minor  5.3
> year   2019
> month  03
> day11
> svn rev76217
> language   R
> version.string R version 3.5.3 (2019-03-11)
> nickname   Great Truth
>
>
>  The simple code that I have:
>
> sam<-c(9,7,8,6,6,7,8,6,7,3)
> for (k in seq(2,9,1)){
> a <- numeric(100)
>   for (i in 1:100){
>   a[i] <- mean(sample(sam,k,replace=T))
>
>   }
>   }
>
> I can do enough with this code but i want to the variable name also
> move based on k.
>
> I have googling enough and meet assign and paste command but not really help.
> Any help would be appreciate.
>
>
>
> Best,
>
> Saat M.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Looping with looping

2019-04-18 Thread ani jaya

Dear R community,

I'm trying to create a looping to see the effect of number of samples from
one dataset.
Lets say I have 10 values in a single data frame and I want to see the mean
of each sampling let say from 2-9 number of sampling. But I want to do the
repetition let say up to 100 for each number of sampling and put it in a
different dataframe, let say a2,a3,a4,... which contain a2[1] is the mean
of first repetition and so on. I believe this is possible but I'm newbie
here.

> version

platform   x86_64-w64-mingw32
arch   x86_64
os mingw32
system x86_64, mingw32
status
major  3
minor  5.3
year   2019
month  03
day11
svn rev76217
language   R
version.string R version 3.5.3 (2019-03-11)
nickname   Great Truth


 The simple code that I have:

sam<-c(9,7,8,6,6,7,8,6,7,3)
for (k in seq(2,9,1)){
a <- numeric(100)
  for (i in 1:100){
  a[i] <- mean(sample(sam,k,replace=T))

  }
  }

I can do enough with this code but i want to the variable name also
move based on k.

I have googling enough and meet assign and paste command but not really help.
Any help would be appreciate.



Best,

Saat M.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] looping using 'diverse' package measures

2017-10-19 Thread David L Carlson

You really need to spend some time learning the basics of R. There are 
thousands of R packages, so you also need to spend time reading the 
documentation for the package so that you can show us what the data format 
should be like. Here are some simple ways to transform the data. You should 
also use dput() to include your data in your email, not just a listing which 
can remove important information about the structure of the original data:

> Example <- structure(list(companyid = c(85390L, 85390L, 85390L, 85390L, 
85390L, 85390L, 85390L, 85390L, 85390L, 85390L, 85390L, 85390L, 
85390L, 85390L, 85390L, 4391076L, 4391076L, 4391076L, 4391076L, 
4391076L, 4391076L, 4391076L, 4391076L, 4391076L, 4391076L), 
year = c(1999L, 1999L, 1999L, 1999L, 1999L, 2000L, 2000L, 
2000L, 2000L, 2000L, 2001L, 2001L, 2001L, 2001L, 2001L, 2005L, 
2005L, 2005L, 2005L, 2005L, 2005L, 2005L, 2005L, 2005L, 2005L
), workerid = c(46446384, 12680, 16330, 60225451, 
60195422, 60225451, 3.571e+09, 16330, 16330, 12680, 
60195422, 60225451, 46446384, 60195422, 60225451, 13753759, 
49988911, 11240, 18550, 35649643, 65809705, 11420, 
19210, 64258701, 1.212e+09), gender = c(0L, 1L, 0L, 0L, 
0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 1L)), .Names = c("companyid", "year", 
"workerid", "gender"), class = "data.frame", row.names = c(NA, 
-25L))
> aggregate(gender~companyid+year, Example, mean)
  companyid year gender
1 85390 19990.2
2 85390 20000.2
3 85390 20010.2
4   4391076 20050.1

> aggregate(gender~companyid+year, Example, table)
  companyid year gender.0 gender.1
1 85390 199941
2 85390 200041
3 85390 200141
4   4391076 200591

> x <- xtabs(~gender+companyid+year, Example)
> ftable(x, row.vars=2:3, col.vars=1)
   gender 0 1
companyid year   
85390 19994 1
  20004 1
  20014 1
  20050 0
4391076   19990 0
  20000 0
  20010 0
  20059 1

You should read these manual pages:
?dput
?aggregate
?xtabs
?ftable


David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77843-4352




-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Li Jiang
Sent: Thursday, October 19, 2017 4:08 AM
To: r-help@r-project.org
Subject: [R] looping using 'diverse' package measures

Hi everyone,

I'm new at R (although I'm a Stata user for some time and somehow proficient in 
it) and I'm trying to use the 'diverse' R package to compute a few diversity 
measures on a sample of firms for a period of about 10 years. I was wondering 
if you can give me some hints on how to best proceed on using the 'diverse' 
package.

My sample has the following setup. It's comprised of a annual variable number 
of firms which are identified by the companyid variable and the year variable 
(unbalanced panel). In addition I also have a variable identifying the worker, 
workerid. I then have a set of variables which i want to use as the basis for 
calculating some of the measures in the 'diverse' package. An example of the 
sample is as follows, using the gender variable (0 for male and 1 for female) 
as the variable of interest:

companyid   yearworkeridgender
85390   1999464463840
85390   199912680   1
85390   199916330   0
85390   1999602254510
85390   1999601954220
85390   2000602254510
85390   2000357100  1
85390   200016330   0
85390   200016330   0
85390   200012680   0
85390   2001601954220
85390   2001602254511
85390   2001464463840
85390   2001601954220
85390   2001602254510
4391076 2005137537590
4391076 2005499889110
4391076 200511240   0
4391076 200518550   0
4391076 2005356496430
4391076 2005658097050
4391076 200511420   0
4391076 200519210   0
4391076 2005642587010
4391076 2005121200  1

Based on the 'diverse' need to calculate for each firm, for each year, for 
instance the diversity(gender) measure.  in Stata this would be obtained just a 
issuing a by firm year command, but have no idea how to tackle this is issue in 
R. Any ideas?

Best wishes,

Li

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__

Re: [R] looping using 'diverse' package measures

2017-10-19 Thread Michael Dewey


Dear Li

Not absolutely sure what you want to do but try
?aggregate
?by
?apply (or one of the other apply functions, possibly tapply


On 19/10/2017 11:29, Li Jiang wrote:

Hi everyone,

I'm new at R (although I'm a Stata user for some time and somehow
proficient in it) and I'm trying to use the 'diverse' R package to compute
a few diversity measures on a sample of firms for a period of about 10
years. I was wondering if you can give me some hints on how to best proceed
on using the 'diverse' package.

My sample has the following setup. It's comprised of a annual variable
number of firms which are identified by the companyid variable and the year
variable (unbalanced panel). In addition I also have a variable identifying
the worker, workerid. I then have a set of variables which i want to use as
the basis for calculating some of the measures in the 'diverse' package. An
example of the sample is as follows, using the gender variable (0 for male
and 1 for female) as the variable of interest:

companyid   yearworkeridgender
85390   1999464463840
85390   199912680   1
85390   199916330   0
85390   1999602254510
85390   1999601954220
85390   2000602254510
85390   2000357100  1
85390   200016330   0
85390   200016330   0
85390   200012680   0
85390   2001601954220
85390   2001602254511
85390   2001464463840
85390   2001601954220
85390   2001602254510
4391076 2005137537590
4391076 2005499889110
4391076 200511240   0
4391076 200518550   0
4391076 2005356496430
4391076 2005658097050
4391076 200511420   0
4391076 200519210   0
4391076 2005642587010
4391076 2005121200  1

Based on the 'diverse' need to calculate for each firm, for each year, for
instance the diversity(gender) measure.  in Stata this would be obtained
just a issuing a by firm year command, but have no idea how to tackle this
is issue in R. Any ideas?

Best wishes,

Li

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

---
This email has been checked for viruses by AVG.
http://www.avg.com




--
Michael
http://www.dewey.myzen.co.uk/home.html

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] looping using 'diverse' package measures

2017-10-19 Thread Li Jiang

Hi everyone,

I'm new at R (although I'm a Stata user for some time and somehow
proficient in it) and I'm trying to use the 'diverse' R package to compute
a few diversity measures on a sample of firms for a period of about 10
years. I was wondering if you can give me some hints on how to best proceed
on using the 'diverse' package.

My sample has the following setup. It's comprised of a annual variable
number of firms which are identified by the companyid variable and the year
variable (unbalanced panel). In addition I also have a variable identifying
the worker, workerid. I then have a set of variables which i want to use as
the basis for calculating some of the measures in the 'diverse' package. An
example of the sample is as follows, using the gender variable (0 for male
and 1 for female) as the variable of interest:

companyid   yearworkeridgender
85390   1999464463840
85390   199912680   1
85390   199916330   0
85390   1999602254510
85390   1999601954220
85390   2000602254510
85390   2000357100  1
85390   200016330   0
85390   200016330   0
85390   200012680   0
85390   2001601954220
85390   2001602254511
85390   2001464463840
85390   2001601954220
85390   2001602254510
4391076 2005137537590
4391076 2005499889110
4391076 200511240   0
4391076 200518550   0
4391076 2005356496430
4391076 2005658097050
4391076 200511420   0
4391076 200519210   0
4391076 2005642587010
4391076 2005121200  1

Based on the 'diverse' need to calculate for each firm, for each year, for
instance the diversity(gender) measure.  in Stata this would be obtained
just a issuing a by firm year command, but have no idea how to tackle this
is issue in R. Any ideas?

Best wishes,

Li

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] looping using 'diverse' package measures

2017-10-19 Thread Li Jiang

Hi everyone,

I'm new at R (although I'm a Stata user for some time and somehow
proficient in it) and I'm trying to use the 'diverse' R package to compute
a few diversity measures on a sample of firms for a period of about 10
years. I was wondering if you can give me some hints on how to best proceed
on using the 'diverse' package.

My sample has the following setup. It's comprised of a annual variable
number of firms which are identified by the companyid variable and the year
variable (unbalanced panel). In addition I also have a variable identifying
the worker, workerid. I then have a set of variables which i want to use as
the basis for calculating some of the measures in the 'diverse' package. An
example of the sample is as follows, using the gender variable (0 for male
and 1 for female) as the variable of interest:

companyid   yearworkeridgender
85390   1999464463840
85390   199912680   1
85390   199916330   0
85390   1999602254510
85390   1999601954220
85390   2000602254510
85390   2000357100  1
85390   200016330   0
85390   200016330   0
85390   200012680   0
85390   2001601954220
85390   2001602254511
85390   2001464463840
85390   2001601954220
85390   2001602254510
4391076 2005137537590
4391076 2005499889110
4391076 200511240   0
4391076 200518550   0
4391076 2005356496430
4391076 2005658097050
4391076 200511420   0
4391076 200519210   0
4391076 2005642587010
4391076 2005121200  1

Based on the 'diverse' need to calculate for each firm, for each year, for
instance the diversity(gender) measure.  in Stata this would be obtained
just a issuing a by firm year command, but have no idea how to tackle this
is issue in R. Any ideas?

Best wishes,

Li

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping Through QuantMod Objects

2017-08-02 Thread Joshua Ulrich

Use auto.assign = FALSE in your getFinancials() call, and use
viewFinancials() to extract the data you want.

items <- c("Cash & Equivalents",
   "Short Term Investments",
   "Cash and Short Term Investments")
tickers <- c("AAPL", "IBM", "MSFT")
for (ticker in tickers) {
  Data <- getFinancials(ticker, auto.assign = FALSE)
  HoldQuart <- viewFinancials(Data, "BS", "Q")
  CashHold <- subset(HoldQuart,rownames(HoldQuart) %in% items)
  CashT <- t(CashHold)
  Cashdf <- data.frame(CashT)
  Cashdf$tic <- ticker
  assign(paste0(ticker, ".c"), Cashdf)
}

If you want to continue processing the data in each [ticker].c object,
it would be better to put the body of the loop into a function and
call lapply(tickers, myfunction).  Then you can use lapply() on the
result to continue applying functions to the data.

You might also be interested in the stackFinancials function I wrote about:
http://blog.fosstrading.com/2017/02/stack-financials.html

Best,
Josh


On Wed, Aug 2, 2017 at 10:00 AM, Sparks, John James  wrote:
> Dear R Helpers,
>
> I have run into a problem trying to perform a number of actions on a set
> of quantmod data objects through a loop and I am hoping that this is an
> easy problem for someone else as opposed to very  difficult for me.
>
> The example task is to get the first three objects of the quarterly
> balance sheet for a number of companies from the getFinancials object and
> put them together into a single file.  I can do this one by one, but if I
> try to build a loop and use the get function then the results are not
> anticipated and leave me baffled.
>
> If I do it one at a time all is good.
>
>
> require(quantmod)
>
> getFinancials("AAPL")
> getFinancials("IBM")
> getFinancials("MSFT")
>
>
> items=c("Cash & Equivalents","Short Term Investments","Cash and Short Term
> Investments")
>
> HoldQuart<-AAPL.f$BS$Q
> CashHold<-subset(HoldQuart,rownames(HoldQuart) %in% items)
> CashT<-t(CashHold)
> Cashdf<-data.frame(CashT)
> Cashdf$tic<-"AAPL"
> AAPL.c<-Cashdf
>
> HoldQuart<-IBM.f$BS$Q
> CashHold<-subset(HoldQuart,rownames(HoldQuart) %in% items)
> CashT<-t(CashHold)
> Cashdf<-data.frame(CashT)
> Cashdf$tic<-"IBM"
> IBM.c<-Cashdf
>
>
> HoldQuart<-MSFT.f$BS$Q
> CashHold<-subset(HoldQuart,rownames(HoldQuart) %in% items)
> CashT<-t(CashHold)
> Cashdf<-data.frame(CashT)
> Cashdf$tic<-"MSFT"
> MSFT.c<-Cashdf
>
>
> BigCash<-rbind(AAPL.c, IBM.c, MSFT.c)
> #setwd<-("C:/Users/HP USER/Documents")
> #write.csv(BigCash,file="CashList.csv")
>
>
> When I try to process through this using a loop, however, things go south
> pretty quickly.
>
> tickerlist<-ls(pattern="^[A-Z]+\\.f")
>
> for( i in 1:1)
> {
> test<-get(paste0(tickerlist[i],"$BS$Q"))
> }
>
> Error in get(paste0(tickerlist[i], "$BS$Q")) :
>   object 'AAPL.f$BS$Q' not found
>
> So I tried to break it up into smaller steps, but the resulting matrix
> seems to have lost its structure (see below).
>
> If someone could help me out, I sure would appreciate.
>
> Thanks.
> --John Sparks
>
>
> tickerlist<-ls(pattern="^[A-Z]+\\.f")
> for( i in 1:1)
> {
> HoldFin<-get(tickerlist[i])
> BSQ<-as.matrix(paste0(HoldFin,"$BS$Q"))
> }
> BSQ
>
> [1,] "list(Q = c(52896, NA, 52896, 32305, 20591, 3718, 2776, NA, NA, NA,
> NA, 38799, 14097, NA, NA, -165, 14684, 11029, NA, NA, 11029, NA, NA, NA,
> 11029, NA, 11029, 11029, NA, NA, NA, NA, 5261.69, 2.1, NA, 0.57, NA, NA,
> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2.1, 78351, NA, 78351, 48175,
> 30176, 3946, 2871, NA, NA, NA, NA, 54992, 23359, NA, NA, 122, 24180,
> 17891, NA, NA, 17891, NA, NA, NA, 17891, NA, 17891, 17891, NA, NA, NA, NA,
> 5327.99, 3.36, NA, 0.57, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> 3.36, \n46852, NA, 46852, 29039, 17813, 3482, 2570, NA, NA, NA, NA, 35091,
> 11761, NA, NA, -159, 12188, 9014, NA, NA, 9014, NA, NA, NA, 9014, NA,
> 9014, 9014, NA, NA, NA, NA, 5393.33, 1.67, NA, 0.57, NA, NA, NA, NA, NA,
> NA, NA, NA, NA, NA, NA, NA, 1.67, 42358, NA, 42358, 26252, 16106, 3441,
> 2560, NA, NA, NA, NA, 32253, 10105, NA, NA, -263, 10469, 7796, NA, NA,
> 7796, NA, NA, NA, 7796, NA, 7796, 7796, NA, NA, NA, NA, 5472.78, 1.42, NA,
> 0.57, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1.42, 50557, NA,
> 50557, \n30636, 19921, 3423, 2511, NA, NA, NA, NA, 36570, 13987, NA, NA,
> -510, 14142, 10516, NA, NA, 10516, NA, NA, NA, 10516, NA, 10516, 10516,
> NA, NA, NA, NA, 5540.89, 1.9, NA, 0.52, NA, NA, NA, NA, NA, NA, NA, NA,
> NA, NA, NA, NA, 1.9), A = c(215639, NA, 215639, 131376, 84263, 14194,
> 10045, NA, NA, NA, NA, 155615, 60024, NA, NA, -1195, 61372, 45687, NA, NA,
> 45687, NA, NA, NA, 45687, NA, 45687, 45687, NA, NA, NA, NA, 5500.28, 8.31,
> NA, 2.18, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 8.31, 233715,
> NA, \n233715, 140089, 93626, 14329, 8067, NA, NA, NA, NA, 162485, 71230,
> NA, NA, -903, 72515, 53394, NA, NA, 53394, NA, NA, NA, 53394, NA, 53394,
> 53394, NA, NA, NA, NA, 5793.07, 9.22, NA, 1.98, NA, NA, NA, NA, NA, NA,
> NA, NA, NA, NA, NA, NA, 9.22, 182795, NA,

[R] Looping Through QuantMod Objects

2017-08-02 Thread Sparks, John James

Dear R Helpers,

I have run into a problem trying to perform a number of actions on a set
of quantmod data objects through a loop and I am hoping that this is an
easy problem for someone else as opposed to very  difficult for me.

The example task is to get the first three objects of the quarterly
balance sheet for a number of companies from the getFinancials object and
put them together into a single file.  I can do this one by one, but if I
try to build a loop and use the get function then the results are not
anticipated and leave me baffled.

If I do it one at a time all is good.


require(quantmod)

getFinancials("AAPL")
getFinancials("IBM")
getFinancials("MSFT")


items=c("Cash & Equivalents","Short Term Investments","Cash and Short Term
Investments")

HoldQuart<-AAPL.f$BS$Q
CashHold<-subset(HoldQuart,rownames(HoldQuart) %in% items)
CashT<-t(CashHold)
Cashdf<-data.frame(CashT)
Cashdf$tic<-"AAPL"
AAPL.c<-Cashdf

HoldQuart<-IBM.f$BS$Q
CashHold<-subset(HoldQuart,rownames(HoldQuart) %in% items)
CashT<-t(CashHold)
Cashdf<-data.frame(CashT)
Cashdf$tic<-"IBM"
IBM.c<-Cashdf


HoldQuart<-MSFT.f$BS$Q
CashHold<-subset(HoldQuart,rownames(HoldQuart) %in% items)
CashT<-t(CashHold)
Cashdf<-data.frame(CashT)
Cashdf$tic<-"MSFT"
MSFT.c<-Cashdf


BigCash<-rbind(AAPL.c, IBM.c, MSFT.c)
#setwd<-("C:/Users/HP USER/Documents")
#write.csv(BigCash,file="CashList.csv")


When I try to process through this using a loop, however, things go south
pretty quickly.

tickerlist<-ls(pattern="^[A-Z]+\\.f")

for( i in 1:1)
{
test<-get(paste0(tickerlist[i],"$BS$Q"))
}

Error in get(paste0(tickerlist[i], "$BS$Q")) :
  object 'AAPL.f$BS$Q' not found

So I tried to break it up into smaller steps, but the resulting matrix
seems to have lost its structure (see below).

If someone could help me out, I sure would appreciate.

Thanks.
--John Sparks


tickerlist<-ls(pattern="^[A-Z]+\\.f")
for( i in 1:1)
{
HoldFin<-get(tickerlist[i])
BSQ<-as.matrix(paste0(HoldFin,"$BS$Q"))
}
BSQ

[1,] "list(Q = c(52896, NA, 52896, 32305, 20591, 3718, 2776, NA, NA, NA,
NA, 38799, 14097, NA, NA, -165, 14684, 11029, NA, NA, 11029, NA, NA, NA,
11029, NA, 11029, 11029, NA, NA, NA, NA, 5261.69, 2.1, NA, 0.57, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2.1, 78351, NA, 78351, 48175,
30176, 3946, 2871, NA, NA, NA, NA, 54992, 23359, NA, NA, 122, 24180,
17891, NA, NA, 17891, NA, NA, NA, 17891, NA, 17891, 17891, NA, NA, NA, NA,
5327.99, 3.36, NA, 0.57, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
3.36, \n46852, NA, 46852, 29039, 17813, 3482, 2570, NA, NA, NA, NA, 35091,
11761, NA, NA, -159, 12188, 9014, NA, NA, 9014, NA, NA, NA, 9014, NA,
9014, 9014, NA, NA, NA, NA, 5393.33, 1.67, NA, 0.57, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 1.67, 42358, NA, 42358, 26252, 16106, 3441,
2560, NA, NA, NA, NA, 32253, 10105, NA, NA, -263, 10469, 7796, NA, NA,
7796, NA, NA, NA, 7796, NA, 7796, 7796, NA, NA, NA, NA, 5472.78, 1.42, NA,
0.57, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1.42, 50557, NA,
50557, \n30636, 19921, 3423, 2511, NA, NA, NA, NA, 36570, 13987, NA, NA,
-510, 14142, 10516, NA, NA, 10516, NA, NA, NA, 10516, NA, 10516, 10516,
NA, NA, NA, NA, 5540.89, 1.9, NA, 0.52, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 1.9), A = c(215639, NA, 215639, 131376, 84263, 14194,
10045, NA, NA, NA, NA, 155615, 60024, NA, NA, -1195, 61372, 45687, NA, NA,
45687, NA, NA, NA, 45687, NA, 45687, 45687, NA, NA, NA, NA, 5500.28, 8.31,
NA, 2.18, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 8.31, 233715,
NA, \n233715, 140089, 93626, 14329, 8067, NA, NA, NA, NA, 162485, 71230,
NA, NA, -903, 72515, 53394, NA, NA, 53394, NA, NA, NA, 53394, NA, 53394,
53394, NA, NA, NA, NA, 5793.07, 9.22, NA, 1.98, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 9.22, 182795, NA, 182795, 112258, 70537, 11993,
6041, NA, NA, NA, NA, 130292, 52503, NA, NA, -311, 53483, 39510, NA, NA,
39510, NA, NA, NA, 39510, NA, 39510, 39510, NA, NA, NA, 0, 6122.66, 6.45,
NA, 1.81, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 6.45, 170910,
\nNA, 170910, 106606, 64304, 10830, 4475, NA, NA, NA, NA, 121911, 48999,
NA, NA, -24, 50155, 37037, NA, NA, 37037, NA, NA, NA, 37037, NA, 37037,
37037, NA, NA, NA, 0, 6521.5, 5.68, NA, 1.63, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 5.68))$BS$Q"
[2,] "list(Q = c(NA, 59501, 67101, 11579, NA, 20612, 2910, NA, 11367,
101990, 65124, -37961, 5473, 2617, 189740, 7549, 334532, 28573, 21665,
9992, 3999, 9113, 73342, 84531, NA, 84531, 98522, 28226, NA, 14351,
200450, NA, NA, 33579, NA, 100925, NA, -902, 134082, 334532, NA, 5205.81,
NA, 51093, 60452, 14057, NA, 27977, 2712, NA, 12191, 103332, 62759,
-36249, 5423, 2848, 185638, 7390, 331141, 38510, 21895, 10493, 3499, 9733,
84130, 73557, NA, 73557, 87549, 26948, NA, 14116, 198751, NA, NA, 32144,
NA, 11, \nNA, -1567, 132390, 331141, NA, 5255.42, NA, 58554, 67155,
15754, NA, 29299, 2132, NA, 8283, 106869, 61245, -34235, 5414, 3206,
170430, 8757, 321686, 37294, 20951, 8105, 3500, 9156, 79006, 75427, NA,
75427, 87032, 26019, NA, 12985,

Re: [R] Looping Through DataFrames with Differing Lenghts

2017-03-28 Thread David L Carlson

You have multiple problems. You do not seem to understand read.csv() or 
as.Date() so you really need to read the manual pages:

?read.csv
?as.Date

> Data <- read.csv("Container.csv")
> str(Data)
'data.frame':   362 obs. of  1 variable:
 $ TransitDate.Transits: Factor w/ 362 levels "1-Apr-00\t25",..: 319 289 78 140 
110 229 18 259 199 169 ...

Notice you have a single factor that combines the TransitDate and Transits 
because the file you sent was NOT a .csv file, but a tab-delimited file:

> Data <- read.delim("Container.csv")
> str(Data)
'data.frame':   362 obs. of  2 variables:
 $ TransitDate: Factor w/ 362 levels "1-Apr-00","1-Apr-01",..: 319 289 78 140 
110 229 18 259 199 169 ...
 $ Transits   : int  4 4 5 4 3 6 4 3 4 5 ...

Now we get two variables, but the date is still a factor.

> Data <- read.delim("Container.csv", stringsAsFactors=FALSE)
> str(Data)
'data.frame':   362 obs. of  2 variables:
 $ TransitDate: chr  "1-Oct-85" "1-Nov-85" "1-Dec-85" "1-Jan-86" ...
 $ Transits   : int  4 4 5 4 3 6 4 3 4 5 ...

Now we get the date as characters, but as Ng Bo Lin pointed out, it is not in 
the format you indicated: "%Y-%m-%d", %Y means a year with the century (e.g. 
1985), but you have 2-digit years (85), %m means month as a decimal number 
(e.g. 10 for October), but you have a 3-digit abbreviation for the month. And 
the order is backwards. What you need is

> TDate <- as.Date(Data$TransitDate, "%e-%B-%y")
> head(TDate)
[1] "1985-10-01" "1985-11-01" "1985-12-01" "1986-01-01" "1986-02-01" 
"1986-03-01"

You probably should preserve the original date and not overwrite it so 
something like

> Data$Transit <- TDate
> str(Data)
'data.frame':   362 obs. of  3 variables:
 $ TransitDate: chr  "1-Oct-85" "1-Nov-85" "1-Dec-85" "1-Jan-86" ...
 $ Transits   : int  4 4 5 4 3 6 4 3 4 5 ...
 $ Transit: Date, format: "1985-10-01" "1985-11-01" ...

Would be preferable. 

-
David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77840-4352


From: Paul Bernal [mailto:paulberna...@gmail.com] 
Sent: Tuesday, March 28, 2017 9:41 AM
To: David L Carlson <dcarl...@tamu.edu>
Cc: Ng Bo Lin <ngboli...@gmail.com>; r-help@r-project.org
Subject: Re: [R] Looping Through DataFrames with Differing Lenghts

Dear friend David,

Thank you for your valuable suggestion. So here is the file in .txt format.

Best of regards,

Paul

2017-03-28 9:35 GMT-05:00 David L Carlson <dcarl...@tamu.edu>:
We did not get the file on the list. You need to rename your file to 
"Container.txt" or the mailing list will strip it from your message. The 
read.csv() function returns a data frame so Data is already a data frame. The 
command DataFrame<-data.frame(Data) just makes a copy of Data.

Without the file, it is difficult to be certain, but your dates are probably 
stored as character strings and read.csv() will turn those to factors unless 
you tell it not to do that. Try

Data<-read.csv("Container.csv", stringsAsFactors=FALSE)
str(Data) # To see how the dates are stored

and see if things work better. If not, rename the file or use dput(Data) and 
copy the result into your email message. If the data is very long, use 
dput(head(Data, 15)).

---------
David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77840-4352


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Paul Bernal
Sent: Tuesday, March 28, 2017 9:12 AM
To: Ng Bo Lin <ngboli...@gmail.com>
Cc: r-help@r-project.org
Subject: Re: [R] Looping Through DataFrames with Differing Lenghts

Dear friends Ng Bo Lin, Mark and Ulrik, thank you all for your kind and
valuable replies,

I am trying to reformat a date as follows:

Data<-read.csv("Container.csv")

DataFrame<-data.frame(Data)

DataFrame$TransitDate<-as.Date(DataFrame$TransitDate, "%Y-%m-%d")

#trying to put it in -MM-DD format

However, when I do this, I get a bunch of NAs for the dates.

I am providing a sample dataset as a reference.

Any help will be greatly appreciated,

Best regards,

Paul

2017-03-28 8:15 GMT-05:00 Ng Bo Lin <ngboli...@gmail.com>:

> Hi Paul,
>
> Using the example provided by Ulrik, where
>
> > exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01”,
> "1986-01-01"), Transits = c(NA, NA, NA, NA))
> > exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits =
> c(15,20)),
>
> You could also try the following function:
>
> for (i in 1:dim(exdf1)[1]){
>         if (!ex

Re: [R] Looping Through DataFrames with Differing Lenghts

2017-03-28 Thread Paul Bernal

Dear friend David,

Thank you for your valuable suggestion. So here is the file in .txt format.

Best of regards,

Paul

2017-03-28 9:35 GMT-05:00 David L Carlson <dcarl...@tamu.edu>:

> We did not get the file on the list. You need to rename your file to
> "Container.txt" or the mailing list will strip it from your message. The
> read.csv() function returns a data frame so Data is already a data frame.
> The command DataFrame<-data.frame(Data) just makes a copy of Data.
>
> Without the file, it is difficult to be certain, but your dates are
> probably stored as character strings and read.csv() will turn those to
> factors unless you tell it not to do that. Try
>
> Data<-read.csv("Container.csv", stringsAsFactors=FALSE)
> str(Data) # To see how the dates are stored
>
> and see if things work better. If not, rename the file or use dput(Data)
> and copy the result into your email message. If the data is very long, use
> dput(head(Data, 15)).
>
> -
> David L Carlson
> Department of Anthropology
> Texas A University
> College Station, TX 77840-4352
>
>
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Paul
> Bernal
> Sent: Tuesday, March 28, 2017 9:12 AM
> To: Ng Bo Lin <ngboli...@gmail.com>
> Cc: r-help@r-project.org
> Subject: Re: [R] Looping Through DataFrames with Differing Lenghts
>
> Dear friends Ng Bo Lin, Mark and Ulrik, thank you all for your kind and
> valuable replies,
>
> I am trying to reformat a date as follows:
>
> Data<-read.csv("Container.csv")
>
> DataFrame<-data.frame(Data)
>
> DataFrame$TransitDate<-as.Date(DataFrame$TransitDate, "%Y-%m-%d")
>
> #trying to put it in -MM-DD format
>
> However, when I do this, I get a bunch of NAs for the dates.
>
> I am providing a sample dataset as a reference.
>
> Any help will be greatly appreciated,
>
> Best regards,
>
> Paul
>
> 2017-03-28 8:15 GMT-05:00 Ng Bo Lin <ngboli...@gmail.com>:
>
> > Hi Paul,
> >
> > Using the example provided by Ulrik, where
> >
> > > exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01”,
> > "1986-01-01"), Transits = c(NA, NA, NA, NA))
> > > exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits =
> > c(15,20)),
> >
> > You could also try the following function:
> >
> > for (i in 1:dim(exdf1)[1]){
> > if (!exdf1[i, 1] %in% exdf2[, 1]){
> > exdf2 <- rbind(exdf2, exdf1[i,])
> > }
> > }
> >
> > Basically, what the function does is that it runs through the number of
> > rows in exdf1, and checks if the Date of the exdf1 row already exists in
> > Date column of exdf2. If so, it skips it. Otherwise, it binds the row to
> > df2.
> >
> > Hope this helps!
> >
> >
> > Side note.: Computational efficiency wise, think Ulrik’s answer is
> > probably better. Presentation wise, his is also much better.
> >
> > Regards,
> > Bo Lin
> >
> > > On 28 Mar 2017, at 5:22 PM, Ulrik Stervbo <ulrik.ster...@gmail.com>
> > wrote:
> > >
> > > Hi Paul,
> > >
> > > does this do what you want?
> > >
> > > exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01",
> > > "1986-01-01"), Transits = c(NA, NA, NA, NA))
> > > exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits =
> > c(15,
> > > 20))
> > >
> > > tmpdf <- subset(exdf1, !Date %in% exdf2$Date)
> > >
> > > rbind(exdf2, tmpdf)
> > >
> > > HTH,
> > > Ulrik
> > >
> > > On Tue, 28 Mar 2017 at 10:50 Paul Bernal <paulberna...@gmail.com>
> wrote:
> > >
> > > Dear friend Mark,
> > >
> > > Great suggestion! Thank you for replying.
> > >
> > > I have two dataframes, dataframe1 and dataframe2.
> > >
> > > dataframe1 has two columns, one with the dates in -MM-DD format and
> > the
> > > other colum with number of transits (all of which were set to NA
> values).
> > > dataframe1 starts in 1985-10-01 (october 1st 1985) and ends in
> 2017-03-01
> > > (march 1 2017).
> > >
> > > dataframe2 has the same  two columns, one with the dates in -MM-DD
> > > format, and the other column with number of transits. dataframe2 starts
> > > have the same

Re: [R] Looping Through DataFrames with Differing Lenghts

2017-03-28 Thread Ng Bo Lin

Hi Paul,

Using the example provided by Ulrik, where

> exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01”, 
> "1986-01-01"), Transits = c(NA, NA, NA, NA))
> exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits = 
> c(15,20)),

You could also try the following function:

for (i in 1:dim(exdf1)[1]){
if (!exdf1[i, 1] %in% exdf2[, 1]){
exdf2 <- rbind(exdf2, exdf1[i,])
}
}

Basically, what the function does is that it runs through the number of rows in 
exdf1, and checks if the Date of the exdf1 row already exists in Date column of 
exdf2. If so, it skips it. Otherwise, it binds the row to df2.

Hope this helps!


Side note.: Computational efficiency wise, think Ulrik’s answer is probably 
better. Presentation wise, his is also much better.

Regards,
Bo Lin

> On 28 Mar 2017, at 5:22 PM, Ulrik Stervbo  wrote:
> 
> Hi Paul,
> 
> does this do what you want?
> 
> exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01",
> "1986-01-01"), Transits = c(NA, NA, NA, NA))
> exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits = c(15,
> 20))
> 
> tmpdf <- subset(exdf1, !Date %in% exdf2$Date)
> 
> rbind(exdf2, tmpdf)
> 
> HTH,
> Ulrik
> 
> On Tue, 28 Mar 2017 at 10:50 Paul Bernal  wrote:
> 
> Dear friend Mark,
> 
> Great suggestion! Thank you for replying.
> 
> I have two dataframes, dataframe1 and dataframe2.
> 
> dataframe1 has two columns, one with the dates in -MM-DD format and the
> other colum with number of transits (all of which were set to NA values).
> dataframe1 starts in 1985-10-01 (october 1st 1985) and ends in 2017-03-01
> (march 1 2017).
> 
> dataframe2 has the same  two columns, one with the dates in -MM-DD
> format, and the other column with number of transits. dataframe2 starts
> have the same start and end dates, however, dataframe2 has missing dates
> between the start and end dates, so it has fewer observations.
> 
> dataframe1 has a total of 378 observations and dataframe2 has a  total of
> 362 observations.
> 
> I would like to come up with a code that could do the following:
> 
> Get the dates of dataframe1 that are missing in dataframe2 and add them as
> records to dataframe 2 but with NA values.
> 
>  
> Date  Transits  Date
> Transits
> 1985-10-01NA 1985-10-0115
> 1985-11-01NA 1986-01-01 20
> 1985-12-01NA 1986-02-01 5
> 1986-01-01NA
> 1986-02-01NA
> 2017-03-01NA
> 
> I would like to fill in the missing dates in dataframe2, with NA as value
> for the missing transits, so that I  could end up with a dataframe3 looking
> as follows:
> 
>  DateTransits
> 1985-10-01  15
> 1985-11-01   NA
> 1985-12-01   NA
> 1986-01-01   20
> 1986-02-01   5
> 2017-03-01   NA
> 
> This is what I want to accomplish.
> 
> Thanks, beforehand for your help,
> 
> Best regards,
> 
> Paul
> 
> 
> 2017-03-27 15:15 GMT-05:00 Mark Sharp :
> 
>> Make some small dataframes of just a few rows that illustrate the problem
>> structure. Make a third that has the result you want. You will get an
>> answer very quickly. Without a self-contained reproducible problem,
> results
>> vary.
>> 
>> Mark
>> R. Mark Sharp, Ph.D.
>> msh...@txbiomed.org
>> 
>> 
>> 
>> 
>> 
>>> On Mar 27, 2017, at 3:09 PM, Paul Bernal  wrote:
>>> 
>>> Dear friends,
>>> 
>>> I have one dataframe which contains 378 observations, and another one,
>>> containing 362 observations.
>>> 
>>> Both dataframes have two columns, one date column and another one with
>> the
>>> number of transits.
>>> 
>>> I wanted to come up with a code so that I could fill in the dates that
>> are
>>> missing in one of the dataframes and replace the column of transits with
>>> the value NA.
>>> 
>>> I have tried several things but R obviously complains that the length of
>>> the dataframes are different.
>>> 
>>> How can I solve this?
>>> 
>>> Any guidance will be greatly appreciated,
>>> 
>>> Best regards,
>>> 
>>> Paul
>>> 
>>> [[alternative HTML version deleted]]
>>> 
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> CONFIDENTIALITY NOTICE: This e-mail and any files and/or attachments
>> transmitted, may contain privileged and confidential information and is
>> intended solely for the exclusive use of the individual or entity to whom
>> it is addressed. If you are not the intended recipient, you are

Re: [R] Looping Through DataFrames with Differing Lenghts

2017-03-28 Thread Ng Bo Lin

Hi Paul,

The date format that you have supplied to R isn’t exactly right.

Instead of supplying the format “%Y-%m-%d”, it appears that the format of your 
data adheres to the “%e-%B-%y” format. In this case, %e refers to Day, and 
takes an integer between (0 - 31), %B refers to the 3 letter abbreviated 
version of the Month, and %y refers to the Year provided in a “2-integer” 
format.

Hope this helps!

Thank you.

Regards,
Bo Lin
> On 28 Mar 2017, at 10:12 PM, Paul Bernal  wrote:
> 
> Dear friends Ng Bo Lin, Mark and Ulrik, thank you all for your kind and 
> valuable replies,
> 
> I am trying to reformat a date as follows:
> 
> Data<-read.csv("Container.csv")
> 
> DataFrame<-data.frame(Data)
> 
> DataFrame$TransitDate<-as.Date(DataFrame$TransitDate, "%Y-%m-%d")
> 
> #trying to put it in -MM-DD format
> 
> However, when I do this, I get a bunch of NAs for the dates.
> 
> I am providing a sample dataset as a reference.
> 
> Any help will be greatly appreciated,
> 
> Best regards,
> 
> Paul
> 
> 2017-03-28 8:15 GMT-05:00 Ng Bo Lin  >:
> Hi Paul,
> 
> Using the example provided by Ulrik, where
> 
> > exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01”, 
> > "1986-01-01"), Transits = c(NA, NA, NA, NA))
> > exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits = 
> > c(15,20)),
> 
> You could also try the following function:
> 
> for (i in 1:dim(exdf1)[1]){
> if (!exdf1[i, 1] %in% exdf2[, 1]){
> exdf2 <- rbind(exdf2, exdf1[i,])
> }
> }
> 
> Basically, what the function does is that it runs through the number of rows 
> in exdf1, and checks if the Date of the exdf1 row already exists in Date 
> column of exdf2. If so, it skips it. Otherwise, it binds the row to df2.
> 
> Hope this helps!
> 
> 
> Side note.: Computational efficiency wise, think Ulrik’s answer is probably 
> better. Presentation wise, his is also much better.
> 
> Regards,
> Bo Lin
> 
> > On 28 Mar 2017, at 5:22 PM, Ulrik Stervbo  > > wrote:
> >
> > Hi Paul,
> >
> > does this do what you want?
> >
> > exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01",
> > "1986-01-01"), Transits = c(NA, NA, NA, NA))
> > exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits = c(15,
> > 20))
> >
> > tmpdf <- subset(exdf1, !Date %in% exdf2$Date)
> >
> > rbind(exdf2, tmpdf)
> >
> > HTH,
> > Ulrik
> >
> > On Tue, 28 Mar 2017 at 10:50 Paul Bernal  > > wrote:
> >
> > Dear friend Mark,
> >
> > Great suggestion! Thank you for replying.
> >
> > I have two dataframes, dataframe1 and dataframe2.
> >
> > dataframe1 has two columns, one with the dates in -MM-DD format and the
> > other colum with number of transits (all of which were set to NA values).
> > dataframe1 starts in 1985-10-01 (october 1st 1985) and ends in 2017-03-01
> > (march 1 2017).
> >
> > dataframe2 has the same  two columns, one with the dates in -MM-DD
> > format, and the other column with number of transits. dataframe2 starts
> > have the same start and end dates, however, dataframe2 has missing dates
> > between the start and end dates, so it has fewer observations.
> >
> > dataframe1 has a total of 378 observations and dataframe2 has a  total of
> > 362 observations.
> >
> > I would like to come up with a code that could do the following:
> >
> > Get the dates of dataframe1 that are missing in dataframe2 and add them as
> > records to dataframe 2 but with NA values.
> >
> >  >
> > Date  Transits  Date
> > Transits
> > 1985-10-01NA 1985-10-0115
> > 1985-11-01NA 1986-01-01 20
> > 1985-12-01NA 1986-02-01 5
> > 1986-01-01NA
> > 1986-02-01NA
> > 2017-03-01NA
> >
> > I would like to fill in the missing dates in dataframe2, with NA as value
> > for the missing transits, so that I  could end up with a dataframe3 looking
> > as follows:
> >
> >  > DateTransits
> > 1985-10-01  15
> > 1985-11-01   NA
> > 1985-12-01   NA
> > 1986-01-01   20
> > 1986-02-01   5
> > 2017-03-01   NA
> >
> > This is what I want to accomplish.
> >
> > Thanks, beforehand for your help,
> >
> > Best regards,
> >
> > Paul
> >
> >
> > 2017-03-27 15:15 GMT-05:00 Mark Sharp  > >:
> >
> >> Make some small dataframes of just a few rows that illustrate the problem
> >> structure. Make a third that has the result you want. You will get an
> >> answer very quickly. Without a self-contained reproducible problem,
> > results
> >> vary.
> >>
> >> Mark
> >> R. Mark Sharp, Ph.D.
> >>

Re: [R] Looping Through DataFrames with Differing Lenghts

2017-03-28 Thread David L Carlson

We did not get the file on the list. You need to rename your file to 
"Container.txt" or the mailing list will strip it from your message. The 
read.csv() function returns a data frame so Data is already a data frame. The 
command DataFrame<-data.frame(Data) just makes a copy of Data. 

Without the file, it is difficult to be certain, but your dates are probably 
stored as character strings and read.csv() will turn those to factors unless 
you tell it not to do that. Try

Data<-read.csv("Container.csv", stringsAsFactors=FALSE)
str(Data) # To see how the dates are stored

and see if things work better. If not, rename the file or use dput(Data) and 
copy the result into your email message. If the data is very long, use 
dput(head(Data, 15)).

-
David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77840-4352


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Paul Bernal
Sent: Tuesday, March 28, 2017 9:12 AM
To: Ng Bo Lin <ngboli...@gmail.com>
Cc: r-help@r-project.org
Subject: Re: [R] Looping Through DataFrames with Differing Lenghts

Dear friends Ng Bo Lin, Mark and Ulrik, thank you all for your kind and
valuable replies,

I am trying to reformat a date as follows:

Data<-read.csv("Container.csv")

DataFrame<-data.frame(Data)

DataFrame$TransitDate<-as.Date(DataFrame$TransitDate, "%Y-%m-%d")

#trying to put it in -MM-DD format

However, when I do this, I get a bunch of NAs for the dates.

I am providing a sample dataset as a reference.

Any help will be greatly appreciated,

Best regards,

Paul

2017-03-28 8:15 GMT-05:00 Ng Bo Lin <ngboli...@gmail.com>:

> Hi Paul,
>
> Using the example provided by Ulrik, where
>
> > exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01”,
> "1986-01-01"), Transits = c(NA, NA, NA, NA))
> > exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits =
> c(15,20)),
>
> You could also try the following function:
>
> for (i in 1:dim(exdf1)[1]){
> if (!exdf1[i, 1] %in% exdf2[, 1]){
> exdf2 <- rbind(exdf2, exdf1[i,])
> }
> }
>
> Basically, what the function does is that it runs through the number of
> rows in exdf1, and checks if the Date of the exdf1 row already exists in
> Date column of exdf2. If so, it skips it. Otherwise, it binds the row to
> df2.
>
> Hope this helps!
>
>
> Side note.: Computational efficiency wise, think Ulrik’s answer is
> probably better. Presentation wise, his is also much better.
>
> Regards,
> Bo Lin
>
> > On 28 Mar 2017, at 5:22 PM, Ulrik Stervbo <ulrik.ster...@gmail.com>
> wrote:
> >
> > Hi Paul,
> >
> > does this do what you want?
> >
> > exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01",
> > "1986-01-01"), Transits = c(NA, NA, NA, NA))
> > exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits =
> c(15,
> > 20))
> >
> > tmpdf <- subset(exdf1, !Date %in% exdf2$Date)
> >
> > rbind(exdf2, tmpdf)
> >
> > HTH,
> > Ulrik
> >
> > On Tue, 28 Mar 2017 at 10:50 Paul Bernal <paulberna...@gmail.com> wrote:
> >
> > Dear friend Mark,
> >
> > Great suggestion! Thank you for replying.
> >
> > I have two dataframes, dataframe1 and dataframe2.
> >
> > dataframe1 has two columns, one with the dates in -MM-DD format and
> the
> > other colum with number of transits (all of which were set to NA values).
> > dataframe1 starts in 1985-10-01 (october 1st 1985) and ends in 2017-03-01
> > (march 1 2017).
> >
> > dataframe2 has the same  two columns, one with the dates in -MM-DD
> > format, and the other column with number of transits. dataframe2 starts
> > have the same start and end dates, however, dataframe2 has missing dates
> > between the start and end dates, so it has fewer observations.
> >
> > dataframe1 has a total of 378 observations and dataframe2 has a  total of
> > 362 observations.
> >
> > I would like to come up with a code that could do the following:
> >
> > Get the dates of dataframe1 that are missing in dataframe2 and add them
> as
> > records to dataframe 2 but with NA values.
> >
> >  >
> > Date  Transits  Date
> > Transits
> > 1985-10-01NA 1985-10-0115
> > 1985-11-01NA 1986-01-01 20
> > 1985-12-01NA

Re: [R] Looping Through DataFrames with Differing Lenghts

2017-03-28 Thread Paul Bernal

Dear Bo Lin,

I tried doing
Containerdata$TransitDate<-as.Date(Containerdata$TransitDate, "%e-%B-%y")
but I keep getting NAs.

I also tried a solution that I saw in stackoverflow doing:

> lct<-Sys.getlocale("LC_TIME"); Sys.setlocale("LC_TIME", "C")
[1] "C"
>
> Sys.setlocale("LC_TIME", lct)
[1] "English_United States.1252"

but didn´t work.

Any other suggestion?

Thank you for your valuable help,

Regards,

Paul

2017-03-28 9:19 GMT-05:00 Ng Bo Lin :

> Hi Paul,
>
> The date format that you have supplied to R isn’t exactly right.
>
> Instead of supplying the format “%Y-%m-%d”, it appears that the format of
> your data adheres to the “%e-%B-%y” format. In this case, %e refers to Day,
> and takes an integer between (0 - 31), %B refers to the 3 letter
> abbreviated version of the Month, and %y refers to the Year provided in a
> “2-integer” format.
>
> Hope this helps!
>
> Thank you.
>
> Regards,
> Bo Lin
>
> On 28 Mar 2017, at 10:12 PM, Paul Bernal  wrote:
>
> Dear friends Ng Bo Lin, Mark and Ulrik, thank you all for your kind and
> valuable replies,
>
> I am trying to reformat a date as follows:
>
> Data<-read.csv("Container.csv")
>
> DataFrame<-data.frame(Data)
>
> DataFrame$TransitDate<-as.Date(DataFrame$TransitDate, "%Y-%m-%d")
>
> #trying to put it in -MM-DD format
>
> However, when I do this, I get a bunch of NAs for the dates.
>
> I am providing a sample dataset as a reference.
>
> Any help will be greatly appreciated,
>
> Best regards,
>
> Paul
>
> 2017-03-28 8:15 GMT-05:00 Ng Bo Lin :
>
>> Hi Paul,
>>
>> Using the example provided by Ulrik, where
>>
>> > exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01”,
>> "1986-01-01"), Transits = c(NA, NA, NA, NA))
>> > exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits =
>> c(15,20)),
>>
>> You could also try the following function:
>>
>> for (i in 1:dim(exdf1)[1]){
>> if (!exdf1[i, 1] %in% exdf2[, 1]){
>> exdf2 <- rbind(exdf2, exdf1[i,])
>> }
>> }
>>
>> Basically, what the function does is that it runs through the number of
>> rows in exdf1, and checks if the Date of the exdf1 row already exists in
>> Date column of exdf2. If so, it skips it. Otherwise, it binds the row to
>> df2.
>>
>> Hope this helps!
>>
>>
>> Side note.: Computational efficiency wise, think Ulrik’s answer is
>> probably better. Presentation wise, his is also much better.
>>
>> Regards,
>> Bo Lin
>>
>> > On 28 Mar 2017, at 5:22 PM, Ulrik Stervbo 
>> wrote:
>> >
>> > Hi Paul,
>> >
>> > does this do what you want?
>> >
>> > exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01",
>> > "1986-01-01"), Transits = c(NA, NA, NA, NA))
>> > exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits =
>> c(15,
>> > 20))
>> >
>> > tmpdf <- subset(exdf1, !Date %in% exdf2$Date)
>> >
>> > rbind(exdf2, tmpdf)
>> >
>> > HTH,
>> > Ulrik
>> >
>> > On Tue, 28 Mar 2017 at 10:50 Paul Bernal 
>> wrote:
>> >
>> > Dear friend Mark,
>> >
>> > Great suggestion! Thank you for replying.
>> >
>> > I have two dataframes, dataframe1 and dataframe2.
>> >
>> > dataframe1 has two columns, one with the dates in -MM-DD format and
>> the
>> > other colum with number of transits (all of which were set to NA
>> values).
>> > dataframe1 starts in 1985-10-01 (october 1st 1985) and ends in
>> 2017-03-01
>> > (march 1 2017).
>> >
>> > dataframe2 has the same  two columns, one with the dates in -MM-DD
>> > format, and the other column with number of transits. dataframe2 starts
>> > have the same start and end dates, however, dataframe2 has missing dates
>> > between the start and end dates, so it has fewer observations.
>> >
>> > dataframe1 has a total of 378 observations and dataframe2 has a  total
>> of
>> > 362 observations.
>> >
>> > I would like to come up with a code that could do the following:
>> >
>> > Get the dates of dataframe1 that are missing in dataframe2 and add them
>> as
>> > records to dataframe 2 but with NA values.
>> >
>> > > >
>> > Date  Transits  Date
>> > Transits
>> > 1985-10-01NA 1985-10-0115
>> > 1985-11-01NA 1986-01-01 20
>> > 1985-12-01NA 1986-02-01 5
>> > 1986-01-01NA
>> > 1986-02-01NA
>> > 2017-03-01NA
>> >
>> > I would like to fill in the missing dates in dataframe2, with NA as
>> value
>> > for the missing transits, so that I  could end up with a dataframe3
>> looking
>> > as follows:
>> >
>> > > > DateTransits
>> > 1985-10-01  15
>> > 1985-11-01   NA
>> > 1985-12-01   NA
>> > 1986-01-01   20
>> > 1986-02-01   5
>> > 2017-03-01   NA
>> >
>> > This is what I want to

Re: [R] Looping Through DataFrames with Differing Lenghts

2017-03-28 Thread Paul Bernal

Dear friends Ng Bo Lin, Mark and Ulrik, thank you all for your kind and
valuable replies,

I am trying to reformat a date as follows:

Data<-read.csv("Container.csv")

DataFrame<-data.frame(Data)

DataFrame$TransitDate<-as.Date(DataFrame$TransitDate, "%Y-%m-%d")

#trying to put it in -MM-DD format

However, when I do this, I get a bunch of NAs for the dates.

I am providing a sample dataset as a reference.

Any help will be greatly appreciated,

Best regards,

Paul

2017-03-28 8:15 GMT-05:00 Ng Bo Lin :

> Hi Paul,
>
> Using the example provided by Ulrik, where
>
> > exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01”,
> "1986-01-01"), Transits = c(NA, NA, NA, NA))
> > exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits =
> c(15,20)),
>
> You could also try the following function:
>
> for (i in 1:dim(exdf1)[1]){
> if (!exdf1[i, 1] %in% exdf2[, 1]){
> exdf2 <- rbind(exdf2, exdf1[i,])
> }
> }
>
> Basically, what the function does is that it runs through the number of
> rows in exdf1, and checks if the Date of the exdf1 row already exists in
> Date column of exdf2. If so, it skips it. Otherwise, it binds the row to
> df2.
>
> Hope this helps!
>
>
> Side note.: Computational efficiency wise, think Ulrik’s answer is
> probably better. Presentation wise, his is also much better.
>
> Regards,
> Bo Lin
>
> > On 28 Mar 2017, at 5:22 PM, Ulrik Stervbo 
> wrote:
> >
> > Hi Paul,
> >
> > does this do what you want?
> >
> > exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01",
> > "1986-01-01"), Transits = c(NA, NA, NA, NA))
> > exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits =
> c(15,
> > 20))
> >
> > tmpdf <- subset(exdf1, !Date %in% exdf2$Date)
> >
> > rbind(exdf2, tmpdf)
> >
> > HTH,
> > Ulrik
> >
> > On Tue, 28 Mar 2017 at 10:50 Paul Bernal  wrote:
> >
> > Dear friend Mark,
> >
> > Great suggestion! Thank you for replying.
> >
> > I have two dataframes, dataframe1 and dataframe2.
> >
> > dataframe1 has two columns, one with the dates in -MM-DD format and
> the
> > other colum with number of transits (all of which were set to NA values).
> > dataframe1 starts in 1985-10-01 (october 1st 1985) and ends in 2017-03-01
> > (march 1 2017).
> >
> > dataframe2 has the same  two columns, one with the dates in -MM-DD
> > format, and the other column with number of transits. dataframe2 starts
> > have the same start and end dates, however, dataframe2 has missing dates
> > between the start and end dates, so it has fewer observations.
> >
> > dataframe1 has a total of 378 observations and dataframe2 has a  total of
> > 362 observations.
> >
> > I would like to come up with a code that could do the following:
> >
> > Get the dates of dataframe1 that are missing in dataframe2 and add them
> as
> > records to dataframe 2 but with NA values.
> >
> >  >
> > Date  Transits  Date
> > Transits
> > 1985-10-01NA 1985-10-0115
> > 1985-11-01NA 1986-01-01 20
> > 1985-12-01NA 1986-02-01 5
> > 1986-01-01NA
> > 1986-02-01NA
> > 2017-03-01NA
> >
> > I would like to fill in the missing dates in dataframe2, with NA as value
> > for the missing transits, so that I  could end up with a dataframe3
> looking
> > as follows:
> >
> >  > DateTransits
> > 1985-10-01  15
> > 1985-11-01   NA
> > 1985-12-01   NA
> > 1986-01-01   20
> > 1986-02-01   5
> > 2017-03-01   NA
> >
> > This is what I want to accomplish.
> >
> > Thanks, beforehand for your help,
> >
> > Best regards,
> >
> > Paul
> >
> >
> > 2017-03-27 15:15 GMT-05:00 Mark Sharp :
> >
> >> Make some small dataframes of just a few rows that illustrate the
> problem
> >> structure. Make a third that has the result you want. You will get an
> >> answer very quickly. Without a self-contained reproducible problem,
> > results
> >> vary.
> >>
> >> Mark
> >> R. Mark Sharp, Ph.D.
> >> msh...@txbiomed.org
> >>
> >>
> >>
> >>
> >>
> >>> On Mar 27, 2017, at 3:09 PM, Paul Bernal 
> wrote:
> >>>
> >>> Dear friends,
> >>>
> >>> I have one dataframe which contains 378 observations, and another one,
> >>> containing 362 observations.
> >>>
> >>> Both dataframes have two columns, one date column and another one with
> >> the
> >>> number of transits.
> >>>
> >>> I wanted to come up with a code so that I could fill in the dates that
> >> are
> >>> missing in one of the dataframes and replace the column of transits
> with
> >>> the value NA.
> >>>
> >>> I have tried several things but R obviously complains that the length
> of
> >>> the dataframes are different.
> >>>
> >>>

Re: [R] Looping Through DataFrames with Differing Lenghts

2017-03-28 Thread Ulrik Stervbo

Hi Paul,

does this do what you want?

exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01",
"1986-01-01"), Transits = c(NA, NA, NA, NA))
exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits = c(15,
20))

tmpdf <- subset(exdf1, !Date %in% exdf2$Date)

rbind(exdf2, tmpdf)

HTH,
Ulrik

On Tue, 28 Mar 2017 at 10:50 Paul Bernal  wrote:

Dear friend Mark,

Great suggestion! Thank you for replying.

I have two dataframes, dataframe1 and dataframe2.

dataframe1 has two columns, one with the dates in -MM-DD format and the
other colum with number of transits (all of which were set to NA values).
dataframe1 starts in 1985-10-01 (october 1st 1985) and ends in 2017-03-01
(march 1 2017).

dataframe2 has the same  two columns, one with the dates in -MM-DD
format, and the other column with number of transits. dataframe2 starts
have the same start and end dates, however, dataframe2 has missing dates
between the start and end dates, so it has fewer observations.

dataframe1 has a total of 378 observations and dataframe2 has a  total of
362 observations.

I would like to come up with a code that could do the following:

Get the dates of dataframe1 that are missing in dataframe2 and add them as
records to dataframe 2 but with NA values.

:

> Make some small dataframes of just a few rows that illustrate the problem
> structure. Make a third that has the result you want. You will get an
> answer very quickly. Without a self-contained reproducible problem,
results
> vary.
>
> Mark
> R. Mark Sharp, Ph.D.
> msh...@txbiomed.org
>
>
>
>
>
> > On Mar 27, 2017, at 3:09 PM, Paul Bernal  wrote:
> >
> > Dear friends,
> >
> > I have one dataframe which contains 378 observations, and another one,
> > containing 362 observations.
> >
> > Both dataframes have two columns, one date column and another one with
> the
> > number of transits.
> >
> > I wanted to come up with a code so that I could fill in the dates that
> are
> > missing in one of the dataframes and replace the column of transits with
> > the value NA.
> >
> > I have tried several things but R obviously complains that the length of
> > the dataframes are different.
> >
> > How can I solve this?
> >
> > Any guidance will be greatly appreciated,
> >
> > Best regards,
> >
> > Paul
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> CONFIDENTIALITY NOTICE: This e-mail and any files and/or attachments
> transmitted, may contain privileged and confidential information and is
> intended solely for the exclusive use of the individual or entity to whom
> it is addressed. If you are not the intended recipient, you are hereby
> notified that any review, dissemination, distribution or copying of this
> e-mail and/or attachments is strictly prohibited. If you have received
this
> e-mail in error, please immediately notify the sender stating that this
> transmission was misdirected; return the e-mail to sender; destroy all
> paper copies and delete all electronic copies from your system without
> disclosing its contents.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping Through DataFrames with Differing Lenghts

2017-03-28 Thread Paul Bernal

Dear friend Mark,

Great suggestion! Thank you for replying.

I have two dataframes, dataframe1 and dataframe2.

dataframe1 has two columns, one with the dates in -MM-DD format and the
other colum with number of transits (all of which were set to NA values).
dataframe1 starts in 1985-10-01 (october 1st 1985) and ends in 2017-03-01
(march 1 2017).

dataframe2 has the same  two columns, one with the dates in -MM-DD
format, and the other column with number of transits. dataframe2 starts
have the same start and end dates, however, dataframe2 has missing dates
between the start and end dates, so it has fewer observations.

dataframe1 has a total of 378 observations and dataframe2 has a  total of
362 observations.

I would like to come up with a code that could do the following:

Get the dates of dataframe1 that are missing in dataframe2 and add them as
records to dataframe 2 but with NA values.

:

> Make some small dataframes of just a few rows that illustrate the problem
> structure. Make a third that has the result you want. You will get an
> answer very quickly. Without a self-contained reproducible problem, results
> vary.
>
> Mark
> R. Mark Sharp, Ph.D.
> msh...@txbiomed.org
>
>
>
>
>
> > On Mar 27, 2017, at 3:09 PM, Paul Bernal  wrote:
> >
> > Dear friends,
> >
> > I have one dataframe which contains 378 observations, and another one,
> > containing 362 observations.
> >
> > Both dataframes have two columns, one date column and another one with
> the
> > number of transits.
> >
> > I wanted to come up with a code so that I could fill in the dates that
> are
> > missing in one of the dataframes and replace the column of transits with
> > the value NA.
> >
> > I have tried several things but R obviously complains that the length of
> > the dataframes are different.
> >
> > How can I solve this?
> >
> > Any guidance will be greatly appreciated,
> >
> > Best regards,
> >
> > Paul
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> CONFIDENTIALITY NOTICE: This e-mail and any files and/or attachments
> transmitted, may contain privileged and confidential information and is
> intended solely for the exclusive use of the individual or entity to whom
> it is addressed. If you are not the intended recipient, you are hereby
> notified that any review, dissemination, distribution or copying of this
> e-mail and/or attachments is strictly prohibited. If you have received this
> e-mail in error, please immediately notify the sender stating that this
> transmission was misdirected; return the e-mail to sender; destroy all
> paper copies and delete all electronic copies from your system without
> disclosing its contents.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping Through DataFrames with Differing Lenghts

2017-03-27 Thread Anthoni, Peter (IMK)

Hi Paul,

match might help, but without a real data sample, it is hard to check if the 
following might work.

mm=match(df.col378[,"Date"],df.col362[,"Date"])
#mm will have NAs, where there is no matching date in df.col362
#and have the index of the match, where the two dates match
new.df=cbind(df.col378,"transits.col362"=df.col362[mm,"transits"])

cheers
Peter



> On 27 Mar 2017, at 22:09, Paul Bernal  wrote:
> 
> Dear friends,
> 
> I have one dataframe which contains 378 observations, and another one,
> containing 362 observations.
> 
> Both dataframes have two columns, one date column and another one with the
> number of transits.
> 
> I wanted to come up with a code so that I could fill in the dates that are
> missing in one of the dataframes and replace the column of transits with
> the value NA.
> 
> I have tried several things but R obviously complains that the length of
> the dataframes are different.
> 
> How can I solve this?
> 
> Any guidance will be greatly appreciated,
> 
> Best regards,
> 
> Paul
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping Through DataFrames with Differing Lenghts

2017-03-27 Thread Ulrik Stervbo

You could use merge() or %in%.

Best,
Ulrik

Mark Sharp  schrieb am Mo., 27. März 2017, 22:20:

> Make some small dataframes of just a few rows that illustrate the problem
> structure. Make a third that has the result you want. You will get an
> answer very quickly. Without a self-contained reproducible problem, results
> vary.
>
> Mark
> R. Mark Sharp, Ph.D.
> msh...@txbiomed.org
>
>
>
>
>
> > On Mar 27, 2017, at 3:09 PM, Paul Bernal  wrote:
> >
> > Dear friends,
> >
> > I have one dataframe which contains 378 observations, and another one,
> > containing 362 observations.
> >
> > Both dataframes have two columns, one date column and another one with
> the
> > number of transits.
> >
> > I wanted to come up with a code so that I could fill in the dates that
> are
> > missing in one of the dataframes and replace the column of transits with
> > the value NA.
> >
> > I have tried several things but R obviously complains that the length of
> > the dataframes are different.
> >
> > How can I solve this?
> >
> > Any guidance will be greatly appreciated,
> >
> > Best regards,
> >
> > Paul
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> CONFIDENTIALITY NOTICE: This e-mail and any files and/or...{{dropped:10}}
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping Through DataFrames with Differing Lenghts

2017-03-27 Thread Mark Sharp

Make some small dataframes of just a few rows that illustrate the problem 
structure. Make a third that has the result you want. You will get an answer 
very quickly. Without a self-contained reproducible problem, results vary.

Mark
R. Mark Sharp, Ph.D.
msh...@txbiomed.org

> On Mar 27, 2017, at 3:09 PM, Paul Bernal  wrote:
>
> Dear friends,
>
> I have one dataframe which contains 378 observations, and another one,
> containing 362 observations.
>
> Both dataframes have two columns, one date column and another one with the
> number of transits.
>
> I wanted to come up with a code so that I could fill in the dates that are
> missing in one of the dataframes and replace the column of transits with
> the value NA.
>
> I have tried several things but R obviously complains that the length of
> the dataframes are different.
>
> How can I solve this?
>
> Any guidance will be greatly appreciated,
>
> Best regards,
>
> Paul
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

CONFIDENTIALITY NOTICE: This e-mail and any files and/or...{{dropped:10}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Looping Through DataFrames with Differing Lenghts

2017-03-27 Thread Paul Bernal

Dear friends,

I have one dataframe which contains 378 observations, and another one,
containing 362 observations.

Both dataframes have two columns, one date column and another one with the
number of transits.

I wanted to come up with a code so that I could fill in the dates that are
missing in one of the dataframes and replace the column of transits with
the value NA.

I have tried several things but R obviously complains that the length of
the dataframes are different.

How can I solve this?

Any guidance will be greatly appreciated,

Best regards,

Paul

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] looping problem

2017-02-02 Thread greg holly

Thanks so much Peter. I do appreciate for this.

Greg

On Thu, Feb 2, 2017 at 10:28 AM, PIKAL Petr <petr.pi...@precheza.cz> wrote:

> Hi.
>
> Your messages are rather confusing. Well, if you could get correct final
> data.frame in loop why not just add inside of loop new column(s) by
>
> psT$chr <- i
>
> Maybe it is time to read R intro as good source for starting with R. It
> has about 100 pages, but you can pick as start only pages 2-40 which are
> basic for data input and manipulation.
>
> Cheers
> Petr
>
>
> > -Original Message-
> > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of greg
> holly
> > Sent: Thursday, February 2, 2017 4:13 PM
> > To: Rui Barradas <ruipbarra...@sapo.pt>
> > Cc: r-help mailing list <r-help@r-project.org>
> > Subject: Re: [R] looping problem
> >
> > Thanks so much for this. Unfortunately, cbind did not work. Basically, I
> like to
> > put an extra column named "chr" in the combined file from 22 chr.
> > So chr colum will be "1" for the portion of chr1 in the combined file, 2
> for the
> > portion of chr2 in the combined file and so on.
> >
> > Regards,
> >
> > Greg
> >
> > On Thu, Feb 2, 2017 at 9:39 AM, Rui Barradas <ruipbarra...@sapo.pt>
> wrote:
> >
> > > Hello,
> > >
> > > If I understand correctly, just use ?cbind.
> > >
> > > Rui Barradas
> > >
> > > Em 02-02-2017 13:33, greg holly escreveu:
> > >
> > >> Hi Rui;
> > >>
> > >> Is there any way to insert the chr ids in numeric as 1,2..,22 in
> > >> the final output. Here is output from str(temp). So I need also chr
> > >> ids in a column.
> > >> 1  rs58108140 10583 G A -0.070438 0.059903
> > >> 2 rs189107123 10611 C G -0.044916 0.085853
> > >>
> > >> Regards,
> > >> Greg
> > >>
> > >>
> > >> On Wed, Feb 1, 2017 at 1:32 PM, Rui Barradas <ruipbarra...@sapo.pt
> > >> <mailto:ruipbarra...@sapo.pt>> wrote:
> > >>
> > >> Hello,
> > >>
> > >> If what you want is to combine the files into one data.frame then
> > >> there are 2 things you should see:
> > >>
> > >> 1) You create a variable named 'temp' and don't ever use it.
> > >> 2) You never combine the data.frames you read in.
> > >>
> > >> Try instead the following.
> > >>
> > >> temp <- data.frame()
> > >> for(i in 1:22) {
> > >>  infile<-paste("chr",i,"/Z-score.imputed",sep="")
> > >>  psT<-read.table(infile,header=T,as.is <http://as.is
> >=T,sep="\t")
> > >>  temp <- rbind(temp, psT)
> > >> }
> > >>
> > >> str(temp)  # to see what you have
> > >>
> > >> Hope this helps,
> > >>
> > >> Rui Barradas
> > >>
> > >>
> > >>
> > >>
> > >> Em 01-02-2017 17:25, greg holly escreveu:
> > >>
> > >> Hi all;
> > >>
> > >> I have 22 directories named chr1, chr2,,chr22. Each
> > >> directory has a
> > >> file named "Z-score.imputed". I would like to combine
> > >> Z-score.imputed
> > >> files into one. I wrote the following loop but did not get any
> > >> results.
> > >> Your helps are highly appreciated.
> > >>
> > >> regards,
> > >>
> > >> Greg
> > >>
> > >> temp<-c()
> > >>
> > >> for(i in 1:22) {
> > >> infile<-paste("chr",i,"/Z-score.imputed",sep="")
> > >> psT<-read.table(as.character(infile),header=T,as.is
> > >> <http://as.is>=T,sep="\t")
> > >> ps<-psT[psT$Var>0.6,]
> > >> ratio=nrow(ps)/nrow(psT)
> > >> print(ratio)
> > >> }
> > >>
> > >>  [[alternative HTML version deleted]]
> > >>
> > >> __
> > >> R-help@r-project.org <mailto:R-help@r-project.org> mailing
&g

Re: [R] looping problem

2017-02-02 Thread PIKAL Petr

Hi.

Your messages are rather confusing. Well, if you could get correct final 
data.frame in loop why not just add inside of loop new column(s) by

psT$chr <- i

Maybe it is time to read R intro as good source for starting with R. It has 
about 100 pages, but you can pick as start only pages 2-40 which are basic for 
data input and manipulation.

Cheers
Petr


> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of greg holly
> Sent: Thursday, February 2, 2017 4:13 PM
> To: Rui Barradas <ruipbarra...@sapo.pt>
> Cc: r-help mailing list <r-help@r-project.org>
> Subject: Re: [R] looping problem
>
> Thanks so much for this. Unfortunately, cbind did not work. Basically, I like 
> to
> put an extra column named "chr" in the combined file from 22 chr.
> So chr colum will be "1" for the portion of chr1 in the combined file, 2 for 
> the
> portion of chr2 in the combined file and so on.
>
> Regards,
>
> Greg
>
> On Thu, Feb 2, 2017 at 9:39 AM, Rui Barradas <ruipbarra...@sapo.pt> wrote:
>
> > Hello,
> >
> > If I understand correctly, just use ?cbind.
> >
> > Rui Barradas
> >
> > Em 02-02-2017 13:33, greg holly escreveu:
> >
> >> Hi Rui;
> >>
> >> Is there any way to insert the chr ids in numeric as 1,2..,22 in
> >> the final output. Here is output from str(temp). So I need also chr
> >> ids in a column.
> >> 1  rs58108140 10583 G A -0.070438 0.059903
> >> 2 rs189107123 10611 C G -0.044916 0.085853
> >>
> >> Regards,
> >> Greg
> >>
> >>
> >> On Wed, Feb 1, 2017 at 1:32 PM, Rui Barradas <ruipbarra...@sapo.pt
> >> <mailto:ruipbarra...@sapo.pt>> wrote:
> >>
> >> Hello,
> >>
> >> If what you want is to combine the files into one data.frame then
> >> there are 2 things you should see:
> >>
> >> 1) You create a variable named 'temp' and don't ever use it.
> >> 2) You never combine the data.frames you read in.
> >>
> >> Try instead the following.
> >>
> >> temp <- data.frame()
> >> for(i in 1:22) {
> >>  infile<-paste("chr",i,"/Z-score.imputed",sep="")
> >>  psT<-read.table(infile,header=T,as.is <http://as.is>=T,sep="\t")
> >>  temp <- rbind(temp, psT)
> >> }
> >>
> >> str(temp)  # to see what you have
> >>
> >> Hope this helps,
> >>
> >> Rui Barradas
> >>
> >>
> >>
> >>
> >> Em 01-02-2017 17:25, greg holly escreveu:
> >>
> >> Hi all;
> >>
> >> I have 22 directories named chr1, chr2,,chr22. Each
> >> directory has a
> >> file named "Z-score.imputed". I would like to combine
> >> Z-score.imputed
> >> files into one. I wrote the following loop but did not get any
> >> results.
> >> Your helps are highly appreciated.
> >>
> >> regards,
> >>
> >> Greg
> >>
> >> temp<-c()
> >>
> >> for(i in 1:22) {
> >> infile<-paste("chr",i,"/Z-score.imputed",sep="")
> >> psT<-read.table(as.character(infile),header=T,as.is
> >> <http://as.is>=T,sep="\t")
> >> ps<-psT[psT$Var>0.6,]
> >> ratio=nrow(ps)/nrow(psT)
> >> print(ratio)
> >> }
> >>
> >>  [[alternative HTML version deleted]]
> >>
> >> __
> >> R-help@r-project.org <mailto:R-help@r-project.org> mailing list
> >> -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> <https://stat.ethz.ch/mailman/listinfo/r-help>
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> <http://www.R-project.org/posting-guide.html>
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >>
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>

Re: [R] looping problem

2017-02-02 Thread greg holly

Thanks so much for this. Unfortunately, cbind did not work. Basically, I
like to put an extra column named "chr" in the combined file from 22 chr.
So chr colum will be "1" for the portion of chr1 in the combined file, 2
for the portion of chr2 in the combined file and so on.

Regards,

Greg

On Thu, Feb 2, 2017 at 9:39 AM, Rui Barradas  wrote:

> Hello,
>
> If I understand correctly, just use ?cbind.
>
> Rui Barradas
>
> Em 02-02-2017 13:33, greg holly escreveu:
>
>> Hi Rui;
>>
>> Is there any way to insert the chr ids in numeric as 1,2..,22 in the
>> final output. Here is output from str(temp). So I need also chr ids in a
>> column.
>> 1  rs58108140 10583 G A -0.070438 0.059903
>> 2 rs189107123 10611 C G -0.044916 0.085853
>>
>> Regards,
>> Greg
>>
>>
>> On Wed, Feb 1, 2017 at 1:32 PM, Rui Barradas > > wrote:
>>
>> Hello,
>>
>> If what you want is to combine the files into one data.frame then
>> there are 2 things you should see:
>>
>> 1) You create a variable named 'temp' and don't ever use it.
>> 2) You never combine the data.frames you read in.
>>
>> Try instead the following.
>>
>> temp <- data.frame()
>> for(i in 1:22) {
>>  infile<-paste("chr",i,"/Z-score.imputed",sep="")
>>  psT<-read.table(infile,header=T,as.is =T,sep="\t")
>>  temp <- rbind(temp, psT)
>> }
>>
>> str(temp)  # to see what you have
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>>
>>
>>
>> Em 01-02-2017 17:25, greg holly escreveu:
>>
>> Hi all;
>>
>> I have 22 directories named chr1, chr2,,chr22. Each
>> directory has a
>> file named "Z-score.imputed". I would like to combine
>> Z-score.imputed
>> files into one. I wrote the following loop but did not get any
>> results.
>> Your helps are highly appreciated.
>>
>> regards,
>>
>> Greg
>>
>> temp<-c()
>>
>> for(i in 1:22) {
>> infile<-paste("chr",i,"/Z-score.imputed",sep="")
>> psT<-read.table(as.character(infile),header=T,as.is
>> =T,sep="\t")
>> ps<-psT[psT$Var>0.6,]
>> ratio=nrow(ps)/nrow(psT)
>> print(ratio)
>> }
>>
>>  [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org  mailing list
>> -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> 
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> 
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] looping problem

2017-02-02 Thread Rui Barradas


Hello,

If I understand correctly, just use ?cbind.

Rui Barradas

Em 02-02-2017 13:33, greg holly escreveu:

Hi Rui;

Is there any way to insert the chr ids in numeric as 1,2..,22 in the
final output. Here is output from str(temp). So I need also chr ids in a
column.
1  rs58108140 10583 G A -0.070438 0.059903
2 rs189107123 10611 C G -0.044916 0.085853

Regards,
Greg


On Wed, Feb 1, 2017 at 1:32 PM, Rui Barradas > wrote:

Hello,

If what you want is to combine the files into one data.frame then
there are 2 things you should see:

1) You create a variable named 'temp' and don't ever use it.
2) You never combine the data.frames you read in.

Try instead the following.

temp <- data.frame()
for(i in 1:22) {
 infile<-paste("chr",i,"/Z-score.imputed",sep="")
 psT<-read.table(infile,header=T,as.is =T,sep="\t")
 temp <- rbind(temp, psT)
}

str(temp)  # to see what you have

Hope this helps,

Rui Barradas




Em 01-02-2017 17:25, greg holly escreveu:

Hi all;

I have 22 directories named chr1, chr2,,chr22. Each
directory has a
file named "Z-score.imputed". I would like to combine
Z-score.imputed
files into one. I wrote the following loop but did not get any
results.
Your helps are highly appreciated.

regards,

Greg

temp<-c()

for(i in 1:22) {
infile<-paste("chr",i,"/Z-score.imputed",sep="")
psT<-read.table(as.character(infile),header=T,as.is
=T,sep="\t")
ps<-psT[psT$Var>0.6,]
ratio=nrow(ps)/nrow(psT)
print(ratio)
}

 [[alternative HTML version deleted]]

__
R-help@r-project.org  mailing list
-- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] looping problem

2017-02-02 Thread greg holly

Hi Rui;

Is there any way to insert the chr ids in numeric as 1,2..,22 in the
final output. Here is output from str(temp). So I need also chr ids in a
column.
1  rs58108140 10583 G A -0.070438 0.059903
2 rs189107123 10611 C G -0.044916 0.085853

Regards,
Greg


On Wed, Feb 1, 2017 at 1:32 PM, Rui Barradas  wrote:

> Hello,
>
> If what you want is to combine the files into one data.frame then there
> are 2 things you should see:
>
> 1) You create a variable named 'temp' and don't ever use it.
> 2) You never combine the data.frames you read in.
>
> Try instead the following.
>
> temp <- data.frame()
> for(i in 1:22) {
> infile<-paste("chr",i,"/Z-score.imputed",sep="")
> psT<-read.table(infile,header=T,as.is=T,sep="\t")
> temp <- rbind(temp, psT)
> }
>
> str(temp)  # to see what you have
>
> Hope this helps,
>
> Rui Barradas
>
>
>
>
> Em 01-02-2017 17:25, greg holly escreveu:
>
>> Hi all;
>>
>> I have 22 directories named chr1, chr2,,chr22. Each directory has a
>> file named "Z-score.imputed". I would like to combine  Z-score.imputed
>> files into one. I wrote the following loop but did not get any results.
>> Your helps are highly appreciated.
>>
>> regards,
>>
>> Greg
>>
>> temp<-c()
>>
>> for(i in 1:22) {
>> infile<-paste("chr",i,"/Z-score.imputed",sep="")
>> psT<-read.table(as.character(infile),header=T,as.is=T,sep="\t")
>> ps<-psT[psT$Var>0.6,]
>> ratio=nrow(ps)/nrow(psT)
>> print(ratio)
>> }
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] looping problem

2017-02-01 Thread greg holly

Hi Rui;

I do appreciate for this. Thanks so much. I will try ASAP.

Regards,

Greg

On Wed, Feb 1, 2017 at 1:32 PM, Rui Barradas  wrote:

> Hello,
>
> If what you want is to combine the files into one data.frame then there
> are 2 things you should see:
>
> 1) You create a variable named 'temp' and don't ever use it.
> 2) You never combine the data.frames you read in.
>
> Try instead the following.
>
> temp <- data.frame()
> for(i in 1:22) {
> infile<-paste("chr",i,"/Z-score.imputed",sep="")
> psT<-read.table(infile,header=T,as.is=T,sep="\t")
> temp <- rbind(temp, psT)
> }
>
> str(temp)  # to see what you have
>
> Hope this helps,
>
> Rui Barradas
>
>
>
>
> Em 01-02-2017 17:25, greg holly escreveu:
>
>> Hi all;
>>
>> I have 22 directories named chr1, chr2,,chr22. Each directory has a
>> file named "Z-score.imputed". I would like to combine  Z-score.imputed
>> files into one. I wrote the following loop but did not get any results.
>> Your helps are highly appreciated.
>>
>> regards,
>>
>> Greg
>>
>> temp<-c()
>>
>> for(i in 1:22) {
>> infile<-paste("chr",i,"/Z-score.imputed",sep="")
>> psT<-read.table(as.character(infile),header=T,as.is=T,sep="\t")
>> ps<-psT[psT$Var>0.6,]
>> ratio=nrow(ps)/nrow(psT)
>> print(ratio)
>> }
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] looping problem

2017-02-01 Thread Rui Barradas


Hello,

If what you want is to combine the files into one data.frame then there 
are 2 things you should see:


1) You create a variable named 'temp' and don't ever use it.
2) You never combine the data.frames you read in.

Try instead the following.

temp <- data.frame()
for(i in 1:22) {
infile<-paste("chr",i,"/Z-score.imputed",sep="")
psT<-read.table(infile,header=T,as.is=T,sep="\t")
temp <- rbind(temp, psT)
}

str(temp)  # to see what you have

Hope this helps,

Rui Barradas



Em 01-02-2017 17:25, greg holly escreveu:

Hi all;

I have 22 directories named chr1, chr2,,chr22. Each directory has a
file named "Z-score.imputed". I would like to combine  Z-score.imputed
files into one. I wrote the following loop but did not get any results.
Your helps are highly appreciated.

regards,

Greg

temp<-c()

for(i in 1:22) {
infile<-paste("chr",i,"/Z-score.imputed",sep="")
psT<-read.table(as.character(infile),header=T,as.is=T,sep="\t")
ps<-psT[psT$Var>0.6,]
ratio=nrow(ps)/nrow(psT)
print(ratio)
}

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] looping problem

2017-02-01 Thread greg holly

Hi all;

I have 22 directories named chr1, chr2,,chr22. Each directory has a
file named "Z-score.imputed". I would like to combine  Z-score.imputed
files into one. I wrote the following loop but did not get any results.
Your helps are highly appreciated.

regards,

Greg

temp<-c()

for(i in 1:22) {
infile<-paste("chr",i,"/Z-score.imputed",sep="")
psT<-read.table(as.character(infile),header=T,as.is=T,sep="\t")
ps<-psT[psT$Var>0.6,]
ratio=nrow(ps)/nrow(psT)
print(ratio)
}

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping through data tables (or data frames) by removing previous individuals

2016-10-03 Thread Charles C. Berry


On Mon, 3 Oct 2016, Frank S. wrote:


Dear R users,




[deleted]

I want to get a list of "k" data tables (or data frames) so that each 
contains those individuals who for the first time are at least 65, 
looping on each of the dates of vector "v". Let's consider the following 
example with 5 individuals:



dt <- data.table(
  id = 1:5,
  fborn = as.Date(c("1935-07-25", "1942-10-05", "1942-09-07", "1943-09-07", 
"1943-12-31")),
  sex = as.factor(rep(c(0, 1), c(2, 3)))
  )

v <- seq(as.Date("2006-01-01"), as.Date("2009-01-01"), by ="year") # k=4


I would expect to obtain k=4 data tables so that:
dt_p1: contains id = 1 (he is for the first time at least 65 on date v[1])
dt_p2: is NULL (no subject reach for the first time 65 on date v[2])
dt_p3: contains id = 2 & id = 3 (they are for the first time at least 65 on 
v[3])
dt_p4: contains id = 4 & id = 5 (they are for the first time at least 65 on 
v[4])




Here is a start (using a data.frame for dt):


vp <- as.POSIXlt( c( as.Date("1000-01-01"), v ))
vp$year <- vp$year-65
dt.cut <- as.numeric(cut(as.POSIXlt(dt$fborn),vp))
split(dt,factor(dt.cut, 1:length(v)))

$`1`
  id  fborn sex
1  1 1935-07-25   0

$`2`
[1] idfborn sex
<0 rows> (or 0-length row.names)

$`3`
  id  fborn sex
2  2 1942-10-05   0
3  3 1942-09-07   1

$`4`
  id  fborn sex
4  4 1943-09-07   1
5  5 1943-12-31   1


See
  ?as.POSIXlt
  ?cut.POSIXt
  ?split

HTH,

Chuck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping through data tables (or data frames) by removing previous individuals

2016-10-03 Thread Ista Zahn

Hi Frank,

How about

library(lubridate)
dtf <- merge(dt, expand.grid(id = dt$id, refdate = v), by = "id")
dtf[, gt65 := as.period(interval(fborn, refdate), unit = "years") > years(65)]
dtf <- dtf[gt65 == TRUE,][, .SD[refdate == min(refdate)], by = id]

Best,
Ista

On Mon, Oct 3, 2016 at 1:17 PM, Frank S.  wrote:
> Dear R users,
>
> With this mail I send my third and last question I wanted to ask these days. 
> First of all, many thanks
>
> for the received support in my previous mails! My question is this: Starting 
> from a series of (for example)
>
> "k" different dates (all contained in vector "v"), I want to get a list of 
> "k" data tables (or data frames) so
>
> that each contains those individuals who for the first time are at least 65, 
> looping on each of the dates of
>
> vector "v". Let's consider the following example with 5 individuals:
>
>
> dt <- data.table(
>id = 1:5,
>fborn = as.Date(c("1935-07-25", "1942-10-05", "1942-09-07", "1943-09-07", 
> "1943-12-31")),
>sex = as.factor(rep(c(0, 1), c(2, 3)))
>)
>
> v <- seq(as.Date("2006-01-01"), as.Date("2009-01-01"), by ="year") # k=4
>
>
> I would expect to obtain k=4 data tables so that:
> dt_p1: contains id = 1 (he is for the first time at least 65 on date v[1])
> dt_p2: is NULL (no subject reach for the first time 65 on date v[2])
> dt_p3: contains id = 2 & id = 3 (they are for the first time at least 65 on 
> v[3])
> dt_p4: contains id = 4 & id = 5 (they are for the first time at least 65 on 
> v[4])
>
>
> I have tried:
>
> dt_p <- list( )# Empty list to alocate data tables
>
> for (i in 1:length(v)) {
>   dt_p[[i]] <- dt[ !(id %in% dt_p[[1:(i-1)]]$id) &  # Remove subjects from 
> previous dt_p's
>  round((v[i] - fborn)/365.25, 2) >= 65, ][ , list(id, fborn, sex)]
>
>  dt.names <- paste0("dt_p", 1:length(v))
>  assign(dt.names[i], dt_p[[i]]) # Assign a name to each data table
>  }
>
> However, I cannot express correctly the previous data tables, because for the 
> first data
>
> table in the loop, there are not any previous. Consequently, I get an error 
> message:
>
> # Error in dt_p[[1:(i - 1)]] : no such index at level 1
>
>
> I would be very grateful for anu suggestion!
>
> Frank S.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping through different groups of variables in models

2016-09-01 Thread Kai Mx

Thanks so much everybody, especially to Dennis. I didn't really occur to me
that I could put the models into a list. I have used dplyr for simple data
transformations and will definitely look into it.

On Thu, Sep 1, 2016 at 8:10 AM, Dennis Murphy  wrote:

> Hi:
>
> See inline.
>
> On Wed, Aug 31, 2016 at 2:58 PM, Kai Mx  wrote:
> > Hi all,
> >
> > I am having trouble wrapping my head around a probably simple issue:
> >
> > After using the reshape package, I have a melted dataframe with the
> columns
> > group (factor), time (int), condition (factor), value(int).
> >
> > These are experimental data. The data were obtained from different
> > treatment groups (group) under different conditions at different time
> > points.
> >
> > I would now like to perform ANOVA, boxplots and calculate means to
> compare
> > groups for all combinations of conditions and time points with something
> > like
> >
> > fit <- lm(value~group, data=[subset of data with combination of
> > condition/timepoint])
> > summary (fit)
> > p <- ggplot([subset of data with combination of condition/timepoint],
> > aes(x= group, y=value)) + geom_boxplot ()
> > print (p)
> > tapply ([subset of data with combination of condition/timepoint]$value,
> > subset of data with combination of condition/timepoint]$group, mean)
>
> There is a traditional approach to this class of problem and an
> evolving 'modern' approach. The traditional approach is to use
> lapply() to produce a list of model objects (one per subgroup); the
> more recent approach is to take advantage of the pipeline operations
> enabled by the magrittr package, e.g.,via the dplyr and tidyr
> packages.  Related packages are purrr and broom; the former applies
> functional programming ideas to map a function recursively across a
> list, while the latter focuses on converting information from model
> objects into data frames that are more compatible with dplyr and
> friends.A package released to CRAN within the past couple of days,
> modelr, adds a few bells and whistles (e.g., bootstrap regression,
> adding columns of predicted values or residuals to a data frame), but
> you don't need it for your immediate purposes.
>
> Below is a generic approach to solving the types of problems you
> described above, which is the best one can do in the absence of a
> reproducible example. Therefore, if this doesn't work out of the box,
> you'll have to fix your data. (I won't do it for you, sorry.)
>
> You could do something like what you have in mind in plyr as follows,
> where md is a surrogate for the name of your melted data frame:
>
> library(plyr)
> L <- dlply(md, .(condition, time), function(d) lm(value ~ group, data = d))
>
> This would produce a list of models, one per condition * time
> combination. You could then use do.call() or another plyr function to
> extract elements of interest from the list, which generally would
> require that you write one or more (anonymous) functions to extract
> the information of interest. A similar approach can be used to
> generate a list of ggplots. It's cleaner if you put your code into
> functions and have it return the output you want, but you have to be
> careful about the form of the input and output - for dlply(), you want
> a function that takes a data frame as input. If you just want the
> plots printed, you could write a function to do that for a single plot
> (again with a data frame as input) and then use the d_ply() function
> in plyr to print them en masse, but it would generally make more sense
> to write them to files, so you'd probably be better off writing a
> function that ends with a ggsave() call and call d_ply(). [Note: the _
> is used when your function creates a side effect, such as printing or
> saving a plot object - it returns nothing to the R console.]
>
> As for the numeric summaries,
>
> ddply(md, .(condition, time, group), function(d) mean(d$value, na.rm =
> TRUE))
>
> would work. The advantage of plyr (and its successor, dplyr) is that
> you can pass arbitrary functions as the third argument as long as the
> input is a data frame and the output is a data frame (or something
> that can be coerced to a data frame). This is more robust than
> tapply().
>
> Comment: plyr/reshape2 is a good starter package combination as it
> teaches you the value of the split-apply-combine approach to data
> analysis, but it can be (very) slow. The dplyr/tidyr package
> combination is faster, more computationally efficient version of
> plyr::ddply() and reshape2 and is recommended for use, although you
> have to learn a somewhat different approach to R programming in the
> process. If you're fairly new to R, that shouldn't matter much.
>
> There has been a lot of work in the last year or two to improve the
> flow of programming for tasks such as recursive plotting or model
> fitting. The dplyr and tidyr packages are meant to be replacements for
> plyr and reshape[2], respectively; both are written by Hadley

Re: [R] Looping through different groups of variables in models

2016-08-31 Thread Jim Lemon

Hi Kai,
Perhaps something like this:

kmdf<-data.frame(group=rep(c("exp","cont"),each=50),
 time=factor(rep(1:5,20)),
 condition=rep(rep(c("hot","cold"),each=25),2),
 value=sample(100:200,100))
for(timeindx in levels(kmdf$time)) {
 for(condindx in levels(kmdf$condition)) {
  cat("Time",timeindx,"Condition",condindx,"\n")
  subdat<-kmdf[kmdf$time == timeindx & kmdf$condition == condindx,]
  fit<-lm(value~group,subdat)
  print(summary(fit))
  plot(subdat$group,subdat$value)
  by(subdat$value,subdat$group,mean)
 }
}

Getting elegant output is another matter. Have a look at packages
meant to produce fancier R output.

Jim


On Thu, Sep 1, 2016 at 7:58 AM, Kai Mx  wrote:
> Hi all,
>
> I am having trouble wrapping my head around a probably simple issue:
>
> After using the reshape package, I have a melted dataframe with the columns
> group (factor), time (int), condition (factor), value(int).
>
> These are experimental data. The data were obtained from different
> treatment groups (group) under different conditions at different time
> points.
>
> I would now like to perform ANOVA, boxplots and calculate means to compare
> groups for all combinations of conditions and time points with something
> like
>
> fit <- lm(value~group, data=[subset of data with combination of
> condition/timepoint])
> summary (fit)
> p <- ggplot([subset of data with combination of condition/timepoint],
> aes(x= group, y=value)) + geom_boxplot ()
> print (p)
> tapply ([subset of data with combination of condition/timepoint]$value,
> subset of data with combination of condition/timepoint]$group, mean)
>
> How can I loop through these combinations and output the data in an elegant
> way?
>
> Thanks so much!
>
> Best,
>
> Kai
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping through different groups of variables in models

2016-08-31 Thread Bert Gunter

Kai:

1. I think that this is a very bad idea, statistically, if I
understand you correctly. Generally, your model should incorporate all
groups, time points, and conditions together, not individually.

2. But plotting results in "small multiples" -- aka "trellis plots"
may be useful. This is done in ggplot through "faceting" which you
could read up on and try (I use lattice, not ggplot, to do this sort
of thing, so can't help with code).

3. However, I think your question is mostly statistical in nature
(define "elegant"), and if so, is off topic here. You might therefore
try stats.stackexchange.com instead to get ideas on how to approach
your data, solicit other opinions on whether what you want to do makes
sense (and if not, what else), etc. Or, perhaps better yet, consult a
local statistical resource.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Wed, Aug 31, 2016 at 2:58 PM, Kai Mx  wrote:
> Hi all,
>
> I am having trouble wrapping my head around a probably simple issue:
>
> After using the reshape package, I have a melted dataframe with the columns
> group (factor), time (int), condition (factor), value(int).
>
> These are experimental data. The data were obtained from different
> treatment groups (group) under different conditions at different time
> points.
>
> I would now like to perform ANOVA, boxplots and calculate means to compare
> groups for all combinations of conditions and time points with something
> like
>
> fit <- lm(value~group, data=[subset of data with combination of
> condition/timepoint])
> summary (fit)
> p <- ggplot([subset of data with combination of condition/timepoint],
> aes(x= group, y=value)) + geom_boxplot ()
> print (p)
> tapply ([subset of data with combination of condition/timepoint]$value,
> subset of data with combination of condition/timepoint]$group, mean)
>
> How can I loop through these combinations and output the data in an elegant
> way?
>
> Thanks so much!
>
> Best,
>
> Kai
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Looping through different groups of variables in models

2016-08-31 Thread Kai Mx

Hi all,

I am having trouble wrapping my head around a probably simple issue:

After using the reshape package, I have a melted dataframe with the columns
group (factor), time (int), condition (factor), value(int).

These are experimental data. The data were obtained from different
treatment groups (group) under different conditions at different time
points.

I would now like to perform ANOVA, boxplots and calculate means to compare
groups for all combinations of conditions and time points with something
like

fit <- lm(value~group, data=[subset of data with combination of
condition/timepoint])
summary (fit)
p <- ggplot([subset of data with combination of condition/timepoint],
aes(x= group, y=value)) + geom_boxplot ()
print (p)
tapply ([subset of data with combination of condition/timepoint]$value,
subset of data with combination of condition/timepoint]$group, mean)

How can I loop through these combinations and output the data in an elegant
way?

Thanks so much!

Best,

Kai

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] looping through rows of a logical matrix for combination with and testing of a numeric vector

2015-12-03 Thread debra ragland via R-help

Or sorry, I should clarify, I struggle with putting components together when it 
comes to looping. 



On Thursday, December 3, 2015 11:43 AM, debra ragland  
wrote:
Thanks again!

And no Bert, this is not homework. I have a very minimal background in R and 
struggle with putting concepts together. 

But thanks anyway. 




On Thursday, December 3, 2015 11:04 AM, Boris Steipe  
wrote:
Use your logical vector to extract the x, y values for the test from the rows 
of the matrix:
  x <- mat[3, x2]
  y <- mat[3, !x2]

Or: use the formula version of wilcox.test as explained in ?wilcox.test


B.



On Dec 3, 2015, at 10:28 AM, debra ragland via R-help  
wrote:

> I have read in a sequence alignment and have done the necessary steps to 
> separate and store the elements of the original input into a new list of 
> character vectors. I have compared the sequence list to a "standard" vector, 
> such that the return is a matrix of logical values indicating TRUE if there 
> is a match to the standard and FALSE where there is no match.
> 
> An example;
> 
> mylist=c("AAEBCC", "AABDCC", "AABBCD")
> list.2 <- strsplit(mylist, split=NULL)
> # setting a standard for comparison
> std.string <- "AABBCC"
> standard <- unlist(strsplit(std.string, split=NULL))
> #create a logical matrix 
> mat<-sapply(list.2, function(x) x==standard)
>> mat
> 
> 
> [,1]  [,2]  [,3]
> [1,]  TRUE  TRUE  TRUE
> [2,]  TRUE  TRUE  TRUE
> [3,] FALSE  TRUE  TRUE
> [4,]  TRUE FALSE  TRUE
> [5,]  TRUE  TRUE  TRUE
> [6,]  TRUE  TRUE FALSE
> 
> Where the number of columns is the same length as the original input strings 
> I compared (15) and the number of rows corresponds is the same as the number 
> of strings from the input (99).
> 
> I also have a named numeric vector(of length 15)--where the "names" of the 
> the values match those of the columns of the logical matrix. For the example
> 
> x2 = runif(3, 5.0, 7.5)
> names(x2) = 1:3
>> x2 
> 123 
> 5.352611 7.058169 6.993105
> 
> For each row in the in the logical matrix I want to combine the logical 
> values with the values from the numeric vector so that I can run a 
> wilcox.test using those values that are "TRUE" against those that are "FALSE".
> 
> For instance if each row pairing was a mini data.frame it would look 
> like
> 
> df=data.frame(x2, mat[3,])
>> df
> 1 5.352611 FALSE
> 2 7.058169 TRUE
> 3 6.993105 TRUE
> wilcox.test(df) #based on all true values vs. all false values
> 
> How can this be achieved?
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] looping through rows of a logical matrix for combination with and testing of a numeric vector

2015-12-03 Thread debra ragland via R-help

Thanks again!

And no Bert, this is not homework. I have a very minimal background in R and 
struggle with putting concepts together. 

But thanks anyway. 



On Thursday, December 3, 2015 11:04 AM, Boris Steipe  
wrote:
Use your logical vector to extract the x, y values for the test from the rows 
of the matrix:
  x <- mat[3, x2]
  y <- mat[3, !x2]

Or: use the formula version of wilcox.test as explained in ?wilcox.test


B.



On Dec 3, 2015, at 10:28 AM, debra ragland via R-help  
wrote:

> I have read in a sequence alignment and have done the necessary steps to 
> separate and store the elements of the original input into a new list of 
> character vectors. I have compared the sequence list to a "standard" vector, 
> such that the return is a matrix of logical values indicating TRUE if there 
> is a match to the standard and FALSE where there is no match.
> 
> An example;
> 
> mylist=c("AAEBCC", "AABDCC", "AABBCD")
> list.2 <- strsplit(mylist, split=NULL)
> # setting a standard for comparison
> std.string <- "AABBCC"
> standard <- unlist(strsplit(std.string, split=NULL))
> #create a logical matrix 
> mat<-sapply(list.2, function(x) x==standard)
>> mat
> 
> 
> [,1]  [,2]  [,3]
> [1,]  TRUE  TRUE  TRUE
> [2,]  TRUE  TRUE  TRUE
> [3,] FALSE  TRUE  TRUE
> [4,]  TRUE FALSE  TRUE
> [5,]  TRUE  TRUE  TRUE
> [6,]  TRUE  TRUE FALSE
> 
> Where the number of columns is the same length as the original input strings 
> I compared (15) and the number of rows corresponds is the same as the number 
> of strings from the input (99).
> 
> I also have a named numeric vector(of length 15)--where the "names" of the 
> the values match those of the columns of the logical matrix. For the example
> 
> x2 = runif(3, 5.0, 7.5)
> names(x2) = 1:3
>> x2 
> 123 
> 5.352611 7.058169 6.993105
> 
> For each row in the in the logical matrix I want to combine the logical 
> values with the values from the numeric vector so that I can run a 
> wilcox.test using those values that are "TRUE" against those that are "FALSE".
> 
> For instance if each row pairing was a mini data.frame it would look 
> like
> 
> df=data.frame(x2, mat[3,])
>> df
> 1 5.352611 FALSE
> 2 7.058169 TRUE
> 3 6.993105 TRUE
> wilcox.test(df) #based on all true values vs. all false values
> 
> How can this be achieved?
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] looping through rows of a logical matrix for combination with and testing of a numeric vector

2015-12-03 Thread Boris Steipe

Use your logical vector to extract the x, y values for the test from the rows 
of the matrix:
  x <- mat[3, x2]
  y <- mat[3, !x2]

Or: use the formula version of wilcox.test as explained in ?wilcox.test


B.


On Dec 3, 2015, at 10:28 AM, debra ragland via R-help  
wrote:

> I have read in a sequence alignment and have done the necessary steps to 
> separate and store the elements of the original input into a new list of 
> character vectors. I have compared the sequence list to a "standard" vector, 
> such that the return is a matrix of logical values indicating TRUE if there 
> is a match to the standard and FALSE where there is no match.
> 
> An example;
> 
> mylist=c("AAEBCC", "AABDCC", "AABBCD")
> list.2 <- strsplit(mylist, split=NULL)
> # setting a standard for comparison
> std.string <- "AABBCC"
> standard <- unlist(strsplit(std.string, split=NULL))
> #create a logical matrix 
> mat<-sapply(list.2, function(x) x==standard)
>> mat
> 
> 
> [,1]  [,2]  [,3]
> [1,]  TRUE  TRUE  TRUE
> [2,]  TRUE  TRUE  TRUE
> [3,] FALSE  TRUE  TRUE
> [4,]  TRUE FALSE  TRUE
> [5,]  TRUE  TRUE  TRUE
> [6,]  TRUE  TRUE FALSE
> 
> Where the number of columns is the same length as the original input strings 
> I compared (15) and the number of rows corresponds is the same as the number 
> of strings from the input (99).
> 
> I also have a named numeric vector(of length 15)--where the "names" of the 
> the values match those of the columns of the logical matrix. For the example
> 
> x2 = runif(3, 5.0, 7.5)
> names(x2) = 1:3
>> x2 
> 123 
> 5.352611 7.058169 6.993105
> 
> For each row in the in the logical matrix I want to combine the logical 
> values with the values from the numeric vector so that I can run a 
> wilcox.test using those values that are "TRUE" against those that are "FALSE".
> 
> For instance if each row pairing was a mini data.frame it would look 
> like
> 
> df=data.frame(x2, mat[3,])
>> df
> 1 5.352611 FALSE
> 2 7.058169 TRUE
> 3 6.993105 TRUE
> wilcox.test(df) #based on all true values vs. all false values
> 
> How can this be achieved?
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] looping through rows of a logical matrix for combination with and testing of a numeric vector

2015-12-03 Thread debra ragland via R-help

I have read in a sequence alignment and have done the necessary steps to 
separate and store the elements of the original input into a new list of 
character vectors. I have compared the sequence list to a "standard" vector, 
such that the return is a matrix of logical values indicating TRUE if there is 
a match to the standard and FALSE where there is no match.

An example;

mylist=c("AAEBCC", "AABDCC", "AABBCD")
list.2 <- strsplit(mylist, split=NULL)
# setting a standard for comparison
std.string <- "AABBCC"
standard <- unlist(strsplit(std.string, split=NULL))
#create a logical matrix 
mat<-sapply(list.2, function(x) x==standard)
>mat


[,1]  [,2]  [,3]
[1,]  TRUE  TRUE  TRUE
[2,]  TRUE  TRUE  TRUE
[3,] FALSE  TRUE  TRUE
[4,]  TRUE FALSE  TRUE
[5,]  TRUE  TRUE  TRUE
[6,]  TRUE  TRUE FALSE

Where the number of columns is the same length as the original input strings I 
compared (15) and the number of rows corresponds is the same as the number of 
strings from the input (99).

I also have a named numeric vector(of length 15)--where the "names" of the the 
values match those of the columns of the logical matrix. For the example

x2 = runif(3, 5.0, 7.5)
names(x2) = 1:3
> x2 
123 
5.352611 7.058169 6.993105

For each row in the in the logical matrix I want to combine the logical values 
with the values from the numeric vector so that I can run a wilcox.test using 
those values that are "TRUE" against those that are "FALSE".

For instance if each row pairing was a mini data.frame it would look like

df=data.frame(x2, mat[3,])
>df
1 5.352611 FALSE
2 7.058169 TRUE
3 6.993105 TRUE
wilcox.test(df) #based on all true values vs. all false values

How can this be achieved?

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] looping through rows of a logical matrix for combination with and testing of a numeric vector

2015-12-03 Thread Bert Gunter

I should have added -- is this homework? There is a no homework policy
on this list.

Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Thu, Dec 3, 2015 at 7:54 AM, Bert Gunter  wrote:
> Have you spent any time with an R tutorial or two? I ask, because you
> do not seem to have much knowledge of the language and its features.
> Have you made any effort to figure this out yourself? -- if so, show
> us your code and where/how it goes wrong. I ask, because you seem to
> be asking this list to do your work for you, which is not its purpose.
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
>-- Clifford Stoll
>
>
> On Thu, Dec 3, 2015 at 7:28 AM, debra ragland via R-help
>  wrote:
>> I have read in a sequence alignment and have done the necessary steps to 
>> separate and store the elements of the original input into a new list of 
>> character vectors. I have compared the sequence list to a "standard" vector, 
>> such that the return is a matrix of logical values indicating TRUE if there 
>> is a match to the standard and FALSE where there is no match.
>>
>> An example;
>>
>> mylist=c("AAEBCC", "AABDCC", "AABBCD")
>> list.2 <- strsplit(mylist, split=NULL)
>> # setting a standard for comparison
>> std.string <- "AABBCC"
>> standard <- unlist(strsplit(std.string, split=NULL))
>> #create a logical matrix
>> mat<-sapply(list.2, function(x) x==standard)
>>>mat
>>
>>
>> [,1]  [,2]  [,3]
>> [1,]  TRUE  TRUE  TRUE
>> [2,]  TRUE  TRUE  TRUE
>> [3,] FALSE  TRUE  TRUE
>> [4,]  TRUE FALSE  TRUE
>> [5,]  TRUE  TRUE  TRUE
>> [6,]  TRUE  TRUE FALSE
>>
>> Where the number of columns is the same length as the original input strings 
>> I compared (15) and the number of rows corresponds is the same as the number 
>> of strings from the input (99).
>>
>> I also have a named numeric vector(of length 15)--where the "names" of the 
>> the values match those of the columns of the logical matrix. For the example
>>
>> x2 = runif(3, 5.0, 7.5)
>> names(x2) = 1:3
>>> x2
>> 123
>> 5.352611 7.058169 6.993105
>>
>> For each row in the in the logical matrix I want to combine the logical 
>> values with the values from the numeric vector so that I can run a 
>> wilcox.test using those values that are "TRUE" against those that are 
>> "FALSE".
>>
>> For instance if each row pairing was a mini data.frame it would look 
>> like
>>
>> df=data.frame(x2, mat[3,])
>>>df
>> 1 5.352611 FALSE
>> 2 7.058169 TRUE
>> 3 6.993105 TRUE
>> wilcox.test(df) #based on all true values vs. all false values
>>
>> How can this be achieved?
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Looping through multiple sub elements of a list to compare to multiple components of a vector

2015-12-02 Thread debra ragland via R-help

I think I am making this problem harder than it has to be and so I keep getting 
stuck on what might be a trivial problem. 
I have used the seqinr package to load a protein sequence alignment containing 
15 protein sequences;
    > library(seqinr)    > x = 
read.alignment("proteins.fasta",format="fasta",forceToLower=FALSE)This 
automatically loads in a list of 4 elements including the sequences and other 
information.
I store the sequences to a new list;
   > mylist = x$seqwhich returns a character vector of 15 strings.
I have found that if I split the long character strings into individual 
characters it is easy to use lapply to loop over this list. So I use strsplit;
    >list.2 = strsplit(mylist, split = NULL)
>From this list I can determine which proteins have changes at certain 
>positions by using;
    >lapply(list.2, "[", 10) == "L"This returns a logical T/F vector for those 
elements of the list that do/do not the letter L at position 10. 
Because each of the protein sequences contains 99amino acids, I want to 
automate this process so that I do not have to compare/contrast positions 1 x 
1. Most of the changes occur between positions/letters 10-95. I have a standard 
character vector that I wish to use for comparison when looping through the 
list. 
Should I perhaps combine all --  the standard "letter"/aa vector, the list of 
protein sequences -- into one list? Or is it better to leave them separate for 
this comparison? I'm not sure what the output should be as I need to use it for 
another statistical test. Would a list of logical vectors be the most 
sufficient output to return? 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping through multiple sub elements of a list to compare to multiple components of a vector

2015-12-02 Thread Adams, Jean

First, a couple posting tips.  It's helpful to provide some example data
people can work with.  Also, please post in plain text (not html).

If you have a single standard for comparison, you might find an approach
like this helpful.

# example data
mylist <- c("AAEBCC", "AABDCC", "AABBCD")
list.2 <- strsplit(mylist, split=NULL)

# setting a standard for comparison
std.string <- "AABBCC"
standard <- unlist(strsplit(std.string, split=NULL))

sapply(list.2, function(x) x==standard)

This gives you a matrix of logicals with the number of rows the same length
as your original strings (the 99 amino acids) and the number of columns the
same length as the number of strings you're comparing (the 15 sequences).

  [,1]  [,2]  [,3]
[1,]  TRUE  TRUE  TRUE
[2,]  TRUE  TRUE  TRUE
[3,] FALSE  TRUE  TRUE
[4,]  TRUE FALSE  TRUE
[5,]  TRUE  TRUE  TRUE
[6,]  TRUE  TRUE FALSE

Jean

On Wed, Dec 2, 2015 at 9:39 AM, debra ragland via R-help <
r-help@r-project.org> wrote:

> I think I am making this problem harder than it has to be and so I keep
> getting stuck on what might be a trivial problem.
> I have used the seqinr package to load a protein sequence alignment
> containing 15 protein sequences;
> > library(seqinr)> x =
> read.alignment("proteins.fasta",format="fasta",forceToLower=FALSE)This
> automatically loads in a list of 4 elements including the sequences and
> other information.
> I store the sequences to a new list;
>> mylist = x$seqwhich returns a character vector of 15 strings.
> I have found that if I split the long character strings into individual
> characters it is easy to use lapply to loop over this list. So I use
> strsplit;
> >list.2 = strsplit(mylist, split = NULL)
> >From this list I can determine which proteins have changes at certain
> positions by using;
> >lapply(list.2, "[", 10) == "L"This returns a logical T/F vector for
> those elements of the list that do/do not the letter L at position 10.
> Because each of the protein sequences contains 99amino acids, I want to
> automate this process so that I do not have to compare/contrast positions 1
> x 1. Most of the changes occur between positions/letters 10-95. I have a
> standard character vector that I wish to use for comparison when looping
> through the list.
> Should I perhaps combine all --  the standard "letter"/aa vector, the list
> of protein sequences -- into one list? Or is it better to leave them
> separate for this comparison? I'm not sure what the output should be as I
> need to use it for another statistical test. Would a list of logical
> vectors be the most sufficient output to return?
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping help

2015-07-31 Thread PIKAL Petr

Hi

Your question is a bit cloudy. Simple loop can be realised to populate lists


res-vector(100, list)
for (i in 1:100) {

lll - do something based on i value

res[[i]] - put lll in ith place of the list
}

Cheers
Petr

 -Original Message-
 From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of April
 Smith
 Sent: Friday, July 31, 2015 2:21 AM
 To: r-help@r-project.org
 Subject: [R] Looping help

 I have never looped before and know I need to.  I am unsure how to
 proceed:


- Action I need done: d(Data[1,2:399], q=0, boot=TRUE,
boot.arg=list(num.iter=1000))
- I need this to happen to all rows, I need All[1,2:399] to increase
 to
All[2:399], etc.
- But I also need the results from q increasing from 0 to 0.25, 0.5,
 1,
2, 4,8,16,32,64 before the loop moves on to the next row.
- For each iteration I will receive two values: D and st.err.  I
 need
this put into a matrix


 I feel like this should be pretty simple to learn, but I have never
 looped before.

 I am hoping to get more of a tutorial on how to write loop code, then
 to just be given the loop code.

 Thanks,
 April

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně 
osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi 
či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. 
Delete the contents of this e-mail with all attachments and its copies from 
your system.
If you are not the intended recipient of this e-mail, you are not authorized to 
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by 
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a 
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately 
accept such offer; The sender of this e-mail (offer) excludes any acceptance of 
the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an 
express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into 
any contracts on behalf of the company except for cases in which he/she is 
expressly authorized to do so in writing, and such authorization or power of 
attorney is submitted to the recipient or the person represented by the 
recipient, or the existence of such authorization is known to the recipient of 
the person represented by the recipient.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping help

2015-07-31 Thread Jim Lemon

Hi April,
You need nested loops for something like this

qs- c(0,0.25,0.5,1,2,4,8,16,32,64)
nrows-dim(Data)[1]
nqs-length(qs)
D.mat-SE.mat-matrix(NA,nrow=nrows,ncol=nqs)
for(row in 1:nrows) {
 for(qval in 1:nqs) {
  # perform your calculation and set D.mat[row,qval] and
SE.mat[row,qval] to the return values
 }
}

Jim

 From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of April
 Smith
 Sent: Friday, July 31, 2015 2:21 AM
 To: r-help@r-project.org
 Subject: [R] Looping help

 I have never looped before and know I need to.  I am unsure how to
 proceed:


- Action I need done: d(Data[1,2:399], q=0, boot=TRUE,
boot.arg=list(num.iter=1000))
- I need this to happen to all rows, I need All[1,2:399] to increase
 to
All[2:399], etc.
- But I also need the results from q increasing from 0 to 0.25, 0.5,
 1,
2, 4,8,16,32,64 before the loop moves on to the next row.
- For each iteration I will receive two values: D and st.err.  I
 need
this put into a matrix


 I feel like this should be pretty simple to learn, but I have never
 looped before.

 I am hoping to get more of a tutorial on how to write loop code, then
 to just be given the loop code.

 Thanks,
 April

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Looping help

2015-07-30 Thread April Smith

I have never looped before and know I need to.  I am unsure how to proceed:


   - Action I need done: d(Data[1,2:399], q=0, boot=TRUE,
   boot.arg=list(num.iter=1000))
   - I need this to happen to all rows, I need All[1,2:399] to increase to
   All[2:399], etc.
   - But I also need the results from q increasing from 0 to 0.25, 0.5, 1,
   2, 4,8,16,32,64 before the loop moves on to the next row.
   - For each iteration I will receive two values: D and st.err.  I need
   this put into a matrix


I feel like this should be pretty simple to learn, but I have never looped
before.

I am hoping to get more of a tutorial on how to write loop code, then to
just be given the loop code.

Thanks,
April

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Looping Through List of .csv Files to Work with Subsets of the Data

2015-06-08 Thread Chad Danyluck

Hello,

I want to subset specific rows of data from 80 .csv files and write those
subsets into new .csv files. The data I want to subset starts on a
different row for each original .csv file. I've created variables that
identify which row the subset should start and end on, but I want to loop
through this process and I am not sure what to do. I've attempted to write
the loop below, albeit, much of it is pseudo code. If anyone can provide me
with some tips I'd appreciate it.

 This data file is used to create the variables where the subsetting
starts and ends for each participant 
mig.data - read.csv(/Users/cdanyluck/Documents/Studies/MIG -
Dissertation/Data  Syntax/mig.data.csv)

# These are the variable names for the start and end of each subset of
relevant data (baseline, audio, and free)
participant.ids - mig.processed.data$participant.id
participant.baseline.start - mig.processed.data$baseline.row.start
participant.baseline.end - mig.processed.data$baseline.row.end
participant.audio.start - mig.processed.data$audio.meditation.row.start
participant.audio.end - mig.processed.data$audio.meditation.row.end
participant.free.start - mig.processed.data$free.meditation.row.start
participant.free.end - mig.processed.data$free.meditation.row.end

# read into a list the individual files from which to subset the data
participant.files - list.files(/Users/cdanyluck/Documents/Studies/MIG -
Dissertation/Data  Syntax/MIG_RAW DATA  TXT Files/Plain Text Files)

# loop through each participant
for (i in 1:length(participant.files)) {

# get baseline rows
results.baseline -
participant.files[participant.baseline.start[i]:participant.baseline.end[i],]

# get audio rows
results.audio
- participant.files[participant.audio.start[i]:participant.audio.end[i],]

# get free rows
results.free -
participant.files[participant.free.start[i]:participant.free.end[i],]

# write out participant relevant data
write.csv(results.baseline, file=baseline[i].csv)
write.csv(results.audio, file = audio[i].csv)
write.csv(results.free, file = free[i].csv)

}

-- 
Chad M. Danyluck, MA
PhD Candidate, Psychology
University of Toronto



“There is nothing either good or bad but thinking makes it so.” - William
Shakespeare

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping Through List of .csv Files to Work with Subsets of the Data

2015-06-08 Thread Chad Danyluck

Thank you Don.

I've incorporated your suggestions which have helped me to understand how
loops work better than previously. However, the loop gets stuck trying to
read the current file:

mig.processed.data - read.csv(/Users/cdanyluck/Documents/Studies/MIG -
Dissertation/Data  Syntax/mig.log.data.addition.csv)

## ASSUMPTION: Starting with augmented processedbook and correct
free.meditation.end
 Read in all data files and Loop through to create new data files
segmented by the rows identified before 

# get required data
participant.ids - mig.processed.data$participant.id
participant.baseline.start - mig.processed.data$baseline.row.start
participant.baseline.end - mig.processed.data$baseline.row.end
participant.audio.start - mig.processed.data$audio.meditation.row.start
participant.audio.end - mig.processed.data$audio.meditation.row.end
participant.free.start - mig.processed.data$free.meditation.row.start
participant.free.end - mig.processed.data$free.meditation.row.end

participant.files - list.files(/Users/cdanyluck/Documents/Studies/MIG -
Dissertation/Data  Syntax/MIG_RAW DATA  TXT Files/Plain Text Files)

for (i in 1:length(participant.files)) {

 id - participant.files[i]

  ## if id is numeric, e.g., 1, 2, 3 ... 80 then I would do this
  ## to ensure that the files sort properly when viewed by the operating
#system
 idc - formatC(id, width=3, flag='0')

#current file
  crnt.file[i] - read.csv( participant.files[i] )

## base
  tmp.base -
crnt.file[participant.baseline.start:participant.baseline.end, ]
  write.csv(tmp.base, file=paste0('baseline',idc,'.csv'))


  ## audio
  tmp.audio - crnt.file[participant.audio.start:participant.audio.end, ]
  write.csv(tmp.audio, file=paste0('audio',idc,'.csv'))



  ## free
  tmp.free - crnt.file[participant.free.start:participant.free.end, ]
  write.csv(tmp.free, file=paste0('free',idc,'.csv'))

}

The error message reads:

Error in file(file, rt) : cannot open the connection
In addition: Warning message:
In file(file, rt) : cannot open file '103.csv': No such file or directory

So it seems to be calling the first file in the list but getting stuck. Any
suggestions?

Best,

Chad

On Mon, Jun 8, 2015 at 8:07 PM, MacQueen, Don macque...@llnl.gov wrote:

 So you have 80 files, one for each participant?

 It appears that from each of the 80 files you want to extract three
 subsets of rows,
   one set for baseline
   one set for audio
   one set for free

 What I think I would do, if the above is correct, is create one master
 file. This file will have eight columns:
 (I'll show an example column name, followed by a description)
   id  participant id
   fn   file name for that participant
   srb  start row for baseline
   erb  end row for baseline
   sra  start row for audio
   era  end row for audio
   srf  start row for free
   erf  end row for free

 This may be fairly close to what you already have, but I'm not sure.

 I would then load the master file into R
   mstf - read.csv( {the master file} )

 Then loop through its rows, and since each row has all the information
 necessary to read the participant's individual file and identify which
 rows to subset, a loop like this should work.

 for (irow in seq(nrow(mstf$id))) {

   id - mstf$id[irow]
   ## if id is numeric, e.g., 1, 2, 3 ... 80 then I would do this
   ## to ensure that the files sort properly when viewed by the operating
 system
   idc - formatC(id, width=2, flag='0')

   crnt.file - read.csv( mstf$fn[irow] )

   ## base
   tmp.base - crnt.file[ mstf$srb[irow]:mstf$erb[irow] , ]
   write.csv(tmp.base, file=paste0('baseline',idc,'.csv')


   ## audio
   tmp.audio - crnt.file[ mstf$sra[irow]:mstf$era[irow] , ]
   write.csv(tmp.audio, file=paste0('audio',idc,'.csv')



   ## free
   tmp.free - crnt.file[ mstf$srf[irow]:mstf$erf[irow] , ]
   write.csv(tmp.free, file=paste0('free',idc,'.csv')

 }


 Obviously, I can't test this. And there may be (likely are!) some typos in
 it.

 Note that it's not necessary to create variables that identify which row
 the subset should start and end on; these are just looked up from the
 master file when needed. Similarly, the three respective subsets are
 stored in temporary data frames, because they are not (I presume) needed
 when the whole thing is done. (if they were needed, then a different
 strategy would be more appropriate)

 There are different ways to index the loop. I just picked one.

 --
 Don MacQueen

 Lawrence Livermore National Laboratory
 7000 East Ave., L-627
 Livermore, CA 94550
 925-423-1062





 On 6/8/15, 2:48 PM, Chad Danyluck c.danyl...@gmail.com wrote:

 Hello,
 
 I want to subset specific rows of data from 80 .csv files and write those
 subsets into new .csv files. The data I want to subset starts on a
 different row for each original .csv file. I've created variables that
 identify which row the subset should start and end on, but I want to loop
 through this process and I am not sure what to do. I've attempted to write
 the loop below,

Re: [R] Looping Through List of .csv Files to Work with Subsets of the Data

2015-06-08 Thread William Dunlap

   participant.files - list.files(/Users/cdanyluck/Documents/Studies/MIG -
Dissertation/Data  Syntax/MIG_RAW DATA  TXT Files/Plain Text Files)

Try adding the argument full.names=TRUE to that call to list.files().

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, Jun 8, 2015 at 7:15 PM, Chad Danyluck c.danyl...@gmail.com wrote:

 Thank you Don.

 I've incorporated your suggestions which have helped me to understand how
 loops work better than previously. However, the loop gets stuck trying to
 read the current file:

 mig.processed.data - read.csv(/Users/cdanyluck/Documents/Studies/MIG -
 Dissertation/Data  Syntax/mig.log.data.addition.csv)

 ## ASSUMPTION: Starting with augmented processedbook and correct
 free.meditation.end
  Read in all data files and Loop through to create new data files
 segmented by the rows identified before 

 # get required data
 participant.ids - mig.processed.data$participant.id
 participant.baseline.start - mig.processed.data$baseline.row.start
 participant.baseline.end - mig.processed.data$baseline.row.end
 participant.audio.start - mig.processed.data$audio.meditation.row.start
 participant.audio.end - mig.processed.data$audio.meditation.row.end
 participant.free.start - mig.processed.data$free.meditation.row.start
 participant.free.end - mig.processed.data$free.meditation.row.end

 participant.files - list.files(/Users/cdanyluck/Documents/Studies/MIG -
 Dissertation/Data  Syntax/MIG_RAW DATA  TXT Files/Plain Text Files)

 for (i in 1:length(participant.files)) {

  id - participant.files[i]

   ## if id is numeric, e.g., 1, 2, 3 ... 80 then I would do this
   ## to ensure that the files sort properly when viewed by the operating
 #system
  idc - formatC(id, width=3, flag='0')

 #current file
   crnt.file[i] - read.csv( participant.files[i] )

 ## base
   tmp.base -
 crnt.file[participant.baseline.start:participant.baseline.end, ]
   write.csv(tmp.base, file=paste0('baseline',idc,'.csv'))


   ## audio
   tmp.audio - crnt.file[participant.audio.start:participant.audio.end, ]
   write.csv(tmp.audio, file=paste0('audio',idc,'.csv'))



   ## free
   tmp.free - crnt.file[participant.free.start:participant.free.end, ]
   write.csv(tmp.free, file=paste0('free',idc,'.csv'))

 }

 The error message reads:

 Error in file(file, rt) : cannot open the connection
 In addition: Warning message:
 In file(file, rt) : cannot open file '103.csv': No such file or directory

 So it seems to be calling the first file in the list but getting stuck. Any
 suggestions?

 Best,

 Chad

 On Mon, Jun 8, 2015 at 8:07 PM, MacQueen, Don macque...@llnl.gov wrote:

  So you have 80 files, one for each participant?
 
  It appears that from each of the 80 files you want to extract three
  subsets of rows,
one set for baseline
one set for audio
one set for free
 
  What I think I would do, if the above is correct, is create one master
  file. This file will have eight columns:
  (I'll show an example column name, followed by a description)
id  participant id
fn   file name for that participant
srb  start row for baseline
erb  end row for baseline
sra  start row for audio
era  end row for audio
srf  start row for free
erf  end row for free
 
  This may be fairly close to what you already have, but I'm not sure.
 
  I would then load the master file into R
mstf - read.csv( {the master file} )
 
  Then loop through its rows, and since each row has all the information
  necessary to read the participant's individual file and identify which
  rows to subset, a loop like this should work.
 
  for (irow in seq(nrow(mstf$id))) {
 
id - mstf$id[irow]
## if id is numeric, e.g., 1, 2, 3 ... 80 then I would do this
## to ensure that the files sort properly when viewed by the operating
  system
idc - formatC(id, width=2, flag='0')
 
crnt.file - read.csv( mstf$fn[irow] )
 
## base
tmp.base - crnt.file[ mstf$srb[irow]:mstf$erb[irow] , ]
write.csv(tmp.base, file=paste0('baseline',idc,'.csv')
 
 
## audio
tmp.audio - crnt.file[ mstf$sra[irow]:mstf$era[irow] , ]
write.csv(tmp.audio, file=paste0('audio',idc,'.csv')
 
 
 
## free
tmp.free - crnt.file[ mstf$srf[irow]:mstf$erf[irow] , ]
write.csv(tmp.free, file=paste0('free',idc,'.csv')
 
  }
 
 
  Obviously, I can't test this. And there may be (likely are!) some typos
 in
  it.
 
  Note that it's not necessary to create variables that identify which row
  the subset should start and end on; these are just looked up from the
  master file when needed. Similarly, the three respective subsets are
  stored in temporary data frames, because they are not (I presume) needed
  when the whole thing is done. (if they were needed, then a different
  strategy would be more appropriate)
 
  There are different ways to index the loop. I just picked one.
 
  --
  Don MacQueen
 
  Lawrence Livermore National Laboratory
  7000 East Ave., L-627
  Livermore, CA 94550
  925-423-1062

Re: [R] Looping Through List of .csv Files to Work with Subsets of the Data

2015-06-08 Thread MacQueen, Don

So you have 80 files, one for each participant?

It appears that from each of the 80 files you want to extract three
subsets of rows,
  one set for baseline
  one set for audio
  one set for free

What I think I would do, if the above is correct, is create one master
file. This file will have eight columns:
(I'll show an example column name, followed by a description)
  id  participant id
  fn   file name for that participant
  srb  start row for baseline
  erb  end row for baseline
  sra  start row for audio
  era  end row for audio
  srf  start row for free
  erf  end row for free

This may be fairly close to what you already have, but I'm not sure.

I would then load the master file into R
  mstf - read.csv( {the master file} )

Then loop through its rows, and since each row has all the information
necessary to read the participant's individual file and identify which
rows to subset, a loop like this should work.

for (irow in seq(nrow(mstf$id))) {

  id - mstf$id[irow]
  ## if id is numeric, e.g., 1, 2, 3 ... 80 then I would do this
  ## to ensure that the files sort properly when viewed by the operating
system
  idc - formatC(id, width=2, flag='0')

  crnt.file - read.csv( mstf$fn[irow] )

  ## base
  tmp.base - crnt.file[ mstf$srb[irow]:mstf$erb[irow] , ]
  write.csv(tmp.base, file=paste0('baseline',idc,'.csv')


  ## audio
  tmp.audio - crnt.file[ mstf$sra[irow]:mstf$era[irow] , ]
  write.csv(tmp.audio, file=paste0('audio',idc,'.csv')



  ## free
  tmp.free - crnt.file[ mstf$srf[irow]:mstf$erf[irow] , ]
  write.csv(tmp.free, file=paste0('free',idc,'.csv')

}


Obviously, I can't test this. And there may be (likely are!) some typos in
it.

Note that it's not necessary to create variables that identify which row
the subset should start and end on; these are just looked up from the
master file when needed. Similarly, the three respective subsets are
stored in temporary data frames, because they are not (I presume) needed
when the whole thing is done. (if they were needed, then a different
strategy would be more appropriate)

There are different ways to index the loop. I just picked one.

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 6/8/15, 2:48 PM, Chad Danyluck c.danyl...@gmail.com wrote:

Hello,

I want to subset specific rows of data from 80 .csv files and write those
subsets into new .csv files. The data I want to subset starts on a
different row for each original .csv file. I've created variables that
identify which row the subset should start and end on, but I want to loop
through this process and I am not sure what to do. I've attempted to write
the loop below, albeit, much of it is pseudo code. If anyone can provide
me
with some tips I'd appreciate it.

 This data file is used to create the variables where the subsetting
starts and ends for each participant 
mig.data - read.csv(/Users/cdanyluck/Documents/Studies/MIG -
Dissertation/Data  Syntax/mig.data.csv)

# These are the variable names for the start and end of each subset of
relevant data (baseline, audio, and free)
participant.ids - mig.processed.data$participant.id
participant.baseline.start - mig.processed.data$baseline.row.start
participant.baseline.end - mig.processed.data$baseline.row.end
participant.audio.start - mig.processed.data$audio.meditation.row.start
participant.audio.end - mig.processed.data$audio.meditation.row.end
participant.free.start - mig.processed.data$free.meditation.row.start
participant.free.end - mig.processed.data$free.meditation.row.end

# read into a list the individual files from which to subset the data
participant.files - list.files(/Users/cdanyluck/Documents/Studies/MIG -
Dissertation/Data  Syntax/MIG_RAW DATA  TXT Files/Plain Text Files)

# loop through each participant
for (i in 1:length(participant.files)) {

# get baseline rows
results.baseline -
participant.files[participant.baseline.start[i]:participant.baseline.end[i
],]

# get audio rows
results.audio
- participant.files[participant.audio.start[i]:participant.audio.end[i],]

# get free rows
results.free -
participant.files[participant.free.start[i]:participant.free.end[i],]

# write out participant relevant data
write.csv(results.baseline, file=baseline[i].csv)
write.csv(results.audio, file = audio[i].csv)
write.csv(results.free, file = free[i].csv)

}

-- 
Chad M. Danyluck, MA
PhD Candidate, Psychology
University of Toronto



³There is nothing either good or bad but thinking makes it so.² - William
Shakespeare

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org

[R] Looping and break

2015-03-02 Thread Scott Colwell

Hello,

I apologies for bringing up next and break in loops given that there is so
much on the net about it, but I've tried numerous examples found using
Google and just can't seem to get this to work.

This is a simple version of what I am doing with matrices but it shows the
issue. I need to have the loop indexed as n to perform a calculation on the
variable total. But if total is greater than 8, it goes to the next loop
indexed a.  For example, it does condition a = 1 for n = 1 to 50 but
within n if total is greater than 8 it goes to the next condition of a which
would be a = 2, and so on.

for (a in 1:3){
  
  if (a == 1) { b - c(1:5) }
  if (a == 2) { b - c(1:5) }
  if (a == 3) { b - c(1:5) }
  
  for (n in 1:50){
  
 if (n  15) next

 total - 2*b
  
 if (total  8) next

  }
}

Any help would be greatly appreciated.

Thanks,

Scott



--
View this message in context: 
http://r.789695.n4.nabble.com/Looping-and-break-tp4704093.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping and break

2015-03-02 Thread Rolf Turner


On 03/03/15 15:04, Jeff Newmiller wrote:

Your example is decidedly not expressed in R, though it looks like
you tried. Can you provide the hand-computed result that you are
trying to obtain?

Note that the reason you cannot find anything about next or break in
R is that they don't exist.


Point of order, Mr. Chairman, but they ***do*** exist.  See e.g ?next 
(which actually takes you to the help for Control Flow).



There are generally alternative ways to
accomplish the kinds of things you might want to accomplish without
them, and those alternatives often don't involve explicit loops at
all.


Otherwise I concur with everything you say.

cheers,

Rolf

--
Rolf Turner
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276
Home phone: +64-9-480-4619

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping and break

2015-03-02 Thread Jeff Newmiller

Sigh. To be positive is to be wrong at the top of one's lungs. Next I will be 
told R has a goto statement.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On March 2, 2015 6:23:57 PM PST, Rolf Turner r.tur...@auckland.ac.nz wrote:
On 03/03/15 15:04, Jeff Newmiller wrote:
 Your example is decidedly not expressed in R, though it looks like
 you tried. Can you provide the hand-computed result that you are
 trying to obtain?

 Note that the reason you cannot find anything about next or break in
 R is that they don't exist.

Point of order, Mr. Chairman, but they ***do*** exist.  See e.g ?next

(which actually takes you to the help for Control Flow).

 There are generally alternative ways to
 accomplish the kinds of things you might want to accomplish without
 them, and those alternatives often don't involve explicit loops at
 all.

Otherwise I concur with everything you say.

cheers,

Rolf

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping and break

2015-03-02 Thread Jeff Newmiller

Your example is decidedly not expressed in R, though it looks like you tried. 
Can you provide the hand-computed result that you are trying to obtain?

Note that the reason you cannot find anything about next or break in R is that 
they don't exist. There are generally alternative ways to accomplish the kinds 
of things you might want to accomplish without them, and those alternatives 
often don't involve explicit loops at all.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On March 2, 2015 4:11:21 PM PST, Scott Colwell scolw...@uoguelph.ca wrote:
Hello,

I apologies for bringing up next and break in loops given that there is
so
much on the net about it, but I've tried numerous examples found using
Google and just can't seem to get this to work.

This is a simple version of what I am doing with matrices but it shows
the
issue. I need to have the loop indexed as n to perform a calculation on
the
variable total. But if total is greater than 8, it goes to the next
loop
indexed a.  For example, it does condition a = 1 for n = 1 to 50 but
within n if total is greater than 8 it goes to the next condition of a
which
would be a = 2, and so on.

for (a in 1:3){
  
  if (a == 1) { b - c(1:5) }
  if (a == 2) { b - c(1:5) }
  if (a == 3) { b - c(1:5) }
  
  for (n in 1:50){
  
 if (n  15) next

 total - 2*b
  
 if (total  8) next

  }
}

Any help would be greatly appreciated.

Thanks,

Scott



--
View this message in context:
http://r.789695.n4.nabble.com/Looping-and-break-tp4704093.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping and break

2015-03-02 Thread Rolf Turner



On 03/03/15 16:08, Jeff Newmiller wrote:


Sigh. To be positive is to be wrong at the top of one's lungs. Next I
will be told R has a goto statement.


I am ***positive*** that it hasn't! :-)  Well, 99.999% confident. 
Although I guess it's not inconceivable that some misguided nerd might 
construct one.  In R all things are possible.  It'd be tough, but, in 
view of the fact that statements are not identified/identifiable in R 
so it would be hard to tell the code, uh, where to go.


cheers,

Rolf

--
Rolf Turner
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276
Home phone: +64-9-480-4619

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] looping multipanel plots to different figures

2015-02-19 Thread efisio solazzo


Dear,
cannot find  a way to direct multipanel plots to different figures 
(files) while within a loop.


Say, the loop creates two plots each step: one plot should go to figure 
1 and the other to  figure 2.
Same for the next steps of the loop: the plots should go to figure 1 
and  figure 2 in a multipanel fashion.


I am not sure at which point to open the files and set the multipanel 
parameters...all I can get is two files with all plots overlaid to the 
same position.


Thanks

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] looping multipanel plots to different figures

2015-02-19 Thread Jim Lemon

Hi efisio,
I read this as wanting to start a new graphics device, then set some plot
parameters, display two plots and then close the graphics device at each
iteration of the loop. If so,

plot_filenames-c(plot1.png,plot2.png,plot3.png)
for(plotfn in plot_filenames) {
 png(plotfn)
 par(mfrow=c(1,2))
 hist(sample(1:5,30,TRUE))
 hist(sample(1:5,30,TRUE))
 dev.off()
}

Jim



On Thu, Feb 19, 2015 at 7:25 PM, efisio solazzo 
efisio.sola...@jrc.ec.europa.eu wrote:

 Dear,
 cannot find  a way to direct multipanel plots to different figures (files)
 while within a loop.

 Say, the loop creates two plots each step: one plot should go to figure 1
 and the other to  figure 2.
 Same for the next steps of the loop: the plots should go to figure 1 and
 figure 2 in a multipanel fashion.

 I am not sure at which point to open the files and set the multipanel
 parameters...all I can get is two files with all plots overlaid to the same
 position.

 Thanks

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/
 posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] looping multipanel plots to different figures

2015-02-19 Thread efisio solazzo

Thanks Jim,
actually I need to keep open two devices at the same time, and within 
the loop access either of them in alternation. In MatLab there is the 
command Figure(#) which keeps track of the open devices and direct the 
output of the plot to whichever of them.

For example:
plot_filenames-c(plot1.png,plot2.png,plot3.png)
for (i in 1:5) {
  png(plot_filenames[1])
  par(mfrow=c(1,2))
  hist(sample(i:10,30,TRUE)) #???
  png(plot_filenames[2]) #???
  par(mfrow=c(1,2))
  hist(sample(i+1:15,30,TRUE))
  dev.off() #???
}

Hope I've been clear.

Thanks, Efisio
===



On 19/02/2015 10:36, Jim Lemon wrote:
 Hi efisio,
 I read this as wanting to start a new graphics device, then set some 
 plot parameters, display two plots and then close the graphics device 
 at each iteration of the loop. If so,

 plot_filenames-c(plot1.png,plot2.png,plot3.png)
 for(plotfn in plot_filenames) {
  png(plotfn)
  par(mfrow=c(1,2))
  hist(sample(1:5,30,TRUE))
  hist(sample(1:5,30,TRUE))
  dev.off()
 }

 Jim


 On Thu, Feb 19, 2015 at 7:25 PM, efisio solazzo 
 efisio.sola...@jrc.ec.europa.eu 
 mailto:efisio.sola...@jrc.ec.europa.eu wrote:

 Dear,
 cannot find  a way to direct multipanel plots to different figures
 (files) while within a loop.

 Say, the loop creates two plots each step: one plot should go to
 figure 1 and the other to  figure 2.
 Same for the next steps of the loop: the plots should go to figure
 1 and  figure 2 in a multipanel fashion.

 I am not sure at which point to open the files and set the
 multipanel parameters...all I can get is two files with all plots
 overlaid to the same position.

 Thanks

 __
 R-help@r-project.org mailto:R-help@r-project.org mailing list --
 To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Efisio SOLAZZO, Ph.D.
European Commission, Joint Research Centre,
Institute for Environment and Sustainability,
TP123, Via E. Fermi, 2749 I-21027 Ispra (VA), Italy
Tel: +390332789944 Fax: +390332785837



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] looping multipanel plots to different figures

2015-02-19 Thread Jim Lemon

Hi efisio,
Okay, you can switch devices using the dev.* functions in the grDevices
package. If you only have two devices open at one time, this is not too
difficult:

#open both devices
png(...)
par(mfrow=c(1,2))
png(...)
par(mfrow=c(1,2))
hist(...)
dev.set(dev.next())
hist(...)
dev.set(dev.next())
# after end of loop
dev.off()
dev.off()

to switch devices. Remember to shut both down when the loop is complete.

Jim


Jim


On Thu, Feb 19, 2015 at 9:16 PM, efisio solazzo 
efisio.sola...@jrc.ec.europa.eu wrote:

  Thanks Jim,
 actually I need to keep open two devices at the same time, and within the
 loop access either of them in alternation. In MatLab there is the command
 Figure(#) which keeps track of the open devices and direct the output of
 the plot to whichever of them.

 For example:
 plot_filenames-c(plot1.png,plot2.png,plot3.png)
 for (i in 1:5) {
  png(plot_filenames[1])
  par(mfrow=c(1,2))
  hist(sample(i:10,30,TRUE)) #???
  png(plot_filenames[2]) #???
  par(mfrow=c(1,2))
  hist(sample(i+1:15,30,TRUE))
  dev.off() #???
 }

 Hope I've been clear.

 Thanks, Efisio
 ===



 On 19/02/2015 10:36, Jim Lemon wrote:

 Hi efisio,
 I read this as wanting to start a new graphics device, then set some plot
 parameters, display two plots and then close the graphics device at each
 iteration of the loop. If so,

 plot_filenames-c(plot1.png,plot2.png,plot3.png)
 for(plotfn in plot_filenames) {
  png(plotfn)
  par(mfrow=c(1,2))
  hist(sample(1:5,30,TRUE))
  hist(sample(1:5,30,TRUE))
  dev.off()
 }

  Jim



 On Thu, Feb 19, 2015 at 7:25 PM, efisio solazzo 
 efisio.sola...@jrc.ec.europa.eu wrote:

 Dear,
 cannot find  a way to direct multipanel plots to different figures
 (files) while within a loop.

 Say, the loop creates two plots each step: one plot should go to figure 1
 and the other to  figure 2.
 Same for the next steps of the loop: the plots should go to figure 1 and
 figure 2 in a multipanel fashion.

 I am not sure at which point to open the files and set the multipanel
 parameters...all I can get is two files with all plots overlaid to the same
 position.

 Thanks

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Efisio SOLAZZO, Ph.D.
 European Commission, Joint Research Centre,
 Institute for Environment and Sustainability,
 TP123, Via E. Fermi, 2749 I-21027 Ispra (VA), Italy
 Tel: +390332789944 Fax: +390332785837



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] looping in R

2014-04-02 Thread Abugri James

I ran the following loop on my SNP data and got an error message as
indicated
for (i in genenames){
+   current - fst1[which(fst1$Gene == i),]
+   num - nrow(current)
+   fst - max(current$fst)
+   position - mean(current$pos)
+   nposition - mean(current$newpos)
+   numhigh - nrow(current[which(current$fst  threshold),])
+   colors - mean(current$colors)
+   output - matrix(NA,nrow=1,ncol=8)
+   numthresh - paste(# SNPs  Fst = , threshold, sep=)
+   colnames(output) - c(gene, gene_old, pos, newpos, # Snps,
numthresh, Max.Fst, colors)
+   output[1,1] - i
+   output[1,2] - as.character(current[1, gene_old])
+   output[1,3] - position
+   output[1,4] - nposition
+   output[1,5] - num
+   output[1,6] - numhigh
+   output[1,7] - fst
+   output[1,8] - colors
+   maxfstgene - rbind(maxfstgene, output)
+ }
Error in output[1, 2] - as.character(current[1, gene_old]) :
  replacement has length zero
In addition: Warning message:
In mean.default(current$pos) :
  argument is not numeric or logical: returning NA
--

-- 
* The information contained in this email and any attachments may be 
legally privileged and confidential. If you are not an intended recipient, 
you are hereby notified that any dissemination, distribution, or copying of 
this e-mail is strictly prohibited. If you have received this e-mail in 
error, please notify the sender and permanently delete the e-mail and any 
attachments immediately. You should not retain, copy or use this e-mail or 
any attachments for any purpose, nor disclose all or any part of the 
contents to any other person.*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] looping in R

2014-04-02 Thread Jeff Newmiller

You desperately need to read the Posting Guide (mentioned in the footer of this 
email) which warns you not to post in HTML format, and learn how to make a 
reproducible example (e.g. 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example).

The problem lies in some interaction between your data and code, and without 
both we cannot help you.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On April 2, 2014 12:15:51 PM PDT, Abugri James jabu...@uds.edu.gh wrote:
I ran the following loop on my SNP data and got an error message as
indicated
for (i in genenames){
+   current - fst1[which(fst1$Gene == i),]
+   num - nrow(current)
+   fst - max(current$fst)
+   position - mean(current$pos)
+   nposition - mean(current$newpos)
+   numhigh - nrow(current[which(current$fst  threshold),])
+   colors - mean(current$colors)
+   output - matrix(NA,nrow=1,ncol=8)
+   numthresh - paste(# SNPs  Fst = , threshold, sep=)
+   colnames(output) - c(gene, gene_old, pos, newpos, #
Snps,
numthresh, Max.Fst, colors)
+   output[1,1] - i
+   output[1,2] - as.character(current[1, gene_old])
+   output[1,3] - position
+   output[1,4] - nposition
+   output[1,5] - num
+   output[1,6] - numhigh
+   output[1,7] - fst
+   output[1,8] - colors
+   maxfstgene - rbind(maxfstgene, output)
+ }
Error in output[1, 2] - as.character(current[1, gene_old]) :
  replacement has length zero
In addition: Warning message:
In mean.default(current$pos) :
  argument is not numeric or logical: returning NA
--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] looping in R

2014-04-02 Thread Duncan Murdoch


On 02/04/2014, 3:15 PM, Abugri James wrote:

I ran the following loop on my SNP data and got an error message as
indicated


I would assume that the error message is accurate: 
as.character(current[1, gene_old]) has length zero.  You'll need to 
debug why that happened.


Duncan Murdoch


for (i in genenames){
+   current - fst1[which(fst1$Gene == i),]
+   num - nrow(current)
+   fst - max(current$fst)
+   position - mean(current$pos)
+   nposition - mean(current$newpos)
+   numhigh - nrow(current[which(current$fst  threshold),])
+   colors - mean(current$colors)
+   output - matrix(NA,nrow=1,ncol=8)
+   numthresh - paste(# SNPs  Fst = , threshold, sep=)
+   colnames(output) - c(gene, gene_old, pos, newpos, # Snps,
numthresh, Max.Fst, colors)
+   output[1,1] - i
+   output[1,2] - as.character(current[1, gene_old])
+   output[1,3] - position
+   output[1,4] - nposition
+   output[1,5] - num
+   output[1,6] - numhigh
+   output[1,7] - fst
+   output[1,8] - colors
+   maxfstgene - rbind(maxfstgene, output)
+ }
Error in output[1, 2] - as.character(current[1, gene_old]) :
   replacement has length zero
In addition: Warning message:
In mean.default(current$pos) :
   argument is not numeric or logical: returning NA
--



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] looping through 3D array

2014-01-11 Thread arun





Hi Alex,

Not sure if this is what you wanted.
length(res) #from the previous 'example' using ##indx - 
combn(dim(results)[1],2)
#[1] 45

mat1 - matrix(0,10,10)
 mat1[lower.tri(mat1)] - res
 mat1[upper.tri(mat1)] - res

A.K.









On Saturday, January 11, 2014 12:22 AM, alex padron alexpadron1...@gmail.com 
wrote:

Thanks for these. 

I am using 

library(multicore)
res - unlist(mclapply(seq_len(ncol(indx)), function(i) { x1 - 
indx[,i];emd2d(results2[x1[1],,],results2[x1[2],,])}, mc.cores=48))

How can I write the output of res as a matrix that has rows and columns for 
each of the calculated distances? So if there is a total of say 100 distances 
calculated the output matrix should have 10 rows and 10 columns. 


-Alex


On Thu, Jan 9, 2014 at 2:05 PM, arun smartpink...@yahoo.com wrote:

Also,
Check these links:

http://stackoverflow.com/questions/1358003/tricks-to-manage-the-available-memory-in-an-r-session

http://www.bigmemory.org/
A.K.



On Thursday, January 9, 2014 4:58 PM, arun smartpink...@yahoo.com wrote:
Hi Alex,

Regarding the code:

indx - combn(dim(results)[1],2) #shouldn't be hard

dim(results)[1]
#[1] 10

seq(dim(results)[1])
# [1]  1  2  3  4  5  6  7  8  9 10
 combn(seq(dim(results)[1]),2) #gives the same result

l1 - lapply(seq_len(ncol(indx)),function(i) {x1 - indx[,i]}) ##check the 
results of this one
#Here, I am looping through the columns of the 'indx' matrix. i.e. Each 
element of 'l1' is a vector of length 2.

l1[[1]]
#[1] 1 2
 is.vector(l1[[1]])
#[1] TRUE


#The idea is to apply the function emd2d() on each of the vector elements, 
that act as indx for the array results
#For example:
 lapply(l1,function(x1) results[x1[1],,])
 lapply(l1,function(x1) results[x1[2],,])
#When we combine that:

 lapply(l1,function(x1) emd2d(results[x1[1],,],results[x1[2],,]))
sapply(l1,function(x1) emd2d(results[x1[1],,],results[x1[2],,])) #gets the 
result as vector.

For the second part, you are using combn() on a vector of 13,000.  If the 
memory holds for getting the results of indx.  Then, you could split the 
indx into smaller matrices and try to run.
Eg.
indx1 - indx[,1:20]  #in your big dataset, change accordingly
indx2 - indx[,21:35]

etc..
A.K.









On Thursday, January 9, 2014 4:34 PM, alex padron alexpadron1...@gmail.com 
wrote:

I don't fully understand your code because it is beyond me. Could you explain 
it a bit and also when I run it on my real data set R crashes because it runs 
out of memory. Any way around this?
On Jan 9, 2014 1:15 PM, alex padron alexpadron1...@gmail.com wrote:

You are awesome. 


-Alex


On Thu, Jan 9, 2014 at 1:10 PM, arun smartpink...@yahoo.com wrote:

Hi,
No problem.
You can use ?lower.tri() or ?upper.tri()

res[lower.tri(res)]
res[lower.tri(res,diag=TRUE)]
#Other way would be to use:
?combn
indx - combn(dim(results)[1],m=2)


res2 - sapply(seq_len(ncol(indx)),function(i) {x1 - indx[,i]; 
emd2d(results[x1[1],,],results[x1[2],,]) })
 identical(res[lower.tri(res)], res2)
#[1] TRUE
A.K.





On Thursday, January 9, 2014 4:03 PM, alex padron alexpadron1...@gmail.com 
wrote:

Thanks. This works. I just noticed that half of the matrix repeats. For 
example res[1,2] is the same as res[2,1]. any way to get half of the matrix 
output (notice the diagonal 0 across the output matrix)?



-Alex


On Thu, Jan 9, 2014 at 12:57 PM, arun smartpink...@yahoo.com wrote:

#or
you can use ?expand.grid() and then loop over:
indx - expand.grid(rep(list(seq(dim(results)[1])),2))
res1 - matrix(sapply(seq_len(nrow(indx)),function(i) {x1 - indx[i,]; 
emd2d(results[x1[,1],,],results[x1[,2],,]) }),ncol=10)
identical(res,res1)
#[1] TRUE





On Thursday, January 9, 2014 3:46 PM, arun smartpink...@yahoo.com wrote:
Hi,
Try:
library(emdist)

set.seed(435)
results- array(sample(1:400,120,replace=TRUE),dim=c(10,3,4))
res - sapply(seq(dim(results)[1]),function(i) {x1 - results[i,,]; x2 - 
results; sapply(seq(dim(x2)[1]),function(i) emd2d(x1,x2[i,,]))})
dim(res)
#[1] 10 10
A.K.







On Thursday, January 9, 2014 3:25 PM, alex padron 
alexpadron1...@gmail.com wrote:

I'll try to be clearer. in your example we have: results- 
array(1:120,dim=c(10,3,4)) 

I want to do the following: compare results[1,,] with every matrix inside 
results. I then want to jump to results[2,,] and compare it to all of the 
other 10 matrices inside results and so on. so emd2d from the emdist 
package outputs a single value when comparing matrices and since your 
example has 10 matrices who are all being compared, the output should be 
100 values. 

Does that make sense?


-Alex





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] looping through 3D array

2014-01-10 Thread apadr007

It seems like emdist does not like to compare matrices with all 0 values. I
ended up removing those from my 3D array and have ~8000 matrices instead of
13000.

I am using res2 - unlist(mclapply(seq_len(ncol(indx)),function(i) {x1 -
indx[,i]; emd2d(results[x1[1],,],results[x1[2],,]) }) )

But even with mclapply it is taking extremely long. Any way to speed this
up?

On Jan 9, 2014 4:10 PM, arun kirshna [via R] 
ml-node+s789695n4683362...@n4.nabble.com wrote:

 Hi,
 No problem.
 You can use ?lower.tri() or ?upper.tri()

 res[lower.tri(res)]
 res[lower.tri(res,diag=TRUE)]
 #Other way would be to use:
 ?combn
 indx - combn(dim(results)[1],m=2)


 res2 - sapply(seq_len(ncol(indx)),function(i) {x1 - indx[,i];
emd2d(results[x1[1],,],results[x1[2],,]) })
  identical(res[lower.tri(res)], res2)
 #[1] TRUE
 A.K.




 On Thursday, January 9, 2014 4:03 PM, alex padron [hidden email] wrote:

 Thanks. This works. I just noticed that half of the matrix repeats. For
example res[1,2] is the same as res[2,1]. any way to get half of the matrix
output (notice the diagonal 0 across the output matrix)?



 -Alex


 On Thu, Jan 9, 2014 at 12:57 PM, arun [hidden email] wrote:

 #or

 you can use ?expand.grid() and then loop over:
 indx - expand.grid(rep(list(seq(dim(results)[1])),2))
 res1 - matrix(sapply(seq_len(nrow(indx)),function(i) {x1 - indx[i,];
emd2d(results[x1[,1],,],results[x1[,2],,]) }),ncol=10)
 identical(res,res1)
 #[1] TRUE
 
 
 
 
 
 On Thursday, January 9, 2014 3:46 PM, arun [hidden email] wrote:
 Hi,
 Try:
 library(emdist)
 
 set.seed(435)
 results- array(sample(1:400,120,replace=TRUE),dim=c(10,3,4))
 res - sapply(seq(dim(results)[1]),function(i) {x1 - results[i,,]; x2
- results; sapply(seq(dim(x2)[1]),function(i) emd2d(x1,x2[i,,]))})
 dim(res)
 #[1] 10 10
 A.K.
 
 
 
 
 
 
 
 On Thursday, January 9, 2014 3:25 PM, alex padron [hidden email]
wrote:
 
 I'll try to be clearer. in your example we have: results-
array(1:120,dim=c(10,3,4))
 
 I want to do the following: compare results[1,,] with every matrix
inside results. I then want to jump to results[2,,] and compare it to all
of the other 10 matrices inside results and so on. so emd2d from
the emdist package outputs a single value when comparing matrices and since
your example has 10 matrices who are all being compared, the output should
be 100 values.
 
 Does that make sense?
 
 
 -Alex
 

 __
 [hidden email] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 
 If you reply to this email, your message will be added to the discussion
below:

http://r.789695.n4.nabble.com/looping-through-3D-array-tp4683350p4683362.html
 To unsubscribe from looping through 3D array, click here.
 NAML




--
View this message in context: 
http://r.789695.n4.nabble.com/looping-through-3D-array-tp4683350p4683403.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] looping through 3D array

2014-01-09 Thread arun

Hi,
Try:
library(emdist)

set.seed(435)
results- array(sample(1:400,120,replace=TRUE),dim=c(10,3,4)) 
res - sapply(seq(dim(results)[1]),function(i) {x1 - results[i,,]; x2 - 
results; sapply(seq(dim(x2)[1]),function(i) emd2d(x1,x2[i,,]))})
dim(res)
#[1] 10 10
A.K.






On Thursday, January 9, 2014 3:25 PM, alex padron alexpadron1...@gmail.com 
wrote:

I'll try to be clearer. in your example we have: results- 
array(1:120,dim=c(10,3,4)) 

I want to do the following: compare results[1,,] with every matrix inside 
results. I then want to jump to results[2,,] and compare it to all of the other 
10 matrices inside results and so on. so emd2d from the emdist package outputs 
a single value when comparing matrices and since your example has 10 matrices 
who are all being compared, the output should be 100 values. 

Does that make sense?


-Alex

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] looping through 3D array

2014-01-09 Thread arun

Hi,
No problem.
You can use ?lower.tri() or ?upper.tri()

res[lower.tri(res)]
res[lower.tri(res,diag=TRUE)]
#Other way would be to use:
?combn
indx - combn(dim(results)[1],m=2)


res2 - sapply(seq_len(ncol(indx)),function(i) {x1 - indx[,i]; 
emd2d(results[x1[1],,],results[x1[2],,]) })
 identical(res[lower.tri(res)], res2)
#[1] TRUE
A.K.




On Thursday, January 9, 2014 4:03 PM, alex padron alexpadron1...@gmail.com 
wrote:

Thanks. This works. I just noticed that half of the matrix repeats. For example 
res[1,2] is the same as res[2,1]. any way to get half of the matrix output 
(notice the diagonal 0 across the output matrix)?



-Alex


On Thu, Jan 9, 2014 at 12:57 PM, arun smartpink...@yahoo.com wrote:

#or
you can use ?expand.grid() and then loop over:
indx - expand.grid(rep(list(seq(dim(results)[1])),2))
res1 - matrix(sapply(seq_len(nrow(indx)),function(i) {x1 - indx[i,]; 
emd2d(results[x1[,1],,],results[x1[,2],,]) }),ncol=10)
identical(res,res1)
#[1] TRUE





On Thursday, January 9, 2014 3:46 PM, arun smartpink...@yahoo.com wrote:
Hi,
Try:
library(emdist)

set.seed(435)
results- array(sample(1:400,120,replace=TRUE),dim=c(10,3,4))
res - sapply(seq(dim(results)[1]),function(i) {x1 - results[i,,]; x2 - 
results; sapply(seq(dim(x2)[1]),function(i) emd2d(x1,x2[i,,]))})
dim(res)
#[1] 10 10
A.K.







On Thursday, January 9, 2014 3:25 PM, alex padron alexpadron1...@gmail.com 
wrote:

I'll try to be clearer. in your example we have: results- 
array(1:120,dim=c(10,3,4)) 

I want to do the following: compare results[1,,] with every matrix inside 
results. I then want to jump to results[2,,] and compare it to all of the 
other 10 matrices inside results and so on. so emd2d from the emdist package 
outputs a single value when comparing matrices and since your example has 10 
matrices who are all being compared, the output should be 100 values. 

Does that make sense?


-Alex


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] looping function through list

2014-01-02 Thread arun

Hi,
May be this helps:
 set.seed(42)
 output1 - list(list(matrix(0,8,11),matrix(0,8,11)), 
list(matrix(rnorm(80),8,10),matrix(rnorm(80),8,10)))
 library(emdist)
 sapply(output1,function(x) {emd2d(x[[seq_along(x)[1]]],x[[seq_along(x)[2]]]) })
#[1]   NaN -6.089909

A.K.

I'm trying to apply a function to a list using rapply but I'm having 
trouble doing so. I'm trying to calculate the earth-movers distance 
using the emdist package. Every index in the list has two subindices. I 
want to calculate the earth-movers distance for these subindices 
iteratively. An example of the list: 

head(output)

[[1]] 
[[1]][[1]] 
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] 
[1,]    0    0    0    0    0    0    0    0    0     0     0 
[2,]    0    0    0    0    0    0    0    0    0     0     0 
[3,]    0    0    0    0    0    0    0    0    0     0     0 
[4,]    0    0    0    0    0    0    0    0    0     0     0 
[5,]    0    0    0    0    0    0    0    0    0     0     0 
[6,]    0    0    0    0    0    0    0    0    0     0     0 
[7,]    0    0    0    0    0    0    0    0    0     0     0 
[8,]    0    0    0    0    0    0    0    0    0     0     0 

[[1]][[2]] 
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] 
[1,]    0    0    0    0    0    0    0    0    0     0     0 
[2,]    0    0    0    0    0    0    0    0    0     0     0 
[3,]    0    0    0    0    0    0    0    0    0     0     0 
[4,]    0    0    0    0    0    0    0    0    0     0     0 
[5,]    0    0    0    0    0    0    0    0    0     0     0 
[6,]    0    0    0    0    0    0    0    0    0     0     0 
[7,]    0    0    0    0    0    0    0    0    0     0     0 
[8,]    0    0    0    0    0    0    0    0    0     0     0 


[[2]] 
[[2]][[1]] 
         [,1]     [,2]     [,3]     [,4]     [,5]    [,6]     [,7]     [,8]     
[,9]    [,10] 
[1,] 1.336731 3.264521 4.567414 4.871417 4.386032 5.75678 6.549675 5.834462 
5.401988 5.933774 
[2,] 1.336731 3.264521 4.567414 4.871417 4.386032 5.75678 6.304306 5.834462 
5.401988 5.933774 
[3,] 1.336731 3.264521 4.567414 4.871417 4.386032 5.75678 6.485151 5.834462 
5.401988 5.933774 
[4,] 1.336731 3.264521 4.567414 4.871417 4.386032 5.75678 6.691836 5.834462 
5.401988 5.933774 
[5,] 1.336731 3.264521 4.567414 4.871417 4.386032 5.75678 6.691836 5.834462 
5.401988 5.933774 
[6,] 1.336731 3.264521 4.567414 4.871417 4.386032 5.75678 6.691836 5.834462 
5.401988 5.933774 
[7,] 1.336731 3.264521 4.567414 4.871417 4.386032 5.75678 6.691836 5.834462 
5.401988 5.933774 
[8,] 1.336731 3.264521 4.567414 4.871417 4.386032 5.75678 6.691836 5.834462 
5.401988 5.933774 
        [,11] 
[1,] 6.549675 
[2,] 6.304306 
[3,] 6.485151 
[4,] 6.790983 
[5,] 7.102360 
[6,] 7.211278 
[7,] 7.211278 
[8,] 7.164059 

[[2]][[2]] 
         [,1]     [,2]     [,3]     [,4]      [,5]     [,6]     [,7]     [,8]   
  [,9] 
[1,] 6.886406 8.814196 10.11709 10.42109  9.935707 11.30645 12.24151 11.38414 
10.95166 
[2,] 6.641038 8.568828  9.87172 10.17572  9.690339 11.06109 11.99614 11.13877 
10.70629 
[3,] 6.821883 8.749673 10.05257 10.35657  9.871184 11.24193 12.17699 11.31961 
10.88714 
[4,] 7.127715 9.055504 10.35840 10.66240 10.177015 11.54776 12.48282 11.62545 
11.19297 
[5,] 7.439092 9.366881 10.66977 10.97378 10.488392 11.85914 12.79420 11.93682 
11.50435 
[6,] 7.749465 9.677255 10.98015 11.28415 10.798766 12.16951 13.10457 12.24720 
11.81472 
[7,] 7.783697 9.711487 11.01438 11.31838 10.832998 12.20375 13.13880 12.28143 
11.84895 
[8,] 7.500790 9.428580 10.73147 11.03548 10.550091 11.92084 12.85590 11.99852 
11.56605 
        [,10]    [,11] 
[1,] 11.48345 12.76095 
[2,] 11.23808 12.51558 
[3,] 11.41893 12.69643 
[4,] 11.72476 13.00226 
[5,] 12.03613 13.31364 
[6,] 12.34651 13.62401 
[7,] 12.38074 13.65824 
[8,] 12.09783 13.37534 

I have tried combining rapply and do.call in this fashion but it has failed so 
far: 

library(emdist)
do.call(rbind, rapply(output, function(x,y) emd2d))

The error message I get is: 
Error in (function (..., deparse.level = 1)  : 
  cannot coerce type 'closure' to vector of type 'list'

Any ideas?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] looping function through list

2014-01-02 Thread arun



#or
 mapply(emd2d,sapply(output1,`[`,1),sapply(output1,`[`,2))
#[1]   NaN -6.089909
A.K.


On Thursday, January 2, 2014 2:33 PM, arun smartpink...@yahoo.com wrote:
Hi,
May be this helps:
 set.seed(42)
 output1 - list(list(matrix(0,8,11),matrix(0,8,11)), 
list(matrix(rnorm(80),8,10),matrix(rnorm(80),8,10)))
 library(emdist)
 sapply(output1,function(x) {emd2d(x[[seq_along(x)[1]]],x[[seq_along(x)[2]]]) })
#[1]   NaN -6.089909

A.K.

I'm trying to apply a function to a list using rapply but I'm having 
trouble doing so. I'm trying to calculate the earth-movers distance 
using the emdist package. Every index in the list has two subindices. I 
want to calculate the earth-movers distance for these subindices 
iteratively. An example of the list: 

head(output)

[[1]] 
[[1]][[1]] 
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] 
[1,]    0    0    0    0    0    0    0    0    0     0     0 
[2,]    0    0    0    0    0    0    0    0    0     0     0 
[3,]    0    0    0    0    0    0    0    0    0     0     0 
[4,]    0    0    0    0    0    0    0    0    0     0     0 
[5,]    0    0    0    0    0    0    0    0    0     0     0 
[6,]    0    0    0    0    0    0    0    0    0     0     0 
[7,]    0    0    0    0    0    0    0    0    0     0     0 
[8,]    0    0    0    0    0    0    0    0    0     0     0 

[[1]][[2]] 
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] 
[1,]    0    0    0    0    0    0    0    0    0     0     0 
[2,]    0    0    0    0    0    0    0    0    0     0     0 
[3,]    0    0    0    0    0    0    0    0    0     0     0 
[4,]    0    0    0    0    0    0    0    0    0     0     0 
[5,]    0    0    0    0    0    0    0    0    0     0     0 
[6,]    0    0    0    0    0    0    0    0    0     0     0 
[7,]    0    0    0    0    0    0    0    0    0     0     0 
[8,]    0    0    0    0    0    0    0    0    0     0     0 


[[2]] 
[[2]][[1]] 
         [,1]     [,2]     [,3]     [,4]     [,5]    [,6]     [,7]     [,8]     
[,9]    [,10] 
[1,] 1.336731 3.264521 4.567414 4.871417 4.386032 5.75678 6.549675 5.834462 
5.401988 5.933774 
[2,] 1.336731 3.264521 4.567414 4.871417 4.386032 5.75678 6.304306 5.834462 
5.401988 5.933774 
[3,] 1.336731 3.264521 4.567414 4.871417 4.386032 5.75678 6.485151 5.834462 
5.401988 5.933774 
[4,] 1.336731 3.264521 4.567414 4.871417 4.386032 5.75678 6.691836 5.834462 
5.401988 5.933774 
[5,] 1.336731 3.264521 4.567414 4.871417 4.386032 5.75678 6.691836 5.834462 
5.401988 5.933774 
[6,] 1.336731 3.264521 4.567414 4.871417 4.386032 5.75678 6.691836 5.834462 
5.401988 5.933774 
[7,] 1.336731 3.264521 4.567414 4.871417 4.386032 5.75678 6.691836 5.834462 
5.401988 5.933774 
[8,] 1.336731 3.264521 4.567414 4.871417 4.386032 5.75678 6.691836 5.834462 
5.401988 5.933774 
        [,11] 
[1,] 6.549675 
[2,] 6.304306 
[3,] 6.485151 
[4,] 6.790983 
[5,] 7.102360 
[6,] 7.211278 
[7,] 7.211278 
[8,] 7.164059 

[[2]][[2]] 
         [,1]     [,2]     [,3]     [,4]      [,5]     [,6]     [,7]     [,8]   
  [,9] 
[1,] 6.886406 8.814196 10.11709 10.42109  9.935707 11.30645 12.24151 11.38414 
10.95166 
[2,] 6.641038 8.568828  9.87172 10.17572  9.690339 11.06109 11.99614 11.13877 
10.70629 
[3,] 6.821883 8.749673 10.05257 10.35657  9.871184 11.24193 12.17699 11.31961 
10.88714 
[4,] 7.127715 9.055504 10.35840 10.66240 10.177015 11.54776 12.48282 11.62545 
11.19297 
[5,] 7.439092 9.366881 10.66977 10.97378 10.488392 11.85914 12.79420 11.93682 
11.50435 
[6,] 7.749465 9.677255 10.98015 11.28415 10.798766 12.16951 13.10457 12.24720 
11.81472 
[7,] 7.783697 9.711487 11.01438 11.31838 10.832998 12.20375 13.13880 12.28143 
11.84895 
[8,] 7.500790 9.428580 10.73147 11.03548 10.550091 11.92084 12.85590 11.99852 
11.56605 
        [,10]    [,11] 
[1,] 11.48345 12.76095 
[2,] 11.23808 12.51558 
[3,] 11.41893 12.69643 
[4,] 11.72476 13.00226 
[5,] 12.03613 13.31364 
[6,] 12.34651 13.62401 
[7,] 12.38074 13.65824 
[8,] 12.09783 13.37534 

I have tried combining rapply and do.call in this fashion but it has failed so 
far: 

library(emdist)
do.call(rbind, rapply(output, function(x,y) emd2d))

The error message I get is: 
Error in (function (..., deparse.level = 1)  : 
  cannot coerce type 'closure' to vector of type 'list'

Any ideas?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] looping function through list

2014-01-02 Thread arun

HI,
I tested it on R 3.0.2 console (linux) and also on Rstudio Version 0.98.490.  
It seems alright.
A.K.  


Thanks for this. I am trying to run the code you posted but Rstudio keeps 
crashing. I am trying to run it on the example output1 since it's small but 
that crashes as well. 



On Thursday, January 2, 2014 2:36 PM, arun smartpink...@yahoo.com wrote:


#or
 mapply(emd2d,sapply(output1,`[`,1),sapply(output1,`[`,2))
#[1]   NaN -6.089909
A.K.



On Thursday, January 2, 2014 2:33 PM, arun smartpink...@yahoo.com wrote:
Hi,
May be this helps:
 set.seed(42)
 output1 - list(list(matrix(0,8,11),matrix(0,8,11)), 
list(matrix(rnorm(80),8,10),matrix(rnorm(80),8,10)))
 library(emdist)
 sapply(output1,function(x) {emd2d(x[[seq_along(x)[1]]],x[[seq_along(x)[2]]]) })
#[1]   NaN -6.089909

A.K.

I'm trying to apply a function to a list using rapply but I'm having 
trouble doing so. I'm trying to calculate the earth-movers distance 
using the emdist package. Every index in the list has two subindices. I 
want to calculate the earth-movers distance for these subindices 
iteratively. An example of the list: 

head(output)

[[1]] 
[[1]][[1]] 
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] 
[1,]    0    0    0    0    0    0    0    0    0     0     0 
[2,]    0    0    0    0    0    0    0    0    0     0     0 
[3,]    0    0    0    0    0    0    0    0    0     0     0 
[4,]    0    0    0    0    0    0    0    0    0     0     0 
[5,]    0    0    0    0    0    0    0    0    0     0     0 
[6,]    0    0    0    0    0    0    0    0    0     0     0 
[7,]    0    0    0    0    0    0    0    0    0     0     0 
[8,]    0    0    0    0    0    0    0    0    0     0     0 

[[1]][[2]] 
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] 
[1,]    0    0    0    0    0    0    0    0    0     0     0 
[2,]    0    0    0    0    0    0    0    0    0     0     0 
[3,]    0    0    0    0    0    0    0    0    0     0     0 
[4,]    0    0    0    0    0    0    0    0    0     0     0 
[5,]    0    0    0    0    0    0    0    0    0     0     0 
[6,]    0    0    0    0    0    0    0    0    0     0     0 
[7,]    0    0    0    0    0    0    0    0    0     0     0 
[8,]    0    0    0    0    0    0    0    0    0     0     0 


[[2]] 
[[2]][[1]] 
         [,1]     [,2]     [,3]     [,4]     [,5]    [,6]     [,7]     [,8]     
[,9]    [,10] 
[1,] 1.336731 3.264521 4.567414 4.871417 4.386032 5.75678 6.549675 5.834462 
5.401988 5.933774 
[2,] 1.336731 3.264521 4.567414 4.871417 4.386032 5.75678 6.304306 5.834462 
5.401988 5.933774 
[3,] 1.336731 3.264521 4.567414 4.871417 4.386032 5.75678 6.485151 5.834462 
5.401988 5.933774 
[4,] 1.336731 3.264521 4.567414 4.871417 4.386032 5.75678 6.691836 5.834462 
5.401988 5.933774 
[5,] 1.336731 3.264521 4.567414 4.871417 4.386032 5.75678 6.691836 5.834462 
5.401988 5.933774 
[6,] 1.336731 3.264521 4.567414 4.871417 4.386032 5.75678 6.691836 5.834462 
5.401988 5.933774 
[7,] 1.336731 3.264521 4.567414 4.871417 4.386032 5.75678 6.691836 5.834462 
5.401988 5.933774 
[8,] 1.336731 3.264521 4.567414 4.871417 4.386032 5.75678 6.691836 5.834462 
5.401988 5.933774 
        [,11] 
[1,] 6.549675 
[2,] 6.304306 
[3,] 6.485151 
[4,] 6.790983 
[5,] 7.102360 
[6,] 7.211278 
[7,] 7.211278 
[8,] 7.164059 

[[2]][[2]] 
         [,1]     [,2]     [,3]     [,4]      [,5]     [,6]     [,7]     [,8]   
  [,9] 
[1,] 6.886406 8.814196 10.11709 10.42109  9.935707 11.30645 12.24151 11.38414 
10.95166 
[2,] 6.641038 8.568828  9.87172 10.17572  9.690339 11.06109 11.99614 11.13877 
10.70629 
[3,] 6.821883 8.749673 10.05257 10.35657  9.871184 11.24193 12.17699 11.31961 
10.88714 
[4,] 7.127715 9.055504 10.35840 10.66240 10.177015 11.54776 12.48282 11.62545 
11.19297 
[5,] 7.439092 9.366881 10.66977 10.97378 10.488392 11.85914 12.79420 11.93682 
11.50435 
[6,] 7.749465 9.677255 10.98015 11.28415 10.798766 12.16951 13.10457 12.24720 
11.81472 
[7,] 7.783697 9.711487 11.01438 11.31838 10.832998 12.20375 13.13880 12.28143 
11.84895 
[8,] 7.500790 9.428580 10.73147 11.03548 10.550091 11.92084 12.85590 11.99852 
11.56605 
        [,10]    [,11] 
[1,] 11.48345 12.76095 
[2,] 11.23808 12.51558 
[3,] 11.41893 12.69643 
[4,] 11.72476 13.00226 
[5,] 12.03613 13.31364 
[6,] 12.34651 13.62401 
[7,] 12.38074 13.65824 
[8,] 12.09783 13.37534 

I have tried combining rapply and do.call in this fashion but it has failed so 
far: 

library(emdist)
do.call(rbind, rapply(output, function(x,y) emd2d))

The error message I get is: 
Error in (function (..., deparse.level = 1)  : 
  cannot coerce type 'closure' to vector of type 'list'

Any ideas?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] looping through columns in a matrix or data frame

2014-01-01 Thread arun

Hi,
May be this helps:

Using your function:
mapply(less,test,4)

#or
 invisible(mapply(less,test,4))
#[1] 2 3
#[1] 3

#or

 for(i in 1:ncol(test)){
 less(test[,i],4)}
#[1] 2 3
#[1] 3


A.K.



Hi, I'm trying to figure out how to loop through columns in a matrix or 
data frame, but what I've been finding online has not been very clear. 
I've written the following simple function that I can use on a column to 
extract all values that are less than a specified number. Consider the 
following example using that function to extract all values less than 4 
from column1 of the table test 

 less - function(x,y){print(x[which(x  y)])} 

 test 
  column1 column2 
1       2       3 
2       3       4 
3       4       5 

 less(test[,1],4) 
[1] 2 3 


What I want to do is loop that function over all the columns
 in the table. Note: I realize that this is a silly example and there 
are better ways to do this particular function in R, so please don't 
respond with better ways to extract values less than a given number. The
 question that I am interested in is merely how do I loop over the 
columns. If you could respond by modifying my silly function so that it 
will loop, that would be the most helpful response. Thanks for the 
advice!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping over names of multiple data frames in an R for() loop XXXX

2013-10-13 Thread Mikkel Grum

You might want to try:
assign(d[1], read.csv(yourfile.csv))

...
write.csv(d1, yourfile.csv, append = FALSE)

Regards
Mikkel



On Friday, October 11, 2013 2:53 PM, Dan Abner dan.abne...@gmail.com wrote:
 
Hi everybody,

I thought I was using the get() fn correctly here to loop over multiple
data frame names in an R for() loop. Can someone advise?


 miss-c(#NULL!,999)
 d-c(d1,d2,d3,d4)

 for(i in 1:4){
+
+ miss1-ifelse(i=2,miss[1],miss[2])
+ miss1
+
+ get(d[i])-read.csv(paste(C:\\DATA\\Data\\Original\\,dsn[i],sep=),
+ na.strings=c(miss1,9))
+
+ head(get(d[i]))
+
+ write.csv(get(d[i]),paste(C:\\DATA\\Data\\,dsn[i],sep=),
+ na=.)
+
+ }
Error in get(d[i]) - read.csv(paste(C:\\DATA\\Data\\Original\\, dsn[i],
:
  could not find function get-


Thanks!

Dan

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Looping over names of multiple data frames in an R for() loop XXXX

2013-10-11 Thread Dan Abner

Hi everybody,

I thought I was using the get() fn correctly here to loop over multiple
data frame names in an R for() loop. Can someone advise?


 miss-c(#NULL!,999)
 d-c(d1,d2,d3,d4)

 for(i in 1:4){
+
+ miss1-ifelse(i=2,miss[1],miss[2])
+ miss1
+
+ get(d[i])-read.csv(paste(C:\\DATA\\Data\\Original\\,dsn[i],sep=),
+ na.strings=c(miss1,9))
+
+ head(get(d[i]))
+
+ write.csv(get(d[i]),paste(C:\\DATA\\Data\\,dsn[i],sep=),
+ na=.)
+
+ }
Error in get(d[i]) - read.csv(paste(C:\\DATA\\Data\\Original\\, dsn[i],
:
  could not find function get-


Thanks!

Dan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping over names of multiple data frames in an R for() loop XXXX

2013-10-11 Thread jim holtman

I think you want 'assign' at that point.  Would suggest using a 'list'
to store the input instead of unique named objects.  'list's are
easier to manage.

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Fri, Oct 11, 2013 at 2:59 PM, Dan Abner dan.abne...@gmail.com wrote:
 Hi everybody,

 I thought I was using the get() fn correctly here to loop over multiple
 data frame names in an R for() loop. Can someone advise?


 miss-c(#NULL!,999)
 d-c(d1,d2,d3,d4)

 for(i in 1:4){
 +
 + miss1-ifelse(i=2,miss[1],miss[2])
 + miss1
 +
 + get(d[i])-read.csv(paste(C:\\DATA\\Data\\Original\\,dsn[i],sep=),
 + na.strings=c(miss1,9))
 +
 + head(get(d[i]))
 +
 + write.csv(get(d[i]),paste(C:\\DATA\\Data\\,dsn[i],sep=),
 + na=.)
 +
 + }
 Error in get(d[i]) - read.csv(paste(C:\\DATA\\Data\\Original\\, dsn[i],
 :
   could not find function get-


 Thanks!

 Dan

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping an lapply linear regression function

2013-09-06 Thread arun

HI,
Using the example dataset (Test_data.csv):
dat1- read.csv(Test_data.csv,header=TRUE,sep=\t,row.names=1)
indx2-expand.grid(names(dat1),names(dat1),stringsAsFactors=FALSE) 
indx2New- indx2[indx2[,1]!=indx2[,2],] 
res2-t(sapply(seq_len(nrow(indx2New)),function(i) {x1- indx2New[i,]; 
x2-cbind(dat1[x1[,1]],dat1[x1[,2]]);summary(lm(x2[,1]~x2[,2]))$coef[,4]}))
 dat2- cbind(indx2New,value=res2[,2])
library(reshape2)
res2New- dcast(dat2,Var1~Var2,value.var=value)
row.names(res2New)- res2New[,1]
 res2New- as.matrix(res2New[,-1])
 dim(res2New)
#[1] 28 28
head(res2New,3)
#    AgriEmi   AgriMach  AgriValAd AgrVaGDP   AIL ALAre
#AgriEmi  NA 0.23401895 0.45697412 4.644877e-01 0.6398030 0.4039855
#AgriMach  0.2340189 NA 0.01449519 4.922558e-06 0.3890046 0.9279044
#AgriValAd 0.4569741 0.01449519 NA 5.135269e-02 0.5325943 0.4872555
#  ALPer  ANS AraLa  AraLaPer    CombusRen  ForArea
#AgriEmi   0.4039855 2.507257e-01 0.2303275 0.2303275 0.9438409125 0.0004473563
#AgriMach  0.9279044 6.072123e-05 0.3154370 0.3154370 0.0040254771 0.2590309747
#AgriValAd 0.4872555 2.060412e-01 0.8449600 0.8449600 0.0008077264 0.5152352072
# ForArePer  ForProTon ForProTonSKm  ForRen  GDP
#AgriEmi   0.0004473563 0.01714768 0.0007089448 0.900222038 0.6022470671
#AgriMach  0.2590309748 0.20170800 0.2305335762 0.005584703 0.4199684378
#AgriValAd 0.5152352071 0.80983446 0.4368256400 0.208975126 0.0003534226
#   GEF GroAgriProVal PermaCrop  RoadDens   RoadTot  RurPopGro
#AgriEmi   0.0008580856    0.01078593 0.6863110 0.6398030 0.6398030 0.40734903
#AgriMach  0.1315182244    0.14074612 0.2530378 0.3064186 0.3064186 0.33705434
#AgriValAd 0.7520803684    0.31556633 0.1151395 0.4374599 0.4374599 0.04837586
#  RurPopPerc    TerrPA Trac  Vehi WaterWith
#AgriEmi    0.4835676 0.4504239 2.279566e-01 0.6398030 0.3056195
#AgriMach   0.6401556 0.1707857 4.730759e-33 0.3064186 0.9502553
#AgriValAd  0.2383507 0.0223124 1.513169e-02 0.1251843 0.3307148


#or
res3-xtabs(value~Var1+Var2,data=dat2) #here the diagonals are 0s
 attr(res3,class)- NULL
 attr(res3,call)-NULL
names(dimnames(res3))-NULL

#You can change it in the first solution also.
 res2New- dcast(dat2,Var1~Var2,value.var=value,fill=0)
row.names(res2New)- res2New[,1]
 res2New- as.matrix(res2New[,-1])
 identical(res2New,res3)
#[1] TRUE

A.K.




Arun, 

That does exactly what I wanted to do, but how would I 
manipulate into a matrix where the indepedent variable was on the x and 
dependent on y, or vice versa, rather than a 736, 2 matrix 



    V1   V2   V3   V4   V5...Vn 
V1 - 

V2       - 

V3              - 

V4                    -   

V5                          - 

Vn                               - 


- Original Message -
From: arun smartpink...@yahoo.com
To: R help r-help@r-project.org
Cc: 
Sent: Thursday, September 5, 2013 12:49 PM
Subject: Re: Looping an lapply linear regression function

HI,
May be this helps:
 set.seed(28)
 dat1- 
setNames(as.data.frame(matrix(sample(1:40,10*5,replace=TRUE),ncol=5)),letters[1:5])
indx-as.data.frame(combn(names(dat1),2),stringsAsFactors=FALSE)
res-t(sapply(indx,function(x) 
{x1-cbind(dat1[x[1]],dat1[x[2]]);summary(lm(x1[,1]~x1[,2]))$coef[,4]}))
 rownames(res)-apply(indx,2,paste,collapse=_)
 colnames(res)[2]- Coef1
 head(res,3)
#    (Intercept) Coef1
#a_b  0.39862676 0.8365606
#a_c  0.02427885 0.6094141
#a_d  0.37521423 0.7578723


#permutation
indx2-expand.grid(names(dat1),names(dat1),stringsAsFactors=FALSE)
#or
indx2- expand.grid(rep(list(names(dat1)),2),stringsAsFactors=FALSE)
indx2New- indx2[indx2[,1]!=indx2[,2],]
res2-t(sapply(seq_len(nrow(indx2New)),function(i) {x1- indx2New[i,]; 
x2-cbind(dat1[x1[,1]],dat1[x1[,2]]);summary(lm(x2[,1]~x2[,2]))$coef[,4]}))
row.names(res2)-apply(indx2New,1,paste,collapse=_)
 colnames(res2)- colnames(res)


A.K.


Hi everyone, 

First off just like to say thanks to everyone´s contributions. 
Up until now, I´ve never had to post as I´ve always found the answers 
from trawling through the database. I´ve finally managed to stump 
myself, and although for someone out there, I´m sure the answer to my 
problem is fairly simple, I, however have spent the whole day infront of
my computer struggling. I know I´ll probably get an absolute ribbing 
for making a basic mistake, or not understanding something fully, but 
I´m blind to the mistake now after looking so long at it. 

What I´m looking to do, is formulate a matrix ([28,28]) of 
p-values produced from running linear regressions of 28 variables 
against themselves (eg a~b, a~c, a~d.b~a, b~c etc...), if that makes
sense. I´ve managed to get this to work if I just input each variable 
by hand, but this isn´t going to help when I have to make 20 matrices. 

My script is as follows; 


for (j in [1:28]) 
{ 
 ##This section works perfectly, if I don´t try to loop it, I know 
this wont work at the moment, because I haven´t designated what j

Re: [R] Looping an lapply linear regression function

2013-09-05 Thread Flavio Barros

Hello Arun. Can you provide some data? To help you better i will need a
complete reproducible example ok?


On Thu, Sep 5, 2013 at 1:49 PM, arun smartpink...@yahoo.com wrote:

 HI,
 May be this helps:
  set.seed(28)
  dat1-
 setNames(as.data.frame(matrix(sample(1:40,10*5,replace=TRUE),ncol=5)),letters[1:5])
 indx-as.data.frame(combn(names(dat1),2),stringsAsFactors=FALSE)
 res-t(sapply(indx,function(x)
 {x1-cbind(dat1[x[1]],dat1[x[2]]);summary(lm(x1[,1]~x1[,2]))$coef[,4]}))
  rownames(res)-apply(indx,2,paste,collapse=_)
  colnames(res)[2]- Coef1
  head(res,3)
 #(Intercept) Coef1
 #a_b  0.39862676 0.8365606
 #a_c  0.02427885 0.6094141
 #a_d  0.37521423 0.7578723


 #permutation
 indx2-expand.grid(names(dat1),names(dat1),stringsAsFactors=FALSE)
 #or
 indx2- expand.grid(rep(list(names(dat1)),2),stringsAsFactors=FALSE)
 indx2New- indx2[indx2[,1]!=indx2[,2],]
 res2-t(sapply(seq_len(nrow(indx2New)),function(i) {x1- indx2New[i,];
 x2-cbind(dat1[x1[,1]],dat1[x1[,2]]);summary(lm(x2[,1]~x2[,2]))$coef[,4]}))
 row.names(res2)-apply(indx2New,1,paste,collapse=_)
  colnames(res2)- colnames(res)


 A.K.


 Hi everyone,

 First off just like to say thanks to everyone´s contributions.
 Up until now, I´ve never had to post as I´ve always found the answers
 from trawling through the database. I´ve finally managed to stump
 myself, and although for someone out there, I´m sure the answer to my
 problem is fairly simple, I, however have spent the whole day infront of
  my computer struggling. I know I´ll probably get an absolute ribbing
 for making a basic mistake, or not understanding something fully, but
 I´m blind to the mistake now after looking so long at it.

 What I´m looking to do, is formulate a matrix ([28,28]) of
 p-values produced from running linear regressions of 28 variables
 against themselves (eg a~b, a~c, a~d.b~a, b~c etc...), if that makes
  sense. I´ve managed to get this to work if I just input each variable
 by hand, but this isn´t going to help when I have to make 20 matrices.

 My script is as follows;


 for (j in [1:28])
 {
  ##This section works perfectly, if I don´t try to loop it, I know
 this wont work at the moment, because I haven´t designated what j is,
 but I´m showing to highlight what I´m attempting to do.


models - lapply(varlist, function(x) {
 lm(substitute(ANS ~ i, list(i = as.name(x))), data = con.i)
   })

   abc- lapply(models, function(f) summary(f)$coefficients[,4])

   abc- do.call(rbind, abc)



 }

 I get the following error when I try to loop it...

 Error in model.frame.default(formula = substitute(j ~ i, list(i = 
 as.name(x))),
  :
   variable lengths differ (found for 'ANS') ##ÄNS being my first variable

 All variables are of the same length, with 21 recordings for each


 If anyone can suggest a method of looping, or another means
 or producing ´models´ for each of my 28 variables, without having to do
 it by hand that would be fantastic.

 Thanks in advance!!

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 3 >

1 - 100 of 261 matches

Mail list logo