Re: [R] Looping Through DataFrames with Differing Lenghts

2017-03-28 Thread David L Carlson
You have multiple problems. You do not seem to understand read.csv() or 
as.Date() so you really need to read the manual pages:

?read.csv
?as.Date

> Data <- read.csv("Container.csv")
> str(Data)
'data.frame':   362 obs. of  1 variable:
 $ TransitDate.Transits: Factor w/ 362 levels "1-Apr-00\t25",..: 319 289 78 140 
110 229 18 259 199 169 ...

Notice you have a single factor that combines the TransitDate and Transits 
because the file you sent was NOT a .csv file, but a tab-delimited file:

> Data <- read.delim("Container.csv")
> str(Data)
'data.frame':   362 obs. of  2 variables:
 $ TransitDate: Factor w/ 362 levels "1-Apr-00","1-Apr-01",..: 319 289 78 140 
110 229 18 259 199 169 ...
 $ Transits   : int  4 4 5 4 3 6 4 3 4 5 ...

Now we get two variables, but the date is still a factor.

> Data <- read.delim("Container.csv", stringsAsFactors=FALSE)
> str(Data)
'data.frame':   362 obs. of  2 variables:
 $ TransitDate: chr  "1-Oct-85" "1-Nov-85" "1-Dec-85" "1-Jan-86" ...
 $ Transits   : int  4 4 5 4 3 6 4 3 4 5 ...

Now we get the date as characters, but as Ng Bo Lin pointed out, it is not in 
the format you indicated: "%Y-%m-%d", %Y means a year with the century (e.g. 
1985), but you have 2-digit years (85), %m means month as a decimal number 
(e.g. 10 for October), but you have a 3-digit abbreviation for the month. And 
the order is backwards. What you need is

> TDate <- as.Date(Data$TransitDate, "%e-%B-%y")
> head(TDate)
[1] "1985-10-01" "1985-11-01" "1985-12-01" "1986-01-01" "1986-02-01" 
"1986-03-01"

You probably should preserve the original date and not overwrite it so 
something like

> Data$Transit <- TDate
> str(Data)
'data.frame':   362 obs. of  3 variables:
 $ TransitDate: chr  "1-Oct-85" "1-Nov-85" "1-Dec-85" "1-Jan-86" ...
 $ Transits   : int  4 4 5 4 3 6 4 3 4 5 ...
 $ Transit: Date, format: "1985-10-01" "1985-11-01" ...

Would be preferable. 

-
David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77840-4352


From: Paul Bernal [mailto:paulberna...@gmail.com] 
Sent: Tuesday, March 28, 2017 9:41 AM
To: David L Carlson <dcarl...@tamu.edu>
Cc: Ng Bo Lin <ngboli...@gmail.com>; r-help@r-project.org
Subject: Re: [R] Looping Through DataFrames with Differing Lenghts

Dear friend David,

Thank you for your valuable suggestion. So here is the file in .txt format.

Best of regards,

Paul

2017-03-28 9:35 GMT-05:00 David L Carlson <dcarl...@tamu.edu>:
We did not get the file on the list. You need to rename your file to 
"Container.txt" or the mailing list will strip it from your message. The 
read.csv() function returns a data frame so Data is already a data frame. The 
command DataFrame<-data.frame(Data) just makes a copy of Data.

Without the file, it is difficult to be certain, but your dates are probably 
stored as character strings and read.csv() will turn those to factors unless 
you tell it not to do that. Try

Data<-read.csv("Container.csv", stringsAsFactors=FALSE)
str(Data) # To see how the dates are stored

and see if things work better. If not, rename the file or use dput(Data) and 
copy the result into your email message. If the data is very long, use 
dput(head(Data, 15)).

---------
David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77840-4352


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Paul Bernal
Sent: Tuesday, March 28, 2017 9:12 AM
To: Ng Bo Lin <ngboli...@gmail.com>
Cc: r-help@r-project.org
Subject: Re: [R] Looping Through DataFrames with Differing Lenghts

Dear friends Ng Bo Lin, Mark and Ulrik, thank you all for your kind and
valuable replies,

I am trying to reformat a date as follows:

Data<-read.csv("Container.csv")

DataFrame<-data.frame(Data)

DataFrame$TransitDate<-as.Date(DataFrame$TransitDate, "%Y-%m-%d")

#trying to put it in -MM-DD format

However, when I do this, I get a bunch of NAs for the dates.

I am providing a sample dataset as a reference.

Any help will be greatly appreciated,

Best regards,

Paul

2017-03-28 8:15 GMT-05:00 Ng Bo Lin <ngboli...@gmail.com>:

> Hi Paul,
>
> Using the example provided by Ulrik, where
>
> > exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01”,
> "1986-01-01"), Transits = c(NA, NA, NA, NA))
> > exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits =
> c(15,20)),
>
> You could also try the following function:
>
> for (i in 1:dim(exdf1)[1]){
>         if (!ex

Re: [R] Looping Through DataFrames with Differing Lenghts

2017-03-28 Thread Paul Bernal
Dear friend David,

Thank you for your valuable suggestion. So here is the file in .txt format.

Best of regards,

Paul

2017-03-28 9:35 GMT-05:00 David L Carlson <dcarl...@tamu.edu>:

> We did not get the file on the list. You need to rename your file to
> "Container.txt" or the mailing list will strip it from your message. The
> read.csv() function returns a data frame so Data is already a data frame.
> The command DataFrame<-data.frame(Data) just makes a copy of Data.
>
> Without the file, it is difficult to be certain, but your dates are
> probably stored as character strings and read.csv() will turn those to
> factors unless you tell it not to do that. Try
>
> Data<-read.csv("Container.csv", stringsAsFactors=FALSE)
> str(Data) # To see how the dates are stored
>
> and see if things work better. If not, rename the file or use dput(Data)
> and copy the result into your email message. If the data is very long, use
> dput(head(Data, 15)).
>
> -
> David L Carlson
> Department of Anthropology
> Texas A University
> College Station, TX 77840-4352
>
>
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Paul
> Bernal
> Sent: Tuesday, March 28, 2017 9:12 AM
> To: Ng Bo Lin <ngboli...@gmail.com>
> Cc: r-help@r-project.org
> Subject: Re: [R] Looping Through DataFrames with Differing Lenghts
>
> Dear friends Ng Bo Lin, Mark and Ulrik, thank you all for your kind and
> valuable replies,
>
> I am trying to reformat a date as follows:
>
> Data<-read.csv("Container.csv")
>
> DataFrame<-data.frame(Data)
>
> DataFrame$TransitDate<-as.Date(DataFrame$TransitDate, "%Y-%m-%d")
>
> #trying to put it in -MM-DD format
>
> However, when I do this, I get a bunch of NAs for the dates.
>
> I am providing a sample dataset as a reference.
>
> Any help will be greatly appreciated,
>
> Best regards,
>
> Paul
>
> 2017-03-28 8:15 GMT-05:00 Ng Bo Lin <ngboli...@gmail.com>:
>
> > Hi Paul,
> >
> > Using the example provided by Ulrik, where
> >
> > > exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01”,
> > "1986-01-01"), Transits = c(NA, NA, NA, NA))
> > > exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits =
> > c(15,20)),
> >
> > You could also try the following function:
> >
> > for (i in 1:dim(exdf1)[1]){
> > if (!exdf1[i, 1] %in% exdf2[, 1]){
> > exdf2 <- rbind(exdf2, exdf1[i,])
> > }
> > }
> >
> > Basically, what the function does is that it runs through the number of
> > rows in exdf1, and checks if the Date of the exdf1 row already exists in
> > Date column of exdf2. If so, it skips it. Otherwise, it binds the row to
> > df2.
> >
> > Hope this helps!
> >
> >
> > Side note.: Computational efficiency wise, think Ulrik’s answer is
> > probably better. Presentation wise, his is also much better.
> >
> > Regards,
> > Bo Lin
> >
> > > On 28 Mar 2017, at 5:22 PM, Ulrik Stervbo <ulrik.ster...@gmail.com>
> > wrote:
> > >
> > > Hi Paul,
> > >
> > > does this do what you want?
> > >
> > > exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01",
> > > "1986-01-01"), Transits = c(NA, NA, NA, NA))
> > > exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits =
> > c(15,
> > > 20))
> > >
> > > tmpdf <- subset(exdf1, !Date %in% exdf2$Date)
> > >
> > > rbind(exdf2, tmpdf)
> > >
> > > HTH,
> > > Ulrik
> > >
> > > On Tue, 28 Mar 2017 at 10:50 Paul Bernal <paulberna...@gmail.com>
> wrote:
> > >
> > > Dear friend Mark,
> > >
> > > Great suggestion! Thank you for replying.
> > >
> > > I have two dataframes, dataframe1 and dataframe2.
> > >
> > > dataframe1 has two columns, one with the dates in -MM-DD format and
> > the
> > > other colum with number of transits (all of which were set to NA
> values).
> > > dataframe1 starts in 1985-10-01 (october 1st 1985) and ends in
> 2017-03-01
> > > (march 1 2017).
> > >
> > > dataframe2 has the same  two columns, one with the dates in -MM-DD
> > > format, and the other column with number of transits. dataframe2 starts
> > > have the same 

Re: [R] Looping Through DataFrames with Differing Lenghts

2017-03-28 Thread Ng Bo Lin
Hi Paul,

Using the example provided by Ulrik, where

> exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01”, 
> "1986-01-01"), Transits = c(NA, NA, NA, NA))
> exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits = 
> c(15,20)),

You could also try the following function:

for (i in 1:dim(exdf1)[1]){
if (!exdf1[i, 1] %in% exdf2[, 1]){
exdf2 <- rbind(exdf2, exdf1[i,])
}
}

Basically, what the function does is that it runs through the number of rows in 
exdf1, and checks if the Date of the exdf1 row already exists in Date column of 
exdf2. If so, it skips it. Otherwise, it binds the row to df2.

Hope this helps!


Side note.: Computational efficiency wise, think Ulrik’s answer is probably 
better. Presentation wise, his is also much better.

Regards,
Bo Lin

> On 28 Mar 2017, at 5:22 PM, Ulrik Stervbo  wrote:
> 
> Hi Paul,
> 
> does this do what you want?
> 
> exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01",
> "1986-01-01"), Transits = c(NA, NA, NA, NA))
> exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits = c(15,
> 20))
> 
> tmpdf <- subset(exdf1, !Date %in% exdf2$Date)
> 
> rbind(exdf2, tmpdf)
> 
> HTH,
> Ulrik
> 
> On Tue, 28 Mar 2017 at 10:50 Paul Bernal  wrote:
> 
> Dear friend Mark,
> 
> Great suggestion! Thank you for replying.
> 
> I have two dataframes, dataframe1 and dataframe2.
> 
> dataframe1 has two columns, one with the dates in -MM-DD format and the
> other colum with number of transits (all of which were set to NA values).
> dataframe1 starts in 1985-10-01 (october 1st 1985) and ends in 2017-03-01
> (march 1 2017).
> 
> dataframe2 has the same  two columns, one with the dates in -MM-DD
> format, and the other column with number of transits. dataframe2 starts
> have the same start and end dates, however, dataframe2 has missing dates
> between the start and end dates, so it has fewer observations.
> 
> dataframe1 has a total of 378 observations and dataframe2 has a  total of
> 362 observations.
> 
> I would like to come up with a code that could do the following:
> 
> Get the dates of dataframe1 that are missing in dataframe2 and add them as
> records to dataframe 2 but with NA values.
> 
>  
> Date  Transits  Date
> Transits
> 1985-10-01NA 1985-10-0115
> 1985-11-01NA 1986-01-01 20
> 1985-12-01NA 1986-02-01 5
> 1986-01-01NA
> 1986-02-01NA
> 2017-03-01NA
> 
> I would like to fill in the missing dates in dataframe2, with NA as value
> for the missing transits, so that I  could end up with a dataframe3 looking
> as follows:
> 
>  DateTransits
> 1985-10-01  15
> 1985-11-01   NA
> 1985-12-01   NA
> 1986-01-01   20
> 1986-02-01   5
> 2017-03-01   NA
> 
> This is what I want to accomplish.
> 
> Thanks, beforehand for your help,
> 
> Best regards,
> 
> Paul
> 
> 
> 2017-03-27 15:15 GMT-05:00 Mark Sharp :
> 
>> Make some small dataframes of just a few rows that illustrate the problem
>> structure. Make a third that has the result you want. You will get an
>> answer very quickly. Without a self-contained reproducible problem,
> results
>> vary.
>> 
>> Mark
>> R. Mark Sharp, Ph.D.
>> msh...@txbiomed.org
>> 
>> 
>> 
>> 
>> 
>>> On Mar 27, 2017, at 3:09 PM, Paul Bernal  wrote:
>>> 
>>> Dear friends,
>>> 
>>> I have one dataframe which contains 378 observations, and another one,
>>> containing 362 observations.
>>> 
>>> Both dataframes have two columns, one date column and another one with
>> the
>>> number of transits.
>>> 
>>> I wanted to come up with a code so that I could fill in the dates that
>> are
>>> missing in one of the dataframes and replace the column of transits with
>>> the value NA.
>>> 
>>> I have tried several things but R obviously complains that the length of
>>> the dataframes are different.
>>> 
>>> How can I solve this?
>>> 
>>> Any guidance will be greatly appreciated,
>>> 
>>> Best regards,
>>> 
>>> Paul
>>> 
>>> [[alternative HTML version deleted]]
>>> 
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> CONFIDENTIALITY NOTICE: This e-mail and any files and/or attachments
>> transmitted, may contain privileged and confidential information and is
>> intended solely for the exclusive use of the individual or entity to whom
>> it is addressed. If you are not the intended recipient, you are 

Re: [R] Looping Through DataFrames with Differing Lenghts

2017-03-28 Thread Ng Bo Lin
Hi Paul,

The date format that you have supplied to R isn’t exactly right.

Instead of supplying the format “%Y-%m-%d”, it appears that the format of your 
data adheres to the “%e-%B-%y” format. In this case, %e refers to Day, and 
takes an integer between (0 - 31), %B refers to the 3 letter abbreviated 
version of the Month, and %y refers to the Year provided in a “2-integer” 
format.

Hope this helps!

Thank you.

Regards,
Bo Lin
> On 28 Mar 2017, at 10:12 PM, Paul Bernal  wrote:
> 
> Dear friends Ng Bo Lin, Mark and Ulrik, thank you all for your kind and 
> valuable replies,
> 
> I am trying to reformat a date as follows:
> 
> Data<-read.csv("Container.csv")
> 
> DataFrame<-data.frame(Data)
> 
> DataFrame$TransitDate<-as.Date(DataFrame$TransitDate, "%Y-%m-%d")
> 
> #trying to put it in -MM-DD format
> 
> However, when I do this, I get a bunch of NAs for the dates.
> 
> I am providing a sample dataset as a reference.
> 
> Any help will be greatly appreciated,
> 
> Best regards,
> 
> Paul
> 
> 2017-03-28 8:15 GMT-05:00 Ng Bo Lin  >:
> Hi Paul,
> 
> Using the example provided by Ulrik, where
> 
> > exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01”, 
> > "1986-01-01"), Transits = c(NA, NA, NA, NA))
> > exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits = 
> > c(15,20)),
> 
> You could also try the following function:
> 
> for (i in 1:dim(exdf1)[1]){
> if (!exdf1[i, 1] %in% exdf2[, 1]){
> exdf2 <- rbind(exdf2, exdf1[i,])
> }
> }
> 
> Basically, what the function does is that it runs through the number of rows 
> in exdf1, and checks if the Date of the exdf1 row already exists in Date 
> column of exdf2. If so, it skips it. Otherwise, it binds the row to df2.
> 
> Hope this helps!
> 
> 
> Side note.: Computational efficiency wise, think Ulrik’s answer is probably 
> better. Presentation wise, his is also much better.
> 
> Regards,
> Bo Lin
> 
> > On 28 Mar 2017, at 5:22 PM, Ulrik Stervbo  > > wrote:
> >
> > Hi Paul,
> >
> > does this do what you want?
> >
> > exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01",
> > "1986-01-01"), Transits = c(NA, NA, NA, NA))
> > exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits = c(15,
> > 20))
> >
> > tmpdf <- subset(exdf1, !Date %in% exdf2$Date)
> >
> > rbind(exdf2, tmpdf)
> >
> > HTH,
> > Ulrik
> >
> > On Tue, 28 Mar 2017 at 10:50 Paul Bernal  > > wrote:
> >
> > Dear friend Mark,
> >
> > Great suggestion! Thank you for replying.
> >
> > I have two dataframes, dataframe1 and dataframe2.
> >
> > dataframe1 has two columns, one with the dates in -MM-DD format and the
> > other colum with number of transits (all of which were set to NA values).
> > dataframe1 starts in 1985-10-01 (october 1st 1985) and ends in 2017-03-01
> > (march 1 2017).
> >
> > dataframe2 has the same  two columns, one with the dates in -MM-DD
> > format, and the other column with number of transits. dataframe2 starts
> > have the same start and end dates, however, dataframe2 has missing dates
> > between the start and end dates, so it has fewer observations.
> >
> > dataframe1 has a total of 378 observations and dataframe2 has a  total of
> > 362 observations.
> >
> > I would like to come up with a code that could do the following:
> >
> > Get the dates of dataframe1 that are missing in dataframe2 and add them as
> > records to dataframe 2 but with NA values.
> >
> >  >
> > Date  Transits  Date
> > Transits
> > 1985-10-01NA 1985-10-0115
> > 1985-11-01NA 1986-01-01 20
> > 1985-12-01NA 1986-02-01 5
> > 1986-01-01NA
> > 1986-02-01NA
> > 2017-03-01NA
> >
> > I would like to fill in the missing dates in dataframe2, with NA as value
> > for the missing transits, so that I  could end up with a dataframe3 looking
> > as follows:
> >
> >  > DateTransits
> > 1985-10-01  15
> > 1985-11-01   NA
> > 1985-12-01   NA
> > 1986-01-01   20
> > 1986-02-01   5
> > 2017-03-01   NA
> >
> > This is what I want to accomplish.
> >
> > Thanks, beforehand for your help,
> >
> > Best regards,
> >
> > Paul
> >
> >
> > 2017-03-27 15:15 GMT-05:00 Mark Sharp  > >:
> >
> >> Make some small dataframes of just a few rows that illustrate the problem
> >> structure. Make a third that has the result you want. You will get an
> >> answer very quickly. Without a self-contained reproducible problem,
> > results
> >> vary.
> >>
> >> Mark
> >> R. Mark Sharp, Ph.D.
> >> 

Re: [R] Looping Through DataFrames with Differing Lenghts

2017-03-28 Thread David L Carlson
We did not get the file on the list. You need to rename your file to 
"Container.txt" or the mailing list will strip it from your message. The 
read.csv() function returns a data frame so Data is already a data frame. The 
command DataFrame<-data.frame(Data) just makes a copy of Data. 

Without the file, it is difficult to be certain, but your dates are probably 
stored as character strings and read.csv() will turn those to factors unless 
you tell it not to do that. Try

Data<-read.csv("Container.csv", stringsAsFactors=FALSE)
str(Data) # To see how the dates are stored

and see if things work better. If not, rename the file or use dput(Data) and 
copy the result into your email message. If the data is very long, use 
dput(head(Data, 15)).

-
David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77840-4352


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Paul Bernal
Sent: Tuesday, March 28, 2017 9:12 AM
To: Ng Bo Lin <ngboli...@gmail.com>
Cc: r-help@r-project.org
Subject: Re: [R] Looping Through DataFrames with Differing Lenghts

Dear friends Ng Bo Lin, Mark and Ulrik, thank you all for your kind and
valuable replies,

I am trying to reformat a date as follows:

Data<-read.csv("Container.csv")

DataFrame<-data.frame(Data)

DataFrame$TransitDate<-as.Date(DataFrame$TransitDate, "%Y-%m-%d")

#trying to put it in -MM-DD format

However, when I do this, I get a bunch of NAs for the dates.

I am providing a sample dataset as a reference.

Any help will be greatly appreciated,

Best regards,

Paul

2017-03-28 8:15 GMT-05:00 Ng Bo Lin <ngboli...@gmail.com>:

> Hi Paul,
>
> Using the example provided by Ulrik, where
>
> > exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01”,
> "1986-01-01"), Transits = c(NA, NA, NA, NA))
> > exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits =
> c(15,20)),
>
> You could also try the following function:
>
> for (i in 1:dim(exdf1)[1]){
> if (!exdf1[i, 1] %in% exdf2[, 1]){
> exdf2 <- rbind(exdf2, exdf1[i,])
> }
> }
>
> Basically, what the function does is that it runs through the number of
> rows in exdf1, and checks if the Date of the exdf1 row already exists in
> Date column of exdf2. If so, it skips it. Otherwise, it binds the row to
> df2.
>
> Hope this helps!
>
>
> Side note.: Computational efficiency wise, think Ulrik’s answer is
> probably better. Presentation wise, his is also much better.
>
> Regards,
> Bo Lin
>
> > On 28 Mar 2017, at 5:22 PM, Ulrik Stervbo <ulrik.ster...@gmail.com>
> wrote:
> >
> > Hi Paul,
> >
> > does this do what you want?
> >
> > exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01",
> > "1986-01-01"), Transits = c(NA, NA, NA, NA))
> > exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits =
> c(15,
> > 20))
> >
> > tmpdf <- subset(exdf1, !Date %in% exdf2$Date)
> >
> > rbind(exdf2, tmpdf)
> >
> > HTH,
> > Ulrik
> >
> > On Tue, 28 Mar 2017 at 10:50 Paul Bernal <paulberna...@gmail.com> wrote:
> >
> > Dear friend Mark,
> >
> > Great suggestion! Thank you for replying.
> >
> > I have two dataframes, dataframe1 and dataframe2.
> >
> > dataframe1 has two columns, one with the dates in -MM-DD format and
> the
> > other colum with number of transits (all of which were set to NA values).
> > dataframe1 starts in 1985-10-01 (october 1st 1985) and ends in 2017-03-01
> > (march 1 2017).
> >
> > dataframe2 has the same  two columns, one with the dates in -MM-DD
> > format, and the other column with number of transits. dataframe2 starts
> > have the same start and end dates, however, dataframe2 has missing dates
> > between the start and end dates, so it has fewer observations.
> >
> > dataframe1 has a total of 378 observations and dataframe2 has a  total of
> > 362 observations.
> >
> > I would like to come up with a code that could do the following:
> >
> > Get the dates of dataframe1 that are missing in dataframe2 and add them
> as
> > records to dataframe 2 but with NA values.
> >
> >  >
> > Date  Transits  Date
> > Transits
> > 1985-10-01NA 1985-10-0115
> > 1985-11-01NA 1986-01-01 20
> > 1985-12-01NA  

Re: [R] Looping Through DataFrames with Differing Lenghts

2017-03-28 Thread Paul Bernal
Dear Bo Lin,

I tried doing
Containerdata$TransitDate<-as.Date(Containerdata$TransitDate, "%e-%B-%y")
but I keep getting NAs.

I also tried a solution that I saw in stackoverflow doing:

> lct<-Sys.getlocale("LC_TIME"); Sys.setlocale("LC_TIME", "C")
[1] "C"
>
> Sys.setlocale("LC_TIME", lct)
[1] "English_United States.1252"

but didn´t work.

Any other suggestion?

Thank you for your valuable help,

Regards,

Paul

2017-03-28 9:19 GMT-05:00 Ng Bo Lin :

> Hi Paul,
>
> The date format that you have supplied to R isn’t exactly right.
>
> Instead of supplying the format “%Y-%m-%d”, it appears that the format of
> your data adheres to the “%e-%B-%y” format. In this case, %e refers to Day,
> and takes an integer between (0 - 31), %B refers to the 3 letter
> abbreviated version of the Month, and %y refers to the Year provided in a
> “2-integer” format.
>
> Hope this helps!
>
> Thank you.
>
> Regards,
> Bo Lin
>
> On 28 Mar 2017, at 10:12 PM, Paul Bernal  wrote:
>
> Dear friends Ng Bo Lin, Mark and Ulrik, thank you all for your kind and
> valuable replies,
>
> I am trying to reformat a date as follows:
>
> Data<-read.csv("Container.csv")
>
> DataFrame<-data.frame(Data)
>
> DataFrame$TransitDate<-as.Date(DataFrame$TransitDate, "%Y-%m-%d")
>
> #trying to put it in -MM-DD format
>
> However, when I do this, I get a bunch of NAs for the dates.
>
> I am providing a sample dataset as a reference.
>
> Any help will be greatly appreciated,
>
> Best regards,
>
> Paul
>
> 2017-03-28 8:15 GMT-05:00 Ng Bo Lin :
>
>> Hi Paul,
>>
>> Using the example provided by Ulrik, where
>>
>> > exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01”,
>> "1986-01-01"), Transits = c(NA, NA, NA, NA))
>> > exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits =
>> c(15,20)),
>>
>> You could also try the following function:
>>
>> for (i in 1:dim(exdf1)[1]){
>> if (!exdf1[i, 1] %in% exdf2[, 1]){
>> exdf2 <- rbind(exdf2, exdf1[i,])
>> }
>> }
>>
>> Basically, what the function does is that it runs through the number of
>> rows in exdf1, and checks if the Date of the exdf1 row already exists in
>> Date column of exdf2. If so, it skips it. Otherwise, it binds the row to
>> df2.
>>
>> Hope this helps!
>>
>>
>> Side note.: Computational efficiency wise, think Ulrik’s answer is
>> probably better. Presentation wise, his is also much better.
>>
>> Regards,
>> Bo Lin
>>
>> > On 28 Mar 2017, at 5:22 PM, Ulrik Stervbo 
>> wrote:
>> >
>> > Hi Paul,
>> >
>> > does this do what you want?
>> >
>> > exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01",
>> > "1986-01-01"), Transits = c(NA, NA, NA, NA))
>> > exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits =
>> c(15,
>> > 20))
>> >
>> > tmpdf <- subset(exdf1, !Date %in% exdf2$Date)
>> >
>> > rbind(exdf2, tmpdf)
>> >
>> > HTH,
>> > Ulrik
>> >
>> > On Tue, 28 Mar 2017 at 10:50 Paul Bernal 
>> wrote:
>> >
>> > Dear friend Mark,
>> >
>> > Great suggestion! Thank you for replying.
>> >
>> > I have two dataframes, dataframe1 and dataframe2.
>> >
>> > dataframe1 has two columns, one with the dates in -MM-DD format and
>> the
>> > other colum with number of transits (all of which were set to NA
>> values).
>> > dataframe1 starts in 1985-10-01 (october 1st 1985) and ends in
>> 2017-03-01
>> > (march 1 2017).
>> >
>> > dataframe2 has the same  two columns, one with the dates in -MM-DD
>> > format, and the other column with number of transits. dataframe2 starts
>> > have the same start and end dates, however, dataframe2 has missing dates
>> > between the start and end dates, so it has fewer observations.
>> >
>> > dataframe1 has a total of 378 observations and dataframe2 has a  total
>> of
>> > 362 observations.
>> >
>> > I would like to come up with a code that could do the following:
>> >
>> > Get the dates of dataframe1 that are missing in dataframe2 and add them
>> as
>> > records to dataframe 2 but with NA values.
>> >
>> > > >
>> > Date  Transits  Date
>> > Transits
>> > 1985-10-01NA 1985-10-0115
>> > 1985-11-01NA 1986-01-01 20
>> > 1985-12-01NA 1986-02-01 5
>> > 1986-01-01NA
>> > 1986-02-01NA
>> > 2017-03-01NA
>> >
>> > I would like to fill in the missing dates in dataframe2, with NA as
>> value
>> > for the missing transits, so that I  could end up with a dataframe3
>> looking
>> > as follows:
>> >
>> > > > DateTransits
>> > 1985-10-01  15
>> > 1985-11-01   NA
>> > 1985-12-01   NA
>> > 1986-01-01   20
>> > 1986-02-01   5
>> > 2017-03-01   NA
>> >
>> > This is what I want to 

Re: [R] Looping Through DataFrames with Differing Lenghts

2017-03-28 Thread Paul Bernal
Dear friends Ng Bo Lin, Mark and Ulrik, thank you all for your kind and
valuable replies,

I am trying to reformat a date as follows:

Data<-read.csv("Container.csv")

DataFrame<-data.frame(Data)

DataFrame$TransitDate<-as.Date(DataFrame$TransitDate, "%Y-%m-%d")

#trying to put it in -MM-DD format

However, when I do this, I get a bunch of NAs for the dates.

I am providing a sample dataset as a reference.

Any help will be greatly appreciated,

Best regards,

Paul

2017-03-28 8:15 GMT-05:00 Ng Bo Lin :

> Hi Paul,
>
> Using the example provided by Ulrik, where
>
> > exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01”,
> "1986-01-01"), Transits = c(NA, NA, NA, NA))
> > exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits =
> c(15,20)),
>
> You could also try the following function:
>
> for (i in 1:dim(exdf1)[1]){
> if (!exdf1[i, 1] %in% exdf2[, 1]){
> exdf2 <- rbind(exdf2, exdf1[i,])
> }
> }
>
> Basically, what the function does is that it runs through the number of
> rows in exdf1, and checks if the Date of the exdf1 row already exists in
> Date column of exdf2. If so, it skips it. Otherwise, it binds the row to
> df2.
>
> Hope this helps!
>
>
> Side note.: Computational efficiency wise, think Ulrik’s answer is
> probably better. Presentation wise, his is also much better.
>
> Regards,
> Bo Lin
>
> > On 28 Mar 2017, at 5:22 PM, Ulrik Stervbo 
> wrote:
> >
> > Hi Paul,
> >
> > does this do what you want?
> >
> > exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01",
> > "1986-01-01"), Transits = c(NA, NA, NA, NA))
> > exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits =
> c(15,
> > 20))
> >
> > tmpdf <- subset(exdf1, !Date %in% exdf2$Date)
> >
> > rbind(exdf2, tmpdf)
> >
> > HTH,
> > Ulrik
> >
> > On Tue, 28 Mar 2017 at 10:50 Paul Bernal  wrote:
> >
> > Dear friend Mark,
> >
> > Great suggestion! Thank you for replying.
> >
> > I have two dataframes, dataframe1 and dataframe2.
> >
> > dataframe1 has two columns, one with the dates in -MM-DD format and
> the
> > other colum with number of transits (all of which were set to NA values).
> > dataframe1 starts in 1985-10-01 (october 1st 1985) and ends in 2017-03-01
> > (march 1 2017).
> >
> > dataframe2 has the same  two columns, one with the dates in -MM-DD
> > format, and the other column with number of transits. dataframe2 starts
> > have the same start and end dates, however, dataframe2 has missing dates
> > between the start and end dates, so it has fewer observations.
> >
> > dataframe1 has a total of 378 observations and dataframe2 has a  total of
> > 362 observations.
> >
> > I would like to come up with a code that could do the following:
> >
> > Get the dates of dataframe1 that are missing in dataframe2 and add them
> as
> > records to dataframe 2 but with NA values.
> >
> >  >
> > Date  Transits  Date
> > Transits
> > 1985-10-01NA 1985-10-0115
> > 1985-11-01NA 1986-01-01 20
> > 1985-12-01NA 1986-02-01 5
> > 1986-01-01NA
> > 1986-02-01NA
> > 2017-03-01NA
> >
> > I would like to fill in the missing dates in dataframe2, with NA as value
> > for the missing transits, so that I  could end up with a dataframe3
> looking
> > as follows:
> >
> >  > DateTransits
> > 1985-10-01  15
> > 1985-11-01   NA
> > 1985-12-01   NA
> > 1986-01-01   20
> > 1986-02-01   5
> > 2017-03-01   NA
> >
> > This is what I want to accomplish.
> >
> > Thanks, beforehand for your help,
> >
> > Best regards,
> >
> > Paul
> >
> >
> > 2017-03-27 15:15 GMT-05:00 Mark Sharp :
> >
> >> Make some small dataframes of just a few rows that illustrate the
> problem
> >> structure. Make a third that has the result you want. You will get an
> >> answer very quickly. Without a self-contained reproducible problem,
> > results
> >> vary.
> >>
> >> Mark
> >> R. Mark Sharp, Ph.D.
> >> msh...@txbiomed.org
> >>
> >>
> >>
> >>
> >>
> >>> On Mar 27, 2017, at 3:09 PM, Paul Bernal 
> wrote:
> >>>
> >>> Dear friends,
> >>>
> >>> I have one dataframe which contains 378 observations, and another one,
> >>> containing 362 observations.
> >>>
> >>> Both dataframes have two columns, one date column and another one with
> >> the
> >>> number of transits.
> >>>
> >>> I wanted to come up with a code so that I could fill in the dates that
> >> are
> >>> missing in one of the dataframes and replace the column of transits
> with
> >>> the value NA.
> >>>
> >>> I have tried several things but R obviously complains that the length
> of
> >>> the dataframes are different.
> >>>
> >>> 

Re: [R] Looping Through DataFrames with Differing Lenghts

2017-03-28 Thread Ulrik Stervbo
Hi Paul,

does this do what you want?

exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01",
"1986-01-01"), Transits = c(NA, NA, NA, NA))
exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits = c(15,
20))

tmpdf <- subset(exdf1, !Date %in% exdf2$Date)

rbind(exdf2, tmpdf)

HTH,
Ulrik

On Tue, 28 Mar 2017 at 10:50 Paul Bernal  wrote:

Dear friend Mark,

Great suggestion! Thank you for replying.

I have two dataframes, dataframe1 and dataframe2.

dataframe1 has two columns, one with the dates in -MM-DD format and the
other colum with number of transits (all of which were set to NA values).
dataframe1 starts in 1985-10-01 (october 1st 1985) and ends in 2017-03-01
(march 1 2017).

dataframe2 has the same  two columns, one with the dates in -MM-DD
format, and the other column with number of transits. dataframe2 starts
have the same start and end dates, however, dataframe2 has missing dates
between the start and end dates, so it has fewer observations.

dataframe1 has a total of 378 observations and dataframe2 has a  total of
362 observations.

I would like to come up with a code that could do the following:

Get the dates of dataframe1 that are missing in dataframe2 and add them as
records to dataframe 2 but with NA values.

:

> Make some small dataframes of just a few rows that illustrate the problem
> structure. Make a third that has the result you want. You will get an
> answer very quickly. Without a self-contained reproducible problem,
results
> vary.
>
> Mark
> R. Mark Sharp, Ph.D.
> msh...@txbiomed.org
>
>
>
>
>
> > On Mar 27, 2017, at 3:09 PM, Paul Bernal  wrote:
> >
> > Dear friends,
> >
> > I have one dataframe which contains 378 observations, and another one,
> > containing 362 observations.
> >
> > Both dataframes have two columns, one date column and another one with
> the
> > number of transits.
> >
> > I wanted to come up with a code so that I could fill in the dates that
> are
> > missing in one of the dataframes and replace the column of transits with
> > the value NA.
> >
> > I have tried several things but R obviously complains that the length of
> > the dataframes are different.
> >
> > How can I solve this?
> >
> > Any guidance will be greatly appreciated,
> >
> > Best regards,
> >
> > Paul
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> CONFIDENTIALITY NOTICE: This e-mail and any files and/or attachments
> transmitted, may contain privileged and confidential information and is
> intended solely for the exclusive use of the individual or entity to whom
> it is addressed. If you are not the intended recipient, you are hereby
> notified that any review, dissemination, distribution or copying of this
> e-mail and/or attachments is strictly prohibited. If you have received
this
> e-mail in error, please immediately notify the sender stating that this
> transmission was misdirected; return the e-mail to sender; destroy all
> paper copies and delete all electronic copies from your system without
> disclosing its contents.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Looping Through DataFrames with Differing Lenghts

2017-03-28 Thread Paul Bernal
Dear friend Mark,

Great suggestion! Thank you for replying.

I have two dataframes, dataframe1 and dataframe2.

dataframe1 has two columns, one with the dates in -MM-DD format and the
other colum with number of transits (all of which were set to NA values).
dataframe1 starts in 1985-10-01 (october 1st 1985) and ends in 2017-03-01
(march 1 2017).

dataframe2 has the same  two columns, one with the dates in -MM-DD
format, and the other column with number of transits. dataframe2 starts
have the same start and end dates, however, dataframe2 has missing dates
between the start and end dates, so it has fewer observations.

dataframe1 has a total of 378 observations and dataframe2 has a  total of
362 observations.

I would like to come up with a code that could do the following:

Get the dates of dataframe1 that are missing in dataframe2 and add them as
records to dataframe 2 but with NA values.

:

> Make some small dataframes of just a few rows that illustrate the problem
> structure. Make a third that has the result you want. You will get an
> answer very quickly. Without a self-contained reproducible problem, results
> vary.
>
> Mark
> R. Mark Sharp, Ph.D.
> msh...@txbiomed.org
>
>
>
>
>
> > On Mar 27, 2017, at 3:09 PM, Paul Bernal  wrote:
> >
> > Dear friends,
> >
> > I have one dataframe which contains 378 observations, and another one,
> > containing 362 observations.
> >
> > Both dataframes have two columns, one date column and another one with
> the
> > number of transits.
> >
> > I wanted to come up with a code so that I could fill in the dates that
> are
> > missing in one of the dataframes and replace the column of transits with
> > the value NA.
> >
> > I have tried several things but R obviously complains that the length of
> > the dataframes are different.
> >
> > How can I solve this?
> >
> > Any guidance will be greatly appreciated,
> >
> > Best regards,
> >
> > Paul
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> CONFIDENTIALITY NOTICE: This e-mail and any files and/or attachments
> transmitted, may contain privileged and confidential information and is
> intended solely for the exclusive use of the individual or entity to whom
> it is addressed. If you are not the intended recipient, you are hereby
> notified that any review, dissemination, distribution or copying of this
> e-mail and/or attachments is strictly prohibited. If you have received this
> e-mail in error, please immediately notify the sender stating that this
> transmission was misdirected; return the e-mail to sender; destroy all
> paper copies and delete all electronic copies from your system without
> disclosing its contents.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Looping Through DataFrames with Differing Lenghts

2017-03-27 Thread Anthoni, Peter (IMK)
Hi Paul,

match might help, but without a real data sample, it is hard to check if the 
following might work.

mm=match(df.col378[,"Date"],df.col362[,"Date"])
#mm will have NAs, where there is no matching date in df.col362
#and have the index of the match, where the two dates match
new.df=cbind(df.col378,"transits.col362"=df.col362[mm,"transits"])

cheers
Peter



> On 27 Mar 2017, at 22:09, Paul Bernal  wrote:
> 
> Dear friends,
> 
> I have one dataframe which contains 378 observations, and another one,
> containing 362 observations.
> 
> Both dataframes have two columns, one date column and another one with the
> number of transits.
> 
> I wanted to come up with a code so that I could fill in the dates that are
> missing in one of the dataframes and replace the column of transits with
> the value NA.
> 
> I have tried several things but R obviously complains that the length of
> the dataframes are different.
> 
> How can I solve this?
> 
> Any guidance will be greatly appreciated,
> 
> Best regards,
> 
> Paul
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Looping Through DataFrames with Differing Lenghts

2017-03-27 Thread Ulrik Stervbo
You could use merge() or %in%.

Best,
Ulrik

Mark Sharp  schrieb am Mo., 27. März 2017, 22:20:

> Make some small dataframes of just a few rows that illustrate the problem
> structure. Make a third that has the result you want. You will get an
> answer very quickly. Without a self-contained reproducible problem, results
> vary.
>
> Mark
> R. Mark Sharp, Ph.D.
> msh...@txbiomed.org
>
>
>
>
>
> > On Mar 27, 2017, at 3:09 PM, Paul Bernal  wrote:
> >
> > Dear friends,
> >
> > I have one dataframe which contains 378 observations, and another one,
> > containing 362 observations.
> >
> > Both dataframes have two columns, one date column and another one with
> the
> > number of transits.
> >
> > I wanted to come up with a code so that I could fill in the dates that
> are
> > missing in one of the dataframes and replace the column of transits with
> > the value NA.
> >
> > I have tried several things but R obviously complains that the length of
> > the dataframes are different.
> >
> > How can I solve this?
> >
> > Any guidance will be greatly appreciated,
> >
> > Best regards,
> >
> > Paul
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> CONFIDENTIALITY NOTICE: This e-mail and any files and/or...{{dropped:10}}
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping Through DataFrames with Differing Lenghts

2017-03-27 Thread Mark Sharp
Make some small dataframes of just a few rows that illustrate the problem 
structure. Make a third that has the result you want. You will get an answer 
very quickly. Without a self-contained reproducible problem, results vary.

Mark
R. Mark Sharp, Ph.D.
msh...@txbiomed.org





> On Mar 27, 2017, at 3:09 PM, Paul Bernal  wrote:
>
> Dear friends,
>
> I have one dataframe which contains 378 observations, and another one,
> containing 362 observations.
>
> Both dataframes have two columns, one date column and another one with the
> number of transits.
>
> I wanted to come up with a code so that I could fill in the dates that are
> missing in one of the dataframes and replace the column of transits with
> the value NA.
>
> I have tried several things but R obviously complains that the length of
> the dataframes are different.
>
> How can I solve this?
>
> Any guidance will be greatly appreciated,
>
> Best regards,
>
> Paul
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

CONFIDENTIALITY NOTICE: This e-mail and any files and/or...{{dropped:10}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Looping Through DataFrames with Differing Lenghts

2017-03-27 Thread Paul Bernal
Dear friends,

I have one dataframe which contains 378 observations, and another one,
containing 362 observations.

Both dataframes have two columns, one date column and another one with the
number of transits.

I wanted to come up with a code so that I could fill in the dates that are
missing in one of the dataframes and replace the column of transits with
the value NA.

I have tried several things but R obviously complains that the length of
the dataframes are different.

How can I solve this?

Any guidance will be greatly appreciated,

Best regards,

Paul

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.