Re: [R] missing and replace

2017-04-26 Thread Ng Bo Lin
Apologies, I re-read the question and realised you hope to replace the missing 
values rounded to the nearest whole number.

Here’s the code in full.

df1 <- data.frame(x = c(25, 30, 40, 26, 60), y = c(122, 135, NA, 157, 195), z = 
c(352, 376, 350, NA, 360))

means <- sapply(df1, mean, na.rm = T); return_mean_if_NA <- function(x, y) { if 
(is.na(x)){ x <- y } else { return(x) } }

df2 <- df1[0, ]

for (i in 1:ncol(df1)){
for (j in 1:nrow(df1)){
df2[j, i] <- round(return_mean_if_NA(df1[j, i], means[i]), 0)
}
}

HTH.

Regards,
Bo Lin

> On 27 Apr 2017, at 9:19 AM, Ng Bo Lin <ngboli...@gmail.com> wrote:
> 
> Hi Val,
> 
> You could do this by nesting 2 for loops, and defining a function such that 
> it returns the mean of the column when the value is ‘NA’.
> 
> df1 <- data.frame(x = c(25, 30, 40, 26, 60), y = c(122, 135, NA, 157, 195), z 
> = c(352, 376, 350, NA, 360)); df2 <- df1[0, ]
> 
> means <- sapply(df1, mean, na.rm = T); return_mean_if_NA <- function(x, y) { 
> if (is.na(x)){ x <- y } else { return(x) } }
> 
> for (i in 1:ncol(df1)){
>for (j in 1:nrow(df1)){
>df2[j, i] <- return_mean_if_NA(df1[j, i], means[i])
>}
> }
> 
> 
> Hope this helps!
> 
> Regards,
> Bo Lin
> 
>> On 27 Apr 2017, at 8:45 AM, Val <valkr...@gmail.com> wrote:
>> 
>> HI all,
>> 
>> I have a data frame with three variables. Some of the variables do
>> have missing values and I want to replace those missing values
>> (1represented by NA) with the mean value of that variable. In this
>> sample data,  variable z and y do have missing values. The mean value
>> of y  and z are152. 25  and 359.5, respectively . I want replace those
>> missing values  by the respective mean value ( rounded to the nearest
>> whole number).
>> 
>> DF1 <- read.table(header=TRUE, text='ID1 x y z
>> 1  25  122352
>> 2  30  135376
>> 3  40   NA350
>> 4  26  157NA
>> 5  60  195360')
>> mean x= 36.2
>> mean y=152.25
>> mean z= 359.5
>> 
>> output
>> ID1  x  y  z
>> 1   25 122   352
>> 2   30 135   376
>> 3   40 152   350
>> 4   26 157   360
>> 5   60 195   360
>> 
>> 
>> Thank you in advance
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] missing and replace

2017-04-26 Thread Ng Bo Lin
Hi Val,

You could do this by nesting 2 for loops, and defining a function such that it 
returns the mean of the column when the value is ‘NA’.

df1 <- data.frame(x = c(25, 30, 40, 26, 60), y = c(122, 135, NA, 157, 195), z = 
c(352, 376, 350, NA, 360)); df2 <- df1[0, ]

means <- sapply(df1, mean, na.rm = T); return_mean_if_NA <- function(x, y) { if 
(is.na(x)){ x <- y } else { return(x) } }

for (i in 1:ncol(df1)){
for (j in 1:nrow(df1)){
df2[j, i] <- return_mean_if_NA(df1[j, i], means[i])
}
}


Hope this helps!

Regards,
Bo Lin

> On 27 Apr 2017, at 8:45 AM, Val  wrote:
> 
> HI all,
> 
> I have a data frame with three variables. Some of the variables do
> have missing values and I want to replace those missing values
> (1represented by NA) with the mean value of that variable. In this
> sample data,  variable z and y do have missing values. The mean value
> of y  and z are152. 25  and 359.5, respectively . I want replace those
> missing values  by the respective mean value ( rounded to the nearest
> whole number).
> 
> DF1 <- read.table(header=TRUE, text='ID1 x y z
> 1  25  122352
> 2  30  135376
> 3  40   NA350
> 4  26  157NA
> 5  60  195360')
> mean x= 36.2
> mean y=152.25
> mean z= 359.5
> 
> output
> ID1  x  y  z
> 1   25 122   352
> 2   30 135   376
> 3   40 152   350
> 4   26 157   360
> 5   60 195   360
> 
> 
> Thank you in advance
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping Through DataFrames with Differing Lenghts

2017-03-28 Thread Ng Bo Lin
Hi Paul,

Using the example provided by Ulrik, where

> exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01”, 
> "1986-01-01"), Transits = c(NA, NA, NA, NA))
> exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits = 
> c(15,20)),

You could also try the following function:

for (i in 1:dim(exdf1)[1]){
if (!exdf1[i, 1] %in% exdf2[, 1]){
exdf2 <- rbind(exdf2, exdf1[i,])
}
}

Basically, what the function does is that it runs through the number of rows in 
exdf1, and checks if the Date of the exdf1 row already exists in Date column of 
exdf2. If so, it skips it. Otherwise, it binds the row to df2.

Hope this helps!


Side note.: Computational efficiency wise, think Ulrik’s answer is probably 
better. Presentation wise, his is also much better.

Regards,
Bo Lin

> On 28 Mar 2017, at 5:22 PM, Ulrik Stervbo  wrote:
> 
> Hi Paul,
> 
> does this do what you want?
> 
> exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01",
> "1986-01-01"), Transits = c(NA, NA, NA, NA))
> exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits = c(15,
> 20))
> 
> tmpdf <- subset(exdf1, !Date %in% exdf2$Date)
> 
> rbind(exdf2, tmpdf)
> 
> HTH,
> Ulrik
> 
> On Tue, 28 Mar 2017 at 10:50 Paul Bernal  wrote:
> 
> Dear friend Mark,
> 
> Great suggestion! Thank you for replying.
> 
> I have two dataframes, dataframe1 and dataframe2.
> 
> dataframe1 has two columns, one with the dates in -MM-DD format and the
> other colum with number of transits (all of which were set to NA values).
> dataframe1 starts in 1985-10-01 (october 1st 1985) and ends in 2017-03-01
> (march 1 2017).
> 
> dataframe2 has the same  two columns, one with the dates in -MM-DD
> format, and the other column with number of transits. dataframe2 starts
> have the same start and end dates, however, dataframe2 has missing dates
> between the start and end dates, so it has fewer observations.
> 
> dataframe1 has a total of 378 observations and dataframe2 has a  total of
> 362 observations.
> 
> I would like to come up with a code that could do the following:
> 
> Get the dates of dataframe1 that are missing in dataframe2 and add them as
> records to dataframe 2 but with NA values.
> 
>  
> Date  Transits  Date
> Transits
> 1985-10-01NA 1985-10-0115
> 1985-11-01NA 1986-01-01 20
> 1985-12-01NA 1986-02-01 5
> 1986-01-01NA
> 1986-02-01NA
> 2017-03-01NA
> 
> I would like to fill in the missing dates in dataframe2, with NA as value
> for the missing transits, so that I  could end up with a dataframe3 looking
> as follows:
> 
>  DateTransits
> 1985-10-01  15
> 1985-11-01   NA
> 1985-12-01   NA
> 1986-01-01   20
> 1986-02-01   5
> 2017-03-01   NA
> 
> This is what I want to accomplish.
> 
> Thanks, beforehand for your help,
> 
> Best regards,
> 
> Paul
> 
> 
> 2017-03-27 15:15 GMT-05:00 Mark Sharp :
> 
>> Make some small dataframes of just a few rows that illustrate the problem
>> structure. Make a third that has the result you want. You will get an
>> answer very quickly. Without a self-contained reproducible problem,
> results
>> vary.
>> 
>> Mark
>> R. Mark Sharp, Ph.D.
>> msh...@txbiomed.org
>> 
>> 
>> 
>> 
>> 
>>> On Mar 27, 2017, at 3:09 PM, Paul Bernal  wrote:
>>> 
>>> Dear friends,
>>> 
>>> I have one dataframe which contains 378 observations, and another one,
>>> containing 362 observations.
>>> 
>>> Both dataframes have two columns, one date column and another one with
>> the
>>> number of transits.
>>> 
>>> I wanted to come up with a code so that I could fill in the dates that
>> are
>>> missing in one of the dataframes and replace the column of transits with
>>> the value NA.
>>> 
>>> I have tried several things but R obviously complains that the length of
>>> the dataframes are different.
>>> 
>>> How can I solve this?
>>> 
>>> Any guidance will be greatly appreciated,
>>> 
>>> Best regards,
>>> 
>>> Paul
>>> 
>>> [[alternative HTML version deleted]]
>>> 
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> CONFIDENTIALITY NOTICE: This e-mail and any files and/or attachments
>> transmitted, may contain privileged and confidential information and is
>> intended solely for the exclusive use of the individual or entity to whom
>> it is addressed. If you are not the intended recipient, you are 

Re: [R] Looping Through DataFrames with Differing Lenghts

2017-03-28 Thread Ng Bo Lin
Hi Paul,

The date format that you have supplied to R isn’t exactly right.

Instead of supplying the format “%Y-%m-%d”, it appears that the format of your 
data adheres to the “%e-%B-%y” format. In this case, %e refers to Day, and 
takes an integer between (0 - 31), %B refers to the 3 letter abbreviated 
version of the Month, and %y refers to the Year provided in a “2-integer” 
format.

Hope this helps!

Thank you.

Regards,
Bo Lin
> On 28 Mar 2017, at 10:12 PM, Paul Bernal <paulberna...@gmail.com> wrote:
> 
> Dear friends Ng Bo Lin, Mark and Ulrik, thank you all for your kind and 
> valuable replies,
> 
> I am trying to reformat a date as follows:
> 
> Data<-read.csv("Container.csv")
> 
> DataFrame<-data.frame(Data)
> 
> DataFrame$TransitDate<-as.Date(DataFrame$TransitDate, "%Y-%m-%d")
> 
> #trying to put it in -MM-DD format
> 
> However, when I do this, I get a bunch of NAs for the dates.
> 
> I am providing a sample dataset as a reference.
> 
> Any help will be greatly appreciated,
> 
> Best regards,
> 
> Paul
> 
> 2017-03-28 8:15 GMT-05:00 Ng Bo Lin <ngboli...@gmail.com 
> <mailto:ngboli...@gmail.com>>:
> Hi Paul,
> 
> Using the example provided by Ulrik, where
> 
> > exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01”, 
> > "1986-01-01"), Transits = c(NA, NA, NA, NA))
> > exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits = 
> > c(15,20)),
> 
> You could also try the following function:
> 
> for (i in 1:dim(exdf1)[1]){
> if (!exdf1[i, 1] %in% exdf2[, 1]){
> exdf2 <- rbind(exdf2, exdf1[i,])
> }
> }
> 
> Basically, what the function does is that it runs through the number of rows 
> in exdf1, and checks if the Date of the exdf1 row already exists in Date 
> column of exdf2. If so, it skips it. Otherwise, it binds the row to df2.
> 
> Hope this helps!
> 
> 
> Side note.: Computational efficiency wise, think Ulrik’s answer is probably 
> better. Presentation wise, his is also much better.
> 
> Regards,
> Bo Lin
> 
> > On 28 Mar 2017, at 5:22 PM, Ulrik Stervbo <ulrik.ster...@gmail.com 
> > <mailto:ulrik.ster...@gmail.com>> wrote:
> >
> > Hi Paul,
> >
> > does this do what you want?
> >
> > exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01",
> > "1986-01-01"), Transits = c(NA, NA, NA, NA))
> > exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits = c(15,
> > 20))
> >
> > tmpdf <- subset(exdf1, !Date %in% exdf2$Date)
> >
> > rbind(exdf2, tmpdf)
> >
> > HTH,
> > Ulrik
> >
> > On Tue, 28 Mar 2017 at 10:50 Paul Bernal <paulberna...@gmail.com 
> > <mailto:paulberna...@gmail.com>> wrote:
> >
> > Dear friend Mark,
> >
> > Great suggestion! Thank you for replying.
> >
> > I have two dataframes, dataframe1 and dataframe2.
> >
> > dataframe1 has two columns, one with the dates in -MM-DD format and the
> > other colum with number of transits (all of which were set to NA values).
> > dataframe1 starts in 1985-10-01 (october 1st 1985) and ends in 2017-03-01
> > (march 1 2017).
> >
> > dataframe2 has the same  two columns, one with the dates in -MM-DD
> > format, and the other column with number of transits. dataframe2 starts
> > have the same start and end dates, however, dataframe2 has missing dates
> > between the start and end dates, so it has fewer observations.
> >
> > dataframe1 has a total of 378 observations and dataframe2 has a  total of
> > 362 observations.
> >
> > I would like to come up with a code that could do the following:
> >
> > Get the dates of dataframe1 that are missing in dataframe2 and add them as
> > records to dataframe 2 but with NA values.
> >
> >  >
> > Date  Transits  Date
> > Transits
> > 1985-10-01NA 1985-10-0115
> > 1985-11-01NA 1986-01-01 20
> > 1985-12-01NA 1986-02-01 5
> > 1986-01-01NA
> > 1986-02-01NA
> > 2017-03-01NA
> >
> > I would like to fill in the missing dates in dataframe2, with NA as value
> > for the missing transits, so that I  could end up with a dataframe3 looking
> > as follows:
> >
> >  > DateTransits
> >

Re: [R] Fw: Averaging without NAs

2017-03-02 Thread Ng Bo Lin
Hi Elahe,

You can do so using the mean function, mean(), by specifying an additional 
argument, na.rm = TRUE. In this case, you specify that you wish to remove (rm) 
all NA values in the columns.

—> mean($X2016.Q1, na.rm = T).

By default, na.rm is set to FALSE, so mean() will return a NA value.

Hope this helps!

Regards,
Bo Lin
> On 2 Mar 2017, at 6:57 PM, ch.elahe via R-help  wrote:
> 
> 
> 
> The question seems easy but I could not find an answer for it. I have the 
> following column in my data frame and I want to take average of the column 
> excluding the number of NAs. 
> 
> $X2016.Q1 : int 47 53 75 97 NA NA 23 NA 43 NA 
> 
> Does anyone know how to do that?
> Thanks for any help 
> Elahe
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.