Re: [R] how to ignore NA with NA or NULL
Hello, I added your flags in my code but there are still errors. Actually I tried some things: - in function na.fill, I changed: if(all(!is.na(y[1:8700,1]))) return(NA) to if(all(!is.finite(y[1:8700,1]))) return(y) In order to have this file unchanged. It has removed my dimension problem. I don't have errors anymore in: refill - process.all(lst, corhiver2008capt1) but just some message d'avis readable with warnings() Then I noticed in refill (the object which should be filled with my code) that files containing only NAs are turned as NULL in this object. So I have 0 rows for these objects instead of having them unchanged (35000 rows). So when I transform it to data.frame, it doesn't work because of a new dimension problem due to these NULL files. But I don't understand where these files have been turned as NULL in my code. Could you maybe tell me how can I have in output my only NA files not as NULL but kept unchanged like at the beginning? Thanks again. -- View this message in context: http://r.789695.n4.nabble.com/how-to-ignore-NA-with-NA-or-NULL-tp4632287p4632506.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to ignore NA with NA or NULL
Please read the posting guide mentioned at the bottom of every message. You might also benefit from reading http://stackoverflow.com/questions/5963269/how-to-make-a-great-reproducible-example. We would certainly benefit from not having to guess what problems you are really encountering. Also, it seems that you refer to in-memory data as files... this is imprecise and confusing. Learn to use the str() function to know what kinds of objects you are referring to... in this case I believe you are referring to data frames. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. jeff6868 geoffrey_kl...@etu.u-bourgogne.fr wrote: Hello, I added your flags in my code but there are still errors. Actually I tried some things: - in function na.fill, I changed: if(all(!is.na(y[1:8700,1]))) return(NA) to if(all(!is.finite(y[1:8700,1]))) return(y) In order to have this file unchanged. It has removed my dimension problem. I don't have errors anymore in: refill - process.all(lst, corhiver2008capt1) but just some message d'avis readable with warnings() Then I noticed in refill (the object which should be filled with my code) that files containing only NAs are turned as NULL in this object. So I have 0 rows for these objects instead of having them unchanged (35000 rows). So when I transform it to data.frame, it doesn't work because of a new dimension problem due to these NULL files. But I don't understand where these files have been turned as NULL in my code. Could you maybe tell me how can I have in output my only NA files not as NULL but kept unchanged like at the beginning? Thanks again. -- View this message in context: http://r.789695.n4.nabble.com/how-to-ignore-NA-with-NA-or-NULL-tp4632287p4632506.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to ignore NA with NA or NULL
Ok Jeff, but then it'll be a big one. I'm working on a list of files and my problem depends on different functions used previously. So it's very hard for me to summarize to reproduct my error. But here is the reproductible example with the error at the last line of the code (just copy and paste it). You'll notice that the data.frame with only NAs is set to NULL in refill, and I just want to have it unchanged in output (so the same as input). The aim of the function is to fill the NAs of my data.frames. It'll not work in this example because there're only big NA gaps which are my problem for the moment. But maybe now you can have an idea where the problem is (change NULL for only NA DF in output to the same DF as in input). For the example, we are just testing for x1. Hope you have understood my problem now :) Thanks Jeff, Rui or everyone else! # my data for example DF1 - data.frame(x1=rnorm(1:20),x2=c(31:50)) write.table(DF1,ST001_2008.csv,sep=;) DF2 - data.frame(x1=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,rnorm(1:10)),x2=c(1:20)) write.table(DF2,ST002_2008.csv,sep=;) DF3 - data.frame(x1=rnorm(81:100),x2=NA) write.table(DF3,ST003_2008.csv,sep=;) DF4 - data.frame(x1=c(21:40),x2=rnorm(1:20)) write.table(DF4,ST004_2008.csv,sep=;) #list my data filenames - list.files(pattern=\\_2008.csv$) Sensors - paste(x, 1:2,sep=) Stations -substr(filenames,1,5) nsensors - length(Sensors) nstations - length(Stations) nobs - nrow(read.table(filenames[1], header=TRUE)) yr2008 - array(NA, dim=c(nobs, nsensors, nstations)) for(i in seq_len(nstations)){ tmp - read.table(filenames[i], header=TRUE, sep=;) yr2008[ , , i] - as.matrix(tmp[, Sensors]) } dimnames(yr2008) - list(seq.int(nobs), Sensors, Stations) yr2008capt1hiver-yr2008[1:10,1,] yr2008capt1hiver - as.data.frame(yr2008capt1hiver) #correlation between my data for x1 (for the example) corhiver2008capt1 - cor(yr2008capt1hiver,use=pairwise.complete.obs) capt1hiver - c(1:length(yr2008capt1hiver)) for(i in 1:length(capt1hiver)) { if(sum(!is.na(yr2008capt1hiver[,capt1hiver[i]]))(length(yr2008capt1hiver[[capt1hiver[i]]])/2)) { corhiver2008capt1[i,]=NA corhiver2008capt1[,i]=NA } } lst - lapply(list.files(pattern=\\_2008.csv$), read.table,sep=;, header=TRUE, stringsAsFactors=FALSE) names(lst) - Stations # searching the highest correlation for each data.Frame get.max.cor - function(station, mat){ mat[row(mat) == col(mat)] - -Inf m - max(mat[station, ],na.rm=TRUE) if (is.finite(m)) {return(which( mat[station, ] == m ))} else {return(NA)} } # fill the data.frame with the data.frame which has the highest correlation coefficient na.fill - function(x, y){ if(all(!is.finite(y[1:10,1]))) return(y) i - is.na(x[1:10,1]) xx - y[1:10,1] new - data.frame(xx=xx) x[1:10,1][i] - predict(lm(x[1:10,1]~xx, na.action=na.exclude),new)[i] x } process.all - function(df.list, mat){ f - function(station) na.fill(df.list[[ station ]], df.list[[ max.cor[station] ]]) g - function(station){ x - df.list[[station]] if(any(!is.finite(x[1:10,1]))){ mat[row(mat) == col(mat)] - -Inf nas - which(is.na(x[1:10,1])) ord - order(mat[station, ], decreasing = TRUE)[-c(1, ncol(mat))] for(y in ord){ if(all(!is.na(df.list[[y]][1:10,1][nas]))){ xx - df.list[[y]][1:10,1] new - data.frame(xx=xx) x[1:10,1][nas] - predict(lm(x[1:10,1]~xx, na.action=na.exclude), new)[nas] break } } } x } n - length(df.list) nms - names(df.list) max.cor - sapply(seq.int(n), get.max.cor, corhiver2008capt1) df.list - lapply(seq.int(n), f) df.list - lapply(seq.int(n), g) names(df.list) - nms df.list } refill - process.all(lst, corhiver2008capt1) refill - as.data.frame(refill) ## HERE IS THE PROBLEM ## head(refill) -- View this message in context: http://r.789695.n4.nabble.com/how-to-ignore-NA-with-NA-or-NULL-tp4632287p4632527.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to ignore NA with NA or NULL
Still not clear what solution you would consider a success. On the one hand, you said you needed the NULLs, but you want one big data frame also. Does refill - refill[ -which( sapply( refill, is.null ), arr.ind=TRUE ) ) ] refill - as.data.frame( refill ) do what you want? If you need to keep the nulls, perhaps don't overwrite the refill list? --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. jeff6868 geoffrey_kl...@etu.u-bourgogne.fr wrote: Ok Jeff, but then it'll be a big one. I'm working on a list of files and my problem depends on different functions used previously. So it's very hard for me to summarize to reproduct my error. But here is the reproductible example with the error at the last line of the code (just copy and paste it). You'll notice that the data.frame with only NAs is set to NULL in refill, and I just want to have it unchanged in output (so the same as input). The aim of the function is to fill the NAs of my data.frames. It'll not work in this example because there're only big NA gaps which are my problem for the moment. But maybe now you can have an idea where the problem is (change NULL for only NA DF in output to the same DF as in input). For the example, we are just testing for x1. Hope you have understood my problem now :) Thanks Jeff, Rui or everyone else! # my data for example DF1 - data.frame(x1=rnorm(1:20),x2=c(31:50)) write.table(DF1,ST001_2008.csv,sep=;) DF2 - data.frame(x1=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,rnorm(1:10)),x2=c(1:20)) write.table(DF2,ST002_2008.csv,sep=;) DF3 - data.frame(x1=rnorm(81:100),x2=NA) write.table(DF3,ST003_2008.csv,sep=;) DF4 - data.frame(x1=c(21:40),x2=rnorm(1:20)) write.table(DF4,ST004_2008.csv,sep=;) #list my data filenames - list.files(pattern=\\_2008.csv$) Sensors - paste(x, 1:2,sep=) Stations -substr(filenames,1,5) nsensors - length(Sensors) nstations - length(Stations) nobs - nrow(read.table(filenames[1], header=TRUE)) yr2008 - array(NA, dim=c(nobs, nsensors, nstations)) for(i in seq_len(nstations)){ tmp - read.table(filenames[i], header=TRUE, sep=;) yr2008[ , , i] - as.matrix(tmp[, Sensors]) } dimnames(yr2008) - list(seq.int(nobs), Sensors, Stations) yr2008capt1hiver-yr2008[1:10,1,] yr2008capt1hiver - as.data.frame(yr2008capt1hiver) #correlation between my data for x1 (for the example) corhiver2008capt1 - cor(yr2008capt1hiver,use=pairwise.complete.obs) capt1hiver - c(1:length(yr2008capt1hiver)) for(i in 1:length(capt1hiver)) { if(sum(!is.na(yr2008capt1hiver[,capt1hiver[i]]))(length(yr2008capt1hiver[[capt1hiver[i]]])/2)) { corhiver2008capt1[i,]=NA corhiver2008capt1[,i]=NA } } lst - lapply(list.files(pattern=\\_2008.csv$), read.table,sep=;, header=TRUE, stringsAsFactors=FALSE) names(lst) - Stations # searching the highest correlation for each data.Frame get.max.cor - function(station, mat){ mat[row(mat) == col(mat)] - -Inf m - max(mat[station, ],na.rm=TRUE) if (is.finite(m)) {return(which( mat[station, ] == m ))} else {return(NA)} } # fill the data.frame with the data.frame which has the highest correlation coefficient na.fill - function(x, y){ if(all(!is.finite(y[1:10,1]))) return(y) i - is.na(x[1:10,1]) xx - y[1:10,1] new - data.frame(xx=xx) x[1:10,1][i] - predict(lm(x[1:10,1]~xx, na.action=na.exclude),new)[i] x } process.all - function(df.list, mat){ f - function(station) na.fill(df.list[[ station ]], df.list[[ max.cor[station] ]]) g - function(station){ x - df.list[[station]] if(any(!is.finite(x[1:10,1]))){ mat[row(mat) == col(mat)] - -Inf nas - which(is.na(x[1:10,1])) ord - order(mat[station, ], decreasing = TRUE)[-c(1, ncol(mat))] for(y in ord){ if(all(!is.na(df.list[[y]][1:10,1][nas]))){ xx - df.list[[y]][1:10,1] new - data.frame(xx=xx) x[1:10,1][nas] - predict(lm(x[1:10,1]~xx, na.action=na.exclude), new)[nas] break } } } x } n - length(df.list) nms - names(df.list) max.cor - sapply(seq.int(n), get.max.cor, corhiver2008capt1) df.list - lapply(seq.int(n), f) df.list - lapply(seq.int(n), g) names(df.list) - nms df.list }
Re: [R] how to ignore NA with NA or NULL
Thanks again for your help jeff. Sorry if I'm not very clear. It's programmingly speaking hard to explain, and even to explain in english as I'm French. But i'll try again. Well your proposition removes the error, but it's not the result I'm expecting. You've removed NULL data.frames, but I need to keep them, well not to keep them but to transform them to something non-NULL actually. I'll try to show you in a very small and fake exemple what I want results to be: Imagine these are my 3 input data frames (10 rows each): ST1 - data.frame(x1=c(1:10)) ST2 - data.frame(x2=c(1:5,NA,NA,8:10)) ST3 - data.frame(x3=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA)) The aim of my code is to fill all the NA of my data.frames with data, according to the correlation coefficient of my data.frames(for example, if there're NAs in ST1, ST1 must be filled with data from the best correlated file with ST1 (between ST2 and ST3 in this example)). As ST3 has no data, I cannot have any correlation coefficient. So NAs from ST3 cannot be filled, and ST3 cannot also be used to fill another file. So ST3 has no use if you want. Nevertheless I want to keep ST3 unchanged during all my code. For the moment my code would give for refill this (filled NA in my data.frames): ST1 - data.frame(x1=c(1:10)) ST2 - data.frame(x2=c(1:5,6,7,8:10)) ST3 - NULL But actually, I want for results in refill this: ST1 - data.frame(x1=c(1:10)) ST2 - data.frame(x2=c(1:5,6,7,8:10)) ST3 - data.frame(x3=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA)) So for data.frames with only NAs, I don't want them to be NULL in refill, but I want them to be identical as in input. I need this to have the same dimensions of data.frames between inputs and outputs. If I set them as NULL (like it is for the moment but I don't understand why and I want to change this), there will be 0 rows in this data.frame instead of 10 rows like the other data.frames. So I think there's something wrong in my code in function process.all or na.fill or maybe lst. We don't seem to be far from the solution but I still don't find it for the moment. For information, in function process.all and na.fill: x is the data.frame I want to fill, and y is the file which will be used to fill x (so the best correlated file with x). I really hope I've been enoughly clear and understandable this time. Thank you! -- View this message in context: http://r.789695.n4.nabble.com/how-to-ignore-NA-with-NA-or-NULL-tp4632287p4632546.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to ignore NA with NA or NULL
Hello, Why don't you test an all(is.na(x)) condition? If TRUE, return(NA), not NULL. Rui Barradas Em 06-06-2012 16:42, jeff6868 escreveu: Thanks again for your help jeff. Sorry if I'm not very clear. It's programmingly speaking hard to explain, and even to explain in english as I'm French. But i'll try again. Well your proposition removes the error, but it's not the result I'm expecting. You've removed NULL data.frames, but I need to keep them, well not to keep them but to transform them to something non-NULL actually. I'll try to show you in a very small and fake exemple what I want results to be: Imagine these are my 3 input data frames (10 rows each): ST1 - data.frame(x1=c(1:10)) ST2 - data.frame(x2=c(1:5,NA,NA,8:10)) ST3 - data.frame(x3=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA)) The aim of my code is to fill all the NA of my data.frames with data, according to the correlation coefficient of my data.frames(for example, if there're NAs in ST1, ST1 must be filled with data from the best correlated file with ST1 (between ST2 and ST3 in this example)). As ST3 has no data, I cannot have any correlation coefficient. So NAs from ST3 cannot be filled, and ST3 cannot also be used to fill another file. So ST3 has no use if you want. Nevertheless I want to keep ST3 unchanged during all my code. For the moment my code would give for refill this (filled NA in my data.frames): ST1 - data.frame(x1=c(1:10)) ST2 - data.frame(x2=c(1:5,6,7,8:10)) ST3 - NULL But actually, I want for results in refill this: ST1 - data.frame(x1=c(1:10)) ST2 - data.frame(x2=c(1:5,6,7,8:10)) ST3 - data.frame(x3=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA)) So for data.frames with only NAs, I don't want them to be NULL in refill, but I want them to be identical as in input. I need this to have the same dimensions of data.frames between inputs and outputs. If I set them as NULL (like it is for the moment but I don't understand why and I want to change this), there will be 0 rows in this data.frame instead of 10 rows like the other data.frames. So I think there's something wrong in my code in function process.all or na.fill or maybe lst. We don't seem to be far from the solution but I still don't find it for the moment. For information, in function process.all and na.fill: x is the data.frame I want to fill, and y is the file which will be used to fill x (so the best correlated file with x). I really hope I've been enoughly clear and understandable this time. Thank you! -- View this message in context: http://r.789695.n4.nabble.com/how-to-ignore-NA-with-NA-or-NULL-tp4632287p4632546.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to ignore NA with NA or NULL
Thanks again but my errors are still here. Is it maybe coming from the next fonction (I combinate these 2 functions but I thought it was coming from the first one): process.all - function(df.list, mat){ f - function(station) na.fill(df.list[[ station ]], df.list[[ max.cor[station] ]]) g - function(station){ x - df.list[[station]] if(any(is.na(x[1:8700,1]))){ mat[row(mat) == col(mat)] - -Inf nas - which(is.na(x[1:8700,1])) ord - order(mat[station, ], decreasing = TRUE)[-c(1, ncol(mat))] for(y in ord){ if(all(!is.na(df.list[[y]][1:8700,1][nas]))){ xx - df.list[[y]][1:8700,1] new - data.frame(xx=xx) x[1:8700,1][nas] - predict(lm(x[1:8700,1]~xx, na.action=na.exclude), new)[nas] break } } } x } n - length(df.list) nms - names(df.list) max.cor - sapply(seq.int(n), get.max.cor, corhiver2008capt1) df.list - lapply(seq.int(n), f) df.list - lapply(seq.int(n), g) names(df.list) - nms df.list } refill - process.all(lst, corhiver2008capt1) refill - as.data.frame(refill) The error is when refill is created. It applies process.all in which na.fill is also used. Do you see perhaps any error or missing code which could create this NA problem when I introduce only NAs files? -- View this message in context: http://r.789695.n4.nabble.com/how-to-ignore-NA-with-NA-or-NULL-tp4632287p4632388.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to ignore NA with NA or NULL
Hello, I believe the error is in function 'g'. If I'm right, follow these steps 1. Just before the first if include flag - TRUE 2. Just before for(y in ord) include flag - FALSE 3. Just before break include flag - TRUE 3. Change the return value form simply x to if(flag) x else NA The code loops through the ordered matrix until it finds no NAs in the respective df.list element. Nothing guarantees that there are such list elements. The changes above check it by setting a flag. Rui Barradas Em 05-06-2012 10:54, jeff6868 escreveu: Thanks again but my errors are still here. Is it maybe coming from the next fonction (I combinate these 2 functions but I thought it was coming from the first one): process.all- function(df.list, mat){ f- function(station) na.fill(df.list[[ station ]], df.list[[ max.cor[station] ]]) g- function(station){ x- df.list[[station]] if(any(is.na(x[1:8700,1]))){ mat[row(mat) == col(mat)]- -Inf nas- which(is.na(x[1:8700,1])) ord- order(mat[station, ], decreasing = TRUE)[-c(1, ncol(mat))] for(y in ord){ if(all(!is.na(df.list[[y]][1:8700,1][nas]))){ xx- df.list[[y]][1:8700,1] new- data.frame(xx=xx) x[1:8700,1][nas]- predict(lm(x[1:8700,1]~xx, na.action=na.exclude), new)[nas] break } } } x } n- length(df.list) nms- names(df.list) max.cor- sapply(seq.int(n), get.max.cor, corhiver2008capt1) df.list- lapply(seq.int(n), f) df.list- lapply(seq.int(n), g) names(df.list)- nms df.list } refill- process.all(lst, corhiver2008capt1) refill- as.data.frame(refill) The error is when refill is created. It applies process.all in which na.fill is also used. Do you see perhaps any error or missing code which could create this NA problem when I introduce only NAs files? -- View this message in context: http://r.789695.n4.nabble.com/how-to-ignore-NA-with-NA-or-NULL-tp4632287p4632388.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to ignore NA with NA or NULL
Hello dear R-users, I have a problem in my code about ignoring NA values without removing them. I'm working on a list of files. The aim is to fill one file from another according to the highest correlation (correlation coeff between all my files, so the file which looks like the most to the one I want to fill). When I have just small gaps of NA, my function works well. The problem is when I have only NAs in some files. As a consequence, it cannot calculate any correlation coefficients (my previous function in the case of only NAs in the file returns NA for the correlation coefficient), and so it cannot fill it or make any calculation with it. Nevertheless in my work I need to keep these NA files in my list (and so to keep their dimensions). Otherwise it creates some dimensions problems, and my function needs to me automatic for every files. So my question in this post is: how to ignore (or do nothing with them if you prefer) NA files with NA correlation coefficients? The function for filling files (where there's the problem) is: na.fill - function(x, y){ i - is.na(x[1:8700,1]) xx - y[1:8700,1] new - data.frame(xx=xx) x[1:8700,1][i] - predict(lm(x[1:8700,1]~xx, na.action=na.exclude), new)[i] x } My error message is: Error in model.frame.default(formula = x[1:8700, 1] ~ xx, na.action = na.exclude, : : invalid type (NULL) for variable 'xx' I tried to add in the function: ifelse( all(is.null(xx))==TRUE,return(NA),xx) or ifelse( all(is.null(xx))==TRUE,return(NULL),xx) but it still doesn't work. How can I write that in my function? With NA, NULL or in another way? Thank you very much for your answers -- View this message in context: http://r.789695.n4.nabble.com/how-to-ignore-NA-with-NA-or-NULL-tp4632287.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to ignore NA with NA or NULL
I find that avoiding using the return() function at all makes my code easier to follow. In your case it is simply incorrect, though, since ifelse is a vector function and return is a control flow function. Your code is not reproducible and your description isn't clear about how you are handling the return result from this function, so I can't be sure what you are really asking, but I suspect you just want flow control, so use (untested): na.fill - function(x, y){ i - is.na(x[1:8700,1]) xx - y[1:8700,1] new - data.frame(xx=xx) if ( !all(is.na(xx)) ) { x[1:8700,1][i] - predict(lm(x[1:8700,1]~xx, na.action=na.exclude),new)[i] } x } --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. jeff6868 geoffrey_kl...@etu.u-bourgogne.fr wrote: Hello dear R-users, I have a problem in my code about ignoring NA values without removing them. I'm working on a list of files. The aim is to fill one file from another according to the highest correlation (correlation coeff between all my files, so the file which looks like the most to the one I want to fill). When I have just small gaps of NA, my function works well. The problem is when I have only NAs in some files. As a consequence, it cannot calculate any correlation coefficients (my previous function in the case of only NAs in the file returns NA for the correlation coefficient), and so it cannot fill it or make any calculation with it. Nevertheless in my work I need to keep these NA files in my list (and so to keep their dimensions). Otherwise it creates some dimensions problems, and my function needs to me automatic for every files. So my question in this post is: how to ignore (or do nothing with them if you prefer) NA files with NA correlation coefficients? The function for filling files (where there's the problem) is: na.fill - function(x, y){ i - is.na(x[1:8700,1]) xx - y[1:8700,1] new - data.frame(xx=xx) x[1:8700,1][i] - predict(lm(x[1:8700,1]~xx, na.action=na.exclude), new)[i] x } My error message is: Error in model.frame.default(formula = x[1:8700, 1] ~ xx, na.action = na.exclude, : : invalid type (NULL) for variable 'xx' I tried to add in the function: ifelse( all(is.null(xx))==TRUE,return(NA),xx) or ifelse( all(is.null(xx))==TRUE,return(NULL),xx) but it still doesn't work. How can I write that in my function? With NA, NULL or in another way? Thank you very much for your answers -- View this message in context: http://r.789695.n4.nabble.com/how-to-ignore-NA-with-NA-or-NULL-tp4632287.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to ignore NA with NA or NULL
Hello, 'ifelse' is vectorized, what you want is the plain 'if'. if(all(is.na(xx))) return(NA) Hope this helps, Rui Barradas Em 04-06-2012 09:56, jeff6868 escreveu: Hello dear R-users, I have a problem in my code about ignoring NA values without removing them. I'm working on a list of files. The aim is to fill one file from another according to the highest correlation (correlation coeff between all my files, so the file which looks like the most to the one I want to fill). When I have just small gaps of NA, my function works well. The problem is when I have only NAs in some files. As a consequence, it cannot calculate any correlation coefficients (my previous function in the case of only NAs in the file returns NA for the correlation coefficient), and so it cannot fill it or make any calculation with it. Nevertheless in my work I need to keep these NA files in my list (and so to keep their dimensions). Otherwise it creates some dimensions problems, and my function needs to me automatic for every files. So my question in this post is: how to ignore (or do nothing with them if you prefer) NA files with NA correlation coefficients? The function for filling files (where there's the problem) is: na.fill- function(x, y){ i- is.na(x[1:8700,1]) xx- y[1:8700,1] new- data.frame(xx=xx) x[1:8700,1][i]- predict(lm(x[1:8700,1]~xx, na.action=na.exclude), new)[i] x } My error message is: Error in model.frame.default(formula = x[1:8700, 1] ~ xx, na.action = na.exclude, : : invalid type (NULL) for variable 'xx' I tried to add in the function: ifelse( all(is.null(xx))==TRUE,return(NA),xx) or ifelse( all(is.null(xx))==TRUE,return(NULL),xx) but it still doesn't work. How can I write that in my function? With NA, NULL or in another way? Thank you very much for your answers -- View this message in context: http://r.789695.n4.nabble.com/how-to-ignore-NA-with-NA-or-NULL-tp4632287.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to ignore NA with NA or NULL
Thanks for answering Jeff. Yes sorry it's not easy to explain my problem. I'll try to give you a reproductible example (even if it'll not be exactly like my files), and I'll try to explain my function and what I want to do more precisely. Imagine for the example: df1, df2 and df3 are my files: df1 - data.frame(x1=c(rnorm(1:5),NA,NA,rnorm(8:10))) df2 - data.frame(x2=rnorm(1:10)) df3 - data.frame(x3=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA)) df - list(df1,df2,df3) I want to fill each NA gaps of my files. If I have only df1 and df2 in my list, it'll work. If I introduce df3 (a file with only NAs), R won't understand what to do. In my function: na.fill - function(x, y){ i - is.na(x[1:10,1]) xx - y[1:10,1] new - data.frame(xx=xx) x[1:10,1][i] - predict(lm(x[1:10,1]~xx, na.action=na.exclude), new)[i] x } x is the file I want to fill. So i lists all the NA gaps of the file. xx is the file that will be used to fill x (actually the best correlated file with x according to all my files). And then I apply a linear regression between my 2 files: x and xx to take predicted values from xx to put in the gaps of x. Before I got files containing only NAs, it was working well. But since I introduced some files with no data and so only NAs, I have my problem. I got different NA problems when I tried a few solutions: Error in model.frame.default(formula = x[1:8700,1] ~xx, na.action = na.exclude, : : invalid type (NULL) for variable 'xx' OR 0 (non-NA) cases OR is.na() applied to non-(list or vector) of type 'NULL Actually I'm looking for a solution in na.fill to avoid these problems, in order to ignore these only NA files from the calculation (maybe something like na.pass) but I would like to keep them in the list. So the aim would be maybe to keep them unchanged (if I have for example ST1 file with 30 only NA in input, I want to have ST1 file with 30 only NA in output) but calculation should work with these kinds of files in my list even if the code does nothing with them. Hope you've understood. Thanks again for your help. -- View this message in context: http://r.789695.n4.nabble.com/how-to-ignore-NA-with-NA-or-NULL-tp4632287p4632314.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to ignore NA with NA or NULL
Hello Rui, Sorry I read your post after having answered to jeff. If seems effectively to be better than ifelse, thanks. But I still have some errors: Error in x[1:8700, 1] : incorrect number of dimensions AND In is.na(xx) : is.na() applied to non-(list or vector) of type 'NULL It seems to have modified the length of my data, due to these NAs -- View this message in context: http://r.789695.n4.nabble.com/how-to-ignore-NA-with-NA-or-NULL-tp4632287p4632315.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to ignore NA with NA or NULL
Hello again, The complete function would be na.fill - function(x, y){ # do this immediatly, may save copying if(all(is.na(y[1:8700,1]))) return(NA) i - is.na(x[1:8700,1]) xx - y[1:8700,1] new - data.frame(xx=xx) x[1:8700,1][i] - predict(lm(x[1:8700,1]~xx, na.action=na.exclude), new)[i] x } Rui Barradas Em 04-06-2012 16:05, Rui Barradas escreveu: Hello, 'ifelse' is vectorized, what you want is the plain 'if'. if(all(is.na(xx))) return(NA) Hope this helps, Rui Barradas Em 04-06-2012 09:56, jeff6868 escreveu: Hello dear R-users, I have a problem in my code about ignoring NA values without removing them. I'm working on a list of files. The aim is to fill one file from another according to the highest correlation (correlation coeff between all my files, so the file which looks like the most to the one I want to fill). When I have just small gaps of NA, my function works well. The problem is when I have only NAs in some files. As a consequence, it cannot calculate any correlation coefficients (my previous function in the case of only NAs in the file returns NA for the correlation coefficient), and so it cannot fill it or make any calculation with it. Nevertheless in my work I need to keep these NA files in my list (and so to keep their dimensions). Otherwise it creates some dimensions problems, and my function needs to me automatic for every files. So my question in this post is: how to ignore (or do nothing with them if you prefer) NA files with NA correlation coefficients? The function for filling files (where there's the problem) is: na.fill- function(x, y){ i- is.na(x[1:8700,1]) xx- y[1:8700,1] new- data.frame(xx=xx) x[1:8700,1][i]- predict(lm(x[1:8700,1]~xx, na.action=na.exclude), new)[i] x } My error message is: Error in model.frame.default(formula = x[1:8700, 1] ~ xx, na.action = na.exclude, : : invalid type (NULL) for variable 'xx' I tried to add in the function: ifelse( all(is.null(xx))==TRUE,return(NA),xx) or ifelse( all(is.null(xx))==TRUE,return(NULL),xx) but it still doesn't work. How can I write that in my function? With NA, NULL or in another way? Thank you very much for your answers -- View this message in context: http://r.789695.n4.nabble.com/how-to-ignore-NA-with-NA-or-NULL-tp4632287.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.