HI Philippos, Try this: dat1<- read.csv("Validation_data_set3.csv",sep=",",stringsAsFactors=FALSE) #converted to csv str(dat1) #'data.frame': 12573 obs. of 17 variables: # $ Removed.AGC : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.SST : chr "" "46.1658" "41.2566" "14.0931" ... # $ Removed.Kurtosis : num NA NA NA NA 5.38 ... # $ Removed.Skewness : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.QC17999 : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.QC16200 : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.SST.AGC : chr "" "46.1658" "41.2566" "14.0931" ... # $ Removed.Kurtosis.Skewness : num NA NA NA NA 5.38 ... # $ Removed.AGC.QC16200 : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.AGC.QC17999 : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.AGC.QC17999.3.stdevs : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.AGC.QC17999.less.than.1 : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.SST.AGC.QC17999 : chr "" "46.1658" "41.2566" "14.0931" ... # $ Removed.SST.AGC.QC16200 : chr "" "46.1658" "41.2566" "14.0931" ... # $ Removed.SST.AGC.Kurtosis.Skewness : chr "" "" "" "" ... # $ Removed.SST.AGC.Kurtosis.Skewness.QC17999: chr "" "" "" "" ... # $ Removed.SST.AGC.Kurtosis.Skewness.QC16200: chr "" "" "" "" ...
#Found these characters in columns that are not numeric do.call(rbind,lapply(dat1,function(x) {x1<- x[is.character(x)];x1[grepl("\\#",x1)]})) # [,1] [,2] [,3] #Removed.SST "#DIV/0!" "#DIV/0!" "#DIV/0!" #Removed.SST.AGC "#DIV/0!" "#DIV/0!" "#DIV/0!" #Removed.SST.AGC.QC17999 "#DIV/0!" "#DIV/0!" "#DIV/0!" #Removed.SST.AGC.QC16200 "#DIV/0!" "#DIV/0!" "#DIV/0!" #Removed.SST.AGC.Kurtosis.Skewness "#DIV/0!" "#DIV/0!" "#DIV/0!" #Removed.SST.AGC.Kurtosis.Skewness.QC17999 "#DIV/0!" "#DIV/0!" "#DIV/0!" #Removed.SST.AGC.Kurtosis.Skewness.QC16200 "#DIV/0!" "#DIV/0!" "#DIV/0!" # [,4] #Removed.SST "#DIV/0!" #Removed.SST.AGC "#DIV/0!" #Removed.SST.AGC.QC17999 "#DIV/0!" #Removed.SST.AGC.QC16200 "#DIV/0!" #Removed.SST.AGC.Kurtosis.Skewness "#DIV/0!" #Removed.SST.AGC.Kurtosis.Skewness.QC17999 "#DIV/0!" #Removed.SST.AGC.Kurtosis.Skewness.QC16200 "#DIV/0!" dat2<-as.data.frame(sapply(dat1,function(x) { x[is.character(x)][grep("\\#",x[is.character(x)])]<- NA; x1<- as.numeric(x)})) str(dat2) #'data.frame': 12573 obs. of 17 variables: # $ Removed.AGC : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.SST : num NA 46.17 41.26 14.09 5.38 ... # $ Removed.Kurtosis : num NA NA NA NA 5.38 ... # $ Removed.Skewness : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.QC17999 : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.QC16200 : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.SST.AGC : num NA 46.17 41.26 14.09 5.38 ... # $ Removed.Kurtosis.Skewness : num NA NA NA NA 5.38 ... # $ Removed.AGC.QC16200 : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.AGC.QC17999 : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.AGC.QC17999.3.stdevs : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.AGC.QC17999.less.than.1 : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.SST.AGC.QC17999 : num NA 46.17 41.26 14.09 5.38 ... # $ Removed.SST.AGC.QC16200 : num NA 46.17 41.26 14.09 5.38 ... # $ Removed.SST.AGC.Kurtosis.Skewness : num NA NA NA NA 5.38 ... # $ Removed.SST.AGC.Kurtosis.Skewness.QC17999: num NA NA NA NA 5.38 ... # $ Removed.SST.AGC.Kurtosis.Skewness.QC16200: num NA NA NA NA 5.38 ... head(dat2,3) # Removed.AGC Removed.SST Removed.Kurtosis Removed.Skewness Removed.QC17999 #1 65.6738 NA NA 65.6738 65.6738 #2 46.1658 46.1658 NA 46.1658 46.1658 #3 41.2566 41.2566 NA 41.2566 41.2566 # Removed.QC16200 Removed.SST.AGC Removed.Kurtosis.Skewness Removed.AGC.QC16200 #1 65.6738 NA NA 65.6738 #2 46.1658 46.1658 NA 46.1658 #3 41.2566 41.2566 NA 41.2566 # Removed.AGC.QC17999 Removed.AGC.QC17999.3.stdevs #1 65.6738 65.6738 #2 46.1658 46.1658 #3 41.2566 41.2566 # Removed.AGC.QC17999.less.than.1 Removed.SST.AGC.QC17999 #1 65.6738 NA #2 46.1658 46.1658 #3 41.2566 41.2566 # Removed.SST.AGC.QC16200 Removed.SST.AGC.Kurtosis.Skewness #1 NA NA #2 46.1658 NA #3 41.2566 NA # Removed.SST.AGC.Kurtosis.Skewness.QC17999 #1 NA #2 NA #3 NA # Removed.SST.AGC.Kurtosis.Skewness.QC16200 #1 NA #2 NA #3 NA I work as a postdoc at Wayne State University, Detroit, Regards, A.K. ________________________________ From: Philippos Tsourkas <ptsour...@hotmail.com> To: "smartpink...@yahoo.com" <smartpink...@yahoo.com> Sent: Tuesday, April 16, 2013 6:07 PM Subject: R question Hello Arun, and thank you for your offer to help. I am sending you the xlsx file I am trying to use. I save it as a csv, read it in R using read.csv, amd then extract the columns. Some columns are numeric and contain NA instead of blank spaces (e.g. column 1), while other columns (e.g. column 2) contain blank spaces instead of NA and is not numeric. I can't figure out what's causing this or how to deal with it. Basically, all columns should be numeric with NAs instead of blank spaces. What do you do by the way? Thanks again, Philippos ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.