Rich, I have to wonder about how your data was placed in the CSV file based on what you report.
functions like read.table() (which is called by read.csv()) ultimately make guesses about what number of columns to expect and what the contents are likely to be. They may just examine the first N entries and make the most compatible choice. The fact that it shows this: 'data.frame': 565675 obs. of 6 variables: $ year : chr "2016" "2016" "2016" "2016" ... $ month: int 3 3 3 3 3 3 3 3 3 3 ... $ day : int 3 3 3 3 3 3 3 3 3 3 ... $ hour : chr "12" "12" "12" "12" ... $ min : int 0 10 20 30 40 50 0 10 20 30 ... $ fps : chr "1.74" "1.75" "1.76" "1.81" ... is odd. It suggests somewhere early in the data, it did not say 2016 or some other entry as an integer but as "2016" or a word like `missing` and not in quotes. Something similar seems to have happened with hour and fps but not the rest. Nonetheless, you did convert back to what you wanted BUT if a single anomalous entry remains then as.integer("missing") would return an NA and as.double("missing") also an NA. So it is wise to check for any unexpected numbers. If the source cannot be changed, then the R program can filter out such cases from your data.frame in various ways. Your way of reading the CSV in was this: vel <- read.csv('../data/water/vel.dat', header = TRUE, sep = ',', stringsAsFactors = FALSE) The default is the options you added for header=TRUE and sep="," so that is harmless. The default now is not to read in strings as Factors. But what you did not include may be something you can look at given your data may be a bit off. Without the underlying file, we can not trivially diagnose what may be wrong in it. Do you get any error messages when reading in the file? You can specify additional arguments to read.csv() about what, if any, quoting characters are used, what sequences should be recognized as an NA, suggestions of what type each column should be assumed to be, what to do with blank lines, what a comment looks like and so on. One thing I sometimes have had to do is open the original CSV file in EXCEL and examine it in various ways or even change it and save it again. That is beyond the scope of this mailing list so if needed, ask me in private. You have been working on this kind of stuff, but I assume often using other tools outside R and dplyr. -----Original Message----- From: R-help <r-help-boun...@r-project.org> On Behalf Of Rich Shepard Sent: Tuesday, September 14, 2021 11:49 AM To: R mailing list <r-help@r-project.org> Subject: Re: [R] Need fresh eyes to see what I'm missing On Tue, 14 Sep 2021, Bert Gunter wrote: > Remove all your as.integer() and as.double() coercions. They are > unnecessary (unless you are preparing input for C code; also, all R > non-integers are double precision) and may be the source of your problems. Bert, When I remove coercions the script produces warnings like this: 1: In mean.default(fps, na.rm = TRUE) : argument is not numeric or logical: returning NA and str(vel) displays this: 'data.frame': 565675 obs. of 6 variables: $ year : chr "2016" "2016" "2016" "2016" ... $ month: int 3 3 3 3 3 3 3 3 3 3 ... $ day : int 3 3 3 3 3 3 3 3 3 3 ... $ hour : chr "12" "12" "12" "12" ... $ min : int 0 10 20 30 40 50 0 10 20 30 ... $ fps : chr "1.74" "1.75" "1.76" "1.81" ... so month, day, and min are recognized as integers but year, hour, and fps are seen as characters. I don't understand why. Regards, Rich ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.