Re: [R] Need fresh eyes to see what I'm missing
Rich, You have helped us understand and at this point, suppose we now are sure about the way missing info is supplied. What you show is not the same as the CSV sample earlier but assuming you know that "Eqp" is the one and only way they signaled bad data. One choice is to fix the original data before reading into R. Chances are placing exactly NA in those places, perhaps using a global substitute of some sort, might do it. But as Bert noted, R is a very powerful environment and you can use it. One argument you can use with read.csv() is to tell it "Eqp" is to be treated as an NA. The substitution may then be made as it is read in AND you might then notice it is properly read in as a column of doubles. Suppose you read in this data and make sure the column involved is read as character strings, instead. You can use any number of tools in base R or dplyr to replace Eqp with NA such as in a pipeline ... %>% mutate(fps=ifelse(fps=="Eqp", NA, fps)) %>% ... The above is one of many ways and of course afterward, you may want to reconvert the character column back to floating point. I note dplyr can do both in the same function as it applies them in order: mutate(fps=ifelse(fps=="Eqp", NA, fps), fps=as.double(fps)) The point is that in many cases, the data must be carefully examined and cleaned and set up. In some cases, it may also be useful to treat some as factors as in the hours and minutes. If you continue on your road and hit ggplot() to make graphs, factors may be useful in various kinds of fine tuning. -Original Message- From: R-help On Behalf Of Rich Shepard Sent: Tuesday, September 14, 2021 1:59 PM To: r-help@r-project.org Subject: Re: [R] Need fresh eyes to see what I'm missing On Tue, 14 Sep 2021, Bert Gunter wrote: > **Don't do this.*** You will make errors. Use fit-for-purpose tools. > That's what R is for. Also, be careful **how** you "download", as that > already may bake in problems. Bert, Haven't had downloading errors saving displayed files. The problem with the velocities data is shown here: 2020-11-24 11:00PST Eqp 2020-11-24 11:05PST Eqp 2020-11-24 11:10PST Eqp 2020-11-24 11:15PST Eqp 2020-11-24 11:20PST Eqp 2020-11-24 11:25PST Eqp 2020-11-24 11:30PST Eqp 2020-11-24 11:35PST Eqp 2020-11-24 11:40PST Eqp 2020-11-24 11:45PST Eqp 2020-11-24 11:50PST Eqp 2021-01-08 16:26PST Eqp Equipment failure during the period shown. What's the best way to replace these lines? Just remove them or change them to NA? Regards, Rich __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need fresh eyes to see what I'm missing
On Tue, 14 Sep 2021, Bert Gunter wrote: **Don't do this.*** You will make errors. Use fit-for-purpose tools. That's what R is for. Also, be careful **how** you "download", as that already may bake in problems. Bert, Haven't had downloading errors saving displayed files. The problem with the velocities data is shown here: 2020-11-24 11:00 PST Eqp 2020-11-24 11:05 PST Eqp 2020-11-24 11:10 PST Eqp 2020-11-24 11:15 PST Eqp 2020-11-24 11:20 PST Eqp 2020-11-24 11:25 PST Eqp 2020-11-24 11:30 PST Eqp 2020-11-24 11:35 PST Eqp 2020-11-24 11:40 PST Eqp 2020-11-24 11:45 PST Eqp 2020-11-24 11:50 PST Eqp 2021-01-08 16:26 PST Eqp Equipment failure during the period shown. What's the best way to replace these lines? Just remove them or change them to NA? Regards, Rich __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need fresh eyes to see what I'm missing
Inline. On Tue, Sep 14, 2021 at 10:42 AM Rich Shepard wrote: > > On Tue, 14 Sep 2021, Eric Berger wrote: > > > My suggestion was not 'to make a difference'. It was to determine whether > > the NAs or NaNs appear before the dplyr commands. You confirmed that they > > do. There are 2321 NAs in vel. Bert suggested some ways that an NA might > > appear. > > Eric, > > Yes, you're all correct. I've just downloaded the raw data again for mean > velocieties and suspended sediments. I'll go through them line-by-line and > look for discrepancies. **Don't do this.*** You will make errors. Use fit-for-purpose tools. That's what R is for. Also, be careful **how** you "download", as that already may bake in problems. -- Bert > > Thanks again, > > Rich > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need fresh eyes to see what I'm missing
On Tue, 14 Sep 2021, Bert Gunter wrote: Input problems of this sort are often caused by stray or extra characters (commas, dashes, etc.) in the input files, which then can trigger automatic conversion to character. Excel files are somewhat notorious for this. Bert, Large volume of missing data at the end of last year. See attached plot. I'll go through the raw data file to see how those missing data are presented. Regards, Rich__ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need fresh eyes to see what I'm missing
On Tue, 14 Sep 2021, Eric Berger wrote: My suggestion was not 'to make a difference'. It was to determine whether the NAs or NaNs appear before the dplyr commands. You confirmed that they do. There are 2321 NAs in vel. Bert suggested some ways that an NA might appear. Eric, Yes, you're all correct. I've just downloaded the raw data again for mean velocieties and suspended sediments. I'll go through them line-by-line and look for discrepancies. Thanks again, Rich __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need fresh eyes to see what I'm missing
Hi Rich, My suggestion was not 'to make a difference'. It was to determine whether the NAs or NaNs appear before the dplyr commands. You confirmed that they do. There are 2321 NAs in vel. Bert suggested some ways that an NA might appear. Best, Eric On Tue, Sep 14, 2021 at 6:42 PM Rich Shepard wrote: > > On Tue, 14 Sep 2021, Eric Berger wrote: > > > Before you create vel_by_month you can check vel for NAs and NaNs by > > > > sum(is.na(vel)) > > sum(unlist(lapply(vel,is.nan))) > > Eric, > > There should not be any missing values in the data file. Regardless, I added > those lines to the script and it made no difference. > > Running those commands on the R command line showed these results: > > sum(is.na(vel)) > [1] 2321 > > sum(unlist(lapply(vel,is.nan))) > [1] 0 > > Yet the monthly summaries retain the initial line: > > vel_by_month > # A tibble: 67 × 3 > # Groups: year [8] > year month flow > > 1 0NA NaN > > I've another data set with the same issue (that's 2 out of 5) and I assume > the source of the problem is the same with both. > > The data sets have no NAs or missing values at the end of a line. > > Thanks for the ideas, > > Rich > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need fresh eyes to see what I'm missing
On Tue, 14 Sep 2021, Bert Gunter wrote: Input problems of this sort are often caused by stray or extra characters (commas, dashes, etc.) in the input files, which then can trigger automatic conversion to character. Excel files are somewhat notorious for this. Bert, Yes, I'm going to closely review the original data file and work forward from there. Thanks for your comments; I do appreciate them. Back when I have more information ... perhaps even a fix. Best regards, Rich __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need fresh eyes to see what I'm missing
Input problems of this sort are often caused by stray or extra characters (commas, dashes, etc.) in the input files, which then can trigger automatic conversion to character. Excel files are somewhat notorious for this. A couple of comments, and then I'll quit, as others should have greater insight (and may correct any of my errors). 1. > as.numeric("1,") [1] NA Warning message: NAs introduced by coercion So if a stray character caused your "numeric" input to be read in as character, then you converted it with as.numeric() (do not use as.integer or as.double), you get that error. 2. So I would say that you need to check those columns in your data frame that were read in as character instead of numeric. I'd also check the others with unique() or some such just to make sure they have the handful of right values. One way of doing this would be to look for NA's in as.numeric, as above. But I thought you said you did this already and found none, so I don't get it. Other approaches would be to examine your .csv file with ?count.fields or try reading it with ?read.delim. Any discrepancies or errors you get from these may help you to pinpoint problems like stray characters, to many fields in a line, etc. 3. As for your "fps as factors" question, note that: > as.numeric(factor("3")) [1] 1 So it depends on how you read stuff in. The answer should be "no" with read.csv(..., stringsAsFactors = FALSE), but I'm not sure what all you did or what kind of junk in your .csv file may be causing R to misread the numeric data as character. As I said, others may be wiser and correct any errors in my "advice." This is as far as I can go -- and it may already be too far. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Sep 14, 2021 at 9:01 AM Rich Shepard wrote: > > On Tue, 14 Sep 2021, Bert Gunter wrote: > > > Remove all your as.integer() and as.double() coercions. They are > > unnecessary (unless you are preparing input for C code; also, all R > > non-integers are double precision) and may be the source of your problems. > > Bert, > > When I remove coercions the script produces warnings like this: > 1: In mean.default(fps, na.rm = TRUE) : >argument is not numeric or logical: returning NA > > and str(vel) displays this: > 'data.frame': 565675 obs. of 6 variables: > $ year : chr "2016" "2016" "2016" "2016" ... > $ month: int 3 3 3 3 3 3 3 3 3 3 ... > $ day : int 3 3 3 3 3 3 3 3 3 3 ... > $ hour : chr "12" "12" "12" "12" ... > $ min : int 0 10 20 30 40 50 0 10 20 30 ... > $ fps : chr "1.74" "1.75" "1.76" "1.81" ... > > so month, day, and min are recognized as integers but year, hour, and fps > are seen as characters. I don't understand why. > > Regards, > > Rich > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need fresh eyes to see what I'm missing
Rich, I have to wonder about how your data was placed in the CSV file based on what you report. functions like read.table() (which is called by read.csv()) ultimately make guesses about what number of columns to expect and what the contents are likely to be. They may just examine the first N entries and make the most compatible choice. The fact that it shows this: 'data.frame': 565675 obs. of 6 variables: $ year : chr "2016" "2016" "2016" "2016" ... $ month: int 3 3 3 3 3 3 3 3 3 3 ... $ day : int 3 3 3 3 3 3 3 3 3 3 ... $ hour : chr "12" "12" "12" "12" ... $ min : int 0 10 20 30 40 50 0 10 20 30 ... $ fps : chr "1.74" "1.75" "1.76" "1.81" ... is odd. It suggests somewhere early in the data, it did not say 2016 or some other entry as an integer but as "2016" or a word like `missing` and not in quotes. Something similar seems to have happened with hour and fps but not the rest. Nonetheless, you did convert back to what you wanted BUT if a single anomalous entry remains then as.integer("missing") would return an NA and as.double("missing") also an NA. So it is wise to check for any unexpected numbers. If the source cannot be changed, then the R program can filter out such cases from your data.frame in various ways. Your way of reading the CSV in was this: vel <- read.csv('../data/water/vel.dat', header = TRUE, sep = ',', stringsAsFactors = FALSE) The default is the options you added for header=TRUE and sep="," so that is harmless. The default now is not to read in strings as Factors. But what you did not include may be something you can look at given your data may be a bit off. Without the underlying file, we can not trivially diagnose what may be wrong in it. Do you get any error messages when reading in the file? You can specify additional arguments to read.csv() about what, if any, quoting characters are used, what sequences should be recognized as an NA, suggestions of what type each column should be assumed to be, what to do with blank lines, what a comment looks like and so on. One thing I sometimes have had to do is open the original CSV file in EXCEL and examine it in various ways or even change it and save it again. That is beyond the scope of this mailing list so if needed, ask me in private. You have been working on this kind of stuff, but I assume often using other tools outside R and dplyr. -Original Message- From: R-help On Behalf Of Rich Shepard Sent: Tuesday, September 14, 2021 11:49 AM To: R mailing list Subject: Re: [R] Need fresh eyes to see what I'm missing On Tue, 14 Sep 2021, Bert Gunter wrote: > Remove all your as.integer() and as.double() coercions. They are > unnecessary (unless you are preparing input for C code; also, all R > non-integers are double precision) and may be the source of your problems. Bert, When I remove coercions the script produces warnings like this: 1: In mean.default(fps, na.rm = TRUE) : argument is not numeric or logical: returning NA and str(vel) displays this: 'data.frame': 565675 obs. of 6 variables: $ year : chr "2016" "2016" "2016" "2016" ... $ month: int 3 3 3 3 3 3 3 3 3 3 ... $ day : int 3 3 3 3 3 3 3 3 3 3 ... $ hour : chr "12" "12" "12" "12" ... $ min : int 0 10 20 30 40 50 0 10 20 30 ... $ fps : chr "1.74" "1.75" "1.76" "1.81" ... so month, day, and min are recognized as integers but year, hour, and fps are seen as characters. I don't understand why. Regards, Rich __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need fresh eyes to see what I'm missing
Rich, I reproduced your problem on my re-arranging the code the mailer mangled. I tried variations like not using pipes or changing what it is grouped by and they all show your results on the abbreviated data with the error: `summarise()` has grouped output by 'year'. You can override using the `.groups` argument. I think I fixed summarise() but it makes me wonder if there is an inconsistency introduced along the way as what you used is supposed to work and has worked for me in the past. I note the man page for summarise() mentions that the .groups="..." is experimental and a tad confusing: I changed your code to this by telling it to keep the grouping in the output the same: vel_by_month = vel %>% group_by(year, month) %>% summarise(flow = mean(fps, na.rm = TRUE), .groups="keep") The change from your code is the addition at the very end of the .groups="keep" argument. Since I used your limited data, this is all I get: > vel_by_month # A tibble: 1 x 3 # Groups: year, month [1] year month flow 1 2016 3 1.77 For now, all I did was shut summarise() up. Not having the rest of your data, the question is where your NA and Nan are introduced. If the change I made above does not resolve it, then as others suggested, you begin by looking at your data more carefully perhaps starting with the .CSV file and then the data structures in R, along the lines of what you were shown. I find the table() function useful for categorical data with limited choices as it would spit out the anomaly as happening once. I see your point about needing fresh eyes. My eyes do not see what you did wrong but am just following clues you may be ignoring. -Original Message- From: R-help On Behalf Of Rich Shepard Sent: Tuesday, September 14, 2021 11:21 AM To: r-help@r-project.org Subject: [R] Need fresh eyes to see what I'm missing The data file begins this way: year,month,day,hour,min,fps 2016,03,03,12,00,1.74 2016,03,03,12,10,1.75 2016,03,03,12,20,1.76 2016,03,03,12,30,1.81 2016,03,03,12,40,1.79 2016,03,03,12,50,1.75 2016,03,03,13,00,1.78 2016,03,03,13,10,1.81 The script to process it: library('tidyverse') vel <- read.csv('../data/water/vel.dat', header = TRUE, sep = ',', stringsAsFactors = FALSE) vel$year <- as.integer(vel$year) vel$month <- as.integer(vel$month) vel$day <- as.integer(vel$day) vel$hour <- as.integer(vel$hour) vel$min <- as.integer(vel$min) vel$fps <- as.double(vel$fps, length = 6) # use dplyr to filter() by year, month, day; summarize() to get monthly # means vel_by_month = vel %>% group_by(year, month) %>% summarize(flow = mean(fps, na.rm = TRUE)) R's display after running the script: > source('vel.R') `summarise()` has grouped output by 'year'. You can override using the `.groups` argument. Warning messages: 1: In eval(ei, envir) : NAs introduced by coercion 2: In eval(ei, envir) : NAs introduced by coercion 3: In eval(ei, envir) : NAs introduced by coercion The dataframe created by the read.csv() command: > head(vel) year month day hour min fps 1 2016 3 3 12 0 1.74 2 2016 3 3 12 10 1.75 3 2016 3 3 12 20 1.76 4 2016 3 3 12 30 1.81 5 2016 3 3 12 40 1.79 6 2016 3 3 12 50 1.75 and the resulting grouping: > vel_by_month # A tibble: 67 × 3 # Groups: year [8] year month flow 1 0NA NaN 2 2016 3 2.40 3 2016 4 3.00 4 2016 5 2.86 5 2016 6 2.51 6 2016 7 2.18 7 2016 8 1.89 8 2016 9 1.38 9 201610 1.73 10 201611 2.01 # … with 57 more rows I cannot find why line 1 is there. Other data sets don't produce this result. TIA, Rich __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need fresh eyes to see what I'm missing
On Tue, 14 Sep 2021, Bert Gunter wrote: Remove all your as.integer() and as.double() coercions. They are unnecessary (unless you are preparing input for C code; also, all R non-integers are double precision) and may be the source of your problems. Bert, Are all columns but the fps factors? Rich __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need fresh eyes to see what I'm missing
On Tue, 14 Sep 2021, Bert Gunter wrote: Remove all your as.integer() and as.double() coercions. They are unnecessary (unless you are preparing input for C code; also, all R non-integers are double precision) and may be the source of your problems. Bert, When I remove coercions the script produces warnings like this: 1: In mean.default(fps, na.rm = TRUE) : argument is not numeric or logical: returning NA and str(vel) displays this: 'data.frame': 565675 obs. of 6 variables: $ year : chr "2016" "2016" "2016" "2016" ... $ month: int 3 3 3 3 3 3 3 3 3 3 ... $ day : int 3 3 3 3 3 3 3 3 3 3 ... $ hour : chr "12" "12" "12" "12" ... $ min : int 0 10 20 30 40 50 0 10 20 30 ... $ fps : chr "1.74" "1.75" "1.76" "1.81" ... so month, day, and min are recognized as integers but year, hour, and fps are seen as characters. I don't understand why. Regards, Rich __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need fresh eyes to see what I'm missing
On Tue, 14 Sep 2021, Eric Berger wrote: Before you create vel_by_month you can check vel for NAs and NaNs by sum(is.na(vel)) sum(unlist(lapply(vel,is.nan))) Eric, There should not be any missing values in the data file. Regardless, I added those lines to the script and it made no difference. Running those commands on the R command line showed these results: sum(is.na(vel)) [1] 2321 sum(unlist(lapply(vel,is.nan))) [1] 0 Yet the monthly summaries retain the initial line: vel_by_month # A tibble: 67 × 3 # Groups: year [8] year month flow 1 0NA NaN I've another data set with the same issue (that's 2 out of 5) and I assume the source of the problem is the same with both. The data sets have no NAs or missing values at the end of a line. Thanks for the ideas, Rich __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need fresh eyes to see what I'm missing
Remove all your as.integer() and as.double() coercions. They are unnecessary (unless you are preparing input for C code; also, all R non-integers are double precision) and may be the source of your problems. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Sep 14, 2021 at 8:31 AM Eric Berger wrote: > > Before you create vel_by_month you can check vel for NAs and NaNs by > > sum(is.na(vel)) > sum(unlist(lapply(vel,is.nan))) > > HTH, > Eric > > > On Tue, Sep 14, 2021 at 6:21 PM Rich Shepard > wrote: > > > The data file begins this way: > > year,month,day,hour,min,fps > > 2016,03,03,12,00,1.74 > > 2016,03,03,12,10,1.75 > > 2016,03,03,12,20,1.76 > > 2016,03,03,12,30,1.81 > > 2016,03,03,12,40,1.79 > > 2016,03,03,12,50,1.75 > > 2016,03,03,13,00,1.78 > > 2016,03,03,13,10,1.81 > > > > The script to process it: > > library('tidyverse') > > vel <- read.csv('../data/water/vel.dat', header = TRUE, sep = ',', > > stringsAsFactors = FALSE) > > vel$year <- as.integer(vel$year) > > vel$month <- as.integer(vel$month) > > vel$day <- as.integer(vel$day) > > vel$hour <- as.integer(vel$hour) > > vel$min <- as.integer(vel$min) > > vel$fps <- as.double(vel$fps, length = 6) > > > > # use dplyr to filter() by year, month, day; summarize() to get monthly > > # means > > vel_by_month = vel %>% > > group_by(year, month) %>% > > summarize(flow = mean(fps, na.rm = TRUE)) > > > > R's display after running the script: > > > source('vel.R') > > `summarise()` has grouped output by 'year'. You can override using the > > `.groups` argument. > > Warning messages: > > 1: In eval(ei, envir) : NAs introduced by coercion > > 2: In eval(ei, envir) : NAs introduced by coercion > > 3: In eval(ei, envir) : NAs introduced by coercion > > > > The dataframe created by the read.csv() command: > > > head(vel) > >year month day hour min fps > > 1 2016 3 3 12 0 1.74 > > 2 2016 3 3 12 10 1.75 > > 3 2016 3 3 12 20 1.76 > > 4 2016 3 3 12 30 1.81 > > 5 2016 3 3 12 40 1.79 > > 6 2016 3 3 12 50 1.75 > > > > and the resulting grouping: > > > vel_by_month > > # A tibble: 67 × 3 > > # Groups: year [8] > > year month flow > > > > 1 0NA NaN > > 2 2016 3 2.40 > > 3 2016 4 3.00 > > 4 2016 5 2.86 > > 5 2016 6 2.51 > > 6 2016 7 2.18 > > 7 2016 8 1.89 > > 8 2016 9 1.38 > > 9 201610 1.73 > > 10 201611 2.01 > > # … with 57 more rows > > > > I cannot find why line 1 is there. Other data sets don't produce this > > result. > > > > TIA, > > > > Rich > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need fresh eyes to see what I'm missing
Before you create vel_by_month you can check vel for NAs and NaNs by sum(is.na(vel)) sum(unlist(lapply(vel,is.nan))) HTH, Eric On Tue, Sep 14, 2021 at 6:21 PM Rich Shepard wrote: > The data file begins this way: > year,month,day,hour,min,fps > 2016,03,03,12,00,1.74 > 2016,03,03,12,10,1.75 > 2016,03,03,12,20,1.76 > 2016,03,03,12,30,1.81 > 2016,03,03,12,40,1.79 > 2016,03,03,12,50,1.75 > 2016,03,03,13,00,1.78 > 2016,03,03,13,10,1.81 > > The script to process it: > library('tidyverse') > vel <- read.csv('../data/water/vel.dat', header = TRUE, sep = ',', > stringsAsFactors = FALSE) > vel$year <- as.integer(vel$year) > vel$month <- as.integer(vel$month) > vel$day <- as.integer(vel$day) > vel$hour <- as.integer(vel$hour) > vel$min <- as.integer(vel$min) > vel$fps <- as.double(vel$fps, length = 6) > > # use dplyr to filter() by year, month, day; summarize() to get monthly > # means > vel_by_month = vel %>% > group_by(year, month) %>% > summarize(flow = mean(fps, na.rm = TRUE)) > > R's display after running the script: > > source('vel.R') > `summarise()` has grouped output by 'year'. You can override using the > `.groups` argument. > Warning messages: > 1: In eval(ei, envir) : NAs introduced by coercion > 2: In eval(ei, envir) : NAs introduced by coercion > 3: In eval(ei, envir) : NAs introduced by coercion > > The dataframe created by the read.csv() command: > > head(vel) >year month day hour min fps > 1 2016 3 3 12 0 1.74 > 2 2016 3 3 12 10 1.75 > 3 2016 3 3 12 20 1.76 > 4 2016 3 3 12 30 1.81 > 5 2016 3 3 12 40 1.79 > 6 2016 3 3 12 50 1.75 > > and the resulting grouping: > > vel_by_month > # A tibble: 67 × 3 > # Groups: year [8] > year month flow > > 1 0NA NaN > 2 2016 3 2.40 > 3 2016 4 3.00 > 4 2016 5 2.86 > 5 2016 6 2.51 > 6 2016 7 2.18 > 7 2016 8 1.89 > 8 2016 9 1.38 > 9 201610 1.73 > 10 201611 2.01 > # … with 57 more rows > > I cannot find why line 1 is there. Other data sets don't produce this > result. > > TIA, > > Rich > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.