Re: [R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Avi Gross via R-help
Rich,

You have helped us understand and at this point, suppose we now are sure
about the way missing info is supplied. What you show is not the same as the
CSV sample earlier but assuming you know that "Eqp" is the one and only way
they signaled bad data.

One choice is to fix the original data before reading into R. Chances are
placing exactly NA in those places, perhaps using a global substitute of
some sort, might do it.

But as Bert noted, R is a very powerful environment and you can use it.

One argument you can use with read.csv() is to tell it "Eqp" is to be
treated as an NA. The substitution may then be made as it is read in AND you
might then notice it is properly read in as a column of doubles.

Suppose you read in this data and make sure the column involved is read as
character strings, instead. You can use any number of tools in base R or
dplyr to replace Eqp with NA such as in a pipeline ... %>%
mutate(fps=ifelse(fps=="Eqp", NA, fps)) %>% ...

The above is one of many ways and of course afterward, you may want to
reconvert the character column back to floating point. I note dplyr can do
both in the same function as it applies them in order:

mutate(fps=ifelse(fps=="Eqp", NA, fps), fps=as.double(fps))

The point is that in many cases, the data must be carefully examined and
cleaned and set up. In some cases, it may also be useful to treat some as
factors as in the hours and minutes. If you continue on your road and hit
ggplot() to make graphs, factors may be useful in various kinds of fine
tuning.

-Original Message-
From: R-help  On Behalf Of Rich Shepard
Sent: Tuesday, September 14, 2021 1:59 PM
To: r-help@r-project.org
Subject: Re: [R] Need fresh eyes to see what I'm missing

On Tue, 14 Sep 2021, Bert Gunter wrote:

> **Don't do this.*** You will make errors. Use fit-for-purpose tools.
> That's what R is for. Also, be careful **how** you "download", as that 
> already may bake in problems.

Bert,

Haven't had downloading errors saving displayed files.

The problem with the velocities data is shown here:
2020-11-24 11:00PST Eqp 
2020-11-24 11:05PST Eqp 
2020-11-24 11:10PST Eqp 
2020-11-24 11:15PST Eqp 
2020-11-24 11:20PST Eqp 
2020-11-24 11:25PST Eqp 
2020-11-24 11:30PST Eqp 
2020-11-24 11:35PST Eqp 
2020-11-24 11:40PST Eqp 
2020-11-24 11:45PST Eqp 
2020-11-24 11:50PST Eqp 
2021-01-08 16:26PST Eqp

Equipment failure during the period shown.

What's the best way to replace these lines? Just remove them or change them
to NA?

Regards,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Rich Shepard

On Tue, 14 Sep 2021, Bert Gunter wrote:


**Don't do this.*** You will make errors. Use fit-for-purpose tools.
That's what R is for. Also, be careful **how** you "download", as that
already may bake in problems.


Bert,

Haven't had downloading errors saving displayed files.

The problem with the velocities data is shown here:
2020-11-24 11:00	PST	Eqp 
2020-11-24 11:05	PST	Eqp 
2020-11-24 11:10	PST	Eqp 
2020-11-24 11:15	PST	Eqp 
2020-11-24 11:20	PST	Eqp 
2020-11-24 11:25	PST	Eqp 
2020-11-24 11:30	PST	Eqp 
2020-11-24 11:35	PST	Eqp 
2020-11-24 11:40	PST	Eqp 
2020-11-24 11:45	PST	Eqp 
2020-11-24 11:50	PST	Eqp 
2021-01-08 16:26	PST	Eqp


Equipment failure during the period shown.

What's the best way to replace these lines? Just remove them or change them
to NA?

Regards,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Bert Gunter
Inline.


On Tue, Sep 14, 2021 at 10:42 AM Rich Shepard  wrote:
>
> On Tue, 14 Sep 2021, Eric Berger wrote:
>
> > My suggestion was not 'to make a difference'. It was to determine whether
> > the NAs or NaNs appear before the dplyr commands. You confirmed that they
> > do. There are 2321 NAs in vel. Bert suggested some ways that an NA might
> > appear.
>
> Eric,
>
> Yes, you're all correct. I've just downloaded the raw data again for mean
> velocieties and suspended sediments. I'll go through them line-by-line and
> look for discrepancies.

**Don't do this.*** You will make errors. Use fit-for-purpose tools.
That's what R is for. Also, be careful **how** you "download", as that
already may bake in problems.

-- Bert
>
> Thanks again,
>
> Rich
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Rich Shepard

On Tue, 14 Sep 2021, Bert Gunter wrote:


Input problems of this sort are often caused by stray or extra characters
(commas, dashes, etc.) in the input files, which then can trigger
automatic conversion to character. Excel files are somewhat notorious for
this.


Bert,

Large volume of missing data at the end of last year. See attached plot.

I'll go through the raw data file to see how those missing data are
presented.

Regards,

Rich__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Rich Shepard

On Tue, 14 Sep 2021, Eric Berger wrote:


My suggestion was not 'to make a difference'. It was to determine whether
the NAs or NaNs appear before the dplyr commands. You confirmed that they
do. There are 2321 NAs in vel. Bert suggested some ways that an NA might
appear.


Eric,

Yes, you're all correct. I've just downloaded the raw data again for mean
velocieties and suspended sediments. I'll go through them line-by-line and
look for discrepancies.

Thanks again,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Eric Berger
Hi Rich,
My suggestion was not 'to make a difference'. It was to determine
whether the NAs or NaNs appear before the dplyr commands. You
confirmed that they do. There are 2321 NAs in vel. Bert suggested some
ways that an NA might appear.

Best,
Eric

On Tue, Sep 14, 2021 at 6:42 PM Rich Shepard  wrote:
>
> On Tue, 14 Sep 2021, Eric Berger wrote:
>
> > Before you create vel_by_month you can check vel for NAs and NaNs by
> >
> > sum(is.na(vel))
> > sum(unlist(lapply(vel,is.nan)))
>
> Eric,
>
> There should not be any missing values in the data file. Regardless, I added
> those lines to the script and it made no difference.
>
> Running those commands on the R command line showed these results:
> > sum(is.na(vel))
> [1] 2321
> > sum(unlist(lapply(vel,is.nan)))
> [1] 0
>
> Yet the monthly summaries retain the initial line:
> > vel_by_month
> # A tibble: 67 × 3
> # Groups:   year [8]
>  year month   flow
>
>   1 0NA NaN
>
> I've another data set with the same issue (that's 2 out of 5) and I assume
> the source of the problem is the same with both.
>
> The data sets have no NAs or missing values at the end of a line.
>
> Thanks for the ideas,
>
> Rich
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Rich Shepard

On Tue, 14 Sep 2021, Bert Gunter wrote:


Input problems of this sort are often caused by stray or extra characters
(commas, dashes, etc.) in the input files, which then can trigger
automatic conversion to character. Excel files are somewhat notorious for
this.


Bert,

Yes, I'm going to closely review the original data file and work forward
from there. Thanks for your comments; I do appreciate them.

Back when I have more information ... perhaps even a fix.

Best regards,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Bert Gunter
Input problems of this sort are often caused by stray or extra
characters (commas, dashes, etc.) in the input files, which then can
trigger automatic conversion to character. Excel files are somewhat
notorious for this.

A couple of comments, and then I'll quit, as others should have
greater insight (and may correct any of my errors).

1.
> as.numeric("1,")
[1] NA
Warning message:
NAs introduced by coercion

So if a stray character caused your "numeric" input to be read in as
character, then you converted it with as.numeric() (do not use
as.integer or as.double), you get that error.

2. So I would say that you need to check those columns in your data
frame that were read in as character instead of numeric.  I'd also
check the others with unique() or some such just to make sure they
have the handful of right values.

One way of doing this would be to look for NA's in as.numeric, as
above. But I thought you said you did
this already and found none, so I don't get it. Other approaches would
be to examine your .csv file with ?count.fields or try reading it with
?read.delim. Any discrepancies or errors you get from these may help
you to pinpoint problems like stray characters, to many fields in a
line, etc.

3. As for your "fps as factors" question, note that:
> as.numeric(factor("3"))
[1] 1

So it depends on how you read stuff in. The answer should be "no" with
read.csv(..., stringsAsFactors = FALSE), but I'm not sure what all you
did or what kind of junk in your .csv file may be causing R to misread
the numeric data as character.

As I said, others may be wiser and correct any errors in my "advice."
This is as far as I can go -- and it may already be too far.

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )



On Tue, Sep 14, 2021 at 9:01 AM Rich Shepard  wrote:
>
> On Tue, 14 Sep 2021, Bert Gunter wrote:
>
> > Remove all your as.integer() and as.double() coercions. They are
> > unnecessary (unless you are preparing input for C code; also, all R
> > non-integers are double precision) and may be the source of your problems.
>
> Bert,
>
> When I remove coercions the script produces warnings like this:
> 1: In mean.default(fps, na.rm = TRUE) :
>argument is not numeric or logical: returning NA
>
> and str(vel) displays this:
> 'data.frame':   565675 obs. of  6 variables:
>   $ year : chr  "2016" "2016" "2016" "2016" ...
>   $ month: int  3 3 3 3 3 3 3 3 3 3 ...
>   $ day  : int  3 3 3 3 3 3 3 3 3 3 ...
>   $ hour : chr  "12" "12" "12" "12" ...
>   $ min  : int  0 10 20 30 40 50 0 10 20 30 ...
>   $ fps  : chr  "1.74" "1.75" "1.76" "1.81" ...
>
> so month, day, and min are recognized as integers but year, hour, and fps
> are seen as characters. I don't understand why.
>
> Regards,
>
> Rich
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Avi Gross via R-help
Rich,

I have to wonder about how your data was placed in the CSV file based on
what you report.

functions like read.table() (which is called by read.csv()) ultimately make
guesses about what number of columns to expect and what the contents are
likely to be. They may just examine the first N entries and make the most
compatible choice. The fact that it shows this:

'data.frame':   565675 obs. of  6 variables:
  $ year : chr  "2016" "2016" "2016" "2016" ...
  $ month: int  3 3 3 3 3 3 3 3 3 3 ...
  $ day  : int  3 3 3 3 3 3 3 3 3 3 ...
  $ hour : chr  "12" "12" "12" "12" ...
  $ min  : int  0 10 20 30 40 50 0 10 20 30 ...
  $ fps  : chr  "1.74" "1.75" "1.76" "1.81" ...

is odd. It suggests somewhere early in the data, it did not say 2016 or some
other entry  as an integer but as "2016" or a word like `missing` and not in
quotes.

Something similar seems to have happened with hour and fps but not the rest.

Nonetheless, you did convert back to what you wanted BUT if a single
anomalous entry remains then as.integer("missing") would return an NA and
as.double("missing") also an NA. So it is wise to check for any unexpected
numbers. If the source cannot be changed, then the R program can filter out
such cases from your data.frame in various ways.

Your way of reading the CSV in was this:

vel <- read.csv('../data/water/vel.dat', header = TRUE, sep = ',',
stringsAsFactors = FALSE)

The default is the options you added for header=TRUE and sep="," so that is
harmless. The default now is not to read in strings as Factors. But what you
did not include may be something you can look at given your data may be a
bit off. 

Without the underlying file, we can not trivially diagnose what may be wrong
in it. Do you get any error messages when reading in the file?  You can
specify additional arguments to read.csv() about what, if any, quoting
characters are used, what sequences should be recognized as an NA,
suggestions of what type each column should be assumed to be, what to do
with blank lines, what a comment looks like  and so on. 

One thing I sometimes have had to do is open the original CSV file in EXCEL
and examine it in various ways or even change it and save it again. That is
beyond the scope of this mailing list so if needed, ask me in private. You
have been working on this kind of stuff, but I assume often using other
tools outside R and dplyr.






-Original Message-
From: R-help  On Behalf Of Rich Shepard
Sent: Tuesday, September 14, 2021 11:49 AM
To: R mailing list 
Subject: Re: [R] Need fresh eyes to see what I'm missing

On Tue, 14 Sep 2021, Bert Gunter wrote:

> Remove all your as.integer() and as.double() coercions. They are 
> unnecessary (unless you are preparing input for C code; also, all R 
> non-integers are double precision) and may be the source of your problems.

Bert,

When I remove coercions the script produces warnings like this:
1: In mean.default(fps, na.rm = TRUE) :
   argument is not numeric or logical: returning NA

and str(vel) displays this:
'data.frame':   565675 obs. of  6 variables:
  $ year : chr  "2016" "2016" "2016" "2016" ...
  $ month: int  3 3 3 3 3 3 3 3 3 3 ...
  $ day  : int  3 3 3 3 3 3 3 3 3 3 ...
  $ hour : chr  "12" "12" "12" "12" ...
  $ min  : int  0 10 20 30 40 50 0 10 20 30 ...
  $ fps  : chr  "1.74" "1.75" "1.76" "1.81" ...

so month, day, and min are recognized as integers but year, hour, and fps
are seen as characters. I don't understand why.

Regards,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Avi Gross via R-help
Rich,

I reproduced your problem on my re-arranging the code the mailer mangled. I 
tried variations like not using pipes or changing what it is grouped by and 
they all show your results on the abbreviated data with the error:

`summarise()` has grouped output by 'year'. You can override using the 
`.groups` argument.

I think I fixed summarise()  but it makes me wonder if there is an 
inconsistency introduced along the way as what you used is supposed to work and 
has worked for me in the past.

I note the man page for summarise() mentions that the .groups="..." is 
experimental and a tad confusing:

I changed your code to this by telling it to keep the grouping in the output 
the same:

vel_by_month = vel %>%
  group_by(year, month) %>%
  summarise(flow = mean(fps, na.rm = TRUE), .groups="keep")

The change from your code is the addition at the very end of the .groups="keep" 
argument.

Since I used your limited data, this is all I get:

> vel_by_month
# A tibble: 1 x 3
# Groups:   year, month [1]
year month  flow
  
  1  2016 3  1.77

For now, all I did was shut summarise() up.

Not having the rest of your data, the question is where your NA and Nan are 
introduced. If the change I made above does not resolve it, then as others 
suggested, you begin by looking at your data more carefully perhaps starting 
with the .CSV file and then the data structures in R, along the lines of what 
you were shown. I find the table() function useful for categorical data with 
limited choices as it would spit out the anomaly as happening once.

I see your point about needing fresh eyes. My eyes do not see what you did 
wrong but am just following clues you may be ignoring.


-Original Message-
From: R-help  On Behalf Of Rich Shepard
Sent: Tuesday, September 14, 2021 11:21 AM
To: r-help@r-project.org
Subject: [R] Need fresh eyes to see what I'm missing

The data file begins this way:
year,month,day,hour,min,fps
2016,03,03,12,00,1.74
2016,03,03,12,10,1.75
2016,03,03,12,20,1.76
2016,03,03,12,30,1.81
2016,03,03,12,40,1.79
2016,03,03,12,50,1.75
2016,03,03,13,00,1.78
2016,03,03,13,10,1.81

The script to process it:
library('tidyverse')
vel <- read.csv('../data/water/vel.dat', header = TRUE, sep = ',', 
stringsAsFactors = FALSE) vel$year <- as.integer(vel$year) vel$month <- 
as.integer(vel$month) vel$day <- as.integer(vel$day) vel$hour <- 
as.integer(vel$hour) vel$min <- as.integer(vel$min) vel$fps <- 
as.double(vel$fps, length = 6)

# use dplyr to filter() by year, month, day; summarize() to get monthly # means 
vel_by_month = vel %>%
 group_by(year, month) %>%
 summarize(flow = mean(fps, na.rm = TRUE))

R's display after running the script:
> source('vel.R')
`summarise()` has grouped output by 'year'. You can override using the 
`.groups` argument.
Warning messages:
1: In eval(ei, envir) : NAs introduced by coercion
2: In eval(ei, envir) : NAs introduced by coercion
3: In eval(ei, envir) : NAs introduced by coercion

The dataframe created by the read.csv() command:
> head(vel)
   year month day hour min  fps
1 2016 3   3   12   0 1.74
2 2016 3   3   12  10 1.75
3 2016 3   3   12  20 1.76
4 2016 3   3   12  30 1.81
5 2016 3   3   12  40 1.79
6 2016 3   3   12  50 1.75

and the resulting grouping:
> vel_by_month
# A tibble: 67 × 3
# Groups:   year [8]
 year month   flow
   
  1 0NA NaN
  2  2016 3   2.40
  3  2016 4   3.00
  4  2016 5   2.86
  5  2016 6   2.51
  6  2016 7   2.18
  7  2016 8   1.89
  8  2016 9   1.38
  9  201610   1.73
10  201611   2.01
# … with 57 more rows

I cannot find why line 1 is there. Other data sets don't produce this result.

TIA,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Rich Shepard

On Tue, 14 Sep 2021, Bert Gunter wrote:


Remove all your as.integer() and as.double() coercions. They are
unnecessary (unless you are preparing input for C code; also, all R
non-integers are double precision) and may be the source of your
problems.


Bert,

Are all columns but the fps factors?

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Rich Shepard

On Tue, 14 Sep 2021, Bert Gunter wrote:


Remove all your as.integer() and as.double() coercions. They are
unnecessary (unless you are preparing input for C code; also, all R
non-integers are double precision) and may be the source of your problems.


Bert,

When I remove coercions the script produces warnings like this:
1: In mean.default(fps, na.rm = TRUE) :
  argument is not numeric or logical: returning NA

and str(vel) displays this:
'data.frame':   565675 obs. of  6 variables:
 $ year : chr  "2016" "2016" "2016" "2016" ...
 $ month: int  3 3 3 3 3 3 3 3 3 3 ...
 $ day  : int  3 3 3 3 3 3 3 3 3 3 ...
 $ hour : chr  "12" "12" "12" "12" ...
 $ min  : int  0 10 20 30 40 50 0 10 20 30 ...
 $ fps  : chr  "1.74" "1.75" "1.76" "1.81" ...

so month, day, and min are recognized as integers but year, hour, and fps
are seen as characters. I don't understand why.

Regards,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Rich Shepard

On Tue, 14 Sep 2021, Eric Berger wrote:


Before you create vel_by_month you can check vel for NAs and NaNs by

sum(is.na(vel))
sum(unlist(lapply(vel,is.nan)))


Eric,

There should not be any missing values in the data file. Regardless, I added
those lines to the script and it made no difference.

Running those commands on the R command line showed these results:

sum(is.na(vel))

[1] 2321

sum(unlist(lapply(vel,is.nan)))

[1] 0

Yet the monthly summaries retain the initial line:

vel_by_month

# A tibble: 67 × 3
# Groups:   year [8]
year month   flow
  
 1 0NA NaN

I've another data set with the same issue (that's 2 out of 5) and I assume
the source of the problem is the same with both.

The data sets have no NAs or missing values at the end of a line.

Thanks for the ideas,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Bert Gunter
Remove all your as.integer() and as.double() coercions. They are
unnecessary (unless you are preparing input for C code; also, all R
non-integers are double precision) and may be the source of your
problems.

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Tue, Sep 14, 2021 at 8:31 AM Eric Berger  wrote:
>
> Before you create vel_by_month you can check vel for NAs and NaNs by
>
> sum(is.na(vel))
> sum(unlist(lapply(vel,is.nan)))
>
> HTH,
> Eric
>
>
> On Tue, Sep 14, 2021 at 6:21 PM Rich Shepard 
> wrote:
>
> > The data file begins this way:
> > year,month,day,hour,min,fps
> > 2016,03,03,12,00,1.74
> > 2016,03,03,12,10,1.75
> > 2016,03,03,12,20,1.76
> > 2016,03,03,12,30,1.81
> > 2016,03,03,12,40,1.79
> > 2016,03,03,12,50,1.75
> > 2016,03,03,13,00,1.78
> > 2016,03,03,13,10,1.81
> >
> > The script to process it:
> > library('tidyverse')
> > vel <- read.csv('../data/water/vel.dat', header = TRUE, sep = ',',
> > stringsAsFactors = FALSE)
> > vel$year <- as.integer(vel$year)
> > vel$month <- as.integer(vel$month)
> > vel$day <- as.integer(vel$day)
> > vel$hour <- as.integer(vel$hour)
> > vel$min <- as.integer(vel$min)
> > vel$fps <- as.double(vel$fps, length = 6)
> >
> > # use dplyr to filter() by year, month, day; summarize() to get monthly
> > # means
> > vel_by_month = vel %>%
> >  group_by(year, month) %>%
> >  summarize(flow = mean(fps, na.rm = TRUE))
> >
> > R's display after running the script:
> > > source('vel.R')
> > `summarise()` has grouped output by 'year'. You can override using the
> > `.groups` argument.
> > Warning messages:
> > 1: In eval(ei, envir) : NAs introduced by coercion
> > 2: In eval(ei, envir) : NAs introduced by coercion
> > 3: In eval(ei, envir) : NAs introduced by coercion
> >
> > The dataframe created by the read.csv() command:
> > > head(vel)
> >year month day hour min  fps
> > 1 2016 3   3   12   0 1.74
> > 2 2016 3   3   12  10 1.75
> > 3 2016 3   3   12  20 1.76
> > 4 2016 3   3   12  30 1.81
> > 5 2016 3   3   12  40 1.79
> > 6 2016 3   3   12  50 1.75
> >
> > and the resulting grouping:
> > > vel_by_month
> > # A tibble: 67 × 3
> > # Groups:   year [8]
> >  year month   flow
> >
> >   1 0NA NaN
> >   2  2016 3   2.40
> >   3  2016 4   3.00
> >   4  2016 5   2.86
> >   5  2016 6   2.51
> >   6  2016 7   2.18
> >   7  2016 8   1.89
> >   8  2016 9   1.38
> >   9  201610   1.73
> > 10  201611   2.01
> > # … with 57 more rows
> >
> > I cannot find why line 1 is there. Other data sets don't produce this
> > result.
> >
> > TIA,
> >
> > Rich
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Eric Berger
Before you create vel_by_month you can check vel for NAs and NaNs by

sum(is.na(vel))
sum(unlist(lapply(vel,is.nan)))

HTH,
Eric


On Tue, Sep 14, 2021 at 6:21 PM Rich Shepard 
wrote:

> The data file begins this way:
> year,month,day,hour,min,fps
> 2016,03,03,12,00,1.74
> 2016,03,03,12,10,1.75
> 2016,03,03,12,20,1.76
> 2016,03,03,12,30,1.81
> 2016,03,03,12,40,1.79
> 2016,03,03,12,50,1.75
> 2016,03,03,13,00,1.78
> 2016,03,03,13,10,1.81
>
> The script to process it:
> library('tidyverse')
> vel <- read.csv('../data/water/vel.dat', header = TRUE, sep = ',',
> stringsAsFactors = FALSE)
> vel$year <- as.integer(vel$year)
> vel$month <- as.integer(vel$month)
> vel$day <- as.integer(vel$day)
> vel$hour <- as.integer(vel$hour)
> vel$min <- as.integer(vel$min)
> vel$fps <- as.double(vel$fps, length = 6)
>
> # use dplyr to filter() by year, month, day; summarize() to get monthly
> # means
> vel_by_month = vel %>%
>  group_by(year, month) %>%
>  summarize(flow = mean(fps, na.rm = TRUE))
>
> R's display after running the script:
> > source('vel.R')
> `summarise()` has grouped output by 'year'. You can override using the
> `.groups` argument.
> Warning messages:
> 1: In eval(ei, envir) : NAs introduced by coercion
> 2: In eval(ei, envir) : NAs introduced by coercion
> 3: In eval(ei, envir) : NAs introduced by coercion
>
> The dataframe created by the read.csv() command:
> > head(vel)
>year month day hour min  fps
> 1 2016 3   3   12   0 1.74
> 2 2016 3   3   12  10 1.75
> 3 2016 3   3   12  20 1.76
> 4 2016 3   3   12  30 1.81
> 5 2016 3   3   12  40 1.79
> 6 2016 3   3   12  50 1.75
>
> and the resulting grouping:
> > vel_by_month
> # A tibble: 67 × 3
> # Groups:   year [8]
>  year month   flow
>
>   1 0NA NaN
>   2  2016 3   2.40
>   3  2016 4   3.00
>   4  2016 5   2.86
>   5  2016 6   2.51
>   6  2016 7   2.18
>   7  2016 8   1.89
>   8  2016 9   1.38
>   9  201610   1.73
> 10  201611   2.01
> # … with 57 more rows
>
> I cannot find why line 1 is there. Other data sets don't produce this
> result.
>
> TIA,
>
> Rich
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.