Re: [R] Changing time intervals in data set

2021-12-16 Thread Rich Shepard

On Wed, 15 Dec 2021, Avi Gross wrote:


I still do not see what you want to do, sorry.


Avi,

Backing up to my original post on this thread I've realized that no one
addressed my main question: do variable measurement intervals affect
analyses of the data. And, if so, how and how to compensate for the changes.

Since this is not an R question I'll take it to stackoverflow (I think
that's the appropriate platform for this question.

My thanks to you and all others who commented on this thread.

Regards,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Changing time intervals in data set

2021-12-16 Thread Rich Shepard

On Thu, 16 Dec 2021, Chris Evans wrote:


What you said earlier was:



For me the next step, in tidyverse pseudocode, might be something like:

tibData %>%
  arrange(nbr, datetime) %>% # just in case things are not ordered nicely
  group_by(site_nbr) %>% # as you want to get changes within site I think
  mutate(gapTime = datetime - lag(datetime)) %>% # get the simple gaps
  summarise(nGaps = n_distinct(gapTime)) # get the number of gaps per site


Chris,

Thank you. Each tibble has the data for the same site_nbr. Your pseudocode
should put me on the path to doing what I want.

If the rows were grouped by the minute (the measurement variable period) ...
I'll need to think deeply how only the time interval, not the individual
values, could be extracted.


From what you are saying that will get you numbers of time gap changes per
site. That will help you work out how many are simple failures of sensors
etc. (would they come up as multiples of that site's then usual interval,
or might they be more complex?) In the light of that you can start the
somewhat more challenging issue of disentangling those from more long
lasting switches in a site's gapTime value. I am sure I can offer some
thoughts on that in the light of what you find but the best solutions will
depend on the number of sites and on what those distributions of changes
within site look like.


There are only four sites, and I've identified the dates and times of
measurement frequencies for one of them. My interest is on when that
measurement frequency changes. Because I've not before seen such frequency
changes in data sets I've used for projects I've no idea whether these
changes affect analytic results.


Disclaimer: I am not a professional statistician nor a professional R
coder though I do spend much of each week hacking up R code that works and
supports publications. Others here are professional statisticians _and_
professional R coders.


Neither am I a professional statistician nor do I use R every day or with
every project.

Regards,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Changing time intervals in data set

2021-12-15 Thread Chris Evans
What you said earlier was:

> >> The data.frame/tibble has columns for year, month, day, hour, minute, and
> >> datetime.

> As well as a site_nbr.

> What I asked is,

> >> Would difftime() allow me to find the dates when the changes occurred?

For me the next step, in tidyverse pseudocode, might be something like:

tibData %>%
   arrange(nbr, datetime) %>% # just in case things are not ordered nicely
   group_by(site_nbr) %>% # as you want to get changes within site I think
   mutate(gapTime = datetime - lag(datetime)) %>% # get the simple gaps
   summarise(nGaps = n_distinct(gapTime)) # get the number of gaps per site

(untested, may be flawed but it conveys the ideas)

>From what you are saying that will get you numbers of time gap changes per
site.  That will help you work out how many are simple failures of sensors
etc. (would they come up as multiples of that site's then usual interval,
or might they be more complex?)  In the light of that you can start the 
somewhat more challenging issue of disentangling those from more long 
lasting switches in a site's gapTime value.  I am sure I can offer some
thoughts on that in the light of what you find but the best solutions will
depend on the number of sites and on what those distributions of changes
within site look like.  

Disclaimer: I am not a professional statistician nor a professional R 
coder though I do spend much of each week hacking up R code that works
and supports publications.  Others here are professional statisticians
_and_ professional R coders.

Very best and seasonal greetings to all,

Chris


- Original Message -
> From: "Rich Shepard" 
> To: "r-help mailing list" 
> Sent: Wednesday, 15 December, 2021 23:42:42
> Subject: Re: [R] Changing time intervals in data set

> On Thu, 16 Dec 2021, Jim Lemon wrote:
> 
>> From what you sent, it seems like you want to find where the change in
>> _measurement interval_ occurred. That looks to me as though it is the
>> first datetime in each row. In the first row, there is a week gap between
>> the ten and fifteen minute intervals. This may indicate that no
>> measurements were taken or perhaps they were lost.
> 
> Jim,
> 
> Yes, there are times when the equipment fails, but not all changes in
> measurement intervals have a time gap other than a few minutes.
> 
> Normally I work with much smaller data sets so if there are interval changes
> they've not appeared in the data I've used.
> 
> I will learn from the USGS why there are so many measurement interval
> changes.
> 
> Because these data are so different from what I've seen in the past I want
> to explore whether (or how) they affect discharge variability calculations.
> 
> Regards,
> 
> Rich


-- 
Chris Evans (he/him)  
Visiting Professor, UDLA, Quito, Ecuador & Honorary Professor, University of 
Roehampton, London, UK.
Work web site: https://www.psyctc.org/psyctc/ 
CORE site: https://www.coresystemtrust.org.uk/
Personal site: https://www.psyctc.org/pelerinage2016/
OMbook:https://ombook.psyctc.org/book/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Changing time intervals in data set

2021-12-15 Thread Rich Shepard

On Thu, 16 Dec 2021, Jim Lemon wrote:


From what you sent, it seems like you want to find where the change in
_measurement interval_ occurred. That looks to me as though it is the
first datetime in each row. In the first row, there is a week gap between
the ten and fifteen minute intervals. This may indicate that no
measurements were taken or perhaps they were lost.


Jim,

Yes, there are times when the equipment fails, but not all changes in
measurement intervals have a time gap other than a few minutes.

Normally I work with much smaller data sets so if there are interval changes
they've not appeared in the data I've used.

I will learn from the USGS why there are so many measurement interval
changes.

Because these data are so different from what I've seen in the past I want
to explore whether (or how) they affect discharge variability calculations.

Regards,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Changing time intervals in data set

2021-12-15 Thread Jim Lemon
Hi Rich,
>From what you sent, it seems like you want to find where the change in
_measurement interval_ occurred. That looks to me as though it is the
first datetime in each row. In the first row, there is a week gap
between the ten and fifteen minute intervals. This may indicate that
no measurements were taken or perhaps they were lost.

Jim

On Thu, Dec 16, 2021 at 8:13 AM Rich Shepard  wrote:
>
> On Wed, 15 Dec 2021, jim holtman wrote:
>
> > At least show a sample of the data and then what you would like as output.
>
> Jim,
>
> There are 813,694 rows of data. As I wrote,
> >> A 33-year set of river discharge data at one gauge location has recording
> >> intervals of 5, 10, and 30 minutes over the period of record.
>
> Also, 15 minute intervals
>
> >> The data.frame/tibble has columns for year, month, day, hour, minute, and
> >> datetime.
>
> As well as a site_nbr.
>
> What I asked is,
>
> >> Would difftime() allow me to find the dates when the changes occurred?
>
> I'm scrolling through the data file (not the R dataframe). The output I've
> so far generated manually:
>
> 1988-10-01 00:10 - 1991-03-12 23:50  10 min
> 1991-03-19 00:00 - 1994-07-05 23:45  15 min
> 2003-04-03 00:00 - 2004-01-31 10:30  30 min
> 2004-01-31 10:40 - 2004-01-31 15:30   5 min
> 2004-01-31 16:00 - 2008-06-27 11:00  30 min
> 2008-06-27 11:10 - 2008-07-07 15:30   5 min
> 2008-07-07 16:00 - 2009-05-26 13:00  30 min
> 2009-05-26 13:15 - 2009-05-27 06:30   5 min
> 2009-05-27 07:00 - 2009-06-11 08:00  30 min
> 2009-06-11 08:15 - 2009-06-12 08:50   5 min
> 2009-06-12 09:00 - 2010-09-07 08:00  30 min
> 2010-09-07 08:20 - 2010-09-07 18:00   5 min
> 2010-09-07 18:30 - 2010-09-09 07:30  30 min
> 2010-09-09 07:45 - 2010-09-13 11:15   5 min
> 2010-09-13 11:30 - 2010-12-01 07:30  30 min
> 2010-12-01 07:50 - 2010-12-02 15:50   5 min
> 2010-12-02 16:00 - 2010-12-17 10:00  30 min
> 2010-12-17 10:05 - 2010-12-20 08:35   5 min
> 2010-12-20 09:00 - 2011-05-20 09:00  30 min
> 2011-05-20 09:05 - 2011-05-23 06:40   5 min
> 2011-05-23 07:00 - 2011-08-18 17:30  30 min
> 2011-08-18 17:45 - 2011-12-14 06:15  15 min
> 2011-12-14 06:20 - 2011-12-14 17:35   5 min
> 2011-12-14 17:45 - 2012-06-28 06:15  15 min
> 2012-06-28 06:25 - 2012-06-28 14:30   5 min
> 2012-06-28 14:45 - 2012-10-12 06:15  15 min
> 2012-10-12 06:25 - 2012-10-12 15:45   5 min
> 2012-10-12 16:00 - 2014-01-17 07:00  15 min
> 2014-01-17 07:05 - 2014-01-21 07:05   5 min
> 2014-01-21 07:15 - 2015-04-03 07:30  15 min
>
> Rich
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Changing time intervals in data set

2021-12-15 Thread Rich Shepard

On Wed, 15 Dec 2021, jim holtman wrote:


At least show a sample of the data and then what you would like as output.


Jim,

There are 813,694 rows of data. As I wrote,

A 33-year set of river discharge data at one gauge location has recording
intervals of 5, 10, and 30 minutes over the period of record.


Also, 15 minute intervals


The data.frame/tibble has columns for year, month, day, hour, minute, and
datetime.


As well as a site_nbr.

What I asked is,


Would difftime() allow me to find the dates when the changes occurred?


I'm scrolling through the data file (not the R dataframe). The output I've
so far generated manually:

1988-10-01 00:10 - 1991-03-12 23:50  10 min
1991-03-19 00:00 - 1994-07-05 23:45  15 min
2003-04-03 00:00 - 2004-01-31 10:30  30 min
2004-01-31 10:40 - 2004-01-31 15:30   5 min
2004-01-31 16:00 - 2008-06-27 11:00  30 min
2008-06-27 11:10 - 2008-07-07 15:30   5 min
2008-07-07 16:00 - 2009-05-26 13:00  30 min
2009-05-26 13:15 - 2009-05-27 06:30   5 min
2009-05-27 07:00 - 2009-06-11 08:00  30 min
2009-06-11 08:15 - 2009-06-12 08:50   5 min
2009-06-12 09:00 - 2010-09-07 08:00  30 min
2010-09-07 08:20 - 2010-09-07 18:00   5 min
2010-09-07 18:30 - 2010-09-09 07:30  30 min
2010-09-09 07:45 - 2010-09-13 11:15   5 min
2010-09-13 11:30 - 2010-12-01 07:30  30 min
2010-12-01 07:50 - 2010-12-02 15:50   5 min
2010-12-02 16:00 - 2010-12-17 10:00  30 min
2010-12-17 10:05 - 2010-12-20 08:35   5 min
2010-12-20 09:00 - 2011-05-20 09:00  30 min
2011-05-20 09:05 - 2011-05-23 06:40   5 min
2011-05-23 07:00 - 2011-08-18 17:30  30 min
2011-08-18 17:45 - 2011-12-14 06:15  15 min
2011-12-14 06:20 - 2011-12-14 17:35   5 min
2011-12-14 17:45 - 2012-06-28 06:15  15 min
2012-06-28 06:25 - 2012-06-28 14:30   5 min
2012-06-28 14:45 - 2012-10-12 06:15  15 min
2012-10-12 06:25 - 2012-10-12 15:45   5 min
2012-10-12 16:00 - 2014-01-17 07:00  15 min
2014-01-17 07:05 - 2014-01-21 07:05   5 min
2014-01-21 07:15 - 2015-04-03 07:30  15 min

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Changing time intervals in data set

2021-12-15 Thread Avi Gross via R-help
I think Rich has shared aspects of the data before and may have forgotten we
want something here and now.

Besides a small sample of what the relevant columns look like and a
suggestion of what he wants some new column to look like, we probably need
more to understand what he wants.

The issue could be  bit like people who want to group their data by quarter,
for example, or by some other aspect such as when someone started and ended
one topic and switched to another. No way we can guess what he actually
wants.

What Rich writes may be perfectly clear to him but not others. It does sound
like there are periods people sit there and record measurements in seemingly
multiple (?contiguous) records with each recording the time at intervals
such as every five minutes, and/or 10 or 30. So a wild guess might be to
cluster them together by finding a GAP where the next record is close enough
in time to the previous ones. In essence, the condition seems to be that:

 time-of-current-record - time-of-previous-record > threshold

Where threshold may simply be thirty minutes, assuming that all the records
are also in the same series as in locations of measurement and do not
intertwine.

I assume, as usual, there are umpteen ways to deal with such sliding window
problems but am loathe to suggest any ideas till Rich has more clearly
defined the issue, perhaps by including a small amount of data in a format
trivial to copy/paste into our R implementation to play with and verify that
the solution seems to work.

But very loosely speaking, a simple sliding window of one might work. In
base R, you can use some form of loop, obviously, starting with column 2,
that perhaps uses a comparison from row N to row N-1 and sets some new
column value to something like 1 until it encounters a big enough gap when
it starts setting it to2 and so on. A later pass on the new data could use
grouping by that column, IF all of what I assume makes sense. 

And, of course, the tidyverse has perhaps easier to use functionality such
as their non-base functions of lag() and lead() used within something like
mutate()

https://dplyr.tidyverse.org/reference/lead-lag.html

But again, you need clearer requirements. You asked how to find when DATES
change. That is not the same as my guess as the date changes at midnight
local time so measures seconds apart would change. If you want to know when
clusters of non-overlapping measures change, that is another issue.

And what exactly do you want to do after determining when things change?
Depending on what you want, you may need a different way to solve the
initial problem. I mentioned the idea of grouping by another variable you
create as one such possibility. But many other solutions would not make a
grouping variable on every row, but insert some kind of cut mark in just the
first row or add a special row between groups and anything lese your
imagination supplies.

Clearly, you do not want us to solve the entire problem you are working on,
but more context may get you answers to the specific thing you are working
on. And, note that adding a new time column may not be required as they can
be created on the fly too in some places, given the other columns. But it
does help to have it in place, at least for a while, if you want to provide
answers such as how many measures were made in what total amount of time
(first to last.)



-Original Message-
From: R-help  On Behalf Of jim holtman
Sent: Wednesday, December 15, 2021 1:05 PM
To: Rich Shepard 
Cc: R mailing list 
Subject: Re: [R] Changing time intervals in data set

At least show a sample of the data and then what you would like as output.

Thanks

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Wed, Dec 15, 2021 at 6:40 AM Rich Shepard 
wrote:

> A 33-year set of river discharge data at one gauge location has 
> recording intervals of 5, 10, and 30 minutes over the period of record.
>
> The data.frame/tibble has columns for year, month, day, hour, minute, 
> and datetime.
>
> Would difftime() allow me to find the dates when the changes occurred?
>
> TIA,
>
> Rich
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org maili

Re: [R] Changing time intervals in data set

2021-12-15 Thread jim holtman
At least show a sample of the data and then what you would like as output.

Thanks

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Wed, Dec 15, 2021 at 6:40 AM Rich Shepard 
wrote:

> A 33-year set of river discharge data at one gauge location has recording
> intervals of 5, 10, and 30 minutes over the period of record.
>
> The data.frame/tibble has columns for year, month, day, hour, minute, and
> datetime.
>
> Would difftime() allow me to find the dates when the changes occurred?
>
> TIA,
>
> Rich
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Changing time intervals in data set

2021-12-15 Thread Rich Shepard

A 33-year set of river discharge data at one gauge location has recording
intervals of 5, 10, and 30 minutes over the period of record.

The data.frame/tibble has columns for year, month, day, hour, minute, and
datetime.

Would difftime() allow me to find the dates when the changes occurred?

TIA,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.