Re: [GENERAL] partial time stamp query

Kirk Wythers Mon, 04 Feb 2013 19:41:40 -0800

On Feb 4, 2013, at 7:03 PM, Misa Simic <misa.si...@gmail.com> wrote:


> Select time2::date, extract('hour' from time2), AVG(avg) from tablename group 
> by time2::date, extract('hour' from time2)

Thanks Misa, 

But this gives the same result as the way I was using date_trunc (not GROUPING 
BY the hour portion of the timestamp, or in this case the re-cast date). I have 
simplified the query, as much as I can, and it is below:

--COPY (
SELECT
        derived_tsoil_fifteen_min_stacked.time2::date,  
        extract('hour' FROM derived_tsoil_fifteen_min_stacked.time2),
        data_key.plot,
        data_key.variable_name,
        AVG(derived_tsoil_fifteen_min_stacked.value)
FROM
        data_key,
        derived_tsoil_fifteen_min_stacked
WHERE
        data_key.variable_channel = derived_tsoil_fifteen_min_stacked.variable 
AND data_key.block_name = derived_tsoil_fifteen_min_stacked.block_name
        AND data_key.plot = 'a2'
        AND derived_tsoil_fifteen_min_stacked.time2::date = '2010-07-07'::date
        AND derived_tsoil_fifteen_min_stacked.variable = 'tsoil_avg1_sc'
GROUP BY
        derived_tsoil_fifteen_min_stacked.time2::date, 
        extract('hour' FROM derived_tsoil_fifteen_min_stacked.time2),
        derived_tsoil_fifteen_min_stacked.time2,
        data_key.variable_name,
        data_key.plot
ORDER BY
        derived_tsoil_fifteen_min_stacked.time2
--) TO '/tmp/derived_tsoil_hourly.csv' WITH CSV HEADER
;

This query returns 96 records (again, one for each 15 minute interval in the 24 
hour day). 

2010-07-07      0       a2      tsoil_sc        21.0599994659424
2010-07-07      0       a2      tsoil_sc        20.9599990844727
2010-07-07      0       a2      tsoil_sc        20.8799991607666
2010-07-07      0       a2      tsoil_sc        20.7999992370605
2010-07-07      1       a2      tsoil_sc        20.7199993133545
2010-07-07      1       a2      tsoil_sc        20.6399993896484
2010-07-07      1       a2      tsoil_sc        20.5499992370605
2010-07-07      1       a2      tsoil_sc        20.4699993133545
2010-07-07      2       a2      tsoil_sc        20.3899993896484
2010-07-07      2       a2      tsoil_sc        20.3099994659424
2010-07-07      2       a2      tsoil_sc        20.25
2010-07-07      2       a2      tsoil_sc        20.1599998474121
2010-07-07      3       a2      tsoil_sc        20.1000003814697
2010-07-07      3       a2      tsoil_sc        20.0100002288818
2010-07-07      3       a2      tsoil_sc        19.9400005340576
2010-07-07      3       a2      tsoil_sc        19.8600006103516
2010-07-07      4       a2      tsoil_sc        19.7700004577637
2010-07-07      4       a2      tsoil_sc        19.7199993133545
2010-07-07      4       a2      tsoil_sc        19.6499996185303
2010-07-07      4       a2      tsoil_sc        19.5599994659424
etc….

Could there be anything in the JOIN part of this query that is causing 
problems? I'm really grasping at straws now!

Thanks again,

Kirk


> 
> 
> On Monday, February 4, 2013, Kirk Wythers wrote:
> Hi Brent,
> 
> Nice to hear from you. I hope your world is good.
> 
> On Feb 4, 2013, at 2:14 PM, Brent Wood <brent.w...@niwa.co.nz> wrote:
> 
> > Hi Kirk,
> >
> > We have a (near) real time data database for instrument observations from 
> > our research vessels. All observations (summarised to one minute intervals 
> > - the actual raw data is in netCDF, this database makes for easier access & 
> > meets most users needs) go into a single table, with other tables for 
> > metadata about platforms, instruments, etc. Now approaching 350m records, 
> > so reasonably substantial.
> >
> > Underlying optimisations include
> >
> > partitioned readings table, with a separate partition for each year (now 23 
> > years)
> > clustered index on timestamp for the previous years partitions.
> > largeish filesystem block size - tested to work well with the clustered 
> > index & small size records)
> >
> > These generally make a big difference to performance. To address one issue, 
> > much like yours, where some users want hourly data for a year, some want 
> > daily data for 10 years & some want 1 minute data for the last month (& 
> > some, no doubt, want one minute data for 20+ years!) I introduced an 
> > integer column called timer. This value is set according to the time (not 
> > date) of each record.
> 
> Very similar to what I need to do. Our main table consists of records that 
> have been standardized to 15 minute timestamps. Here is a simplified example
> 
> record  timestamp               variable                value
> 1               12:00:00                temp            12.6
> 2               12:15:00                temp            12.3
> 3               12:30:00                temp            11.7
> 4               12:45:00                temp            12.3
> 5               13:00:00                temp            13.9
> 6               13:15:00                temp            12.5
> 7               13.30:00                temp            13.7
> 8               13:45:00                temp            12.0
> 
> You are exactly right, some people will want the original 15 minute version, 
> some people will want these summarized to hourly data, and others will want 
> these summarized to daily data. Still others may be satisfied with monthly 
> summaries.
> 
> >
> > Along the lines of (from memory) :an even no of minutes after the hour is 
> > 2, 5 minutes is 4, 10 minutes is 8, 15 minute is 16, 30 minutes is 32, 60 
> > minutes is 64, 6 hourly is 128, 12:00 AM is 256 & 12:00PM is 512.   When 
> > any timestamp is in more than one category (eg: 12:00 is all of even, 5, 
> > 15m 30m 60 minutes), the timer value is set to the largest appropriate one.
> 
> I'm not quite following. In my case, if I want hourly data, I'd be looking 
> for…
> 
> record  timestamp               variable                value
> 1               12:00:00                temp            12.225
> 2               13:00:00                temp            13.025
> 
> Are you saying that I could use an approach that WHILE statement? Something 
> like:
> 
> WHILE data_truc('hour', timestamp) = 12:00:00, then calulate AVG(value)?
> 
> >
> > So a request for:
> > 1 minute data is select from table;
> > 2 minute data is select from table where timer >=2 and timer !=15 and timer 
> > !=4;
> > hourly data is select from table where timer >=64 and timer != 15 and timer 
> > != 4;
> > etc
> >
> > 5 & 15 minute add a bit of complexity, but we gave the users what they 
> > wanted. This has worked well for years now, & we have an internal web 
> > (mapserver/openlayers based) application allowing users to visualise & 
> > download their selected data - they choose from an interval pick list & the 
> > SQL is hidden. Some extra enhancements are the automatic collation of lat & 
> > lon gps readings into a Postgis point for each reading record, & the 
> > automatic aggregation of daily points into daily track lines, so the track 
> > for any selected set of dates can easily be displayed on a map (the 
> > platforms are mobile vessels - not fixed sites)
> >
> > You might adapt some of these ideas for your use case?
> >
> > Cheers
> >
> > Brent Wood
> >
> > Programme leader: Environmental Information Delivery
> > NIWA
> > DDI:  +64 (4) 3860529
> > ________________________________________
> > From: pgsql-general-ow...@postgresql.org 
> > [pgsql-general-ow...@postgresql.org] on behalf of Kirk Wythers 
> > [wythe...@umn.edu]
> > Sent: Tuesday, February 05, 2013 5:58 AM
> > To: pgsql-general@postgresql.org
> > Subject: Fwd: [GENERAL] partial time stamp query
> >
> > Thanks. That worked great! Now I am trying to aggregate these same fifteen 
> > minute to hourly. I have tried using date_trunk:
> >
> > date_trunc('hour', derived_tsoil_fifteen_min_stacked.time2),
> >
> > but date_truck only seems to aggriage the timestamp. I thought I could use
> >
> > AVG(derived_tsoil_fifteen_min_stacked.value)
> >
> > in combination with date_trunk, but I still get 15 minute values, not the 
> > hourly average from the four 15 minute records.
> >
> > rowid date_truck time2 site canopy plot variable name value avg
> > 2010-07-07_00:00:00_b4warm_a 2010-07-07 00:00:00 1 2010-07-07 00:00:00 cfc 
> > closed a2 tsoil_sc tsoil_avg1_sc 21.06 21.0599994659424
> > 2010-07-07_00:15:00_b4warm_a 2010-07-07 00:00:00 1 2010-07-07 00:15:00 cfc 
> > closed a2 tsoil_sc tsoil_avg1_sc 20.96 20.9599990844727
> > 2010-07-07_00:30:00_b4warm_a 2010-07-07 00:00:00 1 2010-07-07 00:30:00 cfc 
> > closed a2 tsoil_sc tsoil_avg1_sc 20.88 20.8799991607666
> > 2010-07-07_00:45:00_b4warm_a 2010-07-07 00:00:00 1 2010-07-07 00:45:00 cfc 
> > closed a2 tsoil_sc tsoil_avg1_sc 20.8 20.7999992370605
> > 2010-07-07_01:00:00_b4warm_a 2010-07-07 01:00:00 1 2010-07-07 01:00:00 cfc 
> > closed a2 tsoil_sc tsoil_avg1_sc 20.72 20.7199993133545
> > 2010-07-07_01:15:00_b4warm_a 2010-07-07 01:00:00 1 2010-07-07 01:15:00 cfc 
> > closed a2 tsoil_sc tsoil_avg1_sc 20.64 20.6399993896484
> > 2010-07-07_01:30:00_b4warm_a 2010-07-07 01:00:00 1 2010-07-07 01:30:00 cfc 
> > closed a2 tsoil_sc tsoil_avg1_sc 20.55 20.5499992370605
> > 2010-07-07_01:45:00_b4warm_a 2010-07-07 01:00:00 1 2010-07-07 01:45:00 cfc 
> > closed a2 tsoil_sc tsoil_avg1_sc 20.47 20.4699993133545
> >
> > I was tying to get two records out of this set, with the 'avg" column 
> > representing the mean of the first and last four of each 15 minute records.
> >
> > Perhaps date_trunk only works for the timestamp?
> >
> >
> >
> > On Feb 4, 2013, at 8:50 AM, Misa Simic 
> > <misa.si...@gmail.com<mailto:misa.si...@gmail.com>> wrote:
> >
> > WHERE derived_tsoil_fifteen_min_stacked.time2::date = '2010-07-07'::date
> >
> > On Monday, February 4, 2013, Kirk Wythers wrote:
> > I am trying to write a query that grabs one particular day from a timestamp 
> > column. The data are ordered in 15 minute chunks like this:
> >
> > 2010-07-07 12:45:00
> > 2010-07-07 13:00:00
> > 2010-07-07 13:15:00
> > 2010-07-07 13:30:00
> > etc…
> >
> > If I wanted all records from july 7th 2010, I would expect 4 x 24 = 96 
> > records per day.
> >
> > I have tried the '=' operator, like this
> >
> > WHERE derived_tsoil_fifteen_min_stacked.time2 = '2010-07-07*'
> >
> > but that grabs nothing, and using the '~' operator grabs everything with a 
> > 2010 or 07 in it… in other word

Re: [GENERAL] partial time stamp query

Reply via email to