[GENERAL] How to INSERT INTO one table from another table, WHERE

2013-05-03 Thread Kirk Wythers
I am trying to insert data from 2 columns in tableB (colX and colY) into the 
same two columns of tableB, with a join like where clause. Is this possible? 

For example:

INSERT INTO tableA (colX, colY) 
(SELECT colX, colY
FROM tableB
WHERE
tableA.blockname = tableB.block_name
AND tableA.timestamp = tableB.timestamp)
;




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] run COPY as user other than postgres

2013-04-24 Thread Kirk Wythers

On Apr 24, 2013, at 6:14 AM, Bill Moran  wrote:
>>> 
> 
> Write your own client that uses the copy interface to
> load a file from wherever and send it to the server.
> 
> Or just use the one built in to psql, as Jasen suggested.
> 

I am using "copy to" to write data from the db out to csv files. The psql 
suggestion of using \copy worked like a charm. 





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] run COPY as user other than postgres

2013-04-23 Thread Kirk Wythers
I would like to run the COPY command as a user other than "postgres". I find it 
a bit of a pain (or at least requiring an extra step or two) to have the 
postgres user own the files that I am creating with COPY TO. Here is a simple 
example where the location '/some/path/to/file/file.csv' is owned by another 
user and it would be very spiffy if I could run the COPY TO as that user. Any 
ideas? 



COPY (
SELECT * FROM
some_table
WHERE
2012 = EXTRACT (YEAR FROM some_column) --AND value IS NOT NULL
) 
TO '/some/path/to/file/file.csv' WITH CSV HEADER;

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pulling year out of a timestamp

2013-04-11 Thread Kirk Wythers

On Apr 11, 2013, at 10:55 AM, Ryan Kelly  wrote:

> You want date_part, not date_trunc.
> 
> -Ryan

Thanks Ryan. It looks like " EXTRACT(YEAR FROM table2.time2)" works as well.

[GENERAL] pulling year out of a timestamp

2013-04-11 Thread Kirk Wythers
I am trying to perform a join between two tables where I need to join "year" in 
table 1 with the year component of a timestamp in table 2.

Something like this:

table1.year = table2.timestamp 
where timestamp has the format:  "2009-01-01 00:00:00"

I've tried 

date_trunc('year', table2.timestamp) = table1.year

but am getting this error:

ERROR:  operator does not exist: timestamp without time zone = integer
LINE 15:  AND date_trunc('year', _60min_user.time2) = power.year
^
HINT:  No operator matches the given name and argument type(s). You might need 
to add explicit type casts.

Am I barking up the completely wrong tree?




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] big un stacking query - help save me from myself

2013-03-15 Thread Kirk Wythers

On Mar 14, 2013, at 10:27 AM, Kevin Grittner  wrote:
>> 
> 
> I didn't understand your description of what you are trying to do,
> and the example has so many columns and cases that it would take a
> long time to understand it.  Can you distill this down to just a
> few columns and cases so that it is easier to understand what you
> are trying to accomplish?  Even better would be a self-contained
> test case with just a few rows so people can see "before" and
> "after" data.  What you have already posted will help give context
> on how it needs to scale, which is important, too; but if you make
> the issue easier to understand, the odds improve that someone will
> volunteer the time needed to make a suggestion.


Here is a simplified version of the query approach I am attempting. First of 
all, this query works fine with a limited number of columns. There are some 
colums that I am leaving alone (those are the var1, var2, var3 variables) and a 
limited number of variables that I am trying to "unstack" (those are unstack1, 
unstack2, unstack3…. variables). 

The problem lies in that the real table I am working with is vary large. There 
are 30 plus var1, var2… columns I am not unstacking, and 30 plus variables 
(unstack1, unstack2…) that I am unstacking, from a 25 million row table.  

I have looked at the tablefunc approach, and I am wondering if it is any more 
efficient than using the CASE approach I am trying here. I let the full version 
of the below query run for 2 days before killing it when it threatened to fill 
the entire hard drive (250 Gigs). 

CREATE TABLE unstacked_table AS (
SELECT
var1,
var2,
var3,
MAX (
CASE
WHEN variable_name = 'unstack1' THEN

VALUE

END
) AS unstack1,
MAX (
CASE
WHEN variable_name = 'unstack2' THEN

VALUE

END
) AS unstack2,
MAX (
CASE
WHEN variable_name = 'unstack3' THEN

VALUE

END
) AS unstack3,
MAX (
CASE
WHEN variable_name = 'unstack4' THEN

VALUE

END
) AS unstack4
FROM
stacked_table
GROUP BY
variable1,
variable2,
variable3   
) 
;

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] big un stacking query - save me from myself

2013-03-13 Thread Kirk Wythers
I have a fairly large table with two columns that I need to "de-normalize" (235 
million rows) There has got to be a better (i.e. faster) approach than what I 
am doing. I am using a MAX CASE on each of the 24 variables (column names 
variable and value) that I want to unstack.  Any suggestions would be most 
appreciated.  Here is the ANALYZE 

GroupAggregate  (cost=107156950.16..174988710.12 rows=23511876 width=286)
  ->  Sort  (cost=107156950.16..107744747.04 rows=235118752 width=286)
Sort Key: site, canopy, block, plot, measurement_interval, 
warming_treatment, treatment_code, treatment_abbr, water_treatment, 
variable_name, variable_channel, variable_id, rowid, time2, "timestamp", 
block_name, table_name, batt_volt, program, flag1, flag2, flag3, amb_a_avg, 
amb_closed_avg, airtc_avg, airtemp_avg, airtemp_max, airtemp_min, all_avgt, 
am25tref1, ptemp_avg, rh, s_all_avgt_avg, vp_avg, tabove_sdupper, 
tabove_sdlower, tabove_meantrim, tabove_mean_dc, tsoil_sdupper, tsoil_sdlower, 
tsoil_meantrim, tsoil_mean_dc
->  Seq Scan on derived_15min_joined  (cost=0.00..11782048.52 
rows=235118752 width=286)
(4 rows)


And here is the table:


b4warmed3=# \d derived_15min_joined
 Table "public.derived_15min_joined"
   Column|Type | Modifiers 
--+-+---
site | character varying(6)| 
canopy   | character varying(24)   | 
block| character(2)| 
plot | character(6)| 
measurement_interval | interval| 
warming_treatment| character varying(24)   | 
treatment_code   | character varying(24)   | 
treatment_abbr   | character varying(24)   | 
water_treatment  | character varying(24)   | 
variable_name| character varying(24)   | 
variable_channel | character varying(24)   | 
variable_id  | character varying(24)   | 
rowid| character varying(48)   | 
time2| timestamp without time zone | 
timestamp| timestamp without time zone | 
block_name   | character varying(8)| 
table_name   | character varying(10)   | 
batt_volt| real| 
program  | character varying(48)   | 
flag1| integer | 
flag2| integer | 
flag3| integer | 
amb_a_avg| real| 
amb_closed_avg   | real| 
airtc_avg| real| 
airtemp_avg  | real| 
airtemp_max  | real| 
airtemp_min  | real| 
all_avgt | real| 
am25tref1| real| 
ptemp_avg| real| 
rh   | real| 
s_all_avgt_avg   | real| 
vp_avg   | real| 
tabove_sdupper   | real| 
tabove_sdlower   | real| 
tabove_meantrim  | real| 
tabove_mean_dc   | real| 
tsoil_sdupper| real| 
tsoil_sdlower| real| 
tsoil_meantrim   | real| 
tsoil_mean_dc| real| 
variable | text| 
value| real| 
Indexes:
   "derived_15min_joined_lower_idx" btree (lower(variable_name::text))
   "derived_15min_joined_time2_idx" btree (time2)

b4warmed3=# 

And here is my query. 

SELECT
site,
canopy,
block,
plot,
measurement_interval,
warming_treatment,
treatment_code,
treatment_abbr,
water_treatment,
variable_name,
variable_channel,
variable_id,
rowid,
time2,
timestamp,
block_name,
table_name,
batt_volt,
program,
flag1,
flag2,
flag3,
amb_a_avg,
amb_closed_avg,
airtc_avg,
airtemp_avg,
airtemp_max,
airtemp_min,
all_avgt,
am25tref1,
ptemp_avg,
rh,
s_all_avgt_avg,
vp_avg,
tabove_sdupper,
tabove_sdlower,
tabove_meantrim,
tabove_mean_dc,
tsoil_sdupper,
tsoil_sdlower,
tsoil_meantrim,
tsoil_mean_dc,
MAX (
CASE
WHEN lower(variable_name) = 'tabove' THEN

value

END
) AS tabove,
MAX (
CASE
WHEN 

[GENERAL] big un stacking query - help save me from myself

2013-03-13 Thread Kirk Wythers
I hace a fairly large table with two columns that I need to "de-normalize" (235 
million rows) There has got to be a better (i.e. faster) approach than what I 
am doing. I am using a MAX CASE on each of the 24 variables (column names 
variable and value) that I want to unstack.  Any suggestions would be most 
appreciated.  Here is the ANALYZE 

 GroupAggregate  (cost=107156950.16..174988710.12 rows=23511876 width=286)
   ->  Sort  (cost=107156950.16..107744747.04 rows=235118752 width=286)
 Sort Key: site, canopy, block, plot, measurement_interval, 
warming_treatment, treatment_code, treatment_abbr, water_treatment, 
variable_name, variable_channel, variable_id, rowid, time2, "timestamp", 
block_name, table_name, batt_volt, program, flag1, flag2, flag3, amb_a_avg, 
amb_closed_avg, airtc_avg, airtemp_avg, airtemp_max, airtemp_min, all_avgt, 
am25tref1, ptemp_avg, rh, s_all_avgt_avg, vp_avg, tabove_sdupper, 
tabove_sdlower, tabove_meantrim, tabove_mean_dc, tsoil_sdupper, tsoil_sdlower, 
tsoil_meantrim, tsoil_mean_dc
 ->  Seq Scan on derived_15min_joined  (cost=0.00..11782048.52 
rows=235118752 width=286)
(4 rows)


And here is the table:


b4warmed3=# \d derived_15min_joined
  Table "public.derived_15min_joined"
Column|Type | Modifiers 
--+-+---
 site | character varying(6)| 
 canopy   | character varying(24)   | 
 block| character(2)| 
 plot | character(6)| 
 measurement_interval | interval| 
 warming_treatment| character varying(24)   | 
 treatment_code   | character varying(24)   | 
 treatment_abbr   | character varying(24)   | 
 water_treatment  | character varying(24)   | 
 variable_name| character varying(24)   | 
 variable_channel | character varying(24)   | 
 variable_id  | character varying(24)   | 
 rowid| character varying(48)   | 
 time2| timestamp without time zone | 
 timestamp| timestamp without time zone | 
 block_name   | character varying(8)| 
 table_name   | character varying(10)   | 
 batt_volt| real| 
 program  | character varying(48)   | 
 flag1| integer | 
 flag2| integer | 
 flag3| integer | 
 amb_a_avg| real| 
 amb_closed_avg   | real| 
 airtc_avg| real| 
 airtemp_avg  | real| 
 airtemp_max  | real| 
 airtemp_min  | real| 
 all_avgt | real| 
 am25tref1| real| 
 ptemp_avg| real| 
 rh   | real| 
 s_all_avgt_avg   | real| 
 vp_avg   | real| 
 tabove_sdupper   | real| 
 tabove_sdlower   | real| 
 tabove_meantrim  | real| 
 tabove_mean_dc   | real| 
 tsoil_sdupper| real| 
 tsoil_sdlower| real| 
 tsoil_meantrim   | real| 
 tsoil_mean_dc| real| 
 variable | text| 
 value| real| 
Indexes:
"derived_15min_joined_lower_idx" btree (lower(variable_name::text))
"derived_15min_joined_time2_idx" btree (time2)

b4warmed3=# 

And here is my query. 

SELECT
site,
canopy,
block,
plot,
measurement_interval,
warming_treatment,
treatment_code,
treatment_abbr,
water_treatment,
variable_name,
variable_channel,
variable_id,
rowid,
time2,
timestamp,
block_name,
table_name,
batt_volt,
program,
flag1,
flag2,
flag3,
amb_a_avg,
amb_closed_avg,
airtc_avg,
airtemp_avg,
airtemp_max,
airtemp_min,
all_avgt,
am25tref1,
ptemp_avg,
rh,
s_all_avgt_avg,
vp_avg,
tabove_sdupper,
tabove_sdlower,
tabove_meantrim,
tabove_mean_dc,
tsoil_sdupper,
tsoil_sdlower,
tsoil_meantrim,
tsoil_mean_dc,
MAX (
CASE
WHEN lower(variable_name) = 'tabove' THEN

value

END
) AS tabove,

Re: [GENERAL] PostgreSQL took over my login

2013-02-20 Thread Kirk Wythers

On Feb 20, 2013, at 5:42 PM, Adrian Klaver  wrote:

> On 02/19/2013 07:04 PM, NiRE wrote:
>> All of a sudden my macbook pro force quit and when I restarted it it is
>> run by user PostgreSQL user - i do not know the password and cannot
>> change it. I cannot find out online how to fix this back. Any help would
>> be much appreciated and also as soon as possible. Thank you.
> 
> Can't you just log out and then log in as you?
>> 
>> Erin
> 

Try booting into "single user" mode or "safe" mode. Single user mode will boot 
the machine as root. You can read us on the different boot modes here:

http://support.apple.com/kb/ht1533




Re: [GENERAL] partial time stamp query

2013-02-04 Thread Kirk Wythers

On Feb 4, 2013, at 7:03 PM, Misa Simic  wrote:

> Select time2::date, extract('hour' from time2), AVG(avg) from tablename group 
> by time2::date, extract('hour' from time2)

Thanks Misa, 

But this gives the same result as the way I was using date_trunc (not GROUPING 
BY the hour portion of the timestamp, or in this case the re-cast date). I have 
simplified the query, as much as I can, and it is below:

--COPY (
SELECT
derived_tsoil_fifteen_min_stacked.time2::date,  
extract('hour' FROM derived_tsoil_fifteen_min_stacked.time2),
data_key.plot,
data_key.variable_name,
AVG(derived_tsoil_fifteen_min_stacked.value)
FROM
data_key,
derived_tsoil_fifteen_min_stacked
WHERE
data_key.variable_channel = derived_tsoil_fifteen_min_stacked.variable 
AND data_key.block_name = derived_tsoil_fifteen_min_stacked.block_name
AND data_key.plot = 'a2'
AND derived_tsoil_fifteen_min_stacked.time2::date = '2010-07-07'::date
AND derived_tsoil_fifteen_min_stacked.variable = 'tsoil_avg1_sc'
GROUP BY
derived_tsoil_fifteen_min_stacked.time2::date, 
extract('hour' FROM derived_tsoil_fifteen_min_stacked.time2),
derived_tsoil_fifteen_min_stacked.time2,
data_key.variable_name,
data_key.plot
ORDER BY
derived_tsoil_fifteen_min_stacked.time2
--) TO '/tmp/derived_tsoil_hourly.csv' WITH CSV HEADER
;

This query returns 96 records (again, one for each 15 minute interval in the 24 
hour day). 

2010-07-07  0   a2  tsoil_sc21.054659424
2010-07-07  0   a2  tsoil_sc20.950844727
2010-07-07  0   a2  tsoil_sc20.871607666
2010-07-07  0   a2  tsoil_sc20.792370605
2010-07-07  1   a2  tsoil_sc20.713133545
2010-07-07  1   a2  tsoil_sc20.633896484
2010-07-07  1   a2  tsoil_sc20.542370605
2010-07-07  1   a2  tsoil_sc20.463133545
2010-07-07  2   a2  tsoil_sc20.383896484
2010-07-07  2   a2  tsoil_sc20.304659424
2010-07-07  2   a2  tsoil_sc20.25
2010-07-07  2   a2  tsoil_sc20.158474121
2010-07-07  3   a2  tsoil_sc20.103814697
2010-07-07  3   a2  tsoil_sc20.012288818
2010-07-07  3   a2  tsoil_sc19.945340576
2010-07-07  3   a2  tsoil_sc19.866103516
2010-07-07  4   a2  tsoil_sc19.774577637
2010-07-07  4   a2  tsoil_sc19.713133545
2010-07-07  4   a2  tsoil_sc19.646185303
2010-07-07  4   a2  tsoil_sc19.554659424
etc….

Could there be anything in the JOIN part of this query that is causing 
problems? I'm really grasping at straws now!

Thanks again,

Kirk


> 
> 
> On Monday, February 4, 2013, Kirk Wythers wrote:
> Hi Brent,
> 
> Nice to hear from you. I hope your world is good.
> 
> On Feb 4, 2013, at 2:14 PM, Brent Wood  wrote:
> 
> > Hi Kirk,
> >
> > We have a (near) real time data database for instrument observations from 
> > our research vessels. All observations (summarised to one minute intervals 
> > - the actual raw data is in netCDF, this database makes for easier access & 
> > meets most users needs) go into a single table, with other tables for 
> > metadata about platforms, instruments, etc. Now approaching 350m records, 
> > so reasonably substantial.
> >
> > Underlying optimisations include
> >
> > partitioned readings table, with a separate partition for each year (now 23 
> > years)
> > clustered index on timestamp for the previous years partitions.
> > largeish filesystem block size - tested to work well with the clustered 
> > index & small size records)
> >
> > These generally make a big difference to performance. To address one issue, 
> > much like yours, where some users want hourly data for a year, some want 
> > daily data for 10 years & some want 1 minute data for the last month (& 
> > some, no doubt, want one minute data for 20+ years!) I introduced an 
> > integer column called timer. This value is set according to the time (not 
> > date) of each record.
> 
> Very similar to what I need to do. Our main table consists of records that 
> have been standardized to 15 minute timestamps. Here is a simplified example
> 
> record  timestamp   variablevalue
> 1   12:00:00temp12.6
> 2   12:15:00temp12.3
> 3   12:30:00

Re: [GENERAL] date_trunc to aggregate values?

2013-02-04 Thread Kirk Wythers
On Feb 4, 2013, at 3:26 PM, Jason Dusek  wrote:

> 2013/2/4 Kirk Wythers :
>> I am looking for suggestions on aggregation techniques using a timestamp 
>> column. In my case I have tried:
>> 
>> date_trunc('hour', derived_tsoil_fifteen_min_stacked.time2),
>> 
>> but date_truck only seems to aggregate the timestamp. I thought I could use
>> 
>> AVG(derived_tsoil_fifteen_min_stacked.value)
>> 
>> in combination with date_trunk, but I still get 15 minute values, not the 
>> hourly average from the four 15 minute records.
>> 
>> rowid   date_truck   
>>time2   sitecanopy  plot
>> variablenamevalue   avg
>> 2010-07-07_00:00:00_b4warm_a2010-07-07 00:00:00 1   2010-07-07 
>> 00:00:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   21.06   
>> 21.054659424
>> 2010-07-07_00:15:00_b4warm_a2010-07-07 00:00:00 1   2010-07-07 
>> 00:15:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.96   
>> 20.950844727
>> 2010-07-07_00:30:00_b4warm_a2010-07-07 00:00:00 1   2010-07-07 
>> 00:30:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.88   
>> 20.871607666
>> 2010-07-07_00:45:00_b4warm_a2010-07-07 00:00:00 1   2010-07-07 
>> 00:45:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.8
>> 20.792370605
>> 2010-07-07_01:00:00_b4warm_a2010-07-07 01:00:00 1   2010-07-07 
>> 01:00:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.72   
>> 20.713133545
>> 2010-07-07_01:15:00_b4warm_a2010-07-07 01:00:00 1   2010-07-07 
>> 01:15:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.64   
>> 20.633896484
>> 2010-07-07_01:30:00_b4warm_a2010-07-07 01:00:00 1   2010-07-07 
>> 01:30:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.55   
>> 20.542370605
>> 2010-07-07_01:45:00_b4warm_a2010-07-07 01:00:00 1   2010-07-07 
>> 01:45:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.47   
>> 20.463133545
>> 
>> I was tying to get two records out of this set, with the 'avg" column 
>> representing the mean of the first and last four of each 15 minute records.
>> 
>> Suggestions?
> 
> Are you using an explicit GROUP BY?
> 

Here is what I have in the GROUP BY clause

GROUP BY
date_trunc('hour', derived_tsoil_fifteen_min_stacked.time2),
derived_tsoil_fifteen_min_stacked.time2,
data_key.site,
data_key.canopy,
data_key.variable_name,
data_key.plot



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] partial time stamp query

2013-02-04 Thread Kirk Wythers
Hi Brent, 

Nice to hear from you. I hope your world is good. 

On Feb 4, 2013, at 2:14 PM, Brent Wood  wrote:

> Hi Kirk,
> 
> We have a (near) real time data database for instrument observations from our 
> research vessels. All observations (summarised to one minute intervals - the 
> actual raw data is in netCDF, this database makes for easier access & meets 
> most users needs) go into a single table, with other tables for metadata 
> about platforms, instruments, etc. Now approaching 350m records, so 
> reasonably substantial.
> 
> Underlying optimisations include
> 
> partitioned readings table, with a separate partition for each year (now 23 
> years)
> clustered index on timestamp for the previous years partitions.
> largeish filesystem block size - tested to work well with the clustered index 
> & small size records)
> 
> These generally make a big difference to performance. To address one issue, 
> much like yours, where some users want hourly data for a year, some want 
> daily data for 10 years & some want 1 minute data for the last month (& some, 
> no doubt, want one minute data for 20+ years!) I introduced an integer column 
> called timer. This value is set according to the time (not date) of each 
> record.

Very similar to what I need to do. Our main table consists of records that have 
been standardized to 15 minute timestamps. Here is a simplified example

record  timestamp   variablevalue
1   12:00:00temp12.6
2   12:15:00temp12.3
3   12:30:00temp11.7
4   12:45:00temp12.3
5   13:00:00temp13.9
6   13:15:00temp12.5
7   13.30:00temp13.7
8   13:45:00temp12.0

You are exactly right, some people will want the original 15 minute version, 
some people will want these summarized to hourly data, and others will want 
these summarized to daily data. Still others may be satisfied with monthly 
summaries. 

> 
> Along the lines of (from memory) :an even no of minutes after the hour is 2, 
> 5 minutes is 4, 10 minutes is 8, 15 minute is 16, 30 minutes is 32, 60 
> minutes is 64, 6 hourly is 128, 12:00 AM is 256 & 12:00PM is 512.   When any 
> timestamp is in more than one category (eg: 12:00 is all of even, 5, 15m 30m 
> 60 minutes), the timer value is set to the largest appropriate one.

I'm not quite following. In my case, if I want hourly data, I'd be looking for…

record  timestamp   variablevalue
1   12:00:00temp12.225
2   13:00:00temp13.025

Are you saying that I could use an approach that WHILE statement? Something 
like:

WHILE data_truc('hour', timestamp) = 12:00:00, then calulate AVG(value)?

> 
> So a request for:
> 1 minute data is select from table;
> 2 minute data is select from table where timer >=2 and timer !=15 and timer 
> !=4;
> hourly data is select from table where timer >=64 and timer != 15 and timer 
> != 4;
> etc
> 
> 5 & 15 minute add a bit of complexity, but we gave the users what they 
> wanted. This has worked well for years now, & we have an internal web 
> (mapserver/openlayers based) application allowing users to visualise & 
> download their selected data - they choose from an interval pick list & the 
> SQL is hidden. Some extra enhancements are the automatic collation of lat & 
> lon gps readings into a Postgis point for each reading record, & the 
> automatic aggregation of daily points into daily track lines, so the track 
> for any selected set of dates can easily be displayed on a map (the platforms 
> are mobile vessels - not fixed sites)
> 
> You might adapt some of these ideas for your use case?
> 
> Cheers
> 
> Brent Wood
> 
> Programme leader: Environmental Information Delivery
> NIWA
> DDI:  +64 (4) 3860529
> 
> From: pgsql-general-ow...@postgresql.org [pgsql-general-ow...@postgresql.org] 
> on behalf of Kirk Wythers [wythe...@umn.edu]
> Sent: Tuesday, February 05, 2013 5:58 AM
> To: pgsql-general@postgresql.org
> Subject: Fwd: [GENERAL] partial time stamp query
> 
> Thanks. That worked great! Now I am trying to aggregate these same fifteen 
> minute to hourly. I have tried using date_trunk:
> 
> date_trunc('hour', derived_tsoil_fifteen_min_stacked.time2),
> 
> but date_truck only seems to aggriage the timestamp. I thought I could use
> 
> AVG(derived_tsoil_fifteen_min_stacked

[GENERAL] date_trunc to aggregate values?

2013-02-04 Thread Kirk Wythers
I am looking for suggestions on aggregation techniques using a timestamp 
column. In my case I have tried:

date_trunc('hour', derived_tsoil_fifteen_min_stacked.time2),

but date_truck only seems to aggregate the timestamp. I thought I could use 

AVG(derived_tsoil_fifteen_min_stacked.value)

in combination with date_trunk, but I still get 15 minute values, not the 
hourly average from the four 15 minute records.

rowid   date_truck  
time2   sitecanopy  plot
variablenamevalue   avg
2010-07-07_00:00:00_b4warm_a2010-07-07 00:00:00 1   2010-07-07 
00:00:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   21.06   
21.054659424
2010-07-07_00:15:00_b4warm_a2010-07-07 00:00:00 1   2010-07-07 
00:15:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.96   
20.950844727
2010-07-07_00:30:00_b4warm_a2010-07-07 00:00:00 1   2010-07-07 
00:30:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.88   
20.871607666
2010-07-07_00:45:00_b4warm_a2010-07-07 00:00:00 1   2010-07-07 
00:45:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.8
20.792370605
2010-07-07_01:00:00_b4warm_a2010-07-07 01:00:00 1   2010-07-07 
01:00:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.72   
20.713133545
2010-07-07_01:15:00_b4warm_a2010-07-07 01:00:00 1   2010-07-07 
01:15:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.64   
20.633896484
2010-07-07_01:30:00_b4warm_a2010-07-07 01:00:00 1   2010-07-07 
01:30:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.55   
20.542370605
2010-07-07_01:45:00_b4warm_a2010-07-07 01:00:00 1   2010-07-07 
01:45:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.47   
20.463133545

I was tying to get two records out of this set, with the 'avg" column 
representing the mean of the first and last four of each 15 minute records. 

Suggestions?



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Fwd: [GENERAL] partial time stamp query

2013-02-04 Thread Kirk Wythers
Thanks. That worked great! Now I am trying to aggregate these same fifteen 
minute to hourly. I have tried using date_trunk:

date_trunc('hour', derived_tsoil_fifteen_min_stacked.time2),

but date_truck only seems to aggriage the timestamp. I thought I could use 

AVG(derived_tsoil_fifteen_min_stacked.value)

in combination with date_trunk, but I still get 15 minute values, not the 
hourly average from the four 15 minute records.

rowid   date_truck  
time2   sitecanopy  plot
variablenamevalue   avg
2010-07-07_00:00:00_b4warm_a2010-07-07 00:00:00 1   2010-07-07 
00:00:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   21.06   
21.054659424
2010-07-07_00:15:00_b4warm_a2010-07-07 00:00:00 1   2010-07-07 
00:15:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.96   
20.950844727
2010-07-07_00:30:00_b4warm_a2010-07-07 00:00:00 1   2010-07-07 
00:30:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.88   
20.871607666
2010-07-07_00:45:00_b4warm_a2010-07-07 00:00:00 1   2010-07-07 
00:45:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.8
20.792370605
2010-07-07_01:00:00_b4warm_a2010-07-07 01:00:00 1   2010-07-07 
01:00:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.72   
20.713133545
2010-07-07_01:15:00_b4warm_a2010-07-07 01:00:00 1   2010-07-07 
01:15:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.64   
20.633896484
2010-07-07_01:30:00_b4warm_a2010-07-07 01:00:00 1   2010-07-07 
01:30:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.55   
20.542370605
2010-07-07_01:45:00_b4warm_a2010-07-07 01:00:00 1   2010-07-07 
01:45:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.47   
20.463133545

I was tying to get two records out of this set, with the 'avg" column 
representing the mean of the first and last four of each 15 minute records. 

Perhaps date_trunk only works for the timestamp? 



On Feb 4, 2013, at 8:50 AM, Misa Simic  wrote:

> WHERE derived_tsoil_fifteen_min_stacked.time2::date = '2010-07-07'::date
> 
> On Monday, February 4, 2013, Kirk Wythers wrote:
> I am trying to write a query that grabs one particular day from a timestamp 
> column. The data are ordered in 15 minute chunks like this:
> 
> 2010-07-07 12:45:00
> 2010-07-07 13:00:00
> 2010-07-07 13:15:00
> 2010-07-07 13:30:00
> etc…
> 
> If I wanted all records from july 7th 2010, I would expect 4 x 24 = 96 
> records per day.
> 
> I have tried the '=' operator, like this
> 
> WHERE derived_tsoil_fifteen_min_stacked.time2 = '2010-07-07*'
> 
> but that grabs nothing, and using the '~' operator grabs everything with a 
> 2010 or 07 in it… in other words all days from July of 2010.
> 
> Any suggestions would be much appreciated.
> 
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general




Re: [GENERAL] partial time stamp query

2013-02-04 Thread Kirk Wythers
Thanks. That worked great! Now I am trying to aggregate these same fifteen 
minute to hourly. I have tried using date_trunk:

date_trunc('hour', derived_tsoil_fifteen_min_stacked.time2),

but date_truck only seems to aggriage the timestamp. I thought I could use 

AVG(derived_tsoil_fifteen_min_stacked.value)

in combination with date_trunk, but I still get 15 minute values, not the 
hourly average from the four 15 minute records.

rowid   date_truck  
time2   sitecanopy  plot
variablenamevalue   avg
2010-07-07_00:00:00_b4warm_a2010-07-07 00:00:00 1   2010-07-07 
00:00:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   21.06   
21.054659424
2010-07-07_00:15:00_b4warm_a2010-07-07 00:00:00 1   2010-07-07 
00:15:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.96   
20.950844727
2010-07-07_00:30:00_b4warm_a2010-07-07 00:00:00 1   2010-07-07 
00:30:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.88   
20.871607666
2010-07-07_00:45:00_b4warm_a2010-07-07 00:00:00 1   2010-07-07 
00:45:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.8
20.792370605
2010-07-07_01:00:00_b4warm_a2010-07-07 01:00:00 1   2010-07-07 
01:00:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.72   
20.713133545
2010-07-07_01:15:00_b4warm_a2010-07-07 01:00:00 1   2010-07-07 
01:15:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.64   
20.633896484
2010-07-07_01:30:00_b4warm_a2010-07-07 01:00:00 1   2010-07-07 
01:30:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.55   
20.542370605
2010-07-07_01:45:00_b4warm_a2010-07-07 01:00:00 1   2010-07-07 
01:45:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.47   
20.463133545

I was tying to get two records out of this set, with the 'avg" column 
representing the mean of the first and last four of each 15 minute records. 

Perhaps date_trunk only works for the timestamp? 



On Feb 4, 2013, at 8:50 AM, Misa Simic  wrote:

> WHERE derived_tsoil_fifteen_min_stacked.time2::date = '2010-07-07'::date
> 
> On Monday, February 4, 2013, Kirk Wythers wrote:
> I am trying to write a query that grabs one particular day from a timestamp 
> column. The data are ordered in 15 minute chunks like this:
> 
> 2010-07-07 12:45:00
> 2010-07-07 13:00:00
> 2010-07-07 13:15:00
> 2010-07-07 13:30:00
> etc…
> 
> If I wanted all records from july 7th 2010, I would expect 4 x 24 = 96 
> records per day.
> 
> I have tried the '=' operator, like this
> 
> WHERE derived_tsoil_fifteen_min_stacked.time2 = '2010-07-07*'
> 
> but that grabs nothing, and using the '~' operator grabs everything with a 
> 2010 or 07 in it… in other words all days from July of 2010.
> 
> Any suggestions would be much appreciated.
> 
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general



[GENERAL] partial time stamp query

2013-02-04 Thread Kirk Wythers
I am trying to write a query that grabs one particular day from a timestamp 
column. The data are ordered in 15 minute chunks like this:

2010-07-07 12:45:00
2010-07-07 13:00:00
2010-07-07 13:15:00
2010-07-07 13:30:00
etc…

If I wanted all records from july 7th 2010, I would expect 4 x 24 = 96 records 
per day.

I have tried the '=' operator, like this 

WHERE derived_tsoil_fifteen_min_stacked.time2 = '2010-07-07*'

but that grabs nothing, and using the '~' operator grabs everything with a 2010 
or 07 in it… in other words all days from July of 2010. 

Any suggestions would be much appreciated. 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] partial time stamp query

2013-02-04 Thread Kirk Wythers
I am trying to write a query that grabs one particular day from a timestamp 
column. The data are ordered in 15 minute chunks like this:

2010-07-07 12:45:00
2010-07-07 13:00:00
2010-07-07 13:15:00
2010-07-07 13:30:00
etc…

If I wanted all records from july 7th 2010, I would expect 4 x 24 = 96 records 
per day.

I have tried the '=' operator, like this 

WHERE derived_tsoil_fifteen_min_stacked.time2 = '2010-07-07*'

but that grabs nothing, and using the '~' operator grabs everything with a 2010 
or 07 in it… in other words all days from July of 2010. 

Any suggestions would be much appreciated. 




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] date_trunc to aggregate by timestamp?

2013-01-24 Thread Kirk Wythers
I am trying to some up with an approach that uses "date_truc" to aggregate 15 
minute time series data to hourly bins. My current query which utilizes a view, 
does performs a join after which I use a series a WHERE statements to specify 
which of the 15 minute records I want to look at. 

I think what I need to do is to add a date_truc function to this query which 
would aggregate the 15 minute records to hourly means by plot. In other words 
each of the bolded records listed below (the four records from plot e2 with a 
timestamp from hour 15 would get averaged to a single record. 

I can see in the docs that date_trunc uses a select statement to grab the 
specified time unit. 

SELECT date_trunc('hour', TIMESTAMP '2001-02-16 20:38:40');
Result: 2001-02-16 20:00:00
However, I am unclear as to how I need to implement this in my query in such a 
way that the aggregation averages "value" by "plot". Any suggestions would be 
appreciated. 


SELECT
data_key.site,
data_key.canopy,
data_key.interval,
data_key.treatment_code,
data_key.treatment_abbr,
data_key.plot, 
fifteen_min_stacked_proper.*
FROM
data_key,
fifteen_min_stacked_proper
WHERE
data_key.variable_channel = fifteen_min_stacked_proper.variable AND 
data_key.block_name = fifteen_min_stacked_proper.block_name
AND 2012 = EXTRACT(YEAR FROM time2)
--AND fifteen_min_stacked_proper.block ~ 'b4warm_[ace]'
AND fifteen_min_stacked_proper.value IS NOT NULL
AND fifteen_min_stacked_proper.variable ~ 'scld'


Here is a snip of the query output: 

sitecanopy  intervalplotrowid   
time2   block   variable
value
cfc open0:00:15 e2  2009-03-19_15:00:00_b4warm_e3/19/09 15:00   
b4warm_escldout_avg10
cfc open0:00:15 e8  2009-03-19_15:00:00_b4warm_e3/19/09 15:00   
b4warm_escldout_avg20
cfc open0:00:15 e1  2009-03-19_15:00:00_b4warm_e3/19/09 15:00   
b4warm_escldout_avg30
cfc open0:00:15 e5  2009-03-19_15:00:00_b4warm_e3/19/09 15:00   
b4warm_escldout_avg40
cfc open0:00:15 e2  2009-03-19_15:15:00_b4warm_e3/19/09 15:15   
b4warm_escldout_avg10
cfc open0:00:15 e8  2009-03-19_15:15:00_b4warm_e3/19/09 15:15   
b4warm_escldout_avg20
cfc open0:00:15 e1  2009-03-19_15:15:00_b4warm_e3/19/09 15:15   
b4warm_escldout_avg30
cfc open0:00:15 e5  2009-03-19_15:15:00_b4warm_e3/19/09 15:15   
b4warm_escldout_avg40
cfc open0:00:15 e2  2009-03-19_15:30:00_b4warm_e3/19/09 15:30   
b4warm_escldout_avg17999
cfc open0:00:15 e8  2009-03-19_15:30:00_b4warm_e3/19/09 15:30   
b4warm_escldout_avg27999
cfc open0:00:15 e1  2009-03-19_15:30:00_b4warm_e3/19/09 15:30   
b4warm_escldout_avg33579
cfc open0:00:15 e5  2009-03-19_15:30:00_b4warm_e3/19/09 15:30   
b4warm_escldout_avg43579
cfc open0:00:15 e2  2009-03-19_15:45:00_b4warm_e3/19/09 15:45   
b4warm_escldout_avg17999
cfc open0:00:15 e8  2009-03-19_15:45:00_b4warm_e3/19/09 15:45   
b4warm_escldout_avg27999
cfc open0:00:15 e1  2009-03-19_15:45:00_b4warm_e3/19/09 15:45   
b4warm_escldout_avg34000
cfc open0:00:15 e5  2009-03-19_15:45:00_b4warm_e3/19/09 15:45   
b4warm_escldout_avg44000
cfc open0:00:15 e2  2009-03-19_16:00:00_b4warm_e3/19/09 16:00   
b4warm_escldout_avg17999
cfc open0:00:15 e8  2009-03-19_16:00:00_b4warm_e3/19/09 16:00   
b4warm_escldout_avg27999
cfc open0:00:15 e1  2009-03-19_16:00:00_b4warm_e3/19/09 16:00   
b4warm_escldout_avg34000
cfc open0:00:15 e5  2009-03-19_16:00:00_b4warm_e3/19/09 16:00   
b4warm_escldout_avg44000
cfc open0:00:15 e2  2009-03-19_16:15:00_b4warm_e3/19/09 16:15   
b4warm_escldout_avg17999
cfc open0:00:15 e8  2009-03-19_16:15:00_b4warm_e3/19/09 16:15   
b4warm_escldout_avg27999
cfc open0:00:15 e1  2009-03-19_16:15:00_b4warm_e3/19/09 16:15   
b4warm_escldout_avg34000
cfc open0:00:15 e5  2009-03-19_16:15:00_b4warm_e3/19/09 16:15   
b4warm_escldout_avg44000
cfc open0:00:15 e2  2009-03-19_16:30:00_b4warm_e3/19/09 16:30   
b4warm_escldout_avg17999
cfc open0:00:15 e8  2009-03-19_16:30:00_b4warm_e3/19/09 16:30   
b4warm_escldout_avg27999
cfc open0:00:15 e1  2009-03-19_16:30:00_b4warm_e3/19/09 16:30   
b4warm_escldout_avg34000
cfc open0:00:15 e5  2009-03-19_16:30:00_b4warm_e3/19/

Re: [GENERAL] speeding up a join query that utilizes a view

2013-01-18 Thread Kirk Wythers

On Jan 18, 2013, at 10:05 AM, Igor Neyman  wrote:

> Kirk,
> 
> Are you doing un-pivoting in most of your queries?
> Did you try normalized design for fifteen_minute table?
> Is there specific reason for de-normalization?
> 
> Regards,
> Igor Neyman

Thanks Igor. The only reason I'm de-normalizing with unnest, is so I can 
perform a join on variable_name with the table "data_key". I't kind of a crazy 
design, but it is what I was given to work with. Here is the join that takes so 
dang long to perform:

SELECT
data_key.site,
data_key.canopy,
data_key.measurement_interval,
data_key.treatment_code,
data_key.treatment_abbr,
data_key.plot, 
fifteen_min_stacked_propper.*
FROM
data_key,
fifteen_min_stacked_propper
WHERE
data_key.variable_channel = fifteen_min_stacked_propper.variable AND 
data_key.block_name = fifteen_min_stacked_propper.block_name
--AND 2012 = EXTRACT(YEAR FROM time2)
--AND fifteen_min_stacked_propper.block_name ~ 'b4warm_[ace]'
--AND fifteen_min_stacked_propper.value IS NOT NULL
AND fifteen_min_stacked_propper.variable ~ 'tsoil'

The whole point of the de-normalized table "fifteen_min_stacked_propper" is so 
that variable names in fifteen_min_stacked_propper.variable can be used to join 
on data_key.variable_channel.

Does that make sense? 

Kirk


> 
>> -Original Message-
>> From: Kirk Wythers [mailto:kwyth...@umn.edu]
>> Sent: Friday, January 18, 2013 10:50 AM
>> To: Igor Neyman
>> Cc: Kirk Wythers; pgsql-general@postgresql.org
>> Subject: Re: [GENERAL] speeding up a join query that utilizes a view
>> 
>> 
>> On Jan 18, 2013, at 8:10 AM, Igor Neyman 
>> wrote:
>> 
>>> Yes, my mistake, I meant to ask about fifteen_min_stacked_view
>> definition, and Postgres parameters from postgresql.conf configuration
>> file, at least those - modified from default setting and related to
>> "resource consumption" and "query tuning".
>>> 
>>> Regards,
>>> Igor Neyman
>> 
>> Here some extra bits form the postgresql.conf file.  As you can see, I
>> have not changed much from the default settings.
>> 
>> #--
>> 
>> # RESOURCE USAGE (except WAL)
>> #--
>> 
>> 
>> # - Memory -
>> 
>> shared_buffers = 3GB # 7GB  # min 128kB
>># (change requires restart)
>> temp_buffers = 80MB # 8MB   # min 800kB
>> #max_prepared_transactions = 0  # zero disables the feature
>># (change requires restart) #
>> Note:  Increasing max_prepared_transactions costs ~600 bytes of shared
>> memory # per transaction slot, plus lock space (see
>> max_locks_per_transaction).
>> # It is not advisable to set max_prepared_transactions nonzero unless
>> you # actively intend to use prepared transactions.
>> work_mem = 64MB #8MB# min 64kB
>> maintenance_work_mem = 128MB# min 1MB
>> #max_stack_depth = 2MB  # min 100kB
>> 
>> # - Kernel Resource Usage -
>> 
>> #max_files_per_process = 1000   # min 25
>># (change requires restart)
>> #shared_preload_libraries = ''  # (change requires restart)
>> 
>> # - Cost-Based Vacuum Delay -
>> 
>> #vacuum_cost_delay = 0ms# 0-100 milliseconds
>> #vacuum_cost_page_hit = 1   # 0-1 credits
>> #vacuum_cost_page_miss = 10 # 0-1 credits
>> #vacuum_cost_page_dirty = 20# 0-1 credits
>> #vacuum_cost_limit = 200# 1-1 credits
>> 
>> # - Background Writer -
>> 
>> #bgwriter_delay = 200ms # 10-1ms between rounds
>> #bgwriter_lru_maxpages = 100# 0-1000 max buffers
>> written/round
>> #bgwriter_lru_multiplier = 2.0  # 0-10.0 multipler on buffers
>> scanned/round
>> 
>> # - Asynchronous Behavior -
>> 
>> #effective_io_concurrency = 1   # 1-1000. 0 disables
>> prefetching
>> 
>> #--
>> 
>> # QUERY TUNING
>> #--
>> 
>> 
>> # - Planner Method Configuration -
>> 
>> #enable_bitmapscan = on

Re: [GENERAL] speeding up a join query that utilizes a view

2013-01-18 Thread Kirk Wythers

On Jan 18, 2013, at 8:10 AM, Igor Neyman  wrote:

> Yes, my mistake, I meant to ask about fifteen_min_stacked_view definition, 
> and Postgres parameters from postgresql.conf configuration file, at least 
> those - modified from default setting and related to "resource consumption" 
> and "query tuning".
> 
> Regards,
> Igor Neyman

Here some extra bits form the postgresql.conf file.  As you can see, I have not 
changed much from the default settings. 

#--
# RESOURCE USAGE (except WAL)
#--

# - Memory -

shared_buffers = 3GB # 7GB  # min 128kB
# (change requires restart)
temp_buffers = 80MB # 8MB   # min 800kB
#max_prepared_transactions = 0  # zero disables the feature
# (change requires restart)
# Note:  Increasing max_prepared_transactions costs ~600 bytes of shared memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
# It is not advisable to set max_prepared_transactions nonzero unless you
# actively intend to use prepared transactions.
work_mem = 64MB #8MB# min 64kB
maintenance_work_mem = 128MB# min 1MB
#max_stack_depth = 2MB  # min 100kB

# - Kernel Resource Usage -

#max_files_per_process = 1000   # min 25
# (change requires restart)
#shared_preload_libraries = ''  # (change requires restart)

# - Cost-Based Vacuum Delay -

#vacuum_cost_delay = 0ms# 0-100 milliseconds
#vacuum_cost_page_hit = 1   # 0-1 credits
#vacuum_cost_page_miss = 10 # 0-1 credits
#vacuum_cost_page_dirty = 20# 0-1 credits
#vacuum_cost_limit = 200# 1-1 credits

# - Background Writer -

#bgwriter_delay = 200ms # 10-1ms between rounds
#bgwriter_lru_maxpages = 100# 0-1000 max buffers written/round
#bgwriter_lru_multiplier = 2.0  # 0-10.0 multipler on buffers 
scanned/round

# - Asynchronous Behavior -

#effective_io_concurrency = 1   # 1-1000. 0 disables prefetching

#--
# QUERY TUNING
#--

# - Planner Method Configuration -

#enable_bitmapscan = on
#enable_hashagg = on
#enable_hashjoin = on
#enable_indexscan = on
#enable_material = on
#enable_mergejoin = on
#enable_nestloop = on
#enable_seqscan = on
#enable_sort = on
#enable_tidscan = on

# - Planner Cost Constants -

#seq_page_cost = 1.0# measured on an arbitrary scale
#random_page_cost = 4.0 # same scale as above
#cpu_tuple_cost = 0.01  # same scale as above
#cpu_index_tuple_cost = 0.005   # same scale as above
#cpu_operator_cost = 0.0025 # same scale as above
effective_cache_size = 6GB #13GB

# - Genetic Query Optimizer -

#geqo = on
#geqo_threshold = 12
#geqo_effort = 5# range 1-10
#geqo_pool_size = 0 # selects default based on effort
#geqo_generations = 0   # selects default based on effort
#geqo_selection_bias = 2.0  # range 1.5-2.0
#geqo_seed = 0.0# range 0.0-1.0

# - Other Planner Options -

#default_statistics_target = 100# range 1-1
#constraint_exclusion = partition   # on, off, or partition
#cursor_tuple_fraction = 0.1# range 0.0-1.0
#from_collapse_limit = 8
#join_collapse_limit = 8# 1 disables collapsing of explicit
# JOIN clauses


Here is a snip from earlier that includes info about both the table that is 
used to build the view and the view. In short, I use the UNNEST function to 
un-pivot all the variables of interest in the "fifteen_min" table into the 
columns "variable" and "value" in the "fifteen_min_stacked_proper" view.

Thanks again.

Kirk


b4warmed3=# \d fifteen_min
Table "public.fifteen_min"
 Column|Type | Modifiers
-+-+---
rowid   | character varying(48)   | not null
time2   | timestamp without time zone |
timestamp   | timestamp without time zone |
block_name  | character varying(8)|
stat_name   | character varying(8)|
table_name  | character varying(10)   |
program | character varying(48)   |
a_dc_avg1   | real|
a_dc_avg2   | real|
a_dc_avg3   | real|
a_dc_avg4   | real|
a_dif_avg1  

Re: [GENERAL] speeding up a join query that utilizes a view

2013-01-17 Thread Kirk Wythers

On Jan 17, 2013, at 3:51 PM, Igor Neyman  wrote:

> What about index definition, Postgres version, config parameters?
> Hardware configuration would be helpful too.
> 

Sorry

pg 9.1

OS X 10.8 server. 
32 G ram 8 cores

I thought what you meant by index definition is at the bottom of the \d 
table-name. For example: 
>> Indexes:
>>"data_key_pkey" PRIMARY KEY, btree (variable_id)
>>"data_key_lower_idx" btree (lower(block_name::text))
>>"data_key_lower_idx1" btree (lower(variable_channel::text))

on data_key. 

I'm not sure what you mean by config parameters? Output from pg_config?

~$ pg_config 
BINDIR = /usr/bin
DOCDIR = /usr/share/doc/postgresql
HTMLDIR = /usr/share/postgresql
INCLUDEDIR = /usr/include
PKGINCLUDEDIR = /usr/include/postgresql
INCLUDEDIR-SERVER = /usr/include/postgresql/server
LIBDIR = /usr/lib
PKGLIBDIR = /usr/lib/postgresql
LOCALEDIR = /usr/share/locale
MANDIR = /usr/share/man
SHAREDIR = /usr/share/postgresql
SYSCONFDIR = /private/etc/postgresql
PGXS = /usr/lib/postgresql/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--infodir=/usr/share/info' '--disable-dependency-tracking' 
'--prefix=/usr' '--sbindir=/usr/libexec' '--sysconfdir=/private/etc' 
'--mandir=/usr/share/man' '--localstatedir=/private/var/pgsql' 
'--htmldir=/usr/share/postgresql' '--enable-thread-safety' '--enable-dtrace' 
'--with-tcl' '--with-perl' '--with-python' '--with-gssapi' '--with-krb5' 
'--with-pam' '--with-ldap' '--with-bonjour' '--with-openssl' '--with-libxml' 
'--with-libxslt' '--with-system-tzdata=/usr/share/zoneinfo' 
'CC=/Applications/Xcode.app/Contents/Developer/Toolchains/OSX10.8.xctoolchain/usr/bin/cc'
 'CFLAGS=-arch x86_64 -pipe -Os -g -Wall -Wno-deprecated-declarations' 
'LDFLAGS=-arch x86_64 -pipe -Os -g -Wall -Wno-deprecated-declarations' 
'LDFLAGS_EX=-mdynamic-no-pic'
CC = 
/Applications/Xcode.app/Contents/Developer/Toolchains/OSX10.8.xctoolchain/usr/bin/cc
CPPFLAGS = -I/usr/include/libxml2
CFLAGS = -arch x86_64 -pipe -Os -g -Wall -Wno-deprecated-declarations -Wall 
-Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement 
-Wendif-labels -Wformat-security -fno-strict-aliasing -fwrapv
CFLAGS_SL = 
LDFLAGS = -arch x86_64 -pipe -Os -g -Wall -Wno-deprecated-declarations 
-Wl,-dead_strip_dylibs
LDFLAGS_EX = -mdynamic-no-pic
LDFLAGS_SL = 
LIBS = -lpgport -lxslt -lxml2 -lpam -lssl -lcrypto -lgssapi_krb5 -lz -lreadline 
-lm 
VERSION = PostgreSQL 9.1.4

Does that help? 

> 
>> -Original Message-
>> From: Kirk Wythers [mailto:kwyth...@umn.edu]
>> Sent: Thursday, January 17, 2013 3:59 PM
>> To: Igor Neyman
>> Cc: Kirk Wythers; pgsql-general@postgresql.org
>> Subject: Re: [GENERAL] speeding up a join query that utilizes a view
>> 
>>> 
>>> Not enough information:
>>> 
>>> Postgres version?
>>> OS?
>>> Some Postgres configuration parameters, specifically related to
>> "RESOURCE USAGE" and " QUERY TUNING"?
>>> Table structures (including indexes) for:
>> fifteen_min_stacked_propper, fifteen_min, and data_key?
>>> View definition for fifteen_min_stacked_view?
>>> 
>> 
>> 
>> Here is some additional information:
>> 
>> b4warmed3=# \d data_key
>> Table "public.data_key"
>>Column| Type  |
>> Modifiers
>> --+---+
>> -
>> --+---+-
>> site | character varying(6)  |
>> canopy   | character varying(24) |
>> block| character(2)  |
>> plot | character(2)  |
>> measurement_interval | interval  |
>> warming_treatment| character varying(24) |
>> treatment_code   | character varying(24) |
>> treatment_abbr   | character varying(24) |
>> water_treatment  | character varying(24) |
>> block_name   | character varying(24) |
>> variable_name| character varying(24) |
>> variable_channel | character varying(24) |
>> variable_id  | character varying(24) | not null default
>> NULL::character varying
>> Indexes:
>>"data_key_pkey" PRIMARY KEY, btree (variable_id)
>>"data_key_lower_idx" btree (lower(block_name::text))
>>"data_key_lower_idx1" btree (lower(variable_channel::text))
>> 
>> b4war

Re: [GENERAL] speeding up a join query that utilizes a view

2013-01-17 Thread Kirk Wythers
> 
> Not enough information:
> 
> Postgres version?
> OS?
> Some Postgres configuration parameters, specifically related to "RESOURCE 
> USAGE" and " QUERY TUNING"?
> Table structures (including indexes) for: fifteen_min_stacked_propper, 
> fifteen_min, and data_key?
> View definition for fifteen_min_stacked_view?
> 


Here is some additional information:

b4warmed3=# \d data_key
 Table "public.data_key"
Column| Type  |Modifiers
 
--+---+--
 site | character varying(6)  | 
 canopy   | character varying(24) | 
 block| character(2)  | 
 plot | character(2)  | 
 measurement_interval | interval  | 
 warming_treatment| character varying(24) | 
 treatment_code   | character varying(24) | 
 treatment_abbr   | character varying(24) | 
 water_treatment  | character varying(24) | 
 block_name   | character varying(24) | 
 variable_name| character varying(24) | 
 variable_channel | character varying(24) | 
 variable_id  | character varying(24) | not null default 
NULL::character varying
Indexes:
"data_key_pkey" PRIMARY KEY, btree (variable_id)
"data_key_lower_idx" btree (lower(block_name::text))
"data_key_lower_idx1" btree (lower(variable_channel::text))

b4warmed3=# SELECT COUNT(*) FROM data_key;
 count 
---
  4728
(1 row)

b4warmed3=# \d fifteen_min
  Table "public.fifteen_min"
   Column|Type | Modifiers 
-+-+---
 rowid   | character varying(48)   | not null
 time2   | timestamp without time zone | 
 timestamp   | timestamp without time zone | 
 block_name  | character varying(8)| 
 stat_name   | character varying(8)| 
 table_name  | character varying(10)   | 
 program | character varying(48)   | 
 a_dc_avg1   | real| 
 a_dc_avg2   | real| 
 a_dc_avg3   | real| 
 a_dc_avg4   | real| 
 a_dif_avg1  | real| 
 a_dif_avg2  | real| 
 a_dif_avg3  | real| 
 a_dif_avg4  | real| 
 a_targettemp_avg1   | real| 
 a_targettemp_avg2   | real| 
 a_targettemp_avg3   | real| 
 a_targettemp_avg4   | real| 
 a_targettemp_avg5   | real| 
 a_targettemp_avg6   | real| 
 a_targettemp_avg7   | real| 
 a_targettemp_avg8   | real| 
 a_tc_avg1   | real| 
 a_tc_avg10  | real| 
 a_tc_avg11  | real| 
 a_tc_avg12  | real| 
 a_tc_avg2   | real| 
 a_tc_avg3   | real| 
 a_tc_avg4   | real| 
 a_tc_avg5   | real| 
 a_tc_avg6   | real| 
 a_tc_avg7   | real| 
 a_tc_avg8   | real| 
 a_tc_avg9   | real| 
 a_tc_std1   | real| 
 a_tc_std10  | real| 
 a_tc_std11  | real| 
 a_tc_std12  | real| 
 a_tc_std2   | real| 
 a_tc_std3   | real| 
 a_tc_std4   | real| 
 a_tc_std5   | real| 
 a_tc_std6   | real| 
 a_tc_std7   | real| 
 a_tc_std8   | real| 
 a_tc_std9   | real| 
 airtc_avg   | real| 
 airtemp_avg | real| 
 airtemp_max | real| 
 airtemp_min | real| 
 all_avgt| real| 
 am25tref1   | real| 
 amb_a_avg   | real| 
 amb_avg1| real| 
 amb_avg2| real| 
 amb_closed_avg  | real| 
 b_dc_avg1   | real| 
 b_dc_avg2   | real| 
 b_dc_avg3   | real

[GENERAL] speeding up a join query that utilizes a view

2013-01-16 Thread Kirk Wythers
I am looking for advice on a performance problem. I'm pretty sure that the 
culprit of my slow performance is a view that is several hundred million 
records in size. Because it is a view, I can only index the underlying table, 
but because the view generates an "un-pivoted" version of the underlying table 
with un unnest function, I can't index the important column in the underlying 
table, because it doesn't exist until after the un-pivot or stacking function 
of the view… I know… this is all very circular. 

Here is the join query that uses the view. I have 

SELECT  

   
data_key.site,  

data_key.canopy,


data_key.measurement_interval,  


data_key.treatment_code,

data_key.treatment_abbr,


data_key.plot,  

fifteen_min_stacked_view.*  

   
FROM

   
data_key,   


fifteen_min_stacked_view

   WHERE


data_key.variable_channel = fifteen_min_stacked_view.variable AND 
data_key.block_name = fifteen_min_stacked_view.block_name   





  AND fifteen_min_stacked_view.variable ~ 'tsoil'   




I have tried adding indexes where I can on the join colums in the data_key table
Here is the EXPLAIN.


  QUERY PLAN
  
--
 Hash Join  (cost=195.20..548004.70 rows=196 width=192) (actual 
time=3.295..443523.222 rows=28779376 loops=1)
   Hash Cond: ((fifteen_min_stacked_propper.variable = 
(data_key.variable_channel)::text) AND 
((fifteen_min_stacked_propper.block_name)::text = (data_key.block_name)::text))
   ->  Subquery Scan on fifteen_min_stacked_propper  (cost=0.00..547620.47 
rows=2878 width=156) (actual time=0.247..424911.643 rows=28779376 loops=1)
 Filter: (fifteen_min_stacked_propper.variable ~ 'tsoil'::text)
 ->  Index Scan using fifteen_min_pkey on fifteen_min  
(cost=0.00..525136.58 rows=1798711 width=1072) (actual time=0.034..96077.588 
rows=428093218 loops=1)
   -

Re: [GENERAL] psql copy from through bash

2013-01-11 Thread Kirk Wythers

On Jan 11, 2013, at 12:18 PM, Szymon Guz  wrote:

> 
> 
> 
> On 11 January 2013 19:13, Kirk Wythers  wrote:
> Can anyone see what I'm misisng? I am trying to run a psql "copy from" 
> command through a bash script to load a buch of cdv files into the same 
> table. I'm getting an error about the file "infile" not existing?
> 
> #!/bin/sh
> 
> for infile in /path_to_files/*.csv
> do
>cat infile | psql dbname -c "\copy table_name FROM stdin with delimiter as 
> ',' NULL AS 'NA' CSV HEADER"
> done
> 
> 
> Thanks in advance
> 
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
> 
> 
> Hi Kirk,
> maybe try this:
> 
> cat $infile |
> 
> 

Oh my goodness! Thanks you. 

Once more quickie. It seems that I am going to be asked for my password every 
time psql loops through the copy statement. 

What is considered best practices to handle authentication? I am connecting 
locally, as myself as the user and I'm being asked for my user password. I 
added the -w (no-password) to the psql statement, but now assume I need to add 
a .pgpass file or something. 

Suggestions?



[GENERAL] psql copy from through bash

2013-01-11 Thread Kirk Wythers
Can anyone see what I'm misisng? I am trying to run a psql "copy from" command 
through a bash script to load a buch of cdv files into the same table. I'm 
getting an error about the file "infile" not existing?

#!/bin/sh

for infile in /path_to_files/*.csv
do
   cat infile | psql dbname -c "\copy table_name FROM stdin with delimiter as 
',' NULL AS 'NA' CSV HEADER"
done


Thanks in advance

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query by partial timestamp

2013-01-08 Thread Kirk Wythers

On Jan 8, 2013, at 6:48 PM, Tom Lane  wrote:

> The OP didn't
> suggest how many years his data covers, but it's quite possible that
> pulling a full year's worth of data will read enough of the table that
> there's no point in worrying about whether an index could be used
> anyway.

There are only a few years worth of data, 2008 - 2012. However, the data 
consists of 15 min measurements and when renormalized (un-pivoted) is several 
hundred million records. It is conceivable that someone will want to query by 
month, or even hour of the day. 

[GENERAL] query by partial timestamp

2013-01-08 Thread Kirk Wythers
I have a column of type TIMESTAMP, I'd like to query all records from 2011. If 
it were text I could use a partial such as:

WHERE
text ~ '2011'

There must be a simple way to pull the year part out of a timestamp format. 
Thanks in advance.




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] recasting to timestamp from varchar

2013-01-04 Thread Kirk Wythers
I am trying to re-cast a column as a timestamp>

ALTER TABLE sixty_min ALTER COLUMN time2 TYPE timestamp;
ERROR:  column "time2" cannot be cast to type timestamp without time zone

The column time2 is currently a varchar. I actually do not want to mess with 
time zones, but it seems that postgres will not let me do this. The actual 
values in the column time2 look like this:

7/15/08 12:00

Is this possible? 

Thanks in advance.

[GENERAL] recasting to timestamp from varchar

2013-01-04 Thread Kirk Wythers
I am trying to re-cast a column as a timestamp>

ALTER TABLE sixty_min ALTER COLUMN time2 TYPE timestamp;
ERROR:  column "time2" cannot be cast to type timestamp without time zone

The column time2 is currently a varchar. I actually do not want to mess with 
time zones, but it seems that postgres will not let me do this. The actual 
values in the column time2 look like this:

7/15/08 12:00

Is this possible? 

[GENERAL] recasting to timestamp from varchar

2013-01-04 Thread Kirk Wythers
I am trying to re-cast a column as a timestamp>

ALTER TABLE sixty_min ALTER COLUMN time2 TYPE timestamp;
ERROR:  column "time2" cannot be cast to type timestamp without time zone

The column time2 is currently a varchar. I actually do not want to mess with 
time zones, but it seems that postgres will not let me do this. The actual 
values in the column time2 look like this:

7/15/08 12:00

Is this possible? 

[GENERAL] un-pivot with crosstab?

2013-01-02 Thread Kirk Wythers
I am trying to follow the example given below, but in reverse. I have something 
like table2, that I am trying to "un-pivot" to look like table1… And I seem to 
be having a booger of a time. 

My attempt:

SELECT * FROM crosstab(
 'SELECT rowid, key1,key1 
   FROM test WHERE key1= ''value'' OR key2 = ''value'' 
   ORDER BY 1,2'
 ) AS ct(rowid text, value int8);

Can anyone point out where my small brain is failing me?


PostgreSQL crosstab query - Rotate a table about a pivot
An interesting feature of relational databases(postgres in this case) is the 
ability to rotate the table about a pivot. So if you have data like this-
 id | rowid | key | value 
---+--++---
  1 | test1 | key1  | val1
  2 | test1 | key2  | val2
  3 | test1 | key3  | val3
  4 | test1 | key4  | val4
  5 | test2 | key1  | val5
  6 | test2 | key2  | val6
  7 | test2 | key3  | val7
  8 | test2 | key4  | val8

And want to have a result set like this -

rowid | key1 | key2 | key3 | key4 
--+--+-+-+--
 test1  | val1  | val2  | val3  | val4
 test2  | val5  | val6  | val7  | val8


It can be achieved by a "crosstab" query in a postgres database -

SELECT * FROM crosstab(
 'SELECT rowid, key, value 
   FROM test WHERE key= ''key1'' OR key = ''key2'' 
   OR key = ''key3'' OR key = ''key4''
   ORDER BY 1,2'
 ) AS ct(rowid text, key1 text, key2 text,
 key3 text, key4 text);

[GENERAL] do I need a table function to do this

2012-12-29 Thread Kirk Wythers
I have been given an interesting problem to solve in a postgres db. I was given 
two tables

t1:


sitetreatment   variableid (pk)
-
A   X   BLUE1A
B   Y   RED 2B
A   Y   GREEN   3A



t2:

rowid (pk)  timestamp   BLUERED GREEN
-
1   1332493200  3.4 2.1 
5.8
2   1332496800  3.2 2.0 
5.8
3   1332500400  3.3 2.2 
6.0


I need to combine the site and treatment information from t1 with the variable 
records in t2. I think I will have to combine these one variable at a time. 
Something like this (I'm not using the word join, because I don't think this is 
a join in the regular sense. It's more like some kind of crazy pivot table 
thing!):

t3:

rowid (pk)  timestamp   BLUEsite
treatment
-
1   1332493200  3.4 A   
X
2   1332496800  3.2 A   
X
3   1332500400  3.3 A   
X

and then:

t4

rowid (pk)  timestamp   RED site
treatment
-
1   1332493200  2.1 B   
Y
2   1332496800  2.0 B   
Y
3   1332500400  2.2 B   
Y


Is this even possible?





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] do I need a table function to do this?

2012-12-29 Thread Kirk Wythers
I have been given an interesting problem to solve in a postgres db. I was given 
two tables

t1:


sitetreatment   variableid (pk)
-
A   X   BLUE1A
B   Y   RED 2B
A   Y   GREEN   3A



t2:

rowid (pk)  timestamp   BLUERED GREEN
-
1   1332493200  3.4 2.1 
5.8
2   1332496800  3.2 2.0 
5.8
3   1332500400  3.3 2.2 
6.0


I need to combine the site and treatment information from t1 with the variable 
records in t2. I think I will have to combine these one variable at a time. 
Something like this (I'm not using the word join, because I don't think this is 
a join in the regular sense. It's more like some kind of crazy pivot table 
thing!):

t3:

rowid (pk)  timestamp   BLUEsite
treatment
-
1   1332493200  3.4 A   
X
2   1332496800  3.2 A   
X
3   1332500400  3.3 A   
X

and then:

t4

rowid (pk)  timestamp   RED site
treatment
-
1   1332493200  2.1 B   
Y
2   1332496800  2.0 B   
Y
3   1332500400  2.2 B   
Y


Is this even possible?






-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] update from a csv file?

2012-12-27 Thread Kirk Wythers
I have been using COPY FROM to do a mass import of records from CSV files into 
a new database. I have discover however, a small number of records ( a few 
thousand) in one of the files that contain new data that needs to be added to 
the database, but on rows that have a primary key and have already been 
inserted (so I can't use COPY FROM because it violates the primary key). 

If the structure of the table is 

id  data1   data2   data3

and the structure of the CSV file is 

id  data1   data2   data3

and I need to update all the rows in data3 where the id = id.

Is this a job for the UPDATE command? or is there a better way to pull data 
from a CSV file in order to do a mass update?

Thanks

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] update table from a csv file

2012-12-27 Thread Kirk Wythers

On Dec 27, 2012, at 10:39 AM, Adrian Klaver  wrote:

> No. Some questions though.

Thanks for the reply Adrian. 

> 
> What version pf Postgres?

9.1

> Is that the actual UPDATE statement, I see no SET?

I was reading the docs but obviously don't understand the syntax of the update 
statement. 

> Have you tried it?
> If so and it failed what was the error?

Yes and I got an error at or near from. Like this:

b4warmed3=# UPDATE sixty_min FROM tmp_60 WHERE sixty_min.rowid = tmp_60.rowid;
ERROR:  syntax error at or near "FROM"
LINE 1: UPDATE sixty_min FROM tmp_60 WHERE sixty_min.rowid = tmp_60
 ^
b4warmed3=# 



[GENERAL] update table from csv file

2012-12-27 Thread Kirk Wythers
I have been using COPY FROM to do a mass import of records from CSV files into 
a new database. I have discover however, a small number of records ( a few 
thousand) in one of the files that contain new data that needs to be added to 
the database, but on rows that have a primary key and have already been 
inserted (so I can't use COPY FROM because it violates the primary key). 

If the structure of the table is 

id  data1   data2   data3

and the structure of the CSV file is 

id  data1   data2   data3

and I need to update all the rows in data3 where the id = id.

I have created a temporary table and used COPY FROM to load the update data 
into the temporary table. I seem to be stuck however. I thought I should be 
able to use the UPDATE command to update all columns and all rows the table.id 
= tmp_table.id

Something like:

UPDATE table FROM tmp_table WHERE table.id = tmp_table.id;

Or am I completely off course?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] update table from a csv file

2012-12-27 Thread Kirk Wythers
I have been using COPY FROM to do a mass import of records from CSV files into 
a new database. I have discover however, a small number of records ( a few 
thousand) in one of the files that contain new data that needs to be added to 
the database, but on rows that have a primary key and have already been 
inserted (so I can't use COPY FROM because it violates the primary key). 

If the structure of the table is 

id  data1   data2   data3

and the structure of the CSV file is 

id  data1   data2   data3

and I need to update all the rows in data3 where the id = id.

I have created a temporary table and used COPY FROM to load the update data 
into the temporary table. I seem to be stuck however. I thought I should be 
able to use the UPDATE command to update all columns and all rows the table.id 
= tmp_table.id

Something like:

UPDATE table FROM tmp_table WHERE table.id = tmp_table.id;

Or am I completely off course?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] data type troubles

2012-12-21 Thread Kirk Wythers
I seem to be dealing with a data type issue when I try and import data into a 
new and empty database. 

Error Message: ERROR:  date/time field value out of range: "1332471600:00:00"
LINE 1: ...tc_avg25") values ('2012-03-22 21:00:00_B4WARM_A','133247160…

Here are the first two rows in the files I was sent. I cast ROWID as varchar32, 
TIME2 as time (I was told that TIME2 was a posix time value), and TIMESTAMP as 
timestamp. 

Any advice that would help me out of this hole is appreciated.

ROWID   TIME2   TIMESTAMP   BLOCK   STATNAMETABLE   PROGRAM 
A_TC_AVG1   A_TC_AVG10  A_TC_AVG11  A_TC_AVG12  A_TC_AVG2   
A_TC_AVG3   A_TC_AVG4   A_TC_AVG5   A_TC_AVG6   A_TC_AVG7   
A_TC_AVG8   A_TC_AVG9   AIRTC_AVG   ECODE1  ECODE10 ECODE11 ECODE12 
ECODE2  ECODE3  ECODE4  ECODE5  ECODE6  ECODE7  ECODE8  ECODE9  ETC_AVG13   
ETC_AVG14   ETC_AVG15   ETC_AVG16   ETC_AVG17   ETC_AVG18   
ETC_AVG19   ETC_AVG20   ETC_AVG21   ETC_AVG22   ETC_AVG23   
ETC_AVG24   ETC_AVG25   FLAG1   FLAG10  FLAG11  FLAG12  FLAG2   FLAG3   
FLAG4   FLAG5   FLAG6   FLAG7   FLAG8   FLAG9   PAR_DEN_AVG PAR_TOT_TOT 
PERIOD10PERIOD11PERIOD12PERIOD13PERIOD14
PERIOD15PERIOD16PERIOD9 RAIN_IN_TOT RH  S_TC_AVG1   
S_TC_AVG10  S_TC_AVG11  S_TC_AVG12  S_TC_AVG2   S_TC_AVG3   
S_TC_AVG4   S_TC_AVG5   S_TC_AVG6   S_TC_AVG7   S_TC_AVG8   
S_TC_AVG9   SLR_MJ_TOT  SLR_W_AVG   SOILTEMP10_AVG  SOILTEMP100_AVG 
SOILTEMP20_AVG  SOILTEMP30_AVG  SOILTEMP40_AVG  SOILTEMP50_AVG  SOILTEMP75_AVG  
VP_AVG  VWC10   VWC11   VWC12   VWC13   VWC14   VWC15   VWC16   VWC9WCODE1  
WCODE10 WCODE11 WCODE12 WCODE2  WCODE3  WCODE4  WCODE5  WCODE6  WCODE7  WCODE8  
WCODE9  WTC_AVG13   WTC_AVG14   WTC_AVG15   WTC_AVG16   
WTC_AVG17   WTC_AVG18   WTC_AVG19   WTC_AVG20   WTC_AVG21   
WTC_AVG22   WTC_AVG23   WTC_AVG24   WTC_AVG25
2012-03-22 21:00:00_B4WARM_A1332471600  1332471600  B4WARM_A
B4WARM_AB4Warm_60   CPU:B4warm2011_Sauron_ALT4.2.CR1NA  
NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  
NA  0   NA  NA  NA  NA  NA  NA  NA  NA  
NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  
NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  
NA  NA  NA  NA  NA  NA  NA  NA  NA  0   
0   25.63   24.73   23.27   23.87   23.63   25.09   23.59   21.76   0   
0   NA  NA  NA  NA  NA  NA  NA  NA  NA  
NA  NA  NA  0   0   NA  NA  NA  NA  NA  
NA  NA  0   0.232   0.206   0.166   0.182   0.176   0.216   0.175   
0.128   NA  NA  NA  NA  NA  NA  NA  NA  NA  
NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  
NA  NA  NA  NA  NA  NA

Re: [GENERAL] copy from questions

2012-12-19 Thread Kirk Wythers
Thanks for the reply Steve. These suggestions are new to me, so I'd like to 
rephrase them back to you in order to make sure I understand the bits and 
details. 



On Dec 19, 2012, at 10:48 AM, Steve Crawford  
wrote:
>> 
> I suppose you could use a trigger to check each record before inserting but 
> that is likely to be inefficient for bulk loads. A quick bash loop is 
> probably your best bet. Something along the lines of:
> 
> for inputfile in /infiledirectory/*.csv
> do
>cat inputfile | psql [connection-params] -c '\copy rawinput from stdin csv 
> header...'
> done

I get this… If my except for the header… bit. Here is my interpretation of your 
code including my specific connection parameters. 

#!/bin/sh
for inputfile in '/Volumes/disk7/b4warmed3/export/60min2/*.txt'
do
   cat inputfile | psql -p 54321 -h localhost -c '\copy rawinput FROM stdin 
WITH CSV HEADER DELIMTER AS ',' NULL AS 'NA '
done

I added single quotes around the path to the input files. Correct right?

> 
> This imports everything into a "staging" table (I called it rawinput). From 
> there you can create your final table with SELECT DISTINCT…

This bit needs to be as a separate step right? (rowid is the primary key)

SELECT DISTINCT ON (rowid) *
FROM rawinput;

From here do I need another COPY FROM or some kind of INSERT statement? 

> 
> For speed make sure that you create your staging table as "unlogged".

I understand that I need to create the rawinput table first, but I am 
unfamiliar with the "UNLOGGED" option. I assume it makes things faster… Does it 
go something like:

CREATE TABLE UNLOGGED rawinput; 

Do I need to create all the variables (including types) in rawinput as well? If 
so, then I assume that I do not want rowid to have a primary key… or else I 
would be back where I started. 


> 
> Cheers,
> Steve
> 

Thanks again,

Kirk




[GENERAL] copy from questions

2012-12-19 Thread Kirk Wythers
I am using version 9.1 and have a large number of files to insert. I am trying 
to use a simple COPY FROM command but have a couple questions.

1. There are a small number of instances where there are duplicate records that 
are being caught by the primary key (as it is supposed to do). However, the 
duplicate records are artifacts of a recording system and can be ignored. I 
would like to be able to simply skip the duplicate or UPDATE the table with the 
duplicate… Anything that allows the COPY FROM to proceed while adding only one 
of the duplicate records to the table. 

2. SInce I have several hundred files to perform a COPY FROM on, I'd like to 
automate the import in some way… sort of a, grab all files in the directory 
approach:

COPY newtable FROM '/directory_of_files/*' WITH CSV HEADER DELIMITER AS ',' 
NULL AS 'NA'; 

Is this possible?

Thanks in advance

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] data type - import problem

2012-12-17 Thread Kirk Wythers
Thanks. 

Is there a way to use the float4 data type for a variable when empty records 
contain "NA", instead of nothing? As you can see below the variable "A_TC_AVG1" 
has the first record filled with "NA" signifying that the record is empty? 
Records that contain values will have decimal values like 12.3456. 


ROWID   TIME2   TIMESTAMP   BLOCK   STATNAMETABLE   PROGRAM 
A_TC_AVG1   A_TC_AVG10  A_TC_AVG11  A_TC_AVG12  A_TC_AVG2   
A_TC_AVG3   A_TC_AVG4   A_TC_AVG5   A_TC_AVG6   A_TC_AVG7   
A_TC_AVG8   A_TC_AVG9   AIRTC_AVG   ECODE1  ECODE10 ECODE11 ECODE12 
ECODE2  ECODE3  ECODE4  ECODE5  ECODE6  ECODE7  ECODE8  ECODE9  ETC_AVG13   
ETC_AVG14   ETC_AVG15   ETC_AVG16   ETC_AVG17   ETC_AVG18   
ETC_AVG19   ETC_AVG20   ETC_AVG21   ETC_AVG22   ETC_AVG23   
ETC_AVG24   ETC_AVG25   FLAG1   FLAG10  FLAG11  FLAG12  FLAG2   FLAG3   
FLAG4   FLAG5   FLAG6   FLAG7   FLAG8   FLAG9   PAR_DEN_AVG PAR_TOT_TOT 
PERIOD10PERIOD11PERIOD12PERIOD13PERIOD14
PERIOD15PERIOD16PERIOD9 RAIN_IN_TOT RH  S_TC_AVG1   
S_TC_AVG10  S_TC_AVG11  S_TC_AVG12  S_TC_AVG2   S_TC_AVG3   
S_TC_AVG4   S_TC_AVG5   S_TC_AVG6   S_TC_AVG7   S_TC_AVG8   
S_TC_AVG9   SLR_MJ_TOT  SLR_W_AVG   SOILTEMP10_AVG  SOILTEMP100_AVG 
SOILTEMP20_AVG  SOILTEMP30_AVG  SOILTEMP40_AVG  SOILTEMP50_AVG  SOILTEMP75_AVG  
VP_AVG  VWC10   VWC11   VWC12   VWC13   VWC14   VWC15   VWC16   VWC9WCODE1  
WCODE10 WCODE11 WCODE12 WCODE2  WCODE3  WCODE4  WCODE5  WCODE6  WCODE7  WCODE8  
WCODE9  WTC_AVG13   WTC_AVG14   WTC_AVG15   WTC_AVG16   
WTC_AVG17   WTC_AVG18   WTC_AVG19   WTC_AVG20   WTC_AVG21   
WTC_AVG22   WTC_AVG23   WTC_AVG24   WTC_AVG25
2012-03-22 21:00:00_B4WARM_A1332471600  1332471600  B4WARM_A
B4WARM_AB4Warm_60   CPU:B4warm2011_Sauron_ALT4.2.CR1NA  
NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  
NA  0   NA  NA  NA  NA  NA  NA  NA  NA  
NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  
NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  
NA  NA  NA  NA  NA  NA  NA  NA  NA  0   
0   25.63   24.73   23.27   23.87   23.63   25.09   23.59   21.76   0   
0   NA  NA  NA  NA  NA  NA  NA  NA  NA  
NA  NA  NA  0   0   NA  NA  NA  NA  NA  
NA  NA  0   0.232   0.206   0.166   0.182   0.176   0.216   0.175   
0.128   NA  NA  NA  NA  NA  NA  NA  NA  NA  
NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  
NA  NA  NA  NA  NA  NA




On Dec 17, 2012, at 4:05 PM, Adrian Klaver  wrote:

> On 12/17/2012 01:53 PM, Kirk Wythers wrote:
>> 
>> I seem to be dealing with a data type issue when I try and import data
>> into a new and empty database.
>> 
>> Error Message: ERROR:  date/time field value out of range:
>> "1332471600:00:00"
>> LINE 1: ...tc_avg25") values ('2012-03-22 21:00:00_B4WARM_A','133247160…
>> 
>> Here are the first two rows in the files I was sent. I cast ROWID as
>> varchar32, TIME2 as time (I was told that TIME2 was a posix time value),
>> and TIMESTAMP as timestamp.
>> 
>> Any advice that would help me out of this hole is appreciated.
> 
> Assuming by POSIX time you mean seconds since epoch you will need to do 
> something like this:
> 
> test=# SELECT to_timestamp(1332471600);
>  to_timestamp
> 
> 2012-03-22 20:00:00-07
> 
> For time:
> test=# SELECT to_timestamp('1332471600')::time;
> to_timestamp
> --
> 20:00:00
> 
> 
> In your INSERT statement use the to_timestamp() in the VALUES portion.
> -- 
> Adrian Klaver
> adrian.kla...@gmail.com



[GENERAL] data type - import problem

2012-12-17 Thread Kirk Wythers

I seem to be dealing with a data type issue when I try and import data into a 
new and empty database. 

Error Message: ERROR:  date/time field value out of range: "1332471600:00:00"
LINE 1: ...tc_avg25") values ('2012-03-22 21:00:00_B4WARM_A','133247160…

Here are the first two rows in the files I was sent. I cast ROWID as varchar32, 
TIME2 as time (I was told that TIME2 was a posix time value), and TIMESTAMP as 
timestamp. 

Any advice that would help me out of this hole is appreciated.

ROWID   TIME2   TIMESTAMP   BLOCK   STATNAMETABLE   PROGRAM 
A_TC_AVG1   A_TC_AVG10  A_TC_AVG11  A_TC_AVG12  A_TC_AVG2   
A_TC_AVG3   A_TC_AVG4   A_TC_AVG5   A_TC_AVG6   A_TC_AVG7   
A_TC_AVG8   A_TC_AVG9   AIRTC_AVG   ECODE1  ECODE10 ECODE11 ECODE12 
ECODE2  ECODE3  ECODE4  ECODE5  ECODE6  ECODE7  ECODE8  ECODE9  ETC_AVG13   
ETC_AVG14   ETC_AVG15   ETC_AVG16   ETC_AVG17   ETC_AVG18   
ETC_AVG19   ETC_AVG20   ETC_AVG21   ETC_AVG22   ETC_AVG23   
ETC_AVG24   ETC_AVG25   FLAG1   FLAG10  FLAG11  FLAG12  FLAG2   FLAG3   
FLAG4   FLAG5   FLAG6   FLAG7   FLAG8   FLAG9   PAR_DEN_AVG PAR_TOT_TOT 
PERIOD10PERIOD11PERIOD12PERIOD13PERIOD14
PERIOD15PERIOD16PERIOD9 RAIN_IN_TOT RH  S_TC_AVG1   
S_TC_AVG10  S_TC_AVG11  S_TC_AVG12  S_TC_AVG2   S_TC_AVG3   
S_TC_AVG4   S_TC_AVG5   S_TC_AVG6   S_TC_AVG7   S_TC_AVG8   
S_TC_AVG9   SLR_MJ_TOT  SLR_W_AVG   SOILTEMP10_AVG  SOILTEMP100_AVG 
SOILTEMP20_AVG  SOILTEMP30_AVG  SOILTEMP40_AVG  SOILTEMP50_AVG  SOILTEMP75_AVG  
VP_AVG  VWC10   VWC11   VWC12   VWC13   VWC14   VWC15   VWC16   VWC9WCODE1  
WCODE10 WCODE11 WCODE12 WCODE2  WCODE3  WCODE4  WCODE5  WCODE6  WCODE7  WCODE8  
WCODE9  WTC_AVG13   WTC_AVG14   WTC_AVG15   WTC_AVG16   
WTC_AVG17   WTC_AVG18   WTC_AVG19   WTC_AVG20   WTC_AVG21   
WTC_AVG22   WTC_AVG23   WTC_AVG24   WTC_AVG25
2012-03-22 21:00:00_B4WARM_A1332471600  1332471600  B4WARM_A
B4WARM_AB4Warm_60   CPU:B4warm2011_Sauron_ALT4.2.CR1NA  
NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  
NA  0   NA  NA  NA  NA  NA  NA  NA  NA  
NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  
NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  
NA  NA  NA  NA  NA  NA  NA  NA  NA  0   
0   25.63   24.73   23.27   23.87   23.63   25.09   23.59   21.76   0   
0   NA  NA  NA  NA  NA  NA  NA  NA  NA  
NA  NA  NA  0   0   NA  NA  NA  NA  NA  
NA  NA  0   0.232   0.206   0.166   0.182   0.176   0.216   0.175   
0.128   NA  NA  NA  NA  NA  NA  NA  NA  NA  
NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  
NA  NA  NA  NA  NA  NA



Re: [GENERAL] db not dumping properly, or at least not restoring

2009-10-16 Thread Kirk Wythers
On Oct 16, 2009, at 4:51 PM, Scott Marlowe   
wrote:


On Fri, Oct 16, 2009 at 11:25 AM, Kirk Wythers   
wrote:



Any ideas what the problem could be here?


Use the pg_dump from the target (i.e. newer) pgsql.  I.e. if going
from 8.3.8 to 8.4.1, use the pg_dump that comes with 8.4.1 to dump the
8.3.8 database.


Can I assume that this is even more critical if gong from 8.2 to 8.4?



I usually just do it like so:

(First migrate accounts:)
pg_dumpall --globals -h oldserver | psql -h newserver postgres


I'm a little confused here. Are you saying to used the network  
connections between thetwo servers and to pipe the dumpall directly to  
the psql load?



(then each database:)
createdb -h newserver dbname


Then create new databases on the the new server to match the. The  
names from the old server?




pg_dump -h oldserver dbname | psql -h newserver dbname
(repeat as needed, save output for error messages)


Then dump each database individually and pipe the dump to the psql load?


These two procedures seem to duplicate the goal? Or am I mosaic  
something?


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] db not dumping properly, or at least not restoring

2009-10-16 Thread Kirk Wythers
I am trying to move databases to another macine (and update from 8.2  
to 8.4 along the way). I first tried pg_dumpall, but I found that one  
of the data bases did not restore and data, just an empty db with no  
tables. Since then I have tried pg_dump with the following:


bash-3.2$ /usr/local/pgsql/bin/pg_dump -o mn_timber > /Volumes/disk3/ 
backup_db/mn_timber20091016.out


then restore on the new machine with:

kwythers$ pg_restore -C -d postgres mn_timber20091016.out

But I am getting the error:

pg_restore: [archiver] input file does not appear to be a valid archive
onceler:~ kwythers$

Looking at the db on the original machine, all looks good.

mn_timber=# \d
   List of relations
 Schema |  Name  |   Type   |  Owner
++--+--
 public | all_timber_data| view | kwythers
 public | county | table| kwythers
 public | forties| table| kwythers
 public | geometry_columns   | table| kwythers
 public | grid_cell  | view | kwythers
 public | mn_pls_grid| table| kwythers
 public | mn_pls_grid_gid_seq| sequence | kwythers
 public | rdir   | table| kwythers
 public | session| table| kwythers
 public | session_session_id_seq | sequence | kwythers
 public | spatial_ref_sys| table| kwythers
 public | timber_type| table| kwythers
 public | timber_volume  | table| kwythers
 public | timber_volume_seq  | sequence | kwythers
(14 rows)

mn_timber=# SELECT * FROM timber_volume;
 grid_id | tt_id | year | cords | mbm | poles | posts | tv_id
-+---+--+---+-+---+---+---
  263515 |17 | 1920 |11 |   2 |   |   | 10176
  266999 | 6 | 1920 | 7 |  19 |   |   | 10869
 1141653 | 5 | 1920 |10 |   1 |   |   | 10238
 1143744 | 5 | 1920 | 2 |   1 |   |   | 10293
  263560 | 9 | 1920 |   |   5 |   |   | 10346
  264027 | 3 | 1920 |49 |   1 |   |   | 10391
  264180 | 9 | 1920 |70 |   5 |   |   | 10430
  263728 | 4 | 1920 |   | |   919 |  1495 | 10468
  263667 |17 | 1920 | 1 | |   |   | 10501
  263658 |17 | 1920 |15 | |   |   | 10528
  263984 | 3 | 1920 |98 | |   |   | 10554
  264289 |17 | 1920 | 1 | |   |   | 10579
  263973 | 4 | 1920 |   | |40 |40 | 10601
.
.
.
(38437 rows)

Any ideas what the problem could be here?

Thanks in advance.





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] query help

2007-08-14 Thread Kirk Wythers
I need some help with rewriting a query. I have a fairly complicated  
query (for me anyway) that dumps daily climate data, filling in  
missing data with monthly averages (one line per day).


I want to output monthly averages (one line per month). I am having a  
hard time wrapping my head around this. Particularly how to deal with  
the doy column (day of year). I have tried several approaches and my  
forehead is starting to get my keyboard bloody.


Thanks in advance for any suggestions.

Here is the daily query:

SELECT CASE
WHEN w.station_id = site_near.station_id THEN w.obs_id
ELSE s.obs_id
END AS obs_id,
site_near.station_id,
site_near.longname,
w.year,
w.doy,
--replace missing values (-999) with the monthly average
   CASE w.tmax
 WHEN -999 THEN avgtmax.avg
 ELSE w.tmax
   END,
CASE w.tmin
 WHEN -999 THEN avgtmin.avg
 ELSE w.tmin
   END,
CASE s.par
 WHEN -999 THEN avgpar.avg
 ELSE s.par
   END,
CASE w.precip
 WHEN -999 THEN avgprecip.avg
 ELSE w.precip
   END
FROM  site_near
   INNER JOIN solar s
 ON (site_near.ref_solar_station_id = s.station_id
 AND site_near.obs_year = s.year)
   INNER JOIN weather w
 ON (site_near.ref_weather_station_id = w.station_id
 AND site_near.obs_year = w.year
 AND s.date = w.date)
   INNER JOIN (SELECT   MONTH,
round(avg(tmax)::numeric, 2) AS avg
   FROM weather
   WHEREtmax != -999
   GROUP BY MONTH) AS avgtmax
 ON (w.month = avgtmax.month)
INNER JOIN (SELECT   MONTH,
round(avg(tmin)::numeric, 2) AS avg
   FROM weather
   WHEREtmin != -999
   GROUP BY MONTH) AS avgtmin
 ON (w.month = avgtmin.month)
   INNER JOIN (SELECT   MONTH,
round(avg(par)::numeric, 2) AS avg
   FROM solar
   WHEREpar != -999
   GROUP BY MONTH) AS avgpar
 ON (s.month = avgpar.month)
INNER JOIN (SELECT   MONTH,
round(avg(precip)::numeric, 2) AS avg
   FROM weather
   WHEREprecip != -999
   GROUP BY MONTH) AS avgprecip
 ON (w.month = avgprecip.month)
--select station to output climate data by id number
WHERE  w.station_id = 219101

Re: [GENERAL] problem with a conditional statement

2007-05-09 Thread Kirk Wythers


On May 9, 2007, at 10:41 AM, Erik Jones wrote:



On May 9, 2007, at 10:32 AM, Kirk Wythers wrote:





Here is a more elaborate version, I'm trying to add 'avgsol' to
your original FROM clause:



SELECT CASE WHEN w.station_id = site_near.station_id THEN
w.obs_id ELSE
s.obs_id END AS obs_id,  site_near.station_id, site_near.longname,
w.year, w.doy, w.precip, w.tmin, w.tmax,
--replace missing values (-999) with the monthly average
CASE s.par WHEN -999 THEN avgsol.par ELSE s.par END




FROM site_near
  INNER JOIN solar s ON
(site_near.ref_solar_station_id = s.station_id
 AND site_near.obs_year = s.year)
  INNER JOIN weather w ON
(site_near.ref_weather_station_id = w.station_id
 AND site_near.obs_year = w.year
 AND s.date = w.date)
  INNER JOIN (SELECT month, AVG(par) FROM solar GROUP BY month) AS
avgsol ON
(s.month = avgsol.month)
WHERE ...

Still no claim for correctness.

Does it make more sense now?


Thanks again Laurenz. Hopefully I have nearly figured this out. I  
have a question that indicates to me that I am a little fuzzy on  
one more point.


In the line: "CASE s.par WHEN -999 THEN avgsol.par ELSE s.par END"

I am getting an error that says, "column avgsol.par does not  
exist". I understand that you are creating avgsol as a subselect,  
but I also see the point of the error message that the column .par  
does not exist. If I change avgsol.par to the simple form avgsol  
(to match the subselect " INNER JOIN (SELECT month, AVG(par) FROM  
solar GROUP BY month) AS avgsol ON (s.month = avgsol.month)". Then  
I get an error about CASE types real and record cannot be matched.  
Any final ideas?


The "AVG(par)" should've been aliased.  "AVG(par) as par" would  
work.  As is, the column name returned is just "avg".


Got it! Thankyou!




erik jones <[EMAIL PROTECTED]>
software developer
615-296-0838
emma(r)






---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] problem with a conditional statement

2007-05-09 Thread Kirk Wythers




Here is a more elaborate version, I'm trying to add 'avgsol' to
your original FROM clause:



SELECT CASE WHEN w.station_id = site_near.station_id THEN
w.obs_id ELSE
s.obs_id END AS obs_id,  site_near.station_id, site_near.longname,
w.year, w.doy, w.precip, w.tmin, w.tmax,
--replace missing values (-999) with the monthly average
CASE s.par WHEN -999 THEN avgsol.par ELSE s.par END




FROM site_near
  INNER JOIN solar s ON
(site_near.ref_solar_station_id = s.station_id
 AND site_near.obs_year = s.year)
  INNER JOIN weather w ON
(site_near.ref_weather_station_id = w.station_id
 AND site_near.obs_year = w.year
 AND s.date = w.date)
  INNER JOIN (SELECT month, AVG(par) FROM solar GROUP BY month) AS
avgsol ON
(s.month = avgsol.month)
WHERE ...

Still no claim for correctness.

Does it make more sense now?


Thanks again Laurenz. Hopefully I have nearly figured this out. I  
have a question that indicates to me that I am a little fuzzy on one  
more point.


In the line: "CASE s.par WHEN -999 THEN avgsol.par ELSE s.par END"

I am getting an error that says, "column avgsol.par does not exist".  
I understand that you are creating avgsol as a subselect, but I also  
see the point of the error message that the column .par does not  
exist. If I change avgsol.par to the simple form avgsol (to match the  
subselect " INNER JOIN (SELECT month, AVG(par) FROM solar GROUP BY  
month) AS avgsol ON (s.month = avgsol.month)". Then I get an error  
about CASE types real and record cannot be matched. Any final ideas?




Yours,
Laurenz Albe



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] problem with a conditional statement

2007-05-08 Thread Kirk Wythers


On May 8, 2007, at 2:02 AM, Albe Laurenz wrote:


Kirk Wythers wrote:


I am struggling to get a CASE WHEN statement to work within another
CASE WHEN. Here is my original code:

SELECT CASE WHEN w.station_id = site_near.station_id THEN w.obs_id

ELSE

s.obs_id END AS obs_id,  site_near.station_id, site_near.longname,
w.year, w.doy, w.precip, w.tmin, w.tmax,

--replace missing solar values (-999) with the average of all solar
--values from that month (s.month)

--CASE  s.par WHEN -999 THEN AVG( s.par) ELSE s.par END
--FROM solar s
--GROUP BY s.month;

FROM site_near INNER JOIN solar s ON
site_near.ref_solar_station_id = s.station_id  AND
site_near.obs_year = s.year
INNER JOIN weather w ON site_near.ref_weather_station_id =
w.station_id AND site_near.obs_year = w.year AND s.date = w.date
WHERE w.station_id = 211630;

I have commented out the troublesome bits in the middle of the code.
All I am trying to do here is to replace missing values with averages



from the same day of the year for all years. Does anyone see what I
am buggering up here?




Thank you for the reply. I see what you are doing in the creating of  
avgsol. That should work perfectly. However, I am unsure how you are  
working it into the existing code.



The problem here is the AVG().
All columns that appear outside of group functions in the SELECT list
must be in the GROUP BY clause.

Maybe something like this could help you:

SELECT ..., w.tmax,


I think you adding "CASE s.par WHEN -999 THEN avgsol.par ELSE s.par  
END" after


"SELECT CASE WHEN w.station_id = site_near.station_id THEN w.obs_id ELSE
s.obs_id END AS obs_id,  site_near.station_id, site_near.longname,  
w.year, w.doy, w.precip, w.tmin, w.tmax,"


to look this like this:

SELECT CASE WHEN w.station_id = site_near.station_id THEN w.obs_id ELSE
s.obs_id END AS obs_id,  site_near.station_id, site_near.longname,  
w.year, w.doy, w.precip, w.tmin, w.tmax,

--replace missing values (-999) with the monthly average
CASE s.par WHEN -999 THEN avgsol.par ELSE s.par END

Correct?


  CASE s.par WHEN -999 THEN avgsol.par ELSE s.par END
  ...
  FROM solar s INNER JOIN ...,


I can't quite figure out what you are suggesting here?


(SELECT month, AVG(par) FROM solar GROUP BY month) AS avgsol
  WHERE s.month = avgsol.month
AND ...


Do you mean:

FROM site_near INNER JOIN solar s ON
(SELECT month, AVG(par) FROM solar GROUP BY month) AS avgsol WHERE  
s.month = avgsol.month
AND site_near.ref_solar_station_id = s.station_id  AND  
site_near.obs_year = s.year
INNER JOIN weather w ON site_near.ref_weather_station_id =  
w.station_id AND site_near.obs_year = w.year AND s.date = w.date

WHERE w.station_id = 211630;

I think my trouble is figuring how to place the code snipit:

   (SELECT month, AVG(par) FROM solar GROUP BY month) AS avgsol
  WHERE s.month = avgsol.month
AND ...

Sorry for being so dull





In this statement I create a subselect "avgsol" that I use like
a table.

Be warned that there will probably be a sequential scan of the whole
table "solar" whenever you run the statement, because the averages  
have

to be calculated first!


That is ok, I won't be running this query so often that the  
performance will be an issue.





---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[GENERAL] problem with a conditional statement

2007-05-07 Thread Kirk Wythers
I am struggling to get a CASE WHEN statement to work within another  
CASE WHEN. Here is my original code:


SELECT CASE WHEN w.station_id = site_near.station_id THEN w.obs_id ELSE
s.obs_id END AS obs_id,  site_near.station_id, site_near.longname,  
w.year, w.doy, w.precip, w.tmin, w.tmax,


--replace missing solar values (-999) with the average of all solar  
values from that month (s.month)


--CASE  s.par WHEN -999 THEN AVG( s.par) ELSE s.par END
--FROM solar s
--GROUP BY s.month;

FROM site_near INNER JOIN solar s ON
site_near.ref_solar_station_id = s.station_id  AND site_near.obs_year  
= s.year
INNER JOIN weather w ON site_near.ref_weather_station_id =  
w.station_id AND site_near.obs_year = w.year AND s.date = w.date

WHERE w.station_id = 211630;

I have commented out the troublesome bits in the middle of the code.  
All I am trying to do here is to replace missing values with averages  
from the same day of the year for all years. Does anyone see what I  
am buggering up here?


Thanks in advance

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] out of memory error on 3 table join

2006-12-11 Thread Kirk Wythers


On Dec 11, 2006, at 1:43 PM, Tom Lane wrote:


Kirk Wythers <[EMAIL PROTECTED]> writes:

I have an database (pg 8.1.0 on OS X) where a three table inner-join
gives the following errors:



psql(606) malloc: *** vm_allocate(size=8421376) failed (error code=3)
psql(606) malloc: *** error: can't allocate region
psql(606) malloc: *** set a breakpoint in szone_error to debug
out of memory for query result


How many rows are you expecting from this join?
(If you're not sure, try SELECT count(*) FROM same-from-and-where- 
clauses)


SELECT count (*) returns 33,061,700

met_data=# SELECT count(*) FROM climate, sites, solar WHERE  
climate.id = sites.id AND solar.id = sites.id AND climate.year = 1999;

--
33061700
(1 row)

However attempting the join
met_data=# SELECT climate.year, solar.doy, sites.longname FROM  
climate, sites, solar WHERE climate.id = sites.id AND solar.id =  
sites.id AND climate.year = 1999;


gives the error.


psql(394) malloc: *** vm_allocate(size=396742656) failed (error code=3)
psql(394) malloc: *** error: can't allocate region
psql(394) malloc: *** set a breakpoint in szone_error to debug
out of memory





My bet is that you've mistyped something leading to an unconstrained
join --- maybe an extra instance of a table, or something.  The query
doesn't show anything like that as given:


SELECT sites.longname, solar.year, solar.doy, solar.solar
climate.tmax FROM solar, sites, climate WHERE solar.id = sites.id AND
climate.id = sites.id AND solar.year = 1999;


but you have obviously not copied-and-pasted exactly what you typed.

regards, tom lane

---(end of  
broadcast)---

TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that  
your

   message can get through to the mailing list cleanly



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[GENERAL] out of memory error on 3 table join

2006-12-11 Thread Kirk Wythers
I have an database (pg 8.1.0 on OS X) where a three table inner-join  
gives the following errors:


psql(606) malloc: *** vm_allocate(size=8421376) failed (error code=3)
psql(606) malloc: *** error: can't allocate region
psql(606) malloc: *** set a breakpoint in szone_error to debug
out of memory for query result

In this case table 'a' references table 'b', and table 'c' references  
table 'b'. Two table joins between 'a' and 'b' work fine. Likewise a  
two table join between 'b' and 'c' work fine. However, trying to do a  
join between 'a', 'b', and 'c' results in the error. For example:


met_data=# SELECT sites.longname, climate.year, climate.doy,  
climate.tmax FROM climate, sites WHERE climate.id = sites.id AND  
climate.year = 1999;


returns the expected result, and a query like:

SELECT sites.longname, solar.year, solar.doy, solar.solar FROM solar,  
sites WHERE solar.id = sites.id AND solar.year = 1999;


also returns the expected result.

However, combining all three tables such as below results in the errors:

SELECT sites.longname, solar.year, solar.doy, solar.solar  
climate.tmax FROM solar, sites, climate WHERE solar.id = sites.id AND  
climate.id = sites.id AND solar.year = 1999;


I am beginning to suspect that I set up the tables incorrectly. Can  
anyone spot where I messed this up? Tables are as follows:


met_data=# \d climate
Table "public.climate"
Column |   Type   | Modifiers
+--+---
id | integer  |
year   | smallint |
month  | smallint |
day| smallint |
doy| smallint |
date   | date |
precip | real |
tmin   | real |
tmax   | real |
Foreign-key constraints:
"idfk" FOREIGN KEY (id) REFERENCES sites(id)

met_data=# \d solar
 Table "public.solar"
Column |   Type   | Modifiers
+--+---
id | integer  |
year   | smallint |
month  | smallint |
day| smallint |
doy| smallint |
date   | date |
solar  | real |
Foreign-key constraints:
"idfk" FOREIGN KEY (id) REFERENCES sites(id)

met_data=# \d sites
  Table "public.sites"
 Column  |   Type| Modifiers
-+---+---
id  | integer   | not null
name| character varying |
longname| character varying |
state   | character varying |
lat | double precision  |
lon | double precision  |
the_geom_lonlat | geometry  |
the_geom_meters | geometry  | not null
Indexes:
"sites_pkey" PRIMARY KEY, btree (id)
"sites_id_key" UNIQUE, btree (id)
"idx_sites_the_geom_lonlat" gist (the_geom_lonlat)
"idx_sites_the_geom_meters" gist (the_geom_meters) CLUSTER
Check constraints:
"enforce_dims_the_geom_meters" CHECK (ndims(the_geom_meters) = 2)
"enforce_dims_the_goem_lonlat" CHECK (ndims(the_geom_lonlat) = 2)
"enforce_geotype_the_geom_meters" CHECK (geometrytype 
(the_geom_meters) = 'POINT'::text OR the_geom_meters IS NULL)
"enforce_geotype_the_goem_lonlat" CHECK (geometrytype 
(the_geom_lonlat) = 'POINT'::text OR the_geom_lonlat IS NULL)
"enforce_srid_the_geom_meters" CHECK (srid(the_geom_meters) =  
32661)

"enforce_srid_the_goem_lonlat" CHECK (srid(the_geom_lonlat) = 4269)



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match