Re: [SQL] Inquiry From Form [pgsql]
Jack Kiss <[EMAIL PROTECTED]> writes: > 1)Do you have a function which is similar to Oracle\'s DECODE. Yes > 2) Can you extract day of week (Monday,etc) from yours date functions. Yes Check out the "Date/Time Function and Operators" and the "Conditional Expressions" sections of this: http://www.ca.postgresql.org/users-lounge/docs/7.3/postgres/functions.html -- greg ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [SQL] HardCORE QUERY HELP!!!
Metnetsky <[EMAIL PROTECTED]> writes: > It's for a class and my professor has a thing for brain teaser type > questions. Incidentally, TAs and Profs aren't stupid, and have been known to check on newsgroups and mailing lists for students asking for people to do their homework for them. -- greg ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] Forcing query to use an index
One suggestion I'll make about your data model -- I'm not sure it would actually help this query, but might help elsewhere: WHERE ( C.Disabled > '2003-02-28' OR C.Disabled IS NULL ) Don't use NULL values like this. Most databases don't index NULLs (Oracle) or even if they do, don't make "IS NULL" an indexable operation (postgres). There's been some talk of changing this in postgres but even then, it wouldn't be able to use an index for an OR clause like this. If you used a very large date, like -01-01 as your "not deactivated" value then the constraint would be C.disabled > '2003-02-28' and postgres could use an index on "disabled". Alternatively if you have a disabled_flag and disabled_date then you could have an index on disabled_flag,disabled_date and uhm, there should be a way to use that index though I'm not seeing it right now. This won't matter at first when 99% of your customers are active. And ideally in this query you find some way to use an index to find "kate" rather than doing a fully table scan. But later when 90% of the clients are disabled, then in a bigger batch job where you actually want to process every active record it could prevent postgres from having to dig through a table full of old inactive records. > This may make better use of your index, because the planner will have a more > accurate estimate of the number of rows returned from the outer join. > > AND: > >AND ( C.Accountnum~* 'kate' > OR C.Firstname ~* 'kate' > OR C.Lastname ~* 'kate' > OR C.Organization ~* 'kate' > OR C.Address ~* 'kate' > OR C.Postal~* 'kate' > OR C.City ~* 'kate' > OR EA.Name || '@' || JoinDomain(EA.Domain) ~* 'kate' > > This set of expressions has "seq scan" written all over it. I hihgly suggest > that you try to find a way to turn these into anchored text searches, perhaps > using functional indexes on lower(column). If you really need to find substring matches everywhere you might want to look into the full text search module in contrib/tsearch. I haven't started using it yet but I expect I will have to when I get to that part of my project. > Finally: > > OR CMS.Package ~* 'kate' *confusion*. Oooh, Yeah, this one is a big problem. It means it's not clear which end of the join to start with. Maybe it would be better to separate this into two separate queries, give the user the option to search for a user "kate" or a package "kate" but not both simultaneously. -- greg ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [SQL] Forcing query to use an index
> -> Merge Join (cost=6106.42..6335.30 rows=2679 width=265) (actual time=859.77..948.06 rows=1 loops=1) Actually another problem, notice the big discrepancy between the estimated row and the actual rows. That's because you have the big OR clause so postgres figures there's a good chance one of the clauses will be true so it estimates a lot of rows will match. In fact of course they're all very selective and you'll usually probably only get a few records. If you're stuck with the unanchored text search it will always do a full table scan so it will never be lightening fast. But it would probably be a bit faster if you put a limit clause (on a subquery) on the table that's doing the full table scan. That will convince postgres that there won't be thousands of resulting records, which might convince it to do a nested loop. Also, as a beneficial side effect will also limit the damage if one your users does a search for "e"... This only really helps if you can get rid of the OR CMS.package clause... otherwise it actually needs all the records in case they match a summary record with a kate package. -- greg ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [SQL] Gist indexes on int arrays
Greg Stark <[EMAIL PROTECTED]> writes: > Can I have a GiST index on (foo_id, attribute_set_array) and have it be just > as fast at narrowing the search to just foo_id = 900 but also speed up the ~ > operation? Hm, so if I understand what I'm reading I can do this if I load the btree_gist contrib module as well. I'm still not sure whether it'll be worthwhile for this application though. I have a bit of a problem though. Is building GiST indexes supposed to take much much longer than building btree indexes? It's been running nearly an hour and it's still going. The hard drive is hardly moving so it seems to be all cpu usage. I don't even see any pgsql_tmp usage. db=# CREATE INDEX cache_gist_idx on cache using gist ( foo_id , attribute_set gist__int_ops); postgres 30176 86.3 22.2 64896 57344 ? R11:08 40:32 postgres: postgres slo [local] CREATE INDEX I don't remember exact numbers but building the normal btree index took on the order of 15m. This will have to be rebuilt nightly, an hour long index build won't be practical. -- greg ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] Complex outer joins?
Tom Lane <[EMAIL PROTECTED]> writes: > The SQL-standard way of writing this would presumably be either > > from G left join L on (G.SELID = L.SELID) > left join C on (L.SELID = C.SELID and L.SELLEVEL = C.SELLEVEL) I would think of it as this one. > from G left join > (L left join C on (L.SELID = C.SELID and L.SELLEVEL = C.SELLEVEL)) > on (G.SELID = L.SELID) I don't see how that would be at all different. > depending on which join you think ought to be done first. It might be > that the results are the same in this case, but I'm not convinced of > that. In general the results of outer joins definitely depend on join > order. I'm pretty sure Oracle actually builds an abstract join representation where the two queries above would actually be represented the same way. Then decides the order from amongst the equivalent choices based on performance decisions. Can you show an example where the join order would affect the result set? I can't think of any. -- greg ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] LEAST and GREATEST functions?
Josh Berkus <[EMAIL PROTECTED]> writes: > Stefan, > > > I know the LEAST and GREATEST functions are not part > > of standard SQL, but they sure were handy where I came > > from (Oracle-land). > > Um, what's wrong with MAX and MIN, exactly? MAX and MIN are single-parameter aggregate functions. LEAST and GREATEST are two-parameter (though in postgres they could be defined for 3 and more parameters) scalar functions. eg: SELECT max(a) FROM bar would return a single tuple with the maximum value of a from amongst every record. whereas: SELECT greatest(a,b) FROM bar would return one tuple for every record in the table with a single value representing the greater of bar.a and bar.b. You could define your own functions to do this but it would be tiresome to define one for every datatype. -- greg ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [SQL] summing tables
To solve this problem efficiently you probably need the lead/lag analytic functions. Unfortunately Postgres doesn't have them. You could do it with something like: update foo set c = a+b+(select c from foo as x where seq < foo.seq ORDER BY seq desc LIMIT 1) or the more standard but likely to be way slower: update foo set c = a+b+(select c from foo as x where seq = (select max(seq) from foo as y where seq < foo.seq)) However, i would suggest that if you have an implicit relationship between records you should make that relationship explicit with a foreign key. If you had a column that contained the seq of the parent record then this would be easy. I'm really puzzled how this query as currently specified could be useful. -- greg ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [SQL] summing tables
"Viorel Dragomir" <[EMAIL PROTECTED]> writes: > Anyway, in real life this update modifies only one row with a value wich is > diff of null. It was really handy if it was specified the option ORDER for > the update command. Are you hoping to produce a running total? That's very difficult in standard SQL. That would be very different from the query you asked for. Running totals, ranking, lead/lag, are all things that are very difficult to do in standard SQL. They don't fit in the unordered set model that SQL follows so doing them without special non-standard functions is very hard and inefficient. The functions to do them don't fit well within the SQL universe either, which might be why they don't exist yet in postgres. -- greg ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [SQL] Recursive request ...
Joe Conway <[EMAIL PROTECTED]> writes: > BenLaKnet wrote: > > I see connect by in Oracle > > ??? is there an equivalent in PostgreSQL or not ?? > > Someone is working on the SQL99 equivalent, but it isn't done yet. Perhaps for > 7.5. There's a connectby hack in the contrib/tablefunc directory. I haven't used it so I'm not clear on how powerful it is compared to the real deal, but people on one of the pgsql lists seemed to find it useful when it came up in the past. connectby(text relname, text keyid_fld, text parent_keyid_fld, text start_with, int max_depth [, text branch_delim]) - returns keyid, parent_keyid, level, and an optional branch string - requires anonymous composite type syntax in the FROM clause. See the instructions in the documentation below. -- greg ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [SQL] Table Partitioning and Rules
"Girish Bajaj" <[EMAIL PROTECTED]> writes: > I cant possibly index all the cols in the table. So I thought Id best manage > the data by splitting up the table into multiple partitions and eventually > depending on application logic, only scan those tables that are necessary to > scan sequentially instead of the whole big table. But that's only going to help if one of the columns they're searching on is the last name column isn't it? I'm a fan of partitioned tables but you have to consider what advantage you're trying to achieve to understand if it will actually be helpful for you: Partitioned tables only really improve query performance if virtually all queries use a common constraint. The canonical example is accounting tables being partitioned based on fiscal year. Virtually all the queries--even ones doing massive batch queries best served by sequential scans--will only scan the current fiscal year. In your case unless you can impose a constraint on the UI that users always perform their queries on a single letter of the alphabet at a time and only see results for people whose last names match that letter, it's not really a great match as far as query performance. The other advantage of partitioned tables is space management; it allows placing each partition on a separate physical storage space. However without native support in Postgres doing it via rules is going to be a headache. I would think you would be better off striping the disks together and storing it as a single large table. That's the only clean approach Postgres really allows at this point anyways. Finally, if I WAS going to partition based on the first letter of a text string, which I doubt I would, I would probably create 26 partitions right off the bat. Not try to make up arbitrary break points. If those arbitrary breakpoints turn out to be poorly chosen it'll be a complex manual job to move them. Whereas if you just have 26 partitions some will be large and some small and you can move partitions between physical storage freely to balance things. -- greg ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] casting to arrays
Joe Conway <[EMAIL PROTECTED]> writes: > Not possible in current releases, but it will be in 7.4 (about to start beta). > It looks like this: Well there is the int_array_aggregate function in the contrib/intagg directory. It has to be compiled separately, and it has a few quirks (like the arrays are zero-based instead of 1-based) but it works more or less, and it does exactly what you describe. But the 7.4 stuff should be much cleaner and more flexible, so if you don't need it right now you're better off waiting. -- greg ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [SQL] min() and NaN
Tom Lane <[EMAIL PROTECTED]> writes: > NULL can be special, because it acts specially in comparisons anyway. > But NaN is just a value of the datatype. Does postgres intend to support all the different types of NaN? Does you intend to have +Inf and -Inf and underflow detection and all the other goodies you actually need to make it useful? If not it seems more useful to just use the handy unknown-value thing SQL already has and turn NaN into a NULL. -- greg ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [SQL] numeric and float converts to int differently?
"SZÛCS Gábor" <[EMAIL PROTECTED]> writes: > > cannot see is that the float values are not actually exactly 0.5 > > Yes I could guess that (floating point vs fixed), but is this a coincidence > that both '0.5'::float and '-0.5'::float are closer to 0, whereas they could > be closer to +/-1, as well as both closer to the lower or upper bound. Wouldn't 0.5 and -0.5 be representable exactly as floats? -- greg ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] Table versions
Tom Lane <[EMAIL PROTECTED]> writes: > Returning to the original problem, it seems to me that comparing "pg_dump > -s" output is a reasonable way to proceed. I've actually started checking in a pg_dump -s output file into my CVS tree. However I prune a few key lines from it. I prune the TOC OID numbers from it, and anything not owned by the user I'm interested in. The makefile rule I use looks like: schema.sql: pg_dump -U postgres -s user | sed '/^-- TOC entry/d;/^\\connect - postgres/,/^\\connect - user/d;/^SET search_path/d;/^$$/d;/^--$$/d' > $@ This still suffers from one major deficiency. The order that objects are outputed isn't necessarily consistent between databases. If I add tables to the development server but then add them to the production server in a different order the schema still shows differences even though the objects in the two databases are identical. -- greg ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] Getting last insert value
Guillaume LELARGE <[EMAIL PROTECTED]> writes: > Doing a "select currval() from my_table" after your insert should work. That's "select currval('my_table_pkcol_seq')" actually. The above would have called the currval() function for every record of the table which isn't what you want and in any case currval takes an argument. -- greg ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [SQL] Programatically switching database
ow <[EMAIL PROTECTED]> writes: > My concern though ... wouldn't pgSql server collapse when faced with > transaction spawning across 100M+ records? The number of records involved really doesn't faze Postgres at all. However the amount of time spent in the transaction could be an issue if there is other activity in other schemas of the same database. As long as the transaction is running none of the deleted or old updated data in any schema of the database can be cleaned up by vacuum as postgres thinks the big transaction "might" need to see it sometime. So if the rest of the database is still active the tables and indexes being updated may grow larger than normal. If it goes on for a _really_ long time they might need a VACUUM FULL at some point to clean them up. -- greg ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] Expressional Indexes
"Randolf Richardson, DevNet SysOp 29" <[EMAIL PROTECTED]> writes: > For example, if I want to index on a date field but only have the index > keep track of the most recent 30 days (and then create a secondary index for > all dates) so as to improve performance on more heavily loaded systems. > > Am I understanding this new terminology correctly? Thanks in advance. No, you could do the above using "partial indexes" but it wouldn't work very well in this case because the "last 30 days" keeps moving and you would have to keep redefining the index periodically. It also wouldn't really help performance. Expression Indexes are just more powerful "functional indexes". In 7.3 they could be used for indexing expressions like "lower(foo)". In 7.4 they're more powerful and you can index expressions other than simple function calls. They still should be things that always return the same value, which excludes subqueries. -- greg ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [SQL] Expressional Indexes
Tom Lane <[EMAIL PROTECTED]> writes: > No, because the above represents a moving cutoff; it will (and should) > be rejected as a non-immutable predicate condition. You could do > something like > > CREATE INDEX my_Nov_03_index on my_table (create_date) > WHERE (create_date >= date '2003-11-01'); > > and then a month from now replace this with > > CREATE INDEX my_Dec_03_index on my_table (create_date) > WHERE (create_date >= date '2003-12-01'); > > bearing in mind that this index can be used with queries that contain > WHERE conditions like "create_date >= some-date-constant". The planner > must be able to convince itself that the right-hand side of the WHERE > condition is >= the cutoff in the index's predicate condition. Since > the planner is not very bright, both items had better be simple DATE > constants, or it won't be able to figure it out ... Note that if you're just doing this to speed up regular queries where you have create_date in some small range, then you'll likely not see much of an increase. Mainly you'll just save space. What can be interesting is to create a partial index like this but over a second unrelated column. Something like: CREATE INDEX my_dec_03_index on my_table (userid) WHERE (create_date >= date '2003-11-02'); Then you can do queries like SELECT * FROM my_table WHERE userid = ? AND create_date >= date '2003-11-02' And it'll be able to efficiently pull out just those records, even if there are thousands more records that are older than 2003-11-02. This avoids having to create a two-column index with a low-selectivity column like "month". -- greg ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [SQL] Scaler forms as function arguments
Joe Conway <[EMAIL PROTECTED]> writes: > In 7.4 you could use an array. It would look like this: Though note that 7.4 doesn't know how to optimize this form: db=> explain select * from foo where foo_id in (1,2); QUERY PLAN - Index Scan using foo_pkey, foo_pkey on foo (cost=0.00..6.05 rows=2 width=756) Index Cond: ((foo_id = 1) OR (foo_id = 2)) (2 rows) db=> explain select * from foo where foo_id = ANY (array[1,2]); QUERY PLAN -- Seq Scan on foo (cost=0.00..1132.82 rows=5955 width=756) Filter: (foo_id = ANY ('{1,2}'::integer[])) (2 rows) -- greg ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [SQL] OFFSET and subselects
[EMAIL PROTECTED] (Dmitri Bichko) writes: > I am running in trouble with pagination here, somehow (rather naively) I > assumed that when doing a LIMIT and OFFSET, the subselects on the records > before the OFFSET would not be performed, which quite apparently is not the > case. So, LIMIT 50 OFFSET 0 takes 100ms to run, LIMIT 50 OFFSET 50 takes > 200ms, LIMIT 50 OFFSET 100 takes 300ms; and so forth, this really becomes > unacceptable after a few pages. If you don't need any of the results of the subqueries in your WHERE clause then you can do this by introducing a view in your query like: SELECT *, (SELECT ...) AS sub_1, (SELECT ...) AS sub_2, (SELECT ...) AS sub_3 FROM ( SELECT x,y,z FROM ... WHERE ... ) LIMIT 50 OFFSET 50 If you do use the results of the subqueries in your where clause or order by clause then, well, you're SOL. Since the OFFSET and LIMIT clauses only kick in after the where clause restrictions are taken into account. -- greg ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [SQL] Index not recognized
"Grace C. Unson" <[EMAIL PROTECTED]> writes: > Why is it that my index for text[] data type is not recognized by the > Planner? > > I did these steps: > > 1. create function textarr(text[]) returns text language sql as 'select > $1[1]' strict immutable > 2. create index org_idx on EmpData (textarr(org)); This index will only be used if you use the expression textarr(org) in your query. You would probably have some success if you did: select * from empdata where textarr(org) = 'math' > 3. vacuum full > 4. explain analyze select name from EmpData where org *= 'math'; Is this *= operator from the contrib/array directory? It's not an indexable operator at all using standard btree indexes. The GiST indexing does make indexable operators that can do things like *= but that's a whole other ball of wax. What are you really trying to do? > Result: > = > Seq Scan on EmpData (cost=0.00..3193.20 rows=102 width=488) > (actual time=3.71.35..371.35 rows=0 loops=1) > > Filter: (org[0]='math'::text) Well that's awfully odd. I don't know how that expression came out of the query you gave. You'll have to give a lot more information about how you're defining *= and why you think it's related to the function you used to define the index. -- greg ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [SQL] How can I get the last element out of GROUP BY sets?
Tom Lane <[EMAIL PROTECTED]> writes: > Robert Creager <[EMAIL PROTECTED]> writes: > > ... one piece of data I need is the last value for each GROUP BY > > period. Alas, I cannot figure out how to do this. > > SELECT DISTINCT ON (rather than GROUP BY) could get this done for you. Or if you need to combine this with other aggregate functions like sum, count, etc: CREATE FUNCTION first_accum (integer, integer) RETURNS integer AS 'select coalesce($1,$2)' LANGUAGE sql; CREATE FUNCTION last_accum (integer, integer) RETURNS integer AS 'select $2' LANGUAGE sql; CREATE AGGREGATE first (BASETYPE = integer, SFUNC = first_accum, STYPE = integer); CREATE AGGREGATE last (BASETYPE = integer, SFUNC = last_accum, STYPE = integer); Then you can do first() and last(). These definitions only work for integer but you can pattern match for other datatypes. You might be able to get a universal function working using anyelement now, I haven't tried. -- greg ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] limit 1 and functional indexes
Bruno Wolff III <[EMAIL PROTECTED]> writes: > >QUERY PLAN > > > > Sort (cost=11824.16..11831.55 rows=2956 width=33) (actual time=248.17..248.17 > > rows=0 loops=1) > >Sort Key: order_date > >-> Index Scan using transactions_pop_i on transactions > > (cost=0.00..11653.79 rows=2956 width=33) > > (actual time=126.13..126.13 rows=0 loops=1) > > Index Cond: (upper((pop)::text) = > > '79BCDC8A4A4F99E7C111'::text) > > Total runtime: 248.25 msec Yeah, the problem with functional indexes is that the optimizer doesn't have any clue how the records are distributed since it only has statistics for columns, not your expression. Notice it's estimating 2956 rows where in fact there are 0. I think someone was actually working on this so it may be improved in 7.5 but I'm not sure. Given the type of data you're storing, which looks like hex strings, are you sure you need to do a case-insensitive search here? Can't you just uppercase it when you store it? The other option would be to use a subquery and force the planner not to pull it up, something like: select code from ( select code from transactions where UPPER(pop) = UPPER('79bcdc8a4a4f99e7c111') offset 0 ) order by order_date DESC; The offset 0 prevents the optimizer from pulling the subquery into the outer query. I think this will prevent it from even considering the order_date index scan, but you'll have to try to be sure. -- greg ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] limit 1 and functional indexes: SOLVED
"Alexandra Birch" <[EMAIL PROTECTED]> writes: > It works perfectly - thanks a million! > Strangely the offset 0 does not seem to make any difference. > Gotta read up more about subqueries :) > > explain analyze > select code,order_date >from ( > select code, order_date >from transactions > where UPPER(pop) = UPPER('c892eb2f877e3a28ddc8e196cd5a8aad') > limit 1 > ) as foo > order by order_date DESC; I think what you're trying to do here is get the last order? Then you'll want the limit to be on the outer query where it's ordered by order_date: select code,order_date from ( select code, order_date from transactions where UPPER(pop) = UPPER('c892eb2f877e3a28ddc8e196cd5a8aad') offset 0 ) as foo order by order_date DESC; limit 1 Note that in theory postgres should be able to find the same plan for this query as yours since it's equivalent. It really ought to use the order_date index since it thinks it would be more efficient. However it's unable to because postgres doesn't try every possible index, only the ones that look like they'll be useful for a where clause or an order by. And the order by on the outer query isn't considered when it's looking at the subquery. It normally handles this case by merging the subquery into the outer query, but it can't do that if there's a limit or offset. So an "offset 0" is convenient for fooling it into thinking the subquery can't be pulled up without actually changing the output. You could do "order by upper(pop)" instead which might be clearer for someone reading the query in that it makes it look like you're trying to encourage it to use the index on upper(pop). In theory "order by"s on subqueries are useless and postgres could ignore them, but it doesn't. -- greg ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] Index not used - now me
Christoph Haller <[EMAIL PROTECTED]> writes: > Why appear? If the Index Scan has a Total runtime: 2.46 msec and the Seq Scan > a Total runtime: 46.19 msec, then the Index Scan is much faster. > Or am I completely off the track reading the explain analyze output? To estimate the relative costs of a sequential scan and an index scan Postgres has to take into account the likelihood the blocks needed will be the disk cache. In your example your database is otherwise idle and the entire table is small enough that the entire index is probably in cache. This means that the random access pattern of the index isn't really hurting the index scan at all. Whereas in a busy database with less available RAM the random access pattern makes a big difference. You could try raising effective_cache_size to give postgres a better chance at guessing that all the blocks will be in cache. But that may no longer be true when the query is run on a busy database. You could also try lowering random_page_cost. Some people find as low as 1.2 or so to be useful, but that would almost certainly be lying to postgres about the costs of random access and would cause it to use index scans aggressively even when they're not faster. -- greg ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [SQL] Slow sub-selects, max and count(*)
Josh Berkus <[EMAIL PROTECTED]> writes: > Max() and Count() cannot use indexes for technical reasons. Browse through > the archives of SQL, PERFORM, and HACKERS for about 12,000 discussions on the > subject. Please don't confuse the issue by throwing Max() and Count() into the same basket. The issue with Min()/Max() is that the result could be generated efficiently by scanning indexes but it's just hard, especially given generic aggregate functions, and the work simply hasn't been done, or even started, yet. The issue with Count() is that people want the result to be cached in a single per-table counter, but that can't be done as simply as that because of transactions. People have discussed complex solutions to this but it's a much more complex problem than it appears. They're really two entirely separate issues. -- greg ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [SQL] SQL challenge--top 10 for each key value?
Jeff Boes <[EMAIL PROTECTED]> writes: > I headed off in the direction of groups of SELECTs and UNIONs, and quit when I > got to something like four levels of "SELECT ... AS FOO" ... four? wimp, that's nothing! ok, seriously I think there's no way to do this directly with straight SQL. You would have to define a non-immutable function that has some temporary storage where it keeps track of how many it has seen. The generic function that would help here would be some kind of rank(value) that would give you the equivalent of rownum except with a level break every time value changes. I've been hoping to see something like this on the list for a long time but haven't yet. If the value of n is constant and small you could cheat with an aggregate function with an array of the top n values. db=> create function first3_accum(integer[],integer) returns integer[] as 'select case when array_upper($1,1) >= 3 then $1 else array_append($1,$2) end' language sql strict immutable; CREATE FUNCTION db=> create aggregate first3 (basetype = integer, sfunc = first3_accum, stype = integer[], initcond = '{}'); CREATE AGGREGATE then something like: SELECT first3(id) FROM (SELECT id FROM my_table ORDER BY query, CASE WHEN include THEN 1 ELSE 2 END ASC, score DESC) GROUP BY query But then you'll have to go back to the table to refetch the original records that you've found. The best way I find to do that is with the int_array_enum() function from the int_aggregate contrib module. SELECT * FROM my_table WHERE id IN ( SELECT int_array_enum(f3) FROM ( SELECT first3(id) as f3 FROM (SELECT id FROM my_table ORDER BY query, CASE WHEN include THEN 1 ELSE 2 END ASC, score DESC) as x GROUP BY query ) as x ) This last step is kind of annoying since you've already seen all those records. And it requires writing a new aggregate function every time the value of n changes though, which kind of sucks. In theory if the new work in 7.5 handling structured datatypes is as cool as it sounds you could have an array of complete records and when UNNEST is eventually incorporated into the array code then you could expand those instead of using the int_array_enum function. Neither of those things are ready yet as far as I know though. -- greg ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [SQL] SQL challenge--top 10 for each key value?
Josh Berkus <[EMAIL PROTECTED]> writes: > Rod, > > > Something along the lines of the below would accomplish what you want > > according to spec. ROW_NUMBER() is a spec defined function. (6.10 of > > SQL200N) > > Great leaping little gods! They added something called "row number" to the > spec? > > Boy howdy, folks were right ... the ANSI committee really has completly blown > off the relational model completely. If it's like Oracle's rownum then it's the row number of the *output*, not the position on disk. So it's not entirely blowing off the relational model any more than ORDER BY does. The weird thing is the number of cases where you want ORDER BY or rownum inside subselects. Which the solution to the original question needed. > When a standards committee becomes hostage to a handful of vendors, kiss > real standards goodbye. In the case of SQL was there ever any pretension otherwise? Was the SQL standard ever really useful as a "real standard"? I can write useful ANSI C89 code that will compile and work on any C compiler. Trying to write portable SQL92 code that does any useful work is about as productive as stapling bagels to your forehead. -- greg ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] SQL challenge--top 10 for each key value?
elein <[EMAIL PROTECTED]> writes: > create or replace function pycounter(integer) > returns integer as > ' >if args[0] == 0: > SD["nextno"] = 1 > return SD["nextno"] >try: > SD["nextno"] += 1 >except: > SD["nextno"] = 1 >return SD["nextno"] > ' language 'plpythonu'; > > And clearly it can be done faster as a little > C function. Does this approach have a hope of working if it's used twice in the same query? -- greg ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] trigger/for key help
Tom Lane <[EMAIL PROTECTED]> writes: > Bret Hughes <[EMAIL PROTECTED]> writes: > > FWIW I tried to use alter table but could never get the parser to accept > > $1 as a constraint name. I used single and double quotes as well as a > > lame attempt \$1. > > Hm, "$1" works for me ... Hm, this reminds me. When I was first learning this stuff I was stymied by the same issue. It took me quite a while to figure out how to drop constraints because of the quoting issue. Of course now it seems obvious, but for someone just starting it adds another roadblock. Is there a reason postgres goes out of its way to pick names that will be harder to work with than necessary? Or is it considered a good thing on the theory that if it's hard to reference it's also hard to accidentally use such names in conflicting ways? Perhaps names like _1 _2 ... would be easier to handle? Or perhaps making $ not require quoting would be helpful? -- greg ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] trigger/for key help
Tom Lane <[EMAIL PROTECTED]> writes: > Greg Stark <[EMAIL PROTECTED]> writes: > > Is there a reason postgres goes out of its way to pick names that > > will be harder to work with than necessary? > > If we use ordinary identifiers for system-generated names then we will > be infringing on user name space --- ie, there's a potential for > conflict. I suppose we could use long randomly-generated names like > ewjncm343cnlen, but are those really easier to work with? I don't see an unseverable link between "user name space" and "identifiers that don't need to be quoted". Mixed case names for instance seem like perfectly good user name space identifiers. Postgres could just as easily say "the system reserves all identifiers starting with $" and still not require quoting $. > I think a more useful approach is to treat it as a documentation > problem. Perhaps an example in the ALTER TABLE man page would help. -- greg ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [SQL] Concatenate results of a single column query
Marco Lazzeri <[EMAIL PROTECTED]> writes: > SELECT > p.name, ARRAY(SELECT nicknames FROM people WHERE people.id = p.id) > FROM people AS p > Any suggestions? Something like: db=> create aggregate array_aggregate (basetype = integer, sfunc = array_append, stype = integer[], initcond = '{}'); CREATE AGGREGATE db=> select array_aggregate(id) from tab; array_aggregate {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,28,29,30,31,32,33,34,36,37,38,39,40,41,42,43,27,26,44,45,46,47,48,49,50,51,52,53,54,35} (1 row) -- greg ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [SQL] Concatenate results of a single column query
Christoph Haller <[EMAIL PROTECTED]> writes: > Interesting feature, but I cannot find function array_append: > ERROR: AggregateCreate: function array_append(integer[], integer) does not exist It's new in Postgres 7.4 I think you could do this in 7.3 though, it would just be more awkward. Try || but I think that's new in 7.4 as well. Otherwise I think you would have to pick out the upper bound of the array with array_dims and set the upper+1'th element of the array. If you're doing text you may want to go directly to a textual concatenation like: CREATE FUNCTION concat_agg_accum(text, text) RETURNS text AS 'select $1 || '', '' || $2' LANGUAGE sql IMMUTABLE STRICT; CREATE AGGREGATE concat_agg ( BASETYPE = text, SFUNC = concat_agg_accum, STYPE = text ); -- greg ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [SQL] isolation level
Bruno Wolff III <[EMAIL PROTECTED]> writes: > On Thu, May 13, 2004 at 18:13:23 +, > Jaime Casanova <[EMAIL PROTECTED]> wrote: > > Hi all, is there a way to set the isolation level to something like the > > sql standard dirty read. > > No. There will be a way to use the standard name in a SET command, but > you will actaully get READ COMMITTED isolation (which is the next step up). I wonder how hard this would be to implement. It doesn't seem like it should be very hard. It would be very convenient for debugging and for checking on the progress of large batch updates or loads. -- greg ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [SQL] Multiple outer join on same table
Marco Lazzeri <[EMAIL PROTECTED]> writes: > Hi! > I'm searching a better (quicker) way to retrieve data as I used to do > using the following query... > > == > > SELECT main.codice, >other.value AS value_one, >other.value AS value_two > FROM main > LEFT OUTER JOIN other ON (main.id = other.id_main) > LEFT OUTER JOIN other AS other2 ON (main.id = other2.id_main) > WHERE other.type = 'type_one' >AND other2.type = 'type_two' > ; a) you're better off sending the actual query rather than retyping it. I assume you made a typo in the select column list and it should be "other2.value AS value_two"? Also the parentheses are required on the ON clause. b) The WHERE clause will effectively make this a plain inner join, not an outer join at all. Since any values that aren't found would have a NULL type column and cause the row to not be selected. I think the query you meant to write would be SELECT codice, other1.value AS value_one, other2.value AS value_two FROM main LEFT OUTER JOIN other as other1 ON (main.id = other1.id_main AND type = 'type_one') LEFT OUTER JOIN other as other2 ON (main.id = other2.id_main AND type = 'type_two) Another way to write this query that might be faster or might not depending would be: SELECT codice, (SELECT value FROM other WHERE id_main = id AND type = 'type_one') AS value_one, (SELECT value FROM other WHERE id_main = id AND type = 'type_two') AS value_two FROM codice In theory the two queries really ought to always result in the same plan because they're equivalent. However the Postgres optimizer as clever as it is is incapable of seeing this. The first form with the outer join leaves the optimizer with a lot more flexibility though, including at least one plan that is effectively identical to what the optimizer is forced to do for the second query. So really the first one should be no worse than the second. If you find the second faster (or if they're both still slow) you might consider posting explain analyze output for both queries. It may be that you have other issues preventing the optimizer from finding a good plan. You have run analyze on these tables recently? And you vacuum regularly? And for the second query you would really want an index on other.id_main too. For the first one it would depend on the data in the two tables. -- greg ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] Selecting "sample" data from large tables.
> Joseph Turner <[EMAIL PROTECTED]> writes: > > I have a table with a decent number of rows (let's say for example a > > billion rows). I am trying to construct a graph that displays the > > distribution of that data. However, I don't want to read in the > > complete data set (as reading a billion rows would take a while). Can > > anyone thing of a way to do this is postgresql? One way would be to have an indexed column with random values in it. Then you could use an index scan to pull out samples. However this has a few downsides. a) index scans are a lot less efficient than sequential scans. Effectively reducing the sample size you can get for a given amount of time even further. a 10% sample using this technique is probably almost as slow as reading the entire table, for example. If you only need .1% though this might be a good approach. b) the data in the random column would have to be static meaning multiple samples wouldn't be entirely independent. Depending on what you're doing with the samples this might be a problem or not. Oh, and if you're content with always using the same sample but want to analyze it multiple different ways, you might want to use a partial index. You could have partial indexes matching the order by and where clauses of your analyses but with a where clause of its own selecting on the random data. Then make sure that where clause is in every query. But still, the sample has to be small enough that even using an index scan you're winning over reading the entire data. -- greg ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] Empty array IS NULL?
Markus Bertheau <[EMAIL PROTECTED]> writes: > oocms=# SELECT ARRAY(SELECT 1 WHERE FALSE); > ?column? > -- > This one seems strange to me. Shouldn't it result in an empty array? -- greg ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: FW: [SQL] "=" operator vs. "IS"
Rich Hall <[EMAIL PROTECTED]> writes: > "(anything) = NULL" is always Null, this cannot be what the coder intended. I often have such things in my SQL. Consider what happens when you have SQL constructed dynamically. Or more frequently, consider that many drivers still don't use the new binary placeholder syntax and emulate it by putting the parameters directly into the SQL. -- greg ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: FW: [SQL] "=" operator vs. "IS"
Stephan Szabo <[EMAIL PROTECTED]> writes: > IS TRUE and IS FALSE have a different effect from =true and =false when > the left hand side is NULL. The former will return false, the latter will > return NULL. No, actually they both return false. (But thanks, I didn't even realize they were special this way) -- greg ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [SQL] How to create an aggregate?
Ray Aspeitia <[EMAIL PROTECTED]> writes: > I also would like to pass the delimiter to the aggregate as a parameter and > I am not sure if it can handle that. Currently aggregates that take multiple parameters are just not supported. -- greg ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [SQL] reply to setting
Joe Conway <[EMAIL PROTECTED]> writes: > This is very true. In fact, I get mildly annoyed when people *don't* include > the direct reply to me, because I very actively filter/redirect my mail. > Replies directly to me are pretty much guaranteed to be seen quickly, but the > ones that go to the list might get lost among the hundreds of posts that go > into my "postgres" inbox every day. I think many other people do something > similar. Just as a side comment, one trick I found very helpful in my mail filters is to treat any message with one of my message-ids in the references as a personal message as far as mail notifications. This way I get notifications for any message on a thread following a post of my own. This is easy in Gnus since the message id has the sending hostname and also the first few characters has a base64 encoded copy of the unix userid. You would have to figure out how to recognize message-ids from your MUA. -- greg ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [SQL] sleep function
John DeSoi <[EMAIL PROTECTED]> writes: > On Aug 10, 2004, at 10:57 AM, Bruce Momjian wrote: > > > I can't think of one, no. I think you will have to use one of the > > server-side languages and call a sleep in there. > > This is no good in the real world since it pounds the CPU, but it worked well > enough for my testing purposes. You went the long way around. I think what he meant was something as simple as: bash-2.05b$ /usr/lib/postgresql/bin/createlang -U postgres plperlu test bash-2.05b$ psql -d test -U postgres Welcome to psql 7.4.2, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help on internal slash commands \g or terminate with semicolon to execute query \q to quit test=# create or replace function sleep(integer) returns integer as 'return sleep(shift)' language plperlu; CREATE FUNCTION test=# \timing Timing is on. test=# select sleep(10) ; sleep --- 10 (1 row) Time: 10002.493 ms -- greg ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] only last records in subgroups
Dino Vliet <[EMAIL PROTECTED]> writes: > x,0 and y,4 but how do I manage this in sql? I was > hoping for a keyword LAST or so, where I can specify > that when I've ordered my results with order by, I > could only get the last of the subgroups (the first > one is easy because I could use limit 1) There's no concept of "first" and "last" in SQL outside of the ORDER BY clause of your query. And you can easily reverse the order of the ORDER BY sort by putting "DESC" after the columns you're sorting on. But I don't understand how you intend to use "LIMIT 1" to solve your problem. As you describe the problem you want the last (or first) record of *each* *group*. Solving that using LIMIT would require a complex query with a subquery in the column list which would be quite a pain. As the other poster suggested, if you're just looking to fetch a single column you can just use min() or max() to solve this. If you're looking to fetch more than one column Postgres provides a non-standard SQL extension for dealing with this situation, "DISTINCT ON". SELECT DISTINCT ON (id) id,day,other,columns FROM tab ORDER BY id,day That gets the lowest value of "day". Using "ORDER BY id, day DESC" to get the greatest value of "day". -- greg ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] from PG_DUMP to CVS
"Riccardo G. Facchini" <[EMAIL PROTECTED]> writes: > After searching throught the list, I assume you mean this link: > http://www.rbt.ca/autodoc/index.html > by Rod Taylor. > > Looks promising, but still what I need is a proper CVS output, as I > need to review the changes made to the specific database structure. Well, CVS can still be useful even if the changes are all in one file. Look at "cvs annotate" for example. And CVS diff would still give you useful information. -- greg ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [SQL] Aggregate query for multiple records
Scott Gerhardt <[EMAIL PROTECTED]> writes: > Hello, I am new to the list, my apology if this question is beyond the scope or > charter of this list. > > My questions is: > What is the best method to perform an aggregate query to calculate sum() values > for each distinct wid as in the example below, but except for all wid's (not > just WHERE wid='01/1-6-1-30w1/0'). > > Also, performance wise, would it be better to build a function for this query. > The table has 9 million records and these aggregate queries take hours. The "top n" type query (or in this case "first n" or "last n" but it's the same thing) is actually very tricky to do in standard SQL. The best solution seen here for postgres is to use arrays and custom aggregate functions. The following is based on a previous answer from Tom Lane to a similar question. (I thought I already posted this for you on pgsql-general but the list archives are down and you don't seem to have seen it, so I'm resending it) It allows you to do the whole query with a single sort for the grouping and the ordering by date together. You would have to use it with something like: SELECT sum_first_6(oil) FROM (SELECT oil from prd_data ORDER BY wid, "date") GROUP BY wid If you pump up sort_mem enough -- you can do it within the session for the single connection using "SET sort_mem" it should be pretty fast. I think it's the best you're going to get. If you're absolutely sure the data is physically stored in chronological order -- which I would only feel comfortable with if you've never done any updates or deletes, only inserts and perhaps occasional truncates, then you might be able to get by without ordering and convince it to do a hash aggregate. That would be the optimal result, no sorts at all. But it would be hard to make sure it would always work. test=> create or replace function first_6_accum (integer[], integer) returns integer[] language sql immutable as 'select case when array_upper($1,1)>=6 then $1 else $1||$2 end'; CREATE FUNCTION test=> create function sum_6(integer[]) returns integer immutable language sql as 'select $1[1]+$1[2]+$1[3]+$1[4]+$1[5]+$1[6]'; CREATE FUNCTION test=> create aggregate sum_first_6 (basetype=integer, sfunc=first_6_accum, stype=integer[],initcond='{}',finalfunc=sum_6); CREATE AGGREGATE test=> select sum_first_6(i) from (select i from (select 1 as i union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8) as x order by i desc) as x; sum_first_6 - 33 (1 row) test=> select sum_first_6(i) from (select i from (select 1 as i union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8) as x order by i asc) as x; sum_first_6 - 21 (1 row) This can easily be switched around to make it "last_6" and you can write functions to handle 6 records or 9 records. And all of these could be combined in a single query, so you only have to do the sort once. Unfortunately you cannot make aggregate functions that take multiple parameters, nor can you pass extra parameters to the state function. So you'll have to create a separate set of functions for each variant. Also, you'll have to change it to use reals. -- greg ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] Isnumeric function?
Theo Galanakis <[EMAIL PROTECTED]> writes: > error: btree item size 2744 exceeds maximum 2713. > > I assume I had to change some server settings to extend the maximum, however I would guess the block size. But I'm just guessing. > in the end this column holds content, and even applying an index would be > incredible slow to search across hundred of thousands of "content" records > looking for a primary key. Perhaps you could have an indexed column that contains a crc32 hash? Then you could do searches by comparing crc32 which make for fast efficient integer index lookups. You should still include a comparison against the original content column since it is possible for there to be a rare crc32 collision. This doesn't let you do range lookups efficiently. But it does let you look up specific values quickly even when they aren't numeric. -- greg ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [SQL] Isnumeric function?
Theo Galanakis <[EMAIL PROTECTED]> writes: > I created the Index you specified, however it chooses to run a seq scan on > the column rather than a Index scan. How can you force it to use that > Index.. > > CREATE INDEX idx_content_numeric ON botched_table(content) WHERE content ~ > '^[0-9]{1,9}$'; > > select * from botched_table where content = 200::integer You need to put a "and content ~ '^[0-9]{1,9}$'" in your query, the clause has to match the clause in the partial index pretty closely. perhaps you would find it convenient to make a view of select * from botched_table where content ~ '^[0-9]{1,9}$' and then just always select these values from that view. Also the "::integer" is useless. It actually gets cast to text here anyways. The index is on the text contents of the content column. You might consider making the index a functional index on content::integer instead. I suspect that would be faster and smaller than an index on the text version of content: slo=> create table botched_table (content text); CREATE TABLE slo=> create index idx_botched_table on botched_table ((content::integer)) where content ~ '^[0-9]{1,9}$'; CREATE INDEX slo=> create view botched_view as (select content::integer as content_id, * from botched_table where content ~ '^[0-9]{1,9}$'); CREATE VIEW slo=> explain select * from botched_view where content_id = 1; QUERY PLAN Index Scan using idx_botched_table on botched_table (cost=0.00..3.72 rows=3 width=32) Index Cond: ((content)::integer = 1) Filter: (content ~ '^[0-9]{1,9}$'::text) (3 rows) -- greg ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [SQL] Isnumeric function?
Greg Stark <[EMAIL PROTECTED]> writes: > Theo Galanakis <[EMAIL PROTECTED]> writes: > > > I created the Index you specified, however it chooses to run a seq scan on > > the column rather than a Index scan. How can you force it to use that > > Index.. > > > > CREATE INDEX idx_content_numeric ON botched_table(content) WHERE content ~ > > '^[0-9]{1,9}$'; > > > > select * from botched_table where content = 200::integer > > You need to put a "and content ~ '^[0-9]{1,9}$'" in your query, the clause has > to match the clause in the partial index pretty closely. Well this is weird. I tried to come up with a cleaner way to arrange this than the view I described before using a function. But postgres isn't using the partial index when it seems it ought to be available. When I say it has to match "pretty closely" in this case I think it would have to match exactly, however in the case of simple range operators postgres knows how to figure out implications. Ie, "where a>1" should use a partial index built on "where a>0". slo=> create table test (a integer); CREATE TABLE slo=> create index idx_text on test (a) where a > 0; CREATE INDEX slo=> explain select * from test where a > 0; QUERY PLAN Index Scan using idx_text on test (cost=0.00..17.50 rows=334 width=4) Index Cond: (a > 0) (2 rows) slo=> explain select * from test where a > 1; QUERY PLAN Index Scan using idx_text on test (cost=0.00..17.50 rows=334 width=4) Index Cond: (a > 1) (2 rows) That's all well and good. But when I tried to make a version of your situation that used a function I found it doesn't work so well with functional indexes: slo=> create function test(integer) returns integer as 'select $1' language plpgsql immutable; CREATE FUNCTION slo=> create index idx_test_2 on test (test(a)) where test(a) > 0; CREATE INDEX slo=> explain select test(a) from test where test(a) > 0; QUERY PLAN -- Index Scan using idx_test_2 on test (cost=0.00..19.17 rows=334 width=4) Index Cond: (test(a) > 0) (2 rows) slo=> explain select test(a) from test where test(a) > 1; QUERY PLAN --- Seq Scan on test (cost=0.00..25.84 rows=334 width=4) Filter: (test(a) > 1) (2 rows) I can't figure out why this is happening. I would think it has something to do with the lack of statistics on functional indexes except a) none of the tables is analyzed anyways and b) the estimated row count is the same anyways. -- greg ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [SQL] Isnumeric function?
Tom Lane <[EMAIL PROTECTED]> writes: > Greg Stark <[EMAIL PROTECTED]> writes: > > That's all well and good. But when I tried to make a version of your > > situation that used a function I found it doesn't work so well with > > functional indexes: > > ... > > I can't figure out why this is happening. > > You're using 7.3 or older? 7.4.3. -- greg ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [SQL] implementing an out-of-transaction trigger
"Iain" <[EMAIL PROTECTED]> writes: > Though, as far as I can tell, there is no way to have the notify activate a > pl/pgsql function directly. I'll still need to write a client program to > create a session and actually do the listening, that is if I havn't missed > anything else... Right, presumably some sort of daemon that sits and waits for events. Much like you would have with Oracle advanced queuing and such. The big difference is that NOTIFY doesn't pass along any parameters. You will need some way for your daemon to find any pending data it needs to process. You might need some kind of queue table, or you might be able to get by without one. -- greg ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] 1-byte integers
stig erikson <[EMAIL PROTECTED]> writes: > how can i specify an integer to be one byte byte or even 4 bits long? > int1, int(1), tinyint are nowhere to be seen. > smallest i can find is smallint that is 2 bytes. There's a type called "char" (the double quotes are needed). It's used by postgres system catalogues but it's available for users too. It's a little quirky since it has operators that treat it as a 1 byte text data type and other operators that treat it as an integer data type. But that doesn't normally lead to any problems, just strange symptoms when your code has a bug. Other than that there are things like bit(4) which has a cast to and from integer. But they won't save you any storage space. If you have multiple columns like this and want to define a new type that aggregates them all for storage but lets you access them individually that could be useful. -- greg ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] ORDER BY and NULLs
T E Schmitz <[EMAIL PROTECTED]> writes: > I just dug out the PostgreSQL book again because I thought I might've garbled > it: > > Quote: "PostgreSQL will not index NULL values. Because an index will never > include NULL values, it cannot be used to satisfy the ORDER BY clause of a > query that returns all rows in a table." You should just cross out that whole section. It's just flatly wrong. I had always assumed it was just people bringing assumptions over from Oracle where it is true. Perhaps this book is to blame for some of the confusion. Which book is it? Postgres indexes NULLs. It can use them for ORDER BY clauses. Where it cannot use them is to satisfy "WHERE foo IS NULL" or "WHERE foo IS NOT NULL" constraints though. That's an implementation detail, but it can be worked around with partial indexes. -- greg ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [SQL] ORDER BY and NULLs
Tom Lane <[EMAIL PROTECTED]> writes: > > The paragraph continues: > > "If the SELECT command included the clause WHERE phone NOT NULL, > > PostgreSQL could use the index to satisfy the ORDER BY clause. > > An index that covers optional (NOT NULL) columns will not be used to > > speed table joins either." > > My goodness, it seems to be a veritable fount of misinformation :-( > > I wonder how much of this is stuff that is true for Oracle and they just > assumed it carried over? The first part is true for Oracle. You have to add the WHERE phone NOT NULL to convince Oracle it can use an index. Or just make the column NOT NULL to begin with I think. However as far as I recall the second part is not true. Oracle is smart enough to realize that an equijoin clause implies NOT NULL and therefore allows it to use the index. (This may have all changed in Oracle 9+. The last I saw of Oracle was 8i) I wonder if they just tried explain on a bunch of queries and noticed that postgres wasn't using an index for SELECT * FROM foo ORDER BY bar and came up with explanations for the patterns they saw? -- greg ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [SQL] COUNT(*) to find records which have a certain number of dependencies ?
T E Schmitz <[EMAIL PROTECTED]> writes: > I want to select only those BRAND/MODEL combinations, where the MODEL has more > than one TYPE, but only where one of those has TYPE_NAME='xyz'. > I am not interested in MODELs with multiple TYPEs where none of them are called > 'xyz'. There are lots of approaches to this with various pros and cons. The simplest one off the top of my head: select * from brand join model on (brand_pk = brand_fk) where exists (select 1 from type where model_fk = model_pk and type_name = 'xyz') and (select count(*) from type where model_fk = model_pk) > 1 You could try to be clever about avoiding the redundant access to the type table: select * from brand join model on (brand_pk = brand_fk) where (select count(*) from type where model_fk = model_pk having sum(case when type = 'xyz' then 1 else 0 end) >= 1 ) > 1 I'm haven't tested that, it might need some tweaking. In any case I don't think it's worth the added complexity, assuming you have indexes on type. I'm not even sure it would run faster. You could try to be really clever about it by turning the whole thing into a join: select * from brand join model on (brand_pk = brand_fk) join (select model_fk from type group by model_fk having sum(case when type = 'xyz' then 1 else 0 end) >= 1 and count(*) > 1 ) on (model_fk = model_pk) This would let the planner have a more plans to choose from and might be a big win if there are lots of brands and models but few that satisfy the criteria you're looking for. -- greg ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [SQL] COUNT(*) to find records which have a certain number of
T E Schmitz <[EMAIL PROTECTED]> writes: > SELECT > BRAND_NAME,MODEL_NAME ... > intersect ... Huh, I never think of the set operation solutions. I'm curious how it compares speed-wise. -- greg ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [SQL] COUNT(*) to find records which have a certain number of dependencies ?
T E Schmitz <[EMAIL PROTECTED]> writes: > ) as somealias on (model_fk = model_pk) > > (subquery in FROM must have an alias) ARGH! This is one of the most annoying things about postgres! It bites me all the time. Obviously it's totally insignificant since it's easy for my to just throw an "AS x" on the end of it. But damn. I see there's a comment foreseeing some annoyance value for this in the source: /* * The SQL spec does not permit a subselect * () without an alias clause, * so we don't either. This avoids the problem * of needing to invent a unique refname for it. * That could be surmounted if there's sufficient * popular demand, but for now let's just implement * the spec and see if anyone complains. * However, it does seem like a good idea to emit * an error message that's better than "syntax error". */ So where can I officially register my complaint? :) -- greg ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] JOIN performance
Tom Lane <[EMAIL PROTECTED]> writes: > Fixing this properly is a research project, and I haven't thought of any > quick-and-dirty hacks that aren't too ugly to consider :-( Just thinking out loud here. Instead of trying to peek inside the CASE couldn't the optimizer just wrap the non-strict expression in a conditional that tests whether the row was found? -- greg ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [SQL] [GENERAL] need ``row number``
Tom Lane <[EMAIL PROTECTED]> writes: > Karsten Hilbert <[EMAIL PROTECTED]> writes: > > I am not convinced I'll need a SRF. I am not trying to > > calculate something that isn't there yet. I am just trying to > > join two views appropriately. I might have to employ some > > variant of Celko's integer helper table but I'm not sure how > > to proceed. > > A fairly common hack for this is to use a sequence: > > create temp sequence tseq; > select nextval('tseq'), * from (select order by ...) ss; But I thought she wanted to get the row number within a group. Not the row number for the entire result set. A sequence can't do that. Or at least, I suppose it could but it would be mighty strange to see setval() in a SELECT query. And I can't think of how to detect the level break in a select query either. -- greg ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] Inserting into table only if the row does not already exist.
"C. Bensend" <[EMAIL PROTECTED]> writes: > INSERT INTO table ( column1, column2, column3 ) >SELECT column1, column2, column3 >WHERE NOT EXISTS ( > SELECT column1, column2, column3 FROM table WHERE > column1 = $column1 AND > column2 = $column2 AND > column3 = $column3 ) > >.. which gave me 'ERROR: column1 does not exist'. Nuts. Well you're not selecting from any table so "column1" isn't going to exist. You just have to put it in the select list as a constant. If you're feeling generous to the next programmer to read it you could put "AS column1" after each one, but the column name doesn't actually have to match the column you're inserting into. INSERT INTO table ( column1, column2, column3 ) ( SELECT $column1, $column2, $column3 WHERE NOT EXISTS ( SELECT 1 FROM table WHERE column1 = $column1 AND column2 = $column2 AND column3 = $column3 ) ) Note that this is going to have some concurrency issues. I think it will be possible for a second query to execute before the first commits. In that case it won't see the record the first query inserted and try to insert again. You'll just get a primary key violation though which I guess you can just ignore. Which raises a question. Why not forgoe this complicated SQL and try to do the insert. If you get a primary key violation, well there's your answer... If you don't care about the failure just ignore it and move on. I would suggest checking specifically for a primary key violation and still stopping execution on unexpected errors though. If you're doing this inside a bigger transaction that's a bit more of a pain. Until 8.0 postgres can't handle ignoring an error on a query without aborting the entire transaction. But if you're in autocommit mode then you can just ignore the primary key violation and continue. Incidentally, if you're putting your parameters directly into your queries using $column1 then you've got a potential security problem. Unless you're quoting every variable everywhere religiously using postgres's quoting functions an attacker can sneak extra SQL into your queries. Potentially including whole new statements such as "DELETE FROM table"... -- greg ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [SQL] Inserting into table only if the row does not already
"C. Bensend" <[EMAIL PROTECTED]> writes: > The risk of a cron gone wild is acceptable to me at this moment. Gee, now I have images of late-night advertisements for bofh-porn video tapes of Cron Jobs Gone Wild(tm) dancing through my head... thanks. -- greg ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [SQL] help on a query
Michelle Murrain <[EMAIL PROTECTED]> writes: > The OUTER JOIN version is quite a bit more efficient (by an order of magnitude) > than the option with WHERE NOT EXISTS subquery. This is going to be heavily dependent on the version of postgres. IN/NOT IN execution has improved a lot in 7.4 and later. If you're still on 7.3 then the outer join will be better. But in 7.4 I would actually expect the NOT IN or the NOT EXISTS to be faster. -- greg ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [SQL] Cross tabulations
"Muhyiddin A.M Hayat" <[EMAIL PROTECTED]> writes: > Dear all, > > I need to do something similar to a cross tabulation, but without any > aggregation. join your table to itself four times: select * from (select check_time::date as date, employee_id, check_time-check_time::date as in from test where state = 'In') as a join (select check_time::date as date, employee_id, check_time-check_time::date as break_out from test where state = 'Break Out') as b using (employee_id,date) join (select check_time::date as date, employee_id, check_time-check_time::date as break_in from test where state = 'Break In') as d using (employee_id,date) join (select check_time::date as date, employee_id, check_time-check_time::date as out from test where state = 'Out') as e using (employee_id,date) ; Note that this will do strange things if you don't have precisely four records for each employee. Alternatively use subqueries: select date, employee_id, (select check_time-check_time::date from test where employee_id = x.employee_id and check_time::date = date and state = 'In') as in, (select check_time-check_time::date from test where employee_id = x.employee_id and check_time::date = date and state = 'Break Out') as break_out, (select check_time-check_time::date from test where employee_id = x.employee_id and check_time::date = date and state = 'Break In') as break_in, (select check_time-check_time::date from test where employee_id = x.employee_id and check_time::date = date and state = 'Out') as out from (select distinct employee_id, check_time::date as date from test) as x; This will at least behave fine if there are missing records and will give an error if there are multiple records instead of doing strange things. Neither of these will be particularly pretty on the performance front. -- greg ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [SQL] vacuum analyze slows sql query
patrick ~ <[EMAIL PROTECTED]> writes: > I noticed that a freshly created db with freshly inserted data (from > a previous pg_dump) would result in quite fast results. However, > after running 'vacuum analyze' the very same query slowed down about > 1250x (Time: 1080688.921 ms vs Time: 864.522 ms). If it gets slower immediately after a single vacuum analyze then the problem is that one of the queries is getting planned wrong when stats are available. If it takes a while to slow down then it could be other problems such as index bloat etc. Don't use "explain verbose" use "explain analyze". I don't know if it was in 7.1 but you say you reproduced the problem with 7.4. It would be helpful to see the results of "explain analyze select ..." on the query before and after the vacuum analyze. -- greg ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] Simple SQL Question
Tomasz Myrta <[EMAIL PROTECTED]> writes: > > select * from table1 LIMIT x > > gives me the first x row of the result. > > After that, I save the last value, and next time, I adjust > > the query as > > select * from table1 where itemkey>:lastvalue LIMIT x > > Why do you complicate it so much? Everything you need is: > > select * from table1 LIMIT x > select * from table1 LIMIT x OFFSET x > select * from table1 LIMIT x OFFSET 2*x > > Remember to sort rows before using limit/offset. There are two good reasons to prefer his Andras' solution to yours. a) If the data is modified between the two queries his will continue from where the previous page left off. Yours will either skip records or overlap with the previous page. b) If itemkey is indexed his will be an efficient index scan that performs similarly regardless of what page is being fetched. Yours will perform more and more slowly as the user gets deeper into the results. Note that both queries are wrong however. You need an "ORDER BY itemkey" or else nothing guarantees the second page has any relation at all to the first page. -- greg ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] Comparing Dates
"Thomas F.O'Connell" <[EMAIL PROTECTED]> writes: > select 2004-06-08; > ?column? > -- > 1990 > > I'm not exactly sure how the bare string is converted internally, but it's > clearly not a complete date like you're expecting. What string? That's just integer arithmetic. -- greg ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [SQL] Making dirty reads possible?
[EMAIL PROTECTED] writes: > But not possible for real at the moment? > > So, summarising: > - Nested transactions is not (yet) supported > - READ UNCOMMITTED isolation level is not (yet) supported > - the EXECUTE plpgsql construct does not circumvent the transaction Well nested transactions are in 8.0 but I don't think they help you much. I find I've been stymied using server-side functions for large batch jobs for pretty much the same reason. I find it works better and it's more flexible to write client-side programs in the language of my choice that connect to the database and do the batch jobs. They can output progress logs or keep information about their progress in some shared space. They can also control the transaction more freely committing in the middle of the job if it's safe. -- greg ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [SQL] replacing mysql enum
Ian Barwick <[EMAIL PROTECTED]> writes: > What I still don't quite understand is why IN in a CHECK context is > handled differently to say: select 1 where 'x' in (null,'a','b','c') ? > This could be a bit of a gotcha for anyone constructing a constraint > similar to the original poster's and not realising it has no effect. well WHERE foo IN (null, ...) returns null if foo isn't explicitly in the list (ie, "it may or may not equal the unknown value in the list"). And I think constraints that return null are deemed to have succeeded. -- greg ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [SQL] Column with recycled sequence value
Andrew Sullivan <[EMAIL PROTECTED]> writes: > You can set the sequence up to cycle (so once it gets to the end, it > wraps around to the beginning again). The keyword is CYCLE at CREATE > SEQUENCE time. It defaults to NO CYCLE. > > One potential problem, of course, are collisions on the table, > because some value wasn't cleared out. It sounds like you don't have > that problem though. Alternatively you can go through the database and make sure all the foreign keys are declared and marked ON UPDATE CASCADE. Then go through and renumber all your entries sequentially starting at 1 and reset your sequence. I'm not sure this is such a hot idea really. But I don't really like the idea of letting the sequence wrap around much either. You'll have to consider the pros and cons of each approach (and of just moving to bigserial too). If you're going to do this you'll want an index on all the foreign key columns. That is, the columns referring to this value from other tables. Otherwise the automatic updates would be very slow. And will probably want to schedule down-time for this. Otherwise application code that holds values in local state might get very confused. I think I would do it with a program that connects and updates each record individually and commits periodically rather than with a single big update. Just because I like having control and having things that give me progress information and can be interrupted without losing work. Oh, and this won't work if you have any external references to these values from outside your database. Say if the value is something like a customer account number that you've previously sent to customers... -- greg ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [SQL] OFFSET impact on Performance???
"Andrei Bintintan" <[EMAIL PROTECTED]> writes: > > If you're using this to provide "pages" of results, could you use a cursor? > What do you mean by that? Cursor? > > Yes I'm using this to provide "pages", but If I jump to the last pages it goes > very slow. The best way to do pages for is not to use offset or cursors but to use an index. This only works if you can enumerate all the sort orders the application might be using and can have an index on each of them. To do this the query would look something like: SELECT * FROM tab WHERE col > ? ORDER BY col LIMIT 50 Then you take note of the last value used on a given page and if the user selects "next" you pass that as the starting point for the next page. This query takes the same amount of time no matter how many records are in the table and no matter what page of the result set the user is on. It should actually be instantaneous even if the user is on the hundredth page of millions of records because it uses an index both for the finding the right point to start and for the ordering. It also has the advantage that it works even if the list of items changes as the user navigates. If you use OFFSET and someone inserts a record in the table then the "next" page will overlap the current page. Worse, if someone deletes a record then "next" will skip a record. The disadvantages of this are a) it's hard (but not impossible) to go backwards. And b) it's impossible to give the user a list of pages and let them skip around willy nilly. (If this is for a web page then specifically don't recommend cursors. It will mean you'll have to have some complex session management system that guarantees the user will always come to the same postgres session and has some garbage collection if the user disappears. And it means the URL is only good for a limited amount of time. If they bookmark it it'll break if they come back the next day.) -- greg ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] [SQL] OFFSET impact on Performance???
Alex Turner <[EMAIL PROTECTED]> writes: > I am also very interesting in this very question.. Is there any way to > declare a persistant cursor that remains open between pg sessions? > This would be better than a temp table because you would not have to > do the initial select and insert into a fresh table and incur those IO > costs, which are often very heavy, and the reason why one would want > to use a cursor. TANSTAAFL. How would such a persistent cursor be implemented if not by building a temporary table somewhere behind the scenes? There could be some advantage if the data were stored in a temporary table marked as not having to be WAL logged. Instead it could be automatically cleared on every database start. -- greg ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] same question little different test MSSQL vrs Postgres
"Joel Fradkin" <[EMAIL PROTECTED]> writes: > I also tried a simple select * from tblcase where clientum = 'SAKS' Try: explain analyze select * from tblcase where clientum = 'SAKS' Send the output. -- greg ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [SQL] same question little different test MSSQL vrs Postgres
"Joel Fradkin" <[EMAIL PROTECTED]> writes: > QUERY PLAN > "Seq Scan on tblcase (cost=0.00..30066.21 rows=37401 width=996) (actual > time=0.344..962.260 rows=22636 loops=1)" > " Filter: ((clientnum)::text = 'SAKS'::text)" > "Total runtime: 1034.434 ms" Well that says it only took 1s. So it seems this is highly dependent on whether the data is in cache. Perhaps it was in cache on MSSQL when you profiled it there and not on postgres? You could put an index on clientnum, but if the data is usually in cache like this it might not even be necessary. -- greg ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [SQL] same question little different test MSSQL vrs Postgres
"Joel Fradkin" <[EMAIL PROTECTED]> writes: > I tried the SET ENABLE_SEQSCAN=FALSE; > And the result took 29 secs instead of 117. > > After playing around with the cache and buffers etc I see I am no longer > doing any swapping (not sure how I got the 100 sec response might have been > shared buffers set higher, been goofing around with it all morning). If it's swapping you're definitely going to get bad results. You really want the *majority* of RAM left free for the OS to cache disk data. > My worry here is it should obviously use an index scan so something is not > setup correctly yet. I don't want to second guess the analyzer (or is this a > normal thing?) No that's not obvious. 22k out of 344k is a selectivity of 6.6% which is probably about borderline. The optimizer is estimating even worse at 10.9% which isn't far off but puts it well out of the range for an index scan. If you really want to get postgres using an index scan you'll have to a) improve the estimate using "alter table tblcase alter column clientnum set statistics" to raise the statistics target for that column and reanalyze. And b) lower random_page_cost. random_page_cost tells postgres how much slower indexes are than table scans and at the default setting it accurately represents most disk hardware. If your database fits within RAM and is often cached then you might have to lower it to model that fact. But you shouldn't do it based on a single query like this. -- greg ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] order by question
Gary Stainburn <[EMAIL PROTECTED]> writes: > > Alternatively: (a<>6),(a<>4),a > > Although this does exactly what I want, at first glance it should do > exactly the opposite. > > I'm guessing that for each line it evaluates > not (a=6) 0 for true else 1 Not really, "not a=6" is an expression that evaluates to a boolean, true or false. true sorts as "greater" than false. That order is counterintuitive but it's because the default sort order is ascending. So the "lesser" false records appear first. If you put "not a=6" in your select column list you'll see the true and false values appear. -- greg ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [SQL] Consecutive row count query
Leon Stringer <[EMAIL PROTECTED]> writes: > Hi, > > I wondered if anyone could answer the following question: > > If I have a table such as the one below: > > col1 col_order > --- > Apple 1 > Apple 2 > Orange 3 > Banana 4 > Apple 5 > > Is there a way I can get the following results: > > Apple 2 > Orange 1 > Banana 1 > Apple 1 Maybe. But not easily or efficiently. How about this: SELECT a.col1, a.col_order FROM tab as a LEFT OUTER JOIN tab as b ON (b.col_order = a.col_order+1 AND b.col1=a.col1) WHERE b.col1 IS NULL > But since (in my intended table) most rows will have col_count = 1, this > seems like unnecessary normalization (and semantically "wrong"). I think this looks like a better option. "unnecessary normalization" is an odd phrase. Unless you can point at some reason that the denormalized seems *more* convenient --and much *more* convenient at that-- not less convenient then you should go for it. Besides, that col_count column's only going to be four bytes. Unless the "Apple" data is really short it'll only take a few col_count>1 to make it worthwhile. The only reason you might have a problem is if it's really "semantically wrong" which would be if there's data attached to Apple or Orange that might be different from one streak of results to the other. -- greg ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [SQL] Date/Time Conversion
James G Wilkinson <[EMAIL PROTECTED]> writes: > I hope that this is some silly beginner's mistake. I have spent quite a bit > of > time > reading the PostgreSQL documentation and cannot find my error. I have also > scanned the PostgreSQL archive and the web for help, but I have not found > anything > to get me over the hump (it is probably out there, I just cannot find it). > > All I am trying to do is convert GMT dates and times to an arbitrary > time zone. Here is my small test database: I think the closest thing to what you're trying to do is: slo=> set timezone = 'GMT'; SET slo=> select '2001-01-01'::date::timestamptz at time zone 'EST' ; timezone - 2000-12-31 19:00:00 (1 row) Why are you not just storing a "timestamp with timezone" with the actual time of the event, and a second column with the local time zone in which you can choose to use to display the time? -- greg ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [SQL] ignore single character in SELECT query?
[EMAIL PROTECTED] writes: > So basically I want to ignore a single character (the apostrophe > character), anywhere in the middle of my search word, in selecting > results. How can I do this? WHERE replace(name,,'') like '%dont%' Beware of quoting issues if "dont" is coming from user supplied inputs. -- greg ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [SQL] row-attribute in EXPLAIN-output doesn't match count(*)
Andrew Sullivan <[EMAIL PROTECTED]> writes: > On Thu, Apr 14, 2005 at 07:21:38PM +0200, Andreas Joseph Krogh wrote: > > > > So, what you're suggesting is that a restart of the webapp should make > > vacuum > > able to delete those dead rows? > > Yes, but that'll only solve your problem for now. You'll have the > problem again soon. What's keeping open the transaction? This is presumably because of the long-standing issue that Postgres takes the snapshot as soon as the BEGIN is issued. A lot of drivers issue a "COMMIT; BEGIN;" right away even though it could be a long time before any actual work is done. Other databases (by which I mean Oracle) treat BEGIN as a noop. The snapshot starts when the first SQL statement that needs a snapshot is executed. So until a SELECT is issued the connection doesn't participate in any transactional issues like keeping old versions of records around. -- greg ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [SQL] row-attribute in EXPLAIN-output doesn't match count(*)
Tom Lane <[EMAIL PROTECTED]> writes: > We have to start the transaction no later than event #2 since there has > to be something to hold the lock. But it'd be easy enough to decouple > this from BEGIN, and it'd be good enough to solve the "COMMIT;BEGIN" > problem. Oh I think I finally figured out what you're saying here. So vacuum doesn't really know what tuples are actually visible to the snapshots actually taken by a transaction? It's making the conservative estimate that a snapshot could have been taken as early as the start of the transaction even if no snapshot was taken until later? > Which of these three times do you think now() ought to correspond to? > I recall having argued that it should be event #3 since that corresponds > to the database snapshot you see. 100% backwards compatibility would > require setting now() at event #1, but will anyone weep if we change that? I think it would be weird to perform a select and see records with dates after now(). It would also be weird to perform a select and not see records inserted before now(). I'm not sure any of the above guarantees those criteria for READ COMMITTED mode, but I think I'm on the same general path as you. -- greg ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [SQL] can you change an FK constraint from NOT DEFERRABLE to DEFERRABLE
Bruno Wolff III <[EMAIL PROTECTED]> writes: > It does look like you can only ADD and DROP constraints, not directly > alter or replace them. So making a reference deferable is go to require > a DROP and ADD which will need to recheck the constraint. I asked the same question a few days ago on pgsql-general. In short, if you want to skip the rechecking you have to update system tables directly and you have to do two of them. The updates you want would look something like these. But these would do *all* your constraints, make sure to get only the ones you really want to change: update pg_constraint set condeferrable = 't' where contype = 'f' update pg_trigger set tgdeferrable=true where tgisconstraint = true I think an ALTER CONSTRAINT to change these settings as well as the ON {UPDATE,DELETE} behaviour would be neat. -- greg ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [SQL] interesting SQL puzzle - concatenating column with itself.
Nick Fankhauser <[EMAIL PROTECTED]> writes: > Alvaro Herrera wrote: > > > The order is not really guaranteed, though if this is a one-shot thing, > > you may get away with turning off hashed aggregates. > > > > When I read this, I assumed there was a runtime parameter I could set that was > similar to ENABLE_HASHJOIN. Are you referring to a different runtime parameter > or something else entirely? Similar but different. enable_hashagg. You can see all of these with "show all" in psql. However you do not have to worry. Even with (actually, *especially with*) hash aggregates the records will be processed in the order they're received. It's actually the normal aggregate method of sorting on the GROUP BY columns that risks damaging the order. However Postgres does a special check to see if the subquery is already sorted by the GROUP BY column and avoids the extra sort which could cause you problems. So this is not guaranteed by the SQL spec to work (but then the SQL spec doesn't have custom aggregates at all) but Postgres goes out of its way to make sure this doesn't break unnecessarily. [This is all for 8.0 and I think 7.4. Some versions before that might -- greg ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] SELECT * FROM foo OFFSET -1 LIMIT 1
Markus Bertheau <[EMAIL PROTECTED]> writes: > Offset for negative numbers means 0, as it seems. I think there is a > sensible meaning for negative offset numbers and wondered, what > arguments led to negative offsets being processed as 0 offset. Frankly I'm surprised it's not a syntax error. -- greg ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [SQL] ARRAYs and INDEXes ...
"Marc G. Fournier" <[EMAIL PROTECTED]> writes: > SELECT * FROM customers WHERE monthly_balance[6] = 0.00; This, like the other poster said, can be accomplished with a set of simple expression indexes. > As an example ... or > > SELECT * FROM customers WHERE 0.00 = any (monthly_balance); This would require a GiST index. Look at the intarray contrib module. I don't think there's any equivalent for other data types. You might have to store these values as fixed precision numbers and divide or multiple by 100 to convert. -- greg ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [SQL] PL/SQL Function: self-contained transaction?
"Marc G. Fournier" <[EMAIL PROTECTED]> writes: > In PostgreSQL, as everyone knows, a QUERY == a transaction, unless wrap'd in a > BEGIN/END explicitly ... how does that work with a function? is there an > implicit BEGIN/END around the whole transaction, or each QUERY within the > function itself? The whole outer query issued from your frontend is in one transaction. > If the whole function (and all QUERYs inside of it) are considered one > transaction, can you do a begin/end within the function itself to 'force' > commit on a specific part of the function? Functions cannot issue start or end transactions. They're a creature of the transaction you're in when you call them. Otherwise it wouldn't make sense to be able to call them from within a query. There is some discussion of "stored procedures" which would live outside of transactions and be able to create transactions, commit, and roll them back. But I don't think any of that work is committed yet. I'm not even sure it's been written yet. -- greg ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [SQL] uuid type (moved from HACKERS)
Josh Berkus writes: > Mark, Nathan, > > I'm moving this over to the PGSQL-SQL list, away from -hackers, as it's no > longer a -hackers type discussion. Hope you don't mind! > > > On Wed, Sep 07, 2005 at 11:31:16AM -0700, Josh Berkus wrote: > > > I'm also a little baffled to come up with any real application where > > > making an id number for most tables "unguessable" would provide any > > > kind of real protection not far better provided by other means. For > > > your "users" table, sure, but that's a very special case. > > > > It should never be the sole means of defense, however, it can be quite > > effective at prevention. > > > > For a rather simple example, consider a site that associates a picture > > with each member. If the pictures are named 1.jpg, 2.jpg, 3.jpg, etc. > > it makes it ridiculously easy to write a script to pull all of the > > pictures off the site. This can be bothersome, as the only type of > > person who would do this, is the type of person with an illegitimate > > motivation. I want the data to be easily and freely accessible as > > specific objects, but I do not wish to provide an easy way of > > dumping all of the data as a unit. Of course you could have just done the same thing using an hmac (or a simple hash like crypt) and not had to store an extraneous meaningless piece of information in your database. -- greg ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [SQL] MOVE in SQL vs PLPGSQL
Tom Lane <[EMAIL PROTECTED]> writes: > Richard Huxton writes: > > Ah, now I see what you're saying. You're quite right in your suspicions, > > "MOVE..." isn't supported for plpgsql cursors. You could probably do > > something with EXECUTE and returning a refcursor from a previous > > function, but that sounds fiddly. > > > I must admit, on the odd occasion I want to skip a row, I just FETCH it > > and move on. Anyone else? > > There is something on the TODO list about improving plpgsql's cursor > functionality --- there's no reason it shouldn't have MOVE, except that > no one got around to it yet. Though the original poster should realize, a MOVE command would be only marginally more efficient than just fetching those records. It would save the network overhead and context switches involved in communicating those records, but there's no way it would let the server avoid reading all those records from disk. At least as far as I can see. -- greg ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [SQL] pg, mysql comparison with "group by" clause
Stephan Szabo <[EMAIL PROTECTED]> writes: > On Tue, 11 Oct 2005, Rick Schumeyer wrote: > > > I'm not sure what I was thinking, but I tried the following query in pg: > > > > SELECT * FROM t GROUP BY state; > > > > pg returns an error. > > > > Mysql, OTOH, returns the first row for each state. (The first row with > > "AK", the first row with "PA", etc.) > > > > I'm no SQL expert, but it seems to me that the pg behavior is correct, and > > the mysql result is just weird. Am I correct? > > In your case, it sounds like the mysql result is wrong. I believe SQL99 > would allow it if the other columns were functionally dependant upon state > (as there'd by definition only be one value for the other columns per > group). I believe this is a documented feature. MySQL treats "select a,b from t group by a" equivalently to Postgres's "select distinct on (a) a,b from t" I suppose "equivalent" isn't quite true. It's more general since it allows aggregate functions as well. The equivalently general Postgres syntax is to have a first() aggregate function and do "select a,first(b) from t group by a". I'm sure it's very convenient. -- greg ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [SQL] Text->Date conversion in a WHERE clause
Collin Peters <[EMAIL PROTECTED]> writes: > I have a table that has some columns which store 'custom' fields so the > content varies according to the user that the row belongs to. For one > of the groups of users the field is a date (the type of the field is > 'text' though). I'm trying to perform a query where it only returns > values in a certain date range so in the WHERE clause I have > > WHERE cust3 <> '' > AND cust3::text::timestamp > CURRENT_DATE - interval '1 month' > > This results in the error 'ERROR: date/time field value out of range: > This results in the error 'ERROR: date/time field value out of range: > "052-44-5863"'. Now that is obviously not a valid date but there > is actually more to the where clause and the first part of it excludes > all rows where the user is not even the correct type, so the row which > includes the field '052-44-5863' should really not even be checked. I think you have to use a CASE expression like: WHERE CASE WHEN user_type = 1 THEN cust3::timestamp > CURRENT_DATE - interval '1 month' ELSE false END CASE AND ... There's no advantage to doing this kind of thing though. Good database design principles dictate having one column for each piece of data. Just leave the columns for which the data is inappropriate NULL. NULLs take effectively no space. -- greg ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] pg, mysql comparison with "group by" clause
Scott Marlowe <[EMAIL PROTECTED]> writes: > Hehe. When I turn on my windshield wipers and my airbag deploys, is it > a documented "feature" if the dealership told me about this behaviour > ahead of time? Well it's more like my car where the dashboard dims when I turn on my headlights which annoys me to no end since I learned to always put my headlights on even in the day. > In much the same way, while this behaviour may be documented by MySQL, I > can't imagine it really being called a feature. But at least this > misbehaviour is documented. However, I think most people in the MySQL > universe just stumble onto it by accident when they try it and it works. I'd > at least prefer it to throw a warning or notice or something. I don't see why you think people stumble on this by accident. I think it's actually an extremely common need. So common that Postgres has the same feature (though less general) and invented a whole syntax to handle it. I think most MySQL users don't stumble on it, they learn it as the way to handle the common use case when you join a master table against a detail table and then want to aggregate all the detail records. In standard SQL you have to write GROUP BY ... and list every single column you need from the master table. Forcing the database to do a lot of redundant comparisons and sort on uselessly long keys where in fact you only really need it to sort and group by the primary key. Remember, most MySQL users learn MySQL first, and only later learn what is standard SQL and what isn't. > A Subselect would let you do such a thing as well, and while it's more > complicated to write, it is likely to be easier to tell just what it's > doing. Subselects have their own problems here. Mainly Postgres's optimizer, as good as it is, doesn't treat them with the same code paths as joins and can't find all the same plans for them. But in any case you cannot always write a subselect that's equivalent to an arbitrary join. -- greg ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] pg, mysql comparison with "group by" clause
"Anthony Molinaro" <[EMAIL PROTECTED]> writes: > By changing the values in the select/group by you are changing > Group! How can you arbitrarily add or exclude a column? > You can't do it. Go back and reread the previous posts again. You missed the whole point. -- greg ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [SQL] pg, mysql comparison with "group by" clause
"Anthony Molinaro" <[EMAIL PROTECTED]> writes: > Greg, > You'll have to pardon me... > > I saw this comment: > > "I don't see why you think people stumble on this by accident. > I think it's actually an extremely common need." > > Which, if referring to the ability to have items in the select that do not > need to be included in the group, (excluding constants and the like) is just > silly. Well the "constants and the like" are precisely the point. There are plenty of cases where adding the column to the GROUP BY is unnecessary and since Postgres makes no attempt to prune them out, inefficient. And constants aren't the only such case. The most common case is columns that are coming from a table where the primary key is already included in the GROUP BY list. In the case of columns coming from a table where the primary key is already in the GROUP BY list it's possible for the database to deduce that it's unnecessary to group on that column. But it's also possible to have cases where the programmer has out of band knowledge that it's unnecessary but the database doesn't have that knowledge. The most obvious case that comes to mind is a denormalized data model that includes a redundant column. select dept_id, dept_name, count(*) from employee_list For example if dept_name is guaranteed to be the same for all records with the same dept_id. Of course that's generally considered poor design but it doesn't mean there aren't thousands of databases out there with data models like that. -- greg ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] pg, mysql comparison with "group by" clause
Scott Marlowe <[EMAIL PROTECTED]> writes: > Sorry, but it's worse than that. It is quite possible that two people > could run this query at the same time and get different data from the > same set and the same point in time. That shouldn't happen accidentally > in SQL, you should know it's coming. I'm pretty unsympathetic to the "we should make a language less powerful and more awkward because someone might use it wrong" argument. > > In standard SQL you have to > > write GROUP BY ... and list every single column you need from the master > > table. Forcing the database to do a lot of redundant comparisons and sort on > > uselessly long keys where in fact you only really need it to sort and group > > by > > the primary key. > > But again, you're getting whatever row the database feels like giving > you. A use of a simple, stupid aggregate like an any() aggregate would > be fine here, and wouldn't require a lot of overhead, and would meet the > SQL spec. Great, so I have a user table with, oh, say, 40 columns. And I want to return all those columns plus their current account balance in a single query. The syntax under discussion would be: select user.*, sum(money) from user join user_money using (user_id) group by user_id You would prefer: select user_id, any(username) as username, any(firstname) as firstname, any(lastname) as lastname, any(address) as address, any(city) as city, any(street) as street, any(phone) as phone, any(last_update) as last_update, any(last_login) as last_login, any(referrer_id) as referrer_id, any(register_date) as register_date, ... sum(money) as balance, count(money) as num_txns from user join user_money using (user_id) group by user_id Having a safeties is fine but when I have to disengage the safety for every single column it starts to get more than a little annoying. Note that you cannot write the above as a subquery since there are two aggregates. You could write it as a join against a view but don't expect to get the same plans from Postgres for that. > Actually, for things like aggregates, I've often been able to improve > performance with sub selects in PostgreSQL. If your experience is like mine it's a case of two wrongs cancelling each other out. The optimizer underestimates the efficiency of nested loops which is another problem. Since subqueries' only eligible plan is basically a nested loop it often turns out to be faster than the more exotic plans a join can reach. In an ideal world subqueries would be transformed into the equivalent join (or some more general join structure that can cover both sets of semantics) and then planned through the same code path. In an ideal world the user should be guaranteed that equivalent queries would always result in the same plan regardless of how they're written. -- greg ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [SQL] pg, mysql comparison with "group by" clause
"Anthony Molinaro" <[EMAIL PROTECTED]> writes: > More awkward? What *you're* suggesting is more awkward. You realize that > right? How can syntax that is understood and accepted for years be more > awkward? Well gosh, I would say that that's something only a newbie could say about SQL of all things... I had a whole thing written after that but I just deleted it. I grow tired of this thread. I am pretty happy to hear that the SQL standard endorsed the idea having the right thing happen if the primary key is present in the grouping list. That would be a wonderful feature for Postgres. -- greg ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] Storing HTML in table
Rob <[EMAIL PROTECTED]> writes: > I would like complete control over this information -- so if sometime in the > future it's decided to totally redesign the layout. Also, at some point a > tool will be created so novice computer users can enter nicely formatted > markup -- meaning you won't have to know HTML to use this tool. You should go look at some of the XML database books out there and maybe you'll find a tool to do what you want. But beware, there's a lot of snake-oil in this field. Anyone who promises a one-size-fits-all solution to every problem is probably deluded. The problem is hard. There's an inherent tension between being able to easily manipulate parts of the data and flexibility in the type of data you can represent. One one side is the traditional relational database model where every record must contain precisely the same fixed list of predetermined columns. One the other side is the decidely non-relational model where you just store a large hunk of xml text. What you're looking for is one of the many systems that attempt to bridge these two extremes and provide the best of both worlds. -- greg ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] "large" IN/NOT IN subqueries result in query returning wrong data
elein <[EMAIL PROTECTED]> writes: > > Note that the above are not inverses because you changed the lefthand > > input. You do get consistent results when you just add or omit NOT: > Yes, you are right. I skipped the permutations to get down to the point. Remember that NULL means "unknown". So "1 IN (1,2,NULL)" *should* be true because regardless of what that unknown value is it's still obvious that 1 really is in the list. And "3 NOT IN (1,2,NULL)" is unknown because it depends on whether that unknown quantity is 3 or not. IN is the same as "= ANY" so "1 IN (1,2,NULL)" is the same as "1=1 OR 1=2 OR 1=NULL" which is true even though the last of the three is null. -- greg ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [SQL] How to implement Microsoft Access boolean (YESNO)
"Jesper K. Pedersen" <[EMAIL PROTECTED]> writes: > Having checked the I/O format it seems that MS Access exports the > values of a YESNO field as 0 and 1 If only Postgres's boolean type were as helpful. -- greg ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [SQL] How to implement Microsoft Access boolean (YESNO)
Tom Lane <[EMAIL PROTECTED]> writes: > Greg Stark <[EMAIL PROTECTED]> writes: > > "Jesper K. Pedersen" <[EMAIL PROTECTED]> writes: > >> Having checked the I/O format it seems that MS Access exports the > >> values of a YESNO field as 0 and 1 > > > If only Postgres's boolean type were as helpful. > > There's a cast to int in 8.1, and you can roll-your-own easily in prior > releases ... The annoying thing about is that in just about any client language you'll get 't' and 'f' by default and both will evaluate to false. So any user who tries to do things the obvious way like this will get a surprise: if ($db->query("select canlogin from users where userid = ?",$userid)) { ... } Is there an implicit cast from ints? So if I pass a 0 or 1 argument for a boolean parameter now it'll work? That definitely didn't work in the past. -- greg ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] two count columns?
Jan Danielsson <[EMAIL PROTECTED]> writes: > select from_ip, count(from_ip) as entries, count(select * from log where > ...) as tot_entries, max(ts)::timestamp(0) as last_access from log where > to_port=22 and direction='in' group by from_ip select from_ip, count(from_ip) as entries, (select count(*) from log as l where l.from_ip = log.from_ip ) as tot_entries, max(ts) as last_access from log where to_port=22 and direction='in group by from_ip expect it to be pretty slow though. For every from_ip it has to look up every other entry with that from_ip. > Thankful for any hints or tips. There is a trick you could use to make it faster but it gets cumbersome and pretty tricky to use when you're doing more than one thing at a time: select from_ip sum(case when to_port=22 and direction='in' then 1 else 0 end) as entries, count(*) as tot_entries, max(case when to_port=22 and direction='in' then ts::timestamp(0) else null end) as last_access from log group by from_ip having entries > 0 Note that in either case you might want to look at ANALYZE results for the query and try raising work_mem for this query using SET until you see the plan using a hash aggregate. If it can use a hash aggregate for your query (more likely for the first query than the second) without swapping it'll be faster than sorting. -- greg ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [SQL] view of weekly data
"Ding Xiangguang" <[EMAIL PROTECTED]> writes: > Hi, friend, > > Suppose there is table of daily transaction data with 5 fields, > > time(date), open(float8), high(float8), low(float8), close(float8) > > Is it possible to create a view of weekly data, i.e. open is the first > day'open, high is the highest of the week, low is the lowest of the week, > close > is the last day's close. low and high are easy, they're just min() and max(). so you would get something like: select date_trunc('week', time) as startofweek, min(low) as weeklylow, max(high) as weeklyhigh from dailydata group by date_trunc('week', time) Unfortunately showing the open and close is much much harder. To get them efficiently requires a feature set called OLAP that Postgres doesn't have and isn't likely to get soon. In Postgres 8.1 (and 8.0?) you could actually write some custom aggregate functions using RECORD data type to store the earliest and latest time found so far and the corresponding open and close to get them efficiently. Maybe someone else would be able to show how to do that, I haven't tried it yet. The only way to do it in standardish SQL would be with terribly inefficient subqueries: select date_trunc('week', time) as startofweek, min(low) as weeklylow, max(high) as weeklyhigh, (select open from dailydata as d where date_trunc('week',time)=date_trunc('week',dailydata.time) order by time asc limit 1 ) as weeklyopen, (select close from dailydata as d where date_trunc('week',time)=date_trunc('week',dailydata.time) order by time desc limit 1 ) as weeklyclose from dailydata group by date_trunc('week', time) -- greg ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org