Re: [SQL] Inquiry From Form [pgsql]

2003-03-02 Thread Greg Stark

Jack Kiss <[EMAIL PROTECTED]> writes:

> 1)Do you have a function which is similar to Oracle\'s DECODE.

Yes

> 2) Can you extract day of week (Monday,etc) from yours date functions.

Yes

Check out the "Date/Time Function and Operators" and the "Conditional
Expressions" sections of this:

http://www.ca.postgresql.org/users-lounge/docs/7.3/postgres/functions.html


-- 
greg


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [SQL] HardCORE QUERY HELP!!!

2003-03-03 Thread Greg Stark

Metnetsky <[EMAIL PROTECTED]> writes:

> It's for a class and my professor has a thing for brain teaser type
> questions.

Incidentally, TAs and Profs aren't stupid, and have been known to check on
newsgroups and mailing lists for students asking for people to do their
homework for them.

--
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


Re: [SQL] Forcing query to use an index

2003-03-03 Thread Greg Stark

One suggestion I'll make about your data model -- I'm not sure it would
actually help this query, but might help elsewhere:

 WHERE ( C.Disabled > '2003-02-28'
  OR C.Disabled IS NULL
   )

Don't use NULL values like this. Most databases don't index NULLs (Oracle) or
even if they do, don't make "IS NULL" an indexable operation (postgres).
There's been some talk of changing this in postgres but even then, it wouldn't
be able to use an index for an OR clause like this.

If you used a very large date, like -01-01 as your "not deactivated" value
then the constraint would be C.disabled > '2003-02-28' and postgres could use
an index on "disabled".

Alternatively if you have a disabled_flag and disabled_date then you could
have an index on disabled_flag,disabled_date and uhm, there should be a way to
use that index though I'm not seeing it right now. 

This won't matter at first when 99% of your customers are active. And ideally
in this query you find some way to use an index to find "kate" rather than
doing a fully table scan. But later when 90% of the clients are disabled, then
in a bigger batch job where you actually want to process every active record
it could prevent postgres from having to dig through a table full of old
inactive records.

> This may make better use of your index, because the planner will have a more 
> accurate estimate of the number of rows returned from the outer join.
> 
> AND:
> 
>AND ( C.Accountnum~* 'kate'
>   OR C.Firstname ~* 'kate'
>   OR C.Lastname  ~* 'kate'
>   OR C.Organization  ~* 'kate'
>   OR C.Address   ~* 'kate'
>   OR C.Postal~* 'kate'
>   OR C.City  ~* 'kate'
>   OR EA.Name || '@' || JoinDomain(EA.Domain) ~* 'kate'
> 
> This set of expressions has "seq scan" written all over it.   I hihgly suggest 
> that you try to find a way to turn these into anchored text searches, perhaps 
> using functional indexes on lower(column).

If you really need to find substring matches everywhere you might want to look
into the full text search module in contrib/tsearch. I haven't started using
it yet but I expect I will have to when I get to that part of my project. 

> Finally:
> 
>   OR CMS.Package ~* 'kate'

*confusion*. Oooh, Yeah, this one is a big problem. It means it's not clear
which end of the join to start with. Maybe it would be better to separate this
into two separate queries, give the user the option to search for a user
"kate" or a package "kate" but not both simultaneously.

-- 
greg


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html


Re: [SQL] Forcing query to use an index

2003-03-03 Thread Greg Stark


> ->  Merge Join  (cost=6106.42..6335.30 rows=2679 width=265) 
   (actual time=859.77..948.06 rows=1 loops=1)

Actually another problem, notice the big discrepancy between the estimated row
and the actual rows. That's because you have the big OR clause so postgres
figures there's a good chance one of the clauses will be true so it estimates
a lot of rows will match. In fact of course they're all very selective and
you'll usually probably only get a few records.

If you're stuck with the unanchored text search it will always do a full table
scan so it will never be lightening fast. But it would probably be a bit
faster if you put a limit clause (on a subquery) on the table that's doing the
full table scan. 

That will convince postgres that there won't be thousands of resulting
records, which might convince it to do a nested loop.

Also, as a beneficial side effect will also limit the damage if one your users
does a search for "e"...

This only really helps if you can get rid of the OR CMS.package clause...
otherwise it actually needs all the records in case they match a summary
record with a kate package.

-- 
greg


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html


Re: [SQL] Gist indexes on int arrays

2003-03-04 Thread Greg Stark

Greg Stark <[EMAIL PROTECTED]> writes:

> Can I have a GiST index on (foo_id, attribute_set_array) and have it be just
> as fast at narrowing the search to just foo_id = 900 but also speed up the ~
> operation?

Hm, so if I understand what I'm reading I can do this if I load the btree_gist
contrib module as well. I'm still not sure whether it'll be worthwhile for
this application though.

I have a bit of a problem though. Is building GiST indexes supposed to take
much much longer than building btree indexes? It's been running nearly an hour
and it's still going. The hard drive is hardly moving so it seems to be all
cpu usage. I don't even see any pgsql_tmp usage.

db=# CREATE INDEX cache_gist_idx on cache using gist ( foo_id , attribute_set 
gist__int_ops);

postgres 30176 86.3 22.2 64896 57344 ?   R11:08  40:32 postgres: postgres slo 
[local] CREATE INDEX


I don't remember exact numbers but building the normal btree index took on the
order of 15m. This will have to be rebuilt nightly, an hour long index build
won't be practical.

--
greg


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly


Re: [SQL] Complex outer joins?

2003-03-26 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes:

> The SQL-standard way of writing this would presumably be either
> 
> from G left join L on (G.SELID = L.SELID)
>  left join C on (L.SELID = C.SELID and L.SELLEVEL = C.SELLEVEL)

I would think of it as this one.

> from G left join
>  (L left join C on (L.SELID = C.SELID and L.SELLEVEL = C.SELLEVEL))
>  on (G.SELID = L.SELID)

I don't see how that would be at all different.

> depending on which join you think ought to be done first.  It might be
> that the results are the same in this case, but I'm not convinced of
> that.  In general the results of outer joins definitely depend on join
> order.

I'm pretty sure Oracle actually builds an abstract join representation where
the two queries above would actually be represented the same way. Then decides
the order from amongst the equivalent choices based on performance decisions.

Can you show an example where the join order would affect the result set? I
can't think of any.


-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


Re: [SQL] LEAST and GREATEST functions?

2003-07-01 Thread Greg Stark
Josh Berkus <[EMAIL PROTECTED]> writes:

> Stefan,
> 
> > I know the LEAST and GREATEST functions are not part
> > of standard SQL, but they sure were handy where I came
> > from (Oracle-land).
> 
> Um, what's wrong with MAX and MIN, exactly?

MAX and MIN are single-parameter aggregate functions. LEAST and GREATEST are
two-parameter (though in postgres they could be defined for 3 and more
parameters) scalar functions.

eg:

SELECT max(a) FROM bar 

would return a single tuple with the maximum value of a from amongst every
record. whereas:

SELECT greatest(a,b) FROM bar

would return one tuple for every record in the table with a single value
representing the greater of bar.a and bar.b.

You could define your own functions to do this but it would be tiresome to
define one for every datatype.

-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [SQL] summing tables

2003-07-15 Thread Greg Stark

To solve this problem efficiently you probably need the lead/lag analytic
functions. Unfortunately Postgres doesn't have them.

You could do it with something like:

update foo set c = a+b+(select c from foo as x where seq < foo.seq ORDER BY seq desc 
LIMIT 1) 

or the more standard but likely to be way slower:

update foo set c = a+b+(select c from foo as x where seq = (select max(seq) from foo 
as y where seq < foo.seq))


However, i would suggest that if you have an implicit relationship between
records you should make that relationship explicit with a foreign key. If you
had a column that contained the seq of the parent record then this would be
easy. I'm really puzzled how this query as currently specified could be
useful.


-- 
greg


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [SQL] summing tables

2003-07-15 Thread Greg Stark

"Viorel Dragomir" <[EMAIL PROTECTED]> writes:

> Anyway, in real life this update modifies only one row with a value wich is
> diff of null. It was really handy if it was specified the option ORDER for
> the update command.

Are you hoping to produce a running total? That's very difficult in standard
SQL. That would be very different from the query you asked for. 

Running totals, ranking, lead/lag, are all things that are very difficult to
do in standard SQL. They don't fit in the unordered set model that SQL follows
so doing them without special non-standard functions is very hard and
inefficient. 

The functions to do them don't fit well within the SQL universe either, which
might be why they don't exist yet in postgres.

-- 
greg


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [SQL] Recursive request ...

2003-07-17 Thread Greg Stark
Joe Conway <[EMAIL PROTECTED]> writes:

> BenLaKnet wrote:
> > I see connect by in Oracle
> > ??? is there an equivalent in PostgreSQL or not ??
> 
> Someone is working on the SQL99 equivalent, but it isn't done yet. Perhaps for
> 7.5.

There's a connectby hack in the contrib/tablefunc directory. I haven't used it
so I'm not clear on how powerful it is compared to the real deal, but people
on one of the pgsql lists seemed to find it useful when it came up in the
past.


connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
  - returns keyid, parent_keyid, level, and an optional branch string
  - requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.


-- 
greg


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [SQL] Table Partitioning and Rules

2003-07-17 Thread Greg Stark

"Girish Bajaj" <[EMAIL PROTECTED]> writes:

> I cant possibly index all the cols in the table. So I thought Id best manage
> the data by splitting up the table into multiple partitions and eventually
> depending on application logic, only scan those tables that are necessary to
> scan sequentially instead of the whole big table.

But that's only going to help if one of the columns they're searching on is
the last name column isn't it?

I'm a fan of partitioned tables but you have to consider what advantage you're
trying to achieve to understand if it will actually be helpful for you:


Partitioned tables only really improve query performance if virtually all
queries use a common constraint. The canonical example is accounting tables
being partitioned based on fiscal year. Virtually all the queries--even ones
doing massive batch queries best served by sequential scans--will only scan
the current fiscal year.

In your case unless you can impose a constraint on the UI that users always
perform their queries on a single letter of the alphabet at a time and only
see results for people whose last names match that letter, it's not really a
great match as far as query performance.


The other advantage of partitioned tables is space management; it allows
placing each partition on a separate physical storage space. However without
native support in Postgres doing it via rules is going to be a headache. I
would think you would be better off striping the disks together and storing it
as a single large table. That's the only clean approach Postgres really allows
at this point anyways. 


Finally, if I WAS going to partition based on the first letter of a text
string, which I doubt I would, I would probably create 26 partitions right off
the bat. Not try to make up arbitrary break points. If those arbitrary
breakpoints turn out to be poorly chosen it'll be a complex manual job to move
them. Whereas if you just have 26 partitions some will be large and some small
and you can move partitions between physical storage freely to balance things.

-- 
greg


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [SQL] casting to arrays

2003-07-19 Thread Greg Stark

Joe Conway <[EMAIL PROTECTED]> writes:

> Not possible in current releases, but it will be in 7.4 (about to start beta).
> It looks like this:

Well there is the int_array_aggregate function in the contrib/intagg
directory. It has to be compiled separately, and it has a few quirks (like the
arrays are zero-based instead of 1-based) but it works more or less, and it
does exactly what you describe.

But the 7.4 stuff should be much cleaner and more flexible, so if you don't
need it right now you're better off waiting.

-- 
greg


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [SQL] min() and NaN

2003-07-24 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes:

> NULL can be special, because it acts specially in comparisons anyway.
> But NaN is just a value of the datatype.

Does postgres intend to support all the different types of NaN? Does you
intend to have +Inf and -Inf and underflow detection and all the other goodies
you actually need to make it useful?

If not it seems more useful to just use the handy unknown-value thing SQL
already has and turn NaN into a NULL. 

-- 
greg


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [SQL] numeric and float converts to int differently?

2003-10-28 Thread Greg Stark

"SZÛCS Gábor" <[EMAIL PROTECTED]> writes:

> > cannot see is that the float values are not actually exactly 0.5
> 
> Yes I could guess that (floating point vs fixed), but is this a coincidence
> that both '0.5'::float and '-0.5'::float are closer to 0, whereas they could
> be closer to +/-1, as well as both closer to the lower or upper bound.

Wouldn't 0.5 and -0.5 be representable exactly as floats?


-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] Table versions

2003-10-29 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes:

> Returning to the original problem, it seems to me that comparing "pg_dump
> -s" output is a reasonable way to proceed.  

I've actually started checking in a pg_dump -s output file into my CVS tree. 

However I prune a few key lines from it. I prune the TOC OID numbers from it,
and anything not owned by the user I'm interested in.

The makefile rule I use looks like:

schema.sql:
pg_dump -U postgres -s user | sed '/^-- TOC entry/d;/^\\connect - 
postgres/,/^\\connect - user/d;/^SET search_path/d;/^$$/d;/^--$$/d' > $@


This still suffers from one major deficiency. The order that objects are
outputed isn't necessarily consistent between databases. If I add tables to
the development server but then add them to the production server in a
different order the schema still shows differences even though the objects in
the two databases are identical.

-- 
greg


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [SQL] Getting last insert value

2003-11-15 Thread Greg Stark

Guillaume LELARGE <[EMAIL PROTECTED]> writes:

> Doing a "select currval() from my_table" after your insert should work.

That's "select currval('my_table_pkcol_seq')" actually. 

The above would have called the currval() function for every record of the
table which isn't what you want and in any case currval takes an argument.

-- 
greg


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [SQL] Programatically switching database

2003-11-16 Thread Greg Stark
ow <[EMAIL PROTECTED]> writes:

> My concern though ... wouldn't pgSql server collapse when faced with
> transaction spawning across 100M+ records? 

The number of records involved really doesn't faze Postgres at all. However
the amount of time spent in the transaction could be an issue if there is
other activity in other schemas of the same database.

As long as the transaction is running none of the deleted or old updated data
in any schema of the database can be cleaned up by vacuum as postgres thinks
the big transaction "might" need to see it sometime.

So if the rest of the database is still active the tables and indexes being
updated may grow larger than normal. If it goes on for a _really_ long time
they might need a VACUUM FULL at some point to clean them up.

-- 
greg


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [SQL] Expressional Indexes

2003-11-18 Thread Greg Stark

"Randolf Richardson, DevNet SysOp 29" <[EMAIL PROTECTED]> writes:

>   For example, if I want to index on a date field but only have the index 
> keep track of the most recent 30 days (and then create a secondary index for 
> all dates) so as to improve performance on more heavily loaded systems.
> 
>   Am I understanding this new terminology correctly?  Thanks in advance.

No, you could do the above using "partial indexes" but it wouldn't work very
well in this case because the "last 30 days" keeps moving and you would have
to keep redefining the index periodically. It also wouldn't really help
performance.

Expression Indexes are just more powerful "functional indexes". In 7.3 they
could be used for indexing expressions like "lower(foo)". In 7.4 they're more
powerful and you can index expressions other than simple function calls. 

They still should be things that always return the same value, which excludes
subqueries.

-- 
greg


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [SQL] Expressional Indexes

2003-11-21 Thread Greg Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> No, because the above represents a moving cutoff; it will (and should)
> be rejected as a non-immutable predicate condition.  You could do
> something like
> 
>   CREATE INDEX my_Nov_03_index on my_table (create_date)
> WHERE (create_date >= date '2003-11-01');
> 
> and then a month from now replace this with
> 
>   CREATE INDEX my_Dec_03_index on my_table (create_date)
> WHERE (create_date >= date '2003-12-01');
> 
> bearing in mind that this index can be used with queries that contain
> WHERE conditions like "create_date >= some-date-constant".  The planner
> must be able to convince itself that the right-hand side of the WHERE
> condition is >= the cutoff in the index's predicate condition.  Since
> the planner is not very bright, both items had better be simple DATE
> constants, or it won't be able to figure it out ...

Note that if you're just doing this to speed up regular queries where you have
create_date in some small range, then you'll likely not see much of an
increase. Mainly you'll just save space.

What can be interesting is to create a partial index like this but over a
second unrelated column. Something like:

CREATE INDEX my_dec_03_index on my_table (userid)
 WHERE (create_date >= date '2003-11-02');

Then you can do queries like

SELECT * FROM my_table WHERE userid = ? AND create_date >= date '2003-11-02'

And it'll be able to efficiently pull out just those records, even if there
are thousands more records that are older than 2003-11-02.

This avoids having to create a two-column index with a low-selectivity column
like "month".

-- 
greg


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [SQL] Scaler forms as function arguments

2003-11-28 Thread Greg Stark

Joe Conway <[EMAIL PROTECTED]> writes:

> In 7.4 you could use an array. It would look like this:


Though note that 7.4 doesn't know how to optimize this form:


db=> explain select * from foo where foo_id in (1,2);
   QUERY PLAN  
  
-
 Index Scan using foo_pkey, foo_pkey on foo  (cost=0.00..6.05 rows=2 width=756)
   Index Cond: ((foo_id = 1) OR (foo_id = 2))
(2 rows)


db=> explain select * from foo where foo_id = ANY (array[1,2]);
  QUERY PLAN  
--
 Seq Scan on foo  (cost=0.00..1132.82 rows=5955 width=756)
   Filter: (foo_id = ANY ('{1,2}'::integer[]))
(2 rows)

-- 
greg


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [SQL] OFFSET and subselects

2003-11-28 Thread Greg Stark

[EMAIL PROTECTED] (Dmitri Bichko) writes:

> I am running in trouble with pagination here, somehow (rather naively) I
> assumed that when doing a LIMIT and OFFSET, the subselects on the records
> before the OFFSET would not be performed, which quite apparently is not the
> case. So, LIMIT 50 OFFSET 0 takes 100ms to run, LIMIT 50 OFFSET 50 takes
> 200ms, LIMIT 50 OFFSET 100 takes 300ms; and so forth, this really becomes
> unacceptable after a few pages.

If you don't need any of the results of the subqueries in your WHERE clause
then you can do this by introducing a view in your query like:

SELECT *,
   (SELECT ...) AS sub_1,
   (SELECT ...) AS sub_2,
   (SELECT ...) AS sub_3
  FROM (
SELECT x,y,z
  FROM ...
 WHERE ...
   )
 LIMIT 50
OFFSET 50


If you do use the results of the subqueries in your where clause or order by
clause then, well, you're SOL. Since the OFFSET and LIMIT clauses only kick in
after the where clause restrictions are taken into account.

-- 
greg


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [SQL] Index not recognized

2003-12-06 Thread Greg Stark

"Grace C. Unson" <[EMAIL PROTECTED]> writes:

> Why is it that my index for text[] data type is not recognized by the
> Planner?
> 
> I did these steps:
> 
> 1. create function textarr(text[]) returns text language sql as 'select
> $1[1]' strict immutable
> 2. create index org_idx on EmpData (textarr(org));

This index will only be used if you use the expression textarr(org) in your
query. You would probably have some success if you did:

 select * from empdata where textarr(org) = 'math' 

> 3. vacuum full
> 4. explain analyze select name from EmpData where org *= 'math';

Is this *= operator from the contrib/array directory? It's not an indexable
operator at all using standard btree indexes.

The GiST indexing does make indexable operators that can do things like *= but
that's a whole other ball of wax.

What are you really trying to do?

> Result:
> =
> Seq Scan on EmpData (cost=0.00..3193.20 rows=102 width=488)
> (actual time=3.71.35..371.35 rows=0 loops=1)
> 
> Filter: (org[0]='math'::text)

Well that's awfully odd. I don't know how that expression came out of the
query you gave. You'll have to give a lot more information about how you're
defining *= and why you think it's related to the function you used to define
the index.

-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [SQL] How can I get the last element out of GROUP BY sets?

2004-01-22 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes:

> Robert Creager <[EMAIL PROTECTED]> writes:
> > ... one piece of data I need is the last value for each GROUP BY
> > period.  Alas, I cannot figure out how to do this.
> 
> SELECT DISTINCT ON (rather than GROUP BY) could get this done for you.

Or if you need to combine this with other aggregate functions like sum, count,
etc:

CREATE FUNCTION first_accum (integer, integer) RETURNS integer AS 'select 
coalesce($1,$2)' LANGUAGE sql;
CREATE FUNCTION last_accum (integer, integer) RETURNS integer AS 'select $2' LANGUAGE 
sql;
CREATE AGGREGATE first (BASETYPE = integer, SFUNC = first_accum, STYPE = integer);
CREATE AGGREGATE last (BASETYPE = integer, SFUNC = last_accum, STYPE = integer);

Then you can do first() and last(). These definitions only work for integer
but you can pattern match for other datatypes. You might be able to get a
universal function working using anyelement now, I haven't tried.

-- 
greg


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [SQL] limit 1 and functional indexes

2004-01-29 Thread Greg Stark

Bruno Wolff III <[EMAIL PROTECTED]> writes:

> >QUERY PLAN
> > 
> >  Sort  (cost=11824.16..11831.55 rows=2956 width=33) (actual time=248.17..248.17 
> > rows=0 loops=1)
> >Sort Key: order_date
> >->  Index Scan using transactions_pop_i on transactions
> > (cost=0.00..11653.79 rows=2956 width=33) 
> > (actual time=126.13..126.13 rows=0 loops=1)
> >  Index Cond: (upper((pop)::text) = 
> > '79BCDC8A4A4F99E7C111'::text)
> >  Total runtime: 248.25 msec


Yeah, the problem with functional indexes is that the optimizer doesn't have
any clue how the records are distributed since it only has statistics for
columns, not your expression. Notice it's estimating 2956 rows where in fact
there are 0.

I think someone was actually working on this so it may be improved in 7.5 but
I'm not sure.

Given the type of data you're storing, which looks like hex strings, are you
sure you need to do a case-insensitive search here? Can't you just uppercase
it when you store it?

The other option would be to use a subquery and force the planner not to pull
it up, something like:


 select code
   from (
 select code 
   from transactions 
  where UPPER(pop) = UPPER('79bcdc8a4a4f99e7c111') 
 offset 0
)
 order by order_date DESC;


The offset 0 prevents the optimizer from pulling the subquery into the outer
query. I think this will prevent it from even considering the order_date index
scan, but you'll have to try to be sure.

-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] limit 1 and functional indexes: SOLVED

2004-01-30 Thread Greg Stark

"Alexandra Birch" <[EMAIL PROTECTED]> writes:

> It works perfectly - thanks a million!
> Strangely the offset 0 does not seem to make any difference.
> Gotta read up more about subqueries :)
> 
>  explain analyze
>  select code,order_date
>from (
>  select code, order_date
>from transactions
>   where UPPER(pop) = UPPER('c892eb2f877e3a28ddc8e196cd5a8aad')
>   limit 1
> ) as foo
>  order by order_date DESC;

I think what you're trying to do here is get the last order? Then you'll want
the limit to be on the outer query where it's ordered by order_date:

  select code,order_date
from (
  select code, order_date
from transactions
   where UPPER(pop) = UPPER('c892eb2f877e3a28ddc8e196cd5a8aad')
  offset 0
 ) as foo
   order by order_date DESC;
   limit 1

Note that in theory postgres should be able to find the same plan for this
query as yours since it's equivalent. It really ought to use the order_date
index since it thinks it would be more efficient. 

However it's unable to because postgres doesn't try every possible index, only
the ones that look like they'll be useful for a where clause or an order by.
And the order by on the outer query isn't considered when it's looking at the
subquery. 

It normally handles this case by merging the subquery into the outer query,
but it can't do that if there's a limit or offset. So an "offset 0" is
convenient for fooling it into thinking the subquery can't be pulled up
without actually changing the output.

You could do "order by upper(pop)" instead which might be clearer for someone
reading the query in that it makes it look like you're trying to encourage it
to use the index on upper(pop). In theory "order by"s on subqueries are
useless and postgres could ignore them, but it doesn't.

-- 
greg


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [SQL] Index not used - now me

2004-02-09 Thread Greg Stark

Christoph Haller <[EMAIL PROTECTED]> writes:

> Why appear? If the Index Scan has a Total runtime: 2.46 msec and the Seq Scan 
> a Total runtime: 46.19 msec, then the Index Scan is much faster. 
> Or am I completely off the track reading the explain analyze output? 

To estimate the relative costs of a sequential scan and an index scan Postgres
has to take into account the likelihood the blocks needed will be the disk
cache. In your example your database is otherwise idle and the entire table is
small enough that the entire index is probably in cache.

This means that the random access pattern of the index isn't really hurting
the index scan at all. Whereas in a busy database with less available RAM the
random access pattern makes a big difference.

You could try raising effective_cache_size to give postgres a better chance at
guessing that all the blocks will be in cache. But that may no longer be true
when the query is run on a busy database.

You could also try lowering random_page_cost. Some people find as low as 1.2
or so to be useful, but that would almost certainly be lying to postgres about
the costs of random access and would cause it to use index scans aggressively
even when they're not faster.

-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [SQL] Slow sub-selects, max and count(*)

2004-03-27 Thread Greg Stark

Josh Berkus <[EMAIL PROTECTED]> writes:

> Max() and Count() cannot use indexes for technical reasons.   Browse through 
> the archives of SQL, PERFORM, and HACKERS for about 12,000 discussions on the 
> subject.

Please don't confuse the issue by throwing Max() and Count() into the same
basket.

The issue with Min()/Max() is that the result could be generated efficiently
by scanning indexes but it's just hard, especially given generic aggregate
functions, and the work simply hasn't been done, or even started, yet.

The issue with Count() is that people want the result to be cached in a single
per-table counter, but that can't be done as simply as that because of
transactions. People have discussed complex solutions to this but it's a much
more complex problem than it appears.

They're really two entirely separate issues.

-- 
greg


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [SQL] SQL challenge--top 10 for each key value?

2004-04-08 Thread Greg Stark

Jeff Boes <[EMAIL PROTECTED]> writes:

> I headed off in the direction of groups of SELECTs and UNIONs, and quit when I
> got to something like four levels of "SELECT ... AS FOO" ...

four? wimp, that's nothing!

ok, seriously I think there's no way to do this directly with straight SQL.
You would have to define a non-immutable function that has some temporary
storage where it keeps track of how many it has seen. 

The generic function that would help here would be some kind of rank(value)
that would give you the equivalent of rownum except with a level break every
time value changes. I've been hoping to see something like this on the list
for a long time but haven't yet.

If the value of n is constant and small you could cheat with an aggregate
function with an array of the top n values.

db=> create function first3_accum(integer[],integer) returns integer[] as 'select case 
when array_upper($1,1) >= 3 then $1 else array_append($1,$2) end' language sql strict 
immutable;
CREATE FUNCTION
db=> create aggregate first3 (basetype = integer, sfunc = first3_accum, stype = 
integer[], initcond = '{}');
CREATE AGGREGATE

then something like:

SELECT first3(id)
  FROM (SELECT id 
 FROM my_table 
ORDER BY query, 
 CASE WHEN include THEN 1 ELSE 2 END ASC, 
 score DESC)
 GROUP BY query

But then you'll have to go back to the table to refetch the original records
that you've found. The best way I find to do that is with the int_array_enum()
function from the int_aggregate contrib module.

SELECT * 
  FROM my_table 
 WHERE id IN (
   SELECT int_array_enum(f3)
 FROM (
   SELECT first3(id) as f3
 FROM (SELECT id 
FROM my_table 
   ORDER BY query, 
CASE WHEN include THEN 1 ELSE 2 END ASC, 
score DESC) as x
GROUP BY query
 ) as x
   )


This last step is kind of annoying since you've already seen all those
records. And it requires writing a new aggregate function every time the value
of n changes though, which kind of sucks.

In theory if the new work in 7.5 handling structured datatypes is as cool as
it sounds you could have an array of complete records and when UNNEST is
eventually incorporated into the array code then you could expand those
instead of using the int_array_enum function. Neither of those things are
ready yet as far as I know though.

-- 
greg


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [SQL] SQL challenge--top 10 for each key value?

2004-04-09 Thread Greg Stark
Josh Berkus <[EMAIL PROTECTED]> writes:

> Rod,
> 
> > Something along the lines of the below would accomplish what you want
> > according to spec. ROW_NUMBER() is a spec defined function. (6.10 of
> > SQL200N)
> 
> Great leaping little gods!   They added something called "row number" to the 
> spec? 
> 
> Boy howdy, folks were right ... the ANSI committee really has completly blown 
> off the relational model completely.   

If it's like Oracle's rownum then it's the row number of the *output*, not the
position on disk. So it's not entirely blowing off the relational model any
more than ORDER BY does.

The weird thing is the number of cases where you want ORDER BY or rownum
inside subselects. Which the solution to the original question needed.

> When a standards committee becomes hostage to a handful of vendors, kiss
> real standards goodbye.

In the case of SQL was there ever any pretension otherwise? Was the SQL
standard ever really useful as a "real standard"? I can write useful ANSI C89
code that will compile and work on any C compiler. Trying to write portable
SQL92 code that does any useful work is about as productive as stapling bagels
to your forehead.

-- 
greg


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [SQL] SQL challenge--top 10 for each key value?

2004-04-10 Thread Greg Stark

elein <[EMAIL PROTECTED]> writes:

> create or replace function pycounter(integer)
> returns integer as
> '
>if args[0] == 0:
>   SD["nextno"] = 1
>   return SD["nextno"]
>try:
>   SD["nextno"] += 1
>except:
>   SD["nextno"] = 1
>return SD["nextno"]
> ' language 'plpythonu';
> 
> And clearly it can be done faster as a little
> C function.

Does this approach have a hope of working if it's used twice in the same
query?


-- 
greg


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [SQL] trigger/for key help

2004-04-12 Thread Greg Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> Bret Hughes <[EMAIL PROTECTED]> writes:
> > FWIW I tried to use alter table but could never get the parser to accept
> > $1 as a constraint name.  I used single and double quotes as well as a
> > lame attempt \$1.
> 
> Hm, "$1" works for me ...

Hm, this reminds me. When I was first learning this stuff I was stymied by the
same issue. It took me quite a while to figure out how to drop constraints
because of the quoting issue.

Of course now it seems obvious, but for someone just starting it adds another
roadblock. Is there a reason postgres goes out of its way to pick names that
will be harder to work with than necessary?

Or is it considered a good thing on the theory that if it's hard to reference
it's also hard to accidentally use such names in conflicting ways?

Perhaps names like _1 _2 ... would be easier to handle? 
Or perhaps making $ not require quoting would be helpful?

-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] trigger/for key help

2004-04-13 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes:

> Greg Stark <[EMAIL PROTECTED]> writes:
> > Is there a reason postgres goes out of its way to pick names that
> > will be harder to work with than necessary?
> 
> If we use ordinary identifiers for system-generated names then we will
> be infringing on user name space --- ie, there's a potential for
> conflict.  I suppose we could use long randomly-generated names like
> ewjncm343cnlen, but are those really easier to work with?

I don't see an unseverable link between "user name space" and "identifiers
that don't need to be quoted". Mixed case names for instance seem like
perfectly good user name space identifiers.

Postgres could just as easily say "the system reserves all identifiers
starting with $" and still not require quoting $.

> I think a more useful approach is to treat it as a documentation
> problem.  Perhaps an example in the ALTER TABLE man page would help.

-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [SQL] Concatenate results of a single column query

2004-04-18 Thread Greg Stark

Marco Lazzeri <[EMAIL PROTECTED]> writes:

> SELECT
>   p.name, ARRAY(SELECT nicknames FROM people WHERE people.id = p.id)
> FROM people AS p

> Any suggestions?

Something like:

db=> create aggregate array_aggregate (basetype = integer, sfunc = array_append, stype 
= integer[], initcond = '{}');
CREATE AGGREGATE

db=> select array_aggregate(id) from tab;
  array_aggregate  
 

 
{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,28,29,30,31,32,33,34,36,37,38,39,40,41,42,43,27,26,44,45,46,47,48,49,50,51,52,53,54,35}
(1 row)

-- 
greg


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [SQL] Concatenate results of a single column query

2004-04-19 Thread Greg Stark
Christoph Haller <[EMAIL PROTECTED]> writes:

> Interesting feature, but I cannot find function array_append: 
> ERROR:  AggregateCreate: function array_append(integer[], integer) does not exist

It's new in Postgres 7.4

I think you could do this in 7.3 though, it would just be more awkward. Try ||
but I think that's new in 7.4 as well. Otherwise I think you would have to
pick out the upper bound of the array with array_dims and set the upper+1'th
element of the array.

If you're doing text you may want to go directly to a textual concatenation
like:

CREATE FUNCTION concat_agg_accum(text, text) RETURNS text
AS 'select $1 || '', '' || $2'
LANGUAGE sql IMMUTABLE STRICT;

CREATE AGGREGATE concat_agg (
BASETYPE = text,
SFUNC = concat_agg_accum,
STYPE = text
);


-- 
greg


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [SQL] isolation level

2004-05-13 Thread Greg Stark

Bruno Wolff III <[EMAIL PROTECTED]> writes:

> On Thu, May 13, 2004 at 18:13:23 +,
>   Jaime Casanova <[EMAIL PROTECTED]> wrote:
> > Hi all, is there a way to set the isolation level  to something like the 
> > sql standard dirty read.
> 
> No. There will be a way to use the standard name in a SET command, but
> you will actaully get READ COMMITTED isolation (which is the next step up).

I wonder how hard this would be to implement. It doesn't seem like it should
be very hard.

It would be very convenient for debugging and for checking on the progress of
large batch updates or loads.

-- 
greg


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [SQL] Multiple outer join on same table

2004-05-13 Thread Greg Stark
Marco Lazzeri <[EMAIL PROTECTED]> writes:

> Hi!
> I'm searching a better (quicker) way to retrieve data as I used to do
> using the following query...
> 
> ==
> 
> SELECT main.codice,
>other.value AS value_one,
>other.value AS value_two
>   FROM main 
>   LEFT OUTER JOIN other   ON (main.id = other.id_main)
>   LEFT OUTER JOIN other AS other2 ON (main.id = other2.id_main)
>  WHERE other.type = 'type_one'
>AND other2.type = 'type_two'
> ;

a) you're better off sending the actual query rather than retyping it. I
assume you made a typo in the select column list and it should be
"other2.value AS value_two"? Also the parentheses are required on the ON
clause.

b) The WHERE clause will effectively make this a plain inner join, not an
outer join at all. Since any values that aren't found would have a NULL type
column and cause the row to not be selected.

I think the query you meant to write would be

SELECT codice, 
   other1.value AS value_one, 
   other2.value AS value_two
  FROM main
  LEFT OUTER JOIN other as other1 ON (main.id = other1.id_main AND type = 'type_one')
  LEFT OUTER JOIN other as other2 ON (main.id = other2.id_main AND type = 'type_two)

Another way to write this query that might be faster or might not depending
would be:

SELECT codice,
   (SELECT value FROM other WHERE id_main = id AND type = 'type_one') AS value_one,
   (SELECT value FROM other WHERE id_main = id AND type = 'type_two') AS value_two
  FROM codice

In theory the two queries really ought to always result in the same plan
because they're equivalent. However the Postgres optimizer as clever as it is
is incapable of seeing this. 

The first form with the outer join leaves the optimizer with a lot more
flexibility though, including at least one plan that is effectively identical
to what the optimizer is forced to do for the second query. So really the
first one should be no worse than the second.

If you find the second faster (or if they're both still slow) you might
consider posting explain analyze output for both queries. It may be that you
have other issues preventing the optimizer from finding a good plan.

You have run analyze on these tables recently? And you vacuum regularly? And
for the second query you would really want an index on other.id_main too. For
the first one it would depend on the data in the two tables.

-- 
greg


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [SQL] Selecting "sample" data from large tables.

2004-06-03 Thread Greg Stark

> Joseph Turner <[EMAIL PROTECTED]> writes:
> > I have a table with a decent number of rows (let's say for example a
> > billion rows).  I am trying to construct a graph that displays the
> > distribution of that data.  However, I don't want to read in the
> > complete data set (as reading a billion rows would take a while).  Can
> > anyone thing of a way to do this is postgresql?

One way would be to have an indexed column with random values in it. Then you
could use an index scan to pull out samples.

However this has a few downsides. 

a) index scans are a lot less efficient than sequential scans. Effectively
reducing the sample size you can get for a given amount of time even further.
a 10% sample using this technique is probably almost as slow as reading the
entire table, for example. If you only need .1% though this might be a good
approach.

b) the data in the random column would have to be static meaning multiple
samples wouldn't be entirely independent. Depending on what you're doing with
the samples this might be a problem or not.

Oh, and if you're content with always using the same sample but want to
analyze it multiple different ways, you might want to use a partial index. You
could have partial indexes matching the order by and where clauses of your
analyses but with a where clause of its own selecting on the random data. Then
make sure that where clause is in every query.

But still, the sample has to be small enough that even using an index scan
you're winning over reading the entire data.

-- 
greg


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [SQL] Empty array IS NULL?

2004-06-28 Thread Greg Stark

Markus Bertheau <[EMAIL PROTECTED]> writes:

> oocms=# SELECT ARRAY(SELECT 1 WHERE FALSE);
>  ?column?
> --
>   

This one seems strange to me. Shouldn't it result in an empty array?


-- 
greg


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: FW: [SQL] "=" operator vs. "IS"

2004-06-29 Thread Greg Stark
Rich Hall <[EMAIL PROTECTED]> writes:

> "(anything) = NULL" is always Null, this cannot be what the coder intended.

I often have such things in my SQL. Consider what happens when you have SQL
constructed dynamically. Or more frequently, consider that many drivers still
don't use the new binary placeholder syntax and emulate it by putting the
parameters directly into the SQL.

-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: FW: [SQL] "=" operator vs. "IS"

2004-06-29 Thread Greg Stark

Stephan Szabo <[EMAIL PROTECTED]> writes:

> IS TRUE and IS FALSE have a different effect from =true and =false when
> the left hand side is NULL. The former will return false, the latter will
> return NULL.

No, actually they both return false.

(But thanks, I didn't even realize they were special this way)

-- 
greg


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [SQL] How to create an aggregate?

2004-08-01 Thread Greg Stark

Ray Aspeitia <[EMAIL PROTECTED]> writes:

> I also would like to pass the delimiter to the aggregate as a parameter and
> I am not sure if it can handle that.

Currently aggregates that take multiple parameters are just not supported.

-- 
greg


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [SQL] reply to setting

2004-08-22 Thread Greg Stark

Joe Conway <[EMAIL PROTECTED]> writes:

> This is very true. In fact, I get mildly annoyed when people *don't* include
> the direct reply to me, because I very actively filter/redirect my mail.
> Replies directly to me are pretty much guaranteed to be seen quickly, but the
> ones that go to the list might get lost among the hundreds of posts that go
> into my "postgres" inbox every day. I think many other people do something
> similar.

Just as a side comment, one trick I found very helpful in my mail filters is
to treat any message with one of my message-ids in the references as a
personal message as far as mail notifications. This way I get notifications
for any message on a thread following a post of my own.

This is easy in Gnus since the message id has the sending hostname and also
the first few characters has a base64 encoded copy of the unix userid. You
would have to figure out how to recognize message-ids from your MUA.


-- 
greg


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [SQL] sleep function

2004-08-22 Thread Greg Stark

John DeSoi <[EMAIL PROTECTED]> writes:

> On Aug 10, 2004, at 10:57 AM, Bruce Momjian wrote:
> 
> > I can't think of one, no.  I think you will have to use one of the
> > server-side languages and call a sleep in there.
> 
> This is no good in the real world since it pounds the CPU, but it worked well
> enough for my testing purposes.

You went the long way around. I think what he meant was something as simple
as:

bash-2.05b$ /usr/lib/postgresql/bin/createlang -U postgres plperlu test

bash-2.05b$ psql -d test -U postgres
Welcome to psql 7.4.2, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
   \h for help with SQL commands
   \? for help on internal slash commands
   \g or terminate with semicolon to execute query
   \q to quit

test=# create or replace function sleep(integer) returns integer as 'return 
sleep(shift)' language plperlu;
CREATE FUNCTION

test=# \timing
Timing is on.

test=# select sleep(10) ;
 sleep 
---
10
(1 row)

Time: 10002.493 ms


-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] only last records in subgroups

2004-08-22 Thread Greg Stark

Dino Vliet <[EMAIL PROTECTED]> writes:

> x,0 and y,4 but how do I manage this in sql? I was
> hoping for a keyword LAST or so, where I can specify
> that when I've ordered my results with order by, I
> could only get the last of the subgroups (the first
> one is easy because I could use limit 1)

There's no concept of "first" and "last" in SQL outside of the ORDER BY clause
of your query. And you can easily reverse the order of the ORDER BY sort by
putting "DESC" after the columns you're sorting on.

But I don't understand how you intend to use "LIMIT 1" to solve your problem.
As you describe the problem you want the last (or first) record of *each*
*group*. Solving that using LIMIT would require a complex query with a
subquery in the column list which would be quite a pain.

As the other poster suggested, if you're just looking to fetch a single column
you can just use min() or max() to solve this. 

If you're looking to fetch more than one column Postgres provides a
non-standard SQL extension for dealing with this situation, "DISTINCT ON".

SELECT DISTINCT ON (id) id,day,other,columns FROM tab ORDER BY id,day

That gets the lowest value of "day". Using "ORDER BY id, day DESC" to get the
greatest value of "day".

-- 
greg


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [SQL] from PG_DUMP to CVS

2004-08-26 Thread Greg Stark
"Riccardo G. Facchini" <[EMAIL PROTECTED]> writes:

> After searching throught the list, I assume you mean this link:
> http://www.rbt.ca/autodoc/index.html
> by Rod Taylor.
> 
> Looks promising, but still what I need is a proper CVS output, as I
> need to review the changes made to the specific database structure.

Well, CVS can still be useful even if the changes are all in one file. Look at
"cvs annotate" for example. And CVS diff would still give you useful
information.

-- 
greg


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [SQL] Aggregate query for multiple records

2004-08-28 Thread Greg Stark

Scott Gerhardt <[EMAIL PROTECTED]> writes:

> Hello, I am new to the list, my apology if this question is beyond the scope or
> charter of this list.
> 
> My questions is:
> What is the best method to perform an aggregate query to calculate sum() values
> for each distinct wid as in the example below, but except for all wid's (not
> just WHERE wid='01/1-6-1-30w1/0').
> 
> Also, performance wise, would it be better to build a function for this query.
> The table has 9 million records and these aggregate queries take hours.

The "top n" type query (or in this case "first n" or "last n" but it's the
same thing) is actually very tricky to do in standard SQL. The best solution
seen here for postgres is to use arrays and custom aggregate functions. 

The following is based on a previous answer from Tom Lane to a similar
question. (I thought I already posted this for you on pgsql-general but the
list archives are down and you don't seem to have seen it, so I'm resending
it)

It allows you to do the whole query with a single sort for the grouping and
the ordering by date together. You would have to use it with something like:

SELECT sum_first_6(oil) 
  FROM (SELECT oil from prd_data ORDER BY wid, "date") 
 GROUP BY wid

If you pump up sort_mem enough -- you can do it within the session for the
single connection using "SET sort_mem" it should be pretty fast.

I think it's the best you're going to get. If you're absolutely sure the data
is physically stored in chronological order -- which I would only feel
comfortable with if you've never done any updates or deletes, only inserts and
perhaps occasional truncates, then you might be able to get by without
ordering and convince it to do a hash aggregate. That would be the optimal
result, no sorts at all. But it would be hard to make sure it would always
work.

test=> create or replace function first_6_accum (integer[], integer) returns integer[] 
language sql immutable as 'select case when array_upper($1,1)>=6 then $1 else $1||$2 
end';
CREATE FUNCTION

test=> create function sum_6(integer[]) returns integer immutable language sql as 
'select $1[1]+$1[2]+$1[3]+$1[4]+$1[5]+$1[6]';
CREATE FUNCTION

test=> create aggregate sum_first_6 (basetype=integer, sfunc=first_6_accum, 
stype=integer[],initcond='{}',finalfunc=sum_6);
CREATE AGGREGATE

test=> select sum_first_6(i) from (select i from (select 1 as i union select 2 union 
select 3 union select 4 union select 5 union select 6 union select 7 union select 8) 
as x order by i desc) as x;
 sum_first_6 
-
  33
(1 row)

test=> select sum_first_6(i) from (select i from (select 1 as i union select 2 union 
select 3 union select 4 union select 5 union select 6 union select 7 union select 8) 
as x order by i asc) as x;
 sum_first_6 
-
  21
(1 row)

This can easily be switched around to make it "last_6" and you can write
functions to handle 6 records or 9 records. And all of these could be combined
in a single query, so you only have to do the sort once.

Unfortunately you cannot make aggregate functions that take multiple
parameters, nor can you pass extra parameters to the state function. So you'll
have to create a separate set of functions for each variant.

Also, you'll have to change it to use reals.


-- 
greg


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [SQL] Isnumeric function?

2004-09-08 Thread Greg Stark

Theo Galanakis <[EMAIL PROTECTED]> writes:

>   error: btree item size 2744 exceeds maximum 2713.
> 
> I assume I had to change some server settings to extend the maximum, however

I would guess the block size. But I'm just guessing.

> in the end this column holds content, and even applying an index would be
> incredible slow to search across hundred of thousands of "content" records
> looking for a primary key.

Perhaps you could have an indexed column that contains a crc32 hash? Then you
could do searches by comparing crc32 which make for fast efficient integer
index lookups. You should still include a comparison against the original
content column since it is possible for there to be a rare crc32 collision.

This doesn't let you do range lookups efficiently. But it does let you look up
specific values quickly even when they aren't numeric.

-- 
greg


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [SQL] Isnumeric function?

2004-09-09 Thread Greg Stark

Theo Galanakis <[EMAIL PROTECTED]> writes:

> I created the Index you specified, however it chooses to run a seq scan on
> the column rather than a Index scan. How can you force it to use that
> Index..
> 
> CREATE INDEX idx_content_numeric ON botched_table(content) WHERE content ~
> '^[0-9]{1,9}$';
> 
> select * from botched_table where content = 200::integer

You need to put a "and content ~ '^[0-9]{1,9}$'" in your query, the clause has
to match the clause in the partial index pretty closely.

perhaps you would find it convenient to make a view of
 select * from botched_table where content ~ '^[0-9]{1,9}$'
and then just always select these values from that view.

Also the "::integer" is useless. It actually gets cast to text here anyways.
The index is on the text contents of the content column.

You might consider making the index a functional index on content::integer
instead. I suspect that would be faster and smaller than an index on the text
version of content:

slo=> create table botched_table (content text);
CREATE TABLE
slo=> create index idx_botched_table on botched_table ((content::integer)) where 
content ~ '^[0-9]{1,9}$';
CREATE INDEX
slo=> create view botched_view as (select content::integer as content_id, * from 
botched_table where content ~ '^[0-9]{1,9}$');
CREATE VIEW
slo=> explain select * from botched_view where content_id = 1;
   QUERY PLAN  
 

 Index Scan using idx_botched_table on botched_table  (cost=0.00..3.72 rows=3 width=32)
   Index Cond: ((content)::integer = 1)
   Filter: (content ~ '^[0-9]{1,9}$'::text)
(3 rows)


-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [SQL] Isnumeric function?

2004-09-10 Thread Greg Stark
Greg Stark <[EMAIL PROTECTED]> writes:

> Theo Galanakis <[EMAIL PROTECTED]> writes:
> 
> > I created the Index you specified, however it chooses to run a seq scan on
> > the column rather than a Index scan. How can you force it to use that
> > Index..
> > 
> > CREATE INDEX idx_content_numeric ON botched_table(content) WHERE content ~
> > '^[0-9]{1,9}$';
> > 
> > select * from botched_table where content = 200::integer
> 
> You need to put a "and content ~ '^[0-9]{1,9}$'" in your query, the clause has
> to match the clause in the partial index pretty closely.

Well this is weird. I tried to come up with a cleaner way to arrange this than
the view I described before using a function. But postgres isn't using the
partial index when it seems it ought to be available.

When I say it has to match "pretty closely" in this case I think it would have
to match exactly, however in the case of simple range operators postgres knows
how to figure out implications. Ie, "where a>1" should use a partial index
built on "where a>0".

slo=> create table test (a integer);
CREATE TABLE
slo=> create index idx_text on test (a) where a > 0;
CREATE INDEX
slo=> explain select * from test where a > 0;
   QUERY PLAN   

 Index Scan using idx_text on test  (cost=0.00..17.50 rows=334 width=4)
   Index Cond: (a > 0)
(2 rows)

slo=> explain select * from test where a > 1;
   QUERY PLAN   

 Index Scan using idx_text on test  (cost=0.00..17.50 rows=334 width=4)
   Index Cond: (a > 1)
(2 rows)



That's all well and good. But when I tried to make a version of your situation
that used a function I found it doesn't work so well with functional indexes:



slo=> create function test(integer) returns integer as 'select $1' language plpgsql 
immutable;
CREATE FUNCTION
slo=> create index idx_test_2 on test (test(a)) where test(a) > 0;
CREATE INDEX
slo=> explain select test(a) from test where test(a) > 0;
QUERY PLAN
--
 Index Scan using idx_test_2 on test  (cost=0.00..19.17 rows=334 width=4)
   Index Cond: (test(a) > 0)
(2 rows)

slo=> explain select test(a) from test where test(a) > 1;
  QUERY PLAN   
---
 Seq Scan on test  (cost=0.00..25.84 rows=334 width=4)
   Filter: (test(a) > 1)
(2 rows)


I can't figure out why this is happening. I would think it has something to do
with the lack of statistics on functional indexes except a) none of the tables
is analyzed anyways and b) the estimated row count is the same anyways.


-- 
greg


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [SQL] Isnumeric function?

2004-09-10 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes:

> Greg Stark <[EMAIL PROTECTED]> writes:
> > That's all well and good. But when I tried to make a version of your
> > situation that used a function I found it doesn't work so well with
> > functional indexes:
> > ...
> > I can't figure out why this is happening.
> 
> You're using 7.3 or older?

7.4.3.


-- 
greg


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [SQL] implementing an out-of-transaction trigger

2004-09-15 Thread Greg Stark

"Iain" <[EMAIL PROTECTED]> writes:

> Though, as far as I can tell, there is no way to have the notify activate a
> pl/pgsql function directly. I'll still need to write a client program to
> create a session and actually do the listening, that is if I havn't missed
> anything else...

Right, presumably some sort of daemon that sits and waits for events. Much
like you would have with Oracle advanced queuing and such.

The big difference is that NOTIFY doesn't pass along any parameters. You will
need some way for your daemon to find any pending data it needs to process.
You might need some kind of queue table, or you might be able to get by
without one.

-- 
greg


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [SQL] 1-byte integers

2004-09-18 Thread Greg Stark

stig erikson <[EMAIL PROTECTED]> writes:

> how can i specify an integer to be one byte byte or even 4 bits long?
> int1, int(1), tinyint are nowhere to be seen.
> smallest i can find is smallint that is 2 bytes.

There's a type called "char" (the double quotes are needed). It's used by
postgres system catalogues but it's available for users too. It's a little
quirky since it has operators that treat it as a 1 byte text data type and
other operators that treat it as an integer data type. But that doesn't
normally lead to any problems, just strange symptoms when your code has a bug.

Other than that there are things like bit(4) which has a cast to and from
integer. But they won't save you any storage space. If you have multiple
columns like this and want to define a new type that aggregates them all for
storage but lets you access them individually that could be useful.

-- 
greg


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [SQL] ORDER BY and NULLs

2004-09-19 Thread Greg Stark

T E Schmitz <[EMAIL PROTECTED]> writes:

> I just dug out the PostgreSQL book again because I thought I might've garbled
> it:
> 
> Quote: "PostgreSQL will not index NULL values. Because an index will never
> include NULL values, it cannot be used to satisfy the ORDER BY clause of a
> query that returns all rows in a table."

You should just cross out that whole section. It's just flatly wrong. 

I had always assumed it was just people bringing assumptions over from Oracle
where it is true. Perhaps this book is to blame for some of the confusion.
Which book is it?

Postgres indexes NULLs. It can use them for ORDER BY clauses. 

Where it cannot use them is to satisfy "WHERE foo IS NULL" or "WHERE foo IS
NOT NULL" constraints though. That's an implementation detail, but it can be
worked around with partial indexes.

-- 
greg


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [SQL] ORDER BY and NULLs

2004-09-19 Thread Greg Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> > The paragraph continues:
> > "If the SELECT command included the clause WHERE phone NOT NULL, 
> > PostgreSQL could use the index to satisfy the ORDER BY clause.
> > An index that covers optional (NOT NULL) columns will not be used to 
> > speed table joins either."
> 
> My goodness, it seems to be a veritable fount of misinformation :-(
> 
> I wonder how much of this is stuff that is true for Oracle and they just
> assumed it carried over?

The first part is true for Oracle. You have to add the WHERE phone NOT NULL to
convince Oracle it can use an index. Or just make the column NOT NULL to begin
with I think.

However as far as I recall the second part is not true. Oracle is smart enough
to realize that an equijoin clause implies NOT NULL and therefore allows it to
use the index.

(This may have all changed in Oracle 9+. The last I saw of Oracle was 8i)

I wonder if they just tried explain on a bunch of queries and noticed that
postgres wasn't using an index for SELECT * FROM foo ORDER BY bar and came up
with explanations for the patterns they saw?

-- 
greg


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [SQL] COUNT(*) to find records which have a certain number of dependencies ?

2004-09-20 Thread Greg Stark

T E Schmitz <[EMAIL PROTECTED]> writes:

> I want to select only those BRAND/MODEL combinations, where the MODEL has more
> than one TYPE, but only where one of those has TYPE_NAME='xyz'.
> I am not interested in MODELs with multiple TYPEs where none of them are called
> 'xyz'.


There are lots of approaches to this with various pros and cons.

The simplest one off the top of my head:

select * 
  from brand 
  join model on (brand_pk = brand_fk)
 where exists (select 1 from type where model_fk = model_pk and type_name = 'xyz')
   and (select count(*) from type where model_fk = model_pk) > 1


You could try to be clever about avoiding the redundant access to the type table:

select * 
  from brand 
  join model on (brand_pk = brand_fk)
 where (select count(*)
  from type 
 where model_fk = model_pk 
having sum(case when type = 'xyz' then 1 else 0 end) >= 1
   ) > 1

I'm haven't tested that, it might need some tweaking. In any case I don't
think it's worth the added complexity, assuming you have indexes on type. I'm
not even sure it would run faster.

You could try to be really clever about it by turning the whole thing into a
join:

select * 
  from brand 
  join model on (brand_pk = brand_fk)
  join (select model_fk
  from type 
 group by model_fk
having sum(case when type = 'xyz' then 1 else 0 end) >= 1
   and count(*) > 1
   ) on (model_fk = model_pk)

This would let the planner have a more plans to choose from and might be a big
win if there are lots of brands and models but few that satisfy the criteria
you're looking for.

-- 
greg


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [SQL] COUNT(*) to find records which have a certain number of

2004-09-20 Thread Greg Stark

T E Schmitz <[EMAIL PROTECTED]> writes:

> SELECT
> BRAND_NAME,MODEL_NAME
...
> intersect
...

Huh, I never think of the set operation solutions. I'm curious how it compares
speed-wise.

-- 
greg


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [SQL] COUNT(*) to find records which have a certain number of dependencies ?

2004-09-20 Thread Greg Stark

T E Schmitz <[EMAIL PROTECTED]> writes:

> ) as somealias on (model_fk = model_pk)
> 
> (subquery in FROM must have an alias)

ARGH! This is one of the most annoying things about postgres! It bites me all
the time. Obviously it's totally insignificant since it's easy for my to just
throw an "AS x" on the end of it. But damn.

I see there's a comment foreseeing some annoyance value for this in the
source:

  /*
   * The SQL spec does not permit a subselect
   * () without an alias clause,
   * so we don't either.  This avoids the problem
   * of needing to invent a unique refname for it.
   * That could be surmounted if there's sufficient
   * popular demand, but for now let's just implement
   * the spec and see if anyone complains.
   * However, it does seem like a good idea to emit
   * an error message that's better than "syntax error".
   */

So where can I officially register my complaint? :)

-- 
greg


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [SQL] JOIN performance

2004-09-21 Thread Greg Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> Fixing this properly is a research project, and I haven't thought of any
> quick-and-dirty hacks that aren't too ugly to consider :-(

Just thinking out loud here. Instead of trying to peek inside the CASE
couldn't the optimizer just wrap the non-strict expression in a conditional
that tests whether the row was found?


-- 
greg


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [SQL] [GENERAL] need ``row number``

2004-09-24 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes:

> Karsten Hilbert <[EMAIL PROTECTED]> writes:
> > I am not convinced I'll need a SRF. I am not trying to
> > calculate something that isn't there yet. I am just trying to
> > join two views appropriately. I might have to employ some
> > variant of Celko's integer helper table but I'm not sure how
> > to proceed.
> 
> A fairly common hack for this is to use a sequence:
> 
>   create temp sequence tseq;
>   select nextval('tseq'), * from (select  order by ...) ss;

But I thought she wanted to get the row number within a group. Not the row
number for the entire result set. A sequence can't do that. Or at least, I
suppose it could but it would be mighty strange to see setval() in a SELECT
query. And I can't think of how to detect the level break in a select query
either.

-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] Inserting into table only if the row does not already exist.

2004-10-14 Thread Greg Stark

"C. Bensend" <[EMAIL PROTECTED]> writes:

> INSERT INTO table ( column1, column2, column3 )
>SELECT column1, column2, column3
>WHERE NOT EXISTS (
>   SELECT column1, column2, column3 FROM table WHERE
>  column1 = $column1 AND
>  column2 = $column2 AND
>  column3 = $column3 )
> 
>.. which gave me 'ERROR: column1 does not exist'.  Nuts.

Well you're not selecting from any table so "column1" isn't going to exist.
You just have to put it in the select list as a constant. If you're feeling
generous to the next programmer to read it you could put "AS column1" after
each one, but the column name doesn't actually have to match the column you're
inserting into.

 INSERT INTO table ( column1, column2, column3 )
   (
SELECT $column1, $column2, $column3
 WHERE NOT EXISTS (
   SELECT 1
 FROM table 
WHERE column1 = $column1
  AND column2 = $column2
  AND column3 = $column3 )
)

Note that this is going to have some concurrency issues. I think it will be
possible for a second query to execute before the first commits. In that case
it won't see the record the first query inserted and try to insert again.
You'll just get a primary key violation though which I guess you can just
ignore.

Which raises a question. Why not forgoe this complicated SQL and try to do the
insert. If you get a primary key violation, well there's your answer... If you
don't care about the failure just ignore it and move on. I would suggest
checking specifically for a primary key violation and still stopping execution
on unexpected errors though.

If you're doing this inside a bigger transaction that's a bit more of a pain.
Until 8.0 postgres can't handle ignoring an error on a query without aborting
the entire transaction. But if you're in autocommit mode then you can just
ignore the primary key violation and continue. 

Incidentally, if you're putting your parameters directly into your queries
using $column1 then you've got a potential security problem. Unless you're
quoting every variable everywhere religiously using postgres's quoting
functions an attacker can sneak extra SQL into your queries. Potentially
including whole new statements such as "DELETE FROM table"...

-- 
greg


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [SQL] Inserting into table only if the row does not already

2004-10-16 Thread Greg Stark

"C. Bensend" <[EMAIL PROTECTED]> writes:

> The risk of a cron gone wild is acceptable to me at this moment.

Gee, now I have images of late-night advertisements for bofh-porn video tapes
of Cron Jobs Gone Wild(tm) dancing through my head... thanks.

-- 
greg


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [SQL] help on a query

2004-10-08 Thread Greg Stark

Michelle Murrain <[EMAIL PROTECTED]> writes:

> The OUTER JOIN version is quite a bit more efficient (by an order of magnitude)
> than the option with WHERE NOT EXISTS subquery.

This is going to be heavily dependent on the version of postgres. IN/NOT IN
execution has improved a lot in 7.4 and later. If you're still on 7.3 then the
outer join will be better. But in 7.4 I would actually expect the NOT IN or
the NOT EXISTS to be faster.

-- 
greg


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [SQL] Cross tabulations

2004-10-19 Thread Greg Stark
"Muhyiddin A.M Hayat" <[EMAIL PROTECTED]> writes:

> Dear all,
> 
> I need to do something similar to a cross tabulation, but without any
> aggregation.

join your table to itself four times:

select * 
  from (select check_time::date as date, employee_id, check_time-check_time::date as 
in from test where state = 'In') as a 
  join (select check_time::date as date, employee_id, check_time-check_time::date as 
break_out from test where state = 'Break Out') as b using (employee_id,date)
  join (select check_time::date as date, employee_id, check_time-check_time::date as 
break_in from test where state = 'Break In') as d using (employee_id,date)
  join (select check_time::date as date, employee_id, check_time-check_time::date as 
out from test where state = 'Out') as e using (employee_id,date) ;

Note that this will do strange things if you don't have precisely four records
for each employee.

Alternatively use subqueries:

select date, employee_id,
   (select check_time-check_time::date from test where employee_id = x.employee_id 
and check_time::date = date and state = 'In') as in,
   (select check_time-check_time::date from test where employee_id = x.employee_id 
and check_time::date = date and state = 'Break Out') as break_out,
   (select check_time-check_time::date from test where employee_id = x.employee_id 
and check_time::date = date and state = 'Break In') as break_in,
   (select check_time-check_time::date from test where employee_id = x.employee_id 
and check_time::date = date and state = 'Out') as out
  from (select distinct employee_id, check_time::date as date from test) as x; 

This will at least behave fine if there are missing records and will give an
error if there are multiple records instead of doing strange things.

Neither of these will be particularly pretty on the performance front.

-- 
greg


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [SQL] vacuum analyze slows sql query

2004-11-03 Thread Greg Stark

patrick ~ <[EMAIL PROTECTED]> writes:

> I noticed that a freshly created db with freshly inserted data (from
> a previous pg_dump) would result in quite fast results.  However,
> after running 'vacuum analyze' the very same query slowed down about
> 1250x (Time: 1080688.921 ms vs Time: 864.522 ms).

If it gets slower immediately after a single vacuum analyze then the problem
is that one of the queries is getting planned wrong when stats are available.
If it takes a while to slow down then it could be other problems such as index
bloat etc.

Don't use "explain verbose" use "explain analyze". I don't know if it was in
7.1 but you say you reproduced the problem with 7.4. It would be helpful to
see the results of "explain analyze select ..." on the query before and after
the vacuum analyze.

-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] Simple SQL Question

2004-11-06 Thread Greg Stark

Tomasz Myrta <[EMAIL PROTECTED]> writes:

> > select * from table1 LIMIT x
> > gives me the first x row of the result.
> > After that, I save the last value, and next time, I adjust
> > the query as
> > select * from table1 where itemkey>:lastvalue LIMIT x
> 
> Why do you complicate it so much? Everything you need is:
> 
> select * from table1 LIMIT x
> select * from table1 LIMIT x OFFSET x
> select * from table1 LIMIT x OFFSET 2*x
> 
> Remember to sort rows before using limit/offset.

There are two good reasons to prefer his Andras' solution to yours.

a) If the data is modified between the two queries his will continue from
where the previous page left off. Yours will either skip records or overlap
with the previous page.

b) If itemkey is indexed his will be an efficient index scan that performs
similarly regardless of what page is being fetched. Yours will perform more
and more slowly as the user gets deeper into the results.


Note that both queries are wrong however. You need an "ORDER BY itemkey" or
else nothing guarantees the second page has any relation at all to the first
page.

-- 
greg


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [SQL] Comparing Dates

2004-11-18 Thread Greg Stark

"Thomas F.O'Connell" <[EMAIL PROTECTED]> writes:

> select 2004-06-08;
>   ?column?
> --
>   1990
> 
> I'm not exactly sure how the bare string is converted internally, but it's
> clearly not a complete date like you're expecting.

What string? That's just integer arithmetic.

-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [SQL] Making dirty reads possible?

2004-12-06 Thread Greg Stark

[EMAIL PROTECTED] writes:

> But not possible for real at the moment?
> 
> So, summarising:
> - Nested transactions is not (yet) supported
> - READ UNCOMMITTED isolation level is not (yet) supported
> - the EXECUTE plpgsql construct does not circumvent the transaction

Well nested transactions are in 8.0 but I don't think they help you much.

I find I've been stymied using server-side functions for large batch jobs for
pretty much the same reason. I find it works better and it's more flexible to
write client-side programs in the language of my choice that connect to the
database and do the batch jobs.

They can output progress logs or keep information about their progress in some
shared space. They can also control the transaction more freely committing in
the middle of the job if it's safe.

-- 
greg


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [SQL] replacing mysql enum

2004-12-11 Thread Greg Stark

Ian Barwick <[EMAIL PROTECTED]> writes:

> What I still don't quite understand is why IN in a CHECK context is
> handled differently to say: select 1 where 'x' in (null,'a','b','c') ?
> This could be a bit of a gotcha for anyone constructing a constraint
> similar to the original poster's and not realising it has no effect.

well

WHERE foo IN (null, ...)

returns null if foo isn't explicitly in the list (ie, "it may or may not equal
the unknown value in the list"). And I think constraints that return null are
deemed to have succeeded.

-- 
greg


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [SQL] Column with recycled sequence value

2005-01-13 Thread Greg Stark
Andrew Sullivan <[EMAIL PROTECTED]> writes:

> You can set the sequence up to cycle (so once it gets to the end, it
> wraps around to the beginning again).  The keyword is CYCLE at CREATE
> SEQUENCE time.  It defaults to NO CYCLE.
> 
> One potential problem, of course, are collisions on the table,
> because some value wasn't cleared out.  It sounds like you don't have
> that problem though.

Alternatively you can go through the database and make sure all the foreign
keys are declared and marked ON UPDATE CASCADE. Then go through and renumber
all your entries sequentially starting at 1 and reset your sequence. 

I'm not sure this is such a hot idea really. But I don't really like the idea
of letting the sequence wrap around much either. You'll have to consider the
pros and cons of each approach (and of just moving to bigserial too).

If you're going to do this you'll want an index on all the foreign key
columns. That is, the columns referring to this value from other tables.
Otherwise the automatic updates would be very slow.

And will probably want to schedule down-time for this. Otherwise application
code that holds values in local state might get very confused.

I think I would do it with a program that connects and updates each record
individually and commits periodically rather than with a single big update.
Just because I like having control and having things that give me progress
information and can be interrupted without losing work.

Oh, and this won't work if you have any external references to these values
from outside your database. Say if the value is something like a customer
account number that you've previously sent to customers...

-- 
greg


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [SQL] OFFSET impact on Performance???

2005-01-20 Thread Greg Stark
"Andrei Bintintan" <[EMAIL PROTECTED]> writes:

> > If you're using this to provide "pages" of results, could you use a cursor?
> What do you mean by that? Cursor?
> 
> Yes I'm using this to provide "pages", but If I jump to the last pages it goes
> very slow.

The best way to do pages for is not to use offset or cursors but to use an
index. This only works if you can enumerate all the sort orders the
application might be using and can have an index on each of them.

To do this the query would look something like:

SELECT * FROM tab WHERE col > ? ORDER BY col LIMIT 50

Then you take note of the last value used on a given page and if the user
selects "next" you pass that as the starting point for the next page.

This query takes the same amount of time no matter how many records are in the
table and no matter what page of the result set the user is on. It should
actually be instantaneous even if the user is on the hundredth page of
millions of records because it uses an index both for the finding the right
point to start and for the ordering.

It also has the advantage that it works even if the list of items changes as
the user navigates. If you use OFFSET and someone inserts a record in the
table then the "next" page will overlap the current page. Worse, if someone
deletes a record then "next" will skip a record.

The disadvantages of this are a) it's hard (but not impossible) to go
backwards. And b) it's impossible to give the user a list of pages and let
them skip around willy nilly.


(If this is for a web page then specifically don't recommend cursors. It will
mean you'll have to have some complex session management system that
guarantees the user will always come to the same postgres session and has some
garbage collection if the user disappears. And it means the URL is only good
for a limited amount of time. If they bookmark it it'll break if they come
back the next day.)

-- 
greg


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-25 Thread Greg Stark

Alex Turner <[EMAIL PROTECTED]> writes:

> I am also very interesting in this very question.. Is there any way to
> declare a persistant cursor that remains open between pg sessions? 
> This would be better than a temp table because you would not have to
> do the initial select and insert into a fresh table and incur those IO
> costs, which are often very heavy, and the reason why one would want
> to use a cursor.

TANSTAAFL. How would such a persistent cursor be implemented if not by
building a temporary table somewhere behind the scenes?

There could be some advantage if the data were stored in a temporary table
marked as not having to be WAL logged. Instead it could be automatically
cleared on every database start.

-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] same question little different test MSSQL vrs Postgres

2005-01-25 Thread Greg Stark

"Joel Fradkin" <[EMAIL PROTECTED]> writes:

> I also tried a simple select * from tblcase where clientum = 'SAKS'

Try:

explain analyze select * from tblcase where clientum = 'SAKS'

Send the output.

-- 
greg


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [SQL] same question little different test MSSQL vrs Postgres

2005-01-26 Thread Greg Stark
"Joel Fradkin" <[EMAIL PROTECTED]> writes:

> QUERY PLAN
> "Seq Scan on tblcase  (cost=0.00..30066.21 rows=37401 width=996) (actual
> time=0.344..962.260 rows=22636 loops=1)"
> "  Filter: ((clientnum)::text = 'SAKS'::text)"
> "Total runtime: 1034.434 ms"

Well that says it only took 1s. So it seems this is highly dependent on
whether the data is in cache. Perhaps it was in cache on MSSQL when you
profiled it there and not on postgres?

You could put an index on clientnum, but if the data is usually in cache like
this it might not even be necessary.

-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [SQL] same question little different test MSSQL vrs Postgres

2005-01-26 Thread Greg Stark
"Joel Fradkin" <[EMAIL PROTECTED]> writes:

> I tried the SET ENABLE_SEQSCAN=FALSE;
> And the result took 29 secs instead of 117.
> 
> After playing around with the cache and buffers etc I see I am no longer
> doing any swapping (not sure how I got the 100 sec response might have been
> shared buffers set higher, been goofing around with it all morning).

If it's swapping you're definitely going to get bad results. You really want
the *majority* of RAM left free for the OS to cache disk data.

> My worry here is it should obviously use an index scan so something is not
> setup correctly yet. I don't want to second guess the analyzer (or is this a
> normal thing?)

No that's not obvious. 22k out of 344k is a selectivity of 6.6% which is
probably about borderline. The optimizer is estimating even worse at 10.9%
which isn't far off but puts it well out of the range for an index scan.

If you really want to get postgres using an index scan you'll have to a)
improve the estimate using "alter table tblcase alter column clientnum set
statistics" to raise the statistics target for that column and reanalyze. 

And b) lower random_page_cost. random_page_cost tells postgres how much slower
indexes are than table scans and at the default setting it accurately
represents most disk hardware. If your database fits within RAM and is often
cached then you might have to lower it to model that fact. But you shouldn't
do it based on a single query like this.


-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] order by question

2005-03-09 Thread Greg Stark
Gary Stainburn <[EMAIL PROTECTED]> writes:

> > Alternatively: (a<>6),(a<>4),a
> 
> Although this does exactly what I want, at first glance it should do 
> exactly the opposite.
> 
> I'm guessing that for each line it evaluates
> not (a=6) 0 for true else 1

Not really, "not a=6" is an expression that evaluates to a boolean, true or
false. true sorts as "greater" than false. That order is counterintuitive but
it's because the default sort order is ascending. So the "lesser" false
records appear first.

If you put "not a=6" in your select column list you'll see the true and false
values appear.


-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [SQL] Consecutive row count query

2005-03-17 Thread Greg Stark
Leon Stringer <[EMAIL PROTECTED]> writes:

> Hi,
> 
> I wondered if anyone could answer the following question:
> 
> If I have a table such as the one below:
> 
> col1   col_order
> ---
> Apple  1
> Apple  2
> Orange 3
> Banana 4
> Apple  5
> 
> Is there a way I can get the following results:
> 
> Apple  2
> Orange 1
> Banana 1
> Apple  1

Maybe. But not easily or efficiently.

How about this:

SELECT a.col1, a.col_order
  FROM tab as a
  LEFT OUTER JOIN tab as b 
ON (b.col_order = a.col_order+1 AND b.col1=a.col1)
 WHERE b.col1 IS NULL


> But since (in my intended table) most rows will have col_count = 1, this
> seems like unnecessary normalization (and semantically "wrong").

I think this looks like a better option. "unnecessary normalization" is an odd
phrase. Unless you can point at some reason that the denormalized seems *more*
convenient --and much *more* convenient at that-- not less convenient then you
should go for it.

Besides, that col_count column's only going to be four bytes. Unless the
"Apple" data is really short it'll only take a few col_count>1 to make it
worthwhile.

The only reason you might have a problem is if it's really "semantically
wrong" which would be if there's data attached to Apple or Orange that might
be different from one streak of results to the other.

-- 
greg


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [SQL] Date/Time Conversion

2005-04-03 Thread Greg Stark
James G Wilkinson <[EMAIL PROTECTED]> writes:

> I hope that this is some silly beginner's mistake.  I have spent quite a bit 
> of
> time
> reading the PostgreSQL documentation and cannot find my error.  I have also
> scanned the PostgreSQL archive and the web for help, but I have not found
> anything
> to get me over the hump (it is probably out there, I just cannot find it).
> 
> All I am trying to do is convert GMT dates and times to an arbitrary
> time zone.  Here is my small test database:

I think the closest thing to what you're trying to do is:

slo=> set timezone  =  'GMT';
SET
slo=>  select '2001-01-01'::date::timestamptz at time zone 'EST' ;
  timezone   
-
 2000-12-31 19:00:00
(1 row)


Why are you not just storing a "timestamp with timezone" with the actual time
of the event, and a second column with the local time zone in which you can
choose to use to display the time?

-- 
greg


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [SQL] ignore single character in SELECT query?

2005-04-12 Thread Greg Stark

[EMAIL PROTECTED] writes:

> So basically I want to ignore a single character (the apostrophe
> character), anywhere in the middle of my search word, in selecting
> results.  How can I do this?

WHERE replace(name,,'') like '%dont%'

Beware of quoting issues if "dont" is coming from user supplied inputs.

-- 
greg


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [SQL] row-attribute in EXPLAIN-output doesn't match count(*)

2005-04-14 Thread Greg Stark
Andrew Sullivan <[EMAIL PROTECTED]> writes:

> On Thu, Apr 14, 2005 at 07:21:38PM +0200, Andreas Joseph Krogh wrote:
> > 
> > So, what you're suggesting is that a restart of the webapp should make 
> > vacuum 
> > able to delete those dead rows?
> 
> Yes, but that'll only solve your problem for now.  You'll have the
> problem again soon.  What's keeping open the transaction?

This is presumably because of the long-standing issue that Postgres takes the
snapshot as soon as the BEGIN is issued. A lot of drivers issue a "COMMIT;
BEGIN;" right away even though it could be a long time before any actual work
is done.

Other databases (by which I mean Oracle) treat BEGIN as a noop. The snapshot
starts when the first SQL statement that needs a snapshot is executed. So
until a SELECT is issued the connection doesn't participate in any
transactional issues like keeping old versions of records around.

-- 
greg


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [SQL] row-attribute in EXPLAIN-output doesn't match count(*)

2005-04-14 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes:

> We have to start the transaction no later than event #2 since there has
> to be something to hold the lock.  But it'd be easy enough to decouple
> this from BEGIN, and it'd be good enough to solve the "COMMIT;BEGIN"
> problem.

Oh I think I finally figured out what you're saying here. 

So vacuum doesn't really know what tuples are actually visible to the
snapshots actually taken by a transaction? It's making the conservative
estimate that a snapshot could have been taken as early as the start of the
transaction even if no snapshot was taken until later?


> Which of these three times do you think now() ought to correspond to?
> I recall having argued that it should be event #3 since that corresponds
> to the database snapshot you see.  100% backwards compatibility would
> require setting now() at event #1, but will anyone weep if we change that?

I think it would be weird to perform a select and see records with dates after
now(). It would also be weird to perform a select and not see records inserted
before now(). I'm not sure any of the above guarantees those criteria for READ
COMMITTED mode, but I think I'm on the same general path as you.

-- 
greg


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [SQL] can you change an FK constraint from NOT DEFERRABLE to DEFERRABLE

2005-03-30 Thread Greg Stark
Bruno Wolff III <[EMAIL PROTECTED]> writes:

> It does look like you can only ADD and DROP constraints, not directly
> alter or replace them. So making a reference deferable is go to require
> a DROP and ADD which will need to recheck the constraint.

I asked the same question a few days ago on pgsql-general.

In short, if you want to skip the rechecking you have to update system tables
directly and you have to do two of them. 

The updates you want would look something like these. But these would do *all*
your constraints, make sure to get only the ones you really want to change:

update pg_constraint set condeferrable = 't' where contype = 'f'
update pg_trigger set tgdeferrable=true where tgisconstraint = true


I think an ALTER CONSTRAINT to change these settings as well as the 
ON {UPDATE,DELETE} behaviour would be neat.

-- 
greg


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [SQL] interesting SQL puzzle - concatenating column with itself.

2005-05-12 Thread Greg Stark

Nick Fankhauser <[EMAIL PROTECTED]> writes:

> Alvaro Herrera wrote:
> 
> > The order is not really guaranteed, though if this is a one-shot thing,
> > you may get away with turning off hashed aggregates.
> >
> 
> When I read this, I assumed there was a runtime parameter I could set that was
> similar to ENABLE_HASHJOIN. Are you referring to a different runtime parameter
> or something else entirely?

Similar but different. enable_hashagg. You can see all of these with "show
all" in psql.

However you do not have to worry. Even with (actually, *especially with*) hash
aggregates the records will be processed in the order they're received.

It's actually the normal aggregate method of sorting on the GROUP BY columns
that risks damaging the order. However Postgres does a special check to see if
the subquery is already sorted by the GROUP BY column and avoids the extra
sort which could cause you problems.

So this is not guaranteed by the SQL spec to work (but then the SQL spec
doesn't have custom aggregates at all) but Postgres goes out of its way to
make sure this doesn't break unnecessarily.

[This is all for 8.0 and I think 7.4. Some versions before that might 

-- 
greg


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [SQL] SELECT * FROM foo OFFSET -1 LIMIT 1

2005-06-27 Thread Greg Stark

Markus Bertheau <[EMAIL PROTECTED]> writes:

> Offset for negative numbers means 0, as it seems. I think there is a
> sensible meaning for negative offset numbers and wondered, what
> arguments led to negative offsets being processed as 0 offset.

Frankly I'm surprised it's not a syntax error.

-- 
greg


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [SQL] ARRAYs and INDEXes ...

2005-08-16 Thread Greg Stark
"Marc G. Fournier" <[EMAIL PROTECTED]> writes:

> SELECT * FROM customers WHERE monthly_balance[6] = 0.00;

This, like the other poster said, can be accomplished with a set of simple
expression indexes.

> As an example ... or
> 
> SELECT * FROM customers WHERE 0.00 = any (monthly_balance);

This would require a GiST index. Look at the intarray contrib module. I don't
think there's any equivalent for other data types. You might have to store
these values as fixed precision numbers and divide or multiple by 100 to
convert.

-- 
greg


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [SQL] PL/SQL Function: self-contained transaction?

2005-08-22 Thread Greg Stark

"Marc G. Fournier" <[EMAIL PROTECTED]> writes:

> In PostgreSQL, as everyone knows, a QUERY == a transaction, unless wrap'd in a
> BEGIN/END explicitly ... how does that work with a function?  is there an
> implicit BEGIN/END around the whole transaction, or each QUERY within the
> function itself?

The whole outer query issued from your frontend is in one transaction.

> If the whole function (and all QUERYs inside of it) are considered one
> transaction, can you do a begin/end within the function itself to 'force'
> commit on a specific part of the function?

Functions cannot issue start or end transactions. They're a creature of the
transaction you're in when you call them. Otherwise it wouldn't make sense to
be able to call them from within a query.

There is some discussion of "stored procedures" which would live outside of
transactions and be able to create transactions, commit, and roll them back.
But I don't think any of that work is committed yet. I'm not even sure it's
been written yet.

-- 
greg


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [SQL] uuid type (moved from HACKERS)

2005-09-07 Thread Greg Stark
Josh Berkus  writes:

> Mark, Nathan,
> 
> I'm moving this over to the PGSQL-SQL list, away from -hackers, as it's no 
> longer a -hackers type discussion.   Hope you don't mind!
> 
> > On Wed, Sep 07, 2005 at 11:31:16AM -0700, Josh Berkus wrote:
> > > I'm also a little baffled to come up with any real application where
> > > making an id number for most tables "unguessable" would provide any
> > > kind of real protection not far better provided by other means.   For
> > > your "users" table, sure, but that's a very special case.
> >
> > It should never be the sole means of defense, however, it can be quite
> > effective at prevention.
> >
> > For a rather simple example, consider a site that associates a picture
> > with each member. If the pictures are named 1.jpg, 2.jpg, 3.jpg, etc.
> > it makes it ridiculously easy to write a script to pull all of the
> > pictures off the site. This can be bothersome, as the only type of
> > person who would do this, is the type of person with an illegitimate
> > motivation. I want the data to be easily and freely accessible as
> > specific objects, but I do not wish to provide an easy way of
> > dumping all of the data as a unit.

Of course you could have just done the same thing using an hmac (or a simple
hash like crypt) and not had to store an extraneous meaningless piece of
information in your database.

-- 
greg


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [SQL] MOVE in SQL vs PLPGSQL

2005-10-06 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes:

> Richard Huxton  writes:
> > Ah, now I see what you're saying. You're quite right in your suspicions, 
> > "MOVE..." isn't supported for plpgsql cursors. You could probably do 
> > something with EXECUTE and returning a refcursor from a previous 
> > function, but that sounds fiddly.
> 
> > I must admit, on the odd occasion I want to skip a row, I just FETCH it 
> > and move on. Anyone else?
> 
> There is something on the TODO list about improving plpgsql's cursor
> functionality --- there's no reason it shouldn't have MOVE, except that
> no one got around to it yet.

Though the original poster should realize, a MOVE command would be only
marginally more efficient than just fetching those records. It would save the
network overhead and context switches involved in communicating those records,
but there's no way it would let the server avoid reading all those records
from disk.

At least as far as I can see.

-- 
greg


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [SQL] pg, mysql comparison with "group by" clause

2005-10-12 Thread Greg Stark
Stephan Szabo <[EMAIL PROTECTED]> writes:

> On Tue, 11 Oct 2005, Rick Schumeyer wrote:
> 
> > I'm not sure what I was thinking, but I tried the following query in pg:
> >
> > SELECT * FROM t GROUP BY state;
> >
> > pg returns an error.
> >
> > Mysql, OTOH, returns the first row for each state.  (The first row with
> > "AK", the first row with "PA", etc.)
> >
> > I'm no SQL expert, but it seems to me that the pg behavior is correct, and
> > the mysql result is just weird.  Am I correct?
> 
> In your case, it sounds like the mysql result is wrong. I believe SQL99
> would allow it if the other columns were functionally dependant upon state
> (as there'd by definition only be one value for the other columns per
> group).

I believe this is a documented feature.

MySQL treats "select a,b from t group by a" equivalently to Postgres's 
"select distinct on (a) a,b from t"

I suppose "equivalent" isn't quite true. It's more general since it allows
aggregate functions as well. The equivalently general Postgres syntax is to
have a first() aggregate function and do "select a,first(b) from t group by a".

I'm sure it's very convenient.

-- 
greg


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [SQL] Text->Date conversion in a WHERE clause

2005-10-12 Thread Greg Stark
Collin Peters <[EMAIL PROTECTED]> writes:

> I have a table that has some columns which store 'custom' fields so the
> content varies according to the user that the row belongs to.  For one
> of the groups of users the field is a date (the type of the field is
> 'text' though).  I'm trying to perform a query where it only returns
> values in a certain date range so in the WHERE clause I have
> 
> WHERE cust3 <> ''
> AND cust3::text::timestamp > CURRENT_DATE - interval '1 month'
> 
> This results in the error 'ERROR:  date/time field value out of range:


> This results in the error 'ERROR:  date/time field value out of range:
> "052-44-5863"'.  Now that is obviously not a valid date but there
> is actually more to the where clause and the first part of it excludes
> all rows where the user is not even the correct type, so the row which
> includes the field '052-44-5863' should really not even be checked.

I think you have to use a CASE expression like:

WHERE CASE WHEN user_type = 1
   THEN cust3::timestamp > CURRENT_DATE - interval '1 month' 
   ELSE false 
   END CASE
  AND ...



There's no advantage to doing this kind of thing though. Good database design
principles dictate having one column for each piece of data. Just leave the
columns for which the data is inappropriate NULL. NULLs take effectively no
space.

-- 
greg


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [SQL] pg, mysql comparison with "group by" clause

2005-10-12 Thread Greg Stark
Scott Marlowe <[EMAIL PROTECTED]> writes:

> Hehe.  When I turn on my windshield wipers and my airbag deploys, is it
> a documented "feature" if the dealership told me about this behaviour
> ahead of time?  

Well it's more like my car where the dashboard dims when I turn on my
headlights which annoys me to no end since I learned to always put my
headlights on even in the day.

> In much the same way, while this behaviour may be documented by MySQL, I
> can't imagine it really being called a feature. But at least this
> misbehaviour is documented. However, I think most people in the MySQL
> universe just stumble onto it by accident when they try it and it works. I'd
> at least prefer it to throw a warning or notice or something.

I don't see why you think people stumble on this by accident. I think it's
actually an extremely common need. So common that Postgres has the same
feature (though less general) and invented a whole syntax to handle it.

I think most MySQL users don't stumble on it, they learn it as the way to
handle the common use case when you join a master table against a detail table
and then want to aggregate all the detail records. In standard SQL you have to
write GROUP BY ... and list every single column you need from the master
table. Forcing the database to do a lot of redundant comparisons and sort on
uselessly long keys where in fact you only really need it to sort and group by
the primary key.

Remember, most MySQL users learn MySQL first, and only later learn what is
standard SQL and what isn't. 

> A Subselect would let you do such a thing as well, and while it's more
> complicated to write, it is likely to be easier to tell just what it's
> doing.

Subselects have their own problems here. Mainly Postgres's optimizer, as good
as it is, doesn't treat them with the same code paths as joins and can't find
all the same plans for them. But in any case you cannot always write a
subselect that's equivalent to an arbitrary join.


-- 
greg


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] pg, mysql comparison with "group by" clause

2005-10-12 Thread Greg Stark
"Anthony Molinaro" <[EMAIL PROTECTED]> writes:

> By changing the values in the select/group by you are changing 
> Group! How can you arbitrarily add or exclude a column?
> You can't do it.

Go back and reread the previous posts again. You missed the whole point.

-- 
greg


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [SQL] pg, mysql comparison with "group by" clause

2005-10-12 Thread Greg Stark
"Anthony Molinaro" <[EMAIL PROTECTED]> writes:

> Greg,
>   You'll have to pardon me...
>  
> I saw this comment:
> 
> "I don't see why you think people stumble on this by accident. 
> I think it's actually an extremely common need."
> 
> Which, if referring to the ability to have items in the select that do not
> need to be included in the group, (excluding constants and the like) is just
> silly.

Well the "constants and the like" are precisely the point. There are plenty of
cases where adding the column to the GROUP BY is unnecessary and since
Postgres makes no attempt to prune them out, inefficient. And constants aren't
the only such case. The most common case is columns that are coming from a
table where the primary key is already included in the GROUP BY list.

In the case of columns coming from a table where the primary key is already in
the GROUP BY list it's possible for the database to deduce that it's
unnecessary to group on that column. 

But it's also possible to have cases where the programmer has out of band
knowledge that it's unnecessary but the database doesn't have that knowledge.
The most obvious case that comes to mind is a denormalized data model that
includes a redundant column.

  select dept_id, dept_name, count(*) from employee_list

For example if dept_name is guaranteed to be the same for all records with the
same dept_id. Of course that's generally considered poor design but it doesn't
mean there aren't thousands of databases out there with data models like that.

-- 
greg


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [SQL] pg, mysql comparison with "group by" clause

2005-10-13 Thread Greg Stark

Scott Marlowe <[EMAIL PROTECTED]> writes:

> Sorry, but it's worse than that.  It is quite possible that two people
> could run this query at the same time and get different data from the
> same set and the same point in time.  That shouldn't happen accidentally
> in SQL, you should know it's coming.

I'm pretty unsympathetic to the "we should make a language less powerful and
more awkward because someone might use it wrong" argument.

> > In standard SQL you have to
> > write GROUP BY ... and list every single column you need from the master
> > table. Forcing the database to do a lot of redundant comparisons and sort on
> > uselessly long keys where in fact you only really need it to sort and group 
> > by
> > the primary key.
> 
> But again, you're getting whatever row the database feels like giving
> you.  A use of a simple, stupid aggregate like an any() aggregate would
> be fine here, and wouldn't require a lot of overhead, and would meet the
> SQL spec.

Great, so I have a user table with, oh, say, 40 columns. And I want to return
all those columns plus their current account balance in a single query.

The syntax under discussion would be:

select user.*, sum(money) from user join user_money using (user_id) group by 
user_id

You would prefer:

select user_id, 
   any(username) as username, any(firstname) as firstname, 
   any(lastname) as lastname, any(address) as address,
   any(city) as city, any(street) as street, any(phone) as phone,
   any(last_update) as last_update, any(last_login) as last_login,
   any(referrer_id) as referrer_id, any(register_date) as register_date,
   ...
   sum(money) as balance,
   count(money) as num_txns
  from user join user_money using (user_id) group by user_id


Having a safeties is fine but when I have to disengage the safety for every
single column it starts to get more than a little annoying. 

Note that you cannot write the above as a subquery since there are two
aggregates. You could write it as a join against a view but don't expect to
get the same plans from Postgres for that.


> Actually, for things like aggregates, I've often been able to improve
> performance with sub selects in PostgreSQL.  

If your experience is like mine it's a case of two wrongs cancelling each
other out. The optimizer underestimates the efficiency of nested loops which
is another problem. Since subqueries' only eligible plan is basically a nested
loop it often turns out to be faster than the more exotic plans a join can
reach.

In an ideal world subqueries would be transformed into the equivalent join (or
some more general join structure that can cover both sets of semantics) and
then planned through the same code path. In an ideal world the user should be
guaranteed that equivalent queries would always result in the same plan
regardless of how they're written.

-- 
greg


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [SQL] pg, mysql comparison with "group by" clause

2005-10-14 Thread Greg Stark

"Anthony Molinaro" <[EMAIL PROTECTED]> writes:

> More awkward? What *you're* suggesting is more awkward. You realize that
> right? How can syntax that is understood and accepted for years be more
> awkward?

Well gosh, I would say that that's something only a newbie could say about
SQL of all things...

I had a whole thing written after that but I just deleted it. I grow tired of
this thread.

I am pretty happy to hear that the SQL standard endorsed the idea having the
right thing happen if the primary key is present in the grouping list. That
would be a wonderful feature for Postgres.

-- 
greg


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [SQL] Storing HTML in table

2005-11-27 Thread Greg Stark

Rob <[EMAIL PROTECTED]> writes:

> I would like complete control over this information -- so if sometime in the
> future it's decided to totally redesign the layout. Also, at some point a
> tool will be created so novice computer users can enter nicely formatted
> markup -- meaning you won't have to know HTML to use this tool.

You should go look at some of the XML database books out there and maybe
you'll find a tool to do what you want. But beware, there's a lot of snake-oil
in this field. Anyone who promises a one-size-fits-all solution to every
problem is probably deluded.

The problem is hard. There's an inherent tension between being able to easily
manipulate parts of the data and flexibility in the type of data you can
represent. One one side is the traditional relational database model where
every record must contain precisely the same fixed list of predetermined
columns. One the other side is the decidely non-relational model where you
just store a large hunk of xml text. 

What you're looking for is one of the many systems that attempt to bridge
these two extremes and provide the best of both worlds. 

-- 
greg


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [SQL] "large" IN/NOT IN subqueries result in query returning wrong data

2005-12-28 Thread Greg Stark
elein <[EMAIL PROTECTED]> writes:

> > Note that the above are not inverses because you changed the lefthand
> > input.  You do get consistent results when you just add or omit NOT:
> Yes, you are right. I skipped the permutations to get down to the point.

Remember that NULL means "unknown". So "1 IN (1,2,NULL)" *should* be true
because regardless of what that unknown value is it's still obvious that 1
really is in the list. And "3 NOT IN (1,2,NULL)" is unknown because it depends
on whether that unknown quantity is 3 or not.

IN is the same as "= ANY" so "1 IN (1,2,NULL)" is the same as 
"1=1 OR 1=2 OR 1=NULL" which is true even though the last of the three is null.



-- 
greg


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [SQL] How to implement Microsoft Access boolean (YESNO)

2006-01-23 Thread Greg Stark

"Jesper K. Pedersen" <[EMAIL PROTECTED]> writes:

> Having checked the I/O format it seems that MS Access exports the
> values of a YESNO field as 0 and 1

If only Postgres's boolean type were as helpful.

-- 
greg


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [SQL] How to implement Microsoft Access boolean (YESNO)

2006-01-23 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes:

> Greg Stark <[EMAIL PROTECTED]> writes:
> > "Jesper K. Pedersen" <[EMAIL PROTECTED]> writes:
> >> Having checked the I/O format it seems that MS Access exports the
> >> values of a YESNO field as 0 and 1
> 
> > If only Postgres's boolean type were as helpful.
> 
> There's a cast to int in 8.1, and you can roll-your-own easily in prior
> releases ...

The annoying thing about is that in just about any client language you'll get
't' and 'f' by default and both will evaluate to false. So any user who tries
to do things the obvious way like this will get a surprise:

 if ($db->query("select canlogin from users where userid = ?",$userid)) {
   ...
 }

Is there an implicit cast from ints? So if I pass a 0 or 1 argument for a
boolean parameter now it'll work? That definitely didn't work in the past.

-- 
greg


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [SQL] two count columns?

2006-02-17 Thread Greg Stark

Jan Danielsson <[EMAIL PROTECTED]> writes:

> select from_ip, count(from_ip) as entries, count(select * from log where
> ...) as tot_entries, max(ts)::timestamp(0) as last_access from log where
> to_port=22 and direction='in' group by from_ip

select from_ip, 
   count(from_ip) as entries, 
   (select count(*) 
  from log as l
 where l.from_ip = log.from_ip
   ) as tot_entries,
   max(ts) as last_access
  from log
 where to_port=22
   and direction='in
 group by from_ip

expect it to be pretty slow though. For every from_ip it has to look up every
other entry with that from_ip.

> Thankful for any hints or tips.

There is a trick you could use to make it faster but it gets cumbersome and
pretty tricky to use when you're doing more than one thing at a time:

select from_ip
   sum(case when to_port=22 and direction='in' then 1 else 0 end) as 
entries,
   count(*) as tot_entries,
   max(case when to_port=22 and direction='in' then ts::timestamp(0) else 
null end) as last_access
  from log
 group by from_ip
 having entries > 0


Note that in either case you might want to look at ANALYZE results for the
query and try raising work_mem for this query using SET until you see the plan
using a hash aggregate. If it can use a hash aggregate for your query (more
likely for the first query than the second) without swapping it'll be faster
than sorting.

-- 
greg


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [SQL] view of weekly data

2006-02-17 Thread Greg Stark
"Ding Xiangguang" <[EMAIL PROTECTED]> writes:

> Hi, friend,
> 
> Suppose there is table of daily transaction data with 5 fields,
> 
> time(date), open(float8), high(float8), low(float8), close(float8)
> 
> Is it possible to create a view of weekly data, i.e. open is the first
> day'open, high is the highest of the week, low is the lowest of the week, 
> close
> is the last day's close.

low and high are easy, they're just min() and max(). so you would get
something like:

select date_trunc('week', time) as startofweek, 
   min(low) as weeklylow, 
   max(high) as weeklyhigh
  from dailydata
 group by date_trunc('week', time)


Unfortunately showing the open and close is much much harder. To get them
efficiently requires a feature set called OLAP that Postgres doesn't have and
isn't likely to get soon.

In Postgres 8.1 (and 8.0?) you could actually write some custom aggregate
functions using RECORD data type to store the earliest and latest time found
so far and the corresponding open and close to get them efficiently. Maybe
someone else would be able to show how to do that, I haven't tried it yet.

The only way to do it in standardish SQL would be with terribly inefficient
subqueries:

select date_trunc('week', time) as startofweek, 
   min(low) as weeklylow, 
   max(high) as weeklyhigh,
   (select open 
  from dailydata as d 
 where date_trunc('week',time)=date_trunc('week',dailydata.time)
 order by time asc
 limit 1
   ) as weeklyopen,
   (select close
  from dailydata as d 
 where date_trunc('week',time)=date_trunc('week',dailydata.time)
 order by time desc
 limit 1
   ) as weeklyclose
  from dailydata
 group by date_trunc('week', time)


-- 
greg


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


  1   2   >