Re: [SQL] The Information Schema vs the PG Catalog

2005-12-27 Thread George Pavlov
I would say that pg_catalog is the more complete one whereas the
information_schema the more generic, standards-conformant place. I would
stick with the information_schema unless that becomes inadequate. A case
in point may be sequences. Apart from
information_schema.columns.column_default I haven't seen them
represented anywhere there (please someone correct me if I am wrong).
You can get more information about sequences from pg_catalog.pg_class
(look for pg_class.relkind='S') and various views that sit on top of
that (e.g. pg_statio_all_sequences).

George


> I'm writing PL/pgSQL routines that generate triggers, functions,
> and rules based on design characteristics of tables, columns, and
> other database objects.  These routines need to be able to look up
> the definitions of these objects.  I see that there are two places
> available to look up this info: the Information Schema and in the
> PG Catalog.
> 
> Which source is preferable?  Or if that answer isn't absolute,
> what are the reasons or conditions for preferring one over the
> other?
> 
> Also, a specific question:  Does the Information Schema offer any
> way to list the sequences that exist and their attributes?  I
> can't seem to find any.
> 

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[SQL] order and order line item

2005-12-27 Thread David Garamond
Suppose we have the usual order & order line item entities:

 CREATE TABLE "order" (
   id INT PRIMARY KEY,
   date DATE NOT NULL
 );

 CREATE TABLE orderlineitem (
   id INT PRIMARY KEY,
   orderid INT REFERENCES "order"(id),
   seq INT NOT NULL,
   CONSTRAINT con1 UNIQUE (orderid, seq),
   productid INT NOT NULL,
   quantity INT NOT NULL,
   price NUMERIC(18,4) NOT NULL
 );

how do we enforce things like, for example: a) each order must contain
at least 1 line item; or b) each order must have a non-zero total value;
without making INSERT painful? I'm new to triggers, so I'm thinking of
something like a "before COMMIT" trigger but there's no such thing
apparently.

Regards,
dave

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[SQL] constraint and ordered value

2005-12-27 Thread David Garamond
Is it possible to use only CHECK constraint (and not triggers) to
completely enforce ordered value of a column (colx) in a table? By that
I mean:

1. Rows must be inserted in the order of colx=1, then colx=2, 3, and so on;

2. When deleting (or updating), "holes" must not be formed, e.g. if
there are three rows then row with colx=3 must be the first one deleted,
and then colx=2 the second, and so on.

I can see #1 being accomplished using a NOT NULL + UNIQUE constraint and
a CHECK constraint that calls some PL function where the function does a
simple checking (new.colx=1 if COUNT(colx)==0, or new.colx=MAX(colx)+1).

But is it possible to do #2 using only constraints?

Regards,
dave

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] The Information Schema vs the PG Catalog

2005-12-27 Thread Ken Winter
Thanks, George.  What you say fits with what I was finding.  I think that's
the way I will go.

~ Ken

> -Original Message-
> From: George Pavlov [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, December 27, 2005 12:11 PM
> To: pgsql-sql@postgresql.org
> Cc: [EMAIL PROTECTED]
> Subject: Re: The Information Schema vs the PG Catalog
> 
> I would say that pg_catalog is the more complete one whereas the
> information_schema the more generic, standards-conformant place. I would
> stick with the information_schema unless that becomes inadequate. A case
> in point may be sequences. Apart from
> information_schema.columns.column_default I haven't seen them
> represented anywhere there (please someone correct me if I am wrong).
> You can get more information about sequences from pg_catalog.pg_class
> (look for pg_class.relkind='S') and various views that sit on top of
> that (e.g. pg_statio_all_sequences).
> 
> George
> 
> 
> > I'm writing PL/pgSQL routines that generate triggers, functions,
> > and rules based on design characteristics of tables, columns, and
> > other database objects.  These routines need to be able to look up
> > the definitions of these objects.  I see that there are two places
> > available to look up this info: the Information Schema and in the
> > PG Catalog.
> >
> > Which source is preferable?  Or if that answer isn't absolute,
> > what are the reasons or conditions for preferring one over the
> > other?
> >
> > Also, a specific question:  Does the Information Schema offer any
> > way to list the sequences that exist and their attributes?  I
> > can't seem to find any.
> >



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[SQL] "large" IN/NOT IN subqueries result in query returning wrong data

2005-12-27 Thread George Pavlov
The following looks like a bug to me, but please someone review and tell
me what I might be missing. Seems that past a certain result set size a
"[NOT] IN (subquery)" stops behaving as expected and returns 0 matches
even when there should be matches. No errors are returned, just faulty
data. The exact threshholds seem to depend on the data, subquery and
possibly the indexes in place. Nothing in the documentation mentions a
limit to what "IN subquery" can take so I am quite dismayed (if there is
a limit I would expect an ERROR/WARNING/something, but not wrong data).
Read below for details. This has been tried on PG 8.0.4 and 8.1.1 on
WinXP, and 8.0.4 and 8.0.5 on Linux.

I have two tables:
  t1 (id, name, a_type)
  t2 (id, name)

The t1 table is "big" (483211 rows), the name column contains
duplicates, the a_type has only two values. The t2 table is "small" (40
rows), the name values are unique. Some, but not all, t2.name values are
present in t1.name. To be precise, there are 10 t2.name values that do
not occur in t1.name (this is based on extraneous knowledge). I want to
find out which of the t2.name values are not ever used in t1.name:

select count(*) from t2 where t2.name not in (
  select t1.name from t1);
--> 0

This should return 10, instead it returns 0!!! Adding a LIMIT to the
subquery and doing some trial and error produces very interesting
results:

select count(*) from t2 where t2.name not in (
  select t1.name from t1 limit 261683) 
--> 13
select count(*) from t2 where t2.name not in (
  select t1.name from t1 limit 261684) 
--> 0

What is so magical about 261683? The JOIN alternative produces the
correct results no matter what:

select count(*) 
 from t2 left join t1 using (name)
 where t1.name is null 
--> 10

This pretty much summarizes the issue. Any thoughts greatly appreciated.
Follow a few variations to show how the threshhold varies.

-- restrict to only one a_type
-- 
select count(*) from t2 where t2.name not in (
  select t1.name from t1 where t1.a_type='P')
--> 0
select count(*) from t2 where t2.name not in (
  select t1.name from t1 where t1.a_type='P' limit 289014)
--> 11
select count(*) from t2 where t2.name not in (
  select t1.name from t1 where t1.a_type='P' limit 289015)
--> 0
select count(*) 
 from t2 left join (select * from t1 where a_type='P') t1 using (name)
 where t1.name is null 
--> 11

-- try distinct instead
--
select count(*) from t2 where t2.name not in (
  select distinct t1.name from t1)
--> 0
select count(*) from t2 where t2.name not in (
  select distinct t1.name from t1 limit 56001)
--> 10
select count(*) from t2 where t2.name not in (
  select distinct t1.name from t1 limit 56002)
--> 0

-- distinct + a_type restriction (adding the a_type restriction 
-- here _lowers_ the threshhold)
--
select count(*) from t2 where t2.name not in (
  select distinct t1.name from t1 where t1.a_type='P')
--> 0
select count(*) from t2 where t2.name not in (
  select distinct t1.name from t1 where t1.a_type='P' limit 33620)
--> 11 
select count(*) from t2 where t2.name not in (
  select distinct t1.name from t1 where t1.a_type='P' limit 33621)
--> 0


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [SQL] "large" IN/NOT IN subqueries result in query returning wrong data

2005-12-27 Thread Tom Lane
"George Pavlov" <[EMAIL PROTECTED]> writes:
> select count(*) from t2 where t2.name not in (
>   select t1.name from t1 limit 261683) 
> --> 13
> select count(*) from t2 where t2.name not in (
>   select t1.name from t1 limit 261684) 
> --> 0

> What is so magical about 261683?

Most likely, the 261684'th row of t1 has a NULL value of name.
Many people find the behavior of NOT IN with nulls unintuitive,
but it's per SQL spec ...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [SQL] "large" IN/NOT IN subqueries result in query returning

2005-12-27 Thread John McCawley
At a glance I would guess that NULL values in one or both of your tables 
is the culprit.  NULL values always return false.


Example:

A quick test on my database:

select count(*) FROM tbl_employee;

count
---
 2689

select count(*) FROM tbl_employee WHERE username IS NULL;
count
---
   35

So I have 35 null values.

create table tbl_foo (username varchar(32));

insert into tbl_foo (username) values ('poop');

select count(*) FROM tbl_employee WHERE tbl_employee.username NOT IN 
(select tbl_foo.username FROM tbl_foo);

count
---
 2654

So I only get 2654 values that are NOT IN the set 'poop'...i.e. the NULL 
values are not included when I use the "NOT IN" my query.  Nulls can be 
confusing.


Hope this helps.

George Pavlov wrote:


The following looks like a bug to me, but please someone review and tell
me what I might be missing. Seems that past a certain result set size a
"[NOT] IN (subquery)" stops behaving as expected and returns 0 matches
even when there should be matches. No errors are returned, just faulty
data. The exact threshholds seem to depend on the data, subquery and
possibly the indexes in place. Nothing in the documentation mentions a
limit to what "IN subquery" can take so I am quite dismayed (if there is
a limit I would expect an ERROR/WARNING/something, but not wrong data).
Read below for details. This has been tried on PG 8.0.4 and 8.1.1 on
WinXP, and 8.0.4 and 8.0.5 on Linux.

I have two tables:
 t1 (id, name, a_type)
 t2 (id, name)

The t1 table is "big" (483211 rows), the name column contains
duplicates, the a_type has only two values. The t2 table is "small" (40
rows), the name values are unique. Some, but not all, t2.name values are
present in t1.name. To be precise, there are 10 t2.name values that do
not occur in t1.name (this is based on extraneous knowledge). I want to
find out which of the t2.name values are not ever used in t1.name:

select count(*) from t2 where t2.name not in (
 select t1.name from t1);
--> 0

This should return 10, instead it returns 0!!! Adding a LIMIT to the
subquery and doing some trial and error produces very interesting
results:

select count(*) from t2 where t2.name not in (
 select t1.name from t1 limit 261683) 
--> 13

select count(*) from t2 where t2.name not in (
 select t1.name from t1 limit 261684) 
--> 0


What is so magical about 261683? The JOIN alternative produces the
correct results no matter what:

select count(*) 
from t2 left join t1 using (name)
where t1.name is null 
--> 10


This pretty much summarizes the issue. Any thoughts greatly appreciated.
Follow a few variations to show how the threshhold varies.

-- restrict to only one a_type
 



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [SQL] "large" IN/NOT IN subqueries result in query returning wrong data

2005-12-27 Thread George Pavlov
Yes, yes, of course... There are nulls in my t1 table. And, of course,
NOT IN can return THREE possible values: not just TRUE or FALSE but also
NULL...

select distinct (moo.goo not in (null)) from moo;
--> null

select count(*) from moo where moo.goo not in ('gai', 'pan', null)
--> 0, no matter what the goo values are

Sorry... 

Thanks!

George

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [SQL] "large" IN/NOT IN subqueries result in query returning wrong data

2005-12-27 Thread elein
On Tue, Dec 27, 2005 at 02:44:51PM -0500, Tom Lane wrote:
> "George Pavlov" <[EMAIL PROTECTED]> writes:
> > select count(*) from t2 where t2.name not in (
> >   select t1.name from t1 limit 261683) 
> > --> 13
> > select count(*) from t2 where t2.name not in (
> >   select t1.name from t1 limit 261684) 
> > --> 0
> 
> > What is so magical about 261683?
> 
> Most likely, the 261684'th row of t1 has a NULL value of name.
> Many people find the behavior of NOT IN with nulls unintuitive,
> but it's per SQL spec ...
> 
>   regards, tom lane

In 8.0 we get:

   elein=# select 1 in (NULL, 1, 2);
?column?
   --
t
   (1 row)
   
   elein=# select 3 not in (NULL, 1, 2);
?column?
   --
   
   (1 row)
 
For consistency, either both should return NULL or
both return true/false.

For completeness testing, the following are correct. 
Select NULL in/not in any list returns NULL.
   elein=# select NULL in (1,2);
?column?
   --

   (1 row)
   
   elein=# select NULL not in (1,2);
?column?
   --

   (1 row)
   
   elein=# select NULL in (NULL, 1,2);
?column?
   --

   (1 row)
   
   elein=# select NULL not in (NULL, 1,2);
?column?
   --

   (1 row)
   
elein
--
[EMAIL PROTECTED]Varlena, LLCwww.varlena.com
(510)655-2584(o) (510)543-6079(c)

  PostgreSQL Consulting, Support & Training   

PostgreSQL General Bits   http://www.varlena.com/GeneralBits/
--
AIM: varlenallc  Yahoo: AElein   Skype: varlenallc
--
I have always depended on the [QA] of strangers.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] "large" IN/NOT IN subqueries result in query returning wrong data

2005-12-27 Thread Tom Lane
elein <[EMAIL PROTECTED]> writes:
> In 8.0 we get:

>elein=# select 1 in (NULL, 1, 2);
> ?column?
>--
> t
>(1 row)
   
>elein=# select 3 not in (NULL, 1, 2);
> ?column?
>--
   
>(1 row)
 
> For consistency, either both should return NULL or
> both return true/false.

The above behavior is correct per spec.  Feel free to argue its
consistency with the SQL committee ;-)

Note that the above are not inverses because you changed the lefthand
input.  You do get consistent results when you just add or omit NOT:

regression=# select 1 not in (NULL, 1, 2);
 ?column?
--
 f
(1 row)

regression=# select 3 in (NULL, 1, 2);
 ?column?
--

(1 row)

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] "large" IN/NOT IN subqueries result in query returning wrong data

2005-12-27 Thread elein
On Tue, Dec 27, 2005 at 07:25:40PM -0500, Tom Lane wrote:
> elein <[EMAIL PROTECTED]> writes:
> > In 8.0 we get:
> 
> >elein=# select 1 in (NULL, 1, 2);
> > ?column?
> >--
> > t
> >(1 row)
>
> >elein=# select 3 not in (NULL, 1, 2);
> > ?column?
> >--
>
> >(1 row)
>  
> > For consistency, either both should return NULL or
> > both return true/false.
> 
> The above behavior is correct per spec.  Feel free to argue its
> consistency with the SQL committee ;-)

Oh, no! Not the committee! 

> 
> Note that the above are not inverses because you changed the lefthand
> input.  You do get consistent results when you just add or omit NOT:
Yes, you are right. I skipped the permutations to get down to the point.

> 
> regression=# select 1 not in (NULL, 1, 2);
>  ?column?
> --
>  f
> (1 row)
> 
> regression=# select 3 in (NULL, 1, 2);
>  ?column?
> --
> 
> (1 row)
> 
>   regards, tom lane

Thanks for your clarification.  

~elein
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: explain analyze is your friend