from:"Bill Moseley"

[GENERAL] Select last week

2009-06-06 Thread Bill Moseley

I'm looking at ways to select rows with a timestamp column of "last
week" relative to the current server time.

In 8.3 there's "isoyear" so is this a good approach?

extract( isoyear from foo_time )
= extract( isoyear from now() - interval '1 week' )

AND

extract( week from foo_time )
= extract( week from now() - interval '1 week' )

What about before isoyear was available in Postgresql?

Find the week and then add a week for the range?


date_trunc( 'second', foo_time ) BETWEEN
date_trunc( 'week', now() - interval '1 week' )
AND
date_trunc( 'week', now() - interval '1 week' )
    + interval '1 week' - interval '1 second'

Is there a better approach?



-- 
Bill Moseley
mose...@hank.org
Sent from my iMutt


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Tracking down a deadlock

2009-05-04 Thread Bill Moseley

I seemed to have resolved this issue by a:

LOCK TABLE account IN EXCLUSIVE MODE;

at the very start of the transaction to prevent that transaction
from running in parallel.

The transaction has quite a few selects and inserts and a few updates.
I was hoping to not lock at the start of the transaction which will
effectively serialize that code.

The update that was causing the deadlock was just about the last
command in the transaction.  Removing that update and the deadlocks
go away.

I had hoped that a LOCK near the end of the transaction (before that UPDATE
that deadlocks) would work.  Oddly, it didn't and the deadlock was reported
then on the LOCK itself, plus my deadlock_timeout (60 seconds) didn't seem to
apply in that case.   A mystery.

Thanks for the help,


-- 
Bill Moseley.
mose...@hank.org
Sent from my iMutt

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Tracking down a deadlock

2009-05-03 Thread Bill Moseley

On Sun, May 03, 2009 at 09:08:29PM -0400, Tom Lane wrote:
> > There are no other updates to that account table in the transaction, so I'm
> > confused how that is causing a deadlock.
> 
> Is there more than one row with the target id?

No.  It's a single SERIAL primary key.

> Does the account table have any foreign-key references to or from it?

Many.  I had dropped all constraints, except foreign keys, from the account
table to see if that would help.  (Didn't).  One CHECK constraint involved
checking the sum of two columns in the account table which seemed like a
potential place for a deadlock.  But I didn't think the foreign keys would be a
problem.

About 8 tables reference the account table, and the account table has about 5
columns that reference other tables.

> It's sometimes possible to get a deadlock associated with trying to lock
> FK-referenced rows that several updated rows have in common.

The transaction has a number of selects and inserts not related to the
account table.  There is an insert into a log table that references the account
table, though, that happens right before the update to the account table.
I can't picture the deadlock, but I'm only familiar with the obvious examples.

What's the approach for dealing with this kind of deadlock (assuming the
FK-related as you suggest)?  I assume at this point I need to try explicit
locking earlier in the transaction.  Any suggestions which lock to use and on
what?  SHARE ROW EXCLUSIVE on the account table before issuing the update?

-- 
Bill Moseley.
mose...@hank.org
Sent from my iMutt

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Tracking down a deadlock

2009-05-03 Thread Bill Moseley

On Sat, May 02, 2009 at 11:48:21AM -0400, Tom Lane wrote:
> Bill Moseley  writes:
> > Not getting any nibbles, so allow me to try a short question:
> > If I have a deadlock situation (that will be reported as such by
> > Postgresql once the deadlock_timeout passes), does pg_stat_activity
> > show the queries that are blocking each other?
> 
> In 8.2 or later it should do so; in prior versions there could be some
> lag involved.  Another thing to keep in mind is that backends like to
> cache copies of the pg_stat_activity view --- if you are watching it
> to see what is happening, your view is only current as of the start
> of your current transaction.  Or you can do pgstat_clear_snapshot()
> to force collection of new info.

Thanks Tom,

I'm not clear how to run pgstat_clear_snapshot(), but I looked again and before
I ran my test script and pg_stat_activity doesn't list any queries waiting.
So, I don't believe it's showing stale data.

Then when I run the test script (which runs the same transaction in two
processes at the same time) and get a deadlock the same query is shown twice
both with "waiting" set true:

UPDATE account set foo = 123 where id = $1

And if I remove that update from the transaction I no longer have the deadlock.
So, it seems like that is the problem update.

Is postgresql telling me that it's deadlocked on two transactions trying to run
that same update?

There are no other updates to that account table in the transaction, so I'm
confused how that is causing a deadlock.

Is there something else I can do to understand what exactly is the reason for
the deadlock?

Thanks,

-- 
Bill Moseley.
mose...@hank.org
Sent from my iMutt

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Tracking down a deadlock

2009-05-02 Thread Bill Moseley

Not getting any nibbles, so allow me to try a short question:

If I have a deadlock situation (that will be reported as such by
Postgresql once the deadlock_timeout passes), does pg_stat_activity
show the queries that are blocking each other?

I'm wondering if I'm misinterpreting what I'm seeing below.




On Thu, Apr 30, 2009 at 10:30:26AM -0700, Bill Moseley wrote:
> 
> I need a bit of help understanding what might be causing a deadlock.
> 
> To duplicate the problem I'm running a test script that forks two
> child processes.  Each child runs the same transaction and thus the
> order of execution is exactly the same.  (i.e. not like the typical
> deadlock where the order of updates might be reversed between two
> sessions.)
> 
> The transaction inserts a new document into a document management
> system.  The transaction does a number of selects and inserts.  At the
> end of the transaction they both try and update the same row in the
> "account" table.
> 
> 
> It does not happen every time I run my test script -- but if I run it
> enough I get a deadlock.  If I fork more child process I can make it
> happen more often.  So, it does seem like a timing issue.
> 
> 
> No explicit LOCK or SELECT FOR UPDATE is used in the transaction.
> I'm running in the default "read committed" isolation level.
> The initial problem was reported on PostgreSQL 8.3.5, but
> I'm now testing on PostgreSQL 8.2.9.
> 
> 
> 
> I've set my deadlock_timeout high so I can review the locks.
> I see these entires:
> 
> 
> select * from pg_locks where not granted;
>locktype| database | relation | page | tuple | transactionid | classid 
> | objid | objsubid | transaction |  pid  | mode  | granted 
> ---+--+--+--+---+---+-+---+--+-+---+---+-
>  transactionid |  |  |  |   |  18410123 | 
> |   |  |18410135 | 13420 | ShareLock | f
>  tuple |  2474484 |  2474485 |   30 |11 |   | 
> |   |  |18410123 | 13419 | ExclusiveLock | f
> (2 rows)
> 
> select * from pg_locks where locktype='tuple';
>  locktype | database | relation | page | tuple | transactionid | classid | 
> objid | objsubid | transaction |  pid  | mode  | granted 
> --+--+--+--+---+---+-+---+--+-+---+---+-
>  tuple|  2474484 |  2474485 |   30 |11 |   | |
>|  |18410135 | 13420 | ExclusiveLock | t
>  tuple|  2474484 |  2474485 |   30 |11 |   | |
>|  |18410123 | 13419 | ExclusiveLock | f
> (2 rows)
> 
> 
> And pg_stat_activity shows two of the exact same queries in "waiting"
> state.  The "current_query" is just:
> 
> UPDATE account set foo = 123 where id = $1
> 
> and $1 is indeed the same for both.
> 
> 
> If I comment out that update to the "account" table from the
> transaction I never get a deadlock.
> 
> 
> 
> Maybe I'm missing something, but that by itself doesn't seem like a
> deadlock situation.
> 
> The "account" table does have a number of constraints, and one looks
> like:
> 
>     CHECK( ( foo + bar ) <= 0 );
> 
> Could those be responsible?  For a test I dropped all the constraints
> (except foreign keys) and I'm still getting a deadlock.
> 
> In general, do the constraints need to be deferrable and then defer
> constraints at the start of the transaction?
> 
> What else can I do to debug?
> 

-- 
Bill Moseley.
mose...@hank.org
Sent from my iMutt

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Tracking down a deadlock

2009-05-01 Thread Bill Moseley


I need a bit of help understanding what might be causing a deadlock.

To duplicate the problem I'm running a test script that forks two
child processes.  Each child runs the same transaction and thus the
order of execution is exactly the same.  (i.e. not like the typical
deadlock where the order of updates might be reversed between two
sessions.)

The transaction inserts a new document into a document management
system.  The transaction does a number of selects and inserts.  At the
end of the transaction they both try and update the same row in the
"account" table.


It does not happen every time I run my test script -- but if I run it
enough I get a deadlock.  If I fork more child process I can make it
happen more often.  So, it does seem like a timing issue.


No explicit LOCK or SELECT FOR UPDATE is used in the transaction.
I'm running in the default "read committed" isolation level.
The initial problem was reported on PostgreSQL 8.3.5, but
I'm now testing on PostgreSQL 8.2.9.



I've set my deadlock_timeout high so I can review the locks.
I see these entires:


select * from pg_locks where not granted;
   locktype| database | relation | page | tuple | transactionid | classid | 
objid | objsubid | transaction |  pid  | mode  | granted 
---+--+--+--+---+---+-+---+--+-+---+---+-
 transactionid |  |  |  |   |  18410123 | | 
  |  |18410135 | 13420 | ShareLock | f
 tuple |  2474484 |  2474485 |   30 |11 |   | | 
  |  |18410123 | 13419 | ExclusiveLock | f
(2 rows)

select * from pg_locks where locktype='tuple';
 locktype | database | relation | page | tuple | transactionid | classid | 
objid | objsubid | transaction |  pid  | mode  | granted 
--+--+--+--+---+---+-+---+--+-+---+---+-
 tuple|  2474484 |  2474485 |   30 |11 |   | |  
 |  |18410135 | 13420 | ExclusiveLock | t
 tuple|  2474484 |  2474485 |   30 |11 |   | |  
 |  |18410123 | 13419 | ExclusiveLock | f
(2 rows)


And pg_stat_activity shows two of the exact same queries in "waiting"
state.  The "current_query" is just:

UPDATE account set foo = 123 where id = $1

and $1 is indeed the same for both.


If I comment out that update to the "account" table from the
transaction I never get a deadlock.



Maybe I'm missing something, but that by itself doesn't seem like a
deadlock situation.

The "account" table does have a number of constraints, and one looks
like:

CHECK( ( foo + bar ) <= 0 );

Could those be responsible?  For a test I dropped all the constraints
(except foreign keys) and I'm still getting a deadlock.

In general, do the constraints need to be deferrable and then defer
constraints at the start of the transaction?

What else can I do to debug?


Thanks,


-- 
Bill Moseley
mose...@hank.org
Sent from my iMutt


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pgcrypto: is an IV needed with pgp_sym_encrypt()?

2007-09-18 Thread Bill Moseley

On Tue, Sep 18, 2007 at 11:26:17AM +0300, Marko Kreen wrote:
> Few bytes being same is normal.  Those are PGP packet header,
> telling "this is symmetrically encrypted session key packet,
> with length X" plus some more details.

I see.  So, you are saying no need to generate my own IV to prepend to
the plain text before encrypting.

> If more that few bytes are same, and if the salt is not randomised
> it _could_ be a sign of problem. Either pgcrypto bug or failure
> to initialize random generator.  If you suspect a problem, please
> send me few example encryptions with keys and your setup details
> (postgres version, openssl or not, os version)

No, it was only a few bytes that were similar, so the headers explain
that.

> > Besides following the PCI DSS and external audit procedures, the plan
> > is to use pgcrypto (pgp_sym_encrypt() with AES-256) as part of a
> > credit card storage server.  The server and db are SSL only and the
> > key is passed from the application and never stored anyplace (except
> > in memcached on other servers during the session).  The key is a
> > user's plain text password plus an application-specific secret.  So,
> > each row has its own key.  Passwords must be changed periodically,
> > etc.
> 
> I don't know details of your setup, but I strongly suggest you
> look into using public-key crypto.  That allow you separate keys
> for encryption and decryption.  So in webserver where users only
> input credit cards, you keep only public keys, so anybody cracking
> that won't be able to decrypt data.

I need to look at that more.  But I've seen that suggested where one
needs to decrypt the data at a later time.  We don't have that need.

Our plan was to never store any keys.  Every user must log in to the
application with a password.  Their account passwords are only stored
hashed on disk, so we don't know their passwords.

The plan is to encrypt their plain-text password with a secret known
by the application only and stored into memcached.  It's this
plain-text password that will be sent to a separate server to encrypt
and (and decrypt) their credit card data when the user make a
transaction.

We only need to store the credit card data to allow subsequent charges
to their "card on file" -- and that only happens when a user logs in
and processes a transaction.  We don't have any way to decrypt the
data without this password stored in the session.

If someone hacks an application server they could pluck active user's
passwords from memcached and also find the application's secret word.

Then if they also hacked the credit card server they could then
decrypt the data using passwords they were able to sniff.

See any glaring holes?

Thanks for the help!

-- 
Bill Moseley
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[GENERAL] pgcrypto: is an IV needed with pgp_sym_encrypt()?

2007-09-17 Thread Bill Moseley

I'm just starting with pgcrypto, and I'm curious if it's
needed/recommended to use an initialization vector/value (IV) with
the pgp_sym_encrypt() function.

The docs hint that an IV is used automatically, but encrypting plain
text that starts the same seems to result in initial common cipher
text.  So, I'm not clear.

2. Data is prefixed with block of random bytes. This is equal to
using random IV.

So, I'm currently generating a substring of a md5 hash of a few items
and pre-pending that to the plain text I need to encrypt as the IV.
Then when I decrypt I remove that prefix.


BTW, this is for credit card storage, which is a business requirement.

Besides following the PCI DSS and external audit procedures, the plan
is to use pgcrypto (pgp_sym_encrypt() with AES-256) as part of a
credit card storage server.  The server and db are SSL only and the
key is passed from the application and never stored anyplace (except
in memcached on other servers during the session).  The key is a
user's plain text password plus an application-specific secret.  So,
each row has its own key.  Passwords must be changed periodically,
etc.

I'd welcome any comments or recommendations from others that have
implemented something similar.

Thanks,

-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[GENERAL] Schema sanity check

2007-05-21 Thread Bill Moseley

I have an email client application where a person can have many
mailboxes.  Thus, I have "person" and "mailbox" tables:

create table person (
id  SERIAL PRIMARY KEY,
nametext
);

create table mailbox (
id  SERIAL PRIMARY KEY,
nametext,
owner   integer NOT NULL REFERENCES person ON DELETE CASCADE,
UNIQUE ( owner, id ) -- see below
);

Now, a person might like to have a default mailbox that opens up when
they start the application.  So I add a column to the person table:

ALTER TABLE person ADD column default_mailbox integer
REFERENCES mailbox ON DELETE SET NULL;

Of course, I want to make sure that the person actually owns that
mailbox, so add a constraint (which is why the UNIQUE is required
above).

ALTER TABLE person ADD CONSTRAINT default_mailbox_owner 
FOREIGN KEY (id, default_mailbox) REFERENCES mailbox(owner, id);

Is this a sane way to set up a "default mailbox"?

The other option is to have a column on the mailbox table to flag that
it is a default_mailbox -- but then I'd have to ensure there's only
one column for each "person" flagged that way.



Two more related questions:

First, if I delete a default mailbox the default_mailbox will be set set
NULL.  If instead I never delete a mailbox but rather add a boolean
column "deleted".  ON DELETE is no longer any help.  Is my
only option to use a trigger set NULL any default_mailbox column(s)
that reference the mailbox when it is set "deleted"?



Second question.  So, after a while the obvious problem happens and
users have too many mailboxes and they want a way to group them into
"mailbox_groups" that are containers for mailboxes.  So, we create a
new table and alter the mailbox table.  Each user has their own set of
mailbox groups so I include an "owner" column:

create table mailbox_group (
id  SERIAL PRIMARY KEY,
nametext,
owner   integer NOT NULL REFERENCES person ON DELETE CASCADE,
);

ALTER TABLE mailbox ADD COLUMN mailbox_group int
NOT NULL REFERENCES mailbox_group(id);

Now, I'm wondering about the sanity of the design since this results
in "owner" columns on both the mailbox and mailbox_group tables.  Do I
add a constraint to make sure that mailbox.mailbox_group references a
group that has a matching owner?

Or do I remove the "owner" column from mailbox table and alter all my
access to mailbox to now do a join with the mailbox_group table (to
find the owner)?


(Or do I wonder why I didn't expose the database only through views in
the first place?)

Thanks,



-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Performance issues of one vs. two split tables.

2007-05-15 Thread Bill Moseley

On Tue, May 15, 2007 at 09:20:53PM +0200, PFC wrote:
> >From an outside perspective it just
> >seems odd that potentially a large amount of data would be pulled off
> >disk into memory that is never used.  Perhaps there's an overriding
> >reason for this.
> 
>   Yeah, where would you put this data if you didn't put it where it is 
>   now ?

Swish-e isn't a database by any means, but it does have a way to store
column like meta data for each "row".  When it does a search it only
explicitly pulls from disk the meta data that it's asked to return.
Granted, the OS is reading from disk more than the application is
asking for, but the application is only allocating memory for the data
it's going to return.  And the column (meta data) is not always stored
together on disk.

Without knowing Pg internals I wasn't aware of how the actual table
data was organized and fetched into memory.

>   "Premature optimization is the root of all evil"

Exactly what prompted this thread. ;)

-- 
Bill Moseley
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Performance issues of one vs. two split tables.

2007-05-15 Thread Bill Moseley

On Tue, May 15, 2007 at 01:42:18PM -0400, Chris Browne wrote:
> [EMAIL PROTECTED] (PFC) writes:
> >> SELECT o.id
> >> FROM order o
> >> JOIN customer c on o.customer = c.id
> >>
> >> Does that bring into memory all columns from both order and customer?
> >> Maybe that's not a good example due to indexes.
> >
> > No, it just pulls the columns you ask from the table, nothing
> > less,  nothing more.
> 
> That's not quite 100% accurate.
> 
> In order to construct the join, the entire pages of the relevant
> tuples in tables "order" and "customer" will need to be drawn into
> memory.
> 
> Thus, if there are a whole bunch of columns on each table, the data in
> those extra columns (e.g. - all columns aside from "id", the one that
> was asked for in the result set) will indeed be drawn into memory.

Is that specific to Postgresql?  From an outside perspective it just
seems odd that potentially a large amount of data would be pulled off
disk into memory that is never used.  Perhaps there's an overriding
reason for this.

> If you alter tables "customer" and "order", taking some columns off,
> and stowing them in separate tables, then you'll find that more tuples
> of "customer" and "order" will fit into a buffer page, and that the
> join will be assembled with somewhat less memory usage.
> 
> Whether or not that is a worthwhile change to make will vary
> considerably.

Makes designing the schema a bit tough. ;)

-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Performance issues of one vs. two split tables.

2007-05-15 Thread Bill Moseley

On Tue, May 15, 2007 at 07:51:44AM +0200, Dawid Kuroczko wrote:
> On 5/15/07, Bill Moseley <[EMAIL PROTECTED]> wrote:
> >On Tue, May 15, 2007 at 06:33:26AM +0200, Dawid Kuroczko wrote:
> >> Well, views are not going to help with memory consumption here.
> >> It is the table contents that gets cached in buffer cache, not the
> >> views contents.  So if you have a view which returns only one
> >> column from 15-column table, you will be caching that 15-column
> >> data nonetheless.  View, as the name states, is converted into
> >> a select on a real table.
> >
> >Are you saying that in Postgresql:
> >
> >select first_name, last_name from user_table;
> >
> >uses the same memory as this?
> >
> >select first_name, last_name,
> >passowrd, email,
> >[10 other columns]
> >from user_table;
> 
> Yes.  You read whole page (8KB) into buffer_cache,
> then extract these columns from these buffer.  From the
> buffer cache point of view, whole tuple is contained in the
> cache.

Sorry, I don't mean to drag this thread out much longer.  But, I have
one more question regarding joins.

Say I have a customer table and an order table.  I want a list of all
order id's for a given customer.

SELECT o.id
FROM order o
JOIN customer c on o.customer = c.id

Does that bring into memory all columns from both order and customer?
Maybe that's not a good example due to indexes.

See, I've seen this splitting of one-to-one tables a number of time
(such as the user and user_preferences example) and I'm not sure if
that's just poor schema design, premature optimization, or someone
making smart use of their knowledge of the internal workings of
Postgresql



-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Performance issues of one vs. two split tables.

2007-05-15 Thread Bill Moseley

Can anyone provide input on this question?  I'm curious how to look at
this from a disk and memory usage perspective.  Would using a bit
column type help much?

I'm not thrilled by the loss of referential integrity.

On Mon, May 14, 2007 at 01:37:18PM -0700, Bill Moseley wrote:
> 
> Say I have a table "color" that has about 20 different rows ("red",
> "blue", "green", etc.).  I want the user to be able to select zero or
> more favorite colors.  I would typically use a link table:
> 
> create table favorite_colors (
> color   int references color(id),
> userint references user(id)
> );
> 
> Now, that table can have a large number of rows if I have a large
> number of users and if everyone likes all the colors.
> 
> For some value of "large", is there a time when one might consider
> using a single column in the user or user_prefs table to represent
> their color choices instead of a link table?
> 
> table user_prefs (
> ...
> favorite_colors bit varying,
> ...
> );
> 
> Where each bit represents the primary key of the colors table.
> 
> Seems like poor design, but I'm wondering if there might be overriding
> concerns at times.
> 
> For example, if I have 1 million users and they each like all colors
> and thus have a 20 million row link table how much space would be
> saved by using a bit column as above?

-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Performance issues of one vs. two split tables.

2007-05-14 Thread Bill Moseley

On Tue, May 15, 2007 at 06:33:26AM +0200, Dawid Kuroczko wrote:
> Well, views are not going to help with memory consumption here.
> It is the table contents that gets cached in buffer cache, not the
> views contents.  So if you have a view which returns only one
> column from 15-column table, you will be caching that 15-column
> data nonetheless.  View, as the name states, is converted into
> a select on a real table.

Are you saying that in Postgresql:

select first_name, last_name from user_table;

uses the same memory as this?

select first_name, last_name,
passowrd, email,
[10 other columns]
from user_table;



-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Performance issues of one vs. two split tables.

2007-05-14 Thread Bill Moseley

On Mon, May 14, 2007 at 10:52:13PM +0200, PFC wrote:
> 
> >Say I have a "user" table that has first, last, email, password, and
> >last_accessed columns.  This user table will be accessed often.  (It's
> >not really "user", but that's not important in this discussion)
> >
> >Say that there's also about 10 columns of settings or preferences for
> >each user.  Are there any cases or reasons to have a separate
> >"user_preferences" table vs. just placing all the columns together in
> >one table?
> 
>   I did something like that on MySQL some time ago.
>   In the Users table there was stuff that other users need to see 
>   (like his  login name, etc), and stuff that only this user needs to see 
> (like his  preferences).

>   So, when displaying posts in the forum, for instance, only a small 
>   part  of the fields in the Users table was needed, the rest was just 
> dead  
> weight, that made the table unable to fit in RAM.

Well, that's part of my question.  If not selecting those columns in
the common selects how much "dead weight" is brought along due to the
extra columns defined in the table, if any?

-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[GENERAL] Performance issues of one vs. two split tables.

2007-05-14 Thread Bill Moseley


Is there any benefit of splitting up a table into two tables that will
always have a one-to-one relationship?

Say I have a "user" table that has first, last, email, password, and
last_accessed columns.  This user table will be accessed often.  (It's
not really "user", but that's not important in this discussion)

Say that there's also about 10 columns of settings or preferences for
each user.  Are there any cases or reasons to have a separate
"user_preferences" table vs. just placing all the columns together in
one table?


Another related question:

Say I have a table "color" that has about 20 different rows ("red",
"blue", "green", etc.).  I want the user to be able to select zero or
more favorite colors.  I would typically use a link table:

create table favorite_colors (
color   int references color(id),
userint references user(id)
);

Now, that table can have a large number of rows if I have a large
number of users and if everyone likes all the colors.

For some value of "large", is there a time when one might consider
using a single column in the user or user_prefs table to represent
their color choices instead of a link table?

table user_prefs (
...
favorite_colors bit varying,
...
);

Where each bit represents the primary key of the colors table.

Seems like poor design, but I'm wondering if there might be overriding
concerns at times.

For example, if I have 1 million users and they each like all colors
and thus have a 20 million row link table how much space would be
saved by using a bit column as above?



-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Trigger for Audit Table

2007-03-09 Thread Bill Moseley

On Fri, Mar 09, 2007 at 01:27:51PM -0800, [EMAIL PROTECTED] wrote:
> You can/should create it as an AFTER UPDATE trigger.  The OLD row will
> contain the previous values.

Curiously, also works with a BEFORE UPDATE.

Off to review the docs

-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Trigger for Audit Table

2007-03-09 Thread Bill Moseley

On Fri, Mar 09, 2007 at 06:50:39PM -0500, Tom Lane wrote:
> Bill Moseley <[EMAIL PROTECTED]> writes:
> > I'm asking for a sanity check:
> 
> > And then an audit table:
> 
> > create table template_history (
> > id  SERIAL PRIMARY KEY,
> > template_id integer NOT NULL REFERENCES template ON DELETE 
> > CASCADE,
> > pathtext NOT NULL,
> > content text NOT NULL,
> > last_updated_time   timestamp(0) with time zone NOT NULL
> > );
> 
> Why would you want ON DELETE CASCADE?  Or for that matter to have a
> foreign key here at all?  Surely the point of an audit table is to
> remember history.  If the audit entries all disappear the instant
> the main-table entry is deleted, it's not much of an audit tool.

In this case the templates are short lived so only want to track
history while the "live" record is around.  That is, it's expected
that the template and all its history will get wiped out once in a while.

Still, I agree with you that it would be better to use a different
approach and not cascade delete.

The foreign key is there so can find the history related to the primary record.
That's what ties the history records together (the path can change
during the life of the template).

> 
> > My trigger is very simple:
> 
> > CREATE OR REPLACE FUNCTION audit_template() RETURNS TRIGGER AS '
> > BEGIN
> > INSERT INTO template_history
> > ( template_id, path, content, last_updated_time, 
> > person )
> > select
> > id, path, content, last_updated_time, person
> > from
> > template where id = 1;
> > RETURN NEW;
> > END'
> > language 'plpgsql';
> 
> This is not going to work because the row's not there yet.

This is a BEFORE *UPDATE* trigger, not a BEFORE INSERT, so the row is
there.  The audit table is written when the primary record changes
and the old version is written to the audit table, not the new
version.

Yes, it's more common to write the audit for every insert and update
(so the most recent version is also in the audit table).

> (I won't bother pointing out the thinko in the WHERE clause);

Darn cut-n-paste errors

> and even if it did
> work it'd be unnecessarily inefficient.  Just use the NEW row that's
> passed to the trigger:
> 
>   INSERT INTO template_history(...) VALUES(NEW.id, NEW.path, ...)

Ok, but as the id is a sequence.  I need to test if NEW.id is set after
the insert -- seems like not, IIRC, and  I'd need to use curval().

> If you have other BEFORE triggers on this table that can change the
> NEW row, then it might be better to make this an AFTER trigger so it can
> be sure the NEW row it sees won't change anymore.  But AFTER triggers
> are distinctly less efficient, so if you're not intending to add more
> triggers then using a BEFORE trigger is probably the way to go.

It's just that pesky sequence I need access to after the insert
happens.

Thanks for the tips, Tom.

-- 
Bill Moseley
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[GENERAL] Trigger for Audit Table

2007-03-09 Thread Bill Moseley

I'm asking for a sanity check:


This is a very simple audit table setup where I use a BEFORE UPDATE
trigger to save an existing record.

The table stores templates (for a CMS) and looks something like this:

create table template (
id  SERIAL PRIMARY KEY,
pathtext UNIQUE NOT NULL,
content text NOT NULL,
last_updated_time   timestamp(0) with time zone NOT NULL default now()
);

And then an audit table:

create table template_history (
id  SERIAL PRIMARY KEY,
template_id integer NOT NULL REFERENCES template ON DELETE 
CASCADE,
pathtext NOT NULL,
content text NOT NULL,
last_updated_time   timestamp(0) with time zone NOT NULL
);

(The "path" is not the primary key because the template's path
might get renamed (moved), but I still want to track its history.)


My trigger is very simple:

CREATE OR REPLACE FUNCTION audit_template() RETURNS TRIGGER AS '
BEGIN
INSERT INTO template_history
( template_id, path, content, last_updated_time, person 
)
select
id, path, content, last_updated_time, person
from
template where id = 1;

RETURN NEW;
END'
language 'plpgsql';


CREATE TRIGGER template_history_add BEFORE UPDATE ON template
for each row execute procedure audit_template();


I realize this is a *BEFORE* UPDATE trigger, but I have this vague
memory of seeing a post stating that you can't be sure the existing
row has not been updated yet. Perhaps that was just a concern if
another trigger was to modify the row.  But, I can't seem to find that
post now which is why I'm asking for the sanity check.

Are there potential problems with this setup?


-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Adjacency List or Nested Sets to model file system hierarchy?

2007-02-12 Thread Bill Moseley

On Mon, Feb 12, 2007 at 10:53:53AM -0500, Merlin Moncure wrote:
> On 2/12/07, Richard Broersma Jr <[EMAIL PROTECTED]> wrote:
> >> Can you describe in a little bit more detail about what you mean by
> >> 'Adjaceny LIst'?
> >
> >Adjaceny list is the term used in the celko book to refer to a table that 
> >is recurively related to
> >itself.
> >
> >create table foo (
> >idinteger  primary key,
> >parentid  integer references foo (id),
> >name  varchar not null,
> >);
> 
> Above approach is ok but I can think of at least two other methods
> that are probably better.  First approach is to just store the whole
> path in every record for each file.  Yes, this is a pain for updates
> but searching and children discovery is simple.  in that case I would
> define pkey as (path, file).

Yes, that's what I meant by using a de-normalized table -- including
the full path in the row.  That would provide fast access to each row
via a path name.  And the parent id makes it easy to find all children
of a given node and, well, the parent too.

Separating the path and file as you suggest would make finding all
"files" at a given directory level simple, too.

But, I'm not thrilled about the possibility of the hard-coded path not
matching the path up the tree to the root node, though.  Which, of
course, is why I posted.  But, I'll give it a test.

Thanks,

-- 
Bill Moseley
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Adjacency List or Nested Sets to model file system hierarchy?

2007-02-12 Thread Bill Moseley

On Mon, Feb 12, 2007 at 05:36:37PM +, Ian Harding wrote:
> You don't mention the ltree contrib module, have you looked at it?  It
> can easily meet your requirements without having to reinvent anything.
> It may be what you're referring to as Nested Sets, I don't know.  I
> use it and like it a lot.

Yes, I have seen it.  I just thought it seemed like a very large
"hammer" to use form my task -- quite a few more query methods than I
need .  But, perhaps I should look at it again and get a better
understanding of what it can do.

Thanks,

-- 
Bill Moseley
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: explain analyze is your friend

[GENERAL] Adjacency List or Nested Sets to model file system hierarchy?

2007-02-11 Thread Bill Moseley

I'm looking for a little guidance in representing a file system --
well just the file and directory structure of a file system.

Often articles on representing a hierarchy discuss the advantages of
using Nested Sets (or nested intervals) it seems.  I'm not clear how
well they apply to a file system-like hierarchy, though.

The examples (and my limited understanding) of Nested Sets have the
leaf nodes at the end of the branches, where in a file system a node
can have both leaf nodes (files) and branches (directories).

Also, the Nested Sets seem to solve problems I don't have -- such as
finding all descendants of a given node.

My simple requirements are:

-- Quickly be able to lookup content by a full "path" name

-- Provide "directory" views that shows parent, list of contents
   including any "sub-directories".

-- To be able to easily move branches.

It will not be a large collection of "files" in the tree, so that's
not an issue.

Seems like an Adjacency List along with a de-normalized "path" column
in the leaf nodes would meet the requirements.  But, as I see nested
sets discussed so often I wonder which is a better approach.

I assume this is a reasonably common problem so I'm curious how others
have implemented it.

Thanks,


-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Missing domain socket after reboot.

2006-06-22 Thread Bill Moseley

On Thu, Jun 22, 2006 at 08:16:05AM -0400, Douglas McNaught wrote:
> Bill Moseley <[EMAIL PROTECTED]> writes:
> 
> > Hum.  lsof knows about the file.
> >
> > $ lsof -p 1512 | grep /var/run
> > postmaste 1512 postgres4u  unix 0xf78b5980   1631 
> > /var/run/postgresql/.s.PGSQL.5432
> >
> >
> > Any ideas what happened to the socket?
> 
> Maybe something in your bootup process tried to clean up /var/run and
> deleted it after the postmaster had started?

That's what I thought, but my quick look couldn't find anything in
the init scripts, not that that's conclusive:

$ fgrep /var/run * | grep rm
apache2:[ -f /var/run/apache2/ssl_scache ] && rm -f 
/var/run/apache2/*ssl_scache*
bootclean.sh:   rm -f /var/run/.clean
bootmisc.sh:rm -f /tmp/.clean /var/run/.clean /var/lock/.clean
portmap:  rm -f /var/run/portmap.upgrade-state
portmap:rm -f /var/run/portmap.state
rsync:  rm -f /var/run/rsync.pid
rsync:  rm -f /var/run/rsync.pid
rsync:  rm -f /var/run/rsync.pid
umountnfs.sh:rm -f /tmp/.clean /var/lock/.clean /var/run/.clean

But maybe postgresql is started too early.

$ ls /etc/rc?.d  | grep postgres | head -1
K20postgresql-8.1
K20postgresql-8.1
S20postgresql-8.1
S20postgresql-8.1
S20postgresql-8.1
S20postgresql-8.1
K20postgresql-8.1


Apache, for example, starts S91.

/etc/rc2.d:
K10atdS20courier-imap  S20mysqld-helper  
S21nfs-common
K10cron   S20courier-imap-ssl  S20netatalk   S21quotarpc
K10syslog-ng  S20courier-mta   S20nfs-kernel-server  
S23ntp-server
S10sysklogd   S20courier-pop   S20ntop   S25mdadm
S11klogd  S20courier-pop-ssl   S20oidentdS30sysctl
S14pppS20darwinss  S20postfixS89cron
S15logicalS20exim4 S20postgresql-8.1 S91apache2
S16mountnfsforlogical.sh  S20grlogcheckS20rmnologin  
S91ifp_httpd
S18atdS20httpd S20rsync  S99jabber
S18portmapS20httpd2S20saslauthd  
S99stop-bootlogd
S19spamassassin   S20inetd S20sshS99ud
S19syslog-ng  S20jabberS20syslog-ng
S20binfmt-support S20makedev   S20sysstat
S20courier-authdaemon S20mysqldS20xmail



> Be interesting to see if you can reproduce it...

Next reboot I'll look again.  It's a a production machine so I can't
really bring it up one service at a time.

-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 6: explain analyze is your friend

[GENERAL] Missing domain socket after reboot.

2006-06-21 Thread Bill Moseley

After a reboot today Postgresql 8.1 came back up and started
accepting connections over TCP but the unix socket file was missing.

This is on Debian Stable, and I can't imagine what might of removed
the file.

Running psql I get:

$ psql test
psql: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket 
"/var/run/postgresql/.s.PGSQL.5432"?

Yep, missing:


$ ls -la /var/run/postgresql
total 8
drwxrwsr-x   2 postgres postgres 4096 2006-06-21 17:03 .
drwxr-xr-x  16 root root 4096 2006-06-21 21:10 ..

Config looks ok:

/etc/postgresql/8.1/main$ fgrep unix_socket_dir postgresql.conf 
unix_socket_directory = '/var/run/postgresql'

Startup option:

$ ps ux -u postgres | grep unix_socket
postgres  1512  0.0  0.3  17564  3476 ?S17:02   0:00 
/usr/lib/postgresql/8.1/bin/postmaster -D /var/lib/postgresql/8.1/main -c 
unix_socket_directory=/var/run/postgresql -c 
config_file=/etc/postgresql/8.1/main/postgresql.conf -c 
hba_file=/etc/postgresql/8.1/main/pg_hba.conf -c 
ident_file=/etc/postgresql/8.1/main/pg_ident.conf

Hum.  lsof knows about the file.

$ lsof -p 1512 | grep /var/run
postmaste 1512 postgres4u  unix 0xf78b5980   1631 
/var/run/postgresql/.s.PGSQL.5432


Any ideas what happened to the socket?


I had to stop and start the postmaster to get the socket back.



-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Anyone install 8.1 on Debian Stable?

2006-04-25 Thread Bill Moseley

On Tue, Apr 25, 2006 at 06:58:20PM +0100, Gavin Hamill wrote:
> As per the instructions on the site - you use both :)

(I meant which do you pin the specific package or use the -t option.)

> 
> psql for both 7.x and 8.1 will use version 3 of the libpq API, so
> there's no issue like with mysql 4.0 versus 4.1's new auth system.

I see.  I'll build DBD::Pg from source.  The Dbdpg-general list
recommended linking DBD::Pg against libpq4 instead of libpq3.

> > And I assume I'll need to rebuild DBD::Pg -- and any tricks getting
> 
> No rebuilding should be necessary for the same reasons above -
> however if you do need to build something, grab
> postgresql-server-dev-8.1 :)

Great.  Thanks for all the hand holding. ;)

-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Anyone install 8.1 on Debian Stable?

2006-04-25 Thread Bill Moseley

On Tue, Apr 25, 2006 at 04:57:20PM +0100, Gavin Hamill wrote:
> Very simple www.backports.org :)
> 
> They have 8.1.3 and it works perfectly.. even the -contrib package is 
> there for cube/earthdistance, and the -dev package is there if you want 
> to compile Slony, etc.
> 
> Follow http://www.backports.org/instructions.html for 
> /etc/apt/preferences, and then off you go...

Ok.  So as someone that tried to understand pinning once and gave up,
do you pin the packages or use:

apt-get -t sarge-backports install postgresql

method?

I guess it's in the docs, but how do you use psql with both versions?
And I assume I'll need to rebuild DBD::Pg -- and any tricks getting
it to link with the correct client library?

It's the "managed" part of the Dreamhost server that has me
worried.  ;)

Thanks,

-- 
Bill Moseley
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[GENERAL] Anyone install 8.1 on Debian Stable?

2006-04-25 Thread Bill Moseley

Anyone installed 8.1 on Stable?  Did you build from source or use a
backport?

I've got a managed dedicated server at Dreamhost and trying to decide
if building from source or using a backport is a better approach.

I've had problems in the past with using backports on other servers
(problems showed up when later upgrading the server).  And I'm a bit
concerned about a package conflicting with Dreamhost's management
setup.  (I had 7.4 installed and after some maintenance and a
reboot the package was uninstalled.)  And IIRC, the backports tend to
bring in a number of dependency packages.


If I build from source I need to update /etc/ld.so.conf to point to
/usr/local (for linking with the driver) and install my own init.d
scripts -- both of which I worry about in the managed environment.
Plus, the socket, logs, pid are all not in the standard debian
locations.

So, anyone that has been through this have any advice?  There isn't
an init.d script in the distribution, right?



-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] plgpsql and transactions

2006-04-07 Thread Bill Moseley

On Thu, Apr 06, 2006 at 03:48:15PM -0500, Terry Lee Tucker wrote:
> >
> > -- fires a trigger that updates more than one table
> > insert into semething (default);
> >
> > and:
> >
> > begin;
> > -- fires a trigger that updates more than one table
> > insert into somthing (default);
> > commit;
> >
> 
> In the latter, you have expanded the scope of the transaction; which, 
> sometimes you might want to do.

Yes, I might.  But, I'd like to understand it so I do know when I might
or might not want to do it.

Say I have a BEFORE INSERT trigger that does a table lock.  When
is that lock released?  At the end of the trigger?  Or after the
INSERT has completed?

For example, say I want to set a column on the row I'm inserting based
on what's already in the table.  So I lock the table in the trigger
and check the current status of the table and set the column based on
that current status.

I want to make sure that between the time the trigger completes and
when the insert finally happens that another session can't also do an
insert and see the same table state.

-- 
Bill Moseley
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] plgpsql and transactions

2006-04-06 Thread Bill Moseley

On Thu, Apr 06, 2006 at 01:44:57PM -0500, Terry Lee Tucker wrote:
> Triggers fire inside a transaction.

Ah, thanks.  Makes sense since each statement is in an implicit
transaction.

Granted, would help to see the trigger, but
these are basically the same?

-- fires a trigger that updates more than one table
insert into semething (default);

and:

begin;
-- fires a trigger that updates more than one table
insert into somthing (default);
commit;




-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[GENERAL] plgpsql and transactions

2006-04-06 Thread Bill Moseley

In a BEFORE INSERT trigger, depending on input values, I need to lock
a table and do a few selects.  Of course, the "lock table" isn't much
use if not currently in a transaction.

So my question is this:  can I tell if I'm inside a transaction or
not and issue a BEGIN if not.  And then also set a flag so that after
the INSERT I can detect that I issued a BEGIN and do a COMMIT?

Or, maybe better is to just throw an exception if not already inside a
transaction.

BTW -- it seems odd to me that you can issue a lock table outside of
an explicit begin/commit and not get a warning.  When would issuing a
lock table outside an explicit transaction be of any use?


-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[GENERAL] Table locks and serializable transactions.

2006-03-11 Thread Bill Moseley

I need to insert a row, but how that row is inserted depends on the
number of items existing in the table.  I initially thought
SERIALIZABLE would help, but that only keeps me from seeing changes
until the commit in that session.

Am I correct that if I need to insert a row into a table that contains
column info based on the state of the table I need to lock the table
in "share row exclusive mode"?

In my case I have a table that holds registrations, and a
registration has a column "status" that can be "confirmed", "wait
list", or "cancel".

Any inserts should be "wait list" if the number of existing
"confirmed" is > $max_confirmed OR if any rows are marked "wait list".

Obviously, I don't want to let another insert happen in another
session between the select and insert.


So, in that case is "share row exclusive mode" the way to go?



I'm not that clear how locking and serializable work together:

The serializable isolation level would only be needed if I wanted to
see a frozen view of other selects (on other tables) during the
transaction.  That is, the locked table can't have updates in other
sessions due to the lock so I'll see a frozen view of that table
regardless of serializable.

In other words, using serializable doesn't add anything if the table
is already locked in the transaction and all I'm looking at is that
one locked table.

Thanks,


-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Mixing different LC_COLLATE and database encodings

2006-02-18 Thread Bill Moseley

On Sat, Feb 18, 2006 at 09:31:27PM -0500, Greg Stark wrote:
> Anything else and the collation just won't work properly. It will be
> expecting UTF-8 and be fed ISO-8859-1 strings, resulting in weird
> and sometimes inconsistent sort orders.

So if I have utf8 encoded text and the lc_collate is anything but
utf8 then sorting will be all wrong for any chars that don't map to
ASCII (>127).  Kind of a mess.

> There's a certain amount of feeling that using any locale other than C is
> probably not ever the right thing given the current functionality. Just about
> any database has some strings in it that are really just ascii strings like
> char(1) primary keys and other internal database strings. You may not want
> them being subject to the locale's collation for comparison purposes and you
> may not want the overhead of variable width character encodings.

Is the Holy Grail encoding and lc_collate settings per column?

Changing topics, but I'm going to play with different cluster
settings for collate.  If I create a cluster in given directory
is there any problems with moving that cluster (renaming the
directory)?

Thanks for your comments, Greg.

-- 
Bill Moseley
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [GENERAL] Mixing different LC_COLLATE and database encodings

2006-02-18 Thread Bill Moseley

On Sat, Feb 18, 2006 at 01:40:09PM -0500, Tom Lane wrote:
> Bill Moseley <[EMAIL PROTECTED]> writes:
> > - To clarify the first point, if the database is encoded utf-8 and
> > lc_collate is en_US then Postgresql does NOT try to convert utf-8 to
> > 8859-1 before sorting.
> 
> Basically, this is a horribly bad idea and you should never do it.
> The database encoding should always match what the locale assumes
> for its character set (unless the locale is "C", which doesn't care).

What's a bad idea?  Having a lc_collate on the cluster that doesn't
support the encodings in the databases?

> We'd enforce that you never do it if we knew a portable way to determine
> the character set assumed by an LC_COLLATE setting.

Again, not sure what "it" is, but I do find it confusing when the
cluster can have only one lc_collate, but the databases on that
cluster can have more than one encoding.  That's why I was asking
how postgresql handles (possibly) different encodings.

Are you saying that if a database is encoded as utf8 then the cluster
should be initiated with something like en_US.utf8?  And then all
databaes on that cluster should be encoded the same?

I suspect I don't understand how LC_COLLATE works that well.

I thought the locale defines the order of the characters, but not the
encoding of those characters.  Maybe that's not correct. I assumed the
same locale should sort the same chars represented in different
encodings the same way.  Maybe that's not the case:

$ LC_ALL=en_US.UTF-8 locale charmap
UTF-8

    $ LC_ALL=en_US locale charmap
ISO-8859-1

$ LC_ALL=C locale charmap
ANSI_X3.4-1968

-- 
Bill Moseley
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Mixing different LC_COLLATE and database encodings

2006-02-18 Thread Bill Moseley

On Sat, Feb 18, 2006 at 05:20:19PM +0100, Peter Eisentraut wrote:
> > 2) What client encoding is used if the client does not specify one?
> 
> the server encoding

What's the server encoding?  The environment when the cluster is
started?  How do you find out what it's running as?

Does that mean if the encoding is anything other than "C" then
Postgresql will convert?  That is, if my database is utf8 and the
server is en_US then text will be sent to the client as 8859-1?  Not,
that's not correct as I'm not seeing that.  So I guess I'm not clear
on that point.

> > 5) I suppose there's not way to answer this, short of running
> > benchmarks, but any ideas what using a lc_collate with utf-8 would do
> > to performance?  Is it a big hit?
> 
> I don't know why that would be a problem.

Just that sorting utf8 is a bit more work that sorting raw bytes.

Thanks for the help,

-- 
Bill Moseley
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: explain analyze is your friend

[GENERAL] Mixing different LC_COLLATE and database encodings

2006-02-18 Thread Bill Moseley



I've been going through the docs and list archives trying to get
clear on encoding issues, but still have a few question.

Do I have these statements correct?

- LC_COLLATE is set on the cluster at initdb time.  From that point
on all database text is sorted based on that *regardless* of the
encoding set on an individual database.

- So for lc_collate="C" sorting is just based on the byte values, and
if lc_collate="en_US" then sorting is based on the us_US order, and
the bytes are assumed to be 8859-1 (if that matters).

- To clarify the first point, if the database is encoded utf-8 and
lc_collate is en_US then Postgresql does NOT try to convert utf-8 to
8859-1 before sorting.

- If the "client encoding" and the database encoding differ then
Postgresql will convert between the two encodings during I/O.

- The exception is if *either* the client or the server's encoding is
"SQL_ASCII" then no client<=>server conversion is done.


Sound about right?


1) What else is the database's encoding used for besides to determine
how to convert text in input and output based on the client encoding?

2) What client encoding is used if the client does not specify one?
For example, I'm looking through Perl's DBD::Pg and I don't see any
place where it calls PQsetClientEncoding().


  http://search.cpan.org/src/DBDPG/DBD-Pg-1.41/
  
http://www.postgresql.org/docs/7.4/interactive/multibyte.html#MULTIBYTE-TRANSLATION-TABLE


3) The vast majority of my utf-8 encoded text that I need to display
sorted probably maps to 8859-1 characters.

I think I already answered this above, but:

Am I correct that Postgresql is *not* converting text from the
database encoding to the cluster encoding before sorting?  That is
with "C" it's just sorting in byte order, and with en_US it's just
assuming that the bytes are 8859-1 and ignoring that it's really
utf-8?

That is, if I have text that's in utf-8 but includes characters that
would map to 8859-1 (say accented chars), that sorting will not be
correct because it's not converted to 8859-1 when sorting?



4) If the above is true, then if I wanted my utf-8 encoded text to be
sorted correctly then I'd need to re-initdb using --encoding=en_US.UTF-8,
correct?



5) I suppose there's not way to answer this, short of running
benchmarks, but any ideas what using a lc_collate with utf-8 would do
to performance?  Is it a big hit?




Not related to Postgresql, but testing some of this is confusing
due to my environment.  How do I get my xterm to work with utf8?
Does ssh do something with encoding?

If I have a utf8 xterm window open on my machine, then ssh to
the server running postgresql where the default locale is "POSIX"
Then running:

LANG=en_US.utf8 psql utf8test
utf8test=> \encoding
UNICODE

utf8test=> select first_name from person where last_name = 'Anderson';

Then I see:

Zo

But, if on that same remote machine I run a unicode xterm (uxterm in
Debian) then in that xterm window I do:

utf8test=> \encoding
UNICODE
utf8test=> select first_name from person where last_name = 'Anderson';
    Zoë  (correct)

It's must slower running xterm remotely than using my local xterm and
ssh, so it would be nice to be able to display the utf8.



-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Limiting with a left outer join

2006-02-12 Thread Bill Moseley

On Sat, Feb 11, 2006 at 12:35:34AM -0700, Michael Fuhr wrote:
> On Fri, Feb 10, 2006 at 11:59:30AM -0800, Bill Moseley wrote:
> > How do I make a join on the class table but not effect the left outer
> > join?
> 
> Are you looking for something like this?
> 
> LEFT OUTER JOIN (class INNER JOIN class_domain ON ...) c ON ...

Why, yes I am.  Thank you.

I'll post my select below, just in case anyone cares to review it for
sanity. ;)

Something is not quite right about my schema, I fear.  The idea of the
domains is to limit viewing of classes and workshops to different
groups of users.  A given workshop may be available to more than one
group.

But, one problem is it's possible that a class and its parent
workshop may not have a domain in common.  Maybe that's something the
application code needs to enforce, and not the database.

BTW -- Is there a way to turn something like this into a view?  The
2 domain bind parameters will alway match, and the only other
input parameters are the two review mode booleans.  That is, the
input to the query is a domain id, and if "review_mode" must be false.

SELECT  w.id,
count(c.id) as class_count,
w.name as name,
scheduled_message,
no_scheduled_message,

(CASE
WHEN workshop_comment_end_time > now()
THEN workshop_comment
ELSE NULL
END) AS workshop_comment,

(CASE
WHEN new_workshop_end_time > now()
THEN '1'
ELSE NULL
END) AS is_new,

w.review_mode as review_mode,

workshop_category.name as workshop_cat

FROMworkshop w INNER JOIN workshop_category ON
(
workshop_category.id = w.workshop_category
AND w.review_mode IS FALSE
)

INNER JOIN workshop_domain ON
(
workshop_domain.workshop = w.id
AND workshop_domain.domain = ?
)

LEFT OUTER JOIN
(class INNER JOIN class_domain ON
(
class_domain.class = class.id
AND class_domain.domain = ?
AND class.review_mode IS FALSE
AND class.register_cutoff_time >= now()
)
)
c ON (c.workshop = w.id )

GROUP BY1,3,4,5,6,7,8,9,
w.start_display_time,
w.stop_display_time

HAVING  ( count(c.id) > 0 ) OR
(
  (now() between w.start_display_time and w.stop_display_time)
  OR
  (w.stop_display_time IS NULL AND
-- probably don't need to check for NOT NULL here
w.start_display_time IS NOT NULL AND w.start_display_time <= 
now())
  OR
  (w.start_display_time IS NULL AND
w.stop_display_time IS NOT NULL and w.stop_display_time > now())
)

ORDER BYw.id

-- 
Bill Moseley
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[GENERAL] Limiting with a left outer join

2006-02-10 Thread Bill Moseley

I've been staring at this too long.  Could someone give me a bit of
sql help?


I want to show a list of upcoming workshops, and the number of
sessions (classes) offered for each workshop:

 workshop_id | classes_offered | review_mode | workshop_cat
-+-+-+--
   3 |   0 | t   | On Ground
  29 |   5 | f   | On Ground
  30 |   0 | f   | On Ground
  31 |   1 | f   | On Line
  61 |   3 | f   | On Ground
  62 |   2 | f   | On Ground
  63 |   1 | f   | On Line


A class is an instance of a given workshop (location and date given).
A class references a workshop.

Now, I'm using a LEFT OUTER JOIN to list workshops that don't have
any classes assigned yet.  Those are the zeros above.


Where I'm stuck is I need to apply limits to what rows to select.
For example, I don't want to include classes or workshops that are in
"review_mode".  Also, both workshops and classes can belong to
"domains" (via link tables) so need to only look at those, too.

Trying to do the "class_domain" join is where I'm stuck.  Here's
without that join, which sees to work:



FROMworkshop w INNER JOIN workshop_category ON
(
workshop_category.id = w.workshop_category
AND w.review_mode IS FALSE
)

INNER JOIN workshop_domain ON
(
workshop_domain.workshop = w.id 
AND workshop_domain.domain = 1
)

LEFT OUTER JOIN class c ON
(
c.workshop = w.id
AND c.register_cutoff_time >= now()
AND c.review_mode IS FALSE
)

The class table also has a "class_domain" table (like the
workshop_domain).  But, I'm not seeing how to make that join.


This pulls all the zeros out of the results:

[...]

LEFT OUTER JOIN class c ON
(
c.workshop = w.id
AND c.register_cutoff_time >= now()
AND c.review_mode IS FALSE
)
INNER JOIN class_domain ON (
class_domain.class = c.id
AND class_domain.domain = 1
)


It's these left outer joins that always get me.


What I think I need is something like:

[...]

LEFT OUTER JOIN class c ON
(
c.workshop = w.id
AND c.register_cutoff_time >= now()
AND c.review_mode IS FALSE
AND class_domain.class = c.id
AND class_domain.domain = 1
)

But, that's not part of the join, of course.

How do I make a join on the class table but not effect the left outer
join?


Thanks,



-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] SQL Help: Multiple LEFT OUTER JOINs

2005-11-21 Thread Bill Moseley

On Mon, Nov 21, 2005 at 02:06:35PM -0800, Bill Moseley wrote:
> The now working query (thanks to you!) is:

No that doesn't work.  It's dropping the people that have never
been assigned a class to teach (i.e. don't have a row in the
"instructors" link table).


>   FROM class INNER JOIN instructors ON class.id = instructors.class
> LEFT OUTER JOIN person  ON person.id = instructors.person,
> person_role

I really seem to need the multiple left outer join.  This works:

SELECT  person.id AS id, last_name,
person_role.role AS role,
count(instructors.class),
sum (CASE WHEN class_time > now() THEN 1 ELSE 0 END) as 
future_class_count,
sum (CASE WHEN class_time <= now() THEN 1 ELSE 0 END) as 
past_class_count


  FROM  person LEFT OUTER JOIN instructors ON (person.id = 
instructors.person)
LEFT OUTER JOIN class ON (instructors.class = class.id),
person_role

 WHERE  person_role.person = person.id
-- AND person_role.role = 2

  GROUP BY  person.id, last_name, person_role.role;


I'm not clear how to move that "person_role.person = person.id" into
the FROM statement.  Does it matter?


-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] SQL Help: Multiple LEFT OUTER JOINs

2005-11-21 Thread Bill Moseley

  4 |  0 |4
   9 | 4 | 4 |  0 |4
  31 | 4 | 4 |  0 |4

-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] SQL Help: Multiple LEFT OUTER JOINs

2005-11-21 Thread Bill Moseley

On Mon, Nov 21, 2005 at 11:45:34AM -0600, Bruno Wolff III wrote:
> On Mon, Nov 21, 2005 at 05:40:10 -0800,
>   Bill Moseley <[EMAIL PROTECTED]> wrote:
> > 
> > Here's where I'm missing something.  Trying to do an outer join on
> > to bring in the class row with its class_time column:
> 
> You don't say exactly why you are having a problem with this, but I think you
> would be better off doing an inner join between instructors and class and
> then do an outer join of that result to person.

Sorry, I thought I was so far off it might be obvious.  I suspect I'm
making the query harder than it really is.

This query just eats CPU and doesn't seem to finish, but I didn't let
it run more than a minute (which is forever as far as I'm concerned).
The tables are not that big (10,000 people, 1500 classes)


> > SELECT  person.id AS id, last_name,
> > count(instructors.class) as total,
> > sum (CASE WHEN class_time > now() THEN 1 ELSE 0 END) as 
> > future_class_count,
> > sum (CASE WHEN class_time <= now() THEN 1 ELSE 0 END) as 
> > past_class_count
> > 
> > 
> >   FROM  (person LEFT OUTER JOIN instructors ON (person.id = 
> > instructors.person)) t
> >  LEFT OUTER JOIN class on ( t.class = class.id ),
> > person_role
> > 
> >  WHERE  person_role.person = person.id
> > AND person_role.role = 3
> > 
> >   GROUP BY  person.id, last_name;

Well, I'm stabbing in the dark now.  You mean like:

SELECT  person.id AS id, first_name, last_name, 
count(instructors.class) as total_classes,
sum (CASE WHEN class.id IS NULL THEN 0 ELSE 1 END) as 
total_class_count,  -- which is better?
sum (CASE WHEN class_time > now() THEN 1 ELSE 0 END) as 
future_class_count,
sum (CASE WHEN class_time <= now() THEN 1 ELSE 0 END) as 
past_class_count


  FROM  (class INNER JOIN instructors ON ( class.id = instructors.class 
)) t
LEFT OUTER JOIN person  ON ( person.id = t.person ),
person_role 

WHERE   person_role.person = person.id
AND person_role.role = 3

GROUP BYperson.id, first_name, last_name;

Still eats CPU.

 GroupAggregate  (cost=1750458.67..1890662.91 rows=10212 width=39)
   ->  Sort  (cost=1750458.67..1767958.67 rows=700 width=39)
 Sort Key: person.id, person.first_name, person.last_name
 ->  Nested Loop  (cost=111.27..140276.35 rows=700 width=39)
   ->  Nested Loop  (cost=91.27..256.35 rows=7000 width=35)
 ->  Hash Join  (cost=71.27..96.35 rows=7 width=31)
   Hash Cond: ("outer".id = "inner"."class")
   ->  Seq Scan on "class"  (cost=0.00..20.00 rows=1000 
width=12)
   ->  Hash  (cost=71.25..71.25 rows=7 width=27)
 ->  Nested Loop  (cost=3.20..71.25 rows=7 
width=27)
   ->  Hash Join  (cost=3.20..30.77 rows=7 
width=12)
 Hash Cond: ("outer".person = 
"inner".person)
 ->  Seq Scan on instructors  
(cost=0.00..20.00 rows=1000 width=8)
 ->  Hash  (cost=3.01..3.01 rows=75 
width=4)
   ->  Index Scan using 
person_role_role_index on person_role  (cost=0.00..3.01 rows=75 width=4)
 Index Cond: (role = 3)
   ->  Index Scan using person_pkey on 
person  (cost=0.00..5.77 rows=1 width=23)
 Index Cond: ("outer".person = 
person.id)
 ->  Materialize  (cost=20.00..30.00 rows=1000 width=4)
   ->  Seq Scan on instructors  (cost=0.00..20.00 
rows=1000 width=4)
   ->  Materialize  (cost=20.00..30.00 rows=1000 width=4)
 ->  Seq Scan on "class"  (cost=0.00..20.00 rows=1000 
width=4)
(22 rows)



> > 
> > 
> > 
> > 
> > 
> > -- 
> > Bill Moseley
> > [EMAIL PROTECTED]
> > 
> > 
> > ---(end of broadcast)---
> > TIP 1: if posting/reading through Usenet, please send an appropriate
> >subscribe-nomail command to [EMAIL PROTECTED] so that your
> >message can get through to the mailing list cleanly
> 

-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[GENERAL] SQL Help: Multiple LEFT OUTER JOINs

2005-11-21 Thread Bill Moseley


I need to generate a table of teachers, and the count of classes they
taught in the past and are scheduled to teach in the future.

 id  |  last_name   |   totalfuture_class_count | past_class_count 
-+--+--++-
   3 |   Smith  |   12 |  3 |   9
   8 |   Jones  |0 |  0 |   0


table person
id
last_name

table class
id
class_time

table role
id
role_name  -- for limiting to a type of teacher

-- link tables

table person_role
person  references person
rolereferences role

-- This table ties a person to a class, thus making them an instructor

table instructors 
person references person
class references class


I can easily get instructors and the total count of their classes:

SELECT  person.id AS id, last_name, count(instructors.class)

  FROM  person LEFT OUTER JOIN instructors ON (person.id = 
instructors.person),
person_role

 WHERE  person_role.person = person.id
AND person_role.role = 3  -- limit to this type of teacher

  GROUP BY  id, last_name;


Here's where I'm missing something.  Trying to do an outer join on
to bring in the class row with its class_time column:


SELECT  person.id AS id, last_name,
count(instructors.class) as total,
sum (CASE WHEN class_time > now() THEN 1 ELSE 0 END) as 
future_class_count,
sum (CASE WHEN class_time <= now() THEN 1 ELSE 0 END) as 
past_class_count


  FROM  (person LEFT OUTER JOIN instructors ON (person.id = 
instructors.person)) t
 LEFT OUTER JOIN class on ( t.class = class.id ),
person_role

 WHERE  person_role.person = person.id
AND person_role.role = 3

  GROUP BY  person.id, last_name;





-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Wrong rows selected with view

2005-11-17 Thread Bill Moseley

On Wed, Nov 16, 2005 at 11:34:21PM -0500, Tom Lane wrote:
> No, I'm saying that the underlying data (the join result before applying
> DISTINCT ON) looks like this:
> 
> bill=# select  "class".id, person.id AS person_id
> bill-#FROM "class", "location", region, person, instructors
> bill-#   WHERE "class"."location" = "location".id AND "class".id = 
> instructors."class" 
> bill-# AND instructors.person = person.id AND "location".region = region.id
> bill-#   ORDER BY "class".id; 
>   id  | person_id 
> --+---
> 1 |49
> 2 |27
> 3 |19
> 4 |82
> 5 |12
>  ...
>  1238 |61
>  1238 |60
>  1239 |40
>  1240 |67
>  1241 |11
>  1243 |26
>  1243 |84
>  1244 |26
>  1244 |84
> (1311 rows)
> 
> The DISTINCT ON will take just one of the two rows with id = 1243, and
> just one of the rows with id = 1244, and *it is effectively random which
> one gets picked*.  So when you then select rows with person_id = 84, you
> may or may not see these rows in the end result.

Yikes!  The problem is *when* DISTINCT ON happens, right?

And, Tom,  you actually explained this to me on the list back on Aug 25th,
but that's when I was using the view in a different way.  You noted
that the order was "unpredictable" but at that time it didn't matter
which row was selected to me.

http://archives.postgresql.org/pgsql-general/2005-08/msg01291.php

This current problem was due to my assumption of how PG executes the
query:

My assumption was that the select would first do the joins (including
limit by class.id = 84) *then* weed out the duplicate class.ids.

But if PG is first doing the the joins on all the tables (before
limiting by class.id = 84) and then weeding out the duplicate
class.ids, and then finally limiting by class.id = 84 then I can see
where I might end up wit the missing row.

Frankly, I expected the first to happen because it would use an index
to select just the records of class.id = 84, then do the joins on
that small set of records.  Didn't seem likely that the database would
join all the records first and then limit by class.id.  Seems like the
hard way to do the query.  But the query planner works in strange and
mysterious ways. ;)

Does that also explain why PG was sometimes returning the "correct"
number of rows?  Depending on which of the two query plans above
were used?

> Exactly.  So your view is going to return the class id along with a
> randomly selected one of the instructor ids.  It seems to me that
> filtering this result on instructor id is perhaps a bit ill-advised,
> even if you fix the view so that the chosen instructor id isn't so
> random (eg, you could fix it to display the lowest-numbered instructor
> id for the particular class).  Even then, are you searching for the
> instructor id that the view happens to show for that class, or some
> other one?

Well, clearly, my "one-size-fits-all view" doesn't work in this case.
I just need another view without distinct when limiting by instructor.

It was that red-herring of removing a seemingly random column from the
view that made it hard to see what was really happening.

Thanks very much for all your time.

-- 
Bill Moseley
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Wrong rows selected with view

2005-11-16 Thread Bill Moseley

On Wed, Nov 16, 2005 at 07:48:06PM -0500, Tom Lane wrote:
> Bill Moseley <[EMAIL PROTECTED]> writes:
> > [ strange behavior ]
> 
> Oh, duh, it's not a PG bug: the problem is that the view is
> underspecified.  You have
> 
> SELECT DISTINCT ON (class.id)
> ... a bunch of stuff ...
> FROM ... a bunch of tables ...
> ORDER BY class.id;
> 
> The difficulty with this is that DISTINCT ON will take the first row in
> each group with the same class.id.  And since you're only sorting by
> class.id, "the first row" is ill-defined.

Sorry, but I fear I'm missing something.


That ORDER BY is added by PG -- it's not part of my view when I define
it.  I assume PG adds that so it can do the DISTINCT ON.

Still, I don't have any duplicate class.id rows in this select that I
can see.  class.id 1243 and 1244 are not the same, yet PG is
(sometimes) throwing out one of them.  Are you saying that somehow PG
thinks they are the same class.id and is thus removing one?


I'm asking for a list of all classes taught by instructor 84.


ws2=> select * from instructors where person = 84 order by class;   
 person | class 
+---
 84 |   727
 84 |   739
 84 |   804
 84 |   813
 84 |   867
 84 |  1243
 84 |  1244
(7 rows)


The reason I'm using DISTINCT ON is because the "class_list" view is
suppose to just return a list of unique classes, and a class might have
more than one instructor which would result in extra rows -- as shown
here:

ws2=> select * from instructors where class in (select class from instructors 
where person = 84);
 person | class 
+---
 84 |   727
 84 |   739
 84 |   804
 84 |   813
 84 |   867
 84 |  1243
 26 |  1243
 84 |  1244
 26 |  1244
(9 rows)

So when I don't want duplicates:

ws2=> select distinct on (class) * from instructors where class in (select 
class from instructors where person = 84);
 person | class 
+---
 84 |   727
 84 |   739
 84 |   804
 84 |   813
 84 |   867
 84 |  1243
 84 |  1244
(7 rows)

I don't care in this case about the DISTINCT ON throwing out the
duplicates -- I just care about distinct classes, not that all the
instructors are included in this select.




And even if I throw in all my other joins I get the same thing:

ws2=>SELECT DISTINCT ON (class.id)
ws2-> class.id AS id
ws2-> 
ws2->   FROM class, location, region, person, instructors
ws2-> 
ws2->  WHERE class.location   = location.id   -- join with 
location
ws2->AND class.id = instructors.class   -- join the 
instructors
ws2->AND instructors.person   = person.id -- join the 
person(s)
ws2->AND location.region  = region.id   -- join the 
location to a region
ws2->AND person.id = 84;
  id  
--
  727
  739
  804
  813
  867
 1243
 1244
(7 rows)


> I'm not sure why qsort's
> behavior seems to depend on the width of the rows, but there's no doubt
> that it's sorting different rows to the front of each group depending
> on which view you use.

I just don't see what groups there are, though in this case.


> It could also be that you don't want to be using DISTINCT ON at all;
> have you thought through exactly what this view ought to produce for
> each class.id?

Yes, I think so.  A list of columns related to it, with the exception
of when there's duplicate instructors I want one of those duplicates
thrown out (and I don't care which one).

When I do a query that generates duplicate class.id's such as
when a class has more than one instructor:

ws2=> select  class.id AS class_id,
ws2-> person.id AS person_id
ws2-> 
ws2->   FROM class, location, region, person, instructors
ws2-> 
ws2->  WHERE class.location   = location.id   -- join with 
location
ws2->AND class.id = instructors.class   -- join the 
instructors
ws2->AND instructors.person   = person.id -- join the 
person(s)
ws2->AND location.region  = region.id   -- join the 
location to a region
ws2->AND class_time > now();
 class_id | person_id 
--+---
  561 |95
  614 |95
  747 |   111
  762 |   111
  772 |   111
  883 |13
  924 |26
  935 |26
  945 |26
 1243 |84
 1243 |26
 1244 |84
 1244 |26
(13 rows)


You can see some classes are listed twice, so using distinct on gets
just my list of unique classes:


ws2=> SELECT DISTINCT ON (class.id)
ws2->

Re: [GENERAL] Wrong rows selected with view

2005-11-16 Thread Bill Moseley

On Wed, Nov 16, 2005 at 10:53:21AM -0500, Tom Lane wrote:
> Bill Moseley <[EMAIL PROTECTED]> writes:
> > The first plan below returns the correct number of rows, the second plan 
> > does
> > not.  These are after I did the reindex, btw.
> 
> Bizarre.  What are the datatypes of the columns being joined on?  If
> they're string types, what's your database locale and encoding?

The primary keys are all SERIAL, and the FKs are integer.  Nothing too
odd.

The odd thing is the row that is not returned is basically a clone of
another row -- which is why I diff'ed them in my first posting.

BTW, this might be obvious, but the reason I'm doing DISTINCT ON
class.id is that the instructors table is a link table and a class can
have more than one instructor.  I only want a list of classes, not one
per instructor (which could duplicate them).



I'm still a novice with Pg, so I assume this is what you are asking
(although none of my joins are on text fields).

ws2=> SHOW LC_CTYPE;
-[ RECORD 1 ]---
lc_ctype | en_US

ws2=> SHOW SERVER_ENCODING;
-[ RECORD 1 ]---+---
server_encoding | LATIN1

So my joins are:

 WHERE class.location   = location.id   -- join with location
   AND class.id = instructors.class   -- join the 
instructors
   AND instructors.person   = person.id -- join the person(s)
   AND location.region  = region.id;-- join the location to 
a region

And the .id are all SERIAL integer and the FKs are all integer.

Trying to avoid sending too much unnecessary data to the list, but
here's a sample of the tables:


ws2=> \d region 
 Table "public.region"
   Column   |  Type   |   Modifiers
+-+
 id | integer | not null default nextval('public.region_id_seq'::text)
 active | boolean | not null default true
 sort_order | integer | not null default 1
 name   | text| not null
Indexes:
"region_pkey" primary key, btree (id)
"region_name_key" unique, btree (nam


ws2=> \d instructors
  Table "public.instructors"
 Column |  Type   | Modifiers 
+-+---
 person | integer | not null
 class  | integer | not null
Indexes:
"instructors_pkey" primary key, btree (person, "class")
"instructors_class_index" btree ("class")
"instructors_person_index" btree (person)
Foreign-key constraints:
"$1" FOREIGN KEY (person) REFERENCES person(id)
"$2" FOREIGN KEY ("class") REFERENCES "class"(id)




ws2=> \d class
 Table "public.class"
 Column  |Type |   
Modifiers   
-+-+---
 id  | integer | not null default 
nextval('public.class_id_seq'::text)
 name| text| not null
 old_id  | integer | 
 location| integer | not null
 workshop| integer | not null
 class_time  | timestamp(0) with time zone | not null
 class_end_time  | timestamp(0) with time zone | not null
 class_size  | integer | not null
 begin_reg_time  | timestamp(0) with time zone | 
 class_list_sent_time| timestamp(0) with time zone | 
 class_list_sent_email   | text| 
 reminder_sent_time  | timestamp(0) with time zone | 
 ride_list_sent_time | timestamp(0) with time zone | 
 html_description| text| not null
 short_description   | text| 
 special_instructions| text| 
 on_hold_message | text| 
 review_mode | boolean | not null default false
 workshop_group  | integer | not null
 distance_ed | boolean | not null default false
 contract_class  | boolean | not null default false
 online_evaluation   | boolean | not null default true
 price_scheme| integer | not null
 duration| text| 
 register_cutoff_time| timestamp(0) with time zone | not null
 cutoff_message  | text| 
 full_message| text| 
 wait_list_size

Re: [GENERAL] Wrong rows selected with view

2005-11-16 Thread Bill Moseley

On Wed, Nov 16, 2005 at 03:42:19PM +0100, Andreas Seltenreich wrote:
> How does the query plan change when you make those changes? If it only
> occurs if a certain index is used, it might be corrupt (=> REINDEX).

I did a "reindex database ws2;" and no change.

I'm not very good at reading the query plans.  For one thing, they
always send me off on some tangent wondering why it's doing a Seq Scan
instead of a index scan. ;)

The first plan below returns the correct number of rows, the second plan does
not.  These are after I did the reindex, btw.


ws2=> explain select id from bar where person_id = 84;

 Subquery Scan bar  (cost=1225.81..1243.32 rows=6 width=4)
   Filter: (person_id = 84)
   ->  Unique  (cost=1225.81..1230.82 rows=1000 width=334)
 ->  Sort  (cost=1225.81..1228.31 rows=1003 width=334)
   Sort Key: "class".id
   ->  Hash Join  (cost=802.15..1175.81 rows=1003 width=334)
 Hash Cond: ("outer".person = "inner".id)
 ->  Hash Join  (cost=67.50..203.81 rows=1003 width=315)
   Hash Cond: ("outer".region = "inner".id)
   ->  Hash Join  (cost=45.00..163.77 rows=1002 
width=279)
 Hash Cond: ("outer"."location" = "inner".id)
 ->  Hash Join  (cost=22.50..118.74 rows=1001 
width=141)
   Hash Cond: ("outer"."class" = "inner".id)
   ->  Seq Scan on instructors  
(cost=0.00..20.00 rows=1000 width=8)
   ->  Hash  (cost=20.00..20.00 rows=1000 
width=137)
 ->  Seq Scan on "class"  
(cost=0.00..20.00 rows=1000 width=137)
 ->  Hash  (cost=20.00..20.00 rows=1000 
width=142)
   ->  Seq Scan on "location"  
(cost=0.00..20.00 rows=1000 width=142)
   ->  Hash  (cost=20.00..20.00 rows=1000 width=40)
 ->  Seq Scan on region  (cost=0.00..20.00 
rows=1000 width=40)
 ->  Hash  (cost=649.12..649.12 rows=10212 width=23)
   ->  Seq Scan on person  (cost=0.00..649.12 
rows=10212 width=23)
(22 rows)


This returns one row less and the only change to the view is this
commented out column:

-- class.full_message AS full_message,  -- this


ws2=> explain select id from bar where person_id = 84;


 Subquery Scan bar  (cost=1222.54..1240.05 rows=6 width=4)
   Filter: (person_id = 84)
   ->  Unique  (cost=1222.54..1227.55 rows=1000 width=366)
 ->  Sort  (cost=1222.54..1225.05 rows=1003 width=366)
   Sort Key: "class".id
   ->  Hash Join  (cost=779.65..1172.54 rows=1003 width=366)
 Hash Cond: ("outer".person = "inner".id)
 ->  Hash Join  (cost=45.00..204.14 rows=1003 width=347)
   Hash Cond: ("outer".region = "inner".id)
   ->  Hash Join  (cost=22.50..164.10 rows=1002 
width=311)
 Hash Cond: ("outer"."location" = "inner".id)
 ->  Merge Join  (cost=0.00..119.06 rows=1001 
width=173)
   Merge Cond: ("outer".id = 
"inner"."class")
   ->  Index Scan using class_pkey on 
"class"  (cost=0.00..52.00 rows=1000 width=169)
   ->  Index Scan using 
instructors_class_index on instructors  (cost=0.00..52.00 rows=1000 width=8)
 ->  Hash  (cost=20.00..20.00 rows=1000 
width=142)
   ->  Seq Scan on "location"  
(cost=0.00..20.00 rows=1000 width=142)
   ->  Hash  (cost=20.00..20.00 rows=1000 width=40)
 ->  Seq Scan on region  (cost=0.00..20.00 
rows=1000 width=40)
 ->  Hash  (cost=649.12..649.12 rows=10212 width=23)
   ->  Seq Scan on person  (cost=0.00..649.12 
rows=10212 width=23)
(21 rows)


-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[GENERAL] Wrong rows selected with view

2005-11-15 Thread Bill Moseley

PostgreSQL 7.4.8 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.0.2 20050816 
(prerelease) (Debian 4.0.1-5)

Hopefully this is something simple -- I assume it's a problem with my
SQL.  But it looks really weird to me at this late hour.

I have some tables for managing workshops, and I have a view (below).
I noticed for a given select it was returning the wrong number of
rows.  I expected seven rows back, but only got six.

I rewrote the view with just the joins and it returned the correct number
of rows.  So I started the brute force method of removing one thing at
a time in the view to see what would make it start returning the
correct number of rows.  That just confused me more.

Below if I comment out *any* single column that's marked "-- this" then
I get the correct number of rows.  If I comment out any rows marked
"-- nope" then there's no change (get six rows returned).  But, if I
start commenting out more than one "-- nope" then I get seven rows.

Can someone point out my error?


create view bar AS
SELECT DISTINCT ON (class.id)  -- this
   class.id AS id,
   class.name AS name, -- this
   class.class_time AS class_time, -- this
   class.begin_reg_time AS begin_reg_time, -- this

   (CASE WHEN class.register_cutoff_time > class.class_time -- this
THEN class.register_cutoff_time
ELSE class.class_time
END) AS register_stop_time,

   class.location AS location, -- nope
   class.workshop AS workshop, -- nope
   class.review_mode AS review_mode, -- nope
   class.workshop_group AS workshop_group, -- nope
   location.name AS location_name, -- this
   location.address AS address, -- this
   location.city AS city, -- nope
   location.state AS state, -- this
   location.zip AS zip, -- nope

   region.id AS region, -- nope
   region.name AS region_name, -- nope
   region.sort_order AS region_sort,  -- nope

   person.id AS person_id,
   UPPER( person.last_name || person.first_name ) AS instructor, -- 
this

   class.cutoff_message AS cutoff_message,-- this
   class.full_message AS full_message,  -- this
   class.wait_description AS wait_description  -- this

  FROM class, location, region, person, instructors

 WHERE class.location   = location.id   -- join with location
   AND class.id = instructors.class   -- join the 
instructors
   AND instructors.person   = person.id -- join the person(s)
   AND location.region  = region.id;-- join the location to 
a region
select id from bar where person_id = 84;
drop view bar;

  id
--
  727
  739
  804
  813
  867
 1244
(6 rows

Comment out one column:


   -- class.full_message AS full_message,  -- this
   class.wait_description AS wait_description  -- this

  FROM class, location, region, person, instructors

 WHERE class.location   = location.id   -- join with location
   AND class.id = instructors.class   -- join the 
instructors
   AND instructors.person   = person.id -- join the person(s)
   AND location.region  = region.id;-- join the location to 
a region
select id from bar where person_id = 84;
drop view bar;

  id
--
  727
  739
  804
  813
  867
 1243
 1244
(7 rows)

It's always class.id 1243 that doesn't show up.  Not sure this helps,
but:


[EMAIL PROTECTED]:~$ echo '\x \\ select * from class where id = 1243' | psql 
ws2 > 1243
[EMAIL PROTECTED]:~$ echo '\x \\ select * from class where id = 1244' | psql 
ws2 > 1244
[EMAIL PROTECTED]:~$ diff -U 0 1243 1244
--- 12432005-11-15 20:16:26.619412721 -0800
+++ 12442005-11-15 20:16:30.438646443 -0800
@@ -3 +3 @@
-id  | 1243
+id  | 1244
@@ -8,4 +8,4 @@
-class_time  | 2005-12-12 07:00:00-08
-class_end_time  | 2005-12-12 14:00:00-08
-class_size  | 55
-begin_reg_time  | 2005-11-15 17:36:00-08
+class_time  | 2005-12-25 07:15:00-08
+class_end_time  | 2005-12-25 11:00:00-08
+class_size  | 33
+begin_reg_time  | 
@@ -27 +27 @@
-register_cutoff_time| 2005-12-11 19:00:00-08
+register_cutoff_time| 2005-12-24 19:15:00-08



Thanks,


-- 
Bill "stabbing in the dark" Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Question about a query plan

2005-09-20 Thread Bill Moseley

I'm still trying to understand EXPLAIN ANALYZE output.

ws2=> select count(*) from person_role;
 count 
---
   123
(1 row)

ws2=> select count(*) from person; 
 count 
---
 11033
(1 row)

ws2=> EXPLAIN ANALYZE select id, first_name, last_name from person, person_role 
where id = 94 and person_role.person = person.id and (person_role.role = 2);
QUERY PLAN  
   
---
 Nested Loop  (cost=0.00..8.28 rows=1 width=23) (actual time=0.198..0.237 
rows=1 loops=1)
   ->  Index Scan using person_pkey on person  (cost=0.00..5.44 rows=1 
width=23) (actual time=0.054..0.056 rows=1 loops=1)
 Index Cond: (id = 94)
   ->  Seq Scan on person_role  (cost=0.00..2.83 rows=1 width=4) (actual 
time=0.130..0.165 rows=1 loops=1)
 Filter: ((role = 2) AND (person = 94))
 Total runtime: 0.379 ms
(6 rows)


Why does it say "Seq Scan" on person_role?  The query has both the
"person" and "role" to use as a primary key -- which is indexed.
Indeed, "rows=1" so it looks like an index fetch.

Perhaps, I'm reading that incorrectly?


ws2=> \d person_role;
  Table "public.person_role"
 Column |  Type   | Modifiers 
+-+---
 person | integer | not null
 role   | integer | not null
Indexes:
"person_role_pkey" primary key, btree (person, role)
Foreign-key constraints:
"$2" FOREIGN KEY (role) REFERENCES role(id) ON DELETE RESTRICT
"$1" FOREIGN KEY (person) REFERENCES person(id) ON DELETE CASCADE

Thanks,



-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [GENERAL] Setting WHERE on a VIEW with aggregate function.

2005-09-16 Thread Bill Moseley

On Fri, Sep 16, 2005 at 04:56:25PM -0700, Roger Hand wrote:
> So it would appear to me that you won't able to meet your goal by simply 
> using a view.
> 
> However, there are other solutions you can use which may or may not be 
> appropriate. I can think of three at the moment.
> 
> #1: Function Solution:

I'll take a look at this.  It's about time I started to learn about
functions a bit.

> This would be the best solution if you are in control of the
> application source code. In Java, for example, it's relatively
> simple to call this function and return the result as a result set.
> If you're working in Java I'd be glad to show you same sample code.

I'm using Perl and DBI (really Class::DBI but DBI is not far away).

> #2: Simplify the Select Criteria Solution:
> =
> 
> A slightly less flexible approach, but one that may be workable, would be to 
> add two boolean columns (with default value of '0') to the class table: 
> "completed" and "current". Then once a semester you run a simple query that 
> updates them. Something like:
> 
> UPDATE class SET current = '1' where class_time = '2005-09-01';
> UPDATE class SET completed = '1' where class_time < '2005-09-01';

Classes start daily (and at different hours).  I could cron once an
hour I suppose, but I'd rather not de-normalize the data.

Maybe I can just create three views (future, recent, old) and live
with that.

The temporary table is another possibility I'll look into.

Thank you very much for spending time on this.  I really appreciate
it.


-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [GENERAL] Setting WHERE on a VIEW with aggregate function.

2005-09-16 Thread Bill Moseley

On Fri, Sep 16, 2005 at 12:06:19PM -0700, Roger Hand wrote:
> > select * from instructor_counts where class_time > now();
> > 
> > But class_time is not part of the VIEW so that's not valid.
> 
> No problem, just make it a part of the view. See the classes section below.
> 
> CREATE VIEW future_instructor_counts
> AS
> SELECT  * FROM 
> 
>   (SELECT 
>person.id AS person_id,
> first_name,
> last_name) personinfo
> 
>   INNER JOIN 
> 
>   -- Add class_time field!
>   (SELECT class.id, class_time FROM class
>   WHERE class_time > now() ) classes
> 
>   INNER JOIN
> 
>   (SELECT 
>id, count(class) AS class_count 
>   FROM instructors GROUP BY id) classcount
> 
>   ON personinfo.person_id = instructors.id
>   AND classes.id = instructors.id

I couldn't get that to work -- Postgresql isn't that helpful just
reporting "ERROR:  syntax error at or near ";" at character 496" even
after adding a FROM in the first select.  So, I'm stabbing in the dark
to get it to work.

> [Disclaimer: I've not tested this code at all. It could help if you sent 
> table definitions and maybe even dummy
> data via insert commands.]

Ok -- this should be cut-n-paste:

CREATE TABLE class (
id  integer PRIMARY KEY,
class_time  timestamp(0) with time zone,
nametext
);

CREATE TABLE person (
id  integer PRIMARY KEY,
first_name  text
);

create table instructors (
person  integer NOT NULL REFERENCES person,
class   integer NOT NULL REFERENCES class,
PRIMARY KEY  (person, class)
);

INSERT INTO person (id,first_name) values (1,'Joe');
INSERT INTO person (id,first_name) values (2,'Mary');
INSERT INTO person (id,first_name) values (3,'Bob');
INSERT INTO person (id,first_name) values (4,'Cindy');

INSERT INTO class (id,name, class_time) values (1,'Math', now());
INSERT INTO class (id,name, class_time) values (2,'Math', now() + interval '1 
day');
INSERT INTO class (id,name, class_time) values (3,'Science', now());
INSERT INTO class (id,name, class_time) values (4,'PE', now() + interval '1 
day');

INSERT INTO instructors (person, class) values (1,1);  -- joe teaches math now

INSERT INTO instructors (person, class) values (1,2);  -- joe teaches math 
tomorrow
INSERT INTO instructors (person, class) values (2,2);  --   with Mary

INSERT INTO instructors (person, class) values (3,3);  -- Bob teaches science 
now
INSERT INTO instructors (person, class) values (4,3);  -- Cindy teaches science 
tomorrow

-- view

CREATE VIEW instructor_counts
AS
SELECT  person.id AS person_id,
first_name,
count(instructors.class) AS class_count

  FROM  class, instructors, person

 WHERE  class.id= instructors.class AND
person.id   = instructors.person
-- AND class_time > now()

  GROUP BY  person_id, first_name; 


select * from instructor_counts order by class_count desc;

-- Returns:

 person_id | first_name | class_count 
---++-
 1 | Joe|   2
 2 | Mary   |   1
 3 | Bob|   1
 4 | Cindy  |   1
(4 rows)

My GOAL above is to be able to add a WHERE class_time > $some_time.


Here's were I left off, which I never could get to work.
The individual selects work, but seems like I need to be say
c.class_id = i.class in addition.  But I can't even get
this without syntax errors:

CREATE VIEW instructor_counts
AS
SELECT  * 

FROM

(SELECT person.id AS person_id, first_name
 FROM person) p

INNER JOIN

(SELECT class.id AS class_id, class_time
     FROM class) c

INNER JOIN

(SELECT person, count(class) AS class_count
 FROM instructors GROUP BY person) i

ON ( p.person_id = i.person);


That also looks like the selects are going to be full table scans.




-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 6: explain analyze is your friend

[GENERAL] Setting WHERE on a VIEW with aggregate function.

2005-09-16 Thread Bill Moseley

I have a view to generate a list of instructors and a count of their
future classes.

"instructors" is a link table between "class" and "person".

CREATE VIEW future_instructor_counts
AS
SELECT  person.id AS person_id,
first_name,
last_name,
count(instructors.class) AS class_count

  FROM  class, instructors, person

 WHERE  class.id= instructors.class AND
person.id   = instructors.person
AND class_time > now()

  GROUP BY  person_id, first_name, last_name;


I have two very basic SQL questions:

1) With an aggregate function in the query, is there any way to remove
the "AND class_time > now()" so that timestamp can be passed in the
select?  That is, I'd like to be able to do this?

select * from instructor_counts where class_time > now();

But class_time is not part of the VIEW so that's not valid.  And if it
was included then I don't have an aggregate function any more - no
more grouping.


2) I think I'm missing something obvious.  I know that I need to
specify all my non-aggregate columns in the "GROUP BY", but I don't
under stand why.  Really, the results are just grouped only by
person.id so why the need to specify the other columns.

And if you don't specify all the columns then Postgresql reports:

  ERROR:  column "person.id" must appear in the GROUP BY 
clause or be used in an aggregate function

Is there a reason Postgresql doesn't just add the column
automatically?  It does in other cases (like a missing table in a
join).

Thanks


-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Question about a query plan

2005-09-16 Thread Bill Moseley

On Fri, Sep 16, 2005 at 10:02:28AM -0500, Thomas O'Connell wrote:
> 
> On Sep 16, 2005, at 9:18 AM, Bill Moseley wrote:
> 
> >First question is why the planner is not using an index scan when I
> >use "now()" or CURRENT_TIMESTAMP?
> 
> It also used to be the case (pre-8.0; I couldn't find in the release  
> notes whether this was an 8.0 or 8.1 fix) that now() and  
> CURRENT_TIMESTAMP were not indexable, I think because of mutability.
> 
> For older versions of postgres, it's recommended that you determine  
> the time in the client and use constant data in your query.

Interesting.  I have a few VIEWs that include now(), but I guess I
could adjust and pass in the date from the client.

Thanks for the tip.

-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Question about a query plan

2005-09-16 Thread Bill Moseley

On Fri, Sep 16, 2005 at 04:45:57PM +0200, Peter Eisentraut wrote:
> The planner thinks your query will return 414 rows, so it thinks the 
> sequential scan is faster.  In reality, your query only retuns 28 rows, so 
> you need to create better statistics, either by running ANALYZE or VACUUM (or 
> both) or tweaking the statistics parameters of the columns.

I did wonder about the planner stats, so I had run ANALYZE on the
database with no change.  I just now ran VACUUM (and VACUUM ANALYZE),
and again see no change.  Perhaps my table is just too small for this
test.


-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[GENERAL] Question about a query plan

2005-09-16 Thread Bill Moseley

y?

EXPLAIN select person, first_name, count(class) 
from instructors, person
where instructors.class in 
(select id from class where class_time > now() ) 
AND person.id = instructors.person
group by person, first_name;
  QUERY PLAN
  
--
 HashAggregate  (cost=734.06..735.15 rows=437 width=17)
   ->  Merge Join  (cost=706.81..730.78 rows=437 width=17)
 Merge Cond: ("outer".id = "inner".person)
 ->  Index Scan using person_pkey on person  (cost=0.00..1703.82 
rows=12246 width=13)
 ->  Sort  (cost=706.81..707.90 rows=437 width=8)
   Sort Key: instructors.person
   ->  Hash IN Join  (cost=656.65..687.64 rows=437 width=8)
 Hash Cond: ("outer"."class" = "inner".id)
 ->  Seq Scan on instructors  (cost=0.00..20.08 rows=1308 
width=8)
 ->  Hash  (cost=655.62..655.62 rows=414 width=4)
   ->  Seq Scan on "class"  (cost=0.00..655.62 rows=414 
width=4)
 Filter: (class_time > now())



-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[GENERAL] GROUP BY requirement

2005-08-26 Thread Bill Moseley

I'm wondering if adding a GROUP BY (as required by Postgres) will
change the results of a select on a view.

I have the following view which joins a "class" with a teacher.  A
teacher is a "person" and I have an "instructors" link table.

CREATE VIEW class_list  (id, class_time, instructor )
AS
SELECT DISTINCT ON(class.id) 
   class.id, class.class_time, person.first_name

  FROM class, instructors, person
 WHERE instructors.person = person.id
   AND class.id = instructors.class;

I also have a table "registration" that links students with a class.
The registration table has a "reg_status" column to say if they are
confirmed or on the wait_list.  So when showing the above I'd also
like to see how many students are confirmed and on the wait_list.

DROP VIEW cl;
CREATE VIEW cl  (id, class_time, instructor, 
confirmed_cnt, wait_list_cnt)
AS
SELECT DISTINCT ON(class.id) 
   class.id, class.class_time, person.first_name,
   sum (CASE WHEN registration.reg_status = 1 THEN 1 ELSE 0 END) as 
confirmed_cnt,
   sum (CASE WHEN registration.reg_status = 2 THEN 1 ELSE 0 END) as 
wait_list_cnt,

  FROM class, instructors, person, registration
 WHERE instructors.person = person.id
   AND class.id = instructors.class
   AND class.id = registration.class

  GROUP BY class.id, class.class_time, person.first_name;

PostgreSQL requires the GROUP BY.  But, I'm not clear how the GROUP BY
might change the results between the two views above.

  http://www.postgresql.org/docs/8.0/static/sql-select.html#SQL-GROUPBY

says:

When GROUP BY is present, it is not valid for the SELECT list
expressions to refer to ungrouped columns except within aggregate
functions, since there would be more than one possible value to
return for an ungrouped column.

Frankly, I cannot see how it might change results of a select between
the two views.  Am I missing something?



-- 
Bill Moseley
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Help with a subselect inside a view

2005-08-25 Thread Bill Moseley

On Thu, Aug 25, 2005 at 12:14:31PM -0400, Tom Lane wrote:
> > CREATE VIEW cl  (id, class_time, instructor)
> > AS
> > SELECT DISTINCT ON(class.id)
> >class.id, class.class_time, person.first_name
> >   FROM class, instructors, person
> >  WHERE instructors.person = person.id
> >AND class.id = instructors.class;
> 
> This is allowed because the code automatically adds "ORDER BY class.id"
> within the view (as you would see if you examined the view with \d).

I see that now.  Might be helpful for the docs to say that for folks
like me.


> It's fairly pointless though, because as the manual notes, you can't get
> any well-defined behavior without additional ORDER BY columns to
> prioritize the rows within class.id groups.  As is, you're getting
> random choices of class_time and first_name within the groups.
> (Though maybe in this application, you don't care.)

I'm not sure I follow what you are saying.  I understand that I have
no control over which "first_name" I end up with (and I don't really
care), but class_time is a column in the "class" table which I'm using
DISTINCT ON on, so that should be unique as well.  So I assume you
meant random choice of first_name, not class_time.

Thanks,




-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Help with a subselect inside a view

2005-08-25 Thread Bill Moseley

And about being efficient:

On Thu, Aug 25, 2005 at 08:01:26AM -0700, Bill Moseley wrote:
> DROP VIEW cl;
> CREATE VIEW cl  (id, class_time, instructor)
> AS
> SELECT DISTINCT ON(class.id)
>class.id, class.class_time, person.first_name
>   FROM class, instructors, person
>  WHERE instructors.person = person.id
>AND class.id = instructors.class;

And in a case like above, I'm displaying the list a page at a time.
So I first do a count to find total rows and then a select:

select count(*) from cl where class_time >= now();
select * from cl where class_time >= now() LIMIT 20 OFFSET 40;

I looked at the EXPLAIN ANALYZE for both and both do the join, it
seems.  I guess it has to be that way.

So would it be smart to do the initial count on "class" instead
of the view first?

select count(*) from class where class_time >= now();
select * from cl where class_time >= now() LIMIT 20 OFFSET 40;

That is, Postgresql won't figure out that it only need to look at one
table, right?

-- 
Bill Moseley
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Help with a subselect inside a view

2005-08-25 Thread Bill Moseley

On Thu, Aug 25, 2005 at 08:05:36AM -0500, Bruno Wolff III wrote:
> On Wed, Aug 24, 2005 at 23:12:17 -0700,
>   Bill Moseley <[EMAIL PROTECTED]> wrote:
> > I need a little SQL help:
> > 
> > I'm trying to get a subselect working inside a view.
> 
> Unfortunately you didn't show us what you tried. My guess would be that
> you didn't enclose the subselect in parenthesis.

No, it wasn't that.  I just didn't want to look too foolish. ;)

DROP VIEW cl;
CREATE VIEW cl  (id, instructor)
AS
SELECT class.id, person.first_name
  FROM class, instructors, person
 WHERE instructors.person = person.id
   AND class.id = (
   SELECT instructors.id
 FROM instructors, person
WHERE instructors.class = class.id
  AND person.id = instructors.person
LIMIT 1
);

Which returns a row for every row in "instructors" table.


> The distinct on solution that was suggested is probably a better way to
> go anyway.

Turns out it is, Thanks.


-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Help with a subselect inside a view

2005-08-25 Thread Bill Moseley

Hi David,

On Thu, Aug 25, 2005 at 01:22:02AM -0700, David Fetter wrote:
> This sounds like a case for PostgreSQL's nifty DISTINCT ON functionality.
> 
> http://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT

   The DISTINCT ON expression(s) must match the leftmost ORDER BY
   expression(s). The ORDER BY clause will normally contain additional
   expression(s) that determine the desired precedence of rows within
   each DISTINCT ON group.

I read that and thought it wasn't a drop-in replacement for my code
due to the leftmost ORDER BY requirement.  But, it seems to work even
if that requirement is not met.

Perhaps I not understanding the wording above?  Or is Postgresql
adding in the order automatically?

My original VIEWS with duplicates:

DROP VIEW cl;
CREATE VIEW cl  (id, class_time, instructor)
AS
SELECT class.id, class.class_time, person.first_name
  FROM class, instructors, person
 WHERE instructors.person = person.id
   AND class.id = instructors.class;

 select * from cl where id = 555;
 id  |   class_time   | instructor 
-++
 555 | 2005-09-30 09:00:00-07 | Cheryl
 555 | 2005-09-30 09:00:00-07 | Bob
(2 rows)

And with DISTINCT ON():

DROP VIEW cl;
CREATE VIEW cl  (id, class_time, instructor)
AS
SELECT DISTINCT ON(class.id)
   class.id, class.class_time, person.first_name
  FROM class, instructors, person
 WHERE instructors.person = person.id
   AND class.id = instructors.class;

 select * from cl where id = 555;
 id  |   class_time   | instructor 
-++
 555 | 2005-09-30 09:00:00-07 | Cheryl
(1 row)

Here where the leftmost ORDER BY doesn't match the DISTINCT ON, which I thought
was not possible:

select * from cl where class_time > now() order by instructor limit 3;
 id  |   class_time   | instructor 
-++
 544 | 2005-08-31 09:00:00-07 | Cheryl
 555 | 2005-09-30 09:00:00-07 | Cheryl
 737 | 2005-08-30 09:00:00-07 | Cynthia

-- 
Bill Moseley
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[GENERAL] Help with a subselect inside a view

2005-08-24 Thread Bill Moseley

I need a little SQL help:

I'm trying to get a subselect working inside a view.

I have a table "class" that has related tables (a class has a
location, a location has an address with columns city, state, zip).
I want to use a VIEW to display columns related to a given class.

But a class can also have one or more instructors.  So I have a link
table:

  Table "public.instructors"
 Column |  Type   | Modifiers 
+-+---
 person | integer | not null
 class  | integer | not null

Foreign-key constraints:
"$1" FOREIGN KEY (person) REFERENCES person(id)
"$2" FOREIGN KEY ("class") REFERENCES "class"(id)

I can do the following, but in the (very rare) case where there may be
two instructors assigned to the class I will get two rows back.

CREATE VIEW class_list
(
id, name, class_time, location, location_name,
address, city, state, zip,
instructor_name
)
AS
SELECT class.id, class.name, class.class_time, class.location,
 location.name,
   address.id, address.city, address.state, address.zip,
   person.last_name

  FROM class, location, address,
   instructors, person

 WHERE class.location   = location.id
   AND location.address = address.id
   AND location.region  = region.id
   -- Not what I want
   AND instructors.person   = person.id
   AND instructors.class= class.id;

I'm completely happy to just fetch just one of the instructors, and
don't care which one.  I just need only one row per class.  (I assume
that's my hint right there.)

I can select a single instructor from a given class like:

SELECT  person.id
  FROM  instructors, person
 WHERE  instructors.class = 555
AND person.id = instructors.person
 LIMIT  1;

So I thought I might be able to add that as a subselect to the VIEW,
but I have not been able to make it work.  I suspect I'm missing
something obvious.

Thanks,

Oh BTW -- If I do a count(*) and a WHERE that only includes columns in
the "class" table on the VIEW, will Postgresql still do the joins?  Or
will it only do the select on the "class" table.  I suspect it will
do the joins to make sure the relations can be found.


-- 
Bill Moseley
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: explain analyze is your friend

[GENERAL] Optimize a VIEW question

2005-08-23 Thread Bill Moseley

I'm just starting out using VIEWs -- and mostly new to postgresql and
I'm trying to learn a little about reading the EXPLAIN ANALYZE output.

I have the following view setup and I'm wonder where to create
indexes, and mostly if I'm doing anything the incorrect or a very
expensive way.

CREATE VIEW class_list
(
id, name, class_time, location, workshop, review_mode, 
workshop_group, 
location_name,
address, city, state, zip,
region, region_name
)
AS
SELECT class.id, class.name, class.class_time, class.location, 
class.workshop, 
   class.review_mode, class.workshop_group,
   location.name,
   address.id, address.city, address.state, address.zip,
   region.id, region.name

  FROM class, location, address, region

 WHERE class.location   = location.id
   AND location.address = address.id
   AND location.region  = region.id;


I'm not clear about the Seq Scan below.  The region
table is quite small, so am I correct that is why the planner is doing
a seq scan on that table?

\d region
 Table "public.region"
   Column   |  Type   |   Modifiers 
   

+-+
 id | integer | not null default 
nextval('public.region_id_seq'::text)
 active | boolean | not null default true
 sort_order | integer | not null default 1
 name   | text| not null
Indexes:
"region_pkey" primary key, btree (id)
"region_name_key" unique, btree (name)


EXPLAIN ANALYZE select * from class_list where workshop = 28;
  QUERY 
PLAN  
--
 Nested Loop  (cost=51.78..93.07 rows=9 width=157) (actual time=1.306..1.468 
rows=6 loops=1)
   ->  Hash Join  (cost=51.78..76.87 rows=8 width=129) (actual 
time=1.245..1.299 rows=6 loops=1)
 Hash Cond: ("outer".id = "inner".region)
 ->  Seq Scan on region  (cost=0.00..20.00 rows=1000 width=36) (actual 
time=0.016..0.027 rows=10 loops=1)
 ->  Hash  (cost=51.76..51.76 rows=8 width=97) (actual 
time=1.019..1.019 rows=0 loops=1)
   ->  Hash Join  (cost=26.68..51.76 rows=8 width=97) (actual 
time=0.201..1.007 rows=6 loops=1)
 Hash Cond: ("outer".id = "inner"."location")
 ->  Seq Scan on "location"  (cost=0.00..20.00 rows=1000 
width=44) (actual time=0.014..0.694 rows=104 loops=1)
 ->  Hash  (cost=26.66..26.66 rows=7 width=57) (actual 
time=0.150..0.150 rows=0 loops=1)
   ->  Index Scan using class_workshop_index on "class" 
 (cost=0.00..26.66 rows=7 width=57) (actual time=0.057..0.137 rows=6 loops=1)
 Index Cond: (workshop = 28)
   ->  Index Scan using address_pkey on address  (cost=0.00..2.01 rows=1 
width=32) (actual time=0.013..0.015 rows=1 loops=6)
 Index Cond: ("outer".address = address.id)
 Total runtime: 1.853 ms
(14 rows)


By the way -- at one point I managed to hang postgresql (7.4.8-16 on
Debian Sid).  I have not been able to make it happen again, but it
seemed odd.

(gdb) bt
#0  0x081e51ee in tuplestore_gettuple ()
#1  0x0810c7f0 in ExecMaterial ()
#2  0x08102cb2 in ExecProcNode ()
#3  0x0810d8d5 in ExecNestLoop ()
#4  0x08102ceb in ExecProcNode ()
#5  0x081093a4 in ExecAgg ()
#6  0x08102c79 in ExecProcNode ()
#7  0x08101ecc in ExecutorRun ()
#8  0x0816f58b in PortalSetResultFormat ()
#9  0x0816f8c7 in PortalRun ()
    #10 0x0816da9f in PostgresMain ()
#11 0x08148b4e in ClosePostmasterPorts ()
#12 0x0814a4e1 in PostmasterMain ()
#13 0x0811c2e7 in main ()


-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[GENERAL] Input timestamp from epoch?

2005-08-19 Thread Bill Moseley

I'm entering data into a Postgresql database where the input dates are
currently in unix epoch values.  Is there a way for Postgresql to
parse that epoch into a "timestamp(0) with time zone"?

I read section 8.5 "Date/Time Types" and I can see I can input an
(well THE) epoch, and I can EXTRACT(EPOCH FROM [timestamp]), but I'm
not seeing where I can use an epoch format as input for a timestamp(0)
with time zone.


I'm not convinced that I want to store the value as a timestamp.  My
time values are all within the epoch range, so that's not an issue.
Then represent a point in time (and event), and I need to display them
in various timezones (depending on where the event is happening).
And the epoch is reasonably easy to work with.  I guess a timestamp(0)
with time zone is basically the same thing -- but gives me date
operations on Postgresql.  Probably faster for my client application
to parse epoch from the database, though.


-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Waiting on a transaction

2005-08-18 Thread Bill Moseley

On Thu, Aug 18, 2005 at 01:33:18PM -0400, Vivek Khera wrote:

Hey Vivek!  Nice to hear from you over here. ;)


> The  trick is dealing with statement timeouts on shared pool  
> connections over mod_perl and Apache::DBI.  I haven't satisfied  
> myself yet that the timeout will be unset when the next connection  
> uses the DB...

You mean other than setting "alarm 0;"?

-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] How to "ping" the database

2005-08-17 Thread Bill Moseley

On Wed, Aug 17, 2005 at 04:25:48PM -0400, Tom Lane wrote:
> Of course, this begs the question of what ping is really supposed to
> test and when it's supposed to be allowable.  The above will not work
> if in the middle of retrieving a query result, for example.

Well, there's that.  I'm not really sure why there's a need for a ping
-- and I've heard others question it, too.

Perl's DBI has a connect_cached() function that is suppose to return a
cached connection if it's still alive.  So that is one place "ping" is
used.  If ping fails then a new connection is created.


-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Schema design question

2005-08-17 Thread Bill Moseley

On Wed, Aug 17, 2005 at 07:41:20PM +, Matt Miller wrote:

Thanks for responding, Matt:

> create table course (id serial primary key,
>  description varchar);
> create table teacher (id serial primary key,
>   name varchar);
> create table course_teacher (course_id integer not null,
>  teacher_id integer not null);
[...]
> create table class (id serial primary key,
> course_id integer not null,
> teacher_id integer not null,
> starts_on date,
> location varchar);

There may be more than one teacher in the class so instead I'd need
another "class_teacher" link table.

I guess what "bugged" me about this type of layout is that a course
and class share so many columns.  Duplication just looks wrong -- and
I worry about changing a column type on one table and forgetting to
change it on the other table.  Also have to remember to copy all
columns every time a specific class is created.

On the other hand, if I used a single table to represent both types of
entities, then selects are always going to have something like WHERE
type = 'course' added onto the WHERE.  That's extra processing for no
good reason.

> I'm sure there are many ways to get there.  To me, the way I've
> described is the most-direct way to represent the relationships you've
> described.

And thanks very much for you help.

-- 
Bill Moseley
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[GENERAL] Schema design question

2005-08-17 Thread Bill Moseley

I originally was planning on asking about Postgresql inheritance, but
after reading the docs again I do not think that it applies to my
design.

So, now I'm asking a rather basic schema design question.

I have two related objects where one inherits column values from
another.  No, that's incorrect.  The "child" receives default values
from the "parent" when the child is created.

A more concrete example: tables that represent classes taught at a
school.  The parent object is a general course description, and the
child object is a specific instance of a course -- a "class" -- which
is a course taught at a given time and location.  A course can be
taught multiple times, obviously.

A course (and thus a class) can have multiple instructors -- a
many-to-many relationship.  So I have a link table for that.  A class
normally uses the course's default instructors, but may be different
for specific classes instance.

How would you layout the tables for somethings like this?


I can think (out loud) of three ways to set this up:

1) Separate tables for "course" and "class" and when a class is
created simply copy column data from the course to the class.

  Pro: Selects are simple

  Con: Column duplication in the two tables -- two tables look a lot alike
   Need to have duplicate link tables (one pointing to each table)


2) Create a third "common_values" table that both "course" and "class"
tables reference.  Then when creating a class from a course clone the
common values row to a new row that the class can reference.

  Pro: No duplication of columns in multiple tables.
   Only need one linking table for instructors (but still need to
   create new links when creating the new row)

  Con: Need to always do joins on selects (not really a problem)

3) Create a single table with a flag to indicate if the row is a
"course" or a "class".

  Pro: Simple selects and no column duplication between tables

  Con: Columns for a course might be ok as NULL, but would be required
   for a specific class.

Again, a "course" and "class" are very similar.  But, once a class is
created from a course it really is its own entity.  For example, if
the course description changes in the future I don't want it to change
on previous classes.  There also needs to be a link between the two.
For example, you might want to show a list of courses, and then see
what classes are scheduled for a given course, so a class should
reference its parent course.

Thanks very much,


-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 6: explain analyze is your friend

[GENERAL] How to "ping" the database

2005-08-17 Thread Bill Moseley

The Perl DBI interface to Postgresql, module DBD::Pg, has a ping()
method that is suppose to determine if a database connection is alive.
It can be seen here (see: dbd_db_ping):

   http://search.cpan.org/src/DBDPG/DBD-Pg-1.43/dbdimp.c

It pings by calling:

status = _result(imp_dbh, "SELECT 'DBD::Pg ping test'");

This fails when a transaction fails -- for example when doing a
serialized transaction and another session preforms an update between
the serialized transaction's SELECT and UPDATE.  In this situation no
SELECTS are allowed until a ROLLBACK.

In Perl, this failure of Ping results in a new database connection
being created, even though the connection is still valid.

I'm about to post a bug report on DBD::Pg, but I'm wondering if anyone
here could suggest a better way to implement ping() that doesn't fail
just because Postgresql is not allowing SELECTS.

What I did in my code was if ping fails, call rollback and then try
ping one more time.  But, I'm not clear if that works in a more
general case or what might happen if the connection really is broken.


-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Waiting on a transaction

2005-08-16 Thread Bill Moseley

On Tue, Aug 16, 2005 at 08:25:25PM +0200, Martijn van Oosterhout wrote:
> On Tue, Aug 16, 2005 at 11:05:35AM -0700, Bill Moseley wrote:
> > I've read over the docs on Concurrency Control but still not clear
> > about when transactions block other updates, and how to deal with that
> > on the application level.
> > 
> > If I do a BEGIN and an UPDATE in one psql session and then try and do
> > an UPDATE in another psql session that UPDATE waits until either a
> > COMMIT or ROLLBACK.
> > 
> > Is it common for applications using Postgresql to set a timer on
> > updates and abort?
> 
> It is not normal to hold a transaction open while doing nothing. If you
> always send transactions without delays the issue doesn't come up
> because you never have to wait long enough for it matter.

Ok, that will be the normal case.

I was just wondering because I was running test code today and it
hung.  I wondered what was happening and found out I left a psql
window open last night in the middle of a transaction.  That shouldn't
happen in production.

So then I wondered if my application should set an alarm and timeout
with an error if, by odd chance, an update hangs.  Trying to be a bit
more robust -- not that the application could recover, but at least it
could spit out an error other than hang.

-- 
Bill Moseley
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[GENERAL] Waiting on a transaction

2005-08-16 Thread Bill Moseley

I've read over the docs on Concurrency Control but still not clear
about when transactions block other updates, and how to deal with that
on the application level.

If I do a BEGIN and an UPDATE in one psql session and then try and do
an UPDATE in another psql session that UPDATE waits until either a
COMMIT or ROLLBACK.

Is it common for applications using Postgresql to set a timer on
updates and abort?


-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Sorting by related tables

2005-08-15 Thread Bill Moseley

On Mon, Aug 15, 2005 at 11:30:32PM +0200, Andreas Seltenreich wrote:
> 
> This would be one possibility. If you don't want your application to
> deal with transactions being aborted because of non-serializable
> transactions, you could alternatively use explicit locking (SELECT ...
> FOR UPDATE) combined with the Read Committed isolation level (the
> default).

SELECT FOR UPDATE locks just the rows that are selected, right?  If I
understand correctly, that would not work for my case because I'm
updating different rows than I'm selecting.

My tables are small, so I'm thinking of just manually updating all the
rows in sequence to adjust the order when needed -- to make things a
bit more simple.  But it is a problem that I am curious about how best
to solve in a scalable way.

Thanks very much for your feedback.

-- 
Bill Moseley
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [GENERAL] Sorting by related tables

2005-08-15 Thread Bill Moseley

On Sat, Aug 13, 2005 at 06:44:09PM +0200, Andreas Seltenreich wrote:
> > 3) Oh, and I have also this for checking IF there are items in
> > "region" that are "above" the item in question -- to see IF an item
> > can or cannot be moved up in the sort order relative to others.
> >
> > SELECT id FROM __TABLE__
> > WHERE
> > sort_order <= (SELECT sort_order FROM __TABLE__ WHERE id = ?)
> >  AND id != ?;
> >
> > If that returns any rows then I know I can call the UPDATE to move the
> > item up.
> 
> I guess you want a boolean value here? SELECT EXISTS around your above
> query as a subselect should do the trick. You also want to use LIMIT 1
> in the statement, to avoid fetching unnecessary records.

Is there much of a difference between using LIMIT 1 and using an
EXISTS subselect?  Frankly, I'm not clear what you are specifically
suggestion with EXISTS.  I'm using Perl's Class::DBI object mapping module so
returning a single row is an easy way to check this as a boolean
result in Perl.

> > Again, a very basic question: What method should be used to be sure
> > that nothing changes between the SELECT and the UPDATE?
> 
> You can achieve that using transactions. Concurrency control is
> explained here: <http://www.postgresql.org/docs/8.0/static/mvcc.html>.

My comment was that I want to do the above SELECT and then *only* do
an UPDATE if the SELECT returns at least one row.

So, I should do:

  SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

Before the SELECT.   And when I UPDATE I need to be prepared to do a
ROLLBACK if I get an error and repeat the process.  (And, I assume,
take some precaution to give up after some number of tries.)

Does that seem reasonable?

-- 
Bill Moseley
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[GENERAL] Sorting by related tables

2005-08-13 Thread Bill Moseley

I have a few beginner questions about using related tables for
sorting.

create table region {
id  SERIAL PRIMARY KEY,
nametext,
-- order this table should be sorted in
-- a "1" is the top sort level
sort_order  integer
);

create table city {
id  SERIAL PRIMARY KEY,
nametext,
region  integer REFERENCES region
);

I want a way to adjust the sort order of the "region" table ("move
item up" or "move item down" in a web interface) without requiring
knowledge of the existing "sort_order" for the rows in the region
table.  (i.e.  requiring them to already be in an order).

Here's my "move up" (which is a lower sort_order value) statement.
(__TABLE__ is "region" and ? are $\d bind parameters)

UPDATE __TABLE__
SET sort_order =
CASE
-- subtract one from the item's sort, unless it's already "1"
WHEN id = ? AND sort_order > 1 THEN sort_order-1

-- for other items that are greater or equal to sort-1
WHEN id != ? AND sort_order >= (select sort_order from __TABLE__ 
where id = ?)-1
THEN sort_order+1

-- all others, leave alone
ELSE sort_order
END;

This works reasonably well for small tables, but doesn't scale and the
logic likely has holes.  And behavior when adding new rows to the
region table is not defined.

1) How do most people do this?  Use linked lists?

create table region {
id  SERIAL PRIMARY KEY
nametext,
list_head   boolean, -- flag if this is the head of the linked list
nextinteger REFERENCES region
);

2) As a SQL beginner, I'm not seeing how to display rows from "city"
sorted in order based on the order in the "region" table.

3) Oh, and I have also this for checking IF there are items in
"region" that are "above" the item in question -- to see IF an item
can or cannot be moved up in the sort order relative to others.

SELECT id FROM __TABLE__
WHERE
sort_order <= (SELECT sort_order FROM __TABLE__ WHERE id = ?)
 AND id != ?;

If that returns any rows then I know I can call the UPDATE to move the
item up.

Again, a very basic question: What method should be used to be sure
that nothing changes between the SELECT and the UPDATE?



-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

71 matches

Mail list logo