[GENERAL] Select last week
I'm looking at ways to select rows with a timestamp column of "last week" relative to the current server time. In 8.3 there's "isoyear" so is this a good approach? extract( isoyear from foo_time ) = extract( isoyear from now() - interval '1 week' ) AND extract( week from foo_time ) = extract( week from now() - interval '1 week' ) What about before isoyear was available in Postgresql? Find the week and then add a week for the range? date_trunc( 'second', foo_time ) BETWEEN date_trunc( 'week', now() - interval '1 week' ) AND date_trunc( 'week', now() - interval '1 week' ) + interval '1 week' - interval '1 second' Is there a better approach? -- Bill Moseley mose...@hank.org Sent from my iMutt -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Tracking down a deadlock
I seemed to have resolved this issue by a: LOCK TABLE account IN EXCLUSIVE MODE; at the very start of the transaction to prevent that transaction from running in parallel. The transaction has quite a few selects and inserts and a few updates. I was hoping to not lock at the start of the transaction which will effectively serialize that code. The update that was causing the deadlock was just about the last command in the transaction. Removing that update and the deadlocks go away. I had hoped that a LOCK near the end of the transaction (before that UPDATE that deadlocks) would work. Oddly, it didn't and the deadlock was reported then on the LOCK itself, plus my deadlock_timeout (60 seconds) didn't seem to apply in that case. A mystery. Thanks for the help, -- Bill Moseley. mose...@hank.org Sent from my iMutt -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Tracking down a deadlock
On Sun, May 03, 2009 at 09:08:29PM -0400, Tom Lane wrote: > > There are no other updates to that account table in the transaction, so I'm > > confused how that is causing a deadlock. > > Is there more than one row with the target id? No. It's a single SERIAL primary key. > Does the account table have any foreign-key references to or from it? Many. I had dropped all constraints, except foreign keys, from the account table to see if that would help. (Didn't). One CHECK constraint involved checking the sum of two columns in the account table which seemed like a potential place for a deadlock. But I didn't think the foreign keys would be a problem. About 8 tables reference the account table, and the account table has about 5 columns that reference other tables. > It's sometimes possible to get a deadlock associated with trying to lock > FK-referenced rows that several updated rows have in common. The transaction has a number of selects and inserts not related to the account table. There is an insert into a log table that references the account table, though, that happens right before the update to the account table. I can't picture the deadlock, but I'm only familiar with the obvious examples. What's the approach for dealing with this kind of deadlock (assuming the FK-related as you suggest)? I assume at this point I need to try explicit locking earlier in the transaction. Any suggestions which lock to use and on what? SHARE ROW EXCLUSIVE on the account table before issuing the update? -- Bill Moseley. mose...@hank.org Sent from my iMutt -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Tracking down a deadlock
On Sat, May 02, 2009 at 11:48:21AM -0400, Tom Lane wrote: > Bill Moseley writes: > > Not getting any nibbles, so allow me to try a short question: > > If I have a deadlock situation (that will be reported as such by > > Postgresql once the deadlock_timeout passes), does pg_stat_activity > > show the queries that are blocking each other? > > In 8.2 or later it should do so; in prior versions there could be some > lag involved. Another thing to keep in mind is that backends like to > cache copies of the pg_stat_activity view --- if you are watching it > to see what is happening, your view is only current as of the start > of your current transaction. Or you can do pgstat_clear_snapshot() > to force collection of new info. Thanks Tom, I'm not clear how to run pgstat_clear_snapshot(), but I looked again and before I ran my test script and pg_stat_activity doesn't list any queries waiting. So, I don't believe it's showing stale data. Then when I run the test script (which runs the same transaction in two processes at the same time) and get a deadlock the same query is shown twice both with "waiting" set true: UPDATE account set foo = 123 where id = $1 And if I remove that update from the transaction I no longer have the deadlock. So, it seems like that is the problem update. Is postgresql telling me that it's deadlocked on two transactions trying to run that same update? There are no other updates to that account table in the transaction, so I'm confused how that is causing a deadlock. Is there something else I can do to understand what exactly is the reason for the deadlock? Thanks, -- Bill Moseley. mose...@hank.org Sent from my iMutt -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Tracking down a deadlock
Not getting any nibbles, so allow me to try a short question: If I have a deadlock situation (that will be reported as such by Postgresql once the deadlock_timeout passes), does pg_stat_activity show the queries that are blocking each other? I'm wondering if I'm misinterpreting what I'm seeing below. On Thu, Apr 30, 2009 at 10:30:26AM -0700, Bill Moseley wrote: > > I need a bit of help understanding what might be causing a deadlock. > > To duplicate the problem I'm running a test script that forks two > child processes. Each child runs the same transaction and thus the > order of execution is exactly the same. (i.e. not like the typical > deadlock where the order of updates might be reversed between two > sessions.) > > The transaction inserts a new document into a document management > system. The transaction does a number of selects and inserts. At the > end of the transaction they both try and update the same row in the > "account" table. > > > It does not happen every time I run my test script -- but if I run it > enough I get a deadlock. If I fork more child process I can make it > happen more often. So, it does seem like a timing issue. > > > No explicit LOCK or SELECT FOR UPDATE is used in the transaction. > I'm running in the default "read committed" isolation level. > The initial problem was reported on PostgreSQL 8.3.5, but > I'm now testing on PostgreSQL 8.2.9. > > > > I've set my deadlock_timeout high so I can review the locks. > I see these entires: > > > select * from pg_locks where not granted; >locktype| database | relation | page | tuple | transactionid | classid > | objid | objsubid | transaction | pid | mode | granted > ---+--+--+--+---+---+-+---+--+-+---+---+- > transactionid | | | | | 18410123 | > | | |18410135 | 13420 | ShareLock | f > tuple | 2474484 | 2474485 | 30 |11 | | > | | |18410123 | 13419 | ExclusiveLock | f > (2 rows) > > select * from pg_locks where locktype='tuple'; > locktype | database | relation | page | tuple | transactionid | classid | > objid | objsubid | transaction | pid | mode | granted > --+--+--+--+---+---+-+---+--+-+---+---+- > tuple| 2474484 | 2474485 | 30 |11 | | | >| |18410135 | 13420 | ExclusiveLock | t > tuple| 2474484 | 2474485 | 30 |11 | | | >| |18410123 | 13419 | ExclusiveLock | f > (2 rows) > > > And pg_stat_activity shows two of the exact same queries in "waiting" > state. The "current_query" is just: > > UPDATE account set foo = 123 where id = $1 > > and $1 is indeed the same for both. > > > If I comment out that update to the "account" table from the > transaction I never get a deadlock. > > > > Maybe I'm missing something, but that by itself doesn't seem like a > deadlock situation. > > The "account" table does have a number of constraints, and one looks > like: > > CHECK( ( foo + bar ) <= 0 ); > > Could those be responsible? For a test I dropped all the constraints > (except foreign keys) and I'm still getting a deadlock. > > In general, do the constraints need to be deferrable and then defer > constraints at the start of the transaction? > > What else can I do to debug? > -- Bill Moseley. mose...@hank.org Sent from my iMutt -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Tracking down a deadlock
I need a bit of help understanding what might be causing a deadlock. To duplicate the problem I'm running a test script that forks two child processes. Each child runs the same transaction and thus the order of execution is exactly the same. (i.e. not like the typical deadlock where the order of updates might be reversed between two sessions.) The transaction inserts a new document into a document management system. The transaction does a number of selects and inserts. At the end of the transaction they both try and update the same row in the "account" table. It does not happen every time I run my test script -- but if I run it enough I get a deadlock. If I fork more child process I can make it happen more often. So, it does seem like a timing issue. No explicit LOCK or SELECT FOR UPDATE is used in the transaction. I'm running in the default "read committed" isolation level. The initial problem was reported on PostgreSQL 8.3.5, but I'm now testing on PostgreSQL 8.2.9. I've set my deadlock_timeout high so I can review the locks. I see these entires: select * from pg_locks where not granted; locktype| database | relation | page | tuple | transactionid | classid | objid | objsubid | transaction | pid | mode | granted ---+--+--+--+---+---+-+---+--+-+---+---+- transactionid | | | | | 18410123 | | | |18410135 | 13420 | ShareLock | f tuple | 2474484 | 2474485 | 30 |11 | | | | |18410123 | 13419 | ExclusiveLock | f (2 rows) select * from pg_locks where locktype='tuple'; locktype | database | relation | page | tuple | transactionid | classid | objid | objsubid | transaction | pid | mode | granted --+--+--+--+---+---+-+---+--+-+---+---+- tuple| 2474484 | 2474485 | 30 |11 | | | | |18410135 | 13420 | ExclusiveLock | t tuple| 2474484 | 2474485 | 30 |11 | | | | |18410123 | 13419 | ExclusiveLock | f (2 rows) And pg_stat_activity shows two of the exact same queries in "waiting" state. The "current_query" is just: UPDATE account set foo = 123 where id = $1 and $1 is indeed the same for both. If I comment out that update to the "account" table from the transaction I never get a deadlock. Maybe I'm missing something, but that by itself doesn't seem like a deadlock situation. The "account" table does have a number of constraints, and one looks like: CHECK( ( foo + bar ) <= 0 ); Could those be responsible? For a test I dropped all the constraints (except foreign keys) and I'm still getting a deadlock. In general, do the constraints need to be deferrable and then defer constraints at the start of the transaction? What else can I do to debug? Thanks, -- Bill Moseley mose...@hank.org Sent from my iMutt -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pgcrypto: is an IV needed with pgp_sym_encrypt()?
On Tue, Sep 18, 2007 at 11:26:17AM +0300, Marko Kreen wrote: > Few bytes being same is normal. Those are PGP packet header, > telling "this is symmetrically encrypted session key packet, > with length X" plus some more details. I see. So, you are saying no need to generate my own IV to prepend to the plain text before encrypting. > If more that few bytes are same, and if the salt is not randomised > it _could_ be a sign of problem. Either pgcrypto bug or failure > to initialize random generator. If you suspect a problem, please > send me few example encryptions with keys and your setup details > (postgres version, openssl or not, os version) No, it was only a few bytes that were similar, so the headers explain that. > > Besides following the PCI DSS and external audit procedures, the plan > > is to use pgcrypto (pgp_sym_encrypt() with AES-256) as part of a > > credit card storage server. The server and db are SSL only and the > > key is passed from the application and never stored anyplace (except > > in memcached on other servers during the session). The key is a > > user's plain text password plus an application-specific secret. So, > > each row has its own key. Passwords must be changed periodically, > > etc. > > I don't know details of your setup, but I strongly suggest you > look into using public-key crypto. That allow you separate keys > for encryption and decryption. So in webserver where users only > input credit cards, you keep only public keys, so anybody cracking > that won't be able to decrypt data. I need to look at that more. But I've seen that suggested where one needs to decrypt the data at a later time. We don't have that need. Our plan was to never store any keys. Every user must log in to the application with a password. Their account passwords are only stored hashed on disk, so we don't know their passwords. The plan is to encrypt their plain-text password with a secret known by the application only and stored into memcached. It's this plain-text password that will be sent to a separate server to encrypt and (and decrypt) their credit card data when the user make a transaction. We only need to store the credit card data to allow subsequent charges to their "card on file" -- and that only happens when a user logs in and processes a transaction. We don't have any way to decrypt the data without this password stored in the session. If someone hacks an application server they could pluck active user's passwords from memcached and also find the application's secret word. Then if they also hacked the credit card server they could then decrypt the data using passwords they were able to sniff. See any glaring holes? Thanks for the help! -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] pgcrypto: is an IV needed with pgp_sym_encrypt()?
I'm just starting with pgcrypto, and I'm curious if it's needed/recommended to use an initialization vector/value (IV) with the pgp_sym_encrypt() function. The docs hint that an IV is used automatically, but encrypting plain text that starts the same seems to result in initial common cipher text. So, I'm not clear. 2. Data is prefixed with block of random bytes. This is equal to using random IV. So, I'm currently generating a substring of a md5 hash of a few items and pre-pending that to the plain text I need to encrypt as the IV. Then when I decrypt I remove that prefix. BTW, this is for credit card storage, which is a business requirement. Besides following the PCI DSS and external audit procedures, the plan is to use pgcrypto (pgp_sym_encrypt() with AES-256) as part of a credit card storage server. The server and db are SSL only and the key is passed from the application and never stored anyplace (except in memcached on other servers during the session). The key is a user's plain text password plus an application-specific secret. So, each row has its own key. Passwords must be changed periodically, etc. I'd welcome any comments or recommendations from others that have implemented something similar. Thanks, -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Schema sanity check
I have an email client application where a person can have many mailboxes. Thus, I have "person" and "mailbox" tables: create table person ( id SERIAL PRIMARY KEY, nametext ); create table mailbox ( id SERIAL PRIMARY KEY, nametext, owner integer NOT NULL REFERENCES person ON DELETE CASCADE, UNIQUE ( owner, id ) -- see below ); Now, a person might like to have a default mailbox that opens up when they start the application. So I add a column to the person table: ALTER TABLE person ADD column default_mailbox integer REFERENCES mailbox ON DELETE SET NULL; Of course, I want to make sure that the person actually owns that mailbox, so add a constraint (which is why the UNIQUE is required above). ALTER TABLE person ADD CONSTRAINT default_mailbox_owner FOREIGN KEY (id, default_mailbox) REFERENCES mailbox(owner, id); Is this a sane way to set up a "default mailbox"? The other option is to have a column on the mailbox table to flag that it is a default_mailbox -- but then I'd have to ensure there's only one column for each "person" flagged that way. Two more related questions: First, if I delete a default mailbox the default_mailbox will be set set NULL. If instead I never delete a mailbox but rather add a boolean column "deleted". ON DELETE is no longer any help. Is my only option to use a trigger set NULL any default_mailbox column(s) that reference the mailbox when it is set "deleted"? Second question. So, after a while the obvious problem happens and users have too many mailboxes and they want a way to group them into "mailbox_groups" that are containers for mailboxes. So, we create a new table and alter the mailbox table. Each user has their own set of mailbox groups so I include an "owner" column: create table mailbox_group ( id SERIAL PRIMARY KEY, nametext, owner integer NOT NULL REFERENCES person ON DELETE CASCADE, ); ALTER TABLE mailbox ADD COLUMN mailbox_group int NOT NULL REFERENCES mailbox_group(id); Now, I'm wondering about the sanity of the design since this results in "owner" columns on both the mailbox and mailbox_group tables. Do I add a constraint to make sure that mailbox.mailbox_group references a group that has a matching owner? Or do I remove the "owner" column from mailbox table and alter all my access to mailbox to now do a join with the mailbox_group table (to find the owner)? (Or do I wonder why I didn't expose the database only through views in the first place?) Thanks, -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Performance issues of one vs. two split tables.
On Tue, May 15, 2007 at 09:20:53PM +0200, PFC wrote: > >From an outside perspective it just > >seems odd that potentially a large amount of data would be pulled off > >disk into memory that is never used. Perhaps there's an overriding > >reason for this. > > Yeah, where would you put this data if you didn't put it where it is > now ? Swish-e isn't a database by any means, but it does have a way to store column like meta data for each "row". When it does a search it only explicitly pulls from disk the meta data that it's asked to return. Granted, the OS is reading from disk more than the application is asking for, but the application is only allocating memory for the data it's going to return. And the column (meta data) is not always stored together on disk. Without knowing Pg internals I wasn't aware of how the actual table data was organized and fetched into memory. > "Premature optimization is the root of all evil" Exactly what prompted this thread. ;) -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Performance issues of one vs. two split tables.
On Tue, May 15, 2007 at 01:42:18PM -0400, Chris Browne wrote: > [EMAIL PROTECTED] (PFC) writes: > >> SELECT o.id > >> FROM order o > >> JOIN customer c on o.customer = c.id > >> > >> Does that bring into memory all columns from both order and customer? > >> Maybe that's not a good example due to indexes. > > > > No, it just pulls the columns you ask from the table, nothing > > less, nothing more. > > That's not quite 100% accurate. > > In order to construct the join, the entire pages of the relevant > tuples in tables "order" and "customer" will need to be drawn into > memory. > > Thus, if there are a whole bunch of columns on each table, the data in > those extra columns (e.g. - all columns aside from "id", the one that > was asked for in the result set) will indeed be drawn into memory. Is that specific to Postgresql? From an outside perspective it just seems odd that potentially a large amount of data would be pulled off disk into memory that is never used. Perhaps there's an overriding reason for this. > If you alter tables "customer" and "order", taking some columns off, > and stowing them in separate tables, then you'll find that more tuples > of "customer" and "order" will fit into a buffer page, and that the > join will be assembled with somewhat less memory usage. > > Whether or not that is a worthwhile change to make will vary > considerably. Makes designing the schema a bit tough. ;) -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Performance issues of one vs. two split tables.
On Tue, May 15, 2007 at 07:51:44AM +0200, Dawid Kuroczko wrote: > On 5/15/07, Bill Moseley <[EMAIL PROTECTED]> wrote: > >On Tue, May 15, 2007 at 06:33:26AM +0200, Dawid Kuroczko wrote: > >> Well, views are not going to help with memory consumption here. > >> It is the table contents that gets cached in buffer cache, not the > >> views contents. So if you have a view which returns only one > >> column from 15-column table, you will be caching that 15-column > >> data nonetheless. View, as the name states, is converted into > >> a select on a real table. > > > >Are you saying that in Postgresql: > > > >select first_name, last_name from user_table; > > > >uses the same memory as this? > > > >select first_name, last_name, > >passowrd, email, > >[10 other columns] > >from user_table; > > Yes. You read whole page (8KB) into buffer_cache, > then extract these columns from these buffer. From the > buffer cache point of view, whole tuple is contained in the > cache. Sorry, I don't mean to drag this thread out much longer. But, I have one more question regarding joins. Say I have a customer table and an order table. I want a list of all order id's for a given customer. SELECT o.id FROM order o JOIN customer c on o.customer = c.id Does that bring into memory all columns from both order and customer? Maybe that's not a good example due to indexes. See, I've seen this splitting of one-to-one tables a number of time (such as the user and user_preferences example) and I'm not sure if that's just poor schema design, premature optimization, or someone making smart use of their knowledge of the internal workings of Postgresql -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Performance issues of one vs. two split tables.
Can anyone provide input on this question? I'm curious how to look at this from a disk and memory usage perspective. Would using a bit column type help much? I'm not thrilled by the loss of referential integrity. On Mon, May 14, 2007 at 01:37:18PM -0700, Bill Moseley wrote: > > Say I have a table "color" that has about 20 different rows ("red", > "blue", "green", etc.). I want the user to be able to select zero or > more favorite colors. I would typically use a link table: > > create table favorite_colors ( > color int references color(id), > userint references user(id) > ); > > Now, that table can have a large number of rows if I have a large > number of users and if everyone likes all the colors. > > For some value of "large", is there a time when one might consider > using a single column in the user or user_prefs table to represent > their color choices instead of a link table? > > table user_prefs ( > ... > favorite_colors bit varying, > ... > ); > > Where each bit represents the primary key of the colors table. > > Seems like poor design, but I'm wondering if there might be overriding > concerns at times. > > For example, if I have 1 million users and they each like all colors > and thus have a 20 million row link table how much space would be > saved by using a bit column as above? -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Performance issues of one vs. two split tables.
On Tue, May 15, 2007 at 06:33:26AM +0200, Dawid Kuroczko wrote: > Well, views are not going to help with memory consumption here. > It is the table contents that gets cached in buffer cache, not the > views contents. So if you have a view which returns only one > column from 15-column table, you will be caching that 15-column > data nonetheless. View, as the name states, is converted into > a select on a real table. Are you saying that in Postgresql: select first_name, last_name from user_table; uses the same memory as this? select first_name, last_name, passowrd, email, [10 other columns] from user_table; -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Performance issues of one vs. two split tables.
On Mon, May 14, 2007 at 10:52:13PM +0200, PFC wrote: > > >Say I have a "user" table that has first, last, email, password, and > >last_accessed columns. This user table will be accessed often. (It's > >not really "user", but that's not important in this discussion) > > > >Say that there's also about 10 columns of settings or preferences for > >each user. Are there any cases or reasons to have a separate > >"user_preferences" table vs. just placing all the columns together in > >one table? > > I did something like that on MySQL some time ago. > In the Users table there was stuff that other users need to see > (like his login name, etc), and stuff that only this user needs to see > (like his preferences). > So, when displaying posts in the forum, for instance, only a small > part of the fields in the Users table was needed, the rest was just > dead > weight, that made the table unable to fit in RAM. Well, that's part of my question. If not selecting those columns in the common selects how much "dead weight" is brought along due to the extra columns defined in the table, if any? -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] Performance issues of one vs. two split tables.
Is there any benefit of splitting up a table into two tables that will always have a one-to-one relationship? Say I have a "user" table that has first, last, email, password, and last_accessed columns. This user table will be accessed often. (It's not really "user", but that's not important in this discussion) Say that there's also about 10 columns of settings or preferences for each user. Are there any cases or reasons to have a separate "user_preferences" table vs. just placing all the columns together in one table? Another related question: Say I have a table "color" that has about 20 different rows ("red", "blue", "green", etc.). I want the user to be able to select zero or more favorite colors. I would typically use a link table: create table favorite_colors ( color int references color(id), userint references user(id) ); Now, that table can have a large number of rows if I have a large number of users and if everyone likes all the colors. For some value of "large", is there a time when one might consider using a single column in the user or user_prefs table to represent their color choices instead of a link table? table user_prefs ( ... favorite_colors bit varying, ... ); Where each bit represents the primary key of the colors table. Seems like poor design, but I'm wondering if there might be overriding concerns at times. For example, if I have 1 million users and they each like all colors and thus have a 20 million row link table how much space would be saved by using a bit column as above? -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Trigger for Audit Table
On Fri, Mar 09, 2007 at 01:27:51PM -0800, [EMAIL PROTECTED] wrote: > You can/should create it as an AFTER UPDATE trigger. The OLD row will > contain the previous values. Curiously, also works with a BEFORE UPDATE. Off to review the docs -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Trigger for Audit Table
On Fri, Mar 09, 2007 at 06:50:39PM -0500, Tom Lane wrote: > Bill Moseley <[EMAIL PROTECTED]> writes: > > I'm asking for a sanity check: > > > And then an audit table: > > > create table template_history ( > > id SERIAL PRIMARY KEY, > > template_id integer NOT NULL REFERENCES template ON DELETE > > CASCADE, > > pathtext NOT NULL, > > content text NOT NULL, > > last_updated_time timestamp(0) with time zone NOT NULL > > ); > > Why would you want ON DELETE CASCADE? Or for that matter to have a > foreign key here at all? Surely the point of an audit table is to > remember history. If the audit entries all disappear the instant > the main-table entry is deleted, it's not much of an audit tool. In this case the templates are short lived so only want to track history while the "live" record is around. That is, it's expected that the template and all its history will get wiped out once in a while. Still, I agree with you that it would be better to use a different approach and not cascade delete. The foreign key is there so can find the history related to the primary record. That's what ties the history records together (the path can change during the life of the template). > > > My trigger is very simple: > > > CREATE OR REPLACE FUNCTION audit_template() RETURNS TRIGGER AS ' > > BEGIN > > INSERT INTO template_history > > ( template_id, path, content, last_updated_time, > > person ) > > select > > id, path, content, last_updated_time, person > > from > > template where id = 1; > > RETURN NEW; > > END' > > language 'plpgsql'; > > This is not going to work because the row's not there yet. This is a BEFORE *UPDATE* trigger, not a BEFORE INSERT, so the row is there. The audit table is written when the primary record changes and the old version is written to the audit table, not the new version. Yes, it's more common to write the audit for every insert and update (so the most recent version is also in the audit table). > (I won't bother pointing out the thinko in the WHERE clause); Darn cut-n-paste errors > and even if it did > work it'd be unnecessarily inefficient. Just use the NEW row that's > passed to the trigger: > > INSERT INTO template_history(...) VALUES(NEW.id, NEW.path, ...) Ok, but as the id is a sequence. I need to test if NEW.id is set after the insert -- seems like not, IIRC, and I'd need to use curval(). > If you have other BEFORE triggers on this table that can change the > NEW row, then it might be better to make this an AFTER trigger so it can > be sure the NEW row it sees won't change anymore. But AFTER triggers > are distinctly less efficient, so if you're not intending to add more > triggers then using a BEFORE trigger is probably the way to go. It's just that pesky sequence I need access to after the insert happens. Thanks for the tips, Tom. -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] Trigger for Audit Table
I'm asking for a sanity check: This is a very simple audit table setup where I use a BEFORE UPDATE trigger to save an existing record. The table stores templates (for a CMS) and looks something like this: create table template ( id SERIAL PRIMARY KEY, pathtext UNIQUE NOT NULL, content text NOT NULL, last_updated_time timestamp(0) with time zone NOT NULL default now() ); And then an audit table: create table template_history ( id SERIAL PRIMARY KEY, template_id integer NOT NULL REFERENCES template ON DELETE CASCADE, pathtext NOT NULL, content text NOT NULL, last_updated_time timestamp(0) with time zone NOT NULL ); (The "path" is not the primary key because the template's path might get renamed (moved), but I still want to track its history.) My trigger is very simple: CREATE OR REPLACE FUNCTION audit_template() RETURNS TRIGGER AS ' BEGIN INSERT INTO template_history ( template_id, path, content, last_updated_time, person ) select id, path, content, last_updated_time, person from template where id = 1; RETURN NEW; END' language 'plpgsql'; CREATE TRIGGER template_history_add BEFORE UPDATE ON template for each row execute procedure audit_template(); I realize this is a *BEFORE* UPDATE trigger, but I have this vague memory of seeing a post stating that you can't be sure the existing row has not been updated yet. Perhaps that was just a concern if another trigger was to modify the row. But, I can't seem to find that post now which is why I'm asking for the sanity check. Are there potential problems with this setup? -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Adjacency List or Nested Sets to model file system hierarchy?
On Mon, Feb 12, 2007 at 10:53:53AM -0500, Merlin Moncure wrote: > On 2/12/07, Richard Broersma Jr <[EMAIL PROTECTED]> wrote: > >> Can you describe in a little bit more detail about what you mean by > >> 'Adjaceny LIst'? > > > >Adjaceny list is the term used in the celko book to refer to a table that > >is recurively related to > >itself. > > > >create table foo ( > >idinteger primary key, > >parentid integer references foo (id), > >name varchar not null, > >); > > Above approach is ok but I can think of at least two other methods > that are probably better. First approach is to just store the whole > path in every record for each file. Yes, this is a pain for updates > but searching and children discovery is simple. in that case I would > define pkey as (path, file). Yes, that's what I meant by using a de-normalized table -- including the full path in the row. That would provide fast access to each row via a path name. And the parent id makes it easy to find all children of a given node and, well, the parent too. Separating the path and file as you suggest would make finding all "files" at a given directory level simple, too. But, I'm not thrilled about the possibility of the hard-coded path not matching the path up the tree to the root node, though. Which, of course, is why I posted. But, I'll give it a test. Thanks, -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Adjacency List or Nested Sets to model file system hierarchy?
On Mon, Feb 12, 2007 at 05:36:37PM +, Ian Harding wrote: > You don't mention the ltree contrib module, have you looked at it? It > can easily meet your requirements without having to reinvent anything. > It may be what you're referring to as Nested Sets, I don't know. I > use it and like it a lot. Yes, I have seen it. I just thought it seemed like a very large "hammer" to use form my task -- quite a few more query methods than I need . But, perhaps I should look at it again and get a better understanding of what it can do. Thanks, -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] Adjacency List or Nested Sets to model file system hierarchy?
I'm looking for a little guidance in representing a file system -- well just the file and directory structure of a file system. Often articles on representing a hierarchy discuss the advantages of using Nested Sets (or nested intervals) it seems. I'm not clear how well they apply to a file system-like hierarchy, though. The examples (and my limited understanding) of Nested Sets have the leaf nodes at the end of the branches, where in a file system a node can have both leaf nodes (files) and branches (directories). Also, the Nested Sets seem to solve problems I don't have -- such as finding all descendants of a given node. My simple requirements are: -- Quickly be able to lookup content by a full "path" name -- Provide "directory" views that shows parent, list of contents including any "sub-directories". -- To be able to easily move branches. It will not be a large collection of "files" in the tree, so that's not an issue. Seems like an Adjacency List along with a de-normalized "path" column in the leaf nodes would meet the requirements. But, as I see nested sets discussed so often I wonder which is a better approach. I assume this is a reasonably common problem so I'm curious how others have implemented it. Thanks, -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Missing domain socket after reboot.
On Thu, Jun 22, 2006 at 08:16:05AM -0400, Douglas McNaught wrote: > Bill Moseley <[EMAIL PROTECTED]> writes: > > > Hum. lsof knows about the file. > > > > $ lsof -p 1512 | grep /var/run > > postmaste 1512 postgres4u unix 0xf78b5980 1631 > > /var/run/postgresql/.s.PGSQL.5432 > > > > > > Any ideas what happened to the socket? > > Maybe something in your bootup process tried to clean up /var/run and > deleted it after the postmaster had started? That's what I thought, but my quick look couldn't find anything in the init scripts, not that that's conclusive: $ fgrep /var/run * | grep rm apache2:[ -f /var/run/apache2/ssl_scache ] && rm -f /var/run/apache2/*ssl_scache* bootclean.sh: rm -f /var/run/.clean bootmisc.sh:rm -f /tmp/.clean /var/run/.clean /var/lock/.clean portmap: rm -f /var/run/portmap.upgrade-state portmap:rm -f /var/run/portmap.state rsync: rm -f /var/run/rsync.pid rsync: rm -f /var/run/rsync.pid rsync: rm -f /var/run/rsync.pid umountnfs.sh:rm -f /tmp/.clean /var/lock/.clean /var/run/.clean But maybe postgresql is started too early. $ ls /etc/rc?.d | grep postgres | head -1 K20postgresql-8.1 K20postgresql-8.1 S20postgresql-8.1 S20postgresql-8.1 S20postgresql-8.1 S20postgresql-8.1 K20postgresql-8.1 Apache, for example, starts S91. /etc/rc2.d: K10atdS20courier-imap S20mysqld-helper S21nfs-common K10cron S20courier-imap-ssl S20netatalk S21quotarpc K10syslog-ng S20courier-mta S20nfs-kernel-server S23ntp-server S10sysklogd S20courier-pop S20ntop S25mdadm S11klogd S20courier-pop-ssl S20oidentdS30sysctl S14pppS20darwinss S20postfixS89cron S15logicalS20exim4 S20postgresql-8.1 S91apache2 S16mountnfsforlogical.sh S20grlogcheckS20rmnologin S91ifp_httpd S18atdS20httpd S20rsync S99jabber S18portmapS20httpd2S20saslauthd S99stop-bootlogd S19spamassassin S20inetd S20sshS99ud S19syslog-ng S20jabberS20syslog-ng S20binfmt-support S20makedev S20sysstat S20courier-authdaemon S20mysqldS20xmail > Be interesting to see if you can reproduce it... Next reboot I'll look again. It's a a production machine so I can't really bring it up one service at a time. -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] Missing domain socket after reboot.
After a reboot today Postgresql 8.1 came back up and started accepting connections over TCP but the unix socket file was missing. This is on Debian Stable, and I can't imagine what might of removed the file. Running psql I get: $ psql test psql: could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"? Yep, missing: $ ls -la /var/run/postgresql total 8 drwxrwsr-x 2 postgres postgres 4096 2006-06-21 17:03 . drwxr-xr-x 16 root root 4096 2006-06-21 21:10 .. Config looks ok: /etc/postgresql/8.1/main$ fgrep unix_socket_dir postgresql.conf unix_socket_directory = '/var/run/postgresql' Startup option: $ ps ux -u postgres | grep unix_socket postgres 1512 0.0 0.3 17564 3476 ?S17:02 0:00 /usr/lib/postgresql/8.1/bin/postmaster -D /var/lib/postgresql/8.1/main -c unix_socket_directory=/var/run/postgresql -c config_file=/etc/postgresql/8.1/main/postgresql.conf -c hba_file=/etc/postgresql/8.1/main/pg_hba.conf -c ident_file=/etc/postgresql/8.1/main/pg_ident.conf Hum. lsof knows about the file. $ lsof -p 1512 | grep /var/run postmaste 1512 postgres4u unix 0xf78b5980 1631 /var/run/postgresql/.s.PGSQL.5432 Any ideas what happened to the socket? I had to stop and start the postmaster to get the socket back. -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Anyone install 8.1 on Debian Stable?
On Tue, Apr 25, 2006 at 06:58:20PM +0100, Gavin Hamill wrote: > As per the instructions on the site - you use both :) (I meant which do you pin the specific package or use the -t option.) > > psql for both 7.x and 8.1 will use version 3 of the libpq API, so > there's no issue like with mysql 4.0 versus 4.1's new auth system. I see. I'll build DBD::Pg from source. The Dbdpg-general list recommended linking DBD::Pg against libpq4 instead of libpq3. > > And I assume I'll need to rebuild DBD::Pg -- and any tricks getting > > No rebuilding should be necessary for the same reasons above - > however if you do need to build something, grab > postgresql-server-dev-8.1 :) Great. Thanks for all the hand holding. ;) -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Anyone install 8.1 on Debian Stable?
On Tue, Apr 25, 2006 at 04:57:20PM +0100, Gavin Hamill wrote: > Very simple www.backports.org :) > > They have 8.1.3 and it works perfectly.. even the -contrib package is > there for cube/earthdistance, and the -dev package is there if you want > to compile Slony, etc. > > Follow http://www.backports.org/instructions.html for > /etc/apt/preferences, and then off you go... Ok. So as someone that tried to understand pinning once and gave up, do you pin the packages or use: apt-get -t sarge-backports install postgresql method? I guess it's in the docs, but how do you use psql with both versions? And I assume I'll need to rebuild DBD::Pg -- and any tricks getting it to link with the correct client library? It's the "managed" part of the Dreamhost server that has me worried. ;) Thanks, -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] Anyone install 8.1 on Debian Stable?
Anyone installed 8.1 on Stable? Did you build from source or use a backport? I've got a managed dedicated server at Dreamhost and trying to decide if building from source or using a backport is a better approach. I've had problems in the past with using backports on other servers (problems showed up when later upgrading the server). And I'm a bit concerned about a package conflicting with Dreamhost's management setup. (I had 7.4 installed and after some maintenance and a reboot the package was uninstalled.) And IIRC, the backports tend to bring in a number of dependency packages. If I build from source I need to update /etc/ld.so.conf to point to /usr/local (for linking with the driver) and install my own init.d scripts -- both of which I worry about in the managed environment. Plus, the socket, logs, pid are all not in the standard debian locations. So, anyone that has been through this have any advice? There isn't an init.d script in the distribution, right? -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] plgpsql and transactions
On Thu, Apr 06, 2006 at 03:48:15PM -0500, Terry Lee Tucker wrote: > > > > -- fires a trigger that updates more than one table > > insert into semething (default); > > > > and: > > > > begin; > > -- fires a trigger that updates more than one table > > insert into somthing (default); > > commit; > > > > In the latter, you have expanded the scope of the transaction; which, > sometimes you might want to do. Yes, I might. But, I'd like to understand it so I do know when I might or might not want to do it. Say I have a BEFORE INSERT trigger that does a table lock. When is that lock released? At the end of the trigger? Or after the INSERT has completed? For example, say I want to set a column on the row I'm inserting based on what's already in the table. So I lock the table in the trigger and check the current status of the table and set the column based on that current status. I want to make sure that between the time the trigger completes and when the insert finally happens that another session can't also do an insert and see the same table state. -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] plgpsql and transactions
On Thu, Apr 06, 2006 at 01:44:57PM -0500, Terry Lee Tucker wrote: > Triggers fire inside a transaction. Ah, thanks. Makes sense since each statement is in an implicit transaction. Granted, would help to see the trigger, but these are basically the same? -- fires a trigger that updates more than one table insert into semething (default); and: begin; -- fires a trigger that updates more than one table insert into somthing (default); commit; -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] plgpsql and transactions
In a BEFORE INSERT trigger, depending on input values, I need to lock a table and do a few selects. Of course, the "lock table" isn't much use if not currently in a transaction. So my question is this: can I tell if I'm inside a transaction or not and issue a BEGIN if not. And then also set a flag so that after the INSERT I can detect that I issued a BEGIN and do a COMMIT? Or, maybe better is to just throw an exception if not already inside a transaction. BTW -- it seems odd to me that you can issue a lock table outside of an explicit begin/commit and not get a warning. When would issuing a lock table outside an explicit transaction be of any use? -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Table locks and serializable transactions.
I need to insert a row, but how that row is inserted depends on the number of items existing in the table. I initially thought SERIALIZABLE would help, but that only keeps me from seeing changes until the commit in that session. Am I correct that if I need to insert a row into a table that contains column info based on the state of the table I need to lock the table in "share row exclusive mode"? In my case I have a table that holds registrations, and a registration has a column "status" that can be "confirmed", "wait list", or "cancel". Any inserts should be "wait list" if the number of existing "confirmed" is > $max_confirmed OR if any rows are marked "wait list". Obviously, I don't want to let another insert happen in another session between the select and insert. So, in that case is "share row exclusive mode" the way to go? I'm not that clear how locking and serializable work together: The serializable isolation level would only be needed if I wanted to see a frozen view of other selects (on other tables) during the transaction. That is, the locked table can't have updates in other sessions due to the lock so I'll see a frozen view of that table regardless of serializable. In other words, using serializable doesn't add anything if the table is already locked in the transaction and all I'm looking at is that one locked table. Thanks, -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Mixing different LC_COLLATE and database encodings
On Sat, Feb 18, 2006 at 09:31:27PM -0500, Greg Stark wrote: > Anything else and the collation just won't work properly. It will be > expecting UTF-8 and be fed ISO-8859-1 strings, resulting in weird > and sometimes inconsistent sort orders. So if I have utf8 encoded text and the lc_collate is anything but utf8 then sorting will be all wrong for any chars that don't map to ASCII (>127). Kind of a mess. > There's a certain amount of feeling that using any locale other than C is > probably not ever the right thing given the current functionality. Just about > any database has some strings in it that are really just ascii strings like > char(1) primary keys and other internal database strings. You may not want > them being subject to the locale's collation for comparison purposes and you > may not want the overhead of variable width character encodings. Is the Holy Grail encoding and lc_collate settings per column? Changing topics, but I'm going to play with different cluster settings for collate. If I create a cluster in given directory is there any problems with moving that cluster (renaming the directory)? Thanks for your comments, Greg. -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Mixing different LC_COLLATE and database encodings
On Sat, Feb 18, 2006 at 01:40:09PM -0500, Tom Lane wrote: > Bill Moseley <[EMAIL PROTECTED]> writes: > > - To clarify the first point, if the database is encoded utf-8 and > > lc_collate is en_US then Postgresql does NOT try to convert utf-8 to > > 8859-1 before sorting. > > Basically, this is a horribly bad idea and you should never do it. > The database encoding should always match what the locale assumes > for its character set (unless the locale is "C", which doesn't care). What's a bad idea? Having a lc_collate on the cluster that doesn't support the encodings in the databases? > We'd enforce that you never do it if we knew a portable way to determine > the character set assumed by an LC_COLLATE setting. Again, not sure what "it" is, but I do find it confusing when the cluster can have only one lc_collate, but the databases on that cluster can have more than one encoding. That's why I was asking how postgresql handles (possibly) different encodings. Are you saying that if a database is encoded as utf8 then the cluster should be initiated with something like en_US.utf8? And then all databaes on that cluster should be encoded the same? I suspect I don't understand how LC_COLLATE works that well. I thought the locale defines the order of the characters, but not the encoding of those characters. Maybe that's not correct. I assumed the same locale should sort the same chars represented in different encodings the same way. Maybe that's not the case: $ LC_ALL=en_US.UTF-8 locale charmap UTF-8 $ LC_ALL=en_US locale charmap ISO-8859-1 $ LC_ALL=C locale charmap ANSI_X3.4-1968 -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Mixing different LC_COLLATE and database encodings
On Sat, Feb 18, 2006 at 05:20:19PM +0100, Peter Eisentraut wrote: > > 2) What client encoding is used if the client does not specify one? > > the server encoding What's the server encoding? The environment when the cluster is started? How do you find out what it's running as? Does that mean if the encoding is anything other than "C" then Postgresql will convert? That is, if my database is utf8 and the server is en_US then text will be sent to the client as 8859-1? Not, that's not correct as I'm not seeing that. So I guess I'm not clear on that point. > > 5) I suppose there's not way to answer this, short of running > > benchmarks, but any ideas what using a lc_collate with utf-8 would do > > to performance? Is it a big hit? > > I don't know why that would be a problem. Just that sorting utf8 is a bit more work that sorting raw bytes. Thanks for the help, -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] Mixing different LC_COLLATE and database encodings
I've been going through the docs and list archives trying to get clear on encoding issues, but still have a few question. Do I have these statements correct? - LC_COLLATE is set on the cluster at initdb time. From that point on all database text is sorted based on that *regardless* of the encoding set on an individual database. - So for lc_collate="C" sorting is just based on the byte values, and if lc_collate="en_US" then sorting is based on the us_US order, and the bytes are assumed to be 8859-1 (if that matters). - To clarify the first point, if the database is encoded utf-8 and lc_collate is en_US then Postgresql does NOT try to convert utf-8 to 8859-1 before sorting. - If the "client encoding" and the database encoding differ then Postgresql will convert between the two encodings during I/O. - The exception is if *either* the client or the server's encoding is "SQL_ASCII" then no client<=>server conversion is done. Sound about right? 1) What else is the database's encoding used for besides to determine how to convert text in input and output based on the client encoding? 2) What client encoding is used if the client does not specify one? For example, I'm looking through Perl's DBD::Pg and I don't see any place where it calls PQsetClientEncoding(). http://search.cpan.org/src/DBDPG/DBD-Pg-1.41/ http://www.postgresql.org/docs/7.4/interactive/multibyte.html#MULTIBYTE-TRANSLATION-TABLE 3) The vast majority of my utf-8 encoded text that I need to display sorted probably maps to 8859-1 characters. I think I already answered this above, but: Am I correct that Postgresql is *not* converting text from the database encoding to the cluster encoding before sorting? That is with "C" it's just sorting in byte order, and with en_US it's just assuming that the bytes are 8859-1 and ignoring that it's really utf-8? That is, if I have text that's in utf-8 but includes characters that would map to 8859-1 (say accented chars), that sorting will not be correct because it's not converted to 8859-1 when sorting? 4) If the above is true, then if I wanted my utf-8 encoded text to be sorted correctly then I'd need to re-initdb using --encoding=en_US.UTF-8, correct? 5) I suppose there's not way to answer this, short of running benchmarks, but any ideas what using a lc_collate with utf-8 would do to performance? Is it a big hit? Not related to Postgresql, but testing some of this is confusing due to my environment. How do I get my xterm to work with utf8? Does ssh do something with encoding? If I have a utf8 xterm window open on my machine, then ssh to the server running postgresql where the default locale is "POSIX" Then running: LANG=en_US.utf8 psql utf8test utf8test=> \encoding UNICODE utf8test=> select first_name from person where last_name = 'Anderson'; Then I see: Zo But, if on that same remote machine I run a unicode xterm (uxterm in Debian) then in that xterm window I do: utf8test=> \encoding UNICODE utf8test=> select first_name from person where last_name = 'Anderson'; Zoë (correct) It's must slower running xterm remotely than using my local xterm and ssh, so it would be nice to be able to display the utf8. -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Limiting with a left outer join
On Sat, Feb 11, 2006 at 12:35:34AM -0700, Michael Fuhr wrote: > On Fri, Feb 10, 2006 at 11:59:30AM -0800, Bill Moseley wrote: > > How do I make a join on the class table but not effect the left outer > > join? > > Are you looking for something like this? > > LEFT OUTER JOIN (class INNER JOIN class_domain ON ...) c ON ... Why, yes I am. Thank you. I'll post my select below, just in case anyone cares to review it for sanity. ;) Something is not quite right about my schema, I fear. The idea of the domains is to limit viewing of classes and workshops to different groups of users. A given workshop may be available to more than one group. But, one problem is it's possible that a class and its parent workshop may not have a domain in common. Maybe that's something the application code needs to enforce, and not the database. BTW -- Is there a way to turn something like this into a view? The 2 domain bind parameters will alway match, and the only other input parameters are the two review mode booleans. That is, the input to the query is a domain id, and if "review_mode" must be false. SELECT w.id, count(c.id) as class_count, w.name as name, scheduled_message, no_scheduled_message, (CASE WHEN workshop_comment_end_time > now() THEN workshop_comment ELSE NULL END) AS workshop_comment, (CASE WHEN new_workshop_end_time > now() THEN '1' ELSE NULL END) AS is_new, w.review_mode as review_mode, workshop_category.name as workshop_cat FROMworkshop w INNER JOIN workshop_category ON ( workshop_category.id = w.workshop_category AND w.review_mode IS FALSE ) INNER JOIN workshop_domain ON ( workshop_domain.workshop = w.id AND workshop_domain.domain = ? ) LEFT OUTER JOIN (class INNER JOIN class_domain ON ( class_domain.class = class.id AND class_domain.domain = ? AND class.review_mode IS FALSE AND class.register_cutoff_time >= now() ) ) c ON (c.workshop = w.id ) GROUP BY1,3,4,5,6,7,8,9, w.start_display_time, w.stop_display_time HAVING ( count(c.id) > 0 ) OR ( (now() between w.start_display_time and w.stop_display_time) OR (w.stop_display_time IS NULL AND -- probably don't need to check for NOT NULL here w.start_display_time IS NOT NULL AND w.start_display_time <= now()) OR (w.start_display_time IS NULL AND w.stop_display_time IS NOT NULL and w.stop_display_time > now()) ) ORDER BYw.id -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Limiting with a left outer join
I've been staring at this too long. Could someone give me a bit of sql help? I want to show a list of upcoming workshops, and the number of sessions (classes) offered for each workshop: workshop_id | classes_offered | review_mode | workshop_cat -+-+-+-- 3 | 0 | t | On Ground 29 | 5 | f | On Ground 30 | 0 | f | On Ground 31 | 1 | f | On Line 61 | 3 | f | On Ground 62 | 2 | f | On Ground 63 | 1 | f | On Line A class is an instance of a given workshop (location and date given). A class references a workshop. Now, I'm using a LEFT OUTER JOIN to list workshops that don't have any classes assigned yet. Those are the zeros above. Where I'm stuck is I need to apply limits to what rows to select. For example, I don't want to include classes or workshops that are in "review_mode". Also, both workshops and classes can belong to "domains" (via link tables) so need to only look at those, too. Trying to do the "class_domain" join is where I'm stuck. Here's without that join, which sees to work: FROMworkshop w INNER JOIN workshop_category ON ( workshop_category.id = w.workshop_category AND w.review_mode IS FALSE ) INNER JOIN workshop_domain ON ( workshop_domain.workshop = w.id AND workshop_domain.domain = 1 ) LEFT OUTER JOIN class c ON ( c.workshop = w.id AND c.register_cutoff_time >= now() AND c.review_mode IS FALSE ) The class table also has a "class_domain" table (like the workshop_domain). But, I'm not seeing how to make that join. This pulls all the zeros out of the results: [...] LEFT OUTER JOIN class c ON ( c.workshop = w.id AND c.register_cutoff_time >= now() AND c.review_mode IS FALSE ) INNER JOIN class_domain ON ( class_domain.class = c.id AND class_domain.domain = 1 ) It's these left outer joins that always get me. What I think I need is something like: [...] LEFT OUTER JOIN class c ON ( c.workshop = w.id AND c.register_cutoff_time >= now() AND c.review_mode IS FALSE AND class_domain.class = c.id AND class_domain.domain = 1 ) But, that's not part of the join, of course. How do I make a join on the class table but not effect the left outer join? Thanks, -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] SQL Help: Multiple LEFT OUTER JOINs
On Mon, Nov 21, 2005 at 02:06:35PM -0800, Bill Moseley wrote: > The now working query (thanks to you!) is: No that doesn't work. It's dropping the people that have never been assigned a class to teach (i.e. don't have a row in the "instructors" link table). > FROM class INNER JOIN instructors ON class.id = instructors.class > LEFT OUTER JOIN person ON person.id = instructors.person, > person_role I really seem to need the multiple left outer join. This works: SELECT person.id AS id, last_name, person_role.role AS role, count(instructors.class), sum (CASE WHEN class_time > now() THEN 1 ELSE 0 END) as future_class_count, sum (CASE WHEN class_time <= now() THEN 1 ELSE 0 END) as past_class_count FROM person LEFT OUTER JOIN instructors ON (person.id = instructors.person) LEFT OUTER JOIN class ON (instructors.class = class.id), person_role WHERE person_role.person = person.id -- AND person_role.role = 2 GROUP BY person.id, last_name, person_role.role; I'm not clear how to move that "person_role.person = person.id" into the FROM statement. Does it matter? -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] SQL Help: Multiple LEFT OUTER JOINs
4 | 0 |4 9 | 4 | 4 | 0 |4 31 | 4 | 4 | 0 |4 -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] SQL Help: Multiple LEFT OUTER JOINs
On Mon, Nov 21, 2005 at 11:45:34AM -0600, Bruno Wolff III wrote: > On Mon, Nov 21, 2005 at 05:40:10 -0800, > Bill Moseley <[EMAIL PROTECTED]> wrote: > > > > Here's where I'm missing something. Trying to do an outer join on > > to bring in the class row with its class_time column: > > You don't say exactly why you are having a problem with this, but I think you > would be better off doing an inner join between instructors and class and > then do an outer join of that result to person. Sorry, I thought I was so far off it might be obvious. I suspect I'm making the query harder than it really is. This query just eats CPU and doesn't seem to finish, but I didn't let it run more than a minute (which is forever as far as I'm concerned). The tables are not that big (10,000 people, 1500 classes) > > SELECT person.id AS id, last_name, > > count(instructors.class) as total, > > sum (CASE WHEN class_time > now() THEN 1 ELSE 0 END) as > > future_class_count, > > sum (CASE WHEN class_time <= now() THEN 1 ELSE 0 END) as > > past_class_count > > > > > > FROM (person LEFT OUTER JOIN instructors ON (person.id = > > instructors.person)) t > > LEFT OUTER JOIN class on ( t.class = class.id ), > > person_role > > > > WHERE person_role.person = person.id > > AND person_role.role = 3 > > > > GROUP BY person.id, last_name; Well, I'm stabbing in the dark now. You mean like: SELECT person.id AS id, first_name, last_name, count(instructors.class) as total_classes, sum (CASE WHEN class.id IS NULL THEN 0 ELSE 1 END) as total_class_count, -- which is better? sum (CASE WHEN class_time > now() THEN 1 ELSE 0 END) as future_class_count, sum (CASE WHEN class_time <= now() THEN 1 ELSE 0 END) as past_class_count FROM (class INNER JOIN instructors ON ( class.id = instructors.class )) t LEFT OUTER JOIN person ON ( person.id = t.person ), person_role WHERE person_role.person = person.id AND person_role.role = 3 GROUP BYperson.id, first_name, last_name; Still eats CPU. GroupAggregate (cost=1750458.67..1890662.91 rows=10212 width=39) -> Sort (cost=1750458.67..1767958.67 rows=700 width=39) Sort Key: person.id, person.first_name, person.last_name -> Nested Loop (cost=111.27..140276.35 rows=700 width=39) -> Nested Loop (cost=91.27..256.35 rows=7000 width=35) -> Hash Join (cost=71.27..96.35 rows=7 width=31) Hash Cond: ("outer".id = "inner"."class") -> Seq Scan on "class" (cost=0.00..20.00 rows=1000 width=12) -> Hash (cost=71.25..71.25 rows=7 width=27) -> Nested Loop (cost=3.20..71.25 rows=7 width=27) -> Hash Join (cost=3.20..30.77 rows=7 width=12) Hash Cond: ("outer".person = "inner".person) -> Seq Scan on instructors (cost=0.00..20.00 rows=1000 width=8) -> Hash (cost=3.01..3.01 rows=75 width=4) -> Index Scan using person_role_role_index on person_role (cost=0.00..3.01 rows=75 width=4) Index Cond: (role = 3) -> Index Scan using person_pkey on person (cost=0.00..5.77 rows=1 width=23) Index Cond: ("outer".person = person.id) -> Materialize (cost=20.00..30.00 rows=1000 width=4) -> Seq Scan on instructors (cost=0.00..20.00 rows=1000 width=4) -> Materialize (cost=20.00..30.00 rows=1000 width=4) -> Seq Scan on "class" (cost=0.00..20.00 rows=1000 width=4) (22 rows) > > > > > > > > > > > > -- > > Bill Moseley > > [EMAIL PROTECTED] > > > > > > ---(end of broadcast)--- > > TIP 1: if posting/reading through Usenet, please send an appropriate > >subscribe-nomail command to [EMAIL PROTECTED] so that your > >message can get through to the mailing list cleanly > -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] SQL Help: Multiple LEFT OUTER JOINs
I need to generate a table of teachers, and the count of classes they taught in the past and are scheduled to teach in the future. id | last_name | totalfuture_class_count | past_class_count -+--+--++- 3 | Smith | 12 | 3 | 9 8 | Jones |0 | 0 | 0 table person id last_name table class id class_time table role id role_name -- for limiting to a type of teacher -- link tables table person_role person references person rolereferences role -- This table ties a person to a class, thus making them an instructor table instructors person references person class references class I can easily get instructors and the total count of their classes: SELECT person.id AS id, last_name, count(instructors.class) FROM person LEFT OUTER JOIN instructors ON (person.id = instructors.person), person_role WHERE person_role.person = person.id AND person_role.role = 3 -- limit to this type of teacher GROUP BY id, last_name; Here's where I'm missing something. Trying to do an outer join on to bring in the class row with its class_time column: SELECT person.id AS id, last_name, count(instructors.class) as total, sum (CASE WHEN class_time > now() THEN 1 ELSE 0 END) as future_class_count, sum (CASE WHEN class_time <= now() THEN 1 ELSE 0 END) as past_class_count FROM (person LEFT OUTER JOIN instructors ON (person.id = instructors.person)) t LEFT OUTER JOIN class on ( t.class = class.id ), person_role WHERE person_role.person = person.id AND person_role.role = 3 GROUP BY person.id, last_name; -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Wrong rows selected with view
On Wed, Nov 16, 2005 at 11:34:21PM -0500, Tom Lane wrote: > No, I'm saying that the underlying data (the join result before applying > DISTINCT ON) looks like this: > > bill=# select "class".id, person.id AS person_id > bill-#FROM "class", "location", region, person, instructors > bill-# WHERE "class"."location" = "location".id AND "class".id = > instructors."class" > bill-# AND instructors.person = person.id AND "location".region = region.id > bill-# ORDER BY "class".id; > id | person_id > --+--- > 1 |49 > 2 |27 > 3 |19 > 4 |82 > 5 |12 > ... > 1238 |61 > 1238 |60 > 1239 |40 > 1240 |67 > 1241 |11 > 1243 |26 > 1243 |84 > 1244 |26 > 1244 |84 > (1311 rows) > > The DISTINCT ON will take just one of the two rows with id = 1243, and > just one of the rows with id = 1244, and *it is effectively random which > one gets picked*. So when you then select rows with person_id = 84, you > may or may not see these rows in the end result. Yikes! The problem is *when* DISTINCT ON happens, right? And, Tom, you actually explained this to me on the list back on Aug 25th, but that's when I was using the view in a different way. You noted that the order was "unpredictable" but at that time it didn't matter which row was selected to me. http://archives.postgresql.org/pgsql-general/2005-08/msg01291.php This current problem was due to my assumption of how PG executes the query: My assumption was that the select would first do the joins (including limit by class.id = 84) *then* weed out the duplicate class.ids. But if PG is first doing the the joins on all the tables (before limiting by class.id = 84) and then weeding out the duplicate class.ids, and then finally limiting by class.id = 84 then I can see where I might end up wit the missing row. Frankly, I expected the first to happen because it would use an index to select just the records of class.id = 84, then do the joins on that small set of records. Didn't seem likely that the database would join all the records first and then limit by class.id. Seems like the hard way to do the query. But the query planner works in strange and mysterious ways. ;) Does that also explain why PG was sometimes returning the "correct" number of rows? Depending on which of the two query plans above were used? > Exactly. So your view is going to return the class id along with a > randomly selected one of the instructor ids. It seems to me that > filtering this result on instructor id is perhaps a bit ill-advised, > even if you fix the view so that the chosen instructor id isn't so > random (eg, you could fix it to display the lowest-numbered instructor > id for the particular class). Even then, are you searching for the > instructor id that the view happens to show for that class, or some > other one? Well, clearly, my "one-size-fits-all view" doesn't work in this case. I just need another view without distinct when limiting by instructor. It was that red-herring of removing a seemingly random column from the view that made it hard to see what was really happening. Thanks very much for all your time. -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Wrong rows selected with view
On Wed, Nov 16, 2005 at 07:48:06PM -0500, Tom Lane wrote: > Bill Moseley <[EMAIL PROTECTED]> writes: > > [ strange behavior ] > > Oh, duh, it's not a PG bug: the problem is that the view is > underspecified. You have > > SELECT DISTINCT ON (class.id) > ... a bunch of stuff ... > FROM ... a bunch of tables ... > ORDER BY class.id; > > The difficulty with this is that DISTINCT ON will take the first row in > each group with the same class.id. And since you're only sorting by > class.id, "the first row" is ill-defined. Sorry, but I fear I'm missing something. That ORDER BY is added by PG -- it's not part of my view when I define it. I assume PG adds that so it can do the DISTINCT ON. Still, I don't have any duplicate class.id rows in this select that I can see. class.id 1243 and 1244 are not the same, yet PG is (sometimes) throwing out one of them. Are you saying that somehow PG thinks they are the same class.id and is thus removing one? I'm asking for a list of all classes taught by instructor 84. ws2=> select * from instructors where person = 84 order by class; person | class +--- 84 | 727 84 | 739 84 | 804 84 | 813 84 | 867 84 | 1243 84 | 1244 (7 rows) The reason I'm using DISTINCT ON is because the "class_list" view is suppose to just return a list of unique classes, and a class might have more than one instructor which would result in extra rows -- as shown here: ws2=> select * from instructors where class in (select class from instructors where person = 84); person | class +--- 84 | 727 84 | 739 84 | 804 84 | 813 84 | 867 84 | 1243 26 | 1243 84 | 1244 26 | 1244 (9 rows) So when I don't want duplicates: ws2=> select distinct on (class) * from instructors where class in (select class from instructors where person = 84); person | class +--- 84 | 727 84 | 739 84 | 804 84 | 813 84 | 867 84 | 1243 84 | 1244 (7 rows) I don't care in this case about the DISTINCT ON throwing out the duplicates -- I just care about distinct classes, not that all the instructors are included in this select. And even if I throw in all my other joins I get the same thing: ws2=>SELECT DISTINCT ON (class.id) ws2-> class.id AS id ws2-> ws2-> FROM class, location, region, person, instructors ws2-> ws2-> WHERE class.location = location.id -- join with location ws2->AND class.id = instructors.class -- join the instructors ws2->AND instructors.person = person.id -- join the person(s) ws2->AND location.region = region.id -- join the location to a region ws2->AND person.id = 84; id -- 727 739 804 813 867 1243 1244 (7 rows) > I'm not sure why qsort's > behavior seems to depend on the width of the rows, but there's no doubt > that it's sorting different rows to the front of each group depending > on which view you use. I just don't see what groups there are, though in this case. > It could also be that you don't want to be using DISTINCT ON at all; > have you thought through exactly what this view ought to produce for > each class.id? Yes, I think so. A list of columns related to it, with the exception of when there's duplicate instructors I want one of those duplicates thrown out (and I don't care which one). When I do a query that generates duplicate class.id's such as when a class has more than one instructor: ws2=> select class.id AS class_id, ws2-> person.id AS person_id ws2-> ws2-> FROM class, location, region, person, instructors ws2-> ws2-> WHERE class.location = location.id -- join with location ws2->AND class.id = instructors.class -- join the instructors ws2->AND instructors.person = person.id -- join the person(s) ws2->AND location.region = region.id -- join the location to a region ws2->AND class_time > now(); class_id | person_id --+--- 561 |95 614 |95 747 | 111 762 | 111 772 | 111 883 |13 924 |26 935 |26 945 |26 1243 |84 1243 |26 1244 |84 1244 |26 (13 rows) You can see some classes are listed twice, so using distinct on gets just my list of unique classes: ws2=> SELECT DISTINCT ON (class.id) ws2->
Re: [GENERAL] Wrong rows selected with view
On Wed, Nov 16, 2005 at 10:53:21AM -0500, Tom Lane wrote: > Bill Moseley <[EMAIL PROTECTED]> writes: > > The first plan below returns the correct number of rows, the second plan > > does > > not. These are after I did the reindex, btw. > > Bizarre. What are the datatypes of the columns being joined on? If > they're string types, what's your database locale and encoding? The primary keys are all SERIAL, and the FKs are integer. Nothing too odd. The odd thing is the row that is not returned is basically a clone of another row -- which is why I diff'ed them in my first posting. BTW, this might be obvious, but the reason I'm doing DISTINCT ON class.id is that the instructors table is a link table and a class can have more than one instructor. I only want a list of classes, not one per instructor (which could duplicate them). I'm still a novice with Pg, so I assume this is what you are asking (although none of my joins are on text fields). ws2=> SHOW LC_CTYPE; -[ RECORD 1 ]--- lc_ctype | en_US ws2=> SHOW SERVER_ENCODING; -[ RECORD 1 ]---+--- server_encoding | LATIN1 So my joins are: WHERE class.location = location.id -- join with location AND class.id = instructors.class -- join the instructors AND instructors.person = person.id -- join the person(s) AND location.region = region.id;-- join the location to a region And the .id are all SERIAL integer and the FKs are all integer. Trying to avoid sending too much unnecessary data to the list, but here's a sample of the tables: ws2=> \d region Table "public.region" Column | Type | Modifiers +-+ id | integer | not null default nextval('public.region_id_seq'::text) active | boolean | not null default true sort_order | integer | not null default 1 name | text| not null Indexes: "region_pkey" primary key, btree (id) "region_name_key" unique, btree (nam ws2=> \d instructors Table "public.instructors" Column | Type | Modifiers +-+--- person | integer | not null class | integer | not null Indexes: "instructors_pkey" primary key, btree (person, "class") "instructors_class_index" btree ("class") "instructors_person_index" btree (person) Foreign-key constraints: "$1" FOREIGN KEY (person) REFERENCES person(id) "$2" FOREIGN KEY ("class") REFERENCES "class"(id) ws2=> \d class Table "public.class" Column |Type | Modifiers -+-+--- id | integer | not null default nextval('public.class_id_seq'::text) name| text| not null old_id | integer | location| integer | not null workshop| integer | not null class_time | timestamp(0) with time zone | not null class_end_time | timestamp(0) with time zone | not null class_size | integer | not null begin_reg_time | timestamp(0) with time zone | class_list_sent_time| timestamp(0) with time zone | class_list_sent_email | text| reminder_sent_time | timestamp(0) with time zone | ride_list_sent_time | timestamp(0) with time zone | html_description| text| not null short_description | text| special_instructions| text| on_hold_message | text| review_mode | boolean | not null default false workshop_group | integer | not null distance_ed | boolean | not null default false contract_class | boolean | not null default false online_evaluation | boolean | not null default true price_scheme| integer | not null duration| text| register_cutoff_time| timestamp(0) with time zone | not null cutoff_message | text| full_message| text| wait_list_size
Re: [GENERAL] Wrong rows selected with view
On Wed, Nov 16, 2005 at 03:42:19PM +0100, Andreas Seltenreich wrote: > How does the query plan change when you make those changes? If it only > occurs if a certain index is used, it might be corrupt (=> REINDEX). I did a "reindex database ws2;" and no change. I'm not very good at reading the query plans. For one thing, they always send me off on some tangent wondering why it's doing a Seq Scan instead of a index scan. ;) The first plan below returns the correct number of rows, the second plan does not. These are after I did the reindex, btw. ws2=> explain select id from bar where person_id = 84; Subquery Scan bar (cost=1225.81..1243.32 rows=6 width=4) Filter: (person_id = 84) -> Unique (cost=1225.81..1230.82 rows=1000 width=334) -> Sort (cost=1225.81..1228.31 rows=1003 width=334) Sort Key: "class".id -> Hash Join (cost=802.15..1175.81 rows=1003 width=334) Hash Cond: ("outer".person = "inner".id) -> Hash Join (cost=67.50..203.81 rows=1003 width=315) Hash Cond: ("outer".region = "inner".id) -> Hash Join (cost=45.00..163.77 rows=1002 width=279) Hash Cond: ("outer"."location" = "inner".id) -> Hash Join (cost=22.50..118.74 rows=1001 width=141) Hash Cond: ("outer"."class" = "inner".id) -> Seq Scan on instructors (cost=0.00..20.00 rows=1000 width=8) -> Hash (cost=20.00..20.00 rows=1000 width=137) -> Seq Scan on "class" (cost=0.00..20.00 rows=1000 width=137) -> Hash (cost=20.00..20.00 rows=1000 width=142) -> Seq Scan on "location" (cost=0.00..20.00 rows=1000 width=142) -> Hash (cost=20.00..20.00 rows=1000 width=40) -> Seq Scan on region (cost=0.00..20.00 rows=1000 width=40) -> Hash (cost=649.12..649.12 rows=10212 width=23) -> Seq Scan on person (cost=0.00..649.12 rows=10212 width=23) (22 rows) This returns one row less and the only change to the view is this commented out column: -- class.full_message AS full_message, -- this ws2=> explain select id from bar where person_id = 84; Subquery Scan bar (cost=1222.54..1240.05 rows=6 width=4) Filter: (person_id = 84) -> Unique (cost=1222.54..1227.55 rows=1000 width=366) -> Sort (cost=1222.54..1225.05 rows=1003 width=366) Sort Key: "class".id -> Hash Join (cost=779.65..1172.54 rows=1003 width=366) Hash Cond: ("outer".person = "inner".id) -> Hash Join (cost=45.00..204.14 rows=1003 width=347) Hash Cond: ("outer".region = "inner".id) -> Hash Join (cost=22.50..164.10 rows=1002 width=311) Hash Cond: ("outer"."location" = "inner".id) -> Merge Join (cost=0.00..119.06 rows=1001 width=173) Merge Cond: ("outer".id = "inner"."class") -> Index Scan using class_pkey on "class" (cost=0.00..52.00 rows=1000 width=169) -> Index Scan using instructors_class_index on instructors (cost=0.00..52.00 rows=1000 width=8) -> Hash (cost=20.00..20.00 rows=1000 width=142) -> Seq Scan on "location" (cost=0.00..20.00 rows=1000 width=142) -> Hash (cost=20.00..20.00 rows=1000 width=40) -> Seq Scan on region (cost=0.00..20.00 rows=1000 width=40) -> Hash (cost=649.12..649.12 rows=10212 width=23) -> Seq Scan on person (cost=0.00..649.12 rows=10212 width=23) (21 rows) -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Wrong rows selected with view
PostgreSQL 7.4.8 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.0.2 20050816 (prerelease) (Debian 4.0.1-5) Hopefully this is something simple -- I assume it's a problem with my SQL. But it looks really weird to me at this late hour. I have some tables for managing workshops, and I have a view (below). I noticed for a given select it was returning the wrong number of rows. I expected seven rows back, but only got six. I rewrote the view with just the joins and it returned the correct number of rows. So I started the brute force method of removing one thing at a time in the view to see what would make it start returning the correct number of rows. That just confused me more. Below if I comment out *any* single column that's marked "-- this" then I get the correct number of rows. If I comment out any rows marked "-- nope" then there's no change (get six rows returned). But, if I start commenting out more than one "-- nope" then I get seven rows. Can someone point out my error? create view bar AS SELECT DISTINCT ON (class.id) -- this class.id AS id, class.name AS name, -- this class.class_time AS class_time, -- this class.begin_reg_time AS begin_reg_time, -- this (CASE WHEN class.register_cutoff_time > class.class_time -- this THEN class.register_cutoff_time ELSE class.class_time END) AS register_stop_time, class.location AS location, -- nope class.workshop AS workshop, -- nope class.review_mode AS review_mode, -- nope class.workshop_group AS workshop_group, -- nope location.name AS location_name, -- this location.address AS address, -- this location.city AS city, -- nope location.state AS state, -- this location.zip AS zip, -- nope region.id AS region, -- nope region.name AS region_name, -- nope region.sort_order AS region_sort, -- nope person.id AS person_id, UPPER( person.last_name || person.first_name ) AS instructor, -- this class.cutoff_message AS cutoff_message,-- this class.full_message AS full_message, -- this class.wait_description AS wait_description -- this FROM class, location, region, person, instructors WHERE class.location = location.id -- join with location AND class.id = instructors.class -- join the instructors AND instructors.person = person.id -- join the person(s) AND location.region = region.id;-- join the location to a region select id from bar where person_id = 84; drop view bar; id -- 727 739 804 813 867 1244 (6 rows Comment out one column: -- class.full_message AS full_message, -- this class.wait_description AS wait_description -- this FROM class, location, region, person, instructors WHERE class.location = location.id -- join with location AND class.id = instructors.class -- join the instructors AND instructors.person = person.id -- join the person(s) AND location.region = region.id;-- join the location to a region select id from bar where person_id = 84; drop view bar; id -- 727 739 804 813 867 1243 1244 (7 rows) It's always class.id 1243 that doesn't show up. Not sure this helps, but: [EMAIL PROTECTED]:~$ echo '\x \\ select * from class where id = 1243' | psql ws2 > 1243 [EMAIL PROTECTED]:~$ echo '\x \\ select * from class where id = 1244' | psql ws2 > 1244 [EMAIL PROTECTED]:~$ diff -U 0 1243 1244 --- 12432005-11-15 20:16:26.619412721 -0800 +++ 12442005-11-15 20:16:30.438646443 -0800 @@ -3 +3 @@ -id | 1243 +id | 1244 @@ -8,4 +8,4 @@ -class_time | 2005-12-12 07:00:00-08 -class_end_time | 2005-12-12 14:00:00-08 -class_size | 55 -begin_reg_time | 2005-11-15 17:36:00-08 +class_time | 2005-12-25 07:15:00-08 +class_end_time | 2005-12-25 11:00:00-08 +class_size | 33 +begin_reg_time | @@ -27 +27 @@ -register_cutoff_time| 2005-12-11 19:00:00-08 +register_cutoff_time| 2005-12-24 19:15:00-08 Thanks, -- Bill "stabbing in the dark" Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Question about a query plan
I'm still trying to understand EXPLAIN ANALYZE output. ws2=> select count(*) from person_role; count --- 123 (1 row) ws2=> select count(*) from person; count --- 11033 (1 row) ws2=> EXPLAIN ANALYZE select id, first_name, last_name from person, person_role where id = 94 and person_role.person = person.id and (person_role.role = 2); QUERY PLAN --- Nested Loop (cost=0.00..8.28 rows=1 width=23) (actual time=0.198..0.237 rows=1 loops=1) -> Index Scan using person_pkey on person (cost=0.00..5.44 rows=1 width=23) (actual time=0.054..0.056 rows=1 loops=1) Index Cond: (id = 94) -> Seq Scan on person_role (cost=0.00..2.83 rows=1 width=4) (actual time=0.130..0.165 rows=1 loops=1) Filter: ((role = 2) AND (person = 94)) Total runtime: 0.379 ms (6 rows) Why does it say "Seq Scan" on person_role? The query has both the "person" and "role" to use as a primary key -- which is indexed. Indeed, "rows=1" so it looks like an index fetch. Perhaps, I'm reading that incorrectly? ws2=> \d person_role; Table "public.person_role" Column | Type | Modifiers +-+--- person | integer | not null role | integer | not null Indexes: "person_role_pkey" primary key, btree (person, role) Foreign-key constraints: "$2" FOREIGN KEY (role) REFERENCES role(id) ON DELETE RESTRICT "$1" FOREIGN KEY (person) REFERENCES person(id) ON DELETE CASCADE Thanks, -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Setting WHERE on a VIEW with aggregate function.
On Fri, Sep 16, 2005 at 04:56:25PM -0700, Roger Hand wrote: > So it would appear to me that you won't able to meet your goal by simply > using a view. > > However, there are other solutions you can use which may or may not be > appropriate. I can think of three at the moment. > > #1: Function Solution: I'll take a look at this. It's about time I started to learn about functions a bit. > This would be the best solution if you are in control of the > application source code. In Java, for example, it's relatively > simple to call this function and return the result as a result set. > If you're working in Java I'd be glad to show you same sample code. I'm using Perl and DBI (really Class::DBI but DBI is not far away). > #2: Simplify the Select Criteria Solution: > = > > A slightly less flexible approach, but one that may be workable, would be to > add two boolean columns (with default value of '0') to the class table: > "completed" and "current". Then once a semester you run a simple query that > updates them. Something like: > > UPDATE class SET current = '1' where class_time = '2005-09-01'; > UPDATE class SET completed = '1' where class_time < '2005-09-01'; Classes start daily (and at different hours). I could cron once an hour I suppose, but I'd rather not de-normalize the data. Maybe I can just create three views (future, recent, old) and live with that. The temporary table is another possibility I'll look into. Thank you very much for spending time on this. I really appreciate it. -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Setting WHERE on a VIEW with aggregate function.
On Fri, Sep 16, 2005 at 12:06:19PM -0700, Roger Hand wrote: > > select * from instructor_counts where class_time > now(); > > > > But class_time is not part of the VIEW so that's not valid. > > No problem, just make it a part of the view. See the classes section below. > > CREATE VIEW future_instructor_counts > AS > SELECT * FROM > > (SELECT >person.id AS person_id, > first_name, > last_name) personinfo > > INNER JOIN > > -- Add class_time field! > (SELECT class.id, class_time FROM class > WHERE class_time > now() ) classes > > INNER JOIN > > (SELECT >id, count(class) AS class_count > FROM instructors GROUP BY id) classcount > > ON personinfo.person_id = instructors.id > AND classes.id = instructors.id I couldn't get that to work -- Postgresql isn't that helpful just reporting "ERROR: syntax error at or near ";" at character 496" even after adding a FROM in the first select. So, I'm stabbing in the dark to get it to work. > [Disclaimer: I've not tested this code at all. It could help if you sent > table definitions and maybe even dummy > data via insert commands.] Ok -- this should be cut-n-paste: CREATE TABLE class ( id integer PRIMARY KEY, class_time timestamp(0) with time zone, nametext ); CREATE TABLE person ( id integer PRIMARY KEY, first_name text ); create table instructors ( person integer NOT NULL REFERENCES person, class integer NOT NULL REFERENCES class, PRIMARY KEY (person, class) ); INSERT INTO person (id,first_name) values (1,'Joe'); INSERT INTO person (id,first_name) values (2,'Mary'); INSERT INTO person (id,first_name) values (3,'Bob'); INSERT INTO person (id,first_name) values (4,'Cindy'); INSERT INTO class (id,name, class_time) values (1,'Math', now()); INSERT INTO class (id,name, class_time) values (2,'Math', now() + interval '1 day'); INSERT INTO class (id,name, class_time) values (3,'Science', now()); INSERT INTO class (id,name, class_time) values (4,'PE', now() + interval '1 day'); INSERT INTO instructors (person, class) values (1,1); -- joe teaches math now INSERT INTO instructors (person, class) values (1,2); -- joe teaches math tomorrow INSERT INTO instructors (person, class) values (2,2); -- with Mary INSERT INTO instructors (person, class) values (3,3); -- Bob teaches science now INSERT INTO instructors (person, class) values (4,3); -- Cindy teaches science tomorrow -- view CREATE VIEW instructor_counts AS SELECT person.id AS person_id, first_name, count(instructors.class) AS class_count FROM class, instructors, person WHERE class.id= instructors.class AND person.id = instructors.person -- AND class_time > now() GROUP BY person_id, first_name; select * from instructor_counts order by class_count desc; -- Returns: person_id | first_name | class_count ---++- 1 | Joe| 2 2 | Mary | 1 3 | Bob| 1 4 | Cindy | 1 (4 rows) My GOAL above is to be able to add a WHERE class_time > $some_time. Here's were I left off, which I never could get to work. The individual selects work, but seems like I need to be say c.class_id = i.class in addition. But I can't even get this without syntax errors: CREATE VIEW instructor_counts AS SELECT * FROM (SELECT person.id AS person_id, first_name FROM person) p INNER JOIN (SELECT class.id AS class_id, class_time FROM class) c INNER JOIN (SELECT person, count(class) AS class_count FROM instructors GROUP BY person) i ON ( p.person_id = i.person); That also looks like the selects are going to be full table scans. -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] Setting WHERE on a VIEW with aggregate function.
I have a view to generate a list of instructors and a count of their future classes. "instructors" is a link table between "class" and "person". CREATE VIEW future_instructor_counts AS SELECT person.id AS person_id, first_name, last_name, count(instructors.class) AS class_count FROM class, instructors, person WHERE class.id= instructors.class AND person.id = instructors.person AND class_time > now() GROUP BY person_id, first_name, last_name; I have two very basic SQL questions: 1) With an aggregate function in the query, is there any way to remove the "AND class_time > now()" so that timestamp can be passed in the select? That is, I'd like to be able to do this? select * from instructor_counts where class_time > now(); But class_time is not part of the VIEW so that's not valid. And if it was included then I don't have an aggregate function any more - no more grouping. 2) I think I'm missing something obvious. I know that I need to specify all my non-aggregate columns in the "GROUP BY", but I don't under stand why. Really, the results are just grouped only by person.id so why the need to specify the other columns. And if you don't specify all the columns then Postgresql reports: ERROR: column "person.id" must appear in the GROUP BY clause or be used in an aggregate function Is there a reason Postgresql doesn't just add the column automatically? It does in other cases (like a missing table in a join). Thanks -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Question about a query plan
On Fri, Sep 16, 2005 at 10:02:28AM -0500, Thomas O'Connell wrote: > > On Sep 16, 2005, at 9:18 AM, Bill Moseley wrote: > > >First question is why the planner is not using an index scan when I > >use "now()" or CURRENT_TIMESTAMP? > > It also used to be the case (pre-8.0; I couldn't find in the release > notes whether this was an 8.0 or 8.1 fix) that now() and > CURRENT_TIMESTAMP were not indexable, I think because of mutability. > > For older versions of postgres, it's recommended that you determine > the time in the client and use constant data in your query. Interesting. I have a few VIEWs that include now(), but I guess I could adjust and pass in the date from the client. Thanks for the tip. -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Question about a query plan
On Fri, Sep 16, 2005 at 04:45:57PM +0200, Peter Eisentraut wrote: > The planner thinks your query will return 414 rows, so it thinks the > sequential scan is faster. In reality, your query only retuns 28 rows, so > you need to create better statistics, either by running ANALYZE or VACUUM (or > both) or tweaking the statistics parameters of the columns. I did wonder about the planner stats, so I had run ANALYZE on the database with no change. I just now ran VACUUM (and VACUUM ANALYZE), and again see no change. Perhaps my table is just too small for this test. -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] Question about a query plan
y? EXPLAIN select person, first_name, count(class) from instructors, person where instructors.class in (select id from class where class_time > now() ) AND person.id = instructors.person group by person, first_name; QUERY PLAN -- HashAggregate (cost=734.06..735.15 rows=437 width=17) -> Merge Join (cost=706.81..730.78 rows=437 width=17) Merge Cond: ("outer".id = "inner".person) -> Index Scan using person_pkey on person (cost=0.00..1703.82 rows=12246 width=13) -> Sort (cost=706.81..707.90 rows=437 width=8) Sort Key: instructors.person -> Hash IN Join (cost=656.65..687.64 rows=437 width=8) Hash Cond: ("outer"."class" = "inner".id) -> Seq Scan on instructors (cost=0.00..20.08 rows=1308 width=8) -> Hash (cost=655.62..655.62 rows=414 width=4) -> Seq Scan on "class" (cost=0.00..655.62 rows=414 width=4) Filter: (class_time > now()) -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] GROUP BY requirement
I'm wondering if adding a GROUP BY (as required by Postgres) will change the results of a select on a view. I have the following view which joins a "class" with a teacher. A teacher is a "person" and I have an "instructors" link table. CREATE VIEW class_list (id, class_time, instructor ) AS SELECT DISTINCT ON(class.id) class.id, class.class_time, person.first_name FROM class, instructors, person WHERE instructors.person = person.id AND class.id = instructors.class; I also have a table "registration" that links students with a class. The registration table has a "reg_status" column to say if they are confirmed or on the wait_list. So when showing the above I'd also like to see how many students are confirmed and on the wait_list. DROP VIEW cl; CREATE VIEW cl (id, class_time, instructor, confirmed_cnt, wait_list_cnt) AS SELECT DISTINCT ON(class.id) class.id, class.class_time, person.first_name, sum (CASE WHEN registration.reg_status = 1 THEN 1 ELSE 0 END) as confirmed_cnt, sum (CASE WHEN registration.reg_status = 2 THEN 1 ELSE 0 END) as wait_list_cnt, FROM class, instructors, person, registration WHERE instructors.person = person.id AND class.id = instructors.class AND class.id = registration.class GROUP BY class.id, class.class_time, person.first_name; PostgreSQL requires the GROUP BY. But, I'm not clear how the GROUP BY might change the results between the two views above. http://www.postgresql.org/docs/8.0/static/sql-select.html#SQL-GROUPBY says: When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions, since there would be more than one possible value to return for an ungrouped column. Frankly, I cannot see how it might change results of a select between the two views. Am I missing something? -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Help with a subselect inside a view
On Thu, Aug 25, 2005 at 12:14:31PM -0400, Tom Lane wrote: > > CREATE VIEW cl (id, class_time, instructor) > > AS > > SELECT DISTINCT ON(class.id) > >class.id, class.class_time, person.first_name > > FROM class, instructors, person > > WHERE instructors.person = person.id > >AND class.id = instructors.class; > > This is allowed because the code automatically adds "ORDER BY class.id" > within the view (as you would see if you examined the view with \d). I see that now. Might be helpful for the docs to say that for folks like me. > It's fairly pointless though, because as the manual notes, you can't get > any well-defined behavior without additional ORDER BY columns to > prioritize the rows within class.id groups. As is, you're getting > random choices of class_time and first_name within the groups. > (Though maybe in this application, you don't care.) I'm not sure I follow what you are saying. I understand that I have no control over which "first_name" I end up with (and I don't really care), but class_time is a column in the "class" table which I'm using DISTINCT ON on, so that should be unique as well. So I assume you meant random choice of first_name, not class_time. Thanks, -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Help with a subselect inside a view
And about being efficient: On Thu, Aug 25, 2005 at 08:01:26AM -0700, Bill Moseley wrote: > DROP VIEW cl; > CREATE VIEW cl (id, class_time, instructor) > AS > SELECT DISTINCT ON(class.id) >class.id, class.class_time, person.first_name > FROM class, instructors, person > WHERE instructors.person = person.id >AND class.id = instructors.class; And in a case like above, I'm displaying the list a page at a time. So I first do a count to find total rows and then a select: select count(*) from cl where class_time >= now(); select * from cl where class_time >= now() LIMIT 20 OFFSET 40; I looked at the EXPLAIN ANALYZE for both and both do the join, it seems. I guess it has to be that way. So would it be smart to do the initial count on "class" instead of the view first? select count(*) from class where class_time >= now(); select * from cl where class_time >= now() LIMIT 20 OFFSET 40; That is, Postgresql won't figure out that it only need to look at one table, right? -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Help with a subselect inside a view
On Thu, Aug 25, 2005 at 08:05:36AM -0500, Bruno Wolff III wrote: > On Wed, Aug 24, 2005 at 23:12:17 -0700, > Bill Moseley <[EMAIL PROTECTED]> wrote: > > I need a little SQL help: > > > > I'm trying to get a subselect working inside a view. > > Unfortunately you didn't show us what you tried. My guess would be that > you didn't enclose the subselect in parenthesis. No, it wasn't that. I just didn't want to look too foolish. ;) DROP VIEW cl; CREATE VIEW cl (id, instructor) AS SELECT class.id, person.first_name FROM class, instructors, person WHERE instructors.person = person.id AND class.id = ( SELECT instructors.id FROM instructors, person WHERE instructors.class = class.id AND person.id = instructors.person LIMIT 1 ); Which returns a row for every row in "instructors" table. > The distinct on solution that was suggested is probably a better way to > go anyway. Turns out it is, Thanks. -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Help with a subselect inside a view
Hi David, On Thu, Aug 25, 2005 at 01:22:02AM -0700, David Fetter wrote: > This sounds like a case for PostgreSQL's nifty DISTINCT ON functionality. > > http://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). The ORDER BY clause will normally contain additional expression(s) that determine the desired precedence of rows within each DISTINCT ON group. I read that and thought it wasn't a drop-in replacement for my code due to the leftmost ORDER BY requirement. But, it seems to work even if that requirement is not met. Perhaps I not understanding the wording above? Or is Postgresql adding in the order automatically? My original VIEWS with duplicates: DROP VIEW cl; CREATE VIEW cl (id, class_time, instructor) AS SELECT class.id, class.class_time, person.first_name FROM class, instructors, person WHERE instructors.person = person.id AND class.id = instructors.class; select * from cl where id = 555; id | class_time | instructor -++ 555 | 2005-09-30 09:00:00-07 | Cheryl 555 | 2005-09-30 09:00:00-07 | Bob (2 rows) And with DISTINCT ON(): DROP VIEW cl; CREATE VIEW cl (id, class_time, instructor) AS SELECT DISTINCT ON(class.id) class.id, class.class_time, person.first_name FROM class, instructors, person WHERE instructors.person = person.id AND class.id = instructors.class; select * from cl where id = 555; id | class_time | instructor -++ 555 | 2005-09-30 09:00:00-07 | Cheryl (1 row) Here where the leftmost ORDER BY doesn't match the DISTINCT ON, which I thought was not possible: select * from cl where class_time > now() order by instructor limit 3; id | class_time | instructor -++ 544 | 2005-08-31 09:00:00-07 | Cheryl 555 | 2005-09-30 09:00:00-07 | Cheryl 737 | 2005-08-30 09:00:00-07 | Cynthia -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] Help with a subselect inside a view
I need a little SQL help: I'm trying to get a subselect working inside a view. I have a table "class" that has related tables (a class has a location, a location has an address with columns city, state, zip). I want to use a VIEW to display columns related to a given class. But a class can also have one or more instructors. So I have a link table: Table "public.instructors" Column | Type | Modifiers +-+--- person | integer | not null class | integer | not null Foreign-key constraints: "$1" FOREIGN KEY (person) REFERENCES person(id) "$2" FOREIGN KEY ("class") REFERENCES "class"(id) I can do the following, but in the (very rare) case where there may be two instructors assigned to the class I will get two rows back. CREATE VIEW class_list ( id, name, class_time, location, location_name, address, city, state, zip, instructor_name ) AS SELECT class.id, class.name, class.class_time, class.location, location.name, address.id, address.city, address.state, address.zip, person.last_name FROM class, location, address, instructors, person WHERE class.location = location.id AND location.address = address.id AND location.region = region.id -- Not what I want AND instructors.person = person.id AND instructors.class= class.id; I'm completely happy to just fetch just one of the instructors, and don't care which one. I just need only one row per class. (I assume that's my hint right there.) I can select a single instructor from a given class like: SELECT person.id FROM instructors, person WHERE instructors.class = 555 AND person.id = instructors.person LIMIT 1; So I thought I might be able to add that as a subselect to the VIEW, but I have not been able to make it work. I suspect I'm missing something obvious. Thanks, Oh BTW -- If I do a count(*) and a WHERE that only includes columns in the "class" table on the VIEW, will Postgresql still do the joins? Or will it only do the select on the "class" table. I suspect it will do the joins to make sure the relations can be found. -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] Optimize a VIEW question
I'm just starting out using VIEWs -- and mostly new to postgresql and I'm trying to learn a little about reading the EXPLAIN ANALYZE output. I have the following view setup and I'm wonder where to create indexes, and mostly if I'm doing anything the incorrect or a very expensive way. CREATE VIEW class_list ( id, name, class_time, location, workshop, review_mode, workshop_group, location_name, address, city, state, zip, region, region_name ) AS SELECT class.id, class.name, class.class_time, class.location, class.workshop, class.review_mode, class.workshop_group, location.name, address.id, address.city, address.state, address.zip, region.id, region.name FROM class, location, address, region WHERE class.location = location.id AND location.address = address.id AND location.region = region.id; I'm not clear about the Seq Scan below. The region table is quite small, so am I correct that is why the planner is doing a seq scan on that table? \d region Table "public.region" Column | Type | Modifiers +-+ id | integer | not null default nextval('public.region_id_seq'::text) active | boolean | not null default true sort_order | integer | not null default 1 name | text| not null Indexes: "region_pkey" primary key, btree (id) "region_name_key" unique, btree (name) EXPLAIN ANALYZE select * from class_list where workshop = 28; QUERY PLAN -- Nested Loop (cost=51.78..93.07 rows=9 width=157) (actual time=1.306..1.468 rows=6 loops=1) -> Hash Join (cost=51.78..76.87 rows=8 width=129) (actual time=1.245..1.299 rows=6 loops=1) Hash Cond: ("outer".id = "inner".region) -> Seq Scan on region (cost=0.00..20.00 rows=1000 width=36) (actual time=0.016..0.027 rows=10 loops=1) -> Hash (cost=51.76..51.76 rows=8 width=97) (actual time=1.019..1.019 rows=0 loops=1) -> Hash Join (cost=26.68..51.76 rows=8 width=97) (actual time=0.201..1.007 rows=6 loops=1) Hash Cond: ("outer".id = "inner"."location") -> Seq Scan on "location" (cost=0.00..20.00 rows=1000 width=44) (actual time=0.014..0.694 rows=104 loops=1) -> Hash (cost=26.66..26.66 rows=7 width=57) (actual time=0.150..0.150 rows=0 loops=1) -> Index Scan using class_workshop_index on "class" (cost=0.00..26.66 rows=7 width=57) (actual time=0.057..0.137 rows=6 loops=1) Index Cond: (workshop = 28) -> Index Scan using address_pkey on address (cost=0.00..2.01 rows=1 width=32) (actual time=0.013..0.015 rows=1 loops=6) Index Cond: ("outer".address = address.id) Total runtime: 1.853 ms (14 rows) By the way -- at one point I managed to hang postgresql (7.4.8-16 on Debian Sid). I have not been able to make it happen again, but it seemed odd. (gdb) bt #0 0x081e51ee in tuplestore_gettuple () #1 0x0810c7f0 in ExecMaterial () #2 0x08102cb2 in ExecProcNode () #3 0x0810d8d5 in ExecNestLoop () #4 0x08102ceb in ExecProcNode () #5 0x081093a4 in ExecAgg () #6 0x08102c79 in ExecProcNode () #7 0x08101ecc in ExecutorRun () #8 0x0816f58b in PortalSetResultFormat () #9 0x0816f8c7 in PortalRun () #10 0x0816da9f in PostgresMain () #11 0x08148b4e in ClosePostmasterPorts () #12 0x0814a4e1 in PostmasterMain () #13 0x0811c2e7 in main () -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Input timestamp from epoch?
I'm entering data into a Postgresql database where the input dates are currently in unix epoch values. Is there a way for Postgresql to parse that epoch into a "timestamp(0) with time zone"? I read section 8.5 "Date/Time Types" and I can see I can input an (well THE) epoch, and I can EXTRACT(EPOCH FROM [timestamp]), but I'm not seeing where I can use an epoch format as input for a timestamp(0) with time zone. I'm not convinced that I want to store the value as a timestamp. My time values are all within the epoch range, so that's not an issue. Then represent a point in time (and event), and I need to display them in various timezones (depending on where the event is happening). And the epoch is reasonably easy to work with. I guess a timestamp(0) with time zone is basically the same thing -- but gives me date operations on Postgresql. Probably faster for my client application to parse epoch from the database, though. -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Waiting on a transaction
On Thu, Aug 18, 2005 at 01:33:18PM -0400, Vivek Khera wrote: Hey Vivek! Nice to hear from you over here. ;) > The trick is dealing with statement timeouts on shared pool > connections over mod_perl and Apache::DBI. I haven't satisfied > myself yet that the timeout will be unset when the next connection > uses the DB... You mean other than setting "alarm 0;"? -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] How to "ping" the database
On Wed, Aug 17, 2005 at 04:25:48PM -0400, Tom Lane wrote: > Of course, this begs the question of what ping is really supposed to > test and when it's supposed to be allowable. The above will not work > if in the middle of retrieving a query result, for example. Well, there's that. I'm not really sure why there's a need for a ping -- and I've heard others question it, too. Perl's DBI has a connect_cached() function that is suppose to return a cached connection if it's still alive. So that is one place "ping" is used. If ping fails then a new connection is created. -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Schema design question
On Wed, Aug 17, 2005 at 07:41:20PM +, Matt Miller wrote: Thanks for responding, Matt: > create table course (id serial primary key, > description varchar); > create table teacher (id serial primary key, > name varchar); > create table course_teacher (course_id integer not null, > teacher_id integer not null); [...] > create table class (id serial primary key, > course_id integer not null, > teacher_id integer not null, > starts_on date, > location varchar); There may be more than one teacher in the class so instead I'd need another "class_teacher" link table. I guess what "bugged" me about this type of layout is that a course and class share so many columns. Duplication just looks wrong -- and I worry about changing a column type on one table and forgetting to change it on the other table. Also have to remember to copy all columns every time a specific class is created. On the other hand, if I used a single table to represent both types of entities, then selects are always going to have something like WHERE type = 'course' added onto the WHERE. That's extra processing for no good reason. > I'm sure there are many ways to get there. To me, the way I've > described is the most-direct way to represent the relationships you've > described. And thanks very much for you help. -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Schema design question
I originally was planning on asking about Postgresql inheritance, but after reading the docs again I do not think that it applies to my design. So, now I'm asking a rather basic schema design question. I have two related objects where one inherits column values from another. No, that's incorrect. The "child" receives default values from the "parent" when the child is created. A more concrete example: tables that represent classes taught at a school. The parent object is a general course description, and the child object is a specific instance of a course -- a "class" -- which is a course taught at a given time and location. A course can be taught multiple times, obviously. A course (and thus a class) can have multiple instructors -- a many-to-many relationship. So I have a link table for that. A class normally uses the course's default instructors, but may be different for specific classes instance. How would you layout the tables for somethings like this? I can think (out loud) of three ways to set this up: 1) Separate tables for "course" and "class" and when a class is created simply copy column data from the course to the class. Pro: Selects are simple Con: Column duplication in the two tables -- two tables look a lot alike Need to have duplicate link tables (one pointing to each table) 2) Create a third "common_values" table that both "course" and "class" tables reference. Then when creating a class from a course clone the common values row to a new row that the class can reference. Pro: No duplication of columns in multiple tables. Only need one linking table for instructors (but still need to create new links when creating the new row) Con: Need to always do joins on selects (not really a problem) 3) Create a single table with a flag to indicate if the row is a "course" or a "class". Pro: Simple selects and no column duplication between tables Con: Columns for a course might be ok as NULL, but would be required for a specific class. Again, a "course" and "class" are very similar. But, once a class is created from a course it really is its own entity. For example, if the course description changes in the future I don't want it to change on previous classes. There also needs to be a link between the two. For example, you might want to show a list of courses, and then see what classes are scheduled for a given course, so a class should reference its parent course. Thanks very much, -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] How to "ping" the database
The Perl DBI interface to Postgresql, module DBD::Pg, has a ping() method that is suppose to determine if a database connection is alive. It can be seen here (see: dbd_db_ping): http://search.cpan.org/src/DBDPG/DBD-Pg-1.43/dbdimp.c It pings by calling: status = _result(imp_dbh, "SELECT 'DBD::Pg ping test'"); This fails when a transaction fails -- for example when doing a serialized transaction and another session preforms an update between the serialized transaction's SELECT and UPDATE. In this situation no SELECTS are allowed until a ROLLBACK. In Perl, this failure of Ping results in a new database connection being created, even though the connection is still valid. I'm about to post a bug report on DBD::Pg, but I'm wondering if anyone here could suggest a better way to implement ping() that doesn't fail just because Postgresql is not allowing SELECTS. What I did in my code was if ping fails, call rollback and then try ping one more time. But, I'm not clear if that works in a more general case or what might happen if the connection really is broken. -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Waiting on a transaction
On Tue, Aug 16, 2005 at 08:25:25PM +0200, Martijn van Oosterhout wrote: > On Tue, Aug 16, 2005 at 11:05:35AM -0700, Bill Moseley wrote: > > I've read over the docs on Concurrency Control but still not clear > > about when transactions block other updates, and how to deal with that > > on the application level. > > > > If I do a BEGIN and an UPDATE in one psql session and then try and do > > an UPDATE in another psql session that UPDATE waits until either a > > COMMIT or ROLLBACK. > > > > Is it common for applications using Postgresql to set a timer on > > updates and abort? > > It is not normal to hold a transaction open while doing nothing. If you > always send transactions without delays the issue doesn't come up > because you never have to wait long enough for it matter. Ok, that will be the normal case. I was just wondering because I was running test code today and it hung. I wondered what was happening and found out I left a psql window open last night in the middle of a transaction. That shouldn't happen in production. So then I wondered if my application should set an alarm and timeout with an error if, by odd chance, an update hangs. Trying to be a bit more robust -- not that the application could recover, but at least it could spit out an error other than hang. -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] Waiting on a transaction
I've read over the docs on Concurrency Control but still not clear about when transactions block other updates, and how to deal with that on the application level. If I do a BEGIN and an UPDATE in one psql session and then try and do an UPDATE in another psql session that UPDATE waits until either a COMMIT or ROLLBACK. Is it common for applications using Postgresql to set a timer on updates and abort? -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Sorting by related tables
On Mon, Aug 15, 2005 at 11:30:32PM +0200, Andreas Seltenreich wrote: > > This would be one possibility. If you don't want your application to > deal with transactions being aborted because of non-serializable > transactions, you could alternatively use explicit locking (SELECT ... > FOR UPDATE) combined with the Read Committed isolation level (the > default). SELECT FOR UPDATE locks just the rows that are selected, right? If I understand correctly, that would not work for my case because I'm updating different rows than I'm selecting. My tables are small, so I'm thinking of just manually updating all the rows in sequence to adjust the order when needed -- to make things a bit more simple. But it is a problem that I am curious about how best to solve in a scalable way. Thanks very much for your feedback. -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Sorting by related tables
On Sat, Aug 13, 2005 at 06:44:09PM +0200, Andreas Seltenreich wrote: > > 3) Oh, and I have also this for checking IF there are items in > > "region" that are "above" the item in question -- to see IF an item > > can or cannot be moved up in the sort order relative to others. > > > > SELECT id FROM __TABLE__ > > WHERE > > sort_order <= (SELECT sort_order FROM __TABLE__ WHERE id = ?) > > AND id != ?; > > > > If that returns any rows then I know I can call the UPDATE to move the > > item up. > > I guess you want a boolean value here? SELECT EXISTS around your above > query as a subselect should do the trick. You also want to use LIMIT 1 > in the statement, to avoid fetching unnecessary records. Is there much of a difference between using LIMIT 1 and using an EXISTS subselect? Frankly, I'm not clear what you are specifically suggestion with EXISTS. I'm using Perl's Class::DBI object mapping module so returning a single row is an easy way to check this as a boolean result in Perl. > > Again, a very basic question: What method should be used to be sure > > that nothing changes between the SELECT and the UPDATE? > > You can achieve that using transactions. Concurrency control is > explained here: <http://www.postgresql.org/docs/8.0/static/mvcc.html>. My comment was that I want to do the above SELECT and then *only* do an UPDATE if the SELECT returns at least one row. So, I should do: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; Before the SELECT. And when I UPDATE I need to be prepared to do a ROLLBACK if I get an error and repeat the process. (And, I assume, take some precaution to give up after some number of tries.) Does that seem reasonable? -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] Sorting by related tables
I have a few beginner questions about using related tables for sorting. create table region { id SERIAL PRIMARY KEY, nametext, -- order this table should be sorted in -- a "1" is the top sort level sort_order integer ); create table city { id SERIAL PRIMARY KEY, nametext, region integer REFERENCES region ); I want a way to adjust the sort order of the "region" table ("move item up" or "move item down" in a web interface) without requiring knowledge of the existing "sort_order" for the rows in the region table. (i.e. requiring them to already be in an order). Here's my "move up" (which is a lower sort_order value) statement. (__TABLE__ is "region" and ? are $\d bind parameters) UPDATE __TABLE__ SET sort_order = CASE -- subtract one from the item's sort, unless it's already "1" WHEN id = ? AND sort_order > 1 THEN sort_order-1 -- for other items that are greater or equal to sort-1 WHEN id != ? AND sort_order >= (select sort_order from __TABLE__ where id = ?)-1 THEN sort_order+1 -- all others, leave alone ELSE sort_order END; This works reasonably well for small tables, but doesn't scale and the logic likely has holes. And behavior when adding new rows to the region table is not defined. 1) How do most people do this? Use linked lists? create table region { id SERIAL PRIMARY KEY nametext, list_head boolean, -- flag if this is the head of the linked list nextinteger REFERENCES region ); 2) As a SQL beginner, I'm not seeing how to display rows from "city" sorted in order based on the order in the "region" table. 3) Oh, and I have also this for checking IF there are items in "region" that are "above" the item in question -- to see IF an item can or cannot be moved up in the sort order relative to others. SELECT id FROM __TABLE__ WHERE sort_order <= (SELECT sort_order FROM __TABLE__ WHERE id = ?) AND id != ?; If that returns any rows then I know I can call the UPDATE to move the item up. Again, a very basic question: What method should be used to be sure that nothing changes between the SELECT and the UPDATE? -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster