[GENERAL] Spurious rows returned with left join?
I think I have stumbled on a bug, though I'm not entirely sure about that. Things do seem to get a little fuzzy when using outer joins Consider the following: create table t1(t1_a int); insert into t1 values (1); insert into t1 values (2); insert into t1 values (3); create table t2(t2_a int , t2_b int); insert into t2 values (1, 1); insert into t2 values (2, 1); create table t3(t3_b int, t3_c int); insert into t3 values (1, 9); insert into t3 values (1, 10); select * from t1 left join t2 on t1_a = t2_a left join t3 on t2_b = t3_b order by t1_a, t2_b; t1_a | t2_a | t2_b | t3_b | t3_c --+--+--+--+-- 1 |1 |1 |1 |9 1 |1 |1 |1 | 10 2 |2 |1 |1 |9 2 |2 |1 |1 | 10 3 | | | | 3 | | | | Note that I get 2 rows where t1_a = 3. My pgsql version is: PostgreSQL 8.1.3 on i686-redhat-linux-gnu, compiled by GCC gcc (GCC) 3.4.4 20050721 (Red Hat 3.4.4-2) (RedHat ES4 rpms from postgresql.org) I have tried this same select on pgsql ver-7.4.8, CVS tip, as well as SQL Server 2005, MySQL and Firebird, all of which return just one row for t1_a = 3. Since writing the original query I've realized that this is partly a problem with the INNER JOINS before OUTER JOINS query writing principle (law?) and the query should perhaps be better written as: select t1.*, t2.*, t3.* from t2 join t3 on t2_b = t3_b right join t1 on t1_a = t2_a; which does return just one row for t1_a =3 for all postgres versions I have currently available. Edmund ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Spurious rows returned with left join?
Tom Lane wrote: Edmund Bacon [EMAIL PROTECTED] writes: Consider the following: ... Note that I get 2 rows where t1_a = 3. Are you getting a Merge Right Join plan for that? If so, you're likely getting bit by this bug: 2006-03-17 14:38 tgl That's correct. After doing an analyze on t1,t2,t3 the plan changed to Hash Left Join, and I got the expected results. This'll be fixed in 8.1.4, or if you're in a hurry you can get the patch from our CVS server. regards, tom lane No great worries. As remarked in original post, by doing the inner join first, I got the right results. I didn't see anything in the archives about this, so I thought it might be an unknown issue. Admittedly, I didn't scour the archives too thouroghly .. Edmund ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] joining tables
[EMAIL PROTECTED] writes: Hi, If you have two tables, each with a column called keys and a column called values, and they are both incomplete, such as: table 1: keys | values -+-- 1| (null) 2| two 3| (null) table 2: keys | values -+- 1| one 2| (null) 3| three is there a way to join them, in order to get: keys | values -+- 1| one 2| two 3| three The closest I could get was with NATURAL FULL JOIN: keys | values -+- 1| one 1| (null) 2| two 2| (null) 3| three 3| (null) Thanks Try something like: SELECT key, CASE when table1.value IS NOT NULL THEN k1.value ELSE table2.value END as value FROM table1 FULL JOIN table2 USING(key); You might want to use 'IS DISTINCT FROM table2.value' if you want the value for table1 to be returned in preference to table2.value. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] find last day of month
Andrus Moor [EMAIL PROTECTED] writes: I have a table containing month column in format mm. create table months ( tmkuu c(7)); insert into months values ('01.2005'); insert into months values ('02.2005'); How to create select statement which converts this column to date type containing last day of month like '2005-01-31' '2005-02-28' As long as you are using DateSytle = DMY the following should work: ebacon=# select * from months; tmkuu - 01.2005 02.2005 12.2004 (3 rows) ebacon=# select (('01.' || tmkuu)::date + interval '1 month' - interval '1 day')::date from months; date 2005-01-31 2005-02-28 2004-12-31 (3 rows) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Select all invalid e-mail addresses
Andrus [EMAIL PROTECTED] writes: I have a database of e-mail addresses. I want to select the email addresses which are not valid: do not contain exactly one @ character, contain ;' , characters or spaces etc. What is the WHERE clause for this ? There was a thread here not so long ago about matching valid email addresses. It's not so simple. You probably want to do a regex match - e.g. select ... where email ~ 'regex' However the regex for all valid email possibilities is *VERY* complex. see: http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html You should probably search the maillist archives. ISTR that there were some suggestions on how one might simplify the search space. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Generating random values.
[EMAIL PROTECTED] (Joshua D. Drake) writes: Fernando Lujan wrote: Hi folks, I have a table wich contains my users... I want to insert to each user a random password, so I need a random function. Is there such function in Postgres? I just found the RANDOM which generates values between 0.0 and 1.0. Any help or suggestion will be appreciated. :) I would do someting like: select substring(md5(random() || random()), 5, 8); Sincerely, Joshua D. Drkae Great! a simple, dumb program can generate all your passwords in very quickly. My 2.4 Ghz Pentium 4 did it in under 10 minutes. A token set of 16 characters, and a fixed length of 8 charachters just isnt a very big search space. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Failure to use indexes (fwd)
Greetings all! Given the quiet, I assume that there is no experience with index issues on inherited tables? Just seeing if anybody may have any ideas or suggested work arounds (I seem to have found one by constructing a query that does all the joins between inherited tables explicitely - this causes the indexes to be used - still trying to make sure it is a legitimate method). Regards! Ed -- Forwarded message -- Date: Fri, 29 Jul 2005 14:23:29 -0400 (EDT) From: Edmund Dengler [EMAIL PROTECTED] To: Postgresql-General pgsql-general@postgresql.org Subject: Re: [GENERAL] Failure to use indexes Greetings! I managed to get an even simpler example: select * from eventlog.record_main where luid in ( select luid from eventlog_partition._20050706__raw_record order by luid limit 1 ) ; If I use = rather than in, postgresql uses index scanning. As soon as I use in (ie, a set to compare against), I get sequential scanning, event though the set size is only a single element. Regards! Ed On Fri, 29 Jul 2005, Edmund Dengler wrote: Greetings! I am using inherits to partition several tables. When I perform a query on another table, and then try to join against an inherited table set, the optimizer does not use any indexes to perform the join. This is 8.0.1. Is this fixed in 8.0.2 or 8.0.3? The query: explain select * from ( select * from eventlog.record_classification as record_classification where time_written = '2005-07-06 00:00:00+00'::timestamptz and time_written '2005-07-06 00:00:00+00'::timestamptz order by time_written, luid offset 0 limit 500 ) as classification join eventlog.record_main as main using (luid, time_written) ; The explanation: QUERY PLAN --- Hash Join (cost=51.15..20191003.89 rows=208027 width=178) Hash Cond: ((outer.luid = inner.luid) AND (outer.time_written = inner.time_written)) - Append (cost=0.00..14641488.64 rows=554738383 width=96) - Seq Scan on record_main main (cost=0.00..0.00 rows=1 width=96) - Seq Scan on _20050723__record_main main (cost=0.00..94078.62 rows=3564462 width=96) - Seq Scan on _20050724__record_main main (cost=0.00..110075.12 rows=4170512 width=96) - Seq Scan on _20050725__record_main main (cost=0.00..122836.02 rows=4654002 width=96) - Seq Scan on _20050726__record_main main (cost=0.00..142347.71 rows=5393271 width=96) - Seq Scan on _20050727__record_main main (cost=0.00..130858.80 rows=4957980 width=96) (and so on, currently 123 such inheritd tables) - Hash (cost=51.07..51.07 rows=15 width=98) - Subquery Scan classification (cost=50.89..51.07 rows=15 width=98) - Limit (cost=50.89..50.92 rows=15 width=98) - Sort (cost=50.89..50.92 rows=15 width=98) Sort Key: record_classification.time_written, record_classification.luid - Result (cost=0.00..50.59 rows=15 width=98) - Append (cost=0.00..50.59 rows=15 width=98) - Seq Scan on record_classification (cost=0.00..0.00 rows=1 width=98) Filter: ((time_written = '2005-07-06 00:00:00+00'::timestamp with time zone) AND (time_written '2005-07-06 00:00:00+00'::timestamp with time zone)) - Index Scan using _20050705__record_classification_time_written_idx on _20050705__record_classification record_classification (cost=0.00..3.46 rows=1 width=54) Index Cond: ((time_written = '2005-07-06 00:00:00+00'::timestamp with time zone) AND (time_written '2005-07-06 00:00:00+00'::timestamp with time zone)) - Index Scan using _20050701__record_classification_time_written_idx on _20050701__record_classification record_classification (cost=0.00..3.59 rows=1 width=54) Index Cond: ((time_written = '2005-07-06 00:00:00+00'::timestamp with time zone) AND (time_written '2005-07-06 00:00:00+00'::timestamp with time zone)) - Index Scan using _20050702__record_classification_time_written_idx on _20050702__record_classification record_classification (cost=0.00..3.69 rows=1 width=54) Index Cond: ((time_written = '2005-07-06 00:00:00+00'::timestamp with time zone) AND (time_written '2005-07-06 00:00
Re: [GENERAL] Failure to use indexes (fwd)
Greetings! I have already increased the stats from 10 to 100. In addition, if I specify individual tables, then the indexes are used. However, when I go through the inherits, then indexes are not used. I will try and expand the statistics, but suspect it is not the root cause of the problem. Regards! Ed On Tue, 2 Aug 2005, Scott Marlowe wrote: On Tue, 2005-08-02 at 16:06, Dr NoName wrote: The solution to my problem was to increase statistics value and do another analyze. You can also change default_statistics_target parameter in postgresql.conf. Don't know if that's related to the problem you're seeing, but it's worth a try. Cool postgresql trick: alter database test set default_statistics_target=200; You can change the default for a databases's new tables too. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] Failure to use indexes
Greetings! I am using inherits to partition several tables. When I perform a query on another table, and then try to join against an inherited table set, the optimizer does not use any indexes to perform the join. This is 8.0.1. Is this fixed in 8.0.2 or 8.0.3? The query: explain select * from ( select * from eventlog.record_classification as record_classification where time_written = '2005-07-06 00:00:00+00'::timestamptz and time_written '2005-07-06 00:00:00+00'::timestamptz order by time_written, luid offset 0 limit 500 ) as classification join eventlog.record_main as main using (luid, time_written) ; The explanation: QUERY PLAN --- Hash Join (cost=51.15..20191003.89 rows=208027 width=178) Hash Cond: ((outer.luid = inner.luid) AND (outer.time_written = inner.time_written)) - Append (cost=0.00..14641488.64 rows=554738383 width=96) - Seq Scan on record_main main (cost=0.00..0.00 rows=1 width=96) - Seq Scan on _20050723__record_main main (cost=0.00..94078.62 rows=3564462 width=96) - Seq Scan on _20050724__record_main main (cost=0.00..110075.12 rows=4170512 width=96) - Seq Scan on _20050725__record_main main (cost=0.00..122836.02 rows=4654002 width=96) - Seq Scan on _20050726__record_main main (cost=0.00..142347.71 rows=5393271 width=96) - Seq Scan on _20050727__record_main main (cost=0.00..130858.80 rows=4957980 width=96) (and so on, currently 123 such inheritd tables) - Hash (cost=51.07..51.07 rows=15 width=98) - Subquery Scan classification (cost=50.89..51.07 rows=15 width=98) - Limit (cost=50.89..50.92 rows=15 width=98) - Sort (cost=50.89..50.92 rows=15 width=98) Sort Key: record_classification.time_written, record_classification.luid - Result (cost=0.00..50.59 rows=15 width=98) - Append (cost=0.00..50.59 rows=15 width=98) - Seq Scan on record_classification (cost=0.00..0.00 rows=1 width=98) Filter: ((time_written = '2005-07-06 00:00:00+00'::timestamp with time zone) AND (time_written '2005-07-06 00:00:00+00'::timestamp with time zone)) - Index Scan using _20050705__record_classification_time_written_idx on _20050705__record_classification record_classification (cost=0.00..3.46 rows=1 width=54) Index Cond: ((time_written = '2005-07-06 00:00:00+00'::timestamp with time zone) AND (time_written '2005-07-06 00:00:00+00'::timestamp with time zone)) - Index Scan using _20050701__record_classification_time_written_idx on _20050701__record_classification record_classification (cost=0.00..3.59 rows=1 width=54) Index Cond: ((time_written = '2005-07-06 00:00:00+00'::timestamp with time zone) AND (time_written '2005-07-06 00:00:00+00'::timestamp with time zone)) - Index Scan using _20050702__record_classification_time_written_idx on _20050702__record_classification record_classification (cost=0.00..3.69 rows=1 width=54) Index Cond: ((time_written = '2005-07-06 00:00:00+00'::timestamp with time zone) AND (time_written '2005-07-06 00:00:00+00'::timestamp with time zone)) - Index Scan using _20050703__record_classification_time_written_idx on _20050703__record_classification record_classification (cost=0.00..3.70 rows=1 width=54) Index Cond: ((time_written = '2005-07-06 00:00:00+00'::timestamp with time zone) AND (time_written '2005-07-06 00:00:00+00'::timestamp with time zone)) (and so on) - Index Scan using _20050714__record_classification_time_written_idx on _20050714__record_classification record_classification (cost=0.00..3.69 rows=1 width=53) Index Cond: ((time_written = '2005-07-06 00:00:00+00'::timestamp with time zone) AND (time_written '2005-07-06 00:00:00+00'::timestamp with time zone)) (164 rows) Sample tables: eventlog=# \d eventlog_partition._20050723__record_main Table eventlog_partition._20050723__record_main Column|
Re: [GENERAL] Failure to use indexes
Greetings! I managed to get an even simpler example: select * from eventlog.record_main where luid in ( select luid from eventlog_partition._20050706__raw_record order by luid limit 1 ) ; If I use = rather than in, postgresql uses index scanning. As soon as I use in (ie, a set to compare against), I get sequential scanning, event though the set size is only a single element. Regards! Ed On Fri, 29 Jul 2005, Edmund Dengler wrote: Greetings! I am using inherits to partition several tables. When I perform a query on another table, and then try to join against an inherited table set, the optimizer does not use any indexes to perform the join. This is 8.0.1. Is this fixed in 8.0.2 or 8.0.3? The query: explain select * from ( select * from eventlog.record_classification as record_classification where time_written = '2005-07-06 00:00:00+00'::timestamptz and time_written '2005-07-06 00:00:00+00'::timestamptz order by time_written, luid offset 0 limit 500 ) as classification join eventlog.record_main as main using (luid, time_written) ; The explanation: QUERY PLAN --- Hash Join (cost=51.15..20191003.89 rows=208027 width=178) Hash Cond: ((outer.luid = inner.luid) AND (outer.time_written = inner.time_written)) - Append (cost=0.00..14641488.64 rows=554738383 width=96) - Seq Scan on record_main main (cost=0.00..0.00 rows=1 width=96) - Seq Scan on _20050723__record_main main (cost=0.00..94078.62 rows=3564462 width=96) - Seq Scan on _20050724__record_main main (cost=0.00..110075.12 rows=4170512 width=96) - Seq Scan on _20050725__record_main main (cost=0.00..122836.02 rows=4654002 width=96) - Seq Scan on _20050726__record_main main (cost=0.00..142347.71 rows=5393271 width=96) - Seq Scan on _20050727__record_main main (cost=0.00..130858.80 rows=4957980 width=96) (and so on, currently 123 such inheritd tables) - Hash (cost=51.07..51.07 rows=15 width=98) - Subquery Scan classification (cost=50.89..51.07 rows=15 width=98) - Limit (cost=50.89..50.92 rows=15 width=98) - Sort (cost=50.89..50.92 rows=15 width=98) Sort Key: record_classification.time_written, record_classification.luid - Result (cost=0.00..50.59 rows=15 width=98) - Append (cost=0.00..50.59 rows=15 width=98) - Seq Scan on record_classification (cost=0.00..0.00 rows=1 width=98) Filter: ((time_written = '2005-07-06 00:00:00+00'::timestamp with time zone) AND (time_written '2005-07-06 00:00:00+00'::timestamp with time zone)) - Index Scan using _20050705__record_classification_time_written_idx on _20050705__record_classification record_classification (cost=0.00..3.46 rows=1 width=54) Index Cond: ((time_written = '2005-07-06 00:00:00+00'::timestamp with time zone) AND (time_written '2005-07-06 00:00:00+00'::timestamp with time zone)) - Index Scan using _20050701__record_classification_time_written_idx on _20050701__record_classification record_classification (cost=0.00..3.59 rows=1 width=54) Index Cond: ((time_written = '2005-07-06 00:00:00+00'::timestamp with time zone) AND (time_written '2005-07-06 00:00:00+00'::timestamp with time zone)) - Index Scan using _20050702__record_classification_time_written_idx on _20050702__record_classification record_classification (cost=0.00..3.69 rows=1 width=54) Index Cond: ((time_written = '2005-07-06 00:00:00+00'::timestamp with time zone) AND (time_written '2005-07-06 00:00:00+00'::timestamp with time zone)) - Index Scan using _20050703__record_classification_time_written_idx on _20050703__record_classification record_classification (cost=0.00..3.70 rows=1 width=54) Index Cond: ((time_written = '2005-07-06 00:00:00+00'::timestamp with time zone) AND (time_written '2005-07-06 00:00:00+00'::timestamp with time zone)) (and so
Re: [GENERAL] Copying bytea data out via pgsql
[EMAIL PROTECTED] (Leonel Nunez) writes: John Wells wrote: Guys, I have a number of jpegs and tiffs that are stored in a bytea field in a PostgreSQL database by a Java program using Hibernate. I need to copy these out to a flat file via pgsql for viewing, etc. I've tried psql -c 'select binarydata_field from my_image_table where id=1' mydb flatfile but although that seems to provide the data it does not seem to pull it in a valid image format. Is there some special handling I need to do here? You have on theflatfile info about the query let's say leonel= select count (*) from image; count --- 1 (1 row) you gotfieldname, ---, the image and ( 1 row ) on the flatfile file you need to make a script to put the image on that flatfile just read it as any query and put the contents of that field on the file I use this little Perl Script or you could use some of the other options to psql, e.g psql -A -t -c select ... From the man page: -A --no-align Switches to unaligned output mode. (The default output mode is otherwise aligned.) -t --tuples-only Turn off printing of column names and result row count footers, etc. This is equivalent to the \t command. You might also want to consider the -o filename option ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] Deletes and large tables
Greetings! We have a table with more than 250 million rows. I am trying to delete the first 100,000 rows (based on a bigint primary key), and I had to cancel after 4 hours of the system not actually finishing the delete. I wrote a script to delete individual rows 10,000 at a time using transactions, and am finding each individual delete takes on the order of 0.1 seconds to 2-3 seconds. There are 4 indexes on the table, one of which is very hashlike (ie, distribution is throught the index for sequential rows). Is this considered normal for delete speed? Currently using 8.0.1. Drives are capable of 40MB/sec sustained transfers, but only acheiving about 2-10kB/sec, and mostly CPU bound. Regards! Ed ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Deletes and large tables
Greetings! This table has 3 foreign keys, but that should not matter during deletes. In addition, the tables being referred to are small, and should be in cache. There are no tables depending on it for references, so no dependent triggers should be running. Also, if this was a foreign key issue, I would expect I/O issues/bounds and not CPU. Regards! Ed On Fri, 10 Jun 2005, Richard Huxton wrote: Edmund Dengler wrote: Greetings! We have a table with more than 250 million rows. I am trying to delete the first 100,000 rows (based on a bigint primary key), and I had to cancel after 4 hours of the system not actually finishing the delete. I wrote a script to delete individual rows 10,000 at a time using transactions, and am finding each individual delete takes on the order of 0.1 seconds to 2-3 seconds. There are 4 indexes on the table, one of which is very hashlike (ie, distribution is throught the index for sequential rows). I don't suppose it's off checking foreign-keys in a lot of tables is it? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Deletes and large tables
Just did a sanity check. I dumped the DB schema, and there is indeed a foreign key reference into the table. Now interestingly, the table pointing in has no index on the column, but is a relatively small table with only entries near the end of the large table. So looks like I was getting CPU bound because of a sequental scan of the smaller table per delete. Dropped the constraint, and deletes are now much faster. Regards! Ed On Fri, 10 Jun 2005, David Gagnon wrote: This table has 3 foreign keys, but that should not matter during deletes. In addition, the tables being referred to are small, and should be in cache. I'm talking about FK that point this table... Not FK defined for this table that point to other table. If Table A is referenced by 10 other tables 10 referencial check are needed. There are no tables depending on it for references, so no dependent triggers should be running. Also, if this was a foreign key issue, I would expect I/O issues/bounds and not CPU. Maybe... I'm honetly not sure. Like I said in my previous mail... I got a similar problem (maybe not the same). It was taking 10 minutes to delete 10k line in a table. I turn on some log info in postgresql.conf and I saw that for each row deleted 4 selects were issued to check FK. I drop those FK and the after the delete was taking less than a second. Hope it help /David Regards! Ed On Fri, 10 Jun 2005, Richard Huxton wrote: Edmund Dengler wrote: Greetings! We have a table with more than 250 million rows. I am trying to delete the first 100,000 rows (based on a bigint primary key), and I had to cancel after 4 hours of the system not actually finishing the delete. I wrote a script to delete individual rows 10,000 at a time using transactions, and am finding each individual delete takes on the order of 0.1 seconds to 2-3 seconds. There are 4 indexes on the table, one of which is very hashlike (ie, distribution is throught the index for sequential rows). I don't suppose it's off checking foreign-keys in a lot of tables is it? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 8: explain analyze is your friend
[GENERAL] INHERITS and planning
Greetings! Is there an issue when a large number of INHERITS tables exist for planning? We have 2 base tables, and use INHERITS to partition the data. When we get around 2000-2200 sub-tables (approx 1000-1100 per base table), planning a SELECT statement on the base table (ie, to search all sub-tables) will start slowing down dramatically (ie, feels like something exponential OR some kind of in-memory to on-disk transition). I haven't done enough to really plot out the planning times, but definitely around 1600 tables we were getting sub-second plans, and around 2200 we were above 30 seconds. Also, is there any plans to support proper partitioning/binning of data rather than through INHERITS? I know it has been mentioned as upcoming sometime similar to Oracle. I would like to put in a vote to support auto-binning in which a function is called to define the bin. The Oracle model really only supports: (1) explicit partitioning (ie, every new partition must be defined), or (2) hash binning. What we deal with is temporal data, and would like to bin on the hour or day automatically, hopefully to support truncating whole bins. This helps us 2 ways: (1) data deletion is bulk (we currently drop a full inherited table), (2) cancelling a VACUUM/SELECT doesn't take forever while the execution engine finishes this table (we have had cancels take 2 hours because the VACUUM was on a very large single table). Regards! Ed ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] Optimizer and inherited tables
(Sorry, wrong subject line got sent) Greetings! Does anybody know how well the optimizer works when dealing with inherited tables? I am currently using 8.0.1. I have a table called eventlog.record_main, and a number of inherited tables to partition the data (called eventlog_partition._day__record_main). luid is the primary key (all tables have this indexed via the primary key). The reason for doing this is that a single table would be way too big (there are on average 6-7 million rows per table) so that vacuum and deletes would be inefficient. Inserting has been much more efficient using this mechanism. When I try the following query, I get sequential scans: explain select * from eventlog.record_main order by luid limit 5; QUERY PLAN -- Limit (cost=160800332.75..160800332.77 rows=5 width=92) - Sort (cost=160800332.75..161874465.60 rows=429653138 width=92) Sort Key: eventlog.record_main.luid - Result (cost=0.00..11138614.37 rows=429653138 width=92) - Append (cost=0.00..11138614.37 rows=429653138 width=92) - Seq Scan on record_main (cost=0.00..0.00 rows=1 width=92) - Seq Scan on _20050401__record_main record_main (cost=0.00..223029.64 rows=8620164 width=92) - Seq Scan on _20050402__record_main record_main (cost=0.00..201536.46 rows=7789446 width=92) - Seq Scan on _20050403__record_main record_main (cost=0.00..211277.34 rows=8165934 width=92) - Seq Scan on _20050404__record_main record_main (cost=0.00..219381.70 rows=8479170 width=92) - Seq Scan on _20050405__record_main record_main (cost=0.00..226305.94 rows=8746794 width=92) (and so on) Yet, when I run the query on a single table, I get index usage: explain select * from eventlog_partition._20050601__record_main order by luid limit 5; QUERY PLAN Limit (cost=0.00..0.15 rows=5 width=92) - Index Scan using _20050601__record_main_pkey on _20050601__record_main (cost=0.00..163375.23 rows=5460021 width=92) (2 rows) This means that any query that limits the rows will run extremely inefficiently. Given a limit of 5, at most only 5 rows need to be considered in each partition sub-table, so an optimal plan would run a sub-query in each table limited to 5 rows, and then merge the results. Any ideas/fixes/patches? Regards! Ed ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] vulnerability/SSL (fwd)
Greetings! Does anybody know how well the optimizer works when dealing with inherited tables? I am currently using 8.0.1. I have a table called eventlog.record_main, and a number of inherited tables to partition the data (called eventlog_partition._day__record_main). luid is the primary key (all tables have this indexed via the primary key). The reason for doing this is that a single table would be way too big (there are on average 6-7 million rows per table) so that vacuum and deletes would be inefficient. Inserting has been much more efficient using this mechanism. When I try the following query, I get sequential scans: explain select * from eventlog.record_main order by luid limit 5; QUERY PLAN -- Limit (cost=160800332.75..160800332.77 rows=5 width=92) - Sort (cost=160800332.75..161874465.60 rows=429653138 width=92) Sort Key: eventlog.record_main.luid - Result (cost=0.00..11138614.37 rows=429653138 width=92) - Append (cost=0.00..11138614.37 rows=429653138 width=92) - Seq Scan on record_main (cost=0.00..0.00 rows=1 width=92) - Seq Scan on _20050401__record_main record_main (cost=0.00..223029.64 rows=8620164 width=92) - Seq Scan on _20050402__record_main record_main (cost=0.00..201536.46 rows=7789446 width=92) - Seq Scan on _20050403__record_main record_main (cost=0.00..211277.34 rows=8165934 width=92) - Seq Scan on _20050404__record_main record_main (cost=0.00..219381.70 rows=8479170 width=92) - Seq Scan on _20050405__record_main record_main (cost=0.00..226305.94 rows=8746794 width=92) (and so on) Yet, when I run the query on a single table, I get index usage: explain select * from eventlog_partition._20050601__record_main order by luid limit 5; QUERY PLAN Limit (cost=0.00..0.15 rows=5 width=92) - Index Scan using _20050601__record_main_pkey on _20050601__record_main (cost=0.00..163375.23 rows=5460021 width=92) (2 rows) This means that any query that limits the rows will run extremely inefficiently. Given a limit of 5, at most only 5 rows need to be considered in each partition sub-table, so an optimal plan would run a sub-query in each table limited to 5 rows, and then merge the results. Any ideas/fixes/patches? Regards! Ed ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] hpw to Count without group by
[EMAIL PROTECTED] (Yudie Pg) writes: Hello, I have a table, structure like this: create table product( sku, int4 not null, category int4 null, display_name varchar(100) null, rank int4 null ) let say example data: sku, category, display_name === 10001, 5, postgresql, 132 10002, 5, mysql, 243 10003, 5, oracle, 323 10006, 7, photoshop, 53 10007, 7, flash mx, 88 10008, 9, Windows XP, 44 10008, 9, Linux, 74 Expected query result: sku, category, display_name, category_count 10001, 5, postgresql, 3 10006, 7, photoshop, 2 10008, 9, Windows XP, 2 The idea is getting getting highest ranking each product category and COUNT how many products in the category with SINGLE query. the first 3 columns can be done with select distinct on (category) ... order by category, rank desc but it still missing the category_count. I wish no subquery needed for having simplest query plan. Thank you. Yudie G. I do not believe you can do this without a subquery - you are trying to get 2 separate pieces of information from your data * some data about the record having MAX(rank) for each category and * the count of records in each category Note, however that you can get MAX(rank) and COUNT(category) in one sequential pass of the data: e.g SELECT category, MAX(rank), COUNT(category) FROM product; Joining this with the orignal table is not too dificult : SELECT sku, category, display_name, category_count FROM product JOIN (SELECT category, MAX(rank) AS rank, COUNT(category) AS category_count FROM product GROUP BY category) subq USING(category, rank) ORDER BY sku; Depending on what your data looks like, you might improve things by having an index on category, and perhaps on (category, rank). Note that there is may be a problem with this query: If you have more than one product with the same rank in the same category, you may get more than one record for that category. Apply distinct on as neccessary. -- Remove -42 for email ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [GENERAL] Count and Results together
[EMAIL PROTECTED] (Jan Sunavec) writes: I am using libpg.so. I assume that you mean libpq ? I tryed find solution for this problem in internet but, I don't find nothing yet. I have idea get rowcount throught some function write in C. Or is there any plan add this feature into PostgreSQL? Have you read your documentation? Perhaps this page would be of interest to you? http://www.postgresql.org/docs/8.0/interactive/libpq-exec.html#LIBPQ-EXEC-SELECT-INFO -- Remove -42 for email ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] CSV delim quoting differences PgCOPY, Excel etc...
[EMAIL PROTECTED] (Jerry Sievers) writes: Hello. Anyway, I am often enough having to load Pg databases using SQL COPY from CSV output written by Excel, that I've had to write a script to change the quoting behavior from Excel's putting double quotes around a field having embedded delim chars, to Pg's backslash quoting. I've got a hunch that one or both of the aforementioned softwares can change their quoting behavior. (IT just seems reasonable... famous last words.) Is it true that neither of Pg SQL \copy and/or Excel can adapt as I wish? What about psql \copy? I fail to read in the docs for the Pg products any support for this. If Excel can adapt, at least one experienced user I've asked, doesn't know about it. Thanks. As of version 8.0 Pg has been taught how to do the right thing to copy CSV files: COPY mytable TO 'myfile' WITH CSV; -- Remove -42 for email ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Debugging deadlocks
[EMAIL PROTECTED] (Bruno Wolff III) writes: Using domains is a good way to keep column constraints in just one place. Speaking of domains, how do you find out what the range of a domain is? eg: test=# create domain fruit as text check( value in ('apple', 'orange', 'banana', 'pear')); CREATE DOMAIN test=# \dD fruit List of domains Schema | Name | Type | Modifier +---+--+-- public | fruit | text | (1 row) test=# \dD+ fuit List of domains Schema | Name | Type | Modifier +---+--+-- public | fruit | text | (1 row) A quick look through pg_catalog doesn't suggest anything that would return the check conditions - Is there any way to do this? -- Remove -42 for email ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] fied separator change from the shell command line
[EMAIL PROTECTED] writes: Hello, I'm trying to change the usal | table field separator from the shell command line: psql -d ect -f pl_lost.sql -o pl_lost.out.txt -F \t -U asaadmin But it doesn't work. It keeps the same | separator in the output file. Can anyone please help me? I need to output to a tab separated file. Can anynone hel me with this? Thanks in advance, Changing the field separator only works with unaligned output. You need to add a -A to your command line switches (or --no-align). Doing this, though, drops the column headers. -- Remove -42 for email ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] Stuck with a query...
[EMAIL PROTECTED] (Greg Stark) writes: Geoff Caplan [EMAIL PROTECTED] writes: Hi folks, Sorry to ask a newbie SQL question but I'm struggling... There's no efficient way to write this in standard SQL. However Postgres has an extension DISTINCT ON that would do it: select url,count(*) from (select distinct on (session_id) url from clickstream order by session_id,sequence_num desc ) group by url This isn't going to be a superfast query. It has to sort all the clickstream records by session and sequence, take just the last one, then probably sort those again. As an experiment I tried a more 'standard SQL' approach to this problem: SELECT url, count(1) FROM clickstream WHERE (session_id, sequence_num) IN (SELECT session_id, max(sequence_num) FROM clickstream GROUP BY session_id) GROUP BY url; On a table with about 100,000 rows this runs in about 720ms on my system , compared to the ON DISTICNT version which runs in about 1000ms. Adding an index on (session_id, sequence_num) reduced the run time to about 690ms, but made no difference to the DISTINCT ON version. With only about 10,000 rows, there's no appreciable difference. This surprised me, because I expected the DISTINCT ON to be better. -- Remove -42 for email ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] Novice Question
[EMAIL PROTECTED] (Michael Romagnoli) writes: What kind of command would I run if I wanted to copy an entire table (along with renaming it, and, of course, all data from the first table - some of which is binary)? SELECT * INTO newtable FROM oldtable; Note that this doesn't construct indexes, Foreign keys, constraints, etc. If by 'binary data' you mean BLOBs, I'd expect the above to work. Other than that, AFAIUI you have no reasonable expectation that your data is stored in any meaningful binary format by the database. All data could be internally stored as strings (though that might be very slow). -- Remove -42 for email ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] basic trigger using OLD not working?
[EMAIL PROTECTED] (Rick Casey) writes: CREATE OR REPLACE FUNCTION logPedigreesDel() RETURNS TRIGGER AS ' begin RAISE EXCEPTION ''OLD.famindid = '', OLD.famindid; RAISE EXCEPTION ''OLD.famindid = %'', OLD.famindid; ^ return OLD; end; ' LANGUAGE plpgsql; -- Remove -42 for email ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[GENERAL] Multi-column indexes
Greetings! I have a technical question concerning multi-column indexes and their implementation. I tried looking for the answr in the docs but couldn't find anything. I have the following table: eventlog= \d agent.record Table agent.record Column | Type | Modifiers +--+- luid | bigint | not null default nextval('agent.record_luid_seq'::text) host_luid | bigint | remote_system_luid | bigint | log_luid | bigint | not null time_logged| timestamp with time zone | not null default now() record | bytea| not null error | boolean | error_reason | text | Indexes: record_pkey primary key, btree (luid) record_to_process_idx unique, btree (host_luid, log_luid, luid) WHERE (error IS NULL) record_to_process_idx2 unique, btree (luid) WHERE (error IS NULL) record_last_logged_idx btree (time_logged, host_luid, log_luid, luid) Foreign-key constraints: $1 FOREIGN KEY (host_luid) REFERENCES eventlog.host(luid) ON UPDATE CASCADE ON DELETE CASCADE $2 FOREIGN KEY (remote_system_luid) REFERENCES eventlog.remote_system(luid) $3 FOREIGN KEY (log_luid) REFERENCES eventlog.log(luid) ON UPDATE CASCADE ON DELETE CASCADE consisting of 27306578 rows. So I try running the following query: explain analyze select record from agent.record where host_luid = 3::bigint and log_luid = 2::bigint and error is null order by host_luid desc, log_luid desc, luid desc limit 1 I get the following query plan: Limit (cost=0.00..1.47 rows=1 width=286) (actual time=249064.949..249064.950 rows=1 loops=1) - Index Scan Backward using record_to_process_idx on record (cost=0.00..13106.73 rows=8898 width=286) (actual time=249064.944..249064.944 rows=1 loops=1) Index Cond: ((host_luid = 3::bigint) AND (log_luid = 2::bigint)) Filter: (error IS NULL) Total runtime: 249065.004 ms (5 rows) Now, this plan seems kinda slow, in the sense of scanning backwards. And it takes quite a long time (compared to seeking the last row based only on luid, for example). It feels that if I have host_luid values of (1,2,3,4,5), that the above is scanning through _all_ 5 entries, then 4 entries, and then finally gets to 3. So, now to my question: is this really happening? I guess it breaks down to how these indexes are implemented. Are multi-column indexes implemented as true multiple level indexes, in the sense there is a level 1 index on host_luid, pointing to a level 2 index on log_luid, pointing to a level 3 index on luid? Or are they the equivalent of a host_luid,log_luid,luid single index (ie, as if I created a functional index consisting of host_luid || ',' || log_luid || ',' || luid )? My initial guess was that Postgresql would search first to the host_luid desc, then from there to the specific log_luid desc, and then from there to the luid (ie, the equivalent of 3 btree jumps), essentialy skipping over the inappropriate host_luid's of 5 and 4. But it seems to be scanning through them, even though I have a low cost for random page accesses within my postgresql.conf file. Are they components of the index to allow it to skip backwards lots of pages rather than loading them from disk? Any ideas? How does multi-column indexes really work? I would hate to have to define specific indexes for each host_luid as this is an unmaintainable situation. Thanks! Ed ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Multi-column indexes
Hi Tom! Yep, there are a large number of host_luid/log_luid combinations (there are approximatly 5-10 hosts and 1-3 logs per system we are running). Thanks for the recommended workaround, I'll have a try at it at some point tomorrow. Regards! Ed On Sat, 15 Jan 2005, Tom Lane wrote: Edmund Dengler [EMAIL PROTECTED] writes: record_to_process_idx unique, btree (host_luid, log_luid, luid) WHERE (error IS NULL) explain analyze select record from agent.record where host_luid = 3::bigint and log_luid = 2::bigint and error is null order by host_luid desc, log_luid desc, luid desc limit 1 Limit (cost=0.00..1.47 rows=1 width=286) (actual time=249064.949..249064.950 rows=1 loops=1) - Index Scan Backward using record_to_process_idx on record (cost=0.00..13106.73 rows=8898 width=286) (actual time=249064.944..249064.944 rows=1 loops=1) Index Cond: ((host_luid = 3::bigint) AND (log_luid = 2::bigint)) Filter: (error IS NULL) Total runtime: 249065.004 ms Are there a whole lotta rows with that host_luid/log_luid combination? What's happening is that the index search initially finds the first such row, and then it has to step to the last such row to start the backwards scan. This is fixed as of 8.0, but all earlier releases are going to be slow in that scenario. It's got nothing to do with single vs multi column indexes, it is just a shortcoming of the startup code for backwards index scans. (I get the impression that the original implementation of Postgres' btree indexes only supported unique indexes, because there were a number of places where it was horridly inefficient for large numbers of equal keys. I think this 8.0 fix is the last such issue.) Since your index has an additional column, there is a hack you can use to get decent performance in 7.4 and before. Add a dummy condition on the last column: where host_luid = 3::bigint and log_luid = 2::bigint AND LUID = someverylargevalue::bigint and error is null order by host_luid desc, log_luid desc, luid desc limit 1 Now, instead of positioning to the first row with value (3,2,anything) the initial btree descent will home in on the end of that range, and so the expensive stepping over all the rows between is avoided. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 8: explain analyze is your friend
[GENERAL] select into temp tables withough using EXECUTE in plpgsql
I have a query surrounding somthing taht seems to have been a relatively FAQ. It concerns the use of temporary tables in plpgsql. Which initially resulted in the good old 'oid not found' error. So I learnt from the maliing-list that I should be 'executing' (with EXECUTE Command) my queries because expressions used in a PL/pgSQL function are only prepared and saved once (without using execute). However I would like to select ditinct valuse in my temp table which seem impossible as: SELECT INTO is not currently supported within EXECUTE. So, the only way to extract a result from a dynamically-created SELECT is to use the FOR ... EXECUTE form described later. (http://www.postgresql.org/docs/7.1/static/plpgsql-description.html) I either cannot find or do not understand the documentation for this FOR ... EXECUTE form being described somewhere later in the docuanetation. So to recap I'm creating a temp table fine. I'm EXCUTING an insert into the temp table fine. Then I cannot select from the table in the form: SELECT INTO int1 count(distinct(value)) from TEMP1; Then it happily drops the table (without the select). Any advice on how to select from a temp table into a variable wuold be gratefully recieved. Many Thanks Edmund ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] disabling constraints
[EMAIL PROTECTED] (Vivek Khera) writes: DP == David Parker [EMAIL PROTECTED] writes: DP I would like to be able to truncate all of the tables in a schema DP without worrying about FK constraints. I tried issuing a SET DP CONSTRAINTS ALL DEFERRED before truncating, but I still get DP constraint errors. Is there a way to do something like: Why don't you truncate your tables in an order that won't violate FK's? Or do you have circular references? -- Because that doesn't work: test=# create table able(id serial primary key, data text); NOTICE: CREATE TABLE will create implicit sequence able_id_seq for serial column able.id NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index able_pkey for table able CREATE TABLE test=# create table baker(id int references able(id) deferrable, data text); CREATE TABLE test=# truncate able; ERROR: cannot truncate a table referenced in a foreign key constraint DETAIL: Table baker references able via foreign key constraint $1. test=# begin; BEGIN test=# set constraints all deferred; SET CONSTRAINTS test=# truncate able; ERROR: cannot truncate a table referenced in a foreign key constraint DETAIL: Table baker references able via foreign key constraint $1. test=# rollback; ROLLBACK test=# -- Remove -42 for email ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] UTF-8 and =, LIKE problems
I am running a web-based accounting package (SQL-Ledger) that supports multiple languages on PostgreSQL. When a database encoding is set to Unicode, multilingual operation is possible. However, when a user's input language is set to say English, and the user enters data such as 79, the data that is sent back to PostgreSQL for storage is U+FF17 U+FF19, which are the Unicode half width characters 79. So far so good. Now, if the user switches languages and enters 79 as a search key, the previously entered row will not be found with the LIKE or = operators, and all other comparison operations will fail too. The problem is that the browser now sends back U+0037 U+0039, which are Unicode full width characters for 79. Semantically, one might expect U+FF17 U+FF19 to be identical to U+0037 U+0039, but of course they aren't if a simple-minded byte-by-byte or character-by-character comparison is done. In the ideal case, one would probably want to convert all full width chars to their half width equivalents because the numbers look wierd on the screen (e.g., 7 9 B r i s b a n e S t r e e t instead of 79 Brisbane Street. Is there any way to get PostgreSQL to do so? Failing this, is there any way to get PostgreSQL to be a bit smarter in doing comparisons? I think I'm SOL, but I thought I'd ask anyway. ...Edmund. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Issue adding foreign key
[EMAIL PROTECTED] (George Woodring) writes: I have 2 existing tables in my db: iss= \d pollgrpinfo Table public.pollgrpinfo Column | Type | Modifiers ---++--- pollgrpinfoid | integer| not null pollgrpid | integer| not null name | character varying(100) | descript | character varying(200) | Indexes: pollgrpinfo_pkey primary key, btree (pollgrpinfoid) pollgrpinfo_pollgrpid_key unique, btree (pollgrpid) iss= \d notpoll Table public.notpoll Column| Type | Modifiers -++- notpollid | integer| not null pollgrpid | integer| notgroupsid | integer| alerting| character(1) | default 'y'::bpchar disuser | character varying(50) | distime | integer| alertingcom | character varying(200) | Indexes: notpoll_pkey primary key, btree (notpollid) notpoll_pollgrpid_key unique, btree (pollgrpid) notpoll_alerting_index btree (alerting) Triggers: RI_ConstraintTrigger_2110326 AFTER INSERT OR UPDATE ON notpoll FROM notgroups NOT DEFERRABLE INITIALLY IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE RI_FKey_check_ins('notgroups_exists', 'notpoll', 'notgroups', 'UNSPECIFIED', 'notgroupsid', 'notgroupsid') I am trying to add a foreign key to the notpoll table iss= alter table notpoll add constraint pollgrp_exists foreign key(pollgrpid) references pollgrpinfo on delete cascade; ERROR: insert or update on table notpoll violates foreign key constraint pollgrp_exists DETAIL: Key (pollgrpid)=(7685) is not present in table pollgrpinfo. When expressed as FOREIGN KEY (foo) REFERENCES mytable postgresql assumes that foo references the PRIMARY KEY for table mytable. If you are not referencing the primary key, you need to tell postgresql which column e.g. FOREIGN KEY (foo) REFERNCES mytable(foo). -- Remove -42 for email ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [GENERAL] adding missing FROM-clause
[EMAIL PROTECTED] (C G) writes: Dear All, I have a simple join query SELECT c1 FROM t1 INNER JOIN t2 ON t2.c2 = t1.c2 WHERE t3.c3= t2.c3; Which gives the expected result but I get the message NOTICE: adding missing FROM-clause entry for table t3 How do I get rid of this NOTICE, i.e. how should I construct my select query. SELECT c1 FROM t1, t2, t3 WHERE t2.c2 = t1.c2 AND t3.c3 = t2.c3; or SELECT c1 FROM t1 INNER JOIN t2 ON t2.c2 = t1.c2 INNER JOIN t3 ON T3.c3 = t2.c3; The above can also be written as SELECT c1 FROM t1 JOIN t2 USING(c2) JOIN t3 USING(c3); or even SELECT c1 FROM t1 NATURAL JOIN t2 NATURAL JOIN t3; This last might be problematic if t3 has a column named c1. Question: Is there any advantage to specifying USING() rather than ON? I know that if I do SELECT * from T1 JOIN t2 USING(col) then I only get 1 instance of col in the returned rows, but I'm wondering if there is any advantage to the planner by specifying either USING() or ON? -- Remove -42 for email ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [GENERAL] ON DELETE trigger blocks delete from my table
[EMAIL PROTECTED] (Naeem Bari) writes: I understand. Makes sense. Is there anyway for my trigger function to know that it is being called on a delete or on an update? Because I do need to return new on update... and I really don't want to write 2 different functions, one for update and one for delete... Yes, plpgsql sets a variable TG_OP to INSERT, UPDATE or DELETE. so, for example IF ( TG_OP = ''DELETE'' ) THEN RETURN old; ELSE RETURN new; END IF; -- Remove -42 for email ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [GENERAL] earthdistance is not giving correct results.
[EMAIL PROTECTED] (mike cox) writes: I'm running PostgreSQL 8.0 beta 1. I'm using the earthdistance to find the distance between two different latitude and logitude locations. Unfortunately, the result seems to be wrong. Here is what I'm doing: select earth_distance(ll_to_earth('122.55688','45.513746'),ll_to_earth('122.396357','47.648845')); The result I get is this: I believe ll_to_earth() is expecting ll_to_earth(latitude, longitude), Also, I think earth_distance returns it's value in meters. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [GENERAL] SELECT based on function result
Robert Fitzpatrick wrote: I have a function that tells me if a record is positive and negative based on several field values. I use it in select statements: ohc= SELECT sample_id, is_wipe_positive(tblleadwipe.sample_id) AS positive FROM tblleadwipe WHERE hud_building_id IS NOT NULL; sample_id | positive ---+-- 73 | f 81 | t (2 rows) I see that I cannot change my WHERE statement to WHERE positive = 't' because the column positive does not exist. Now I am looking for the best way to return all records that are found positive or negative using a query. Can anyone offer any guidance to how I can return all the positives (or negatvies)? Or do I need to write another function that does that? -- Robert Any reason why: SELECT sample_id, positive FROM (SELECT sample_id, is_wipe_positive(tblleadwipe.sample_id) AS positive FROM tblleadwipe WHERE hud_building_id IS NOT NULL) foo WHERE positive = 't'; won't work? ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [GENERAL] Enough RAM for entire Database.. cost aside, is this
Greetings! On Fri, 2 Jul 2004, Mike Rylander wrote: I find that experience does not bear this out. There is a saying a coworker of mine has about apps that try to solve problems, in this case caching, that are well understood and generally handled well at other levels of the software stack... he calls them too smart by half :) But on the other hand, general algorithms which are designed to work under a wide variety of circumstances may fail in specific cases. I am thinking of VACUUM which would kill most caching algorithms simply because we cannot tell the O/S by the by, this set of pages will not be used again, and therefore it would be fine to use almost none of the general cache to store this. All algorithms have assumptions of value distribution and usages. Caches depend on locality of reference, and we do not have an easy way to say when this is broken. Regards! Ed ---(end of broadcast)--- TIP 8: explain analyze is your friend
[GENERAL] Unable to use index?
Hi folks! A query I am running does not seem to use indexes that are available (running version 7.4.2). I have the following table: = \d replicated Table public.replicated Column | Type | Modifiers -+--+- rep_id | bigint | not null default nextval('replicated_id_seq'::text) rep_component | character varying(100) | rep_key1| integer | rep_key2| bigint | rep_key3| smallint | rep_replicated | timestamp with time zone | rep_remotekey1 | integer | rep_remotekey2 | bigint | rep_remotekey3 | smallint | rep_key2b | bigint | rep_remotekey2b | bigint | rep_key4| text | Indexes: replicated_pkey primary key, btree (rep_id) replicate_key1_idx btree (rep_key1, rep_key2, rep_key3) replicated_item2_idx btree (rep_component, rep_key2, rep_key3) replicated_item_idx btree (rep_component, rep_key1, rep_key2, rep_key3) replicated_key2_idx btree (rep_key2, rep_key3) replicated_key4_idx btree (rep_key4) = analyze verbose replicated; INFO: analyzing public.replicated INFO: replicated: 362140 pages, 3 rows sampled, 45953418 estimated total rows ANALYZE The following does not use an index, even though two are available for the specific selection of rep_component. = explain analyze select * from replicated where rep_component = 'ps_probe' limit 1; QUERY PLAN --- Limit (cost=0.00..0.23 rows=1 width=101) (actual time=34401.857..34401.859 rows=1 loops=1) - Seq Scan on replicated (cost=0.00..936557.70 rows=4114363 width=101) (actual time=34401.849..34401.849 rows=1 loops=1) Filter: ((rep_component)::text = 'ps_probe'::text) Total runtime: 34401.925 ms (4 rows) Yet, if I do the following, an index will be used, and it runs much faster (even when I swapped the order of the execution). = explain analyze select * from replicated where rep_component = 'ps_probe' order by rep_component limit 1; QUERY PLAN --- Limit (cost=0.00..1.66 rows=1 width=101) (actual time=51.163..51.165 rows=1 loops=1) - Index Scan using replicated_item2_idx on replicated (cost=0.00..6838123.76 rows=4114363 width=101) (actual time=51.157..51.157 rows=1 loops=1) Index Cond: ((rep_component)::text = 'ps_probe'::text) Total runtime: 51.265 ms (4 rows) Any reason why the index is not chosen? Maybe I need to up the number of rows sampled for statistics? Regards! Ed ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[GENERAL] unsubscribe
unsubscribe EDMUND ZYNDA [EMAIL PROTECTED] FocalBase Internet Solutions p. 410.751.2093 x219 f. 410.751.2653 www.focalbase.com image001.gif
Re: [GENERAL] SET within a function?
I guess it comes back to the semantics of NULL. As has been pointed out in many a database course, what we mean by NULL changes, and how we want to use NULL changes on circumstances. Normally, when I am comparing rows, I do want NULL NULL. In this specific instance, no value has been assigned to the specific column for this row, so NULL is appropriate. However, there are cases where I am trying to explicitely test for existence of a specific row in the table, and in this case, I _do_ want a NULL == NULL type of comparison. I could try and specify a dummy value (in this case, I could put in -1), but then I am trying to create a second class of NULLs, and this is usually not considered good design. Note that as a prime example of how postgresql itself is not consistent (in the strictest sense) is GROUP BY which treats NULL == NULL (interesting side bar, is there a way to cause GROUP BY to treat NULLs as not equal to each other?). In a theoretical question, how is this justified if NULL should not equal to NULL (other than it is in the spec)? Also, is there a particular reason for not having a strict equality operator (or is it simply because it is not in the specification)? Performance? No support from the back-end? Something else? Regards, Ed On Wed, 15 Oct 2003, Tom Lane wrote: Edmund Dengler [EMAIL PROTECTED] writes: ... I have no real choice in this as there is no way to specify that NULL == NULL. The conventional wisdom on this is that if you think you need NULL == NULL to yield true, then you are misusing NULL, and you'd better reconsider your data representation. The standard semantics for NULL really do not support any other interpretation of NULL than I don't know what this value is. If you are trying to use NULL to mean something else, you will face nothing but misery. Choose another representation for whatever you do mean. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [GENERAL] SET within a function?
The problem I would face is that this still needs to be a sequential scan in the table rather than an index lookup. Regards, Ed On Tue, 14 Oct 2003, Arthur Ward wrote: Is the rewrite only for the literal 'X = NULL' or will it do a test against a value such as 'X = OLD.X' (and rewrite is OLD.X is NULL)? Is there any way to match NULLS to each other (as I am looking for a literal row, not using NULL as the UNKNOWN). I suppose I could put in a dummy value for the 'Not a valid value', but it seems to be quite awkward when I really do want the NULL. I ended up writing an equivalent function for the project I'm working on. It goes like this in plpgsql: IF $1 IS NULL THEN RETURN $2 IS NULL; ELSIF $2 IS NULL THEN -- We already know $1 is not null. RETURN FALSE; ELSE -- Both args are not null. RETURN $1 = $2; END IF; That's the basic idea. I put a wrapper around this to generate a copy of it for all the data types used in my database. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[GENERAL] SET within a function?
Hi all! I am doing some trigger functions that need to find a tuple in another table. The problem is that this second table is doing some summarization work, and I need nulls to equal each other. Basically, in the trigger I do a: SELECT INTO ... x FROM table1 WHERE ...(some straightforward x = old.x)... AND (x1 = old.x1 OR (x1 is null and old.x1 is null)) AND (x2 = old.x2 OR (x2 is null and old.x2 is null)) AND (x3 = old.x3 OR (x3 is null and old.x3 is null)); The problem is that an index is used to perform the straightforward stuff, and then the x1,x2,x3 is done via an index scan, rather than directly. Unfortunately for the data set I have, it can be clustered pretty badly around the straightforward stuff, and so the scan can take multiple seconds per call. I think if I could do a 'SET TRANSFORM_NULL_EQUALS TO ON' then this might fix the issue (don't know, haven't tried it yet). My question is: can this be done within a function such that at the end of the function, the value is reset back to value upon entering (kind of like 'SET LOCAL' except for just the length of the function call). Is this possible? Thanks! Ed ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [GENERAL] SET within a function?
Is the rewrite only for the literal 'X = NULL' or will it do a test against a value such as 'X = OLD.X' (and rewrite is OLD.X is NULL)? Is there any way to match NULLS to each other (as I am looking for a literal row, not using NULL as the UNKNOWN). I suppose I could put in a dummy value for the 'Not a valid value', but it seems to be quite awkward when I really do want the NULL. Regards! Ed On Mon, 13 Oct 2003, Bruno Wolff III wrote: On Mon, Oct 13, 2003 at 21:16:33 -0400, Edmund Dengler [EMAIL PROTECTED] wrote: I think if I could do a 'SET TRANSFORM_NULL_EQUALS TO ON' then this might fix the issue (don't know, haven't tried it yet). My question is: can this be done within a function such that at the end of the function, the value is reset back to value upon entering (kind of like 'SET LOCAL' except for just the length of the function call). Is this possible? I don't think that will do what you want. That setting is used to rewrite = null as is null, not to change things so that nulls match each other. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [GENERAL] Buglist
Well, if they are locked waiting on vacuum, then vacuum should upgrade it's priority to the highest waiting process (priority inheritance). This way, vacuum will be running at a priority level equivalent to who is waiting on it. Regards, Ed On Thu, 21 Aug 2003, Andrew Sullivan wrote: On Wed, Aug 20, 2003 at 11:41:41PM +0200, Karsten Hilbert wrote: You mean, like, nice 19 or so ? ISTR someone reporting problems with locking on the performance list from doing exactly that. The problem is that the vacuum back end might take a lock and then not get any processor time -- in which case everybody else gets their processor slice but can't do anything, because they have to wait until the niced vacuum process gets back in line. A -- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada [EMAIL PROTECTED] M2P 2A8 +1 416 646 3304 x110 ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Bulk Insert / Update / Delete
Wasn't there a feature in some SQL database which was the equivalent of UPDATE OR INSERT ... based on the primary key? Would this accomplish what you want (I know that I have a desire for this feature a couple of times, as I simply have code or triggers to essentially do the equivalent)? Is this a desirable feature for Postgresql? Regards, Ed On Thu, 21 Aug 2003, Philip Boonzaaier wrote: Hi Ron That is just the point. If Postgres cannot tell me which records exist and need updating, and which do not and need inserting, then what can ? In the old world of indexed ISAM files it is very simple - try to get the record ( row ) by primary key. If it is there, update it, if it is not, insert it. Now, one can do this with a higher level language and SQL combined, but is SQL that weak ? What happens when you merge two tables ? Surely SQL must somehow determine what needs INSERTING and what needs UPDATING Or does one try to merge, get a failure, an resort to writing something in Perl or C ? Please help to un - confuse me ! Regards Phil - Original Message - From: Ron Johnson [EMAIL PROTECTED] To: PgSQL General ML [EMAIL PROTECTED] Sent: Tuesday, August 19, 2003 6:45 PM Subject: Re: [GENERAL] Bulk Insert / Update / Delete On Tue, 2003-08-19 at 22:03, Philip Boonzaaier wrote: Hi Jason Thanks for your prompt response. I'm pretty new to SQL, so please excuse the following rather stupid question : How do I use the (NOT) EXISTS or (NOT) IN clause ? Would it be feasible, using your suggestion, to simply put in two SQL statements, in the same query - first UPDATE when EXISTS, then INSERT when NOT EXISTS, to accomplist this in one go ? Regards Phil How will you which records were updated, thus able to know which need to be inserted? A temporary table and pl/pgsql should do the trick. - Original Message - From: Jason Godden [EMAIL PROTECTED] To: Philip Boonzaaier [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Tuesday, August 19, 2003 4:42 PM Subject: Re: [GENERAL] Bulk Insert / Update / Delete Hi Philip, Pg is more ansi compliant than most (GoodThing (TM)). You can use the 'when' conditional but not to do what you need. If I understand you correclty you should be able to acheive the same result using two seperate queries and the (NOT) EXISTS or (NOT) IN clause. Failing that have a look at the fine docs on pl/pgsql and other postgresql procedural languages which allow you to use loops and conditional statements like 'if'. Rgds, J On Wed, 20 Aug 2003 12:21 pm, Philip Boonzaaier wrote: I want to be able to generate SQL statements that will go through a list of data, effectively row by row, enquire on the database if this exists in the selected table- If it exists, then the colums must be UPDATED, if not, they must be INSERTED. Logically then, I would like to SELECT * FROM TABLE WHERE Values entered here, and then IF FOUND UPDATE TABLE SET Values entered here ELSE INSERT INTO TABLE VALUES Values entered here END IF; The IF statement gets rejected by the parser. So it would appear that PostgreSQL does not support an IF in this type of query, or maybe not at all. Does anyone have any suggestions as to how I can achieve this ? -- - Ron Johnson, Jr. [EMAIL PROTECTED] Jefferson, LA USA 484,246 sq mi are needed for 6 billion people to live, 4 persons per lot, in lots that are 60'x150'. That is ~ California, Texas and Missouri. Alternatively, France, Spain and The United Kingdom. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] This message is privileged and confidential and intended for the addressee only. If you are not the intended recipient you may not disclose, copy or in any way use or publish the content hereof, which is subject to copyright.If you have received this in error, please destroy the original message and contact us at [EMAIL PROTECTED] Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the view of Computerkit Retail Systems, its subsidiaries or associates. Please note that the recipient must scan this e-mail and attachments for viruses. We accept no liability of whatever nature for any loss, liability,damage or expense resulting directly or indirectly from this transmission of this message and/or attachments. ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [GENERAL] Buglist
What I am pointing out is that this is all the same issue, and that solutions to the we can't do priorities because of locking issues have existed for many years. I/O is the same as processors, it is a resource that needs managing. So the intelligence can be made to exist, it just needs to be made. Now onto other questions: can vacuuming be done without locks? Can it be done in parts (ie, lock only a bit)? Can the I/O be better managed? Is this a general model that would work well? I have plenty of queries that I would love to run on a as the system allows basis, or on a keep a bit of spare cycles or I/O for the important stuff, but which I cannot specify. So a vote from me for any mechanism that allows priorities to be specified. If this is a desired feature, then comes the hard part of what is feasible, what can be done in a reasonable amount of time, and of doing it. Regards! Ed On Thu, 21 Aug 2003, Andrew Sullivan wrote: On Thu, Aug 21, 2003 at 12:05:28PM -0400, Edmund Dengler wrote: Well, if they are locked waiting on vacuum, then vacuum should upgrade it's priority to the highest waiting process (priority inheritance). This way, vacuum will be running at a priority level equivalent to who is waiting on it. Right, but all that intelligence is something that isn't in there now. And anyway, the real issue is I/O, not processor. A -- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada [EMAIL PROTECTED] M2P 2A8 +1 416 646 3304 x110 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [GENERAL] Buglist
What about the use of priority inheritance to deal with the issue of priority inversion (a standard methodology within the real-time world)? Then we could have priorities, but still have low priority processes bumped up if a high level one is waiting on them. Regards, Ed On Wed, 20 Aug 2003, Tom Lane wrote: Andrew Sullivan [EMAIL PROTECTED] writes: I disagree. Triggering a vacuum on a db that is nearly saturating the disk bandwidth has a significant impact. Vivek is right about this. If your system is already very busy, then a vacuum on a largish table is painful. I don't actually think having the process done in real time will help, though -- it seems to me what would be more useful is an even lazier vacuum: something that could be told clean up as cycles are available, but make sure you stay out of the way. Of course, that's easy to say glibly, and mighty hard to do, I expect. I'd love to be able to do that, but I can't think of a good way. Just nice'ing the VACUUM process is likely to be counterproductive because of locking issues (priority inversion). Though if anyone cares to try it on a heavily-loaded system, I'd be interested to hear the results... regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [GENERAL] Unused Indexes
You need to convert the int's to bigints. select id where col1 = 1::bigint and col2 = 1::bigint Regards, Ed -Original Message- From: Tim McAuley [EMAIL PROTECTED] Date: Wed, 30 Jul 2003 13:46:46 To:[EMAIL PROTECTED] Subject: [GENERAL] Unused Indexes Hi, I have a table which I have populated with over 5000 entries. There is a combined index placed on two of the columns (both bigint). I am trying a simple select (i.e. select id where col1 = 1 and col2 = 1) covering these two columns and it keeps using a seq scan. Is this correct? I would have thought that with this number of entries that an index scan should be used. I am testing this using postgresql 7.3.3 on windows 2000 using cygwin. Doing set enable_seqscan to off does not change the results of the explain operation. I also tried setting a single index on just one of the columns and running an appropriate search; it still uses a seq scan. At what stage will the planner normally start using an index scan? Any hints appreciated. Tim ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Storing double-byte strings in text fields.
Hello, I am putting together a web site to display a collection of Chinese woodblock prints. I want to be able to store double byte values (that is to say Big5, Unicode etc encoded) in a text field for things such as the artist's name and the title of the print. I have the following questions: Is this possible using a plain vanilla version of Postgres, ie without the multi-lingual support enabled? As I understand it multi-lingual support allows me to store table and field names etc in non-ASCII, but doesn't really affect what goes into the fields. Are programs such as pgdump and the COPY method 8bit clean or will they mess up the text? I have done some quick trials and it all seems OK but I want to be sure before commiting. If the above is not the case will the multi-lingual support fix my problems? I tried it out but had problems with the backend crashing on certain queries. I'd also rather not use it as it will be easier to port my system to other servers if it just needs a plain vanilla install. I am currently using Postgresql 7.0.3 on RedHat 6.2 (x86) and also on YellowDog 1.2 (PPC). The web server is Apache 1.3.12 with PHP 4.0.x. Thanks, Edmund. -- *** *** Edmund von der Burg *** [EMAIL PROTECTED] *** ***
[GENERAL] [Fwd: PostgreSQL - Desparate!]
Pardon the intrusion, but I have a dilemma which I cannot find the answer to, after searching the newsgroups, documentation, and mailing lists -- My postmaster will not start. Below is a message I've posted in several newsgroups. If you could make ANY suggestions on getting postmaster to work, I would appreciated it greatly. -Thanks - whenever I attempt to start the postmaster, I get the following error: FindBackend: found "/usr/bin/postgres" using argv[0] FATAL: StreamServerPort: bind() failed: errno=13 Is another postmaster already running on that port? If not, remove socket node (/tmp/.s.PGSQL.portnr)and retry. I'ts been working fine for several weeks until recently. There are no files in /tmp relating to postgreSQL. The port I'm using is not in use by another process, determined with the ps auxw|grep postma command, and also, fuser -vun tcp 5432. Here's the command I'm using to start postmaster: nohup /usr/bin/postmaster -i -B 256 -p 5432 -d -D /var/lib/pgsql /home/postgres/pgsql.log 2/home/postgres/pgsql.err I'm using postgreSQL 6.3.2 under RedHat Linux 5.1. Any suggestions would be greatly appreciated. -Thanks