Re: [PERFORM] Query hanging/not finishing inconsistently

2006-06-20 Thread Meetesh Karia
Hi all,I just saw another email on the mailing list to this effect as well.  We recently updated the kernel versions on our machines to the latest stable versions (which contained both HyperThreading and IO bug fixes) and we updated Postgres to version 
8.0.8.  We thought we were in the clear when we didn't encounter a hang for 6+ days.  But, once again we ran into the same situation where a query that normally executes in ~15ms wouldn't finish.  As before, there were no ungranted locks and threads weren't waiting on a lock.  I attached gdb to one of the stuck postgres processes and got the following stack trace:
#0  0x008967a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2#1  0x00977e5b in semop () from /lib/tls/libc.so.6#2  0x08167298 in PGSemaphoreLock ()#3  0x0818bcb5 in LWLockAcquire ()#4  0x080a47f5 in SimpleLruWritePage ()
#5  0x080a48ad in SimpleLruReadPage ()#6  0x080a519a in SubTransGetParent ()#7  0x080a51f2 in SubTransGetTopmostTransaction ()#8  0x0821371c in HeapTupleSatisfiesSnapshot ()#9  0x080822a2 in heap_release_fetch ()
#10 0x080880fb in index_getnext ()#11 0x08128507 in ExecReScanHashJoin ()#12 0x08122a09 in ExecScan ()#13 0x081287f9 in ExecIndexScan ()#14 0x0811dfdd in ExecProcNode ()#15 0x0812a49f in ExecNestLoop ()
#16 0x0811df9d in ExecProcNode ()#17 0x0812b74d in ExecSort ()#18 0x0811df5d in ExecProcNode ()#19 0x0812b941 in ExecUnique ()#20 0x0811df2c in ExecProcNode ()#21 0x0811ce18 in ExecutorRun ()#22 0x081947ec in PortalSetResultFormat ()
#23 0x08194df4 in PortalRun ()#24 0x08192ef7 in PostgresMain ()#25 0x08169780 in ClosePostmasterPorts ()#26 0x0816b0ae in PostmasterMain ()#27 0x0813a5a6 in main ()We then upgraded glibc to 2.3.4-2.19
 but we encountered the problem within a day.  Our latest attempt at isolating the problem has been to reboot the machine with a 'noht' kernel param.  The machine has been up for 1 day, 13:18 since then and we haven't seen the problem yet.
Has anyone been able to solve this problem?Thanks,MeeteshOn 5/22/06, Meetesh Karia <
[EMAIL PROTECTED]
> wrote:Hi Craig,Thanks for your response.  This did start recently and it wasn't after a kernel update, but it was after we moved the db from Machine B to Machine A (which have slightly different kernel versions).  However, the problem took about a week to show up after we moved from one machine to the other.  Unfortunately, the problem only reappears after 15 mins once it occurs the first time.  If it occurs again today I'll attach gdb to it and see whether it's stuck on a mutex.
MeeteshOn 5/22/06, Craig A. James
 <[EMAIL PROTECTED]> wrote:



Meetesh Karia wrote:> Hi all,>> We've recently started having a problem where a query that normally> executes in ~15ms starts to take upwards of 20s to complete.  When the> connection that ran query is returned to the connection pool, it appears
> as though a transaction is still in progress so the connection pool> tries to cancel the transaction and close the connection.  This fails> and the connection is removed from the connection pool.  At this point,
> the situation rapidly degrades and we run out of connections to the> postgres server.>> An inspection of the pg_stat_activity table shows that practically every> connection is running the above-mentioned query and some of those
> queries have been active for many minutes!  We've looked at the pg_locks> table as well and the only exclusive locks are on transactions that are> open.  All other locks are AccessShareLocks.  Also, as far as we can
> tell (from looking at the Hibernate stats), every db session that is> opened is closed.>> When this happens, if I kill one of the running postgres processes (just> by picking the last process returned from "ps -ef | grep postgres"), the
> other queries will immediately finish and the system will respond.> However, within 15 minutes, we'll be back in the same state as before.> At that point, I've cycled Apache, Tomcat and Postgres and the system
> then seems to come back.This sounds suspiciously like a question I asked a few weeks ago, on April 4.  I have a process that just gets stuck.  After some questions from various of the experts in this forum, I used gdb(1) to attach to one of the frozen Postgress backend processes, and here's what I found:
On 5/12/2006, I wrote:> Thanks, good advice.  You're absolutely right, it's stuck on a> mutex.  After doing what you suggest, I discovered that the query> in progress is a user-written function (mine).  When I log in as
> root, and use "gdb -p " to attach to the process, here's> what I find.  Notice the second function in the stack, a mutex> lock:>> (gdb) bt> #0  0x0087f7a2 in _dl_sysinfo_int80 () from /lib/ld-
linux.so.2> #1  0x0096cbfe in __lll_mutex_lock_wait () from /lib/tls/libc.so.6> #2  0x008ff67b in _L

Re: [PERFORM] Query hanging/not finishing inconsistently

2006-05-22 Thread Meetesh Karia
Hi Craig,Thanks for your response.  This did start recently and it wasn't after a kernel update, but it was after we moved the db from Machine B to Machine A (which have slightly different kernel versions).  However, the problem took about a week to show up after we moved from one machine to the other.  Unfortunately, the problem only reappears after 15 mins once it occurs the first time.  If it occurs again today I'll attach gdb to it and see whether it's stuck on a mutex.
MeeteshOn 5/22/06, Craig A. James
 <[EMAIL PROTECTED]> wrote:

Meetesh Karia wrote:> Hi all,>> We've recently started having a problem where a query that normally> executes in ~15ms starts to take upwards of 20s to complete.  When the> connection that ran query is returned to the connection pool, it appears
> as though a transaction is still in progress so the connection pool> tries to cancel the transaction and close the connection.  This fails> and the connection is removed from the connection pool.  At this point,
> the situation rapidly degrades and we run out of connections to the> postgres server.>> An inspection of the pg_stat_activity table shows that practically every> connection is running the above-mentioned query and some of those
> queries have been active for many minutes!  We've looked at the pg_locks> table as well and the only exclusive locks are on transactions that are> open.  All other locks are AccessShareLocks.  Also, as far as we can
> tell (from looking at the Hibernate stats), every db session that is> opened is closed.>> When this happens, if I kill one of the running postgres processes (just> by picking the last process returned from "ps -ef | grep postgres"), the
> other queries will immediately finish and the system will respond.> However, within 15 minutes, we'll be back in the same state as before.> At that point, I've cycled Apache, Tomcat and Postgres and the system
> then seems to come back.This sounds suspiciously like a question I asked a few weeks ago, on April 4.  I have a process that just gets stuck.  After some questions from various of the experts in this forum, I used gdb(1) to attach to one of the frozen Postgress backend processes, and here's what I found:
On 5/12/2006, I wrote:> Thanks, good advice.  You're absolutely right, it's stuck on a> mutex.  After doing what you suggest, I discovered that the query> in progress is a user-written function (mine).  When I log in as
> root, and use "gdb -p " to attach to the process, here's> what I find.  Notice the second function in the stack, a mutex> lock:>> (gdb) bt> #0  0x0087f7a2 in _dl_sysinfo_int80 () from /lib/ld-
linux.so.2> #1  0x0096cbfe in __lll_mutex_lock_wait () from /lib/tls/libc.so.6> #2  0x008ff67b in _L_mutex_lock_3220 () from /lib/tls/libc.so.6> #3  0x4f5fc1b4 in ?? ()> #4  0x00dc5e64 in std::string::_Rep::_S_empty_rep_storage () from /usr/local/pgsql/lib/libchmoogle.so
> #5  0x009ffcf0 in ?? () from /usr/lib/libz.so.1> #6  0xbfe71c04 in ?? ()> #7  0xbfe71e50 in ?? ()> #8  0xbfe71b78 in ?? ()> #9  0x009f7019 in zcfree () from /usr/lib/libz.so.1> #10 0x009f7019 in zcfree () from /usr/lib/libz.so.1
> #11 0x009f8b7c in inflateEnd () from /usr/lib/libz.so.1> #12 0x00c670a2 in ~basic_unzip_streambuf (this=0xbfe71be0) at zipstreamimpl.h:332> #13 0x00c60b61 in OpenBabel::OBConversion::Read (this=0x1, pOb=0xbfd923b8, pin=0xffea) at istream:115
> #14 0x00c60fd8 in OpenBabel::OBConversion::ReadString (this=0x8672b50, pOb=0xbfd923b8) at obconversion.cpp:780> #15 0x00c19d69 in chmoogle_ichem_mol_alloc () at stl_construct.h:120> #16 0x00c1a203 in chmoogle_ichem_normalize_parent () at stl_construct.h:120
> #17 0x00c1b172 in chmoogle_normalize_parent_sdf () at vector.tcc:243> #18 0x0810ae4d in ExecMakeFunctionResult ()> #19 0x0810de2e in ExecProject ()> #20 0x08115972 in ExecResult ()> #21 0x08109e01 in ExecProcNode ()
> #22 0x0020 in ?? ()> #23 0xbed4b340 in ?? ()> #24 0xbf92d9a0 in ?? ()> #25 0xbed4b0c0 in ?? ()> #26 0x in ?? ()>> It looks to me like my code is trying to read the input parameter
> (a fairly long string, maybe 2K) from a buffer that was gzip'ed> by Postgres for the trip between the client and server... somewhere> along the way, a mutex gets set, and then ... it's stuck forever.
>> ps(1) shows that this thread had been running for about 7 hours,> and the job status showed that this function had been> successfully called about 1 million times, before this mutex lock

> occurred.This is not an issue that's been resolved.  Nobody had ever seen this before.  Tom Lane suggested it might be a libc/c++ bug, but unfortunately in my case this lockup occurs so rarely (every few days) that it will be very difficult to know if we've fixed the problem.
If gdb(1) reveals that your process is stuck in a mutex, then you might have a better chance testing this hypothesis, since your problem happens within 15 minutes or so.Did this start recently, perhaps right after a kernel update?
Craig



[PERFORM] Query hanging/not finishing inconsistently

2006-05-22 Thread Meetesh Karia
Hi all,We've recently started having a problem where a query that normally executes in ~15ms starts to take upwards of 20s to complete.  When the connection that ran query is returned to the connection pool, it appears as though a transaction is still in progress so the connection pool tries to cancel the transaction and close the connection.  This fails and the connection is removed from the connection pool.  At this point, the situation rapidly degrades and we run out of connections to the postgres server.
An inspection of the pg_stat_activity table shows that practically every connection is running the above-mentioned query and some of those queries have been active for many minutes!  We've looked at the pg_locks table as well and the only exclusive locks are on transactions that are open.  All other locks are AccessShareLocks.  Also, as far as we can tell (from looking at the Hibernate stats), every db session that is opened is closed.
When this happens, if I kill one of the running postgres processes (just by picking the last process returned from "ps -ef | grep postgres"), the other queries will immediately finish and the system will respond.  However, within 15 minutes, we'll be back in the same state as before.  At that point, I've cycled Apache, Tomcat and Postgres and the system then seems to come back.
This problem appears to be unrelated to load and in fact, the majority of the time there is very little load on the site when this occurs.  We've run load tests on our dev boxes but we've been unable to reproduce the problem.  We're currently working on playing back the clicks on the site previous to the weird state the site gets in and at the same time, we were wondering if anyone has experienced a problem like this or has any suggestions.
The query in question is:select distinct s.screening_id, f.film_id, f.title, s.period_start, f.runtime, c.value, v.short_name, s.parent_id,
    stats.avg_rating, coalesce(stats.num_adds, 0) as num_adds, coalesce(stats.unique_visits, 0) as unique_visits,
    f.*, s.*    from lte_screening s
    inner join lte_film f on s.film_id = f.film_id    inner join lte_venue v on 
s.venue_id = v.venue_id    inner join lte_film_classifier c on c.film_id = f.film_id
    left join lte_film_stats stats on stats.context = :context and stats.film_id = s.film_id
    where c.name = ? and s.period_start is not null and s.festival_id = ?    and s.period_start
 between ? + ? and ? + ?    order by s.period_start, f.title;And the result of explain analyze:
QUERY PLANUnique  (cost=1117.42..1118.71 rows=11 width=866) (actual time=
18.306..18.386 rows=15 loops=1)  ->  Sort  (cost=1117.42..1117.44 rows=11 width=866) (actual time=18.300..18.316 rows=15 loops=1)
    Sort Key: s.period_start, f.title, s.screening_id, f.film_id, f.runtime, c.value, v.short_name, s.parent_id, 
stats.avg_rating, COALESCE(stats.num_adds, 0), COALESCE(stats.unique_visits, 0::bigint), f.film_id, f.sku, f.title, f."template", f.release_date, f.runtime, f."language", f.country, f.mpaa_rating, f.synopsis
, f."owner", f.ext_sales_rank, f.small_image_url, f.medium_image_url, f.large_image_url, f.detail_page, f.to_delete, f.coalesce_to, (subplan), (subplan), s.screening_id, s.period_start, s.period_end, s.ticket_price
, s.tickets_avail, s.tickets_sold, s."type", s.venue_id, s.festival_id, s.film_id, s.parent_id, s.ext_id, s.purchase_url, s.status, s.status_update_time
    ->  Nested Loop Left Join  (cost=2.62..1117.23 rows=11 width=866) (actual time=2.656..17.773 rows=15 loops=1)
  ->  Nested Loop  (cost=2.62..976.00 rows=11 width=846) (actual time=2.347..16.162 rows=15 loops=1)    ->  Hash Join  (cost=
2.62..929.09 rows=10 width=831) (actual time=2.217..15.480 rows=15 loops=1)  Hash Cond: ("outer".venue_id = "inner".venue_id)
  ->  Nested Loop  (cost=0.00..926.32 rows=10 width=818) (actual time=1.915..14.974 rows=15 loops=1)
    ->  Seq Scan on lte_screening s  (cost=0.00..886.67 rows=10 width=159) (actual time=
1.830..14.314 rows=15 loops=1)  Filter: ((period_start IS NOT NULL) AND (festival_id = 316372) AND (period_start >= '2006-05-19 05:00:00'::timestamp without time zone) AND (period_start <= '2006-05-20 04:59:59'::timestamp without time zone))
    ->  Index Scan using lte_film_pkey on lte_film f  (cost=0.00..3.95 rows=1 width=659) (actual time=
0.026..0.028 rows=1 loops=15)  Index Cond: ("outer".film_id = f.film_id
)  ->  Hash  (cost=2.50..2.50 rows=50 width=21) (actual time=0.215..0.215 rows=0 loops=1)
    ->  Seq Scan on lte_venue v  (cost=0.00..2.50 rows=50 width=21) (actual time=0.012..0.126
 rows=52 loops=1)    ->  Index Scan using idx_classifier_film on lte_film_classifier c  (cost=0.00..4.67
 rows=2 width

Re: [PERFORM] Query take 101 minutes, help, please

2005-09-07 Thread Meetesh Karia
PG is creating the union of January, February and March tables first
and that doesn't have an index on it.  If you're going to do many
queries using the union of those three tables, you might want to place
their contents into one table and create an index on it.

Otherwise, try something like this:

SELECT "Rut Cliente"
FROM "Internet_Abril"
WHERE "Rut Cliente"  NOT IN (SELECT "Rut Cliente"  FROM
"Internet_Enero")
AND "Rut Cliente" NOT IN (SELECT "Rut Cliente"  FROM
"Internet_Febrero")
AND "Rut Cliente" NOT IN (SELECT "Rut Cliente"  FROM
"Internet_Marzo");

You could also compare the performance of that to this and choose the one that works the best:

SELECT "Rut Cliente"

FROM "Internet_Abril" a
LEFT JOIN "Internet_Enero" e ON a."Rut Cliente" = e."Rut Cliente"
LEFT JOIN "Internet_Febrero" f ON a."Rut Cliente" = f."Rut Cliente"
LEFT JOIN "Internet_Marzo" m ON a."Rut Cliente" = m."Rut Cliente"
WHERE e."Rut Cliente" IS NULL AND f."Rut Cliente" IS NULL and m."Rut Cliente" IS NULL;

MeeteshOn 9/7/05, Christian Compagnon <[EMAIL PROTECTED]> wrote:
Hello,I'm a newbie in postgresql, I've installed it on a Windows XP machine( I can't use linux, it's a company machine ), I'm courious why thisquery takes so longSELECT "Rut Cliente"FROM "Internet_Abril"
WHERE "Rut Cliente"  NOT IN ((SELECT "Rut Cliente"  FROM"Internet_Enero") UNION (SELECT "Rut Cliente"  FROM"Internet_Febrero") UNION (SELECT "Rut Cliente"  FROM
"Internet_Marzo"));it takes about 100 minutes to complete the query.All tables has index created ( Rut Cliente is a VarChar ), and tableshas 50.000 records each.The explain for the query tells the following
"QUERY PLAN Seq Scan on "Internet_Abril"  (cost=19406.67..62126112.70 rows=24731 width=13)  Filter: (NOT (subplan))   SubPlan   ->  Materialize  (cost=19406.67..21576.07 rows=136740 width=13)
->  Unique  (cost=17784.23..18467.93
rows=136740 width=13)->  Sort  (cost=17784.23..18126.08 rows=136740 width=13)  
SortKey: "Rut Cliente"  ->  Append  (cost=0.00..3741.80
rows=136740 width=13)  ->  Subquery
Scan "*SELECT* 1"  (cost=0.00..1233.38rows=45069 width=13)  
->  Seq Scan on
"Internet_Enero"  (cost=0.00..782.69rows=45069 width=13)  
->  Subquery Scan "*SELECT*
2"  (cost=0.00..1104.06rows=40353 width=13)
->  Seq Scan on
"Internet_Febrero"  (cost=0.00..700.53rows=40353 width=13)  
->  Subquery Scan "*SELECT*
3"  (cost=0.00..1404.36rows=51318 width=13)->  Seq
Scan on "Internet_Marzo"  (cost=0.00..891.18rows=51318 width=13)Any help will be apreciated, It's for my thesissaludosChristian---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate   subscribe-nomail command to [EMAIL PROTECTED] so that your   message can get through to the mailing list cleanly



Re: [PERFORM] Planner incorrectly choosing seq scan over index scan

2005-08-03 Thread Meetesh Karia
Btw - I tried playing around with some of the other planner cost
constants but I wasn't able to get the planner to choose the index
scan.  It seems like the issue is that the estimated cost for
fetching one row from the index (3.02) is a little high in my
case.  Is there any way that I can adjust that cost
estimate?  Are there any side effects of doing that?  Or is
my best solution to simple set enable_hashjoin to off for this query?

Thanks,
MeeteshOn 8/2/05, Meetesh Karia <[EMAIL PROTECTED]> wrote:
Thanks Tom,

That modifies the query plan slightly, but the planner still decides to
do a hash join for the lte_user table aliased 't'.  Though, if I
make this change and set enable_hashjoin to off, the query plan (and
execution time) gets even better.

enable_hashjoin = on
--
QUERY PLAN
Sort  (cost=10113.35..10122.02 rows=3467 width=48) (actual time=1203.000..1203.000 rows=3467 loops=1)
  Sort Key: c.sourceid, c.targetid
  ->  Nested Loop  (cost=8711.19..9909.50 rows=3467
width=48) (actual time=1156.000..1203.000 rows=3467 loops=1)
    ->  Index Scan using
lte_user_pkey on lte_user s  (cost=0.00..3.02 rows=1 width=16)
(actual time=0.000..0.000 rows=1 loops=1)
  Index Cond: (617004 = user_id)
    ->  Hash Join 
(cost=8711.19..9776.46 rows=3467 width=40) (actual
time=1156.000..1187.000 rows=3467 loops=1)
  Hash Cond: ("outer".targetid = "inner".user_id)
 
->  Seq Scan on candidates617004 c  (cost=0.00..76.34
rows=3467 width=32) (actual time=0.000..16.000 rows=3467 loops=1)
   
Filter: (sourceid = 617004)
 
->  Hash  (cost=8012.55..8012.55 rows=279455 width=16)
(actual time=1141.000..1141.000 rows=0 loops=1)
   
->  Seq Scan on lte_user t  (cost=0.00..8012.55
rows=279455 width=16) (actual time=0.000..720.000 rows=279395 loops=1)
Total runtime: 1218.000 ms

enable_hashjoin = off
---
QUERY PLAN
Sort  (cost=10942.56..10951.22 rows=3467 width=48) (actual time=188.000..188.000 rows=3467 loops=1)
  Sort Key: c.sourceid, c.targetid
  ->  Nested Loop  (cost=0.00..10738.71 rows=3467 width=48) (actual time=0.000..188.000 rows=3467 loops=1)
    ->  Index Scan using
lte_user_pkey on lte_user s  (cost=0.00..3.02 rows=1 width=16)
(actual time=0.000..0.000 rows=1 loops=1)
  Index Cond: (617004 = user_id)
    ->  Nested
Loop  (cost=0.00..10605.67 rows=3467 width=40) (actual
time=0.000..157.000 rows=3467 loops=1)
 
->  Seq Scan on candidates617004 c  (cost=0.00..76.34
rows=3467 width=32) (actual time=0.000..15.000 rows=3467 loops=1)
   
Filter: (sourceid = 617004)
 
->  Index Scan using lte_user_pkey on lte_user t 
(cost=0.00..3.02 rows=1 width=16) (actual time=0.028..0.037 rows=1
loops=3467)
   
Index Cond: ("outer".targetid = t.user_id)
Total runtime: 188.000 ms

Thanks,
MeeteshOn 8/2/05, Tom Lane <
[EMAIL PROTECTED]> wrote:
Meetesh Karia <[EMAIL PROTECTED]> writes:> Sure. The lte_user table is just a collection of users. user_id is assigned=
> uniquely using a sequence. During some processing, we create a candidates=
> table (candidates617004 in our case). This table is usually a temp table.=> sourceid is a user_id (in this case it is always 617004) and targetid is=20> also a user_id (2860 distinct values out of 3467). The rest of the=20
> information is either only used in the select clause or not used at all=20> during this processing.If you know that sourceid has only a single value, it'd probably behelpful to call out that value in the query, ie,
where ... AND c.sourceId = 617004 ...regards,
tom lane




Re: [PERFORM] Planner incorrectly choosing seq scan over index scan

2005-08-02 Thread Meetesh Karia
Thanks Tom,

That modifies the query plan slightly, but the planner still decides to
do a hash join for the lte_user table aliased 't'.  Though, if I
make this change and set enable_hashjoin to off, the query plan (and
execution time) gets even better.

enable_hashjoin = on
--
QUERY PLAN
Sort  (cost=10113.35..10122.02 rows=3467 width=48) (actual time=1203.000..1203.000 rows=3467 loops=1)
  Sort Key: c.sourceid, c.targetid
  ->  Nested Loop  (cost=8711.19..9909.50 rows=3467
width=48) (actual time=1156.000..1203.000 rows=3467 loops=1)
    ->  Index Scan using
lte_user_pkey on lte_user s  (cost=0.00..3.02 rows=1 width=16)
(actual time=0.000..0.000 rows=1 loops=1)
  Index Cond: (617004 = user_id)
    ->  Hash Join 
(cost=8711.19..9776.46 rows=3467 width=40) (actual
time=1156.000..1187.000 rows=3467 loops=1)
  Hash Cond: ("outer".targetid = "inner".user_id)
 
->  Seq Scan on candidates617004 c  (cost=0.00..76.34
rows=3467 width=32) (actual time=0.000..16.000 rows=3467 loops=1)
   
Filter: (sourceid = 617004)
 
->  Hash  (cost=8012.55..8012.55 rows=279455 width=16)
(actual time=1141.000..1141.000 rows=0 loops=1)
   
->  Seq Scan on lte_user t  (cost=0.00..8012.55
rows=279455 width=16) (actual time=0.000..720.000 rows=279395 loops=1)
Total runtime: 1218.000 ms

enable_hashjoin = off
---
QUERY PLAN
Sort  (cost=10942.56..10951.22 rows=3467 width=48) (actual time=188.000..188.000 rows=3467 loops=1)
  Sort Key: c.sourceid, c.targetid
  ->  Nested Loop  (cost=0.00..10738.71 rows=3467 width=48) (actual time=0.000..188.000 rows=3467 loops=1)
    ->  Index Scan using
lte_user_pkey on lte_user s  (cost=0.00..3.02 rows=1 width=16)
(actual time=0.000..0.000 rows=1 loops=1)
  Index Cond: (617004 = user_id)
    ->  Nested
Loop  (cost=0.00..10605.67 rows=3467 width=40) (actual
time=0.000..157.000 rows=3467 loops=1)
 
->  Seq Scan on candidates617004 c  (cost=0.00..76.34
rows=3467 width=32) (actual time=0.000..15.000 rows=3467 loops=1)
   
Filter: (sourceid = 617004)
 
->  Index Scan using lte_user_pkey on lte_user t 
(cost=0.00..3.02 rows=1 width=16) (actual time=0.028..0.037 rows=1
loops=3467)
   
Index Cond: ("outer".targetid = t.user_id)
Total runtime: 188.000 ms

Thanks,
MeeteshOn 8/2/05, Tom Lane <[EMAIL PROTECTED]> wrote:
Meetesh Karia <[EMAIL PROTECTED]> writes:> Sure. The lte_user table is just a collection of users. user_id is assigned=> uniquely using a sequence. During some processing, we create a candidates=
> table (candidates617004 in our case). This table is usually a temp table.=> sourceid is a user_id (in this case it is always 617004) and targetid is=20> also a user_id (2860 distinct values out of 3467). The rest of the=20
> information is either only used in the select clause or not used at all=20> during this processing.If you know that sourceid has only a single value, it'd probably behelpful to call out that value in the query, ie,
where ... AND c.sourceId = 617004 ...regards,
tom lane


Re: [PERFORM] Planner incorrectly choosing seq scan over index scan

2005-08-01 Thread Meetesh Karia
Thanks John.  I've answered your questions below:
Has lte_user and candidates617004 been recently ANALYZEd? All estimates,except for the expected number of rows from lte_user seem to be okay.
I ANALYZEd both tables just before putting together my first
email.  And, unfortunately, modifying the statistics target didn't
help either.
Is user_id the primary key for lte_user?
Yes 
I'm trying to figure out how you can get 50k rows, by searching aprimary key, against a 
3.5k rows. Is user_id only part of the primarykey for lte_user?
Hmmm ... I missed that before.  But, that surprises me too. 
Especially since sourceId in the candidates table has only 1
value.  Also, user_id is the complete primary key for lte_user.
Can you give us the output of:\d lte_user\d candidates617004

Sure, here they are:

lte=# \d lte_user
 Table "public.lte_user"
    Column
|   
Type
| Modifiers
---+-+---
 user_id   |
bigint 
| not null
 firstname | character varying(255)  |
 lastname  | character varying(255)  |
 address1  | character varying(255)  |
 address2  | character varying(255)  |
 city  | character varying(255)  |
 state | character varying(255)  |
 zip   | character varying(255)  |
 phone1    | character varying(255)  |
 phone2    | character varying(255)  |
 username  | character varying(255)  |
 password  | character varying(255)  |
 deleted   |
boolean
| not null
 ext_cust_id   | character varying(255)  |
 aboutme   | character varying(255)  |
 birthday  | timestamp without time zone |
 fm_id   |
bigint 
|
 ar            | double
precision   
|
Indexes:
    "lte_user_pkey" PRIMARY KEY, btree (user_id)
    "idx_user_extid" btree (ext_cust_id)
    "idx_user_username" btree (username)
Foreign-key constraints:
    "fk_user_fm" FOREIGN KEY (fm_id) REFERENCES fm(fm_id)

lte=# \d candidates617004
   Table "public.candidates617004"
    Column   
|  
Type   | Modifiers
--+--+---
 fmid   | bigint   |
 sourceid | bigint   |
 sr            | double precision |
 targetid | bigint   |
 tr           | double precision | 
Also, if you could describe the table layouts, that would help.

Sure.  The lte_user table is just a collection of users. 
user_id is assigned uniquely using a sequence.  During some
processing, we create a candidates table (candidates617004 in our
case).  This table is usually a temp table.  sourceid is a
user_id (in this case it is always 617004) and targetid is also a
user_id (2860 distinct values out of 3467).  The rest of the
information is either only used in the select clause or not used at all
during this processing.

Did I miss something in the table layout description that would be helpful?

Thanks for your help!
Meetesh
>
->  Sort  (cost=9912.07..9920.73 rows=3467
width=40) (actual> time=1156.000..1156.000 rows=3467 loops=1)>   Sort Key: c.sourceid>  
->  Hash Join  (cost=8710.44..9708.21 rows=3467
width=40)> (actual time=1125.000..1156.000 rows=3467 loops=1)>
Hash Cond: ("outer".targetid = "inner".user_id)>
->  Seq Scan on candidates617004 c> (cost=0.00..67.67 rows=3467 width=32) (actual time=0.000..0.000> rows=3467 loops=1)>
->  Hash  (cost=8011.95..8011.95 rows=279395> width=16) (actual time=1125.000..1125.000 rows=0 loops=1)>  
->  Seq Scan on lte_user t> (cost=0.00..8011.95 rows=279395 width=16) (actual time=0.000..670.000> rows=279395 loops=1)> Total runtime: 1406.000 ms>> enable_hashjoin disabled> 
> QUERY PLAN> Sort  (cost=14355.37..14364.03 rows=3467 width=48) (actual> time=391.000..391.000 rows=3467 loops=1)>   Sort Key: c.sourceid, c.targetid>   ->  Nested Loop  (cost=271.52..14151.51
 rows=3467 width=48) (actual> time=203.000..359.000 rows=3467 loops=1)>
->  Merge Join  (cost=271.52..3490.83 rows=3467
width=40)> (actual time=203.000..218.000 rows=3467 loops=1)>  
Merge Cond: ("outer".user_id = "inner".sourceid)>  
->  Index Scan using lte_user_pkey on lte_user s> (cost=0.00..16837.71 rows=279395 width=16) (actual time=0.000..126.000> rows=50034 loops=1)>  
->  Sort  (cost=271.52..280.19 rows=3467
width=32) (actual> time=15.000..30.000 rows=3467 loops=1)>
Sort Key: c.sourceid>
->  Seq Scan on candidates617004 c> (cost=0.00..67.67 rows=3467 width=32) (actual time=0.000..0.000> rows=3467 loops=1)> ->  Index Scan using lte_user_pkey on lte_user t> (cost=0.00..3.03
 rows=1 width=16) (actual time=0.031..0.036 rows=1> loops=3467)>  
Index Cond: ("outer".targetid = t.user_id)> Total runtime

Re: [PERFORM] Planner incorrectly choosing seq scan over index scan

2005-08-01 Thread Meetesh Karia
Are you referring to the statistics gathering target for ANALYZE? 
Based on your email, I just tried the following and then re-ran the
explain analyze but got the same "incorrect" plan:

alter table candidates617004
    alter column sourceId set statistics 1000,
    alter column targetId set statistics 1000;
analyze candidates617004;

alter table lte_user
    alter column user_id set statistics 1000;
analyze lte_user;
Thanks for your suggestion,
Meetesh
On 8/2/05, Tobias Brox <[EMAIL PROTECTED]> wrote:
[Meetesh Karia - Tue at 12:19:27AM +0200]> We're using 8.0.3 and we're seeing a problem where the planner is choosing a> seq scan and hash join over an index scan. If I set enable_hashjoin to off,> then I get the plan I'm expecting and the query runs a lot faster. I've also
> tried lowering the random page cost (even to 1) but the planner still> chooses to use the hash join.Have you tried increasing the statistics collection?--Tobias Brox, +47-91700050Nordicbet, IT dept



[PERFORM] Planner incorrectly choosing seq scan over index scan

2005-08-01 Thread Meetesh Karia
Hi all,

We're using 8.0.3 and we're seeing a problem where the planner is
choosing a seq scan and hash join over an index scan.  If I set
enable_hashjoin to off, then I get the plan I'm expecting and the query
runs a lot faster.  I've also tried lowering the random page cost
(even to 1) but the planner still chooses to use the hash join.

Does anyone have any thoughts/suggestions?  I saw that there was a
thread recently in which the planner wasn't correctly estimating the
cost for queries using LIMIT.  Is it possible that something
similar is happening here (perhaps because of the sort) and that the
patch Tom proposed would fix it?

Thanks.  Here are the various queries and plans:

Normal settings

explain analyze
    select
    c.sourceId,
    c.targetId,
    abs(c.tr - c.sr) as xmy,
    (c.sr - s.ar) * (c.tr - t.ar) as xy,
    (c.sr - s.ar) * (c.sr - s.ar) as x2,
    (c.tr - t.ar) * (c.tr - t.ar) as y2
    from
    candidates617004 c,
    lte_user s,
    lte_user t
    where
    c.sourceId = s.user_id
    and c.targetId = t.user_id
    order by
    c.sourceId,
    c.targetId;

QUERY PLAN
Sort  (cost=13430.57..13439.24 rows=3467 width=48) (actual time=1390.000..1390.000 rows=3467 loops=1)
  Sort Key: c.sourceid, c.targetid
  ->  Merge Join  (cost=9912.07..13226.72 rows=3467
width=48) (actual time=1344.000..1375.000 rows=3467 loops=1)
    Merge Cond: ("outer".user_id = "inner".sourceid)
    ->  Index Scan using
lte_user_pkey on lte_user s  (cost=0.00..16837.71 rows=279395
width=16) (actual time=0.000..95.000 rows=50034 loops=1)
    ->  Sort 
(cost=9912.07..9920.73 rows=3467 width=40) (actual
time=1156.000..1156.000 rows=3467 loops=1)
  Sort Key: c.sourceid
 
->  Hash Join  (cost=8710.44..9708.21 rows=3467 width=40)
(actual time=1125.000..1156.000 rows=3467 loops=1)
   
Hash Cond: ("outer".targetid = "inner".user_id)
   
->  Seq Scan on candidates617004 c  (cost=0.00..67.67
rows=3467 width=32) (actual time=0.000..0.000 rows=3467 loops=1)
   
->  Hash  (cost=8011.95..8011.95 rows=279395 width=16)
(actual time=1125.000..1125.000 rows=0 loops=1)
 
->  Seq Scan on lte_user t  (cost=0.00..8011.95
rows=279395 width=16) (actual time=0.000..670.000 rows=279395 loops=1)
Total runtime: 1406.000 ms

enable_hashjoin disabled

QUERY PLAN
Sort  (cost=14355.37..14364.03 rows=3467 width=48) (actual time=391.000..391.000 rows=3467 loops=1)
  Sort Key: c.sourceid, c.targetid
  ->  Nested Loop  (cost=271.52..14151.51 rows=3467 width=48) (actual time=203.000..359.000 rows=3467 loops=1)
    ->  Merge Join 
(cost=271.52..3490.83 rows=3467 width=40) (actual time=203.000..218.000
rows=3467 loops=1)
  Merge Cond: ("outer".user_id = "inner".sourceid)
 
->  Index Scan using lte_user_pkey on lte_user s 
(cost=0.00..16837.71 rows=279395 width=16) (actual time=0.000..126.000
rows=50034 loops=1)
 
->  Sort  (cost=271.52..280.19 rows=3467 width=32) (actual
time=15.000..30.000 rows=3467 loops=1)
   
Sort Key: c.sourceid
   
->  Seq Scan on candidates617004 c  (cost=0.00..67.67
rows=3467 width=32) (actual time=0.000..0.000 rows=3467 loops=1)
    ->  Index Scan using
lte_user_pkey on lte_user t  (cost=0.00..3.03 rows=1 width=16)
(actual time=0.031..0.036 rows=1 loops=3467)
  Index Cond: ("outer".targetid = t.user_id)
Total runtime: 406.000 ms

random_page_cost set to 1.5
--
QUERY PLAN
Sort  (cost=12702.62..12711.29 rows=3467 width=48) (actual time=1407.000..1407.000 rows=3467 loops=1)
  Sort Key: c.sourceid, c.targetid
  ->  Merge Join  (cost=9912.07..12498.77 rows=3467
width=48) (actual time=1391.000..1407.000 rows=3467 loops=1)
    Merge Cond: ("outer".user_id = "inner".sourceid)
    ->  Index Scan using
lte_user_pkey on lte_user s  (cost=0.00..12807.34 rows=279395
width=16) (actual time=0.000..46.000 rows=50034 loops=1)
    ->  Sort 
(cost=9912.07..9920.73 rows=3467 width=40) (actual
time=1188.000..1188.000 rows=3467 loops=1)
  Sort Key: c.sourceid
 
->  Hash Join  (cost=8710.44..9708.21 rows=3467 width=40)
(actual time=1157.000..1188.000 rows=3467 loops=1)
   
Hash Cond: ("outer".targetid = "inner".user_id)
   
->  Seq Scan on candidates617004 c  (cost=0.00..67.67
rows=3467 width=32) (actual time=0.000..15.000 rows=3467 loops=1)
   
->  Hash  (cost=8011.95..8011.95 rows=279395 width=16)
(actual time=1157.000..1157.000 rows=0 loops=1)
 
->  Seq Scan on lte_user t  (cost=0.00..8011.95
rows=279395 width=16) (actual time=0.000..750.000 rows=279395 loops=1)
Total runtime: 1422.000 ms

random_page_cost set to 1.5 and enable_hashjoin set to false
--

[PERFORM] How are text columns stored?

2005-06-27 Thread Meetesh Karia
Hi all,

I'm running PG 8.0.3 on WinXP and I'm coming across some performance
issues related to text columns.  Basically, it appears as though
PG is storing the text data inline with the rest of the row data as
queries that don't touch the text column are slower when there is data
in the text column than when there isn't.  According to section
8.3 of the doc:

"Long values are also stored in background tables so they do not interfere with 
rapid access to the shorter column values."

So, how long does a value have to be to be considered "long"?

If necessary, here is some more specific information about what I'm doing:

1) I create a new table and use 'COPY FROM' to populate it.  When
the data in the text column is limited to a max of 60 characters, this
part takes 2-3 minutes less than when the data is at its full
size.  The table will be populated with ~750k rows.  Here's
an example of the table I create (no, I didn't name the fields
"vc_field1", "vc_field2", etc ;) ):

    create table my_table_import
    (
    vc_field1 varchar(255),
    vc_field2 varchar(255),
    vc_field3 varchar(255),
    f_field1 float8,
    text_field1 text,
    ts_field1 timestamp,
    v_field4 varchar(255),
    i_field1 int8,
    i_field2 int8
    );

2) I populate i_field1 and i_field2 from lookup tables.  This step
takes about 7 mins longer with the full text data than with the limited
data.

    update my_table_import
    set i_field1 = f.i_field1,
    i_field2 = u.i_field2
    from lookup1 as f, lookup2 as u
    where vc_field2 = f.vc_field2
    and vc_field1 = u.vc_field1;

3) I then create an index on this table and run a couple of queries on
it.  Each of these queries takes about 10 minutes longer with the
full text data then without it.  Here's the index that I create
and an example of one of the queries that I run:

    create index idx_my_table_import_i1_i2 on my_table_import (i_field1, i_field2);
    analyze my_table_import;

    select i_field1, i_field2, max(ts_field1) as ts_field1, count(*) as dup_count
        from my_table_import
    where i_field1 between 0 and 
    group by i_field1, i_field2

Thanks for the help,
Meetesh Karia


Re: [PERFORM] ETL optimization

2005-06-23 Thread Meetesh Karia
I don't know what this will change wrt how often you need to run VACUUM
(I'm a SQL Server guy), but instead of an update and insert, try a
delete and insert.  You'll only have to find the duplicate rows
once and your insert doesn't need a where clause.

MeeteshOn 6/23/05, Bricklen Anderson <[EMAIL PROTECTED]> wrote:
Situation:I'm trying to optimize an ETL process with many upserts (~100k aggregated rows)(no duplicates allowed). The source (table t2) table holds around 14 millionrows, and I'm grabbing them 100,000 rows at a time from t2, resulting in about
100,000 distinct rows in the destination table (t1).What I've tried:i. FOR EXECUTE LOOP over my result set (aggregated results, 100k-ish rows), andtry an update first, check the ROW_COUNT, if 0, then do an insert.
...run time: approx. 25 minsii. in a function (pseudo code), (table name is dynamic):...up_stm :='UPDATE '||t1||' SET x=t2.xFROM(select sum(x),a,b,cfrom t2group by a,b,c) as t2
WHERE '||t1||'.a=t2.a AND '||t1||'.b=t2.b AND '||t1||'.c=t3.c';EXECUTE up_stm;ins_stm :='INSERT INTO '||t1||' (x,a,b,c) select x,a,b,cFROM (select sum(x) as x,a,b,c from t2 group by a,b,c) as t2
WHERE NOT EXISTS(select true from '||t1||'where '||t1||'.a=t2.aand '||t1||'.b=t2.band '||t1||'.c=t2.climit 1)';EXECUTE ins_stm;...takes about 7 minutes. The performance of this is reasonable, but there is room
for improvement.I think it's the NOT EXISTS subquery on the insert that makes the first runslow. Any revisions that may be faster (for the subquery)?Note, this subquery is necessary so that duplicates don't get into the target
table (t1).Subsequent runs will be mostly updates (and still slow), with few inserts. I'mnot seeing a way for that update statement to be sped up, but maybe someone elsedoes?iii. UNIQUE constraint on table "t1". This didn't seem to perform too badly with
fewer rows (preliminary tests), but as you'd expect, on error the wholetransaction would roll back. Is it possible to skip a row if it causes an error,as opposed to aborting the transaction altogether?
To summarize, I'm looking for the most efficient and fastest way to perform myupserts. Tips and/or references to pertinent docs are also appreciated!If any more information is necessary, please let me know.
(postgresql 8.0.3, linux)Cheers,Bricklen--___This e-mail may be privileged and/or confidential, and the sender doesnot waive any related rights and obligations. Any distribution, use or
copying of this e-mail or the information it contains by other than anintended recipient is unauthorized. If you received this e-mail inerror, please advise me (by return e-mail or otherwise) immediately.___
---(end of broadcast)---TIP 2: you can get off all lists at once with the unregister command(send "unregister YourEmailAddressHere" to 
[EMAIL PROTECTED])