Re: [PERFORM] TPC-R benchmarks

2003-10-07 Thread Timothy D. Witham

On Thu, 2003-09-25 at 14:32, Jenny Zhang wrote:
 I am running TPC-H with scale factor of 1 on RedHat7.2 with the kernel
 2.5.74.  Q17 can always finish in about 7 seconds on my system.  The
 execution plan is:

  I just want to point out that we are the OSDL are not running
a TPC-X anything.  We have fair use implementations of these 
benchmarks but because of differences our performance tests can
not be compared with the TPCC's benchmark results.

 
  Aggregate  (cost=780402.43..780402.43 rows=1 width=48)
-  Nested Loop  (cost=0.00..780397.50 rows=1973 width=48)
  Join Filter: (inner.l_quantity  (subplan))
  -  Seq Scan on part  (cost=0.00..8548.00 rows=197 width=12)
Filter: ((p_brand = 'Brand#31'::bpchar) AND (p_container
 = 'LG CASE'::bpchar))
  -  Index Scan using i_l_partkey on lineitem 
 (cost=0.00..124.32 rows=30 width=36)
Index Cond: (outer.p_partkey = lineitem.l_partkey)
  SubPlan
-  Aggregate  (cost=124.40..124.40 rows=1 width=11)
  -  Index Scan using i_l_partkey on lineitem 
 (cost=0.00..124.32 rows=30 width=11)
Index Cond: (l_partkey = $0)
 (11 rows)
 
 Hope this helps,
 Jenny
 On Thu, 2003-09-25 at 12:40, Oleg Lebedev wrote:
  I am running TPC-R benchmarks with a scale factor of 1, which correspond
  to approximately 1 GB database size on PostgreSQL 7.3.4 installed on
  CygWin on Windows XP. I dedicated 128 MB of shared memory to my postrges
  installation.
  Most of the queries were able to complete in a matter of minutes, but
  query 17 was taking hours and hours. The query is show below. Is there
  any way to optimize it ?
   
  select
   sum(l_extendedprice) / 7.0 as avg_yearly
  from
   lineitem,
   part
  where
   p_partkey = l_partkey
   and p_brand = 'Brand#11'
   and p_container = 'SM PKG'
   and l_quantity  (
select
 0.2 * avg(l_quantity)
from
 lineitem
where
 l_partkey = p_partkey
   );
   
  Thanks.
   
  Oleg
  
  *
  
  This e-mail may contain privileged or confidential material intended for the named 
  recipient only.
  If you are not the named recipient, delete this message and all attachments.
  Unauthorized reviewing, copying, printing, disclosing, or otherwise using 
  information in this e-mail is prohibited.
  We reserve the right to monitor e-mail sent through our network. 
  
  *
 
 
 ---(end of broadcast)---
 TIP 7: don't forget to increase your free space map settings
-- 
Timothy D. Witham - Lab Director - [EMAIL PROTECTED]
Open Source Development Lab Inc - A non-profit corporation
12725 SW Millikan Way - Suite 400 - Beaverton OR, 97005
(503)-626-2455 x11 (office)(503)-702-2871 (cell)
(503)-626-2436 (fax)


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] TPC-R benchmarks

2003-10-07 Thread Josh Berkus
Tom,

I've found the problem with TPC-R query #19.  And it, unfortunately, appears 
to be a problem in the PostgreSQL query planner.

To sum up the below:  it appears that whenever a set of WHERE conditions 
exceeds a certain level of complexity, the planner just ignores all 
applicable indexes and goes for a seq scan.   While this may be unavoidable 
to some degree, it seems to me that we need to raise the threshold of 
complexity at which it does this.

tpcr=# select version();
 version
-
 PostgreSQL 7.3.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3 20030226 
(prerelease) (SuSE Linux)
(1 row)

I've tested a number of indexes on the query, and found the two most efficient 
on subsets of the query.  Thus:

explain analyze
select
sum(l_extendedprice* (1 - l_discount)) as revenue
from
lineitem,
part
where
(
p_partkey = l_partkey
and p_brand = 'Brand#33'
and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
and l_quantity = 8 and l_quantity = 8 + 10
and p_size between 1 and 5
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
);
   
   
QUERY PLAN
--
 Aggregate  (cost=10380.70..10380.70 rows=1 width=30) (actual 
time=161.61..161.61 rows=1 loops=1)
   -  Nested Loop  (cost=0.00..10380.67 rows=13 width=30) (actual 
time=81.54..161.47 rows=17 loops=1)
 -  Index Scan using idx_part_1 on part  (cost=0.00..9466.33 rows=62 
width=4) (actual time=81.21..137.24 rows=98 loops=1)
   Index Cond: (p_brand = 'Brand#33'::bpchar)
   Filter: (((p_container = 'SM CASE'::bpchar) OR (p_container = 
'SM BOX'::bpchar) OR (p_container = 'SM PACK'::bpchar) OR (p_container = 'SM 
PKG'::bpchar)) AND (p_size = 1) AND (p_size = 5))
 -  Index Scan using idx_lineitem_3 on lineitem  (cost=0.00..14.84 
rows=1 width=26) (actual time=0.22..0.24 rows=0 loops=98)
   Index Cond: ((outer.p_partkey = lineitem.l_partkey) AND 
(lineitem.l_quantity = 8::numeric) AND (lineitem.l_quantity = 18::numeric))
   Filter: (((l_shipmode = 'AIR'::bpchar) OR (l_shipmode = 'AIR 
REG'::bpchar)) AND (l_shipinstruct = 'DELIVER IN PERSON'::bpchar))
 Total runtime: 161.71 msec



This also works for a similar query:

explain analyze
select
sum(l_extendedprice* (1 - l_discount)) as revenue
from
lineitem,
part
where
(
p_partkey = l_partkey
and p_brand = 'Brand#52'
and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
and l_quantity = 14 and l_quantity = 14 + 10
and p_size between 1 and 10
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
);

 Aggregate  (cost=11449.36..11449.36 rows=1 width=30) (actual 
time=195.72..195.72 rows=1 loops=1)
   -  Nested Loop  (cost=0.00..11449.29 rows=28 width=30) (actual 
time=56.42..195.39 rows=48 loops=1)
 -  Index Scan using idx_part_1 on part  (cost=0.00..9466.33 rows=139 
width=4) (actual time=56.15..153.17 rows=166 loops=1)
   Index Cond: (p_brand = 'Brand#52'::bpchar)
   Filter: (((p_container = 'MED BAG'::bpchar) OR (p_container = 
'MED BOX'::bpchar) OR (p_container = 'MED PKG'::bpchar) OR (p_container = 
'MED PACK'::bpchar)) AND (p_size = 1) AND (p_size = 10))
 -  Index Scan using idx_lineitem_3 on lineitem  (cost=0.00..14.29 
rows=1 width=26) (actual time=0.23..0.25 rows=0 loops=166)
   Index Cond: ((outer.p_partkey = lineitem.l_partkey) AND 
(lineitem.l_quantity = 14::numeric) AND (lineitem.l_quantity = 
24::numeric))
   Filter: (((l_shipmode = 'AIR'::bpchar) OR (l_shipmode = 'AIR 
REG'::bpchar)) AND (l_shipinstruct = 'DELIVER IN PERSON'::bpchar))
 Total runtime: 195.82 msec
(9 rows)


If, however, I combine the two where clauses with an OR, the planner gets 
confused and insists on loading the entire tables into memory (even though I 
don't have that much memory):

explain
select
sum(l_extendedprice* (1 - l_discount)) as revenue
from
lineitem,
part
where
(
p_partkey = l_partkey
and p_brand = 'Brand#33'
and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
and l_quantity = 8 and l_quantity = 8 + 10
and p_size between 1 and 5
and l_shipmode in 

Re: [PERFORM] TPC-R benchmarks

2003-10-03 Thread Oleg Lebedev
Josh,

I declared all the indexes that you suggested and ran vacuum full
analyze. The query plan has not changed and it's still trying to use
seqscan. I tried to disable seqscan, but the plan didn't change. Any
other suggestions?
I started explain analyze on the query, but I doubt it will finish any
time soon.
Thanks.

Oleg


-Original Message-
From: Josh Berkus [mailto:[EMAIL PROTECTED] 
Sent: Thursday, October 02, 2003 11:27 PM
To: Oleg Lebedev; scott.marlowe
Cc: [EMAIL PROTECTED]
Subject: Re: [PERFORM] TPC-R benchmarks


Oleg,

 I have another question. How do I optimize my indexes for the query 
 that contains a lot of ORed blocks, each of which contains a bunch of 
 ANDed expressions? The structure of each ORed block is the same except

 the right-hand-side values vary.

Given the example, I'd do a multicolumn index on p_brand, p_container,
p_size 
and a second multicolumn index on l_partkey, l_quantity, l_shipmode.
Hmmm 
... or maybe seperate indexes, one on l_partkey and one on l_quantity, 
l_shipmode  l_instruct.   Test both configurations.

Mind you, if this is also an OLTP table, then you'd want to test those 
multi-column indexes to determine the least columns you need for the
indexes 
still to be used, since more columns = more index maintainence.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

*

This e-mail may contain privileged or confidential material intended for the named 
recipient only.
If you are not the named recipient, delete this message and all attachments.
Unauthorized reviewing, copying, printing, disclosing, or otherwise using information 
in this e-mail is prohibited.
We reserve the right to monitor e-mail sent through our network. 

*

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] TPC-R benchmarks

2003-10-02 Thread Oleg Lebedev
I ran VACUUM FULL ANALYZE yesterday and the re-ran the query with
EXPLAIN ANALYZE.
I got the same query plan and execution time. 

-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 01, 2003 4:20 PM
To: Oleg Lebedev
Cc: Josh Berkus; scott.marlowe; [EMAIL PROTECTED]
Subject: Re: [PERFORM] TPC-R benchmarks


Oleg Lebedev [EMAIL PROTECTED] writes:
 All right, my query just finished running with EXPLAIN ANALYZE. I show

 the plan below and also attached it as a file. Any ideas?

Uh, have you done an ANALYZE (or VACUUM ANALYZE) on this database? It
sure looks like the planner thinks the tables are a couple of orders of
magnitude smaller than they actually are.  Certainly the estimated sizes
of the joins are way off :-(

If you did analyze, it might help to increase the statistics target and
re-analyze.

regards, tom lane

*

This e-mail may contain privileged or confidential material intended for the named 
recipient only.
If you are not the named recipient, delete this message and all attachments.
Unauthorized reviewing, copying, printing, disclosing, or otherwise using information 
in this e-mail is prohibited.
We reserve the right to monitor e-mail sent through our network. 

*

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] TPC-R benchmarks

2003-10-02 Thread Oleg Lebedev
As Scott recommended, I did the following:
# set enable_nestloop = false;
# vacuum full analyze;

After this I re-ran the query and its execution time went down from 2
hours to 2 minutes. I attached the new query plan to this posting.
Is there any way to optimize it even further?
What should I do to make this query run fast without hurting the
performance of the other queries?
Thanks.

Oleg

-Original Message-
From: scott.marlowe [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 01, 2003 4:00 PM
To: Oleg Lebedev
Cc: Josh Berkus; [EMAIL PROTECTED]
Subject: Re: [PERFORM] TPC-R benchmarks


For troubleshooting, can you try it with set enable_nestloop = false
and 
rerun the query and see how long it takes?  

It looks like the estimates of rows returned is WAY off (estimate is too

low compared to what really comes back.)

Also, you might try to alter the table.column to have a higher target on

the rows p_partkey and ps_partkey and any others where the estimate is
so 
far off of the reality.

On Wed, 1 Oct 2003, Oleg Lebedev wrote:

 All right, my query just finished running with EXPLAIN ANALYZE. I show

 the plan below and also attached it as a file. Any ideas?
 
-  Sort  (cost=54597.49..54597.50 rows=1 width=121) (actual 
 time=6674562.03..6674562.15 rows=175 loops=1)
  Sort Key: nation.n_name, date_part('year'::text,
 orders.o_orderdate)
  -  Aggregate  (cost=54597.45..54597.48 rows=1 width=121) 
 (actual time=6668919.41..6674522.48 rows=175 loops=1)
-  Group  (cost=54597.45..54597.47 rows=3 width=121) 
 (actual time=6668872.68..6672136.96 rows=348760 loops=1)
  -  Sort  (cost=54597.45..54597.46 rows=3
 width=121) (actual time=6668872.65..6669499.95 rows=348760 loops=1)
Sort Key: nation.n_name, 
 date_part('year'::text, orders.o_orderdate)
-  Hash Join  (cost=54596.00..54597.42 
 rows=3
 width=121) (actual time=6632768.89..6650192.67 rows=348760 loops=1)
  Hash Cond: (outer.n_nationkey =
 inner.s_nationkey)
  -  Seq Scan on nation 
 (cost=0.00..1.25 rows=25 width=33) (actual time=6.75..7.13 rows=25
 loops=1)
  -  Hash  (cost=54596.00..54596.00 
 rows=3
 width=88) (actual time=6632671.96..6632671.96 rows=0 loops=1)
-  Nested Loop 
 (cost=0.00..54596.00 rows=3 width=88) (actual time=482.41..6630601.46 
 rows=348760 loops=1)
  Join Filter: 
 (inner.s_suppkey = outer.l_suppkey)
  -  Nested Loop 
 (cost=0.00..54586.18 rows=3 width=80) (actual time=383.87..6594984.40 
 rows=348760 loops=1)
-  Nested Loop 
 (cost=0.00..54575.47 rows=4 width=68) (actual time=199.95..3580882.07 
 rows=348760 loops=1)
  Join Filter: 
 (outer.p_partkey = inner.ps_partkey)
  -  Nested 
 Loop (cost=0.00..22753.33 rows=9343 width=49) (actual 
 time=146.85..3541433.10 rows=348760 loops=1)
-  Seq

 Scan on part  (cost=0.00..7868.00 rows=320 width=4) (actual 
 time=33.64..15651.90 rows=11637 loops=1)
 
 Filter: (p_name ~~ '%green%'::text)
-  
 Index Scan using i_l_partkey on lineitem  (cost=0.00..46.15 rows=29 
 width=45) (actual time=10.71..302.67 rows=30 loops=11637)
  
 Index
 Cond: (outer.p_partkey = lineitem.l_partkey)
  -  Index 
 Scan using pk_partsupp on partsupp  (cost=0.00..3.39 rows=1 width=19) 
 (actual time=0.09..0.09 rows=1 loops=348760)
Index
 Cond: ((partsupp.ps_partkey = outer.l_partkey) AND 
 (partsupp.ps_suppkey =
 outer.l_suppkey))
-  Index Scan 
 using pk_orders on orders  (cost=0.00..3.01 rows=1 width=12) (actual 
 time=8.62..8.62 rows=1 loops=348760)
  Index Cond: 
 (orders.o_orderkey = outer.l_orderkey)
  -  Index Scan using 
 pk_supplier on supplier  (cost=0.00..3.01 rows=1 width=8) (actual 
 time=0.08..0.08 rows=1 loops=348760)
Index Cond: 
 (outer.ps_suppkey = supplier.s_suppkey)  Total runtime: 6674724.23 
 msec (28 rows)
 
 
 -Original Message-
 From: Oleg Lebedev
 Sent: Wednesday, October 01, 2003 12:00 PM
 To: Josh Berkus; scott.marlowe
 Cc: [EMAIL PROTECTED]
 Subject: Re: [PERFORM] TPC-R benchmarks
 Importance: Low
 
 
 Sure, below is the query. I attached the plan to this posting.
 
 select
   nation,
   o_year,
   sum(amount) as sum_profit

Re: [PERFORM] TPC-R benchmarks

2003-10-02 Thread scott.marlowe
Have you tried increasing the statistics target for those columns that are 
getting bad estimates yet and then turning back on enable_nestloop and 
rerunning analyze and seeing how the query does?  

The idea being to try and get a good enough estimate of your statistics so 
the planner stops using nestloops on its own rather than forcing it to 
with enable_nestloop = false.

On Thu, 2 Oct 2003, Oleg Lebedev wrote:

 As Scott recommended, I did the following:
 # set enable_nestloop = false;
 # vacuum full analyze;
 
 After this I re-ran the query and its execution time went down from 2
 hours to 2 minutes. I attached the new query plan to this posting.
 Is there any way to optimize it even further?
 What should I do to make this query run fast without hurting the
 performance of the other queries?
 Thanks.
 
 Oleg
 
 -Original Message-
 From: scott.marlowe [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, October 01, 2003 4:00 PM
 To: Oleg Lebedev
 Cc: Josh Berkus; [EMAIL PROTECTED]
 Subject: Re: [PERFORM] TPC-R benchmarks
 
 
 For troubleshooting, can you try it with set enable_nestloop = false
 and 
 rerun the query and see how long it takes?  
 
 It looks like the estimates of rows returned is WAY off (estimate is too
 
 low compared to what really comes back.)
 
 Also, you might try to alter the table.column to have a higher target on
 
 the rows p_partkey and ps_partkey and any others where the estimate is
 so 
 far off of the reality.
 
 On Wed, 1 Oct 2003, Oleg Lebedev wrote:
 
  All right, my query just finished running with EXPLAIN ANALYZE. I show
 
  the plan below and also attached it as a file. Any ideas?
  
 -  Sort  (cost=54597.49..54597.50 rows=1 width=121) (actual 
  time=6674562.03..6674562.15 rows=175 loops=1)
   Sort Key: nation.n_name, date_part('year'::text,
  orders.o_orderdate)
   -  Aggregate  (cost=54597.45..54597.48 rows=1 width=121) 
  (actual time=6668919.41..6674522.48 rows=175 loops=1)
 -  Group  (cost=54597.45..54597.47 rows=3 width=121) 
  (actual time=6668872.68..6672136.96 rows=348760 loops=1)
   -  Sort  (cost=54597.45..54597.46 rows=3
  width=121) (actual time=6668872.65..6669499.95 rows=348760 loops=1)
 Sort Key: nation.n_name, 
  date_part('year'::text, orders.o_orderdate)
 -  Hash Join  (cost=54596.00..54597.42 
  rows=3
  width=121) (actual time=6632768.89..6650192.67 rows=348760 loops=1)
   Hash Cond: (outer.n_nationkey =
  inner.s_nationkey)
   -  Seq Scan on nation 
  (cost=0.00..1.25 rows=25 width=33) (actual time=6.75..7.13 rows=25
  loops=1)
   -  Hash  (cost=54596.00..54596.00 
  rows=3
  width=88) (actual time=6632671.96..6632671.96 rows=0 loops=1)
 -  Nested Loop 
  (cost=0.00..54596.00 rows=3 width=88) (actual time=482.41..6630601.46 
  rows=348760 loops=1)
   Join Filter: 
  (inner.s_suppkey = outer.l_suppkey)
   -  Nested Loop 
  (cost=0.00..54586.18 rows=3 width=80) (actual time=383.87..6594984.40 
  rows=348760 loops=1)
 -  Nested Loop 
  (cost=0.00..54575.47 rows=4 width=68) (actual time=199.95..3580882.07 
  rows=348760 loops=1)
   Join Filter: 
  (outer.p_partkey = inner.ps_partkey)
   -  Nested 
  Loop (cost=0.00..22753.33 rows=9343 width=49) (actual 
  time=146.85..3541433.10 rows=348760 loops=1)
 -  Seq
 
  Scan on part  (cost=0.00..7868.00 rows=320 width=4) (actual 
  time=33.64..15651.90 rows=11637 loops=1)
  
  Filter: (p_name ~~ '%green%'::text)
 -  
  Index Scan using i_l_partkey on lineitem  (cost=0.00..46.15 rows=29 
  width=45) (actual time=10.71..302.67 rows=30 loops=11637)
   
  Index
  Cond: (outer.p_partkey = lineitem.l_partkey)
   -  Index 
  Scan using pk_partsupp on partsupp  (cost=0.00..3.39 rows=1 width=19) 
  (actual time=0.09..0.09 rows=1 loops=348760)
 Index
  Cond: ((partsupp.ps_partkey = outer.l_partkey) AND 
  (partsupp.ps_suppkey =
  outer.l_suppkey))
 -  Index Scan 
  using pk_orders on orders  (cost=0.00..3.01 rows=1 width=12) (actual 
  time=8.62..8.62 rows=1 loops=348760)
   Index Cond: 
  (orders.o_orderkey = outer.l_orderkey)
   -  Index Scan using 
  pk_supplier on supplier  (cost=0.00..3.01 rows=1 width=8) (actual 
  time=0.08

Re: [PERFORM] TPC-R benchmarks

2003-10-02 Thread scott.marlowe
On Thu, 2 Oct 2003, Oleg Lebedev wrote:

 I was trying to get the pg_stats information to Josh and decided to
 recreate the indexes on all my tables. After that I ran vacuum full
 analyze, re-enabled nestloop and ran explain analyze on the query. It
 ran in about 2 minutes.
 I attached the new query plan. I am not sure what did the trick, but 2
 minutes is much better than 2 hours. But then again, I can't take long
 lunches anymore :)
 Is there any way to make this query run even faster without increasing
 the memory dedicated to postgres?
 Thanks.

As long as the estimated row counts and real ones match up, and postgresql 
seems to be picking the right plan, there's probably not a lot to be done.  
You might want to look at increasing sort_mem a bit, but don't go crazy, 
as being too high can result in swap storms under load, which are a very 
bad thing.

I'd check for index growth.  You may have been reloading your data over 
and over and had an index growth problem.  Next time instead of recreating 
the indexed completely, you might wanna try reindex indexname.

Also, 7.4 mostly fixes the index growth issue, especially as it applies to 
truncating/reloading a table over and over, so moving to 7.4 beta3/4 and 
testing might be a good idea (if you aren't there already).

What you want to avoid is having postgresql switch back to that nestloop 
join on you in the middle of the day, and to prevent that you might need 
to have higher statistics targets so the planner gets the right number 
all the time.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] TPC-R benchmarks

2003-10-02 Thread Josh Berkus
Oleg,

 I have another question. How do I optimize my indexes for the query that
 contains a lot of ORed blocks, each of which contains a bunch of ANDed
 expressions? The structure of each ORed block is the same except the
 right-hand-side values vary.

Given the example, I'd do a multicolumn index on p_brand, p_container, p_size 
and a second multicolumn index on l_partkey, l_quantity, l_shipmode.  Hmmm 
... or maybe seperate indexes, one on l_partkey and one on l_quantity, 
l_shipmode  l_instruct.   Test both configurations.

Mind you, if this is also an OLTP table, then you'd want to test those 
multi-column indexes to determine the least columns you need for the indexes 
still to be used, since more columns = more index maintainence.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] TPC-R benchmarks

2003-10-01 Thread George Essig
Tom Lane wrote:

 When benchmarking with data sets considerably larger than available
 buffer cache, I rather doubt that small random_page_cost would be a 
 good idea.  Still, you might as well experiment to see.

From experience, I know the difference in response time can be huge when postgres 
incorrectly
chooses a sequential scan over an index scan.  In practice, do people experience as 
great a
difference when postgres incorrectly chooses an index scan over a sequential scan?  My 
intuition
is that the speed difference is a lot less for incorrectly choosing an index scan.  If 
this is the
case, it would be safer to chose a small value for random_page_cost. 

George Essig

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] TPC-R benchmarks

2003-10-01 Thread Josh Berkus
Oleg,

 I continue struggling with the TPC-R benchmarks and wonder if anyone
 could help me optimize the query below. ANALYZE statistics indicate that
 the query should run relatively fast, but it takes hours to complete. I
 attached the query plan to this posting.

Even though it takes hours to complete, I think we need you to run EXPLAIN 
ANALYZE instead of just EXPLAIN.   Without the real-time statistics, we 
simply can't see what's slowing the query down.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] TPC-R benchmarks

2003-10-01 Thread Oleg Lebedev
The output of the query should contain about 200 rows. So, I guess the
planer is off assuming that the query should return 1 row.

I will start EXPLAIN ANALYZE now.

Thanks.

Oleg

-Original Message-
From: scott.marlowe [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 01, 2003 7:23 AM
To: Oleg Lebedev
Cc: [EMAIL PROTECTED]
Subject: Re: [PERFORM] TPC-R benchmarks


On Tue, 30 Sep 2003, Oleg Lebedev wrote:

 I continue struggling with the TPC-R benchmarks and wonder if anyone 
 could help me optimize the query below. ANALYZE statistics indicate 
 that the query should run relatively fast, but it takes hours to 
 complete. I attached the query plan to this posting. Thanks.

What are the differences between estimated and real rows and such of an 
explain analyze on that query?  Are there any estimates that are just
way 
off?

*

This e-mail may contain privileged or confidential material intended for the named 
recipient only.
If you are not the named recipient, delete this message and all attachments.
Unauthorized reviewing, copying, printing, disclosing, or otherwise using information 
in this e-mail is prohibited.
We reserve the right to monitor e-mail sent through our network. 

*

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] TPC-R benchmarks

2003-10-01 Thread Oleg Lebedev
All right, my query just finished running with EXPLAIN ANALYZE.
I show the plan below and also attached it as a file.
Any ideas?

   -  Sort  (cost=54597.49..54597.50 rows=1 width=121) (actual
time=6674562.03..6674562.15 rows=175 loops=1)
 Sort Key: nation.n_name, date_part('year'::text,
orders.o_orderdate)
 -  Aggregate  (cost=54597.45..54597.48 rows=1 width=121)
(actual time=6668919.41..6674522.48 rows=175 loops=1)
   -  Group  (cost=54597.45..54597.47 rows=3 width=121)
(actual time=6668872.68..6672136.96 rows=348760 loops=1)
 -  Sort  (cost=54597.45..54597.46 rows=3
width=121) (actual time=6668872.65..6669499.95 rows=348760 loops=1)
   Sort Key: nation.n_name,
date_part('year'::text, orders.o_orderdate)
   -  Hash Join  (cost=54596.00..54597.42
rows=3
width=121) (actual time=6632768.89..6650192.67 rows=348760 loops=1)
 Hash Cond: (outer.n_nationkey =
inner.s_nationkey)
 -  Seq Scan on nation
(cost=0.00..1.25 rows=25 width=33) (actual time=6.75..7.13 rows=25
loops=1)
 -  Hash  (cost=54596.00..54596.00
rows=3
width=88) (actual time=6632671.96..6632671.96 rows=0 loops=1)
   -  Nested Loop
(cost=0.00..54596.00 rows=3 width=88) (actual time=482.41..6630601.46
rows=348760 loops=1)
 Join Filter:
(inner.s_suppkey = outer.l_suppkey)
 -  Nested Loop
(cost=0.00..54586.18 rows=3 width=80) (actual time=383.87..6594984.40
rows=348760 loops=1)
   -  Nested Loop
(cost=0.00..54575.47 rows=4 width=68) (actual time=199.95..3580882.07
rows=348760 loops=1)
 Join Filter:
(outer.p_partkey = inner.ps_partkey)
 -  Nested Loop
(cost=0.00..22753.33 rows=9343 width=49) (actual time=146.85..3541433.10
rows=348760 loops=1)
   -  Seq
Scan on part  (cost=0.00..7868.00 rows=320 width=4) (actual
time=33.64..15651.90 rows=11637 loops=1)

Filter: (p_name ~~ '%green%'::text)
   -  Index
Scan using i_l_partkey on lineitem  (cost=0.00..46.15 rows=29 width=45)
(actual time=10.71..302.67 rows=30 loops=11637)
 
Index
Cond: (outer.p_partkey = lineitem.l_partkey)
 -  Index Scan
using pk_partsupp on partsupp  (cost=0.00..3.39 rows=1 width=19) (actual
time=0.09..0.09 rows=1 loops=348760)
   Index
Cond: ((partsupp.ps_partkey = outer.l_partkey) AND
(partsupp.ps_suppkey =
outer.l_suppkey))
   -  Index Scan using
pk_orders on orders  (cost=0.00..3.01 rows=1 width=12) (actual
time=8.62..8.62 rows=1 loops=348760)
 Index Cond:
(orders.o_orderkey = outer.l_orderkey)
 -  Index Scan using
pk_supplier on supplier  (cost=0.00..3.01 rows=1 width=8) (actual
time=0.08..0.08 rows=1 loops=348760)
   Index Cond:
(outer.ps_suppkey = supplier.s_suppkey)  Total runtime: 6674724.23
msec (28 rows)


-Original Message-
From: Oleg Lebedev 
Sent: Wednesday, October 01, 2003 12:00 PM
To: Josh Berkus; scott.marlowe
Cc: [EMAIL PROTECTED]
Subject: Re: [PERFORM] TPC-R benchmarks
Importance: Low


Sure, below is the query. I attached the plan to this posting.

select
nation,
o_year,
sum(amount) as sum_profit
from
(
select
n_name as nation,
extract(year from o_orderdate) as o_year,
l_extendedprice * (1 - l_discount) -
ps_supplycost * l_quantity as amount
from
part,
supplier,
lineitem,
partsupp,
orders,
nation
where
s_suppkey = l_suppkey
and ps_suppkey = l_suppkey
and ps_partkey = l_partkey
and p_partkey = l_partkey
and o_orderkey = l_orderkey
and s_nationkey = n_nationkey
and p_name like '%green%'
) as profit
group by
nation,
o_year
order by
nation,
o_year desc;


-Original Message-
From: Josh Berkus [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 01, 2003 11:42 AM
To: Oleg Lebedev; scott.marlowe
Cc: [EMAIL PROTECTED]
Subject: Re: [PERFORM] TPC-R benchmarks


Oleg

Re: [PERFORM] TPC-R benchmarks

2003-10-01 Thread scott.marlowe
For troubleshooting, can you try it with set enable_nestloop = false and 
rerun the query and see how long it takes?  

It looks like the estimates of rows returned is WAY off (estimate is too 
low compared to what really comes back.)

Also, you might try to alter the table.column to have a higher target on 
the rows p_partkey and ps_partkey and any others where the estimate is so 
far off of the reality.

On Wed, 1 Oct 2003, Oleg Lebedev wrote:

 All right, my query just finished running with EXPLAIN ANALYZE.
 I show the plan below and also attached it as a file.
 Any ideas?
 
-  Sort  (cost=54597.49..54597.50 rows=1 width=121) (actual
 time=6674562.03..6674562.15 rows=175 loops=1)
  Sort Key: nation.n_name, date_part('year'::text,
 orders.o_orderdate)
  -  Aggregate  (cost=54597.45..54597.48 rows=1 width=121)
 (actual time=6668919.41..6674522.48 rows=175 loops=1)
-  Group  (cost=54597.45..54597.47 rows=3 width=121)
 (actual time=6668872.68..6672136.96 rows=348760 loops=1)
  -  Sort  (cost=54597.45..54597.46 rows=3
 width=121) (actual time=6668872.65..6669499.95 rows=348760 loops=1)
Sort Key: nation.n_name,
 date_part('year'::text, orders.o_orderdate)
-  Hash Join  (cost=54596.00..54597.42
 rows=3
 width=121) (actual time=6632768.89..6650192.67 rows=348760 loops=1)
  Hash Cond: (outer.n_nationkey =
 inner.s_nationkey)
  -  Seq Scan on nation
 (cost=0.00..1.25 rows=25 width=33) (actual time=6.75..7.13 rows=25
 loops=1)
  -  Hash  (cost=54596.00..54596.00
 rows=3
 width=88) (actual time=6632671.96..6632671.96 rows=0 loops=1)
-  Nested Loop
 (cost=0.00..54596.00 rows=3 width=88) (actual time=482.41..6630601.46
 rows=348760 loops=1)
  Join Filter:
 (inner.s_suppkey = outer.l_suppkey)
  -  Nested Loop
 (cost=0.00..54586.18 rows=3 width=80) (actual time=383.87..6594984.40
 rows=348760 loops=1)
-  Nested Loop
 (cost=0.00..54575.47 rows=4 width=68) (actual time=199.95..3580882.07
 rows=348760 loops=1)
  Join Filter:
 (outer.p_partkey = inner.ps_partkey)
  -  Nested Loop
 (cost=0.00..22753.33 rows=9343 width=49) (actual time=146.85..3541433.10
 rows=348760 loops=1)
-  Seq
 Scan on part  (cost=0.00..7868.00 rows=320 width=4) (actual
 time=33.64..15651.90 rows=11637 loops=1)
 
 Filter: (p_name ~~ '%green%'::text)
-  Index
 Scan using i_l_partkey on lineitem  (cost=0.00..46.15 rows=29 width=45)
 (actual time=10.71..302.67 rows=30 loops=11637)
  
 Index
 Cond: (outer.p_partkey = lineitem.l_partkey)
  -  Index Scan
 using pk_partsupp on partsupp  (cost=0.00..3.39 rows=1 width=19) (actual
 time=0.09..0.09 rows=1 loops=348760)
Index
 Cond: ((partsupp.ps_partkey = outer.l_partkey) AND
 (partsupp.ps_suppkey =
 outer.l_suppkey))
-  Index Scan using
 pk_orders on orders  (cost=0.00..3.01 rows=1 width=12) (actual
 time=8.62..8.62 rows=1 loops=348760)
  Index Cond:
 (orders.o_orderkey = outer.l_orderkey)
  -  Index Scan using
 pk_supplier on supplier  (cost=0.00..3.01 rows=1 width=8) (actual
 time=0.08..0.08 rows=1 loops=348760)
Index Cond:
 (outer.ps_suppkey = supplier.s_suppkey)  Total runtime: 6674724.23
 msec (28 rows)
 
 
 -Original Message-
 From: Oleg Lebedev 
 Sent: Wednesday, October 01, 2003 12:00 PM
 To: Josh Berkus; scott.marlowe
 Cc: [EMAIL PROTECTED]
 Subject: Re: [PERFORM] TPC-R benchmarks
 Importance: Low
 
 
 Sure, below is the query. I attached the plan to this posting.
 
 select
   nation,
   o_year,
   sum(amount) as sum_profit
 from
   (
   select
   n_name as nation,
   extract(year from o_orderdate) as o_year,
   l_extendedprice * (1 - l_discount) -
 ps_supplycost * l_quantity as amount
   from
   part,
   supplier,
   lineitem,
   partsupp,
   orders,
   nation
   where
   s_suppkey = l_suppkey
   and ps_suppkey = l_suppkey
   and ps_partkey = l_partkey

Re: [PERFORM] TPC-R benchmarks

2003-10-01 Thread Tom Lane
Oleg Lebedev [EMAIL PROTECTED] writes:
 All right, my query just finished running with EXPLAIN ANALYZE.
 I show the plan below and also attached it as a file.
 Any ideas?

Uh, have you done an ANALYZE (or VACUUM ANALYZE) on this database?
It sure looks like the planner thinks the tables are a couple of orders
of magnitude smaller than they actually are.  Certainly the estimated
sizes of the joins are way off :-(

If you did analyze, it might help to increase the statistics target and
re-analyze.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] TPC-R benchmarks

2003-10-01 Thread Josh Berkus
Oleg,

 All right, my query just finished running with EXPLAIN ANALYZE.
 I show the plan below and also attached it as a file.
 Any ideas?

Yes.  Your problem appears to be right here:

-  Nested Loop
 (cost=0.00..54596.00 rows=3 width=88) (actual time=482.41..6630601.46
 rows=348760 loops=1)
  Join Filter:
 (inner.s_suppkey = outer.l_suppkey)
  -  Nested Loop
 (cost=0.00..54586.18 rows=3 width=80) (actual time=383.87..6594984.40
 rows=348760 loops=1)
-  Nested Loop
 (cost=0.00..54575.47 rows=4 width=68) (actual time=199.95..3580882.07
 rows=348760 loops=1)
  Join Filter:
 (outer.p_partkey = inner.ps_partkey)
  -  Nested Loop
 (cost=0.00..22753.33 rows=9343 width=49) (actual time=146.85..3541433.10
 rows=348760 loops=1)

For some reason, the row estimate on the supplier -- lineitem join is bad, as 
is the estimate on part -- partsupp.Let me first check two things:

1) You have an index on l_suppkey and on ps_partkey.
2) you have run ANALYZE on your whole database before the query

If both of those are true, I'd like to see the lines in pg_stats that apply to 
ps_partkey and l_suppkey; please do a:

SELECT * FROM pg_stats WHERE attname = 'l_suppkey' or attname = 'ps_partkey'


-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] TPC-R benchmarks

2003-09-29 Thread Oleg Lebedev
I left my TPC-R query #17 working over the weekend and it took 3988 mins
~ 10 hours to complete. And this is considering that I am using a TPC-R
database created with a scale factor of 1, which corresponds to ~1 GB of
data. I am running RedHat 8.0 on a dual 1 GHz processor, 512 MB RAM.

Here is an excerpt from my postgresql.conf file (the rest of the
settings are commented out):

#
#   Shared Memory Size
#
shared_buffers = 16384  # 2*max_connections, min 16, typically
8KB each

#
#   Non-shared Memory Sizes
#
sort_mem = 32768

#
#   Optimizer Parameters
#
effective_cache_size = 32000# typically 8KB each

Any suggestions on how to optimize these settings?

I agree with Jenny that declaring additional indexes on the TPC-R tables
may alter the validity of the benchmarks. Are there any official TPC
benchmarks submitted by PostgreSQL? 

Thanks.

Oleg

-Original Message-
From: Mary Edie Meredith [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 26, 2003 10:12 AM
To: Tom Lane
Cc: Oleg Lebedev; Jenny Zhang; pgsql-performance
Subject: Re: [PERFORM] TPC-R benchmarks


The TPC-H/R rules allow only minor changes to the SQL that are necessary
due to SQL implementation differences. They do not allow changes made to
improve performance.  It is their way to test optimizer's ability to
recognize an inefficient SQL statement and do the rewrite.

The rule makes sense for the TPC-H, which is supposed to represent
ad-Hoc query.  One might argue that for TPC-R, which is suppose to
represent Reporting with pre-knowledge of the query, that re-write
should be allowed. However, that is currently not the case. Since the
RDBMS's represented on the TPC council are competing with TPC-H, their
optimizers already do the re-write, so (IMHO) there is no motivation to
relax the rules for the TPC-R.


On Thu, 2003-09-25 at 21:28, Tom Lane wrote:
 Oleg Lebedev [EMAIL PROTECTED] writes:
  Seems like in your case postgres uses an i_l_partkey index on 
  lineitem table. I have a foreign key constraint defined between the 
  lineitem and part table, but didn't create an special indexes. Here 
  is my query plan:
 
 The planner is obviously unhappy with this plan (note the large cost 
 numbers), but it can't find a way to do better.  An index on 
 lineitem.l_partkey would help, I think.
 
 The whole query seems like it's written in a very inefficient fashion;

 couldn't the estimation of '0.2 * avg(l_quantity)' be amortized across

 multiple join rows?  But I dunno whether the TPC rules allow for 
 significant manual rewriting of the given query.
 
   regards, tom lane
 
 ---(end of 
 broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that
your
   message can get through to the mailing list cleanly
-- 
Mary Edie Meredith [EMAIL PROTECTED]
Open Source Development Lab

*

This e-mail may contain privileged or confidential material intended for the named 
recipient only.
If you are not the named recipient, delete this message and all attachments.
Unauthorized reviewing, copying, printing, disclosing, or otherwise using information 
in this e-mail is prohibited.
We reserve the right to monitor e-mail sent through our network. 

*


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] TPC-R benchmarks

2003-09-29 Thread Mary Edie Meredith
On Mon, 2003-09-29 at 07:35, Oleg Lebedev wrote:
 I left my TPC-R query #17 working over the weekend and it took 3988 mins
 ~ 10 hours to complete. And this is considering that I am using a TPC-R
 database created with a scale factor of 1, which corresponds to ~1 GB of
 data. I am running RedHat 8.0 on a dual 1 GHz processor, 512 MB RAM.

Was this run with or without the l_partkey index that Jenny suggested? 

 
 Here is an excerpt from my postgresql.conf file (the rest of the
 settings are commented out):
 
 #
 # Shared Memory Size
 #
 shared_buffers = 16384# 2*max_connections, min 16, typically
 8KB each
 
 #
 # Non-shared Memory Sizes
 #
 sort_mem = 32768
 
 #
 # Optimizer Parameters
 #
 effective_cache_size = 32000  # typically 8KB each
 
 Any suggestions on how to optimize these settings?
 
 I agree with Jenny that declaring additional indexes on the TPC-R tables
 may alter the validity of the benchmarks. Are there any official TPC
 benchmarks submitted by PostgreSQL? 

Actually, for the TPC-R you _are allowed to declare additional indexes. 
With TPC-H you are restricted to a specific set listed in the spec (an
index on l_partkey is allowed for both).

What you cannot do for either TPC-R or TPC-H is rewrite the SQL of the
query for the purposes of making the query run faster.

Sorry if I was unclear.

Valid TPC-R benchmark results are on the TPC web site:
http://www.tpc.org/tpcr/default.asp  

I do not see one for PostgreSQL.


Regards,

Mary 

-- 
Mary Edie Meredith [EMAIL PROTECTED]
Open Source Development Lab

 
 Thanks.
 
 Oleg
 
 -Original Message-
 From: Mary Edie Meredith [mailto:[EMAIL PROTECTED] 
 Sent: Friday, September 26, 2003 10:12 AM
 To: Tom Lane
 Cc: Oleg Lebedev; Jenny Zhang; pgsql-performance
 Subject: Re: [PERFORM] TPC-R benchmarks
 
 
 The TPC-H/R rules allow only minor changes to the SQL that are necessary
 due to SQL implementation differences. They do not allow changes made to
 improve performance.  It is their way to test optimizer's ability to
 recognize an inefficient SQL statement and do the rewrite.
 
 The rule makes sense for the TPC-H, which is supposed to represent
 ad-Hoc query.  One might argue that for TPC-R, which is suppose to
 represent Reporting with pre-knowledge of the query, that re-write
 should be allowed. However, that is currently not the case. Since the
 RDBMS's represented on the TPC council are competing with TPC-H, their
 optimizers already do the re-write, so (IMHO) there is no motivation to
 relax the rules for the TPC-R.
 
 
 On Thu, 2003-09-25 at 21:28, Tom Lane wrote:
  Oleg Lebedev [EMAIL PROTECTED] writes:
   Seems like in your case postgres uses an i_l_partkey index on 
   lineitem table. I have a foreign key constraint defined between the 
   lineitem and part table, but didn't create an special indexes. Here 
   is my query plan:
  
  The planner is obviously unhappy with this plan (note the large cost 
  numbers), but it can't find a way to do better.  An index on 
  lineitem.l_partkey would help, I think.
  
  The whole query seems like it's written in a very inefficient fashion;
 
  couldn't the estimation of '0.2 * avg(l_quantity)' be amortized across
 
  multiple join rows?  But I dunno whether the TPC rules allow for 
  significant manual rewriting of the given query.
  
  regards, tom lane
  
  ---(end of 
  broadcast)---
  TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that
 your
message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] TPC-R benchmarks

2003-09-29 Thread Oleg Lebedev

It took 10 hours to compute the query without the index on
lineitem.l_partkey.
Once I created the index on lineitem.l_partkey, it took only 32 secs to
run the same query.  
After VACUUM ANALYZE it took 72 secs to run the query.
All the subsequent runs took under 3 seconds!

That's quite amazing!

I just checked  

-Original Message-
From: Mary Edie Meredith [mailto:[EMAIL PROTECTED] 
Sent: Monday, September 29, 2003 10:04 AM
To: Oleg Lebedev
Cc: Tom Lane; Jenny Zhang; pgsql-performance
Subject: RE: [PERFORM] TPC-R benchmarks


On Mon, 2003-09-29 at 07:35, Oleg Lebedev wrote:
 I left my TPC-R query #17 working over the weekend and it took 3988 
 mins ~ 10 hours to complete. And this is considering that I am using a

 TPC-R database created with a scale factor of 1, which corresponds to 
 ~1 GB of data. I am running RedHat 8.0 on a dual 1 GHz processor, 512 
 MB RAM.

Was this run with or without the l_partkey index that Jenny suggested? 

 
 Here is an excerpt from my postgresql.conf file (the rest of the 
 settings are commented out):
 
 #
 # Shared Memory Size
 #
 shared_buffers = 16384# 2*max_connections, min 16,
typically
 8KB each
 
 #
 # Non-shared Memory Sizes
 #
 sort_mem = 32768
 
 #
 # Optimizer Parameters
 #
 effective_cache_size = 32000  # typically 8KB each
 
 Any suggestions on how to optimize these settings?
 
 I agree with Jenny that declaring additional indexes on the TPC-R 
 tables may alter the validity of the benchmarks. Are there any 
 official TPC benchmarks submitted by PostgreSQL?

Actually, for the TPC-R you _are allowed to declare additional indexes. 
With TPC-H you are restricted to a specific set listed in the spec (an
index on l_partkey is allowed for both).

What you cannot do for either TPC-R or TPC-H is rewrite the SQL of the
query for the purposes of making the query run faster.

Sorry if I was unclear.

Valid TPC-R benchmark results are on the TPC web site:
http://www.tpc.org/tpcr/default.asp  

I do not see one for PostgreSQL.


Regards,

Mary 

-- 
Mary Edie Meredith [EMAIL PROTECTED]
Open Source Development Lab

 
 Thanks.
 
 Oleg
 
 -Original Message-
 From: Mary Edie Meredith [mailto:[EMAIL PROTECTED]
 Sent: Friday, September 26, 2003 10:12 AM
 To: Tom Lane
 Cc: Oleg Lebedev; Jenny Zhang; pgsql-performance
 Subject: Re: [PERFORM] TPC-R benchmarks
 
 
 The TPC-H/R rules allow only minor changes to the SQL that are 
 necessary due to SQL implementation differences. They do not allow 
 changes made to improve performance.  It is their way to test 
 optimizer's ability to recognize an inefficient SQL statement and do 
 the rewrite.
 
 The rule makes sense for the TPC-H, which is supposed to represent 
 ad-Hoc query.  One might argue that for TPC-R, which is suppose to 
 represent Reporting with pre-knowledge of the query, that re-write 
 should be allowed. However, that is currently not the case. Since the 
 RDBMS's represented on the TPC council are competing with TPC-H, their

 optimizers already do the re-write, so (IMHO) there is no motivation 
 to relax the rules for the TPC-R.
 
 
 On Thu, 2003-09-25 at 21:28, Tom Lane wrote:
  Oleg Lebedev [EMAIL PROTECTED] writes:
   Seems like in your case postgres uses an i_l_partkey index on
   lineitem table. I have a foreign key constraint defined between
the 
   lineitem and part table, but didn't create an special indexes.
Here 
   is my query plan:
  
  The planner is obviously unhappy with this plan (note the large cost
  numbers), but it can't find a way to do better.  An index on 
  lineitem.l_partkey would help, I think.
  
  The whole query seems like it's written in a very inefficient 
  fashion;
 
  couldn't the estimation of '0.2 * avg(l_quantity)' be amortized 
  across
 
  multiple join rows?  But I dunno whether the TPC rules allow for
  significant manual rewriting of the given query.
  
  regards, tom lane
  
  ---(end of
  broadcast)---
  TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that
 your
message can get through to the mailing list cleanly

*

This e-mail may contain privileged or confidential material intended for the named 
recipient only.
If you are not the named recipient, delete this message and all attachments.
Unauthorized reviewing, copying, printing, disclosing, or otherwise using information 
in this e-mail is prohibited.
We reserve the right to monitor e-mail sent through our network. 

*

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] TPC-R benchmarks

2003-09-29 Thread Tom Lane
Shridhar Daithankar [EMAIL PROTECTED] writes:
 Also if you have fast disk drives, you can reduce random page cost to 2 or 1.5.

Note however that most of the people who have found smaller
random_page_cost to be helpful are in situations where most of their
data fits in RAM.  Reducing the cost towards 1 simply reflects the fact
that there's no sequential-fetch advantage when grabbing data that's
already in RAM.

When benchmarking with data sets considerably larger than available
buffer cache, I rather doubt that small random_page_cost would be a good
idea.  Still, you might as well experiment to see.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] TPC-R benchmarks

2003-09-29 Thread Oleg Lebedev

Oops, my previous message got cut off.
Here is the end of it:
I just checked the restrictions on the TPC-R and TPC-H schemas and it
seems that all indexes are allowed in TPC-R and only those that index
parts of primary or foreign keys are allowed in TPC-H.
Thanks.

Oleg 

-Original Message-
From: Oleg Lebedev 
Sent: Monday, September 29, 2003 11:23 AM
To: Mary Edie Meredith
Cc: Jenny Zhang; pgsql-performance
Subject: Re: [PERFORM] TPC-R benchmarks
Importance: Low



It took 10 hours to compute the query without the index on
lineitem.l_partkey. Once I created the index on lineitem.l_partkey, it
took only 32 secs to run the same query.  
After VACUUM ANALYZE it took 72 secs to run the query.
All the subsequent runs took under 3 seconds!

That's quite amazing!

I just checked  

-Original Message-
From: Mary Edie Meredith [mailto:[EMAIL PROTECTED] 
Sent: Monday, September 29, 2003 10:04 AM
To: Oleg Lebedev
Cc: Tom Lane; Jenny Zhang; pgsql-performance
Subject: RE: [PERFORM] TPC-R benchmarks


On Mon, 2003-09-29 at 07:35, Oleg Lebedev wrote:
 I left my TPC-R query #17 working over the weekend and it took 3988
 mins ~ 10 hours to complete. And this is considering that I am using a

 TPC-R database created with a scale factor of 1, which corresponds to
 ~1 GB of data. I am running RedHat 8.0 on a dual 1 GHz processor, 512 
 MB RAM.

Was this run with or without the l_partkey index that Jenny suggested? 

 
 Here is an excerpt from my postgresql.conf file (the rest of the
 settings are commented out):
 
 #
 # Shared Memory Size
 #
 shared_buffers = 16384# 2*max_connections, min 16,
typically
 8KB each
 
 #
 # Non-shared Memory Sizes
 #
 sort_mem = 32768
 
 #
 # Optimizer Parameters
 #
 effective_cache_size = 32000  # typically 8KB each
 
 Any suggestions on how to optimize these settings?
 
 I agree with Jenny that declaring additional indexes on the TPC-R
 tables may alter the validity of the benchmarks. Are there any 
 official TPC benchmarks submitted by PostgreSQL?

Actually, for the TPC-R you _are allowed to declare additional indexes. 
With TPC-H you are restricted to a specific set listed in the spec (an
index on l_partkey is allowed for both).

What you cannot do for either TPC-R or TPC-H is rewrite the SQL of the
query for the purposes of making the query run faster.

Sorry if I was unclear.

Valid TPC-R benchmark results are on the TPC web site:
http://www.tpc.org/tpcr/default.asp  

I do not see one for PostgreSQL.


Regards,

Mary 

-- 
Mary Edie Meredith [EMAIL PROTECTED]
Open Source Development Lab

 
 Thanks.
 
 Oleg
 
 -Original Message-
 From: Mary Edie Meredith [mailto:[EMAIL PROTECTED]
 Sent: Friday, September 26, 2003 10:12 AM
 To: Tom Lane
 Cc: Oleg Lebedev; Jenny Zhang; pgsql-performance
 Subject: Re: [PERFORM] TPC-R benchmarks
 
 
 The TPC-H/R rules allow only minor changes to the SQL that are
 necessary due to SQL implementation differences. They do not allow 
 changes made to improve performance.  It is their way to test 
 optimizer's ability to recognize an inefficient SQL statement and do 
 the rewrite.
 
 The rule makes sense for the TPC-H, which is supposed to represent
 ad-Hoc query.  One might argue that for TPC-R, which is suppose to 
 represent Reporting with pre-knowledge of the query, that re-write 
 should be allowed. However, that is currently not the case. Since the 
 RDBMS's represented on the TPC council are competing with TPC-H, their

 optimizers already do the re-write, so (IMHO) there is no motivation
 to relax the rules for the TPC-R.
 
 
 On Thu, 2003-09-25 at 21:28, Tom Lane wrote:
  Oleg Lebedev [EMAIL PROTECTED] writes:
   Seems like in your case postgres uses an i_l_partkey index on 
   lineitem table. I have a foreign key constraint defined between
the 
   lineitem and part table, but didn't create an special indexes.
Here 
   is my query plan:
  
  The planner is obviously unhappy with this plan (note the large cost

  numbers), but it can't find a way to do better.  An index on 
  lineitem.l_partkey would help, I think.
  
  The whole query seems like it's written in a very inefficient
  fashion;
 
  couldn't the estimation of '0.2 * avg(l_quantity)' be amortized
  across
 
  multiple join rows?  But I dunno whether the TPC rules allow for 
  significant manual rewriting of the given query.
  
  regards, tom lane
  
  ---(end of
  broadcast)---
  TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that
 your
message can get through to the mailing list cleanly

*

This e-mail may contain privileged or confidential material intended for
the named recipient only. If you are not the named recipient, delete
this message and all attachments. Unauthorized reviewing, copying,
printing, disclosing, or otherwise using information

Re: [PERFORM] TPC-R benchmarks

2003-09-29 Thread greg

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


 I'm pretty certain that there are no TPC-certified test results for
 Postgres, because to date no organization has cared to spend the money
 needed to perform a certifiable test.

Anyone have a rough idea of the costs involved?


- --
Greg Sabino Mullane [EMAIL PROTECTED]
PGP Key: 0x14964AC8 200309291344
-BEGIN PGP SIGNATURE-
Comment: http://www.turnstep.com/pgp.html

iD8DBQE/eG+avJuQZxSWSsgRApDFAJ4md34LacZhJbjnydjNGzqfLy2IzQCg5m/8
XiD273M2ugzCWd7YF5zbkio=
=jGkx
-END PGP SIGNATURE-



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] TPC-R benchmarks

2003-09-29 Thread Andrew Sullivan
On Mon, Sep 29, 2003 at 05:43:26PM -, [EMAIL PROTECTED] wrote:
 
 Anyone have a rough idea of the costs involved?

I did a back-of-an-envelope calculation one day and stopped when I
got to $10,000.

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
[EMAIL PROTECTED]  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] TPC-R benchmarks

2003-09-29 Thread Oleg Lebedev
Yes Josh,
L_partkey is a part of the foreign key on the Lineitem table, and it was
ok to create an index on it according to the TPC-R specs. I just created
indices on the rest of the FK columns in the TPC-R database and will
continue my evaluations.
Thanks.

Oleg 

-Original Message-
From: Josh Berkus [mailto:[EMAIL PROTECTED] 
Sent: Monday, September 29, 2003 12:11 PM
To: Oleg Lebedev; Mary Edie Meredith
Cc: Jenny Zhang; pgsql-performance
Subject: Re: [PERFORM] TPC-R benchmarks


Oleg,

 I just checked the restrictions on the TPC-R and TPC-H schemas and it 
 seems that all indexes are allowed in TPC-R and only those that index 
 parts of primary or foreign keys are allowed in TPC-H.

That would be appropriate for this case though, yes?   That column is
part of 
a foriegn key, unless I've totally lost track.

As I remarked before, Postgres does *not* automatically create indexes
for 
FKs.   Many, but not all, other database products do, so comparing
PostgreSQL 
against those products without the index is unfair.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

*

This e-mail may contain privileged or confidential material intended for the named 
recipient only.
If you are not the named recipient, delete this message and all attachments.
Unauthorized reviewing, copying, printing, disclosing, or otherwise using information 
in this e-mail is prohibited.
We reserve the right to monitor e-mail sent through our network. 

*

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] TPC-R benchmarks

2003-09-26 Thread Mary Edie Meredith
The TPC-H/R rules allow only minor changes to the SQL that are necessary
due to SQL implementation differences. They do not allow changes made to
improve performance.  It is their way to test optimizer's ability to
recognize an inefficient SQL statement and do the rewrite.

The rule makes sense for the TPC-H, which is supposed to represent
ad-Hoc query.  One might argue that for TPC-R, which is suppose to
represent Reporting with pre-knowledge of the query, that re-write
should be allowed. However, that is currently not the case. Since the
RDBMS's represented on the TPC council are competing with TPC-H, their
optimizers already do the re-write, so (IMHO) there is no motivation to
relax the rules for the TPC-R.


On Thu, 2003-09-25 at 21:28, Tom Lane wrote:
 Oleg Lebedev [EMAIL PROTECTED] writes:
  Seems like in your case postgres uses an i_l_partkey index on lineitem
  table. I have a foreign key constraint defined between the lineitem and
  part table, but didn't create an special indexes. Here is my query plan:
 
 The planner is obviously unhappy with this plan (note the large cost
 numbers), but it can't find a way to do better.  An index on
 lineitem.l_partkey would help, I think.
 
 The whole query seems like it's written in a very inefficient fashion;
 couldn't the estimation of '0.2 * avg(l_quantity)' be amortized across
 multiple join rows?  But I dunno whether the TPC rules allow for
 significant manual rewriting of the given query.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly
-- 
Mary Edie Meredith [EMAIL PROTECTED]
Open Source Development Lab


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] TPC-R benchmarks

2003-09-25 Thread Jenny Zhang
I am running TPC-H with scale factor of 1 on RedHat7.2 with the kernel
2.5.74.  Q17 can always finish in about 7 seconds on my system.  The
execution plan is:

 Aggregate  (cost=780402.43..780402.43 rows=1 width=48)
   -  Nested Loop  (cost=0.00..780397.50 rows=1973 width=48)
 Join Filter: (inner.l_quantity  (subplan))
 -  Seq Scan on part  (cost=0.00..8548.00 rows=197 width=12)
   Filter: ((p_brand = 'Brand#31'::bpchar) AND (p_container
= 'LG CASE'::bpchar))
 -  Index Scan using i_l_partkey on lineitem 
(cost=0.00..124.32 rows=30 width=36)
   Index Cond: (outer.p_partkey = lineitem.l_partkey)
 SubPlan
   -  Aggregate  (cost=124.40..124.40 rows=1 width=11)
 -  Index Scan using i_l_partkey on lineitem 
(cost=0.00..124.32 rows=30 width=11)
   Index Cond: (l_partkey = $0)
(11 rows)

Hope this helps,
Jenny
On Thu, 2003-09-25 at 12:40, Oleg Lebedev wrote:
 I am running TPC-R benchmarks with a scale factor of 1, which correspond
 to approximately 1 GB database size on PostgreSQL 7.3.4 installed on
 CygWin on Windows XP. I dedicated 128 MB of shared memory to my postrges
 installation.
 Most of the queries were able to complete in a matter of minutes, but
 query 17 was taking hours and hours. The query is show below. Is there
 any way to optimize it ?
  
 select
  sum(l_extendedprice) / 7.0 as avg_yearly
 from
  lineitem,
  part
 where
  p_partkey = l_partkey
  and p_brand = 'Brand#11'
  and p_container = 'SM PKG'
  and l_quantity  (
   select
0.2 * avg(l_quantity)
   from
lineitem
   where
l_partkey = p_partkey
  );
  
 Thanks.
  
 Oleg
 
 *
 
 This e-mail may contain privileged or confidential material intended for the named 
 recipient only.
 If you are not the named recipient, delete this message and all attachments.
 Unauthorized reviewing, copying, printing, disclosing, or otherwise using 
 information in this e-mail is prohibited.
 We reserve the right to monitor e-mail sent through our network. 
 
 *


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] TPC-R benchmarks

2003-09-25 Thread Jenny Zhang
The index is created by:
create index i_l_partkey on lineitem (l_partkey);
I do not have any foreign key defined.  Does the spec require foreign
keys?

When you create a foreign key reference, does PG create an index
automatically?

Can you try with the index?

Jenny
On Thu, 2003-09-25 at 14:39, Oleg Lebedev wrote:
 Seems like in your case postgres uses an i_l_partkey index on lineitem
 table. I have a foreign key constraint defined between the lineitem and
 part table, but didn't create an special indexes. Here is my query plan:
 
-  Aggregate  (cost=1517604222.32..1517604222.32 rows=1 width=31)
  -  Hash Join  (cost=8518.49..1517604217.39 rows=1969 width=31)
Hash Cond: (outer.l_partkey = inner.p_partkey)
Join Filter: (outer.l_quantity  (subplan))
-  Seq Scan on lineitem  (cost=0.00..241889.15
 rows=6001215 widt
 h=27)
-  Hash  (cost=8518.00..8518.00 rows=197 width=4)
  -  Seq Scan on part  (cost=0.00..8518.00 rows=197
 width=4)
 
Filter: ((p_brand = 'Brand#11'::bpchar) AND
 (p_contai
 ner = 'SM PKG'::bpchar))
SubPlan
  -  Aggregate  (cost=256892.28..256892.28 rows=1
 width=11)
-  Seq Scan on lineitem  (cost=0.00..256892.19
 rows=37 w
 idth=11)
  Filter: (l_partkey = $0)
 
 -Original Message-
 From: Jenny Zhang [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, September 25, 2003 3:33 PM
 To: Oleg Lebedev
 Cc: [EMAIL PROTECTED];
 [EMAIL PROTECTED]
 Subject: Re: [PERFORM] TPC-R benchmarks
 
 
 I am running TPC-H with scale factor of 1 on RedHat7.2 with the kernel
 2.5.74.  Q17 can always finish in about 7 seconds on my system.  The
 execution plan is:
 
 
  Aggregate  (cost=780402.43..780402.43 rows=1 width=48)
-  Nested Loop  (cost=0.00..780397.50 rows=1973 width=48)
  Join Filter: (inner.l_quantity  (subplan))
  -  Seq Scan on part  (cost=0.00..8548.00 rows=197 width=12)
Filter: ((p_brand = 'Brand#31'::bpchar) AND (p_container
 = 'LG CASE'::bpchar))
  -  Index Scan using i_l_partkey on lineitem 
 (cost=0.00..124.32 rows=30 width=36)
Index Cond: (outer.p_partkey = lineitem.l_partkey)
  SubPlan
-  Aggregate  (cost=124.40..124.40 rows=1 width=11)
  -  Index Scan using i_l_partkey on lineitem 
 (cost=0.00..124.32 rows=30 width=11)
Index Cond: (l_partkey = $0)
 (11 rows)
 
 Hope this helps,
 Jenny
 On Thu, 2003-09-25 at 12:40, Oleg Lebedev wrote:
  I am running TPC-R benchmarks with a scale factor of 1, which 
  correspond to approximately 1 GB database size on PostgreSQL 7.3.4 
  installed on CygWin on Windows XP. I dedicated 128 MB of shared memory
 
  to my postrges installation. Most of the queries were able to complete
 
  in a matter of minutes, but query 17 was taking hours and hours. The 
  query is show below. Is there any way to optimize it ?
   
  select
   sum(l_extendedprice) / 7.0 as avg_yearly
  from
   lineitem,
   part
  where
   p_partkey = l_partkey
   and p_brand = 'Brand#11'
   and p_container = 'SM PKG'
   and l_quantity  (
select
 0.2 * avg(l_quantity)
from
 lineitem
where
 l_partkey = p_partkey
   );
   
  Thanks.
   
  Oleg
  
  *
  
  This e-mail may contain privileged or confidential material intended 
  for the named recipient only. If you are not the named recipient, 
  delete this message and all attachments. Unauthorized reviewing, 
  copying, printing, disclosing, or otherwise using information in this 
  e-mail is prohibited. We reserve the right to monitor e-mail sent 
  through our network.
  
  *
 
 *
 
 This e-mail may contain privileged or confidential material intended for the named 
 recipient only.
 If you are not the named recipient, delete this message and all attachments.
 Unauthorized reviewing, copying, printing, disclosing, or otherwise using 
 information in this e-mail is prohibited.
 We reserve the right to monitor e-mail sent through our network. 
 
 *
 
 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] TPC-R benchmarks

2003-09-25 Thread Josh Berkus
Jenny,

 create index i_l_partkey on lineitem (l_partkey);
 I do not have any foreign key defined.  Does the spec require foreign
 keys?

 When you create a foreign key reference, does PG create an index
 automatically?

No.   A index is not required to enforce a foriegn key, and is sometimes not 
useful (for example, FK fields with only 3 possible values).

So it may be that you need to create an index on that field.


-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] TPC-R benchmarks

2003-09-25 Thread Tom Lane
Oleg Lebedev [EMAIL PROTECTED] writes:
 Seems like in your case postgres uses an i_l_partkey index on lineitem
 table. I have a foreign key constraint defined between the lineitem and
 part table, but didn't create an special indexes. Here is my query plan:

The planner is obviously unhappy with this plan (note the large cost
numbers), but it can't find a way to do better.  An index on
lineitem.l_partkey would help, I think.

The whole query seems like it's written in a very inefficient fashion;
couldn't the estimation of '0.2 * avg(l_quantity)' be amortized across
multiple join rows?  But I dunno whether the TPC rules allow for
significant manual rewriting of the given query.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly