[PERFORM] planner, *_collapse_limit

2012-07-26 Thread Marcus Engene

Hi,

I've read a little bit about join_collapse_limit and from_collapse_limit 
and I see their reason to exist.


A stupid question: in algorithms 101 you're usually told to make a chess 
program and then you usually do a width first min max tree. A low level 
opponent would interrupt this possibly infinite traversal early, thus 
returning a possibly bad move, and if it's on a higher level it's 
allowed to work longer and it will likely present a better path in the tree.


I understood it as that the *_collapse_limits are to stop a worst case 
join making the optimizer going haywire, but it feels sad that trivial 
big joins are cut off even if they're not too nasty.


Why would it not make some sense to have some time/space constraint on 
the join heuristics instead of/in combination to how the limit presently 
work? If we hit the ceiling, the best produced plan so far is used. The 
chess analogy would obviously be a handful chess pieces left but the 
min-max-tree traversal constraint is on a low depth (rather than 
time/memory) so it would quickly traverse the few options and then be 
constrained.


Best regards,
Marcus


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] odd planner again, pg 9.0.8

2012-07-25 Thread Marcus Engene

On 7/25/12 6:39 PM, Tom Lane wrote:

Marcus Engene  writes:

Lacking index hints I have a bit of a problem with a slow select.

I don't think you need index hints.  What you probably do need is to
increase join_collapse_limit and/or from_collapse_limit to deal with
this complex query as a whole.


There are several selects looking similar to this in our application
that suddenly jumped from a handfull of ms to many seconds.

Perhaps you had those settings adjusted properly and somebody turned
them off again?

regards, tom lane

Wonderful mr Lane, now the query executes amazingly fast! I increased 
from_collapse_limit from it default 8 to 10 and it behaves as expected.


Thank you!
Marcus

 Sort  (cost=10628.68..10631.95 rows=1307 width=89) (actual 
time=26.430..26.493 rows=919 loops=1)

   Sort Key: ppcr.item_common
   Sort Method:  quicksort  Memory: 154kB
   ->  Nested Loop  (cost=0.00..10561.03 rows=1307 width=89) (actual 
time=0.093..25.612 rows=919 loops=1)
 ->  Nested Loop  (cost=0.00..3433.41 rows=460 width=85) 
(actual time=0.061..13.257 rows=919 loops=1)
   ->  Nested Loop Left Join  (cost=0.00..3134.45 rows=460 
width=85) (actual time=0.057..10.972 rows=919 loops=1)
 ->  Nested Loop Left Join (cost=0.00..2706.99 
rows=460 width=32) (actual time=0.053..9.092 rows=919 loops=1)
   ->  Nested Loop  (cost=0.00..2391.21 
rows=460 width=20) (actual time=0.047..6.964 rows=919 loops=1)
 ->  Nested Loop (cost=0.00..1212.82 
rows=460 width=12) (actual time=0.039..3.756 rows=919 loops=1)
   ->  Nested Loop 
(cost=0.00..36.70 rows=460 width=4) (actual time=0.028..0.436 rows=919 
loops=1)
 Join Filter: (ppc.objectid 
= ppcr.pic_curate)
 ->  Seq Scan on pic_curate 
ppc  (cost=0.00..1.02 rows=1 width=4) (actual time=0.006..0.006 rows=1 
loops=1)
   Filter: 
(user_curator = 2)
 ->  Seq Scan on 
pic_curate_row ppcr  (cost=0.00..24.19 rows=919 width=8) (actual 
time=0.019..0.147 rows=919 loops=1)
   ->  Index Scan using 
uploading_x2 on uploading pul  (cost=0.00..2.54 rows=1 width=8) (actual 
time=0.003..0.003 rows=1 loops=919)
 Index Cond: 
(pul.item_common = ppcr.item_common)
 ->  Index Scan using item_common_pkey 
on item_common pic  (cost=0.00..2.55 rows=1 width=8) (actual 
time=0.003..0.003 rows=1 loops=919)
   Index Cond: (pic.objectid = 
ppcr.item_common)
   ->  Index Scan using item_movieclip_pkey on 
item_movieclip pim  (cost=0.00..0.67 rows=1 width=16) (actual 
time=0.002..0.002 rows=0 loops=919)
 Index Cond: (pim.item_common = 
pic.objectid)
 ->  Index Scan using item_image_pkey on item_image 
pii  (cost=0.00..0.92 rows=1 width=57) (actual time=0.002..0.002 rows=0 
loops=919)

   Index Cond: (pii.item_common = pic.objectid)
   ->  Index Scan using user_pkey on user pu 
(cost=0.00..0.64 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=919)

 Index Cond: (pu.objectid = pic.user)
 ->  Index Scan using pricing_x1 on pricing pp (cost=0.00..3.63 
rows=3 width=12) (actual time=0.004..0.004 rows=1 loops=919)

   Index Cond: (pp.item_common = ppcr.item_common)
   Filter: ((date_trunc('sec'::text, now()) >= 
pp.startdate) AND (date_trunc('sec'::text, now()) <= pp.stopdate))

 SubPlan 1
   ->  Index Scan using codec_gfx_pkey on codec_gfx pcg 
(cost=0.00..2.26 rows=1 width=27) (actual time=0.000..0.000 rows=0 
loops=919)

 Index Cond: (objectid = $0)
 SubPlan 2
   ->  Seq Scan on item_snd pis  (cost=0.00..1.90 rows=1 
width=15) (actual time=0.007..0.008 rows=0 loops=919)

 Filter: (objectid = $1)
 Total runtime: 26.795 ms
(34 rows)



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] odd planner again, pg 9.0.8

2012-07-25 Thread Marcus Engene

Hi,

Lacking index hints I have a bit of a problem with a slow select.

 select
 pic.objectid as pic_objectid
,pu.objectid as pu_objectid
,ppcr.preproc_me as pul_preproc_me
,pp.price_amount as pp_price_amount
,pim.aut_item_container as pim_aut_item_container
,COALESCE((select coalesce(pcg.name_preferred, pcg.name) from 
codec_gfx pcg where pcg.objectid = pim.aut_codec_gfx), 'unknown') as 
pcg_name
,COALESCE((select pis.name from item_snd pis where pis.objectid = 
pim.aut_codec_snd), 'unknown') as pis_name
--,(select pii2.price_arr from item_image pii2 where 
pii2.item_common = pic.objectid) as pii_price_arr

,pii.price_arr as pii_price_arr
 from
 (
 select
 ppcr.item_common
,pul.preproc_me as preproc_me
,pul.ul_finished_at as ul_finished_at
,pul.to_curator_at as to_curator_at
 from
 pic_curate ppc
,pic_curate_row ppcr
,uploading pul
 where
 ppc.user_curator = 2 AND
 ppcr.pic_curate = ppc.objectid AND
 ppcr.item_common = pul.item_common
 ) ppcr
,item_common pic
 left outer join item_movieclip pim on (pim.item_common = pic.objectid)
 left outer join item_soundclip pisc on (pisc.item_common = 
pic.objectid)

 left outer join item_image pii on (pii.item_common = pic.objectid)
,user pu
,pricing pp
 where
 pic.objectid = ppcr.item_common AND
 pu.objectid = pic.user AND
 pp.item_common = ppcr.item_common AND
 date_trunc ('sec', current_timestamp) BETWEEN pp.startdate and 
pp.stopdate

 order by
 ppcr.item_common

Item_common is the main table. It has some 10M rows

This query executes with...

 Nested Loop  (cost=256.16..2770236.40 rows=3028 width=523) (actual 
time=0.141..64428.788 rows=919 l

oops=1)
   ->  Nested Loop  (cost=256.16..2753774.01 rows=1066 width=515) 
(actual time=0.095..64414.614 rows=919 loops=1)
 ->  Nested Loop  (cost=256.16..2753472.18 rows=1066 width=501) 
(actual time=0.089..64411.782 rows=919 loops=1)
   ->  Merge Join  (cost=256.16..2750791.56 rows=1066 
width=477) (actual time=0.080..64318.897 rows=919 loops=1)

 Merge Cond: (pic.objectid = ppcr.item_common)
 ->  Merge Left Join (cost=251.72..2733545.74 
rows=10970452 width=473) (actual time=0.038..63075.673 rows=10831339 
loops=1)

   Merge Cond: (pic.objectid = pisc.item_common)
   ->  Merge Left Join (cost=251.72..2689409.45 
rows=10970452 width=457) (actual time=0.031..59173.547 rows=10831339 
loops=1)
 Merge Cond: (pic.objectid = 
pii.item_common)
 ->  Merge Left Join 
(cost=251.72..1844762.76 rows=10970452 width=404) (actual 
time=0.022..36763.334 rows=10831339 loops=1)
   Merge Cond: (pic.objectid = 
pim.item_common)
   ->  Index Scan using 
item_common_pkey on item_common pic  (cost=0.00..1764469.78 
rows=10970452 width=380) (actual time=0.010..20389.141 rows=10831339 
loops=1)
   ->  Index Scan using 
item_movieclip_pkey on item_movieclip pim  (cost=0.00..34287.89 
rows=1486673 width=28) (actual time=0.007..839.065 rows=1440175 loops=1)
 ->  Index Scan using item_image_pkey 
on item_image pii  (cost=0.00..707403.77 rows=8785343 width=57) (actual 
time=0.007..14972.056 rows=8701222 loops=1)
   ->  Index Scan using item_soundclip_pkey on 
item_soundclip pisc  (cost=0.00..10690.67 rows=481559 width=20) (actual 
time=0.007..252.650 rows=478672 loops=1)
 ->  Materialize  (cost=0.00..109.95 rows=1066 
width=4) (actual time=0.019..1.792 rows=919 loops=1)
   ->  Nested Loop  (cost=0.00..107.28 
rows=1066 width=4) (actual time=0.018..1.429 rows=919 loops=1)
 Join Filter: (ppc.objectid = 
ppcr.pic_curate)
 ->  Index Scan using 
pic_curate_row_pkey on pic_curate_row ppcr  (cost=0.00..58.27 rows=3199 
width=8) (actual time=0.010..0.650 rows=919 loops=1)
 ->  Materialize (cost=0.00..1.03 
rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=919)
   ->  Seq Scan on pic_curate ppc  
(cost=0.00..1.02 rows=1 width=4) (actual time=0.005..0.006 rows=1 loops=1)

 Filter: (user_curator = 2)
   ->  Index Scan using uploading_x2 on uploading pul  
(cost=0.00..2.50 rows=1 width=24) (actual time=0.100..0.100 rows=1 
loops=919)

 Index Cond: (pul.item_common = ppcr.item_common)
 ->  Index Scan using user_pkey on user pu (cost=0.00..0.27 
rows=1 width=14) (actual time=0.002..0.002 rows=1 loops=919)

   Index Cond: (pu.o

Re: [PERFORM] WAL in RAM

2011-11-29 Thread Marcus Engene

On 10/29/11 10:11 , Scott Marlowe wrote:

In over 10 years of using hardware RAID controllers with battery
backup on many many machines, I have had exactly zero data loss due to
a failed battery backup.  Of course proper monitoring is important, to
make sure the batteries aren't old and dead, but every single BBU RAID
controller I have used automatically switched from write back to write
through when they detected a bad battery pack.

Proper testing is essential whether it's BBU Caching or using an SSD,
and failure to do so is inconceivable if your data is at all
important.  Given the current high failure rate of SSDs due to
firmware issues (and it's not just the intel drives experiencing such
failures) I'm much more confident in Areca, 3Ware, and LSI BBU RAID
controllers right now than I am in SSDs.
   


Rimu got me a setup with 2x5805 BBU configured as two RAID10 with
SAS 15k rpm drives and on top of that 2x Xeon E5645 (the hex core).
Since I heard warnings that with non software raids, the machine could be
unresponsive during boot when doing a rebuild, I took small 300G drives.
Not that SAS 15k come in that much bigger sizes, but still.

I chickened out from pg 9.1 due to the low minor number.

I also set...
wal_buffers = 16MB
...which used to be default 64kB which possibly could explain some of
the choke problems at write bursts.
   

As per others suggestions I don't feel encouraged to put WAL on SSD from
finding several texts by Greg Smith and others warning about this. I do have
2x OCI Sandforce 1500 drives (with supercap) for some burst load tables.

The reason I started to think about putting WAL on a RAM drive to begin with
was that performance figures for unlogged tables looked very promising
indeed. And the test were of the sort that's occupying my bandwidth;
accumulating statistical writes.

The present pg9 computer is a Pg 9.0.4, Debian Squeeze, 2xXeon, 72GB,
software 4xRAID6(sorry) + 2xSSD. It's OLTP website with 10M products and
SOLR for FTS. During peak it's using ~3-4% CPU, and it's 99.9% reads or
thereabouts. It's the peaks we want to take down. RAID6 or not, with a
spindle as bottleneck there is just a certain max# of writes/s.
 

First things first, get off RAID-6.  A 4 drive RAID-6 gives no more
storage than a 4 drive RAID-10, and is painfully slow by comparison.
Looking at SSDs for WAL is putting the cart about 1,000 miles ahead of
the horse at this point.  You'd be much better off migrating to a
single SSD for everything than running on a 4 disk RAID-6.

   


Message received and understood :)

Having read up too much on drive reliability paranoia in combination
with going from 7k2 -> 15k I feel a bit uneasy, but this mama is fast.
I suppose a little bit could be credited the newly restored dump instead
of the little over a year entropy in the other machine. But I also did some
update/write torture and it was hard to provoke any io wait.

I put OS & WAL on one array and the general data files on the other.
The data directory that used to be on the SSD drive was also put on the
WAL raid.

Thanks for your advices!
Marcus

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance Problem with postgresql 9.03, 8GB RAM,Quadcore Processor Server--Need help!!!!!!!

2011-11-01 Thread Marcus Engene

Hi Hashim,

After I upgraded from Postgres 8.3/8.4 to 9.0 I had all sorts of 
problems with queries with many joins. Queries that used to take 1ms 
suddenly take half a minute for no apparent reason.


I have 72GB which I think makes the planner go bonkers and be too eager 
doing a seq scan. I tried to compensate with ridiculously low 
cpu_index_tuple_cost but it had little effect.


If I were you, I would try to remove some of the joined tables and see 
what happens. When does it start to run very slowly? How does the plan 
look right before it's super slow?



One workaround I've done is if something looking like this

select
...
from
table_linking_massive_table tlmt
   ,massive_table mt
   ,some_table1 st1
   ,some_table2 st2
   ,some_table3 st3
   ,some_table4 st4
where
tlmt.group_id = 123223 AND
mt.id = tmlt.massive_table AND
st1.massive_table = mt.id AND
st2.massive_table = mt.id AND
st3.massive_table = mt.id AND
st4.massive_table = mt.id

...suddenly gets slow, it has helped to rewrite it as

select
...
from
(
select
...
from
table_linking_massive_table tlmt
   ,massive_table mt
where
tlmt.group_id = 123223 AND
mt.id = tmlt.massive_table AND
) as mt
   ,some_table1 st1
   ,some_table2 st2
   ,some_table3 st3
   ,some_table4 st4
where
tlmt.group_id = 123223 AND
mt.id = tmlt.massive_table AND
st1.massive_table = mt.id AND
st2.massive_table = mt.id AND
st3.massive_table = mt.id AND
st4.massive_table = mt.id

This seems to force Postgres to evaluate the mt subselect first and not 
get ideas about how to join. It was a few years ago since I used Oracle 
but if I remember correctly Oracle looked at the order of the things in 
the where section. In this example Oracle would be encourage to use tlmt 
as base table and take it from there. It doesn't seem to me that 
Postgres cares about this order. Not caring would possibly be more 
forgiving with automatically generated sql but it also implies the 
planner always makes the best decisions which it obviously is not. I 
might be talking rubbish here, these are my empirical observations.


I'm sure you'll get better answers, but this is what I've done.

I assume you have done your analyze & indexing correctly etc.

Best regards,
Marcus

On 11/1/11 4:03 , Mohamed Hashim wrote:
Any idea or suggestions how to improve my database best 
performance.???


Regards
Hashim

On Sat, Oct 29, 2011 at 9:40 AM, Mohamed Hashim > wrote:


Thanks Alban & Gregg.


i will describe little more about that table

* We are using PHP application with Apache server & Postgresql
  9.0.3 in a dedicated server.
* stk_source table is mainly used to track the transactions
  from parent to child

   Table "_100410.stk_source"
Column |   Type|  Modifiers

---+---+-


 source_id | integer   | not null default
nextval('source_id_seq'::regclass)
 stock_id  | integer   |
 source_detail | integer[] |
 transaction_reference | integer   |
 is_user_set   | boolean   | default false


We store transaction_type and transaction_id in source_detail
column which is an interger array for each transactions

We use various functions to get the info based on transaction type

For eg:

In function to get the batch details we have used as

FOR batch_id_rec in select distinct(batch_id) from
order_status_batches osb join batch_status_stock bss on
osb.status_id=bss.batch_status_id where stock_id in (select
source_detail[2] from stk_source where stock_id IN (SELECT
std_i.stock_id
FROM order_details_shipments ods
JOIN shipment_pack_stock sps ON sps.pack_id=ods.pack_id
AND ods.order_id=sps.order_id AND ods.item_id=sps.item_id
JOIN stock_transaction_detail_106 std ON
std.transaction_id=sps.transaction_id
JOIN stock_transaction_detail_106 std_i ON std.stock_id =
std_i.stock_id AND std_i.transaction_type = 'i'::bpchar
WHERE shipment_item_id=$1 ) and source_detail[1]=3) LOOP

...



..

Similarly we have used in php pages and views

SELECT abd.bill_no as bill_no,to_char(abd.bill_date,'dd/mm/')
AS date,mp.product_desc as
product_desc,std.quantity,std.area,rip.price AS rate,
FROM acc_bill_items_106 abi
JOIN acc_bill_details_106_table abd ON abd.bill_id=abi.bill_id
AND abd.bill_status='act'
JOIN stk_source_table ss ON ss.source_detail[2]=abi.item_id
and ss.source_detail[1]=1
JOIN stock_transaction

Re: [PERFORM] WAL in RAM

2011-10-29 Thread Marcus Engene

On 10/28/11 5:45 , Kevin Grittner wrote:

Marcus Engene  wrote:

   

Every now and then I have write peaks which causes annoying delay
on my website
 


   

Does anyone here have any recommendations here?
 


For our largest machines we put WAL on a RAID1 drive pair dedicated
to that task, on its own controller with battery-backed cache
configured for write-back.  It does make a big difference, because
when a DBA accidentally got this wrong once, we saw the problem you
describe, and moving WAL to the dedicated drives/controller caused
the problem to go away.

If problems remain, look for posts by Greg Smith on how to tune
this.  You may want to extend your checkpoint completion target,
make the background writer more aggressive, reduce shared buffers,
or tune the OS.  But if you can afford to put WAL on a dedicated
file system something like the above, that would be a better place
to start, IMO.

-Kevin

   


The problem I have with battery backed raid controllers is the battery 
part. They're simply not reliable and requires testing etc which I as a 
rather insignificant customer at a generic datacenter cannot have done 
properly. I have however found this thing which I find primising:

http://news.cnet.com/8301-21546_3-10273658-10253464.html
An Adaptec 5z-controller which has a supercap and flushes to a SSD drive 
on mishap. Perhaps that's the answer to everything?


As per others suggestions I don't feel encouraged to put WAL on SSD from 
finding several texts by Greg Smith and others warning about this. I do 
have 2x OCI Sandforce 1500 drives (with supercap) for some burst load 
tables.


The reason I started to think about putting WAL on a RAM drive to begin 
with was that performance figures for unlogged tables looked very 
promising indeed. And the test were of the sort that's occupying my 
bandwidth; accumulating statistical writes.


The present pg9 computer is a Pg 9.0.4, Debian Squeeze, 2xXeon, 72GB, 
software 4xRAID6(sorry) + 2xSSD. It's OLTP website with 10M products and 
SOLR for FTS. During peak it's using ~3-4% CPU, and it's 99.9% reads or 
thereabouts. It's the peaks we want to take down. RAID6 or not, with a 
spindle as bottleneck there is just a certain max# of writes/s.


Thanks for your answers so far!

Best regards,
Marcus


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] WAL in RAM

2011-10-28 Thread Marcus Engene

Hi list,

Every now and then I have write peaks which causes annoying delay on my 
website. No particular reason it seems, just that laws of probability 
dictates that there will be peaks every now and then.


Anyway, thinking of ways to make the peaks more bareable, I saw the new 
9.1 feature to bypass WAL. Problems is mainly that some statistics 
tables ("x users clicked this link this month") clog the write cache, 
not more important writes. I could live with restoring a nightly dump of 
these tables and loose a days worth of logs.


Though not keen on jumping over to early major versions an old idea of 
putting WAL in RAM came back. Not RAM in main memory but some thingie 
pretending to be a drive with proper battery backup.


a) It seems to exist odd hardware with RAM modules and if lucky also battery
b) Some drive manufactureres have done hybird ram-spindle drives 
(compare with possibly more common ssd-spindle hybrides).


b) sounds slightly more appealing since it basically means I put 
everything on those drives and it magically is faster. The a) 
alternatives also seemed to be non ECC which is a no-no and disturbing.


Does anyone here have any recommendations here?

Pricing is not very important but reliability is.

Thanks,
Marcus

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] bitmask index

2011-07-05 Thread Marcus Engene

On 6/22/11 11:42 , Greg Smith wrote:

On 06/22/2011 05:27 PM, Marcus Engene wrote:
I have some tables with bitmask integers. Set bits are the 
interesting ones. Usually they are sparse.


If it's sparse, create a partial index that just includes rows where 
the bit is set:  
http://www.postgresql.org/docs/current/static/indexes-partial.html


You need to be careful the query uses the exact syntax as the one that 
created the index for it to be used.  But if you do that, it should be 
able to pull the rows that match out quickly.



I ended up having a separate table with an index on.

Though partial index solved another problem. Usually I'm a little bit 
annoyed with the optimizer and the developers religious "fix the planner 
instead of index hints". I must say that I'm willing to reconsider my 
usual stance to that.


We have a large table of products where status=20 is a rare intermediate 
status. I added a...


CREATE INDEX pond_item_common_x8 ON pond_item_common(pond_user, status)
WHERE status = 20;

...and a slow 5s select with users who had existing status=20 items 
became very fast. Planner, I guess, saw the 1 status 20 clips (out 
of millions of items) instead of like 5 different values of status and 
thus ignoring the index. Super!


To my great amazement, the planner also managed to use the index when 
counting how many status=20 items there are in total:


pond90=> explain analyze select
pond90-> coalesce(sum(tt.antal),0) as nbr_in_queue
pond90-> from
pond90-> (
pond90(> select
pond90(> pu.username
pond90(>,t.antal
pond90(> from
pond90(> (
pond90(> select
pond90(> sum(1) as antal
pond90(>,pond_user
pond90(> from
pond90(> pond_item_common
pond90(> where
pond90(> status = 20
pond90(> group by pond_user
pond90(> ) as t
pond90(>,pond_user pu
pond90(> where
pond90(> pu.objectid = t.pond_user
pond90(> order by t.antal desc
pond90(> ) as tt;
  QUERY 
PLAN

--
 Aggregate  (cost=38079.45..38079.46 rows=1 width=8) (actual 
time=166.439..166.440 rows=1 loops=1)
   ->  Sort  (cost=38079.13..38079.18 rows=21 width=18) (actual 
time=166.009..166.085 rows=648 loops=1)

 Sort Key: (sum(1))
 Sort Method:  quicksort  Memory: 67kB
 ->  Nested Loop  (cost=37903.66..38078.67 rows=21 width=18) 
(actual time=157.545..165.561 rows=648 loops=1)
   ->  HashAggregate  (cost=37903.66..37903.92 rows=21 
width=4) (actual time=157.493..157.720 rows=648 loops=1)
 ->  Bitmap Heap Scan on pond_item_common  
(cost=451.43..37853.37 rows=10057 width=4) (actual time=9.061..151.511 
rows=12352 loops=1)

   Recheck Cond: (status = 20)
   ->  Bitmap Index Scan on 
pond_item_common_x8  (cost=0.00..448.91 rows=10057 width=0) (actual 
time=5.654..5.654 rows=20051 loops=1)

 Index Cond: (status = 20)
   ->  Index Scan using pond_user_pkey on pond_user pu  
(cost=0.00..8.30 rows=1 width=14) (actual time=0.011..0.012 rows=1 
loops=648)

 Index Cond: (pu.objectid = pond_item_common.pond_user)
 Total runtime: 166.709 ms
(13 rows)

My hat's off to the dev gang. Impressive!

Best,
Marcus


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] bitmask index

2011-06-22 Thread Marcus Engene

Hi list,

I use Postgres 9.0.4.

I have some tables with bitmask integers. Set bits are the interesting 
ones. Usually they are sparse.


-- Many rows & columns
CREATE TABLE a_table
(
 objectid   INTEGER PRIMARY KEY NOT NULL
,misc_bits  INTEGER DEFAULT 0 NOT NULL
...
)
WITHOUT OIDS;

...and when I use it I...

select
...
from
a_table
where
0 <> (misc_bits & (1 << 13))

Now the dear tables have swollen and these scans aren't as nice anymore.

What indexing strategies would you use here?

External table?:

create table a_table_feature_x
(
 objectid   INTEGER PRIMARY KEY NOT NULL -- fk to 
a_table.objectid

)
WITHOUT OIDS;


Internal in the big mama table?:

CREATE TABLE a_table
(
 objectid   INTEGER PRIMARY KEY NOT NULL
,misc_bits  INTEGER DEFAULT 0 NOT NULL
,feature_x  VARCHAR(1) -- 'y' or null
...
)
WITHOUT OIDS;

CREATE INDEX a_table_x1 ON a_table(feature_x); -- I assume nulls are not 
here



Some other trick?


Thanks,
Marcus

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] wildcard makes seq scan on prod db but not in test

2011-05-09 Thread Marcus Engene

On 5/9/11 9:59 , Kevin Grittner wrote:


You don't need to do that; you can specify an opclass for the index
to tell it that you don't want to order by the normal collation, but
rather in a way which will allow the index to be useful for pattern
matching:

http://www.postgresql.org/docs/9.0/interactive/indexes-opclass.html
-Kevin

   


Hi,

Thanks for the explanation. Works brilliantly!

Best regards,
Marcus


For future googlers:

http://www.postgresonline.com/journal/archives/78-Why-is-my-index-not-being-used.html

drop index bond_item_common_x7;

CREATE INDEX bond_item_common_x7 ON bond_item_common USING 
btree(lower(original_filename) varchar_pattern_ops);


bond90=> explain analyze
select pic2.objectid
from bond_item_common pic2
where
 lower(pic2.original_filename) like 'this is a test%' ;
 QUERY PLAN
--...
 Bitmap Heap Scan on bond_item_common pic2  (cost=705.84..82746.05 
rows=23870 width=4) (actual time=0.015..0.015 rows=0 loops=1)

   Filter: (lower((original_filename)::text) ~~ 'this is a test%'::text)
   ->  Bitmap Index Scan on bond_item_common_x7  (cost=0.00..699.87 
rows=23870 width=0) (actual time=0.014..0.014 rows=0 loops=1)
 Index Cond: ((lower((original_filename)::text) ~>=~ 'this is a 
test'::text) AND (lower((original_filename)::text) ~<~ 'this is a 
tesu'::text))

 Total runtime: 0.033 ms


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] wildcard makes seq scan on prod db but not in test

2011-05-09 Thread Marcus Engene

On 5/9/11 8:57 , Kevin Grittner wrote:


That could be a difference is collations.  What do you get from the
query on this page for each database?:

http://wiki.postgresql.org/wiki/Server_Configuration

-Kevin

   
There's indeed a different collation. Why is this affecting? Can i force 
a column to be ascii?


The (fast) test server:
 version  | PostgreSQL 9.0.4 on x86_64-apple-darwin10.7.0, 
compiled by GCC i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. 
build 5664), 64-bit

 effective_cache_size | 512MB
 lc_collate   | C
 lc_ctype | UTF-8
 maintenance_work_mem | 128MB
 max_connections  | 100
 max_stack_depth  | 2MB
 port | 5435
 server_encoding  | UTF8
 shared_buffers   | 512MB
 temp_buffers | 8192
 TimeZone | Europe/Zurich
 wal_buffers  | 1MB
 work_mem | 128MB
(14 rows)

The (slow) production server:
 version  | PostgreSQL 9.0.4 on 
x86_64-unknown-linux-gnu, compiled by

GCC gcc (Debian 4.3.2-1.1) 4.3.2, 64-bit
 checkpoint_completion_target | 0.9
 checkpoint_segments  | 64
 effective_cache_size | 48GB
 lc_collate   | en_US.UTF-8
 lc_ctype | en_US.UTF-8
 listen_addresses | localhost,10.0.0.3,74.50.57.76
 maintenance_work_mem | 1GB
 max_connections  | 600
 max_stack_depth  | 2MB
 port | 5435
 server_encoding  | UTF8
 shared_buffers   | 8GB
 temp_buffers | 32768
 TimeZone | UTC
 work_mem | 128MB
(16 rows)

Thanks,
Marcus


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] wildcard makes seq scan on prod db but not in test

2011-05-09 Thread Marcus Engene

Dear list,

I have a table with a few million rows and this index:
CREATE INDEX bond_item_common_x7 ON bond_item_common 
((lower(original_filename)));


There are about 2M rows on bonddump and 4M rows on bond90.

bonddump is on a 8MB RAM machine, bond90 is on a 72MB RAM machine.

The table is analyzed properly both places.

I'm an index hint zealot, but aware of our different stances in the 
matter. :)


Dropping the wildcard for the like, both databases uses the index.

Is there a way to convince Postgres to try not to do full table scan as 
much? This is just one of several examples when it happily spends lots 
of time sequentially going thru tables.


Thanks,
Marcus




psql (9.0.4)
Type "help" for help.

bonddump=# explain analyze  select pic2.objectid
bonddump-#  from bond_item_common pic2
bonddump-#  where
bonddump-#  lower(pic2.original_filename) like 'this is a 
test%' ;
  QUERY 
PLAN

---
 Index Scan using bond_item_common_x7 on bond_item_common pic2  
(cost=0.01..8.69 rows=208 width=4) (actual time=26.415..26.415 rows=0 
loops=1)
   Index Cond: ((lower((original_filename)::text) >= 'this is a 
test'::text) AND (lower((original_filename)::text) < 'this is a 
tesu'::text))

   Filter: (lower((original_filename)::text) ~~ 'this is a test%'::text)
 Total runtime: 26.519 ms
(4 rows)




psql (9.0.4)
bond90=> explain analyze  select pic2.objectid
bond90->  from bond_item_common pic2
bond90->  where
bond90->  lower(pic2.original_filename) like 'this is a test%' ;
  QUERY PLAN
--
 Seq Scan on bond_item_common pic2  (cost=0.00..839226.81 rows=475 
width=4) (actual time=10599.401..10599.401 rows=0 loops=1)

   Filter: (lower((original_filename)::text) ~~ 'this is a test%'::text)
 Total runtime: 10599.425 ms
(3 rows)


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance