Re: [PERFORM] SAN/NAS options

2005-12-13 Thread Mark Kirkwood

Charles Sprickman wrote:

Hello all,

It seems that I'm starting to outgrow our current Postgres setup.  We've 
been running a handful of machines as standalone db servers.  This is 
all in a colocation environment, so everything is stuffed into 1U 
Supermicro boxes.  Our standard build looks like this:


Supermicro 1U w/SCA backplane and 4 bays
2x2.8 GHz Xeons
Adaptec 2015S "zero channel" RAID card
2 or 4 x 73GB Seagate 10K Ultra 320 drives (mirrored+striped)
2GB RAM
FreeBSD 4.11
PGSQL data from 5-10GB per box

Recently I started studying what we were running up against in our 
nightly runs that do a ton of updates/inserts to prep things for the 
tasks the db does during the business day (light mix of 
selects/inserts/updates). While we have plenty of disk bandwidth 
(according to bonnie), we are really dying on IOPS. I'm guessing this is 
a mix of a rather anemic RAID controller (ever notice how adaptec 
doesn't publish any real performance specs on raid cards?) and having 
only two or four spindles (effectively 1 or 2 on writes).


So that's where we are...

I'm new to the whole SAN thing, but did recently pick up a few used 
NetApp shelves and a Fibre Channel RAID HBA (Mylex ExtremeRAID 3000, 
also used) to toy with.  I started wondering if I could put something 
together to both get our storage on one set of boxes and allow me to get 
data striped across more drives.  Our budget is not huge and we are not 
adverse to getting used gear where appropriate.


What do you folks recommend?  I'm just starting to look at what's out 
there for SANs and NAS, and from what I've seen, our options are:




Leaving the whole SAN issue for a moment:

It would be interesting to see if moving to FreeBSD 6.0 would help you - 
the vfs layer is no longer throttled by the (SMP) GIANT lock in this 
version, and that may make quite a difference (given you have SMP boxes).


Another interesting thing to try is rebuilding the database ufs 
filesystem(s) with 32K blocks and 4K frags (as opposed to 8K/1K or 
16K/2K - can't recall the default on 4.x). I found this to give a factor 
of 2 speedup on random disk access (specifically queries doing indexed 
joins).


Is it mainly your 2 disk machines that are IOPS bound? if so, a cheap 
option may be to buy 2 more cheetahs for them! If it's the 4's, well how 
about a 2U U320 diskpack from whomever supplies you the Supermicro boxes?


I have just noticed Luke's posting - I would second the advice to avoid 
SAN - in my experience it's an expensive way to buy storage.


best wishes

Mark


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] SAN/NAS options

2005-12-13 Thread Luke Lonergan
Charles,

> Lastly, one thing that I'm not yet finding in trying to 
> educate myself on SANs is a good overview of what's come out 
> in the past few years that's more affordable than the old 
> big-iron stuff.  For example I saw some brief info on this 
> list's archives about the Dell/EMC offerings.  Anything else 
> in that vein to look at?

My two cents: SAN is a bad investment, go for big internal storage.

The 3Ware or Areca SATA RAID adapters kick butt and if you look in the
newest colos (I was just in ours "365main.net" today), you will see rack
on rack of machines with from 4 to 16 internal SATA drives.  Are they
all DB servers?  Not necessarily, but that's where things are headed.

You can get a 3U server with dual opteron 250s, 16GB RAM and 16x 400GB
SATAII drives with the 3Ware 9550SX controller for $10K - we just
ordered 4 of them.  I don't think you can buy an external disk chassis
and a Fibre channel NIC for that.

Performance?  800MB/s RAID5 reads, 400MB/s RAID5 writes.  Random IOs are
also very high for RAID10, but we don't use it so YMMV - look at Areca
and 3Ware.

Managability? Good web management interfaces with 6+ years of
development from 3Ware, e-mail, online rebuild options, all the goodies.
No "snapshot" or offline backup features like the high-end SANs, but do
you really need it?

Need more power or storage over time? Run a parallel DB like Bizgres
MPP, you can add more servers with internal storage and increase your
I/O, CPU and memory.

- Luke


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] SAN/NAS options

2005-12-13 Thread Charles Sprickman

Hello all,

It seems that I'm starting to outgrow our current Postgres setup.  We've been 
running a handful of machines as standalone db servers.  This is all in a 
colocation environment, so everything is stuffed into 1U Supermicro boxes.  Our 
standard build looks like this:


Supermicro 1U w/SCA backplane and 4 bays
2x2.8 GHz Xeons
Adaptec 2015S "zero channel" RAID card
2 or 4 x 73GB Seagate 10K Ultra 320 drives (mirrored+striped)
2GB RAM
FreeBSD 4.11
PGSQL data from 5-10GB per box

Recently I started studying what we were running up against in our nightly runs 
that do a ton of updates/inserts to prep things for the tasks the db does 
during the business day (light mix of selects/inserts/updates). While we have 
plenty of disk bandwidth (according to bonnie), we are really dying on IOPS. 
I'm guessing this is a mix of a rather anemic RAID controller (ever notice how 
adaptec doesn't publish any real performance specs on raid cards?) and having 
only two or four spindles (effectively 1 or 2 on writes).


So that's where we are...

I'm new to the whole SAN thing, but did recently pick up a few used NetApp 
shelves and a Fibre Channel RAID HBA (Mylex ExtremeRAID 3000, also used) to toy 
with.  I started wondering if I could put something together to both get our 
storage on one set of boxes and allow me to get data striped across more 
drives.  Our budget is not huge and we are not adverse to getting used gear 
where appropriate.


What do you folks recommend?  I'm just starting to look at what's out there for 
SANs and NAS, and from what I've seen, our options are:


NetApp Filers - the pluses with these are that if we use NFS, we don't have to 
worry about either large filesystem support in FreeBSD (2TB practical limit), 
or limitation on "growing" partitions as the NetApp just deals with that.  I 
also understand these make backups a bit simpler.  I have a great, trusted, 
spare-stocking source for these.


Apple X-Serve RAID - well, it's pretty cheap.  Honestly, that's all I know 
about it - they don't talk about IOPS numbers, and I have no idea what lurks in 
that box as a RAID controller.


SAN box w/integrated RAID - it seems like this might not be a good choice since 
the RAID hardware in the box may be where I hit any limits.  I also imagine I'm 
probably overpaying for some OEM RAID controller integrated into the box.  No 
idea where to look for used gear.


SAN box, JBOD - this seems like it might be affordable as well.  A few big 
shelves full of drives a SAN "switch" to plug all the shelves and hosts into 
and a FC RAID card in each host.  No idea where to look for used gear here 
either.


You'll note that I'm being somewhat driven by my OS of choice, FreeBSD. Unlike 
Solaris or other commercial offerings, there is no nice volume management 
available.  While I'd love to keep managing a dozen or so FreeBSD boxes, I 
could be persuaded to go to Solaris x86 if the volume management really shines 
and Postgres performs well on it.


Lastly, one thing that I'm not yet finding in trying to educate myself on SANs 
is a good overview of what's come out in the past few years that's more 
affordable than the old big-iron stuff.  For example I saw some brief info on 
this list's archives about the Dell/EMC offerings.  Anything else in that vein 
to look at?


I hope this isn't too far off topic for this list.  Postgres is the main 
application that I'm looking to accomodate.  Anything else I can do with 
whatever solution we find is just gravy...


Thanks!

Charles


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Should Oracle outperform PostgreSQL on a complex

2005-12-13 Thread Luke Lonergan
Simon, 

> Yes, I'd expect something like this right now in 8.1; the 
> numbers stack up to PostgreSQL doing equivalent join speeds, 
> but w/o star join.

I do expect a significant improvement from 8.1 using the new bitmap index 
because there is no need to scan the full Btree indexes.  Also, the new bitmap 
index has a fast compressed bitmap storage and access that make the AND 
operations speedy with no loss like the bitmap scan lossy compression, which 
may enhance the selectivity on very large datasets.

> You've confused the issue here since:
> - Oracle performs star joins using a bit map index transform. 
> It is the star join that is the important bit here, not the 
> just the bitmap part.
> - PostgreSQL does actually provide bitmap index merge, but 
> not star join
> (YET!)

Yes, that is true, a star join optimization may be a big deal, I'm not sure.  
I've certainly talked to people with that experience from RedBrick, Teradata 
and Oracle.
 
> [I've looked into this, but there seem to be multiple patent 
> claims covering various aspects of this technique, yet at 
> least other 3 vendors manage to achieve this. So far I've not 
> dug too deeply, but I understand the optimizations we'd need 
> to perform in PostgreSQL to do this.]

Hmm - I bet there's a way.

You should test the new bitmap index in Bizgres - it rocks hard.  We're 
prepping a Postgres 8.1.1 patch soon, but you can get it in Bizgres CVS now.

- Luke


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Table Partitions / Partial Indexes

2005-12-13 Thread Mike C
On 12/14/05, Simon Riggs <[EMAIL PROTECTED]> wrote:
Maybe not for queries, but if you use a date range then you never needto run a DELETE and never need to VACUUM.You could split the data into two-day chunks.
That's an interesting idea, thanks.
> Am I using a horrid method for partitioning the data? (% 10)No, but what benefit do you think it provides. I'm not sure I see...

I was trying to get both the indexes to be smaller without loosing
selectivity, and make any table scans/index scans faster from having to
read less data.

> Should there be that big of an improvement for multiple tables given> that all the data is still stored on the same filesystem?
You could store partitions in separate tablespaces/filesystems.
Ideally that's what I would do, but to make the most of that I would
have to have a dedicated RAID setup for each partition right? (Which is
a bit pricey for the budget).

Cheers,

Mike


Re: [PERFORM] Should Oracle outperform PostgreSQL on a complex

2005-12-13 Thread Simon Riggs
On Thu, 2005-12-08 at 12:26 +0100, Pål Stenslet wrote:
> I'm currently benchmarking several RDBMSs with respect to analytical
> query performance on medium-sized multidimensional data sets. The data
> set contains 30,000,000 fact rows evenly distributed in a
> multidimensional space of 9 hierarchical dimensions. Each dimension
> has 8000 members.

> I have established similar conditions for the query in PostgreSQL, and
> it runs in about 30 seconds. Again the CPU utilization is high with no
> noticable I/O. The query plan is of course very different from that of
> Oracle, since PostgreSQL lacks the bitmap index merge operation. It
> narrows down the result one dimension at a time, using the
> single-column indexes provided. It is not an option for us to provide
> multi-column indexes tailored to the specific query, since we want
> full freedom as to which dimensions each query will use.

> Are these the results we should expect when comparing PostgreSQL to
> Oracle for such queries, or are there special optimization options for
> PostgreSQL that we may have overlooked? (I wouldn't be suprised if
> there are, since I spent at least 2 full days trying to trigger the
> star optimization magic in my Oracle installation.)

Yes, I'd expect something like this right now in 8.1; the numbers stack
up to PostgreSQL doing equivalent join speeds, but w/o star join.

You've confused the issue here since:
- Oracle performs star joins using a bit map index transform. It is the
star join that is the important bit here, not the just the bitmap part.
- PostgreSQL does actually provide bitmap index merge, but not star join
(YET!)

[I've looked into this, but there seem to be multiple patent claims
covering various aspects of this technique, yet at least other 3 vendors
manage to achieve this. So far I've not dug too deeply, but I understand
the optimizations we'd need to perform in PostgreSQL to do this.]

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Should Oracle outperform PostgreSQL on a complex

2005-12-13 Thread Tomeh, Husam
 
Postgres 8.1 performance rocks (compared with 8.0) specially with the use 
in-memory index bitmaps. Complex queries that used to take 30+ minutes, it 
takes now a few minutes to complete in 8.1.  Many thanks to the all wonderful 
developers for the huge 8.1 performance boost.

---
 
  Husam Tomeh

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane
Sent: Sunday, December 11, 2005 12:39 PM
To: Pål Stenslet
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Should Oracle outperform PostgreSQL on a complex 
multidimensional query?

Perhaps you should be trying this on PG 8.1?  In any case, without
specific details of your schema or a look at EXPLAIN ANALYZE results,
it's unlikely that anyone is going to have any useful comments for you.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

**
This message contains confidential information intended only for the use of the 
addressee(s) named above and may contain information that is legally 
privileged.  If you are not the addressee, or the person responsible for 
delivering it to the addressee, you are hereby notified that reading, 
disseminating, distributing or copying this message is strictly prohibited.  If 
you have received this message by mistake, please immediately notify us by 
replying to the message and delete the original message immediately thereafter.

Thank you.

   FADLD Tag
**


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Table Partitions / Partial Indexes

2005-12-13 Thread Simon Riggs
On Mon, 2005-12-12 at 15:07 +1300, Mike C wrote:

> Partitioning on date range doesn't make much sense for this setup,
> where a typical 1-month query spans both tables (as the billing month
> for the customer might start midway through a calendar month).

Maybe not for queries, but if you use a date range then you never need
to run a DELETE and never need to VACUUM.

You could split the data into two-day chunks.

> Am I using a horrid method for partitioning the data? (% 10) 

No, but what benefit do you think it provides. I'm not sure I see...

> Should there be that big of an improvement for multiple tables given
> that all the data is still stored on the same filesystem?

You could store partitions in separate tablespaces/filesystems.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] query from partitions

2005-12-13 Thread Simon Riggs
On Tue, 2005-12-13 at 15:59 +, Richard Huxton wrote:
> Ключников А.С. wrote:
> > And
> > select * from base 
> > where id in (1,2) and datatime between '2005-05-15' and '2005-05-17';
> > 10 seconds
> > 
> > select * from base
> > where id in (select id from device where id = 1 or id = 2) and
> > datatime between '2005-05-15' and '2005-05-17';
> > 10 minits
> > 
> > Why?
> 
> Run EXPLAIN ANALYSE on both queries to see how the plan has changed.
> 
> My guess for why the plans are different is that in the first case your 
> query ends up as ...where (id=1 or id=2)...
> 
> In the second case, the planner doesn't know what it's going to get back 
> from the subquery until it's executed it, so can't tell it just needs to 
> scan base_1,base_2. Result: you'll scan all child tables of base.
> 
> I think the planner will occasionally evaluate constants before 
> planning, but I don't think it will ever execute a subquery and then 
> re-plan the outer query based on those results. Of course, someone might 
> pop up and tell me I'm wrong now...

Thats right. Partitioning doesn't work for joins in 8.1.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Lots of postmaster processes (fwd)

2005-12-13 Thread Ameet Kini


Resending it here as it may be more relevant here...
Ameet

-- Forwarded message --
Date: Tue, 13 Dec 2005 11:24:26 -0600 (CST)
From: Ameet Kini <[EMAIL PROTECTED]>
To: pgsql-admin@postgresql.org
Subject: Lots of postmaster processes



In our installation of the postgres 7.4.7, we are seeing a lot of the
following postmaster processes (around 50) being spawned by the initial
postmaster process once in a while:

postgres  3977 1  1 Nov03 ?15:11:38
/s/postgresql-7.4.7/bin/postmaster -D /scratch.1/postgres/condor-db-7.4.7

..

postgres 31985  3977  0 10:08 ?00:00:00
/s/postgresql-7.4.7/bin/postmaster -D /scratch.1/postgres/condor-db-7.4.7

postgres 31986  3977  0 10:08 ?00:00:00
/s/postgresql-7.4.7/bin/postmaster -D /scratch.1/postgres/condor-db-7.4.7

postgres 31987  3977  0 10:08 ?00:00:00
/s/postgresql-7.4.7/bin/postmaster -D /scratch.1/postgres/condor-db-7.4.7

postgres 31988  3977  0 10:08 ?00:00:00
/s/postgresql-7.4.7/bin/postmaster -D /scratch.1/postgres/condor-db-7.4.7

..


At the same time when these processes being spawned, sometimes there is
also the checkpoint subprocess. I am not sure if that is related. The
document doesn't provide any information. The other activity going on at
the same time is a 'COPY' statement from a client application.

These extra processes put a considerable load on the machine and cause it
to hang up.

Thanks,
Ameet

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] query from partitions

2005-12-13 Thread Ключников А . С .
* Richard Huxton  [2005-12-13 15:59:11 +]:

> Ключников А.С. wrote:
> >And
> >select * from base 
> > where id in (1,2) and datatime between '2005-05-15' and '2005-05-17';
> >10 seconds
> >
> >select * from base
> > where id in (select id from device where id = 1 or id = 2) and
> > datatime between '2005-05-15' and '2005-05-17';
> >10 minits
> >
> >Why?
> 
> Run EXPLAIN ANALYSE on both queries to see how the plan has changed.
explain select distinct on(id) * from base where id in (1,2) and
data_type=2 and datatime < '2005-11-02' order by id, datatime desc;

Unique  (cost=10461.14..10527.30 rows=2342 width=38)
   ->  Sort  (cost=10461.14..10494.22 rows=13232 width=38)
 Sort Key: public.base.id, public.base.datatime
 ->  Result  (cost=0.00..9555.29 rows=13232 width=38)
   ->  Append  (cost=0.00..9555.29 rows=13232 width=38)
 ->  Seq Scan on base  (cost=0.00..32.60 rows=1
width=38)
   Filter: (((id = 1) OR (id = 2)) AND (data_type =
2) AND (datatime < '2005-11-02 00:00:00'::timestamp without time zone))
 ->  Seq Scan on base_batch base  (cost=0.00..32.60
rows=1 width=38)
...

->  Seq Scan on base_1_2004 base  (cost=0.00..32.60 rows=1 width=38)
   Filter: (((id = 1) OR (id = 2)) AND (data_type =
2) AND (datatime < '2005-11-02 00:00:00'::timestamp without time zone))
(записей: 34)


and
explain select distinct on(id) * from base where id in (select id from
device where id = 1 or id = 2) and data_type=2 and datatime < '2005-11-02'
order by id, datatime desc;

Unique  (cost=369861.89..369872.52 rows=2126 width=38)
   ->  Sort  (cost=369861.89..369867.21 rows=2126 width=38)
 Sort Key: public.base.id, public.base.datatime
 ->  Hash IN Join  (cost=5.88..369744.39 rows=2126 width=38)
   Hash Cond: ("outer".id = "inner".id)
   ->  Append  (cost=0.00..368654.47 rows=212554 width=38)
 ->  Seq Scan on base  (cost=0.00..26.95 rows=2
width=38)
   Filter: ((data_type = 2) AND (datatime <
'2005-11-02 00:00:00'::timestamp without time zone))
 ->  Seq Scan on base_batch base  (cost=0.00..26.95
rows=2 width=38)
   Filter: ((data_type = 2) AND (datatime <
'2005-11-02 00:00:00'::timestamp without time zone))
 ->  Seq Scan on base_lines_05_12 base
(cost=0.00..26.95 rows=2 width=38)

 ->  Hash  (cost=5.88..5.88 rows=2 width=4)
 ->  Seq Scan on device  (cost=0.00..5.88 rows=2
width=4)
   Filter: ((id = 1) OR (id = 2))
(записей: 851)

> 
> My guess for why the plans are different is that in the first case your 
> query ends up as ...where (id=1 or id=2)...
> 
> In the second case, the planner doesn't know what it's going to get back 
> from the subquery until it's executed it, so can't tell it just needs to 
> scan base_1,base_2. Result: you'll scan all child tables of base.
> 
> I think the planner will occasionally evaluate constants before 
> planning, but I don't think it will ever execute a subquery and then 
> re-plan the outer query based on those results. Of course, someone might 
> pop up and tell me I'm wrong now...
> 
> -- 
>   Richard Huxton
>   Archonet Ltd
> 
> 
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>   choose an index scan if your joining column's datatypes do not
>   match

-- 
С уважением,
Ключников А.С.
Ведущий инженер ПРП "Аналитприбор"
432030 г.Ульяновск, а/я 3117
тел./факс +7 (8422) 43-44-78
mailto: [EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] query from partitions

2005-12-13 Thread Steinar H. Gunderson
On Tue, Dec 13, 2005 at 06:18:19PM +0300, Ключников А.С. wrote:
> select * from base
>   where id in (select id from device where id = 1 or id = 2) and
>   datatime between '2005-05-15' and '2005-05-17';
> 10 minits

That's a really odd way of saying "1 or 2". It probably has to go through all
the records in device, not realizing it can just scan for two of them (using
two index scans). I'd guess an EXPLAIN ANALYZE would confirm something like
this happening (you'd want to run that and post the results here anyhow).

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Small table or partial index?

2005-12-13 Thread Francisco Reyes

Jim C. Nasby writes:


On Fri, Dec 02, 2005 at 06:28:09PM -0500, Francisco Reyes wrote:

I am in the process of designing a new system.
There will be a long list of words such as

-word table
word_id integer
word varchar
special boolean

Some "special" words are used to determine if some work is to be done and 
will be what we care the most for one type of operation. 


Tough call. The key here is the amount of time required to do a join. It
also depends on if you need all the special words or not. Your best bet
is to try and benchmark both ways.



In your opinion do you think performance will be comparable?
I am hoping I will have time to test, but not sure if will have time and the 
tables will be pretty large. :-(


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] query from partitions

2005-12-13 Thread Richard Huxton

Ключников А.С. wrote:

And
select * from base 
	where id in (1,2) and datatime between '2005-05-15' and '2005-05-17';

10 seconds

select * from base
where id in (select id from device where id = 1 or id = 2) and
datatime between '2005-05-15' and '2005-05-17';
10 minits

Why?


Run EXPLAIN ANALYSE on both queries to see how the plan has changed.

My guess for why the plans are different is that in the first case your 
query ends up as ...where (id=1 or id=2)...


In the second case, the planner doesn't know what it's going to get back 
from the subquery until it's executed it, so can't tell it just needs to 
scan base_1,base_2. Result: you'll scan all child tables of base.


I think the planner will occasionally evaluate constants before 
planning, but I don't think it will ever execute a subquery and then 
re-plan the outer query based on those results. Of course, someone might 
pop up and tell me I'm wrong now...


--
  Richard Huxton
  Archonet Ltd


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[PERFORM] query from partitions

2005-12-13 Thread Ключников А . С .
Hi.

create table device(id int);

insert into device values(1);
insert into device values(2);
.
insert into device values(250);

create table base (
id int,
data float,
datatime timestamp,
mode int,
status int);

create table base_1 (
check ( id = 1 and datatime >= DATE 
'2005-01-01' 
and datatime < DATE '2006-01-01' )
) INHERITS (base);

create table base_2 (
check ( id = 2 and datatime >= DATE '2005-01-01'
and datatime < DATE '2006-01-01' )
) INHERITS (base);

create table base_250


And
select * from base 
where id in (1,2) and datatime between '2005-05-15' and '2005-05-17';
10 seconds

select * from base
where id in (select id from device where id = 1 or id = 2) and
datatime between '2005-05-15' and '2005-05-17';
10 minits

Why?

-- 
mailto: [EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] partitioning

2005-12-13 Thread Marc Cousin
Yes, that's how I solved it... and I totally agree that it's hard for the 
planner to guess what to do on the partitions. But maybe there should be 
something in the docs explaining the limitations ...

I'm only asking for the biggest 100 ids from the table, so I thought maybe the 
planner would take the 100 biggest from all partitions or something like that 
and return me the 100 biggest from those results. It didn't and that's quite 
logical.

What I meant is that I understand why the planner chooses this plan, but maybe 
it should be written somewhere in the docs that some plans will be worse 
after partitionning.

Le Mardi 13 Décembre 2005 12:50, vous avez écrit :
> I just saw that there is no where clause in the query, that you had
> fed to explain plan.
> you need to include a where clause based on id_machine column to see the
> effect.
>
> On 12/13/05, Pandurangan R S <[EMAIL PROTECTED]> wrote:
> > Did you set constraint_exclusion = true in postgresql.conf file?
> >
> > On 12/13/05, Marc Cousin <[EMAIL PROTECTED]> wrote:
> > > Hi,
> > >
> > > I've been working on trying to partition a big table (I've never
> > > partitioned a table in any other database till now).
> > > Everything went ok, except one query that didn't work afterwards.
> > >
> > > I've put the partition description, indexes, etc ..., and the explain
> > > plan attached.
> > >
> > > The query is extremely fast without partition (index scan backards on
> > > the primary key)
> > >
> > > The query is : "select * from logs order by id desc limit 100;"
> > > id is the primary key.
> > >
> > > It is indexed on all partitions.
> > >
> > > But the explain plan does full table scan on all partitions.
> > >
> > > While I think I understand why it is doing this plan right now, is
> > > there something that could be done to optimize this case ? Or put a
> > > warning in the docs about this kind of behaviour. I guess normally
> > > someone would partition to get faster queries :)
> > >
> > > Anyway, I thought I should mention this, as it has been quite a
> > > surprise.
> > >
> > >
> > >
> > > ---(end of
> > > broadcast)--- TIP 1: if posting/reading through
> > > Usenet, please send an appropriate subscribe-nomail command to
> > > [EMAIL PROTECTED] so that your message can get through to the
> > > mailing list cleanly
> >
> > --
> > Regards
> > Pandu

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] partitioning

2005-12-13 Thread Pandurangan R S
I just saw that there is no where clause in the query, that you had
fed to explain plan.
you need to include a where clause based on id_machine column to see the effect.

On 12/13/05, Pandurangan R S <[EMAIL PROTECTED]> wrote:
> Did you set constraint_exclusion = true in postgresql.conf file?
>
> On 12/13/05, Marc Cousin <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > I've been working on trying to partition a big table (I've never 
> > partitioned a
> > table in any other database till now).
> > Everything went ok, except one query that didn't work afterwards.
> >
> > I've put the partition description, indexes, etc ..., and the explain plan
> > attached.
> >
> > The query is extremely fast without partition (index scan backards on the
> > primary key)
> >
> > The query is : "select * from logs order by id desc limit 100;"
> > id is the primary key.
> >
> > It is indexed on all partitions.
> >
> > But the explain plan does full table scan on all partitions.
> >
> > While I think I understand why it is doing this plan right now, is there
> > something that could be done to optimize this case ? Or put a warning in the
> > docs about this kind of behaviour. I guess normally someone would partition
> > to get faster queries :)
> >
> > Anyway, I thought I should mention this, as it has been quite a surprise.
> >
> >
> >
> > ---(end of broadcast)---
> > TIP 1: if posting/reading through Usenet, please send an appropriate
> >subscribe-nomail command to [EMAIL PROTECTED] so that your
> >message can get through to the mailing list cleanly
> >
> >
> >
> >
>
>
> --
> Regards
> Pandu
>

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] partitioning

2005-12-13 Thread Pandurangan R S
Did you set constraint_exclusion = true in postgresql.conf file?

On 12/13/05, Marc Cousin <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I've been working on trying to partition a big table (I've never partitioned a
> table in any other database till now).
> Everything went ok, except one query that didn't work afterwards.
>
> I've put the partition description, indexes, etc ..., and the explain plan
> attached.
>
> The query is extremely fast without partition (index scan backards on the
> primary key)
>
> The query is : "select * from logs order by id desc limit 100;"
> id is the primary key.
>
> It is indexed on all partitions.
>
> But the explain plan does full table scan on all partitions.
>
> While I think I understand why it is doing this plan right now, is there
> something that could be done to optimize this case ? Or put a warning in the
> docs about this kind of behaviour. I guess normally someone would partition
> to get faster queries :)
>
> Anyway, I thought I should mention this, as it has been quite a surprise.
>
>
>
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly
>
>
>
>


--
Regards
Pandu

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] partitioning

2005-12-13 Thread Marc Cousin
Hi,

I've been working on trying to partition a big table (I've never partitioned a 
table in any other database till now).
Everything went ok, except one query that didn't work afterwards.

I've put the partition description, indexes, etc ..., and the explain plan 
attached.

The query is extremely fast without partition (index scan backards on the 
primary key)

The query is : "select * from logs order by id desc limit 100;"
id is the primary key.

It is indexed on all partitions.

But the explain plan does full table scan on all partitions.

While I think I understand why it is doing this plan right now, is there 
something that could be done to optimize this case ? Or put a warning in the 
docs about this kind of behaviour. I guess normally someone would partition 
to get faster queries :)

Anyway, I thought I should mention this, as it has been quite a surprise.
CREATE TABLE logs_150 (CHECK ( id_machine = 150)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_163 (CHECK ( id_machine = 163)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_289 (CHECK ( id_machine = 289)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_319 (CHECK ( id_machine = 319)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_238 (CHECK ( id_machine = 238)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_148 (CHECK ( id_machine = 148)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_176 (CHECK ( id_machine = 176)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_164 (CHECK ( id_machine = 164)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_316 (CHECK ( id_machine = 316)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_313 (CHECK ( id_machine = 313)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_217 (CHECK ( id_machine = 217)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_167 (CHECK ( id_machine = 167)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_287 (CHECK ( id_machine = 287)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_279 (CHECK ( id_machine = 279)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_248 (CHECK ( id_machine = 248)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_280 (CHECK ( id_machine = 280)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_165 (CHECK ( id_machine = 165)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_330 (CHECK ( id_machine = 330)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_149 (CHECK ( id_machine = 149)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_157 (CHECK ( id_machine = 157)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_272 (CHECK ( id_machine = 272)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_311 (CHECK ( id_machine = 311)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_309 (CHECK ( id_machine = 309)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_318 (CHECK ( id_machine = 318)) INHERITS (logs) TABLESPACE 
data_logs;


CREATE UNIQUE INDEX logs_150_pkey ON logs_150 (id) TABLESPACE index_logs;CREATE 
INDEX date_150 ON logs_150 (date) TABLESPACE index_logs ;CREATE INDEX event_150 
ON logs_150 (evenement) TABLESPACE index_logs;


.


logs=> explain select * from logs order by id desc limit 100;
 QUERY PLAN
-
 Limit  (cost=16524647.29..16524647.54 rows=100 width=295)
   ->  Sort  (cost=16524647.29..16568367.11 rows=17487927 width=295)
 Sort Key: public.logs.id
 ->  Result  (cost=0.00..827622.27 rows=17487927 width=295)
   ->  Append  (cost=0.00..827622.27 rows=17487927 width=295)
 ->  Seq Scan on logs  (cost=0.00..826232.78 rows=17451978 
width=165)
 ->  Seq Scan on logs_150 logs  (cost=0.00..199.04 
rows=6104 width=144)
 ->  Seq Scan on logs_163 logs  (cost=0.00..261.79 
rows=7079 width=169)
 ->  Seq Scan on logs_289 logs  (cost=0.00..428.93 
rows=10693 width=200)
 ->  Seq Scan on logs_319 logs  (cost=0.00..31.92 rows=992 
width=129)
 ->  Seq Scan on logs_238 logs  (cost=0.00..28.01 rows=701 
width=199)
 ->  Seq Scan on logs_148 logs  (cost=0.00..80.15 rows=2015 
width=195)
 ->  Seq Scan on logs_176 logs  (cost=0.00..12.40 rows=240 
width=295)
 ->  Seq Scan on logs_164 logs  (cost=0.00..12.40 rows=240 
width=295)
 ->  Seq Scan on logs_316 logs  (cost=0.00..12.40 rows=240 
width=295)
 ->  Seq Scan on logs_313 logs  (cost=0.00..12.40 rows=240 
width=295)
 ->  Seq Scan on logs_217 logs  (cost=0.00..12.40 rows=240 
width=295)
 ->  Seq Scan on logs_167 logs  (cost=0.00..57.36 rows=1536 
width=170)
 ->  Seq Scan on logs_287 logs  (cost=