Re: [PERFORM] SAN/NAS options
Charles Sprickman wrote: Hello all, It seems that I'm starting to outgrow our current Postgres setup. We've been running a handful of machines as standalone db servers. This is all in a colocation environment, so everything is stuffed into 1U Supermicro boxes. Our standard build looks like this: Supermicro 1U w/SCA backplane and 4 bays 2x2.8 GHz Xeons Adaptec 2015S "zero channel" RAID card 2 or 4 x 73GB Seagate 10K Ultra 320 drives (mirrored+striped) 2GB RAM FreeBSD 4.11 PGSQL data from 5-10GB per box Recently I started studying what we were running up against in our nightly runs that do a ton of updates/inserts to prep things for the tasks the db does during the business day (light mix of selects/inserts/updates). While we have plenty of disk bandwidth (according to bonnie), we are really dying on IOPS. I'm guessing this is a mix of a rather anemic RAID controller (ever notice how adaptec doesn't publish any real performance specs on raid cards?) and having only two or four spindles (effectively 1 or 2 on writes). So that's where we are... I'm new to the whole SAN thing, but did recently pick up a few used NetApp shelves and a Fibre Channel RAID HBA (Mylex ExtremeRAID 3000, also used) to toy with. I started wondering if I could put something together to both get our storage on one set of boxes and allow me to get data striped across more drives. Our budget is not huge and we are not adverse to getting used gear where appropriate. What do you folks recommend? I'm just starting to look at what's out there for SANs and NAS, and from what I've seen, our options are: Leaving the whole SAN issue for a moment: It would be interesting to see if moving to FreeBSD 6.0 would help you - the vfs layer is no longer throttled by the (SMP) GIANT lock in this version, and that may make quite a difference (given you have SMP boxes). Another interesting thing to try is rebuilding the database ufs filesystem(s) with 32K blocks and 4K frags (as opposed to 8K/1K or 16K/2K - can't recall the default on 4.x). I found this to give a factor of 2 speedup on random disk access (specifically queries doing indexed joins). Is it mainly your 2 disk machines that are IOPS bound? if so, a cheap option may be to buy 2 more cheetahs for them! If it's the 4's, well how about a 2U U320 diskpack from whomever supplies you the Supermicro boxes? I have just noticed Luke's posting - I would second the advice to avoid SAN - in my experience it's an expensive way to buy storage. best wishes Mark ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] SAN/NAS options
Charles, > Lastly, one thing that I'm not yet finding in trying to > educate myself on SANs is a good overview of what's come out > in the past few years that's more affordable than the old > big-iron stuff. For example I saw some brief info on this > list's archives about the Dell/EMC offerings. Anything else > in that vein to look at? My two cents: SAN is a bad investment, go for big internal storage. The 3Ware or Areca SATA RAID adapters kick butt and if you look in the newest colos (I was just in ours "365main.net" today), you will see rack on rack of machines with from 4 to 16 internal SATA drives. Are they all DB servers? Not necessarily, but that's where things are headed. You can get a 3U server with dual opteron 250s, 16GB RAM and 16x 400GB SATAII drives with the 3Ware 9550SX controller for $10K - we just ordered 4 of them. I don't think you can buy an external disk chassis and a Fibre channel NIC for that. Performance? 800MB/s RAID5 reads, 400MB/s RAID5 writes. Random IOs are also very high for RAID10, but we don't use it so YMMV - look at Areca and 3Ware. Managability? Good web management interfaces with 6+ years of development from 3Ware, e-mail, online rebuild options, all the goodies. No "snapshot" or offline backup features like the high-end SANs, but do you really need it? Need more power or storage over time? Run a parallel DB like Bizgres MPP, you can add more servers with internal storage and increase your I/O, CPU and memory. - Luke ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] SAN/NAS options
Hello all, It seems that I'm starting to outgrow our current Postgres setup. We've been running a handful of machines as standalone db servers. This is all in a colocation environment, so everything is stuffed into 1U Supermicro boxes. Our standard build looks like this: Supermicro 1U w/SCA backplane and 4 bays 2x2.8 GHz Xeons Adaptec 2015S "zero channel" RAID card 2 or 4 x 73GB Seagate 10K Ultra 320 drives (mirrored+striped) 2GB RAM FreeBSD 4.11 PGSQL data from 5-10GB per box Recently I started studying what we were running up against in our nightly runs that do a ton of updates/inserts to prep things for the tasks the db does during the business day (light mix of selects/inserts/updates). While we have plenty of disk bandwidth (according to bonnie), we are really dying on IOPS. I'm guessing this is a mix of a rather anemic RAID controller (ever notice how adaptec doesn't publish any real performance specs on raid cards?) and having only two or four spindles (effectively 1 or 2 on writes). So that's where we are... I'm new to the whole SAN thing, but did recently pick up a few used NetApp shelves and a Fibre Channel RAID HBA (Mylex ExtremeRAID 3000, also used) to toy with. I started wondering if I could put something together to both get our storage on one set of boxes and allow me to get data striped across more drives. Our budget is not huge and we are not adverse to getting used gear where appropriate. What do you folks recommend? I'm just starting to look at what's out there for SANs and NAS, and from what I've seen, our options are: NetApp Filers - the pluses with these are that if we use NFS, we don't have to worry about either large filesystem support in FreeBSD (2TB practical limit), or limitation on "growing" partitions as the NetApp just deals with that. I also understand these make backups a bit simpler. I have a great, trusted, spare-stocking source for these. Apple X-Serve RAID - well, it's pretty cheap. Honestly, that's all I know about it - they don't talk about IOPS numbers, and I have no idea what lurks in that box as a RAID controller. SAN box w/integrated RAID - it seems like this might not be a good choice since the RAID hardware in the box may be where I hit any limits. I also imagine I'm probably overpaying for some OEM RAID controller integrated into the box. No idea where to look for used gear. SAN box, JBOD - this seems like it might be affordable as well. A few big shelves full of drives a SAN "switch" to plug all the shelves and hosts into and a FC RAID card in each host. No idea where to look for used gear here either. You'll note that I'm being somewhat driven by my OS of choice, FreeBSD. Unlike Solaris or other commercial offerings, there is no nice volume management available. While I'd love to keep managing a dozen or so FreeBSD boxes, I could be persuaded to go to Solaris x86 if the volume management really shines and Postgres performs well on it. Lastly, one thing that I'm not yet finding in trying to educate myself on SANs is a good overview of what's come out in the past few years that's more affordable than the old big-iron stuff. For example I saw some brief info on this list's archives about the Dell/EMC offerings. Anything else in that vein to look at? I hope this isn't too far off topic for this list. Postgres is the main application that I'm looking to accomodate. Anything else I can do with whatever solution we find is just gravy... Thanks! Charles ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Should Oracle outperform PostgreSQL on a complex
Simon, > Yes, I'd expect something like this right now in 8.1; the > numbers stack up to PostgreSQL doing equivalent join speeds, > but w/o star join. I do expect a significant improvement from 8.1 using the new bitmap index because there is no need to scan the full Btree indexes. Also, the new bitmap index has a fast compressed bitmap storage and access that make the AND operations speedy with no loss like the bitmap scan lossy compression, which may enhance the selectivity on very large datasets. > You've confused the issue here since: > - Oracle performs star joins using a bit map index transform. > It is the star join that is the important bit here, not the > just the bitmap part. > - PostgreSQL does actually provide bitmap index merge, but > not star join > (YET!) Yes, that is true, a star join optimization may be a big deal, I'm not sure. I've certainly talked to people with that experience from RedBrick, Teradata and Oracle. > [I've looked into this, but there seem to be multiple patent > claims covering various aspects of this technique, yet at > least other 3 vendors manage to achieve this. So far I've not > dug too deeply, but I understand the optimizations we'd need > to perform in PostgreSQL to do this.] Hmm - I bet there's a way. You should test the new bitmap index in Bizgres - it rocks hard. We're prepping a Postgres 8.1.1 patch soon, but you can get it in Bizgres CVS now. - Luke ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Table Partitions / Partial Indexes
On 12/14/05, Simon Riggs <[EMAIL PROTECTED]> wrote: Maybe not for queries, but if you use a date range then you never needto run a DELETE and never need to VACUUM.You could split the data into two-day chunks. That's an interesting idea, thanks. > Am I using a horrid method for partitioning the data? (% 10)No, but what benefit do you think it provides. I'm not sure I see... I was trying to get both the indexes to be smaller without loosing selectivity, and make any table scans/index scans faster from having to read less data. > Should there be that big of an improvement for multiple tables given> that all the data is still stored on the same filesystem? You could store partitions in separate tablespaces/filesystems. Ideally that's what I would do, but to make the most of that I would have to have a dedicated RAID setup for each partition right? (Which is a bit pricey for the budget). Cheers, Mike
Re: [PERFORM] Should Oracle outperform PostgreSQL on a complex
On Thu, 2005-12-08 at 12:26 +0100, Pål Stenslet wrote: > I'm currently benchmarking several RDBMSs with respect to analytical > query performance on medium-sized multidimensional data sets. The data > set contains 30,000,000 fact rows evenly distributed in a > multidimensional space of 9 hierarchical dimensions. Each dimension > has 8000 members. > I have established similar conditions for the query in PostgreSQL, and > it runs in about 30 seconds. Again the CPU utilization is high with no > noticable I/O. The query plan is of course very different from that of > Oracle, since PostgreSQL lacks the bitmap index merge operation. It > narrows down the result one dimension at a time, using the > single-column indexes provided. It is not an option for us to provide > multi-column indexes tailored to the specific query, since we want > full freedom as to which dimensions each query will use. > Are these the results we should expect when comparing PostgreSQL to > Oracle for such queries, or are there special optimization options for > PostgreSQL that we may have overlooked? (I wouldn't be suprised if > there are, since I spent at least 2 full days trying to trigger the > star optimization magic in my Oracle installation.) Yes, I'd expect something like this right now in 8.1; the numbers stack up to PostgreSQL doing equivalent join speeds, but w/o star join. You've confused the issue here since: - Oracle performs star joins using a bit map index transform. It is the star join that is the important bit here, not the just the bitmap part. - PostgreSQL does actually provide bitmap index merge, but not star join (YET!) [I've looked into this, but there seem to be multiple patent claims covering various aspects of this technique, yet at least other 3 vendors manage to achieve this. So far I've not dug too deeply, but I understand the optimizations we'd need to perform in PostgreSQL to do this.] Best Regards, Simon Riggs ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Should Oracle outperform PostgreSQL on a complex
Postgres 8.1 performance rocks (compared with 8.0) specially with the use in-memory index bitmaps. Complex queries that used to take 30+ minutes, it takes now a few minutes to complete in 8.1. Many thanks to the all wonderful developers for the huge 8.1 performance boost. --- Husam Tomeh -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane Sent: Sunday, December 11, 2005 12:39 PM To: Pål Stenslet Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Should Oracle outperform PostgreSQL on a complex multidimensional query? Perhaps you should be trying this on PG 8.1? In any case, without specific details of your schema or a look at EXPLAIN ANALYZE results, it's unlikely that anyone is going to have any useful comments for you. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ** This message contains confidential information intended only for the use of the addressee(s) named above and may contain information that is legally privileged. If you are not the addressee, or the person responsible for delivering it to the addressee, you are hereby notified that reading, disseminating, distributing or copying this message is strictly prohibited. If you have received this message by mistake, please immediately notify us by replying to the message and delete the original message immediately thereafter. Thank you. FADLD Tag ** ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Table Partitions / Partial Indexes
On Mon, 2005-12-12 at 15:07 +1300, Mike C wrote: > Partitioning on date range doesn't make much sense for this setup, > where a typical 1-month query spans both tables (as the billing month > for the customer might start midway through a calendar month). Maybe not for queries, but if you use a date range then you never need to run a DELETE and never need to VACUUM. You could split the data into two-day chunks. > Am I using a horrid method for partitioning the data? (% 10) No, but what benefit do you think it provides. I'm not sure I see... > Should there be that big of an improvement for multiple tables given > that all the data is still stored on the same filesystem? You could store partitions in separate tablespaces/filesystems. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] query from partitions
On Tue, 2005-12-13 at 15:59 +, Richard Huxton wrote: > Ключников А.С. wrote: > > And > > select * from base > > where id in (1,2) and datatime between '2005-05-15' and '2005-05-17'; > > 10 seconds > > > > select * from base > > where id in (select id from device where id = 1 or id = 2) and > > datatime between '2005-05-15' and '2005-05-17'; > > 10 minits > > > > Why? > > Run EXPLAIN ANALYSE on both queries to see how the plan has changed. > > My guess for why the plans are different is that in the first case your > query ends up as ...where (id=1 or id=2)... > > In the second case, the planner doesn't know what it's going to get back > from the subquery until it's executed it, so can't tell it just needs to > scan base_1,base_2. Result: you'll scan all child tables of base. > > I think the planner will occasionally evaluate constants before > planning, but I don't think it will ever execute a subquery and then > re-plan the outer query based on those results. Of course, someone might > pop up and tell me I'm wrong now... Thats right. Partitioning doesn't work for joins in 8.1. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Lots of postmaster processes (fwd)
Resending it here as it may be more relevant here... Ameet -- Forwarded message -- Date: Tue, 13 Dec 2005 11:24:26 -0600 (CST) From: Ameet Kini <[EMAIL PROTECTED]> To: pgsql-admin@postgresql.org Subject: Lots of postmaster processes In our installation of the postgres 7.4.7, we are seeing a lot of the following postmaster processes (around 50) being spawned by the initial postmaster process once in a while: postgres 3977 1 1 Nov03 ?15:11:38 /s/postgresql-7.4.7/bin/postmaster -D /scratch.1/postgres/condor-db-7.4.7 .. postgres 31985 3977 0 10:08 ?00:00:00 /s/postgresql-7.4.7/bin/postmaster -D /scratch.1/postgres/condor-db-7.4.7 postgres 31986 3977 0 10:08 ?00:00:00 /s/postgresql-7.4.7/bin/postmaster -D /scratch.1/postgres/condor-db-7.4.7 postgres 31987 3977 0 10:08 ?00:00:00 /s/postgresql-7.4.7/bin/postmaster -D /scratch.1/postgres/condor-db-7.4.7 postgres 31988 3977 0 10:08 ?00:00:00 /s/postgresql-7.4.7/bin/postmaster -D /scratch.1/postgres/condor-db-7.4.7 .. At the same time when these processes being spawned, sometimes there is also the checkpoint subprocess. I am not sure if that is related. The document doesn't provide any information. The other activity going on at the same time is a 'COPY' statement from a client application. These extra processes put a considerable load on the machine and cause it to hang up. Thanks, Ameet ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] query from partitions
* Richard Huxton [2005-12-13 15:59:11 +]: > Ключников А.С. wrote: > >And > >select * from base > > where id in (1,2) and datatime between '2005-05-15' and '2005-05-17'; > >10 seconds > > > >select * from base > > where id in (select id from device where id = 1 or id = 2) and > > datatime between '2005-05-15' and '2005-05-17'; > >10 minits > > > >Why? > > Run EXPLAIN ANALYSE on both queries to see how the plan has changed. explain select distinct on(id) * from base where id in (1,2) and data_type=2 and datatime < '2005-11-02' order by id, datatime desc; Unique (cost=10461.14..10527.30 rows=2342 width=38) -> Sort (cost=10461.14..10494.22 rows=13232 width=38) Sort Key: public.base.id, public.base.datatime -> Result (cost=0.00..9555.29 rows=13232 width=38) -> Append (cost=0.00..9555.29 rows=13232 width=38) -> Seq Scan on base (cost=0.00..32.60 rows=1 width=38) Filter: (((id = 1) OR (id = 2)) AND (data_type = 2) AND (datatime < '2005-11-02 00:00:00'::timestamp without time zone)) -> Seq Scan on base_batch base (cost=0.00..32.60 rows=1 width=38) ... -> Seq Scan on base_1_2004 base (cost=0.00..32.60 rows=1 width=38) Filter: (((id = 1) OR (id = 2)) AND (data_type = 2) AND (datatime < '2005-11-02 00:00:00'::timestamp without time zone)) (записей: 34) and explain select distinct on(id) * from base where id in (select id from device where id = 1 or id = 2) and data_type=2 and datatime < '2005-11-02' order by id, datatime desc; Unique (cost=369861.89..369872.52 rows=2126 width=38) -> Sort (cost=369861.89..369867.21 rows=2126 width=38) Sort Key: public.base.id, public.base.datatime -> Hash IN Join (cost=5.88..369744.39 rows=2126 width=38) Hash Cond: ("outer".id = "inner".id) -> Append (cost=0.00..368654.47 rows=212554 width=38) -> Seq Scan on base (cost=0.00..26.95 rows=2 width=38) Filter: ((data_type = 2) AND (datatime < '2005-11-02 00:00:00'::timestamp without time zone)) -> Seq Scan on base_batch base (cost=0.00..26.95 rows=2 width=38) Filter: ((data_type = 2) AND (datatime < '2005-11-02 00:00:00'::timestamp without time zone)) -> Seq Scan on base_lines_05_12 base (cost=0.00..26.95 rows=2 width=38) -> Hash (cost=5.88..5.88 rows=2 width=4) -> Seq Scan on device (cost=0.00..5.88 rows=2 width=4) Filter: ((id = 1) OR (id = 2)) (записей: 851) > > My guess for why the plans are different is that in the first case your > query ends up as ...where (id=1 or id=2)... > > In the second case, the planner doesn't know what it's going to get back > from the subquery until it's executed it, so can't tell it just needs to > scan base_1,base_2. Result: you'll scan all child tables of base. > > I think the planner will occasionally evaluate constants before > planning, but I don't think it will ever execute a subquery and then > re-plan the outer query based on those results. Of course, someone might > pop up and tell me I'm wrong now... > > -- > Richard Huxton > Archonet Ltd > > > ---(end of broadcast)--- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- С уважением, Ключников А.С. Ведущий инженер ПРП "Аналитприбор" 432030 г.Ульяновск, а/я 3117 тел./факс +7 (8422) 43-44-78 mailto: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] query from partitions
On Tue, Dec 13, 2005 at 06:18:19PM +0300, Ключников А.С. wrote: > select * from base > where id in (select id from device where id = 1 or id = 2) and > datatime between '2005-05-15' and '2005-05-17'; > 10 minits That's a really odd way of saying "1 or 2". It probably has to go through all the records in device, not realizing it can just scan for two of them (using two index scans). I'd guess an EXPLAIN ANALYZE would confirm something like this happening (you'd want to run that and post the results here anyhow). /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Small table or partial index?
Jim C. Nasby writes: On Fri, Dec 02, 2005 at 06:28:09PM -0500, Francisco Reyes wrote: I am in the process of designing a new system. There will be a long list of words such as -word table word_id integer word varchar special boolean Some "special" words are used to determine if some work is to be done and will be what we care the most for one type of operation. Tough call. The key here is the amount of time required to do a join. It also depends on if you need all the special words or not. Your best bet is to try and benchmark both ways. In your opinion do you think performance will be comparable? I am hoping I will have time to test, but not sure if will have time and the tables will be pretty large. :-( ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] query from partitions
Ключников А.С. wrote: And select * from base where id in (1,2) and datatime between '2005-05-15' and '2005-05-17'; 10 seconds select * from base where id in (select id from device where id = 1 or id = 2) and datatime between '2005-05-15' and '2005-05-17'; 10 minits Why? Run EXPLAIN ANALYSE on both queries to see how the plan has changed. My guess for why the plans are different is that in the first case your query ends up as ...where (id=1 or id=2)... In the second case, the planner doesn't know what it's going to get back from the subquery until it's executed it, so can't tell it just needs to scan base_1,base_2. Result: you'll scan all child tables of base. I think the planner will occasionally evaluate constants before planning, but I don't think it will ever execute a subquery and then re-plan the outer query based on those results. Of course, someone might pop up and tell me I'm wrong now... -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] query from partitions
Hi. create table device(id int); insert into device values(1); insert into device values(2); . insert into device values(250); create table base ( id int, data float, datatime timestamp, mode int, status int); create table base_1 ( check ( id = 1 and datatime >= DATE '2005-01-01' and datatime < DATE '2006-01-01' ) ) INHERITS (base); create table base_2 ( check ( id = 2 and datatime >= DATE '2005-01-01' and datatime < DATE '2006-01-01' ) ) INHERITS (base); create table base_250 And select * from base where id in (1,2) and datatime between '2005-05-15' and '2005-05-17'; 10 seconds select * from base where id in (select id from device where id = 1 or id = 2) and datatime between '2005-05-15' and '2005-05-17'; 10 minits Why? -- mailto: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] partitioning
Yes, that's how I solved it... and I totally agree that it's hard for the planner to guess what to do on the partitions. But maybe there should be something in the docs explaining the limitations ... I'm only asking for the biggest 100 ids from the table, so I thought maybe the planner would take the 100 biggest from all partitions or something like that and return me the 100 biggest from those results. It didn't and that's quite logical. What I meant is that I understand why the planner chooses this plan, but maybe it should be written somewhere in the docs that some plans will be worse after partitionning. Le Mardi 13 Décembre 2005 12:50, vous avez écrit : > I just saw that there is no where clause in the query, that you had > fed to explain plan. > you need to include a where clause based on id_machine column to see the > effect. > > On 12/13/05, Pandurangan R S <[EMAIL PROTECTED]> wrote: > > Did you set constraint_exclusion = true in postgresql.conf file? > > > > On 12/13/05, Marc Cousin <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > > > I've been working on trying to partition a big table (I've never > > > partitioned a table in any other database till now). > > > Everything went ok, except one query that didn't work afterwards. > > > > > > I've put the partition description, indexes, etc ..., and the explain > > > plan attached. > > > > > > The query is extremely fast without partition (index scan backards on > > > the primary key) > > > > > > The query is : "select * from logs order by id desc limit 100;" > > > id is the primary key. > > > > > > It is indexed on all partitions. > > > > > > But the explain plan does full table scan on all partitions. > > > > > > While I think I understand why it is doing this plan right now, is > > > there something that could be done to optimize this case ? Or put a > > > warning in the docs about this kind of behaviour. I guess normally > > > someone would partition to get faster queries :) > > > > > > Anyway, I thought I should mention this, as it has been quite a > > > surprise. > > > > > > > > > > > > ---(end of > > > broadcast)--- TIP 1: if posting/reading through > > > Usenet, please send an appropriate subscribe-nomail command to > > > [EMAIL PROTECTED] so that your message can get through to the > > > mailing list cleanly > > > > -- > > Regards > > Pandu ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] partitioning
I just saw that there is no where clause in the query, that you had fed to explain plan. you need to include a where clause based on id_machine column to see the effect. On 12/13/05, Pandurangan R S <[EMAIL PROTECTED]> wrote: > Did you set constraint_exclusion = true in postgresql.conf file? > > On 12/13/05, Marc Cousin <[EMAIL PROTECTED]> wrote: > > Hi, > > > > I've been working on trying to partition a big table (I've never > > partitioned a > > table in any other database till now). > > Everything went ok, except one query that didn't work afterwards. > > > > I've put the partition description, indexes, etc ..., and the explain plan > > attached. > > > > The query is extremely fast without partition (index scan backards on the > > primary key) > > > > The query is : "select * from logs order by id desc limit 100;" > > id is the primary key. > > > > It is indexed on all partitions. > > > > But the explain plan does full table scan on all partitions. > > > > While I think I understand why it is doing this plan right now, is there > > something that could be done to optimize this case ? Or put a warning in the > > docs about this kind of behaviour. I guess normally someone would partition > > to get faster queries :) > > > > Anyway, I thought I should mention this, as it has been quite a surprise. > > > > > > > > ---(end of broadcast)--- > > TIP 1: if posting/reading through Usenet, please send an appropriate > >subscribe-nomail command to [EMAIL PROTECTED] so that your > >message can get through to the mailing list cleanly > > > > > > > > > > > -- > Regards > Pandu > ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] partitioning
Did you set constraint_exclusion = true in postgresql.conf file? On 12/13/05, Marc Cousin <[EMAIL PROTECTED]> wrote: > Hi, > > I've been working on trying to partition a big table (I've never partitioned a > table in any other database till now). > Everything went ok, except one query that didn't work afterwards. > > I've put the partition description, indexes, etc ..., and the explain plan > attached. > > The query is extremely fast without partition (index scan backards on the > primary key) > > The query is : "select * from logs order by id desc limit 100;" > id is the primary key. > > It is indexed on all partitions. > > But the explain plan does full table scan on all partitions. > > While I think I understand why it is doing this plan right now, is there > something that could be done to optimize this case ? Or put a warning in the > docs about this kind of behaviour. I guess normally someone would partition > to get faster queries :) > > Anyway, I thought I should mention this, as it has been quite a surprise. > > > > ---(end of broadcast)--- > TIP 1: if posting/reading through Usenet, please send an appropriate >subscribe-nomail command to [EMAIL PROTECTED] so that your >message can get through to the mailing list cleanly > > > > -- Regards Pandu ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] partitioning
Hi, I've been working on trying to partition a big table (I've never partitioned a table in any other database till now). Everything went ok, except one query that didn't work afterwards. I've put the partition description, indexes, etc ..., and the explain plan attached. The query is extremely fast without partition (index scan backards on the primary key) The query is : "select * from logs order by id desc limit 100;" id is the primary key. It is indexed on all partitions. But the explain plan does full table scan on all partitions. While I think I understand why it is doing this plan right now, is there something that could be done to optimize this case ? Or put a warning in the docs about this kind of behaviour. I guess normally someone would partition to get faster queries :) Anyway, I thought I should mention this, as it has been quite a surprise. CREATE TABLE logs_150 (CHECK ( id_machine = 150)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_163 (CHECK ( id_machine = 163)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_289 (CHECK ( id_machine = 289)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_319 (CHECK ( id_machine = 319)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_238 (CHECK ( id_machine = 238)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_148 (CHECK ( id_machine = 148)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_176 (CHECK ( id_machine = 176)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_164 (CHECK ( id_machine = 164)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_316 (CHECK ( id_machine = 316)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_313 (CHECK ( id_machine = 313)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_217 (CHECK ( id_machine = 217)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_167 (CHECK ( id_machine = 167)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_287 (CHECK ( id_machine = 287)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_279 (CHECK ( id_machine = 279)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_248 (CHECK ( id_machine = 248)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_280 (CHECK ( id_machine = 280)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_165 (CHECK ( id_machine = 165)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_330 (CHECK ( id_machine = 330)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_149 (CHECK ( id_machine = 149)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_157 (CHECK ( id_machine = 157)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_272 (CHECK ( id_machine = 272)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_311 (CHECK ( id_machine = 311)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_309 (CHECK ( id_machine = 309)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_318 (CHECK ( id_machine = 318)) INHERITS (logs) TABLESPACE data_logs; CREATE UNIQUE INDEX logs_150_pkey ON logs_150 (id) TABLESPACE index_logs;CREATE INDEX date_150 ON logs_150 (date) TABLESPACE index_logs ;CREATE INDEX event_150 ON logs_150 (evenement) TABLESPACE index_logs; . logs=> explain select * from logs order by id desc limit 100; QUERY PLAN - Limit (cost=16524647.29..16524647.54 rows=100 width=295) -> Sort (cost=16524647.29..16568367.11 rows=17487927 width=295) Sort Key: public.logs.id -> Result (cost=0.00..827622.27 rows=17487927 width=295) -> Append (cost=0.00..827622.27 rows=17487927 width=295) -> Seq Scan on logs (cost=0.00..826232.78 rows=17451978 width=165) -> Seq Scan on logs_150 logs (cost=0.00..199.04 rows=6104 width=144) -> Seq Scan on logs_163 logs (cost=0.00..261.79 rows=7079 width=169) -> Seq Scan on logs_289 logs (cost=0.00..428.93 rows=10693 width=200) -> Seq Scan on logs_319 logs (cost=0.00..31.92 rows=992 width=129) -> Seq Scan on logs_238 logs (cost=0.00..28.01 rows=701 width=199) -> Seq Scan on logs_148 logs (cost=0.00..80.15 rows=2015 width=195) -> Seq Scan on logs_176 logs (cost=0.00..12.40 rows=240 width=295) -> Seq Scan on logs_164 logs (cost=0.00..12.40 rows=240 width=295) -> Seq Scan on logs_316 logs (cost=0.00..12.40 rows=240 width=295) -> Seq Scan on logs_313 logs (cost=0.00..12.40 rows=240 width=295) -> Seq Scan on logs_217 logs (cost=0.00..12.40 rows=240 width=295) -> Seq Scan on logs_167 logs (cost=0.00..57.36 rows=1536 width=170) -> Seq Scan on logs_287 logs (cost=